bobbai00 commented on code in PR #4261:
URL: https://github.com/apache/texera/pull/4261#discussion_r2891347362


##########
common/workflow-operator/src/main/scala/org/apache/texera/amber/operator/visualization/wordCloud/WordCloudOpDesc.scala:
##########
@@ -67,7 +66,7 @@ class WordCloudOpDesc extends PythonOperatorDescriptor {
   def manipulateTable(): PythonTemplateBuilder = {
     pyb"""
        |        table.dropna(subset = [$textColumn], inplace = True) #remove 
missing values
-       |        table = table[table[$textColumn].str.contains(r'\\w', 
regex=True)]
+       |        table = table[table[$textColumn].str.contains(r'\w', 
regex=True)]

Review Comment:
   This is introduced in #4189  
   
   In Scala s"..." interpolation             
   ```                                                                          
                                                                              
     s"""                                                                       
        |        table = table[table['$textColumn'].str.contains(r'\\w', 
regex=True)]
        |""".stripMargin
   ```
   \\ is an escape sequence producing a single \. So the generated Python was 
r'\w' — correct.
   
     After using `pyb"""..."""` template:
   ```
     pyb"""
        |        table = table[table[$textColumn].str.contains(r'\\w', 
regex=True)]
        |"""
   ```
     In Scala triple-quoted strings (used by pyb), backslashes are literal — no 
escape processing. So \\w stays as \\w, producing r'\\w' in Python, which 
matches a literal \ + w instead of word characters.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to