alamb commented on code in PR #20702:
URL: https://github.com/apache/datafusion/pull/20702#discussion_r2886623667


##########
datafusion/optimizer/src/simplify_expressions/regex.rs:
##########
@@ -39,7 +39,7 @@ const ANY_CHAR_REGEX_PATTERN: &str = ".*";
 /// - partial anchored regex patterns (e.g. `^foo`) to `LIKE 'foo%'`
 /// - combinations (alternatives) of the above, will be concatenated with `OR` 
or `AND`
 /// - `EQ .*` to NotNull
-/// - `NE .*` means IS EMPTY
+/// - `NE .*` to false (.* matches non-empty and empty strings, and NULL !~ 
'.*' results in NULL so this can never be true)

Review Comment:
   I think it is only false in the context of filters (and this code can 
currently be used for both filters and regular expressions)
   
   I think `null !~ '.*'` is actually `null` (not `false`)
   
   I think a correct rewrite is 
   ```sql
   CASE 
     WHEN col IS NOT NULL THEN FALSE 
     ELSE NULL
   END
   ```



##########
datafusion/optimizer/src/simplify_expressions/simplify_exprs.rs:
##########
@@ -883,17 +883,17 @@ mod tests {
         "
         )?;
 
-        // Test `!= ".*"` transforms to checking if the column is empty
+        // Test `!~ ".*"` transforms to false

Review Comment:
   can we please also add a test explicitly for a null input?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to