alamb commented on code in PR #20702:
URL: https://github.com/apache/datafusion/pull/20702#discussion_r2886623667
##########
datafusion/optimizer/src/simplify_expressions/regex.rs:
##########
@@ -39,7 +39,7 @@ const ANY_CHAR_REGEX_PATTERN: &str = ".*";
/// - partial anchored regex patterns (e.g. `^foo`) to `LIKE 'foo%'`
/// - combinations (alternatives) of the above, will be concatenated with `OR`
or `AND`
/// - `EQ .*` to NotNull
-/// - `NE .*` means IS EMPTY
+/// - `NE .*` to false (.* matches non-empty and empty strings, and NULL !~
'.*' results in NULL so this can never be true)
Review Comment:
I think it is only false in the context of filters (and this code can
currently be used for both filters and regular expressions)
I think `null !~ '.*'` is actually `null` (not `false`)
I think a correct rewrite is
```sql
CASE
WHEN col IS NOT NULL THEN FALSE
ELSE NULL
END
```
##########
datafusion/optimizer/src/simplify_expressions/simplify_exprs.rs:
##########
@@ -883,17 +883,17 @@ mod tests {
"
)?;
- // Test `!= ".*"` transforms to checking if the column is empty
+ // Test `!~ ".*"` transforms to false
Review Comment:
can we please also add a test explicitly for a null input?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]