alamb opened a new issue #1069:
URL: https://github.com/apache/arrow-rs/issues/1069


   **Describe the bug**
   
   Characters such as `[` and `.` are sometimes treated as regular expressions 
rather than literals in regular expressions
   
   The arrow regular expression kernels such as `like_utf8` 
https://github.com/apache/arrow-rs/blob/master/arrow/src/compute/kernels/comparison.rs#L311-L323
 take limited SQL style string matching patterns (e.g. `%`). 
   
   However, under the covers a regular expression matching library is used but 
special regular expression characters are not escaped. @ovr  added code to 
handle `(` and `)` in https://github.com/apache/arrow-rs/pull/1042 but there 
are other special characters as well
   
   **To Reproduce**
   
   
   ```rust
       let array: StringArray = vec!["foo", "bar", "baz"]
           .into_iter()
           .map(Some)
           .collect();
   
       let comparison = arrow::compute::like_utf8_scalar(&array, 
"foo%.*").unwrap();
   
       let expected: BooleanArray = vec![false, false, false]
           .into_iter()
           .map(Some)
           .collect();
   
       assert_eq!(comparison, expected);
   ```
   
   **Expected behavior**
   This test should pass (is what postgres produces)
   
   ```sql
   alamb=# select * from foo;
     x  
   -----
    foo
    bar
    baz
   (3 rows)
   
   alamb=# select x, x like 'foo%.*' from foo;
     x  | ?column? 
   -----+----------
    foo | f
    bar | f
    baz | f
   (3 rows)
   ```
   
   **Additional context**
   Follow on to https://github.com/apache/arrow-rs/pull/1042 where @ovr  fixed 
the parenthesis issue
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to