gortiz opened a new pull request, #8818:
URL: https://github.com/apache/pinot/pull/8818

   When executing queries like:
   ```sql
   select col1, col2 from Table where text_match(col1, '/r1/') or 
text_match(col1, '/r2/')
   ```
   Pinot has to scan the referred column twice. This PR creates an optimization 
that tries to fuse boolean algebra and `text_match` predicates. Specifically, 
as indicated in the Javadoc:
   - Queries `where text_match(col1, '/r1/') and text_match(col1, '/r2/')` will 
be translated to `where text_match(col1, '/(?=r1)(?=r2)/')`
   - Queries `where text_match(col1, '/r1/') or text_match(col1, '/r2/')` will 
be translated to `where text_match(col1, '/(?:r1)|(?:r2)/')`
   - Queries `where not text_match(col1, '/r1/')` will be translated to `where 
text_match(col1, '/(?!r1)/')`
   
   There are some tests that apply the optimization to more advanced cases.
   
   Regex can be quite complex and it isn't clear to me how expressive is the 
regex language we (and lucene) support. By doing some analysis I'm sure that 
this optimization will break some regex like the ones that use backreferences.
   
   To know if the optimization can be applied or not, it would be necessary to 
analyze the regex, which would require to parse the regex into a AST. As far as 
I know Pinot doesn't have that, so this optimization is disabled by default and 
can be enabled by activating a new query option.
   
   Future improvements may include to translate some `text_match` predicated 
that are not regex to regex, so we would be able to merge things like `where 
text_match(col1, '/r1/') or text_match(col1, '"literal1" and "literal2"')` into 
`where text_match(col1, '/(?:r1)|(?=(?:literal1)|(?:literal2))/')` 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to