gortiz opened a new pull request, #8818: URL: https://github.com/apache/pinot/pull/8818
When executing queries like: ```sql select col1, col2 from Table where text_match(col1, '/r1/') or text_match(col1, '/r2/') ``` Pinot has to scan the referred column twice. This PR creates an optimization that tries to fuse boolean algebra and `text_match` predicates. Specifically, as indicated in the Javadoc: - Queries `where text_match(col1, '/r1/') and text_match(col1, '/r2/')` will be translated to `where text_match(col1, '/(?=r1)(?=r2)/')` - Queries `where text_match(col1, '/r1/') or text_match(col1, '/r2/')` will be translated to `where text_match(col1, '/(?:r1)|(?:r2)/')` - Queries `where not text_match(col1, '/r1/')` will be translated to `where text_match(col1, '/(?!r1)/')` There are some tests that apply the optimization to more advanced cases. Regex can be quite complex and it isn't clear to me how expressive is the regex language we (and lucene) support. By doing some analysis I'm sure that this optimization will break some regex like the ones that use backreferences. To know if the optimization can be applied or not, it would be necessary to analyze the regex, which would require to parse the regex into a AST. As far as I know Pinot doesn't have that, so this optimization is disabled by default and can be enabled by activating a new query option. Future improvements may include to translate some `text_match` predicated that are not regex to regex, so we would be able to merge things like `where text_match(col1, '/r1/') or text_match(col1, '"literal1" and "literal2"')` into `where text_match(col1, '/(?:r1)|(?=(?:literal1)|(?:literal2))/')` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org