jon-wei opened a new pull request #9516: More efficient join filter rewrites
URL: https://github.com/apache/druid/pull/9516
 
 
   This PR adjusts the join filter rewrite/pushdown logic in 
`JoinFilterAnalyzer` to avoid redundant computation/memory waste for filter 
analysis information that's common across segments (converting filters to 
conjunctive normal form, and determining + storing correlated values for filter 
rewrites). 
   
   A new `computeJoinFilterPreAnalysis` method has been added which handles the 
computations described above (called once per query on each node). The result 
of this method is passed to the `splitFilters` method (called once per segment).
   
   Two new query context parameters are added:
   - `enableJoinFilterRewriteValueColumnFilters` : Controls whether we rewrite 
RHS filters on non-key columns. False by default for performance reasons, since 
rewriting such filters requires a scan of the RHS table.
   - `joinFilterRewriteMaxSize`: Controls the maximum size of the correlated 
value set used for filter rewrites. This limit is place to prevent excessive 
memory use. The default limit is 10000.
   
   This PR has:
   - [x] been self-reviewed.
   - [ ] added documentation for new or modified features or behaviors.
   - [x] added Javadocs for most classes and all non-trivial methods. Linked 
related entities via Javadoc links.
   - [ ] added or updated version, license, or notice information in 
[licenses.yaml](https://github.com/apache/druid/blob/master/licenses.yaml)
   - [x] added comments explaining the "why" and the intent of the code 
wherever would not be obvious for an unfamiliar reader.
   - [x] added unit tests or modified existing tests to cover new code paths.
   - [ ] added integration tests.
   - [x] been tested in a test Druid cluster.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org

Reply via email to