jihoonson opened a new pull request #9608: Eliminate common subfilters when 
converting it to a CNF
URL: https://github.com/apache/druid/pull/9608
 
 
   ### Description
   
   `Filters.toCNF()` is used in 2 places, 1) to find prefilters when scanning 
segments that bitmaps are available to evaluate them and 2) to find filters 
which can be pushed down in a Join query. This function was adopted from Hive, 
but have some issues with optimization. One of issues is that it didn't 
eliminate common subfilters, which sometimes resulted in a huge number of 
subfilters which in turn could cause an OOM error. This PR is to eliminate 
those common subfilters by changing the type of subfilters of `AndFilter` and 
`OrFilter` from `List` to `Set`. This is to detect common subfilters of 
different orders, otherwise those subfilters should be sorted in some order 
which seems more complicated than using a `Set`.
   
   The approach used in this PR can create a filter in a suboptimal CNF. There 
are at least 2 cases I'm aware of.
   
   - Even though `x IN (1,2,3)` is equivalent to `x = 1 OR x = 2 OR x = 3`, but 
this case is not handled yet. We may need to decide which form is more optimal 
(probably using `IN` filter is better since it will use less memory).
   - There are some missing cases which can be further reduced. For example, 
`(A || ~(E)) && (A || ~(F)) && (A || ~(E) || ~(F))` can be reduced as below. 
This case is not handled yet, but I left a comment about it in 
`FiltersTest.testToCNFWithComplexFilterIncludingNotAndOr()`.
   
   ```
   (A || ~(E)) && (A || ~(F)) && (A || ~(E) || ~(F))
   => ((A && (~(E) || ~(F))) && (A || ~(E) || ~(F)))
   => (A && (~(E) || ~(F))
   => (A || ~(E)) && (A || ~(F))
   ```
   
   <hr>
   
   This PR has:
   - [x] been self-reviewed.
      - [ ] using the [concurrency 
checklist](https://github.com/apache/druid/blob/master/dev/code-review/concurrency.md)
 (Remove this item if the PR doesn't have any relation to concurrency.)
   - [ ] added documentation for new or modified features or behaviors.
   - [ ] added Javadocs for most classes and all non-trivial methods. Linked 
related entities via Javadoc links.
   - [ ] added or updated version, license, or notice information in 
[licenses.yaml](https://github.com/apache/druid/blob/master/licenses.yaml)
   - [x] added comments explaining the "why" and the intent of the code 
wherever would not be obvious for an unfamiliar reader.
   - [x] added unit tests or modified existing tests to cover new code paths.
   - [ ] added integration tests.
   - [ ] been tested in a test Druid cluster.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org

Reply via email to