jihoonson opened a new pull request #9608: Eliminate common subfilters when converting it to a CNF URL: https://github.com/apache/druid/pull/9608 ### Description `Filters.toCNF()` is used in 2 places, 1) to find prefilters when scanning segments that bitmaps are available to evaluate them and 2) to find filters which can be pushed down in a Join query. This function was adopted from Hive, but have some issues with optimization. One of issues is that it didn't eliminate common subfilters, which sometimes resulted in a huge number of subfilters which in turn could cause an OOM error. This PR is to eliminate those common subfilters by changing the type of subfilters of `AndFilter` and `OrFilter` from `List` to `Set`. This is to detect common subfilters of different orders, otherwise those subfilters should be sorted in some order which seems more complicated than using a `Set`. The approach used in this PR can create a filter in a suboptimal CNF. There are at least 2 cases I'm aware of. - Even though `x IN (1,2,3)` is equivalent to `x = 1 OR x = 2 OR x = 3`, but this case is not handled yet. We may need to decide which form is more optimal (probably using `IN` filter is better since it will use less memory). - There are some missing cases which can be further reduced. For example, `(A || ~(E)) && (A || ~(F)) && (A || ~(E) || ~(F))` can be reduced as below. This case is not handled yet, but I left a comment about it in `FiltersTest.testToCNFWithComplexFilterIncludingNotAndOr()`. ``` (A || ~(E)) && (A || ~(F)) && (A || ~(E) || ~(F)) => ((A && (~(E) || ~(F))) && (A || ~(E) || ~(F))) => (A && (~(E) || ~(F)) => (A || ~(E)) && (A || ~(F)) ``` <hr> This PR has: - [x] been self-reviewed. - [ ] using the [concurrency checklist](https://github.com/apache/druid/blob/master/dev/code-review/concurrency.md) (Remove this item if the PR doesn't have any relation to concurrency.) - [ ] added documentation for new or modified features or behaviors. - [ ] added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links. - [ ] added or updated version, license, or notice information in [licenses.yaml](https://github.com/apache/druid/blob/master/licenses.yaml) - [x] added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader. - [x] added unit tests or modified existing tests to cover new code paths. - [ ] added integration tests. - [ ] been tested in a test Druid cluster.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org