Dear Calcite Community, I have a few questions for the community. We have been trying to update Drill’s Calcite dependency from version 1.34 to 1.40. It has been quite a challenge, but I feel we are very close to a working solution. We’ve run into an issue which we can’t seem to solve and would like to request some help from the Calcite community.
Here is a link to the draft PR: https://github.com/apache/drill/pull/3024. This may not have all the latest attempts but should be fairly recent. The gist of the issue is that when Drill is processing queries with large IN clauses (60+ items), we observe exponential memory growth leading to OutOfMemoryError or test timeouts. This occurs specifically during the SUBQUERY_REWRITE planning phase. Here is a sample of a failing query: SELECT employee_id FROM cp.`employee.json` WHERE employee_id IN (1, 2, 3, ..., 60) -- 60 items Basically, we’re getting the following errors: Memory explosion during rule matching/firing OutOfMemoryError in SubQueryRemoveRule.rewriteIn() Complete system hang requiring timeout termination To solve this, we tried a few different approaches including: Rule-level fixes: Modified DrillSubQueryRemoveRule.matches() to detect and skip large IN clauses Apply-method handling: Added exception catching and fallback logic in apply() Threshold tuning: Tested various IN clause size limits (50, 100, 200 items) Memory analysis: Confirmed the issue exists even with minimal rule configurations We found that the Memory explosion occurs before apply(). The issue manifests itself during rule matching/firing phase, not in rule execution. The exponential growth pattern appears to be related to variant creation or trait propagation. I will also add that this works fine with ~20 items but fails consistently with 60+ items. These queries worked with Calcite 1.34 but are failing as part with the upgrade to Calcite 1.40. My questions: 1. Is this a known issue? Are there known changes in Calcite 1.40's SubQueryRemoveRule that could cause this behavior? 2. Configuration options? Are there planner settings or configuration options that could control the memory usage during subquery rewriting? 3. Alternative approaches? What's the recommended way to handle large IN clauses in Calcite 1.40 while avoiding memory explosion? 4. Performance tuning? Are there specific traits or rule ordering strategies that could mitigate this issue? 5. Do you have any suggestions or advice which could help us resolve this issue and complete the upgrade? Technical Details: Environment: Apache Drill 1.23.0-SNAPSHOT Java: OpenJDK 11 Test case: TestInList.testLargeInList1 with 60-item IN clause Memory: OutOfMemoryError with default heap settings Thank you very much for your assistance. Best, — C
