PS: If you wonder "why decorrelate?", here is a paper which describes why and how: https://github.com/lonng/db-papers/blob/main/papers/nested-query/unnesting-arbitrary-queries.pdf
The current Calcite decorrelator cannot yet do all the things described in this paper, but it does decorrelate some queries. Mihai ________________________________ From: Diego Fernandez <[email protected]> Sent: Tuesday, October 22, 2024 9:37 PM To: [email protected] <[email protected]> Subject: Help with filter that is turned into a LogicalJoin -> (LogicalTableScan, (LogicalAggregate -> LogicalUnion)) Hi, We're using Calcite to parse a query and extract the selected Metrics & Dimensions and any filters to then make a request to our Semantic Layer. We really deal with fairly basic SQL, where we mostly ignore any agg functions (the aggs are defined in the SL), don't allow joins (a "table" just contains all available metrics and dimensions as columns), and ignore any group bys. We mainly use various visitors to traverse through the parsed query to find all these things. We built it mainly through trial and error using the debugger and following random tidbits we've found online. We have everything working, but recently ran into a bug in our filter parsing when a query contains a big IN clause in the filter. It seems that instead of using a Filter, it's doing a LogicalJoin -> (LogicalTableScan, (LogicalAggregate -> LogicalUnion)) and I don't know how to parse that as a filter. With a smaller set of dates in the IN clause, it generates a Filter and I can parse that just fine. My guess is that this is due to some optimization rule; I'd love to either disable it, or find some way to turn the join back into a filter. Any help would be greatly appreciated! The query is something like: ``` SELECT "ALL"."Dimension" AS "Dimension", SUM("ALL"."Metric") AS "Metric" FROM "METRICS"."ALL" "ALL" WHERE ("ALL"."Metric Time" IN ((DATE '2024-08-01'), (DATE '2024-08-02'), (DATE '2024-08-03'), (DATE '2024-08-04'), (DATE '2024-08-05'), (DATE '2024-08-06'), (DATE '2024-08-07'), (DATE '2024-08-08'), (DATE '2024-08-09'), (DATE '2024-08-10'), (DATE '2024-08-11'), (DATE '2024-08-12'), (DATE '2024-08-13'), (DATE '2024-08-14'), (DATE '2024-08-15'), (DATE '2024-08-16'), (DATE '2024-08-17'), (DATE '2024-08-18'), (DATE '2024-08-19'), (DATE '2024-08-20'), (DATE '2024-08-21'), (DATE '2024-08-22'), (DATE '2024-08-23'), (DATE '2024-08-24'), (DATE '2024-08-25'), (DATE '2024-08-26'), (DATE '2024-08-27'), (DATE '2024-08-28'), (DATE '2024-08-29'), (DATE '2024-08-30'), (DATE '2024-08-31'))) GROUP BY 1 ``` And here's some useful debugger output: [image.png] This is how we generate the plan: ``` private fun getPlan(query: String, namedSchema: NamedSchema): RelRoot { val rootSchema = Frameworks.createRootSchema(true) rootSchema.add(namedSchema.name, namedSchema.schema) // Lenient conformance helps lower parsing errors such as groupBy ordinal on date_trunc // fields val conformance = SqlConformanceEnum.LENIENT // I'm not sure if Babel parser is actually needed here, but found recommendations for it // online val parserConfig = SqlParser.Config.DEFAULT.withParserFactory(SqlBabelParserImpl.FACTORY) .withConformance(conformance) val validatorConfig = SqlValidator.Config.DEFAULT.withConformance(conformance) val operatorTable = SqlLibraryOperatorTableFactory.INSTANCE.getOperatorTable( SqlLibrary.STANDARD, SqlLibrary.POSTGRESQL, SqlLibrary.BIG_QUERY ) val frameworkConfig = Frameworks.newConfigBuilder() .parserConfig(parserConfig) .sqlValidatorConfig(validatorConfig) .defaultSchema(rootSchema) .operatorTable(operatorTable) .ruleSets(listOf()) .programs(listOf()) .costFactory(null) .build() val planner = Frameworks.getPlanner(frameworkConfig) val parsed = planner.parse(query) val validated = planner.validate(parsed) return planner.rel(validated) } ``` -- Diego Fernandez Senior Software Engineer - Semantic Layer P: 336-324-0111<tel:336-324-0111> www.getdbt.com<http://www.getdbt.com/?utm_medium=email&utm_source=opensense> [https://smart.dbtlabs.com/v2/imagebucket/dbtlabs.com/dbt-labs.png]<http://www.getdbt.com/?utm_medium=email&utm_source=opensense> Connect with us [github]<https://github.com/dbt-labs/dbt> [linkedin]<http://linkedin.com/company/dbtlabs/> [https://smart.dbtlabs.com/v2/imagebucket/dbtlabs.com/twitter.png]<https://twitter.com/dbt_labs> [https://smart.dbtlabs.com/v2/imagebucket/dbtlabs.com/youtube.png]<https://www.youtube.com/c/dbt-labs> [2025 State of Analytics Engineering Report]<https://smart.dbtlabs.com/v2/a/2025_state_of_analytics_report/67187d9a6bcf4a2a6738ac12-zA1Wz/httpswww.getdbt.comresourcesreportsstate-of-analytics-engineering-2024utm_mediumemailutm_sourceopensenseutm_campaignfy-2025_the-state-of-analytics-engineering-report_awutm_content____utm_term___>
