If I am not mistaken, this behavior can be tuned via SqlToRelConverter's Config#getInSubQueryThreshold [1]
[1] https://github.com/apache/calcite/blob/72963da2e76e7961bec5aa19aa9bdfdd99c33a05/core/src/main/java/org/apache/calcite/sql2rel/SqlToRelConverter.java#L6470 On Wed, Oct 23, 2024 at 6:25 AM Mihai Budiu <[email protected]> wrote: > PS: If you wonder "why decorrelate?", here is a paper which describes why > and how: > > https://github.com/lonng/db-papers/blob/main/papers/nested-query/unnesting-arbitrary-queries.pdf > > The current Calcite decorrelator cannot yet do all the things described in > this paper, but it does decorrelate some queries. > > Mihai > > ________________________________ > From: Diego Fernandez <[email protected]> > Sent: Tuesday, October 22, 2024 9:37 PM > To: [email protected] <[email protected]> > Subject: Help with filter that is turned into a LogicalJoin -> > (LogicalTableScan, (LogicalAggregate -> LogicalUnion)) > > Hi, > > We're using Calcite to parse a query and extract the selected Metrics & > Dimensions and any filters to then make a request to our Semantic Layer. We > really deal with fairly basic SQL, where we mostly ignore any agg functions > (the aggs are defined in the SL), don't allow joins (a "table" just > contains all available metrics and dimensions as columns), and ignore any > group bys. > > We mainly use various visitors to traverse through the parsed query to > find all these things. We built it mainly through trial and error using the > debugger and following random tidbits we've found online. We have > everything working, but recently ran into a bug in our filter parsing when > a query contains a big IN clause in the filter. It seems that instead of > using a Filter, it's doing a LogicalJoin -> (LogicalTableScan, > (LogicalAggregate -> LogicalUnion)) and I don't know how to parse that as a > filter. With a smaller set of dates in the IN clause, it generates a Filter > and I can parse that just fine. My guess is that this is due to some > optimization rule; I'd love to either disable it, or find some way to turn > the join back into a filter. Any help would be greatly appreciated! > > The query is something like: > > ``` > SELECT "ALL"."Dimension" AS "Dimension", > SUM("ALL"."Metric") AS "Metric" > FROM "METRICS"."ALL" "ALL" > WHERE ("ALL"."Metric Time" IN > ((DATE '2024-08-01'), (DATE '2024-08-02'), (DATE > '2024-08-03'), (DATE '2024-08-04'), (DATE '2024-08-05'), (DATE > '2024-08-06'), > (DATE '2024-08-07'), (DATE '2024-08-08'), (DATE > '2024-08-09'), (DATE '2024-08-10'), (DATE '2024-08-11'), (DATE > '2024-08-12'), > (DATE '2024-08-13'), (DATE '2024-08-14'), (DATE > '2024-08-15'), (DATE '2024-08-16'), (DATE '2024-08-17'), (DATE > '2024-08-18'), > (DATE '2024-08-19'), (DATE '2024-08-20'), (DATE > '2024-08-21'), (DATE '2024-08-22'), (DATE '2024-08-23'), (DATE > '2024-08-24'), > (DATE '2024-08-25'), (DATE '2024-08-26'), (DATE > '2024-08-27'), (DATE '2024-08-28'), (DATE '2024-08-29'), (DATE > '2024-08-30'), > (DATE '2024-08-31'))) > GROUP BY 1 > ``` > > And here's some useful debugger output: > [image.png] > > This is how we generate the plan: > > ``` > private fun getPlan(query: String, namedSchema: NamedSchema): RelRoot { > val rootSchema = Frameworks.createRootSchema(true) > rootSchema.add(namedSchema.name, namedSchema.schema) > // Lenient conformance helps lower parsing errors such as groupBy > ordinal on date_trunc > // fields > val conformance = SqlConformanceEnum.LENIENT > > // I'm not sure if Babel parser is actually needed here, but found > recommendations for it > // online > val parserConfig = > > SqlParser.Config.DEFAULT.withParserFactory(SqlBabelParserImpl.FACTORY) > .withConformance(conformance) > val validatorConfig = > SqlValidator.Config.DEFAULT.withConformance(conformance) > val operatorTable = > SqlLibraryOperatorTableFactory.INSTANCE.getOperatorTable( > SqlLibrary.STANDARD, > SqlLibrary.POSTGRESQL, > SqlLibrary.BIG_QUERY > ) > val frameworkConfig = > Frameworks.newConfigBuilder() > .parserConfig(parserConfig) > .sqlValidatorConfig(validatorConfig) > .defaultSchema(rootSchema) > .operatorTable(operatorTable) > .ruleSets(listOf()) > .programs(listOf()) > .costFactory(null) > .build() > val planner = Frameworks.getPlanner(frameworkConfig) > val parsed = planner.parse(query) > val validated = planner.validate(parsed) > return planner.rel(validated) > } > ``` > > > > -- > > > > Diego Fernandez > > Senior Software Engineer - Semantic Layer > > P: 336-324-0111<tel:336-324-0111> > > www.getdbt.com< > http://www.getdbt.com/?utm_medium=email&utm_source=opensense> > > [https://smart.dbtlabs.com/v2/imagebucket/dbtlabs.com/dbt-labs.png]< > http://www.getdbt.com/?utm_medium=email&utm_source=opensense> > > Connect with us > > [github]<https://github.com/dbt-labs/dbt> > > [linkedin]<http://linkedin.com/company/dbtlabs/> > > [https://smart.dbtlabs.com/v2/imagebucket/dbtlabs.com/twitter.png]< > https://twitter.com/dbt_labs> > > [https://smart.dbtlabs.com/v2/imagebucket/dbtlabs.com/youtube.png]< > https://www.youtube.com/c/dbt-labs> > > [2025 State of Analytics Engineering Report]< > https://smart.dbtlabs.com/v2/a/2025_state_of_analytics_report/67187d9a6bcf4a2a6738ac12-zA1Wz/httpswww.getdbt.comresourcesreportsstate-of-analytics-engineering-2024utm_mediumemailutm_sourceopensenseutm_campaignfy-2025_the-state-of-analytics-engineering-report_awutm_content____utm_term___ > > > >
