If I am not mistaken, this behavior can be tuned via SqlToRelConverter's
Config#getInSubQueryThreshold [1]

[1]
https://github.com/apache/calcite/blob/72963da2e76e7961bec5aa19aa9bdfdd99c33a05/core/src/main/java/org/apache/calcite/sql2rel/SqlToRelConverter.java#L6470


On Wed, Oct 23, 2024 at 6:25 AM Mihai Budiu <[email protected]> wrote:

> PS: If you wonder "why decorrelate?", here is a paper which describes why
> and how:
>
> https://github.com/lonng/db-papers/blob/main/papers/nested-query/unnesting-arbitrary-queries.pdf
>
> The current Calcite decorrelator cannot yet do all the things described in
> this paper, but it does decorrelate some queries.
>
> Mihai
>
> ________________________________
> From: Diego Fernandez <[email protected]>
> Sent: Tuesday, October 22, 2024 9:37 PM
> To: [email protected] <[email protected]>
> Subject: Help with filter that is turned into a LogicalJoin ->
> (LogicalTableScan, (LogicalAggregate -> LogicalUnion))
>
> Hi,
>
> We're using Calcite to parse a query and extract the selected Metrics &
> Dimensions and any filters to then make a request to our Semantic Layer. We
> really deal with fairly basic SQL, where we mostly ignore any agg functions
> (the aggs are defined in the SL), don't allow joins (a "table" just
> contains all available metrics and dimensions as columns), and ignore any
> group bys.
>
> We mainly use various visitors to traverse through the parsed query to
> find all these things. We built it mainly through trial and error using the
> debugger and following random tidbits we've found online. We have
> everything working, but recently ran into a bug in our filter parsing when
> a query contains a big IN clause in the filter. It seems that instead of
> using a Filter, it's doing a LogicalJoin -> (LogicalTableScan,
> (LogicalAggregate -> LogicalUnion)) and I don't know how to parse that as a
> filter. With a smaller set of dates in the IN clause, it generates a Filter
> and I can parse that just fine. My guess is that this is due to some
> optimization rule; I'd love to either disable it, or find some way to turn
> the join back into a filter. Any help would be greatly appreciated!
>
> The query is something like:
>
> ```
>         SELECT "ALL"."Dimension" AS "Dimension",
>           SUM("ALL"."Metric") AS "Metric"
>         FROM "METRICS"."ALL" "ALL"
>         WHERE ("ALL"."Metric Time" IN
>                ((DATE '2024-08-01'), (DATE '2024-08-02'), (DATE
> '2024-08-03'), (DATE '2024-08-04'), (DATE '2024-08-05'), (DATE
> '2024-08-06'),
>                 (DATE '2024-08-07'), (DATE '2024-08-08'), (DATE
> '2024-08-09'), (DATE '2024-08-10'), (DATE '2024-08-11'), (DATE
> '2024-08-12'),
>                 (DATE '2024-08-13'), (DATE '2024-08-14'), (DATE
> '2024-08-15'), (DATE '2024-08-16'), (DATE '2024-08-17'), (DATE
> '2024-08-18'),
>                 (DATE '2024-08-19'), (DATE '2024-08-20'), (DATE
> '2024-08-21'), (DATE '2024-08-22'), (DATE '2024-08-23'), (DATE
> '2024-08-24'),
>                 (DATE '2024-08-25'), (DATE '2024-08-26'), (DATE
> '2024-08-27'), (DATE '2024-08-28'), (DATE '2024-08-29'), (DATE
> '2024-08-30'),
>                 (DATE '2024-08-31')))
>         GROUP BY 1
> ```
>
> And here's some useful debugger output:
> [image.png]
>
> This is how we generate the plan:
>
> ```
>     private fun getPlan(query: String, namedSchema: NamedSchema): RelRoot {
>         val rootSchema = Frameworks.createRootSchema(true)
>         rootSchema.add(namedSchema.name, namedSchema.schema)
>         // Lenient conformance helps lower parsing errors such as groupBy
> ordinal on date_trunc
>         // fields
>         val conformance = SqlConformanceEnum.LENIENT
>
>         // I'm not sure if Babel parser is actually needed here, but found
> recommendations for it
>         // online
>         val parserConfig =
>
> SqlParser.Config.DEFAULT.withParserFactory(SqlBabelParserImpl.FACTORY)
>                 .withConformance(conformance)
>         val validatorConfig =
> SqlValidator.Config.DEFAULT.withConformance(conformance)
>         val operatorTable =
>             SqlLibraryOperatorTableFactory.INSTANCE.getOperatorTable(
>                 SqlLibrary.STANDARD,
>                 SqlLibrary.POSTGRESQL,
>                 SqlLibrary.BIG_QUERY
>             )
>         val frameworkConfig =
>             Frameworks.newConfigBuilder()
>                 .parserConfig(parserConfig)
>                 .sqlValidatorConfig(validatorConfig)
>                 .defaultSchema(rootSchema)
>                 .operatorTable(operatorTable)
>                 .ruleSets(listOf())
>                 .programs(listOf())
>                 .costFactory(null)
>                 .build()
>         val planner = Frameworks.getPlanner(frameworkConfig)
>         val parsed = planner.parse(query)
>         val validated = planner.validate(parsed)
>         return planner.rel(validated)
>     }
> ```
>
>
>
> --
>
>
>
> Diego Fernandez
>
> Senior Software Engineer - Semantic Layer
>
> P: 336-324-0111<tel:336-324-0111>
>
> www.getdbt.com<
> http://www.getdbt.com/?utm_medium=email&utm_source=opensense>
>
> [https://smart.dbtlabs.com/v2/imagebucket/dbtlabs.com/dbt-labs.png]<
> http://www.getdbt.com/?utm_medium=email&utm_source=opensense>
>
> Connect with us
>
> [github]<https://github.com/dbt-labs/dbt>
>
> [linkedin]<http://linkedin.com/company/dbtlabs/>
>
> [https://smart.dbtlabs.com/v2/imagebucket/dbtlabs.com/twitter.png]<
> https://twitter.com/dbt_labs>
>
> [https://smart.dbtlabs.com/v2/imagebucket/dbtlabs.com/youtube.png]<
> https://www.youtube.com/c/dbt-labs>
>
> [2025 State of Analytics Engineering Report]<
> https://smart.dbtlabs.com/v2/a/2025_state_of_analytics_report/67187d9a6bcf4a2a6738ac12-zA1Wz/httpswww.getdbt.comresourcesreportsstate-of-analytics-engineering-2024utm_mediumemailutm_sourceopensenseutm_campaignfy-2025_the-state-of-analytics-engineering-report_awutm_content____utm_term___
> >
>
>

Reply via email to