[ https://issues.apache.org/jira/browse/CALCITE-6216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17809721#comment-17809721 ]
xjlem edited comment on CALCITE-6216 at 1/23/24 3:41 AM: --------------------------------------------------------- I'm sorry, I may have misled you. Indeed, joins across data sources should not be pushed down. The core issue with the Jira mentioned above is that the following logic: {code:java} LogicalAggregate(group=[\{0}], pv=[COUNT($1)]) LogicalFilter(condition=[=($0, 1)]) LogicalProject(sku_id=[$2], pv=[$3]) JdbcTableScan(table=[[db2, sku_flow]]) {code} will be transformed into: {code:java} EnumerableAggregate(group=[\{2}], pv=[COUNT($3)]) EnumerableCalc(expr#0..4=[\{inputs}], expr#5=[1], expr#6=[=($t2, $t5)], proj#0..4=[\{exprs}], $condition=[$t6]) JdbcToEnumerableConverter JdbcTableScan(table=[[db2, sku_flow]]){code} Operations like filter and aggregate cannot be pushed down to the data source for execution. Calcite will execute {code:java} select * from db2.sku_flow{code} {{query}} in the data source, and operations like filter and aggregate are executed in memory. was (Author: xjlem): I'm sorry, I may have misled you. Indeed, joins across data sources should not be pushed down. The core issue with the Jira mentioned above is that the following logic: {code:java} LogicalAggregate(group=[\{0}], pv=[COUNT($1)]) LogicalFilter(condition=[=($0, 1)]) LogicalProject(sku_id=[$2], pv=[$3]) JdbcTableScan(table=[[db2, sku_flow]]) {code} will be transformed into: {code:java} EnumerableAggregate(group=[\{2}], pv=[COUNT($3)]) EnumerableCalc(expr#0..4=[\{inputs}], expr#5=[1], expr#6=[=($t2, $t5)], proj#0..4=[\{exprs}], $condition=[$t6]) JdbcToEnumerableConverter JdbcTableScan(table=[[db2, sku_flow]]){code} Operations like filter and aggregate cannot be pushed down to the data source for execution. Calcite will execute a {code:java} select * from db2.sku_flow{code} {{query}} in the data source, and operations like filter and aggregate are executed in memory. > The JDBC adapter is unable to push down queries when there is a join involved > in the query. > ------------------------------------------------------------------------------------------- > > Key: CALCITE-6216 > URL: https://issues.apache.org/jira/browse/CALCITE-6216 > Project: Calcite > Issue Type: Improvement > Components: jdbc-adapter > Reporter: xjlem > Priority: Major > > I want to know why the description of JdbcToEnumerableConverterRule in the > JDBC rule includes "JdbcConvention out" information while others do not, > which leads to different data sources sharing the same JDBC rule. > for example: > config: > {code:java} > { > "defaultSchema": "db1", > "schemas": [ > { > "factory": "org.apache.calcite.adapter.jdbc.JdbcSchema$Factory", > "name": "db1", > "operand": { > "jdbcDriver": "com.mysql.jdbc.Driver", > "jdbcPassword": "", > "jdbcUrl": "", > "jdbcUser": "" > }, > "type": "custom" > }, > { > "factory": "org.apache.calcite.adapter.jdbc.JdbcSchema$Factory", > "name": "db2", > "operand": { > "jdbcDriver": "com.mysql.jdbc.Driver", > "jdbcPassword": "", > "jdbcUrl": "", > "jdbcUser": "" > }, > "type": "custom" > } > ], > "version": "1.0" > } {code} > sql: > {code:java} > SELECT pv / sales, pv, sales > FROM ( > SELECT sku_id, COUNT(*) AS sales > FROM db1.sku_sales > WHERE SKU_ID = 1 > GROUP BY sku_id > ) sku_sales > FULL JOIN ( > SELECT sku_id, COUNT(pv) AS pv > FROM db2.sku_flow > WHERE SKU_ID = 1 > GROUP BY sku_id > ) sku_flow > ON sku_sales.sku_id = sku_flow.sku_id{code} > it will convert to logic relNode: > {code:java} > LogicalProject(EXPR$0=[/($3, $1)], pv=[$3], sales=[$1]) > LogicalJoin(condition=[=($0, $2)], joinType=[full]) > LogicalAggregate(group=[{0}], sales=[COUNT()]) > LogicalFilter(condition=[=($0, 1)]) > LogicalProject(sku_id=[$2]) > JdbcTableScan(table=[[db1, sku_sales]]) > LogicalAggregate(group=[{0}], pv=[COUNT($1)]) > LogicalFilter(condition=[=($0, 1)]) > LogicalProject(sku_id=[$2], pv=[$3]) > JdbcTableScan(table=[[db2, sku_flow]]) {code} > optimize: > When registering the rule, the rule for the DB2 convention will not be > registered because the jdbcRule has the same description as the rule for the > DB1 convention, causing the addRule method to return false. > org.apache.calcite.adapter.jdbc.JdbcConvention#register > {code:java} > @Override public void register(RelOptPlanner planner) { > for (RelOptRule rule : JdbcRules.rules(this)) { > planner.addRule(rule); > } > planner.addRule(FilterSetOpTransposeRule.INSTANCE); > planner.addRule(ProjectRemoveRule.INSTANCE); > } {code} > after optimize: > {code:java} > EnumerableCalc(expr#0..3=[{inputs}], expr#4=[/($t3, $t1)], EXPR$0=[$t4], > pv=[$t3], sales=[$t1]) > EnumerableJoin(condition=[=($0, $2)], joinType=[full]) > JdbcToEnumerableConverter > JdbcAggregate(group=[{2}], sales=[COUNT()]) > JdbcFilter(condition=[=($2, 1)]) > JdbcTableScan(table=[[db1, sku_sales]]) > EnumerableAggregate(group=[{2}], pv=[COUNT($3)]) > EnumerableCalc(expr#0..4=[{inputs}], expr#5=[1], expr#6=[=($t2, $t5)], > proj#0..4=[{exprs}], $condition=[$t6]) > JdbcToEnumerableConverter > JdbcTableScan(table=[[db2, sku_flow]]) > {code} > As a result, we can see that the push down for DB2 fails. > -- This message was sent by Atlassian Jira (v8.20.10#820010)