[ 
https://issues.apache.org/jira/browse/CALCITE-6216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17809721#comment-17809721
 ] 

xjlem edited comment on CALCITE-6216 at 1/23/24 3:41 AM:
---------------------------------------------------------

I'm sorry, I may have misled you. Indeed, joins across data sources should not 
be pushed down. The core issue with the Jira mentioned above is that the 
following logic:
{code:java}
LogicalAggregate(group=[\{0}], pv=[COUNT($1)])
   LogicalFilter(condition=[=($0, 1)])
      LogicalProject(sku_id=[$2], pv=[$3])
         JdbcTableScan(table=[[db2, sku_flow]]) {code}
will be transformed into:
{code:java}
EnumerableAggregate(group=[\{2}], pv=[COUNT($3)])
   EnumerableCalc(expr#0..4=[\{inputs}], expr#5=[1], expr#6=[=($t2, $t5)], 
proj#0..4=[\{exprs}], $condition=[$t6])
     JdbcToEnumerableConverter
        JdbcTableScan(table=[[db2, sku_flow]]){code}
Operations like filter and aggregate cannot be pushed down to the data source 
for execution. Calcite will execute 
{code:java}
select * from db2.sku_flow{code}
{{query}} in the data source, and operations like filter and aggregate are 
executed in memory.


was (Author: xjlem):
I'm sorry, I may have misled you. Indeed, joins across data sources should not 
be pushed down. The core issue with the Jira mentioned above is that the 
following logic:
{code:java}
LogicalAggregate(group=[\{0}], pv=[COUNT($1)])
   LogicalFilter(condition=[=($0, 1)])
      LogicalProject(sku_id=[$2], pv=[$3])
         JdbcTableScan(table=[[db2, sku_flow]]) {code}
will be transformed into:
{code:java}
EnumerableAggregate(group=[\{2}], pv=[COUNT($3)])
   EnumerableCalc(expr#0..4=[\{inputs}], expr#5=[1], expr#6=[=($t2, $t5)], 
proj#0..4=[\{exprs}], $condition=[$t6])
     JdbcToEnumerableConverter
        JdbcTableScan(table=[[db2, sku_flow]]){code}
Operations like filter and aggregate cannot be pushed down to the data source 
for execution. Calcite will execute a 
{code:java}
select * from db2.sku_flow{code}
{{query}} in the data source, and operations like filter and aggregate are 
executed in memory.

> The JDBC adapter is unable to push down queries when there is a join involved 
> in the query.
> -------------------------------------------------------------------------------------------
>
>                 Key: CALCITE-6216
>                 URL: https://issues.apache.org/jira/browse/CALCITE-6216
>             Project: Calcite
>          Issue Type: Improvement
>          Components: jdbc-adapter
>            Reporter: xjlem
>            Priority: Major
>
> I want to know why the description of JdbcToEnumerableConverterRule in the 
> JDBC rule includes "JdbcConvention out" information while others do not, 
> which leads to different data sources sharing the same JDBC rule.
> for example:
> config:
> {code:java}
> {
>   "defaultSchema": "db1",
>   "schemas": [
>     {
>       "factory": "org.apache.calcite.adapter.jdbc.JdbcSchema$Factory",
>       "name": "db1",
>       "operand": {
>         "jdbcDriver": "com.mysql.jdbc.Driver",
>         "jdbcPassword": "",
>         "jdbcUrl": "",
>         "jdbcUser": ""
>       },
>       "type": "custom"
>     },
>     {
>       "factory": "org.apache.calcite.adapter.jdbc.JdbcSchema$Factory",
>       "name": "db2",
>       "operand": {
>         "jdbcDriver": "com.mysql.jdbc.Driver",
>         "jdbcPassword": "",
>         "jdbcUrl": "",
>         "jdbcUser": ""
>       },
>       "type": "custom"
>     }
>   ],
>   "version": "1.0"
> } {code}
> sql:
> {code:java}
> SELECT pv / sales, pv, sales
> FROM (
>     SELECT sku_id, COUNT(*) AS sales
>     FROM db1.sku_sales
>     WHERE SKU_ID = 1
>     GROUP BY sku_id
> ) sku_sales
>     FULL JOIN (
>         SELECT sku_id, COUNT(pv) AS pv
>         FROM db2.sku_flow
>         WHERE SKU_ID = 1
>         GROUP BY sku_id
>     ) sku_flow
>     ON sku_sales.sku_id = sku_flow.sku_id{code}
> it will convert to logic relNode:
> {code:java}
> LogicalProject(EXPR$0=[/($3, $1)], pv=[$3], sales=[$1])
>   LogicalJoin(condition=[=($0, $2)], joinType=[full])
>     LogicalAggregate(group=[{0}], sales=[COUNT()])
>       LogicalFilter(condition=[=($0, 1)])
>         LogicalProject(sku_id=[$2])
>           JdbcTableScan(table=[[db1, sku_sales]])
>     LogicalAggregate(group=[{0}], pv=[COUNT($1)])
>       LogicalFilter(condition=[=($0, 1)])
>         LogicalProject(sku_id=[$2], pv=[$3])
>           JdbcTableScan(table=[[db2, sku_flow]]) {code}
> optimize:
> When registering the rule, the rule for the DB2 convention will not be 
> registered because the jdbcRule has the same description as the rule for the 
> DB1 convention, causing the addRule method to return false. 
> org.apache.calcite.adapter.jdbc.JdbcConvention#register
> {code:java}
> @Override public void register(RelOptPlanner planner) {
>   for (RelOptRule rule : JdbcRules.rules(this)) {
>     planner.addRule(rule);
>   }
>   planner.addRule(FilterSetOpTransposeRule.INSTANCE);
>   planner.addRule(ProjectRemoveRule.INSTANCE);
> } {code}
> after optimize:
> {code:java}
> EnumerableCalc(expr#0..3=[{inputs}], expr#4=[/($t3, $t1)], EXPR$0=[$t4], 
> pv=[$t3], sales=[$t1])
>   EnumerableJoin(condition=[=($0, $2)], joinType=[full])
>     JdbcToEnumerableConverter
>       JdbcAggregate(group=[{2}], sales=[COUNT()])
>         JdbcFilter(condition=[=($2, 1)])
>           JdbcTableScan(table=[[db1, sku_sales]])
>     EnumerableAggregate(group=[{2}], pv=[COUNT($3)])
>       EnumerableCalc(expr#0..4=[{inputs}], expr#5=[1], expr#6=[=($t2, $t5)], 
> proj#0..4=[{exprs}], $condition=[$t6])
>         JdbcToEnumerableConverter
>           JdbcTableScan(table=[[db2, sku_flow]])
>  {code}
> As a result, we can see that the push down for DB2 fails.
>    



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to