Hello, I am new to this mailing list and looking forward to learn more about Calcite. I am starting out with a rather basic task.
Given the set of 100 odd TPC-DS queries I want to achieve the following: 1. Parse them into SqlNodes 2. Validate them via SqlValidator 3. Convert them into RelNodes 4. Decorrelate the resulting RelNodes 5. Use some basic rule sets that are good enough starting point, run these RelNodes through them and observe the optimized RelNodes (1) turned out to be reasonably easy to get working. For (2), I ran into trouble with some queries that had SUBSTR in them. From what I can tell, the validator is expecting me to provide SUBSTRING. I had used the StdOperatorTable but maybe I should be using a different one? Here's the error that I got: No match found for function signature SUBSTR(<CHARACTER>, <NUMERIC>, <NUMERIC>) For (3), I ran into trouble with partition over functions. I got errors that looked like the following: Exception in thread "main" java.lang.RuntimeException: while converting SUM(`WS_EXT_SALES_PRICE`) * 100 / (SUM(SUM(`WS_EXT_SALES_PRICE`)) OVER (PARTITION BY `I_CLASS`)) at org.apache.calcite.sql2rel.ReflectiveConvertletTable.lambda$registerNodeTypeMethod$0(ReflectiveConvertletTable.java:86) at org.apache.calcite.sql2rel.SqlNodeToRexConverterImpl.convertCall(SqlNodeToRexConverterImpl.java:62) at org.apache.calcite.sql2rel.SqlToRelConverter$Blackboard.visit(SqlToRelConverter.java:5098) at org.apache.calcite.sql2rel.SqlToRelConverter$Blackboard.visit(SqlToRelConverter.java:4374) at org.apache.calcite.sql.SqlCall.accept(SqlCall.java:139) at org.apache.calcite.sql2rel.SqlToRelConverter$Blackboard.convertExpression(SqlToRelConverter.java:4961) at org.apache.calcite.sql2rel.StandardConvertletTable.lambda$new$9(StandardConvertletTable.java:207) at org.apache.calcite.sql2rel.SqlNodeToRexConverterImpl.convertCall(SqlNodeToRexConverterImpl.java:62) at org.apache.calcite.sql2rel.SqlToRelConverter$Blackboard.visit(SqlToRelConverter.java:5098) at org.apache.calcite.sql2rel.SqlToRelConverter$Blackboard.visit(SqlToRelConverter.java:4374) at org.apache.calcite.sql.SqlCall.accept(SqlCall.java:139) at org.apache.calcite.sql2rel.SqlToRelConverter$Blackboard.convertExpression(SqlToRelConverter.java:4961) at org.apache.calcite.sql2rel.SqlToRelConverter.createAggImpl(SqlToRelConverter.java:3204) at org.apache.calcite.sql2rel.SqlToRelConverter.convertAgg(SqlToRelConverter.java:3050) at org.apache.calcite.sql2rel.SqlToRelConverter.convertSelectImpl(SqlToRelConverter.java:682) at org.apache.calcite.sql2rel.SqlToRelConverter.convertSelect(SqlToRelConverter.java:644) at org.apache.calcite.sql2rel.SqlToRelConverter.convertQueryRecursive(SqlToRelConverter.java:3438) at org.apache.calcite.sql2rel.SqlToRelConverter.convertQuery(SqlToRelConverter.java:570) at io.netspring.playground.calcite.CalcitePoc.processQuery(CalcitePoc.java:642) at io.netspring.playground.calcite.CalcitePoc.main(CalcitePoc.java:696) Caused by: java.lang.reflect.InvocationTargetException at jdk.internal.reflect.GeneratedMethodAccessor1.invoke(Unknown Source) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:566) at org.apache.calcite.sql2rel.ReflectiveConvertletTable.lambda$registerNodeTypeMethod$0(ReflectiveConvertletTable.java:83) ... 19 more Caused by: java.lang.UnsupportedOperationException: class org.apache.calcite.sql.SqlBasicCall: SUM(SUM(`WS_EXT_SALES_PRICE`)) OVER (PARTITION BY `I_CLASS`) at org.apache.calcite.util.Util.needToImplement(Util.java:1075) at org.apache.calcite.sql.validate.SqlValidatorImpl.getValidatedNodeType(SqlValidatorImpl.java:1658) at org.apache.calcite.sql2rel.SqlToRelConverter.convertOver(SqlToRelConverter.java:2022) at org.apache.calcite.sql2rel.SqlToRelConverter.access$1900(SqlToRelConverter.java:221) at org.apache.calcite.sql2rel.SqlToRelConverter$Blackboard.convertExpression(SqlToRelConverter.java:4954) at org.apache.calcite.sql2rel.StandardConvertletTable.convertExpressionList(StandardConvertletTable.java:839) at org.apache.calcite.sql2rel.StandardConvertletTable.convertCall(StandardConvertletTable.java:815) at org.apache.calcite.sql2rel.StandardConvertletTable.convertCall(StandardConvertletTable.java:802) ... 23 more On (4), I am unsure if I am doing something wrong but I am simply not able to get decorrelation to work. Here's an example plan for query1 which is indeed a correlated query but calcite fails to decorrelate it. with customer_total_return as ( select sr_customer_sk as ctr_customer_sk, sr_store_sk as ctr_store_sk, sum(sr_return_amt) as ctr_total_return from store_returns, date_dim where sr_returned_date_sk = d_date_sk and d_year = 2000 group by sr_customer_sk, sr_store_sk) select c_customer_id from customer_total_return ctr1, store, customer where ctr1.ctr_total_return > (select avg(ctr_total_return)*1.2 from customer_total_return ctr2 where ctr1.ctr_store_sk = ctr2.ctr_store_sk) and s_store_sk = ctr1.ctr_store_sk and s_state = 'TN' and ctr1.ctr_customer_sk = c_customer_sk order by c_customer_id limit 100 *[Initial Logical Plan]*LogicalSort(sort0=[$0], dir0=[ASC], fetch=[100]) LogicalProject(C_CUSTOMER_ID=[$33]) LogicalFilter(condition=[AND(>($2, $50), =($3, $1), =($27, 'TN'), =($0, $32))]) LogicalCorrelate(correlation=[$cor0], joinType=[left], requiredColumns=[{1}]) LogicalJoin(condition=[true], joinType=[inner]) LogicalJoin(condition=[true], joinType=[inner]) LogicalProject(CTR_CUSTOMER_SK=[$0], CTR_STORE_SK=[$1], CTR_TOTAL_RETURN=[$2]) LogicalAggregate(group=[{0, 1}], CTR_TOTAL_RETURN=[SUM($2)]) LogicalProject(sr_customer_sk=[$3], sr_store_sk=[$7], sr_return_amt=[$11]) LogicalFilter(condition=[AND(=($0, $20), =($26, 2000))]) LogicalJoin(condition=[true], joinType=[inner]) LogicalTableScan(table=[[store_returns]]) LogicalTableScan(table=[[date_dim]]) LogicalTableScan(table=[[store]]) LogicalTableScan(table=[[customer]]) LogicalProject(EXPR$0=[*($0, 1.2:DECIMAL(2, 1))]) LogicalAggregate(group=[{}], agg#0=[AVG($0)]) LogicalProject(CTR_TOTAL_RETURN=[$2]) LogicalFilter(condition=[=($cor0.CTR_STORE_SK, $1)]) LogicalProject(CTR_CUSTOMER_SK=[$0], CTR_STORE_SK=[$1], CTR_TOTAL_RETURN=[$2]) LogicalAggregate(group=[{0, 1}], CTR_TOTAL_RETURN=[SUM($2)]) LogicalProject(sr_customer_sk=[$3], sr_store_sk=[$7], sr_return_amt=[$11]) LogicalFilter(condition=[AND(=($0, $20), =($26, 2000))]) LogicalJoin(condition=[true], joinType=[inner]) LogicalTableScan(table=[[store_returns]]) LogicalTableScan(table=[[date_dim]]) *[Decorrelated Logical Plan]*LogicalSort(sort0=[$0], dir0=[ASC], fetch=[100]) LogicalProject(C_CUSTOMER_ID=[$33]) LogicalProject(CTR_CUSTOMER_SK=[$0], CTR_STORE_SK=[$1], CTR_TOTAL_RETURN=[$2], s_store_sk=[$3], s_store_id=[$4], s_rec_start_date=[$5], s_rec_end_date=[$6], s_closed_date_sk=[$7], s_store_name=[$8], s_number_employees=[$9], s_floor_space=[$10], s_hours=[$11], s_manager=[$12], s_market_id=[$13], s_geography_class=[$14], s_market_desc=[$15], s_market_manager=[$16], s_division_id=[$17], s_division_name=[$18], s_company_id=[$19], s_company_name=[$20], s_street_number=[$21], s_street_name=[$22], s_street_type=[$23], s_suite_number=[$24], s_city=[$25], s_county=[$26], s_state=[$27], s_zip=[$28], s_country=[$29], s_gmt_offset=[$30], s_tax_precentage=[$31], c_customer_sk=[$32], c_customer_id=[$33], c_current_cdemo_sk=[$34], c_current_hdemo_sk=[$35], c_current_addr_sk=[$36], c_first_shipto_date_sk=[$37], c_first_sales_date_sk=[$38], c_salutation=[$39], c_first_name=[$40], c_last_name=[$41], c_preferred_cust_flag=[$42], c_birth_day=[$43], c_birth_month=[$44], c_birth_year=[$45], c_birth_country=[$46], c_login=[$47], c_email_address=[$48], c_last_review_date_sk=[$49], EXPR$0=[*($50, 1.2:DECIMAL(2, 1))]) LogicalFilter(condition=[>($2, *($50, 1.2:DECIMAL(2, 1)))]) LogicalCorrelate(correlation=[$cor0], joinType=[left], requiredColumns=[{1}]) LogicalFilter(condition=[AND(=($3, $1), =($27, 'TN'), =($0, $32))]) LogicalJoin(condition=[true], joinType=[inner]) LogicalJoin(condition=[true], joinType=[inner]) LogicalProject(CTR_CUSTOMER_SK=[$0], CTR_STORE_SK=[$1], CTR_TOTAL_RETURN=[$2]) LogicalAggregate(group=[{0, 1}], CTR_TOTAL_RETURN=[SUM($2)]) LogicalProject(sr_customer_sk=[$3], sr_store_sk=[$7], sr_return_amt=[$11]) LogicalJoin(condition=[=($0, $20)], joinType=[inner]) LogicalTableScan(table=[[store_returns]]) LogicalFilter(condition=[=($6, 2000)]) LogicalTableScan(table=[[date_dim]]) LogicalTableScan(table=[[store]]) LogicalTableScan(table=[[customer]]) LogicalAggregate(group=[{}], agg#0=[AVG($0)]) LogicalProject(CTR_TOTAL_RETURN=[$2]) LogicalFilter(condition=[=($cor0.CTR_STORE_SK, $1)]) LogicalProject(CTR_CUSTOMER_SK=[$0], CTR_STORE_SK=[$1], CTR_TOTAL_RETURN=[$2]) LogicalAggregate(group=[{0, 1}], CTR_TOTAL_RETURN=[SUM($2)]) LogicalProject(sr_customer_sk=[$3], sr_store_sk=[$7], sr_return_amt=[$11]) LogicalJoin(condition=[=($0, $20)], joinType=[inner]) LogicalTableScan(table=[[store_returns]]) LogicalFilter(condition=[=($6, 2000)]) LogicalTableScan(table=[[date_dim]]) As far as I can tell, no decorrelation happened for this query. What could I be doing wrong? I am including my source code here for reference in case it may help point out my issue. The source code is rather large because of the repetitive code for setting up the schema. But the rest of the code should be fairly compact and basic. Thanks in advance for your help! import java.io.File; import java.io.IOException; import java.nio.file.Files; import java.nio.file.Path; import java.nio.file.Paths; import java.util.Arrays; import java.util.Collections; import java.util.List; import java.util.Properties; import java.util.Set; import org.apache.calcite.avatica.util.Quoting; import org.apache.calcite.config.CalciteConnectionConfig; import org.apache.calcite.config.CalciteConnectionProperty; import org.apache.calcite.jdbc.CalciteSchema; import org.apache.calcite.jdbc.JavaTypeFactoryImpl; import org.apache.calcite.plan.ConventionTraitDef; import org.apache.calcite.plan.RelOptCluster; import org.apache.calcite.plan.RelOptTable; import org.apache.calcite.plan.RelOptUtil; import org.apache.calcite.plan.volcano.VolcanoPlanner; import org.apache.calcite.prepare.CalciteCatalogReader; import org.apache.calcite.rel.RelRoot; import org.apache.calcite.rel.type.RelDataType; import org.apache.calcite.rel.type.RelDataTypeFactory; import org.apache.calcite.rex.RexBuilder; import org.apache.calcite.schema.impl.AbstractTable; import org.apache.calcite.sql.SqlExplainFormat; import org.apache.calcite.sql.SqlExplainLevel; import org.apache.calcite.sql.fun.SqlStdOperatorTable; import org.apache.calcite.sql.parser.SqlParseException; import org.apache.calcite.sql.parser.SqlParser; import org.apache.calcite.sql.type.SqlTypeName; import org.apache.calcite.sql.validate.SqlValidator; import org.apache.calcite.sql.validate.SqlValidatorUtil; import org.apache.calcite.sql2rel.SqlToRelConverter; import org.apache.calcite.sql2rel.StandardConvertletTable; // This program serves as a template for how we can use Calcite for Flux // query planning. public class CalcitePoc { private static class Column { Column(String name, SqlTypeName dataType) { this.name = name; this.dataType = dataType; } private String name; private SqlTypeName dataType; } private static class Table extends AbstractTable { private final RelDataType rowType_; Table(RelDataTypeFactory typeFactory, Column... columns) { var builder = new RelDataTypeFactory.Builder(typeFactory); for (var column : columns) builder.add(column.name, column.dataType); rowType_ = builder.build(); } @Override public RelDataType getRowType(final RelDataTypeFactory typeFactory) { return rowType_; } } // This function returns a tpcds schema. We intentionally do not use the // implementation at org.apache.calcite.adapter.tpcds.TpcdsSchema which // is already provided by Calcite to illustrate how we may setup the // schema for NetSpring tables. // TODO: Should we convert this into a Schema instead of CalciteSchema? private static CalciteSchema newTpcdsSchema( RelDataTypeFactory typeFactory) { var schema = CalciteSchema.createRootSchema(true); schema.add( "catalog_sales", new Table( typeFactory, new Column("cs_sold_date_sk", SqlTypeName.BIGINT), new Column("cs_sold_time_sk", SqlTypeName.BIGINT), new Column("cs_ship_date_sk", SqlTypeName.BIGINT), new Column("cs_bill_customer_sk", SqlTypeName.BIGINT), new Column("cs_bill_cdemo_sk", SqlTypeName.BIGINT), new Column("cs_bill_hdemo_sk", SqlTypeName.BIGINT), new Column("cs_bill_addr_sk", SqlTypeName.BIGINT), new Column("cs_ship_customer_sk", SqlTypeName.BIGINT), new Column("cs_ship_cdemo_sk", SqlTypeName.BIGINT), new Column("cs_ship_hdemo_sk", SqlTypeName.BIGINT), new Column("cs_ship_addr_sk", SqlTypeName.BIGINT), new Column("cs_call_center_sk", SqlTypeName.BIGINT), new Column("cs_catalog_page_sk", SqlTypeName.BIGINT), new Column("cs_ship_mode_sk", SqlTypeName.BIGINT), new Column("cs_warehouse_sk", SqlTypeName.BIGINT), new Column("cs_item_sk", SqlTypeName.BIGINT), new Column("cs_promo_sk", SqlTypeName.BIGINT), new Column("cs_order_number", SqlTypeName.BIGINT), new Column("cs_quantity", SqlTypeName.BIGINT), new Column("cs_wholesale_cost", SqlTypeName.DOUBLE), new Column("cs_list_price", SqlTypeName.DOUBLE), new Column("cs_sales_price", SqlTypeName.DOUBLE), new Column("cs_ext_discount_amt", SqlTypeName.DOUBLE), new Column("cs_ext_sales_price", SqlTypeName.DOUBLE), new Column("cs_ext_wholesale_cost", SqlTypeName.DOUBLE), new Column("cs_ext_list_price", SqlTypeName.DOUBLE), new Column("cs_ext_tax", SqlTypeName.DOUBLE), new Column("cs_coupon_amt", SqlTypeName.DOUBLE), new Column("cs_ext_ship_cost", SqlTypeName.DOUBLE), new Column("cs_net_paid", SqlTypeName.DOUBLE), new Column("cs_net_paid_inc_tax", SqlTypeName.DOUBLE), new Column("cs_net_paid_inc_ship", SqlTypeName.DOUBLE), new Column("cs_net_paid_inc_ship_tax", SqlTypeName.DOUBLE), new Column("cs_net_profit", SqlTypeName.DOUBLE))); schema.add( "catalog_returns", new Table( typeFactory, new Column("cr_returned_date_sk", SqlTypeName.BIGINT), new Column("cr_returned_time_sk", SqlTypeName.BIGINT), new Column("cr_item_sk", SqlTypeName.BIGINT), new Column("cr_refunded_customer_sk", SqlTypeName.BIGINT), new Column("cr_refunded_cdemo_sk", SqlTypeName.BIGINT), new Column("cr_refunded_hdemo_sk", SqlTypeName.BIGINT), new Column("cr_refunded_addr_sk", SqlTypeName.BIGINT), new Column("cr_returning_customer_sk", SqlTypeName.BIGINT), new Column("cr_returning_cdemo_sk", SqlTypeName.BIGINT), new Column("cr_returning_hdemo_sk", SqlTypeName.BIGINT), new Column("cr_returning_addr_sk", SqlTypeName.BIGINT), new Column("cr_call_center_sk", SqlTypeName.BIGINT), new Column("cr_catalog_page_sk", SqlTypeName.BIGINT), new Column("cr_ship_mode_sk", SqlTypeName.BIGINT), new Column("cr_warehouse_sk", SqlTypeName.BIGINT), new Column("cr_reason_sk", SqlTypeName.BIGINT), new Column("cr_order_number", SqlTypeName.BIGINT), new Column("cr_return_quantity", SqlTypeName.BIGINT), new Column("cr_return_amount", SqlTypeName.DOUBLE), new Column("cr_return_tax", SqlTypeName.DOUBLE), new Column("cr_return_amt_inc_tax", SqlTypeName.DOUBLE), new Column("cr_fee", SqlTypeName.DOUBLE), new Column("cr_return_ship_cost", SqlTypeName.DOUBLE), new Column("cr_refunded_cash", SqlTypeName.DOUBLE), new Column("cr_reversed_charge", SqlTypeName.DOUBLE), new Column("cr_store_credit", SqlTypeName.DOUBLE), new Column("cr_net_loss", SqlTypeName.DOUBLE))); schema.add( "inventory", new Table( typeFactory, new Column("inv_date_sk", SqlTypeName.BIGINT), new Column("inv_item_sk", SqlTypeName.BIGINT), new Column("inv_warehouse_sk", SqlTypeName.BIGINT), new Column("inv_quantity_on_hand", SqlTypeName.BIGINT))); schema.add( "store_sales", new Table( typeFactory, new Column("ss_sold_date_sk", SqlTypeName.BIGINT), new Column("ss_sold_time_sk", SqlTypeName.BIGINT), new Column("ss_item_sk", SqlTypeName.BIGINT), new Column("ss_customer_sk", SqlTypeName.BIGINT), new Column("ss_cdemo_sk", SqlTypeName.BIGINT), new Column("ss_hdemo_sk", SqlTypeName.BIGINT), new Column("ss_addr_sk", SqlTypeName.BIGINT), new Column("ss_store_sk", SqlTypeName.BIGINT), new Column("ss_promo_sk", SqlTypeName.BIGINT), new Column("ss_ticket_number", SqlTypeName.BIGINT), new Column("ss_quantity", SqlTypeName.BIGINT), new Column("ss_wholesale_cost", SqlTypeName.DOUBLE), new Column("ss_list_price", SqlTypeName.DOUBLE), new Column("ss_sales_price", SqlTypeName.DOUBLE), new Column("ss_ext_discount_amt", SqlTypeName.DOUBLE), new Column("ss_ext_sales_price", SqlTypeName.DOUBLE), new Column("ss_ext_wholesale_cost", SqlTypeName.DOUBLE), new Column("ss_ext_list_price", SqlTypeName.DOUBLE), new Column("ss_ext_tax", SqlTypeName.DOUBLE), new Column("ss_coupon_amt", SqlTypeName.DOUBLE), new Column("ss_net_paid", SqlTypeName.DOUBLE), new Column("ss_net_paid_inc_tax", SqlTypeName.DOUBLE), new Column("ss_net_profit", SqlTypeName.DOUBLE))); schema.add( "store_returns", new Table( typeFactory, new Column("sr_returned_date_sk", SqlTypeName.BIGINT), new Column("sr_return_time_sk", SqlTypeName.BIGINT), new Column("sr_item_sk", SqlTypeName.BIGINT), new Column("sr_customer_sk", SqlTypeName.BIGINT), new Column("sr_cdemo_sk", SqlTypeName.BIGINT), new Column("sr_hdemo_sk", SqlTypeName.BIGINT), new Column("sr_addr_sk", SqlTypeName.BIGINT), new Column("sr_store_sk", SqlTypeName.BIGINT), new Column("sr_reason_sk", SqlTypeName.BIGINT), new Column("sr_ticket_number", SqlTypeName.BIGINT), new Column("sr_return_quantity", SqlTypeName.BIGINT), new Column("sr_return_amt", SqlTypeName.DOUBLE), new Column("sr_return_tax", SqlTypeName.DOUBLE), new Column("sr_return_amt_inc_tax", SqlTypeName.DOUBLE), new Column("sr_fee", SqlTypeName.DOUBLE), new Column("sr_return_ship_cost", SqlTypeName.DOUBLE), new Column("sr_refunded_cash", SqlTypeName.DOUBLE), new Column("sr_reversed_charge", SqlTypeName.DOUBLE), new Column("sr_store_credit", SqlTypeName.DOUBLE), new Column("sr_net_loss", SqlTypeName.DOUBLE))); schema.add( "web_sales", new Table( typeFactory, new Column("ws_sold_date_sk", SqlTypeName.BIGINT), new Column("ws_sold_time_sk", SqlTypeName.BIGINT), new Column("ws_ship_date_sk", SqlTypeName.BIGINT), new Column("ws_item_sk", SqlTypeName.BIGINT), new Column("ws_bill_customer_sk", SqlTypeName.BIGINT), new Column("ws_bill_cdemo_sk", SqlTypeName.BIGINT), new Column("ws_bill_hdemo_sk", SqlTypeName.BIGINT), new Column("ws_bill_addr_sk", SqlTypeName.BIGINT), new Column("ws_ship_customer_sk", SqlTypeName.BIGINT), new Column("ws_ship_cdemo_sk", SqlTypeName.BIGINT), new Column("ws_ship_hdemo_sk", SqlTypeName.BIGINT), new Column("ws_ship_addr_sk", SqlTypeName.BIGINT), new Column("ws_web_page_sk", SqlTypeName.BIGINT), new Column("ws_web_site_sk", SqlTypeName.BIGINT), new Column("ws_ship_mode_sk", SqlTypeName.BIGINT), new Column("ws_warehouse_sk", SqlTypeName.BIGINT), new Column("ws_promo_sk", SqlTypeName.BIGINT), new Column("ws_order_number", SqlTypeName.BIGINT), new Column("ws_quantity", SqlTypeName.BIGINT), new Column("ws_wholesale_cost", SqlTypeName.DOUBLE), new Column("ws_list_price", SqlTypeName.DOUBLE), new Column("ws_sales_price", SqlTypeName.DOUBLE), new Column("ws_ext_discount_amt", SqlTypeName.DOUBLE), new Column("ws_ext_sales_price", SqlTypeName.DOUBLE), new Column("ws_ext_wholesale_cost", SqlTypeName.DOUBLE), new Column("ws_ext_list_price", SqlTypeName.DOUBLE), new Column("ws_ext_tax", SqlTypeName.DOUBLE), new Column("ws_coupon_amt", SqlTypeName.DOUBLE), new Column("ws_ext_ship_cost", SqlTypeName.DOUBLE), new Column("ws_net_paid", SqlTypeName.DOUBLE), new Column("ws_net_paid_inc_tax", SqlTypeName.DOUBLE), new Column("ws_net_paid_inc_ship", SqlTypeName.DOUBLE), new Column("ws_net_paid_inc_ship_tax", SqlTypeName.DOUBLE), new Column("ws_net_profit", SqlTypeName.DOUBLE))); schema.add( "web_returns", new Table( typeFactory, new Column("wr_returned_date_sk", SqlTypeName.BIGINT), new Column("wr_returned_time_sk", SqlTypeName.BIGINT), new Column("wr_item_sk", SqlTypeName.BIGINT), new Column("wr_refunded_customer_sk", SqlTypeName.BIGINT), new Column("wr_refunded_cdemo_sk", SqlTypeName.BIGINT), new Column("wr_refunded_hdemo_sk", SqlTypeName.BIGINT), new Column("wr_refunded_addr_sk", SqlTypeName.BIGINT), new Column("wr_returning_customer_sk", SqlTypeName.BIGINT), new Column("wr_returning_cdemo_sk", SqlTypeName.BIGINT), new Column("wr_returning_hdemo_sk", SqlTypeName.BIGINT), new Column("wr_returning_addr_sk", SqlTypeName.BIGINT), new Column("wr_web_page_sk", SqlTypeName.BIGINT), new Column("wr_reason_sk", SqlTypeName.BIGINT), new Column("wr_order_number", SqlTypeName.BIGINT), new Column("wr_return_quantity", SqlTypeName.BIGINT), new Column("wr_return_amt", SqlTypeName.DOUBLE), new Column("wr_return_tax", SqlTypeName.DOUBLE), new Column("wr_return_amt_inc_tax", SqlTypeName.DOUBLE), new Column("wr_fee", SqlTypeName.DOUBLE), new Column("wr_return_ship_cost", SqlTypeName.DOUBLE), new Column("wr_refunded_cash", SqlTypeName.DOUBLE), new Column("wr_reversed_charge", SqlTypeName.DOUBLE), new Column("wr_account_credit", SqlTypeName.DOUBLE), new Column("wr_net_loss", SqlTypeName.DOUBLE))); schema.add( "call_center", new Table( typeFactory, new Column("cc_call_center_sk", SqlTypeName.BIGINT), new Column("cc_call_center_id", SqlTypeName.VARCHAR), new Column("cc_rec_start_date", SqlTypeName.DATE), new Column("cc_rec_end_date", SqlTypeName.DATE), new Column("cc_closed_date_sk", SqlTypeName.BIGINT), new Column("cc_open_date_sk", SqlTypeName.BIGINT), new Column("cc_name", SqlTypeName.VARCHAR), new Column("cc_class", SqlTypeName.VARCHAR), new Column("cc_employees", SqlTypeName.BIGINT), new Column("cc_sq_ft", SqlTypeName.BIGINT), new Column("cc_hours", SqlTypeName.VARCHAR), new Column("cc_manager", SqlTypeName.VARCHAR), new Column("cc_mkt_id", SqlTypeName.BIGINT), new Column("cc_mkt_class", SqlTypeName.VARCHAR), new Column("cc_mkt_desc", SqlTypeName.VARCHAR), new Column("cc_market_manager", SqlTypeName.VARCHAR), new Column("cc_division", SqlTypeName.BIGINT), new Column("cc_division_name", SqlTypeName.VARCHAR), new Column("cc_company", SqlTypeName.BIGINT), new Column("cc_company_name", SqlTypeName.VARCHAR), new Column("cc_street_number", SqlTypeName.VARCHAR), new Column("cc_street_name", SqlTypeName.VARCHAR), new Column("cc_street_type", SqlTypeName.VARCHAR), new Column("cc_suite_number", SqlTypeName.VARCHAR), new Column("cc_city", SqlTypeName.VARCHAR), new Column("cc_county", SqlTypeName.VARCHAR), new Column("cc_state", SqlTypeName.VARCHAR), new Column("cc_zip", SqlTypeName.VARCHAR), new Column("cc_country", SqlTypeName.VARCHAR), new Column("cc_gmt_offset", SqlTypeName.DOUBLE), new Column("cc_tax_percentage", SqlTypeName.DOUBLE))); schema.add( "catalog_page", new Table( typeFactory, new Column("cp_catalog_page_sk", SqlTypeName.BIGINT), new Column("cp_catalog_page_id", SqlTypeName.VARCHAR), new Column("cp_start_date_sk", SqlTypeName.BIGINT), new Column("cp_end_date_sk", SqlTypeName.BIGINT), new Column("cp_department", SqlTypeName.VARCHAR), new Column("cp_catalog_number", SqlTypeName.BIGINT), new Column("cp_catalog_page_number", SqlTypeName.BIGINT), new Column("cp_description", SqlTypeName.VARCHAR), new Column("cp_type", SqlTypeName.VARCHAR))); schema.add( "customer", new Table( typeFactory, new Column("c_customer_sk", SqlTypeName.BIGINT), new Column("c_customer_id", SqlTypeName.VARCHAR), new Column("c_current_cdemo_sk", SqlTypeName.BIGINT), new Column("c_current_hdemo_sk", SqlTypeName.BIGINT), new Column("c_current_addr_sk", SqlTypeName.BIGINT), new Column("c_first_shipto_date_sk", SqlTypeName.BIGINT), new Column("c_first_sales_date_sk", SqlTypeName.BIGINT), new Column("c_salutation", SqlTypeName.VARCHAR), new Column("c_first_name", SqlTypeName.VARCHAR), new Column("c_last_name", SqlTypeName.VARCHAR), new Column("c_preferred_cust_flag", SqlTypeName.VARCHAR), new Column("c_birth_day", SqlTypeName.BIGINT), new Column("c_birth_month", SqlTypeName.BIGINT), new Column("c_birth_year", SqlTypeName.BIGINT), new Column("c_birth_country", SqlTypeName.VARCHAR), new Column("c_login", SqlTypeName.VARCHAR), new Column("c_email_address", SqlTypeName.VARCHAR), new Column("c_last_review_date_sk", SqlTypeName.BIGINT))); schema.add( "customer_address", new Table( typeFactory, new Column("ca_address_sk", SqlTypeName.BIGINT), new Column("ca_address_id", SqlTypeName.VARCHAR), new Column("ca_street_number", SqlTypeName.VARCHAR), new Column("ca_street_name", SqlTypeName.VARCHAR), new Column("ca_street_type", SqlTypeName.VARCHAR), new Column("ca_suite_number", SqlTypeName.VARCHAR), new Column("ca_city", SqlTypeName.VARCHAR), new Column("ca_county", SqlTypeName.VARCHAR), new Column("ca_state", SqlTypeName.VARCHAR), new Column("ca_zip", SqlTypeName.VARCHAR), new Column("ca_country", SqlTypeName.VARCHAR), new Column("ca_gmt_offset", SqlTypeName.DOUBLE), new Column("ca_location_type", SqlTypeName.VARCHAR))); schema.add( "customer_demographics", new Table( typeFactory, new Column("cd_demo_sk", SqlTypeName.BIGINT), new Column("cd_gender", SqlTypeName.VARCHAR), new Column("cd_marital_status", SqlTypeName.VARCHAR), new Column("cd_education_status", SqlTypeName.VARCHAR), new Column("cd_purchase_estimate", SqlTypeName.BIGINT), new Column("cd_credit_rating", SqlTypeName.VARCHAR), new Column("cd_dep_count", SqlTypeName.BIGINT), new Column("cd_dep_employed_count", SqlTypeName.BIGINT), new Column("cd_dep_college_count", SqlTypeName.BIGINT))); schema.add( "date_dim", new Table( typeFactory, new Column("d_date_sk", SqlTypeName.BIGINT), new Column("d_date_id", SqlTypeName.VARCHAR), new Column("d_date", SqlTypeName.DATE), new Column("d_month_seq", SqlTypeName.BIGINT), new Column("d_week_seq", SqlTypeName.BIGINT), new Column("d_quarter_seq", SqlTypeName.BIGINT), new Column("d_year", SqlTypeName.BIGINT), new Column("d_dow", SqlTypeName.BIGINT), new Column("d_moy", SqlTypeName.BIGINT), new Column("d_dom", SqlTypeName.BIGINT), new Column("d_qoy", SqlTypeName.BIGINT), new Column("d_fy_year", SqlTypeName.BIGINT), new Column("d_fy_quarter_seq", SqlTypeName.BIGINT), new Column("d_fy_week_seq", SqlTypeName.BIGINT), new Column("d_day_name", SqlTypeName.VARCHAR), new Column("d_quarter_name", SqlTypeName.VARCHAR), new Column("d_holiday", SqlTypeName.VARCHAR), new Column("d_weekend", SqlTypeName.VARCHAR), new Column("d_following_holiday", SqlTypeName.VARCHAR), new Column("d_first_dom", SqlTypeName.BIGINT), new Column("d_last_dom", SqlTypeName.BIGINT), new Column("d_same_day_ly", SqlTypeName.BIGINT), new Column("d_same_day_lq", SqlTypeName.BIGINT), new Column("d_current_day", SqlTypeName.VARCHAR), new Column("d_current_week", SqlTypeName.VARCHAR), new Column("d_current_month", SqlTypeName.VARCHAR), new Column("d_current_quarter", SqlTypeName.VARCHAR), new Column("d_current_year", SqlTypeName.VARCHAR))); schema.add( "household_demographics", new Table( typeFactory, new Column("hd_demo_sk", SqlTypeName.BIGINT), new Column("hd_income_band_sk", SqlTypeName.BIGINT), new Column("hd_buy_potential", SqlTypeName.VARCHAR), new Column("hd_dep_count", SqlTypeName.BIGINT), new Column("hd_vehicle_count", SqlTypeName.BIGINT))); schema.add( "income_band", new Table( typeFactory, new Column("ib_income_band_sk", SqlTypeName.BIGINT), new Column("ib_lower_bound", SqlTypeName.BIGINT), new Column("ib_upper_bound", SqlTypeName.BIGINT))); schema.add( "item", new Table( typeFactory, new Column("i_item_sk", SqlTypeName.BIGINT), new Column("i_item_id", SqlTypeName.VARCHAR), new Column("i_rec_start_date", SqlTypeName.DATE), new Column("i_rec_end_date", SqlTypeName.DATE), new Column("i_item_desc", SqlTypeName.VARCHAR), new Column("i_current_price", SqlTypeName.DOUBLE), new Column("i_wholesale_cost", SqlTypeName.DOUBLE), new Column("i_brand_id", SqlTypeName.BIGINT), new Column("i_brand", SqlTypeName.VARCHAR), new Column("i_class_id", SqlTypeName.BIGINT), new Column("i_class", SqlTypeName.VARCHAR), new Column("i_category_id", SqlTypeName.BIGINT), new Column("i_category", SqlTypeName.VARCHAR), new Column("i_manufact_id", SqlTypeName.BIGINT), new Column("i_manufact", SqlTypeName.VARCHAR), new Column("i_size", SqlTypeName.VARCHAR), new Column("i_formulation", SqlTypeName.VARCHAR), new Column("i_color", SqlTypeName.VARCHAR), new Column("i_units", SqlTypeName.VARCHAR), new Column("i_container", SqlTypeName.VARCHAR), new Column("i_manager_id", SqlTypeName.BIGINT), new Column("i_product_name", SqlTypeName.VARCHAR))); schema.add( "promotion", new Table( typeFactory, new Column("p_promo_sk", SqlTypeName.BIGINT), new Column("p_promo_id", SqlTypeName.VARCHAR), new Column("p_start_date_sk", SqlTypeName.BIGINT), new Column("p_end_date_sk", SqlTypeName.BIGINT), new Column("p_item_sk", SqlTypeName.BIGINT), new Column("p_cost", SqlTypeName.DOUBLE), new Column("p_response_target", SqlTypeName.BIGINT), new Column("p_promo_name", SqlTypeName.VARCHAR), new Column("p_channel_dmail", SqlTypeName.VARCHAR), new Column("p_channel_email", SqlTypeName.VARCHAR), new Column("p_channel_catalog", SqlTypeName.VARCHAR), new Column("p_channel_tv", SqlTypeName.VARCHAR), new Column("p_channel_radio", SqlTypeName.VARCHAR), new Column("p_channel_press", SqlTypeName.VARCHAR), new Column("p_channel_event", SqlTypeName.VARCHAR), new Column("p_channel_demo", SqlTypeName.VARCHAR), new Column("p_channel_details", SqlTypeName.VARCHAR), new Column("p_purpose", SqlTypeName.VARCHAR), new Column("p_discount_active", SqlTypeName.VARCHAR))); schema.add( "reason", new Table( typeFactory, new Column("r_reason_sk", SqlTypeName.BIGINT), new Column("r_reason_id", SqlTypeName.VARCHAR), new Column("r_reason_desc", SqlTypeName.VARCHAR))); schema.add( "ship_mode", new Table( typeFactory, new Column("sm_ship_mode_sk", SqlTypeName.BIGINT), new Column("sm_ship_mode_id", SqlTypeName.VARCHAR), new Column("sm_type", SqlTypeName.VARCHAR), new Column("sm_code", SqlTypeName.VARCHAR), new Column("sm_carrier", SqlTypeName.VARCHAR), new Column("sm_contract", SqlTypeName.VARCHAR))); schema.add( "store", new Table( typeFactory, new Column("s_store_sk", SqlTypeName.BIGINT), new Column("s_store_id", SqlTypeName.VARCHAR), new Column("s_rec_start_date", SqlTypeName.DATE), new Column("s_rec_end_date", SqlTypeName.DATE), new Column("s_closed_date_sk", SqlTypeName.BIGINT), new Column("s_store_name", SqlTypeName.VARCHAR), new Column("s_number_employees", SqlTypeName.BIGINT), new Column("s_floor_space", SqlTypeName.BIGINT), new Column("s_hours", SqlTypeName.VARCHAR), new Column("s_manager", SqlTypeName.VARCHAR), new Column("s_market_id", SqlTypeName.BIGINT), new Column("s_geography_class", SqlTypeName.VARCHAR), new Column("s_market_desc", SqlTypeName.VARCHAR), new Column("s_market_manager", SqlTypeName.VARCHAR), new Column("s_division_id", SqlTypeName.BIGINT), new Column("s_division_name", SqlTypeName.VARCHAR), new Column("s_company_id", SqlTypeName.BIGINT), new Column("s_company_name", SqlTypeName.VARCHAR), new Column("s_street_number", SqlTypeName.VARCHAR), new Column("s_street_name", SqlTypeName.VARCHAR), new Column("s_street_type", SqlTypeName.VARCHAR), new Column("s_suite_number", SqlTypeName.VARCHAR), new Column("s_city", SqlTypeName.VARCHAR), new Column("s_county", SqlTypeName.VARCHAR), new Column("s_state", SqlTypeName.VARCHAR), new Column("s_zip", SqlTypeName.VARCHAR), new Column("s_country", SqlTypeName.VARCHAR), new Column("s_gmt_offset", SqlTypeName.DOUBLE), new Column("s_tax_precentage", SqlTypeName.DOUBLE))); schema.add( "time_dim", new Table( typeFactory, new Column("t_time_sk", SqlTypeName.BIGINT), new Column("t_time_id", SqlTypeName.VARCHAR), new Column("t_time", SqlTypeName.BIGINT), new Column("t_hour", SqlTypeName.BIGINT), new Column("t_minute", SqlTypeName.BIGINT), new Column("t_second", SqlTypeName.BIGINT), new Column("t_am_pm", SqlTypeName.VARCHAR), new Column("t_shift", SqlTypeName.VARCHAR), new Column("t_sub_shift", SqlTypeName.VARCHAR), new Column("t_meal_time", SqlTypeName.VARCHAR))); schema.add( "warehouse", new Table( typeFactory, new Column("w_warehouse_sk", SqlTypeName.BIGINT), new Column("w_warehouse_id", SqlTypeName.VARCHAR), new Column("w_warehouse_name", SqlTypeName.VARCHAR), new Column("w_warehouse_sq_ft", SqlTypeName.BIGINT), new Column("w_street_number", SqlTypeName.VARCHAR), new Column("w_street_name", SqlTypeName.VARCHAR), new Column("w_street_type", SqlTypeName.VARCHAR), new Column("w_suite_number", SqlTypeName.VARCHAR), new Column("w_city", SqlTypeName.VARCHAR), new Column("w_county", SqlTypeName.VARCHAR), new Column("w_state", SqlTypeName.VARCHAR), new Column("w_zip", SqlTypeName.VARCHAR), new Column("w_country", SqlTypeName.VARCHAR), new Column("w_gmt_offset", SqlTypeName.DOUBLE))); schema.add( "web_page", new Table( typeFactory, new Column("wp_web_page_sk", SqlTypeName.BIGINT), new Column("wp_web_page_id", SqlTypeName.VARCHAR), new Column("wp_rec_start_date", SqlTypeName.DATE), new Column("wp_rec_end_date", SqlTypeName.DATE), new Column("wp_creation_date_sk", SqlTypeName.BIGINT), new Column("wp_access_date_sk", SqlTypeName.BIGINT), new Column("wp_autogen_flag", SqlTypeName.VARCHAR), new Column("wp_customer_sk", SqlTypeName.BIGINT), new Column("wp_url", SqlTypeName.VARCHAR), new Column("wp_type", SqlTypeName.VARCHAR), new Column("wp_char_count", SqlTypeName.BIGINT), new Column("wp_link_count", SqlTypeName.BIGINT), new Column("wp_image_count", SqlTypeName.BIGINT), new Column("wp_max_ad_count", SqlTypeName.BIGINT))); schema.add( "web_site", new Table( typeFactory, new Column("web_site_sk", SqlTypeName.BIGINT), new Column("web_site_id", SqlTypeName.VARCHAR), new Column("web_rec_start_date", SqlTypeName.DATE), new Column("web_rec_end_date", SqlTypeName.DATE), new Column("web_name", SqlTypeName.VARCHAR), new Column("web_open_date_sk", SqlTypeName.BIGINT), new Column("web_close_date_sk", SqlTypeName.BIGINT), new Column("web_class", SqlTypeName.VARCHAR), new Column("web_manager", SqlTypeName.VARCHAR), new Column("web_mkt_id", SqlTypeName.BIGINT), new Column("web_mkt_class", SqlTypeName.VARCHAR), new Column("web_mkt_desc", SqlTypeName.VARCHAR), new Column("web_market_manager", SqlTypeName.VARCHAR), new Column("web_company_id", SqlTypeName.BIGINT), new Column("web_company_name", SqlTypeName.VARCHAR), new Column("web_street_number", SqlTypeName.VARCHAR), new Column("web_street_name", SqlTypeName.VARCHAR), new Column("web_street_type", SqlTypeName.VARCHAR), new Column("web_suite_number", SqlTypeName.VARCHAR), new Column("web_city", SqlTypeName.VARCHAR), new Column("web_county", SqlTypeName.VARCHAR), new Column("web_state", SqlTypeName.VARCHAR), new Column("web_zip", SqlTypeName.VARCHAR), new Column("web_country", SqlTypeName.VARCHAR), new Column("web_gmt_offset", SqlTypeName.DOUBLE), new Column("web_tax_percentage", SqlTypeName.DOUBLE))); return schema; } private static void processQuery( RelDataTypeFactory typeFactory, CalciteSchema schema, String queryFile) throws IOException, SqlParseException { System.out.println(queryFile); var sqlQuery = Files.readString(Path.of(queryFile)); System.out.println(sqlQuery); var parser = SqlParser.create( sqlQuery, SqlParser.Config.DEFAULT.withQuoting(Quoting.BACK_TICK)); var sqlNode = parser.parseQuery(); // 3> Validate the query var catalogReader = new CalciteCatalogReader( schema, Collections.emptyList(), typeFactory, CalciteConnectionConfig.DEFAULT.set( CalciteConnectionProperty.CASE_SENSITIVE, "false")); var validator = SqlValidatorUtil.newValidator( SqlStdOperatorTable.instance(), catalogReader, typeFactory, SqlValidator.Config.DEFAULT); sqlNode = validator.validate(sqlNode); // 4> Convert the query into logical plan. var planner = new VolcanoPlanner(); planner.addRelTraitDef(ConventionTraitDef.INSTANCE); var cluster = RelOptCluster.create(planner, new RexBuilder(typeFactory)); var relConverter = new SqlToRelConverter( new RelOptTable.ViewExpander() { // no-op expander @Override public RelRoot expandView( final RelDataType rowType, final String queryString, final List<String> schemaPath, final List<String> viewPath) { return null; } }, validator, catalogReader, cluster, StandardConvertletTable.INSTANCE); var logicalPlan = relConverter.convertQuery(sqlNode, false, true).rel; System.out.println(RelOptUtil.dumpPlan( "[Initial Logical Plan]", logicalPlan, SqlExplainFormat.TEXT, SqlExplainLevel.EXPPLAN_ATTRIBUTES)); logicalPlan = relConverter.decorrelate(sqlNode, logicalPlan); System.out.println(RelOptUtil.dumpPlan( "[Decorrelated Logical Plan]", logicalPlan, SqlExplainFormat.TEXT, SqlExplainLevel.EXPPLAN_ATTRIBUTES)); System.out.println("---"); } public static void main(String[] args) throws IOException, SqlParseException { // TODO: Should we use the SqlTypeFactoryImpl? var typeFactory = new JavaTypeFactoryImpl(); var schema = newTpcdsSchema(typeFactory); var queryFiles = new File("blaze/testing/tpcds/queries").list(); Arrays.sort( queryFiles, (a, b) -> a.toString().compareTo(b.toString())); int numQueries = 0; for (var queryFile : queryFiles) { var path = Paths.get("blaze/testing/tpcds/queries", queryFile); processQuery(typeFactory, schema, path.toString()); ++numQueries; } System.out.printf("---\nProcessed %d queries\n", numQueries); } }