Hello all, I try to run a Beam implementation [1] of TPC-DS benchmark [2] and I observe that most of the queries don’t pass because of different reasons (see below). I run it with Spark Runner but the issues, I believe, are mostly related to either query parsing or query planning, so we can expect the same with other runners too. For now, only ~22% (23/103) of TPC-DS queries passed successfully via Beam SQL / CalciteSQL.
The most common issues are the following ones: “Caused by: java.lang.UnsupportedOperationException: Non equi-join is not supported” “Caused by: java.lang.UnsupportedOperationException: ORDER BY without a LIMIT is not supported!” “Caused by: org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelOptPlanner$CannotPlanException: There are not enough rules to produce a node with desired properties: convention=BEAM_LOGICAL. All the inputs have relevant nodes, however the cost is still infinite.” “Caused by: org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.validate.SqlValidatorException: No match found for function signature substr(<CHARACTER>, <NUMERIC>, <NUMERIC>)” The full list of query statuses is available here [3]. The generated TPC-DS SQL queries can be found there as well [4]. I’m not very familiar with a current status of ongoing work for Beam SQL, so I’m sorry in advance if my questions will sound naive. Please, guide me on this: 1. Are there any chances that we can resolve, at least, partly the current limitations of the query parsing/planning, mentioned above? Are there any principal blockers among them? 2. Are there any plans or ongoing work related to this? 3. Are there any plans to upgrade vendored Calcite version to more recent one? Should it reduce the number of current limitations or not? 4. Do you think it could be valuable for Beam SQL to run TPC-DS benchmark on a regular basis (as we do for Nexmark, for example) even if not all queries can pass with Beam SQL? I’d appreciate any additional information/docs/details/opinions on this topic. — Alexey [1] https://github.com/apache/beam/tree/master/sdks/java/testing/tpcds [2] http://www.tpc.org/tpcds/ [3] https://docs.google.com/spreadsheets/d/1Gya9Xoa6uWwORHSrRqpkfSII4ajYvDpUTt0cNJCRHjE/edit?usp=sharing [4] https://github.com/apache/beam/tree/master/sdks/java/testing/tpcds/src/main/resources/queries