Hello all,

I try to run a Beam implementation [1] of TPC-DS benchmark [2] and I observe 
that most of the queries don’t pass because of different reasons (see below). I 
run it with Spark Runner but the issues, I believe, are mostly related to 
either query parsing or query planning, so we can expect the same with other 
runners too. For now, only ~22% (23/103) of TPC-DS queries passed successfully 
via Beam SQL / CalciteSQL.

The most common issues are the following ones:
“Caused by: java.lang.UnsupportedOperationException: Non equi-join is not 
supported”
“Caused by: java.lang.UnsupportedOperationException: ORDER BY without a LIMIT 
is not supported!”
“Caused by: 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelOptPlanner$CannotPlanException:
 There are not enough rules to produce a node with desired properties: 
convention=BEAM_LOGICAL. All the inputs have relevant nodes, however the cost 
is still infinite.”
“Caused by: 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.validate.SqlValidatorException:
 No match found for function signature substr(<CHARACTER>, <NUMERIC>, 
<NUMERIC>)”
The full list of query statuses is available here [3]. The generated TPC-DS SQL 
queries can be found there as well [4].

I’m not very familiar with a current status of ongoing work for Beam SQL, so 
I’m sorry in advance if my questions will sound naive. 

Please, guide me on this:

1. Are there any chances that we can resolve, at least, partly the current 
limitations of the query parsing/planning, mentioned above? Are there any 
principal blockers among them?
2. Are there any plans or ongoing work related to this?
3. Are there any plans to upgrade vendored Calcite version to more recent one? 
Should it reduce the number of current limitations or not?
4. Do you think it could be valuable for Beam SQL to run TPC-DS benchmark on a 
regular basis (as we do for Nexmark, for example) even if not all queries can 
pass with Beam SQL?

I’d appreciate any additional information/docs/details/opinions on this topic.

—
Alexey

[1] https://github.com/apache/beam/tree/master/sdks/java/testing/tpcds
[2] http://www.tpc.org/tpcds/
[3] 
https://docs.google.com/spreadsheets/d/1Gya9Xoa6uWwORHSrRqpkfSII4ajYvDpUTt0cNJCRHjE/edit?usp=sharing
[4] 
https://github.com/apache/beam/tree/master/sdks/java/testing/tpcds/src/main/resources/queries

Reply via email to