On Tue, Apr 27, 2021 at 9:10 AM Alexey Romanenko <[email protected]> wrote:
> Hello all, > > I try to run a Beam implementation [1] of TPC-DS benchmark [2] and I > observe that most of the queries don’t pass because of different reasons > (see below). I run it with Spark Runner but the issues, I believe, are > mostly related to either query parsing or query planning, so we can expect > the same with other runners too. For now, only ~22% (23/103) of TPC-DS > queries passed successfully via Beam SQL / CalciteSQL. > > The most common issues are the following ones: > > 1. *“Caused by: java.lang.UnsupportedOperationException: Non equi-join > is not supported”* > 2. *“Caused by: java.lang.UnsupportedOperationException: ORDER BY > without a LIMIT is not supported!”* > 3. *“Caused by: > org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelOptPlanner$CannotPlanException: > There > are not enough rules to produce a node with desired > properties: convention=BEAM_LOGICAL. All the inputs have relevant nodes, > however the cost is still infinite.”* > 4. *“Caused by: > org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.validate.SqlValidatorException: > No > match found for function signature substr(<CHARACTER>, <NUMERIC>, > <NUMERIC>)”* > > The full list of query statuses is available here [3]. The generated > TPC-DS SQL queries can be found there as well [4]. > Not every query can be supported by BeamSQL easily. For example, support non equi-join(BEAM-2194). We had discussions for cause 2 to add the limitation that BeamSQL only supports ORDER BY LIMIT (LIMIT is required). Cause 3 needs a case by case investigation, some might be able to be fixed. Cause 4 looks like no such function found in the catalog. > > I’m not very familiar with a current status of ongoing work for Beam SQL, > so I’m sorry in advance if my questions will sound naive. > > Please, guide me on this: > > 1. Are there any chances that we can resolve, at least, partly the current > limitations of the query parsing/planning, mentioned above? Are there any > principal blockers among them? > 2. Are there any plans or ongoing work related to this? > 3. Are there any plans to upgrade vendored Calcite version to more recent > one? Should it reduce the number of current limitations or not? > 4. Do you think it could be valuable for Beam SQL to run TPC-DS benchmark > on a regular basis (as we do for Nexmark, for example) even if not all > queries can pass with Beam SQL? > This is definitely valuable for BeamSQL if we have enough resources to run such queries regularly. > > I’d appreciate any additional information/docs/details/opinions on this > topic. > > — > Alexey > > [1] https://github.com/apache/beam/tree/master/sdks/java/testing/tpcds > [2] http://www.tpc.org/tpcds/ > [3] > https://docs.google.com/spreadsheets/d/1Gya9Xoa6uWwORHSrRqpkfSII4ajYvDpUTt0cNJCRHjE/edit?usp=sharing > [4] > https://github.com/apache/beam/tree/master/sdks/java/testing/tpcds/src/main/resources/queries >
