> Not every query can be supported by BeamSQL easily. I have one related question. Would we be able to apply SQL specific optimizations that apply only to batch only pipelines? Asking this because I can imagine that covering the full Beam model should constraint the optimization possibilities no?
On Tue, Apr 27, 2021 at 7:25 PM Rui Wang <ruw...@google.com> wrote: > > > On Tue, Apr 27, 2021 at 9:10 AM Alexey Romanenko <aromanenko....@gmail.com> > wrote: > >> Hello all, >> >> I try to run a Beam implementation [1] of TPC-DS benchmark [2] and I >> observe that most of the queries don’t pass because of different reasons >> (see below). I run it with Spark Runner but the issues, I believe, are >> mostly related to either query parsing or query planning, so we can expect >> the same with other runners too. For now, only ~22% (23/103) of TPC-DS >> queries passed successfully via Beam SQL / CalciteSQL. >> >> The most common issues are the following ones: >> >> 1. *“Caused by: java.lang.UnsupportedOperationException: Non >> equi-join is not supported”* >> 2. *“Caused by: java.lang.UnsupportedOperationException: ORDER BY >> without a LIMIT is not supported!”* >> 3. *“Caused by: >> org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelOptPlanner$CannotPlanException: >> There >> are not enough rules to produce a node with desired >> properties: convention=BEAM_LOGICAL. All the inputs have relevant nodes, >> however the cost is still infinite.”* >> 4. *“Caused by: >> org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.validate.SqlValidatorException: >> No >> match found for function signature substr(<CHARACTER>, <NUMERIC>, >> <NUMERIC>)”* >> >> The full list of query statuses is available here [3]. The generated >> TPC-DS SQL queries can be found there as well [4]. >> > > Not every query can be supported by BeamSQL easily. For example, support > non equi-join(BEAM-2194). We had discussions for cause 2 to add the > limitation that BeamSQL only supports ORDER BY LIMIT (LIMIT is required). > Cause 3 needs a case by case investigation, some might be able to be fixed. > Cause 4 looks like no such function found in the catalog. > >> >> I’m not very familiar with a current status of ongoing work for Beam SQL, >> so I’m sorry in advance if my questions will sound naive. >> >> Please, guide me on this: >> >> 1. Are there any chances that we can resolve, at least, partly the >> current limitations of the query parsing/planning, mentioned above? Are >> there any principal blockers among them? >> 2. Are there any plans or ongoing work related to this? >> 3. Are there any plans to upgrade vendored Calcite version to more recent >> one? Should it reduce the number of current limitations or not? >> 4. Do you think it could be valuable for Beam SQL to run TPC-DS benchmark >> on a regular basis (as we do for Nexmark, for example) even if not all >> queries can pass with Beam SQL? >> > > This is definitely valuable for BeamSQL if we have enough resources to run > such queries regularly. > >> >> I’d appreciate any additional information/docs/details/opinions on this >> topic. >> >> — >> Alexey >> >> [1] https://github.com/apache/beam/tree/master/sdks/java/testing/tpcds >> [2] http://www.tpc.org/tpcds/ >> [3] >> https://docs.google.com/spreadsheets/d/1Gya9Xoa6uWwORHSrRqpkfSII4ajYvDpUTt0cNJCRHjE/edit?usp=sharing >> [4] >> https://github.com/apache/beam/tree/master/sdks/java/testing/tpcds/src/main/resources/queries >> >