> Not every query can be supported by BeamSQL easily.

I have one related question. Would we be able to apply SQL specific
optimizations that apply only to batch only pipelines? Asking this because
I can imagine that covering the full Beam model should constraint the
optimization possibilities no?

On Tue, Apr 27, 2021 at 7:25 PM Rui Wang <ruw...@google.com> wrote:

>
>
> On Tue, Apr 27, 2021 at 9:10 AM Alexey Romanenko <aromanenko....@gmail.com>
> wrote:
>
>> Hello all,
>>
>> I try to run a Beam implementation [1] of TPC-DS benchmark [2] and I
>> observe that most of the queries don’t pass because of different reasons
>> (see below). I run it with Spark Runner but the issues, I believe, are
>> mostly related to either query parsing or query planning, so we can expect
>> the same with other runners too. For now, only ~22% (23/103) of TPC-DS
>> queries passed successfully via Beam SQL / CalciteSQL.
>>
>> The most common issues are the following ones:
>>
>>    1. *“Caused by: java.lang.UnsupportedOperationException: Non
>>    equi-join is not supported”*
>>    2. *“Caused by: java.lang.UnsupportedOperationException: ORDER BY
>>    without a LIMIT is not supported!”*
>>    3. *“Caused by: 
>> org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelOptPlanner$CannotPlanException:
>>  There
>>    are not enough rules to produce a node with desired
>>    properties: convention=BEAM_LOGICAL. All the inputs have relevant nodes,
>>    however the cost is still infinite.”*
>>    4. *“Caused by: 
>> org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.validate.SqlValidatorException:
>>  No
>>    match found for function signature substr(<CHARACTER>, <NUMERIC>,
>>    <NUMERIC>)”*
>>
>> The full list of query statuses is available here [3]. The generated
>> TPC-DS SQL queries can be found there as well [4].
>>
>
> Not every query can be supported by BeamSQL easily. For example, support
> non equi-join(BEAM-2194). We had discussions for cause 2 to add the
> limitation that BeamSQL only supports ORDER BY LIMIT (LIMIT is required).
> Cause 3 needs a case by case investigation, some might be able to be fixed.
> Cause 4 looks like no such function found in the catalog.
>
>>
>> I’m not very familiar with a current status of ongoing work for Beam SQL,
>> so I’m sorry in advance if my questions will sound naive.
>>
>> Please, guide me on this:
>>
>> 1. Are there any chances that we can resolve, at least, partly the
>> current limitations of the query parsing/planning, mentioned above? Are
>> there any principal blockers among them?
>> 2. Are there any plans or ongoing work related to this?
>> 3. Are there any plans to upgrade vendored Calcite version to more recent
>> one? Should it reduce the number of current limitations or not?
>> 4. Do you think it could be valuable for Beam SQL to run TPC-DS benchmark
>> on a regular basis (as we do for Nexmark, for example) even if not all
>> queries can pass with Beam SQL?
>>
>
> This is definitely valuable for BeamSQL if we have enough resources to run
> such queries regularly.
>
>>
>> I’d appreciate any additional information/docs/details/opinions on this
>> topic.
>>
>> —
>> Alexey
>>
>> [1] https://github.com/apache/beam/tree/master/sdks/java/testing/tpcds
>> [2] http://www.tpc.org/tpcds/
>> [3]
>> https://docs.google.com/spreadsheets/d/1Gya9Xoa6uWwORHSrRqpkfSII4ajYvDpUTt0cNJCRHjE/edit?usp=sharing
>> [4]
>> https://github.com/apache/beam/tree/master/sdks/java/testing/tpcds/src/main/resources/queries
>>
>

Reply via email to