On Tue, Apr 27, 2021 at 9:10 AM Alexey Romanenko <[email protected]>
wrote:

> Hello all,
>
> I try to run a Beam implementation [1] of TPC-DS benchmark [2] and I
> observe that most of the queries don’t pass because of different reasons
> (see below). I run it with Spark Runner but the issues, I believe, are
> mostly related to either query parsing or query planning, so we can expect
> the same with other runners too. For now, only ~22% (23/103) of TPC-DS
> queries passed successfully via Beam SQL / CalciteSQL.
>
> The most common issues are the following ones:
>
>    1. *“Caused by: java.lang.UnsupportedOperationException: Non equi-join
>    is not supported”*
>    2. *“Caused by: java.lang.UnsupportedOperationException: ORDER BY
>    without a LIMIT is not supported!”*
>    3. *“Caused by: 
> org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelOptPlanner$CannotPlanException:
>  There
>    are not enough rules to produce a node with desired
>    properties: convention=BEAM_LOGICAL. All the inputs have relevant nodes,
>    however the cost is still infinite.”*
>    4. *“Caused by: 
> org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.validate.SqlValidatorException:
>  No
>    match found for function signature substr(<CHARACTER>, <NUMERIC>,
>    <NUMERIC>)”*
>
> The full list of query statuses is available here [3]. The generated
> TPC-DS SQL queries can be found there as well [4].
>

Not every query can be supported by BeamSQL easily. For example, support
non equi-join(BEAM-2194). We had discussions for cause 2 to add the
limitation that BeamSQL only supports ORDER BY LIMIT (LIMIT is required).
Cause 3 needs a case by case investigation, some might be able to be fixed.
Cause 4 looks like no such function found in the catalog.

>
> I’m not very familiar with a current status of ongoing work for Beam SQL,
> so I’m sorry in advance if my questions will sound naive.
>
> Please, guide me on this:
>
> 1. Are there any chances that we can resolve, at least, partly the current
> limitations of the query parsing/planning, mentioned above? Are there any
> principal blockers among them?
> 2. Are there any plans or ongoing work related to this?
> 3. Are there any plans to upgrade vendored Calcite version to more recent
> one? Should it reduce the number of current limitations or not?
> 4. Do you think it could be valuable for Beam SQL to run TPC-DS benchmark
> on a regular basis (as we do for Nexmark, for example) even if not all
> queries can pass with Beam SQL?
>

This is definitely valuable for BeamSQL if we have enough resources to run
such queries regularly.

>
> I’d appreciate any additional information/docs/details/opinions on this
> topic.
>
> —
> Alexey
>
> [1] https://github.com/apache/beam/tree/master/sdks/java/testing/tpcds
> [2] http://www.tpc.org/tpcds/
> [3]
> https://docs.google.com/spreadsheets/d/1Gya9Xoa6uWwORHSrRqpkfSII4ajYvDpUTt0cNJCRHjE/edit?usp=sharing
> [4]
> https://github.com/apache/beam/tree/master/sdks/java/testing/tpcds/src/main/resources/queries
>

Reply via email to