Re: [DISCUSSION] TPC-DS benchmark via Beam SQL, issues

Rui Wang Wed, 28 Apr 2021 12:16:04 -0700

>Could you point me out why "non equi-join” can’t be supported? Either it
can and this is just a question of implementation?


It is a question of implementation. As assuming join are implemented by
CoGBK, non-equi-join probably means you have to generate the key space and
then use CoGBK (which is equi-join) to do the join.

>I’m curious what is a current implementation of "ORDER BY LIMIT” and can
it be applied, at least, to only Bounded collection/Global window in the
same way for "ORDER BY" without limits?

IIRC, The implementation is based on TOP transform. I think the real
question is when support only ORDER BY, e.g. for Bounded collection/Global
window, is useful?

>I have one related question. Would we be able to apply SQL specific
optimizations that apply only to batch only pipelines? Asking this because
I can imagine that covering the full Beam model should constraint the
optimization possibilities no?

I am not sure if we can see a pipeline is batch only during the SQL
optimization process. But as I recall we can see if inputs are
bounded/unbounded, and probably we can only limit some optimizations only
for bounded PCollection.

On Wed, Apr 28, 2021 at 9:33 AM Alexey Romanenko <aromanenko....@gmail.com>
wrote:

>
>
> Cause 4 looks like no such function found in the catalog.
>
>
> I guess it should be
> *"SUBSTRING(<CHARACTER> FROM <NUMERIC> FOR <NUMERIC>)”* instead of 
> *"substr(<CHARACTER>,
> <NUMERIC>, <NUMERIC>)”* ?
>
>
> Well, s/*substr/**substring/ *seems fixes this problem.
>
> —
> Alexey
>
>
>> I’m not very familiar with a current status of ongoing work for Beam SQL,
>> so I’m sorry in advance if my questions will sound naive.
>>
>> Please, guide me on this:
>>
>> 1. Are there any chances that we can resolve, at least, partly the
>> current limitations of the query parsing/planning, mentioned above? Are
>> there any principal blockers among them?
>> 2. Are there any plans or ongoing work related to this?
>> 3. Are there any plans to upgrade vendored Calcite version to more recent
>> one? Should it reduce the number of current limitations or not?
>> 4. Do you think it could be valuable for Beam SQL to run TPC-DS benchmark
>> on a regular basis (as we do for Nexmark, for example) even if not all
>> queries can pass with Beam SQL?
>>
>
> This is definitely valuable for BeamSQL if we have enough resources to run
> such queries regularly.
>
>>
>> I’d appreciate any additional information/docs/details/opinions on this
>> topic.
>>
>> —
>> Alexey
>>
>> [1] https://github.com/apache/beam/tree/master/sdks/java/testing/tpcds
>> [2] http://www.tpc.org/tpcds/
>> [3]
>> https://docs.google.com/spreadsheets/d/1Gya9Xoa6uWwORHSrRqpkfSII4ajYvDpUTt0cNJCRHjE/edit?usp=sharing
>> [4]
>> https://github.com/apache/beam/tree/master/sdks/java/testing/tpcds/src/main/resources/queries
>>
>
>
>

Re: [DISCUSSION] TPC-DS benchmark via Beam SQL, issues

Reply via email to