Re: [ANNOUNCE][Testing] TPC-DS benchmark suite in Beam

2022-09-16 Thread Kenneth Knowles
Awesome. Thank you (all)! We've had so many conversations about it and it
is great to have it running continuously.

Kenn

On Fri, Sep 16, 2022 at 9:29 AM Sachin Agarwal via dev 
wrote:

> This is wonderful - thank you so much to you and the whole Talend team to
> make Beam better!
>
> On Fri, Sep 16, 2022 at 9:11 AM Alexey Romanenko 
> wrote:
>
>> Hi everybody,
>>
>> As some of you may know, at Talend, we’ve been working for a while to add
>> TPC-DS benchmark suite into Beam. We believe that having TPC-DS as a part
>> of Beam testing workflow and release routine will help a community to
>> detect quickly the performance regressions or improvements, identify
>> missing or incorrect Beam SQL features and execute Beam SQL on different
>> runtime environments with different runners.
>>
>> What is TPC-DS? From TPC-DS specification document [1]:
>>
>> *“TPC-DS is a decision support benchmark that models several generally
>> applicable aspects of a decision support system, including queries and data
>> maintenance. The benchmark provides a representative evaluation of
>> performance as a general purpose decision support system.” *
>>
>> TPC-DS benchmark suite for Beam is implemented as a separate testing tool
>> for Java SDK (like well known Nexmark benchmark suite) [2]. It supports a
>> limited number of TPC-DS SQL queries for now (mostly because of limited SQL
>> syntax support in Beam), CSV and Parquet as input data format, and it runs
>> on Jenkins with three most popular Beam runners (Spark [3], Flink [4],
>> Dataflow [5]). The job metrics are stored in InfluxDB and can be accessed
>> though Grafana dashboards [6][7][8].
>>
>> More details can be found in Beam documentation [9].
>>
>> For sure, there are still plenty things to do, like adding new runners,
>> support of other SDKs, data formats, etc - so, your contributions are very
>> welcomed in any form. Though, at least for now, we already have a first
>> working and automated version that can be used by community.
>>
>> Also, I’d like to thank everybody who worked on this improvement!
>>
>> —
>> Alexey
>>
>>
>> [1]
>> https://www.tpc.org/tpc_documents_current_versions/current_specifications5.asp
>> [2] https://github.com/apache/beam/tree/master/sdks/java/testing/tpcds
>> [3] https://ci-beam.apache.org/job/beam_PostCommit_Java_Tpcds_Spark/
>> [4] https://ci-beam.apache.org/job/beam_PostCommit_Java_Tpcds_Flink/
>> [5] https://ci-beam.apache.org/job/beam_PostCommit_Java_Tpcds_Dataflow/
>> [6]
>> http://metrics.beam.apache.org/d/tkqc0AdGk2/tpc-ds-spark-classic-new-sql?orgId=1
>> [7] http://metrics.beam.apache.org/d/8INnSY9Mv/tpc-ds-flink-sql?orgId=1
>> [8]
>> http://metrics.beam.apache.org/d/tkqc0AdGk2/tpc-ds-spark-classic-new-sql?orgId=1
>> [9] https://beam.apache.org/documentation/sdks/java/testing/tpcds/
>>
>>
>>
>>
>>
>>


Re: [ANNOUNCE][Testing] TPC-DS benchmark suite in Beam

2022-09-16 Thread Sachin Agarwal via dev
This is wonderful - thank you so much to you and the whole Talend team to
make Beam better!

On Fri, Sep 16, 2022 at 9:11 AM Alexey Romanenko 
wrote:

> Hi everybody,
>
> As some of you may know, at Talend, we’ve been working for a while to add
> TPC-DS benchmark suite into Beam. We believe that having TPC-DS as a part
> of Beam testing workflow and release routine will help a community to
> detect quickly the performance regressions or improvements, identify
> missing or incorrect Beam SQL features and execute Beam SQL on different
> runtime environments with different runners.
>
> What is TPC-DS? From TPC-DS specification document [1]:
>
> *“TPC-DS is a decision support benchmark that models several generally
> applicable aspects of a decision support system, including queries and data
> maintenance. The benchmark provides a representative evaluation of
> performance as a general purpose decision support system.” *
>
> TPC-DS benchmark suite for Beam is implemented as a separate testing tool
> for Java SDK (like well known Nexmark benchmark suite) [2]. It supports a
> limited number of TPC-DS SQL queries for now (mostly because of limited SQL
> syntax support in Beam), CSV and Parquet as input data format, and it runs
> on Jenkins with three most popular Beam runners (Spark [3], Flink [4],
> Dataflow [5]). The job metrics are stored in InfluxDB and can be accessed
> though Grafana dashboards [6][7][8].
>
> More details can be found in Beam documentation [9].
>
> For sure, there are still plenty things to do, like adding new runners,
> support of other SDKs, data formats, etc - so, your contributions are very
> welcomed in any form. Though, at least for now, we already have a first
> working and automated version that can be used by community.
>
> Also, I’d like to thank everybody who worked on this improvement!
>
> —
> Alexey
>
>
> [1]
> https://www.tpc.org/tpc_documents_current_versions/current_specifications5.asp
> [2] https://github.com/apache/beam/tree/master/sdks/java/testing/tpcds
> [3] https://ci-beam.apache.org/job/beam_PostCommit_Java_Tpcds_Spark/
> [4] https://ci-beam.apache.org/job/beam_PostCommit_Java_Tpcds_Flink/
> [5] https://ci-beam.apache.org/job/beam_PostCommit_Java_Tpcds_Dataflow/
> [6]
> http://metrics.beam.apache.org/d/tkqc0AdGk2/tpc-ds-spark-classic-new-sql?orgId=1
> [7] http://metrics.beam.apache.org/d/8INnSY9Mv/tpc-ds-flink-sql?orgId=1
> [8]
> http://metrics.beam.apache.org/d/tkqc0AdGk2/tpc-ds-spark-classic-new-sql?orgId=1
> [9] https://beam.apache.org/documentation/sdks/java/testing/tpcds/
>
>
>
>
>
>