Re: [ANNOUNCE][Testing] TPC-DS benchmark suite in Beam

2022-09-16 Thread Kenneth Knowles
Awesome. Thank you (all)! We've had so many conversations about it and it
is great to have it running continuously.

Kenn

On Fri, Sep 16, 2022 at 9:29 AM Sachin Agarwal via dev 
wrote:

> This is wonderful - thank you so much to you and the whole Talend team to
> make Beam better!
>
> On Fri, Sep 16, 2022 at 9:11 AM Alexey Romanenko 
> wrote:
>
>> Hi everybody,
>>
>> As some of you may know, at Talend, we’ve been working for a while to add
>> TPC-DS benchmark suite into Beam. We believe that having TPC-DS as a part
>> of Beam testing workflow and release routine will help a community to
>> detect quickly the performance regressions or improvements, identify
>> missing or incorrect Beam SQL features and execute Beam SQL on different
>> runtime environments with different runners.
>>
>> What is TPC-DS? From TPC-DS specification document [1]:
>>
>> *“TPC-DS is a decision support benchmark that models several generally
>> applicable aspects of a decision support system, including queries and data
>> maintenance. The benchmark provides a representative evaluation of
>> performance as a general purpose decision support system.” *
>>
>> TPC-DS benchmark suite for Beam is implemented as a separate testing tool
>> for Java SDK (like well known Nexmark benchmark suite) [2]. It supports a
>> limited number of TPC-DS SQL queries for now (mostly because of limited SQL
>> syntax support in Beam), CSV and Parquet as input data format, and it runs
>> on Jenkins with three most popular Beam runners (Spark [3], Flink [4],
>> Dataflow [5]). The job metrics are stored in InfluxDB and can be accessed
>> though Grafana dashboards [6][7][8].
>>
>> More details can be found in Beam documentation [9].
>>
>> For sure, there are still plenty things to do, like adding new runners,
>> support of other SDKs, data formats, etc - so, your contributions are very
>> welcomed in any form. Though, at least for now, we already have a first
>> working and automated version that can be used by community.
>>
>> Also, I’d like to thank everybody who worked on this improvement!
>>
>> —
>> Alexey
>>
>>
>> [1]
>> https://www.tpc.org/tpc_documents_current_versions/current_specifications5.asp
>> [2] https://github.com/apache/beam/tree/master/sdks/java/testing/tpcds
>> [3] https://ci-beam.apache.org/job/beam_PostCommit_Java_Tpcds_Spark/
>> [4] https://ci-beam.apache.org/job/beam_PostCommit_Java_Tpcds_Flink/
>> [5] https://ci-beam.apache.org/job/beam_PostCommit_Java_Tpcds_Dataflow/
>> [6]
>> http://metrics.beam.apache.org/d/tkqc0AdGk2/tpc-ds-spark-classic-new-sql?orgId=1
>> [7] http://metrics.beam.apache.org/d/8INnSY9Mv/tpc-ds-flink-sql?orgId=1
>> [8]
>> http://metrics.beam.apache.org/d/tkqc0AdGk2/tpc-ds-spark-classic-new-sql?orgId=1
>> [9] https://beam.apache.org/documentation/sdks/java/testing/tpcds/
>>
>>
>>
>>
>>
>>


Re: [ANNOUNCE][Testing] TPC-DS benchmark suite in Beam

2022-09-16 Thread Sachin Agarwal via dev
This is wonderful - thank you so much to you and the whole Talend team to
make Beam better!

On Fri, Sep 16, 2022 at 9:11 AM Alexey Romanenko 
wrote:

> Hi everybody,
>
> As some of you may know, at Talend, we’ve been working for a while to add
> TPC-DS benchmark suite into Beam. We believe that having TPC-DS as a part
> of Beam testing workflow and release routine will help a community to
> detect quickly the performance regressions or improvements, identify
> missing or incorrect Beam SQL features and execute Beam SQL on different
> runtime environments with different runners.
>
> What is TPC-DS? From TPC-DS specification document [1]:
>
> *“TPC-DS is a decision support benchmark that models several generally
> applicable aspects of a decision support system, including queries and data
> maintenance. The benchmark provides a representative evaluation of
> performance as a general purpose decision support system.” *
>
> TPC-DS benchmark suite for Beam is implemented as a separate testing tool
> for Java SDK (like well known Nexmark benchmark suite) [2]. It supports a
> limited number of TPC-DS SQL queries for now (mostly because of limited SQL
> syntax support in Beam), CSV and Parquet as input data format, and it runs
> on Jenkins with three most popular Beam runners (Spark [3], Flink [4],
> Dataflow [5]). The job metrics are stored in InfluxDB and can be accessed
> though Grafana dashboards [6][7][8].
>
> More details can be found in Beam documentation [9].
>
> For sure, there are still plenty things to do, like adding new runners,
> support of other SDKs, data formats, etc - so, your contributions are very
> welcomed in any form. Though, at least for now, we already have a first
> working and automated version that can be used by community.
>
> Also, I’d like to thank everybody who worked on this improvement!
>
> —
> Alexey
>
>
> [1]
> https://www.tpc.org/tpc_documents_current_versions/current_specifications5.asp
> [2] https://github.com/apache/beam/tree/master/sdks/java/testing/tpcds
> [3] https://ci-beam.apache.org/job/beam_PostCommit_Java_Tpcds_Spark/
> [4] https://ci-beam.apache.org/job/beam_PostCommit_Java_Tpcds_Flink/
> [5] https://ci-beam.apache.org/job/beam_PostCommit_Java_Tpcds_Dataflow/
> [6]
> http://metrics.beam.apache.org/d/tkqc0AdGk2/tpc-ds-spark-classic-new-sql?orgId=1
> [7] http://metrics.beam.apache.org/d/8INnSY9Mv/tpc-ds-flink-sql?orgId=1
> [8]
> http://metrics.beam.apache.org/d/tkqc0AdGk2/tpc-ds-spark-classic-new-sql?orgId=1
> [9] https://beam.apache.org/documentation/sdks/java/testing/tpcds/
>
>
>
>
>
>


[ANNOUNCE][Testing] TPC-DS benchmark suite in Beam

2022-09-16 Thread Alexey Romanenko
Hi everybody,

As some of you may know, at Talend, we’ve been working for a while to add 
TPC-DS benchmark suite into Beam. We believe that having TPC-DS as a part of 
Beam testing workflow and release routine will help a community to detect 
quickly the performance regressions or improvements, identify missing or 
incorrect Beam SQL features and execute Beam SQL on different runtime 
environments with different runners. 

What is TPC-DS? From TPC-DS specification document [1]:

“TPC-DS is a decision support benchmark that models several generally 
applicable aspects of a decision support system, including queries and data 
maintenance. The benchmark provides a representative evaluation of performance 
as a general purpose decision support system.” 

TPC-DS benchmark suite for Beam is implemented as a separate testing tool for 
Java SDK (like well known Nexmark benchmark suite) [2]. It supports a limited 
number of TPC-DS SQL queries for now (mostly because of limited SQL syntax 
support in Beam), CSV and Parquet as input data format, and it runs on Jenkins 
with three most popular Beam runners (Spark [3], Flink [4], Dataflow [5]). The 
job metrics are stored in InfluxDB and can be accessed though Grafana 
dashboards [6][7][8]. 

More details can be found in Beam documentation [9].

For sure, there are still plenty things to do, like adding new runners, support 
of other SDKs, data formats, etc - so, your contributions are very welcomed in 
any form. Though, at least for now, we already have a first working and 
automated version that can be used by community. 

Also, I’d like to thank everybody who worked on this improvement!

—
Alexey


[1] 
https://www.tpc.org/tpc_documents_current_versions/current_specifications5.asp 

[2] https://github.com/apache/beam/tree/master/sdks/java/testing/tpcds 

[3] https://ci-beam.apache.org/job/beam_PostCommit_Java_Tpcds_Spark/ 

[4] https://ci-beam.apache.org/job/beam_PostCommit_Java_Tpcds_Flink/ 

[5] https://ci-beam.apache.org/job/beam_PostCommit_Java_Tpcds_Dataflow/ 

[6] 
http://metrics.beam.apache.org/d/tkqc0AdGk2/tpc-ds-spark-classic-new-sql?orgId=1
 

[7] http://metrics.beam.apache.org/d/8INnSY9Mv/tpc-ds-flink-sql?orgId=1 

[8] 
http://metrics.beam.apache.org/d/tkqc0AdGk2/tpc-ds-spark-classic-new-sql?orgId=1
 

[9] https://beam.apache.org/documentation/sdks/java/testing/tpcds/