This is wonderful - thank you so much to you and the whole Talend team to make Beam better!
On Fri, Sep 16, 2022 at 9:11 AM Alexey Romanenko <aromanenko....@gmail.com> wrote: > Hi everybody, > > As some of you may know, at Talend, we’ve been working for a while to add > TPC-DS benchmark suite into Beam. We believe that having TPC-DS as a part > of Beam testing workflow and release routine will help a community to > detect quickly the performance regressions or improvements, identify > missing or incorrect Beam SQL features and execute Beam SQL on different > runtime environments with different runners. > > What is TPC-DS? From TPC-DS specification document [1]: > > *“TPC-DS is a decision support benchmark that models several generally > applicable aspects of a decision support system, including queries and data > maintenance. The benchmark provides a representative evaluation of > performance as a general purpose decision support system.” * > > TPC-DS benchmark suite for Beam is implemented as a separate testing tool > for Java SDK (like well known Nexmark benchmark suite) [2]. It supports a > limited number of TPC-DS SQL queries for now (mostly because of limited SQL > syntax support in Beam), CSV and Parquet as input data format, and it runs > on Jenkins with three most popular Beam runners (Spark [3], Flink [4], > Dataflow [5]). The job metrics are stored in InfluxDB and can be accessed > though Grafana dashboards [6][7][8]. > > More details can be found in Beam documentation [9]. > > For sure, there are still plenty things to do, like adding new runners, > support of other SDKs, data formats, etc - so, your contributions are very > welcomed in any form. Though, at least for now, we already have a first > working and automated version that can be used by community. > > Also, I’d like to thank everybody who worked on this improvement! > > — > Alexey > > > [1] > https://www.tpc.org/tpc_documents_current_versions/current_specifications5.asp > [2] https://github.com/apache/beam/tree/master/sdks/java/testing/tpcds > [3] https://ci-beam.apache.org/job/beam_PostCommit_Java_Tpcds_Spark/ > [4] https://ci-beam.apache.org/job/beam_PostCommit_Java_Tpcds_Flink/ > [5] https://ci-beam.apache.org/job/beam_PostCommit_Java_Tpcds_Dataflow/ > [6] > http://metrics.beam.apache.org/d/tkqc0AdGk2/tpc-ds-spark-classic-new-sql?orgId=1 > [7] http://metrics.beam.apache.org/d/8INnSY9Mv/tpc-ds-flink-sql?orgId=1 > [8] > http://metrics.beam.apache.org/d/tkqc0AdGk2/tpc-ds-spark-classic-new-sql?orgId=1 > [9] https://beam.apache.org/documentation/sdks/java/testing/tpcds/ > > > > > >