The jenkins jobs for the Flink load tests: https://github.com/apache/beam/blob/master/.test-infra/jenkins/job_LoadTests_ParDo_Flink_Python.groovy
The documentation for the test contains how to run it on each runner: https://github.com/apache/beam/blob/master/sdks/python/apache_beam/testing/load_tests/pardo_test.py#L17 I assume that standing up the Flink cluster should be done separately. LMK if that helps Robert. -P. On Fri, Dec 20, 2019 at 9:59 AM Robert Bradshaw <[email protected]> wrote: > Yes, it is possible that this had an influence--Reads are now all > implemented as SDFs and Creates involve a reshuffle to better > redistribute data. This much of a change is quite surprising. Where is > the pipeline for, say, "Python | ParDo | 2GB, 100 byte records, 10 > iterations | Batch" and how does one run it? > > On Fri, Dec 20, 2019 at 6:50 AM Kamil Wasilewski > <[email protected]> wrote: > > > > Hi all, > > > > We have a couple of Python load tests running on Flink in which we are > testing the performance of ParDo, GroupByKey, CoGroupByKey and Combine > operations. > > > > Recently, I've discovered that the runtime of all those tests rose up > significantly. It happened between the 6th and 7th of December (the tests > are running daily). Here are the dashboards where you can see the results: > > > > > https://apache-beam-testing.appspot.com/explore?dashboard=5649695233802240 > > > https://apache-beam-testing.appspot.com/explore?dashboard=5763764733345792 > > > https://apache-beam-testing.appspot.com/explore?dashboard=5698549949923328 > > > https://apache-beam-testing.appspot.com/explore?dashboard=5678187241537536 > > > > I've seen in that period we submitted some changes to the core, > including Read transform. Do you think this might have influenced the > results? > > > > Thanks, > > Kamil >
