Hi Beamers, an update on this. Together With Kasia, Michał and cooperating closely with Pablo we have created and scheduled a Cron Job running daily 7 tests for GroupByKey batch scenarios. Description of the tests is in the proposal [1] and will be documented later. The dashboards for the tests: - showing run times [2] - showing total load size (bytes) [3]
All the metrics are collected using Beam's Metrics API. Things we have on our horizon: - the same set of tests for Java but in streaming mode - similar jobs for Python SDK - running similar suites on Flink runner We have also created a set of Dataproc bash scripts that can be used to set up a Flink cluster that supports portability [4]. It is ready to use and I've already successfully run the word count example using Python SDK on it. Hoping + aiming to run load tests on it soon. :) BTW/Last but not least: we also reused some code to collect metrics using Metrics API in TextIOIT too and are willing to do a similar change for other IOITs. Dashboards for TextIOIT: [5]. Thanks, Łukasz [1] https://s.apache.org/load-test-basic-operations [2] https://apache-beam-testing.appspot.com/explore?dashboard=5643144871804928 [3] https://apache-beam-testing.appspot.com/explore?dashboard=5701325169885184 [4] https://github.com/apache/beam/blob/b1ed061fd0c1ed1da562089c939d55715907769d/.test-infra/dataproc/create_flink_cluster.sh [5] https://apache-beam-testing.appspot.com/explore?dashboard=5629522644828160 śr., 12 wrz 2018 o 14:23 Etienne Chauchot <[email protected]> napisał(a): > Let me elaborate a bit my last sentence > Le mardi 11 septembre 2018 à 11:29 +0200, Etienne Chauchot a écrit : > > Hi Lukasz, > > Well, having low level byte[] based pure performance tests makes sense. > And having high level realistic model (Nexmark auction system) makes sense > also to avoid testing unrealistic pipelines as you describe. > > Have common code between the 2 seems difficult as both the architecture > and the model are different. > > I'm more concerned about having two CI mechanisms to detect > functionnal/performance regressions. > > > Even if parts of NexMark and performance tests are the same they could > target different objectives: raw performance tests (the new framework) and > user oriented tests (nexmark). So they might be complementary. > > We must just chose how to run them. I think we need to have only one > automatic regression detection tool. IMHO, the most relevant for func/perf > regression is Nexmark because it represents what a real user could do (it > simulates an auction system). So let's keep it as post commits. Post > commits allow to target a particular commit that introduced a regression. > > We could schedule the new performance tests. > > Best > Etienne > > > Best > Etienne > > Le lundi 10 septembre 2018 à 18:33 +0200, Łukasz Gajowy a écrit : > > In my opinion and as far as I understand Nexmark, there are some benefits > to having both types of tests. The load tests we propose can be very > straightforward and clearly show what is being tested thanks to the fact > that there's no fixed model but very "low level" KV<byte[], byte[]> > collections only. They are more flexible in shapes of the pipelines they > can express e.g. fanout_64, without having to think about specific use > cases. > > Having both types would allow developers to decide whether they want to > create a new Nexmark query for their specific case or develop a new Load > test (whatever is easier and more fits their case). However, there is a > risk - with KV<byte[], byte[]> developer can overemphasize cases that can > never happen in practice, so we need to be careful about the exact > configurations we run. > > Still, I can imagine that there surely will be code that should be common > to both types of tests and we seek ways to not duplicate code. > > WDYT? > > Regards, > Łukasz > > > > pon., 10 wrz 2018 o 16:36 Etienne Chauchot <[email protected]> > napisał(a): > > Hi, > It seems that there is a notable overlap with what Nexmark already does: > Nexmark mesures performance and regression by exercising all the Beam > model in both batch and streaming modes with several runners. It also > computes on synthetic data. Also nexmark is already included as PostCommits > in the CI and dashboards. > > Shall we merge the two? > > Best > > Etienne > > Le lundi 10 septembre 2018 à 12:56 +0200, Łukasz Gajowy a écrit : > > Hello everyone, > > thank you for all your comments to the proposal. To sum up: > > A set of performance tests exercising Core Beam Transforms (ParDo, > GroupByKey, CoGroupByKey, Combine) will be implemented for Java and Python > SDKs. Those tests will allow to: > > - measure performance of the transforms on various runners > - exercise the transforms by creating stressful conditions and big > loads using Synthetic Source and Synthetic Step API (delays, keeping cpu > busy or asleep, processing large keys and values, performing fanout or > reiteration of inputs) > - run both in batch and streaming context > - gather various metrics > - notice regressions by comparing data from consequent Jenkins runs > > Metrics (runtime, consumed bytes, memory usage, split/bundle count) can be > gathered during test invocations. We will start with runtime and leverage > Metrics API to collect the other metrics in later phases of development. > The tests will be fully configurable through pipeline options and it will > be possible to run any custom scenarios manually. However, a representative > set of testing scenarios will be run periodically using Jenkins. > > Regards, > Łukasz > > śr., 5 wrz 2018 o 20:31 Rafael Fernandez <[email protected]> napisał(a): > > neat! left a comment or two > > On Mon, Sep 3, 2018 at 3:53 AM Łukasz Gajowy <[email protected]> wrote: > > Hi all! > > I'm bumping this (in case you missed it). Any feedback and questions are > welcome! > > Best regards, > Łukasz > > pon., 13 sie 2018 o 13:51 Jean-Baptiste Onofré <[email protected]> > napisał(a): > > Hi Lukasz, > > Thanks for the update, and the abstract looks promising. > > Let me take a look on the doc. > > Regards > JB > > On 13/08/2018 13:24, Łukasz Gajowy wrote: > > Hi all, > > > > since Synthetic Sources API has been introduced in Java and Python SDK, > > it can be used to test some basic Apache Beam operations (i.e. > > GroupByKey, CoGroupByKey Combine, ParDo and ParDo with SideInput) in > > terms of performance. This, in brief, is why we'd like to share the > > below proposal: > > > > _ > https://docs.google.com/document/d/1PuIQv4v06eosKKwT76u7S6IP88AnXhTf870Rcj1AHt4/edit?usp=sharing_ > > > > Let us know what you think in the document's comments. Thank you in > > advance for all the feedback! > > > > Łukasz > >
