Re: [PROPOSAL] Test performance of basic Apache Beam operations

Łukasz Gajowy Fri, 15 Mar 2019 11:16:10 -0700

Hi Beamers,

an update on this. Together With Kasia, Michał and cooperating closely with
Pablo we have created and scheduled a Cron Job running daily 7 tests for
GroupByKey batch scenarios. Description of the tests is in the proposal [1]
and will be documented later. The dashboards for the tests:
 - showing run times [2]
 - showing total load size (bytes) [3]


All the metrics are collected using Beam's Metrics API.

Things we have on our horizon:
 - the same set of tests for Java but in streaming mode
 - similar jobs for Python SDK
 - running similar suites on Flink runner

We have also created a set of Dataproc bash scripts that can be used to set
up a Flink cluster that supports portability [4]. It is ready to use and
I've already successfully run the word count example using Python SDK on
it. Hoping + aiming to run load tests on it soon. :)

BTW/Last but not least: we also reused some code to collect metrics using
Metrics API in TextIOIT too and are willing to do a similar change for
other IOITs. Dashboards for TextIOIT: [5].

Thanks,
Łukasz

[1] https://s.apache.org/load-test-basic-operations
[2]
https://apache-beam-testing.appspot.com/explore?dashboard=5643144871804928
[3]
https://apache-beam-testing.appspot.com/explore?dashboard=5701325169885184
[4]
https://github.com/apache/beam/blob/b1ed061fd0c1ed1da562089c939d55715907769d/.test-infra/dataproc/create_flink_cluster.sh
[5]
https://apache-beam-testing.appspot.com/explore?dashboard=5629522644828160


śr., 12 wrz 2018 o 14:23 Etienne Chauchot <[email protected]> napisał(a):

> Let me elaborate a bit my last sentence
> Le mardi 11 septembre 2018 à 11:29 +0200, Etienne Chauchot a écrit :
>
> Hi Lukasz,
>
> Well, having low level byte[] based pure performance tests makes sense.
> And having high level realistic model (Nexmark auction system) makes sense
> also to avoid testing unrealistic pipelines as you describe.
>
> Have common code between the 2 seems difficult as both the architecture
> and the model are different.
>
> I'm more concerned about having two CI mechanisms to detect
> functionnal/performance regressions.
>
>
> Even if parts of NexMark and performance tests are the same they could
> target different objectives: raw performance tests (the new framework) and
> user oriented tests (nexmark). So they might be complementary.
>
> We must just chose how to run them. I think we need to have only one
> automatic regression detection tool. IMHO, the most relevant for func/perf
> regression is Nexmark because it represents what a real user could do (it
> simulates an auction system). So let's  keep it as post commits. Post
> commits allow to target a particular commit that introduced a regression.
>
> We could schedule the new performance tests.
>
> Best
> Etienne
>
>
> Best
> Etienne
>
> Le lundi 10 septembre 2018 à 18:33 +0200, Łukasz Gajowy a écrit :
>
> In my opinion and as far as I understand Nexmark, there are some benefits
> to having both types of tests. The load tests we propose can be very
> straightforward and clearly show what is being tested thanks to the fact
> that there's no fixed model but very "low level" KV<byte[], byte[]>
> collections only. They are more flexible in shapes of the pipelines they
> can express e.g. fanout_64, without having to think about specific use
> cases.
>
> Having both types would allow developers to decide whether they want to
> create a new Nexmark query for their specific case or develop a new Load
> test (whatever is easier and more fits their case). However, there is a
> risk - with KV<byte[], byte[]> developer can overemphasize cases that can
> never happen in practice, so we need to be careful about the exact
> configurations we run.
>
> Still, I can imagine that there surely will be code that should be common
> to both types of tests and we seek ways to not duplicate code.
>
> WDYT?
>
> Regards,
> Łukasz
>
>
>
> pon., 10 wrz 2018 o 16:36 Etienne Chauchot <[email protected]>
> napisał(a):
>
> Hi,
> It seems that there is a notable overlap with what Nexmark already does:
> Nexmark mesures performance and regression by exercising all the Beam
> model in both batch and streaming modes with several runners. It also
> computes on synthetic data. Also nexmark is already included as PostCommits
> in the CI and dashboards.
>
> Shall we merge the two?
>
> Best
>
> Etienne
>
> Le lundi 10 septembre 2018 à 12:56 +0200, Łukasz Gajowy a écrit :
>
> Hello everyone,
>
> thank you for all your comments to the proposal. To sum up:
>
> A set of performance tests exercising Core Beam Transforms (ParDo,
> GroupByKey, CoGroupByKey, Combine) will be implemented for Java and Python
> SDKs. Those tests will allow to:
>
>    - measure performance of the transforms on various runners
>    - exercise the transforms by creating stressful conditions and big
>    loads using Synthetic Source and Synthetic Step API (delays, keeping cpu
>    busy or asleep, processing large keys and values, performing fanout or
>    reiteration of inputs)
>    - run both in batch and streaming context
>    - gather various metrics
>    - notice regressions by comparing data from consequent Jenkins runs
>
> Metrics (runtime, consumed bytes, memory usage, split/bundle count) can be
> gathered during test invocations. We will start with runtime and leverage
> Metrics API to collect the other metrics in later phases of development.
> The tests will be fully configurable through pipeline options and it will
> be possible to run any custom scenarios manually. However, a representative
> set of testing scenarios will be run periodically using Jenkins.
>
> Regards,
> Łukasz
>
> śr., 5 wrz 2018 o 20:31 Rafael Fernandez <[email protected]> napisał(a):
>
> neat! left a comment or two
>
> On Mon, Sep 3, 2018 at 3:53 AM Łukasz Gajowy <[email protected]> wrote:
>
> Hi all!
>
> I'm bumping this (in case you missed it). Any feedback and questions are
> welcome!
>
> Best regards,
> Łukasz
>
> pon., 13 sie 2018 o 13:51 Jean-Baptiste Onofré <[email protected]>
> napisał(a):
>
> Hi Lukasz,
>
> Thanks for the update, and the abstract looks promising.
>
> Let me take a look on the doc.
>
> Regards
> JB
>
> On 13/08/2018 13:24, Łukasz Gajowy wrote:
> > Hi all,
> >
> > since Synthetic Sources API has been introduced in Java and Python SDK,
> > it can be used to test some basic Apache Beam operations (i.e.
> > GroupByKey, CoGroupByKey Combine, ParDo and ParDo with SideInput) in
> > terms of performance. This, in brief, is why we'd like to share the
> > below proposal:
> >
> > _
> https://docs.google.com/document/d/1PuIQv4v06eosKKwT76u7S6IP88AnXhTf870Rcj1AHt4/edit?usp=sharing_
> >
> > Let us know what you think in the document's comments. Thank you in
> > advance for all the feedback!
> >
> > Łukasz
>
>

Re: [PROPOSAL] Test performance of basic Apache Beam operations

Reply via email to