Could any Googlers help to run NexMark on Dataflow streaming and share the
numbers with the community?
--
Pei

On Fri, Aug 25, 2017 at 11:28 PM, Lukasz Cwik <[email protected]>
wrote:

> Etienne, cut some JIRAs for improvements like ValidatesRunner for the
> Nexmark suite that you think are worthy. Some of them might be good
> 'starter' tasks as well.
>
> On Fri, Aug 25, 2017 at 1:43 AM, Etienne Chauchot <[email protected]>
> wrote:
>
> > Hi guys,
> >
> > There is also some points to discuss:
> >
> > - I think some of the tests in this test suite should be generalized as
> > validatesRunner tests like it was done for example for custom window
> > merging (https://github.com/apache/beam/blob/5181e619f17e1f69fabe8d5
> > bdfc7a3a6a2142cde/sdks/java/core/src/test/java/org/apache/
> > beam/sdk/transforms/windowing/WindowTest.java#L591)
> >
> > - We have run almost no tests on Dataflow, so if someone could run the
> > test suite on dataflow, he's very welcome. All needed information are
> still
> > in the README, but I'll move these info to the website.
> >
> > - other points?
> >
> > WDYT?
> >
> > Best,
> >
> > Etienne
> >
> >
> >
> > Le 24/08/2017 à 18:35, Lukasz Cwik a écrit :
> >
> >> Yeah, was looking forward to this.
> >>
> >> On Thu, Aug 24, 2017 at 9:20 AM, Tyler Akidau
> <[email protected]
> >> >
> >> wrote:
> >>
> >> Awesome news, thank you! :-D
> >>>
> >>> On Thu, Aug 24, 2017 at 12:40 AM Etienne Chauchot <[email protected]
> >
> >>> wrote:
> >>>
> >>> Hi all,
> >>>>
> >>>> I wanted to let you know that the Nexmark PR is merged into master.
> Feel
> >>>> free to use it (e.g. performance testing, release testing ...).
> >>>>
> >>>> Etienne
> >>>>
> >>>> Le 12/05/2017 à 10:55, Etienne Chauchot a écrit :
> >>>>
> >>>>> Hi guys,
> >>>>>
> >>>>> I wanted to let you know that I have just submitted a PR around
> >>>>> NexMark. This is a port of the NexMark queries to Beam, to be used as
> >>>>> integration tests.
> >>>>> This can also be used as A-B testing (no-regression or performance
> >>>>> comparison between 2 versions of the same engine or of the same
> runner)
> >>>>>
> >>>>> This a continuation of the previous PR (#99) from Mark Shields.
> >>>>> The code has changed quite a bit: some queries have changed to use
> new
> >>>>> Beam APIs and there where some big refactorings. More important, we
> >>>>> can now run all the queries in all the runners.
> >>>>>
> >>>>> Nevertheless, there are still some open issues in Nexmark
> >>>>> (https://github.com/iemejia/beam/issues) and in Beam upstream (see
> >>>>> issue links in https://issues.apache.org/jira/browse/BEAM-160)
> >>>>>
> >>>>> I wanted to submit the PR before our (Ismaël and I) NexMark talk at
> >>>>> the ApacheCon. The PR is not perfect but it is in a good shape to
> >>>>> share it.
> >>>>>
> >>>>> Best,
> >>>>>
> >>>>> Etienne
> >>>>>
> >>>>>
> >>>>>
> >>>>> Le 22/03/2017 à 04:51, Kenneth Knowles a écrit :
> >>>>>
> >>>>>> This is great! Having a variety of realistic-ish pipelines running
> on
> >>>>>> all
> >>>>>> runners complements the validation suite and IO IT work.
> >>>>>>
> >>>>>> If I recall, some of these involve heavy and esoteric uses of state,
> >>>>>>
> >>>>> so
> >>>
> >>>> definitely give me a ping if you hit any trouble.
> >>>>>>
> >>>>>> Kenn
> >>>>>>
> >>>>>> On Tue, Mar 21, 2017 at 9:38 AM, Etienne Chauchot <
> >>>>>>
> >>>>> [email protected]>
> >>>
> >>>> wrote:
> >>>>>>
> >>>>>> Hi all,
> >>>>>>>
> >>>>>>> Ismael and I are working on upgrading the Nexmark implementation
> for
> >>>>>>> Beam.
> >>>>>>> See https://github.com/iemejia/beam/tree/BEAM-160-nexmark and
> >>>>>>> https://issues.apache.org/jira/browse/BEAM-160. We are continuing
> >>>>>>>
> >>>>>> the
> >>>
> >>>> work done by Mark Shields. See https://github.com/apache/
> >>>>>>>
> >>>>>> beam/pull/366
> >>>
> >>>> for the original PR.
> >>>>>>>
> >>>>>>> The PR contains queries that have a wide coverage of the Beam model
> >>>>>>>
> >>>>>> and
> >>>
> >>>> that represent a realistic end user use case (some come from client
> >>>>>>> experience on Google Cloud Dataflow).
> >>>>>>>
> >>>>>>> So far, we have upgraded the implementation to the latest Beam
> >>>>>>> snapshot.
> >>>>>>> And we are able to execute a good subset of the queries in the
> >>>>>>> different
> >>>>>>> runners. We upgraded the nexmark drivers to do so: direct driver
> >>>>>>> (upgraded
> >>>>>>> from inProcessDriver) and flink driver and we added a new one for
> >>>>>>> spark.
> >>>>>>>
> >>>>>>> There is still a good amount of work to do and we would like to
> know
> >>>>>>>
> >>>>>> if
> >>>
> >>>> you think that this contribution can have its place into Beam
> >>>>>>> eventually.
> >>>>>>>
> >>>>>>> The interests of having Nexmark on Beam that we have seen so far
> are:
> >>>>>>>
> >>>>>>> - Rich batch/streaming test
> >>>>>>>
> >>>>>>> - A-B testing of runners or runtimes (non-regression, performance
> >>>>>>> comparison between versions ...)
> >>>>>>>
> >>>>>>> - Integration testing (sdk/runners, runner/runtime, ...)
> >>>>>>>
> >>>>>>> - Validate beam capability matrix
> >>>>>>>
> >>>>>>> - It can be used as part of the ongoing PerfKit work (if there is
> any
> >>>>>>> interest).
> >>>>>>>
> >>>>>>> As a final note, we are tracking the issues in the same repo. If
> >>>>>>> someone
> >>>>>>> is interested in contributing, or have more ideas, you are welcome
> :)
> >>>>>>>
> >>>>>>> Etienne
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>
> >
>

Reply via email to