Could any Googlers help to run NexMark on Dataflow streaming and share the numbers with the community? -- Pei
On Fri, Aug 25, 2017 at 11:28 PM, Lukasz Cwik <[email protected]> wrote: > Etienne, cut some JIRAs for improvements like ValidatesRunner for the > Nexmark suite that you think are worthy. Some of them might be good > 'starter' tasks as well. > > On Fri, Aug 25, 2017 at 1:43 AM, Etienne Chauchot <[email protected]> > wrote: > > > Hi guys, > > > > There is also some points to discuss: > > > > - I think some of the tests in this test suite should be generalized as > > validatesRunner tests like it was done for example for custom window > > merging (https://github.com/apache/beam/blob/5181e619f17e1f69fabe8d5 > > bdfc7a3a6a2142cde/sdks/java/core/src/test/java/org/apache/ > > beam/sdk/transforms/windowing/WindowTest.java#L591) > > > > - We have run almost no tests on Dataflow, so if someone could run the > > test suite on dataflow, he's very welcome. All needed information are > still > > in the README, but I'll move these info to the website. > > > > - other points? > > > > WDYT? > > > > Best, > > > > Etienne > > > > > > > > Le 24/08/2017 à 18:35, Lukasz Cwik a écrit : > > > >> Yeah, was looking forward to this. > >> > >> On Thu, Aug 24, 2017 at 9:20 AM, Tyler Akidau > <[email protected] > >> > > >> wrote: > >> > >> Awesome news, thank you! :-D > >>> > >>> On Thu, Aug 24, 2017 at 12:40 AM Etienne Chauchot <[email protected] > > > >>> wrote: > >>> > >>> Hi all, > >>>> > >>>> I wanted to let you know that the Nexmark PR is merged into master. > Feel > >>>> free to use it (e.g. performance testing, release testing ...). > >>>> > >>>> Etienne > >>>> > >>>> Le 12/05/2017 à 10:55, Etienne Chauchot a écrit : > >>>> > >>>>> Hi guys, > >>>>> > >>>>> I wanted to let you know that I have just submitted a PR around > >>>>> NexMark. This is a port of the NexMark queries to Beam, to be used as > >>>>> integration tests. > >>>>> This can also be used as A-B testing (no-regression or performance > >>>>> comparison between 2 versions of the same engine or of the same > runner) > >>>>> > >>>>> This a continuation of the previous PR (#99) from Mark Shields. > >>>>> The code has changed quite a bit: some queries have changed to use > new > >>>>> Beam APIs and there where some big refactorings. More important, we > >>>>> can now run all the queries in all the runners. > >>>>> > >>>>> Nevertheless, there are still some open issues in Nexmark > >>>>> (https://github.com/iemejia/beam/issues) and in Beam upstream (see > >>>>> issue links in https://issues.apache.org/jira/browse/BEAM-160) > >>>>> > >>>>> I wanted to submit the PR before our (Ismaël and I) NexMark talk at > >>>>> the ApacheCon. The PR is not perfect but it is in a good shape to > >>>>> share it. > >>>>> > >>>>> Best, > >>>>> > >>>>> Etienne > >>>>> > >>>>> > >>>>> > >>>>> Le 22/03/2017 à 04:51, Kenneth Knowles a écrit : > >>>>> > >>>>>> This is great! Having a variety of realistic-ish pipelines running > on > >>>>>> all > >>>>>> runners complements the validation suite and IO IT work. > >>>>>> > >>>>>> If I recall, some of these involve heavy and esoteric uses of state, > >>>>>> > >>>>> so > >>> > >>>> definitely give me a ping if you hit any trouble. > >>>>>> > >>>>>> Kenn > >>>>>> > >>>>>> On Tue, Mar 21, 2017 at 9:38 AM, Etienne Chauchot < > >>>>>> > >>>>> [email protected]> > >>> > >>>> wrote: > >>>>>> > >>>>>> Hi all, > >>>>>>> > >>>>>>> Ismael and I are working on upgrading the Nexmark implementation > for > >>>>>>> Beam. > >>>>>>> See https://github.com/iemejia/beam/tree/BEAM-160-nexmark and > >>>>>>> https://issues.apache.org/jira/browse/BEAM-160. We are continuing > >>>>>>> > >>>>>> the > >>> > >>>> work done by Mark Shields. See https://github.com/apache/ > >>>>>>> > >>>>>> beam/pull/366 > >>> > >>>> for the original PR. > >>>>>>> > >>>>>>> The PR contains queries that have a wide coverage of the Beam model > >>>>>>> > >>>>>> and > >>> > >>>> that represent a realistic end user use case (some come from client > >>>>>>> experience on Google Cloud Dataflow). > >>>>>>> > >>>>>>> So far, we have upgraded the implementation to the latest Beam > >>>>>>> snapshot. > >>>>>>> And we are able to execute a good subset of the queries in the > >>>>>>> different > >>>>>>> runners. We upgraded the nexmark drivers to do so: direct driver > >>>>>>> (upgraded > >>>>>>> from inProcessDriver) and flink driver and we added a new one for > >>>>>>> spark. > >>>>>>> > >>>>>>> There is still a good amount of work to do and we would like to > know > >>>>>>> > >>>>>> if > >>> > >>>> you think that this contribution can have its place into Beam > >>>>>>> eventually. > >>>>>>> > >>>>>>> The interests of having Nexmark on Beam that we have seen so far > are: > >>>>>>> > >>>>>>> - Rich batch/streaming test > >>>>>>> > >>>>>>> - A-B testing of runners or runtimes (non-regression, performance > >>>>>>> comparison between versions ...) > >>>>>>> > >>>>>>> - Integration testing (sdk/runners, runner/runtime, ...) > >>>>>>> > >>>>>>> - Validate beam capability matrix > >>>>>>> > >>>>>>> - It can be used as part of the ongoing PerfKit work (if there is > any > >>>>>>> interest). > >>>>>>> > >>>>>>> As a final note, we are tracking the issues in the same repo. If > >>>>>>> someone > >>>>>>> is interested in contributing, or have more ideas, you are welcome > :) > >>>>>>> > >>>>>>> Etienne > >>>>>>> > >>>>>>> > >>>>>>> > >>>> > > >
