Not a deep response, but this is awesome! We'd really like to have some good benchmarks, and I'm excited you're updating Nexmark. This will be great!
On Tue, Mar 21, 2017 at 9:38 AM, Etienne Chauchot <[email protected]> wrote: > Hi all, > > Ismael and I are working on upgrading the Nexmark implementation for Beam. > See https://github.com/iemejia/beam/tree/BEAM-160-nexmark and > https://issues.apache.org/jira/browse/BEAM-160. We are continuing the > work done by Mark Shields. See https://github.com/apache/beam/pull/366 > for the original PR. > > The PR contains queries that have a wide coverage of the Beam model and > that represent a realistic end user use case (some come from client > experience on Google Cloud Dataflow). > > So far, we have upgraded the implementation to the latest Beam snapshot. > And we are able to execute a good subset of the queries in the different > runners. We upgraded the nexmark drivers to do so: direct driver (upgraded > from inProcessDriver) and flink driver and we added a new one for spark. > > There is still a good amount of work to do and we would like to know if > you think that this contribution can have its place into Beam eventually. > > The interests of having Nexmark on Beam that we have seen so far are: > > - Rich batch/streaming test > > - A-B testing of runners or runtimes (non-regression, performance > comparison between versions ...) > > - Integration testing (sdk/runners, runner/runtime, ...) > > - Validate beam capability matrix > > - It can be used as part of the ongoing PerfKit work (if there is any > interest). > > As a final note, we are tracking the issues in the same repo. If someone > is interested in contributing, or have more ideas, you are welcome :) > > Etienne > >
