+1 As the runner seems almost "equivalent" to the one we have, it makes sense.
Question is: do we keep the "old" spark runner for a while or not (or just keep on previous version/tag on git) ? Regards JB On 10/10/2019 09:39, Etienne Chauchot wrote: > Hi guys, > > You probably know that there has been for several months an work > developing a new Spark runner based on Spark Structured Streaming > framework. This work is located in a feature branch here: > https://github.com/apache/beam/tree/spark-runner_structured-streaming > > To attract more contributors and get some user feedback, we think it is > time to merge it to master. Before doing so, some steps need to be > achieved: > > - finish the work on spark Encoders (that allow to call Beam coders) > because, right now, the runner is in an unstable state (some transforms > use the new way of doing ser/de and some use the old one, making a > pipeline incoherent toward serialization) > > - clean history: The history contains commits from November 2018, so > there is a good amount of work, thus a consequent number of commits. > They were already squashed but not from September 2019 > > Regarding status: > > - the runner passes 89% of the validates runner tests in batch mode. We > hope to pass more with the new Encoders > > - Streaming mode is barely started (waiting for the multi-aggregations > support in spark SS framework from the Spark community) > > - Runner can execute Nexmark > > - Some things are not wired up yet > > - Beam Schemas not wired with Spark Schemas > > - Optional features of the model not implemented: state api, timer > api, splittable doFn api, … > > WDYT, can we merge it to master once the 2 steps are done ? > > Best > > Etienne > -- Jean-Baptiste Onofré [email protected] http://blog.nanthrax.net Talend - http://www.talend.com
