+1 I think our experiences with things that go to master early have been very good. So I am in favor ASAP. We can exclude it from releases easily until it is ready for end users.
I have the same question as Robert - how much is modifications and how much is new? I notice it is in a subdirectory of the beam-runners-spark module. I did not see any major changes to dependencies but I will also ask if it has major version differences so that you might want a separate artifact? Kenn On Thu, Oct 10, 2019 at 11:50 AM Robert Bradshaw <rober...@google.com> wrote: > On Thu, Oct 10, 2019 at 12:39 AM Etienne Chauchot <echauc...@apache.org> > wrote: > > > > Hi guys, > > > > You probably know that there has been for several months an work > > developing a new Spark runner based on Spark Structured Streaming > > framework. This work is located in a feature branch here: > > https://github.com/apache/beam/tree/spark-runner_structured-streaming > > > > To attract more contributors and get some user feedback, we think it is > > time to merge it to master. Before doing so, some steps need to be > achieved: > > > > - finish the work on spark Encoders (that allow to call Beam coders) > > because, right now, the runner is in an unstable state (some transforms > > use the new way of doing ser/de and some use the old one, making a > > pipeline incoherent toward serialization) > > > > - clean history: The history contains commits from November 2018, so > > there is a good amount of work, thus a consequent number of commits. > > They were already squashed but not from September 2019 > > I don't think the number of commits should be an issue--we shouldn't > just squash years worth of history away. (OTOH, if this is a case of > this branch containing lots of little, irrelevant commits that would > have normally been squashed away in the normal review process we do > for the main branch, then, yes, some cleanup could be nice.) > > > Regarding status: > > > > - the runner passes 89% of the validates runner tests in batch mode. We > > hope to pass more with the new Encoders > > > > - Streaming mode is barely started (waiting for the multi-aggregations > > support in spark SS framework from the Spark community) > > > > - Runner can execute Nexmark > > > > - Some things are not wired up yet > > > > - Beam Schemas not wired with Spark Schemas > > > > - Optional features of the model not implemented: state api, timer > > api, splittable doFn api, … > > > > WDYT, can we merge it to master once the 2 steps are done ? > > I think that as long as it sits parallel to the existing runner, and > is clearly marked with its status, it makes sense to me. How many > changes does it make to the existing codebase (as opposed to add new > code)? >