+1 on merging the runner into master, which make it more discoverable and easy to contribute (I am also interested in contributing).
-Rui On Tue, Sep 17, 2019 at 3:36 AM Alexey Romanenko <aromanenko....@gmail.com> wrote: > Hi Xinyu, > > Great to hear that you wish to contribute into new Spark runner! We used > to have the sync meetings about all Spark runners in general every two > weeks, so feel free to let know us if you want to participate too. > > Also, as one of the contributors for Structural Streaming Spark runner > (yes, we need to find a shorter way how to call it =), I agree that it’s a > good time to merge it into master (even if it’s not 100% ready). Then we > can create a roadmap with Jiras tasks and push code in normal PR-based way, > so it will be easier to discover new changes and track the work progress. > We only need to prevent users that it’s still under development and not > ready to use in production. > > Scheme part is still quite vague and hazy, so it’s a good topic for > separate discussion. I believe that it would be much effective in terms of > performance if we will be able to have a strong relation between Beam and > Spark schemes in the end. > > On 13 Sep 2019, at 21:16, Xinyu Liu <xinyuliu...@gmail.com> wrote: > > Hi, Etienne, > > The slides are very informative! Thanks for sharing the details about how > the Beam API are mapped into Spark Structural Streaming. We (LinkedIn) are > also interested in trying the new SparkRunner to run Beam pipeine in batch, > and contribute to it too. From my understanding, seems the functionality on > batch side is mostly complete and covers quite a large percentage of the > tests (a few missing pieces like state and timer in ParDo and SDF). If so, > is it possible to merge the new runner sooner into master so it's much > easier for us to pull it in (we have an internal fork) and contribute back? > > Also curious about the scheme part in the runner. Seems we can leverage > the schema-aware work in PCollection and translate from Beam schema to > Spark, so it can be optimized in the planner layer. It will be great to > hear back your plans on that. > > Congrats on this great work! > Thanks, > Xinyu > > On Wed, Sep 11, 2019 at 6:02 PM Rui Wang <ruw...@google.com> wrote: > >> Hello Etienne, >> >> Your slide mentioned that streaming mode development is blocked because >> Spark lacks supporting multiple-aggregations in its streaming mode but >> design is ongoing. Do you have a link or something else to their design >> discussion/doc? >> >> >> -Rui >> >> On Wed, Sep 11, 2019 at 5:10 PM Etienne Chauchot <echauc...@apache.org> >> wrote: >> >>> Hi Rahul, >>> Sure, and great ! Thanks for proposing ! >>> If you want details, here is the presentation I did 30 mins ago at the >>> apachecon. You will find the video on youtube shortly but in the meantime, >>> here is my presentation slides. >>> >>> And here is the structured streaming branch. I'll be happy to review >>> your PRs, thanks ! >>> >>> <https://github.com/apache/beam/tree/spark-runner_structured-streaming> >>> https://github.com/apache/beam/tree/spark-runner_structured-streaming >>> >>> Best >>> Etienne >>> >>> Le mercredi 11 septembre 2019 à 16:37 +0530, rahul patwari a écrit : >>> >>> Hi Etienne, >>> >>> I came to know about the work going on in Structured Streaming Spark >>> Runner from Apache Beam Wiki - Works in Progress. >>> I have contributed to BeamSql earlier. And I am working on supporting >>> PCollectionView in BeamSql. >>> >>> I would love to understand the Runner's side of Apache Beam and >>> contribute to the Structured Streaming Spark Runner. >>> >>> Can you please point me in the right direction? >>> >>> Thanks, >>> Rahul >>> >>> >