Re: Pointers on Contributing to Structured Streaming Spark Runner

Etienne Chauchot Thu, 19 Sep 2019 05:13:57 -0700
Hi Rahul and Xinyu,I just added you to the list of guests in the meeting. Time 
is 5pm GMT +2. That being said, for some
reason last meeting scheduled was 08/28. Ismael initially created the meeting, 
I do not have the rights to add a new
date. Ismael can you add a date ?  I suggest 09/25. WDYT ?
BestEtienne
Le jeudi 19 septembre 2019 à 00:49 +0530, rahul patwari a écrit :
> Hi, 
> I would love to join the call. 
> Can you also share the meeting invitation with me?
> 
> Thanks,
> Rahul
> On Wed 18 Sep, 2019, 11:48 PM Xinyu Liu, <[email protected]> wrote:
> > Alexey and Etienne: I'm very happy to join the sync-up meeting. Please 
> > forward the meeting info to me. I am based in
> > California, US and hopefully the time will work :).
> > Thanks,
> > Xinyu
> > On Wed, Sep 18, 2019 at 6:39 AM Etienne Chauchot <[email protected]> 
> > wrote:
> > > Hi Xinyu,
> > > Thanks for offering help ! My comments are inline:
> > > Le vendredi 13 septembre 2019 à 12:16 -0700, Xinyu Liu a écrit :
> > > > Hi, Etienne,
> > > > The slides are very informative! Thanks for sharing the details about 
> > > > how the Beam API are mapped into Spark
> > > > Structural Streaming. 
> > > 
> > > Thanks !
> > > > We (LinkedIn) are also interested in trying the new SparkRunner to run 
> > > > Beam pipeine in batch, and contribute to
> > > > it too. From my understanding, seems the functionality on batch side is 
> > > > mostly complete and covers quite a large
> > > > percentage of the tests (a few missing pieces like state and timer in 
> > > > ParDo and SDF). 
> > > 
> > > Correct, it passes 89% of the tests, but there is more than SDF, state 
> > > and timer missing, there is also ongoing
> > > encoders work that I would like to commit/push before merging.
> > > > If so, is it possible to merge the new runner sooner into master so 
> > > > it's much easier for us to pull it in (we
> > > > have an internal fork) and contribute back?
> > > 
> > > Sure, see my other mail on this thread. As Alexey mentioned, please join 
> > > the sync meeting we have, the more the
> > > merrier !
> > > > Also curious about the scheme part in the runner. Seems we can leverage 
> > > > the schema-aware work in PCollection and
> > > > translate from Beam schema to Spark, so it can be optimized in the 
> > > > planner layer. It will be great to hear back
> > > > your plans on that.
> > > 
> > > Well, it is not designed yet but, if you remember my talk, we need to 
> > > store beam windowing information with the
> > > data itself, so ending up having a dataset<WindowedValue> . One lead that 
> > > was discussed is to store it as a Spark
> > > schema such as this:
> > > 1. field1: binary data for beam windowing information (cannot be mapped 
> > > to fields  because beam windowing info is
> > > complex structure)
> > > 2. fields of data as defined in the Beam schema if there is one 
> > > 
> > > > Congrats on this great work!
> > > Thanks !
> > > Best,
> > > Etienne
> > > > Thanks,
> > > > Xinyu
> > > > On Wed, Sep 11, 2019 at 6:02 PM Rui Wang <[email protected]> wrote:
> > > > > Hello Etienne,
> > > > > Your slide mentioned that streaming mode development is blocked 
> > > > > because Spark lacks supporting multiple-
> > > > > aggregations in its streaming mode but design is ongoing. Do you have 
> > > > > a link or something else to their design
> > > > > discussion/doc?
> > > > > 
> > > > > 
> > > > > -Rui  
> > > > > On Wed, Sep 11, 2019 at 5:10 PM Etienne Chauchot 
> > > > > <[email protected]> wrote:
> > > > > > Hi Rahul,Sure, and great ! Thanks for proposing !If you want 
> > > > > > details, here is the presentation I did 30 mins
> > > > > > ago at the apachecon. You will find the video on youtube shortly 
> > > > > > but in the meantime, here is my
> > > > > > presentation slides.
> > > > > > And here is the structured streaming branch. I'll be happy to 
> > > > > > review your PRs, thanks ! 
> > > > > > https://github.com/apache/beam/tree/spark-runner_structured-streaming
> > > > > > BestEtienne
> > > > > > Le mercredi 11 septembre 2019 à 16:37 +0530, rahul patwari a écrit :
> > > > > > > Hi Etienne,
> > > > > > > 
> > > > > > > I came to know about the work going on in Structured Streaming 
> > > > > > > Spark Runner from Apache Beam Wiki - Works
> > > > > > > in Progress.
> > > > > > > I have contributed to BeamSql earlier. And I am working on 
> > > > > > > supporting PCollectionView in BeamSql.
> > > > > > > 
> > > > > > > I would love to understand the Runner's side of Apache Beam and 
> > > > > > > contribute to the Structured Streaming
> > > > > > > Spark Runner.
> > > > > > > 
> > > > > > > Can you please point me in the right direction?
> > > > > > > 
> > > > > > > Thanks,
> > > > > > > Rahul
Re: Pointers on Contributing to Structured Streaming Spark Runner

Reply via email to