Re: Pointers on Contributing to Structured Streaming Spark Runner

Rui Wang Tue, 17 Sep 2019 11:36:47 -0700

+1 on merging the runner into master, which make it more discoverable and
easy to contribute (I am also interested in contributing).


-Rui

On Tue, Sep 17, 2019 at 3:36 AM Alexey Romanenko <aromanenko....@gmail.com>
wrote:

> Hi Xinyu,
>
> Great to hear that you wish to contribute into new Spark runner! We used
> to have the sync meetings about all Spark runners in general every two
> weeks, so feel free to let know us if you want to participate too.
>
> Also, as one of the contributors for Structural Streaming Spark runner
> (yes, we need to find a shorter way how to call it =), I agree that it’s a
> good time to merge it into master (even if it’s not 100% ready). Then we
> can create a roadmap with Jiras tasks and push code in normal PR-based way,
> so it will be easier to discover new changes and track the work progress.
> We only need to prevent users that it’s still under development and not
> ready to use in production.
>
> Scheme part is still quite vague and hazy, so it’s a good topic for
> separate discussion. I believe that it would be much effective in terms of
> performance if we will be able to have a strong relation between Beam and
> Spark schemes in the end.
>
> On 13 Sep 2019, at 21:16, Xinyu Liu <xinyuliu...@gmail.com> wrote:
>
> Hi, Etienne,
>
> The slides are very informative! Thanks for sharing the details about how
> the Beam API are mapped into Spark Structural Streaming. We (LinkedIn) are
> also interested in trying the new SparkRunner to run Beam pipeine in batch,
> and contribute to it too. From my understanding, seems the functionality on
> batch side is mostly complete and covers quite a large percentage of the
> tests (a few missing pieces like state and timer in ParDo and SDF). If so,
> is it possible to merge the new runner sooner into master so it's much
> easier for us to pull it in (we have an internal fork) and contribute back?
>
> Also curious about the scheme part in the runner. Seems we can leverage
> the schema-aware work in PCollection and translate from Beam schema to
> Spark, so it can be optimized in the planner layer. It will be great to
> hear back your plans on that.
>
> Congrats on this great work!
> Thanks,
> Xinyu
>
> On Wed, Sep 11, 2019 at 6:02 PM Rui Wang <ruw...@google.com> wrote:
>
>> Hello Etienne,
>>
>> Your slide mentioned that streaming mode development is blocked because
>> Spark lacks supporting multiple-aggregations in its streaming mode but
>> design is ongoing. Do you have a link or something else to their design
>> discussion/doc?
>>
>>
>> -Rui
>>
>> On Wed, Sep 11, 2019 at 5:10 PM Etienne Chauchot <echauc...@apache.org>
>> wrote:
>>
>>> Hi Rahul,
>>> Sure, and great ! Thanks for proposing !
>>> If you want details, here is the presentation I did 30 mins ago at the
>>> apachecon. You will find the video on youtube shortly but in the meantime,
>>> here is my presentation slides.
>>>
>>> And here is the structured streaming branch. I'll be happy to review
>>> your PRs, thanks !
>>>
>>> <https://github.com/apache/beam/tree/spark-runner_structured-streaming>
>>> https://github.com/apache/beam/tree/spark-runner_structured-streaming
>>>
>>> Best
>>> Etienne
>>>
>>> Le mercredi 11 septembre 2019 à 16:37 +0530, rahul patwari a écrit :
>>>
>>> Hi Etienne,
>>>
>>> I came to know about the work going on in Structured Streaming Spark
>>> Runner from Apache Beam Wiki - Works in Progress.
>>> I have contributed to BeamSql earlier. And I am working on supporting
>>> PCollectionView in BeamSql.
>>>
>>> I would love to understand the Runner's side of Apache Beam and
>>> contribute to the Structured Streaming Spark Runner.
>>>
>>> Can you please point me in the right direction?
>>>
>>> Thanks,
>>> Rahul
>>>
>>>
>

Re: Pointers on Contributing to Structured Streaming Spark Runner

Reply via email to