Re: Beam spark 2.x runner status

Jean-Baptiste Onofré Wed, 15 Mar 2017 05:58:18 -0700

Hi Amit,

What do you think of the following:

- in the mean time that you reintroduce the Spark 2 branch, what about"extending" the version in the current Spark runner ? Still using RDD/DStream, Ithink we can support Spark 2.x even if we don't yet leverage the new providedfeatures.


Thoughts ?

Regards
JB

On 03/15/2017 07:39 PM, Amit Sela wrote:

Hi Cody,

I will re-introduce this branch soon as part of the work on BEAM-913
<https://issues.apache.org/jira/browse/BEAM-913>.
For now, and from previous experience with the mentioned branch, batch
implementation should be straight-forward.
Only issue is with streaming support - in the current runner (Spark 1.x) we
have experimental support for windows/triggers and we're working towards
full streaming support.
With Spark 2.x, there is no "general-purpose" stateful operator for the
Dataset API, so I was waiting to see if the new operator
<https://github.com/apache/spark/pull/17179> planned for next version could
help with that.

To summarize, I will introduce a skeleton for the Spark 2 runner with batch
support as soon as I can as a separate branch.

Thanks,
Amit

On Wed, Mar 15, 2017 at 9:07 AM Cody Innowhere <[email protected]> wrote:

Hi guys,
Is there anybody who's currently working on Spark 2.x runner? A old PR for
spark 2.x runner was closed a few days ago, so I wonder what's the status
now, and is there a roadmap for this?
Thanks~


--
Jean-Baptiste Onofré
[email protected]
http://blog.nanthrax.net
Talend - http://www.talend.com

Re: Beam spark 2.x runner status

Reply via email to