Re: Beam spark 2.x runner status

Amit Sela Thu, 23 Mar 2017 00:54:59 -0700

If StreamingContext is valid and we don't have to use SparkSession, and
Accumulators are valid as well and we don't need AccumulatorsV2, I don't
see a reason this shouldn't work (which means there are still tons of
reasons this could break, but I can't think of them off the top of my head
right now).


@JB simply add a profile for the Spark dependencies and run the tests -
you'll have a very definitive answer ;-) .
If this passes, try on a cluster running Spark 2 as well.

Let me know of I can assist.

On Thu, Mar 23, 2017 at 6:55 AM Jean-Baptiste Onofré <j...@nanthrax.net>
wrote:

> Hi guys,
>
> Ismaël summarize well what I have in mind.
>
> I'm a bit late on the PoC around that (I started a branch already).
> I will move forward over the week end.
>
> Regards
> JB
>
> On 03/22/2017 11:42 PM, Ismaël Mejía wrote:
> > Amit, I suppose JB is talking about the RDD based version, so no need
> > to worry about SparkSession or different incompatible APIs.
> >
> > Remember the idea we are discussing is to have in master both the
> > spark 1 and spark 2 runners using the RDD based translation. At the
> > same time we can have a feature branch to evolve the DataSet based
> > translator (this one will replace the RDD based translator for spark 2
> > once it is mature).
> >
> > The advantages have been already discussed as well as the possible
> > issues so I think we have to see now if JB's idea is feasible and how
> > hard would be to live with this while the DataSet version evolves.
> >
> > I think what we are trying to avoid is to have a long living branch
> > for a spark 2 runner based on RDD  because the maintenance burden
> > would be even worse. We would have to fight not only with the double
> > merge of fixes (in case the profile idea does not work), but also with
> > the continue evolution of Beam and we would end up in the long living
> > branch mess that others runners have dealt with (e.g. the Apex runner)
> >
> >
> https://lists.apache.org/thread.html/12cc086f5ffe331cc70b89322ce5416c3112b87efc3393e3e16032a2@%3Cdev.beam.apache.org%3E
> >
> > What do you think about this Amit ? Would you be ok to go with it if
> > JB's profile idea proves to help with the msintenance issues ?
> >
> > Ismaël
> >
> >
> >
> > On Wed, Mar 22, 2017 at 5:53 PM, Ted Yu <yuzhih...@gmail.com> wrote:
> >> hbase-spark module doesn't use SparkSession. So situation there is
> simpler
> >> :-)
> >>
> >> On Wed, Mar 22, 2017 at 5:35 AM, Amit Sela <amitsel...@gmail.com>
> wrote:
> >>
> >>> I'm still wondering how we'll do this - it's not just different
> >>> implementations of the same Class, but a completely different concepts
> such
> >>> as using SparkSession in Spark 2 instead of
> SparkContext/StreamingContext
> >>> in Spark 1.
> >>>
> >>> On Tue, Mar 21, 2017 at 7:25 PM Ted Yu <yuzhih...@gmail.com> wrote:
> >>>
> >>>> I have done some work over in HBASE-16179 where compatibility modules
> are
> >>>> created to isolate changes in Spark 2.x API so that code in
> hbase-spark
> >>>> module can be reused.
> >>>>
> >>>> FYI
> >>>>
> >>>
>
> --
> Jean-Baptiste Onofré
> jbono...@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>

Re: Beam spark 2.x runner status

Reply via email to