Re: Beam spark 2.x runner status

Kobi Salant Thu, 23 Mar 2017 01:23:49 -0700

So, if everything is in place in Spark 2.X and we use provided dependencies
for Spark in Beam.
Theoretically, you can run the same code in 2.X without any need for a
branch?


2017-03-23 9:47 GMT+02:00 Amit Sela <[email protected]>:

> If StreamingContext is valid and we don't have to use SparkSession, and
> Accumulators are valid as well and we don't need AccumulatorsV2, I don't
> see a reason this shouldn't work (which means there are still tons of
> reasons this could break, but I can't think of them off the top of my head
> right now).
>
> @JB simply add a profile for the Spark dependencies and run the tests -
> you'll have a very definitive answer ;-) .
> If this passes, try on a cluster running Spark 2 as well.
>
> Let me know of I can assist.
>
> On Thu, Mar 23, 2017 at 6:55 AM Jean-Baptiste Onofré <[email protected]>
> wrote:
>
> > Hi guys,
> >
> > Ismaël summarize well what I have in mind.
> >
> > I'm a bit late on the PoC around that (I started a branch already).
> > I will move forward over the week end.
> >
> > Regards
> > JB
> >
> > On 03/22/2017 11:42 PM, Ismaël Mejía wrote:
> > > Amit, I suppose JB is talking about the RDD based version, so no need
> > > to worry about SparkSession or different incompatible APIs.
> > >
> > > Remember the idea we are discussing is to have in master both the
> > > spark 1 and spark 2 runners using the RDD based translation. At the
> > > same time we can have a feature branch to evolve the DataSet based
> > > translator (this one will replace the RDD based translator for spark 2
> > > once it is mature).
> > >
> > > The advantages have been already discussed as well as the possible
> > > issues so I think we have to see now if JB's idea is feasible and how
> > > hard would be to live with this while the DataSet version evolves.
> > >
> > > I think what we are trying to avoid is to have a long living branch
> > > for a spark 2 runner based on RDD  because the maintenance burden
> > > would be even worse. We would have to fight not only with the double
> > > merge of fixes (in case the profile idea does not work), but also with
> > > the continue evolution of Beam and we would end up in the long living
> > > branch mess that others runners have dealt with (e.g. the Apex runner)
> > >
> > >
> > https://lists.apache.org/thread.html/12cc086f5ffe331cc70b89322ce541
> 6c3112b87efc3393e3e16032a2@%3Cdev.beam.apache.org%3E
> > >
> > > What do you think about this Amit ? Would you be ok to go with it if
> > > JB's profile idea proves to help with the msintenance issues ?
> > >
> > > Ismaël
> > >
> > >
> > >
> > > On Wed, Mar 22, 2017 at 5:53 PM, Ted Yu <[email protected]> wrote:
> > >> hbase-spark module doesn't use SparkSession. So situation there is
> > simpler
> > >> :-)
> > >>
> > >> On Wed, Mar 22, 2017 at 5:35 AM, Amit Sela <[email protected]>
> > wrote:
> > >>
> > >>> I'm still wondering how we'll do this - it's not just different
> > >>> implementations of the same Class, but a completely different
> concepts
> > such
> > >>> as using SparkSession in Spark 2 instead of
> > SparkContext/StreamingContext
> > >>> in Spark 1.
> > >>>
> > >>> On Tue, Mar 21, 2017 at 7:25 PM Ted Yu <[email protected]> wrote:
> > >>>
> > >>>> I have done some work over in HBASE-16179 where compatibility
> modules
> > are
> > >>>> created to isolate changes in Spark 2.x API so that code in
> > hbase-spark
> > >>>> module can be reused.
> > >>>>
> > >>>> FYI
> > >>>>
> > >>>
> >
> > --
> > Jean-Baptiste Onofré
> > [email protected]
> > http://blog.nanthrax.net
> > Talend - http://www.talend.com
> >
>

Re: Beam spark 2.x runner status

Reply via email to