So, if everything is in place in Spark 2.X and we use provided dependencies for Spark in Beam. Theoretically, you can run the same code in 2.X without any need for a branch?
2017-03-23 9:47 GMT+02:00 Amit Sela <amitsel...@gmail.com>: > If StreamingContext is valid and we don't have to use SparkSession, and > Accumulators are valid as well and we don't need AccumulatorsV2, I don't > see a reason this shouldn't work (which means there are still tons of > reasons this could break, but I can't think of them off the top of my head > right now). > > @JB simply add a profile for the Spark dependencies and run the tests - > you'll have a very definitive answer ;-) . > If this passes, try on a cluster running Spark 2 as well. > > Let me know of I can assist. > > On Thu, Mar 23, 2017 at 6:55 AM Jean-Baptiste Onofré <j...@nanthrax.net> > wrote: > > > Hi guys, > > > > Ismaël summarize well what I have in mind. > > > > I'm a bit late on the PoC around that (I started a branch already). > > I will move forward over the week end. > > > > Regards > > JB > > > > On 03/22/2017 11:42 PM, Ismaël Mejía wrote: > > > Amit, I suppose JB is talking about the RDD based version, so no need > > > to worry about SparkSession or different incompatible APIs. > > > > > > Remember the idea we are discussing is to have in master both the > > > spark 1 and spark 2 runners using the RDD based translation. At the > > > same time we can have a feature branch to evolve the DataSet based > > > translator (this one will replace the RDD based translator for spark 2 > > > once it is mature). > > > > > > The advantages have been already discussed as well as the possible > > > issues so I think we have to see now if JB's idea is feasible and how > > > hard would be to live with this while the DataSet version evolves. > > > > > > I think what we are trying to avoid is to have a long living branch > > > for a spark 2 runner based on RDD because the maintenance burden > > > would be even worse. We would have to fight not only with the double > > > merge of fixes (in case the profile idea does not work), but also with > > > the continue evolution of Beam and we would end up in the long living > > > branch mess that others runners have dealt with (e.g. the Apex runner) > > > > > > > > https://lists.apache.org/thread.html/12cc086f5ffe331cc70b89322ce541 > 6c3112b87efc3393e3e16032a2@%3Cdev.beam.apache.org%3E > > > > > > What do you think about this Amit ? Would you be ok to go with it if > > > JB's profile idea proves to help with the msintenance issues ? > > > > > > Ismaël > > > > > > > > > > > > On Wed, Mar 22, 2017 at 5:53 PM, Ted Yu <yuzhih...@gmail.com> wrote: > > >> hbase-spark module doesn't use SparkSession. So situation there is > > simpler > > >> :-) > > >> > > >> On Wed, Mar 22, 2017 at 5:35 AM, Amit Sela <amitsel...@gmail.com> > > wrote: > > >> > > >>> I'm still wondering how we'll do this - it's not just different > > >>> implementations of the same Class, but a completely different > concepts > > such > > >>> as using SparkSession in Spark 2 instead of > > SparkContext/StreamingContext > > >>> in Spark 1. > > >>> > > >>> On Tue, Mar 21, 2017 at 7:25 PM Ted Yu <yuzhih...@gmail.com> wrote: > > >>> > > >>>> I have done some work over in HBASE-16179 where compatibility > modules > > are > > >>>> created to isolate changes in Spark 2.x API so that code in > > hbase-spark > > >>>> module can be reused. > > >>>> > > >>>> FYI > > >>>> > > >>> > > > > -- > > Jean-Baptiste Onofré > > jbono...@apache.org > > http://blog.nanthrax.net > > Talend - http://www.talend.com > > >