If StreamingContext is valid and we don't have to use SparkSession, and Accumulators are valid as well and we don't need AccumulatorsV2, I don't see a reason this shouldn't work (which means there are still tons of reasons this could break, but I can't think of them off the top of my head right now).
@JB simply add a profile for the Spark dependencies and run the tests - you'll have a very definitive answer ;-) . If this passes, try on a cluster running Spark 2 as well. Let me know of I can assist. On Thu, Mar 23, 2017 at 6:55 AM Jean-Baptiste Onofré <j...@nanthrax.net> wrote: > Hi guys, > > Ismaël summarize well what I have in mind. > > I'm a bit late on the PoC around that (I started a branch already). > I will move forward over the week end. > > Regards > JB > > On 03/22/2017 11:42 PM, Ismaël Mejía wrote: > > Amit, I suppose JB is talking about the RDD based version, so no need > > to worry about SparkSession or different incompatible APIs. > > > > Remember the idea we are discussing is to have in master both the > > spark 1 and spark 2 runners using the RDD based translation. At the > > same time we can have a feature branch to evolve the DataSet based > > translator (this one will replace the RDD based translator for spark 2 > > once it is mature). > > > > The advantages have been already discussed as well as the possible > > issues so I think we have to see now if JB's idea is feasible and how > > hard would be to live with this while the DataSet version evolves. > > > > I think what we are trying to avoid is to have a long living branch > > for a spark 2 runner based on RDD because the maintenance burden > > would be even worse. We would have to fight not only with the double > > merge of fixes (in case the profile idea does not work), but also with > > the continue evolution of Beam and we would end up in the long living > > branch mess that others runners have dealt with (e.g. the Apex runner) > > > > > https://lists.apache.org/thread.html/12cc086f5ffe331cc70b89322ce5416c3112b87efc3393e3e16032a2@%3Cdev.beam.apache.org%3E > > > > What do you think about this Amit ? Would you be ok to go with it if > > JB's profile idea proves to help with the msintenance issues ? > > > > Ismaël > > > > > > > > On Wed, Mar 22, 2017 at 5:53 PM, Ted Yu <yuzhih...@gmail.com> wrote: > >> hbase-spark module doesn't use SparkSession. So situation there is > simpler > >> :-) > >> > >> On Wed, Mar 22, 2017 at 5:35 AM, Amit Sela <amitsel...@gmail.com> > wrote: > >> > >>> I'm still wondering how we'll do this - it's not just different > >>> implementations of the same Class, but a completely different concepts > such > >>> as using SparkSession in Spark 2 instead of > SparkContext/StreamingContext > >>> in Spark 1. > >>> > >>> On Tue, Mar 21, 2017 at 7:25 PM Ted Yu <yuzhih...@gmail.com> wrote: > >>> > >>>> I have done some work over in HBASE-16179 where compatibility modules > are > >>>> created to isolate changes in Spark 2.x API so that code in > hbase-spark > >>>> module can be reused. > >>>> > >>>> FYI > >>>> > >>> > > -- > Jean-Baptiste Onofré > jbono...@apache.org > http://blog.nanthrax.net > Talend - http://www.talend.com >