Any updates for upgrading to spark 2.x? I tried to replace the dependency and found a compile error from implementing a scala trait: org.apache.beam.runners.spark.io.SourceRDD.SourcePartition is not abstract and does not override abstract method org$apache$spark$Partition$$super$equals(java.lang.Object) in org.apache.spark.Partition
(The spark side change was introduced in https://github.com/apache/spark/pull/12157.) Does anyone have ideas about this compile error? On Wed, May 3, 2017 at 1:32 PM, Jean-Baptiste Onofré <j...@nanthrax.net> wrote: > Hi Ted, > > My branch used Spark 2.1.0 and I just updated to 2.1.1. > > As discussed with Aviem, I should be able to create the pull request later > today. > > Regards > JB > > > On 05/03/2017 02:50 AM, Ted Yu wrote: > >> Spark 2.1.1 has been released. >> >> Consider using the new release in this work. >> >> Thanks >> >> On Wed, Mar 29, 2017 at 5:43 AM, Jean-Baptiste Onofré <j...@nanthrax.net> >> wrote: >> >> Cool for the PR merge, I will rebase my branch on it. >>> >>> Thanks ! >>> Regards >>> JB >>> >>> >>> On 03/29/2017 01:58 PM, Amit Sela wrote: >>> >>> @Ted definitely makes sense. >>>> @JB I'm merging https://github.com/apache/beam/pull/2354 soon so any >>>> deprecated Spark API issues should be resolved. >>>> >>>> On Wed, Mar 29, 2017 at 2:46 PM Ted Yu <yuzhih...@gmail.com> wrote: >>>> >>>> This is what I did over HBASE-16179: >>>> >>>>> >>>>> - f.call((asJavaIterator(it), conn)).iterator() >>>>> + // the return type is different in spark 1.x & 2.x, we handle >>>>> both >>>>> cases >>>>> + f.call(asJavaIterator(it), conn) match { >>>>> + // spark 1.x >>>>> + case iterable: Iterable[R] => iterable.iterator() >>>>> + // spark 2.x >>>>> + case iterator: Iterator[R] => iterator >>>>> + } >>>>> ) >>>>> >>>>> FYI >>>>> >>>>> On Wed, Mar 29, 2017 at 1:47 AM, Amit Sela <amitsel...@gmail.com> >>>>> wrote: >>>>> >>>>> Just tried to replace dependencies and see what happens: >>>>> >>>>>> >>>>>> Most required changes are about the runner using deprecated Spark >>>>>> APIs, >>>>>> >>>>>> and >>>>> >>>>> after fixing them the only real issue is with the Java API for >>>>>> Pair/FlatMapFunction that changed return value to Iterator (in 1.6 its >>>>>> Iterable). >>>>>> >>>>>> So I'm not sure that a profile that simply sets dependency on >>>>>> 1.6.3/2.1.0 >>>>>> is feasible. >>>>>> >>>>>> On Thu, Mar 23, 2017 at 10:22 AM Kobi Salant <kobi.sal...@gmail.com> >>>>>> wrote: >>>>>> >>>>>> So, if everything is in place in Spark 2.X and we use provided >>>>>> >>>>>>> >>>>>>> dependencies >>>>>> >>>>>> for Spark in Beam. >>>>>>> Theoretically, you can run the same code in 2.X without any need for >>>>>>> a >>>>>>> branch? >>>>>>> >>>>>>> 2017-03-23 9:47 GMT+02:00 Amit Sela <amitsel...@gmail.com>: >>>>>>> >>>>>>> If StreamingContext is valid and we don't have to use SparkSession, >>>>>>> >>>>>>>> >>>>>>>> and >>>>>>> >>>>>> >>>>> Accumulators are valid as well and we don't need AccumulatorsV2, I >>>>>> >>>>>>> >>>>>>>> don't >>>>>>> >>>>>> >>>>>> see a reason this shouldn't work (which means there are still tons of >>>>>>> >>>>>>>> reasons this could break, but I can't think of them off the top of >>>>>>>> my >>>>>>>> >>>>>>>> head >>>>>>> >>>>>>> right now). >>>>>>>> >>>>>>>> @JB simply add a profile for the Spark dependencies and run the >>>>>>>> >>>>>>>> tests - >>>>>>> >>>>>> >>>>> you'll have a very definitive answer ;-) . >>>>>> >>>>>>> If this passes, try on a cluster running Spark 2 as well. >>>>>>>> >>>>>>>> Let me know of I can assist. >>>>>>>> >>>>>>>> On Thu, Mar 23, 2017 at 6:55 AM Jean-Baptiste Onofré < >>>>>>>> >>>>>>>> j...@nanthrax.net> >>>>>>> >>>>>> >>>>> wrote: >>>>>> >>>>>>> >>>>>>>> Hi guys, >>>>>>>> >>>>>>>>> >>>>>>>>> Ismaël summarize well what I have in mind. >>>>>>>>> >>>>>>>>> I'm a bit late on the PoC around that (I started a branch already). >>>>>>>>> I will move forward over the week end. >>>>>>>>> >>>>>>>>> Regards >>>>>>>>> JB >>>>>>>>> >>>>>>>>> On 03/22/2017 11:42 PM, Ismaël Mejía wrote: >>>>>>>>> >>>>>>>>> Amit, I suppose JB is talking about the RDD based version, so no >>>>>>>>>> >>>>>>>>>> need >>>>>>>>> >>>>>>>> >>>>>> to worry about SparkSession or different incompatible APIs. >>>>>>> >>>>>>>> >>>>>>>>>> Remember the idea we are discussing is to have in master both the >>>>>>>>>> spark 1 and spark 2 runners using the RDD based translation. At >>>>>>>>>> >>>>>>>>>> the >>>>>>>>> >>>>>>>> >>>>> same time we can have a feature branch to evolve the DataSet >>>>>> >>>>>>> >>>>>>>>>> based >>>>>>>>> >>>>>>>> >>>>> translator (this one will replace the RDD based translator for >>>>>> >>>>>>> >>>>>>>>>> spark >>>>>>>>> >>>>>>>> >>>>>> 2 >>>>>>> >>>>>>> once it is mature). >>>>>>>> >>>>>>>>> >>>>>>>>>> The advantages have been already discussed as well as the >>>>>>>>>> >>>>>>>>>> possible >>>>>>>>> >>>>>>>> >>>>> issues so I think we have to see now if JB's idea is feasible and >>>>>> >>>>>>> >>>>>>>>>> how >>>>>>>>> >>>>>>>> >>>>>> hard would be to live with this while the DataSet version >>>>>>> >>>>>>>> >>>>>>>>>> evolves. >>>>>>>>> >>>>>>>> >>>>> >>>>>> I think what we are trying to avoid is to have a long living >>>>>>>>>> >>>>>>>>>> branch >>>>>>>>> >>>>>>>> >>>>> for a spark 2 runner based on RDD because the maintenance burden >>>>>> >>>>>>> would be even worse. We would have to fight not only with the >>>>>>>>>> >>>>>>>>>> double >>>>>>>>> >>>>>>>> >>>>>> merge of fixes (in case the profile idea does not work), but also >>>>>>> >>>>>>>> >>>>>>>>>> with >>>>>>>>> >>>>>>>> >>>>>>> the continue evolution of Beam and we would end up in the long >>>>>>>> >>>>>>>>> >>>>>>>>>> living >>>>>>>>> >>>>>>>> >>>>>> branch mess that others runners have dealt with (e.g. the Apex >>>>>>> >>>>>>>> >>>>>>>>>> runner) >>>>>>>>> >>>>>>>> >>>>>>> >>>>>>>> >>>>>>>>>> >>>>>>>>>> https://lists.apache.org/thread.html/12cc086f5ffe331cc70b893 >>>>>>>>> 22ce541 >>>>>>>>> >>>>>>>> >>>>> 6c3112b87efc3393e3e16032a2@%3Cdev.beam.apache.org%3E >>>>>> >>>>>>> >>>>>>>> >>>>>>>>> What do you think about this Amit ? Would you be ok to go with it >>>>>>>>>> >>>>>>>>>> if >>>>>>>>> >>>>>>>> >>>>>> JB's profile idea proves to help with the msintenance issues ? >>>>>>> >>>>>>>> >>>>>>>>>> Ismaël >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Wed, Mar 22, 2017 at 5:53 PM, Ted Yu <yuzhih...@gmail.com> >>>>>>>>>> >>>>>>>>>> wrote: >>>>>>>>> >>>>>>>> >>>>>> hbase-spark module doesn't use SparkSession. So situation there >>>>>>> >>>>>>>> >>>>>>>>>>> is >>>>>>>>>> >>>>>>>>> >>>>> simpler >>>>>> >>>>>>> >>>>>>>>> :-) >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Wed, Mar 22, 2017 at 5:35 AM, Amit Sela < >>>>>>>>>>> >>>>>>>>>>> amitsel...@gmail.com> >>>>>>>>>> >>>>>>>>> >>>>> wrote: >>>>>> >>>>>>> >>>>>>>>> >>>>>>>>>> I'm still wondering how we'll do this - it's not just different >>>>>>>>>>> >>>>>>>>>>>> implementations of the same Class, but a completely different >>>>>>>>>>>> >>>>>>>>>>>> concepts >>>>>>>>>>> >>>>>>>>>> >>>>>>>> such >>>>>>>>> >>>>>>>>> as using SparkSession in Spark 2 instead of >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> SparkContext/StreamingContext >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> in Spark 1. >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> On Tue, Mar 21, 2017 at 7:25 PM Ted Yu <yuzhih...@gmail.com> >>>>>>>>>>>> >>>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>> >>>>>>> >>>>>>>> I have done some work over in HBASE-16179 where compatibility >>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> modules >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>> are >>>>>>>>> >>>>>>>>> created to isolate changes in Spark 2.x API so that code in >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>> hbase-spark >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>> module can be reused. >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>> FYI >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> -- >>>>>>>>> Jean-Baptiste Onofré >>>>>>>>> jbono...@apache.org >>>>>>>>> http://blog.nanthrax.net >>>>>>>>> Talend - http://www.talend.com >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> -- >>> Jean-Baptiste Onofré >>> jbono...@apache.org >>> http://blog.nanthrax.net >>> Talend - http://www.talend.com >>> >>> >> > -- > Jean-Baptiste Onofré > jbono...@apache.org > http://blog.nanthrax.net > Talend - http://www.talend.com >