Re: Beam spark 2.x runner status

Amit Sela Wed, 29 Mar 2017 02:23:18 -0700

You will need replace the return value of the callback to iterator

On Wed, Mar 29, 2017, 12:19 Jean-Baptiste Onofré <j...@nanthrax.net> wrote:


> I tested a workaround with reflection and it seems to work (at least it
> compiles
> ;)).
>
> I will share the PR asap.
>
> Regards
> JB
>
> On 03/29/2017 10:47 AM, Amit Sela wrote:
> > Just tried to replace dependencies and see what happens:
> >
> > Most required changes are about the runner using deprecated Spark APIs,
> and
> > after fixing them the only real issue is with the Java API for
> > Pair/FlatMapFunction that changed return value to Iterator (in 1.6 its
> > Iterable).
> >
> > So I'm not sure that a profile that simply sets dependency on 1.6.3/2.1.0
> > is feasible.
> >
> > On Thu, Mar 23, 2017 at 10:22 AM Kobi Salant <kobi.sal...@gmail.com>
> wrote:
> >
> >> So, if everything is in place in Spark 2.X and we use provided
> dependencies
> >> for Spark in Beam.
> >> Theoretically, you can run the same code in 2.X without any need for a
> >> branch?
> >>
> >> 2017-03-23 9:47 GMT+02:00 Amit Sela <amitsel...@gmail.com>:
> >>
> >>> If StreamingContext is valid and we don't have to use SparkSession, and
> >>> Accumulators are valid as well and we don't need AccumulatorsV2, I
> don't
> >>> see a reason this shouldn't work (which means there are still tons of
> >>> reasons this could break, but I can't think of them off the top of my
> >> head
> >>> right now).
> >>>
> >>> @JB simply add a profile for the Spark dependencies and run the tests -
> >>> you'll have a very definitive answer ;-) .
> >>> If this passes, try on a cluster running Spark 2 as well.
> >>>
> >>> Let me know of I can assist.
> >>>
> >>> On Thu, Mar 23, 2017 at 6:55 AM Jean-Baptiste Onofré <j...@nanthrax.net>
> >>> wrote:
> >>>
> >>>> Hi guys,
> >>>>
> >>>> Ismaël summarize well what I have in mind.
> >>>>
> >>>> I'm a bit late on the PoC around that (I started a branch already).
> >>>> I will move forward over the week end.
> >>>>
> >>>> Regards
> >>>> JB
> >>>>
> >>>> On 03/22/2017 11:42 PM, Ismaël Mejía wrote:
> >>>>> Amit, I suppose JB is talking about the RDD based version, so no need
> >>>>> to worry about SparkSession or different incompatible APIs.
> >>>>>
> >>>>> Remember the idea we are discussing is to have in master both the
> >>>>> spark 1 and spark 2 runners using the RDD based translation. At the
> >>>>> same time we can have a feature branch to evolve the DataSet based
> >>>>> translator (this one will replace the RDD based translator for spark
> >> 2
> >>>>> once it is mature).
> >>>>>
> >>>>> The advantages have been already discussed as well as the possible
> >>>>> issues so I think we have to see now if JB's idea is feasible and how
> >>>>> hard would be to live with this while the DataSet version evolves.
> >>>>>
> >>>>> I think what we are trying to avoid is to have a long living branch
> >>>>> for a spark 2 runner based on RDD  because the maintenance burden
> >>>>> would be even worse. We would have to fight not only with the double
> >>>>> merge of fixes (in case the profile idea does not work), but also
> >> with
> >>>>> the continue evolution of Beam and we would end up in the long living
> >>>>> branch mess that others runners have dealt with (e.g. the Apex
> >> runner)
> >>>>>
> >>>>>
> >>>> https://lists.apache.org/thread.html/12cc086f5ffe331cc70b89322ce541
> >>> 6c3112b87efc3393e3e16032a2@%3Cdev.beam.apache.org%3E
> >>>>>
> >>>>> What do you think about this Amit ? Would you be ok to go with it if
> >>>>> JB's profile idea proves to help with the msintenance issues ?
> >>>>>
> >>>>> Ismaël
> >>>>>
> >>>>>
> >>>>>
> >>>>> On Wed, Mar 22, 2017 at 5:53 PM, Ted Yu <yuzhih...@gmail.com> wrote:
> >>>>>> hbase-spark module doesn't use SparkSession. So situation there is
> >>>> simpler
> >>>>>> :-)
> >>>>>>
> >>>>>> On Wed, Mar 22, 2017 at 5:35 AM, Amit Sela <amitsel...@gmail.com>
> >>>> wrote:
> >>>>>>
> >>>>>>> I'm still wondering how we'll do this - it's not just different
> >>>>>>> implementations of the same Class, but a completely different
> >>> concepts
> >>>> such
> >>>>>>> as using SparkSession in Spark 2 instead of
> >>>> SparkContext/StreamingContext
> >>>>>>> in Spark 1.
> >>>>>>>
> >>>>>>> On Tue, Mar 21, 2017 at 7:25 PM Ted Yu <yuzhih...@gmail.com>
> >> wrote:
> >>>>>>>
> >>>>>>>> I have done some work over in HBASE-16179 where compatibility
> >>> modules
> >>>> are
> >>>>>>>> created to isolate changes in Spark 2.x API so that code in
> >>>> hbase-spark
> >>>>>>>> module can be reused.
> >>>>>>>>
> >>>>>>>> FYI
> >>>>>>>>
> >>>>>>>
> >>>>
> >>>> --
> >>>> Jean-Baptiste Onofré
> >>>> jbono...@apache.org
> >>>> http://blog.nanthrax.net
> >>>> Talend - http://www.talend.com
> >>>>
> >>>
> >>
> >
>
> --
> Jean-Baptiste Onofré
> jbono...@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>

Re: Beam spark 2.x runner status

Reply via email to