I tested a workaround with reflection and it seems to work (at least it compiles ;)).

I will share the PR asap.

Regards
JB

On 03/29/2017 10:47 AM, Amit Sela wrote:
Just tried to replace dependencies and see what happens:

Most required changes are about the runner using deprecated Spark APIs, and
after fixing them the only real issue is with the Java API for
Pair/FlatMapFunction that changed return value to Iterator (in 1.6 its
Iterable).

So I'm not sure that a profile that simply sets dependency on 1.6.3/2.1.0
is feasible.

On Thu, Mar 23, 2017 at 10:22 AM Kobi Salant <kobi.sal...@gmail.com> wrote:

So, if everything is in place in Spark 2.X and we use provided dependencies
for Spark in Beam.
Theoretically, you can run the same code in 2.X without any need for a
branch?

2017-03-23 9:47 GMT+02:00 Amit Sela <amitsel...@gmail.com>:

If StreamingContext is valid and we don't have to use SparkSession, and
Accumulators are valid as well and we don't need AccumulatorsV2, I don't
see a reason this shouldn't work (which means there are still tons of
reasons this could break, but I can't think of them off the top of my
head
right now).

@JB simply add a profile for the Spark dependencies and run the tests -
you'll have a very definitive answer ;-) .
If this passes, try on a cluster running Spark 2 as well.

Let me know of I can assist.

On Thu, Mar 23, 2017 at 6:55 AM Jean-Baptiste Onofré <j...@nanthrax.net>
wrote:

Hi guys,

Ismaël summarize well what I have in mind.

I'm a bit late on the PoC around that (I started a branch already).
I will move forward over the week end.

Regards
JB

On 03/22/2017 11:42 PM, Ismaël Mejía wrote:
Amit, I suppose JB is talking about the RDD based version, so no need
to worry about SparkSession or different incompatible APIs.

Remember the idea we are discussing is to have in master both the
spark 1 and spark 2 runners using the RDD based translation. At the
same time we can have a feature branch to evolve the DataSet based
translator (this one will replace the RDD based translator for spark
2
once it is mature).

The advantages have been already discussed as well as the possible
issues so I think we have to see now if JB's idea is feasible and how
hard would be to live with this while the DataSet version evolves.

I think what we are trying to avoid is to have a long living branch
for a spark 2 runner based on RDD  because the maintenance burden
would be even worse. We would have to fight not only with the double
merge of fixes (in case the profile idea does not work), but also
with
the continue evolution of Beam and we would end up in the long living
branch mess that others runners have dealt with (e.g. the Apex
runner)


https://lists.apache.org/thread.html/12cc086f5ffe331cc70b89322ce541
6c3112b87efc3393e3e16032a2@%3Cdev.beam.apache.org%3E

What do you think about this Amit ? Would you be ok to go with it if
JB's profile idea proves to help with the msintenance issues ?

Ismaël



On Wed, Mar 22, 2017 at 5:53 PM, Ted Yu <yuzhih...@gmail.com> wrote:
hbase-spark module doesn't use SparkSession. So situation there is
simpler
:-)

On Wed, Mar 22, 2017 at 5:35 AM, Amit Sela <amitsel...@gmail.com>
wrote:

I'm still wondering how we'll do this - it's not just different
implementations of the same Class, but a completely different
concepts
such
as using SparkSession in Spark 2 instead of
SparkContext/StreamingContext
in Spark 1.

On Tue, Mar 21, 2017 at 7:25 PM Ted Yu <yuzhih...@gmail.com>
wrote:

I have done some work over in HBASE-16179 where compatibility
modules
are
created to isolate changes in Spark 2.x API so that code in
hbase-spark
module can be reused.

FYI



--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com





--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com

Reply via email to