Re: [DISCUSS] Drop Spark 1.x support to focus on Spark 2.x

2017-11-26 Thread Jean-Baptiste Onofré

Hi all,

Quick update about the Spark 2.x runner: I updated the PR with Spark 2.x update 
only:


https://github.com/apache/beam/pull/3808

I will rebase and do new tests as soon as gitbox will be back.

Don't hesitate to take a look and review.

Thanks !
Regards
JB

On 11/21/2017 08:32 AM, Jean-Baptiste Onofré wrote:

Hi Tim,

I will update the PR today for a new review round. Yes, you are correct: the 
target is 2.3.0 for end of this year (with announcement in the Release Notes).


Regards
JB

On 11/20/2017 10:09 PM, Tim wrote:

Thanks JB

 From which release will Spark 1.x be dropped please? Is this slated for 2.3.0 
at the end of the year?


Thanks,
Tim,
Sent from my iPhone


On 20 Nov 2017, at 21:21, Jean-Baptiste Onofré  wrote:

Hi,
,
it seems we have a consensus to upgrade to Spark 2.x, dropping Spark 1.x. I 
will upgrade the PR accordingly.


Thanks all for your input and feedback.

Regards
JB


On 11/13/2017 09:32 AM, Jean-Baptiste Onofré wrote:
Hi Beamers,
I'm forwarding this discussion & vote from the dev mailing list to the user 
mailing list.

The goal is to have your feedback as user.
Basically, we have two options:
1. Right now, in the PR, we support both Spark 1.x and 2.x using three 
artifacts (common, spark1, spark2). You, as users, pick up spark1 or spark2 
in your dependencies set depending the Spark target version you want.
2. The other option is to upgrade and focus on Spark 2.x in Beam 2.3.0. If 
you still want to use Spark 1.x, then, you will be stuck up to Beam 2.2.0.

Thoughts ?
Thanks !
Regards
JB
 Forwarded Message 
Subject: [VOTE] Drop Spark 1.x support to focus on Spark 2.x
Date: Wed, 8 Nov 2017 08:27:58 +0100
From: Jean-Baptiste Onofré 
Reply-To: d...@beam.apache.org
To: d...@beam.apache.org
Hi all,
as you might know, we are working on Spark 2.x support in the Spark runner.
I'm working on a PR about that:
https://github.com/apache/beam/pull/3808
Today, we have something working with both Spark 1.x and 2.x from a code 
standpoint, but I have to deal with dependencies. It's the first step of the 
update as I'm still using RDD, the second step would be to support dataframe 
(but for that, I would need PCollection elements with schemas, that's 
another topic on which Eugene, Reuven and I are discussing).
However, as all major distributions now ship Spark 2.x, I don't think it's 
required anymore to support Spark 1.x.
If we agree, I will update and cleanup the PR to only support and focus on 
Spark 2.x.

So, that's why I'm calling for a vote:
   [ ] +1 to drop Spark 1.x support and upgrade to Spark 2.x only
   [ ] 0 (I don't care ;))
   [ ] -1, I would like to still support Spark 1.x, and so having support of 
both Spark 1.x and 2.x (please provide specific comment)
This vote is open for 48 hours (I have the commits ready, just waiting the 
end of the vote to push on the PR).

Thanks !
Regards
JB


--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com




--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com


Re: [DISCUSS] Drop Spark 1.x support to focus on Spark 2.x

2017-11-20 Thread Jean-Baptiste Onofré

Hi Tim,

I will update the PR today for a new review round. Yes, you are correct: the 
target is 2.3.0 for end of this year (with announcement in the Release Notes).


Regards
JB

On 11/20/2017 10:09 PM, Tim wrote:

Thanks JB

 From which release will Spark 1.x be dropped please? Is this slated for 2.3.0 
at the end of the year?

Thanks,
Tim,
Sent from my iPhone


On 20 Nov 2017, at 21:21, Jean-Baptiste Onofré  wrote:

Hi,
,
it seems we have a consensus to upgrade to Spark 2.x, dropping Spark 1.x. I 
will upgrade the PR accordingly.

Thanks all for your input and feedback.

Regards
JB


On 11/13/2017 09:32 AM, Jean-Baptiste Onofré wrote:
Hi Beamers,
I'm forwarding this discussion & vote from the dev mailing list to the user 
mailing list.
The goal is to have your feedback as user.
Basically, we have two options:
1. Right now, in the PR, we support both Spark 1.x and 2.x using three 
artifacts (common, spark1, spark2). You, as users, pick up spark1 or spark2 in 
your dependencies set depending the Spark target version you want.
2. The other option is to upgrade and focus on Spark 2.x in Beam 2.3.0. If you 
still want to use Spark 1.x, then, you will be stuck up to Beam 2.2.0.
Thoughts ?
Thanks !
Regards
JB
 Forwarded Message 
Subject: [VOTE] Drop Spark 1.x support to focus on Spark 2.x
Date: Wed, 8 Nov 2017 08:27:58 +0100
From: Jean-Baptiste Onofré 
Reply-To: d...@beam.apache.org
To: d...@beam.apache.org
Hi all,
as you might know, we are working on Spark 2.x support in the Spark runner.
I'm working on a PR about that:
https://github.com/apache/beam/pull/3808
Today, we have something working with both Spark 1.x and 2.x from a code 
standpoint, but I have to deal with dependencies. It's the first step of the 
update as I'm still using RDD, the second step would be to support dataframe 
(but for that, I would need PCollection elements with schemas, that's another 
topic on which Eugene, Reuven and I are discussing).
However, as all major distributions now ship Spark 2.x, I don't think it's 
required anymore to support Spark 1.x.
If we agree, I will update and cleanup the PR to only support and focus on 
Spark 2.x.
So, that's why I'm calling for a vote:
   [ ] +1 to drop Spark 1.x support and upgrade to Spark 2.x only
   [ ] 0 (I don't care ;))
   [ ] -1, I would like to still support Spark 1.x, and so having support of 
both Spark 1.x and 2.x (please provide specific comment)
This vote is open for 48 hours (I have the commits ready, just waiting the end 
of the vote to push on the PR).
Thanks !
Regards
JB


--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com


--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com


Re: [DISCUSS] Drop Spark 1.x support to focus on Spark 2.x

2017-11-20 Thread Tim
Thanks JB

From which release will Spark 1.x be dropped please? Is this slated for 2.3.0 
at the end of the year?

Thanks,
Tim,
Sent from my iPhone

> On 20 Nov 2017, at 21:21, Jean-Baptiste Onofré  wrote:
> 
> Hi,
> 
> it seems we have a consensus to upgrade to Spark 2.x, dropping Spark 1.x. I 
> will upgrade the PR accordingly.
> 
> Thanks all for your input and feedback.
> 
> Regards
> JB
> 
>> On 11/13/2017 09:32 AM, Jean-Baptiste Onofré wrote:
>> Hi Beamers,
>> I'm forwarding this discussion & vote from the dev mailing list to the user 
>> mailing list.
>> The goal is to have your feedback as user.
>> Basically, we have two options:
>> 1. Right now, in the PR, we support both Spark 1.x and 2.x using three 
>> artifacts (common, spark1, spark2). You, as users, pick up spark1 or spark2 
>> in your dependencies set depending the Spark target version you want.
>> 2. The other option is to upgrade and focus on Spark 2.x in Beam 2.3.0. If 
>> you still want to use Spark 1.x, then, you will be stuck up to Beam 2.2.0.
>> Thoughts ?
>> Thanks !
>> Regards
>> JB
>>  Forwarded Message 
>> Subject: [VOTE] Drop Spark 1.x support to focus on Spark 2.x
>> Date: Wed, 8 Nov 2017 08:27:58 +0100
>> From: Jean-Baptiste Onofré 
>> Reply-To: d...@beam.apache.org
>> To: d...@beam.apache.org
>> Hi all,
>> as you might know, we are working on Spark 2.x support in the Spark runner.
>> I'm working on a PR about that:
>> https://github.com/apache/beam/pull/3808
>> Today, we have something working with both Spark 1.x and 2.x from a code 
>> standpoint, but I have to deal with dependencies. It's the first step of the 
>> update as I'm still using RDD, the second step would be to support dataframe 
>> (but for that, I would need PCollection elements with schemas, that's 
>> another topic on which Eugene, Reuven and I are discussing).
>> However, as all major distributions now ship Spark 2.x, I don't think it's 
>> required anymore to support Spark 1.x.
>> If we agree, I will update and cleanup the PR to only support and focus on 
>> Spark 2.x.
>> So, that's why I'm calling for a vote:
>>   [ ] +1 to drop Spark 1.x support and upgrade to Spark 2.x only
>>   [ ] 0 (I don't care ;))
>>   [ ] -1, I would like to still support Spark 1.x, and so having support of 
>> both Spark 1.x and 2.x (please provide specific comment)
>> This vote is open for 48 hours (I have the commits ready, just waiting the 
>> end of the vote to push on the PR).
>> Thanks !
>> Regards
>> JB
> 
> -- 
> Jean-Baptiste Onofré
> jbono...@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com


Re: [DISCUSS] Drop Spark 1.x support to focus on Spark 2.x

2017-11-20 Thread Jean-Baptiste Onofré

Hi,

it seems we have a consensus to upgrade to Spark 2.x, dropping Spark 1.x. I will 
upgrade the PR accordingly.


Thanks all for your input and feedback.

Regards
JB

On 11/13/2017 09:32 AM, Jean-Baptiste Onofré wrote:

Hi Beamers,

I'm forwarding this discussion & vote from the dev mailing list to the user 
mailing list.

The goal is to have your feedback as user.

Basically, we have two options:
1. Right now, in the PR, we support both Spark 1.x and 2.x using three artifacts 
(common, spark1, spark2). You, as users, pick up spark1 or spark2 in your 
dependencies set depending the Spark target version you want.
2. The other option is to upgrade and focus on Spark 2.x in Beam 2.3.0. If you 
still want to use Spark 1.x, then, you will be stuck up to Beam 2.2.0.


Thoughts ?

Thanks !
Regards
JB


 Forwarded Message 
Subject: [VOTE] Drop Spark 1.x support to focus on Spark 2.x
Date: Wed, 8 Nov 2017 08:27:58 +0100
From: Jean-Baptiste Onofré 
Reply-To: d...@beam.apache.org
To: d...@beam.apache.org

Hi all,

as you might know, we are working on Spark 2.x support in the Spark runner.

I'm working on a PR about that:

https://github.com/apache/beam/pull/3808

Today, we have something working with both Spark 1.x and 2.x from a code 
standpoint, but I have to deal with dependencies. It's the first step of the 
update as I'm still using RDD, the second step would be to support dataframe 
(but for that, I would need PCollection elements with schemas, that's another 
topic on which Eugene, Reuven and I are discussing).


However, as all major distributions now ship Spark 2.x, I don't think it's 
required anymore to support Spark 1.x.


If we agree, I will update and cleanup the PR to only support and focus on Spark 
2.x.


So, that's why I'm calling for a vote:

   [ ] +1 to drop Spark 1.x support and upgrade to Spark 2.x only
   [ ] 0 (I don't care ;))
   [ ] -1, I would like to still support Spark 1.x, and so having support of 
both Spark 1.x and 2.x (please provide specific comment)


This vote is open for 48 hours (I have the commits ready, just waiting the end 
of the vote to push on the PR).


Thanks !
Regards
JB


--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com


Re: [DISCUSS] Drop Spark 1.x support to focus on Spark 2.x

2017-11-15 Thread Jean-Baptiste Onofré

Any additional feedback about that ?

I will update the thread with the two branches later today: the one with Spark 
1.x & 2.x support, the one with Spark 2.x upgrade.


Thanks
Regards
JB

On 11/13/2017 09:32 AM, Jean-Baptiste Onofré wrote:

Hi Beamers,

I'm forwarding this discussion & vote from the dev mailing list to the user 
mailing list.

The goal is to have your feedback as user.

Basically, we have two options:
1. Right now, in the PR, we support both Spark 1.x and 2.x using three artifacts 
(common, spark1, spark2). You, as users, pick up spark1 or spark2 in your 
dependencies set depending the Spark target version you want.
2. The other option is to upgrade and focus on Spark 2.x in Beam 2.3.0. If you 
still want to use Spark 1.x, then, you will be stuck up to Beam 2.2.0.


Thoughts ?

Thanks !
Regards
JB


 Forwarded Message 
Subject: [VOTE] Drop Spark 1.x support to focus on Spark 2.x
Date: Wed, 8 Nov 2017 08:27:58 +0100
From: Jean-Baptiste Onofré 
Reply-To: d...@beam.apache.org
To: d...@beam.apache.org

Hi all,

as you might know, we are working on Spark 2.x support in the Spark runner.

I'm working on a PR about that:

https://github.com/apache/beam/pull/3808

Today, we have something working with both Spark 1.x and 2.x from a code 
standpoint, but I have to deal with dependencies. It's the first step of the 
update as I'm still using RDD, the second step would be to support dataframe 
(but for that, I would need PCollection elements with schemas, that's another 
topic on which Eugene, Reuven and I are discussing).


However, as all major distributions now ship Spark 2.x, I don't think it's 
required anymore to support Spark 1.x.


If we agree, I will update and cleanup the PR to only support and focus on Spark 
2.x.


So, that's why I'm calling for a vote:

   [ ] +1 to drop Spark 1.x support and upgrade to Spark 2.x only
   [ ] 0 (I don't care ;))
   [ ] -1, I would like to still support Spark 1.x, and so having support of 
both Spark 1.x and 2.x (please provide specific comment)


This vote is open for 48 hours (I have the commits ready, just waiting the end 
of the vote to push on the PR).


Thanks !
Regards
JB


--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com


Re: [DISCUSS] Drop Spark 1.x support to focus on Spark 2.x

2017-11-13 Thread Jean-Baptiste Onofré

Hi Tim,

Basically, if an user still wants to use Spark 1.x, he would just be "stuck" 
with Beam 2.2.0.


I would like to see a Beam 2.3.0 end of December/beginning of January with Spark 
2.x support (exclusive or with 1.x).


The goal of the discussion is just to know if it's worth to maintain Spark 1.x 
and 2.x or if I can do cleanup to support only 2.x ;)


Regards
JB

On 11/13/2017 10:56 AM, Tim Robertson wrote:

Thanks JB

On "thoughts":

- Cloudera 5.13 will still default to 1.6 even though a 2.2 parcel is available 
(HWX provides both)
- Cloudera support for spark 2 has a list of exceptions 
(https://www.cloudera.com/documentation/spark2/latest/topics/spark2_known_issues.html)

   - I am not sure if the HBaseIO would be affected
   - I am not sure if structured streaming would have implications
   - it might stop customers from being able to run spark 2 at all due to 
support agreements

- Spark 2.3 (EOY) will drop Scala 2.10 support
- IBM's now defunct distro only has 1.6
- Oozie doesn't have a spark 2 action (need to use a shell action)
- There are a lot of folks with code running on 1.3,1.4 and 1.5 still
- Spark 2.2+ requires Java 8 too, while <2.2 was J7 like Beam (not sure if this 
has other implications for the cross deployment nature of Beam)


My first impressions of Beam was really favourable as it all just worked first 
time on a CDH Spark 1.6 cluster.  For us it is lacking resources to refactor 
legacy code which delays the 2.2 push.


With that said I think is it very reasonable to have a clear cut off in Beam, 
especially if it limits progress / causes headaches in packaging, robustness 
etc.  I'd recommend putting it in a 6 month timeframe which might align with 2.3?


Hope this helps,
Tim











On Mon, Nov 13, 2017 at 10:07 AM, Neville Dipale > wrote:


Hi JB,


   [X ] +1 to drop Spark 1.x support and upgrade to Spark 2.x only
   [ ] 0 (I don't care ;))
   [ ] -1, I would like to still support Spark 1.x, and so having support of
both Spark 1.x and 2.x (please provide specific comment)

On 13 November 2017 at 10:32, Jean-Baptiste Onofré > wrote:

Hi Beamers,

I'm forwarding this discussion & vote from the dev mailing list to the
user mailing list.
The goal is to have your feedback as user.

Basically, we have two options:
1. Right now, in the PR, we support both Spark 1.x and 2.x using three
artifacts (common, spark1, spark2). You, as users, pick up spark1 or
spark2 in your dependencies set depending the Spark target version you 
want.
2. The other option is to upgrade and focus on Spark 2.x in Beam 2.3.0.
If you still want to use Spark 1.x, then, you will be stuck up to Beam
2.2.0.

Thoughts ?

Thanks !
Regards
JB


 Forwarded Message 
Subject: [VOTE] Drop Spark 1.x support to focus on Spark 2.x
Date: Wed, 8 Nov 2017 08:27:58 +0100
From: Jean-Baptiste Onofré >
Reply-To: d...@beam.apache.org 
To: d...@beam.apache.org 

Hi all,

as you might know, we are working on Spark 2.x support in the Spark 
runner.

I'm working on a PR about that:

https://github.com/apache/beam/pull/3808


Today, we have something working with both Spark 1.x and 2.x from a code
standpoint, but I have to deal with dependencies. It's the first step of
the update as I'm still using RDD, the second step would be to support
dataframe (but for that, I would need PCollection elements with schemas,
that's another topic on which Eugene, Reuven and I are discussing).

However, as all major distributions now ship Spark 2.x, I don't think
it's required anymore to support Spark 1.x.

If we agree, I will update and cleanup the PR to only support and focus
on Spark 2.x.

So, that's why I'm calling for a vote:

   [ ] +1 to drop Spark 1.x support and upgrade to Spark 2.x only
   [ ] 0 (I don't care ;))
   [ ] -1, I would like to still support Spark 1.x, and so having
support of both Spark 1.x and 2.x (please provide specific comment)

This vote is open for 48 hours (I have the commits ready, just waiting
the end of the vote to push on the PR).

Thanks !
Regards
JB
-- 
Jean-Baptiste Onofré

jbono...@apache.org 
http://blog.nanthrax.net
Talend - http://www.talend.com





--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com


Re: [DISCUSS] Drop Spark 1.x support to focus on Spark 2.x

2017-11-13 Thread Tim Robertson
Thanks JB

On "thoughts":

- Cloudera 5.13 will still default to 1.6 even though a 2.2 parcel is
available (HWX provides both)
- Cloudera support for spark 2 has a list of exceptions (
https://www.cloudera.com/documentation/spark2/latest/topics/spark2_known_issues.html
)
  - I am not sure if the HBaseIO would be affected
  - I am not sure if structured streaming would have implications
  - it might stop customers from being able to run spark 2 at all due to
support agreements
- Spark 2.3 (EOY) will drop Scala 2.10 support
- IBM's now defunct distro only has 1.6
- Oozie doesn't have a spark 2 action (need to use a shell action)
- There are a lot of folks with code running on 1.3,1.4 and 1.5 still
- Spark 2.2+ requires Java 8 too, while <2.2 was J7 like Beam (not sure if
this has other implications for the cross deployment nature of Beam)

My first impressions of Beam was really favourable as it all just worked
first time on a CDH Spark 1.6 cluster.  For us it is lacking resources to
refactor legacy code which delays the 2.2 push.

With that said I think is it very reasonable to have a clear cut off in
Beam, especially if it limits progress / causes headaches in packaging,
robustness etc.  I'd recommend putting it in a 6 month timeframe which
might align with 2.3?

Hope this helps,
Tim











On Mon, Nov 13, 2017 at 10:07 AM, Neville Dipale 
wrote:

> Hi JB,
>
>
>   [X ] +1 to drop Spark 1.x support and upgrade to Spark 2.x only
>   [ ] 0 (I don't care ;))
>   [ ] -1, I would like to still support Spark 1.x, and so having support
> of both Spark 1.x and 2.x (please provide specific comment)
>
> On 13 November 2017 at 10:32, Jean-Baptiste Onofré 
> wrote:
>
>> Hi Beamers,
>>
>> I'm forwarding this discussion & vote from the dev mailing list to the
>> user mailing list.
>> The goal is to have your feedback as user.
>>
>> Basically, we have two options:
>> 1. Right now, in the PR, we support both Spark 1.x and 2.x using three
>> artifacts (common, spark1, spark2). You, as users, pick up spark1 or spark2
>> in your dependencies set depending the Spark target version you want.
>> 2. The other option is to upgrade and focus on Spark 2.x in Beam 2.3.0.
>> If you still want to use Spark 1.x, then, you will be stuck up to Beam
>> 2.2.0.
>>
>> Thoughts ?
>>
>> Thanks !
>> Regards
>> JB
>>
>>
>>  Forwarded Message 
>> Subject: [VOTE] Drop Spark 1.x support to focus on Spark 2.x
>> Date: Wed, 8 Nov 2017 08:27:58 +0100
>> From: Jean-Baptiste Onofré 
>> Reply-To: d...@beam.apache.org
>> To: d...@beam.apache.org
>>
>> Hi all,
>>
>> as you might know, we are working on Spark 2.x support in the Spark
>> runner.
>>
>> I'm working on a PR about that:
>>
>> https://github.com/apache/beam/pull/3808
>>
>> Today, we have something working with both Spark 1.x and 2.x from a code
>> standpoint, but I have to deal with dependencies. It's the first step of
>> the update as I'm still using RDD, the second step would be to support
>> dataframe (but for that, I would need PCollection elements with schemas,
>> that's another topic on which Eugene, Reuven and I are discussing).
>>
>> However, as all major distributions now ship Spark 2.x, I don't think
>> it's required anymore to support Spark 1.x.
>>
>> If we agree, I will update and cleanup the PR to only support and focus
>> on Spark 2.x.
>>
>> So, that's why I'm calling for a vote:
>>
>>   [ ] +1 to drop Spark 1.x support and upgrade to Spark 2.x only
>>   [ ] 0 (I don't care ;))
>>   [ ] -1, I would like to still support Spark 1.x, and so having support
>> of both Spark 1.x and 2.x (please provide specific comment)
>>
>> This vote is open for 48 hours (I have the commits ready, just waiting
>> the end of the vote to push on the PR).
>>
>> Thanks !
>> Regards
>> JB
>> --
>> Jean-Baptiste Onofré
>> jbono...@apache.org
>> http://blog.nanthrax.net
>> Talend - http://www.talend.com
>>
>
>


Re: [DISCUSS] Drop Spark 1.x support to focus on Spark 2.x

2017-11-13 Thread Neville Dipale
Hi JB,


  [X ] +1 to drop Spark 1.x support and upgrade to Spark 2.x only
  [ ] 0 (I don't care ;))
  [ ] -1, I would like to still support Spark 1.x, and so having support of
both Spark 1.x and 2.x (please provide specific comment)

On 13 November 2017 at 10:32, Jean-Baptiste Onofré  wrote:

> Hi Beamers,
>
> I'm forwarding this discussion & vote from the dev mailing list to the
> user mailing list.
> The goal is to have your feedback as user.
>
> Basically, we have two options:
> 1. Right now, in the PR, we support both Spark 1.x and 2.x using three
> artifacts (common, spark1, spark2). You, as users, pick up spark1 or spark2
> in your dependencies set depending the Spark target version you want.
> 2. The other option is to upgrade and focus on Spark 2.x in Beam 2.3.0. If
> you still want to use Spark 1.x, then, you will be stuck up to Beam 2.2.0.
>
> Thoughts ?
>
> Thanks !
> Regards
> JB
>
>
>  Forwarded Message 
> Subject: [VOTE] Drop Spark 1.x support to focus on Spark 2.x
> Date: Wed, 8 Nov 2017 08:27:58 +0100
> From: Jean-Baptiste Onofré 
> Reply-To: d...@beam.apache.org
> To: d...@beam.apache.org
>
> Hi all,
>
> as you might know, we are working on Spark 2.x support in the Spark runner.
>
> I'm working on a PR about that:
>
> https://github.com/apache/beam/pull/3808
>
> Today, we have something working with both Spark 1.x and 2.x from a code
> standpoint, but I have to deal with dependencies. It's the first step of
> the update as I'm still using RDD, the second step would be to support
> dataframe (but for that, I would need PCollection elements with schemas,
> that's another topic on which Eugene, Reuven and I are discussing).
>
> However, as all major distributions now ship Spark 2.x, I don't think it's
> required anymore to support Spark 1.x.
>
> If we agree, I will update and cleanup the PR to only support and focus on
> Spark 2.x.
>
> So, that's why I'm calling for a vote:
>
>   [ ] +1 to drop Spark 1.x support and upgrade to Spark 2.x only
>   [ ] 0 (I don't care ;))
>   [ ] -1, I would like to still support Spark 1.x, and so having support
> of both Spark 1.x and 2.x (please provide specific comment)
>
> This vote is open for 48 hours (I have the commits ready, just waiting the
> end of the vote to push on the PR).
>
> Thanks !
> Regards
> JB
> --
> Jean-Baptiste Onofré
> jbono...@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>


[DISCUSS] Drop Spark 1.x support to focus on Spark 2.x

2017-11-13 Thread Jean-Baptiste Onofré

Hi Beamers,

I'm forwarding this discussion & vote from the dev mailing list to the user 
mailing list.

The goal is to have your feedback as user.

Basically, we have two options:
1. Right now, in the PR, we support both Spark 1.x and 2.x using three artifacts 
(common, spark1, spark2). You, as users, pick up spark1 or spark2 in your 
dependencies set depending the Spark target version you want.
2. The other option is to upgrade and focus on Spark 2.x in Beam 2.3.0. If you 
still want to use Spark 1.x, then, you will be stuck up to Beam 2.2.0.


Thoughts ?

Thanks !
Regards
JB


 Forwarded Message 
Subject: [VOTE] Drop Spark 1.x support to focus on Spark 2.x
Date: Wed, 8 Nov 2017 08:27:58 +0100
From: Jean-Baptiste Onofré 
Reply-To: d...@beam.apache.org
To: d...@beam.apache.org

Hi all,

as you might know, we are working on Spark 2.x support in the Spark runner.

I'm working on a PR about that:

https://github.com/apache/beam/pull/3808

Today, we have something working with both Spark 1.x and 2.x from a code 
standpoint, but I have to deal with dependencies. It's the first step of the 
update as I'm still using RDD, the second step would be to support dataframe 
(but for that, I would need PCollection elements with schemas, that's another 
topic on which Eugene, Reuven and I are discussing).


However, as all major distributions now ship Spark 2.x, I don't think it's 
required anymore to support Spark 1.x.


If we agree, I will update and cleanup the PR to only support and focus on Spark 
2.x.


So, that's why I'm calling for a vote:

  [ ] +1 to drop Spark 1.x support and upgrade to Spark 2.x only
  [ ] 0 (I don't care ;))
  [ ] -1, I would like to still support Spark 1.x, and so having support of 
both Spark 1.x and 2.x (please provide specific comment)


This vote is open for 48 hours (I have the commits ready, just waiting the end 
of the vote to push on the PR).


Thanks !
Regards
JB
--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com