Re: [VOTE] Drop Spark 1.x support to focus on Spark 2.x

Holden Karau Wed, 08 Nov 2017 21:46:55 -0800

That's a good point about Oozie does only supporting only Spark 1 or 2 at a
time on a cluster -- but do we know people using Oozie and Spark 1 that
would still be using Spark 1 by the time of the next BEAM release? The last
Spark 1 release was a year ago (and last non-maintenance release almost 20
months ago).


On Wed, Nov 8, 2017 at 9:30 PM, NerdyNick <[email protected]> wrote:

> I don't know if ditching Spark 1 out right right now would be a great move
> given that a lot of the main support applications around spark haven't yet
> fully moved to Spark 2 yet. Yet alone have support for having a cluster
> with both. Oozie for example is still pre stable release for their Spark 1
> and can't support a cluster with mixed Spark version. I think maybe doing
> as suggested above with the common, spark1, spark2 packaging might be best
> during this carry over phase. Maybe even just flag spark 1 as deprecated
> and just being maintained might be enough.
>
> On Wed, Nov 8, 2017 at 10:25 PM, Holden Karau <[email protected]>
> wrote:
>
> > Also, upgrading Spark 1 to 2 is generally easier than changing JVM
> > versions. For folks using YARN or the hosted environments it pretty much
> > trivial since you can effectively have distinct Spark clusters for each
> > job.
> >
> > On Wed, Nov 8, 2017 at 9:19 PM, Holden Karau <[email protected]>
> wrote:
> >
> > > I'm +1 on dropping Spark 1. There are a lot of exciting improvements in
> > > Spark 2, and trying to write efficient code that runs between Spark 1
> and
> > > Spark 2 is super painful in the long term. It would be one thing if
> there
> > > were a lot of people available to work on the Spark runners, but it
> seems
> > > like we'd be better spent focusing our energy on the future.
> > >
> > > I don't know a lot of folks who are stuck on Spark 1, and the few that
> I
> > > know are planning to migrate in the next few months anyways.
> > >
> > > Note: this is a non-binding vote as I'm not a committer or PMC member.
> > >
> > > On Wed, Nov 8, 2017 at 3:43 AM, Ted Yu <[email protected]> wrote:
> > >
> > >> Having both Spark1 and Spark2 modules would benefit wider user base.
> > >>
> > >> I would vote for that.
> > >>
> > >> Cheers
> > >>
> > >> On Wed, Nov 8, 2017 at 12:51 AM, Jean-Baptiste Onofré <
> [email protected]>
> > >> wrote:
> > >>
> > >> > Hi Robert,
> > >> >
> > >> > Thanks for your feedback !
> > >> >
> > >> > From an user perspective, with the current state of the PR, the same
> > >> > pipelines can run on both Spark 1.x and 2.x: the only difference is
> > the
> > >> > dependencies set.
> > >> >
> > >> > I'm calling the vote to get suck kind of feedback: if we consider
> > Spark
> > >> > 1.x still need to be supported, no problem, I will improve the PR to
> > >> have
> > >> > three modules (common, spark1, spark2) and let users pick the
> desired
> > >> > version.
> > >> >
> > >> > Let's wait a bit other feedbacks, I will update the PR accordingly.
> > >> >
> > >> > Regards
> > >> > JB
> > >> >
> > >> >
> > >> > On 11/08/2017 09:47 AM, Robert Bradshaw wrote:
> > >> >
> > >> >> I'm generally a -0.5 on this change, or at least doing so hastily.
> > >> >>
> > >> >> As with dropping Java 7 support, I think this should at least be
> > >> >> announced in release notes that we're considering dropping support
> in
> > >> >> the subsequent release, as this dev list likely does not reach a
> > >> >> substantial portion of the userbase.
> > >> >>
> > >> >> How much work is it to move from a Spark 1.x cluster to a Spark 2.x
> > >> >> cluster? I get the feeling it's not nearly as transparent as
> > upgrading
> > >> >> Java versions. Can Spark 1.x pipelines be run on Spark 2.x
> clusters,
> > >> >> or is a new cluster (and/or upgrading all pipelines) required (e.g.
> > >> >> for those who operate spark clusters shared among their many
> users)?
> > >> >>
> > >> >> Looks like the latest release of Spark 1.x was about a year ago,
> > >> >> overlapping a bit with the 2.x series which is coming up on 1.5
> years
> > >> >> old, so I could see a lot of people still using 1.x even if 2.x is
> > >> >> clearly the future. But it sure doesn't seem very backwards
> > >> >> compatible.
> > >> >>
> > >> >> Mostly I'm not comfortable with dropping 1.x in the same release as
> > >> >> adding support for 2.x, giving no transition period, but could be
> > >> >> convinced if this transition is mostly a no-op or no one's still
> > using
> > >> >> 1.x. If there's non-trivial code complexity issues, I would perhaps
> > >> >> revisit the issue of having a single Spark Runner that does chooses
> > >> >> the backend implicitly in favor of simply having two runners which
> > >> >> share the code that's easy to share and diverge otherwise (which
> > seems
> > >> >> it would be much simpler both to implement and explain to users). I
> > >> >> would be OK with even letting the Spark 1.x runner be somewhat
> > >> >> stagnant (e.g. few or no new features) until we decide we can kill
> it
> > >> >> off.
> > >> >>
> > >> >> On Tue, Nov 7, 2017 at 11:27 PM, Jean-Baptiste Onofré <
> > [email protected]
> > >> >
> > >> >> wrote:
> > >> >>
> > >> >>> Hi all,
> > >> >>>
> > >> >>> as you might know, we are working on Spark 2.x support in the
> Spark
> > >> >>> runner.
> > >> >>>
> > >> >>> I'm working on a PR about that:
> > >> >>>
> > >> >>> https://github.com/apache/beam/pull/3808
> > >> >>>
> > >> >>> Today, we have something working with both Spark 1.x and 2.x from
> a
> > >> code
> > >> >>> standpoint, but I have to deal with dependencies. It's the first
> > step
> > >> of
> > >> >>> the
> > >> >>> update as I'm still using RDD, the second step would be to support
> > >> >>> dataframe
> > >> >>> (but for that, I would need PCollection elements with schemas,
> > that's
> > >> >>> another topic on which Eugene, Reuven and I are discussing).
> > >> >>>
> > >> >>> However, as all major distributions now ship Spark 2.x, I don't
> > think
> > >> >>> it's
> > >> >>> required anymore to support Spark 1.x.
> > >> >>>
> > >> >>> If we agree, I will update and cleanup the PR to only support and
> > >> focus
> > >> >>> on
> > >> >>> Spark 2.x.
> > >> >>>
> > >> >>> So, that's why I'm calling for a vote:
> > >> >>>
> > >> >>>    [ ] +1 to drop Spark 1.x support and upgrade to Spark 2.x only
> > >> >>>    [ ] 0 (I don't care ;))
> > >> >>>    [ ] -1, I would like to still support Spark 1.x, and so having
> > >> >>> support of
> > >> >>> both Spark 1.x and 2.x (please provide specific comment)
> > >> >>>
> > >> >>> This vote is open for 48 hours (I have the commits ready, just
> > waiting
> > >> >>> the
> > >> >>> end of the vote to push on the PR).
> > >> >>>
> > >> >>> Thanks !
> > >> >>> Regards
> > >> >>> JB
> > >> >>> --
> > >> >>> Jean-Baptiste Onofré
> > >> >>> [email protected]
> > >> >>> http://blog.nanthrax.net
> > >> >>> Talend - http://www.talend.com
> > >> >>>
> > >> >>
> > >> > --
> > >> > Jean-Baptiste Onofré
> > >> > [email protected]
> > >> > http://blog.nanthrax.net
> > >> > Talend - http://www.talend.com
> > >> >
> > >>
> > >
> > >
> > >
> > > --
> > > Twitter: https://twitter.com/holdenkarau
> > >
> >
> >
> >
> > --
> > Twitter: https://twitter.com/holdenkarau
> >
>
>
>
> --
> Nick Verbeck - NerdyNick
> ----------------------------------------------------
> NerdyNick.com
> TrailsOffroad.com
> NoKnownBoundaries.com
>



-- 
Twitter: https://twitter.com/holdenkarau

Re: [VOTE] Drop Spark 1.x support to focus on Spark 2.x

Reply via email to