The burden may be a little more apparent when dealing with the day to day
merging and fixing of breaks. The upside is maybe the more compelling
argument though. For example, lambda-fying all the Java code, supporting
java.time, and taking advantage of some newer Hadoop/YARN APIs is a
moderate win for users too, and there's also a cost to not doing that.

I must say I don't see a risk of fragmentation as nearly the problem it's
made out to be here. We are, after all, here discussing _beginning_ to
remove support _in 6 months_, for long since non-current versions of
things. An org's decision to not, say, use Java 8 is a decision to not use
the new version of lots of things. It's not clear this is a constituency
that is either large or one to reasonably serve indefinitely.

In the end, the Scala issue may be decisive. Supporting 2.10 - 2.12
simultaneously is a bridge too far, and if 2.12 requires Java 8, it's a
good reason to for Spark to require Java 8. And Steve suggests that means a
minimum of Hadoop 2.6 too. (I still profess ignorance of the Python part of
the issue.)

Put another way I am not sure what the criteria is, if not the above?

I support deprecating all of these things, at the least, in 2.1.0. Although
it's a separate question, I believe it's going to be necessary to remove
support in ~6 months in 2.2.0.


On Thu, Oct 27, 2016 at 4:36 PM Matei Zaharia <matei.zaha...@gmail.com>
wrote:

> Just to comment on this, I'm generally against removing these types of
> things unless they create a substantial burden on project contributors. It
> doesn't sound like Python 2.6 and Java 7 do that yet -- Scala 2.10 might,
> but then of course we need to wait for 2.12 to be out and stable.
>
> In general, this type of stuff only hurts users, and doesn't have a huge
> impact on Spark contributors' productivity (sure, it's a bit unpleasant,
> but that's life). If we break compatibility this way too quickly, we
> fragment the user community, and then either people have a crappy
> experience with Spark because their corporate IT doesn't yet have an
> environment that can run the latest version, or worse, they create more
> maintenance burden for us because they ask for more patches to be
> backported to old Spark versions (1.6.x, 2.0.x, etc). Python in particular
> is pretty fundamental to many Linux distros.
>
> In the future, rather than just looking at when some software came out, it
> may be good to have some criteria for when to drop support for something.
> For example, if there are really nice libraries in Python 2.7 or Java 8
> that we're missing out on, that may be a good reason. The maintenance
> burden for multiple Scala versions is definitely painful but I also think
> we should always support the latest two Scala releases.
>
> Matei
>
> On Oct 27, 2016, at 12:15 PM, Reynold Xin <r...@databricks.com> wrote:
>
> I created a JIRA ticket to track this:
> https://issues.apache.org/jira/browse/SPARK-18138
>
>
>
> On Thu, Oct 27, 2016 at 10:19 AM, Steve Loughran <ste...@hortonworks.com>
> wrote:
>
>
> On 27 Oct 2016, at 10:03, Sean Owen <so...@cloudera.com> wrote:
>
> Seems OK by me.
> How about Hadoop < 2.6, Python 2.6? Those seem more removeable. I'd like
> to add that to a list of things that will begin to be unsupported 6 months
> from now.
>
>
> If you go to java 8 only, then hadoop 2.6+ is mandatory.
>
>
> On Wed, Oct 26, 2016 at 8:49 PM Koert Kuipers <ko...@tresata.com> wrote:
>
> that sounds good to me
>
> On Wed, Oct 26, 2016 at 2:26 PM, Reynold Xin <r...@databricks.com> wrote:
>
> We can do the following concrete proposal:
>
> 1. Plan to remove support for Java 7 / Scala 2.10 in Spark 2.2.0 (Mar/Apr
> 2017).
>
> 2. In Spark 2.1.0 release, aggressively and explicitly announce the
> deprecation of Java 7 / Scala 2.10 support.
>
> (a) It should appear in release notes, documentations that mention how to
> build Spark
>
> (b) and a warning should be shown every time SparkContext is started using
> Scala 2.10 or Java 7.
>
>
>
>
>

Reply via email to