The burden may be a little more apparent when dealing with the day to day merging and fixing of breaks. The upside is maybe the more compelling argument though. For example, lambda-fying all the Java code, supporting java.time, and taking advantage of some newer Hadoop/YARN APIs is a moderate win for users too, and there's also a cost to not doing that.
I must say I don't see a risk of fragmentation as nearly the problem it's made out to be here. We are, after all, here discussing _beginning_ to remove support _in 6 months_, for long since non-current versions of things. An org's decision to not, say, use Java 8 is a decision to not use the new version of lots of things. It's not clear this is a constituency that is either large or one to reasonably serve indefinitely. In the end, the Scala issue may be decisive. Supporting 2.10 - 2.12 simultaneously is a bridge too far, and if 2.12 requires Java 8, it's a good reason to for Spark to require Java 8. And Steve suggests that means a minimum of Hadoop 2.6 too. (I still profess ignorance of the Python part of the issue.) Put another way I am not sure what the criteria is, if not the above? I support deprecating all of these things, at the least, in 2.1.0. Although it's a separate question, I believe it's going to be necessary to remove support in ~6 months in 2.2.0. On Thu, Oct 27, 2016 at 4:36 PM Matei Zaharia <matei.zaha...@gmail.com> wrote: > Just to comment on this, I'm generally against removing these types of > things unless they create a substantial burden on project contributors. It > doesn't sound like Python 2.6 and Java 7 do that yet -- Scala 2.10 might, > but then of course we need to wait for 2.12 to be out and stable. > > In general, this type of stuff only hurts users, and doesn't have a huge > impact on Spark contributors' productivity (sure, it's a bit unpleasant, > but that's life). If we break compatibility this way too quickly, we > fragment the user community, and then either people have a crappy > experience with Spark because their corporate IT doesn't yet have an > environment that can run the latest version, or worse, they create more > maintenance burden for us because they ask for more patches to be > backported to old Spark versions (1.6.x, 2.0.x, etc). Python in particular > is pretty fundamental to many Linux distros. > > In the future, rather than just looking at when some software came out, it > may be good to have some criteria for when to drop support for something. > For example, if there are really nice libraries in Python 2.7 or Java 8 > that we're missing out on, that may be a good reason. The maintenance > burden for multiple Scala versions is definitely painful but I also think > we should always support the latest two Scala releases. > > Matei > > On Oct 27, 2016, at 12:15 PM, Reynold Xin <r...@databricks.com> wrote: > > I created a JIRA ticket to track this: > https://issues.apache.org/jira/browse/SPARK-18138 > > > > On Thu, Oct 27, 2016 at 10:19 AM, Steve Loughran <ste...@hortonworks.com> > wrote: > > > On 27 Oct 2016, at 10:03, Sean Owen <so...@cloudera.com> wrote: > > Seems OK by me. > How about Hadoop < 2.6, Python 2.6? Those seem more removeable. I'd like > to add that to a list of things that will begin to be unsupported 6 months > from now. > > > If you go to java 8 only, then hadoop 2.6+ is mandatory. > > > On Wed, Oct 26, 2016 at 8:49 PM Koert Kuipers <ko...@tresata.com> wrote: > > that sounds good to me > > On Wed, Oct 26, 2016 at 2:26 PM, Reynold Xin <r...@databricks.com> wrote: > > We can do the following concrete proposal: > > 1. Plan to remove support for Java 7 / Scala 2.10 in Spark 2.2.0 (Mar/Apr > 2017). > > 2. In Spark 2.1.0 release, aggressively and explicitly announce the > deprecation of Java 7 / Scala 2.10 support. > > (a) It should appear in release notes, documentations that mention how to > build Spark > > (b) and a warning should be shown every time SparkContext is started using > Scala 2.10 or Java 7. > > > > >