BTW maybe one key point that isn't obvious is that with YARN and Mesos, the 
version of Spark used can be solely up to the developer who writes an app, not 
to the cluster administrator. So even in very conservative orgs, developers can 
download a new version of Spark, run it, and demonstrate value, which is good 
both for them and for the project. On the other hand, if they were stuck with, 
say, Spark 1.3, they'd have a much worse experience and perhaps get a worse 
impression of the project.

Matei

> On Oct 28, 2016, at 9:58 AM, Matei Zaharia <matei.zaha...@gmail.com> wrote:
> 
> Deprecating them is fine (and I know they're already deprecated), the 
> question is just whether to remove them. For example, what exactly is the 
> downside of having Python 2.6 or Java 7 right now? If it's high, then we can 
> remove them, but I just haven't seen a ton of details. It also sounded like 
> fairly recent versions of CDH, HDP, RHEL, etc still have old versions of 
> these.
> 
> Just talking with users, I've seen many of people who say "we have a Hadoop 
> cluster from $VENDOR, but we just download Spark from Apache and run newer 
> versions of that". That's great for Spark IMO, and we need to stay compatible 
> even with somewhat older Hadoop installs because they are time-consuming to 
> update. Having the whole community on a small set of versions leads to a 
> better experience for everyone and also to more of a "network effect": more 
> people can battle-test new versions, answer questions about them online, 
> write libraries that easily reach the majority of Spark users, etc.
> 
> Matei
> 
>> On Oct 27, 2016, at 11:51 PM, Ofir Manor <ofir.ma...@equalum.io 
>> <mailto:ofir.ma...@equalum.io>> wrote:
>> 
>> I totally agree with Sean, just a small correction:
>> Java 7 and Python 2.6 are already deprecated since Spark 2.0 (after a 
>> lengthy discussion), so there is no need to discuss whether they should 
>> become deprecated in 2.1
>>   http://spark.apache.org/releases/spark-release-2-0-0.html#deprecations 
>> <http://spark.apache.org/releases/spark-release-2-0-0.html#deprecations>
>> The discussion is whether Scala 2.10 should also be marked as deprecated (no 
>> one is objecting that), and more importantly, when to actually move from 
>> deprecation to actually dropping support for any combination of JDK / Scala 
>> / Hadoop / Python.
>> 
>> Ofir Manor
>> 
>> Co-Founder & CTO | Equalum
>> 
>> 
>> Mobile: +972-54-7801286 <tel:%2B972-54-7801286> | Email: 
>> ofir.ma...@equalum.io <mailto:ofir.ma...@equalum.io>
>> On Fri, Oct 28, 2016 at 12:13 AM, Sean Owen <so...@cloudera.com 
>> <mailto:so...@cloudera.com>> wrote:
>> The burden may be a little more apparent when dealing with the day to day 
>> merging and fixing of breaks. The upside is maybe the more compelling 
>> argument though. For example, lambda-fying all the Java code, supporting 
>> java.time, and taking advantage of some newer Hadoop/YARN APIs is a moderate 
>> win for users too, and there's also a cost to not doing that.
>> 
>> I must say I don't see a risk of fragmentation as nearly the problem it's 
>> made out to be here. We are, after all, here discussing _beginning_ to 
>> remove support _in 6 months_, for long since non-current versions of things. 
>> An org's decision to not, say, use Java 8 is a decision to not use the new 
>> version of lots of things. It's not clear this is a constituency that is 
>> either large or one to reasonably serve indefinitely.
>> 
>> In the end, the Scala issue may be decisive. Supporting 2.10 - 2.12 
>> simultaneously is a bridge too far, and if 2.12 requires Java 8, it's a good 
>> reason to for Spark to require Java 8. And Steve suggests that means a 
>> minimum of Hadoop 2.6 too. (I still profess ignorance of the Python part of 
>> the issue.)
>> 
>> Put another way I am not sure what the criteria is, if not the above?
>> 
>> I support deprecating all of these things, at the least, in 2.1.0. Although 
>> it's a separate question, I believe it's going to be necessary to remove 
>> support in ~6 months in 2.2.0.
>> 
>> 
>> On Thu, Oct 27, 2016 at 4:36 PM Matei Zaharia <matei.zaha...@gmail.com 
>> <mailto:matei.zaha...@gmail.com>> wrote:
>> Just to comment on this, I'm generally against removing these types of 
>> things unless they create a substantial burden on project contributors. It 
>> doesn't sound like Python 2.6 and Java 7 do that yet -- Scala 2.10 might, 
>> but then of course we need to wait for 2.12 to be out and stable.
>> 
>> In general, this type of stuff only hurts users, and doesn't have a huge 
>> impact on Spark contributors' productivity (sure, it's a bit unpleasant, but 
>> that's life). If we break compatibility this way too quickly, we fragment 
>> the user community, and then either people have a crappy experience with 
>> Spark because their corporate IT doesn't yet have an environment that can 
>> run the latest version, or worse, they create more maintenance burden for us 
>> because they ask for more patches to be backported to old Spark versions 
>> (1.6.x, 2.0.x, etc). Python in particular is pretty fundamental to many 
>> Linux distros.
>> 
>> In the future, rather than just looking at when some software came out, it 
>> may be good to have some criteria for when to drop support for something. 
>> For example, if there are really nice libraries in Python 2.7 or Java 8 that 
>> we're missing out on, that may be a good reason. The maintenance burden for 
>> multiple Scala versions is definitely painful but I also think we should 
>> always support the latest two Scala releases.
>> 
>> Matei
>> 
>>> On Oct 27, 2016, at 12:15 PM, Reynold Xin <r...@databricks.com 
>>> <mailto:r...@databricks.com>> wrote:
>>> 
>>> I created a JIRA ticket to track this: 
>>> https://issues.apache.org/jira/browse/SPARK-18138 
>>> <https://issues.apache.org/jira/browse/SPARK-18138>
>>> 
>>> 
>>> 
>>> On Thu, Oct 27, 2016 at 10:19 AM, Steve Loughran <ste...@hortonworks.com 
>>> <mailto:ste...@hortonworks.com>> wrote:
>>> 
>>>> On 27 Oct 2016, at 10:03, Sean Owen <so...@cloudera.com 
>>>> <mailto:so...@cloudera.com>> wrote:
>>>> 
>>>> Seems OK by me.
>>>> How about Hadoop < 2.6, Python 2.6? Those seem more removeable. I'd like 
>>>> to add that to a list of things that will begin to be unsupported 6 months 
>>>> from now.
>>>> 
>>> 
>>> If you go to java 8 only, then hadoop 2.6+ is mandatory. 
>>> 
>>> 
>>>> On Wed, Oct 26, 2016 at 8:49 PM Koert Kuipers <ko...@tresata.com 
>>>> <mailto:ko...@tresata.com>> wrote:
>>>> that sounds good to me
>>>> 
>>>> On Wed, Oct 26, 2016 at 2:26 PM, Reynold Xin <r...@databricks.com 
>>>> <mailto:r...@databricks.com>> wrote:
>>>> We can do the following concrete proposal:
>>>> 
>>>> 1. Plan to remove support for Java 7 / Scala 2.10 in Spark 2.2.0 (Mar/Apr 
>>>> 2017).
>>>> 
>>>> 2. In Spark 2.1.0 release, aggressively and explicitly announce the 
>>>> deprecation of Java 7 / Scala 2.10 support.
>>>> 
>>>> (a) It should appear in release notes, documentations that mention how to 
>>>> build Spark
>>>> 
>>>> (b) and a warning should be shown every time SparkContext is started using 
>>>> Scala 2.10 or Java 7.
>>>> 
>>> 
>>> 
>> 
>> 
> 

Reply via email to