(PS CDH5 runs fine with Java 8, but I understand your more general point.)

This is a familiar context indeed, but in that context, would a group
not wanting to update to Java 8 want to manually put Spark 2.0 into
the mix? That is, if this is a context where the cluster is
purposefully some stable mix of components, would you be updating just
one?

You make a good point about Scala being more library than
infrastructure component. So it can be updated on a per-app basis. On
the one hand it's harder to handle different Scala versions from the
framework side, it's less hard on the deployment side.

On Thu, Mar 24, 2016 at 4:27 PM, Koert Kuipers <ko...@tresata.com> wrote:
> i think the arguments are convincing, but it also makes me wonder if i live
> in some kind of alternate universe... we deploy on customers clusters, where
> the OS, python version, java version and hadoop distro are not chosen by us.
> so think centos 6, cdh5 or hdp 2.3, java 7 and python 2.6. we simply have
> access to a single proxy machine and launch through yarn. asking them to
> upgrade java is pretty much out of the question or a 6+ month ordeal. of the
> 10 client clusters i can think of on the top of my head all of them are on
> java 7, none are on java 8. so by doing this you would make spark 2
> basically unusable for us (unless most of them have plans of upgrading in
> near term to java 8, i will ask around and report back...).
>
> on a side note, its particularly interesting to me that spark 2 chose to
> continue support for scala 2.10, because even for us in our very constricted
> client environments the scala version is something we can easily upgrade (we
> just deploy a custom build of spark for the relevant scala version and
> hadoop distro). and because scala is not a dependency of any hadoop distro
> (so not on classpath, which i am very happy about) we can use whatever scala
> version we like. also i found the upgrade path from scala 2.10 to 2.11 to be
> very easy, so i have a hard time understanding why anyone would stay on
> scala 2.10. and finally with scala 2.12 around the corner you really dont
> want to be supporting 3 versions. so clearly i am missing something here.
>
>
>
> On Thu, Mar 24, 2016 at 8:52 AM, Jean-Baptiste Onofré <j...@nanthrax.net>
> wrote:
>>
>> +1 to support Java 8 (and future) *only* in Spark 2.0, and end support of
>> Java 7. It makes sense.
>>
>> Regards
>> JB
>>
>>
>> On 03/24/2016 08:27 AM, Reynold Xin wrote:
>>>
>>> About a year ago we decided to drop Java 6 support in Spark 1.5. I am
>>> wondering if we should also just drop Java 7 support in Spark 2.0 (i.e.
>>> Spark 2.0 would require Java 8 to run).
>>>
>>> Oracle ended public updates for JDK 7 in one year ago (Apr 2015), and
>>> removed public downloads for JDK 7 in July 2015. In the past I've
>>> actually been against dropping Java 8, but today I ran into an issue
>>> with the new Dataset API not working well with Java 8 lambdas, and that
>>> changed my opinion on this.
>>>
>>> I've been thinking more about this issue today and also talked with a
>>> lot people offline to gather feedback, and I actually think the pros
>>> outweighs the cons, for the following reasons (in some rough order of
>>> importance):
>>>
>>> 1. It is complicated to test how well Spark APIs work for Java lambdas
>>> if we support Java 7. Jenkins machines need to have both Java 7 and Java
>>> 8 installed and we must run through a set of test suites in 7, and then
>>> the lambda tests in Java 8. This complicates build environments/scripts,
>>> and makes them less robust. Without good testing infrastructure, I have
>>> no confidence in building good APIs for Java 8.
>>>
>>> 2. Dataset/DataFrame performance will be between 1x to 10x slower in
>>> Java 7. The primary APIs we want users to use in Spark 2.x are
>>> Dataset/DataFrame, and this impacts pretty much everything from machine
>>> learning to structured streaming. We have made great progress in their
>>> performance through extensive use of code generation. (In many
>>> dimensions Spark 2.0 with DataFrames/Datasets looks more like a compiler
>>> than a MapReduce or query engine.) These optimizations don't work well
>>> in Java 7 due to broken code cache flushing. This problem has been fixed
>>> by Oracle in Java 8. In addition, Java 8 comes with better support for
>>> Unsafe and SIMD.
>>>
>>> 3. Scala 2.12 will come out soon, and we will want to add support for
>>> that. Scala 2.12 only works on Java 8. If we do support Java 7, we'd
>>> have a fairly complicated compatibility matrix and testing
>>> infrastructure.
>>>
>>> 4. There are libraries that I've looked into in the past that support
>>> only Java 8. This is more common in high performance libraries such as
>>> Aeron (a messaging library). Having to support Java 7 means we are not
>>> able to use these. It is not that big of a deal right now, but will
>>> become increasingly more difficult as we optimize performance.
>>>
>>>
>>> The downside of not supporting Java 7 is also obvious. Some
>>> organizations are stuck with Java 7, and they wouldn't be able to use
>>> Spark 2.0 without upgrading Java.
>>>
>>>
>>
>> --
>> Jean-Baptiste Onofré
>> jbono...@apache.org
>> http://blog.nanthrax.net
>> Talend - http://www.talend.com
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
>> For additional commands, e-mail: dev-h...@spark.apache.org
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Reply via email to