i think the arguments are convincing, but it also makes me wonder if i live
in some kind of alternate universe... we deploy on customers clusters,
where the OS, python version, java version and hadoop distro are not chosen
by us. so think centos 6, cdh5 or hdp 2.3, java 7 and python 2.6. we simply
have access to a single proxy machine and launch through yarn. asking them
to upgrade java is pretty much out of the question or a 6+ month ordeal. of
the 10 client clusters i can think of on the top of my head all of them are
on java 7, none are on java 8. so by doing this you would make spark 2
basically unusable for us (unless most of them have plans of upgrading in
near term to java 8, i will ask around and report back...).

on a side note, its particularly interesting to me that spark 2 chose to
continue support for scala 2.10, because even for us in our very
constricted client environments the scala version is something we can
easily upgrade (we just deploy a custom build of spark for the relevant
scala version and hadoop distro). and because scala is not a dependency of
any hadoop distro (so not on classpath, which i am very happy about) we can
use whatever scala version we like. also i found the upgrade path from
scala 2.10 to 2.11 to be very easy, so i have a hard time understanding why
anyone would stay on scala 2.10. and finally with scala 2.12 around the
corner you really dont want to be supporting 3 versions. so clearly i am
missing something here.



On Thu, Mar 24, 2016 at 8:52 AM, Jean-Baptiste Onofré <j...@nanthrax.net>
wrote:

> +1 to support Java 8 (and future) *only* in Spark 2.0, and end support of
> Java 7. It makes sense.
>
> Regards
> JB
>
>
> On 03/24/2016 08:27 AM, Reynold Xin wrote:
>
>> About a year ago we decided to drop Java 6 support in Spark 1.5. I am
>> wondering if we should also just drop Java 7 support in Spark 2.0 (i.e.
>> Spark 2.0 would require Java 8 to run).
>>
>> Oracle ended public updates for JDK 7 in one year ago (Apr 2015), and
>> removed public downloads for JDK 7 in July 2015. In the past I've
>> actually been against dropping Java 8, but today I ran into an issue
>> with the new Dataset API not working well with Java 8 lambdas, and that
>> changed my opinion on this.
>>
>> I've been thinking more about this issue today and also talked with a
>> lot people offline to gather feedback, and I actually think the pros
>> outweighs the cons, for the following reasons (in some rough order of
>> importance):
>>
>> 1. It is complicated to test how well Spark APIs work for Java lambdas
>> if we support Java 7. Jenkins machines need to have both Java 7 and Java
>> 8 installed and we must run through a set of test suites in 7, and then
>> the lambda tests in Java 8. This complicates build environments/scripts,
>> and makes them less robust. Without good testing infrastructure, I have
>> no confidence in building good APIs for Java 8.
>>
>> 2. Dataset/DataFrame performance will be between 1x to 10x slower in
>> Java 7. The primary APIs we want users to use in Spark 2.x are
>> Dataset/DataFrame, and this impacts pretty much everything from machine
>> learning to structured streaming. We have made great progress in their
>> performance through extensive use of code generation. (In many
>> dimensions Spark 2.0 with DataFrames/Datasets looks more like a compiler
>> than a MapReduce or query engine.) These optimizations don't work well
>> in Java 7 due to broken code cache flushing. This problem has been fixed
>> by Oracle in Java 8. In addition, Java 8 comes with better support for
>> Unsafe and SIMD.
>>
>> 3. Scala 2.12 will come out soon, and we will want to add support for
>> that. Scala 2.12 only works on Java 8. If we do support Java 7, we'd
>> have a fairly complicated compatibility matrix and testing infrastructure.
>>
>> 4. There are libraries that I've looked into in the past that support
>> only Java 8. This is more common in high performance libraries such as
>> Aeron (a messaging library). Having to support Java 7 means we are not
>> able to use these. It is not that big of a deal right now, but will
>> become increasingly more difficult as we optimize performance.
>>
>>
>> The downside of not supporting Java 7 is also obvious. Some
>> organizations are stuck with Java 7, and they wouldn't be able to use
>> Spark 2.0 without upgrading Java.
>>
>>
>>
> --
> Jean-Baptiste Onofré
> jbono...@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> For additional commands, e-mail: dev-h...@spark.apache.org
>
>

Reply via email to