> On 24 Mar 2016, at 07:27, Reynold Xin <r...@databricks.com> wrote:
> 
> About a year ago we decided to drop Java 6 support in Spark 1.5. I am 
> wondering if we should also just drop Java 7 support in Spark 2.0 (i.e. Spark 
> 2.0 would require Java 8 to run).
> 
> Oracle ended public updates for JDK 7 in one year ago (Apr 2015), and removed 
> public downloads for JDK 7 in July 2015.

Still there, Jan 2016 was the last public one.

> In the past I've actually been against dropping Java 8, but today I ran into 
> an issue with the new Dataset API not working well with Java 8 lambdas, and 
> that changed my opinion on this.
> 
> I've been thinking more about this issue today and also talked with a lot 
> people offline to gather feedback, and I actually think the pros outweighs 
> the cons, for the following reasons (in some rough order of importance):
> 
> 1. It is complicated to test how well Spark APIs work for Java lambdas if we 
> support Java 7. Jenkins machines need to have both Java 7 and Java 8 
> installed and we must run through a set of test suites in 7, and then the 
> lambda tests in Java 8. This complicates build environments/scripts, and 
> makes them less robust. Without good testing infrastructure, I have no 
> confidence in building good APIs for Java 8.

+complicates the test matrix for problems: if something works on java 8 and 
fails on java 7, is that a java 8 problem or a java 7 one?
+most developers would want to be on java 8 on their desktop if they could; the 
risk is that people accidentally code for java 8 even if they don't realise it 
just by using java 8 libraries, etc

> 
> 2. Dataset/DataFrame performance will be between 1x to 10x slower in Java 7. 
> The primary APIs we want users to use in Spark 2.x are Dataset/DataFrame, and 
> this impacts pretty much everything from machine learning to structured 
> streaming. We have made great progress in their performance through extensive 
> use of code generation. (In many dimensions Spark 2.0 with 
> DataFrames/Datasets looks more like a compiler than a MapReduce or query 
> engine.) These optimizations don't work well in Java 7 due to broken code 
> cache flushing. This problem has been fixed by Oracle in Java 8. In addition, 
> Java 8 comes with better support for Unsafe and SIMD.
> 
> 3. Scala 2.12 will come out soon, and we will want to add support for that. 
> Scala 2.12 only works on Java 8. If we do support Java 7, we'd have a fairly 
> complicated compatibility matrix and testing infrastructure.
> 
> 4. There are libraries that I've looked into in the past that support only 
> Java 8. This is more common in high performance libraries such as Aeron (a 
> messaging library). Having to support Java 7 means we are not able to use 
> these. It is not that big of a deal right now, but will become increasingly 
> more difficult as we optimize performance.
> 
> 
> The downside of not supporting Java 7 is also obvious. Some organizations are 
> stuck with Java 7, and they wouldn't be able to use Spark 2.0 without 
> upgrading Java.
> 


One thing you have to consider here is : will the organisations that don't want 
to upgrade to java 8 want to be upgrading to spark 2.0 anyway? 
> 

If there is a price, it means all apps that use any remote Spark APIs will also 
have to be java 8. Something like a REST API is less of an issue, but anything 
loading an JAR in the group org.apache.spark will have to be Java 8+. That's 
what held hadoop back on Java 7 in 2015 : twitter made the case that it 
shouldn't be the hadoop cluster forcing them to upgrade all their client apps 
just to use the IPC and filesystem code.I don't believe that's so much of a 
constraint on Spark.

Finally, Java 8 lines you up better for worrying about Java 9, which is on the 
horizon.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Reply via email to