About a year ago we decided to drop Java 6 support in Spark 1.5. I am wondering if we should also just drop Java 7 support in Spark 2.0 (i.e. Spark 2.0 would require Java 8 to run).
Oracle ended public updates for JDK 7 in one year ago (Apr 2015), and removed public downloads for JDK 7 in July 2015. In the past I've actually been against dropping Java 8, but today I ran into an issue with the new Dataset API not working well with Java 8 lambdas, and that changed my opinion on this. I've been thinking more about this issue today and also talked with a lot people offline to gather feedback, and I actually think the pros outweighs the cons, for the following reasons (in some rough order of importance): 1. It is complicated to test how well Spark APIs work for Java lambdas if we support Java 7. Jenkins machines need to have both Java 7 and Java 8 installed and we must run through a set of test suites in 7, and then the lambda tests in Java 8. This complicates build environments/scripts, and makes them less robust. Without good testing infrastructure, I have no confidence in building good APIs for Java 8. 2. Dataset/DataFrame performance will be between 1x to 10x slower in Java 7. The primary APIs we want users to use in Spark 2.x are Dataset/DataFrame, and this impacts pretty much everything from machine learning to structured streaming. We have made great progress in their performance through extensive use of code generation. (In many dimensions Spark 2.0 with DataFrames/Datasets looks more like a compiler than a MapReduce or query engine.) These optimizations don't work well in Java 7 due to broken code cache flushing. This problem has been fixed by Oracle in Java 8. In addition, Java 8 comes with better support for Unsafe and SIMD. 3. Scala 2.12 will come out soon, and we will want to add support for that. Scala 2.12 only works on Java 8. If we do support Java 7, we'd have a fairly complicated compatibility matrix and testing infrastructure. 4. There are libraries that I've looked into in the past that support only Java 8. This is more common in high performance libraries such as Aeron (a messaging library). Having to support Java 7 means we are not able to use these. It is not that big of a deal right now, but will become increasingly more difficult as we optimize performance. The downside of not supporting Java 7 is also obvious. Some organizations are stuck with Java 7, and they wouldn't be able to use Spark 2.0 without upgrading Java.