Can I note that if Spark 2.0 is going to be Java 8+ only, then that means Hadoop 2.6.x should be the minimum Hadoop version.
https://issues.apache.org/jira/browse/HADOOP-11090 Where things get complicated, is that situation of: Hadoop services on Java 7, Spark on Java 8 in its own JVM I'm not sure that you could get away with having the newer version of the Hadoop classes in the spark assembly/lib dir, without coming up against incompatibilities with the Hadoop JNI libraries. These are currently backwards compatible, but trying to link up Hadoop 2.7 against a Hadoop 2.6 hadoop lib will generate an UnsatisfiedLinkException. Meaning: the whole cluster's hadoop libs have to be in sync, or at least the main cluster release in a version of hadoop 2.x >= the spark bundled edition. Ignoring that detail, Hadoop 2.6.1+ Guava >= 15? 17? I think the outcome of Hadoop < 2.6 and JDK >= 8 is "undefined"; all bug reports will be met with a "please upgrade, re-open if the problem is still there". Kerberos is a particular troublespot here : You need Hadoop 2.6.1+ for Kerberos to work in Java 8 and recent versions of Java 7 (HADOOP-10786) Note also that HADOOP-11628 is in 2.8 only. SPNEGO + CNAMES. I'll see about pulling that into 2.7.x, though I'm reluctant to go near 2.6 just to keep that extra stable. Thomas: you've got the big clusters, what versions of Hadoop will they be on by the time you look at Spark 2.0? -Steve