Steve, those are good points, I had forgotten Hadoop had those issues.    We 
run with jdk 8, hadoop is built for jdk7 compatibility, we are running hadoop 
2.7 on our clusters and by the time Spark 2.0 is out I would expected a mix of 
Hadoop 2.7 and 2.8.  We also don't use spnego.
I didn't quite follow what you were saying with the hadoop services being on 
jdk7.  Are you saying building spark with say hadoop 2.8 libraries but your 
hadoop cluster is running hadoop 2.6 or less? If so I would agree that isn't a 
good idea.
Personally and from Yahoo point I'm still fine with going to jdk8 but I could 
see where other people are on older versions of Hadoop where it might be a 
problem.
Tom 

    On Wednesday, March 30, 2016 5:42 AM, Steve Loughran 
<ste...@hortonworks.com> wrote:
 

 
Can I note that if Spark 2.0 is going to be Java 8+ only, then that means 
Hadoop 2.6.x should be the minimum Hadoop version.
https://issues.apache.org/jira/browse/HADOOP-11090
Where things get complicated, is that situation of: Hadoop services on Java 7, 
Spark on Java 8 in its own JVM
I'm not sure that you could get away with having the newer version of the 
Hadoop classes in the spark assembly/lib dir, without coming up against 
incompatibilities with the Hadoop JNI libraries. These are currently backwards 
compatible, but trying to link up Hadoop 2.7 against a Hadoop 2.6 hadoop lib 
will generate an UnsatisfiedLinkException. Meaning: the whole cluster's hadoop 
libs have to be in sync, or at least the main cluster release in a version of 
hadoop 2.x >= the spark bundled edition.
Ignoring that detail, 
Hadoop 2.6.1+Guava >= 15? 17?
 I think the outcome of Hadoop < 2.6 and JDK >= 8 is "undefined"; all bug 
reports will be met with a "please upgrade, re-open if the problem is still 
there". 
Kerberos is  a particular troublespot here : You need Hadoop 2.6.1+ for 
Kerberos to work in Java 8 and recent versions of Java 7 (HADOOP-10786)
Note also that HADOOP-11628 is in 2.8 only. SPNEGO + CNAMES. I'll see about 
pulling that into 2.7.x, though I'm reluctant to go near 2.6 just to keep that 
extra stable.

Thomas: you've got the big clusters, what versions of Hadoop will they be on by 
the time you look at Spark 2.0?
-Steve


 

  

Reply via email to