I think this is causing issues upgrading ADAM <https://github.com/bigdatagenomics/adam> to Spark 1.3.1 (cf. adam#690 <https://github.com/bigdatagenomics/adam/pull/690#issuecomment-107769383>); attempting to build against Hadoop 1.0.4 yields errors like:
2015-06-02 15:57:44 ERROR Executor:96 - Exception in task 0.0 in stage 0.0 (TID 0) *java.lang.IncompatibleClassChangeError: Found class org.apache.hadoop.mapreduce.TaskAttemptContext, but interface was expected* at org.apache.spark.mapred.SparkHadoopMapRedUtil$.commitTask(SparkHadoopMapRedUtil.scala:95) at org.apache.spark.SparkHadoopWriter.commit(SparkHadoopWriter.scala:106) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$13.apply(PairRDDFunctions.scala:1082) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$13.apply(PairRDDFunctions.scala:1059) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61) at org.apache.spark.scheduler.Task.run(Task.scala:64) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) 2015-06-02 15:57:44 WARN TaskSetManager:71 - Lost task 0.0 in stage 0.0 (TID 0, localhost): java.lang.IncompatibleClassChangeError: Found class org.apache.hadoop.mapreduce.TaskAttemptContext, but interface was expected at org.apache.spark.mapred.SparkHadoopMapRedUtil$.commitTask(SparkHadoopMapRedUtil.scala:95) at org.apache.spark.SparkHadoopWriter.commit(SparkHadoopWriter.scala:106) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$13.apply(PairRDDFunctions.scala:1082) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$13.apply(PairRDDFunctions.scala:1059) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61) at org.apache.spark.scheduler.Task.run(Task.scala:64) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) TaskAttemptContext is a class in Hadoop 1.0.4, but an interface in Hadoop 2; Spark 1.3.1 expects the interface but is getting the class. It sounds like, while I *can* depend on Spark 1.3.1 and Hadoop 1.0.4, I then need to hope that I don't exercise certain Spark code paths that run afoul of differences between Hadoop 1 and 2; does that seem correct? Thanks! On Wed, May 20, 2015 at 1:52 PM Sean Owen <so...@cloudera.com> wrote: > I don't think any of those problems are related to Hadoop. Have you looked > at userClassPathFirst settings? > > On Wed, May 20, 2015 at 6:46 PM, Edward Sargisson <ejsa...@gmail.com> > wrote: > >> Hi Sean and Ted, >> Thanks for your replies. >> >> I don't have our current problems nicely written up as good questions >> yet. I'm still sorting out classpath issues, etc. >> In case it is of help, I'm seeing: >> * "Exception in thread "Spark Context Cleaner" >> java.lang.NoClassDefFoundError: 0 >> at >> org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1.apply$mcV$sp(ContextCleaner.scala:149)" >> * We've been having clashing dependencies between a colleague and I >> because of the aforementioned classpath issue >> * The clashing dependencies are also causing issues with what jetty >> libraries are available in the classloader from Spark and don't clash with >> existing libraries we have. >> >> More anon, >> >> Cheers, >> Edward >> >> >> >> -------- Original Message -------- >> Subject: Re: spark 1.3.1 jars in repo1.maven.org Date: 2015-05-20 00:38 >> From: Sean Owen <so...@cloudera.com> To: Edward Sargisson < >> esa...@pobox.com> Cc: user <user@spark.apache.org> >> >> >> Yes, the published artifacts can only refer to one version of anything >> (OK, modulo publishing a large number of variants under classifiers). >> >> You aren't intended to rely on Spark's transitive dependencies for >> anything. Compiling against the Spark API has no relation to what >> version of Hadoop it binds against because it's not part of any API. >> You mark the Spark dependency even as "provided" in your build and get >> all the Spark/Hadoop bindings at runtime from our cluster. >> >> What problem are you experiencing? >> >> >> On Wed, May 20, 2015 at 2:17 AM, Edward Sargisson <esa...@pobox.com> >> wrote: >> >> Hi, >> I'd like to confirm an observation I've just made. Specifically that spark >> is only available in repo1.maven.org for one Hadoop variant. >> >> The Spark source can be compiled against a number of different Hadoops >> using >> profiles. Yay. >> However, the spark jars in repo1.maven.org appear to be compiled against >> one >> specific Hadoop and no other differentiation is made. (I can see a >> difference with hadoop-client being 2.2.0 in repo1.maven.org and 1.0.4 in >> the version I compiled locally). >> >> The implication here is that if you have a pom file asking for >> spark-core_2.10 version 1.3.1 then Maven will only give you an Hadoop 2 >> version. Maven assumes that non-snapshot artifacts never change so trying >> to >> load an Hadoop 1 version will end in tears. >> >> This then means that if you compile code against spark-core then there >> will >> probably be classpath NoClassDefFound issues unless the Hadoop 2 version >> is >> exactly the one you want. >> >> Have I gotten this correct? >> >> It happens that our little app is using a Spark context directly from a >> Jetty webapp and the classpath differences were/are causing some >> confusion. >> We are currently installing a Hadoop 1 spark master and worker. >> >> Thanks a lot! >> Edward >> >> >> >> >