Re: Re: spark 1.3.1 jars in repo1.maven.org

Ryan Williams Tue, 02 Jun 2015 09:11:21 -0700

I think this is causing issues upgrading ADAM
<https://github.com/bigdatagenomics/adam> to Spark 1.3.1 (cf. adam#690
<https://github.com/bigdatagenomics/adam/pull/690#issuecomment-107769383>);
attempting to build against Hadoop 1.0.4 yields errors like:


2015-06-02 15:57:44 ERROR Executor:96 - Exception in task 0.0 in stage 0.0
(TID 0)
*java.lang.IncompatibleClassChangeError: Found class
org.apache.hadoop.mapreduce.TaskAttemptContext, but interface was expected*
at
org.apache.spark.mapred.SparkHadoopMapRedUtil$.commitTask(SparkHadoopMapRedUtil.scala:95)
at org.apache.spark.SparkHadoopWriter.commit(SparkHadoopWriter.scala:106)
at
org.apache.spark.rdd.PairRDDFunctions$$anonfun$13.apply(PairRDDFunctions.scala:1082)
at
org.apache.spark.rdd.PairRDDFunctions$$anonfun$13.apply(PairRDDFunctions.scala:1059)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
at org.apache.spark.scheduler.Task.run(Task.scala:64)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
2015-06-02 15:57:44 WARN  TaskSetManager:71 - Lost task 0.0 in stage 0.0
(TID 0, localhost): java.lang.IncompatibleClassChangeError: Found class
org.apache.hadoop.mapreduce.TaskAttemptContext, but interface was expected
at
org.apache.spark.mapred.SparkHadoopMapRedUtil$.commitTask(SparkHadoopMapRedUtil.scala:95)
at org.apache.spark.SparkHadoopWriter.commit(SparkHadoopWriter.scala:106)
at
org.apache.spark.rdd.PairRDDFunctions$$anonfun$13.apply(PairRDDFunctions.scala:1082)
at
org.apache.spark.rdd.PairRDDFunctions$$anonfun$13.apply(PairRDDFunctions.scala:1059)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
at org.apache.spark.scheduler.Task.run(Task.scala:64)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

TaskAttemptContext is a class in Hadoop 1.0.4, but an interface in Hadoop
2; Spark 1.3.1 expects the interface but is getting the class.

It sounds like, while I *can* depend on Spark 1.3.1 and Hadoop 1.0.4, I
then need to hope that I don't exercise certain Spark code paths that run
afoul of differences between Hadoop 1 and 2; does that seem correct?

Thanks!

On Wed, May 20, 2015 at 1:52 PM Sean Owen <so...@cloudera.com> wrote:

> I don't think any of those problems are related to Hadoop. Have you looked
> at userClassPathFirst settings?
>
> On Wed, May 20, 2015 at 6:46 PM, Edward Sargisson <ejsa...@gmail.com>
> wrote:
>
>> Hi Sean and Ted,
>> Thanks for your replies.
>>
>> I don't have our current problems nicely written up as good questions
>> yet. I'm still sorting out classpath issues, etc.
>> In case it is of help, I'm seeing:
>> * "Exception in thread "Spark Context Cleaner"
>> java.lang.NoClassDefFoundError: 0
>>         at
>> org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1.apply$mcV$sp(ContextCleaner.scala:149)"
>> * We've been having clashing dependencies between a colleague and I
>> because of the aforementioned classpath issue
>> * The clashing dependencies are also causing issues with what jetty
>> libraries are available in the classloader from Spark and don't clash with
>> existing libraries we have.
>>
>> More anon,
>>
>> Cheers,
>> Edward
>>
>>
>>
>> -------- Original Message --------
>>  Subject: Re: spark 1.3.1 jars in repo1.maven.org Date: 2015-05-20 00:38
>> From: Sean Owen <so...@cloudera.com> To: Edward Sargisson <
>> esa...@pobox.com> Cc: user <user@spark.apache.org>
>>
>>
>> Yes, the published artifacts can only refer to one version of anything
>> (OK, modulo publishing a large number of variants under classifiers).
>>
>> You aren't intended to rely on Spark's transitive dependencies for
>> anything. Compiling against the Spark API has no relation to what
>> version of Hadoop it binds against because it's not part of any API.
>> You mark the Spark dependency even as "provided" in your build and get
>> all the Spark/Hadoop bindings at runtime from our cluster.
>>
>> What problem are you experiencing?
>>
>>
>> On Wed, May 20, 2015 at 2:17 AM, Edward Sargisson <esa...@pobox.com>
>> wrote:
>>
>> Hi,
>> I'd like to confirm an observation I've just made. Specifically that spark
>> is only available in repo1.maven.org for one Hadoop variant.
>>
>> The Spark source can be compiled against a number of different Hadoops
>> using
>> profiles. Yay.
>> However, the spark jars in repo1.maven.org appear to be compiled against
>> one
>> specific Hadoop and no other differentiation is made. (I can see a
>> difference with hadoop-client being 2.2.0 in repo1.maven.org and 1.0.4 in
>> the version I compiled locally).
>>
>> The implication here is that if you have a pom file asking for
>> spark-core_2.10 version 1.3.1 then Maven will only give you an Hadoop 2
>> version. Maven assumes that non-snapshot artifacts never change so trying
>> to
>> load an Hadoop 1 version will end in tears.
>>
>> This then means that if you compile code against spark-core then there
>> will
>> probably be classpath NoClassDefFound issues unless the Hadoop 2 version
>> is
>> exactly the one you want.
>>
>> Have I gotten this correct?
>>
>> It happens that our little app is using a Spark context directly from a
>> Jetty webapp and the classpath differences were/are causing some
>> confusion.
>> We are currently installing a Hadoop 1 spark master and worker.
>>
>> Thanks a lot!
>> Edward
>>
>>
>>
>>
>

Re: Re: spark 1.3.1 jars in repo1.maven.org

Reply via email to