True, although a number of other little issues make me, personally, not want to continue down this road:
- There are already a lot of build profiles to try to cover Hadoop versions - I don't think it's quite right to have vendor-specific builds in Spark to begin with - We should be moving to only support Hadoop 2 soon IMHO anyway - CDH4 is EOL in a few months I think On Fri, Feb 20, 2015 at 8:30 AM, Mingyu Kim <m...@palantir.com> wrote: > Hi all, > > Related to https://issues.apache.org/jira/browse/SPARK-3039, the default CDH4 > build, which is built with "mvn -Dhadoop.version=2.0.0-mr1-cdh4.2.0 > -DskipTests clean package”, pulls in avro-mapred hadoop1, as opposed to > avro-mapred hadoop2. This ends up in the same error as mentioned in the > linked bug. (pasted below). > > The right solution would be to create a hadoop-2.0 profile that sets > avro.mapred.classifier to hadoop2, and to build CDH4 build with > “-Phadoop-2.0” option. > > What do people think? > > Mingyu > > —————————— > > java.lang.IncompatibleClassChangeError: Found interface > org.apache.hadoop.mapreduce.TaskAttemptContext, but class was expected > at > org.apache.avro.mapreduce.AvroKeyInputFormat.createRecordReader(AvroKeyInputFormat.java:47) > at > org.apache.spark.rdd.NewHadoopRDD$$anon$1.<init>(NewHadoopRDD.scala:133) > at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:107) > at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:69) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:280) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:247) > at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:280) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:247) > at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:280) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:247) > at org.apache.spark.rdd.FilteredRDD.compute(FilteredRDD.scala:34) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:280) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:247) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61) > at org.apache.spark.scheduler.Task.run(Task.scala:56) > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:200) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org