Thanks for the explanation. To be clear, I meant to speak for any hadoop 2 releases before 2.2, which have profiles in Spark. I referred to CDH4, since that¹s the only Hadoop 2.0/2.1 version Spark ships a prebuilt package for.
I understand the hesitation of making a code change if Spark doesn¹t plan to support Hadoop 2.0/2.1 in general. (Please note, this is not specific to CDH4) If so, can I propose alternative options until Spark moves to only support hadoop2? - Build the CDH4 package with ³-Davro.mapred.classifier=hadoop2², and update http://spark.apache.org/docs/latest/building-spark.html for all ³2.0.*² examples. - Build the CDH4 package as is, but note known issues clearly in the ³download² page. - Simply do not ship CDH4 prebuilt package, and let people figure it out themselves. Preferably, note in documentation that ³-Davro.mapred.classifier=hadoop2² should be used for all hadoop ³2.0.*² builds. Please let me know what you think! Mingyu On 2/20/15, 2:34 AM, "Sean Owen" <so...@cloudera.com> wrote: >True, although a number of other little issues make me, personally, >not want to continue down this road: > >- There are already a lot of build profiles to try to cover Hadoop >versions >- I don't think it's quite right to have vendor-specific builds in >Spark to begin with >- We should be moving to only support Hadoop 2 soon IMHO anyway >- CDH4 is EOL in a few months I think > >On Fri, Feb 20, 2015 at 8:30 AM, Mingyu Kim <m...@palantir.com> wrote: >> Hi all, >> >> Related to >>https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_ji >>ra_browse_SPARK-2D3039&d=AwIFaQ&c=izlc9mHr637UR4lpLEZLFFS3Vn2UXBrZ4tFb6oO >>nmz8&r=ennQJq47pNnObsDh-88a9YUrUulcYQoV8giPASqXB84&m=s1MfvBlt11h2xojQItkw >>aeh094ttUKTu9K5F-lA6DJY&s=Sb2SVubKkvdjaLer3K-b_Z0RfeC1fm-CP4A-Uh6nvEQ&e= >>, the default CDH4 build, which is built with "mvn >>-Dhadoop.version=2.0.0-mr1-cdh4.2.0 -DskipTests clean package², pulls in >>avro-mapred hadoop1, as opposed to avro-mapred hadoop2. This ends up in >>the same error as mentioned in the linked bug. (pasted below). >> >> The right solution would be to create a hadoop-2.0 profile that sets >>avro.mapred.classifier to hadoop2, and to build CDH4 build with >>³-Phadoop-2.0² option. >> >> What do people think? >> >> Mingyu >> >> ‹‹‹‹‹‹‹‹‹‹ >> >> java.lang.IncompatibleClassChangeError: Found interface >>org.apache.hadoop.mapreduce.TaskAttemptContext, but class was expected >> at >>org.apache.avro.mapreduce.AvroKeyInputFormat.createRecordReader(AvroKeyIn >>putFormat.java:47) >> at >>org.apache.spark.rdd.NewHadoopRDD$$anon$1.<init>(NewHadoopRDD.scala:133) >> at >>org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:107) >> at >>org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:69) >> at >>org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:280) >> at org.apache.spark.rdd.RDD.iterator(RDD.scala:247) >> at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31) >> at >>org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:280) >> at org.apache.spark.rdd.RDD.iterator(RDD.scala:247) >> at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31) >> at >>org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:280) >> at org.apache.spark.rdd.RDD.iterator(RDD.scala:247) >> at org.apache.spark.rdd.FilteredRDD.compute(FilteredRDD.scala:34) >> at >>org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:280) >> at org.apache.spark.rdd.RDD.iterator(RDD.scala:247) >> at >>org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61) >> at org.apache.spark.scheduler.Task.run(Task.scala:56) >> at >>org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:200) >> at >>java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java >>:1145) >> at >>java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.jav >>a:615) >> at java.lang.Thread.run(Thread.java:745) >> --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org