[ https://issues.apache.org/jira/browse/SPARK-2075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14252194#comment-14252194 ]
Sean Owen commented on SPARK-2075: ---------------------------------- [~shivaram] No, I don't think that's the case. Certainly not Hadoop 1 vs 2. Here is how the downloadable distros are built: {code} make_binary_release "hadoop1" "-Phive -Phive-thriftserver -Dhadoop.version=1.0.4" & make_binary_release "hadoop1-scala2.11" "-Phive -Dscala-2.11" & make_binary_release "cdh4" "-Phive -Phive-thriftserver -Dhadoop.version=2.0.0-mr1-cdh4.2.0" & make_binary_release "hadoop2.3" "-Phadoop-2.3 -Phive -Phive-thriftserver -Pyarn" & make_binary_release "hadoop2.4" "-Phadoop-2.4 -Phive -Phive-thriftserver -Pyarn" & make_binary_release "mapr3" "-Pmapr3 -Phive -Phive-thriftserver" & make_binary_release "mapr4" "-Pmapr4 -Pyarn -Phive -Phive-thriftserver" & make_binary_release "hadoop2.4-without-hive" "-Phadoop-2.4 -Pyarn" & {code} A default {{mvn release}} would use {{hadoop.version=1.0.4}}. Somebody can correct me if this isn't the case, but I assume that this is what goes to Maven Central. Of course, this behavior is not desirable and not by design, and worth a mention. As far as I can tell it has only been observed arising in Hadoop 1 vs Hadoop 2-compiled artifacts. Although the intended public API is identical, it's not actually 100% compatible. Ideally you could get away with using any copy of the public API. In these particular cases, the safe practice of always harmonizing binaries on client and server is actually necessary, which is no terrible thing I think. Nothing about this requires using {{spark-submit}}, although, that's a way to make sure you're using the same Spark in your app and cluster. > Anonymous classes are missing from Spark distribution > ----------------------------------------------------- > > Key: SPARK-2075 > URL: https://issues.apache.org/jira/browse/SPARK-2075 > Project: Spark > Issue Type: Bug > Components: Build, Spark Core > Affects Versions: 1.0.0 > Reporter: Paul R. Brown > Priority: Critical > > Running a job built against the Maven dep for 1.0.0 and the hadoop1 > distribution produces: > {code} > java.lang.ClassNotFoundException: > org.apache.spark.rdd.RDD$$anonfun$saveAsTextFile$1 > {code} > Here's what's in the Maven dep as of 1.0.0: > {code} > jar tvf > ~/.m2/repository/org/apache/spark/spark-core_2.10/1.0.0/spark-core_2.10-1.0.0.jar > | grep 'rdd/RDD' | grep 'saveAs' > 1519 Mon May 26 13:57:58 PDT 2014 > org/apache/spark/rdd/RDD$anonfun$saveAsTextFile$1.class > 1560 Mon May 26 13:57:58 PDT 2014 > org/apache/spark/rdd/RDD$anonfun$saveAsTextFile$2.class > {code} > And here's what's in the hadoop1 distribution: > {code} > jar tvf spark-assembly-1.0.0-hadoop1.0.4.jar| grep 'rdd/RDD' | grep 'saveAs' > {code} > I.e., it's not there. It is in the hadoop2 distribution: > {code} > jar tvf spark-assembly-1.0.0-hadoop2.2.0.jar| grep 'rdd/RDD' | grep 'saveAs' > 1519 Mon May 26 07:29:54 PDT 2014 > org/apache/spark/rdd/RDD$anonfun$saveAsTextFile$1.class > 1560 Mon May 26 07:29:54 PDT 2014 > org/apache/spark/rdd/RDD$anonfun$saveAsTextFile$2.class > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org