[ https://issues.apache.org/jira/browse/SPARK-2075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14254720#comment-14254720 ]
Sun Rui edited comment on SPARK-2075 at 12/20/14 1:37 PM: ---------------------------------------------------------- [~srowen] I assume that mvn jars were built for Hadoop 1.x as you said that "A default mvn release would use hadoop.version=1.0.4. Somebody can correct me if this isn't the case, but I assume that this is what goes to Maven Central." :) so it is very possible that mvn jars were actually built for Hadoop 2.x, which is the cause for this issue. Yes, I agree that we should using match binaries. The problem is that I thought I was using the matching binaries but actually not. So we can 1. For exising releases, document the fact that mvn jars are for Hadoop 2.x. If app depends on mvn jars, it should be used with Hadoop 2.x. If the app is intended to work with Hadoop 1.x, one way is to rebuild Spark source against Hadoop 1.x and publish the module jars to local mvn repo. 2. For futrure release, we may eliminate the Spark core bytecode incompatibility between Hadoop1.x and 2.x just as Shixong's PR is trying to do. was (Author: sunrui): [~srowen] I assume that mvn jars were built for Hadoop 1.x as you said that "A default mvn release would use hadoop.version=1.0.4. Somebody can correct me if this isn't the case, but I assume that this is what goes to Maven Central." :) so it is very possible that mvn jars were actually built for Hadoop 2.x, which is the cause for this issue. Yes, I agree that we should using match binaries. The problem is that we thought we are using the matching binaries but actually not. So we need a way to avoid this (documentation, or fixing ) > Anonymous classes are missing from Spark distribution > ----------------------------------------------------- > > Key: SPARK-2075 > URL: https://issues.apache.org/jira/browse/SPARK-2075 > Project: Spark > Issue Type: Bug > Components: Build, Spark Core > Affects Versions: 1.0.0 > Reporter: Paul R. Brown > Assignee: Shixiong Zhu > Priority: Critical > > Running a job built against the Maven dep for 1.0.0 and the hadoop1 > distribution produces: > {code} > java.lang.ClassNotFoundException: > org.apache.spark.rdd.RDD$$anonfun$saveAsTextFile$1 > {code} > Here's what's in the Maven dep as of 1.0.0: > {code} > jar tvf > ~/.m2/repository/org/apache/spark/spark-core_2.10/1.0.0/spark-core_2.10-1.0.0.jar > | grep 'rdd/RDD' | grep 'saveAs' > 1519 Mon May 26 13:57:58 PDT 2014 > org/apache/spark/rdd/RDD$anonfun$saveAsTextFile$1.class > 1560 Mon May 26 13:57:58 PDT 2014 > org/apache/spark/rdd/RDD$anonfun$saveAsTextFile$2.class > {code} > And here's what's in the hadoop1 distribution: > {code} > jar tvf spark-assembly-1.0.0-hadoop1.0.4.jar| grep 'rdd/RDD' | grep 'saveAs' > {code} > I.e., it's not there. It is in the hadoop2 distribution: > {code} > jar tvf spark-assembly-1.0.0-hadoop2.2.0.jar| grep 'rdd/RDD' | grep 'saveAs' > 1519 Mon May 26 07:29:54 PDT 2014 > org/apache/spark/rdd/RDD$anonfun$saveAsTextFile$1.class > 1560 Mon May 26 07:29:54 PDT 2014 > org/apache/spark/rdd/RDD$anonfun$saveAsTextFile$2.class > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org