[ https://issues.apache.org/jira/browse/SPARK-2075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14021465#comment-14021465 ]
Patrick Wendell commented on SPARK-2075: ---------------------------------------- Okay I did some more digging. I think the issue is that the anonymous classes used by saveAsTextFile are not guaranteed to be compiled to the same name every time you compile them in Scala. In the Hadoop 1 build these end up being shortened wheras in the Hadoop 2 build they use the longer names. saveAsTextFile seems to, strangely, be the only affected function. I confirmed this by looking at the difference in the hadoop 1 and 2 jars: {code} $ jar tvf spark-1.0.0-bin-hadoop1/lib/spark-assembly-1.0.0-hadoop*.jar |grep "rdd\/RDD\\$" | awk '{ print $8;}' | sort > hadoop1 $ jar tvf spark-1.0.0-bin-hadoop2/lib/spark-assembly-1.0.0-hadoop*.jar |grep "rdd\/RDD\\$" | awk '{ print $8;}' | sort > hadoop2 $ diff hadoop1 hadoop2 23a24 > org/apache/spark/rdd/RDD$$anonfun$28$$anonfun$apply$13.class 27,29d27 < org/apache/spark/rdd/RDD$$anonfun$30$$anonfun$apply$13.class < org/apache/spark/rdd/RDD$$anonfun$30.class < org/apache/spark/rdd/RDD$$anonfun$31.class 90a89,90 > org/apache/spark/rdd/RDD$$anonfun$saveAsTextFile$1.class > org/apache/spark/rdd/RDD$$anonfun$saveAsTextFile$2.class {code} This strangely only seems to affect the saveAsTextFile function. I'm still a bit confused though because I didn't think these anonymous classes would show up in the byte code of the user application, so I don't think it should matter (i.e. this is why Scala probably allows this). [~paulrbrown] could you explain how you are bundling and submitting your application to the Spark cluster? > Hadoop1 distribution of 1.0.0 does not contain classes expected by the Maven > 1.0.0 artifact > ------------------------------------------------------------------------------------------- > > Key: SPARK-2075 > URL: https://issues.apache.org/jira/browse/SPARK-2075 > Project: Spark > Issue Type: Bug > Components: Build, Spark Core > Affects Versions: 1.0.0 > Reporter: Paul R. Brown > > Running a job built against the Maven dep for 1.0.0 and the hadoop1 > distribution produces: > {code} > java.lang.ClassNotFoundException: > org.apache.spark.rdd.RDD$$anonfun$saveAsTextFile$1 > {code} > Here's what's in the Maven dep as of 1.0.0: > {code} > jar tvf > ~/.m2/repository/org/apache/spark/spark-core_2.10/1.0.0/spark-core_2.10-1.0.0.jar > | grep 'rdd/RDD' | grep 'saveAs' > 1519 Mon May 26 13:57:58 PDT 2014 > org/apache/spark/rdd/RDD$anonfun$saveAsTextFile$1.class > 1560 Mon May 26 13:57:58 PDT 2014 > org/apache/spark/rdd/RDD$anonfun$saveAsTextFile$2.class > {code} > And here's what's in the hadoop1 distribution: > {code} > jar tvf spark-assembly-1.0.0-hadoop1.0.4.jar| grep 'rdd/RDD' | grep 'saveAs' > {code} > I.e., it's not there. It is in the hadoop2 distribution: > {code} > jar tvf spark-assembly-1.0.0-hadoop2.2.0.jar| grep 'rdd/RDD' | grep 'saveAs' > 1519 Mon May 26 07:29:54 PDT 2014 > org/apache/spark/rdd/RDD$anonfun$saveAsTextFile$1.class > 1560 Mon May 26 07:29:54 PDT 2014 > org/apache/spark/rdd/RDD$anonfun$saveAsTextFile$2.class > {code} -- This message was sent by Atlassian JIRA (v6.2#6252)