[ https://issues.apache.org/jira/browse/SPARK-2075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14025255#comment-14025255 ]
Paul R. Brown commented on SPARK-2075: -------------------------------------- The job is run by a Java client that connects to the master (using a SparkContext). Bundling is performed by a Maven build with two shade plugin invocations, one to package a "driver" uberjar and one to packager a "worker" uberjar. The worker flavor is sent to the worker nodes, the driver contains the code to connect to the master and run the job. The Maven build runs against the JAR from Maven Central, and the deployment uses the Spark 1.0.0 hadoop1 download. (The Spark is staged to S3 once and then downloaded onto master/worker nodes and set up during cluster provisioning.) The Maven build uses the usual Scala setup with the library as a dependency and the plugin: {code} <dependency> <groupId>org.scala-lang</groupId> <artifactId>scala-library</artifactId> <version>2.10.3</version> </dependency> {code} {code} <plugin> <groupId>net.alchim31.maven</groupId> <artifactId>scala-maven-plugin</artifactId> <executions> <execution> <goals> <goal>compile</goal> <goal>testCompile</goal> </goals> </execution> </executions> <configuration> <scalaVersion>2.10.3</scalaVersion> <jvmArgs> <jvmArg>-Xms64m</jvmArg> <jvmArg>-Xmx4096m</jvmArg> </jvmArgs> </configuration> </plugin> {code} > Hadoop1 distribution of 1.0.0 does not contain classes expected by the Maven > 1.0.0 artifact > ------------------------------------------------------------------------------------------- > > Key: SPARK-2075 > URL: https://issues.apache.org/jira/browse/SPARK-2075 > Project: Spark > Issue Type: Bug > Components: Build, Spark Core > Affects Versions: 1.0.0 > Reporter: Paul R. Brown > > Running a job built against the Maven dep for 1.0.0 and the hadoop1 > distribution produces: > {code} > java.lang.ClassNotFoundException: > org.apache.spark.rdd.RDD$$anonfun$saveAsTextFile$1 > {code} > Here's what's in the Maven dep as of 1.0.0: > {code} > jar tvf > ~/.m2/repository/org/apache/spark/spark-core_2.10/1.0.0/spark-core_2.10-1.0.0.jar > | grep 'rdd/RDD' | grep 'saveAs' > 1519 Mon May 26 13:57:58 PDT 2014 > org/apache/spark/rdd/RDD$anonfun$saveAsTextFile$1.class > 1560 Mon May 26 13:57:58 PDT 2014 > org/apache/spark/rdd/RDD$anonfun$saveAsTextFile$2.class > {code} > And here's what's in the hadoop1 distribution: > {code} > jar tvf spark-assembly-1.0.0-hadoop1.0.4.jar| grep 'rdd/RDD' | grep 'saveAs' > {code} > I.e., it's not there. It is in the hadoop2 distribution: > {code} > jar tvf spark-assembly-1.0.0-hadoop2.2.0.jar| grep 'rdd/RDD' | grep 'saveAs' > 1519 Mon May 26 07:29:54 PDT 2014 > org/apache/spark/rdd/RDD$anonfun$saveAsTextFile$1.class > 1560 Mon May 26 07:29:54 PDT 2014 > org/apache/spark/rdd/RDD$anonfun$saveAsTextFile$2.class > {code} -- This message was sent by Atlassian JIRA (v6.2#6252)