[ 
https://issues.apache.org/jira/browse/SPARK-2075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14025255#comment-14025255
 ] 

Paul R. Brown commented on SPARK-2075:
--------------------------------------

The job is run by a Java client that connects to the master (using a 
SparkContext).

Bundling is performed by a Maven build with two shade plugin invocations, one 
to package a "driver" uberjar and one to packager a "worker" uberjar.  The 
worker flavor is sent to the worker nodes, the driver contains the code to 
connect to the master and run the job.  The Maven build runs against the JAR 
from Maven Central, and the deployment uses the Spark 1.0.0 hadoop1 download.  
(The Spark is staged to S3 once and then downloaded onto master/worker nodes 
and set up during cluster provisioning.)

The Maven build uses the usual Scala setup with the library as a dependency and 
the plugin:

{code}
        <dependency>
            <groupId>org.scala-lang</groupId>
            <artifactId>scala-library</artifactId>
            <version>2.10.3</version>
        </dependency>
{code}

{code}
            <plugin>
                <groupId>net.alchim31.maven</groupId>
                <artifactId>scala-maven-plugin</artifactId>
                <executions>
                    <execution>
                        <goals>
                            <goal>compile</goal>
                            <goal>testCompile</goal>
                        </goals>
                    </execution>
                </executions>
                <configuration>
                    <scalaVersion>2.10.3</scalaVersion>
                    <jvmArgs>
                      <jvmArg>-Xms64m</jvmArg>
                      <jvmArg>-Xmx4096m</jvmArg>
                    </jvmArgs>
                </configuration>
            </plugin>
{code}


> Hadoop1 distribution of 1.0.0 does not contain classes expected by the Maven 
> 1.0.0 artifact
> -------------------------------------------------------------------------------------------
>
>                 Key: SPARK-2075
>                 URL: https://issues.apache.org/jira/browse/SPARK-2075
>             Project: Spark
>          Issue Type: Bug
>          Components: Build, Spark Core
>    Affects Versions: 1.0.0
>            Reporter: Paul R. Brown
>
> Running a job built against the Maven dep for 1.0.0 and the hadoop1 
> distribution produces:
> {code}
> java.lang.ClassNotFoundException:
> org.apache.spark.rdd.RDD$$anonfun$saveAsTextFile$1
> {code}
> Here's what's in the Maven dep as of 1.0.0:
> {code}
> jar tvf 
> ~/.m2/repository/org/apache/spark/spark-core_2.10/1.0.0/spark-core_2.10-1.0.0.jar
>  | grep 'rdd/RDD' | grep 'saveAs'
>   1519 Mon May 26 13:57:58 PDT 2014 
> org/apache/spark/rdd/RDD$anonfun$saveAsTextFile$1.class
>   1560 Mon May 26 13:57:58 PDT 2014 
> org/apache/spark/rdd/RDD$anonfun$saveAsTextFile$2.class
> {code}
> And here's what's in the hadoop1 distribution:
> {code}
> jar tvf spark-assembly-1.0.0-hadoop1.0.4.jar| grep 'rdd/RDD' | grep 'saveAs'
> {code}
> I.e., it's not there.  It is in the hadoop2 distribution:
> {code}
> jar tvf spark-assembly-1.0.0-hadoop2.2.0.jar| grep 'rdd/RDD' | grep 'saveAs'
>   1519 Mon May 26 07:29:54 PDT 2014 
> org/apache/spark/rdd/RDD$anonfun$saveAsTextFile$1.class
>   1560 Mon May 26 07:29:54 PDT 2014 
> org/apache/spark/rdd/RDD$anonfun$saveAsTextFile$2.class
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to