[jira] [Commented] (SPARK-2075) Hadoop1 distribution of 1.0.0 does not contain classes expected by the Maven 1.0.0 artifact

Patrick Wendell (JIRA) Mon, 09 Jun 2014 10:47:38 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-2075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14025432#comment-14025432
 ]


Patrick Wendell commented on SPARK-2075:
----------------------------------------

I see - so the issue is that your closures are compiled to call that anonymous 
class in your driver. Then on the cluster the anonymous class is named 
differently. I wonder if scala can produce stable naming for anonymous classes, 
that would be the best solution because otherwise it sort of breaks closures. I 
looked around the compiler flags but I don't see anything useful.

One solution here is to see if we can avoid re-compiling spark-core when we 
make the different builds. The tricky bit though is that if users compile their 
own downstream version of Spark we'd have to make sure they don't end up with 
different versions as well.

> Hadoop1 distribution of 1.0.0 does not contain classes expected by the Maven 
> 1.0.0 artifact
> -------------------------------------------------------------------------------------------
>
>                 Key: SPARK-2075
>                 URL: https://issues.apache.org/jira/browse/SPARK-2075
>             Project: Spark
>          Issue Type: Bug
>          Components: Build, Spark Core
>    Affects Versions: 1.0.0
>            Reporter: Paul R. Brown
>
> Running a job built against the Maven dep for 1.0.0 and the hadoop1 
> distribution produces:
> {code}
> java.lang.ClassNotFoundException:
> org.apache.spark.rdd.RDD$$anonfun$saveAsTextFile$1
> {code}
> Here's what's in the Maven dep as of 1.0.0:
> {code}
> jar tvf 
> ~/.m2/repository/org/apache/spark/spark-core_2.10/1.0.0/spark-core_2.10-1.0.0.jar
>  | grep 'rdd/RDD' | grep 'saveAs'
>   1519 Mon May 26 13:57:58 PDT 2014 
> org/apache/spark/rdd/RDD$anonfun$saveAsTextFile$1.class
>   1560 Mon May 26 13:57:58 PDT 2014 
> org/apache/spark/rdd/RDD$anonfun$saveAsTextFile$2.class
> {code}
> And here's what's in the hadoop1 distribution:
> {code}
> jar tvf spark-assembly-1.0.0-hadoop1.0.4.jar| grep 'rdd/RDD' | grep 'saveAs'
> {code}
> I.e., it's not there.  It is in the hadoop2 distribution:
> {code}
> jar tvf spark-assembly-1.0.0-hadoop2.2.0.jar| grep 'rdd/RDD' | grep 'saveAs'
>   1519 Mon May 26 07:29:54 PDT 2014 
> org/apache/spark/rdd/RDD$anonfun$saveAsTextFile$1.class
>   1560 Mon May 26 07:29:54 PDT 2014 
> org/apache/spark/rdd/RDD$anonfun$saveAsTextFile$2.class
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (SPARK-2075) Hadoop1 distribution of 1.0.0 does not contain classes expected by the Maven 1.0.0 artifact

Reply via email to