[jira] [Comment Edited] (SPARK-2075) Anonymous classes are missing from Spark distribution

Sun Rui (JIRA) Sat, 20 Dec 2014 05:37:28 -0800

    [ 
https://issues.apache.org/jira/browse/SPARK-2075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14254720#comment-14254720
 ]


Sun Rui edited comment on SPARK-2075 at 12/20/14 1:37 PM:
----------------------------------------------------------

[~srowen] I assume that mvn jars were built for Hadoop 1.x as you said that "A 
default mvn release would use hadoop.version=1.0.4. Somebody can correct me if 
this isn't the case, but I assume that this is what goes to Maven Central." :) 
so it is very possible that mvn jars were actually built for Hadoop 2.x, which 
is the cause for this issue.

Yes, I agree that we should using match binaries. The problem is that I thought 
I was using the matching binaries but actually not. So we can 
1. For exising releases,  document the fact that mvn jars are for Hadoop 2.x. 
If app depends on mvn jars, it should be       used with Hadoop 2.x. If the app 
is intended to work with Hadoop 1.x, one way is to rebuild Spark source against 
Hadoop 1.x and publish the module jars to local mvn repo.
2. For futrure release, we may eliminate the Spark core bytecode 
incompatibility between Hadoop1.x and 2.x just as Shixong's PR is trying to do. 


was (Author: sunrui):
[~srowen] I assume that mvn jars were built for Hadoop 1.x as you said that "A 
default mvn release would use hadoop.version=1.0.4. Somebody can correct me if 
this isn't the case, but I assume that this is what goes to Maven Central." :) 
so it is very possible that mvn jars were actually built for Hadoop 2.x, which 
is the cause for this issue.

Yes, I agree that we should using match binaries. The problem is that we 
thought we are using the matching binaries but actually not. So we need a way 
to avoid this (documentation, or fixing )

> Anonymous classes are missing from Spark distribution
> -----------------------------------------------------
>
>                 Key: SPARK-2075
>                 URL: https://issues.apache.org/jira/browse/SPARK-2075
>             Project: Spark
>          Issue Type: Bug
>          Components: Build, Spark Core
>    Affects Versions: 1.0.0
>            Reporter: Paul R. Brown
>            Assignee: Shixiong Zhu
>            Priority: Critical
>
> Running a job built against the Maven dep for 1.0.0 and the hadoop1 
> distribution produces:
> {code}
> java.lang.ClassNotFoundException:
> org.apache.spark.rdd.RDD$$anonfun$saveAsTextFile$1
> {code}
> Here's what's in the Maven dep as of 1.0.0:
> {code}
> jar tvf 
> ~/.m2/repository/org/apache/spark/spark-core_2.10/1.0.0/spark-core_2.10-1.0.0.jar
>  | grep 'rdd/RDD' | grep 'saveAs'
>   1519 Mon May 26 13:57:58 PDT 2014 
> org/apache/spark/rdd/RDD$anonfun$saveAsTextFile$1.class
>   1560 Mon May 26 13:57:58 PDT 2014 
> org/apache/spark/rdd/RDD$anonfun$saveAsTextFile$2.class
> {code}
> And here's what's in the hadoop1 distribution:
> {code}
> jar tvf spark-assembly-1.0.0-hadoop1.0.4.jar| grep 'rdd/RDD' | grep 'saveAs'
> {code}
> I.e., it's not there.  It is in the hadoop2 distribution:
> {code}
> jar tvf spark-assembly-1.0.0-hadoop2.2.0.jar| grep 'rdd/RDD' | grep 'saveAs'
>   1519 Mon May 26 07:29:54 PDT 2014 
> org/apache/spark/rdd/RDD$anonfun$saveAsTextFile$1.class
>   1560 Mon May 26 07:29:54 PDT 2014 
> org/apache/spark/rdd/RDD$anonfun$saveAsTextFile$2.class
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-2075) Anonymous classes are missing from Spark distribution

Reply via email to