[jira] [Comment Edited] (SPARK-2075) Hadoop1 distribution of 1.0.0 does not contain classes expected by the Maven 1.0.0 artifact

2014-06-09 Thread Paul R. Brown (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14025296#comment-14025296
 ] 

Paul R. Brown edited comment on SPARK-2075 at 6/9/14 3:36 PM:
--

As food for thought, 
[here|http://docs.oracle.com/javase/specs/jvms/se7/html/jvms-4.html#jvms-4.7.6] 
is the {{code}}InnerClass{{code}} section of the JVM spec.  It looks like there 
have been some changes from 2.10.3 to 2.10.4 (e.g., 
[SI-6546|https://issues.scala-lang.org/browse/SI-6546], but I didn't dig in.

I think the thing most likely to work is to ensure that exactly the same bits 
are used by all of the distributions and posted to Maven Central.  (For some 
discussion on inner class naming stability, there was quite a bit of it on the 
Java 8 lambda discussion list, e.g., [this 
message|http://mail.openjdk.java.net/pipermail/lambda-spec-experts/2013-July/000316.html].)


was (Author: paulrbrown):
As food for thought, 
[here|http://docs.oracle.com/javase/specs/jvms/se7/html/jvms-4.html#jvms-4.7.6] 
is the {code}InnerClass{code} section of the JVM spec.  It looks like there 
have been some changes from 2.10.3 to 2.10.4 (e.g., 
[SI-6546|https://issues.scala-lang.org/browse/SI-6546], but I didn't dig in.

I think the thing most likely to work is to ensure that exactly the same bits 
are used by all of the distributions and posted to Maven Central.  (For some 
discussion on inner class naming stability, there was quite a bit of it on the 
Java 8 lambda discussion list, e.g., [this 
message|http://mail.openjdk.java.net/pipermail/lambda-spec-experts/2013-July/000316.html].)

 Hadoop1 distribution of 1.0.0 does not contain classes expected by the Maven 
 1.0.0 artifact
 ---

 Key: SPARK-2075
 URL: https://issues.apache.org/jira/browse/SPARK-2075
 Project: Spark
  Issue Type: Bug
  Components: Build, Spark Core
Affects Versions: 1.0.0
Reporter: Paul R. Brown

 Running a job built against the Maven dep for 1.0.0 and the hadoop1 
 distribution produces:
 {code}
 java.lang.ClassNotFoundException:
 org.apache.spark.rdd.RDD$$anonfun$saveAsTextFile$1
 {code}
 Here's what's in the Maven dep as of 1.0.0:
 {code}
 jar tvf 
 ~/.m2/repository/org/apache/spark/spark-core_2.10/1.0.0/spark-core_2.10-1.0.0.jar
  | grep 'rdd/RDD' | grep 'saveAs'
   1519 Mon May 26 13:57:58 PDT 2014 
 org/apache/spark/rdd/RDD$anonfun$saveAsTextFile$1.class
   1560 Mon May 26 13:57:58 PDT 2014 
 org/apache/spark/rdd/RDD$anonfun$saveAsTextFile$2.class
 {code}
 And here's what's in the hadoop1 distribution:
 {code}
 jar tvf spark-assembly-1.0.0-hadoop1.0.4.jar| grep 'rdd/RDD' | grep 'saveAs'
 {code}
 I.e., it's not there.  It is in the hadoop2 distribution:
 {code}
 jar tvf spark-assembly-1.0.0-hadoop2.2.0.jar| grep 'rdd/RDD' | grep 'saveAs'
   1519 Mon May 26 07:29:54 PDT 2014 
 org/apache/spark/rdd/RDD$anonfun$saveAsTextFile$1.class
   1560 Mon May 26 07:29:54 PDT 2014 
 org/apache/spark/rdd/RDD$anonfun$saveAsTextFile$2.class
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (SPARK-2075) Hadoop1 distribution of 1.0.0 does not contain classes expected by the Maven 1.0.0 artifact

2014-06-09 Thread Paul R. Brown (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14025296#comment-14025296
 ] 

Paul R. Brown edited comment on SPARK-2075 at 6/9/14 3:37 PM:
--

As food for thought, 
[here|http://docs.oracle.com/javase/specs/jvms/se7/html/jvms-4.html#jvms-4.7.6] 
is the {{InnerClass}} section of the JVM spec.  It looks like there have been 
some changes from 2.10.3 to 2.10.4 (e.g., 
[SI-6546|https://issues.scala-lang.org/browse/SI-6546], but I didn't dig in.

I think the thing most likely to work is to ensure that exactly the same bits 
are used by all of the distributions and posted to Maven Central.  (For some 
discussion on inner class naming stability, there was quite a bit of it on the 
Java 8 lambda discussion list, e.g., [this 
message|http://mail.openjdk.java.net/pipermail/lambda-spec-experts/2013-July/000316.html].)


was (Author: paulrbrown):
As food for thought, 
[here|http://docs.oracle.com/javase/specs/jvms/se7/html/jvms-4.html#jvms-4.7.6] 
is the {{code}}InnerClass{{code}} section of the JVM spec.  It looks like there 
have been some changes from 2.10.3 to 2.10.4 (e.g., 
[SI-6546|https://issues.scala-lang.org/browse/SI-6546], but I didn't dig in.

I think the thing most likely to work is to ensure that exactly the same bits 
are used by all of the distributions and posted to Maven Central.  (For some 
discussion on inner class naming stability, there was quite a bit of it on the 
Java 8 lambda discussion list, e.g., [this 
message|http://mail.openjdk.java.net/pipermail/lambda-spec-experts/2013-July/000316.html].)

 Hadoop1 distribution of 1.0.0 does not contain classes expected by the Maven 
 1.0.0 artifact
 ---

 Key: SPARK-2075
 URL: https://issues.apache.org/jira/browse/SPARK-2075
 Project: Spark
  Issue Type: Bug
  Components: Build, Spark Core
Affects Versions: 1.0.0
Reporter: Paul R. Brown

 Running a job built against the Maven dep for 1.0.0 and the hadoop1 
 distribution produces:
 {code}
 java.lang.ClassNotFoundException:
 org.apache.spark.rdd.RDD$$anonfun$saveAsTextFile$1
 {code}
 Here's what's in the Maven dep as of 1.0.0:
 {code}
 jar tvf 
 ~/.m2/repository/org/apache/spark/spark-core_2.10/1.0.0/spark-core_2.10-1.0.0.jar
  | grep 'rdd/RDD' | grep 'saveAs'
   1519 Mon May 26 13:57:58 PDT 2014 
 org/apache/spark/rdd/RDD$anonfun$saveAsTextFile$1.class
   1560 Mon May 26 13:57:58 PDT 2014 
 org/apache/spark/rdd/RDD$anonfun$saveAsTextFile$2.class
 {code}
 And here's what's in the hadoop1 distribution:
 {code}
 jar tvf spark-assembly-1.0.0-hadoop1.0.4.jar| grep 'rdd/RDD' | grep 'saveAs'
 {code}
 I.e., it's not there.  It is in the hadoop2 distribution:
 {code}
 jar tvf spark-assembly-1.0.0-hadoop2.2.0.jar| grep 'rdd/RDD' | grep 'saveAs'
   1519 Mon May 26 07:29:54 PDT 2014 
 org/apache/spark/rdd/RDD$anonfun$saveAsTextFile$1.class
   1560 Mon May 26 07:29:54 PDT 2014 
 org/apache/spark/rdd/RDD$anonfun$saveAsTextFile$2.class
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (SPARK-2075) Hadoop1 distribution of 1.0.0 does not contain classes expected by the Maven 1.0.0 artifact

2014-06-09 Thread Paul R. Brown (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14025296#comment-14025296
 ] 

Paul R. Brown edited comment on SPARK-2075 at 6/9/14 3:54 PM:
--

As food for thought, 
[here|http://docs.oracle.com/javase/specs/jvms/se7/html/jvms-4.html#jvms-4.7.6] 
is the {{InnerClass}} section of the JVM spec.  It looks like there have been 
some changes from 2.10.3 to 2.10.4 (e.g., 
[SI-6546|https://issues.scala-lang.org/browse/SI-6546]), but I didn't dig in.

I think the thing most likely to work is to ensure that exactly the same bits 
are used by all of the distributions and posted to Maven Central.  (For some 
discussion on inner class naming stability, there was quite a bit of it on the 
Java 8 lambda discussion list, e.g., [this 
message|http://mail.openjdk.java.net/pipermail/lambda-spec-experts/2013-July/000316.html].)


was (Author: paulrbrown):
As food for thought, 
[here|http://docs.oracle.com/javase/specs/jvms/se7/html/jvms-4.html#jvms-4.7.6] 
is the {{InnerClass}} section of the JVM spec.  It looks like there have been 
some changes from 2.10.3 to 2.10.4 (e.g., 
[SI-6546|https://issues.scala-lang.org/browse/SI-6546], but I didn't dig in.

I think the thing most likely to work is to ensure that exactly the same bits 
are used by all of the distributions and posted to Maven Central.  (For some 
discussion on inner class naming stability, there was quite a bit of it on the 
Java 8 lambda discussion list, e.g., [this 
message|http://mail.openjdk.java.net/pipermail/lambda-spec-experts/2013-July/000316.html].)

 Hadoop1 distribution of 1.0.0 does not contain classes expected by the Maven 
 1.0.0 artifact
 ---

 Key: SPARK-2075
 URL: https://issues.apache.org/jira/browse/SPARK-2075
 Project: Spark
  Issue Type: Bug
  Components: Build, Spark Core
Affects Versions: 1.0.0
Reporter: Paul R. Brown

 Running a job built against the Maven dep for 1.0.0 and the hadoop1 
 distribution produces:
 {code}
 java.lang.ClassNotFoundException:
 org.apache.spark.rdd.RDD$$anonfun$saveAsTextFile$1
 {code}
 Here's what's in the Maven dep as of 1.0.0:
 {code}
 jar tvf 
 ~/.m2/repository/org/apache/spark/spark-core_2.10/1.0.0/spark-core_2.10-1.0.0.jar
  | grep 'rdd/RDD' | grep 'saveAs'
   1519 Mon May 26 13:57:58 PDT 2014 
 org/apache/spark/rdd/RDD$anonfun$saveAsTextFile$1.class
   1560 Mon May 26 13:57:58 PDT 2014 
 org/apache/spark/rdd/RDD$anonfun$saveAsTextFile$2.class
 {code}
 And here's what's in the hadoop1 distribution:
 {code}
 jar tvf spark-assembly-1.0.0-hadoop1.0.4.jar| grep 'rdd/RDD' | grep 'saveAs'
 {code}
 I.e., it's not there.  It is in the hadoop2 distribution:
 {code}
 jar tvf spark-assembly-1.0.0-hadoop2.2.0.jar| grep 'rdd/RDD' | grep 'saveAs'
   1519 Mon May 26 07:29:54 PDT 2014 
 org/apache/spark/rdd/RDD$anonfun$saveAsTextFile$1.class
   1560 Mon May 26 07:29:54 PDT 2014 
 org/apache/spark/rdd/RDD$anonfun$saveAsTextFile$2.class
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (SPARK-2075) Hadoop1 distribution of 1.0.0 does not contain classes expected by the Maven 1.0.0 artifact

2014-06-08 Thread Patrick Wendell (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14021465#comment-14021465
 ] 

Patrick Wendell edited comment on SPARK-2075 at 6/8/14 10:13 PM:
-

Okay I did some more digging. I think the issue is that the anonymous classes 
used by saveAsTextFile are not guaranteed to be compiled to the same name every 
time you compile them in Scala. In the Hadoop 1 build these end up being 
shortened wheras in the Hadoop 2 build they use the longer names. 
saveAsTextFile seems to, strangely, be the only affected function. I confirmed 
this by looking at the difference in the hadoop 1 and 2 jars:

{code}
$ jar tvf spark-1.0.0-bin-hadoop1/lib/spark-assembly-1.0.0-hadoop*.jar |grep 
rdd\/RDD\\$ | awk '{ print $8;}' | sort  hadoop1
$ jar tvf spark-1.0.0-bin-hadoop2/lib/spark-assembly-1.0.0-hadoop*.jar |grep 
rdd\/RDD\\$ | awk '{ print $8;}' | sort  hadoop2
$ diff hadoop1 hadoop2
23a24
 org/apache/spark/rdd/RDD$$anonfun$28$$anonfun$apply$13.class
27,29d27
 org/apache/spark/rdd/RDD$$anonfun$30$$anonfun$apply$13.class
 org/apache/spark/rdd/RDD$$anonfun$30.class
 org/apache/spark/rdd/RDD$$anonfun$31.class
90a89,90
 org/apache/spark/rdd/RDD$$anonfun$saveAsTextFile$1.class
 org/apache/spark/rdd/RDD$$anonfun$saveAsTextFile$2.class
{code}

This strangely only seems to affect the saveAsTextFile function.

I'm still a bit confused though because I didn't think these anonymous classes 
would show up in the byte code of the user application, so I don't think it 
should matter (i.e. this is why Scala probably allows this).

{code}
javap RDD | grep saveAsText
  public void saveAsTextFile(java.lang.String);
  public void saveAsTextFile(java.lang.String, java.lang.Class? extends 
org.apache.hadoop.io.compress.CompressionCodec);
{code}

[~paulrbrown] could you explain how you are bundling and submitting your 
application to the Spark cluster?


was (Author: pwendell):
Okay I did some more digging. I think the issue is that the anonymous classes 
used by saveAsTextFile are not guaranteed to be compiled to the same name every 
time you compile them in Scala. In the Hadoop 1 build these end up being 
shortened wheras in the Hadoop 2 build they use the longer names. 
saveAsTextFile seems to, strangely, be the only affected function. I confirmed 
this by looking at the difference in the hadoop 1 and 2 jars:

{code}
$ jar tvf spark-1.0.0-bin-hadoop1/lib/spark-assembly-1.0.0-hadoop*.jar |grep 
rdd\/RDD\\$ | awk '{ print $8;}' | sort  hadoop1
$ jar tvf spark-1.0.0-bin-hadoop2/lib/spark-assembly-1.0.0-hadoop*.jar |grep 
rdd\/RDD\\$ | awk '{ print $8;}' | sort  hadoop2
$ diff hadoop1 hadoop2
23a24
 org/apache/spark/rdd/RDD$$anonfun$28$$anonfun$apply$13.class
27,29d27
 org/apache/spark/rdd/RDD$$anonfun$30$$anonfun$apply$13.class
 org/apache/spark/rdd/RDD$$anonfun$30.class
 org/apache/spark/rdd/RDD$$anonfun$31.class
90a89,90
 org/apache/spark/rdd/RDD$$anonfun$saveAsTextFile$1.class
 org/apache/spark/rdd/RDD$$anonfun$saveAsTextFile$2.class
{code}

This strangely only seems to affect the saveAsTextFile function.

I'm still a bit confused though because I didn't think these anonymous classes 
would show up in the byte code of the user application, so I don't think it 
should matter (i.e. this is why Scala probably allows this).

[~paulrbrown] could you explain how you are bundling and submitting your 
application to the Spark cluster?

 Hadoop1 distribution of 1.0.0 does not contain classes expected by the Maven 
 1.0.0 artifact
 ---

 Key: SPARK-2075
 URL: https://issues.apache.org/jira/browse/SPARK-2075
 Project: Spark
  Issue Type: Bug
  Components: Build, Spark Core
Affects Versions: 1.0.0
Reporter: Paul R. Brown

 Running a job built against the Maven dep for 1.0.0 and the hadoop1 
 distribution produces:
 {code}
 java.lang.ClassNotFoundException:
 org.apache.spark.rdd.RDD$$anonfun$saveAsTextFile$1
 {code}
 Here's what's in the Maven dep as of 1.0.0:
 {code}
 jar tvf 
 ~/.m2/repository/org/apache/spark/spark-core_2.10/1.0.0/spark-core_2.10-1.0.0.jar
  | grep 'rdd/RDD' | grep 'saveAs'
   1519 Mon May 26 13:57:58 PDT 2014 
 org/apache/spark/rdd/RDD$anonfun$saveAsTextFile$1.class
   1560 Mon May 26 13:57:58 PDT 2014 
 org/apache/spark/rdd/RDD$anonfun$saveAsTextFile$2.class
 {code}
 And here's what's in the hadoop1 distribution:
 {code}
 jar tvf spark-assembly-1.0.0-hadoop1.0.4.jar| grep 'rdd/RDD' | grep 'saveAs'
 {code}
 I.e., it's not there.  It is in the hadoop2 distribution:
 {code}
 jar tvf spark-assembly-1.0.0-hadoop2.2.0.jar| grep 'rdd/RDD' | grep 'saveAs'
   1519 Mon May 26 07:29:54 PDT 2014 
 org/apache/spark/rdd/RDD$anonfun$saveAsTextFile$1.class
   1560 Mon May 26 07:29:54 PDT 2014