[jira] [Comment Edited] (SPARK-2075) Hadoop1 distribution of 1.0.0 does not contain classes expected by the Maven 1.0.0 artifact
[ https://issues.apache.org/jira/browse/SPARK-2075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14025296#comment-14025296 ] Paul R. Brown edited comment on SPARK-2075 at 6/9/14 3:36 PM: -- As food for thought, [here|http://docs.oracle.com/javase/specs/jvms/se7/html/jvms-4.html#jvms-4.7.6] is the {{code}}InnerClass{{code}} section of the JVM spec. It looks like there have been some changes from 2.10.3 to 2.10.4 (e.g., [SI-6546|https://issues.scala-lang.org/browse/SI-6546], but I didn't dig in. I think the thing most likely to work is to ensure that exactly the same bits are used by all of the distributions and posted to Maven Central. (For some discussion on inner class naming stability, there was quite a bit of it on the Java 8 lambda discussion list, e.g., [this message|http://mail.openjdk.java.net/pipermail/lambda-spec-experts/2013-July/000316.html].) was (Author: paulrbrown): As food for thought, [here|http://docs.oracle.com/javase/specs/jvms/se7/html/jvms-4.html#jvms-4.7.6] is the {code}InnerClass{code} section of the JVM spec. It looks like there have been some changes from 2.10.3 to 2.10.4 (e.g., [SI-6546|https://issues.scala-lang.org/browse/SI-6546], but I didn't dig in. I think the thing most likely to work is to ensure that exactly the same bits are used by all of the distributions and posted to Maven Central. (For some discussion on inner class naming stability, there was quite a bit of it on the Java 8 lambda discussion list, e.g., [this message|http://mail.openjdk.java.net/pipermail/lambda-spec-experts/2013-July/000316.html].) Hadoop1 distribution of 1.0.0 does not contain classes expected by the Maven 1.0.0 artifact --- Key: SPARK-2075 URL: https://issues.apache.org/jira/browse/SPARK-2075 Project: Spark Issue Type: Bug Components: Build, Spark Core Affects Versions: 1.0.0 Reporter: Paul R. Brown Running a job built against the Maven dep for 1.0.0 and the hadoop1 distribution produces: {code} java.lang.ClassNotFoundException: org.apache.spark.rdd.RDD$$anonfun$saveAsTextFile$1 {code} Here's what's in the Maven dep as of 1.0.0: {code} jar tvf ~/.m2/repository/org/apache/spark/spark-core_2.10/1.0.0/spark-core_2.10-1.0.0.jar | grep 'rdd/RDD' | grep 'saveAs' 1519 Mon May 26 13:57:58 PDT 2014 org/apache/spark/rdd/RDD$anonfun$saveAsTextFile$1.class 1560 Mon May 26 13:57:58 PDT 2014 org/apache/spark/rdd/RDD$anonfun$saveAsTextFile$2.class {code} And here's what's in the hadoop1 distribution: {code} jar tvf spark-assembly-1.0.0-hadoop1.0.4.jar| grep 'rdd/RDD' | grep 'saveAs' {code} I.e., it's not there. It is in the hadoop2 distribution: {code} jar tvf spark-assembly-1.0.0-hadoop2.2.0.jar| grep 'rdd/RDD' | grep 'saveAs' 1519 Mon May 26 07:29:54 PDT 2014 org/apache/spark/rdd/RDD$anonfun$saveAsTextFile$1.class 1560 Mon May 26 07:29:54 PDT 2014 org/apache/spark/rdd/RDD$anonfun$saveAsTextFile$2.class {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Comment Edited] (SPARK-2075) Hadoop1 distribution of 1.0.0 does not contain classes expected by the Maven 1.0.0 artifact
[ https://issues.apache.org/jira/browse/SPARK-2075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14025296#comment-14025296 ] Paul R. Brown edited comment on SPARK-2075 at 6/9/14 3:37 PM: -- As food for thought, [here|http://docs.oracle.com/javase/specs/jvms/se7/html/jvms-4.html#jvms-4.7.6] is the {{InnerClass}} section of the JVM spec. It looks like there have been some changes from 2.10.3 to 2.10.4 (e.g., [SI-6546|https://issues.scala-lang.org/browse/SI-6546], but I didn't dig in. I think the thing most likely to work is to ensure that exactly the same bits are used by all of the distributions and posted to Maven Central. (For some discussion on inner class naming stability, there was quite a bit of it on the Java 8 lambda discussion list, e.g., [this message|http://mail.openjdk.java.net/pipermail/lambda-spec-experts/2013-July/000316.html].) was (Author: paulrbrown): As food for thought, [here|http://docs.oracle.com/javase/specs/jvms/se7/html/jvms-4.html#jvms-4.7.6] is the {{code}}InnerClass{{code}} section of the JVM spec. It looks like there have been some changes from 2.10.3 to 2.10.4 (e.g., [SI-6546|https://issues.scala-lang.org/browse/SI-6546], but I didn't dig in. I think the thing most likely to work is to ensure that exactly the same bits are used by all of the distributions and posted to Maven Central. (For some discussion on inner class naming stability, there was quite a bit of it on the Java 8 lambda discussion list, e.g., [this message|http://mail.openjdk.java.net/pipermail/lambda-spec-experts/2013-July/000316.html].) Hadoop1 distribution of 1.0.0 does not contain classes expected by the Maven 1.0.0 artifact --- Key: SPARK-2075 URL: https://issues.apache.org/jira/browse/SPARK-2075 Project: Spark Issue Type: Bug Components: Build, Spark Core Affects Versions: 1.0.0 Reporter: Paul R. Brown Running a job built against the Maven dep for 1.0.0 and the hadoop1 distribution produces: {code} java.lang.ClassNotFoundException: org.apache.spark.rdd.RDD$$anonfun$saveAsTextFile$1 {code} Here's what's in the Maven dep as of 1.0.0: {code} jar tvf ~/.m2/repository/org/apache/spark/spark-core_2.10/1.0.0/spark-core_2.10-1.0.0.jar | grep 'rdd/RDD' | grep 'saveAs' 1519 Mon May 26 13:57:58 PDT 2014 org/apache/spark/rdd/RDD$anonfun$saveAsTextFile$1.class 1560 Mon May 26 13:57:58 PDT 2014 org/apache/spark/rdd/RDD$anonfun$saveAsTextFile$2.class {code} And here's what's in the hadoop1 distribution: {code} jar tvf spark-assembly-1.0.0-hadoop1.0.4.jar| grep 'rdd/RDD' | grep 'saveAs' {code} I.e., it's not there. It is in the hadoop2 distribution: {code} jar tvf spark-assembly-1.0.0-hadoop2.2.0.jar| grep 'rdd/RDD' | grep 'saveAs' 1519 Mon May 26 07:29:54 PDT 2014 org/apache/spark/rdd/RDD$anonfun$saveAsTextFile$1.class 1560 Mon May 26 07:29:54 PDT 2014 org/apache/spark/rdd/RDD$anonfun$saveAsTextFile$2.class {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Comment Edited] (SPARK-2075) Hadoop1 distribution of 1.0.0 does not contain classes expected by the Maven 1.0.0 artifact
[ https://issues.apache.org/jira/browse/SPARK-2075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14025296#comment-14025296 ] Paul R. Brown edited comment on SPARK-2075 at 6/9/14 3:54 PM: -- As food for thought, [here|http://docs.oracle.com/javase/specs/jvms/se7/html/jvms-4.html#jvms-4.7.6] is the {{InnerClass}} section of the JVM spec. It looks like there have been some changes from 2.10.3 to 2.10.4 (e.g., [SI-6546|https://issues.scala-lang.org/browse/SI-6546]), but I didn't dig in. I think the thing most likely to work is to ensure that exactly the same bits are used by all of the distributions and posted to Maven Central. (For some discussion on inner class naming stability, there was quite a bit of it on the Java 8 lambda discussion list, e.g., [this message|http://mail.openjdk.java.net/pipermail/lambda-spec-experts/2013-July/000316.html].) was (Author: paulrbrown): As food for thought, [here|http://docs.oracle.com/javase/specs/jvms/se7/html/jvms-4.html#jvms-4.7.6] is the {{InnerClass}} section of the JVM spec. It looks like there have been some changes from 2.10.3 to 2.10.4 (e.g., [SI-6546|https://issues.scala-lang.org/browse/SI-6546], but I didn't dig in. I think the thing most likely to work is to ensure that exactly the same bits are used by all of the distributions and posted to Maven Central. (For some discussion on inner class naming stability, there was quite a bit of it on the Java 8 lambda discussion list, e.g., [this message|http://mail.openjdk.java.net/pipermail/lambda-spec-experts/2013-July/000316.html].) Hadoop1 distribution of 1.0.0 does not contain classes expected by the Maven 1.0.0 artifact --- Key: SPARK-2075 URL: https://issues.apache.org/jira/browse/SPARK-2075 Project: Spark Issue Type: Bug Components: Build, Spark Core Affects Versions: 1.0.0 Reporter: Paul R. Brown Running a job built against the Maven dep for 1.0.0 and the hadoop1 distribution produces: {code} java.lang.ClassNotFoundException: org.apache.spark.rdd.RDD$$anonfun$saveAsTextFile$1 {code} Here's what's in the Maven dep as of 1.0.0: {code} jar tvf ~/.m2/repository/org/apache/spark/spark-core_2.10/1.0.0/spark-core_2.10-1.0.0.jar | grep 'rdd/RDD' | grep 'saveAs' 1519 Mon May 26 13:57:58 PDT 2014 org/apache/spark/rdd/RDD$anonfun$saveAsTextFile$1.class 1560 Mon May 26 13:57:58 PDT 2014 org/apache/spark/rdd/RDD$anonfun$saveAsTextFile$2.class {code} And here's what's in the hadoop1 distribution: {code} jar tvf spark-assembly-1.0.0-hadoop1.0.4.jar| grep 'rdd/RDD' | grep 'saveAs' {code} I.e., it's not there. It is in the hadoop2 distribution: {code} jar tvf spark-assembly-1.0.0-hadoop2.2.0.jar| grep 'rdd/RDD' | grep 'saveAs' 1519 Mon May 26 07:29:54 PDT 2014 org/apache/spark/rdd/RDD$anonfun$saveAsTextFile$1.class 1560 Mon May 26 07:29:54 PDT 2014 org/apache/spark/rdd/RDD$anonfun$saveAsTextFile$2.class {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Comment Edited] (SPARK-2075) Hadoop1 distribution of 1.0.0 does not contain classes expected by the Maven 1.0.0 artifact
[ https://issues.apache.org/jira/browse/SPARK-2075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14021465#comment-14021465 ] Patrick Wendell edited comment on SPARK-2075 at 6/8/14 10:13 PM: - Okay I did some more digging. I think the issue is that the anonymous classes used by saveAsTextFile are not guaranteed to be compiled to the same name every time you compile them in Scala. In the Hadoop 1 build these end up being shortened wheras in the Hadoop 2 build they use the longer names. saveAsTextFile seems to, strangely, be the only affected function. I confirmed this by looking at the difference in the hadoop 1 and 2 jars: {code} $ jar tvf spark-1.0.0-bin-hadoop1/lib/spark-assembly-1.0.0-hadoop*.jar |grep rdd\/RDD\\$ | awk '{ print $8;}' | sort hadoop1 $ jar tvf spark-1.0.0-bin-hadoop2/lib/spark-assembly-1.0.0-hadoop*.jar |grep rdd\/RDD\\$ | awk '{ print $8;}' | sort hadoop2 $ diff hadoop1 hadoop2 23a24 org/apache/spark/rdd/RDD$$anonfun$28$$anonfun$apply$13.class 27,29d27 org/apache/spark/rdd/RDD$$anonfun$30$$anonfun$apply$13.class org/apache/spark/rdd/RDD$$anonfun$30.class org/apache/spark/rdd/RDD$$anonfun$31.class 90a89,90 org/apache/spark/rdd/RDD$$anonfun$saveAsTextFile$1.class org/apache/spark/rdd/RDD$$anonfun$saveAsTextFile$2.class {code} This strangely only seems to affect the saveAsTextFile function. I'm still a bit confused though because I didn't think these anonymous classes would show up in the byte code of the user application, so I don't think it should matter (i.e. this is why Scala probably allows this). {code} javap RDD | grep saveAsText public void saveAsTextFile(java.lang.String); public void saveAsTextFile(java.lang.String, java.lang.Class? extends org.apache.hadoop.io.compress.CompressionCodec); {code} [~paulrbrown] could you explain how you are bundling and submitting your application to the Spark cluster? was (Author: pwendell): Okay I did some more digging. I think the issue is that the anonymous classes used by saveAsTextFile are not guaranteed to be compiled to the same name every time you compile them in Scala. In the Hadoop 1 build these end up being shortened wheras in the Hadoop 2 build they use the longer names. saveAsTextFile seems to, strangely, be the only affected function. I confirmed this by looking at the difference in the hadoop 1 and 2 jars: {code} $ jar tvf spark-1.0.0-bin-hadoop1/lib/spark-assembly-1.0.0-hadoop*.jar |grep rdd\/RDD\\$ | awk '{ print $8;}' | sort hadoop1 $ jar tvf spark-1.0.0-bin-hadoop2/lib/spark-assembly-1.0.0-hadoop*.jar |grep rdd\/RDD\\$ | awk '{ print $8;}' | sort hadoop2 $ diff hadoop1 hadoop2 23a24 org/apache/spark/rdd/RDD$$anonfun$28$$anonfun$apply$13.class 27,29d27 org/apache/spark/rdd/RDD$$anonfun$30$$anonfun$apply$13.class org/apache/spark/rdd/RDD$$anonfun$30.class org/apache/spark/rdd/RDD$$anonfun$31.class 90a89,90 org/apache/spark/rdd/RDD$$anonfun$saveAsTextFile$1.class org/apache/spark/rdd/RDD$$anonfun$saveAsTextFile$2.class {code} This strangely only seems to affect the saveAsTextFile function. I'm still a bit confused though because I didn't think these anonymous classes would show up in the byte code of the user application, so I don't think it should matter (i.e. this is why Scala probably allows this). [~paulrbrown] could you explain how you are bundling and submitting your application to the Spark cluster? Hadoop1 distribution of 1.0.0 does not contain classes expected by the Maven 1.0.0 artifact --- Key: SPARK-2075 URL: https://issues.apache.org/jira/browse/SPARK-2075 Project: Spark Issue Type: Bug Components: Build, Spark Core Affects Versions: 1.0.0 Reporter: Paul R. Brown Running a job built against the Maven dep for 1.0.0 and the hadoop1 distribution produces: {code} java.lang.ClassNotFoundException: org.apache.spark.rdd.RDD$$anonfun$saveAsTextFile$1 {code} Here's what's in the Maven dep as of 1.0.0: {code} jar tvf ~/.m2/repository/org/apache/spark/spark-core_2.10/1.0.0/spark-core_2.10-1.0.0.jar | grep 'rdd/RDD' | grep 'saveAs' 1519 Mon May 26 13:57:58 PDT 2014 org/apache/spark/rdd/RDD$anonfun$saveAsTextFile$1.class 1560 Mon May 26 13:57:58 PDT 2014 org/apache/spark/rdd/RDD$anonfun$saveAsTextFile$2.class {code} And here's what's in the hadoop1 distribution: {code} jar tvf spark-assembly-1.0.0-hadoop1.0.4.jar| grep 'rdd/RDD' | grep 'saveAs' {code} I.e., it's not there. It is in the hadoop2 distribution: {code} jar tvf spark-assembly-1.0.0-hadoop2.2.0.jar| grep 'rdd/RDD' | grep 'saveAs' 1519 Mon May 26 07:29:54 PDT 2014 org/apache/spark/rdd/RDD$anonfun$saveAsTextFile$1.class 1560 Mon May 26 07:29:54 PDT 2014