[jira] [Commented] (SPARK-1464) Update MLLib Examples to Use Breeze
[ https://issues.apache.org/jira/browse/SPARK-1464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13972342#comment-13972342 ] Sean Owen commented on SPARK-1464: -- This is a duplicate of https://issues.apache.org/jira/browse/SPARK-1462 which is resolved now. Update MLLib Examples to Use Breeze --- Key: SPARK-1464 URL: https://issues.apache.org/jira/browse/SPARK-1464 Project: Spark Issue Type: Task Components: MLlib Reporter: Patrick Wendell Assignee: Xiangrui Meng Priority: Blocker Fix For: 1.0.0 If we want to deprecate the vector class we need to update all of the examples to use Breeze. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (SPARK-1520) Spark assembly fails with Java 6
Patrick Wendell created SPARK-1520: -- Summary: Spark assembly fails with Java 6 Key: SPARK-1520 URL: https://issues.apache.org/jira/browse/SPARK-1520 Project: Spark Issue Type: Bug Reporter: Patrick Wendell Priority: Blocker Fix For: 1.0.0 This is a real doozie - when compiling a Spark assembly with JDK7, the produced jar does not work well with JRE6. I confirmed the byte code being produced is JDK 6 compatible (major version 50). What happens is that, silently, the JRE will not load any class files from the assembled jar. {code} patrick@patrick-t430s: sbt/sbt assembly/assembly patrick@patrick-t430s:~/Documents/spark$ /usr/lib/jvm/java-1.7.0-openjdk-amd64/bin/java -cp /home/patrick/Documents/spark/assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar org.apache.spark.ui.UIWorkloadGenerator usage: ./bin/spark-class org.apache.spark.ui.UIWorkloadGenerator [master] [FIFO|FAIR] patrick@patrick-t430s:~/Documents/spark$ /usr/lib/jvm/java-1.6.0-openjdk-amd64/bin/java -cp /home/patrick/Documents/spark/assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar org.apache.spark.ui.UIWorkloadGenerator Exception in thread main java.lang.NoClassDefFoundError: org/apache/spark/ui/UIWorkloadGenerator Caused by: java.lang.ClassNotFoundException: org.apache.spark.ui.UIWorkloadGenerator at java.net.URLClassLoader$1.run(URLClassLoader.java:217) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:205) at java.lang.ClassLoader.loadClass(ClassLoader.java:323) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294) at java.lang.ClassLoader.loadClass(ClassLoader.java:268) Could not find the main class: org.apache.spark.ui.UIWorkloadGenerator. Program will exit. {code} I also noticed that if the jar is unzipped, and the classpath set to the currently directory, it just works. Finally, if the assembly jar is compiled with JDK6, it also works. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-1520) Spark assembly fails with Java 6
[ https://issues.apache.org/jira/browse/SPARK-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-1520: --- Description: This is a real doozie - when compiling a Spark assembly with JDK7, the produced jar does not work well with JRE6. I confirmed the byte code being produced is JDK 6 compatible (major version 50). What happens is that, silently, the JRE will not load any class files from the assembled jar. {code} $ sbt/sbt assembly/assembly $ /usr/lib/jvm/java-1.7.0-openjdk-amd64/bin/java -cp /home/patrick/Documents/spark/assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar org.apache.spark.ui.UIWorkloadGenerator usage: ./bin/spark-class org.apache.spark.ui.UIWorkloadGenerator [master] [FIFO|FAIR] $ /usr/lib/jvm/java-1.6.0-openjdk-amd64/bin/java -cp /home/patrick/Documents/spark/assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar org.apache.spark.ui.UIWorkloadGenerator Exception in thread main java.lang.NoClassDefFoundError: org/apache/spark/ui/UIWorkloadGenerator Caused by: java.lang.ClassNotFoundException: org.apache.spark.ui.UIWorkloadGenerator at java.net.URLClassLoader$1.run(URLClassLoader.java:217) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:205) at java.lang.ClassLoader.loadClass(ClassLoader.java:323) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294) at java.lang.ClassLoader.loadClass(ClassLoader.java:268) Could not find the main class: org.apache.spark.ui.UIWorkloadGenerator. Program will exit. {code} I also noticed that if the jar is unzipped, and the classpath set to the currently directory, it just works. Finally, if the assembly jar is compiled with JDK6, it also works. was: This is a real doozie - when compiling a Spark assembly with JDK7, the produced jar does not work well with JRE6. I confirmed the byte code being produced is JDK 6 compatible (major version 50). What happens is that, silently, the JRE will not load any class files from the assembled jar. {code} patrick@patrick-t430s: sbt/sbt assembly/assembly patrick@patrick-t430s:~/Documents/spark$ /usr/lib/jvm/java-1.7.0-openjdk-amd64/bin/java -cp /home/patrick/Documents/spark/assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar org.apache.spark.ui.UIWorkloadGenerator usage: ./bin/spark-class org.apache.spark.ui.UIWorkloadGenerator [master] [FIFO|FAIR] patrick@patrick-t430s:~/Documents/spark$ /usr/lib/jvm/java-1.6.0-openjdk-amd64/bin/java -cp /home/patrick/Documents/spark/assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar org.apache.spark.ui.UIWorkloadGenerator Exception in thread main java.lang.NoClassDefFoundError: org/apache/spark/ui/UIWorkloadGenerator Caused by: java.lang.ClassNotFoundException: org.apache.spark.ui.UIWorkloadGenerator at java.net.URLClassLoader$1.run(URLClassLoader.java:217) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:205) at java.lang.ClassLoader.loadClass(ClassLoader.java:323) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294) at java.lang.ClassLoader.loadClass(ClassLoader.java:268) Could not find the main class: org.apache.spark.ui.UIWorkloadGenerator. Program will exit. {code} I also noticed that if the jar is unzipped, and the classpath set to the currently directory, it just works. Finally, if the assembly jar is compiled with JDK6, it also works. Spark assembly fails with Java 6 Key: SPARK-1520 URL: https://issues.apache.org/jira/browse/SPARK-1520 Project: Spark Issue Type: Bug Reporter: Patrick Wendell Priority: Blocker Fix For: 1.0.0 This is a real doozie - when compiling a Spark assembly with JDK7, the produced jar does not work well with JRE6. I confirmed the byte code being produced is JDK 6 compatible (major version 50). What happens is that, silently, the JRE will not load any class files from the assembled jar. {code} $ sbt/sbt assembly/assembly $ /usr/lib/jvm/java-1.7.0-openjdk-amd64/bin/java -cp /home/patrick/Documents/spark/assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar org.apache.spark.ui.UIWorkloadGenerator usage: ./bin/spark-class org.apache.spark.ui.UIWorkloadGenerator [master] [FIFO|FAIR] $ /usr/lib/jvm/java-1.6.0-openjdk-amd64/bin/java -cp /home/patrick/Documents/spark/assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar org.apache.spark.ui.UIWorkloadGenerator Exception in thread main java.lang.NoClassDefFoundError: org/apache/spark/ui/UIWorkloadGenerator Caused by: java.lang.ClassNotFoundException:
[jira] [Updated] (SPARK-1520) Spark assembly fails with JRE6 for unknown reason
[ https://issues.apache.org/jira/browse/SPARK-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-1520: --- Summary: Spark assembly fails with JRE6 for unknown reason (was: Spark assembly fails with Java 6) Spark assembly fails with JRE6 for unknown reason - Key: SPARK-1520 URL: https://issues.apache.org/jira/browse/SPARK-1520 Project: Spark Issue Type: Bug Reporter: Patrick Wendell Priority: Blocker Fix For: 1.0.0 This is a real doozie - when compiling a Spark assembly with JDK7, the produced jar does not work well with JRE6. I confirmed the byte code being produced is JDK 6 compatible (major version 50). What happens is that, silently, the JRE will not load any class files from the assembled jar. {code} $ sbt/sbt assembly/assembly $ /usr/lib/jvm/java-1.7.0-openjdk-amd64/bin/java -cp /home/patrick/Documents/spark/assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar org.apache.spark.ui.UIWorkloadGenerator usage: ./bin/spark-class org.apache.spark.ui.UIWorkloadGenerator [master] [FIFO|FAIR] $ /usr/lib/jvm/java-1.6.0-openjdk-amd64/bin/java -cp /home/patrick/Documents/spark/assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar org.apache.spark.ui.UIWorkloadGenerator Exception in thread main java.lang.NoClassDefFoundError: org/apache/spark/ui/UIWorkloadGenerator Caused by: java.lang.ClassNotFoundException: org.apache.spark.ui.UIWorkloadGenerator at java.net.URLClassLoader$1.run(URLClassLoader.java:217) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:205) at java.lang.ClassLoader.loadClass(ClassLoader.java:323) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294) at java.lang.ClassLoader.loadClass(ClassLoader.java:268) Could not find the main class: org.apache.spark.ui.UIWorkloadGenerator. Program will exit. {code} I also noticed that if the jar is unzipped, and the classpath set to the currently directory, it just works. Finally, if the assembly jar is compiled with JDK6, it also works. The error is seen with any class, not just the UIWorkloadGenerator. Also, this error doesn't exist in branch 0.9, only in master. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-1520) Spark assembly fails with Java 6
[ https://issues.apache.org/jira/browse/SPARK-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-1520: --- Description: This is a real doozie - when compiling a Spark assembly with JDK7, the produced jar does not work well with JRE6. I confirmed the byte code being produced is JDK 6 compatible (major version 50). What happens is that, silently, the JRE will not load any class files from the assembled jar. {code} $ sbt/sbt assembly/assembly $ /usr/lib/jvm/java-1.7.0-openjdk-amd64/bin/java -cp /home/patrick/Documents/spark/assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar org.apache.spark.ui.UIWorkloadGenerator usage: ./bin/spark-class org.apache.spark.ui.UIWorkloadGenerator [master] [FIFO|FAIR] $ /usr/lib/jvm/java-1.6.0-openjdk-amd64/bin/java -cp /home/patrick/Documents/spark/assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar org.apache.spark.ui.UIWorkloadGenerator Exception in thread main java.lang.NoClassDefFoundError: org/apache/spark/ui/UIWorkloadGenerator Caused by: java.lang.ClassNotFoundException: org.apache.spark.ui.UIWorkloadGenerator at java.net.URLClassLoader$1.run(URLClassLoader.java:217) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:205) at java.lang.ClassLoader.loadClass(ClassLoader.java:323) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294) at java.lang.ClassLoader.loadClass(ClassLoader.java:268) Could not find the main class: org.apache.spark.ui.UIWorkloadGenerator. Program will exit. {code} I also noticed that if the jar is unzipped, and the classpath set to the currently directory, it just works. Finally, if the assembly jar is compiled with JDK6, it also works. The error is seen with any class, not just the UIWorkloadGenerator. Also, this error doesn't exist in branch 0.9, only in master. was: This is a real doozie - when compiling a Spark assembly with JDK7, the produced jar does not work well with JRE6. I confirmed the byte code being produced is JDK 6 compatible (major version 50). What happens is that, silently, the JRE will not load any class files from the assembled jar. {code} $ sbt/sbt assembly/assembly $ /usr/lib/jvm/java-1.7.0-openjdk-amd64/bin/java -cp /home/patrick/Documents/spark/assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar org.apache.spark.ui.UIWorkloadGenerator usage: ./bin/spark-class org.apache.spark.ui.UIWorkloadGenerator [master] [FIFO|FAIR] $ /usr/lib/jvm/java-1.6.0-openjdk-amd64/bin/java -cp /home/patrick/Documents/spark/assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar org.apache.spark.ui.UIWorkloadGenerator Exception in thread main java.lang.NoClassDefFoundError: org/apache/spark/ui/UIWorkloadGenerator Caused by: java.lang.ClassNotFoundException: org.apache.spark.ui.UIWorkloadGenerator at java.net.URLClassLoader$1.run(URLClassLoader.java:217) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:205) at java.lang.ClassLoader.loadClass(ClassLoader.java:323) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294) at java.lang.ClassLoader.loadClass(ClassLoader.java:268) Could not find the main class: org.apache.spark.ui.UIWorkloadGenerator. Program will exit. {code} I also noticed that if the jar is unzipped, and the classpath set to the currently directory, it just works. Finally, if the assembly jar is compiled with JDK6, it also works. Spark assembly fails with Java 6 Key: SPARK-1520 URL: https://issues.apache.org/jira/browse/SPARK-1520 Project: Spark Issue Type: Bug Reporter: Patrick Wendell Priority: Blocker Fix For: 1.0.0 This is a real doozie - when compiling a Spark assembly with JDK7, the produced jar does not work well with JRE6. I confirmed the byte code being produced is JDK 6 compatible (major version 50). What happens is that, silently, the JRE will not load any class files from the assembled jar. {code} $ sbt/sbt assembly/assembly $ /usr/lib/jvm/java-1.7.0-openjdk-amd64/bin/java -cp /home/patrick/Documents/spark/assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar org.apache.spark.ui.UIWorkloadGenerator usage: ./bin/spark-class org.apache.spark.ui.UIWorkloadGenerator [master] [FIFO|FAIR] $ /usr/lib/jvm/java-1.6.0-openjdk-amd64/bin/java -cp /home/patrick/Documents/spark/assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar org.apache.spark.ui.UIWorkloadGenerator Exception in thread main java.lang.NoClassDefFoundError: org/apache/spark/ui/UIWorkloadGenerator Caused by:
[jira] [Updated] (SPARK-1520) Spark assembly fails with JRE6 for unknown reason
[ https://issues.apache.org/jira/browse/SPARK-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-1520: --- Component/s: MLlib Spark assembly fails with JRE6 for unknown reason - Key: SPARK-1520 URL: https://issues.apache.org/jira/browse/SPARK-1520 Project: Spark Issue Type: Bug Components: MLlib Reporter: Patrick Wendell Priority: Blocker Fix For: 1.0.0 This is a real doozie - when compiling a Spark assembly with JDK7, the produced jar does not work well with JRE6. I confirmed the byte code being produced is JDK 6 compatible (major version 50). What happens is that, silently, the JRE will not load any class files from the assembled jar. {code} $ sbt/sbt assembly/assembly $ /usr/lib/jvm/java-1.7.0-openjdk-amd64/bin/java -cp /home/patrick/Documents/spark/assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar org.apache.spark.ui.UIWorkloadGenerator usage: ./bin/spark-class org.apache.spark.ui.UIWorkloadGenerator [master] [FIFO|FAIR] $ /usr/lib/jvm/java-1.6.0-openjdk-amd64/bin/java -cp /home/patrick/Documents/spark/assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar org.apache.spark.ui.UIWorkloadGenerator Exception in thread main java.lang.NoClassDefFoundError: org/apache/spark/ui/UIWorkloadGenerator Caused by: java.lang.ClassNotFoundException: org.apache.spark.ui.UIWorkloadGenerator at java.net.URLClassLoader$1.run(URLClassLoader.java:217) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:205) at java.lang.ClassLoader.loadClass(ClassLoader.java:323) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294) at java.lang.ClassLoader.loadClass(ClassLoader.java:268) Could not find the main class: org.apache.spark.ui.UIWorkloadGenerator. Program will exit. {code} I also noticed that if the jar is unzipped, and the classpath set to the currently directory, it just works. Finally, if the assembly jar is compiled with JDK6, it also works. The error is seen with any class, not just the UIWorkloadGenerator. Also, this error doesn't exist in branch 0.9, only in master. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-1520) Spark assembly fails with JRE6 for unknown reason
[ https://issues.apache.org/jira/browse/SPARK-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-1520: --- Component/s: Spark Core Spark assembly fails with JRE6 for unknown reason - Key: SPARK-1520 URL: https://issues.apache.org/jira/browse/SPARK-1520 Project: Spark Issue Type: Bug Components: MLlib, Spark Core Reporter: Patrick Wendell Priority: Blocker Fix For: 1.0.0 This is a real doozie - when compiling a Spark assembly with JDK7, the produced jar does not work well with JRE6. I confirmed the byte code being produced is JDK 6 compatible (major version 50). What happens is that, silently, the JRE will not load any class files from the assembled jar. {code} $ sbt/sbt assembly/assembly $ /usr/lib/jvm/java-1.7.0-openjdk-amd64/bin/java -cp /home/patrick/Documents/spark/assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar org.apache.spark.ui.UIWorkloadGenerator usage: ./bin/spark-class org.apache.spark.ui.UIWorkloadGenerator [master] [FIFO|FAIR] $ /usr/lib/jvm/java-1.6.0-openjdk-amd64/bin/java -cp /home/patrick/Documents/spark/assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar org.apache.spark.ui.UIWorkloadGenerator Exception in thread main java.lang.NoClassDefFoundError: org/apache/spark/ui/UIWorkloadGenerator Caused by: java.lang.ClassNotFoundException: org.apache.spark.ui.UIWorkloadGenerator at java.net.URLClassLoader$1.run(URLClassLoader.java:217) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:205) at java.lang.ClassLoader.loadClass(ClassLoader.java:323) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294) at java.lang.ClassLoader.loadClass(ClassLoader.java:268) Could not find the main class: org.apache.spark.ui.UIWorkloadGenerator. Program will exit. {code} I also noticed that if the jar is unzipped, and the classpath set to the currently directory, it just works. Finally, if the assembly jar is compiled with JDK6, it also works. The error is seen with any class, not just the UIWorkloadGenerator. Also, this error doesn't exist in branch 0.9, only in master. I ran a git bisection and this appeared after the MLLib sparse vector patch was merged: https://github.com/apache/spark/commit/80c29689ae3b589254a571da3ddb5f9c866ae534 SPARK-1212 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-1520) Inclusion of breeze corrupts assembly when compiled with JDK7 and run on JDK6
[ https://issues.apache.org/jira/browse/SPARK-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-1520: --- Summary: Inclusion of breeze corrupts assembly when compiled with JDK7 and run on JDK6 (was: Spark assembly fails with JRE6 for unknown reason) Inclusion of breeze corrupts assembly when compiled with JDK7 and run on JDK6 - Key: SPARK-1520 URL: https://issues.apache.org/jira/browse/SPARK-1520 Project: Spark Issue Type: Bug Components: MLlib, Spark Core Reporter: Patrick Wendell Priority: Blocker Fix For: 1.0.0 This is a real doozie - when compiling a Spark assembly with JDK7, the produced jar does not work well with JRE6. I confirmed the byte code being produced is JDK 6 compatible (major version 50). What happens is that, silently, the JRE will not load any class files from the assembled jar. {code} $ sbt/sbt assembly/assembly $ /usr/lib/jvm/java-1.7.0-openjdk-amd64/bin/java -cp /home/patrick/Documents/spark/assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar org.apache.spark.ui.UIWorkloadGenerator usage: ./bin/spark-class org.apache.spark.ui.UIWorkloadGenerator [master] [FIFO|FAIR] $ /usr/lib/jvm/java-1.6.0-openjdk-amd64/bin/java -cp /home/patrick/Documents/spark/assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar org.apache.spark.ui.UIWorkloadGenerator Exception in thread main java.lang.NoClassDefFoundError: org/apache/spark/ui/UIWorkloadGenerator Caused by: java.lang.ClassNotFoundException: org.apache.spark.ui.UIWorkloadGenerator at java.net.URLClassLoader$1.run(URLClassLoader.java:217) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:205) at java.lang.ClassLoader.loadClass(ClassLoader.java:323) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294) at java.lang.ClassLoader.loadClass(ClassLoader.java:268) Could not find the main class: org.apache.spark.ui.UIWorkloadGenerator. Program will exit. {code} I also noticed that if the jar is unzipped, and the classpath set to the currently directory, it just works. Finally, if the assembly jar is compiled with JDK6, it also works. The error is seen with any class, not just the UIWorkloadGenerator. Also, this error doesn't exist in branch 0.9, only in master. I ran a git bisection and this appeared after the MLLib sparse vector patch was merged: https://github.com/apache/spark/commit/80c29689ae3b589254a571da3ddb5f9c866ae534 SPARK-1212 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-1520) Inclusion of breeze corrupts assembly when compiled with JDK7 and run on JDK6
[ https://issues.apache.org/jira/browse/SPARK-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-1520: --- Description: This is a real doozie - when compiling a Spark assembly with JDK7, the produced jar does not work well with JRE6. I confirmed the byte code being produced is JDK 6 compatible (major version 50). What happens is that, silently, the JRE will not load any class files from the assembled jar. {code} $ sbt/sbt assembly/assembly $ /usr/lib/jvm/java-1.7.0-openjdk-amd64/bin/java -cp /home/patrick/Documents/spark/assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar org.apache.spark.ui.UIWorkloadGenerator usage: ./bin/spark-class org.apache.spark.ui.UIWorkloadGenerator [master] [FIFO|FAIR] $ /usr/lib/jvm/java-1.6.0-openjdk-amd64/bin/java -cp /home/patrick/Documents/spark/assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar org.apache.spark.ui.UIWorkloadGenerator Exception in thread main java.lang.NoClassDefFoundError: org/apache/spark/ui/UIWorkloadGenerator Caused by: java.lang.ClassNotFoundException: org.apache.spark.ui.UIWorkloadGenerator at java.net.URLClassLoader$1.run(URLClassLoader.java:217) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:205) at java.lang.ClassLoader.loadClass(ClassLoader.java:323) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294) at java.lang.ClassLoader.loadClass(ClassLoader.java:268) Could not find the main class: org.apache.spark.ui.UIWorkloadGenerator. Program will exit. {code} I also noticed that if the jar is unzipped, and the classpath set to the currently directory, it just works. Finally, if the assembly jar is compiled with JDK6, it also works. The error is seen with any class, not just the UIWorkloadGenerator. Also, this error doesn't exist in branch 0.9, only in master. I ran a git bisection and this appeared after the MLLib sparse vector patch was merged: https://github.com/apache/spark/commit/80c29689ae3b589254a571da3ddb5f9c866ae534 SPARK-1212 I narrowed this down specifically to the inclusion of the breeze library. Just adding breeze to an older (unaffected) build fixed the issue. was: This is a real doozie - when compiling a Spark assembly with JDK7, the produced jar does not work well with JRE6. I confirmed the byte code being produced is JDK 6 compatible (major version 50). What happens is that, silently, the JRE will not load any class files from the assembled jar. {code} $ sbt/sbt assembly/assembly $ /usr/lib/jvm/java-1.7.0-openjdk-amd64/bin/java -cp /home/patrick/Documents/spark/assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar org.apache.spark.ui.UIWorkloadGenerator usage: ./bin/spark-class org.apache.spark.ui.UIWorkloadGenerator [master] [FIFO|FAIR] $ /usr/lib/jvm/java-1.6.0-openjdk-amd64/bin/java -cp /home/patrick/Documents/spark/assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar org.apache.spark.ui.UIWorkloadGenerator Exception in thread main java.lang.NoClassDefFoundError: org/apache/spark/ui/UIWorkloadGenerator Caused by: java.lang.ClassNotFoundException: org.apache.spark.ui.UIWorkloadGenerator at java.net.URLClassLoader$1.run(URLClassLoader.java:217) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:205) at java.lang.ClassLoader.loadClass(ClassLoader.java:323) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294) at java.lang.ClassLoader.loadClass(ClassLoader.java:268) Could not find the main class: org.apache.spark.ui.UIWorkloadGenerator. Program will exit. {code} I also noticed that if the jar is unzipped, and the classpath set to the currently directory, it just works. Finally, if the assembly jar is compiled with JDK6, it also works. The error is seen with any class, not just the UIWorkloadGenerator. Also, this error doesn't exist in branch 0.9, only in master. I ran a git bisection and this appeared after the MLLib sparse vector patch was merged: https://github.com/apache/spark/commit/80c29689ae3b589254a571da3ddb5f9c866ae534 SPARK-1212 Inclusion of breeze corrupts assembly when compiled with JDK7 and run on JDK6 - Key: SPARK-1520 URL: https://issues.apache.org/jira/browse/SPARK-1520 Project: Spark Issue Type: Bug Components: MLlib, Spark Core Reporter: Patrick Wendell Priority: Blocker Fix For: 1.0.0 This is a real doozie - when compiling a Spark assembly with JDK7, the produced jar does not work well with JRE6. I confirmed the byte code being produced is JDK 6
[jira] [Updated] (SPARK-1520) Inclusion of breeze corrupts assembly when compiled with JDK7 and run on JDK6
[ https://issues.apache.org/jira/browse/SPARK-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-1520: --- Description: This is a real doozie - when compiling a Spark assembly with JDK7, the produced jar does not work well with JRE6. I confirmed the byte code being produced is JDK 6 compatible (major version 50). What happens is that, silently, the JRE will not load any class files from the assembled jar. {code} $ sbt/sbt assembly/assembly $ /usr/lib/jvm/java-1.7.0-openjdk-amd64/bin/java -cp /home/patrick/Documents/spark/assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar org.apache.spark.ui.UIWorkloadGenerator usage: ./bin/spark-class org.apache.spark.ui.UIWorkloadGenerator [master] [FIFO|FAIR] $ /usr/lib/jvm/java-1.6.0-openjdk-amd64/bin/java -cp /home/patrick/Documents/spark/assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar org.apache.spark.ui.UIWorkloadGenerator Exception in thread main java.lang.NoClassDefFoundError: org/apache/spark/ui/UIWorkloadGenerator Caused by: java.lang.ClassNotFoundException: org.apache.spark.ui.UIWorkloadGenerator at java.net.URLClassLoader$1.run(URLClassLoader.java:217) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:205) at java.lang.ClassLoader.loadClass(ClassLoader.java:323) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294) at java.lang.ClassLoader.loadClass(ClassLoader.java:268) Could not find the main class: org.apache.spark.ui.UIWorkloadGenerator. Program will exit. {code} I also noticed that if the jar is unzipped, and the classpath set to the currently directory, it just works. Finally, if the assembly jar is compiled with JDK6, it also works. The error is seen with any class, not just the UIWorkloadGenerator. Also, this error doesn't exist in branch 0.9, only in master. *Isolation* -I ran a git bisection and this appeared after the MLLib sparse vector patch was merged:- https://github.com/apache/spark/commit/80c29689ae3b589254a571da3ddb5f9c866ae534 SPARK-1212 -I narrowed this down specifically to the inclusion of the breeze library. Just adding breeze to an older (unaffected) build triggered the issue.- I narrowed this down to one or more classes inside of breeze/linalg/operators. If this directory is deleted and the jar is re-assembled things work fine. was: This is a real doozie - when compiling a Spark assembly with JDK7, the produced jar does not work well with JRE6. I confirmed the byte code being produced is JDK 6 compatible (major version 50). What happens is that, silently, the JRE will not load any class files from the assembled jar. {code} $ sbt/sbt assembly/assembly $ /usr/lib/jvm/java-1.7.0-openjdk-amd64/bin/java -cp /home/patrick/Documents/spark/assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar org.apache.spark.ui.UIWorkloadGenerator usage: ./bin/spark-class org.apache.spark.ui.UIWorkloadGenerator [master] [FIFO|FAIR] $ /usr/lib/jvm/java-1.6.0-openjdk-amd64/bin/java -cp /home/patrick/Documents/spark/assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar org.apache.spark.ui.UIWorkloadGenerator Exception in thread main java.lang.NoClassDefFoundError: org/apache/spark/ui/UIWorkloadGenerator Caused by: java.lang.ClassNotFoundException: org.apache.spark.ui.UIWorkloadGenerator at java.net.URLClassLoader$1.run(URLClassLoader.java:217) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:205) at java.lang.ClassLoader.loadClass(ClassLoader.java:323) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294) at java.lang.ClassLoader.loadClass(ClassLoader.java:268) Could not find the main class: org.apache.spark.ui.UIWorkloadGenerator. Program will exit. {code} I also noticed that if the jar is unzipped, and the classpath set to the currently directory, it just works. Finally, if the assembly jar is compiled with JDK6, it also works. The error is seen with any class, not just the UIWorkloadGenerator. Also, this error doesn't exist in branch 0.9, only in master. I ran a git bisection and this appeared after the MLLib sparse vector patch was merged: https://github.com/apache/spark/commit/80c29689ae3b589254a571da3ddb5f9c866ae534 SPARK-1212 I narrowed this down specifically to the inclusion of the breeze library. Just adding breeze to an older (unaffected) build triggered the issue. Inclusion of breeze corrupts assembly when compiled with JDK7 and run on JDK6 - Key: SPARK-1520 URL: https://issues.apache.org/jira/browse/SPARK-1520 Project: Spark Issue
[jira] [Created] (SPARK-1521) Take character set size into account when compressing in-memory string columns
Cheng Lian created SPARK-1521: - Summary: Take character set size into account when compressing in-memory string columns Key: SPARK-1521 URL: https://issues.apache.org/jira/browse/SPARK-1521 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 1.1.0 Reporter: Cheng Lian Quoted from [a blog post|https://code.facebook.com/posts/229861827208629/scaling-the-facebook-data-warehouse-to-300-pb/] from Facebook: bq. Strings dominate the largest tables in our warehouse and make up about 80% of the columns across the warehouse, so optimizing compression for string columns was important. By using a threshold on observed number of distinct column values per stripe, we modified the ORCFile writer to apply dictionary encoding to a stripe only when beneficial. Additionally, we sample the column values and take the character set of the column into account, since a small character set can be leveraged by codecs like Zlib for good compression and dictionary encoding then becomes unnecessary or sometimes even detrimental if applied. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-1520) Inclusion of breeze corrupts assembly when compiled with JDK7 and run on JDK6
[ https://issues.apache.org/jira/browse/SPARK-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-1520: --- Description: This is a real doozie - when compiling a Spark assembly with JDK7, the produced jar does not work well with JRE6. I confirmed the byte code being produced is JDK 6 compatible (major version 50). What happens is that, silently, the JRE will not load any class files from the assembled jar. {code} $ sbt/sbt assembly/assembly $ /usr/lib/jvm/java-1.7.0-openjdk-amd64/bin/java -cp /home/patrick/Documents/spark/assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar org.apache.spark.ui.UIWorkloadGenerator usage: ./bin/spark-class org.apache.spark.ui.UIWorkloadGenerator [master] [FIFO|FAIR] $ /usr/lib/jvm/java-1.6.0-openjdk-amd64/bin/java -cp /home/patrick/Documents/spark/assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar org.apache.spark.ui.UIWorkloadGenerator Exception in thread main java.lang.NoClassDefFoundError: org/apache/spark/ui/UIWorkloadGenerator Caused by: java.lang.ClassNotFoundException: org.apache.spark.ui.UIWorkloadGenerator at java.net.URLClassLoader$1.run(URLClassLoader.java:217) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:205) at java.lang.ClassLoader.loadClass(ClassLoader.java:323) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294) at java.lang.ClassLoader.loadClass(ClassLoader.java:268) Could not find the main class: org.apache.spark.ui.UIWorkloadGenerator. Program will exit. {code} I also noticed that if the jar is unzipped, and the classpath set to the currently directory, it just works. Finally, if the assembly jar is compiled with JDK6, it also works. The error is seen with any class, not just the UIWorkloadGenerator. Also, this error doesn't exist in branch 0.9, only in master. *Isolation* -I ran a git bisection and this appeared after the MLLib sparse vector patch was merged:- https://github.com/apache/spark/commit/80c29689ae3b589254a571da3ddb5f9c866ae534 SPARK-1212 -I narrowed this down specifically to the inclusion of the breeze library. Just adding breeze to an older (unaffected) build triggered the issue.- I (maybe) narrowed this down to one or more classes inside of breeze/linalg/operators. If this directory is deleted and the jar is re-assembled things work fine. was: This is a real doozie - when compiling a Spark assembly with JDK7, the produced jar does not work well with JRE6. I confirmed the byte code being produced is JDK 6 compatible (major version 50). What happens is that, silently, the JRE will not load any class files from the assembled jar. {code} $ sbt/sbt assembly/assembly $ /usr/lib/jvm/java-1.7.0-openjdk-amd64/bin/java -cp /home/patrick/Documents/spark/assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar org.apache.spark.ui.UIWorkloadGenerator usage: ./bin/spark-class org.apache.spark.ui.UIWorkloadGenerator [master] [FIFO|FAIR] $ /usr/lib/jvm/java-1.6.0-openjdk-amd64/bin/java -cp /home/patrick/Documents/spark/assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar org.apache.spark.ui.UIWorkloadGenerator Exception in thread main java.lang.NoClassDefFoundError: org/apache/spark/ui/UIWorkloadGenerator Caused by: java.lang.ClassNotFoundException: org.apache.spark.ui.UIWorkloadGenerator at java.net.URLClassLoader$1.run(URLClassLoader.java:217) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:205) at java.lang.ClassLoader.loadClass(ClassLoader.java:323) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294) at java.lang.ClassLoader.loadClass(ClassLoader.java:268) Could not find the main class: org.apache.spark.ui.UIWorkloadGenerator. Program will exit. {code} I also noticed that if the jar is unzipped, and the classpath set to the currently directory, it just works. Finally, if the assembly jar is compiled with JDK6, it also works. The error is seen with any class, not just the UIWorkloadGenerator. Also, this error doesn't exist in branch 0.9, only in master. *Isolation* -I ran a git bisection and this appeared after the MLLib sparse vector patch was merged:- https://github.com/apache/spark/commit/80c29689ae3b589254a571da3ddb5f9c866ae534 SPARK-1212 -I narrowed this down specifically to the inclusion of the breeze library. Just adding breeze to an older (unaffected) build triggered the issue.- I narrowed this down to one or more classes inside of breeze/linalg/operators. If this directory is deleted and the jar is re-assembled things work fine. Inclusion of breeze corrupts assembly when compiled with JDK7 and run on JDK6
[jira] [Updated] (SPARK-1520) Inclusion of breeze corrupts assembly when compiled with JDK7 and run on JDK6
[ https://issues.apache.org/jira/browse/SPARK-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-1520: --- Description: This is a real doozie - when compiling a Spark assembly with JDK7, the produced jar does not work well with JRE6. I confirmed the byte code being produced is JDK 6 compatible (major version 50). What happens is that, silently, the JRE will not load any class files from the assembled jar. {code} $ sbt/sbt assembly/assembly $ /usr/lib/jvm/java-1.7.0-openjdk-amd64/bin/java -cp /home/patrick/Documents/spark/assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar org.apache.spark.ui.UIWorkloadGenerator usage: ./bin/spark-class org.apache.spark.ui.UIWorkloadGenerator [master] [FIFO|FAIR] $ /usr/lib/jvm/java-1.6.0-openjdk-amd64/bin/java -cp /home/patrick/Documents/spark/assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar org.apache.spark.ui.UIWorkloadGenerator Exception in thread main java.lang.NoClassDefFoundError: org/apache/spark/ui/UIWorkloadGenerator Caused by: java.lang.ClassNotFoundException: org.apache.spark.ui.UIWorkloadGenerator at java.net.URLClassLoader$1.run(URLClassLoader.java:217) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:205) at java.lang.ClassLoader.loadClass(ClassLoader.java:323) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294) at java.lang.ClassLoader.loadClass(ClassLoader.java:268) Could not find the main class: org.apache.spark.ui.UIWorkloadGenerator. Program will exit. {code} I also noticed that if the jar is unzipped, and the classpath set to the currently directory, it just works. Finally, if the assembly jar is compiled with JDK6, it also works. The error is seen with any class, not just the UIWorkloadGenerator. Also, this error doesn't exist in branch 0.9, only in master. *Isolation* -I ran a git bisection and this appeared after the MLLib sparse vector patch was merged:- https://github.com/apache/spark/commit/80c29689ae3b589254a571da3ddb5f9c866ae534 SPARK-1212 -I narrowed this down specifically to the inclusion of the breeze library. Just adding breeze to an older (unaffected) build triggered the issue.- I've found that if I just unpack and re-pack the jar, it sometimes works: {code} $ cd assembly/target/scala-2.10/ $ /usr/lib/jvm/java-1.6.0-openjdk-amd64/bin/java -cp ./spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar org.apache.spark.ui.UIWorkloadGenerator # fails $ jar xvf spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar $ jar cvf spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar * $ /usr/lib/jvm/java-1.6.0-openjdk-amd64/bin/java -cp ./spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar org.apache.spark.ui.UIWorkloadGenerator # succeeds {code} was: This is a real doozie - when compiling a Spark assembly with JDK7, the produced jar does not work well with JRE6. I confirmed the byte code being produced is JDK 6 compatible (major version 50). What happens is that, silently, the JRE will not load any class files from the assembled jar. {code} $ sbt/sbt assembly/assembly $ /usr/lib/jvm/java-1.7.0-openjdk-amd64/bin/java -cp /home/patrick/Documents/spark/assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar org.apache.spark.ui.UIWorkloadGenerator usage: ./bin/spark-class org.apache.spark.ui.UIWorkloadGenerator [master] [FIFO|FAIR] $ /usr/lib/jvm/java-1.6.0-openjdk-amd64/bin/java -cp /home/patrick/Documents/spark/assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar org.apache.spark.ui.UIWorkloadGenerator Exception in thread main java.lang.NoClassDefFoundError: org/apache/spark/ui/UIWorkloadGenerator Caused by: java.lang.ClassNotFoundException: org.apache.spark.ui.UIWorkloadGenerator at java.net.URLClassLoader$1.run(URLClassLoader.java:217) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:205) at java.lang.ClassLoader.loadClass(ClassLoader.java:323) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294) at java.lang.ClassLoader.loadClass(ClassLoader.java:268) Could not find the main class: org.apache.spark.ui.UIWorkloadGenerator. Program will exit. {code} I also noticed that if the jar is unzipped, and the classpath set to the currently directory, it just works. Finally, if the assembly jar is compiled with JDK6, it also works. The error is seen with any class, not just the UIWorkloadGenerator. Also, this error doesn't exist in branch 0.9, only in master. *Isolation* -I ran a git bisection and this appeared after the MLLib sparse vector patch was merged:- https://github.com/apache/spark/commit/80c29689ae3b589254a571da3ddb5f9c866ae534 SPARK-1212 -I narrowed this down specifically to the inclusion
[jira] [Commented] (SPARK-1520) Inclusion of breeze corrupts assembly when compiled with JDK7 and run on JDK6
[ https://issues.apache.org/jira/browse/SPARK-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13972397#comment-13972397 ] Sean Owen commented on SPARK-1520: -- Madness. One wild guess is that the breeze .jar files have something in META-INF that, when merged together into the assembly jar, conflicts with other META-INF items. In particular I'm thinking of MANIFEST.MF entries. It's worth diffing those if you can from before and after. However this would still require that Java 7 and 6 behave differently with respect to the entries, to explain your findings. It's possible. Your last comment however suggests it's something strange with the byte code that gets output for a few classes. Java 7 is stricter about byte code. For example: https://weblogs.java.net/blog/fabriziogiudici/archive/2012/05/07/understanding-subtle-new-behaviours-jdk-7 However I would think these would manifest as quite different errors. What about running with -verbose:class to print classloading messages? it might point directly to what's failing to load, if that's it. Of course you can always build with Java 6 since that's supposed to be all that's supported/required now (see my other JIRA about making Jenkins do this), although I agree that it would be nice to get to the bottom of this, as there is no obvious reason this shouldn't work. Inclusion of breeze corrupts assembly when compiled with JDK7 and run on JDK6 - Key: SPARK-1520 URL: https://issues.apache.org/jira/browse/SPARK-1520 Project: Spark Issue Type: Bug Components: MLlib, Spark Core Reporter: Patrick Wendell Priority: Blocker Fix For: 1.0.0 This is a real doozie - when compiling a Spark assembly with JDK7, the produced jar does not work well with JRE6. I confirmed the byte code being produced is JDK 6 compatible (major version 50). What happens is that, silently, the JRE will not load any class files from the assembled jar. {code} $ sbt/sbt assembly/assembly $ /usr/lib/jvm/java-1.7.0-openjdk-amd64/bin/java -cp /home/patrick/Documents/spark/assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar org.apache.spark.ui.UIWorkloadGenerator usage: ./bin/spark-class org.apache.spark.ui.UIWorkloadGenerator [master] [FIFO|FAIR] $ /usr/lib/jvm/java-1.6.0-openjdk-amd64/bin/java -cp /home/patrick/Documents/spark/assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar org.apache.spark.ui.UIWorkloadGenerator Exception in thread main java.lang.NoClassDefFoundError: org/apache/spark/ui/UIWorkloadGenerator Caused by: java.lang.ClassNotFoundException: org.apache.spark.ui.UIWorkloadGenerator at java.net.URLClassLoader$1.run(URLClassLoader.java:217) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:205) at java.lang.ClassLoader.loadClass(ClassLoader.java:323) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294) at java.lang.ClassLoader.loadClass(ClassLoader.java:268) Could not find the main class: org.apache.spark.ui.UIWorkloadGenerator. Program will exit. {code} I also noticed that if the jar is unzipped, and the classpath set to the currently directory, it just works. Finally, if the assembly jar is compiled with JDK6, it also works. The error is seen with any class, not just the UIWorkloadGenerator. Also, this error doesn't exist in branch 0.9, only in master. *Isolation* -I ran a git bisection and this appeared after the MLLib sparse vector patch was merged:- https://github.com/apache/spark/commit/80c29689ae3b589254a571da3ddb5f9c866ae534 SPARK-1212 -I narrowed this down specifically to the inclusion of the breeze library. Just adding breeze to an older (unaffected) build triggered the issue.- I've found that if I just unpack and re-pack the jar, it sometimes works: {code} $ cd assembly/target/scala-2.10/ $ /usr/lib/jvm/java-1.6.0-openjdk-amd64/bin/java -cp ./spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar org.apache.spark.ui.UIWorkloadGenerator # fails $ jar xvf spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar $ jar cvf spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar * $ /usr/lib/jvm/java-1.6.0-openjdk-amd64/bin/java -cp ./spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar org.apache.spark.ui.UIWorkloadGenerator # succeeds {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-1520) Inclusion of breeze corrupts assembly when compiled with JDK7 and run on JDK6
[ https://issues.apache.org/jira/browse/SPARK-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-1520: --- Description: This is a real doozie - when compiling a Spark assembly with JDK7, the produced jar does not work well with JRE6. I confirmed the byte code being produced is JDK 6 compatible (major version 50). What happens is that, silently, the JRE will not load any class files from the assembled jar. {code} $ sbt/sbt assembly/assembly $ /usr/lib/jvm/java-1.7.0-openjdk-amd64/bin/java -cp /home/patrick/Documents/spark/assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar org.apache.spark.ui.UIWorkloadGenerator usage: ./bin/spark-class org.apache.spark.ui.UIWorkloadGenerator [master] [FIFO|FAIR] $ /usr/lib/jvm/java-1.6.0-openjdk-amd64/bin/java -cp /home/patrick/Documents/spark/assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar org.apache.spark.ui.UIWorkloadGenerator Exception in thread main java.lang.NoClassDefFoundError: org/apache/spark/ui/UIWorkloadGenerator Caused by: java.lang.ClassNotFoundException: org.apache.spark.ui.UIWorkloadGenerator at java.net.URLClassLoader$1.run(URLClassLoader.java:217) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:205) at java.lang.ClassLoader.loadClass(ClassLoader.java:323) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294) at java.lang.ClassLoader.loadClass(ClassLoader.java:268) Could not find the main class: org.apache.spark.ui.UIWorkloadGenerator. Program will exit. {code} I also noticed that if the jar is unzipped, and the classpath set to the currently directory, it just works. Finally, if the assembly jar is compiled with JDK6, it also works. The error is seen with any class, not just the UIWorkloadGenerator. Also, this error doesn't exist in branch 0.9, only in master. *Isolation* -I ran a git bisection and this appeared after the MLLib sparse vector patch was merged:- https://github.com/apache/spark/commit/80c29689ae3b589254a571da3ddb5f9c866ae534 SPARK-1212 -I narrowed this down specifically to the inclusion of the breeze library. Just adding breeze to an older (unaffected) build triggered the issue.- I've found that if I just unpack and re-pack the jar (using `jar` from java 7) it always works: {code} $ cd assembly/target/scala-2.10/ $ /usr/lib/jvm/java-1.6.0-openjdk-amd64/bin/java -cp ./spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar org.apache.spark.ui.UIWorkloadGenerator # fails $ jar xvf spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar $ jar cvf spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar * $ /usr/lib/jvm/java-1.6.0-openjdk-amd64/bin/java -cp ./spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar org.apache.spark.ui.UIWorkloadGenerator # succeeds {code} I also noticed something of note. The Breeze package contains single directories that have huge numbers of files in them (e.g. 2000+ class files in one directory). It's possible we are hitting some weird bugs/corner cases with compatibility of the internal storage format of the jar itself. was: This is a real doozie - when compiling a Spark assembly with JDK7, the produced jar does not work well with JRE6. I confirmed the byte code being produced is JDK 6 compatible (major version 50). What happens is that, silently, the JRE will not load any class files from the assembled jar. {code} $ sbt/sbt assembly/assembly $ /usr/lib/jvm/java-1.7.0-openjdk-amd64/bin/java -cp /home/patrick/Documents/spark/assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar org.apache.spark.ui.UIWorkloadGenerator usage: ./bin/spark-class org.apache.spark.ui.UIWorkloadGenerator [master] [FIFO|FAIR] $ /usr/lib/jvm/java-1.6.0-openjdk-amd64/bin/java -cp /home/patrick/Documents/spark/assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar org.apache.spark.ui.UIWorkloadGenerator Exception in thread main java.lang.NoClassDefFoundError: org/apache/spark/ui/UIWorkloadGenerator Caused by: java.lang.ClassNotFoundException: org.apache.spark.ui.UIWorkloadGenerator at java.net.URLClassLoader$1.run(URLClassLoader.java:217) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:205) at java.lang.ClassLoader.loadClass(ClassLoader.java:323) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294) at java.lang.ClassLoader.loadClass(ClassLoader.java:268) Could not find the main class: org.apache.spark.ui.UIWorkloadGenerator. Program will exit. {code} I also noticed that if the jar is unzipped, and the classpath set to the currently directory, it just works. Finally, if the assembly jar is compiled with JDK6, it also works. The error is seen with any class, not just the
[jira] [Updated] (SPARK-1520) Inclusion of breeze corrupts assembly when compiled with JDK7 and run on JDK6
[ https://issues.apache.org/jira/browse/SPARK-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-1520: --- Description: This is a real doozie - when compiling a Spark assembly with JDK7, the produced jar does not work well with JRE6. I confirmed the byte code being produced is JDK 6 compatible (major version 50). What happens is that, silently, the JRE will not load any class files from the assembled jar. {code} $ sbt/sbt assembly/assembly $ /usr/lib/jvm/java-1.7.0-openjdk-amd64/bin/java -cp /home/patrick/Documents/spark/assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar org.apache.spark.ui.UIWorkloadGenerator usage: ./bin/spark-class org.apache.spark.ui.UIWorkloadGenerator [master] [FIFO|FAIR] $ /usr/lib/jvm/java-1.6.0-openjdk-amd64/bin/java -cp /home/patrick/Documents/spark/assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar org.apache.spark.ui.UIWorkloadGenerator Exception in thread main java.lang.NoClassDefFoundError: org/apache/spark/ui/UIWorkloadGenerator Caused by: java.lang.ClassNotFoundException: org.apache.spark.ui.UIWorkloadGenerator at java.net.URLClassLoader$1.run(URLClassLoader.java:217) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:205) at java.lang.ClassLoader.loadClass(ClassLoader.java:323) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294) at java.lang.ClassLoader.loadClass(ClassLoader.java:268) Could not find the main class: org.apache.spark.ui.UIWorkloadGenerator. Program will exit. {code} I also noticed that if the jar is unzipped, and the classpath set to the currently directory, it just works. Finally, if the assembly jar is compiled with JDK6, it also works. The error is seen with any class, not just the UIWorkloadGenerator. Also, this error doesn't exist in branch 0.9, only in master. *Isolation* -I ran a git bisection and this appeared after the MLLib sparse vector patch was merged:- https://github.com/apache/spark/commit/80c29689ae3b589254a571da3ddb5f9c866ae534 SPARK-1212 -I narrowed this down specifically to the inclusion of the breeze library. Just adding breeze to an older (unaffected) build triggered the issue.- I've found that if I just unpack and re-pack the jar (using `jar` from java 6 or 7) it always works: {code} $ cd assembly/target/scala-2.10/ $ /usr/lib/jvm/java-1.6.0-openjdk-amd64/bin/java -cp ./spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar org.apache.spark.ui.UIWorkloadGenerator # fails $ jar xvf spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar $ jar cvf spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar * $ /usr/lib/jvm/java-1.6.0-openjdk-amd64/bin/java -cp ./spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar org.apache.spark.ui.UIWorkloadGenerator # succeeds {code} I also noticed something of note. The Breeze package contains single directories that have huge numbers of files in them (e.g. 2000+ class files in one directory). It's possible we are hitting some weird bugs/corner cases with compatibility of the internal storage format of the jar itself. was: This is a real doozie - when compiling a Spark assembly with JDK7, the produced jar does not work well with JRE6. I confirmed the byte code being produced is JDK 6 compatible (major version 50). What happens is that, silently, the JRE will not load any class files from the assembled jar. {code} $ sbt/sbt assembly/assembly $ /usr/lib/jvm/java-1.7.0-openjdk-amd64/bin/java -cp /home/patrick/Documents/spark/assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar org.apache.spark.ui.UIWorkloadGenerator usage: ./bin/spark-class org.apache.spark.ui.UIWorkloadGenerator [master] [FIFO|FAIR] $ /usr/lib/jvm/java-1.6.0-openjdk-amd64/bin/java -cp /home/patrick/Documents/spark/assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar org.apache.spark.ui.UIWorkloadGenerator Exception in thread main java.lang.NoClassDefFoundError: org/apache/spark/ui/UIWorkloadGenerator Caused by: java.lang.ClassNotFoundException: org.apache.spark.ui.UIWorkloadGenerator at java.net.URLClassLoader$1.run(URLClassLoader.java:217) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:205) at java.lang.ClassLoader.loadClass(ClassLoader.java:323) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294) at java.lang.ClassLoader.loadClass(ClassLoader.java:268) Could not find the main class: org.apache.spark.ui.UIWorkloadGenerator. Program will exit. {code} I also noticed that if the jar is unzipped, and the classpath set to the currently directory, it just works. Finally, if the assembly jar is compiled with JDK6, it also works. The error is seen with any class, not just the
[jira] [Commented] (SPARK-1520) Inclusion of breeze corrupts assembly when compiled with JDK7 and run on JDK6
[ https://issues.apache.org/jira/browse/SPARK-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13972523#comment-13972523 ] Sean Owen commented on SPARK-1520: -- Regarding large numbers of files: are there INDEX.LST files used anywhere in the jars? If this gets munged or truncated while building the assembly jar, that might cause all kinds of havoc. It could be omitted. http://docs.oracle.com/javase/7/docs/technotes/guides/jar/jar.html#Index_File_Specification Inclusion of breeze corrupts assembly when compiled with JDK7 and run on JDK6 - Key: SPARK-1520 URL: https://issues.apache.org/jira/browse/SPARK-1520 Project: Spark Issue Type: Bug Components: MLlib, Spark Core Reporter: Patrick Wendell Priority: Blocker Fix For: 1.0.0 This is a real doozie - when compiling a Spark assembly with JDK7, the produced jar does not work well with JRE6. I confirmed the byte code being produced is JDK 6 compatible (major version 50). What happens is that, silently, the JRE will not load any class files from the assembled jar. {code} $ sbt/sbt assembly/assembly $ /usr/lib/jvm/java-1.7.0-openjdk-amd64/bin/java -cp /home/patrick/Documents/spark/assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar org.apache.spark.ui.UIWorkloadGenerator usage: ./bin/spark-class org.apache.spark.ui.UIWorkloadGenerator [master] [FIFO|FAIR] $ /usr/lib/jvm/java-1.6.0-openjdk-amd64/bin/java -cp /home/patrick/Documents/spark/assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar org.apache.spark.ui.UIWorkloadGenerator Exception in thread main java.lang.NoClassDefFoundError: org/apache/spark/ui/UIWorkloadGenerator Caused by: java.lang.ClassNotFoundException: org.apache.spark.ui.UIWorkloadGenerator at java.net.URLClassLoader$1.run(URLClassLoader.java:217) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:205) at java.lang.ClassLoader.loadClass(ClassLoader.java:323) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294) at java.lang.ClassLoader.loadClass(ClassLoader.java:268) Could not find the main class: org.apache.spark.ui.UIWorkloadGenerator. Program will exit. {code} I also noticed that if the jar is unzipped, and the classpath set to the currently directory, it just works. Finally, if the assembly jar is compiled with JDK6, it also works. The error is seen with any class, not just the UIWorkloadGenerator. Also, this error doesn't exist in branch 0.9, only in master. *Isolation* -I ran a git bisection and this appeared after the MLLib sparse vector patch was merged:- https://github.com/apache/spark/commit/80c29689ae3b589254a571da3ddb5f9c866ae534 SPARK-1212 -I narrowed this down specifically to the inclusion of the breeze library. Just adding breeze to an older (unaffected) build triggered the issue.- I've found that if I just unpack and re-pack the jar (using `jar` from java 6 or 7) it always works: {code} $ cd assembly/target/scala-2.10/ $ /usr/lib/jvm/java-1.6.0-openjdk-amd64/bin/java -cp ./spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar org.apache.spark.ui.UIWorkloadGenerator # fails $ jar xvf spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar $ jar cvf spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar * $ /usr/lib/jvm/java-1.6.0-openjdk-amd64/bin/java -cp ./spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar org.apache.spark.ui.UIWorkloadGenerator # succeeds {code} I also noticed something of note. The Breeze package contains single directories that have huge numbers of files in them (e.g. 2000+ class files in one directory). It's possible we are hitting some weird bugs/corner cases with compatibility of the internal storage format of the jar itself. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (SPARK-1522) YARN ClientBase will throw a NPE if there is no YARN application specific classpath.
Bernardo Gomez Palacio created SPARK-1522: - Summary: YARN ClientBase will throw a NPE if there is no YARN application specific classpath. Key: SPARK-1522 URL: https://issues.apache.org/jira/browse/SPARK-1522 Project: Spark Issue Type: Bug Components: YARN Affects Versions: 0.9.0, 1.0.0, 0.9.1 Reporter: Bernardo Gomez Palacio Priority: Critical The current implementation of ClientBase.getDefaultYarnApplicationClasspath inspects the MRJobConfig class for the field DEFAULT_YARN_APPLICATION_CLASSPATH when it should be really looking into YarnConfiguration. If the Application Configuration has no yarn.application.classpath defined a NPE exception will be thrown. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-1522) YARN ClientBase will throw a NPE if there is no YARN application specific classpath.
[ https://issues.apache.org/jira/browse/SPARK-1522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13972803#comment-13972803 ] Bernardo Gomez Palacio commented on SPARK-1522: --- https://github.com/apache/spark/pull/433 YARN ClientBase will throw a NPE if there is no YARN application specific classpath. Key: SPARK-1522 URL: https://issues.apache.org/jira/browse/SPARK-1522 Project: Spark Issue Type: Bug Components: YARN Affects Versions: 0.9.0, 1.0.0, 0.9.1 Reporter: Bernardo Gomez Palacio Priority: Critical Labels: YARN The current implementation of ClientBase.getDefaultYarnApplicationClasspath inspects the MRJobConfig class for the field DEFAULT_YARN_APPLICATION_CLASSPATH when it should be really looking into YarnConfiguration. If the Application Configuration has no yarn.application.classpath defined a NPE exception will be thrown. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (SPARK-1524) TaskSetManager'd better not schedule tasks which has no preferred executorId using PROCESS_LOCAL in the first search process
YanTang Zhai created SPARK-1524: --- Summary: TaskSetManager'd better not schedule tasks which has no preferred executorId using PROCESS_LOCAL in the first search process Key: SPARK-1524 URL: https://issues.apache.org/jira/browse/SPARK-1524 Project: Spark Issue Type: Improvement Components: Spark Core Reporter: YanTang Zhai Priority: Minor ShuffleMapTask is constructed with TaskLocation which has only host not (host, executorID) pair in DAGScheduler. When TaskSetManager schedules ShuffleMapTask which has no preferred executorId using specific execId host and PROCESS_LOCAL locality level, no tasks match the given locality constraint in the first search process. We also find that the host used by Scheduler is hostname while the host used by TaskLocation is IP in our cluster. The tow hosts do not match, that makes pendingTasksForHost HashMap empty and the finding task process against our expectation. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-1524) TaskSetManager'd better not schedule tasks which has no preferred executorId using PROCESS_LOCAL in the first search process
[ https://issues.apache.org/jira/browse/SPARK-1524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13972864#comment-13972864 ] Mridul Muralidharan commented on SPARK-1524: The expectation is to fallback to a previous schedule type in case the higher level is not valid : though this is tricky in general case. Will need to take a look at it - though given that I am tied up with other things, if someone else wants to take a crack, please feel free to do so ! Btw, use of IP's and multiple hostnames for a host is not supported in spark - so that is something that will need to be resolved at the deployment end. TaskSetManager'd better not schedule tasks which has no preferred executorId using PROCESS_LOCAL in the first search process Key: SPARK-1524 URL: https://issues.apache.org/jira/browse/SPARK-1524 Project: Spark Issue Type: Improvement Components: Spark Core Reporter: YanTang Zhai Priority: Minor ShuffleMapTask is constructed with TaskLocation which has only host not (host, executorID) pair in DAGScheduler. When TaskSetManager schedules ShuffleMapTask which has no preferred executorId using specific execId host and PROCESS_LOCAL locality level, no tasks match the given locality constraint in the first search process. We also find that the host used by Scheduler is hostname while the host used by TaskLocation is IP in our cluster. The tow hosts do not match, that makes pendingTasksForHost HashMap empty and the finding task process against our expectation. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-1525) TaskSchedulerImpl should decrease availableCpus by spark.task.cpus not 1
[ https://issues.apache.org/jira/browse/SPARK-1525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13972957#comment-13972957 ] witgo commented on SPARK-1525: -- The latest code already fix the bug. TaskSchedulerImpl should decrease availableCpus by spark.task.cpus not 1 Key: SPARK-1525 URL: https://issues.apache.org/jira/browse/SPARK-1525 Project: Spark Issue Type: Bug Components: Spark Core Reporter: YanTang Zhai Priority: Minor TaskSchedulerImpl decreases availableCpus by 1 in resourceOffers process always even though spark.task.cpus is more than 1, which will schedule more tasks to some node when spark.task.cpus is more than 1. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Closed] (SPARK-1511) Update TestUtils.createCompiledClass() API to work with creating class file on different filesystem
[ https://issues.apache.org/jira/browse/SPARK-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ye Xianjin closed SPARK-1511. - Resolution: Fixed Fix Version/s: 1.0.0 Update TestUtils.createCompiledClass() API to work with creating class file on different filesystem --- Key: SPARK-1511 URL: https://issues.apache.org/jira/browse/SPARK-1511 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 0.8.1, 0.9.0, 1.0.0 Environment: Mac OS X, two disks. Reporter: Ye Xianjin Priority: Minor Labels: starter Fix For: 1.0.0 Original Estimate: 24h Remaining Estimate: 24h The createCompliedClass method uses java File.renameTo method to rename source file to destination file, which will fail if source and destination files are on different disks (or partitions). see http://apache-spark-developers-list.1001551.n3.nabble.com/Tests-failed-after-assembling-the-latest-code-from-github-td6315.html for more details. Use com.google.common.io.Files.move instead of renameTo will solve this issue. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-1476) 2GB limit in spark for blocks
[ https://issues.apache.org/jira/browse/SPARK-1476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13972978#comment-13972978 ] Mridul Muralidharan commented on SPARK-1476: [~matei] We are having some issue porting the netty shuffle copier code to support 2G since only ByteBuf seems to be exposed. Before I dig into netty more, wanted to know if you or someone else from among spark developers knew how to add support for large buffers in our netty code. Thanks ! 2GB limit in spark for blocks - Key: SPARK-1476 URL: https://issues.apache.org/jira/browse/SPARK-1476 Project: Spark Issue Type: Bug Components: Spark Core Environment: all Reporter: Mridul Muralidharan Assignee: Mridul Muralidharan Priority: Critical Fix For: 1.1.0 The underlying abstraction for blocks in spark is a ByteBuffer : which limits the size of the block to 2GB. This has implication not just for managed blocks in use, but also for shuffle blocks (memory mapped blocks are limited to 2gig, even though the api allows for long), ser-deser via byte array backed outstreams (SPARK-1391), etc. This is a severe limitation for use of spark when used on non trivial datasets. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (SPARK-1526) Running spark driver program from my local machine
Idan Zalzberg created SPARK-1526: Summary: Running spark driver program from my local machine Key: SPARK-1526 URL: https://issues.apache.org/jira/browse/SPARK-1526 Project: Spark Issue Type: Wish Components: Spark Core Reporter: Idan Zalzberg Currently it seems that the design choice is that the driver program should be close network-wise to the worker and allow connections to be created from either side. This makes using Spark somewhat harder since when I develop locally I not only to package all my program, but also all it's local dependencies. let's say I have a local DB with names of files in HADOOP that I want to process with spark, now I need my local DB to be accessible from the cluster so it can fetch the file names in runtime. The driver program is an awesome thing, but it loses some of it's strength if you can't really run it anywhere. It seems to me that the problem is with the DAGScheduler that needs to be close to the worker, maybe it shouldn't be embedded in the driver then? -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (SPARK-1527) rootDirs in DiskBlockManagerSuite doesn't get full path from rootDir0, rootDir1
Ye Xianjin created SPARK-1527: - Summary: rootDirs in DiskBlockManagerSuite doesn't get full path from rootDir0, rootDir1 Key: SPARK-1527 URL: https://issues.apache.org/jira/browse/SPARK-1527 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 0.9.0 Reporter: Ye Xianjin Priority: Minor In core/src/test/scala/org/apache/storage/DiskBlockManagerSuite.scala val rootDir0 = Files.createTempDir() rootDir0.deleteOnExit() val rootDir1 = Files.createTempDir() rootDir1.deleteOnExit() val rootDirs = rootDir0.getName + , + rootDir1.getName rootDir0 and rootDir1 are in system's temporary directory. rootDir0.getName will not get the full path of the directory but the last component of the directory. When passing to DiskBlockManage constructor, the DiskBlockerManger creates directories in pwd not the temporary directory. rootDir0.toString will fix this issue. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-1527) rootDirs in DiskBlockManagerSuite doesn't get full path from rootDir0, rootDir1
[ https://issues.apache.org/jira/browse/SPARK-1527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973067#comment-13973067 ] Sean Owen commented on SPARK-1527: -- {{toString()}} returns {{getPath()}} which may still be relative. {{getAbsolutePath()}} is better, but even {{getCanonicalPath()}} may be better still. rootDirs in DiskBlockManagerSuite doesn't get full path from rootDir0, rootDir1 --- Key: SPARK-1527 URL: https://issues.apache.org/jira/browse/SPARK-1527 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 0.9.0 Reporter: Ye Xianjin Priority: Minor Labels: starter Original Estimate: 24h Remaining Estimate: 24h In core/src/test/scala/org/apache/storage/DiskBlockManagerSuite.scala val rootDir0 = Files.createTempDir() rootDir0.deleteOnExit() val rootDir1 = Files.createTempDir() rootDir1.deleteOnExit() val rootDirs = rootDir0.getName + , + rootDir1.getName rootDir0 and rootDir1 are in system's temporary directory. rootDir0.getName will not get the full path of the directory but the last component of the directory. When passing to DiskBlockManage constructor, the DiskBlockerManger creates directories in pwd not the temporary directory. rootDir0.toString will fix this issue. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-1518) Spark master doesn't compile against hadoop-common trunk
[ https://issues.apache.org/jira/browse/SPARK-1518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973084#comment-13973084 ] witgo commented on SPARK-1518: -- As the hadoop API changes, some methods have been removed. The hadoop related in spark core Independence to new modules. As in the case of yarn. Spark master doesn't compile against hadoop-common trunk Key: SPARK-1518 URL: https://issues.apache.org/jira/browse/SPARK-1518 Project: Spark Issue Type: Bug Reporter: Marcelo Vanzin FSDataOutputStream::sync() has disappeared from trunk in Hadoop; FileLogger.scala is calling it. I've changed it locally to hsync() so I can compile the code, but haven't checked yet whether those are equivalent. hsync() seems to have been there forever, so it hopefully works with all versions Spark cares about. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-1527) rootDirs in DiskBlockManagerSuite doesn't get full path from rootDir0, rootDir1
[ https://issues.apache.org/jira/browse/SPARK-1527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973087#comment-13973087 ] Ye Xianjin commented on SPARK-1527: --- Yes. You are right. toString() may give relative path. And since it's determined by java.io.tmpdir system property. see https://code.google.com/p/guava-libraries/source/browse/guava/src/com/google/common/io/Files.java line 591. It's possible that the DiskBlockManager will create different directories than the original temp dir when java.io.tmpdir is a relative path. so use getAbsolutePath since I use this method in my last pr? But, I saw toString() was called other places! Should we do something about that? rootDirs in DiskBlockManagerSuite doesn't get full path from rootDir0, rootDir1 --- Key: SPARK-1527 URL: https://issues.apache.org/jira/browse/SPARK-1527 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 0.9.0 Reporter: Ye Xianjin Priority: Minor Labels: starter Original Estimate: 24h Remaining Estimate: 24h In core/src/test/scala/org/apache/storage/DiskBlockManagerSuite.scala val rootDir0 = Files.createTempDir() rootDir0.deleteOnExit() val rootDir1 = Files.createTempDir() rootDir1.deleteOnExit() val rootDirs = rootDir0.getName + , + rootDir1.getName rootDir0 and rootDir1 are in system's temporary directory. rootDir0.getName will not get the full path of the directory but the last component of the directory. When passing to DiskBlockManage constructor, the DiskBlockerManger creates directories in pwd not the temporary directory. rootDir0.toString will fix this issue. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-1527) rootDirs in DiskBlockManagerSuite doesn't get full path from rootDir0, rootDir1
[ https://issues.apache.org/jira/browse/SPARK-1527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973091#comment-13973091 ] Sean Owen commented on SPARK-1527: -- If the paths are only used locally, then an absolute path never hurts (except to be a bit longer). I assume that since these are references to a temp directory that is by definition only valid locally, that absolute path is the right thing to use. In other cases, similar logic may apply. I could imagine in some cases the right thing to do is transmit a relative path. rootDirs in DiskBlockManagerSuite doesn't get full path from rootDir0, rootDir1 --- Key: SPARK-1527 URL: https://issues.apache.org/jira/browse/SPARK-1527 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 0.9.0 Reporter: Ye Xianjin Priority: Minor Labels: starter Original Estimate: 24h Remaining Estimate: 24h In core/src/test/scala/org/apache/storage/DiskBlockManagerSuite.scala val rootDir0 = Files.createTempDir() rootDir0.deleteOnExit() val rootDir1 = Files.createTempDir() rootDir1.deleteOnExit() val rootDirs = rootDir0.getName + , + rootDir1.getName rootDir0 and rootDir1 are in system's temporary directory. rootDir0.getName will not get the full path of the directory but the last component of the directory. When passing to DiskBlockManage constructor, the DiskBlockerManger creates directories in pwd not the temporary directory. rootDir0.toString will fix this issue. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (SPARK-1528) Spark on Yarn: Add option for user to specify additional namenodes to get tokens from
Thomas Graves created SPARK-1528: Summary: Spark on Yarn: Add option for user to specify additional namenodes to get tokens from Key: SPARK-1528 URL: https://issues.apache.org/jira/browse/SPARK-1528 Project: Spark Issue Type: Improvement Components: YARN Affects Versions: 1.0.0 Reporter: Thomas Graves Some users running spark on yarn may wish to contact other Hdfs clusters then the one they are running on. We should add in an option for them to specify those namenodes so that we can get the credentials needed for the application to contact them. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-1527) rootDirs in DiskBlockManagerSuite doesn't get full path from rootDir0, rootDir1
[ https://issues.apache.org/jira/browse/SPARK-1527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973096#comment-13973096 ] Ye Xianjin commented on SPARK-1527: --- Yes, of course, sometimes we want absolute path, sometimes we want to transmit a relative path. It depends on logic. But I think maybe we should review these usages so that we can make sure absolute paths or relative paths are used appropriately. I may have time to review it after I finish another JIRA issue. If you want to take it over, please! Anyway, thanks for your comments and help. rootDirs in DiskBlockManagerSuite doesn't get full path from rootDir0, rootDir1 --- Key: SPARK-1527 URL: https://issues.apache.org/jira/browse/SPARK-1527 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 0.9.0 Reporter: Ye Xianjin Priority: Minor Labels: starter Original Estimate: 24h Remaining Estimate: 24h In core/src/test/scala/org/apache/storage/DiskBlockManagerSuite.scala val rootDir0 = Files.createTempDir() rootDir0.deleteOnExit() val rootDir1 = Files.createTempDir() rootDir1.deleteOnExit() val rootDirs = rootDir0.getName + , + rootDir1.getName rootDir0 and rootDir1 are in system's temporary directory. rootDir0.getName will not get the full path of the directory but the last component of the directory. When passing to DiskBlockManage constructor, the DiskBlockerManger creates directories in pwd not the temporary directory. rootDir0.toString will fix this issue. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-1527) rootDirs in DiskBlockManagerSuite doesn't get full path from rootDir0, rootDir1
[ https://issues.apache.org/jira/browse/SPARK-1527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973148#comment-13973148 ] Sean Owen commented on SPARK-1527: -- There are a number of other uses of File.getName(), but a quick glance suggests all the others are appropriate. There are a number of other uses of File.toString(), almost all in tests. I suspect the Files in question already have absolute paths, and that even relative paths happen to work fine in a test since the working dir doesn't change. So those could change, but are probably not a concern. The only one that gave me pause was the use in HttpBroadcast.scala, though I suspect it turns out to work fine for similar reasons. If reviewers are interested in changing the toString()s I'll test and submit a PR for that. rootDirs in DiskBlockManagerSuite doesn't get full path from rootDir0, rootDir1 --- Key: SPARK-1527 URL: https://issues.apache.org/jira/browse/SPARK-1527 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 0.9.0 Reporter: Ye Xianjin Priority: Minor Labels: starter Original Estimate: 24h Remaining Estimate: 24h In core/src/test/scala/org/apache/storage/DiskBlockManagerSuite.scala val rootDir0 = Files.createTempDir() rootDir0.deleteOnExit() val rootDir1 = Files.createTempDir() rootDir1.deleteOnExit() val rootDirs = rootDir0.getName + , + rootDir1.getName rootDir0 and rootDir1 are in system's temporary directory. rootDir0.getName will not get the full path of the directory but the last component of the directory. When passing to DiskBlockManage constructor, the DiskBlockerManger creates directories in pwd not the temporary directory. rootDir0.toString will fix this issue. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-1520) Inclusion of breeze corrupts assembly when compiled with JDK7 and run on JDK6
[ https://issues.apache.org/jira/browse/SPARK-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973291#comment-13973291 ] Xiangrui Meng commented on SPARK-1520: -- I'm using Java 6 JDK located at /System/Library/Java/JavaVirtualMachines/1.6.0.jdk/Contents/Home on a mac. It can create a jar with more than 65536 files. I also found this JIRA: https://bugs.openjdk.java.net/browse/JDK-4828461 (Support Zip files with more than 64k entries) which was fixed in version 6. Note that this is for openjdk. I'm going to check the headers of assembly jars created by java 6 and 7. Inclusion of breeze corrupts assembly when compiled with JDK7 and run on JDK6 - Key: SPARK-1520 URL: https://issues.apache.org/jira/browse/SPARK-1520 Project: Spark Issue Type: Bug Components: MLlib, Spark Core Reporter: Patrick Wendell Priority: Blocker Fix For: 1.0.0 This is a real doozie - when compiling a Spark assembly with JDK7, the produced jar does not work well with JRE6. I confirmed the byte code being produced is JDK 6 compatible (major version 50). What happens is that, silently, the JRE will not load any class files from the assembled jar. {code} $ sbt/sbt assembly/assembly $ /usr/lib/jvm/java-1.7.0-openjdk-amd64/bin/java -cp /home/patrick/Documents/spark/assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar org.apache.spark.ui.UIWorkloadGenerator usage: ./bin/spark-class org.apache.spark.ui.UIWorkloadGenerator [master] [FIFO|FAIR] $ /usr/lib/jvm/java-1.6.0-openjdk-amd64/bin/java -cp /home/patrick/Documents/spark/assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar org.apache.spark.ui.UIWorkloadGenerator Exception in thread main java.lang.NoClassDefFoundError: org/apache/spark/ui/UIWorkloadGenerator Caused by: java.lang.ClassNotFoundException: org.apache.spark.ui.UIWorkloadGenerator at java.net.URLClassLoader$1.run(URLClassLoader.java:217) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:205) at java.lang.ClassLoader.loadClass(ClassLoader.java:323) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294) at java.lang.ClassLoader.loadClass(ClassLoader.java:268) Could not find the main class: org.apache.spark.ui.UIWorkloadGenerator. Program will exit. {code} I also noticed that if the jar is unzipped, and the classpath set to the currently directory, it just works. Finally, if the assembly jar is compiled with JDK6, it also works. The error is seen with any class, not just the UIWorkloadGenerator. Also, this error doesn't exist in branch 0.9, only in master. *Isolation* -I ran a git bisection and this appeared after the MLLib sparse vector patch was merged:- https://github.com/apache/spark/commit/80c29689ae3b589254a571da3ddb5f9c866ae534 SPARK-1212 -I narrowed this down specifically to the inclusion of the breeze library. Just adding breeze to an older (unaffected) build triggered the issue.- I've found that if I just unpack and re-pack the jar (using `jar` from java 6 or 7) it always works: {code} $ cd assembly/target/scala-2.10/ $ /usr/lib/jvm/java-1.6.0-openjdk-amd64/bin/java -cp ./spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar org.apache.spark.ui.UIWorkloadGenerator # fails $ jar xvf spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar $ jar cvf spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar * $ /usr/lib/jvm/java-1.6.0-openjdk-amd64/bin/java -cp ./spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar org.apache.spark.ui.UIWorkloadGenerator # succeeds {code} I also noticed something of note. The Breeze package contains single directories that have huge numbers of files in them (e.g. 2000+ class files in one directory). It's possible we are hitting some weird bugs/corner cases with compatibility of the internal storage format of the jar itself. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-1520) Inclusion of breeze corrupts assembly when compiled with JDK7 and run on JDK6
[ https://issues.apache.org/jira/browse/SPARK-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973306#comment-13973306 ] Xiangrui Meng commented on SPARK-1520: -- When I try to use jar-1.6 to untar the assembly jar created by java 7: ~~~ java.util.zip.ZipException: invalid CEN header (bad signature) at java.util.zip.ZipFile.open(Native Method) at java.util.zip.ZipFile.init(ZipFile.java:128) at java.util.zip.ZipFile.init(ZipFile.java:89) at sun.tools.jar.Main.list(Main.java:977) at sun.tools.jar.Main.run(Main.java:222) at sun.tools.jar.Main.main(Main.java:1147) ~~~ 7z shows: ~~~ Path = spark-assembly-1.6.jar Type = zip Physical Size = 119682511 Path = spark-assembly-1.7.jar Type = zip 64-bit = + Physical Size = 119682587 ~~~ I think the number of files limit is already increased in Java 6 (at least in the latest update), but Java 7 will use zip64 format for more than 64k files, and this format cannot be recognized by Java 6. Inclusion of breeze corrupts assembly when compiled with JDK7 and run on JDK6 - Key: SPARK-1520 URL: https://issues.apache.org/jira/browse/SPARK-1520 Project: Spark Issue Type: Bug Components: MLlib, Spark Core Reporter: Patrick Wendell Priority: Blocker Fix For: 1.0.0 This is a real doozie - when compiling a Spark assembly with JDK7, the produced jar does not work well with JRE6. I confirmed the byte code being produced is JDK 6 compatible (major version 50). What happens is that, silently, the JRE will not load any class files from the assembled jar. {code} $ sbt/sbt assembly/assembly $ /usr/lib/jvm/java-1.7.0-openjdk-amd64/bin/java -cp /home/patrick/Documents/spark/assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar org.apache.spark.ui.UIWorkloadGenerator usage: ./bin/spark-class org.apache.spark.ui.UIWorkloadGenerator [master] [FIFO|FAIR] $ /usr/lib/jvm/java-1.6.0-openjdk-amd64/bin/java -cp /home/patrick/Documents/spark/assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar org.apache.spark.ui.UIWorkloadGenerator Exception in thread main java.lang.NoClassDefFoundError: org/apache/spark/ui/UIWorkloadGenerator Caused by: java.lang.ClassNotFoundException: org.apache.spark.ui.UIWorkloadGenerator at java.net.URLClassLoader$1.run(URLClassLoader.java:217) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:205) at java.lang.ClassLoader.loadClass(ClassLoader.java:323) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294) at java.lang.ClassLoader.loadClass(ClassLoader.java:268) Could not find the main class: org.apache.spark.ui.UIWorkloadGenerator. Program will exit. {code} I also noticed that if the jar is unzipped, and the classpath set to the currently directory, it just works. Finally, if the assembly jar is compiled with JDK6, it also works. The error is seen with any class, not just the UIWorkloadGenerator. Also, this error doesn't exist in branch 0.9, only in master. *Isolation* -I ran a git bisection and this appeared after the MLLib sparse vector patch was merged:- https://github.com/apache/spark/commit/80c29689ae3b589254a571da3ddb5f9c866ae534 SPARK-1212 -I narrowed this down specifically to the inclusion of the breeze library. Just adding breeze to an older (unaffected) build triggered the issue.- I've found that if I just unpack and re-pack the jar (using `jar` from java 6 or 7) it always works: {code} $ cd assembly/target/scala-2.10/ $ /usr/lib/jvm/java-1.6.0-openjdk-amd64/bin/java -cp ./spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar org.apache.spark.ui.UIWorkloadGenerator # fails $ jar xvf spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar $ jar cvf spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar * $ /usr/lib/jvm/java-1.6.0-openjdk-amd64/bin/java -cp ./spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar org.apache.spark.ui.UIWorkloadGenerator # succeeds {code} I also noticed something of note. The Breeze package contains single directories that have huge numbers of files in them (e.g. 2000+ class files in one directory). It's possible we are hitting some weird bugs/corner cases with compatibility of the internal storage format of the jar itself. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Comment Edited] (SPARK-1520) Inclusion of breeze corrupts assembly when compiled with JDK7 and run on JDK6
[ https://issues.apache.org/jira/browse/SPARK-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973326#comment-13973326 ] Xiangrui Meng edited comment on SPARK-1520 at 4/17/14 7:59 PM: --- The quick fix may be removing fastutil, so Java 7 still generates the assembly jar in zip format instead of zip64. In RDD#countApproxDistinct, we use HyperLogLog from com.clearspring.analytics:stream, which depends on fastutil. If this is the only place that introduces fastutil dependency, we should implement HyperLogLog and remove fastutil completely from Spark's dependencies. was (Author: mengxr): The quick fix may be removing fastutil. In RDD#countApproxDistinct, we use HyperLogLog from com.clearspring.analytics:stream, which depends on fastutil. If this is the only place that introduces fastutil dependency, we should implement HyperLogLog and remove fastutil completely from Spark's dependencies. Inclusion of breeze corrupts assembly when compiled with JDK7 and run on JDK6 - Key: SPARK-1520 URL: https://issues.apache.org/jira/browse/SPARK-1520 Project: Spark Issue Type: Bug Components: MLlib, Spark Core Reporter: Patrick Wendell Priority: Blocker Fix For: 1.0.0 This is a real doozie - when compiling a Spark assembly with JDK7, the produced jar does not work well with JRE6. I confirmed the byte code being produced is JDK 6 compatible (major version 50). What happens is that, silently, the JRE will not load any class files from the assembled jar. {code} $ sbt/sbt assembly/assembly $ /usr/lib/jvm/java-1.7.0-openjdk-amd64/bin/java -cp /home/patrick/Documents/spark/assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar org.apache.spark.ui.UIWorkloadGenerator usage: ./bin/spark-class org.apache.spark.ui.UIWorkloadGenerator [master] [FIFO|FAIR] $ /usr/lib/jvm/java-1.6.0-openjdk-amd64/bin/java -cp /home/patrick/Documents/spark/assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar org.apache.spark.ui.UIWorkloadGenerator Exception in thread main java.lang.NoClassDefFoundError: org/apache/spark/ui/UIWorkloadGenerator Caused by: java.lang.ClassNotFoundException: org.apache.spark.ui.UIWorkloadGenerator at java.net.URLClassLoader$1.run(URLClassLoader.java:217) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:205) at java.lang.ClassLoader.loadClass(ClassLoader.java:323) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294) at java.lang.ClassLoader.loadClass(ClassLoader.java:268) Could not find the main class: org.apache.spark.ui.UIWorkloadGenerator. Program will exit. {code} I also noticed that if the jar is unzipped, and the classpath set to the currently directory, it just works. Finally, if the assembly jar is compiled with JDK6, it also works. The error is seen with any class, not just the UIWorkloadGenerator. Also, this error doesn't exist in branch 0.9, only in master. *Isolation* -I ran a git bisection and this appeared after the MLLib sparse vector patch was merged:- https://github.com/apache/spark/commit/80c29689ae3b589254a571da3ddb5f9c866ae534 SPARK-1212 -I narrowed this down specifically to the inclusion of the breeze library. Just adding breeze to an older (unaffected) build triggered the issue.- I've found that if I just unpack and re-pack the jar (using `jar` from java 6 or 7) it always works: {code} $ cd assembly/target/scala-2.10/ $ /usr/lib/jvm/java-1.6.0-openjdk-amd64/bin/java -cp ./spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar org.apache.spark.ui.UIWorkloadGenerator # fails $ jar xvf spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar $ jar cvf spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar * $ /usr/lib/jvm/java-1.6.0-openjdk-amd64/bin/java -cp ./spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar org.apache.spark.ui.UIWorkloadGenerator # succeeds {code} I also noticed something of note. The Breeze package contains single directories that have huge numbers of files in them (e.g. 2000+ class files in one directory). It's possible we are hitting some weird bugs/corner cases with compatibility of the internal storage format of the jar itself. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-1520) Inclusion of breeze corrupts assembly when compiled with JDK7 and run on JDK6
[ https://issues.apache.org/jira/browse/SPARK-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973326#comment-13973326 ] Xiangrui Meng commented on SPARK-1520: -- The quick fix may be removing fastutil. In RDD#countApproxDistinct, we use HyperLogLog from com.clearspring.analytics:stream, which depends on fastutil. If this is the only place that introduces fastutil dependency, we should implement HyperLogLog and remove fastutil completely from Spark's dependencies. Inclusion of breeze corrupts assembly when compiled with JDK7 and run on JDK6 - Key: SPARK-1520 URL: https://issues.apache.org/jira/browse/SPARK-1520 Project: Spark Issue Type: Bug Components: MLlib, Spark Core Reporter: Patrick Wendell Priority: Blocker Fix For: 1.0.0 This is a real doozie - when compiling a Spark assembly with JDK7, the produced jar does not work well with JRE6. I confirmed the byte code being produced is JDK 6 compatible (major version 50). What happens is that, silently, the JRE will not load any class files from the assembled jar. {code} $ sbt/sbt assembly/assembly $ /usr/lib/jvm/java-1.7.0-openjdk-amd64/bin/java -cp /home/patrick/Documents/spark/assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar org.apache.spark.ui.UIWorkloadGenerator usage: ./bin/spark-class org.apache.spark.ui.UIWorkloadGenerator [master] [FIFO|FAIR] $ /usr/lib/jvm/java-1.6.0-openjdk-amd64/bin/java -cp /home/patrick/Documents/spark/assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar org.apache.spark.ui.UIWorkloadGenerator Exception in thread main java.lang.NoClassDefFoundError: org/apache/spark/ui/UIWorkloadGenerator Caused by: java.lang.ClassNotFoundException: org.apache.spark.ui.UIWorkloadGenerator at java.net.URLClassLoader$1.run(URLClassLoader.java:217) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:205) at java.lang.ClassLoader.loadClass(ClassLoader.java:323) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294) at java.lang.ClassLoader.loadClass(ClassLoader.java:268) Could not find the main class: org.apache.spark.ui.UIWorkloadGenerator. Program will exit. {code} I also noticed that if the jar is unzipped, and the classpath set to the currently directory, it just works. Finally, if the assembly jar is compiled with JDK6, it also works. The error is seen with any class, not just the UIWorkloadGenerator. Also, this error doesn't exist in branch 0.9, only in master. *Isolation* -I ran a git bisection and this appeared after the MLLib sparse vector patch was merged:- https://github.com/apache/spark/commit/80c29689ae3b589254a571da3ddb5f9c866ae534 SPARK-1212 -I narrowed this down specifically to the inclusion of the breeze library. Just adding breeze to an older (unaffected) build triggered the issue.- I've found that if I just unpack and re-pack the jar (using `jar` from java 6 or 7) it always works: {code} $ cd assembly/target/scala-2.10/ $ /usr/lib/jvm/java-1.6.0-openjdk-amd64/bin/java -cp ./spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar org.apache.spark.ui.UIWorkloadGenerator # fails $ jar xvf spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar $ jar cvf spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar * $ /usr/lib/jvm/java-1.6.0-openjdk-amd64/bin/java -cp ./spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar org.apache.spark.ui.UIWorkloadGenerator # succeeds {code} I also noticed something of note. The Breeze package contains single directories that have huge numbers of files in them (e.g. 2000+ class files in one directory). It's possible we are hitting some weird bugs/corner cases with compatibility of the internal storage format of the jar itself. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-1520) Assembly Jar with more than 65536 files won't work when compiled on JDK7 and run on JDK6
[ https://issues.apache.org/jira/browse/SPARK-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-1520: --- Summary: Assembly Jar with more than 65536 files won't work when compiled on JDK7 and run on JDK6 (was: Assembly Jar with more than JDK7 and run on JDK6) Assembly Jar with more than 65536 files won't work when compiled on JDK7 and run on JDK6 - Key: SPARK-1520 URL: https://issues.apache.org/jira/browse/SPARK-1520 Project: Spark Issue Type: Bug Components: MLlib, Spark Core Reporter: Patrick Wendell Priority: Blocker Fix For: 1.0.0 This is a real doozie - when compiling a Spark assembly with JDK7, the produced jar does not work well with JRE6. I confirmed the byte code being produced is JDK 6 compatible (major version 50). What happens is that, silently, the JRE will not load any class files from the assembled jar. {code} $ sbt/sbt assembly/assembly $ /usr/lib/jvm/java-1.7.0-openjdk-amd64/bin/java -cp /home/patrick/Documents/spark/assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar org.apache.spark.ui.UIWorkloadGenerator usage: ./bin/spark-class org.apache.spark.ui.UIWorkloadGenerator [master] [FIFO|FAIR] $ /usr/lib/jvm/java-1.6.0-openjdk-amd64/bin/java -cp /home/patrick/Documents/spark/assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar org.apache.spark.ui.UIWorkloadGenerator Exception in thread main java.lang.NoClassDefFoundError: org/apache/spark/ui/UIWorkloadGenerator Caused by: java.lang.ClassNotFoundException: org.apache.spark.ui.UIWorkloadGenerator at java.net.URLClassLoader$1.run(URLClassLoader.java:217) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:205) at java.lang.ClassLoader.loadClass(ClassLoader.java:323) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294) at java.lang.ClassLoader.loadClass(ClassLoader.java:268) Could not find the main class: org.apache.spark.ui.UIWorkloadGenerator. Program will exit. {code} I also noticed that if the jar is unzipped, and the classpath set to the currently directory, it just works. Finally, if the assembly jar is compiled with JDK6, it also works. The error is seen with any class, not just the UIWorkloadGenerator. Also, this error doesn't exist in branch 0.9, only in master. *Isolation* -I ran a git bisection and this appeared after the MLLib sparse vector patch was merged:- https://github.com/apache/spark/commit/80c29689ae3b589254a571da3ddb5f9c866ae534 SPARK-1212 -I narrowed this down specifically to the inclusion of the breeze library. Just adding breeze to an older (unaffected) build triggered the issue.- I've found that if I just unpack and re-pack the jar (using `jar` from java 6 or 7) it always works: {code} $ cd assembly/target/scala-2.10/ $ /usr/lib/jvm/java-1.6.0-openjdk-amd64/bin/java -cp ./spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar org.apache.spark.ui.UIWorkloadGenerator # fails $ jar xvf spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar $ jar cvf spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar * $ /usr/lib/jvm/java-1.6.0-openjdk-amd64/bin/java -cp ./spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar org.apache.spark.ui.UIWorkloadGenerator # succeeds {code} I also noticed something of note. The Breeze package contains single directories that have huge numbers of files in them (e.g. 2000+ class files in one directory). It's possible we are hitting some weird bugs/corner cases with compatibility of the internal storage format of the jar itself. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-1520) Assembly Jar with more than JDK7 and run on JDK6
[ https://issues.apache.org/jira/browse/SPARK-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-1520: --- Summary: Assembly Jar with more than JDK7 and run on JDK6 (was: Inclusion of breeze corrupts assembly when compiled with JDK7 and run on JDK6) Assembly Jar with more than JDK7 and run on JDK6 - Key: SPARK-1520 URL: https://issues.apache.org/jira/browse/SPARK-1520 Project: Spark Issue Type: Bug Components: MLlib, Spark Core Reporter: Patrick Wendell Priority: Blocker Fix For: 1.0.0 This is a real doozie - when compiling a Spark assembly with JDK7, the produced jar does not work well with JRE6. I confirmed the byte code being produced is JDK 6 compatible (major version 50). What happens is that, silently, the JRE will not load any class files from the assembled jar. {code} $ sbt/sbt assembly/assembly $ /usr/lib/jvm/java-1.7.0-openjdk-amd64/bin/java -cp /home/patrick/Documents/spark/assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar org.apache.spark.ui.UIWorkloadGenerator usage: ./bin/spark-class org.apache.spark.ui.UIWorkloadGenerator [master] [FIFO|FAIR] $ /usr/lib/jvm/java-1.6.0-openjdk-amd64/bin/java -cp /home/patrick/Documents/spark/assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar org.apache.spark.ui.UIWorkloadGenerator Exception in thread main java.lang.NoClassDefFoundError: org/apache/spark/ui/UIWorkloadGenerator Caused by: java.lang.ClassNotFoundException: org.apache.spark.ui.UIWorkloadGenerator at java.net.URLClassLoader$1.run(URLClassLoader.java:217) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:205) at java.lang.ClassLoader.loadClass(ClassLoader.java:323) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294) at java.lang.ClassLoader.loadClass(ClassLoader.java:268) Could not find the main class: org.apache.spark.ui.UIWorkloadGenerator. Program will exit. {code} I also noticed that if the jar is unzipped, and the classpath set to the currently directory, it just works. Finally, if the assembly jar is compiled with JDK6, it also works. The error is seen with any class, not just the UIWorkloadGenerator. Also, this error doesn't exist in branch 0.9, only in master. *Isolation* -I ran a git bisection and this appeared after the MLLib sparse vector patch was merged:- https://github.com/apache/spark/commit/80c29689ae3b589254a571da3ddb5f9c866ae534 SPARK-1212 -I narrowed this down specifically to the inclusion of the breeze library. Just adding breeze to an older (unaffected) build triggered the issue.- I've found that if I just unpack and re-pack the jar (using `jar` from java 6 or 7) it always works: {code} $ cd assembly/target/scala-2.10/ $ /usr/lib/jvm/java-1.6.0-openjdk-amd64/bin/java -cp ./spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar org.apache.spark.ui.UIWorkloadGenerator # fails $ jar xvf spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar $ jar cvf spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar * $ /usr/lib/jvm/java-1.6.0-openjdk-amd64/bin/java -cp ./spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar org.apache.spark.ui.UIWorkloadGenerator # succeeds {code} I also noticed something of note. The Breeze package contains single directories that have huge numbers of files in them (e.g. 2000+ class files in one directory). It's possible we are hitting some weird bugs/corner cases with compatibility of the internal storage format of the jar itself. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (SPARK-1529) Support setting spark.local.dirs to a hadoop FileSystem
Patrick Wendell created SPARK-1529: -- Summary: Support setting spark.local.dirs to a hadoop FileSystem Key: SPARK-1529 URL: https://issues.apache.org/jira/browse/SPARK-1529 Project: Spark Issue Type: Bug Components: Spark Core Reporter: Patrick Wendell Assignee: Cheng Lian Fix For: 1.1.0 In some environments, like with MapR, local volumes are accessed through the Hadoop filesystem interface. We should allow setting spark.local.dir to a Hadoop filesystem location. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-1520) Assembly Jar with more than 65536 files won't work when compiled on JDK7 and run on JDK6
[ https://issues.apache.org/jira/browse/SPARK-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-1520: --- Description: This is a real doozie - when compiling a Spark assembly with JDK7, the produced jar does not work well with JRE6. I confirmed the byte code being produced is JDK 6 compatible (major version 50). What happens is that, silently, the JRE will not load any class files from the assembled jar. {code} $ sbt/sbt assembly/assembly $ /usr/lib/jvm/java-1.7.0-openjdk-amd64/bin/java -cp /home/patrick/Documents/spark/assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar org.apache.spark.ui.UIWorkloadGenerator usage: ./bin/spark-class org.apache.spark.ui.UIWorkloadGenerator [master] [FIFO|FAIR] $ /usr/lib/jvm/java-1.6.0-openjdk-amd64/bin/java -cp /home/patrick/Documents/spark/assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar org.apache.spark.ui.UIWorkloadGenerator Exception in thread main java.lang.NoClassDefFoundError: org/apache/spark/ui/UIWorkloadGenerator Caused by: java.lang.ClassNotFoundException: org.apache.spark.ui.UIWorkloadGenerator at java.net.URLClassLoader$1.run(URLClassLoader.java:217) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:205) at java.lang.ClassLoader.loadClass(ClassLoader.java:323) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294) at java.lang.ClassLoader.loadClass(ClassLoader.java:268) Could not find the main class: org.apache.spark.ui.UIWorkloadGenerator. Program will exit. {code} I also noticed that if the jar is unzipped, and the classpath set to the currently directory, it just works. Finally, if the assembly jar is compiled with JDK6, it also works. The error is seen with any class, not just the UIWorkloadGenerator. Also, this error doesn't exist in branch 0.9, only in master. h1. Isolation and Cause This issue is caused by the following -I've found that if I just unpack and re-pack the jar (using `jar`) it always works:- {code} $ cd assembly/target/scala-2.10/ $ /usr/lib/jvm/java-1.6.0-openjdk-amd64/bin/java -cp ./spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar org.apache.spark.ui.UIWorkloadGenerator # fails $ jar xvf spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar $ jar cvf spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar * $ /usr/lib/jvm/java-1.6.0-openjdk-amd64/bin/java -cp ./spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar org.apache.spark.ui.UIWorkloadGenerator # succeeds {code} -I also noticed something of note. The Breeze package contains single directories that have huge numbers of files in them (e.g. 2000+ class files in one directory). It's possible we are hitting some weird bugs/corner cases with compatibility of the internal storage format of the jar itself.- -I narrowed this down specifically to the inclusion of the breeze library. Just adding breeze to an older (unaffected) build triggered the issue.- -I ran a git bisection and this appeared after the MLLib sparse vector patch was merged:- https://github.com/apache/spark/commit/80c29689ae3b589254a571da3ddb5f9c866ae534 SPARK-1212 was: This is a real doozie - when compiling a Spark assembly with JDK7, the produced jar does not work well with JRE6. I confirmed the byte code being produced is JDK 6 compatible (major version 50). What happens is that, silently, the JRE will not load any class files from the assembled jar. {code} $ sbt/sbt assembly/assembly $ /usr/lib/jvm/java-1.7.0-openjdk-amd64/bin/java -cp /home/patrick/Documents/spark/assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar org.apache.spark.ui.UIWorkloadGenerator usage: ./bin/spark-class org.apache.spark.ui.UIWorkloadGenerator [master] [FIFO|FAIR] $ /usr/lib/jvm/java-1.6.0-openjdk-amd64/bin/java -cp /home/patrick/Documents/spark/assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar org.apache.spark.ui.UIWorkloadGenerator Exception in thread main java.lang.NoClassDefFoundError: org/apache/spark/ui/UIWorkloadGenerator Caused by: java.lang.ClassNotFoundException: org.apache.spark.ui.UIWorkloadGenerator at java.net.URLClassLoader$1.run(URLClassLoader.java:217) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:205) at java.lang.ClassLoader.loadClass(ClassLoader.java:323) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294) at java.lang.ClassLoader.loadClass(ClassLoader.java:268) Could not find the main class: org.apache.spark.ui.UIWorkloadGenerator. Program will exit. {code} I also noticed that if the jar is unzipped, and the classpath set to the currently directory, it just works. Finally, if the assembly jar is compiled with JDK6, it also works. The error is seen
[jira] [Updated] (SPARK-1520) Assembly Jar with more than 65536 files won't work when compiled on JDK7 and run on JDK6
[ https://issues.apache.org/jira/browse/SPARK-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-1520: --- Description: This is a real doozie - when compiling a Spark assembly with JDK7, the produced jar does not work well with JRE6. I confirmed the byte code being produced is JDK 6 compatible (major version 50). What happens is that, silently, the JRE will not load any class files from the assembled jar. {code} $ sbt/sbt assembly/assembly $ /usr/lib/jvm/java-1.7.0-openjdk-amd64/bin/java -cp /home/patrick/Documents/spark/assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar org.apache.spark.ui.UIWorkloadGenerator usage: ./bin/spark-class org.apache.spark.ui.UIWorkloadGenerator [master] [FIFO|FAIR] $ /usr/lib/jvm/java-1.6.0-openjdk-amd64/bin/java -cp /home/patrick/Documents/spark/assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar org.apache.spark.ui.UIWorkloadGenerator Exception in thread main java.lang.NoClassDefFoundError: org/apache/spark/ui/UIWorkloadGenerator Caused by: java.lang.ClassNotFoundException: org.apache.spark.ui.UIWorkloadGenerator at java.net.URLClassLoader$1.run(URLClassLoader.java:217) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:205) at java.lang.ClassLoader.loadClass(ClassLoader.java:323) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294) at java.lang.ClassLoader.loadClass(ClassLoader.java:268) Could not find the main class: org.apache.spark.ui.UIWorkloadGenerator. Program will exit. {code} I also noticed that if the jar is unzipped, and the classpath set to the currently directory, it just works. Finally, if the assembly jar is compiled with JDK6, it also works. The error is seen with any class, not just the UIWorkloadGenerator. Also, this error doesn't exist in branch 0.9, only in master. h1. Isolation and Cause The package-time behavior of Java 6 and 7 differ with respect to the format used for jar files: ||Number of entries||JDK 6||JDK 7|| |= 65536|zip|zip| | 65536|zip*|zip64| zip* is a workaround for the original zip format that [described in JDK-6828461|https://bugs.openjdk.java.net/browse/JDK-4828461)] that allows some versions of Java 6 to support larger assembly jars. The Scala libraries we depend on have added a large number of classes which bumped us over the limit. This causes the Java 7 packaging to not work with Java 6. We can probably go back under the limit by clearing out some accidental inclusion of FastUtil, but eventually we'll go over again. The real answer is to force people to build with JDK 6 if they want to run Spark on JRE 6. -I've found that if I just unpack and re-pack the jar (using `jar`) it always works:- {code} $ cd assembly/target/scala-2.10/ $ /usr/lib/jvm/java-1.6.0-openjdk-amd64/bin/java -cp ./spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar org.apache.spark.ui.UIWorkloadGenerator # fails $ jar xvf spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar $ jar cvf spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar * $ /usr/lib/jvm/java-1.6.0-openjdk-amd64/bin/java -cp ./spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar org.apache.spark.ui.UIWorkloadGenerator # succeeds {code} -I also noticed something of note. The Breeze package contains single directories that have huge numbers of files in them (e.g. 2000+ class files in one directory). It's possible we are hitting some weird bugs/corner cases with compatibility of the internal storage format of the jar itself.- -I narrowed this down specifically to the inclusion of the breeze library. Just adding breeze to an older (unaffected) build triggered the issue.- -I ran a git bisection and this appeared after the MLLib sparse vector patch was merged:- https://github.com/apache/spark/commit/80c29689ae3b589254a571da3ddb5f9c866ae534 SPARK-1212 was: This is a real doozie - when compiling a Spark assembly with JDK7, the produced jar does not work well with JRE6. I confirmed the byte code being produced is JDK 6 compatible (major version 50). What happens is that, silently, the JRE will not load any class files from the assembled jar. {code} $ sbt/sbt assembly/assembly $ /usr/lib/jvm/java-1.7.0-openjdk-amd64/bin/java -cp /home/patrick/Documents/spark/assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar org.apache.spark.ui.UIWorkloadGenerator usage: ./bin/spark-class org.apache.spark.ui.UIWorkloadGenerator [master] [FIFO|FAIR] $ /usr/lib/jvm/java-1.6.0-openjdk-amd64/bin/java -cp /home/patrick/Documents/spark/assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar org.apache.spark.ui.UIWorkloadGenerator Exception in thread main java.lang.NoClassDefFoundError: org/apache/spark/ui/UIWorkloadGenerator Caused by: java.lang.ClassNotFoundException:
[jira] [Updated] (SPARK-1520) Assembly Jar with more than 65536 files won't work when compiled on JDK7 and run on JDK6
[ https://issues.apache.org/jira/browse/SPARK-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-1520: --- Description: This is a real doozie - when compiling a Spark assembly with JDK7, the produced jar does not work well with JRE6. I confirmed the byte code being produced is JDK 6 compatible (major version 50). What happens is that, silently, the JRE will not load any class files from the assembled jar. {code} $ sbt/sbt assembly/assembly $ /usr/lib/jvm/java-1.7.0-openjdk-amd64/bin/java -cp /home/patrick/Documents/spark/assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar org.apache.spark.ui.UIWorkloadGenerator usage: ./bin/spark-class org.apache.spark.ui.UIWorkloadGenerator [master] [FIFO|FAIR] $ /usr/lib/jvm/java-1.6.0-openjdk-amd64/bin/java -cp /home/patrick/Documents/spark/assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar org.apache.spark.ui.UIWorkloadGenerator Exception in thread main java.lang.NoClassDefFoundError: org/apache/spark/ui/UIWorkloadGenerator Caused by: java.lang.ClassNotFoundException: org.apache.spark.ui.UIWorkloadGenerator at java.net.URLClassLoader$1.run(URLClassLoader.java:217) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:205) at java.lang.ClassLoader.loadClass(ClassLoader.java:323) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294) at java.lang.ClassLoader.loadClass(ClassLoader.java:268) Could not find the main class: org.apache.spark.ui.UIWorkloadGenerator. Program will exit. {code} I also noticed that if the jar is unzipped, and the classpath set to the currently directory, it just works. Finally, if the assembly jar is compiled with JDK6, it also works. The error is seen with any class, not just the UIWorkloadGenerator. Also, this error doesn't exist in branch 0.9, only in master. h1. Isolation and Cause The package-time behavior of Java 6 and 7 differ with respect to the format used for jar files: ||Number of entries||JDK 6||JDK 7|| |= 65536|zip|zip| | 65536|zip*|zip64| zip* is a workaround for the original zip format that [described in JDK-6828461|https://bugs.openjdk.java.net/browse/JDK-4828461] that allows some versions of Java 6 to support larger assembly jars. The Scala libraries we depend on have added a large number of classes which bumped us over the limit. This causes the Java 7 packaging to not work with Java 6. We can probably go back under the limit by clearing out some accidental inclusion of FastUtil, but eventually we'll go over again. The real answer is to force people to build with JDK 6 if they want to run Spark on JRE 6. -I've found that if I just unpack and re-pack the jar (using `jar`) it always works:- {code} $ cd assembly/target/scala-2.10/ $ /usr/lib/jvm/java-1.6.0-openjdk-amd64/bin/java -cp ./spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar org.apache.spark.ui.UIWorkloadGenerator # fails $ jar xvf spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar $ jar cvf spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar * $ /usr/lib/jvm/java-1.6.0-openjdk-amd64/bin/java -cp ./spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar org.apache.spark.ui.UIWorkloadGenerator # succeeds {code} -I also noticed something of note. The Breeze package contains single directories that have huge numbers of files in them (e.g. 2000+ class files in one directory). It's possible we are hitting some weird bugs/corner cases with compatibility of the internal storage format of the jar itself.- -I narrowed this down specifically to the inclusion of the breeze library. Just adding breeze to an older (unaffected) build triggered the issue.- -I ran a git bisection and this appeared after the MLLib sparse vector patch was merged:- https://github.com/apache/spark/commit/80c29689ae3b589254a571da3ddb5f9c866ae534 SPARK-1212 was: This is a real doozie - when compiling a Spark assembly with JDK7, the produced jar does not work well with JRE6. I confirmed the byte code being produced is JDK 6 compatible (major version 50). What happens is that, silently, the JRE will not load any class files from the assembled jar. {code} $ sbt/sbt assembly/assembly $ /usr/lib/jvm/java-1.7.0-openjdk-amd64/bin/java -cp /home/patrick/Documents/spark/assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar org.apache.spark.ui.UIWorkloadGenerator usage: ./bin/spark-class org.apache.spark.ui.UIWorkloadGenerator [master] [FIFO|FAIR] $ /usr/lib/jvm/java-1.6.0-openjdk-amd64/bin/java -cp /home/patrick/Documents/spark/assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar org.apache.spark.ui.UIWorkloadGenerator Exception in thread main java.lang.NoClassDefFoundError: org/apache/spark/ui/UIWorkloadGenerator Caused by: java.lang.ClassNotFoundException:
[jira] [Commented] (SPARK-1520) Assembly Jar with more than 65536 files won't work when compiled on JDK7 and run on JDK6
[ https://issues.apache.org/jira/browse/SPARK-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973463#comment-13973463 ] Xiangrui Meng commented on SPARK-1520: -- It seems HyperLogLog doesn't need fastutil, so we can exclude fastutil directly. Will send a patch. Assembly Jar with more than 65536 files won't work when compiled on JDK7 and run on JDK6 - Key: SPARK-1520 URL: https://issues.apache.org/jira/browse/SPARK-1520 Project: Spark Issue Type: Bug Components: MLlib, Spark Core Reporter: Patrick Wendell Priority: Blocker Fix For: 1.0.0 This is a real doozie - when compiling a Spark assembly with JDK7, the produced jar does not work well with JRE6. I confirmed the byte code being produced is JDK 6 compatible (major version 50). What happens is that, silently, the JRE will not load any class files from the assembled jar. {code} $ sbt/sbt assembly/assembly $ /usr/lib/jvm/java-1.7.0-openjdk-amd64/bin/java -cp /home/patrick/Documents/spark/assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar org.apache.spark.ui.UIWorkloadGenerator usage: ./bin/spark-class org.apache.spark.ui.UIWorkloadGenerator [master] [FIFO|FAIR] $ /usr/lib/jvm/java-1.6.0-openjdk-amd64/bin/java -cp /home/patrick/Documents/spark/assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar org.apache.spark.ui.UIWorkloadGenerator Exception in thread main java.lang.NoClassDefFoundError: org/apache/spark/ui/UIWorkloadGenerator Caused by: java.lang.ClassNotFoundException: org.apache.spark.ui.UIWorkloadGenerator at java.net.URLClassLoader$1.run(URLClassLoader.java:217) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:205) at java.lang.ClassLoader.loadClass(ClassLoader.java:323) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294) at java.lang.ClassLoader.loadClass(ClassLoader.java:268) Could not find the main class: org.apache.spark.ui.UIWorkloadGenerator. Program will exit. {code} I also noticed that if the jar is unzipped, and the classpath set to the currently directory, it just works. Finally, if the assembly jar is compiled with JDK6, it also works. The error is seen with any class, not just the UIWorkloadGenerator. Also, this error doesn't exist in branch 0.9, only in master. h1. Isolation and Cause The package-time behavior of Java 6 and 7 differ with respect to the format used for jar files: ||Number of entries||JDK 6||JDK 7|| |= 65536|zip|zip| | 65536|zip*|zip64| zip* is a workaround for the original zip format that [described in JDK-6828461|https://bugs.openjdk.java.net/browse/JDK-4828461] that allows some versions of Java 6 to support larger assembly jars. The Scala libraries we depend on have added a large number of classes which bumped us over the limit. This causes the Java 7 packaging to not work with Java 6. We can probably go back under the limit by clearing out some accidental inclusion of FastUtil, but eventually we'll go over again. The real answer is to force people to build with JDK 6 if they want to run Spark on JRE 6. -I've found that if I just unpack and re-pack the jar (using `jar`) it always works:- {code} $ cd assembly/target/scala-2.10/ $ /usr/lib/jvm/java-1.6.0-openjdk-amd64/bin/java -cp ./spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar org.apache.spark.ui.UIWorkloadGenerator # fails $ jar xvf spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar $ jar cvf spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar * $ /usr/lib/jvm/java-1.6.0-openjdk-amd64/bin/java -cp ./spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar org.apache.spark.ui.UIWorkloadGenerator # succeeds {code} -I also noticed something of note. The Breeze package contains single directories that have huge numbers of files in them (e.g. 2000+ class files in one directory). It's possible we are hitting some weird bugs/corner cases with compatibility of the internal storage format of the jar itself.- -I narrowed this down specifically to the inclusion of the breeze library. Just adding breeze to an older (unaffected) build triggered the issue.- -I ran a git bisection and this appeared after the MLLib sparse vector patch was merged:- https://github.com/apache/spark/commit/80c29689ae3b589254a571da3ddb5f9c866ae534 SPARK-1212 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (SPARK-1464) Update MLLib Examples to Use Breeze
[ https://issues.apache.org/jira/browse/SPARK-1464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng resolved SPARK-1464. -- Resolution: Duplicate Update MLLib Examples to Use Breeze --- Key: SPARK-1464 URL: https://issues.apache.org/jira/browse/SPARK-1464 Project: Spark Issue Type: Task Components: MLlib Reporter: Patrick Wendell Assignee: Xiangrui Meng Priority: Blocker Fix For: 1.0.0 If we want to deprecate the vector class we need to update all of the examples to use Breeze. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-1496) SparkContext.jarOfClass should return Option instead of a sequence
[ https://issues.apache.org/jira/browse/SPARK-1496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-1496: --- Labels: release-notes (was: releasenotes) SparkContext.jarOfClass should return Option instead of a sequence -- Key: SPARK-1496 URL: https://issues.apache.org/jira/browse/SPARK-1496 Project: Spark Issue Type: Improvement Components: Spark Core Reporter: Patrick Wendell Assignee: Patrick Wendell Labels: release-notes Fix For: 1.0.0 This is pretty confusing, especially since addJar expects to take a single jar. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-1496) SparkContext.jarOfClass should return Option instead of a sequence
[ https://issues.apache.org/jira/browse/SPARK-1496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-1496: --- Labels: releasenotes (was: ) SparkContext.jarOfClass should return Option instead of a sequence -- Key: SPARK-1496 URL: https://issues.apache.org/jira/browse/SPARK-1496 Project: Spark Issue Type: Improvement Components: Spark Core Reporter: Patrick Wendell Assignee: Patrick Wendell Labels: release-notes Fix For: 1.0.0 This is pretty confusing, especially since addJar expects to take a single jar. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-1496) SparkContext.jarOfClass should return Option instead of a sequence
[ https://issues.apache.org/jira/browse/SPARK-1496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-1496: --- Labels: api-change (was: release-notes) SparkContext.jarOfClass should return Option instead of a sequence -- Key: SPARK-1496 URL: https://issues.apache.org/jira/browse/SPARK-1496 Project: Spark Issue Type: Improvement Components: Spark Core Reporter: Patrick Wendell Assignee: Patrick Wendell Labels: api-change Fix For: 1.0.0 This is pretty confusing, especially since addJar expects to take a single jar. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-964) Investigate the potential for using JDK 8 lambda expressions for the Java/Scala APIs
[ https://issues.apache.org/jira/browse/SPARK-964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-964: -- Labels: api-change (was: ) Investigate the potential for using JDK 8 lambda expressions for the Java/Scala APIs Key: SPARK-964 URL: https://issues.apache.org/jira/browse/SPARK-964 Project: Spark Issue Type: Story Reporter: Marek Kolodziej Assignee: Marek Kolodziej Labels: api-change Fix For: 1.0.0 JDK 8 (to be released soon) will have lambda expressions. The question is whether they can be leveraged for Java to use Scala's Spark API (perhaps with some modifications), or whether a new functional API would need to be developed for Java 8+. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (SPARK-1530) Streaming UI test can hang indefinitely
Patrick Wendell created SPARK-1530: -- Summary: Streaming UI test can hang indefinitely Key: SPARK-1530 URL: https://issues.apache.org/jira/browse/SPARK-1530 Project: Spark Issue Type: Bug Reporter: Patrick Wendell Assignee: Tathagata Das This has been causing Jenkins to hang recently: {code} pool-1-thread-1 prio=10 tid=0x7f4b9449f000 nid=0x6c37 runnable [0x7f4b8a26c000] java.lang.Thread.State: RUNNABLE at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.read(SocketInputStream.java:152) at java.net.SocketInputStream.read(SocketInputStream.java:122) at java.io.BufferedInputStream.fill(BufferedInputStream.java:235) at java.io.BufferedInputStream.read1(BufferedInputStream.java:275) at java.io.BufferedInputStream.read(BufferedInputStream.java:334) - locked 0x0007cad700d0 (a java.io.BufferedInputStream) at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:687) at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633) at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1323) - locked 0x0007cad662b8 (a sun.net.www.protocol.http.HttpURLConnection) at java.net.URL.openStream(URL.java:1037) at scala.io.Source$.fromURL(Source.scala:140) at scala.io.Source$.fromURL(Source.scala:130) at org.apache.spark.ui.UISuite$$anonfun$2$$anonfun$apply$mcV$sp$2$$anonfun$apply$2.apply$mcV$sp(UISuite.scala:57) at org.apache.spark.ui.UISuite$$anonfun$2$$anonfun$apply$mcV$sp$2$$anonfun$apply$2.apply(UISuite.scala:56) at org.apache.spark.ui.UISuite$$anonfun$2$$anonfun$apply$mcV$sp$2$$anonfun$apply$2.apply(UISuite.scala:56) at org.scalatest.concurrent.Eventually$class.makeAValiantAttempt$1(Eventually.scala:394) at org.scalatest.concurrent.Eventually$class.tryTryAgain$1(Eventually.scala:408) at org.scalatest.concurrent.Eventually$class.eventually(Eventually.scala:437) at org.scalatest.concurrent.Eventually$.eventually(Eventually.scala:477) at org.scalatest.concurrent.Eventually$class.eventually(Eventually.scala:307) at org.scalatest.concurrent.Eventually$.eventually(Eventually.scala:477) at org.apache.spark.ui.UISuite$$anonfun$2$$anonfun$apply$mcV$sp$2.apply(UISuite.scala:56) at org.apache.spark.ui.UISuite$$anonfun$2$$anonfun$apply$mcV$sp$2.apply(UISuite.scala:54) at org.apache.spark.LocalSparkContext$.withSpark(LocalSparkContext.scala:60) at org.apache.spark.ui.UISuite$$anonfun$2.apply$mcV$sp(UISuite.scala:54) at org.apache.spark.ui.UISuite$$anonfun$2.apply(UISuite.scala:54) at org.apache.spark.ui.UISuite$$anonfun$2.apply(UISuite.scala:54) at org.scalatest.FunSuite$$anon$1.apply(FunSuite.scala:1265) at org.scalatest.Suite$class.withFixture(Suite.scala:1974) at org.apache.spark.ui.UISuite.withFixture(UISuite.scala:37) at org.scalatest.FunSuite$class.invokeWithFixture$1(FunSuite.scala:1262) at org.scalatest.FunSuite$$anonfun$runTest$1.apply(FunSuite.scala:1271) at org.scalatest.FunSuite$$anonfun$runTest$1.apply(FunSuite.scala:1271) at org.scalatest.SuperEngine.runTestImpl(Engine.scala:198) at org.scalatest.FunSuite$class.runTest(FunSuite.scala:1271) at org.apache.spark.ui.UISuite.runTest(UISuite.scala:37) at org.scalatest.FunSuite$$anonfun$runTests$1.apply(FunSuite.scala:1304) at org.scalatest.FunSuite$$anonfun$runTests$1.apply(FunSuite.scala:1304) at org.scalatest.SuperEngine$$anonfun$org$scalatest$SuperEngine$$runTestsInBranch$1.apply(Engine.scala:260) at org.scalatest.SuperEngine$$anonfun$org$scalatest$SuperEngine$$runTestsInBranch$1.apply(Engine.scala:249) at scala.collection.immutable.List.foreach(List.scala:318) at org.scalatest.SuperEngine.org$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:249) at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:326) at org.scalatest.FunSuite$class.runTests(FunSuite.scala:1304) at org.apache.spark.ui.UISuite.runTests(UISuite.scala:37) at org.scalatest.Suite$class.run(Suite.scala:2303) at org.apache.spark.ui.UISuite.org$scalatest$FunSuite$$super$run(UISuite.scala:37) at org.scalatest.FunSuite$$anonfun$run$1.apply(FunSuite.scala:1310) at org.scalatest.FunSuite$$anonfun$run$1.apply(FunSuite.scala:1310) at org.scalatest.SuperEngine.runImpl(Engine.scala:362) at org.scalatest.FunSuite$class.run(FunSuite.scala:1310) at org.apache.spark.ui.UISuite.run(UISuite.scala:37) at org.scalatest.tools.ScalaTestFramework$ScalaTestRunner.run(ScalaTestFramework.scala:214) at
[jira] [Commented] (SPARK-1473) Feature selection for high dimensional datasets
[ https://issues.apache.org/jira/browse/SPARK-1473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973631#comment-13973631 ] Ignacio Zendejas commented on SPARK-1473: - Thanks for pointing this out Martin. We'll definitely take this into consideration. Feature selection for high dimensional datasets --- Key: SPARK-1473 URL: https://issues.apache.org/jira/browse/SPARK-1473 Project: Spark Issue Type: New Feature Components: MLlib Reporter: Ignacio Zendejas Priority: Minor Labels: features Fix For: 1.1.0 For classification tasks involving large feature spaces in the order of tens of thousands or higher (e.g., text classification with n-grams, where n 1), it is often useful to rank and filter features that are irrelevant thereby reducing the feature space by at least one or two orders of magnitude without impacting performance on key evaluation metrics (accuracy/precision/recall). A feature evaluation interface which is flexible needs to be designed and at least two methods should be implemented with Information Gain being a priority as it has been shown to be amongst the most reliable. Special consideration should be taken in the design to account for wrapper methods (see research papers below) which are more practical for lower dimensional data. Relevant research: * Brown, G., Pocock, A., Zhao, M. J., Luján, M. (2012). Conditional likelihood maximisation: a unifying framework for information theoretic feature selection.*The Journal of Machine Learning Research*, *13*, 27-66. * Forman, George. An extensive empirical study of feature selection metrics for text classification. The Journal of machine learning research 3 (2003): 1289-1305. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-1530) Streaming UI test can hang indefinitely
[ https://issues.apache.org/jira/browse/SPARK-1530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973737#comment-13973737 ] Tathagata Das commented on SPARK-1530: -- I wonder what about Jenkins environment is causing this. And how frequently does this happen? Streaming UI test can hang indefinitely --- Key: SPARK-1530 URL: https://issues.apache.org/jira/browse/SPARK-1530 Project: Spark Issue Type: Bug Reporter: Patrick Wendell Assignee: Tathagata Das This has been causing Jenkins to hang recently: {code} pool-1-thread-1 prio=10 tid=0x7f4b9449f000 nid=0x6c37 runnable [0x7f4b8a26c000] java.lang.Thread.State: RUNNABLE at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.read(SocketInputStream.java:152) at java.net.SocketInputStream.read(SocketInputStream.java:122) at java.io.BufferedInputStream.fill(BufferedInputStream.java:235) at java.io.BufferedInputStream.read1(BufferedInputStream.java:275) at java.io.BufferedInputStream.read(BufferedInputStream.java:334) - locked 0x0007cad700d0 (a java.io.BufferedInputStream) at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:687) at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633) at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1323) - locked 0x0007cad662b8 (a sun.net.www.protocol.http.HttpURLConnection) at java.net.URL.openStream(URL.java:1037) at scala.io.Source$.fromURL(Source.scala:140) at scala.io.Source$.fromURL(Source.scala:130) at org.apache.spark.ui.UISuite$$anonfun$2$$anonfun$apply$mcV$sp$2$$anonfun$apply$2.apply$mcV$sp(UISuite.scala:57) at org.apache.spark.ui.UISuite$$anonfun$2$$anonfun$apply$mcV$sp$2$$anonfun$apply$2.apply(UISuite.scala:56) at org.apache.spark.ui.UISuite$$anonfun$2$$anonfun$apply$mcV$sp$2$$anonfun$apply$2.apply(UISuite.scala:56) at org.scalatest.concurrent.Eventually$class.makeAValiantAttempt$1(Eventually.scala:394) at org.scalatest.concurrent.Eventually$class.tryTryAgain$1(Eventually.scala:408) at org.scalatest.concurrent.Eventually$class.eventually(Eventually.scala:437) at org.scalatest.concurrent.Eventually$.eventually(Eventually.scala:477) at org.scalatest.concurrent.Eventually$class.eventually(Eventually.scala:307) at org.scalatest.concurrent.Eventually$.eventually(Eventually.scala:477) at org.apache.spark.ui.UISuite$$anonfun$2$$anonfun$apply$mcV$sp$2.apply(UISuite.scala:56) at org.apache.spark.ui.UISuite$$anonfun$2$$anonfun$apply$mcV$sp$2.apply(UISuite.scala:54) at org.apache.spark.LocalSparkContext$.withSpark(LocalSparkContext.scala:60) at org.apache.spark.ui.UISuite$$anonfun$2.apply$mcV$sp(UISuite.scala:54) at org.apache.spark.ui.UISuite$$anonfun$2.apply(UISuite.scala:54) at org.apache.spark.ui.UISuite$$anonfun$2.apply(UISuite.scala:54) at org.scalatest.FunSuite$$anon$1.apply(FunSuite.scala:1265) at org.scalatest.Suite$class.withFixture(Suite.scala:1974) at org.apache.spark.ui.UISuite.withFixture(UISuite.scala:37) at org.scalatest.FunSuite$class.invokeWithFixture$1(FunSuite.scala:1262) at org.scalatest.FunSuite$$anonfun$runTest$1.apply(FunSuite.scala:1271) at org.scalatest.FunSuite$$anonfun$runTest$1.apply(FunSuite.scala:1271) at org.scalatest.SuperEngine.runTestImpl(Engine.scala:198) at org.scalatest.FunSuite$class.runTest(FunSuite.scala:1271) at org.apache.spark.ui.UISuite.runTest(UISuite.scala:37) at org.scalatest.FunSuite$$anonfun$runTests$1.apply(FunSuite.scala:1304) at org.scalatest.FunSuite$$anonfun$runTests$1.apply(FunSuite.scala:1304) at org.scalatest.SuperEngine$$anonfun$org$scalatest$SuperEngine$$runTestsInBranch$1.apply(Engine.scala:260) at org.scalatest.SuperEngine$$anonfun$org$scalatest$SuperEngine$$runTestsInBranch$1.apply(Engine.scala:249) at scala.collection.immutable.List.foreach(List.scala:318) at org.scalatest.SuperEngine.org$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:249) at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:326) at org.scalatest.FunSuite$class.runTests(FunSuite.scala:1304) at org.apache.spark.ui.UISuite.runTests(UISuite.scala:37) at org.scalatest.Suite$class.run(Suite.scala:2303) at org.apache.spark.ui.UISuite.org$scalatest$FunSuite$$super$run(UISuite.scala:37) at org.scalatest.FunSuite$$anonfun$run$1.apply(FunSuite.scala:1310) at
[jira] [Assigned] (SPARK-1520) Assembly Jar with more than 65536 files won't work when compiled on JDK7 and run on JDK6
[ https://issues.apache.org/jira/browse/SPARK-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng reassigned SPARK-1520: Assignee: Xiangrui Meng Assembly Jar with more than 65536 files won't work when compiled on JDK7 and run on JDK6 - Key: SPARK-1520 URL: https://issues.apache.org/jira/browse/SPARK-1520 Project: Spark Issue Type: Bug Components: MLlib, Spark Core Reporter: Patrick Wendell Assignee: Xiangrui Meng Priority: Blocker Fix For: 1.0.0 This is a real doozie - when compiling a Spark assembly with JDK7, the produced jar does not work well with JRE6. I confirmed the byte code being produced is JDK 6 compatible (major version 50). What happens is that, silently, the JRE will not load any class files from the assembled jar. {code} $ sbt/sbt assembly/assembly $ /usr/lib/jvm/java-1.7.0-openjdk-amd64/bin/java -cp /home/patrick/Documents/spark/assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar org.apache.spark.ui.UIWorkloadGenerator usage: ./bin/spark-class org.apache.spark.ui.UIWorkloadGenerator [master] [FIFO|FAIR] $ /usr/lib/jvm/java-1.6.0-openjdk-amd64/bin/java -cp /home/patrick/Documents/spark/assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar org.apache.spark.ui.UIWorkloadGenerator Exception in thread main java.lang.NoClassDefFoundError: org/apache/spark/ui/UIWorkloadGenerator Caused by: java.lang.ClassNotFoundException: org.apache.spark.ui.UIWorkloadGenerator at java.net.URLClassLoader$1.run(URLClassLoader.java:217) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:205) at java.lang.ClassLoader.loadClass(ClassLoader.java:323) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294) at java.lang.ClassLoader.loadClass(ClassLoader.java:268) Could not find the main class: org.apache.spark.ui.UIWorkloadGenerator. Program will exit. {code} I also noticed that if the jar is unzipped, and the classpath set to the currently directory, it just works. Finally, if the assembly jar is compiled with JDK6, it also works. The error is seen with any class, not just the UIWorkloadGenerator. Also, this error doesn't exist in branch 0.9, only in master. h1. Isolation and Cause The package-time behavior of Java 6 and 7 differ with respect to the format used for jar files: ||Number of entries||JDK 6||JDK 7|| |= 65536|zip|zip| | 65536|zip*|zip64| zip* is a workaround for the original zip format that [described in JDK-6828461|https://bugs.openjdk.java.net/browse/JDK-4828461] that allows some versions of Java 6 to support larger assembly jars. The Scala libraries we depend on have added a large number of classes which bumped us over the limit. This causes the Java 7 packaging to not work with Java 6. We can probably go back under the limit by clearing out some accidental inclusion of FastUtil, but eventually we'll go over again. The real answer is to force people to build with JDK 6 if they want to run Spark on JRE 6. -I've found that if I just unpack and re-pack the jar (using `jar`) it always works:- {code} $ cd assembly/target/scala-2.10/ $ /usr/lib/jvm/java-1.6.0-openjdk-amd64/bin/java -cp ./spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar org.apache.spark.ui.UIWorkloadGenerator # fails $ jar xvf spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar $ jar cvf spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar * $ /usr/lib/jvm/java-1.6.0-openjdk-amd64/bin/java -cp ./spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar org.apache.spark.ui.UIWorkloadGenerator # succeeds {code} -I also noticed something of note. The Breeze package contains single directories that have huge numbers of files in them (e.g. 2000+ class files in one directory). It's possible we are hitting some weird bugs/corner cases with compatibility of the internal storage format of the jar itself.- -I narrowed this down specifically to the inclusion of the breeze library. Just adding breeze to an older (unaffected) build triggered the issue.- -I ran a git bisection and this appeared after the MLLib sparse vector patch was merged:- https://github.com/apache/spark/commit/80c29689ae3b589254a571da3ddb5f9c866ae534 SPARK-1212 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (SPARK-1531) GraphX should have messageRDD to enable OutOfCore messages
Jianfeng created SPARK-1531: --- Summary: GraphX should have messageRDD to enable OutOfCore messages Key: SPARK-1531 URL: https://issues.apache.org/jira/browse/SPARK-1531 Project: Spark Issue Type: Improvement Components: GraphX Affects Versions: 0.9.1 Reporter: Jianfeng There is no such `messageRDD` in Pregel function. Most of the sendMessage is directly sent one Scala Iterator. Like the below one in staticPageRank: ``` def sendMessage(edge: EdgeTriplet[Double, Double]) = Iterator((edge.dstId, edge.srcAttr * edge.attr)) ``` For some message intensively computation on some bigger graph, it will throw OOM exceptions. If we have some more general messageRDD, at lease we can set MessageRDD.persist(DISK) to enable it flush onto the disk. -- This message was sent by Atlassian JIRA (v6.2#6252)