[jira] [Commented] (SPARK-1464) Update MLLib Examples to Use Breeze

2014-04-17 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13972342#comment-13972342
 ] 

Sean Owen commented on SPARK-1464:
--

This is a duplicate of https://issues.apache.org/jira/browse/SPARK-1462 which 
is resolved now.

 Update MLLib Examples to Use Breeze
 ---

 Key: SPARK-1464
 URL: https://issues.apache.org/jira/browse/SPARK-1464
 Project: Spark
  Issue Type: Task
  Components: MLlib
Reporter: Patrick Wendell
Assignee: Xiangrui Meng
Priority: Blocker
 Fix For: 1.0.0


 If we want to deprecate the vector class we need to update all of the 
 examples to use Breeze.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (SPARK-1520) Spark assembly fails with Java 6

2014-04-17 Thread Patrick Wendell (JIRA)
Patrick Wendell created SPARK-1520:
--

 Summary: Spark assembly fails with Java 6
 Key: SPARK-1520
 URL: https://issues.apache.org/jira/browse/SPARK-1520
 Project: Spark
  Issue Type: Bug
Reporter: Patrick Wendell
Priority: Blocker
 Fix For: 1.0.0


This is a real doozie - when compiling a Spark assembly with JDK7, the produced 
jar does not work well with JRE6. I confirmed the byte code being produced is 
JDK 6 compatible (major version 50). What happens is that, silently, the JRE 
will not load any class files from the assembled jar.

{code}
patrick@patrick-t430s: sbt/sbt assembly/assembly
patrick@patrick-t430s:~/Documents/spark$ 
/usr/lib/jvm/java-1.7.0-openjdk-amd64/bin/java -cp 
/home/patrick/Documents/spark/assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar
 org.apache.spark.ui.UIWorkloadGenerator
usage: ./bin/spark-class org.apache.spark.ui.UIWorkloadGenerator [master] 
[FIFO|FAIR]
patrick@patrick-t430s:~/Documents/spark$ 
/usr/lib/jvm/java-1.6.0-openjdk-amd64/bin/java -cp 
/home/patrick/Documents/spark/assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar
 org.apache.spark.ui.UIWorkloadGenerator
Exception in thread main java.lang.NoClassDefFoundError: 
org/apache/spark/ui/UIWorkloadGenerator
Caused by: java.lang.ClassNotFoundException: 
org.apache.spark.ui.UIWorkloadGenerator
at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
at java.lang.ClassLoader.loadClass(ClassLoader.java:323)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
at java.lang.ClassLoader.loadClass(ClassLoader.java:268)
Could not find the main class: org.apache.spark.ui.UIWorkloadGenerator. Program 
will exit.

{code}

I also noticed that if the jar is unzipped, and the classpath set to the 
currently directory, it just works. Finally, if the assembly jar is compiled 
with JDK6, it also works.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (SPARK-1520) Spark assembly fails with Java 6

2014-04-17 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-1520:
---

Description: 
This is a real doozie - when compiling a Spark assembly with JDK7, the produced 
jar does not work well with JRE6. I confirmed the byte code being produced is 
JDK 6 compatible (major version 50). What happens is that, silently, the JRE 
will not load any class files from the assembled jar.

{code}
$ sbt/sbt assembly/assembly

$ /usr/lib/jvm/java-1.7.0-openjdk-amd64/bin/java -cp 
/home/patrick/Documents/spark/assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar
 org.apache.spark.ui.UIWorkloadGenerator
usage: ./bin/spark-class org.apache.spark.ui.UIWorkloadGenerator [master] 
[FIFO|FAIR]

$ /usr/lib/jvm/java-1.6.0-openjdk-amd64/bin/java -cp 
/home/patrick/Documents/spark/assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar
 org.apache.spark.ui.UIWorkloadGenerator
Exception in thread main java.lang.NoClassDefFoundError: 
org/apache/spark/ui/UIWorkloadGenerator
Caused by: java.lang.ClassNotFoundException: 
org.apache.spark.ui.UIWorkloadGenerator
at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
at java.lang.ClassLoader.loadClass(ClassLoader.java:323)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
at java.lang.ClassLoader.loadClass(ClassLoader.java:268)
Could not find the main class: org.apache.spark.ui.UIWorkloadGenerator. Program 
will exit.

{code}

I also noticed that if the jar is unzipped, and the classpath set to the 
currently directory, it just works. Finally, if the assembly jar is compiled 
with JDK6, it also works.

  was:
This is a real doozie - when compiling a Spark assembly with JDK7, the produced 
jar does not work well with JRE6. I confirmed the byte code being produced is 
JDK 6 compatible (major version 50). What happens is that, silently, the JRE 
will not load any class files from the assembled jar.

{code}
patrick@patrick-t430s: sbt/sbt assembly/assembly
patrick@patrick-t430s:~/Documents/spark$ 
/usr/lib/jvm/java-1.7.0-openjdk-amd64/bin/java -cp 
/home/patrick/Documents/spark/assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar
 org.apache.spark.ui.UIWorkloadGenerator
usage: ./bin/spark-class org.apache.spark.ui.UIWorkloadGenerator [master] 
[FIFO|FAIR]
patrick@patrick-t430s:~/Documents/spark$ 
/usr/lib/jvm/java-1.6.0-openjdk-amd64/bin/java -cp 
/home/patrick/Documents/spark/assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar
 org.apache.spark.ui.UIWorkloadGenerator
Exception in thread main java.lang.NoClassDefFoundError: 
org/apache/spark/ui/UIWorkloadGenerator
Caused by: java.lang.ClassNotFoundException: 
org.apache.spark.ui.UIWorkloadGenerator
at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
at java.lang.ClassLoader.loadClass(ClassLoader.java:323)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
at java.lang.ClassLoader.loadClass(ClassLoader.java:268)
Could not find the main class: org.apache.spark.ui.UIWorkloadGenerator. Program 
will exit.

{code}

I also noticed that if the jar is unzipped, and the classpath set to the 
currently directory, it just works. Finally, if the assembly jar is compiled 
with JDK6, it also works.


 Spark assembly fails with Java 6
 

 Key: SPARK-1520
 URL: https://issues.apache.org/jira/browse/SPARK-1520
 Project: Spark
  Issue Type: Bug
Reporter: Patrick Wendell
Priority: Blocker
 Fix For: 1.0.0


 This is a real doozie - when compiling a Spark assembly with JDK7, the 
 produced jar does not work well with JRE6. I confirmed the byte code being 
 produced is JDK 6 compatible (major version 50). What happens is that, 
 silently, the JRE will not load any class files from the assembled jar.
 {code}
 $ sbt/sbt assembly/assembly
 $ /usr/lib/jvm/java-1.7.0-openjdk-amd64/bin/java -cp 
 /home/patrick/Documents/spark/assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar
  org.apache.spark.ui.UIWorkloadGenerator
 usage: ./bin/spark-class org.apache.spark.ui.UIWorkloadGenerator [master] 
 [FIFO|FAIR]
 $ /usr/lib/jvm/java-1.6.0-openjdk-amd64/bin/java -cp 
 /home/patrick/Documents/spark/assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar
  org.apache.spark.ui.UIWorkloadGenerator
 Exception in thread main java.lang.NoClassDefFoundError: 
 org/apache/spark/ui/UIWorkloadGenerator
 Caused by: java.lang.ClassNotFoundException: 
 

[jira] [Updated] (SPARK-1520) Spark assembly fails with JRE6 for unknown reason

2014-04-17 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-1520:
---

Summary: Spark assembly fails with JRE6 for unknown reason  (was: Spark 
assembly fails with Java 6)

 Spark assembly fails with JRE6 for unknown reason
 -

 Key: SPARK-1520
 URL: https://issues.apache.org/jira/browse/SPARK-1520
 Project: Spark
  Issue Type: Bug
Reporter: Patrick Wendell
Priority: Blocker
 Fix For: 1.0.0


 This is a real doozie - when compiling a Spark assembly with JDK7, the 
 produced jar does not work well with JRE6. I confirmed the byte code being 
 produced is JDK 6 compatible (major version 50). What happens is that, 
 silently, the JRE will not load any class files from the assembled jar.
 {code}
 $ sbt/sbt assembly/assembly
 $ /usr/lib/jvm/java-1.7.0-openjdk-amd64/bin/java -cp 
 /home/patrick/Documents/spark/assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar
  org.apache.spark.ui.UIWorkloadGenerator
 usage: ./bin/spark-class org.apache.spark.ui.UIWorkloadGenerator [master] 
 [FIFO|FAIR]
 $ /usr/lib/jvm/java-1.6.0-openjdk-amd64/bin/java -cp 
 /home/patrick/Documents/spark/assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar
  org.apache.spark.ui.UIWorkloadGenerator
 Exception in thread main java.lang.NoClassDefFoundError: 
 org/apache/spark/ui/UIWorkloadGenerator
 Caused by: java.lang.ClassNotFoundException: 
 org.apache.spark.ui.UIWorkloadGenerator
   at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
   at java.security.AccessController.doPrivileged(Native Method)
   at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
   at java.lang.ClassLoader.loadClass(ClassLoader.java:323)
   at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
   at java.lang.ClassLoader.loadClass(ClassLoader.java:268)
 Could not find the main class: org.apache.spark.ui.UIWorkloadGenerator. 
 Program will exit.
 {code}
 I also noticed that if the jar is unzipped, and the classpath set to the 
 currently directory, it just works. Finally, if the assembly jar is 
 compiled with JDK6, it also works. The error is seen with any class, not just 
 the UIWorkloadGenerator. Also, this error doesn't exist in branch 0.9, only 
 in master.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (SPARK-1520) Spark assembly fails with Java 6

2014-04-17 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-1520:
---

Description: 
This is a real doozie - when compiling a Spark assembly with JDK7, the produced 
jar does not work well with JRE6. I confirmed the byte code being produced is 
JDK 6 compatible (major version 50). What happens is that, silently, the JRE 
will not load any class files from the assembled jar.

{code}
$ sbt/sbt assembly/assembly

$ /usr/lib/jvm/java-1.7.0-openjdk-amd64/bin/java -cp 
/home/patrick/Documents/spark/assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar
 org.apache.spark.ui.UIWorkloadGenerator
usage: ./bin/spark-class org.apache.spark.ui.UIWorkloadGenerator [master] 
[FIFO|FAIR]

$ /usr/lib/jvm/java-1.6.0-openjdk-amd64/bin/java -cp 
/home/patrick/Documents/spark/assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar
 org.apache.spark.ui.UIWorkloadGenerator
Exception in thread main java.lang.NoClassDefFoundError: 
org/apache/spark/ui/UIWorkloadGenerator
Caused by: java.lang.ClassNotFoundException: 
org.apache.spark.ui.UIWorkloadGenerator
at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
at java.lang.ClassLoader.loadClass(ClassLoader.java:323)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
at java.lang.ClassLoader.loadClass(ClassLoader.java:268)
Could not find the main class: org.apache.spark.ui.UIWorkloadGenerator. Program 
will exit.

{code}

I also noticed that if the jar is unzipped, and the classpath set to the 
currently directory, it just works. Finally, if the assembly jar is compiled 
with JDK6, it also works. The error is seen with any class, not just the 
UIWorkloadGenerator. Also, this error doesn't exist in branch 0.9, only in 
master.

  was:
This is a real doozie - when compiling a Spark assembly with JDK7, the produced 
jar does not work well with JRE6. I confirmed the byte code being produced is 
JDK 6 compatible (major version 50). What happens is that, silently, the JRE 
will not load any class files from the assembled jar.

{code}
$ sbt/sbt assembly/assembly

$ /usr/lib/jvm/java-1.7.0-openjdk-amd64/bin/java -cp 
/home/patrick/Documents/spark/assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar
 org.apache.spark.ui.UIWorkloadGenerator
usage: ./bin/spark-class org.apache.spark.ui.UIWorkloadGenerator [master] 
[FIFO|FAIR]

$ /usr/lib/jvm/java-1.6.0-openjdk-amd64/bin/java -cp 
/home/patrick/Documents/spark/assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar
 org.apache.spark.ui.UIWorkloadGenerator
Exception in thread main java.lang.NoClassDefFoundError: 
org/apache/spark/ui/UIWorkloadGenerator
Caused by: java.lang.ClassNotFoundException: 
org.apache.spark.ui.UIWorkloadGenerator
at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
at java.lang.ClassLoader.loadClass(ClassLoader.java:323)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
at java.lang.ClassLoader.loadClass(ClassLoader.java:268)
Could not find the main class: org.apache.spark.ui.UIWorkloadGenerator. Program 
will exit.

{code}

I also noticed that if the jar is unzipped, and the classpath set to the 
currently directory, it just works. Finally, if the assembly jar is compiled 
with JDK6, it also works.


 Spark assembly fails with Java 6
 

 Key: SPARK-1520
 URL: https://issues.apache.org/jira/browse/SPARK-1520
 Project: Spark
  Issue Type: Bug
Reporter: Patrick Wendell
Priority: Blocker
 Fix For: 1.0.0


 This is a real doozie - when compiling a Spark assembly with JDK7, the 
 produced jar does not work well with JRE6. I confirmed the byte code being 
 produced is JDK 6 compatible (major version 50). What happens is that, 
 silently, the JRE will not load any class files from the assembled jar.
 {code}
 $ sbt/sbt assembly/assembly
 $ /usr/lib/jvm/java-1.7.0-openjdk-amd64/bin/java -cp 
 /home/patrick/Documents/spark/assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar
  org.apache.spark.ui.UIWorkloadGenerator
 usage: ./bin/spark-class org.apache.spark.ui.UIWorkloadGenerator [master] 
 [FIFO|FAIR]
 $ /usr/lib/jvm/java-1.6.0-openjdk-amd64/bin/java -cp 
 /home/patrick/Documents/spark/assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar
  org.apache.spark.ui.UIWorkloadGenerator
 Exception in thread main java.lang.NoClassDefFoundError: 
 org/apache/spark/ui/UIWorkloadGenerator
 Caused by: 

[jira] [Updated] (SPARK-1520) Spark assembly fails with JRE6 for unknown reason

2014-04-17 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-1520:
---

Component/s: MLlib

 Spark assembly fails with JRE6 for unknown reason
 -

 Key: SPARK-1520
 URL: https://issues.apache.org/jira/browse/SPARK-1520
 Project: Spark
  Issue Type: Bug
  Components: MLlib
Reporter: Patrick Wendell
Priority: Blocker
 Fix For: 1.0.0


 This is a real doozie - when compiling a Spark assembly with JDK7, the 
 produced jar does not work well with JRE6. I confirmed the byte code being 
 produced is JDK 6 compatible (major version 50). What happens is that, 
 silently, the JRE will not load any class files from the assembled jar.
 {code}
 $ sbt/sbt assembly/assembly
 $ /usr/lib/jvm/java-1.7.0-openjdk-amd64/bin/java -cp 
 /home/patrick/Documents/spark/assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar
  org.apache.spark.ui.UIWorkloadGenerator
 usage: ./bin/spark-class org.apache.spark.ui.UIWorkloadGenerator [master] 
 [FIFO|FAIR]
 $ /usr/lib/jvm/java-1.6.0-openjdk-amd64/bin/java -cp 
 /home/patrick/Documents/spark/assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar
  org.apache.spark.ui.UIWorkloadGenerator
 Exception in thread main java.lang.NoClassDefFoundError: 
 org/apache/spark/ui/UIWorkloadGenerator
 Caused by: java.lang.ClassNotFoundException: 
 org.apache.spark.ui.UIWorkloadGenerator
   at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
   at java.security.AccessController.doPrivileged(Native Method)
   at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
   at java.lang.ClassLoader.loadClass(ClassLoader.java:323)
   at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
   at java.lang.ClassLoader.loadClass(ClassLoader.java:268)
 Could not find the main class: org.apache.spark.ui.UIWorkloadGenerator. 
 Program will exit.
 {code}
 I also noticed that if the jar is unzipped, and the classpath set to the 
 currently directory, it just works. Finally, if the assembly jar is 
 compiled with JDK6, it also works. The error is seen with any class, not just 
 the UIWorkloadGenerator. Also, this error doesn't exist in branch 0.9, only 
 in master.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (SPARK-1520) Spark assembly fails with JRE6 for unknown reason

2014-04-17 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-1520:
---

Component/s: Spark Core

 Spark assembly fails with JRE6 for unknown reason
 -

 Key: SPARK-1520
 URL: https://issues.apache.org/jira/browse/SPARK-1520
 Project: Spark
  Issue Type: Bug
  Components: MLlib, Spark Core
Reporter: Patrick Wendell
Priority: Blocker
 Fix For: 1.0.0


 This is a real doozie - when compiling a Spark assembly with JDK7, the 
 produced jar does not work well with JRE6. I confirmed the byte code being 
 produced is JDK 6 compatible (major version 50). What happens is that, 
 silently, the JRE will not load any class files from the assembled jar.
 {code}
 $ sbt/sbt assembly/assembly
 $ /usr/lib/jvm/java-1.7.0-openjdk-amd64/bin/java -cp 
 /home/patrick/Documents/spark/assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar
  org.apache.spark.ui.UIWorkloadGenerator
 usage: ./bin/spark-class org.apache.spark.ui.UIWorkloadGenerator [master] 
 [FIFO|FAIR]
 $ /usr/lib/jvm/java-1.6.0-openjdk-amd64/bin/java -cp 
 /home/patrick/Documents/spark/assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar
  org.apache.spark.ui.UIWorkloadGenerator
 Exception in thread main java.lang.NoClassDefFoundError: 
 org/apache/spark/ui/UIWorkloadGenerator
 Caused by: java.lang.ClassNotFoundException: 
 org.apache.spark.ui.UIWorkloadGenerator
   at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
   at java.security.AccessController.doPrivileged(Native Method)
   at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
   at java.lang.ClassLoader.loadClass(ClassLoader.java:323)
   at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
   at java.lang.ClassLoader.loadClass(ClassLoader.java:268)
 Could not find the main class: org.apache.spark.ui.UIWorkloadGenerator. 
 Program will exit.
 {code}
 I also noticed that if the jar is unzipped, and the classpath set to the 
 currently directory, it just works. Finally, if the assembly jar is 
 compiled with JDK6, it also works. The error is seen with any class, not just 
 the UIWorkloadGenerator. Also, this error doesn't exist in branch 0.9, only 
 in master.
 I ran a git bisection and this appeared after the MLLib sparse vector patch 
 was merged:
 https://github.com/apache/spark/commit/80c29689ae3b589254a571da3ddb5f9c866ae534
 SPARK-1212



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (SPARK-1520) Inclusion of breeze corrupts assembly when compiled with JDK7 and run on JDK6

2014-04-17 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-1520:
---

Summary: Inclusion of breeze corrupts assembly when compiled with JDK7 and 
run on JDK6  (was: Spark assembly fails with JRE6 for unknown reason)

 Inclusion of breeze corrupts assembly when compiled with JDK7 and run on JDK6
 -

 Key: SPARK-1520
 URL: https://issues.apache.org/jira/browse/SPARK-1520
 Project: Spark
  Issue Type: Bug
  Components: MLlib, Spark Core
Reporter: Patrick Wendell
Priority: Blocker
 Fix For: 1.0.0


 This is a real doozie - when compiling a Spark assembly with JDK7, the 
 produced jar does not work well with JRE6. I confirmed the byte code being 
 produced is JDK 6 compatible (major version 50). What happens is that, 
 silently, the JRE will not load any class files from the assembled jar.
 {code}
 $ sbt/sbt assembly/assembly
 $ /usr/lib/jvm/java-1.7.0-openjdk-amd64/bin/java -cp 
 /home/patrick/Documents/spark/assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar
  org.apache.spark.ui.UIWorkloadGenerator
 usage: ./bin/spark-class org.apache.spark.ui.UIWorkloadGenerator [master] 
 [FIFO|FAIR]
 $ /usr/lib/jvm/java-1.6.0-openjdk-amd64/bin/java -cp 
 /home/patrick/Documents/spark/assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar
  org.apache.spark.ui.UIWorkloadGenerator
 Exception in thread main java.lang.NoClassDefFoundError: 
 org/apache/spark/ui/UIWorkloadGenerator
 Caused by: java.lang.ClassNotFoundException: 
 org.apache.spark.ui.UIWorkloadGenerator
   at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
   at java.security.AccessController.doPrivileged(Native Method)
   at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
   at java.lang.ClassLoader.loadClass(ClassLoader.java:323)
   at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
   at java.lang.ClassLoader.loadClass(ClassLoader.java:268)
 Could not find the main class: org.apache.spark.ui.UIWorkloadGenerator. 
 Program will exit.
 {code}
 I also noticed that if the jar is unzipped, and the classpath set to the 
 currently directory, it just works. Finally, if the assembly jar is 
 compiled with JDK6, it also works. The error is seen with any class, not just 
 the UIWorkloadGenerator. Also, this error doesn't exist in branch 0.9, only 
 in master.
 I ran a git bisection and this appeared after the MLLib sparse vector patch 
 was merged:
 https://github.com/apache/spark/commit/80c29689ae3b589254a571da3ddb5f9c866ae534
 SPARK-1212



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (SPARK-1520) Inclusion of breeze corrupts assembly when compiled with JDK7 and run on JDK6

2014-04-17 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-1520:
---

Description: 
This is a real doozie - when compiling a Spark assembly with JDK7, the produced 
jar does not work well with JRE6. I confirmed the byte code being produced is 
JDK 6 compatible (major version 50). What happens is that, silently, the JRE 
will not load any class files from the assembled jar.

{code}
$ sbt/sbt assembly/assembly

$ /usr/lib/jvm/java-1.7.0-openjdk-amd64/bin/java -cp 
/home/patrick/Documents/spark/assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar
 org.apache.spark.ui.UIWorkloadGenerator
usage: ./bin/spark-class org.apache.spark.ui.UIWorkloadGenerator [master] 
[FIFO|FAIR]

$ /usr/lib/jvm/java-1.6.0-openjdk-amd64/bin/java -cp 
/home/patrick/Documents/spark/assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar
 org.apache.spark.ui.UIWorkloadGenerator
Exception in thread main java.lang.NoClassDefFoundError: 
org/apache/spark/ui/UIWorkloadGenerator
Caused by: java.lang.ClassNotFoundException: 
org.apache.spark.ui.UIWorkloadGenerator
at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
at java.lang.ClassLoader.loadClass(ClassLoader.java:323)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
at java.lang.ClassLoader.loadClass(ClassLoader.java:268)
Could not find the main class: org.apache.spark.ui.UIWorkloadGenerator. Program 
will exit.

{code}

I also noticed that if the jar is unzipped, and the classpath set to the 
currently directory, it just works. Finally, if the assembly jar is compiled 
with JDK6, it also works. The error is seen with any class, not just the 
UIWorkloadGenerator. Also, this error doesn't exist in branch 0.9, only in 
master.

I ran a git bisection and this appeared after the MLLib sparse vector patch was 
merged:
https://github.com/apache/spark/commit/80c29689ae3b589254a571da3ddb5f9c866ae534
SPARK-1212

I narrowed this down specifically to the inclusion of the breeze library. Just 
adding breeze to an older (unaffected) build fixed the issue.

  was:
This is a real doozie - when compiling a Spark assembly with JDK7, the produced 
jar does not work well with JRE6. I confirmed the byte code being produced is 
JDK 6 compatible (major version 50). What happens is that, silently, the JRE 
will not load any class files from the assembled jar.

{code}
$ sbt/sbt assembly/assembly

$ /usr/lib/jvm/java-1.7.0-openjdk-amd64/bin/java -cp 
/home/patrick/Documents/spark/assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar
 org.apache.spark.ui.UIWorkloadGenerator
usage: ./bin/spark-class org.apache.spark.ui.UIWorkloadGenerator [master] 
[FIFO|FAIR]

$ /usr/lib/jvm/java-1.6.0-openjdk-amd64/bin/java -cp 
/home/patrick/Documents/spark/assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar
 org.apache.spark.ui.UIWorkloadGenerator
Exception in thread main java.lang.NoClassDefFoundError: 
org/apache/spark/ui/UIWorkloadGenerator
Caused by: java.lang.ClassNotFoundException: 
org.apache.spark.ui.UIWorkloadGenerator
at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
at java.lang.ClassLoader.loadClass(ClassLoader.java:323)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
at java.lang.ClassLoader.loadClass(ClassLoader.java:268)
Could not find the main class: org.apache.spark.ui.UIWorkloadGenerator. Program 
will exit.

{code}

I also noticed that if the jar is unzipped, and the classpath set to the 
currently directory, it just works. Finally, if the assembly jar is compiled 
with JDK6, it also works. The error is seen with any class, not just the 
UIWorkloadGenerator. Also, this error doesn't exist in branch 0.9, only in 
master.

I ran a git bisection and this appeared after the MLLib sparse vector patch was 
merged:
https://github.com/apache/spark/commit/80c29689ae3b589254a571da3ddb5f9c866ae534
SPARK-1212


 Inclusion of breeze corrupts assembly when compiled with JDK7 and run on JDK6
 -

 Key: SPARK-1520
 URL: https://issues.apache.org/jira/browse/SPARK-1520
 Project: Spark
  Issue Type: Bug
  Components: MLlib, Spark Core
Reporter: Patrick Wendell
Priority: Blocker
 Fix For: 1.0.0


 This is a real doozie - when compiling a Spark assembly with JDK7, the 
 produced jar does not work well with JRE6. I confirmed the byte code being 
 produced is JDK 6 

[jira] [Updated] (SPARK-1520) Inclusion of breeze corrupts assembly when compiled with JDK7 and run on JDK6

2014-04-17 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-1520:
---

Description: 
This is a real doozie - when compiling a Spark assembly with JDK7, the produced 
jar does not work well with JRE6. I confirmed the byte code being produced is 
JDK 6 compatible (major version 50). What happens is that, silently, the JRE 
will not load any class files from the assembled jar.

{code}
$ sbt/sbt assembly/assembly

$ /usr/lib/jvm/java-1.7.0-openjdk-amd64/bin/java -cp 
/home/patrick/Documents/spark/assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar
 org.apache.spark.ui.UIWorkloadGenerator
usage: ./bin/spark-class org.apache.spark.ui.UIWorkloadGenerator [master] 
[FIFO|FAIR]

$ /usr/lib/jvm/java-1.6.0-openjdk-amd64/bin/java -cp 
/home/patrick/Documents/spark/assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar
 org.apache.spark.ui.UIWorkloadGenerator
Exception in thread main java.lang.NoClassDefFoundError: 
org/apache/spark/ui/UIWorkloadGenerator
Caused by: java.lang.ClassNotFoundException: 
org.apache.spark.ui.UIWorkloadGenerator
at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
at java.lang.ClassLoader.loadClass(ClassLoader.java:323)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
at java.lang.ClassLoader.loadClass(ClassLoader.java:268)
Could not find the main class: org.apache.spark.ui.UIWorkloadGenerator. Program 
will exit.

{code}

I also noticed that if the jar is unzipped, and the classpath set to the 
currently directory, it just works. Finally, if the assembly jar is compiled 
with JDK6, it also works. The error is seen with any class, not just the 
UIWorkloadGenerator. Also, this error doesn't exist in branch 0.9, only in 
master.

*Isolation*

-I ran a git bisection and this appeared after the MLLib sparse vector patch 
was merged:-
https://github.com/apache/spark/commit/80c29689ae3b589254a571da3ddb5f9c866ae534
SPARK-1212

-I narrowed this down specifically to the inclusion of the breeze library. Just 
adding breeze to an older (unaffected) build triggered the issue.-

I narrowed this down to one or more classes inside of breeze/linalg/operators. 
If this directory is deleted and the jar is re-assembled things work fine.

  was:
This is a real doozie - when compiling a Spark assembly with JDK7, the produced 
jar does not work well with JRE6. I confirmed the byte code being produced is 
JDK 6 compatible (major version 50). What happens is that, silently, the JRE 
will not load any class files from the assembled jar.

{code}
$ sbt/sbt assembly/assembly

$ /usr/lib/jvm/java-1.7.0-openjdk-amd64/bin/java -cp 
/home/patrick/Documents/spark/assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar
 org.apache.spark.ui.UIWorkloadGenerator
usage: ./bin/spark-class org.apache.spark.ui.UIWorkloadGenerator [master] 
[FIFO|FAIR]

$ /usr/lib/jvm/java-1.6.0-openjdk-amd64/bin/java -cp 
/home/patrick/Documents/spark/assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar
 org.apache.spark.ui.UIWorkloadGenerator
Exception in thread main java.lang.NoClassDefFoundError: 
org/apache/spark/ui/UIWorkloadGenerator
Caused by: java.lang.ClassNotFoundException: 
org.apache.spark.ui.UIWorkloadGenerator
at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
at java.lang.ClassLoader.loadClass(ClassLoader.java:323)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
at java.lang.ClassLoader.loadClass(ClassLoader.java:268)
Could not find the main class: org.apache.spark.ui.UIWorkloadGenerator. Program 
will exit.

{code}

I also noticed that if the jar is unzipped, and the classpath set to the 
currently directory, it just works. Finally, if the assembly jar is compiled 
with JDK6, it also works. The error is seen with any class, not just the 
UIWorkloadGenerator. Also, this error doesn't exist in branch 0.9, only in 
master.

I ran a git bisection and this appeared after the MLLib sparse vector patch was 
merged:
https://github.com/apache/spark/commit/80c29689ae3b589254a571da3ddb5f9c866ae534
SPARK-1212

I narrowed this down specifically to the inclusion of the breeze library. Just 
adding breeze to an older (unaffected) build triggered the issue.


 Inclusion of breeze corrupts assembly when compiled with JDK7 and run on JDK6
 -

 Key: SPARK-1520
 URL: https://issues.apache.org/jira/browse/SPARK-1520
 Project: Spark
  Issue 

[jira] [Created] (SPARK-1521) Take character set size into account when compressing in-memory string columns

2014-04-17 Thread Cheng Lian (JIRA)
Cheng Lian created SPARK-1521:
-

 Summary: Take character set size into account when compressing 
in-memory string columns
 Key: SPARK-1521
 URL: https://issues.apache.org/jira/browse/SPARK-1521
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 1.1.0
Reporter: Cheng Lian


Quoted from [a blog 
post|https://code.facebook.com/posts/229861827208629/scaling-the-facebook-data-warehouse-to-300-pb/]
 from Facebook:

bq. Strings dominate the largest tables in our warehouse and make up about 80% 
of the columns across the warehouse, so optimizing compression for string 
columns was important. By using a threshold on observed number of distinct 
column values per stripe, we modified the ORCFile writer to apply dictionary 
encoding to a stripe only when beneficial. Additionally, we sample the column 
values and take the character set of the column into account, since a small 
character set can be leveraged by codecs like Zlib for good compression and 
dictionary encoding then becomes unnecessary or sometimes even detrimental if 
applied.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (SPARK-1520) Inclusion of breeze corrupts assembly when compiled with JDK7 and run on JDK6

2014-04-17 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-1520:
---

Description: 
This is a real doozie - when compiling a Spark assembly with JDK7, the produced 
jar does not work well with JRE6. I confirmed the byte code being produced is 
JDK 6 compatible (major version 50). What happens is that, silently, the JRE 
will not load any class files from the assembled jar.

{code}
$ sbt/sbt assembly/assembly

$ /usr/lib/jvm/java-1.7.0-openjdk-amd64/bin/java -cp 
/home/patrick/Documents/spark/assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar
 org.apache.spark.ui.UIWorkloadGenerator
usage: ./bin/spark-class org.apache.spark.ui.UIWorkloadGenerator [master] 
[FIFO|FAIR]

$ /usr/lib/jvm/java-1.6.0-openjdk-amd64/bin/java -cp 
/home/patrick/Documents/spark/assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar
 org.apache.spark.ui.UIWorkloadGenerator
Exception in thread main java.lang.NoClassDefFoundError: 
org/apache/spark/ui/UIWorkloadGenerator
Caused by: java.lang.ClassNotFoundException: 
org.apache.spark.ui.UIWorkloadGenerator
at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
at java.lang.ClassLoader.loadClass(ClassLoader.java:323)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
at java.lang.ClassLoader.loadClass(ClassLoader.java:268)
Could not find the main class: org.apache.spark.ui.UIWorkloadGenerator. Program 
will exit.

{code}

I also noticed that if the jar is unzipped, and the classpath set to the 
currently directory, it just works. Finally, if the assembly jar is compiled 
with JDK6, it also works. The error is seen with any class, not just the 
UIWorkloadGenerator. Also, this error doesn't exist in branch 0.9, only in 
master.

*Isolation*

-I ran a git bisection and this appeared after the MLLib sparse vector patch 
was merged:-
https://github.com/apache/spark/commit/80c29689ae3b589254a571da3ddb5f9c866ae534
SPARK-1212

-I narrowed this down specifically to the inclusion of the breeze library. Just 
adding breeze to an older (unaffected) build triggered the issue.-

I (maybe) narrowed this down to one or more classes inside of 
breeze/linalg/operators. If this directory is deleted and the jar is 
re-assembled things work fine.

  was:
This is a real doozie - when compiling a Spark assembly with JDK7, the produced 
jar does not work well with JRE6. I confirmed the byte code being produced is 
JDK 6 compatible (major version 50). What happens is that, silently, the JRE 
will not load any class files from the assembled jar.

{code}
$ sbt/sbt assembly/assembly

$ /usr/lib/jvm/java-1.7.0-openjdk-amd64/bin/java -cp 
/home/patrick/Documents/spark/assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar
 org.apache.spark.ui.UIWorkloadGenerator
usage: ./bin/spark-class org.apache.spark.ui.UIWorkloadGenerator [master] 
[FIFO|FAIR]

$ /usr/lib/jvm/java-1.6.0-openjdk-amd64/bin/java -cp 
/home/patrick/Documents/spark/assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar
 org.apache.spark.ui.UIWorkloadGenerator
Exception in thread main java.lang.NoClassDefFoundError: 
org/apache/spark/ui/UIWorkloadGenerator
Caused by: java.lang.ClassNotFoundException: 
org.apache.spark.ui.UIWorkloadGenerator
at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
at java.lang.ClassLoader.loadClass(ClassLoader.java:323)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
at java.lang.ClassLoader.loadClass(ClassLoader.java:268)
Could not find the main class: org.apache.spark.ui.UIWorkloadGenerator. Program 
will exit.

{code}

I also noticed that if the jar is unzipped, and the classpath set to the 
currently directory, it just works. Finally, if the assembly jar is compiled 
with JDK6, it also works. The error is seen with any class, not just the 
UIWorkloadGenerator. Also, this error doesn't exist in branch 0.9, only in 
master.

*Isolation*

-I ran a git bisection and this appeared after the MLLib sparse vector patch 
was merged:-
https://github.com/apache/spark/commit/80c29689ae3b589254a571da3ddb5f9c866ae534
SPARK-1212

-I narrowed this down specifically to the inclusion of the breeze library. Just 
adding breeze to an older (unaffected) build triggered the issue.-

I narrowed this down to one or more classes inside of breeze/linalg/operators. 
If this directory is deleted and the jar is re-assembled things work fine.


 Inclusion of breeze corrupts assembly when compiled with JDK7 and run on JDK6
 

[jira] [Updated] (SPARK-1520) Inclusion of breeze corrupts assembly when compiled with JDK7 and run on JDK6

2014-04-17 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-1520:
---

Description: 
This is a real doozie - when compiling a Spark assembly with JDK7, the produced 
jar does not work well with JRE6. I confirmed the byte code being produced is 
JDK 6 compatible (major version 50). What happens is that, silently, the JRE 
will not load any class files from the assembled jar.

{code}
$ sbt/sbt assembly/assembly

$ /usr/lib/jvm/java-1.7.0-openjdk-amd64/bin/java -cp 
/home/patrick/Documents/spark/assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar
 org.apache.spark.ui.UIWorkloadGenerator
usage: ./bin/spark-class org.apache.spark.ui.UIWorkloadGenerator [master] 
[FIFO|FAIR]

$ /usr/lib/jvm/java-1.6.0-openjdk-amd64/bin/java -cp 
/home/patrick/Documents/spark/assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar
 org.apache.spark.ui.UIWorkloadGenerator
Exception in thread main java.lang.NoClassDefFoundError: 
org/apache/spark/ui/UIWorkloadGenerator
Caused by: java.lang.ClassNotFoundException: 
org.apache.spark.ui.UIWorkloadGenerator
at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
at java.lang.ClassLoader.loadClass(ClassLoader.java:323)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
at java.lang.ClassLoader.loadClass(ClassLoader.java:268)
Could not find the main class: org.apache.spark.ui.UIWorkloadGenerator. Program 
will exit.

{code}

I also noticed that if the jar is unzipped, and the classpath set to the 
currently directory, it just works. Finally, if the assembly jar is compiled 
with JDK6, it also works. The error is seen with any class, not just the 
UIWorkloadGenerator. Also, this error doesn't exist in branch 0.9, only in 
master.

*Isolation*

-I ran a git bisection and this appeared after the MLLib sparse vector patch 
was merged:-
https://github.com/apache/spark/commit/80c29689ae3b589254a571da3ddb5f9c866ae534
SPARK-1212

-I narrowed this down specifically to the inclusion of the breeze library. Just 
adding breeze to an older (unaffected) build triggered the issue.-

I've found that if I just unpack and re-pack the jar, it sometimes works:

{code}
$ cd assembly/target/scala-2.10/
$ /usr/lib/jvm/java-1.6.0-openjdk-amd64/bin/java -cp 
./spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar 
org.apache.spark.ui.UIWorkloadGenerator # fails
$ jar xvf spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar
$ jar cvf spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar *
$ /usr/lib/jvm/java-1.6.0-openjdk-amd64/bin/java -cp 
./spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar 
org.apache.spark.ui.UIWorkloadGenerator # succeeds
{code}

  was:
This is a real doozie - when compiling a Spark assembly with JDK7, the produced 
jar does not work well with JRE6. I confirmed the byte code being produced is 
JDK 6 compatible (major version 50). What happens is that, silently, the JRE 
will not load any class files from the assembled jar.

{code}
$ sbt/sbt assembly/assembly

$ /usr/lib/jvm/java-1.7.0-openjdk-amd64/bin/java -cp 
/home/patrick/Documents/spark/assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar
 org.apache.spark.ui.UIWorkloadGenerator
usage: ./bin/spark-class org.apache.spark.ui.UIWorkloadGenerator [master] 
[FIFO|FAIR]

$ /usr/lib/jvm/java-1.6.0-openjdk-amd64/bin/java -cp 
/home/patrick/Documents/spark/assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar
 org.apache.spark.ui.UIWorkloadGenerator
Exception in thread main java.lang.NoClassDefFoundError: 
org/apache/spark/ui/UIWorkloadGenerator
Caused by: java.lang.ClassNotFoundException: 
org.apache.spark.ui.UIWorkloadGenerator
at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
at java.lang.ClassLoader.loadClass(ClassLoader.java:323)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
at java.lang.ClassLoader.loadClass(ClassLoader.java:268)
Could not find the main class: org.apache.spark.ui.UIWorkloadGenerator. Program 
will exit.

{code}

I also noticed that if the jar is unzipped, and the classpath set to the 
currently directory, it just works. Finally, if the assembly jar is compiled 
with JDK6, it also works. The error is seen with any class, not just the 
UIWorkloadGenerator. Also, this error doesn't exist in branch 0.9, only in 
master.

*Isolation*

-I ran a git bisection and this appeared after the MLLib sparse vector patch 
was merged:-
https://github.com/apache/spark/commit/80c29689ae3b589254a571da3ddb5f9c866ae534
SPARK-1212

-I narrowed this down specifically to the inclusion 

[jira] [Commented] (SPARK-1520) Inclusion of breeze corrupts assembly when compiled with JDK7 and run on JDK6

2014-04-17 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13972397#comment-13972397
 ] 

Sean Owen commented on SPARK-1520:
--

Madness. One wild guess is that the breeze .jar files have something in 
META-INF that, when merged together into the assembly jar, conflicts with other 
META-INF items. In particular I'm thinking of MANIFEST.MF entries. It's worth 
diffing those if you can from before and after. However this would still 
require that Java 7 and 6 behave differently with respect to the entries, to 
explain your findings. It's possible.

Your last comment however suggests it's something strange with the byte code 
that gets output for a few classes. Java 7 is stricter about byte code. For 
example: 
https://weblogs.java.net/blog/fabriziogiudici/archive/2012/05/07/understanding-subtle-new-behaviours-jdk-7
However I would think these would manifest as quite different errors.

What about running with -verbose:class to print classloading messages? it might 
point directly to what's failing to load, if that's it.

Of course you can always build with Java 6 since that's supposed to be all 
that's supported/required now (see my other JIRA about making Jenkins do this), 
although I agree that it would be nice to get to the bottom of this, as there 
is no obvious reason this shouldn't work.

 Inclusion of breeze corrupts assembly when compiled with JDK7 and run on JDK6
 -

 Key: SPARK-1520
 URL: https://issues.apache.org/jira/browse/SPARK-1520
 Project: Spark
  Issue Type: Bug
  Components: MLlib, Spark Core
Reporter: Patrick Wendell
Priority: Blocker
 Fix For: 1.0.0


 This is a real doozie - when compiling a Spark assembly with JDK7, the 
 produced jar does not work well with JRE6. I confirmed the byte code being 
 produced is JDK 6 compatible (major version 50). What happens is that, 
 silently, the JRE will not load any class files from the assembled jar.
 {code}
 $ sbt/sbt assembly/assembly
 $ /usr/lib/jvm/java-1.7.0-openjdk-amd64/bin/java -cp 
 /home/patrick/Documents/spark/assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar
  org.apache.spark.ui.UIWorkloadGenerator
 usage: ./bin/spark-class org.apache.spark.ui.UIWorkloadGenerator [master] 
 [FIFO|FAIR]
 $ /usr/lib/jvm/java-1.6.0-openjdk-amd64/bin/java -cp 
 /home/patrick/Documents/spark/assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar
  org.apache.spark.ui.UIWorkloadGenerator
 Exception in thread main java.lang.NoClassDefFoundError: 
 org/apache/spark/ui/UIWorkloadGenerator
 Caused by: java.lang.ClassNotFoundException: 
 org.apache.spark.ui.UIWorkloadGenerator
   at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
   at java.security.AccessController.doPrivileged(Native Method)
   at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
   at java.lang.ClassLoader.loadClass(ClassLoader.java:323)
   at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
   at java.lang.ClassLoader.loadClass(ClassLoader.java:268)
 Could not find the main class: org.apache.spark.ui.UIWorkloadGenerator. 
 Program will exit.
 {code}
 I also noticed that if the jar is unzipped, and the classpath set to the 
 currently directory, it just works. Finally, if the assembly jar is 
 compiled with JDK6, it also works. The error is seen with any class, not just 
 the UIWorkloadGenerator. Also, this error doesn't exist in branch 0.9, only 
 in master.
 *Isolation*
 -I ran a git bisection and this appeared after the MLLib sparse vector patch 
 was merged:-
 https://github.com/apache/spark/commit/80c29689ae3b589254a571da3ddb5f9c866ae534
 SPARK-1212
 -I narrowed this down specifically to the inclusion of the breeze library. 
 Just adding breeze to an older (unaffected) build triggered the issue.-
 I've found that if I just unpack and re-pack the jar, it sometimes works:
 {code}
 $ cd assembly/target/scala-2.10/
 $ /usr/lib/jvm/java-1.6.0-openjdk-amd64/bin/java -cp 
 ./spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar 
 org.apache.spark.ui.UIWorkloadGenerator # fails
 $ jar xvf spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar
 $ jar cvf spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar *
 $ /usr/lib/jvm/java-1.6.0-openjdk-amd64/bin/java -cp 
 ./spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar 
 org.apache.spark.ui.UIWorkloadGenerator # succeeds
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (SPARK-1520) Inclusion of breeze corrupts assembly when compiled with JDK7 and run on JDK6

2014-04-17 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-1520:
---

Description: 
This is a real doozie - when compiling a Spark assembly with JDK7, the produced 
jar does not work well with JRE6. I confirmed the byte code being produced is 
JDK 6 compatible (major version 50). What happens is that, silently, the JRE 
will not load any class files from the assembled jar.

{code}
$ sbt/sbt assembly/assembly

$ /usr/lib/jvm/java-1.7.0-openjdk-amd64/bin/java -cp 
/home/patrick/Documents/spark/assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar
 org.apache.spark.ui.UIWorkloadGenerator
usage: ./bin/spark-class org.apache.spark.ui.UIWorkloadGenerator [master] 
[FIFO|FAIR]

$ /usr/lib/jvm/java-1.6.0-openjdk-amd64/bin/java -cp 
/home/patrick/Documents/spark/assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar
 org.apache.spark.ui.UIWorkloadGenerator
Exception in thread main java.lang.NoClassDefFoundError: 
org/apache/spark/ui/UIWorkloadGenerator
Caused by: java.lang.ClassNotFoundException: 
org.apache.spark.ui.UIWorkloadGenerator
at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
at java.lang.ClassLoader.loadClass(ClassLoader.java:323)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
at java.lang.ClassLoader.loadClass(ClassLoader.java:268)
Could not find the main class: org.apache.spark.ui.UIWorkloadGenerator. Program 
will exit.

{code}

I also noticed that if the jar is unzipped, and the classpath set to the 
currently directory, it just works. Finally, if the assembly jar is compiled 
with JDK6, it also works. The error is seen with any class, not just the 
UIWorkloadGenerator. Also, this error doesn't exist in branch 0.9, only in 
master.

*Isolation*

-I ran a git bisection and this appeared after the MLLib sparse vector patch 
was merged:-
https://github.com/apache/spark/commit/80c29689ae3b589254a571da3ddb5f9c866ae534
SPARK-1212

-I narrowed this down specifically to the inclusion of the breeze library. Just 
adding breeze to an older (unaffected) build triggered the issue.-

I've found that if I just unpack and re-pack the jar (using `jar` from java 7) 
it always works:

{code}
$ cd assembly/target/scala-2.10/
$ /usr/lib/jvm/java-1.6.0-openjdk-amd64/bin/java -cp 
./spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar 
org.apache.spark.ui.UIWorkloadGenerator # fails
$ jar xvf spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar
$ jar cvf spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar *
$ /usr/lib/jvm/java-1.6.0-openjdk-amd64/bin/java -cp 
./spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar 
org.apache.spark.ui.UIWorkloadGenerator # succeeds
{code}

I also noticed something of note. The Breeze package contains single 
directories that have huge numbers of files in them (e.g. 2000+ class files in 
one directory). It's possible we are hitting some weird bugs/corner cases with 
compatibility of the internal storage format of the jar itself.

  was:
This is a real doozie - when compiling a Spark assembly with JDK7, the produced 
jar does not work well with JRE6. I confirmed the byte code being produced is 
JDK 6 compatible (major version 50). What happens is that, silently, the JRE 
will not load any class files from the assembled jar.

{code}
$ sbt/sbt assembly/assembly

$ /usr/lib/jvm/java-1.7.0-openjdk-amd64/bin/java -cp 
/home/patrick/Documents/spark/assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar
 org.apache.spark.ui.UIWorkloadGenerator
usage: ./bin/spark-class org.apache.spark.ui.UIWorkloadGenerator [master] 
[FIFO|FAIR]

$ /usr/lib/jvm/java-1.6.0-openjdk-amd64/bin/java -cp 
/home/patrick/Documents/spark/assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar
 org.apache.spark.ui.UIWorkloadGenerator
Exception in thread main java.lang.NoClassDefFoundError: 
org/apache/spark/ui/UIWorkloadGenerator
Caused by: java.lang.ClassNotFoundException: 
org.apache.spark.ui.UIWorkloadGenerator
at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
at java.lang.ClassLoader.loadClass(ClassLoader.java:323)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
at java.lang.ClassLoader.loadClass(ClassLoader.java:268)
Could not find the main class: org.apache.spark.ui.UIWorkloadGenerator. Program 
will exit.

{code}

I also noticed that if the jar is unzipped, and the classpath set to the 
currently directory, it just works. Finally, if the assembly jar is compiled 
with JDK6, it also works. The error is seen with any class, not just the 

[jira] [Updated] (SPARK-1520) Inclusion of breeze corrupts assembly when compiled with JDK7 and run on JDK6

2014-04-17 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-1520:
---

Description: 
This is a real doozie - when compiling a Spark assembly with JDK7, the produced 
jar does not work well with JRE6. I confirmed the byte code being produced is 
JDK 6 compatible (major version 50). What happens is that, silently, the JRE 
will not load any class files from the assembled jar.

{code}
$ sbt/sbt assembly/assembly

$ /usr/lib/jvm/java-1.7.0-openjdk-amd64/bin/java -cp 
/home/patrick/Documents/spark/assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar
 org.apache.spark.ui.UIWorkloadGenerator
usage: ./bin/spark-class org.apache.spark.ui.UIWorkloadGenerator [master] 
[FIFO|FAIR]

$ /usr/lib/jvm/java-1.6.0-openjdk-amd64/bin/java -cp 
/home/patrick/Documents/spark/assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar
 org.apache.spark.ui.UIWorkloadGenerator
Exception in thread main java.lang.NoClassDefFoundError: 
org/apache/spark/ui/UIWorkloadGenerator
Caused by: java.lang.ClassNotFoundException: 
org.apache.spark.ui.UIWorkloadGenerator
at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
at java.lang.ClassLoader.loadClass(ClassLoader.java:323)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
at java.lang.ClassLoader.loadClass(ClassLoader.java:268)
Could not find the main class: org.apache.spark.ui.UIWorkloadGenerator. Program 
will exit.

{code}

I also noticed that if the jar is unzipped, and the classpath set to the 
currently directory, it just works. Finally, if the assembly jar is compiled 
with JDK6, it also works. The error is seen with any class, not just the 
UIWorkloadGenerator. Also, this error doesn't exist in branch 0.9, only in 
master.

*Isolation*

-I ran a git bisection and this appeared after the MLLib sparse vector patch 
was merged:-
https://github.com/apache/spark/commit/80c29689ae3b589254a571da3ddb5f9c866ae534
SPARK-1212

-I narrowed this down specifically to the inclusion of the breeze library. Just 
adding breeze to an older (unaffected) build triggered the issue.-

I've found that if I just unpack and re-pack the jar (using `jar` from java 6 
or 7) it always works:

{code}
$ cd assembly/target/scala-2.10/
$ /usr/lib/jvm/java-1.6.0-openjdk-amd64/bin/java -cp 
./spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar 
org.apache.spark.ui.UIWorkloadGenerator # fails
$ jar xvf spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar
$ jar cvf spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar *
$ /usr/lib/jvm/java-1.6.0-openjdk-amd64/bin/java -cp 
./spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar 
org.apache.spark.ui.UIWorkloadGenerator # succeeds
{code}

I also noticed something of note. The Breeze package contains single 
directories that have huge numbers of files in them (e.g. 2000+ class files in 
one directory). It's possible we are hitting some weird bugs/corner cases with 
compatibility of the internal storage format of the jar itself.

  was:
This is a real doozie - when compiling a Spark assembly with JDK7, the produced 
jar does not work well with JRE6. I confirmed the byte code being produced is 
JDK 6 compatible (major version 50). What happens is that, silently, the JRE 
will not load any class files from the assembled jar.

{code}
$ sbt/sbt assembly/assembly

$ /usr/lib/jvm/java-1.7.0-openjdk-amd64/bin/java -cp 
/home/patrick/Documents/spark/assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar
 org.apache.spark.ui.UIWorkloadGenerator
usage: ./bin/spark-class org.apache.spark.ui.UIWorkloadGenerator [master] 
[FIFO|FAIR]

$ /usr/lib/jvm/java-1.6.0-openjdk-amd64/bin/java -cp 
/home/patrick/Documents/spark/assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar
 org.apache.spark.ui.UIWorkloadGenerator
Exception in thread main java.lang.NoClassDefFoundError: 
org/apache/spark/ui/UIWorkloadGenerator
Caused by: java.lang.ClassNotFoundException: 
org.apache.spark.ui.UIWorkloadGenerator
at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
at java.lang.ClassLoader.loadClass(ClassLoader.java:323)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
at java.lang.ClassLoader.loadClass(ClassLoader.java:268)
Could not find the main class: org.apache.spark.ui.UIWorkloadGenerator. Program 
will exit.

{code}

I also noticed that if the jar is unzipped, and the classpath set to the 
currently directory, it just works. Finally, if the assembly jar is compiled 
with JDK6, it also works. The error is seen with any class, not just the 

[jira] [Commented] (SPARK-1520) Inclusion of breeze corrupts assembly when compiled with JDK7 and run on JDK6

2014-04-17 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13972523#comment-13972523
 ] 

Sean Owen commented on SPARK-1520:
--

Regarding large numbers of files: are there INDEX.LST files used anywhere in 
the jars? If this gets munged or truncated while building the assembly jar, 
that might cause all kinds of havoc. It could be omitted.

http://docs.oracle.com/javase/7/docs/technotes/guides/jar/jar.html#Index_File_Specification

 Inclusion of breeze corrupts assembly when compiled with JDK7 and run on JDK6
 -

 Key: SPARK-1520
 URL: https://issues.apache.org/jira/browse/SPARK-1520
 Project: Spark
  Issue Type: Bug
  Components: MLlib, Spark Core
Reporter: Patrick Wendell
Priority: Blocker
 Fix For: 1.0.0


 This is a real doozie - when compiling a Spark assembly with JDK7, the 
 produced jar does not work well with JRE6. I confirmed the byte code being 
 produced is JDK 6 compatible (major version 50). What happens is that, 
 silently, the JRE will not load any class files from the assembled jar.
 {code}
 $ sbt/sbt assembly/assembly
 $ /usr/lib/jvm/java-1.7.0-openjdk-amd64/bin/java -cp 
 /home/patrick/Documents/spark/assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar
  org.apache.spark.ui.UIWorkloadGenerator
 usage: ./bin/spark-class org.apache.spark.ui.UIWorkloadGenerator [master] 
 [FIFO|FAIR]
 $ /usr/lib/jvm/java-1.6.0-openjdk-amd64/bin/java -cp 
 /home/patrick/Documents/spark/assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar
  org.apache.spark.ui.UIWorkloadGenerator
 Exception in thread main java.lang.NoClassDefFoundError: 
 org/apache/spark/ui/UIWorkloadGenerator
 Caused by: java.lang.ClassNotFoundException: 
 org.apache.spark.ui.UIWorkloadGenerator
   at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
   at java.security.AccessController.doPrivileged(Native Method)
   at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
   at java.lang.ClassLoader.loadClass(ClassLoader.java:323)
   at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
   at java.lang.ClassLoader.loadClass(ClassLoader.java:268)
 Could not find the main class: org.apache.spark.ui.UIWorkloadGenerator. 
 Program will exit.
 {code}
 I also noticed that if the jar is unzipped, and the classpath set to the 
 currently directory, it just works. Finally, if the assembly jar is 
 compiled with JDK6, it also works. The error is seen with any class, not just 
 the UIWorkloadGenerator. Also, this error doesn't exist in branch 0.9, only 
 in master.
 *Isolation*
 -I ran a git bisection and this appeared after the MLLib sparse vector patch 
 was merged:-
 https://github.com/apache/spark/commit/80c29689ae3b589254a571da3ddb5f9c866ae534
 SPARK-1212
 -I narrowed this down specifically to the inclusion of the breeze library. 
 Just adding breeze to an older (unaffected) build triggered the issue.-
 I've found that if I just unpack and re-pack the jar (using `jar` from java 6 
 or 7) it always works:
 {code}
 $ cd assembly/target/scala-2.10/
 $ /usr/lib/jvm/java-1.6.0-openjdk-amd64/bin/java -cp 
 ./spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar 
 org.apache.spark.ui.UIWorkloadGenerator # fails
 $ jar xvf spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar
 $ jar cvf spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar *
 $ /usr/lib/jvm/java-1.6.0-openjdk-amd64/bin/java -cp 
 ./spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar 
 org.apache.spark.ui.UIWorkloadGenerator # succeeds
 {code}
 I also noticed something of note. The Breeze package contains single 
 directories that have huge numbers of files in them (e.g. 2000+ class files 
 in one directory). It's possible we are hitting some weird bugs/corner cases 
 with compatibility of the internal storage format of the jar itself.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (SPARK-1522) YARN ClientBase will throw a NPE if there is no YARN application specific classpath.

2014-04-17 Thread Bernardo Gomez Palacio (JIRA)
Bernardo Gomez Palacio created SPARK-1522:
-

 Summary: YARN ClientBase will throw a NPE if there is no YARN 
application specific classpath.
 Key: SPARK-1522
 URL: https://issues.apache.org/jira/browse/SPARK-1522
 Project: Spark
  Issue Type: Bug
  Components: YARN
Affects Versions: 0.9.0, 1.0.0, 0.9.1
Reporter: Bernardo Gomez Palacio
Priority: Critical


The current implementation of ClientBase.getDefaultYarnApplicationClasspath 
inspects the MRJobConfig class for the field DEFAULT_YARN_APPLICATION_CLASSPATH 
when it should be really looking into YarnConfiguration.

If the Application Configuration has no yarn.application.classpath defined a 
NPE exception will be thrown.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (SPARK-1522) YARN ClientBase will throw a NPE if there is no YARN application specific classpath.

2014-04-17 Thread Bernardo Gomez Palacio (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13972803#comment-13972803
 ] 

Bernardo Gomez Palacio commented on SPARK-1522:
---

https://github.com/apache/spark/pull/433

 YARN ClientBase will throw a NPE if there is no YARN application specific 
 classpath.
 

 Key: SPARK-1522
 URL: https://issues.apache.org/jira/browse/SPARK-1522
 Project: Spark
  Issue Type: Bug
  Components: YARN
Affects Versions: 0.9.0, 1.0.0, 0.9.1
Reporter: Bernardo Gomez Palacio
Priority: Critical
  Labels: YARN

 The current implementation of ClientBase.getDefaultYarnApplicationClasspath 
 inspects the MRJobConfig class for the field 
 DEFAULT_YARN_APPLICATION_CLASSPATH when it should be really looking into 
 YarnConfiguration.
 If the Application Configuration has no yarn.application.classpath defined a 
 NPE exception will be thrown.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (SPARK-1524) TaskSetManager'd better not schedule tasks which has no preferred executorId using PROCESS_LOCAL in the first search process

2014-04-17 Thread YanTang Zhai (JIRA)
YanTang Zhai created SPARK-1524:
---

 Summary: TaskSetManager'd better not schedule tasks which has no 
preferred executorId using PROCESS_LOCAL in the first search process
 Key: SPARK-1524
 URL: https://issues.apache.org/jira/browse/SPARK-1524
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Reporter: YanTang Zhai
Priority: Minor


ShuffleMapTask is constructed with TaskLocation which has only host not (host, 
executorID) pair in DAGScheduler.
When TaskSetManager schedules ShuffleMapTask which has no preferred executorId 
using specific execId host and PROCESS_LOCAL locality level, no tasks match the 
given locality constraint in the first search process.
We also find that the host used by Scheduler is hostname while the host used by 
TaskLocation is IP in our cluster. The tow hosts do not match, that makes 
pendingTasksForHost HashMap empty and the finding task process against our 
expectation.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (SPARK-1524) TaskSetManager'd better not schedule tasks which has no preferred executorId using PROCESS_LOCAL in the first search process

2014-04-17 Thread Mridul Muralidharan (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13972864#comment-13972864
 ] 

Mridul Muralidharan commented on SPARK-1524:


The expectation is to fallback to a previous schedule type in case the higher 
level is not valid : though this is tricky in general case.
Will need to take a look at it - though given that I am tied up with other 
things, if someone else wants to take a crack, please feel free to do so !

Btw, use of IP's and multiple hostnames for a host is not supported in spark - 
so that is something that will need to be resolved at the deployment end.

 TaskSetManager'd better not schedule tasks which has no preferred executorId 
 using PROCESS_LOCAL in the first search process
 

 Key: SPARK-1524
 URL: https://issues.apache.org/jira/browse/SPARK-1524
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Reporter: YanTang Zhai
Priority: Minor

 ShuffleMapTask is constructed with TaskLocation which has only host not 
 (host, executorID) pair in DAGScheduler.
 When TaskSetManager schedules ShuffleMapTask which has no preferred 
 executorId using specific execId host and PROCESS_LOCAL locality level, no 
 tasks match the given locality constraint in the first search process.
 We also find that the host used by Scheduler is hostname while the host used 
 by TaskLocation is IP in our cluster. The tow hosts do not match, that makes 
 pendingTasksForHost HashMap empty and the finding task process against our 
 expectation.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (SPARK-1525) TaskSchedulerImpl should decrease availableCpus by spark.task.cpus not 1

2014-04-17 Thread witgo (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13972957#comment-13972957
 ] 

witgo commented on SPARK-1525:
--

The latest code already fix the bug.

 TaskSchedulerImpl should decrease availableCpus by spark.task.cpus not 1
 

 Key: SPARK-1525
 URL: https://issues.apache.org/jira/browse/SPARK-1525
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Reporter: YanTang Zhai
Priority: Minor

 TaskSchedulerImpl decreases availableCpus by 1 in resourceOffers process 
 always even though spark.task.cpus is more than 1, which will schedule more 
 tasks to some node when spark.task.cpus is more than 1.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Closed] (SPARK-1511) Update TestUtils.createCompiledClass() API to work with creating class file on different filesystem

2014-04-17 Thread Ye Xianjin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ye Xianjin closed SPARK-1511.
-

   Resolution: Fixed
Fix Version/s: 1.0.0

 Update TestUtils.createCompiledClass() API to work with creating class file 
 on different filesystem
 ---

 Key: SPARK-1511
 URL: https://issues.apache.org/jira/browse/SPARK-1511
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 0.8.1, 0.9.0, 1.0.0
 Environment: Mac OS X, two disks. 
Reporter: Ye Xianjin
Priority: Minor
  Labels: starter
 Fix For: 1.0.0

   Original Estimate: 24h
  Remaining Estimate: 24h

 The createCompliedClass method uses java File.renameTo method to rename 
 source file to destination file, which will fail if source and destination 
 files are on different disks (or partitions).
 see 
 http://apache-spark-developers-list.1001551.n3.nabble.com/Tests-failed-after-assembling-the-latest-code-from-github-td6315.html
  for more details.
 Use com.google.common.io.Files.move instead of renameTo will solve this issue.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (SPARK-1476) 2GB limit in spark for blocks

2014-04-17 Thread Mridul Muralidharan (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13972978#comment-13972978
 ] 

Mridul Muralidharan commented on SPARK-1476:



[~matei] We are having some issue porting the netty shuffle copier code to 
support  2G since only ByteBuf seems to be exposed.
Before I dig into netty more, wanted to know if you or someone else from among 
spark developers knew how to add support for large buffers in our netty code. 
Thanks !

 2GB limit in spark for blocks
 -

 Key: SPARK-1476
 URL: https://issues.apache.org/jira/browse/SPARK-1476
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
 Environment: all
Reporter: Mridul Muralidharan
Assignee: Mridul Muralidharan
Priority: Critical
 Fix For: 1.1.0


 The underlying abstraction for blocks in spark is a ByteBuffer : which limits 
 the size of the block to 2GB.
 This has implication not just for managed blocks in use, but also for shuffle 
 blocks (memory mapped blocks are limited to 2gig, even though the api allows 
 for long), ser-deser via byte array backed outstreams (SPARK-1391), etc.
 This is a severe limitation for use of spark when used on non trivial 
 datasets.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (SPARK-1526) Running spark driver program from my local machine

2014-04-17 Thread Idan Zalzberg (JIRA)
Idan Zalzberg created SPARK-1526:


 Summary: Running spark driver program from my local machine
 Key: SPARK-1526
 URL: https://issues.apache.org/jira/browse/SPARK-1526
 Project: Spark
  Issue Type: Wish
  Components: Spark Core
Reporter: Idan Zalzberg


Currently it seems that the design choice is that the driver program should be 
close network-wise to the worker and allow connections to be created from 
either side.

This makes using Spark somewhat harder since when I develop locally I not only 
to package all my program, but also all it's local dependencies.
let's say I have a local DB with names of files in HADOOP that I want to 
process with spark, now I need my local DB to be accessible from the cluster so 
it can fetch the file names in runtime.

The driver program is an awesome thing, but it loses some of it's strength if 
you can't really run it anywhere.

It seems to me that the problem is with the DAGScheduler that needs to be close 
to the worker, maybe it shouldn't be embedded in the driver then?



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (SPARK-1527) rootDirs in DiskBlockManagerSuite doesn't get full path from rootDir0, rootDir1

2014-04-17 Thread Ye Xianjin (JIRA)
Ye Xianjin created SPARK-1527:
-

 Summary: rootDirs in DiskBlockManagerSuite doesn't get full path 
from rootDir0, rootDir1
 Key: SPARK-1527
 URL: https://issues.apache.org/jira/browse/SPARK-1527
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 0.9.0
Reporter: Ye Xianjin
Priority: Minor


In core/src/test/scala/org/apache/storage/DiskBlockManagerSuite.scala

  val rootDir0 = Files.createTempDir()
  rootDir0.deleteOnExit()
  val rootDir1 = Files.createTempDir()
  rootDir1.deleteOnExit()
  val rootDirs = rootDir0.getName + , + rootDir1.getName

rootDir0 and rootDir1 are in system's temporary directory. 
rootDir0.getName will not get the full path of the directory but the last 
component of the directory. When passing to DiskBlockManage constructor, the 
DiskBlockerManger creates directories in pwd not the temporary directory.

rootDir0.toString will fix this issue.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (SPARK-1527) rootDirs in DiskBlockManagerSuite doesn't get full path from rootDir0, rootDir1

2014-04-17 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973067#comment-13973067
 ] 

Sean Owen commented on SPARK-1527:
--

{{toString()}} returns {{getPath()}} which may still be relative. 
{{getAbsolutePath()}} is better, but even {{getCanonicalPath()}} may be better 
still.

 rootDirs in DiskBlockManagerSuite doesn't get full path from rootDir0, 
 rootDir1
 ---

 Key: SPARK-1527
 URL: https://issues.apache.org/jira/browse/SPARK-1527
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 0.9.0
Reporter: Ye Xianjin
Priority: Minor
  Labels: starter
   Original Estimate: 24h
  Remaining Estimate: 24h

 In core/src/test/scala/org/apache/storage/DiskBlockManagerSuite.scala
   val rootDir0 = Files.createTempDir()
   rootDir0.deleteOnExit()
   val rootDir1 = Files.createTempDir()
   rootDir1.deleteOnExit()
   val rootDirs = rootDir0.getName + , + rootDir1.getName
 rootDir0 and rootDir1 are in system's temporary directory. 
 rootDir0.getName will not get the full path of the directory but the last 
 component of the directory. When passing to DiskBlockManage constructor, the 
 DiskBlockerManger creates directories in pwd not the temporary directory.
 rootDir0.toString will fix this issue.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (SPARK-1518) Spark master doesn't compile against hadoop-common trunk

2014-04-17 Thread witgo (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973084#comment-13973084
 ] 

witgo commented on SPARK-1518:
--

As the hadoop API changes, some methods have been removed.
The hadoop related in spark core Independence to new modules. As in the case of 
yarn.

 Spark master doesn't compile against hadoop-common trunk
 

 Key: SPARK-1518
 URL: https://issues.apache.org/jira/browse/SPARK-1518
 Project: Spark
  Issue Type: Bug
Reporter: Marcelo Vanzin

 FSDataOutputStream::sync() has disappeared from trunk in Hadoop; 
 FileLogger.scala is calling it.
 I've changed it locally to hsync() so I can compile the code, but haven't 
 checked yet whether those are equivalent. hsync() seems to have been there 
 forever, so it hopefully works with all versions Spark cares about.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (SPARK-1527) rootDirs in DiskBlockManagerSuite doesn't get full path from rootDir0, rootDir1

2014-04-17 Thread Ye Xianjin (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973087#comment-13973087
 ] 

Ye Xianjin commented on SPARK-1527:
---

Yes. You are right. toString() may give relative path. And since it's 
determined by java.io.tmpdir system property. see 
https://code.google.com/p/guava-libraries/source/browse/guava/src/com/google/common/io/Files.java
 line 591. It's possible that the DiskBlockManager will create different 
directories than the original temp dir when java.io.tmpdir is a relative path. 

so use getAbsolutePath since I use this method in my last pr?

But, I saw toString() was called other places! Should we do something about 
that?

 rootDirs in DiskBlockManagerSuite doesn't get full path from rootDir0, 
 rootDir1
 ---

 Key: SPARK-1527
 URL: https://issues.apache.org/jira/browse/SPARK-1527
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 0.9.0
Reporter: Ye Xianjin
Priority: Minor
  Labels: starter
   Original Estimate: 24h
  Remaining Estimate: 24h

 In core/src/test/scala/org/apache/storage/DiskBlockManagerSuite.scala
   val rootDir0 = Files.createTempDir()
   rootDir0.deleteOnExit()
   val rootDir1 = Files.createTempDir()
   rootDir1.deleteOnExit()
   val rootDirs = rootDir0.getName + , + rootDir1.getName
 rootDir0 and rootDir1 are in system's temporary directory. 
 rootDir0.getName will not get the full path of the directory but the last 
 component of the directory. When passing to DiskBlockManage constructor, the 
 DiskBlockerManger creates directories in pwd not the temporary directory.
 rootDir0.toString will fix this issue.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (SPARK-1527) rootDirs in DiskBlockManagerSuite doesn't get full path from rootDir0, rootDir1

2014-04-17 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973091#comment-13973091
 ] 

Sean Owen commented on SPARK-1527:
--

If the paths are only used locally, then an absolute path never hurts (except 
to be a bit longer). I assume that since these are references to a temp 
directory that is by definition only valid locally, that absolute path is the 
right thing to use.

In other cases, similar logic may apply. I could imagine in some cases the 
right thing to do is transmit a relative path. 

 rootDirs in DiskBlockManagerSuite doesn't get full path from rootDir0, 
 rootDir1
 ---

 Key: SPARK-1527
 URL: https://issues.apache.org/jira/browse/SPARK-1527
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 0.9.0
Reporter: Ye Xianjin
Priority: Minor
  Labels: starter
   Original Estimate: 24h
  Remaining Estimate: 24h

 In core/src/test/scala/org/apache/storage/DiskBlockManagerSuite.scala
   val rootDir0 = Files.createTempDir()
   rootDir0.deleteOnExit()
   val rootDir1 = Files.createTempDir()
   rootDir1.deleteOnExit()
   val rootDirs = rootDir0.getName + , + rootDir1.getName
 rootDir0 and rootDir1 are in system's temporary directory. 
 rootDir0.getName will not get the full path of the directory but the last 
 component of the directory. When passing to DiskBlockManage constructor, the 
 DiskBlockerManger creates directories in pwd not the temporary directory.
 rootDir0.toString will fix this issue.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (SPARK-1528) Spark on Yarn: Add option for user to specify additional namenodes to get tokens from

2014-04-17 Thread Thomas Graves (JIRA)
Thomas Graves created SPARK-1528:


 Summary: Spark on Yarn: Add option for user to specify additional 
namenodes to get tokens from
 Key: SPARK-1528
 URL: https://issues.apache.org/jira/browse/SPARK-1528
 Project: Spark
  Issue Type: Improvement
  Components: YARN
Affects Versions: 1.0.0
Reporter: Thomas Graves


Some users running spark on yarn may wish to contact other Hdfs clusters then 
the one they are running on.  We should add in an option for them to specify 
those namenodes so that we can get the credentials needed for the application 
to contact them.  



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (SPARK-1527) rootDirs in DiskBlockManagerSuite doesn't get full path from rootDir0, rootDir1

2014-04-17 Thread Ye Xianjin (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973096#comment-13973096
 ] 

Ye Xianjin commented on SPARK-1527:
---

Yes, of course, sometimes we want absolute path, sometimes we want to transmit 
a relative path. It depends on logic. 
But I think maybe we should review these usages so that we can make sure 
absolute paths or relative paths are used appropriately.

I may have time to review it after I finish another JIRA issue. If you want to 
take it over, please!

Anyway, thanks for your comments and help.


 rootDirs in DiskBlockManagerSuite doesn't get full path from rootDir0, 
 rootDir1
 ---

 Key: SPARK-1527
 URL: https://issues.apache.org/jira/browse/SPARK-1527
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 0.9.0
Reporter: Ye Xianjin
Priority: Minor
  Labels: starter
   Original Estimate: 24h
  Remaining Estimate: 24h

 In core/src/test/scala/org/apache/storage/DiskBlockManagerSuite.scala
   val rootDir0 = Files.createTempDir()
   rootDir0.deleteOnExit()
   val rootDir1 = Files.createTempDir()
   rootDir1.deleteOnExit()
   val rootDirs = rootDir0.getName + , + rootDir1.getName
 rootDir0 and rootDir1 are in system's temporary directory. 
 rootDir0.getName will not get the full path of the directory but the last 
 component of the directory. When passing to DiskBlockManage constructor, the 
 DiskBlockerManger creates directories in pwd not the temporary directory.
 rootDir0.toString will fix this issue.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (SPARK-1527) rootDirs in DiskBlockManagerSuite doesn't get full path from rootDir0, rootDir1

2014-04-17 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973148#comment-13973148
 ] 

Sean Owen commented on SPARK-1527:
--

There are a number of other uses of File.getName(), but a quick glance suggests 
all the others are appropriate.

There are a number of other uses of File.toString(), almost all in tests. I 
suspect the Files in question already have absolute paths, and that even 
relative paths happen to work fine in a test since the working dir doesn't 
change. So those could change, but are probably not a concern.

The only one that gave me pause was the use in HttpBroadcast.scala, though I 
suspect it turns out to work fine for similar reasons.

If reviewers are interested in changing the toString()s I'll test and submit a 
PR for that.

 rootDirs in DiskBlockManagerSuite doesn't get full path from rootDir0, 
 rootDir1
 ---

 Key: SPARK-1527
 URL: https://issues.apache.org/jira/browse/SPARK-1527
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 0.9.0
Reporter: Ye Xianjin
Priority: Minor
  Labels: starter
   Original Estimate: 24h
  Remaining Estimate: 24h

 In core/src/test/scala/org/apache/storage/DiskBlockManagerSuite.scala
   val rootDir0 = Files.createTempDir()
   rootDir0.deleteOnExit()
   val rootDir1 = Files.createTempDir()
   rootDir1.deleteOnExit()
   val rootDirs = rootDir0.getName + , + rootDir1.getName
 rootDir0 and rootDir1 are in system's temporary directory. 
 rootDir0.getName will not get the full path of the directory but the last 
 component of the directory. When passing to DiskBlockManage constructor, the 
 DiskBlockerManger creates directories in pwd not the temporary directory.
 rootDir0.toString will fix this issue.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (SPARK-1520) Inclusion of breeze corrupts assembly when compiled with JDK7 and run on JDK6

2014-04-17 Thread Xiangrui Meng (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973291#comment-13973291
 ] 

Xiangrui Meng commented on SPARK-1520:
--

I'm using Java 6 JDK located at 
/System/Library/Java/JavaVirtualMachines/1.6.0.jdk/Contents/Home on a mac. It 
can create a jar with more than 65536 files. I also found this JIRA:

https://bugs.openjdk.java.net/browse/JDK-4828461 (Support Zip files with more 
than 64k entries)

which was fixed in version 6. Note that this is for openjdk.

I'm going to check the headers of assembly jars created by java 6 and 7.

 Inclusion of breeze corrupts assembly when compiled with JDK7 and run on JDK6
 -

 Key: SPARK-1520
 URL: https://issues.apache.org/jira/browse/SPARK-1520
 Project: Spark
  Issue Type: Bug
  Components: MLlib, Spark Core
Reporter: Patrick Wendell
Priority: Blocker
 Fix For: 1.0.0


 This is a real doozie - when compiling a Spark assembly with JDK7, the 
 produced jar does not work well with JRE6. I confirmed the byte code being 
 produced is JDK 6 compatible (major version 50). What happens is that, 
 silently, the JRE will not load any class files from the assembled jar.
 {code}
 $ sbt/sbt assembly/assembly
 $ /usr/lib/jvm/java-1.7.0-openjdk-amd64/bin/java -cp 
 /home/patrick/Documents/spark/assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar
  org.apache.spark.ui.UIWorkloadGenerator
 usage: ./bin/spark-class org.apache.spark.ui.UIWorkloadGenerator [master] 
 [FIFO|FAIR]
 $ /usr/lib/jvm/java-1.6.0-openjdk-amd64/bin/java -cp 
 /home/patrick/Documents/spark/assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar
  org.apache.spark.ui.UIWorkloadGenerator
 Exception in thread main java.lang.NoClassDefFoundError: 
 org/apache/spark/ui/UIWorkloadGenerator
 Caused by: java.lang.ClassNotFoundException: 
 org.apache.spark.ui.UIWorkloadGenerator
   at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
   at java.security.AccessController.doPrivileged(Native Method)
   at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
   at java.lang.ClassLoader.loadClass(ClassLoader.java:323)
   at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
   at java.lang.ClassLoader.loadClass(ClassLoader.java:268)
 Could not find the main class: org.apache.spark.ui.UIWorkloadGenerator. 
 Program will exit.
 {code}
 I also noticed that if the jar is unzipped, and the classpath set to the 
 currently directory, it just works. Finally, if the assembly jar is 
 compiled with JDK6, it also works. The error is seen with any class, not just 
 the UIWorkloadGenerator. Also, this error doesn't exist in branch 0.9, only 
 in master.
 *Isolation*
 -I ran a git bisection and this appeared after the MLLib sparse vector patch 
 was merged:-
 https://github.com/apache/spark/commit/80c29689ae3b589254a571da3ddb5f9c866ae534
 SPARK-1212
 -I narrowed this down specifically to the inclusion of the breeze library. 
 Just adding breeze to an older (unaffected) build triggered the issue.-
 I've found that if I just unpack and re-pack the jar (using `jar` from java 6 
 or 7) it always works:
 {code}
 $ cd assembly/target/scala-2.10/
 $ /usr/lib/jvm/java-1.6.0-openjdk-amd64/bin/java -cp 
 ./spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar 
 org.apache.spark.ui.UIWorkloadGenerator # fails
 $ jar xvf spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar
 $ jar cvf spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar *
 $ /usr/lib/jvm/java-1.6.0-openjdk-amd64/bin/java -cp 
 ./spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar 
 org.apache.spark.ui.UIWorkloadGenerator # succeeds
 {code}
 I also noticed something of note. The Breeze package contains single 
 directories that have huge numbers of files in them (e.g. 2000+ class files 
 in one directory). It's possible we are hitting some weird bugs/corner cases 
 with compatibility of the internal storage format of the jar itself.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (SPARK-1520) Inclusion of breeze corrupts assembly when compiled with JDK7 and run on JDK6

2014-04-17 Thread Xiangrui Meng (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973306#comment-13973306
 ] 

Xiangrui Meng commented on SPARK-1520:
--

When I try to use jar-1.6 to untar the assembly jar created by java 7:

~~~
java.util.zip.ZipException: invalid CEN header (bad signature)
at java.util.zip.ZipFile.open(Native Method)
at java.util.zip.ZipFile.init(ZipFile.java:128)
at java.util.zip.ZipFile.init(ZipFile.java:89)
at sun.tools.jar.Main.list(Main.java:977)
at sun.tools.jar.Main.run(Main.java:222)
at sun.tools.jar.Main.main(Main.java:1147)
~~~

7z shows:

~~~
Path = spark-assembly-1.6.jar
Type = zip
Physical Size = 119682511

Path = spark-assembly-1.7.jar
Type = zip
64-bit = +
Physical Size = 119682587
~~~

I think the number of files limit is already increased in Java 6 (at least in 
the latest update), but Java 7 will use zip64 format for more than 64k  files, 
and this format cannot be recognized by Java 6.

 Inclusion of breeze corrupts assembly when compiled with JDK7 and run on JDK6
 -

 Key: SPARK-1520
 URL: https://issues.apache.org/jira/browse/SPARK-1520
 Project: Spark
  Issue Type: Bug
  Components: MLlib, Spark Core
Reporter: Patrick Wendell
Priority: Blocker
 Fix For: 1.0.0


 This is a real doozie - when compiling a Spark assembly with JDK7, the 
 produced jar does not work well with JRE6. I confirmed the byte code being 
 produced is JDK 6 compatible (major version 50). What happens is that, 
 silently, the JRE will not load any class files from the assembled jar.
 {code}
 $ sbt/sbt assembly/assembly
 $ /usr/lib/jvm/java-1.7.0-openjdk-amd64/bin/java -cp 
 /home/patrick/Documents/spark/assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar
  org.apache.spark.ui.UIWorkloadGenerator
 usage: ./bin/spark-class org.apache.spark.ui.UIWorkloadGenerator [master] 
 [FIFO|FAIR]
 $ /usr/lib/jvm/java-1.6.0-openjdk-amd64/bin/java -cp 
 /home/patrick/Documents/spark/assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar
  org.apache.spark.ui.UIWorkloadGenerator
 Exception in thread main java.lang.NoClassDefFoundError: 
 org/apache/spark/ui/UIWorkloadGenerator
 Caused by: java.lang.ClassNotFoundException: 
 org.apache.spark.ui.UIWorkloadGenerator
   at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
   at java.security.AccessController.doPrivileged(Native Method)
   at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
   at java.lang.ClassLoader.loadClass(ClassLoader.java:323)
   at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
   at java.lang.ClassLoader.loadClass(ClassLoader.java:268)
 Could not find the main class: org.apache.spark.ui.UIWorkloadGenerator. 
 Program will exit.
 {code}
 I also noticed that if the jar is unzipped, and the classpath set to the 
 currently directory, it just works. Finally, if the assembly jar is 
 compiled with JDK6, it also works. The error is seen with any class, not just 
 the UIWorkloadGenerator. Also, this error doesn't exist in branch 0.9, only 
 in master.
 *Isolation*
 -I ran a git bisection and this appeared after the MLLib sparse vector patch 
 was merged:-
 https://github.com/apache/spark/commit/80c29689ae3b589254a571da3ddb5f9c866ae534
 SPARK-1212
 -I narrowed this down specifically to the inclusion of the breeze library. 
 Just adding breeze to an older (unaffected) build triggered the issue.-
 I've found that if I just unpack and re-pack the jar (using `jar` from java 6 
 or 7) it always works:
 {code}
 $ cd assembly/target/scala-2.10/
 $ /usr/lib/jvm/java-1.6.0-openjdk-amd64/bin/java -cp 
 ./spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar 
 org.apache.spark.ui.UIWorkloadGenerator # fails
 $ jar xvf spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar
 $ jar cvf spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar *
 $ /usr/lib/jvm/java-1.6.0-openjdk-amd64/bin/java -cp 
 ./spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar 
 org.apache.spark.ui.UIWorkloadGenerator # succeeds
 {code}
 I also noticed something of note. The Breeze package contains single 
 directories that have huge numbers of files in them (e.g. 2000+ class files 
 in one directory). It's possible we are hitting some weird bugs/corner cases 
 with compatibility of the internal storage format of the jar itself.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (SPARK-1520) Inclusion of breeze corrupts assembly when compiled with JDK7 and run on JDK6

2014-04-17 Thread Xiangrui Meng (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973326#comment-13973326
 ] 

Xiangrui Meng edited comment on SPARK-1520 at 4/17/14 7:59 PM:
---

The quick fix may be removing fastutil, so Java 7 still generates the assembly 
jar in zip format instead of zip64.

In RDD#countApproxDistinct, we use HyperLogLog from 
com.clearspring.analytics:stream, which depends on fastutil. If this is the 
only place that introduces fastutil dependency, we should implement HyperLogLog 
and remove fastutil completely from Spark's dependencies.


was (Author: mengxr):
The quick fix may be removing fastutil.

In RDD#countApproxDistinct, we use HyperLogLog from 
com.clearspring.analytics:stream, which depends on fastutil. If this is the 
only place that introduces fastutil dependency, we should implement HyperLogLog 
and remove fastutil completely from Spark's dependencies.

 Inclusion of breeze corrupts assembly when compiled with JDK7 and run on JDK6
 -

 Key: SPARK-1520
 URL: https://issues.apache.org/jira/browse/SPARK-1520
 Project: Spark
  Issue Type: Bug
  Components: MLlib, Spark Core
Reporter: Patrick Wendell
Priority: Blocker
 Fix For: 1.0.0


 This is a real doozie - when compiling a Spark assembly with JDK7, the 
 produced jar does not work well with JRE6. I confirmed the byte code being 
 produced is JDK 6 compatible (major version 50). What happens is that, 
 silently, the JRE will not load any class files from the assembled jar.
 {code}
 $ sbt/sbt assembly/assembly
 $ /usr/lib/jvm/java-1.7.0-openjdk-amd64/bin/java -cp 
 /home/patrick/Documents/spark/assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar
  org.apache.spark.ui.UIWorkloadGenerator
 usage: ./bin/spark-class org.apache.spark.ui.UIWorkloadGenerator [master] 
 [FIFO|FAIR]
 $ /usr/lib/jvm/java-1.6.0-openjdk-amd64/bin/java -cp 
 /home/patrick/Documents/spark/assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar
  org.apache.spark.ui.UIWorkloadGenerator
 Exception in thread main java.lang.NoClassDefFoundError: 
 org/apache/spark/ui/UIWorkloadGenerator
 Caused by: java.lang.ClassNotFoundException: 
 org.apache.spark.ui.UIWorkloadGenerator
   at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
   at java.security.AccessController.doPrivileged(Native Method)
   at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
   at java.lang.ClassLoader.loadClass(ClassLoader.java:323)
   at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
   at java.lang.ClassLoader.loadClass(ClassLoader.java:268)
 Could not find the main class: org.apache.spark.ui.UIWorkloadGenerator. 
 Program will exit.
 {code}
 I also noticed that if the jar is unzipped, and the classpath set to the 
 currently directory, it just works. Finally, if the assembly jar is 
 compiled with JDK6, it also works. The error is seen with any class, not just 
 the UIWorkloadGenerator. Also, this error doesn't exist in branch 0.9, only 
 in master.
 *Isolation*
 -I ran a git bisection and this appeared after the MLLib sparse vector patch 
 was merged:-
 https://github.com/apache/spark/commit/80c29689ae3b589254a571da3ddb5f9c866ae534
 SPARK-1212
 -I narrowed this down specifically to the inclusion of the breeze library. 
 Just adding breeze to an older (unaffected) build triggered the issue.-
 I've found that if I just unpack and re-pack the jar (using `jar` from java 6 
 or 7) it always works:
 {code}
 $ cd assembly/target/scala-2.10/
 $ /usr/lib/jvm/java-1.6.0-openjdk-amd64/bin/java -cp 
 ./spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar 
 org.apache.spark.ui.UIWorkloadGenerator # fails
 $ jar xvf spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar
 $ jar cvf spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar *
 $ /usr/lib/jvm/java-1.6.0-openjdk-amd64/bin/java -cp 
 ./spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar 
 org.apache.spark.ui.UIWorkloadGenerator # succeeds
 {code}
 I also noticed something of note. The Breeze package contains single 
 directories that have huge numbers of files in them (e.g. 2000+ class files 
 in one directory). It's possible we are hitting some weird bugs/corner cases 
 with compatibility of the internal storage format of the jar itself.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (SPARK-1520) Inclusion of breeze corrupts assembly when compiled with JDK7 and run on JDK6

2014-04-17 Thread Xiangrui Meng (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973326#comment-13973326
 ] 

Xiangrui Meng commented on SPARK-1520:
--

The quick fix may be removing fastutil.

In RDD#countApproxDistinct, we use HyperLogLog from 
com.clearspring.analytics:stream, which depends on fastutil. If this is the 
only place that introduces fastutil dependency, we should implement HyperLogLog 
and remove fastutil completely from Spark's dependencies.

 Inclusion of breeze corrupts assembly when compiled with JDK7 and run on JDK6
 -

 Key: SPARK-1520
 URL: https://issues.apache.org/jira/browse/SPARK-1520
 Project: Spark
  Issue Type: Bug
  Components: MLlib, Spark Core
Reporter: Patrick Wendell
Priority: Blocker
 Fix For: 1.0.0


 This is a real doozie - when compiling a Spark assembly with JDK7, the 
 produced jar does not work well with JRE6. I confirmed the byte code being 
 produced is JDK 6 compatible (major version 50). What happens is that, 
 silently, the JRE will not load any class files from the assembled jar.
 {code}
 $ sbt/sbt assembly/assembly
 $ /usr/lib/jvm/java-1.7.0-openjdk-amd64/bin/java -cp 
 /home/patrick/Documents/spark/assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar
  org.apache.spark.ui.UIWorkloadGenerator
 usage: ./bin/spark-class org.apache.spark.ui.UIWorkloadGenerator [master] 
 [FIFO|FAIR]
 $ /usr/lib/jvm/java-1.6.0-openjdk-amd64/bin/java -cp 
 /home/patrick/Documents/spark/assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar
  org.apache.spark.ui.UIWorkloadGenerator
 Exception in thread main java.lang.NoClassDefFoundError: 
 org/apache/spark/ui/UIWorkloadGenerator
 Caused by: java.lang.ClassNotFoundException: 
 org.apache.spark.ui.UIWorkloadGenerator
   at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
   at java.security.AccessController.doPrivileged(Native Method)
   at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
   at java.lang.ClassLoader.loadClass(ClassLoader.java:323)
   at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
   at java.lang.ClassLoader.loadClass(ClassLoader.java:268)
 Could not find the main class: org.apache.spark.ui.UIWorkloadGenerator. 
 Program will exit.
 {code}
 I also noticed that if the jar is unzipped, and the classpath set to the 
 currently directory, it just works. Finally, if the assembly jar is 
 compiled with JDK6, it also works. The error is seen with any class, not just 
 the UIWorkloadGenerator. Also, this error doesn't exist in branch 0.9, only 
 in master.
 *Isolation*
 -I ran a git bisection and this appeared after the MLLib sparse vector patch 
 was merged:-
 https://github.com/apache/spark/commit/80c29689ae3b589254a571da3ddb5f9c866ae534
 SPARK-1212
 -I narrowed this down specifically to the inclusion of the breeze library. 
 Just adding breeze to an older (unaffected) build triggered the issue.-
 I've found that if I just unpack and re-pack the jar (using `jar` from java 6 
 or 7) it always works:
 {code}
 $ cd assembly/target/scala-2.10/
 $ /usr/lib/jvm/java-1.6.0-openjdk-amd64/bin/java -cp 
 ./spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar 
 org.apache.spark.ui.UIWorkloadGenerator # fails
 $ jar xvf spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar
 $ jar cvf spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar *
 $ /usr/lib/jvm/java-1.6.0-openjdk-amd64/bin/java -cp 
 ./spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar 
 org.apache.spark.ui.UIWorkloadGenerator # succeeds
 {code}
 I also noticed something of note. The Breeze package contains single 
 directories that have huge numbers of files in them (e.g. 2000+ class files 
 in one directory). It's possible we are hitting some weird bugs/corner cases 
 with compatibility of the internal storage format of the jar itself.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (SPARK-1520) Assembly Jar with more than 65536 files won't work when compiled on JDK7 and run on JDK6

2014-04-17 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-1520:
---

Summary: Assembly Jar with more than 65536 files won't work when compiled 
on  JDK7 and run on JDK6  (was: Assembly Jar with more than  JDK7 and run on 
JDK6)

 Assembly Jar with more than 65536 files won't work when compiled on  JDK7 and 
 run on JDK6
 -

 Key: SPARK-1520
 URL: https://issues.apache.org/jira/browse/SPARK-1520
 Project: Spark
  Issue Type: Bug
  Components: MLlib, Spark Core
Reporter: Patrick Wendell
Priority: Blocker
 Fix For: 1.0.0


 This is a real doozie - when compiling a Spark assembly with JDK7, the 
 produced jar does not work well with JRE6. I confirmed the byte code being 
 produced is JDK 6 compatible (major version 50). What happens is that, 
 silently, the JRE will not load any class files from the assembled jar.
 {code}
 $ sbt/sbt assembly/assembly
 $ /usr/lib/jvm/java-1.7.0-openjdk-amd64/bin/java -cp 
 /home/patrick/Documents/spark/assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar
  org.apache.spark.ui.UIWorkloadGenerator
 usage: ./bin/spark-class org.apache.spark.ui.UIWorkloadGenerator [master] 
 [FIFO|FAIR]
 $ /usr/lib/jvm/java-1.6.0-openjdk-amd64/bin/java -cp 
 /home/patrick/Documents/spark/assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar
  org.apache.spark.ui.UIWorkloadGenerator
 Exception in thread main java.lang.NoClassDefFoundError: 
 org/apache/spark/ui/UIWorkloadGenerator
 Caused by: java.lang.ClassNotFoundException: 
 org.apache.spark.ui.UIWorkloadGenerator
   at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
   at java.security.AccessController.doPrivileged(Native Method)
   at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
   at java.lang.ClassLoader.loadClass(ClassLoader.java:323)
   at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
   at java.lang.ClassLoader.loadClass(ClassLoader.java:268)
 Could not find the main class: org.apache.spark.ui.UIWorkloadGenerator. 
 Program will exit.
 {code}
 I also noticed that if the jar is unzipped, and the classpath set to the 
 currently directory, it just works. Finally, if the assembly jar is 
 compiled with JDK6, it also works. The error is seen with any class, not just 
 the UIWorkloadGenerator. Also, this error doesn't exist in branch 0.9, only 
 in master.
 *Isolation*
 -I ran a git bisection and this appeared after the MLLib sparse vector patch 
 was merged:-
 https://github.com/apache/spark/commit/80c29689ae3b589254a571da3ddb5f9c866ae534
 SPARK-1212
 -I narrowed this down specifically to the inclusion of the breeze library. 
 Just adding breeze to an older (unaffected) build triggered the issue.-
 I've found that if I just unpack and re-pack the jar (using `jar` from java 6 
 or 7) it always works:
 {code}
 $ cd assembly/target/scala-2.10/
 $ /usr/lib/jvm/java-1.6.0-openjdk-amd64/bin/java -cp 
 ./spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar 
 org.apache.spark.ui.UIWorkloadGenerator # fails
 $ jar xvf spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar
 $ jar cvf spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar *
 $ /usr/lib/jvm/java-1.6.0-openjdk-amd64/bin/java -cp 
 ./spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar 
 org.apache.spark.ui.UIWorkloadGenerator # succeeds
 {code}
 I also noticed something of note. The Breeze package contains single 
 directories that have huge numbers of files in them (e.g. 2000+ class files 
 in one directory). It's possible we are hitting some weird bugs/corner cases 
 with compatibility of the internal storage format of the jar itself.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (SPARK-1520) Assembly Jar with more than JDK7 and run on JDK6

2014-04-17 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-1520:
---

Summary: Assembly Jar with more than  JDK7 and run on JDK6  (was: Inclusion 
of breeze corrupts assembly when compiled with JDK7 and run on JDK6)

 Assembly Jar with more than  JDK7 and run on JDK6
 -

 Key: SPARK-1520
 URL: https://issues.apache.org/jira/browse/SPARK-1520
 Project: Spark
  Issue Type: Bug
  Components: MLlib, Spark Core
Reporter: Patrick Wendell
Priority: Blocker
 Fix For: 1.0.0


 This is a real doozie - when compiling a Spark assembly with JDK7, the 
 produced jar does not work well with JRE6. I confirmed the byte code being 
 produced is JDK 6 compatible (major version 50). What happens is that, 
 silently, the JRE will not load any class files from the assembled jar.
 {code}
 $ sbt/sbt assembly/assembly
 $ /usr/lib/jvm/java-1.7.0-openjdk-amd64/bin/java -cp 
 /home/patrick/Documents/spark/assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar
  org.apache.spark.ui.UIWorkloadGenerator
 usage: ./bin/spark-class org.apache.spark.ui.UIWorkloadGenerator [master] 
 [FIFO|FAIR]
 $ /usr/lib/jvm/java-1.6.0-openjdk-amd64/bin/java -cp 
 /home/patrick/Documents/spark/assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar
  org.apache.spark.ui.UIWorkloadGenerator
 Exception in thread main java.lang.NoClassDefFoundError: 
 org/apache/spark/ui/UIWorkloadGenerator
 Caused by: java.lang.ClassNotFoundException: 
 org.apache.spark.ui.UIWorkloadGenerator
   at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
   at java.security.AccessController.doPrivileged(Native Method)
   at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
   at java.lang.ClassLoader.loadClass(ClassLoader.java:323)
   at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
   at java.lang.ClassLoader.loadClass(ClassLoader.java:268)
 Could not find the main class: org.apache.spark.ui.UIWorkloadGenerator. 
 Program will exit.
 {code}
 I also noticed that if the jar is unzipped, and the classpath set to the 
 currently directory, it just works. Finally, if the assembly jar is 
 compiled with JDK6, it also works. The error is seen with any class, not just 
 the UIWorkloadGenerator. Also, this error doesn't exist in branch 0.9, only 
 in master.
 *Isolation*
 -I ran a git bisection and this appeared after the MLLib sparse vector patch 
 was merged:-
 https://github.com/apache/spark/commit/80c29689ae3b589254a571da3ddb5f9c866ae534
 SPARK-1212
 -I narrowed this down specifically to the inclusion of the breeze library. 
 Just adding breeze to an older (unaffected) build triggered the issue.-
 I've found that if I just unpack and re-pack the jar (using `jar` from java 6 
 or 7) it always works:
 {code}
 $ cd assembly/target/scala-2.10/
 $ /usr/lib/jvm/java-1.6.0-openjdk-amd64/bin/java -cp 
 ./spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar 
 org.apache.spark.ui.UIWorkloadGenerator # fails
 $ jar xvf spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar
 $ jar cvf spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar *
 $ /usr/lib/jvm/java-1.6.0-openjdk-amd64/bin/java -cp 
 ./spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar 
 org.apache.spark.ui.UIWorkloadGenerator # succeeds
 {code}
 I also noticed something of note. The Breeze package contains single 
 directories that have huge numbers of files in them (e.g. 2000+ class files 
 in one directory). It's possible we are hitting some weird bugs/corner cases 
 with compatibility of the internal storage format of the jar itself.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (SPARK-1529) Support setting spark.local.dirs to a hadoop FileSystem

2014-04-17 Thread Patrick Wendell (JIRA)
Patrick Wendell created SPARK-1529:
--

 Summary: Support setting spark.local.dirs to a hadoop FileSystem 
 Key: SPARK-1529
 URL: https://issues.apache.org/jira/browse/SPARK-1529
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Reporter: Patrick Wendell
Assignee: Cheng Lian
 Fix For: 1.1.0


In some environments, like with MapR, local volumes are accessed through the 
Hadoop filesystem interface. We should allow setting spark.local.dir to a 
Hadoop filesystem location. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (SPARK-1520) Assembly Jar with more than 65536 files won't work when compiled on JDK7 and run on JDK6

2014-04-17 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-1520:
---

Description: 
This is a real doozie - when compiling a Spark assembly with JDK7, the produced 
jar does not work well with JRE6. I confirmed the byte code being produced is 
JDK 6 compatible (major version 50). What happens is that, silently, the JRE 
will not load any class files from the assembled jar.

{code}
$ sbt/sbt assembly/assembly

$ /usr/lib/jvm/java-1.7.0-openjdk-amd64/bin/java -cp 
/home/patrick/Documents/spark/assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar
 org.apache.spark.ui.UIWorkloadGenerator
usage: ./bin/spark-class org.apache.spark.ui.UIWorkloadGenerator [master] 
[FIFO|FAIR]

$ /usr/lib/jvm/java-1.6.0-openjdk-amd64/bin/java -cp 
/home/patrick/Documents/spark/assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar
 org.apache.spark.ui.UIWorkloadGenerator
Exception in thread main java.lang.NoClassDefFoundError: 
org/apache/spark/ui/UIWorkloadGenerator
Caused by: java.lang.ClassNotFoundException: 
org.apache.spark.ui.UIWorkloadGenerator
at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
at java.lang.ClassLoader.loadClass(ClassLoader.java:323)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
at java.lang.ClassLoader.loadClass(ClassLoader.java:268)
Could not find the main class: org.apache.spark.ui.UIWorkloadGenerator. Program 
will exit.

{code}

I also noticed that if the jar is unzipped, and the classpath set to the 
currently directory, it just works. Finally, if the assembly jar is compiled 
with JDK6, it also works. The error is seen with any class, not just the 
UIWorkloadGenerator. Also, this error doesn't exist in branch 0.9, only in 
master.

h1. Isolation and Cause

This issue is caused by the following 

-I've found that if I just unpack and re-pack the jar (using `jar`) it always 
works:-

{code}
$ cd assembly/target/scala-2.10/
$ /usr/lib/jvm/java-1.6.0-openjdk-amd64/bin/java -cp 
./spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar 
org.apache.spark.ui.UIWorkloadGenerator # fails
$ jar xvf spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar
$ jar cvf spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar *
$ /usr/lib/jvm/java-1.6.0-openjdk-amd64/bin/java -cp 
./spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar 
org.apache.spark.ui.UIWorkloadGenerator # succeeds
{code}

-I also noticed something of note. The Breeze package contains single 
directories that have huge numbers of files in them (e.g. 2000+ class files in 
one directory). It's possible we are hitting some weird bugs/corner cases with 
compatibility of the internal storage format of the jar itself.-

-I narrowed this down specifically to the inclusion of the breeze library. Just 
adding breeze to an older (unaffected) build triggered the issue.-

-I ran a git bisection and this appeared after the MLLib sparse vector patch 
was merged:-
https://github.com/apache/spark/commit/80c29689ae3b589254a571da3ddb5f9c866ae534
SPARK-1212

  was:
This is a real doozie - when compiling a Spark assembly with JDK7, the produced 
jar does not work well with JRE6. I confirmed the byte code being produced is 
JDK 6 compatible (major version 50). What happens is that, silently, the JRE 
will not load any class files from the assembled jar.

{code}
$ sbt/sbt assembly/assembly

$ /usr/lib/jvm/java-1.7.0-openjdk-amd64/bin/java -cp 
/home/patrick/Documents/spark/assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar
 org.apache.spark.ui.UIWorkloadGenerator
usage: ./bin/spark-class org.apache.spark.ui.UIWorkloadGenerator [master] 
[FIFO|FAIR]

$ /usr/lib/jvm/java-1.6.0-openjdk-amd64/bin/java -cp 
/home/patrick/Documents/spark/assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar
 org.apache.spark.ui.UIWorkloadGenerator
Exception in thread main java.lang.NoClassDefFoundError: 
org/apache/spark/ui/UIWorkloadGenerator
Caused by: java.lang.ClassNotFoundException: 
org.apache.spark.ui.UIWorkloadGenerator
at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
at java.lang.ClassLoader.loadClass(ClassLoader.java:323)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
at java.lang.ClassLoader.loadClass(ClassLoader.java:268)
Could not find the main class: org.apache.spark.ui.UIWorkloadGenerator. Program 
will exit.

{code}

I also noticed that if the jar is unzipped, and the classpath set to the 
currently directory, it just works. Finally, if the assembly jar is compiled 
with JDK6, it also works. The error is seen 

[jira] [Updated] (SPARK-1520) Assembly Jar with more than 65536 files won't work when compiled on JDK7 and run on JDK6

2014-04-17 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-1520:
---

Description: 
This is a real doozie - when compiling a Spark assembly with JDK7, the produced 
jar does not work well with JRE6. I confirmed the byte code being produced is 
JDK 6 compatible (major version 50). What happens is that, silently, the JRE 
will not load any class files from the assembled jar.

{code}
$ sbt/sbt assembly/assembly

$ /usr/lib/jvm/java-1.7.0-openjdk-amd64/bin/java -cp 
/home/patrick/Documents/spark/assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar
 org.apache.spark.ui.UIWorkloadGenerator
usage: ./bin/spark-class org.apache.spark.ui.UIWorkloadGenerator [master] 
[FIFO|FAIR]

$ /usr/lib/jvm/java-1.6.0-openjdk-amd64/bin/java -cp 
/home/patrick/Documents/spark/assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar
 org.apache.spark.ui.UIWorkloadGenerator
Exception in thread main java.lang.NoClassDefFoundError: 
org/apache/spark/ui/UIWorkloadGenerator
Caused by: java.lang.ClassNotFoundException: 
org.apache.spark.ui.UIWorkloadGenerator
at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
at java.lang.ClassLoader.loadClass(ClassLoader.java:323)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
at java.lang.ClassLoader.loadClass(ClassLoader.java:268)
Could not find the main class: org.apache.spark.ui.UIWorkloadGenerator. Program 
will exit.

{code}

I also noticed that if the jar is unzipped, and the classpath set to the 
currently directory, it just works. Finally, if the assembly jar is compiled 
with JDK6, it also works. The error is seen with any class, not just the 
UIWorkloadGenerator. Also, this error doesn't exist in branch 0.9, only in 
master.

h1. Isolation and Cause

The package-time behavior of Java 6 and 7 differ with respect to the format 
used for jar files:
||Number of entries||JDK 6||JDK 7||
|= 65536|zip|zip|
| 65536|zip*|zip64|

zip* is a workaround for the original zip format that [described in 
JDK-6828461|https://bugs.openjdk.java.net/browse/JDK-4828461)] that allows some 
versions of Java 6 to support larger assembly jars.

The Scala libraries we depend on have added a large number of classes which 
bumped us over the limit. This causes the Java 7 packaging to not work with 
Java 6. We can probably go back under the limit by clearing out some accidental 
inclusion of FastUtil, but eventually we'll go over again.

The real answer is to force people to build with JDK 6 if they want to run 
Spark on JRE 6.

-I've found that if I just unpack and re-pack the jar (using `jar`) it always 
works:-

{code}
$ cd assembly/target/scala-2.10/
$ /usr/lib/jvm/java-1.6.0-openjdk-amd64/bin/java -cp 
./spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar 
org.apache.spark.ui.UIWorkloadGenerator # fails
$ jar xvf spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar
$ jar cvf spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar *
$ /usr/lib/jvm/java-1.6.0-openjdk-amd64/bin/java -cp 
./spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar 
org.apache.spark.ui.UIWorkloadGenerator # succeeds
{code}

-I also noticed something of note. The Breeze package contains single 
directories that have huge numbers of files in them (e.g. 2000+ class files in 
one directory). It's possible we are hitting some weird bugs/corner cases with 
compatibility of the internal storage format of the jar itself.-

-I narrowed this down specifically to the inclusion of the breeze library. Just 
adding breeze to an older (unaffected) build triggered the issue.-

-I ran a git bisection and this appeared after the MLLib sparse vector patch 
was merged:-
https://github.com/apache/spark/commit/80c29689ae3b589254a571da3ddb5f9c866ae534
SPARK-1212

  was:
This is a real doozie - when compiling a Spark assembly with JDK7, the produced 
jar does not work well with JRE6. I confirmed the byte code being produced is 
JDK 6 compatible (major version 50). What happens is that, silently, the JRE 
will not load any class files from the assembled jar.

{code}
$ sbt/sbt assembly/assembly

$ /usr/lib/jvm/java-1.7.0-openjdk-amd64/bin/java -cp 
/home/patrick/Documents/spark/assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar
 org.apache.spark.ui.UIWorkloadGenerator
usage: ./bin/spark-class org.apache.spark.ui.UIWorkloadGenerator [master] 
[FIFO|FAIR]

$ /usr/lib/jvm/java-1.6.0-openjdk-amd64/bin/java -cp 
/home/patrick/Documents/spark/assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar
 org.apache.spark.ui.UIWorkloadGenerator
Exception in thread main java.lang.NoClassDefFoundError: 
org/apache/spark/ui/UIWorkloadGenerator
Caused by: java.lang.ClassNotFoundException: 

[jira] [Updated] (SPARK-1520) Assembly Jar with more than 65536 files won't work when compiled on JDK7 and run on JDK6

2014-04-17 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-1520:
---

Description: 
This is a real doozie - when compiling a Spark assembly with JDK7, the produced 
jar does not work well with JRE6. I confirmed the byte code being produced is 
JDK 6 compatible (major version 50). What happens is that, silently, the JRE 
will not load any class files from the assembled jar.

{code}
$ sbt/sbt assembly/assembly

$ /usr/lib/jvm/java-1.7.0-openjdk-amd64/bin/java -cp 
/home/patrick/Documents/spark/assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar
 org.apache.spark.ui.UIWorkloadGenerator
usage: ./bin/spark-class org.apache.spark.ui.UIWorkloadGenerator [master] 
[FIFO|FAIR]

$ /usr/lib/jvm/java-1.6.0-openjdk-amd64/bin/java -cp 
/home/patrick/Documents/spark/assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar
 org.apache.spark.ui.UIWorkloadGenerator
Exception in thread main java.lang.NoClassDefFoundError: 
org/apache/spark/ui/UIWorkloadGenerator
Caused by: java.lang.ClassNotFoundException: 
org.apache.spark.ui.UIWorkloadGenerator
at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
at java.lang.ClassLoader.loadClass(ClassLoader.java:323)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
at java.lang.ClassLoader.loadClass(ClassLoader.java:268)
Could not find the main class: org.apache.spark.ui.UIWorkloadGenerator. Program 
will exit.

{code}

I also noticed that if the jar is unzipped, and the classpath set to the 
currently directory, it just works. Finally, if the assembly jar is compiled 
with JDK6, it also works. The error is seen with any class, not just the 
UIWorkloadGenerator. Also, this error doesn't exist in branch 0.9, only in 
master.

h1. Isolation and Cause

The package-time behavior of Java 6 and 7 differ with respect to the format 
used for jar files:
||Number of entries||JDK 6||JDK 7||
|= 65536|zip|zip|
| 65536|zip*|zip64|

zip* is a workaround for the original zip format that [described in 
JDK-6828461|https://bugs.openjdk.java.net/browse/JDK-4828461] that allows some 
versions of Java 6 to support larger assembly jars.

The Scala libraries we depend on have added a large number of classes which 
bumped us over the limit. This causes the Java 7 packaging to not work with 
Java 6. We can probably go back under the limit by clearing out some accidental 
inclusion of FastUtil, but eventually we'll go over again.

The real answer is to force people to build with JDK 6 if they want to run 
Spark on JRE 6.

-I've found that if I just unpack and re-pack the jar (using `jar`) it always 
works:-

{code}
$ cd assembly/target/scala-2.10/
$ /usr/lib/jvm/java-1.6.0-openjdk-amd64/bin/java -cp 
./spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar 
org.apache.spark.ui.UIWorkloadGenerator # fails
$ jar xvf spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar
$ jar cvf spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar *
$ /usr/lib/jvm/java-1.6.0-openjdk-amd64/bin/java -cp 
./spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar 
org.apache.spark.ui.UIWorkloadGenerator # succeeds
{code}

-I also noticed something of note. The Breeze package contains single 
directories that have huge numbers of files in them (e.g. 2000+ class files in 
one directory). It's possible we are hitting some weird bugs/corner cases with 
compatibility of the internal storage format of the jar itself.-

-I narrowed this down specifically to the inclusion of the breeze library. Just 
adding breeze to an older (unaffected) build triggered the issue.-

-I ran a git bisection and this appeared after the MLLib sparse vector patch 
was merged:-
https://github.com/apache/spark/commit/80c29689ae3b589254a571da3ddb5f9c866ae534
SPARK-1212

  was:
This is a real doozie - when compiling a Spark assembly with JDK7, the produced 
jar does not work well with JRE6. I confirmed the byte code being produced is 
JDK 6 compatible (major version 50). What happens is that, silently, the JRE 
will not load any class files from the assembled jar.

{code}
$ sbt/sbt assembly/assembly

$ /usr/lib/jvm/java-1.7.0-openjdk-amd64/bin/java -cp 
/home/patrick/Documents/spark/assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar
 org.apache.spark.ui.UIWorkloadGenerator
usage: ./bin/spark-class org.apache.spark.ui.UIWorkloadGenerator [master] 
[FIFO|FAIR]

$ /usr/lib/jvm/java-1.6.0-openjdk-amd64/bin/java -cp 
/home/patrick/Documents/spark/assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar
 org.apache.spark.ui.UIWorkloadGenerator
Exception in thread main java.lang.NoClassDefFoundError: 
org/apache/spark/ui/UIWorkloadGenerator
Caused by: java.lang.ClassNotFoundException: 

[jira] [Commented] (SPARK-1520) Assembly Jar with more than 65536 files won't work when compiled on JDK7 and run on JDK6

2014-04-17 Thread Xiangrui Meng (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973463#comment-13973463
 ] 

Xiangrui Meng commented on SPARK-1520:
--

It seems HyperLogLog doesn't need fastutil, so we can exclude fastutil 
directly. Will send a patch.

 Assembly Jar with more than 65536 files won't work when compiled on  JDK7 and 
 run on JDK6
 -

 Key: SPARK-1520
 URL: https://issues.apache.org/jira/browse/SPARK-1520
 Project: Spark
  Issue Type: Bug
  Components: MLlib, Spark Core
Reporter: Patrick Wendell
Priority: Blocker
 Fix For: 1.0.0


 This is a real doozie - when compiling a Spark assembly with JDK7, the 
 produced jar does not work well with JRE6. I confirmed the byte code being 
 produced is JDK 6 compatible (major version 50). What happens is that, 
 silently, the JRE will not load any class files from the assembled jar.
 {code}
 $ sbt/sbt assembly/assembly
 $ /usr/lib/jvm/java-1.7.0-openjdk-amd64/bin/java -cp 
 /home/patrick/Documents/spark/assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar
  org.apache.spark.ui.UIWorkloadGenerator
 usage: ./bin/spark-class org.apache.spark.ui.UIWorkloadGenerator [master] 
 [FIFO|FAIR]
 $ /usr/lib/jvm/java-1.6.0-openjdk-amd64/bin/java -cp 
 /home/patrick/Documents/spark/assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar
  org.apache.spark.ui.UIWorkloadGenerator
 Exception in thread main java.lang.NoClassDefFoundError: 
 org/apache/spark/ui/UIWorkloadGenerator
 Caused by: java.lang.ClassNotFoundException: 
 org.apache.spark.ui.UIWorkloadGenerator
   at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
   at java.security.AccessController.doPrivileged(Native Method)
   at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
   at java.lang.ClassLoader.loadClass(ClassLoader.java:323)
   at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
   at java.lang.ClassLoader.loadClass(ClassLoader.java:268)
 Could not find the main class: org.apache.spark.ui.UIWorkloadGenerator. 
 Program will exit.
 {code}
 I also noticed that if the jar is unzipped, and the classpath set to the 
 currently directory, it just works. Finally, if the assembly jar is 
 compiled with JDK6, it also works. The error is seen with any class, not just 
 the UIWorkloadGenerator. Also, this error doesn't exist in branch 0.9, only 
 in master.
 h1. Isolation and Cause
 The package-time behavior of Java 6 and 7 differ with respect to the format 
 used for jar files:
 ||Number of entries||JDK 6||JDK 7||
 |= 65536|zip|zip|
 | 65536|zip*|zip64|
 zip* is a workaround for the original zip format that [described in 
 JDK-6828461|https://bugs.openjdk.java.net/browse/JDK-4828461] that allows 
 some versions of Java 6 to support larger assembly jars.
 The Scala libraries we depend on have added a large number of classes which 
 bumped us over the limit. This causes the Java 7 packaging to not work with 
 Java 6. We can probably go back under the limit by clearing out some 
 accidental inclusion of FastUtil, but eventually we'll go over again.
 The real answer is to force people to build with JDK 6 if they want to run 
 Spark on JRE 6.
 -I've found that if I just unpack and re-pack the jar (using `jar`) it always 
 works:-
 {code}
 $ cd assembly/target/scala-2.10/
 $ /usr/lib/jvm/java-1.6.0-openjdk-amd64/bin/java -cp 
 ./spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar 
 org.apache.spark.ui.UIWorkloadGenerator # fails
 $ jar xvf spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar
 $ jar cvf spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar *
 $ /usr/lib/jvm/java-1.6.0-openjdk-amd64/bin/java -cp 
 ./spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar 
 org.apache.spark.ui.UIWorkloadGenerator # succeeds
 {code}
 -I also noticed something of note. The Breeze package contains single 
 directories that have huge numbers of files in them (e.g. 2000+ class files 
 in one directory). It's possible we are hitting some weird bugs/corner cases 
 with compatibility of the internal storage format of the jar itself.-
 -I narrowed this down specifically to the inclusion of the breeze library. 
 Just adding breeze to an older (unaffected) build triggered the issue.-
 -I ran a git bisection and this appeared after the MLLib sparse vector patch 
 was merged:-
 https://github.com/apache/spark/commit/80c29689ae3b589254a571da3ddb5f9c866ae534
 SPARK-1212



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (SPARK-1464) Update MLLib Examples to Use Breeze

2014-04-17 Thread Xiangrui Meng (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng resolved SPARK-1464.
--

Resolution: Duplicate

 Update MLLib Examples to Use Breeze
 ---

 Key: SPARK-1464
 URL: https://issues.apache.org/jira/browse/SPARK-1464
 Project: Spark
  Issue Type: Task
  Components: MLlib
Reporter: Patrick Wendell
Assignee: Xiangrui Meng
Priority: Blocker
 Fix For: 1.0.0


 If we want to deprecate the vector class we need to update all of the 
 examples to use Breeze.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (SPARK-1496) SparkContext.jarOfClass should return Option instead of a sequence

2014-04-17 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-1496:
---

Labels: release-notes  (was: releasenotes)

 SparkContext.jarOfClass should return Option instead of a sequence
 --

 Key: SPARK-1496
 URL: https://issues.apache.org/jira/browse/SPARK-1496
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Reporter: Patrick Wendell
Assignee: Patrick Wendell
  Labels: release-notes
 Fix For: 1.0.0


 This is pretty confusing, especially since addJar expects to take a single 
 jar.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (SPARK-1496) SparkContext.jarOfClass should return Option instead of a sequence

2014-04-17 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-1496:
---

Labels: releasenotes  (was: )

 SparkContext.jarOfClass should return Option instead of a sequence
 --

 Key: SPARK-1496
 URL: https://issues.apache.org/jira/browse/SPARK-1496
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Reporter: Patrick Wendell
Assignee: Patrick Wendell
  Labels: release-notes
 Fix For: 1.0.0


 This is pretty confusing, especially since addJar expects to take a single 
 jar.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (SPARK-1496) SparkContext.jarOfClass should return Option instead of a sequence

2014-04-17 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-1496:
---

Labels: api-change  (was: release-notes)

 SparkContext.jarOfClass should return Option instead of a sequence
 --

 Key: SPARK-1496
 URL: https://issues.apache.org/jira/browse/SPARK-1496
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Reporter: Patrick Wendell
Assignee: Patrick Wendell
  Labels: api-change
 Fix For: 1.0.0


 This is pretty confusing, especially since addJar expects to take a single 
 jar.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (SPARK-964) Investigate the potential for using JDK 8 lambda expressions for the Java/Scala APIs

2014-04-17 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-964:
--

Labels: api-change  (was: )

 Investigate the potential for using JDK 8 lambda expressions for the 
 Java/Scala APIs
 

 Key: SPARK-964
 URL: https://issues.apache.org/jira/browse/SPARK-964
 Project: Spark
  Issue Type: Story
Reporter: Marek Kolodziej
Assignee: Marek Kolodziej
  Labels: api-change
 Fix For: 1.0.0


 JDK 8 (to be released soon) will have lambda expressions. The question is 
 whether they can be leveraged for Java to use Scala's Spark API (perhaps with 
 some modifications), or whether a new functional API would need to be 
 developed for Java 8+.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (SPARK-1530) Streaming UI test can hang indefinitely

2014-04-17 Thread Patrick Wendell (JIRA)
Patrick Wendell created SPARK-1530:
--

 Summary: Streaming UI test can hang indefinitely
 Key: SPARK-1530
 URL: https://issues.apache.org/jira/browse/SPARK-1530
 Project: Spark
  Issue Type: Bug
Reporter: Patrick Wendell
Assignee: Tathagata Das


This has been causing Jenkins to hang recently:

{code}
pool-1-thread-1 prio=10 tid=0x7f4b9449f000 nid=0x6c37 runnable 
[0x7f4b8a26c000]
   java.lang.Thread.State: RUNNABLE
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:152)
at java.net.SocketInputStream.read(SocketInputStream.java:122)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
- locked 0x0007cad700d0 (a java.io.BufferedInputStream)
at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:687)
at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633)
at 
sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1323)
- locked 0x0007cad662b8 (a 
sun.net.www.protocol.http.HttpURLConnection)
at java.net.URL.openStream(URL.java:1037)
at scala.io.Source$.fromURL(Source.scala:140)
at scala.io.Source$.fromURL(Source.scala:130)
at 
org.apache.spark.ui.UISuite$$anonfun$2$$anonfun$apply$mcV$sp$2$$anonfun$apply$2.apply$mcV$sp(UISuite.scala:57)
at 
org.apache.spark.ui.UISuite$$anonfun$2$$anonfun$apply$mcV$sp$2$$anonfun$apply$2.apply(UISuite.scala:56)
at 
org.apache.spark.ui.UISuite$$anonfun$2$$anonfun$apply$mcV$sp$2$$anonfun$apply$2.apply(UISuite.scala:56)
at 
org.scalatest.concurrent.Eventually$class.makeAValiantAttempt$1(Eventually.scala:394)
at 
org.scalatest.concurrent.Eventually$class.tryTryAgain$1(Eventually.scala:408)
at 
org.scalatest.concurrent.Eventually$class.eventually(Eventually.scala:437)
at org.scalatest.concurrent.Eventually$.eventually(Eventually.scala:477)
at 
org.scalatest.concurrent.Eventually$class.eventually(Eventually.scala:307)
at org.scalatest.concurrent.Eventually$.eventually(Eventually.scala:477)
at 
org.apache.spark.ui.UISuite$$anonfun$2$$anonfun$apply$mcV$sp$2.apply(UISuite.scala:56)
at 
org.apache.spark.ui.UISuite$$anonfun$2$$anonfun$apply$mcV$sp$2.apply(UISuite.scala:54)
at 
org.apache.spark.LocalSparkContext$.withSpark(LocalSparkContext.scala:60)
at org.apache.spark.ui.UISuite$$anonfun$2.apply$mcV$sp(UISuite.scala:54)
at org.apache.spark.ui.UISuite$$anonfun$2.apply(UISuite.scala:54)
at org.apache.spark.ui.UISuite$$anonfun$2.apply(UISuite.scala:54)
at org.scalatest.FunSuite$$anon$1.apply(FunSuite.scala:1265)
at org.scalatest.Suite$class.withFixture(Suite.scala:1974)
at org.apache.spark.ui.UISuite.withFixture(UISuite.scala:37)
at org.scalatest.FunSuite$class.invokeWithFixture$1(FunSuite.scala:1262)
at org.scalatest.FunSuite$$anonfun$runTest$1.apply(FunSuite.scala:1271)
at org.scalatest.FunSuite$$anonfun$runTest$1.apply(FunSuite.scala:1271)
at org.scalatest.SuperEngine.runTestImpl(Engine.scala:198)
at org.scalatest.FunSuite$class.runTest(FunSuite.scala:1271)
at org.apache.spark.ui.UISuite.runTest(UISuite.scala:37)
at org.scalatest.FunSuite$$anonfun$runTests$1.apply(FunSuite.scala:1304)
at org.scalatest.FunSuite$$anonfun$runTests$1.apply(FunSuite.scala:1304)
at 
org.scalatest.SuperEngine$$anonfun$org$scalatest$SuperEngine$$runTestsInBranch$1.apply(Engine.scala:260)
at 
org.scalatest.SuperEngine$$anonfun$org$scalatest$SuperEngine$$runTestsInBranch$1.apply(Engine.scala:249)
at scala.collection.immutable.List.foreach(List.scala:318)
at 
org.scalatest.SuperEngine.org$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:249)
at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:326)
at org.scalatest.FunSuite$class.runTests(FunSuite.scala:1304)
at org.apache.spark.ui.UISuite.runTests(UISuite.scala:37)
at org.scalatest.Suite$class.run(Suite.scala:2303)
at 
org.apache.spark.ui.UISuite.org$scalatest$FunSuite$$super$run(UISuite.scala:37)
at org.scalatest.FunSuite$$anonfun$run$1.apply(FunSuite.scala:1310)
at org.scalatest.FunSuite$$anonfun$run$1.apply(FunSuite.scala:1310)
at org.scalatest.SuperEngine.runImpl(Engine.scala:362)
at org.scalatest.FunSuite$class.run(FunSuite.scala:1310)
at org.apache.spark.ui.UISuite.run(UISuite.scala:37)
at 
org.scalatest.tools.ScalaTestFramework$ScalaTestRunner.run(ScalaTestFramework.scala:214)
at 

[jira] [Commented] (SPARK-1473) Feature selection for high dimensional datasets

2014-04-17 Thread Ignacio Zendejas (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973631#comment-13973631
 ] 

Ignacio Zendejas commented on SPARK-1473:
-

Thanks for pointing this out Martin. We'll definitely take this into 
consideration.

 Feature selection for high dimensional datasets
 ---

 Key: SPARK-1473
 URL: https://issues.apache.org/jira/browse/SPARK-1473
 Project: Spark
  Issue Type: New Feature
  Components: MLlib
Reporter: Ignacio Zendejas
Priority: Minor
  Labels: features
 Fix For: 1.1.0


 For classification tasks involving large feature spaces in the order of tens 
 of thousands or higher (e.g., text classification with n-grams, where n  1), 
 it is often useful to rank and filter features that are irrelevant thereby 
 reducing the feature space by at least one or two orders of magnitude without 
 impacting performance on key evaluation metrics (accuracy/precision/recall).
 A feature evaluation interface which is flexible needs to be designed and at 
 least two methods should be implemented with Information Gain being a 
 priority as it has been shown to be amongst the most reliable.
 Special consideration should be taken in the design to account for wrapper 
 methods (see research papers below) which are more practical for lower 
 dimensional data.
 Relevant research:
 * Brown, G., Pocock, A., Zhao, M. J.,  Luján, M. (2012). Conditional
 likelihood maximisation: a unifying framework for information theoretic
 feature selection.*The Journal of Machine Learning Research*, *13*, 27-66.
 * Forman, George. An extensive empirical study of feature selection metrics 
 for text classification. The Journal of machine learning research 3 (2003): 
 1289-1305.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (SPARK-1530) Streaming UI test can hang indefinitely

2014-04-17 Thread Tathagata Das (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973737#comment-13973737
 ] 

Tathagata Das commented on SPARK-1530:
--

I wonder what about Jenkins environment is causing this. 
And how frequently does this happen?


 Streaming UI test can hang indefinitely
 ---

 Key: SPARK-1530
 URL: https://issues.apache.org/jira/browse/SPARK-1530
 Project: Spark
  Issue Type: Bug
Reporter: Patrick Wendell
Assignee: Tathagata Das

 This has been causing Jenkins to hang recently:
 {code}
 pool-1-thread-1 prio=10 tid=0x7f4b9449f000 nid=0x6c37 runnable 
 [0x7f4b8a26c000]
java.lang.Thread.State: RUNNABLE
 at java.net.SocketInputStream.socketRead0(Native Method)
 at java.net.SocketInputStream.read(SocketInputStream.java:152)
 at java.net.SocketInputStream.read(SocketInputStream.java:122)
 at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
 at java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
 at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
 - locked 0x0007cad700d0 (a java.io.BufferedInputStream)
 at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:687)
 at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633)
 at 
 sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1323)
 - locked 0x0007cad662b8 (a 
 sun.net.www.protocol.http.HttpURLConnection)
 at java.net.URL.openStream(URL.java:1037)
 at scala.io.Source$.fromURL(Source.scala:140)
 at scala.io.Source$.fromURL(Source.scala:130)
 at 
 org.apache.spark.ui.UISuite$$anonfun$2$$anonfun$apply$mcV$sp$2$$anonfun$apply$2.apply$mcV$sp(UISuite.scala:57)
 at 
 org.apache.spark.ui.UISuite$$anonfun$2$$anonfun$apply$mcV$sp$2$$anonfun$apply$2.apply(UISuite.scala:56)
 at 
 org.apache.spark.ui.UISuite$$anonfun$2$$anonfun$apply$mcV$sp$2$$anonfun$apply$2.apply(UISuite.scala:56)
 at 
 org.scalatest.concurrent.Eventually$class.makeAValiantAttempt$1(Eventually.scala:394)
 at 
 org.scalatest.concurrent.Eventually$class.tryTryAgain$1(Eventually.scala:408)
 at 
 org.scalatest.concurrent.Eventually$class.eventually(Eventually.scala:437)
 at 
 org.scalatest.concurrent.Eventually$.eventually(Eventually.scala:477)
 at 
 org.scalatest.concurrent.Eventually$class.eventually(Eventually.scala:307)
 at 
 org.scalatest.concurrent.Eventually$.eventually(Eventually.scala:477)
 at 
 org.apache.spark.ui.UISuite$$anonfun$2$$anonfun$apply$mcV$sp$2.apply(UISuite.scala:56)
 at 
 org.apache.spark.ui.UISuite$$anonfun$2$$anonfun$apply$mcV$sp$2.apply(UISuite.scala:54)
 at 
 org.apache.spark.LocalSparkContext$.withSpark(LocalSparkContext.scala:60)
 at 
 org.apache.spark.ui.UISuite$$anonfun$2.apply$mcV$sp(UISuite.scala:54)
 at org.apache.spark.ui.UISuite$$anonfun$2.apply(UISuite.scala:54)
 at org.apache.spark.ui.UISuite$$anonfun$2.apply(UISuite.scala:54)
 at org.scalatest.FunSuite$$anon$1.apply(FunSuite.scala:1265)
 at org.scalatest.Suite$class.withFixture(Suite.scala:1974)
 at org.apache.spark.ui.UISuite.withFixture(UISuite.scala:37)
 at 
 org.scalatest.FunSuite$class.invokeWithFixture$1(FunSuite.scala:1262)
 at 
 org.scalatest.FunSuite$$anonfun$runTest$1.apply(FunSuite.scala:1271)
 at 
 org.scalatest.FunSuite$$anonfun$runTest$1.apply(FunSuite.scala:1271)
 at org.scalatest.SuperEngine.runTestImpl(Engine.scala:198)
 at org.scalatest.FunSuite$class.runTest(FunSuite.scala:1271)
 at org.apache.spark.ui.UISuite.runTest(UISuite.scala:37)
 at 
 org.scalatest.FunSuite$$anonfun$runTests$1.apply(FunSuite.scala:1304)
 at 
 org.scalatest.FunSuite$$anonfun$runTests$1.apply(FunSuite.scala:1304)
 at 
 org.scalatest.SuperEngine$$anonfun$org$scalatest$SuperEngine$$runTestsInBranch$1.apply(Engine.scala:260)
 at 
 org.scalatest.SuperEngine$$anonfun$org$scalatest$SuperEngine$$runTestsInBranch$1.apply(Engine.scala:249)
 at scala.collection.immutable.List.foreach(List.scala:318)
 at 
 org.scalatest.SuperEngine.org$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:249)
 at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:326)
 at org.scalatest.FunSuite$class.runTests(FunSuite.scala:1304)
 at org.apache.spark.ui.UISuite.runTests(UISuite.scala:37)
 at org.scalatest.Suite$class.run(Suite.scala:2303)
 at 
 org.apache.spark.ui.UISuite.org$scalatest$FunSuite$$super$run(UISuite.scala:37)
 at org.scalatest.FunSuite$$anonfun$run$1.apply(FunSuite.scala:1310)
 at 

[jira] [Assigned] (SPARK-1520) Assembly Jar with more than 65536 files won't work when compiled on JDK7 and run on JDK6

2014-04-17 Thread Xiangrui Meng (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng reassigned SPARK-1520:


Assignee: Xiangrui Meng

 Assembly Jar with more than 65536 files won't work when compiled on  JDK7 and 
 run on JDK6
 -

 Key: SPARK-1520
 URL: https://issues.apache.org/jira/browse/SPARK-1520
 Project: Spark
  Issue Type: Bug
  Components: MLlib, Spark Core
Reporter: Patrick Wendell
Assignee: Xiangrui Meng
Priority: Blocker
 Fix For: 1.0.0


 This is a real doozie - when compiling a Spark assembly with JDK7, the 
 produced jar does not work well with JRE6. I confirmed the byte code being 
 produced is JDK 6 compatible (major version 50). What happens is that, 
 silently, the JRE will not load any class files from the assembled jar.
 {code}
 $ sbt/sbt assembly/assembly
 $ /usr/lib/jvm/java-1.7.0-openjdk-amd64/bin/java -cp 
 /home/patrick/Documents/spark/assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar
  org.apache.spark.ui.UIWorkloadGenerator
 usage: ./bin/spark-class org.apache.spark.ui.UIWorkloadGenerator [master] 
 [FIFO|FAIR]
 $ /usr/lib/jvm/java-1.6.0-openjdk-amd64/bin/java -cp 
 /home/patrick/Documents/spark/assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar
  org.apache.spark.ui.UIWorkloadGenerator
 Exception in thread main java.lang.NoClassDefFoundError: 
 org/apache/spark/ui/UIWorkloadGenerator
 Caused by: java.lang.ClassNotFoundException: 
 org.apache.spark.ui.UIWorkloadGenerator
   at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
   at java.security.AccessController.doPrivileged(Native Method)
   at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
   at java.lang.ClassLoader.loadClass(ClassLoader.java:323)
   at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
   at java.lang.ClassLoader.loadClass(ClassLoader.java:268)
 Could not find the main class: org.apache.spark.ui.UIWorkloadGenerator. 
 Program will exit.
 {code}
 I also noticed that if the jar is unzipped, and the classpath set to the 
 currently directory, it just works. Finally, if the assembly jar is 
 compiled with JDK6, it also works. The error is seen with any class, not just 
 the UIWorkloadGenerator. Also, this error doesn't exist in branch 0.9, only 
 in master.
 h1. Isolation and Cause
 The package-time behavior of Java 6 and 7 differ with respect to the format 
 used for jar files:
 ||Number of entries||JDK 6||JDK 7||
 |= 65536|zip|zip|
 | 65536|zip*|zip64|
 zip* is a workaround for the original zip format that [described in 
 JDK-6828461|https://bugs.openjdk.java.net/browse/JDK-4828461] that allows 
 some versions of Java 6 to support larger assembly jars.
 The Scala libraries we depend on have added a large number of classes which 
 bumped us over the limit. This causes the Java 7 packaging to not work with 
 Java 6. We can probably go back under the limit by clearing out some 
 accidental inclusion of FastUtil, but eventually we'll go over again.
 The real answer is to force people to build with JDK 6 if they want to run 
 Spark on JRE 6.
 -I've found that if I just unpack and re-pack the jar (using `jar`) it always 
 works:-
 {code}
 $ cd assembly/target/scala-2.10/
 $ /usr/lib/jvm/java-1.6.0-openjdk-amd64/bin/java -cp 
 ./spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar 
 org.apache.spark.ui.UIWorkloadGenerator # fails
 $ jar xvf spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar
 $ jar cvf spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar *
 $ /usr/lib/jvm/java-1.6.0-openjdk-amd64/bin/java -cp 
 ./spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar 
 org.apache.spark.ui.UIWorkloadGenerator # succeeds
 {code}
 -I also noticed something of note. The Breeze package contains single 
 directories that have huge numbers of files in them (e.g. 2000+ class files 
 in one directory). It's possible we are hitting some weird bugs/corner cases 
 with compatibility of the internal storage format of the jar itself.-
 -I narrowed this down specifically to the inclusion of the breeze library. 
 Just adding breeze to an older (unaffected) build triggered the issue.-
 -I ran a git bisection and this appeared after the MLLib sparse vector patch 
 was merged:-
 https://github.com/apache/spark/commit/80c29689ae3b589254a571da3ddb5f9c866ae534
 SPARK-1212



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (SPARK-1531) GraphX should have messageRDD to enable OutOfCore messages

2014-04-17 Thread Jianfeng (JIRA)
Jianfeng created SPARK-1531:
---

 Summary: GraphX should have messageRDD to enable OutOfCore messages
 Key: SPARK-1531
 URL: https://issues.apache.org/jira/browse/SPARK-1531
 Project: Spark
  Issue Type: Improvement
  Components: GraphX
Affects Versions: 0.9.1
Reporter: Jianfeng


There is no such `messageRDD` in Pregel function.
Most of the sendMessage is directly sent one Scala Iterator. Like the below one 
in staticPageRank:
```
def sendMessage(edge: EdgeTriplet[Double, Double]) =
  Iterator((edge.dstId, edge.srcAttr * edge.attr))
```
For some message intensively computation on some bigger graph, it will throw 
OOM exceptions. If we have some more general messageRDD, at lease we can set 
MessageRDD.persist(DISK) to enable it flush onto the disk.




--
This message was sent by Atlassian JIRA
(v6.2#6252)