[ 
https://issues.apache.org/jira/browse/SPARK-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13973306#comment-13973306
 ] 

Xiangrui Meng commented on SPARK-1520:
--------------------------------------

When I try to use jar-1.6 to untar the assembly jar created by java 7:

~~~
java.util.zip.ZipException: invalid CEN header (bad signature)
        at java.util.zip.ZipFile.open(Native Method)
        at java.util.zip.ZipFile.<init>(ZipFile.java:128)
        at java.util.zip.ZipFile.<init>(ZipFile.java:89)
        at sun.tools.jar.Main.list(Main.java:977)
        at sun.tools.jar.Main.run(Main.java:222)
        at sun.tools.jar.Main.main(Main.java:1147)
~~~

7z shows:

~~~
Path = spark-assembly-1.6.jar
Type = zip
Physical Size = 119682511

Path = spark-assembly-1.7.jar
Type = zip
64-bit = +
Physical Size = 119682587
~~~

I think the number of files limit is already increased in Java 6 (at least in 
the latest update), but Java 7 will use zip64 format for more than 64k  files, 
and this format cannot be recognized by Java 6.

> Inclusion of breeze corrupts assembly when compiled with JDK7 and run on JDK6
> -----------------------------------------------------------------------------
>
>                 Key: SPARK-1520
>                 URL: https://issues.apache.org/jira/browse/SPARK-1520
>             Project: Spark
>          Issue Type: Bug
>          Components: MLlib, Spark Core
>            Reporter: Patrick Wendell
>            Priority: Blocker
>             Fix For: 1.0.0
>
>
> This is a real doozie - when compiling a Spark assembly with JDK7, the 
> produced jar does not work well with JRE6. I confirmed the byte code being 
> produced is JDK 6 compatible (major version 50). What happens is that, 
> silently, the JRE will not load any class files from the assembled jar.
> {code}
> $> sbt/sbt assembly/assembly
> $> /usr/lib/jvm/java-1.7.0-openjdk-amd64/bin/java -cp 
> /home/patrick/Documents/spark/assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar
>  org.apache.spark.ui.UIWorkloadGenerator
> usage: ./bin/spark-class org.apache.spark.ui.UIWorkloadGenerator [master] 
> [FIFO|FAIR]
> $> /usr/lib/jvm/java-1.6.0-openjdk-amd64/bin/java -cp 
> /home/patrick/Documents/spark/assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar
>  org.apache.spark.ui.UIWorkloadGenerator
> Exception in thread "main" java.lang.NoClassDefFoundError: 
> org/apache/spark/ui/UIWorkloadGenerator
> Caused by: java.lang.ClassNotFoundException: 
> org.apache.spark.ui.UIWorkloadGenerator
>       at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
>       at java.security.AccessController.doPrivileged(Native Method)
>       at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
>       at java.lang.ClassLoader.loadClass(ClassLoader.java:323)
>       at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
>       at java.lang.ClassLoader.loadClass(ClassLoader.java:268)
> Could not find the main class: org.apache.spark.ui.UIWorkloadGenerator. 
> Program will exit.
> {code}
> I also noticed that if the jar is unzipped, and the classpath set to the 
> currently directory, it "just works". Finally, if the assembly jar is 
> compiled with JDK6, it also works. The error is seen with any class, not just 
> the UIWorkloadGenerator. Also, this error doesn't exist in branch 0.9, only 
> in master.
> *Isolation*
> -I ran a git bisection and this appeared after the MLLib sparse vector patch 
> was merged:-
> https://github.com/apache/spark/commit/80c29689ae3b589254a571da3ddb5f9c866ae534
> SPARK-1212
> -I narrowed this down specifically to the inclusion of the breeze library. 
> Just adding breeze to an older (unaffected) build triggered the issue.-
> I've found that if I just unpack and re-pack the jar (using `jar` from java 6 
> or 7) it always works:
> {code}
> $ cd assembly/target/scala-2.10/
> $ /usr/lib/jvm/java-1.6.0-openjdk-amd64/bin/java -cp 
> ./spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar 
> org.apache.spark.ui.UIWorkloadGenerator # fails
> $ jar xvf spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar
> $ jar cvf spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar *
> $ /usr/lib/jvm/java-1.6.0-openjdk-amd64/bin/java -cp 
> ./spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar 
> org.apache.spark.ui.UIWorkloadGenerator # succeeds
> {code}
> I also noticed something of note. The Breeze package contains single 
> directories that have huge numbers of files in them (e.g. 2000+ class files 
> in one directory). It's possible we are hitting some weird bugs/corner cases 
> with compatibility of the internal storage format of the jar itself.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to