[ 
https://issues.apache.org/jira/browse/SPARK-7009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14589004#comment-14589004
 ] 

Harry Brundage commented on SPARK-7009:
---------------------------------------

We're still experiencing this issue with Pyspark on YARN with the recently 
released 1.4 artifacts and are stuck using a forked version of Spark where 
we've merged in Github pull #5637. Doesn't this seem rather severe to you 
folks? You can't use PySpark on YARN with the official artifacts unless you use 
hadoop-provided, which doesn't make sense in our case as a lot of users are 
submitting jobs locally or from driver nodes that aren't part of the hadoop 
cluster. [~joshrosen] do you have any ideas on this one? 


Interestingly, merging in #5637 to the current apache/spark master also doesn't 
seem to produce a python-importable JAR file, so something seems to have broken 
there. I fought with the pom.xml for a while and solve an initial problem where 
because the ant-run plugin is included for other reasons now the repackage 
profile was running before the assembly had actually completed, but once they 
are in the right order and the repackage completes successfully the jar file 
still can't be imported. Another factoid is that my previously working artifact 
from my previous spark version with #5637 merged in has 98018 files in it, and 
the version with #5637 merged into master has 101473 files, which doesn't seem 
like all that large a jump to break something.  This seems odd to me and I am 
not super confident I am doing everything correctly. 

> Build assembly JAR via ant to avoid zip64 problems
> --------------------------------------------------
>
>                 Key: SPARK-7009
>                 URL: https://issues.apache.org/jira/browse/SPARK-7009
>             Project: Spark
>          Issue Type: Improvement
>          Components: Build
>    Affects Versions: 1.3.0
>         Environment: Java 7+
>            Reporter: Steve Loughran
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> SPARK-1911 shows the problem that JDK7+ is using zip64 to build large JARs; a 
> format incompatible with Java and pyspark.
> Provided the total number of .class files+resources is <64K, ant can be used 
> to make the final JAR instead, perhaps by unzipping the maven-generated JAR 
> then rezipping it with zip64=never, before publishing the artifact via maven.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to