[ https://issues.apache.org/jira/browse/SPARK-7009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14589004#comment-14589004 ]
Harry Brundage commented on SPARK-7009: --------------------------------------- We're still experiencing this issue with Pyspark on YARN with the recently released 1.4 artifacts and are stuck using a forked version of Spark where we've merged in Github pull #5637. Doesn't this seem rather severe to you folks? You can't use PySpark on YARN with the official artifacts unless you use hadoop-provided, which doesn't make sense in our case as a lot of users are submitting jobs locally or from driver nodes that aren't part of the hadoop cluster. [~joshrosen] do you have any ideas on this one? Interestingly, merging in #5637 to the current apache/spark master also doesn't seem to produce a python-importable JAR file, so something seems to have broken there. I fought with the pom.xml for a while and solve an initial problem where because the ant-run plugin is included for other reasons now the repackage profile was running before the assembly had actually completed, but once they are in the right order and the repackage completes successfully the jar file still can't be imported. Another factoid is that my previously working artifact from my previous spark version with #5637 merged in has 98018 files in it, and the version with #5637 merged into master has 101473 files, which doesn't seem like all that large a jump to break something. This seems odd to me and I am not super confident I am doing everything correctly. > Build assembly JAR via ant to avoid zip64 problems > -------------------------------------------------- > > Key: SPARK-7009 > URL: https://issues.apache.org/jira/browse/SPARK-7009 > Project: Spark > Issue Type: Improvement > Components: Build > Affects Versions: 1.3.0 > Environment: Java 7+ > Reporter: Steve Loughran > Original Estimate: 2h > Remaining Estimate: 2h > > SPARK-1911 shows the problem that JDK7+ is using zip64 to build large JARs; a > format incompatible with Java and pyspark. > Provided the total number of .class files+resources is <64K, ant can be used > to make the final JAR instead, perhaps by unzipping the maven-generated JAR > then rezipping it with zip64=never, before publishing the artifact via maven. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org