[jira] [Commented] (SPARK-1718) pyspark doesn't work with assembly jar containing over 65536 files/dirs built on redhat

2014-06-12 Thread Thomas Graves (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14029422#comment-14029422
 ] 

Thomas Graves commented on SPARK-1718:
--

So this actually appears to be an issue because of using jdk7 on redhat.  If I 
switch back to use jdk6 then the build works and pyspark works.  Note that in 
both cases I'm using jdk7 to run with, so it doesn't appear to be the same as 
SPARK-1520


> pyspark doesn't work with assembly jar containing over 65536 files/dirs built 
> on redhat 
> 
>
> Key: SPARK-1718
> URL: https://issues.apache.org/jira/browse/SPARK-1718
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 1.0.0
>Reporter: Thomas Graves
>
> Recently pyspark was ported to yarn (pr 30), but when I went to try it I 
> couldn't get it work.  I was building on a redhat 6 box.  I figured out that 
> if the assembly jar file contained over 65536 files/directories then it 
> wouldn't work.  If I unjarred the assembly and removed some stuff to get it 
> under 65536 and jarred it back up, then it would work. 
> It appears to only be an issue when building on a redhat box as I can build 
> on my mac and it works just fine there.   



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (SPARK-1718) pyspark doesn't work with assembly jar containing over 65536 files/dirs built on redhat

2014-05-05 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13989906#comment-13989906
 ] 

Sean Owen commented on SPARK-1718:
--

Yeah I may not be adding anything here. I suppose I just advise to double-check 
what's being used to build, to run, and anything in between (like zip or jar). 

Like, does the python-related build zip or jar anything? (I don't know that 
part of the build.) That could reintroduce the problem if something outside of 
Java land is not using the zip64 format.

> pyspark doesn't work with assembly jar containing over 65536 files/dirs built 
> on redhat 
> 
>
> Key: SPARK-1718
> URL: https://issues.apache.org/jira/browse/SPARK-1718
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 1.0.0
>Reporter: Thomas Graves
>
> Recently pyspark was ported to yarn (pr 30), but when I went to try it I 
> couldn't get it work.  I was building on a redhat 6 box.  I figured out that 
> if the assembly jar file contained over 65536 files/directories then it 
> wouldn't work.  If I unjarred the assembly and removed some stuff to get it 
> under 65536 and jarred it back up, then it would work. 
> It appears to only be an issue when building on a redhat box as I can build 
> on my mac and it works just fine there.   



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (SPARK-1718) pyspark doesn't work with assembly jar containing over 65536 files/dirs built on redhat

2014-05-05 Thread Thomas Graves (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13989903#comment-13989903
 ] 

Thomas Graves commented on SPARK-1718:
--

I am running with jdk7 and building with jdk7.

> pyspark doesn't work with assembly jar containing over 65536 files/dirs built 
> on redhat 
> 
>
> Key: SPARK-1718
> URL: https://issues.apache.org/jira/browse/SPARK-1718
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 1.0.0
>Reporter: Thomas Graves
>
> Recently pyspark was ported to yarn (pr 30), but when I went to try it I 
> couldn't get it work.  I was building on a redhat 6 box.  I figured out that 
> if the assembly jar file contained over 65536 files/directories then it 
> wouldn't work.  If I unjarred the assembly and removed some stuff to get it 
> under 65536 and jarred it back up, then it would work. 
> It appears to only be an issue when building on a redhat box as I can build 
> on my mac and it works just fine there.   



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (SPARK-1718) pyspark doesn't work with assembly jar containing over 65536 files/dirs built on redhat

2014-05-05 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13989892#comment-13989892
 ] 

Sean Owen commented on SPARK-1718:
--

I could be mistaken here. The primary problem combination is building with 7 
and running with 6. But I also thought I understood that building or running 
with an older JDK 6 could be a problem too. (That, I am not 100% sure of.)

If you are running with 6, then you definitely don't want to build with 7. (The 
source/target can't be set to 7 in this case; either you build with 6 and it 
balks, or, you successfully build with 7 but at runtime, 6 won't accept the 
bytecode.) It sounds like you are building with 7 then? but is that your Mac 
build? if your RedHat build is using JDK 7, then I think this is just the same 
problem as in SPARK-1520 and you should use JDK 6 to build on that machine.

(keep in mind that unzipping / rezipping, and unjarring / rejarring, might 
affect the result, as it affects the format of the .jar file! Worth noting 
whether that alone is causing or solving the issue.)

If you are sure you're building with 6, then my next question would be whether 
it's actually building with an older JDK 6, and whether that can be upgraded 
perhaps, and whether that resolves it. 

Running on JDK 7 should be fine either way. I wasn't clear whether Andrew was 
saying that didn't work either: 
https://github.com/apache/spark/pull/30#issuecomment-42057384   But I assume 
the question is how to get it running on 6.

> pyspark doesn't work with assembly jar containing over 65536 files/dirs built 
> on redhat 
> 
>
> Key: SPARK-1718
> URL: https://issues.apache.org/jira/browse/SPARK-1718
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 1.0.0
>Reporter: Thomas Graves
>
> Recently pyspark was ported to yarn (pr 30), but when I went to try it I 
> couldn't get it work.  I was building on a redhat 6 box.  I figured out that 
> if the assembly jar file contained over 65536 files/directories then it 
> wouldn't work.  If I unjarred the assembly and removed some stuff to get it 
> under 65536 and jarred it back up, then it would work. 
> It appears to only be an issue when building on a redhat box as I can build 
> on my mac and it works just fine there.   



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (SPARK-1718) pyspark doesn't work with assembly jar containing over 65536 files/dirs built on redhat

2014-05-05 Thread Thomas Graves (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13989854#comment-13989854
 ] 

Thomas Graves commented on SPARK-1718:
--

I can build the jar on my mac, copy it over to the same redhat boxes I run on 
and it works fine.  If it was the runtime environment was using jdk6 then that 
wouldn't work.  I assume you are saying that if you build the jar with jdk6 and 
try to run on jdk7 it has the same issue?

I also checked the MANIFEST to verify it was build with jdk7.  

$ cat MANIFEST.MF 
Build-Jdk: 1.7.0_25

I also went in and changed the pom.xml to use java version 1.7 as source and 
target, but that doesn't look like its working as when I check the .class files 
for the major version it comes back as 50 (jdk6) so perhaps this is what is 
causing the issue.

It could still be possible there is something in my environment causing it but 
as of yet haven't figured out what so wanted to file a jira to track the issue. 
I took Andrew's comment as he also tried it and ran into the same issue but 
perhaps I misunderstood.
Do you happen to have redhat box you could try it on?  

> pyspark doesn't work with assembly jar containing over 65536 files/dirs built 
> on redhat 
> 
>
> Key: SPARK-1718
> URL: https://issues.apache.org/jira/browse/SPARK-1718
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 1.0.0
>Reporter: Thomas Graves
>
> Recently pyspark was ported to yarn (pr 30), but when I went to try it I 
> couldn't get it work.  I was building on a redhat 6 box.  I figured out that 
> if the assembly jar file contained over 65536 files/directories then it 
> wouldn't work.  If I unjarred the assembly and removed some stuff to get it 
> under 65536 and jarred it back up, then it would work. 
> It appears to only be an issue when building on a redhat box as I can build 
> on my mac and it works just fine there.   



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (SPARK-1718) pyspark doesn't work with assembly jar containing over 65536 files/dirs built on redhat

2014-05-05 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13989666#comment-13989666
 ] 

Sean Owen commented on SPARK-1718:
--

Yeah, but it seems like it could well be that the JDK binaries being used 
during the build aren't quite what is expected, because some home or path 
variable points to JDK6. That was the substance of Patrick's last comment, and 
I wasn't sure whether Andrew was definitely confirming the build happened with 
Java 7, just that it was installed. (?) 

I suppose it could also be some old version of zip being used to re-zip jars 
and such, though it strikes me as less likely, but hey.

> pyspark doesn't work with assembly jar containing over 65536 files/dirs built 
> on redhat 
> 
>
> Key: SPARK-1718
> URL: https://issues.apache.org/jira/browse/SPARK-1718
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 1.0.0
>Reporter: Thomas Graves
>
> Recently pyspark was ported to yarn (pr 30), but when I went to try it I 
> couldn't get it work.  I was building on a redhat 6 box.  I figured out that 
> if the assembly jar file contained over 65536 files/directories then it 
> wouldn't work.  If I unjarred the assembly and removed some stuff to get it 
> under 65536 and jarred it back up, then it would work. 
> It appears to only be an issue when building on a redhat box as I can build 
> on my mac and it works just fine there.   



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (SPARK-1718) pyspark doesn't work with assembly jar containing over 65536 files/dirs built on redhat

2014-05-05 Thread Thomas Graves (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13989658#comment-13989658
 ] 

Thomas Graves commented on SPARK-1718:
--

No, see discussion on https://github.com/apache/spark/pull/30.   This happens 
if you build on a redhat box and you build and run with jdk7.  


> pyspark doesn't work with assembly jar containing over 65536 files/dirs built 
> on redhat 
> 
>
> Key: SPARK-1718
> URL: https://issues.apache.org/jira/browse/SPARK-1718
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 1.0.0
>Reporter: Thomas Graves
>
> Recently pyspark was ported to yarn (pr 30), but when I went to try it I 
> couldn't get it work.  I was building on a redhat 6 box.  I figured out that 
> if the assembly jar file contained over 65536 files/directories then it 
> wouldn't work.  If I unjarred the assembly and removed some stuff to get it 
> under 65536 and jarred it back up, then it would work. 
> It appears to only be an issue when building on a redhat box as I can build 
> on my mac and it works just fine there.   



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (SPARK-1718) pyspark doesn't work with assembly jar containing over 65536 files/dirs built on redhat

2014-05-05 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13989642#comment-13989642
 ] 

Sean Owen commented on SPARK-1718:
--

This is the same issue as https://issues.apache.org/jira/browse/SPARK-1520 
right?

> pyspark doesn't work with assembly jar containing over 65536 files/dirs built 
> on redhat 
> 
>
> Key: SPARK-1718
> URL: https://issues.apache.org/jira/browse/SPARK-1718
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 1.0.0
>Reporter: Thomas Graves
>
> Recently pyspark was ported to yarn (pr 30), but when I went to try it I 
> couldn't get it work.  I was building on a redhat 6 box.  I figured out that 
> if the assembly jar file contained over 65536 files/directories then it 
> wouldn't work.  If I unjarred the assembly and removed some stuff to get it 
> under 65536 and jarred it back up, then it would work. 
> It appears to only be an issue when building on a redhat box as I can build 
> on my mac and it works just fine there.   



--
This message was sent by Atlassian JIRA
(v6.2#6252)