Re: Spark 1.3.1 / Hadoop 2.6 package has broken S3 access

2015-05-18 Thread Imran Rashid
On Fri, May 8, 2015 at 4:16 AM, Steve Loughran ste...@hortonworks.com wrote: Would there be a place in the code tree for some tests to run against things like this? They're cloud integration tests rather than unit tests and nobody would want them to be on by default, but it could be good for

Re: Spark 1.3.1 / Hadoop 2.6 package has broken S3 access

2015-05-08 Thread Steve Loughran
2. I can add a hadoop-2.6 profile that sets things up for s3a, azure and openstack swift. Added: https://issues.apache.org/jira/browse/SPARK-7481 One thing to consider here is testing; the s3x clients themselves have some tests that individuals/orgs can run against different S3

Re: Spark 1.3.1 / Hadoop 2.6 package has broken S3 access

2015-05-08 Thread Steve Loughran
On 7 May 2015, at 18:02, Matei Zaharia matei.zaha...@gmail.com wrote: We should make sure to update our docs to mention s3a as well, since many people won't look at Hadoop's docs for this. Matei 1. to use s3a you'll also need an amazon toolkit JAR on the cp 2. I can add a hadoop-2.6

Re: Spark 1.3.1 / Hadoop 2.6 package has broken S3 access

2015-05-07 Thread Nicholas Chammas
I can try that, but the issue is I understand this is supposed to work out of the box (like it does with all the other Spark/Hadoop pre-built packages). On Thu, May 7, 2015 at 12:35 PM Peter Rudenko petro.rude...@gmail.com wrote: Try to download this jar:

Re: Spark 1.3.1 / Hadoop 2.6 package has broken S3 access

2015-05-07 Thread Peter Rudenko
Yep it's a Hadoop issue: https://issues.apache.org/jira/browse/HADOOP-11863 http://mail-archives.apache.org/mod_mbox/hadoop-user/201504.mbox/%3CCA+XUwYxPxLkfhOxn1jNkoUKEQQMcPWFzvXJ=u+kp28kdejo...@mail.gmail.com%3E http://stackoverflow.com/a/28033408/3271168 So for now need to manually add that

Spark 1.3.1 / Hadoop 2.6 package has broken S3 access

2015-05-07 Thread Nicholas Chammas
Details are here: https://issues.apache.org/jira/browse/SPARK-7442 It looks like something specific to building against Hadoop 2.6? Nick

Re: Spark 1.3.1 / Hadoop 2.6 package has broken S3 access

2015-05-07 Thread Reynold Xin
Is this related to s3a update in 2.6? On Thursday, May 7, 2015, Nicholas Chammas nicholas.cham...@gmail.com wrote: Details are here: https://issues.apache.org/jira/browse/SPARK-7442 It looks like something specific to building against Hadoop 2.6? Nick

Re: Spark 1.3.1 / Hadoop 2.6 package has broken S3 access

2015-05-07 Thread Nicholas Chammas
Hmm, I just tried changing s3n to s3a: py4j.protocol.Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.collectAndServe. : java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.s3a.S3AFileSystem not found Nick ​ On Thu, May

Re: Spark 1.3.1 / Hadoop 2.6 package has broken S3 access

2015-05-07 Thread Peter Rudenko
Try to download this jar: http://search.maven.org/remotecontent?filepath=org/apache/hadoop/hadoop-aws/2.6.0/hadoop-aws-2.6.0.jar And add: export CLASSPATH=$CLASSPATH:hadoop-aws-2.6.0.jar And try to relaunch. Thanks, Peter Rudenko On 2015-05-07 19:30, Nicholas Chammas wrote: Hmm, I just

Re: Spark 1.3.1 / Hadoop 2.6 package has broken S3 access

2015-05-07 Thread Nicholas Chammas
Ah, thanks for the pointers. So as far as Spark is concerned, is this a breaking change? Is it possible that people who have working code that accesses S3 will upgrade to use Spark-against-Hadoop-2.6 and find their code is not working all of a sudden? Nick On Thu, May 7, 2015 at 12:48 PM Peter

Re: Spark 1.3.1 / Hadoop 2.6 package has broken S3 access

2015-05-07 Thread Matei Zaharia
We should make sure to update our docs to mention s3a as well, since many people won't look at Hadoop's docs for this. Matei On May 7, 2015, at 12:57 PM, Nicholas Chammas nicholas.cham...@gmail.com wrote: Ah, thanks for the pointers. So as far as Spark is concerned, is this a breaking