Re: Spark 1.3.1 / Hadoop 2.6 package has broken S3 access

Matei Zaharia Thu, 07 May 2015 10:03:48 -0700

We should make sure to update our docs to mention s3a as well, since many 
people won't look at Hadoop's docs for this.


Matei

> On May 7, 2015, at 12:57 PM, Nicholas Chammas <nicholas.cham...@gmail.com> 
> wrote:
> 
> Ah, thanks for the pointers.
> 
> So as far as Spark is concerned, is this a breaking change? Is it possible
> that people who have working code that accesses S3 will upgrade to use
> Spark-against-Hadoop-2.6 and find their code is not working all of a sudden?
> 
> Nick
> 
> On Thu, May 7, 2015 at 12:48 PM Peter Rudenko <petro.rude...@gmail.com 
> <mailto:petro.rude...@gmail.com>>
> wrote:
> 
>> Yep it's a Hadoop issue:
>> https://issues.apache.org/jira/browse/HADOOP-11863
>> 
>> 
>> http://mail-archives.apache.org/mod_mbox/hadoop-user/201504.mbox/%3CCA+XUwYxPxLkfhOxn1jNkoUKEQQMcPWFzvXJ=u+kp28kdejo...@mail.gmail.com%3E
>> http://stackoverflow.com/a/28033408/3271168
>> 
>> 
>> So for now need to manually add that jar to classpath on hadoop-2.6.
>> 
>> Thanks,
>> Peter Rudenko
>> 
>> On 2015-05-07 19:41, Nicholas Chammas wrote:
>> 
>> I can try that, but the issue is I understand this is supposed to work out
>> of the box (like it does with all the other Spark/Hadoop pre-built
>> packages).
>> 
>> On Thu, May 7, 2015 at 12:35 PM Peter Rudenko <petro.rude...@gmail.com>
>> wrote:
>> 
>>> Try to download this jar:
>>> 
>>> http://search.maven.org/remotecontent?filepath=org/apache/hadoop/hadoop-aws/2.6.0/hadoop-aws-2.6.0.jar
>>>  
>>> <http://search.maven.org/remotecontent?filepath=org/apache/hadoop/hadoop-aws/2.6.0/hadoop-aws-2.6.0.jar>
>>> 
>>> And add:
>>> 
>>> export CLASSPATH=$CLASSPATH:hadoop-aws-2.6.0.jar
>>> 
>>> And try to relaunch.
>>> 
>>> Thanks,
>>> Peter Rudenko
>>> 
>>> 
>>> On 2015-05-07 19:30, Nicholas Chammas wrote:
>>> 
>>> Hmm, I just tried changing s3n to s3a:
>>> 
>>> py4j.protocol.Py4JJavaError: An error occurred while calling 
>>> z:org.apache.spark.api.python.PythonRDD.collectAndServe.
>>> : java.lang.RuntimeException: java.lang.ClassNotFoundException: Class 
>>> org.apache.hadoop.fs.s3a.S3AFileSystem not found
>>> 
>>> Nick
>>> 
>>> 
>>> On Thu, May 7, 2015 at 12:29 PM Peter Rudenko <petro.rude...@gmail.com 
>>> <mailto:petro.rude...@gmail.com>>
>>> wrote:
>>> 
>>>> Hi Nick, had the same issue.
>>>> By default it should work with s3a protocol:
>>>> 
>>>> sc.textFile('s3a://bucket/file_*').count()
>>>> 
>>>> 
>>>> If you want to use s3n protocol you need to add hadoop-aws.jar to
>>>> spark's classpath. Wich hadoop vendor (Hortonworks, Cloudera, MapR) do you
>>>> use?
>>>> 
>>>> Thanks,
>>>> Peter Rudenko
>>>> 
>>>> On 2015-05-07 19:25, Nicholas Chammas wrote:
>>>> 
>>>> Details are here: https://issues.apache.org/jira/browse/SPARK-7442
>>>> 
>>>> It looks like something specific to building against Hadoop 2.6?
>>>> 
>>>> Nick

Re: Spark 1.3.1 / Hadoop 2.6 package has broken S3 access

Reply via email to