We should make sure to update our docs to mention s3a as well, since many people won't look at Hadoop's docs for this.
Matei > On May 7, 2015, at 12:57 PM, Nicholas Chammas <nicholas.cham...@gmail.com> > wrote: > > Ah, thanks for the pointers. > > So as far as Spark is concerned, is this a breaking change? Is it possible > that people who have working code that accesses S3 will upgrade to use > Spark-against-Hadoop-2.6 and find their code is not working all of a sudden? > > Nick > > On Thu, May 7, 2015 at 12:48 PM Peter Rudenko <petro.rude...@gmail.com > <mailto:petro.rude...@gmail.com>> > wrote: > >> Yep it's a Hadoop issue: >> https://issues.apache.org/jira/browse/HADOOP-11863 >> >> >> http://mail-archives.apache.org/mod_mbox/hadoop-user/201504.mbox/%3CCA+XUwYxPxLkfhOxn1jNkoUKEQQMcPWFzvXJ=u+kp28kdejo...@mail.gmail.com%3E >> http://stackoverflow.com/a/28033408/3271168 >> >> >> So for now need to manually add that jar to classpath on hadoop-2.6. >> >> Thanks, >> Peter Rudenko >> >> On 2015-05-07 19:41, Nicholas Chammas wrote: >> >> I can try that, but the issue is I understand this is supposed to work out >> of the box (like it does with all the other Spark/Hadoop pre-built >> packages). >> >> On Thu, May 7, 2015 at 12:35 PM Peter Rudenko <petro.rude...@gmail.com> >> wrote: >> >>> Try to download this jar: >>> >>> http://search.maven.org/remotecontent?filepath=org/apache/hadoop/hadoop-aws/2.6.0/hadoop-aws-2.6.0.jar >>> >>> <http://search.maven.org/remotecontent?filepath=org/apache/hadoop/hadoop-aws/2.6.0/hadoop-aws-2.6.0.jar> >>> >>> And add: >>> >>> export CLASSPATH=$CLASSPATH:hadoop-aws-2.6.0.jar >>> >>> And try to relaunch. >>> >>> Thanks, >>> Peter Rudenko >>> >>> >>> On 2015-05-07 19:30, Nicholas Chammas wrote: >>> >>> Hmm, I just tried changing s3n to s3a: >>> >>> py4j.protocol.Py4JJavaError: An error occurred while calling >>> z:org.apache.spark.api.python.PythonRDD.collectAndServe. >>> : java.lang.RuntimeException: java.lang.ClassNotFoundException: Class >>> org.apache.hadoop.fs.s3a.S3AFileSystem not found >>> >>> Nick >>> >>> >>> On Thu, May 7, 2015 at 12:29 PM Peter Rudenko <petro.rude...@gmail.com >>> <mailto:petro.rude...@gmail.com>> >>> wrote: >>> >>>> Hi Nick, had the same issue. >>>> By default it should work with s3a protocol: >>>> >>>> sc.textFile('s3a://bucket/file_*').count() >>>> >>>> >>>> If you want to use s3n protocol you need to add hadoop-aws.jar to >>>> spark's classpath. Wich hadoop vendor (Hortonworks, Cloudera, MapR) do you >>>> use? >>>> >>>> Thanks, >>>> Peter Rudenko >>>> >>>> On 2015-05-07 19:25, Nicholas Chammas wrote: >>>> >>>> Details are here: https://issues.apache.org/jira/browse/SPARK-7442 >>>> >>>> It looks like something specific to building against Hadoop 2.6? >>>> >>>> Nick