Ah, thanks for the pointers. So as far as Spark is concerned, is this a breaking change? Is it possible that people who have working code that accesses S3 will upgrade to use Spark-against-Hadoop-2.6 and find their code is not working all of a sudden?
Nick On Thu, May 7, 2015 at 12:48 PM Peter Rudenko <petro.rude...@gmail.com> wrote: > Yep it's a Hadoop issue: > https://issues.apache.org/jira/browse/HADOOP-11863 > > > http://mail-archives.apache.org/mod_mbox/hadoop-user/201504.mbox/%3CCA+XUwYxPxLkfhOxn1jNkoUKEQQMcPWFzvXJ=u+kp28kdejo...@mail.gmail.com%3E > http://stackoverflow.com/a/28033408/3271168 > > > So for now need to manually add that jar to classpath on hadoop-2.6. > > Thanks, > Peter Rudenko > > On 2015-05-07 19:41, Nicholas Chammas wrote: > > I can try that, but the issue is I understand this is supposed to work out > of the box (like it does with all the other Spark/Hadoop pre-built > packages). > > On Thu, May 7, 2015 at 12:35 PM Peter Rudenko <petro.rude...@gmail.com> > wrote: > >> Try to download this jar: >> >> http://search.maven.org/remotecontent?filepath=org/apache/hadoop/hadoop-aws/2.6.0/hadoop-aws-2.6.0.jar >> >> And add: >> >> export CLASSPATH=$CLASSPATH:hadoop-aws-2.6.0.jar >> >> And try to relaunch. >> >> Thanks, >> Peter Rudenko >> >> >> On 2015-05-07 19:30, Nicholas Chammas wrote: >> >> Hmm, I just tried changing s3n to s3a: >> >> py4j.protocol.Py4JJavaError: An error occurred while calling >> z:org.apache.spark.api.python.PythonRDD.collectAndServe. >> : java.lang.RuntimeException: java.lang.ClassNotFoundException: Class >> org.apache.hadoop.fs.s3a.S3AFileSystem not found >> >> Nick >> >> >> On Thu, May 7, 2015 at 12:29 PM Peter Rudenko <petro.rude...@gmail.com> >> wrote: >> >>> Hi Nick, had the same issue. >>> By default it should work with s3a protocol: >>> >>> sc.textFile('s3a://bucket/file_*').count() >>> >>> >>> If you want to use s3n protocol you need to add hadoop-aws.jar to >>> spark's classpath. Wich hadoop vendor (Hortonworks, Cloudera, MapR) do you >>> use? >>> >>> Thanks, >>> Peter Rudenko >>> >>> On 2015-05-07 19:25, Nicholas Chammas wrote: >>> >>> Details are here: https://issues.apache.org/jira/browse/SPARK-7442 >>> >>> It looks like something specific to building against Hadoop 2.6? >>> >>> Nick >>> >>> >>> >>> >> >