Have you tried reading using s3n which is a slightly older protocol ? I'm not sure how compatible s3a is with older versions of Spark.
Femi On Fri, Feb 24, 2017 at 2:18 AM, Benjamin Kim <bbuil...@gmail.com> wrote: > Hi Gourav, > > My answers are below. > > Cheers, > Ben > > > On Feb 23, 2017, at 10:57 PM, Gourav Sengupta <gourav.sengu...@gmail.com> > wrote: > > Can I ask where are you running your CDH? Is it on premise or have you > created a cluster for yourself in AWS? Our cluster in on premise in our > data center. > > Also I have really never seen use s3a before, that was used way long > before when writing s3 files took a long time, but I think that you are > reading it. > > Anyideas why you are not migrating to Spark 2.1, besides speed, there are > lots of apis which are new and the existing ones are being deprecated. > Therefore there is a very high chance that you are already working on code > which is being deprecated by the SPARK community right now. We use CDH > and upgrade with whatever Spark version they include, which is 1.6.0. We > are waiting for the move to Spark 2.0/2.1. > > And besides that would you not want to work on a platform which is at > least 10 times faster What would that be? > > Regards, > Gourav Sengupta > > On Thu, Feb 23, 2017 at 6:23 PM, Benjamin Kim <bbuil...@gmail.com> wrote: > >> We are trying to use Spark 1.6 within CDH 5.7.1 to retrieve a 1.3GB >> Parquet file from AWS S3. We can read the schema and show some data when >> the file is loaded into a DataFrame, but when we try to do some operations, >> such as count, we get this error below. >> >> com.cloudera.com.amazonaws.AmazonClientException: Unable to load AWS >> credentials from any provider in the chain >> at com.cloudera.com.amazonaws.auth.AWSCredentialsProviderChain. >> getCredentials(AWSCredentialsProviderChain.java:117) >> at com.cloudera.com.amazonaws.services.s3.AmazonS3Client.invoke >> (AmazonS3Client.java:3779) >> at com.cloudera.com.amazonaws.services.s3.AmazonS3Client.headBu >> cket(AmazonS3Client.java:1107) >> at com.cloudera.com.amazonaws.services.s3.AmazonS3Client.doesBu >> cketExist(AmazonS3Client.java:1070) >> at org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSys >> tem.java:239) >> at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem. >> java:2711) >> at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:97) >> at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem >> .java:2748) >> at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java: >> 2730) >> at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:385) >> at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296) >> at parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReade >> r.java:385) >> at parquet.hadoop.ParquetRecordReader.initializeInternalReader( >> ParquetRecordReader.java:162) >> at parquet.hadoop.ParquetRecordReader.initialize(ParquetRecordR >> eader.java:145) >> at org.apache.spark.rdd.SqlNewHadoopRDD$$anon$1.<init>( >> SqlNewHadoopRDD.scala:180) >> at org.apache.spark.rdd.SqlNewHadoopRDD.compute(SqlNewHadoopRDD >> .scala:126) >> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala: >> 306) >> at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) >> at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsR >> DD.scala:38) >> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala: >> 306) >> at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) >> at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsR >> DD.scala:38) >> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala: >> 306) >> at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) >> at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMap >> Task.scala:73) >> at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMap >> Task.scala:41) >> at org.apache.spark.scheduler.Task.run(Task.scala:89) >> at org.apache.spark.executor.Executor$TaskRunner.run(Executor. >> scala:229) >> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool >> Executor.java:1142) >> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo >> lExecutor.java:617) >> at java.lang.Thread.run(Thread.java:745) >> >> Can anyone help? >> >> Cheers, >> Ben >> >> >> --------------------------------------------------------------------- >> To unsubscribe e-mail: user-unsubscr...@spark.apache.org >> >> > > -- http://www.femibyte.com/twiki5/bin/view/Tech/ http://www.nextmatrix.com "Great spirits have always encountered violent opposition from mediocre minds." - Albert Einstein.