Re: [Error:]while read s3 buckets in Spark 1.6 in spark -submit
Hi Steve, I am trying to read it from S3n://"bucket" and already included aws-java-sdk 1.7.4 in my classpath . My machine is AWS EMR with HAdoop 2.7.2 and Spark 1.6.1 installed . As per the below post its shows that issue with EMR Hadoop2.7.2 http://stackoverflow.com/questions/30385981/how-to-access-s3a-files-from-apache-spark Is it really the issue ? Could somebody help me validate the above ? Thanks, Divya On 1 September 2016 at 16:59, Steve Loughranwrote: > > On 1 Sep 2016, at 03:45, Divya Gehlot wrote: > > Hi, > I am using Spark 1.6.1 in EMR machine > I am trying to read s3 buckets in my Spark job . > When I read it through Spark shell I am able to read it ,but when I try to > package the job and and run it as spark submit I am getting below error > > 16/08/31 07:36:38 INFO ApplicationMaster: Registered signal handlers for > [TERM, HUP, INT] > >> 16/08/31 07:36:39 INFO ApplicationMaster: ApplicationAttemptId: >> appattempt_1468570153734_2851_01 >> Exception in thread "main" java.util.ServiceConfigurationError: >> org.apache.hadoop.fs.FileSystem: Provider >> org.apache.hadoop.fs.s3a.S3AFileSystem >> could not be instantiated >> at java.util.ServiceLoader.fail(ServiceLoader.java:224) >> at java.util.ServiceLoader.access$100(ServiceLoader.java:181) >> at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:377) >> > I have already included > > "com.amazonaws" % "aws-java-sdk-s3" % "1.11.15", > > in my build.sbt > > > Assuming you are using a released version of Hadoop 2.6 or 2.7 underneath > spark, you will need to make sure your classpath has aws-java-sdk 1.7.4 on > your CP. You can't just drop in a new JAR as it is incompatible at the API > level ( https://issues.apache.org/jira/browse/HADOOP-12269 ) > > > > com.amazonaws > aws-java-sdk > 1.7.4 > compile > > > > and jackson artifacts databind and annotations in sync with the rest of > your app > > > > com.fasterxml.jackson.core > jackson-databind > > > com.fasterxml.jackson.core > jackson-annotations > > > > I tried the provinding the access key also in my job still the same error > persists. > > when I googled it I if you have IAM role created there is no need to > provide access key . > > > > You don't get IAM support until Hadoop 2.8 ships. sorry. Needed a fair > amount of reworking of how S3A does authentication. > > Note that if you launch spark jobs with the AWS environment variables set, > these will be automatically picked up and used to set the relevant > properties in the configuration. >
Re: [Error:]while read s3 buckets in Spark 1.6 in spark -submit
On 1 Sep 2016, at 03:45, Divya Gehlot> wrote: Hi, I am using Spark 1.6.1 in EMR machine I am trying to read s3 buckets in my Spark job . When I read it through Spark shell I am able to read it ,but when I try to package the job and and run it as spark submit I am getting below error 16/08/31 07:36:38 INFO ApplicationMaster: Registered signal handlers for [TERM, HUP, INT] 16/08/31 07:36:39 INFO ApplicationMaster: ApplicationAttemptId: appattempt_1468570153734_2851_01 Exception in thread "main" java.util.ServiceConfigurationError: org.apache.hadoop.fs.FileSystem: Provider org.apache.hadoop.fs.s3a.S3AFileSystem could not be instantiated at java.util.ServiceLoader.fail(ServiceLoader.java:224) at java.util.ServiceLoader.access$100(ServiceLoader.java:181) at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:377) I have already included "com.amazonaws" % "aws-java-sdk-s3" % "1.11.15", in my build.sbt Assuming you are using a released version of Hadoop 2.6 or 2.7 underneath spark, you will need to make sure your classpath has aws-java-sdk 1.7.4 on your CP. You can't just drop in a new JAR as it is incompatible at the API level ( https://issues.apache.org/jira/browse/HADOOP-12269 ) com.amazonaws aws-java-sdk 1.7.4 compile and jackson artifacts databind and annotations in sync with the rest of your app com.fasterxml.jackson.core jackson-databind com.fasterxml.jackson.core jackson-annotations I tried the provinding the access key also in my job still the same error persists. when I googled it I if you have IAM role created there is no need to provide access key . You don't get IAM support until Hadoop 2.8 ships. sorry. Needed a fair amount of reworking of how S3A does authentication. Note that if you launch spark jobs with the AWS environment variables set, these will be automatically picked up and used to set the relevant properties in the configuration.
[Error:]while read s3 buckets in Spark 1.6 in spark -submit
Hi, I am using Spark 1.6.1 in EMR machine I am trying to read s3 buckets in my Spark job . When I read it through Spark shell I am able to read it ,but when I try to package the job and and run it as spark submit I am getting below error 16/08/31 07:36:38 INFO ApplicationMaster: Registered signal handlers for [TERM, HUP, INT] > 16/08/31 07:36:39 INFO ApplicationMaster: ApplicationAttemptId: > appattempt_1468570153734_2851_01 > Exception in thread "main" java.util.ServiceConfigurationError: > org.apache.hadoop.fs.FileSystem: Provider > org.apache.hadoop.fs.s3a.S3AFileSystem could not be instantiated > at java.util.ServiceLoader.fail(ServiceLoader.java:224) > at java.util.ServiceLoader.access$100(ServiceLoader.java:181) > at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:377) > at java.util.ServiceLoader$1.next(ServiceLoader.java:445) > at org.apache.hadoop.fs.FileSystem.loadFileSystems(FileSystem.java:2673) > at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2684) > at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2701) > at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91) > at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2737) > at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2719) > at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:375) > at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:174) > at > org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:142) > at > org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$main$1.apply$mcV$sp(ApplicationMaster.scala:653) > at > org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:69) > at > org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:68) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) > at > org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:68) > at > org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:651) > at > org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala) > Caused by: java.lang.NoClassDefFoundError: > com/amazonaws/services/s3/AmazonS3 > at java.lang.Class.getDeclaredConstructors0(Native Method) > at java.lang.Class.privateGetDeclaredConstructors(Class.java:2595) > at java.lang.Class.getConstructor0(Class.java:2895) > at java.lang.Class.newInstance(Class.java:354) > at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:373) > ... 19 more > Caused by: java.lang.ClassNotFoundException: > com.amazonaws.services.s3.AmazonS3 > at java.net.URLClassLoader$1.run(URLClassLoader.java:366) > at java.net.URLClassLoader$1.run(URLClassLoader.java:355) > at java.security.AccessController.doPrivileged(Native Method) > at java.net.URLClassLoader.findClass(URLClassLoader.java:354) > at java.lang.ClassLoader.loadClass(ClassLoader.java:425) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) > at java.lang.ClassLoader.loadClass(ClassLoader.java:358) > ... 24 more > End of LogType:stderr I have already included "com.amazonaws" % "aws-java-sdk-s3" % "1.11.15", in my build.sbt I tried the provinding the access key also in my job still the same error persists. when I googled it I if you have IAM role created there is no need to provide access key . Would really appreciate the help. Thanks, Divya