Re: [Error:]while read s3 buckets in Spark 1.6 in spark -submit

2016-09-02 Thread Divya Gehlot
Hi Steve,
I am trying to read it from S3n://"bucket" and already included aws-java-sdk
1.7.4 in my classpath .
My machine is AWS EMR with HAdoop 2.7.2 and Spark 1.6.1 installed .
As per the below post its shows that issue with EMR Hadoop2.7.2
http://stackoverflow.com/questions/30385981/how-to-access-s3a-files-from-apache-spark
Is it really the issue ?
Could somebody help me validate the above ?


Thanks,
Divya



On 1 September 2016 at 16:59, Steve Loughran  wrote:

>
> On 1 Sep 2016, at 03:45, Divya Gehlot  wrote:
>
> Hi,
> I am using Spark 1.6.1 in EMR machine
> I am trying to read s3 buckets in my Spark job .
> When I read it through Spark shell I am able to read it ,but when I try to
> package the job and and run it as spark submit I am getting below error
>
> 16/08/31 07:36:38 INFO ApplicationMaster: Registered signal handlers for
> [TERM, HUP, INT]
>
>> 16/08/31 07:36:39 INFO ApplicationMaster: ApplicationAttemptId:
>> appattempt_1468570153734_2851_01
>> Exception in thread "main" java.util.ServiceConfigurationError:
>> org.apache.hadoop.fs.FileSystem: Provider 
>> org.apache.hadoop.fs.s3a.S3AFileSystem
>> could not be instantiated
>> at java.util.ServiceLoader.fail(ServiceLoader.java:224)
>> at java.util.ServiceLoader.access$100(ServiceLoader.java:181)
>> at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:377)
>>
> I have already included
>
>  "com.amazonaws" % "aws-java-sdk-s3" % "1.11.15",
>
> in my build.sbt
>
>
> Assuming you are using a released version of Hadoop 2.6 or 2.7 underneath
> spark, you will need to make sure your classpath has aws-java-sdk 1.7.4 on
> your CP. You can't just drop in a new JAR as it is incompatible at the API
> level ( https://issues.apache.org/jira/browse/HADOOP-12269 )
>
>
> 
>   com.amazonaws
>   aws-java-sdk
>   1.7.4
>   compile
> 
>
>
> and jackson artifacts databind and annotations in sync with the rest of
> your app
>
>
> 
>   com.fasterxml.jackson.core
>   jackson-databind
> 
> 
>   com.fasterxml.jackson.core
>   jackson-annotations
> 
>
>
> I tried the provinding the access key also in my job still the same error
> persists.
>
> when I googled it I if you have IAM role created there is no need to
> provide access key .
>
>
>
> You don't get IAM support until Hadoop 2.8 ships. sorry. Needed a fair
> amount of reworking of how S3A does authentication.
>
> Note that if you launch spark jobs with the AWS environment variables set,
> these will be automatically picked up and used to set the relevant
> properties in the configuration.
>


Re: [Error:]while read s3 buckets in Spark 1.6 in spark -submit

2016-09-01 Thread Steve Loughran

On 1 Sep 2016, at 03:45, Divya Gehlot 
> wrote:

Hi,
I am using Spark 1.6.1 in EMR machine
I am trying to read s3 buckets in my Spark job .
When I read it through Spark shell I am able to read it ,but when I try to 
package the job and and run it as spark submit I am getting below error

16/08/31 07:36:38 INFO ApplicationMaster: Registered signal handlers for [TERM, 
HUP, INT]

16/08/31 07:36:39 INFO ApplicationMaster: ApplicationAttemptId: 
appattempt_1468570153734_2851_01
Exception in thread "main" java.util.ServiceConfigurationError: 
org.apache.hadoop.fs.FileSystem: Provider 
org.apache.hadoop.fs.s3a.S3AFileSystem could not be instantiated
at java.util.ServiceLoader.fail(ServiceLoader.java:224)
at java.util.ServiceLoader.access$100(ServiceLoader.java:181)
at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:377)

I have already included

 "com.amazonaws" % "aws-java-sdk-s3" % "1.11.15",

in my build.sbt


Assuming you are using a released version of Hadoop 2.6 or 2.7 underneath 
spark, you will need to make sure your classpath has aws-java-sdk 1.7.4 on your 
CP. You can't just drop in a new JAR as it is incompatible at the API level ( 
https://issues.apache.org/jira/browse/HADOOP-12269 )



  com.amazonaws
  aws-java-sdk
  1.7.4
  compile



and jackson artifacts databind and annotations in sync with the rest of your app



  com.fasterxml.jackson.core
  jackson-databind


  com.fasterxml.jackson.core
  jackson-annotations




I tried the provinding the access key also in my job still the same error 
persists.

when I googled it I if you have IAM role created there is no need to provide 
access key .



You don't get IAM support until Hadoop 2.8 ships. sorry. Needed a fair amount 
of reworking of how S3A does authentication.

Note that if you launch spark jobs with the AWS environment variables set, 
these will be automatically picked up and used to set the relevant properties 
in the configuration.


[Error:]while read s3 buckets in Spark 1.6 in spark -submit

2016-08-31 Thread Divya Gehlot
Hi,
I am using Spark 1.6.1 in EMR machine
I am trying to read s3 buckets in my Spark job .
When I read it through Spark shell I am able to read it ,but when I try to
package the job and and run it as spark submit I am getting below error

16/08/31 07:36:38 INFO ApplicationMaster: Registered signal handlers for
[TERM, HUP, INT]

> 16/08/31 07:36:39 INFO ApplicationMaster: ApplicationAttemptId:
> appattempt_1468570153734_2851_01
> Exception in thread "main" java.util.ServiceConfigurationError:
> org.apache.hadoop.fs.FileSystem: Provider
> org.apache.hadoop.fs.s3a.S3AFileSystem could not be instantiated
> at java.util.ServiceLoader.fail(ServiceLoader.java:224)
> at java.util.ServiceLoader.access$100(ServiceLoader.java:181)
> at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:377)
> at java.util.ServiceLoader$1.next(ServiceLoader.java:445)
> at org.apache.hadoop.fs.FileSystem.loadFileSystems(FileSystem.java:2673)
> at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2684)
> at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2701)
> at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91)
> at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2737)
> at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2719)
> at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:375)
> at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:174)
> at
> org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:142)
> at
> org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$main$1.apply$mcV$sp(ApplicationMaster.scala:653)
> at
> org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:69)
> at
> org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:68)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
> at
> org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:68)
> at
> org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:651)
> at
> org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala)
> Caused by: java.lang.NoClassDefFoundError:
> com/amazonaws/services/s3/AmazonS3
> at java.lang.Class.getDeclaredConstructors0(Native Method)
> at java.lang.Class.privateGetDeclaredConstructors(Class.java:2595)
> at java.lang.Class.getConstructor0(Class.java:2895)
> at java.lang.Class.newInstance(Class.java:354)
> at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:373)
> ... 19 more
> Caused by: java.lang.ClassNotFoundException:
> com.amazonaws.services.s3.AmazonS3
> at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
> at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
> at java.security.AccessController.doPrivileged(Native Method)
> at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
> ... 24 more
> End of LogType:stderr



I have already included

 "com.amazonaws" % "aws-java-sdk-s3" % "1.11.15",

in my build.sbt


I tried the provinding the access key also in my job still the same error
persists.

when I googled it I if you have IAM role created there is no need to
provide access key .

Would really appreciate the help.


Thanks,

Divya