[ 
https://issues.apache.org/jira/browse/SPARK-15965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15333527#comment-15333527
 ] 

thauvin damien commented on SPARK-15965:
----------------------------------------

So i succeed to run the "s3n" driver with this configuration with this tutorial:
http://deploymentzone.com/2015/12/20/s3a-on-spark-on-aws-ec2/
https://gist.github.com/chicagobuss/6557dbf1ad97e5a09709
https://gist.github.com/thekensta/21068ef1b6f4af08eb09

1) download the "spark-1.6.1-bin-hadoop2.4.tgz" distribution and unzip it 
2 ) download the "aws-java-sdk-1.7.4.jar" and "hadoop-aws-2.7.1.jar" 
3) copy "aws-java-sdk-1.7.4.jar" and "hadoop-aws-2.7.1.jar"  to $SPARK_HOME/lib 
( or other dir) 
4) edit $SPARK_HOME/conf/spark-defaults.conf and add : 
#spark.executor.extraClassPath  
$SPARK_HOME/lib/aws-java-sdk-1.7.4.jar:$SPARK_HOME/lib/hadoop-aws-2.7.1.jar
#spark.driver.extraClassPath    
$SPARK_HOME/lib/aws-java-sdk-1.7.4.jar:$SPARK_HOME/lib/hadoop-aws-2.7.1.jar
5) in spark-shell 
#sc.hadoopConfiguration.set("fs.s3n.awsAccessKeyId", "XXXZZZHHH")
#sc.hadoopConfiguration.set("fs.s3n.awsSecretAccessKey", 
"xxxxxxxxxxxxxxxxxxxxxxxxxxx")
#val lines=sc.textFile("s3n://poc-XXX/access/2016/02/20160201202001_xxx.log.gz")
#lines.count

Works !

This is BAD because this links talk about S3A driver not S3N . With the same 
configuration i still have this Error : 
#java.io.IOException: No FileSystem for scheme: s3a

When I run this command line : 
#sc.hadoopConfiguration.set("fs.s3a.awsAccessKeyId", "XXXZZZHHH")
#sc.hadoopConfiguration.set("fs.s3a.awsSecretAccessKey", 
"xxxxxxxxxxxxxxxxxxxxxxxxxxx")
#val lines=sc.textFile("s3a://poc-XXX/access/2016/02/20160201202001_xxx.log.gz")
#lines.count


> No FileSystem for scheme: s3n or s3a  spark-2.0.0 and spark-1.6.1
> -----------------------------------------------------------------
>
>                 Key: SPARK-15965
>                 URL: https://issues.apache.org/jira/browse/SPARK-15965
>             Project: Spark
>          Issue Type: Bug
>          Components: Build
>    Affects Versions: 1.6.1
>         Environment: Debian GNU/Linux 8
> java version "1.7.0_79"
>            Reporter: thauvin damien
>   Original Estimate: 8h
>  Remaining Estimate: 8h
>
> The spark programming-guide explain that Spark can create distributed 
> datasets on Amazon S3 . 
> But since the pre-buid "Hadoop 2.6" the S3 access doesn't work with s3n or 
> s3a. 
> sc.hadoopConfiguration.set("fs.s3a.awsAccessKeyId", "XXXZZZHHH")
> sc.hadoopConfiguration.set("fs.s3a.awsSecretAccessKey", 
> "xxxxxxxxxxxxxxxxxxxxxxxxxxx")
> val 
> lines=sc.textFile("s3a://poc-XXX/access/2016/02/20160201202001_xxx.log.gz")
> java.lang.RuntimeException: java.lang.ClassNotFoundException: Class 
> org.apache.hadoop.fs.s3a.S3AFileSystem not found
> Any version of spark : spark-1.3.1 ; spark-1.6.1 even spark-2.0.0 with 
> hadoop.7.2 .
> I understand this is an Hadoop Issue (SPARK-7442)  but can you make some 
> documentation to explain what jar we need to add and where ? ( for standalone 
> installation) .
> "hadoop-aws-x.x.x.jar and aws-java-sdk-x.x.x.jar is enough ? 
> What env variable we need to set and what file we need to modifiy .
> Is it "$CLASSPATH "or a variable in "spark-defaults.conf" with variable 
> "spark.driver.extraClassPath" and "spark.executor.extraClassPath"
> But Still Works with spark-1.6.1 pre build with hadoop2.4 
> Thanks 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to