[jira] [Commented] (SPARK-15965) No FileSystem for scheme: s3n or s3a spark-2.0.0 and spark-1.6.1

2016-08-23 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15432432#comment-15432432
 ] 

Steve Loughran commented on SPARK-15965:


This is being fixed with tests in my work in SPARK-7481; the manual workaround 
is

Spark 2: 

# Get the same hadoop version that your spark version is built against
# add hadoop-aws, everything with amazon-*.jar into the JARs subdir

Spark 1.6+

This needs my patch a rebuild of spark assembly. However, once that patch is 
in, trying to use the assembly without  the AWS JARs will stop spark from 
starting —unless you move up to Hadoop 2.7.3

> No FileSystem for scheme: s3n or s3a  spark-2.0.0 and spark-1.6.1
> -
>
> Key: SPARK-15965
> URL: https://issues.apache.org/jira/browse/SPARK-15965
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 1.6.1
> Environment: Debian GNU/Linux 8
> java version "1.7.0_79"
>Reporter: thauvin damien
>   Original Estimate: 8h
>  Remaining Estimate: 8h
>
> The spark programming-guide explain that Spark can create distributed 
> datasets on Amazon S3 . 
> But since the pre-buid "Hadoop 2.6" the S3 access doesn't work with s3n or 
> s3a. 
> sc.hadoopConfiguration.set("fs.s3a.awsAccessKeyId", "XXXZZZHHH")
> sc.hadoopConfiguration.set("fs.s3a.awsSecretAccessKey", 
> "xxx")
> val 
> lines=sc.textFile("s3a://poc-XXX/access/2016/02/20160201202001_xxx.log.gz")
> java.lang.RuntimeException: java.lang.ClassNotFoundException: Class 
> org.apache.hadoop.fs.s3a.S3AFileSystem not found
> Any version of spark : spark-1.3.1 ; spark-1.6.1 even spark-2.0.0 with 
> hadoop.7.2 .
> I understand this is an Hadoop Issue (SPARK-7442)  but can you make some 
> documentation to explain what jar we need to add and where ? ( for standalone 
> installation) .
> "hadoop-aws-x.x.x.jar and aws-java-sdk-x.x.x.jar is enough ? 
> What env variable we need to set and what file we need to modifiy .
> Is it "$CLASSPATH "or a variable in "spark-defaults.conf" with variable 
> "spark.driver.extraClassPath" and "spark.executor.extraClassPath"
> But Still Works with spark-1.6.1 pre build with hadoop2.4 
> Thanks 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15965) No FileSystem for scheme: s3n or s3a spark-2.0.0 and spark-1.6.1

2016-06-16 Thread thauvin damien (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15333527#comment-15333527
 ] 

thauvin damien commented on SPARK-15965:


So i succeed to run the "s3n" driver with this configuration with this tutorial:
http://deploymentzone.com/2015/12/20/s3a-on-spark-on-aws-ec2/
https://gist.github.com/chicagobuss/6557dbf1ad97e5a09709
https://gist.github.com/thekensta/21068ef1b6f4af08eb09

1) download the "spark-1.6.1-bin-hadoop2.4.tgz" distribution and unzip it 
2 ) download the "aws-java-sdk-1.7.4.jar" and "hadoop-aws-2.7.1.jar" 
3) copy "aws-java-sdk-1.7.4.jar" and "hadoop-aws-2.7.1.jar"  to $SPARK_HOME/lib 
( or other dir) 
4) edit $SPARK_HOME/conf/spark-defaults.conf and add : 
#spark.executor.extraClassPath  
$SPARK_HOME/lib/aws-java-sdk-1.7.4.jar:$SPARK_HOME/lib/hadoop-aws-2.7.1.jar
#spark.driver.extraClassPath
$SPARK_HOME/lib/aws-java-sdk-1.7.4.jar:$SPARK_HOME/lib/hadoop-aws-2.7.1.jar
5) in spark-shell 
#sc.hadoopConfiguration.set("fs.s3n.awsAccessKeyId", "XXXZZZHHH")
#sc.hadoopConfiguration.set("fs.s3n.awsSecretAccessKey", 
"xxx")
#val lines=sc.textFile("s3n://poc-XXX/access/2016/02/20160201202001_xxx.log.gz")
#lines.count

Works !

This is BAD because this links talk about S3A driver not S3N . With the same 
configuration i still have this Error : 
#java.io.IOException: No FileSystem for scheme: s3a

When I run this command line : 
#sc.hadoopConfiguration.set("fs.s3a.awsAccessKeyId", "XXXZZZHHH")
#sc.hadoopConfiguration.set("fs.s3a.awsSecretAccessKey", 
"xxx")
#val lines=sc.textFile("s3a://poc-XXX/access/2016/02/20160201202001_xxx.log.gz")
#lines.count


> No FileSystem for scheme: s3n or s3a  spark-2.0.0 and spark-1.6.1
> -
>
> Key: SPARK-15965
> URL: https://issues.apache.org/jira/browse/SPARK-15965
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 1.6.1
> Environment: Debian GNU/Linux 8
> java version "1.7.0_79"
>Reporter: thauvin damien
>   Original Estimate: 8h
>  Remaining Estimate: 8h
>
> The spark programming-guide explain that Spark can create distributed 
> datasets on Amazon S3 . 
> But since the pre-buid "Hadoop 2.6" the S3 access doesn't work with s3n or 
> s3a. 
> sc.hadoopConfiguration.set("fs.s3a.awsAccessKeyId", "XXXZZZHHH")
> sc.hadoopConfiguration.set("fs.s3a.awsSecretAccessKey", 
> "xxx")
> val 
> lines=sc.textFile("s3a://poc-XXX/access/2016/02/20160201202001_xxx.log.gz")
> java.lang.RuntimeException: java.lang.ClassNotFoundException: Class 
> org.apache.hadoop.fs.s3a.S3AFileSystem not found
> Any version of spark : spark-1.3.1 ; spark-1.6.1 even spark-2.0.0 with 
> hadoop.7.2 .
> I understand this is an Hadoop Issue (SPARK-7442)  but can you make some 
> documentation to explain what jar we need to add and where ? ( for standalone 
> installation) .
> "hadoop-aws-x.x.x.jar and aws-java-sdk-x.x.x.jar is enough ? 
> What env variable we need to set and what file we need to modifiy .
> Is it "$CLASSPATH "or a variable in "spark-defaults.conf" with variable 
> "spark.driver.extraClassPath" and "spark.executor.extraClassPath"
> But Still Works with spark-1.6.1 pre build with hadoop2.4 
> Thanks 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15965) No FileSystem for scheme: s3n or s3a spark-2.0.0 and spark-1.6.1

2016-06-15 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15332369#comment-15332369
 ] 

Sean Owen commented on SPARK-15965:
---

CC [~steve_l] but I think this is your classpath issue or a Hadoop issue, not 
Spark.

> No FileSystem for scheme: s3n or s3a  spark-2.0.0 and spark-1.6.1
> -
>
> Key: SPARK-15965
> URL: https://issues.apache.org/jira/browse/SPARK-15965
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 1.6.1
> Environment: Debian GNU/Linux 8
> java version "1.7.0_79"
>Reporter: thauvin damien
>   Original Estimate: 8h
>  Remaining Estimate: 8h
>
> The spark programming-guide explain that Spark can create distributed 
> datasets on Amazon S3 . 
> But since the pre-buid "Hadoop 2.6" the S3 access doesn't work with s3n or 
> s3a. 
> sc.hadoopConfiguration.set("fs.s3a.awsAccessKeyId", "XXXZZZHHH")
> sc.hadoopConfiguration.set("fs.s3a.awsSecretAccessKey", 
> "xxx")
> val 
> lines=sc.textFile("s3a://poc-XXX/access/2016/02/20160201202001_xxx.log.gz")
> java.lang.RuntimeException: java.lang.ClassNotFoundException: Class 
> org.apache.hadoop.fs.s3a.S3AFileSystem not found
> Any version of spark : spark-1.3.1 ; spark-1.6.1 even spark-2.0.0 with 
> hadoop.7.2 .
> I understand this is an Hadoop Issue (SPARK-7442)  but can you make some 
> documentation to explain what jar we need to add and where ? ( for standalone 
> installation) .
> "hadoop-aws-x.x.x.jar and aws-java-sdk-x.x.x.jar is enough ? 
> What env variable we need to set and what file we need to modifiy .
> Is it "$CLASSPATH "or a variable in "spark-defaults.conf" with variable 
> "spark.driver.extraClassPath" and "spark.executor.extraClassPath"
> But Still Works with spark-1.6.1 pre build with hadoop2.4 
> Thanks 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org