[jira] [Commented] (SPARK-15965) No FileSystem for scheme: s3n or s3a spark-2.0.0 and spark-1.6.1
[ https://issues.apache.org/jira/browse/SPARK-15965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15432432#comment-15432432 ] Steve Loughran commented on SPARK-15965: This is being fixed with tests in my work in SPARK-7481; the manual workaround is Spark 2: # Get the same hadoop version that your spark version is built against # add hadoop-aws, everything with amazon-*.jar into the JARs subdir Spark 1.6+ This needs my patch a rebuild of spark assembly. However, once that patch is in, trying to use the assembly without the AWS JARs will stop spark from starting —unless you move up to Hadoop 2.7.3 > No FileSystem for scheme: s3n or s3a spark-2.0.0 and spark-1.6.1 > - > > Key: SPARK-15965 > URL: https://issues.apache.org/jira/browse/SPARK-15965 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 1.6.1 > Environment: Debian GNU/Linux 8 > java version "1.7.0_79" >Reporter: thauvin damien > Original Estimate: 8h > Remaining Estimate: 8h > > The spark programming-guide explain that Spark can create distributed > datasets on Amazon S3 . > But since the pre-buid "Hadoop 2.6" the S3 access doesn't work with s3n or > s3a. > sc.hadoopConfiguration.set("fs.s3a.awsAccessKeyId", "XXXZZZHHH") > sc.hadoopConfiguration.set("fs.s3a.awsSecretAccessKey", > "xxx") > val > lines=sc.textFile("s3a://poc-XXX/access/2016/02/20160201202001_xxx.log.gz") > java.lang.RuntimeException: java.lang.ClassNotFoundException: Class > org.apache.hadoop.fs.s3a.S3AFileSystem not found > Any version of spark : spark-1.3.1 ; spark-1.6.1 even spark-2.0.0 with > hadoop.7.2 . > I understand this is an Hadoop Issue (SPARK-7442) but can you make some > documentation to explain what jar we need to add and where ? ( for standalone > installation) . > "hadoop-aws-x.x.x.jar and aws-java-sdk-x.x.x.jar is enough ? > What env variable we need to set and what file we need to modifiy . > Is it "$CLASSPATH "or a variable in "spark-defaults.conf" with variable > "spark.driver.extraClassPath" and "spark.executor.extraClassPath" > But Still Works with spark-1.6.1 pre build with hadoop2.4 > Thanks -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-15965) No FileSystem for scheme: s3n or s3a spark-2.0.0 and spark-1.6.1
[ https://issues.apache.org/jira/browse/SPARK-15965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15333527#comment-15333527 ] thauvin damien commented on SPARK-15965: So i succeed to run the "s3n" driver with this configuration with this tutorial: http://deploymentzone.com/2015/12/20/s3a-on-spark-on-aws-ec2/ https://gist.github.com/chicagobuss/6557dbf1ad97e5a09709 https://gist.github.com/thekensta/21068ef1b6f4af08eb09 1) download the "spark-1.6.1-bin-hadoop2.4.tgz" distribution and unzip it 2 ) download the "aws-java-sdk-1.7.4.jar" and "hadoop-aws-2.7.1.jar" 3) copy "aws-java-sdk-1.7.4.jar" and "hadoop-aws-2.7.1.jar" to $SPARK_HOME/lib ( or other dir) 4) edit $SPARK_HOME/conf/spark-defaults.conf and add : #spark.executor.extraClassPath $SPARK_HOME/lib/aws-java-sdk-1.7.4.jar:$SPARK_HOME/lib/hadoop-aws-2.7.1.jar #spark.driver.extraClassPath $SPARK_HOME/lib/aws-java-sdk-1.7.4.jar:$SPARK_HOME/lib/hadoop-aws-2.7.1.jar 5) in spark-shell #sc.hadoopConfiguration.set("fs.s3n.awsAccessKeyId", "XXXZZZHHH") #sc.hadoopConfiguration.set("fs.s3n.awsSecretAccessKey", "xxx") #val lines=sc.textFile("s3n://poc-XXX/access/2016/02/20160201202001_xxx.log.gz") #lines.count Works ! This is BAD because this links talk about S3A driver not S3N . With the same configuration i still have this Error : #java.io.IOException: No FileSystem for scheme: s3a When I run this command line : #sc.hadoopConfiguration.set("fs.s3a.awsAccessKeyId", "XXXZZZHHH") #sc.hadoopConfiguration.set("fs.s3a.awsSecretAccessKey", "xxx") #val lines=sc.textFile("s3a://poc-XXX/access/2016/02/20160201202001_xxx.log.gz") #lines.count > No FileSystem for scheme: s3n or s3a spark-2.0.0 and spark-1.6.1 > - > > Key: SPARK-15965 > URL: https://issues.apache.org/jira/browse/SPARK-15965 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 1.6.1 > Environment: Debian GNU/Linux 8 > java version "1.7.0_79" >Reporter: thauvin damien > Original Estimate: 8h > Remaining Estimate: 8h > > The spark programming-guide explain that Spark can create distributed > datasets on Amazon S3 . > But since the pre-buid "Hadoop 2.6" the S3 access doesn't work with s3n or > s3a. > sc.hadoopConfiguration.set("fs.s3a.awsAccessKeyId", "XXXZZZHHH") > sc.hadoopConfiguration.set("fs.s3a.awsSecretAccessKey", > "xxx") > val > lines=sc.textFile("s3a://poc-XXX/access/2016/02/20160201202001_xxx.log.gz") > java.lang.RuntimeException: java.lang.ClassNotFoundException: Class > org.apache.hadoop.fs.s3a.S3AFileSystem not found > Any version of spark : spark-1.3.1 ; spark-1.6.1 even spark-2.0.0 with > hadoop.7.2 . > I understand this is an Hadoop Issue (SPARK-7442) but can you make some > documentation to explain what jar we need to add and where ? ( for standalone > installation) . > "hadoop-aws-x.x.x.jar and aws-java-sdk-x.x.x.jar is enough ? > What env variable we need to set and what file we need to modifiy . > Is it "$CLASSPATH "or a variable in "spark-defaults.conf" with variable > "spark.driver.extraClassPath" and "spark.executor.extraClassPath" > But Still Works with spark-1.6.1 pre build with hadoop2.4 > Thanks -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-15965) No FileSystem for scheme: s3n or s3a spark-2.0.0 and spark-1.6.1
[ https://issues.apache.org/jira/browse/SPARK-15965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15332369#comment-15332369 ] Sean Owen commented on SPARK-15965: --- CC [~steve_l] but I think this is your classpath issue or a Hadoop issue, not Spark. > No FileSystem for scheme: s3n or s3a spark-2.0.0 and spark-1.6.1 > - > > Key: SPARK-15965 > URL: https://issues.apache.org/jira/browse/SPARK-15965 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 1.6.1 > Environment: Debian GNU/Linux 8 > java version "1.7.0_79" >Reporter: thauvin damien > Original Estimate: 8h > Remaining Estimate: 8h > > The spark programming-guide explain that Spark can create distributed > datasets on Amazon S3 . > But since the pre-buid "Hadoop 2.6" the S3 access doesn't work with s3n or > s3a. > sc.hadoopConfiguration.set("fs.s3a.awsAccessKeyId", "XXXZZZHHH") > sc.hadoopConfiguration.set("fs.s3a.awsSecretAccessKey", > "xxx") > val > lines=sc.textFile("s3a://poc-XXX/access/2016/02/20160201202001_xxx.log.gz") > java.lang.RuntimeException: java.lang.ClassNotFoundException: Class > org.apache.hadoop.fs.s3a.S3AFileSystem not found > Any version of spark : spark-1.3.1 ; spark-1.6.1 even spark-2.0.0 with > hadoop.7.2 . > I understand this is an Hadoop Issue (SPARK-7442) but can you make some > documentation to explain what jar we need to add and where ? ( for standalone > installation) . > "hadoop-aws-x.x.x.jar and aws-java-sdk-x.x.x.jar is enough ? > What env variable we need to set and what file we need to modifiy . > Is it "$CLASSPATH "or a variable in "spark-defaults.conf" with variable > "spark.driver.extraClassPath" and "spark.executor.extraClassPath" > But Still Works with spark-1.6.1 pre build with hadoop2.4 > Thanks -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org