Re: spark 1.3.1 : unable to access s3n:// urls (no file system for scheme s3n:)

2015-07-22 Thread Eugene Morozov
Hi, 

I’m stuck with the same issue, but I see 
org.apache.hadoop.fs.s3native.NativeS3FileSystem in the hadoop-core:1.0.4 
(that’s the current hadoop-client I use) and this far is transitive dependency 
that comes from spark itself. I’m using custom build of spark 1.3.1 with 
hadoop-client 1.0.4. 

[INFO] +- 
org.apache.spark:spark-core_2.10:jar:1.3.1-hadoop-client-1.0.4:provided
...
[INFO] |  +- org.apache.hadoop:hadoop-client:jar:1.0.4:provided
[INFO] |  |  \- org.apache.hadoop:hadoop-core:jar:1.0.4:provided

The thing is I don’t have any direct usages of any hadoop-client version, so in 
my understanding I should be able to run my jar on any version of spark (1.3.1 
with hadoop-client 2.2.0 up to 2.2.6 or 1.3.1 with hadoop-client 1.0.4 up to 
1.2.1), but in reality, running it on a live cluster I’m getting class not 
found exception. I’ve checked über-jar of spark itself, and NativeS3FileSystem 
is there, so I don’t really understand where it comes from.

java.lang.RuntimeException: java.lang.ClassNotFoundException: Class 
org.apache.hadoop.fs.s3native.NativeS3FileSystem not found
at 
org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2074)


I’ve just got an idea. Is it possible that Executor’s classpath is different 
from the Worker classpath? How can I check Executor’s classpath?

On 23 Apr 2015, at 17:35, Ted Yu yuzhih...@gmail.com wrote:

 NativeS3FileSystem class is in hadoop-aws jar.
 Looks like it was not on classpath.
 
 Cheers
 
 On Thu, Apr 23, 2015 at 7:30 AM, Sujee Maniyam su...@sujee.net wrote:
 Thanks all...
 
 btw, s3n load works without any issues with  spark-1.3.1-bulit-for-hadoop 2.4 
 
 I tried this on 1.3.1-hadoop26
   sc.hadoopConfiguration.set(fs.s3n.impl, 
  org.apache.hadoop.fs.s3native.NativeS3FileSystem)
  val f = sc.textFile(s3n://bucket/file)
  f.count
 
 No it can't find the implementation path.  Looks like some jar is missing ?
 
 java.lang.RuntimeException: java.lang.ClassNotFoundException: Class 
 org.apache.hadoop.fs.s3native.NativeS3FileSystem not found
   at 
 org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2074)
   at 
 org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2578)
   at 
 org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2591)
   at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91)
 
 On Wednesday, April 22, 2015, Shuai Zheng szheng.c...@gmail.com wrote:
 Below is my code to access s3n without problem (only for 1.3.1. there is a 
 bug in 1.3.0).
 
  
 
   Configuration hadoopConf = ctx.hadoopConfiguration();
 
   hadoopConf.set(fs.s3n.impl, 
 org.apache.hadoop.fs.s3native.NativeS3FileSystem);
 
   hadoopConf.set(fs.s3n.awsAccessKeyId, awsAccessKeyId);
 
   hadoopConf.set(fs.s3n.awsSecretAccessKey, awsSecretAccessKey);
 
  
 
 Regards,
 
  
 
 Shuai
 
  
 
 From: Sujee Maniyam [mailto:su...@sujee.net] 
 Sent: Wednesday, April 22, 2015 12:45 PM
 To: Spark User List
 Subject: spark 1.3.1 : unable to access s3n:// urls (no file system for 
 scheme s3n:)
 
  
 
 Hi all
 
 I am unable to access s3n://  urls using   sc.textFile().. getting 'no file 
 system for scheme s3n://'  error.
 
  
 
 a bug or some conf settings missing?
 
  
 
 See below for details:
 
  
 
 env variables : 
 
 AWS_SECRET_ACCESS_KEY=set
 
 AWS_ACCESS_KEY_ID=set
 
  
 
 spark/RELAESE :
 
 Spark 1.3.1 (git revision 908a0bf) built for Hadoop 2.6.0
 
 Build flags: -Phadoop-2.4 -Dhadoop.version=2.6.0 -Phive -Phive-thriftserver 
 -Pyarn -DzincPort=3034
 
  
 
  
 
 ./bin/spark-shell
 
  val f = sc.textFile(s3n://bucket/file)
 
  f.count
 
  
 
 error== 
 
 java.io.IOException: No FileSystem for scheme: s3n
 
 at 
 org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2584)
 
 at 
 org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2591)
 
 at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91)
 
 at 
 org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2630)
 
 at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2612)
 
 at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370)
 
 at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296)
 
 at 
 org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:256)
 
 at 
 org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:228)
 
 at 
 org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:313)
 
 at 
 org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:203)
 
 at 
 org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219)
 
 at 
 org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217)
 
 at scala.Option.getOrElse(Option.scala:120)
 
 at 

Re: spark 1.3.1 : unable to access s3n:// urls (no file system for scheme s3n:)

2015-04-23 Thread Sujee Maniyam
Thanks all...

btw, s3n load works without any issues with  spark-1.3.1-bulit-for-hadoop
2.4

I tried this on 1.3.1-hadoop26
  sc.hadoopConfiguration.set(fs.s3n.impl,
org.apache.hadoop.fs.s3native.NativeS3FileSystem)
 val f = sc.textFile(s3n://bucket/file)
 f.count

No it can't find the implementation path.  Looks like some jar is missing ?

java.lang.RuntimeException: java.lang.ClassNotFoundException: Class
org.apache.hadoop.fs.s3native.NativeS3FileSystem not found
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2074)
at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2578)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2591)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91)

On Wednesday, April 22, 2015, Shuai Zheng szheng.c...@gmail.com wrote:

 Below is my code to access s3n without problem (only for 1.3.1. there is a
 bug in 1.3.0).



   Configuration hadoopConf = ctx.hadoopConfiguration();

   hadoopConf.set(fs.s3n.impl,
 org.apache.hadoop.fs.s3native.NativeS3FileSystem);

   hadoopConf.set(fs.s3n.awsAccessKeyId, awsAccessKeyId);

   hadoopConf.set(fs.s3n.awsSecretAccessKey,
 awsSecretAccessKey);



 Regards,



 Shuai



 *From:* Sujee Maniyam [mailto:su...@sujee.net
 javascript:_e(%7B%7D,'cvml','su...@sujee.net');]
 *Sent:* Wednesday, April 22, 2015 12:45 PM
 *To:* Spark User List
 *Subject:* spark 1.3.1 : unable to access s3n:// urls (no file system for
 scheme s3n:)



 Hi all

 I am unable to access s3n://  urls using   sc.textFile().. getting 'no
 file system for scheme s3n://'  error.



 a bug or some conf settings missing?



 See below for details:



 env variables :

 AWS_SECRET_ACCESS_KEY=set

 AWS_ACCESS_KEY_ID=set



 spark/RELAESE :

 Spark 1.3.1 (git revision 908a0bf) built for Hadoop 2.6.0

 Build flags: -Phadoop-2.4 -Dhadoop.version=2.6.0 -Phive
 -Phive-thriftserver -Pyarn -DzincPort=3034





 ./bin/spark-shell

  val f = sc.textFile(s3n://bucket/file)

  f.count



 error==

 java.io.IOException: No FileSystem for scheme: s3n

 at
 org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2584)

 at
 org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2591)

 at
 org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91)

 at
 org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2630)

 at
 org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2612)

 at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370)

 at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296)

 at
 org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:256)

 at
 org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:228)

 at
 org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:313)

 at
 org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:203)

 at
 org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219)

 at
 org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217)

 at scala.Option.getOrElse(Option.scala:120)

 at org.apache.spark.rdd.RDD.partitions(RDD.scala:217)

 at
 org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:32)

 at
 org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219)

 at
 org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217)

 at scala.Option.getOrElse(Option.scala:120)

 at org.apache.spark.rdd.RDD.partitions(RDD.scala:217)

 at
 org.apache.spark.SparkContext.runJob(SparkContext.scala:1512)

 at org.apache.spark.rdd.RDD.count(RDD.scala:1006)

 at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.init(console:24)

 at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.init(console:29)

 at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.init(console:31)

 at $iwC$$iwC$$iwC$$iwC$$iwC.init(console:33)

 at $iwC$$iwC$$iwC$$iwC.init(console:35)

 at $iwC$$iwC$$iwC.init(console:37)

 at $iwC$$iwC.init(console:39)

 at $iwC.init(console:41)

 at init(console:43)

 at .init(console:47)

 at .clinit(console)

 at .init(console:7)

 at .clinit(console)

 at $print(console)

 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

 at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)

 at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

 at java.lang.reflect.Method.invoke(Method.java:606)

 at
 

Re: spark 1.3.1 : unable to access s3n:// urls (no file system for scheme s3n:)

2015-04-23 Thread Ted Yu
NativeS3FileSystem class is in hadoop-aws jar.
Looks like it was not on classpath.

Cheers

On Thu, Apr 23, 2015 at 7:30 AM, Sujee Maniyam su...@sujee.net wrote:

 Thanks all...

 btw, s3n load works without any issues with  spark-1.3.1-bulit-for-hadoop
 2.4

 I tried this on 1.3.1-hadoop26
   sc.hadoopConfiguration.set(fs.s3n.impl,
 org.apache.hadoop.fs.s3native.NativeS3FileSystem)
  val f = sc.textFile(s3n://bucket/file)
  f.count

 No it can't find the implementation path.  Looks like some jar is missing ?

 java.lang.RuntimeException: java.lang.ClassNotFoundException: Class
 org.apache.hadoop.fs.s3native.NativeS3FileSystem not found
 at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2074)
 at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2578)
 at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2591)
 at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91)

 On Wednesday, April 22, 2015, Shuai Zheng szheng.c...@gmail.com wrote:

 Below is my code to access s3n without problem (only for 1.3.1. there is
 a bug in 1.3.0).



   Configuration hadoopConf = ctx.hadoopConfiguration();

   hadoopConf.set(fs.s3n.impl,
 org.apache.hadoop.fs.s3native.NativeS3FileSystem);

   hadoopConf.set(fs.s3n.awsAccessKeyId, awsAccessKeyId);

   hadoopConf.set(fs.s3n.awsSecretAccessKey,
 awsSecretAccessKey);



 Regards,



 Shuai



 *From:* Sujee Maniyam [mailto:su...@sujee.net]
 *Sent:* Wednesday, April 22, 2015 12:45 PM
 *To:* Spark User List
 *Subject:* spark 1.3.1 : unable to access s3n:// urls (no file system
 for scheme s3n:)



 Hi all

 I am unable to access s3n://  urls using   sc.textFile().. getting 'no
 file system for scheme s3n://'  error.



 a bug or some conf settings missing?



 See below for details:



 env variables :

 AWS_SECRET_ACCESS_KEY=set

 AWS_ACCESS_KEY_ID=set



 spark/RELAESE :

 Spark 1.3.1 (git revision 908a0bf) built for Hadoop 2.6.0

 Build flags: -Phadoop-2.4 -Dhadoop.version=2.6.0 -Phive
 -Phive-thriftserver -Pyarn -DzincPort=3034





 ./bin/spark-shell

  val f = sc.textFile(s3n://bucket/file)

  f.count



 error==

 java.io.IOException: No FileSystem for scheme: s3n

 at
 org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2584)

 at
 org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2591)

 at
 org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91)

 at
 org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2630)

 at
 org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2612)

 at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370)

 at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296)

 at
 org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:256)

 at
 org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:228)

 at
 org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:313)

 at
 org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:203)

 at
 org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219)

 at
 org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217)

 at scala.Option.getOrElse(Option.scala:120)

 at org.apache.spark.rdd.RDD.partitions(RDD.scala:217)

 at
 org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:32)

 at
 org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219)

 at
 org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217)

 at scala.Option.getOrElse(Option.scala:120)

 at org.apache.spark.rdd.RDD.partitions(RDD.scala:217)

 at
 org.apache.spark.SparkContext.runJob(SparkContext.scala:1512)

 at org.apache.spark.rdd.RDD.count(RDD.scala:1006)

 at
 $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.init(console:24)

 at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.init(console:29)

 at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.init(console:31)

 at $iwC$$iwC$$iwC$$iwC$$iwC.init(console:33)

 at $iwC$$iwC$$iwC$$iwC.init(console:35)

 at $iwC$$iwC$$iwC.init(console:37)

 at $iwC$$iwC.init(console:39)

 at $iwC.init(console:41)

 at init(console:43)

 at .init(console:47)

 at .clinit(console)

 at .init(console:7)

 at .clinit(console)

 at $print(console)

 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

 at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)

 at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

 at 

RE: spark 1.3.1 : unable to access s3n:// urls (no file system for scheme s3n:)

2015-04-22 Thread Shuai Zheng
Below is my code to access s3n without problem (only for 1.3.1. there is a bug 
in 1.3.0).

 

  Configuration hadoopConf = ctx.hadoopConfiguration();

  hadoopConf.set(fs.s3n.impl, 
org.apache.hadoop.fs.s3native.NativeS3FileSystem);

  hadoopConf.set(fs.s3n.awsAccessKeyId, awsAccessKeyId);

  hadoopConf.set(fs.s3n.awsSecretAccessKey, awsSecretAccessKey);

 

Regards,

 

Shuai

 

From: Sujee Maniyam [mailto:su...@sujee.net] 
Sent: Wednesday, April 22, 2015 12:45 PM
To: Spark User List
Subject: spark 1.3.1 : unable to access s3n:// urls (no file system for scheme 
s3n:)

 

Hi all

I am unable to access s3n://  urls using   sc.textFile().. getting 'no file 
system for scheme s3n://'  error.

 

a bug or some conf settings missing?

 

See below for details:

 

env variables : 

AWS_SECRET_ACCESS_KEY=set

AWS_ACCESS_KEY_ID=set

 

spark/RELAESE :

Spark 1.3.1 (git revision 908a0bf) built for Hadoop 2.6.0

Build flags: -Phadoop-2.4 -Dhadoop.version=2.6.0 -Phive -Phive-thriftserver 
-Pyarn -DzincPort=3034

 

 

./bin/spark-shell

 val f = sc.textFile(s3n://bucket/file)

 f.count

 

error== 

java.io.IOException: No FileSystem for scheme: s3n

at 
org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2584)

at 
org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2591)

at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91)

at 
org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2630)

at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2612)

at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370)

at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296)

at 
org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:256)

at 
org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:228)

at 
org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:313)

at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:203)

at 
org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219)

at 
org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217)

at scala.Option.getOrElse(Option.scala:120)

at org.apache.spark.rdd.RDD.partitions(RDD.scala:217)

at 
org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:32)

at 
org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219)

at 
org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217)

at scala.Option.getOrElse(Option.scala:120)

at org.apache.spark.rdd.RDD.partitions(RDD.scala:217)

at org.apache.spark.SparkContext.runJob(SparkContext.scala:1512)

at org.apache.spark.rdd.RDD.count(RDD.scala:1006)

at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.init(console:24)

at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.init(console:29)

at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.init(console:31)

at $iwC$$iwC$$iwC$$iwC$$iwC.init(console:33)

at $iwC$$iwC$$iwC$$iwC.init(console:35)

at $iwC$$iwC$$iwC.init(console:37)

at $iwC$$iwC.init(console:39)

at $iwC.init(console:41)

at init(console:43)

at .init(console:47)

at .clinit(console)

at .init(console:7)

at .clinit(console)

at $print(console)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)

at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:606)

at 
org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065)

at 
org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1338)

at 
org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840)

at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871)

at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819)

at 
org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:856)

at 
org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:901)

at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:813)

at 
org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:656)

at 
org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:664)

at 
org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$loop(SparkILoop.scala:669)

at 

Re: spark 1.3.1 : unable to access s3n:// urls (no file system for scheme s3n:)

2015-04-22 Thread Ted Yu
This thread from hadoop mailing list should give you some clue:
http://search-hadoop.com/m/LgpTk2df7822

On Wed, Apr 22, 2015 at 9:45 AM, Sujee Maniyam su...@sujee.net wrote:

 Hi all
 I am unable to access s3n://  urls using   sc.textFile().. getting 'no
 file system for scheme s3n://'  error.

 a bug or some conf settings missing?

 See below for details:

 env variables :
 AWS_SECRET_ACCESS_KEY=set
 AWS_ACCESS_KEY_ID=set

 spark/RELAESE :
 Spark 1.3.1 (git revision 908a0bf) built for Hadoop 2.6.0
 Build flags: -Phadoop-2.4 -Dhadoop.version=2.6.0 -Phive
 -Phive-thriftserver -Pyarn -DzincPort=3034


 ./bin/spark-shell
  val f = sc.textFile(s3n://bucket/file)
  f.count

 error==
 java.io.IOException: No FileSystem for scheme: s3n
 at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2584)
 at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2591)
 at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91)
 at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2630)
 at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2612)
 at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370)
 at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296)
 at
 org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:256)
 at
 org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:228)
 at
 org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:313)
 at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:203)
 at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219)
 at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217)
 at scala.Option.getOrElse(Option.scala:120)
 at org.apache.spark.rdd.RDD.partitions(RDD.scala:217)
 at
 org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:32)
 at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219)
 at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217)
 at scala.Option.getOrElse(Option.scala:120)
 at org.apache.spark.rdd.RDD.partitions(RDD.scala:217)
 at org.apache.spark.SparkContext.runJob(SparkContext.scala:1512)
 at org.apache.spark.rdd.RDD.count(RDD.scala:1006)
 at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.init(console:24)
 at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.init(console:29)
 at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.init(console:31)
 at $iwC$$iwC$$iwC$$iwC$$iwC.init(console:33)
 at $iwC$$iwC$$iwC$$iwC.init(console:35)
 at $iwC$$iwC$$iwC.init(console:37)
 at $iwC$$iwC.init(console:39)
 at $iwC.init(console:41)
 at init(console:43)
 at .init(console:47)
 at .clinit(console)
 at .init(console:7)
 at .clinit(console)
 at $print(console)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at
 org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065)
 at
 org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1338)
 at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840)
 at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871)
 at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819)
 at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:856)
 at
 org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:901)
 at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:813)
 at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:656)
 at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:664)
 at org.apache.spark.repl.SparkILoop.org
 $apache$spark$repl$SparkILoop$$loop(SparkILoop.scala:669)
 at
 org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:996)
 at
 org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:944)
 at
 org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:944)
 at
 scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
 at org.apache.spark.repl.SparkILoop.org
 $apache$spark$repl$SparkILoop$$process(SparkILoop.scala:944)
 at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1058)
 at org.apache.spark.repl.Main$.main(Main.scala:31)
 at org.apache.spark.repl.Main.main(Main.scala)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at
 org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:569)
 at 

Re: spark 1.3.1 : unable to access s3n:// urls (no file system for scheme s3n:)

2015-04-22 Thread Sujee Maniyam
Thanks all...

btw, s3n load works without any issues with  spark-1.3.1-bulit-for-hadoop
2.4

I tried this on 1.3.1-hadoop26
  sc.hadoopConfiguration.set(fs.s3n.impl,
org.apache.hadoop.fs.s3native.NativeS3FileSystem)
 val f = sc.textFile(s3n://bucket/file)
 f.count

No it can't find the implementation path.  Looks like some jar is missing ?

java.lang.RuntimeException: java.lang.ClassNotFoundException: Class
org.apache.hadoop.fs.s3native.NativeS3FileSystem not found
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2074)
at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2578)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2591)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91)


Sujee Maniyam (http://sujee.net | http://www.linkedin.com/in/sujeemaniyam )

On Wed, Apr 22, 2015 at 9:49 AM, Shuai Zheng szheng.c...@gmail.com wrote:

 Below is my code to access s3n without problem (only for 1.3.1. there is a
 bug in 1.3.0).



   Configuration hadoopConf = ctx.hadoopConfiguration();

   hadoopConf.set(fs.s3n.impl,
 org.apache.hadoop.fs.s3native.NativeS3FileSystem);

   hadoopConf.set(fs.s3n.awsAccessKeyId, awsAccessKeyId);

   hadoopConf.set(fs.s3n.awsSecretAccessKey,
 awsSecretAccessKey);



 Regards,



 Shuai



 *From:* Sujee Maniyam [mailto:su...@sujee.net]
 *Sent:* Wednesday, April 22, 2015 12:45 PM
 *To:* Spark User List
 *Subject:* spark 1.3.1 : unable to access s3n:// urls (no file system for
 scheme s3n:)



 Hi all

 I am unable to access s3n://  urls using   sc.textFile().. getting 'no
 file system for scheme s3n://'  error.



 a bug or some conf settings missing?



 See below for details:



 env variables :

 AWS_SECRET_ACCESS_KEY=set

 AWS_ACCESS_KEY_ID=set



 spark/RELAESE :

 Spark 1.3.1 (git revision 908a0bf) built for Hadoop 2.6.0

 Build flags: -Phadoop-2.4 -Dhadoop.version=2.6.0 -Phive
 -Phive-thriftserver -Pyarn -DzincPort=3034





 ./bin/spark-shell

  val f = sc.textFile(s3n://bucket/file)

  f.count



 error==

 java.io.IOException: No FileSystem for scheme: s3n

 at
 org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2584)

 at
 org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2591)

 at
 org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91)

 at
 org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2630)

 at
 org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2612)

 at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370)

 at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296)

 at
 org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:256)

 at
 org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:228)

 at
 org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:313)

 at
 org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:203)

 at
 org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219)

 at
 org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217)

 at scala.Option.getOrElse(Option.scala:120)

 at org.apache.spark.rdd.RDD.partitions(RDD.scala:217)

 at
 org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:32)

 at
 org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219)

 at
 org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217)

 at scala.Option.getOrElse(Option.scala:120)

 at org.apache.spark.rdd.RDD.partitions(RDD.scala:217)

 at
 org.apache.spark.SparkContext.runJob(SparkContext.scala:1512)

 at org.apache.spark.rdd.RDD.count(RDD.scala:1006)

 at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.init(console:24)

 at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.init(console:29)

 at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.init(console:31)

 at $iwC$$iwC$$iwC$$iwC$$iwC.init(console:33)

 at $iwC$$iwC$$iwC$$iwC.init(console:35)

 at $iwC$$iwC$$iwC.init(console:37)

 at $iwC$$iwC.init(console:39)

 at $iwC.init(console:41)

 at init(console:43)

 at .init(console:47)

 at .clinit(console)

 at .init(console:7)

 at .clinit(console)

 at $print(console)

 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

 at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)

 at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

 at java.lang.reflect.Method.invoke(Method.java:606)

 at