Thanks Akhil, it solved the problem. best /Shahab
On Fri, Jun 12, 2015 at 8:50 PM, Akhil Das <ak...@sigmoidanalytics.com> wrote: > Looks like your spark is not able to pick up the HADOOP_CONF. To fix this, > you can actually add jets3t-0.9.0.jar to the classpath > (sc.addJar(/path/to/jets3t-0.9.0.jar). > > Thanks > Best Regards > > On Thu, Jun 11, 2015 at 6:44 PM, shahab <shahab.mok...@gmail.com> wrote: > >> Hi, >> >> I tried to read a csv file from amazon s3, but I get the following >> exception which I have no clue how to solve this. I tried both spark 1.3.1 >> and 1.2.1, but no success. Any idea how to solve this is appreciated. >> >> >> best, >> /Shahab >> >> the code: >> >> val hadoopConf=sc.hadoopConfiguration; >> >> hadoopConf.set("fs.s3.impl", >> "org.apache.hadoop.fs.s3native.NativeS3FileSystem") >> >> hadoopConf.set("fs.s3.awsAccessKeyId", aws_access_key_id) >> >> hadoopConf.set("fs.s3.awsSecretAccessKey", aws_secret_access_key) >> >> val csv = sc.textFile(""s3n://mybucket/info.csv") // original file >> >> val data = csv.map(line => line.split(",").map(elem => elem.trim)) //lines >> in rows >> >> >> Here is the exception I faced: >> >> Exception in thread "main" java.lang.NoClassDefFoundError: >> org/jets3t/service/ServiceException >> >> at org.apache.hadoop.fs.s3native.NativeS3FileSystem.createDefaultStore( >> NativeS3FileSystem.java:280) >> >> at org.apache.hadoop.fs.s3native.NativeS3FileSystem.initialize( >> NativeS3FileSystem.java:270) >> >> at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2397) >> >> at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:89) >> >> at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2431 >> ) >> >> at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2413) >> >> at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:368) >> >> at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296) >> >> at org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus( >> FileInputFormat.java:256) >> >> at org.apache.hadoop.mapred.FileInputFormat.listStatus( >> FileInputFormat.java:228) >> >> at org.apache.hadoop.mapred.FileInputFormat.getSplits( >> FileInputFormat.java:304) >> >> at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:203) >> >> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219) >> >> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217) >> >> at scala.Option.getOrElse(Option.scala:120) >> >> at org.apache.spark.rdd.RDD.partitions(RDD.scala:217) >> >> at org.apache.spark.rdd.MapPartitionsRDD.getPartitions( >> MapPartitionsRDD.scala:32) >> >> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219) >> >> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217) >> >> at scala.Option.getOrElse(Option.scala:120) >> >> at org.apache.spark.rdd.RDD.partitions(RDD.scala:217) >> >> at org.apache.spark.rdd.MapPartitionsRDD.getPartitions( >> MapPartitionsRDD.scala:32) >> >> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219) >> >> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217) >> >> at scala.Option.getOrElse(Option.scala:120) >> >> at org.apache.spark.rdd.RDD.partitions(RDD.scala:217) >> >> at org.apache.spark.SparkContext.runJob(SparkContext.scala:1512) >> >> at org.apache.spark.rdd.RDD.count(RDD.scala:1006) >> > >