Re: Reading file from S3, facing java.lang.NoClassDefFoundError: org/jets3t/service/ServiceException

2015-06-15 Thread shahab
Thanks Akhil, it solved the problem.

best
/Shahab

On Fri, Jun 12, 2015 at 8:50 PM, Akhil Das ak...@sigmoidanalytics.com
wrote:

 Looks like your spark is not able to pick up the HADOOP_CONF. To fix this,
 you can actually add jets3t-0.9.0.jar to the classpath
 (sc.addJar(/path/to/jets3t-0.9.0.jar).

 Thanks
 Best Regards

 On Thu, Jun 11, 2015 at 6:44 PM, shahab shahab.mok...@gmail.com wrote:

 Hi,

 I tried to read a csv file from amazon s3, but I get the following
 exception which I have no clue how to solve this. I tried both spark 1.3.1
 and 1.2.1, but no success.  Any idea how to solve this is appreciated.


 best,
 /Shahab

 the code:

 val hadoopConf=sc.hadoopConfiguration;

  hadoopConf.set(fs.s3.impl,
 org.apache.hadoop.fs.s3native.NativeS3FileSystem)

  hadoopConf.set(fs.s3.awsAccessKeyId, aws_access_key_id)

  hadoopConf.set(fs.s3.awsSecretAccessKey, aws_secret_access_key)

  val csv = sc.textFile(s3n://mybucket/info.csv)  // original file

  val data = csv.map(line = line.split(,).map(elem = elem.trim)) //lines
 in rows


 Here is the exception I faced:

 Exception in thread main java.lang.NoClassDefFoundError:
 org/jets3t/service/ServiceException

 at org.apache.hadoop.fs.s3native.NativeS3FileSystem.createDefaultStore(
 NativeS3FileSystem.java:280)

 at org.apache.hadoop.fs.s3native.NativeS3FileSystem.initialize(
 NativeS3FileSystem.java:270)

 at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2397)

 at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:89)

 at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2431
 )

 at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2413)

 at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:368)

 at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296)

 at org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(
 FileInputFormat.java:256)

 at org.apache.hadoop.mapred.FileInputFormat.listStatus(
 FileInputFormat.java:228)

 at org.apache.hadoop.mapred.FileInputFormat.getSplits(
 FileInputFormat.java:304)

 at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:203)

 at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219)

 at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217)

 at scala.Option.getOrElse(Option.scala:120)

 at org.apache.spark.rdd.RDD.partitions(RDD.scala:217)

 at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(
 MapPartitionsRDD.scala:32)

 at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219)

 at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217)

 at scala.Option.getOrElse(Option.scala:120)

 at org.apache.spark.rdd.RDD.partitions(RDD.scala:217)

 at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(
 MapPartitionsRDD.scala:32)

 at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219)

 at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217)

 at scala.Option.getOrElse(Option.scala:120)

 at org.apache.spark.rdd.RDD.partitions(RDD.scala:217)

 at org.apache.spark.SparkContext.runJob(SparkContext.scala:1512)

 at org.apache.spark.rdd.RDD.count(RDD.scala:1006)





Re: Reading file from S3, facing java.lang.NoClassDefFoundError: org/jets3t/service/ServiceException

2015-06-12 Thread Akhil Das
Looks like your spark is not able to pick up the HADOOP_CONF. To fix this,
you can actually add jets3t-0.9.0.jar to the classpath
(sc.addJar(/path/to/jets3t-0.9.0.jar).

Thanks
Best Regards

On Thu, Jun 11, 2015 at 6:44 PM, shahab shahab.mok...@gmail.com wrote:

 Hi,

 I tried to read a csv file from amazon s3, but I get the following
 exception which I have no clue how to solve this. I tried both spark 1.3.1
 and 1.2.1, but no success.  Any idea how to solve this is appreciated.


 best,
 /Shahab

 the code:

 val hadoopConf=sc.hadoopConfiguration;

  hadoopConf.set(fs.s3.impl,
 org.apache.hadoop.fs.s3native.NativeS3FileSystem)

  hadoopConf.set(fs.s3.awsAccessKeyId, aws_access_key_id)

  hadoopConf.set(fs.s3.awsSecretAccessKey, aws_secret_access_key)

  val csv = sc.textFile(s3n://mybucket/info.csv)  // original file

  val data = csv.map(line = line.split(,).map(elem = elem.trim)) //lines
 in rows


 Here is the exception I faced:

 Exception in thread main java.lang.NoClassDefFoundError:
 org/jets3t/service/ServiceException

 at org.apache.hadoop.fs.s3native.NativeS3FileSystem.createDefaultStore(
 NativeS3FileSystem.java:280)

 at org.apache.hadoop.fs.s3native.NativeS3FileSystem.initialize(
 NativeS3FileSystem.java:270)

 at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2397)

 at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:89)

 at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2431)

 at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2413)

 at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:368)

 at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296)

 at org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(
 FileInputFormat.java:256)

 at org.apache.hadoop.mapred.FileInputFormat.listStatus(
 FileInputFormat.java:228)

 at org.apache.hadoop.mapred.FileInputFormat.getSplits(
 FileInputFormat.java:304)

 at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:203)

 at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219)

 at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217)

 at scala.Option.getOrElse(Option.scala:120)

 at org.apache.spark.rdd.RDD.partitions(RDD.scala:217)

 at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(
 MapPartitionsRDD.scala:32)

 at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219)

 at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217)

 at scala.Option.getOrElse(Option.scala:120)

 at org.apache.spark.rdd.RDD.partitions(RDD.scala:217)

 at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(
 MapPartitionsRDD.scala:32)

 at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219)

 at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217)

 at scala.Option.getOrElse(Option.scala:120)

 at org.apache.spark.rdd.RDD.partitions(RDD.scala:217)

 at org.apache.spark.SparkContext.runJob(SparkContext.scala:1512)

 at org.apache.spark.rdd.RDD.count(RDD.scala:1006)



Reading file from S3, facing java.lang.NoClassDefFoundError: org/jets3t/service/ServiceException

2015-06-11 Thread shahab
Hi,

I tried to read a csv file from amazon s3, but I get the following
exception which I have no clue how to solve this. I tried both spark 1.3.1
and 1.2.1, but no success.  Any idea how to solve this is appreciated.


best,
/Shahab

the code:

val hadoopConf=sc.hadoopConfiguration;

 hadoopConf.set(fs.s3.impl,
org.apache.hadoop.fs.s3native.NativeS3FileSystem)

 hadoopConf.set(fs.s3.awsAccessKeyId, aws_access_key_id)

 hadoopConf.set(fs.s3.awsSecretAccessKey, aws_secret_access_key)

 val csv = sc.textFile(s3n://mybucket/info.csv)  // original file

 val data = csv.map(line = line.split(,).map(elem = elem.trim)) //lines
in rows


Here is the exception I faced:

Exception in thread main java.lang.NoClassDefFoundError:
org/jets3t/service/ServiceException

at org.apache.hadoop.fs.s3native.NativeS3FileSystem.createDefaultStore(
NativeS3FileSystem.java:280)

at org.apache.hadoop.fs.s3native.NativeS3FileSystem.initialize(
NativeS3FileSystem.java:270)

at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2397)

at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:89)

at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2431)

at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2413)

at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:368)

at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296)

at org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(
FileInputFormat.java:256)

at org.apache.hadoop.mapred.FileInputFormat.listStatus(
FileInputFormat.java:228)

at org.apache.hadoop.mapred.FileInputFormat.getSplits(
FileInputFormat.java:304)

at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:203)

at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219)

at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217)

at scala.Option.getOrElse(Option.scala:120)

at org.apache.spark.rdd.RDD.partitions(RDD.scala:217)

at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(
MapPartitionsRDD.scala:32)

at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219)

at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217)

at scala.Option.getOrElse(Option.scala:120)

at org.apache.spark.rdd.RDD.partitions(RDD.scala:217)

at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(
MapPartitionsRDD.scala:32)

at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219)

at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217)

at scala.Option.getOrElse(Option.scala:120)

at org.apache.spark.rdd.RDD.partitions(RDD.scala:217)

at org.apache.spark.SparkContext.runJob(SparkContext.scala:1512)

at org.apache.spark.rdd.RDD.count(RDD.scala:1006)