Re: Reading file from S3, facing java.lang.NoClassDefFoundError: org/jets3t/service/ServiceException
Thanks Akhil, it solved the problem. best /Shahab On Fri, Jun 12, 2015 at 8:50 PM, Akhil Das ak...@sigmoidanalytics.com wrote: Looks like your spark is not able to pick up the HADOOP_CONF. To fix this, you can actually add jets3t-0.9.0.jar to the classpath (sc.addJar(/path/to/jets3t-0.9.0.jar). Thanks Best Regards On Thu, Jun 11, 2015 at 6:44 PM, shahab shahab.mok...@gmail.com wrote: Hi, I tried to read a csv file from amazon s3, but I get the following exception which I have no clue how to solve this. I tried both spark 1.3.1 and 1.2.1, but no success. Any idea how to solve this is appreciated. best, /Shahab the code: val hadoopConf=sc.hadoopConfiguration; hadoopConf.set(fs.s3.impl, org.apache.hadoop.fs.s3native.NativeS3FileSystem) hadoopConf.set(fs.s3.awsAccessKeyId, aws_access_key_id) hadoopConf.set(fs.s3.awsSecretAccessKey, aws_secret_access_key) val csv = sc.textFile(s3n://mybucket/info.csv) // original file val data = csv.map(line = line.split(,).map(elem = elem.trim)) //lines in rows Here is the exception I faced: Exception in thread main java.lang.NoClassDefFoundError: org/jets3t/service/ServiceException at org.apache.hadoop.fs.s3native.NativeS3FileSystem.createDefaultStore( NativeS3FileSystem.java:280) at org.apache.hadoop.fs.s3native.NativeS3FileSystem.initialize( NativeS3FileSystem.java:270) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2397) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:89) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2431 ) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2413) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:368) at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296) at org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus( FileInputFormat.java:256) at org.apache.hadoop.mapred.FileInputFormat.listStatus( FileInputFormat.java:228) at org.apache.hadoop.mapred.FileInputFormat.getSplits( FileInputFormat.java:304) at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:203) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.partitions(RDD.scala:217) at org.apache.spark.rdd.MapPartitionsRDD.getPartitions( MapPartitionsRDD.scala:32) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.partitions(RDD.scala:217) at org.apache.spark.rdd.MapPartitionsRDD.getPartitions( MapPartitionsRDD.scala:32) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.partitions(RDD.scala:217) at org.apache.spark.SparkContext.runJob(SparkContext.scala:1512) at org.apache.spark.rdd.RDD.count(RDD.scala:1006)
Re: Reading file from S3, facing java.lang.NoClassDefFoundError: org/jets3t/service/ServiceException
Looks like your spark is not able to pick up the HADOOP_CONF. To fix this, you can actually add jets3t-0.9.0.jar to the classpath (sc.addJar(/path/to/jets3t-0.9.0.jar). Thanks Best Regards On Thu, Jun 11, 2015 at 6:44 PM, shahab shahab.mok...@gmail.com wrote: Hi, I tried to read a csv file from amazon s3, but I get the following exception which I have no clue how to solve this. I tried both spark 1.3.1 and 1.2.1, but no success. Any idea how to solve this is appreciated. best, /Shahab the code: val hadoopConf=sc.hadoopConfiguration; hadoopConf.set(fs.s3.impl, org.apache.hadoop.fs.s3native.NativeS3FileSystem) hadoopConf.set(fs.s3.awsAccessKeyId, aws_access_key_id) hadoopConf.set(fs.s3.awsSecretAccessKey, aws_secret_access_key) val csv = sc.textFile(s3n://mybucket/info.csv) // original file val data = csv.map(line = line.split(,).map(elem = elem.trim)) //lines in rows Here is the exception I faced: Exception in thread main java.lang.NoClassDefFoundError: org/jets3t/service/ServiceException at org.apache.hadoop.fs.s3native.NativeS3FileSystem.createDefaultStore( NativeS3FileSystem.java:280) at org.apache.hadoop.fs.s3native.NativeS3FileSystem.initialize( NativeS3FileSystem.java:270) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2397) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:89) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2431) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2413) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:368) at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296) at org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus( FileInputFormat.java:256) at org.apache.hadoop.mapred.FileInputFormat.listStatus( FileInputFormat.java:228) at org.apache.hadoop.mapred.FileInputFormat.getSplits( FileInputFormat.java:304) at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:203) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.partitions(RDD.scala:217) at org.apache.spark.rdd.MapPartitionsRDD.getPartitions( MapPartitionsRDD.scala:32) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.partitions(RDD.scala:217) at org.apache.spark.rdd.MapPartitionsRDD.getPartitions( MapPartitionsRDD.scala:32) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.partitions(RDD.scala:217) at org.apache.spark.SparkContext.runJob(SparkContext.scala:1512) at org.apache.spark.rdd.RDD.count(RDD.scala:1006)
Reading file from S3, facing java.lang.NoClassDefFoundError: org/jets3t/service/ServiceException
Hi, I tried to read a csv file from amazon s3, but I get the following exception which I have no clue how to solve this. I tried both spark 1.3.1 and 1.2.1, but no success. Any idea how to solve this is appreciated. best, /Shahab the code: val hadoopConf=sc.hadoopConfiguration; hadoopConf.set(fs.s3.impl, org.apache.hadoop.fs.s3native.NativeS3FileSystem) hadoopConf.set(fs.s3.awsAccessKeyId, aws_access_key_id) hadoopConf.set(fs.s3.awsSecretAccessKey, aws_secret_access_key) val csv = sc.textFile(s3n://mybucket/info.csv) // original file val data = csv.map(line = line.split(,).map(elem = elem.trim)) //lines in rows Here is the exception I faced: Exception in thread main java.lang.NoClassDefFoundError: org/jets3t/service/ServiceException at org.apache.hadoop.fs.s3native.NativeS3FileSystem.createDefaultStore( NativeS3FileSystem.java:280) at org.apache.hadoop.fs.s3native.NativeS3FileSystem.initialize( NativeS3FileSystem.java:270) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2397) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:89) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2431) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2413) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:368) at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296) at org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus( FileInputFormat.java:256) at org.apache.hadoop.mapred.FileInputFormat.listStatus( FileInputFormat.java:228) at org.apache.hadoop.mapred.FileInputFormat.getSplits( FileInputFormat.java:304) at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:203) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.partitions(RDD.scala:217) at org.apache.spark.rdd.MapPartitionsRDD.getPartitions( MapPartitionsRDD.scala:32) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.partitions(RDD.scala:217) at org.apache.spark.rdd.MapPartitionsRDD.getPartitions( MapPartitionsRDD.scala:32) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.partitions(RDD.scala:217) at org.apache.spark.SparkContext.runJob(SparkContext.scala:1512) at org.apache.spark.rdd.RDD.count(RDD.scala:1006)