Hello, I'm trying to read from s3 using a simple spark java app:
--------------------- SparkConf sparkConf = new SparkConf().setAppName("TestApp"); sparkConf.setMaster("local"); JavaSparkContext sc = new JavaSparkContext(sparkConf); sc.hadoopConfiguration().set("fs.s3.awsAccessKeyId", "XXXXXX"); sc.hadoopConfiguration().set("fs.s3.awsSecretAccessKey", "XXXXXX"); String path = "s3://bucket/test/testdata"; JavaRDD<String> textFile = sc.textFile(path); System.out.println(textFile.count()); --------------------- But getting this error: org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: s3://bucket/test/testdata at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:251) at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:270) at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:175) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:204) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:202) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.partitions(RDD.scala:202) at org.apache.spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:28) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:204) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:202) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.partitions(RDD.scala:202) at org.apache.spark.SparkContext.runJob(SparkContext.scala:1097) at org.apache.spark.rdd.RDD.count(RDD.scala:861) at org.apache.spark.api.java.JavaRDDLike$class.count(JavaRDDLike.scala:365) at org.apache.spark.api.java.JavaRDD.count(JavaRDD.scala:29) .... Looking at the debug log I see that org.jets3t.service.impl.rest.httpclient.RestS3Service returned 404 error trying to locate the file. Using a simple java program with com.amazonaws.services.s3.AmazonS3Client works just fine. Any idea? Thanks, Tomer --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org