Hi, all
I’m writing a Spark application to load S3 data to HDFS,
the HDFS version is 2.3.0, so I have to compile Spark with Hadoop 2.3.0
after I execute
val allfiles = sc.textFile(s3n://abc/*.txt”)
val output = allfiles.saveAsTextFile(hdfs://x.x.x.x:9000/dataset”)
Spark throws exception:
I ran into the same issue. The problem seems to be with the jets3t library
that Spark uses in project/SparkBuild.scala.
change this:
net.java.dev.jets3t % jets3t % 0.7.1
to
net.java.dev.jets3t % jets3t % 0.9.0
0.7.1 is not the right version of jets3t for Hadoop
Yes, I fixed in the same way, but didn’t get a change to get back to here
I also made a PR: https://github.com/apache/spark/pull/468
Best,
--
Nan Zhu
On Monday, April 21, 2014 at 8:19 PM, Parviz Deyhim wrote:
I ran into the same issue. The problem seems to be with the jets3t library