Re: Programmatic Spark 1.2.0 on EMR | S3 filesystem is not working when using

Sven Krasser Fri, 30 Jan 2015 09:55:49 -0800

>From your stacktrace it appears that the S3 writer tries to write the data
to a temp file on the local file system first. Taking a guess, that local
directory doesn't exist or you don't have permissions for it.
-Sven


On Fri, Jan 30, 2015 at 6:44 AM, Aniket Bhatnagar <
aniket.bhatna...@gmail.com> wrote:

> I am programmatically submit spark jobs in yarn-client mode on EMR.
> Whenever a job tries to save file to s3, it gives the below mentioned
> exception. I think the issue might be what EMR is not setup properly as I
> have to set all hadoop configurations manually in SparkContext. However, I
> am not sure which configuration am I missing (if any).
>
> Configurations that I am using in SparkContext to setup EMRFS:
> "spark.hadoop.fs.s3n.impl": "com.amazon.ws.emr.hadoop.fs.EmrFileSystem",
> "spark.hadoop.fs.s3.impl": "com.amazon.ws.emr.hadoop.fs.EmrFileSystem",
> "spark.hadoop.fs.emr.configuration.version": "1.0",
> "spark.hadoop.fs.s3n.multipart.uploads.enabled": "true",
> "spark.hadoop.fs.s3.enableServerSideEncryption": "false",
> "spark.hadoop.fs.s3.serverSideEncryptionAlgorithm": "AES256",
> "spark.hadoop.fs.s3.consistent": "true",
> "spark.hadoop.fs.s3.consistent.retryPolicyType": "exponential",
> "spark.hadoop.fs.s3.consistent.retryPeriodSeconds": "10",
> "spark.hadoop.fs.s3.consistent.retryCount": "5",
> "spark.hadoop.fs.s3.maxRetries": "4",
> "spark.hadoop.fs.s3.sleepTimeSeconds": "10",
> "spark.hadoop.fs.s3.consistent.throwExceptionOnInconsistency": "true",
> "spark.hadoop.fs.s3.consistent.metadata.autoCreate": "true",
> "spark.hadoop.fs.s3.consistent.metadata.tableName": "EmrFSMetadata",
> "spark.hadoop.fs.s3.consistent.metadata.read.capacity": "500",
> "spark.hadoop.fs.s3.consistent.metadata.write.capacity": "100",
> "spark.hadoop.fs.s3.consistent.fastList": "true",
> "spark.hadoop.fs.s3.consistent.fastList.prefetchMetadata": "false",
> "spark.hadoop.fs.s3.consistent.notification.CloudWatch": "false",
> "spark.hadoop.fs.s3.consistent.notification.SQS": "false",
>
> Exception:
> java.io.IOException: No such file or directory
> at java.io.UnixFileSystem.createFileExclusively(Native Method)
> at java.io.File.createNewFile(File.java:1006)
> at java.io.File.createTempFile(File.java:1989)
> at
> com.amazon.ws.emr.hadoop.fs.s3.S3FSOutputStream.startNewTempFile(S3FSOutputStream.java:269)
> at
> com.amazon.ws.emr.hadoop.fs.s3.S3FSOutputStream.writeInternal(S3FSOutputStream.java:205)
> at
> com.amazon.ws.emr.hadoop.fs.s3.S3FSOutputStream.flush(S3FSOutputStream.java:136)
> at
> com.amazon.ws.emr.hadoop.fs.s3.S3FSOutputStream.close(S3FSOutputStream.java:156)
> at
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72)
> at
> org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:105)
> at
> org.apache.hadoop.mapred.TextOutputFormat$LineRecordWriter.close(TextOutputFormat.java:109)
> at
> org.apache.hadoop.mapred.lib.MultipleOutputFormat$1.close(MultipleOutputFormat.java:116)
> at org.apache.spark.SparkHadoopWriter.close(SparkHadoopWriter.scala:102)
> at
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$13.apply(PairRDDFunctions.scala:1068)
> at
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$13.apply(PairRDDFunctions.scala:1047)
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
> at org.apache.spark.scheduler.Task.run(Task.scala:56)
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
>
> Hints? Suggestions?
>



-- 
http://sites.google.com/site/krasser/?utm_source=sig

Re: Programmatic Spark 1.2.0 on EMR | S3 filesystem is not working when using

Reply via email to