>From your stacktrace it appears that the S3 writer tries to write the data to a temp file on the local file system first. Taking a guess, that local directory doesn't exist or you don't have permissions for it. -Sven
On Fri, Jan 30, 2015 at 6:44 AM, Aniket Bhatnagar < aniket.bhatna...@gmail.com> wrote: > I am programmatically submit spark jobs in yarn-client mode on EMR. > Whenever a job tries to save file to s3, it gives the below mentioned > exception. I think the issue might be what EMR is not setup properly as I > have to set all hadoop configurations manually in SparkContext. However, I > am not sure which configuration am I missing (if any). > > Configurations that I am using in SparkContext to setup EMRFS: > "spark.hadoop.fs.s3n.impl": "com.amazon.ws.emr.hadoop.fs.EmrFileSystem", > "spark.hadoop.fs.s3.impl": "com.amazon.ws.emr.hadoop.fs.EmrFileSystem", > "spark.hadoop.fs.emr.configuration.version": "1.0", > "spark.hadoop.fs.s3n.multipart.uploads.enabled": "true", > "spark.hadoop.fs.s3.enableServerSideEncryption": "false", > "spark.hadoop.fs.s3.serverSideEncryptionAlgorithm": "AES256", > "spark.hadoop.fs.s3.consistent": "true", > "spark.hadoop.fs.s3.consistent.retryPolicyType": "exponential", > "spark.hadoop.fs.s3.consistent.retryPeriodSeconds": "10", > "spark.hadoop.fs.s3.consistent.retryCount": "5", > "spark.hadoop.fs.s3.maxRetries": "4", > "spark.hadoop.fs.s3.sleepTimeSeconds": "10", > "spark.hadoop.fs.s3.consistent.throwExceptionOnInconsistency": "true", > "spark.hadoop.fs.s3.consistent.metadata.autoCreate": "true", > "spark.hadoop.fs.s3.consistent.metadata.tableName": "EmrFSMetadata", > "spark.hadoop.fs.s3.consistent.metadata.read.capacity": "500", > "spark.hadoop.fs.s3.consistent.metadata.write.capacity": "100", > "spark.hadoop.fs.s3.consistent.fastList": "true", > "spark.hadoop.fs.s3.consistent.fastList.prefetchMetadata": "false", > "spark.hadoop.fs.s3.consistent.notification.CloudWatch": "false", > "spark.hadoop.fs.s3.consistent.notification.SQS": "false", > > Exception: > java.io.IOException: No such file or directory > at java.io.UnixFileSystem.createFileExclusively(Native Method) > at java.io.File.createNewFile(File.java:1006) > at java.io.File.createTempFile(File.java:1989) > at > com.amazon.ws.emr.hadoop.fs.s3.S3FSOutputStream.startNewTempFile(S3FSOutputStream.java:269) > at > com.amazon.ws.emr.hadoop.fs.s3.S3FSOutputStream.writeInternal(S3FSOutputStream.java:205) > at > com.amazon.ws.emr.hadoop.fs.s3.S3FSOutputStream.flush(S3FSOutputStream.java:136) > at > com.amazon.ws.emr.hadoop.fs.s3.S3FSOutputStream.close(S3FSOutputStream.java:156) > at > org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72) > at > org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:105) > at > org.apache.hadoop.mapred.TextOutputFormat$LineRecordWriter.close(TextOutputFormat.java:109) > at > org.apache.hadoop.mapred.lib.MultipleOutputFormat$1.close(MultipleOutputFormat.java:116) > at org.apache.spark.SparkHadoopWriter.close(SparkHadoopWriter.scala:102) > at > org.apache.spark.rdd.PairRDDFunctions$$anonfun$13.apply(PairRDDFunctions.scala:1068) > at > org.apache.spark.rdd.PairRDDFunctions$$anonfun$13.apply(PairRDDFunctions.scala:1047) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61) > at org.apache.spark.scheduler.Task.run(Task.scala:56) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > > Hints? Suggestions? > -- http://sites.google.com/site/krasser/?utm_source=sig