[ https://issues.apache.org/jira/browse/SPARK-5545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hari Shreedharan resolved SPARK-5545. ------------------------------------- Resolution: Duplicate > [STREAMING] DStream#saveAs**Files can fail after app restarts > ------------------------------------------------------------- > > Key: SPARK-5545 > URL: https://issues.apache.org/jira/browse/SPARK-5545 > Project: Spark > Issue Type: Bug > Reporter: Hari Shreedharan > Priority: Critical > > After an app restarts, sometimes the saveAs**Files can fail. This happens if > the driver dies while the RDD was being written to HDFS. At this point the > rdd-<timestamp> directory has already been created but we have not marked it > as completely processed. This causes the RDD to get written after we restart, > into the same directory. This can cause the underlying MR api to throw an > exception that looks like this: > {code} > 15/02/02 13:16:41 INFO yarn.ApplicationMaster: Final app status: FAILED, > exitCode: 15, (reason: User class threw exception: Output directory > hdfs://wypoon-cdhx-1.ent.cloudera.com:8020/user/systest/flumetest/rdd-1422911774000 > already exists) > Exception in thread "Driver" > org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory > hdfs://wypoon-cdhx-1.ent.cloudera.com:8020/user/systest/flumetest/rdd-1422911774000 > already exists > at > org.apache.hadoop.mapred.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:132) > at > org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopDataset(PairRDDFunctions.scala:1041) > at > org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopFile(PairRDDFunctions.scala:940) > at > org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopFile(PairRDDFunctions.scala:849) > at org.apache.spark.rdd.RDD.saveAsTextFile(RDD.scala:1164) > ... > {code} > Thanks to [~wypoon] for finding this issue! > I have a PR coming up for this. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org