[ https://issues.apache.org/jira/browse/SPARK-4402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14214486#comment-14214486 ]
Sean Owen commented on SPARK-4402: ---------------------------------- Can the Spark code go back and check this before any of it is called, at the start of your program? no that isn't possible. It wouldn't even know which RDDs may be executed at the outset, and, wouldn't be sure that the output dir isn't cleared up by your code before output happens. Here it seems to happen before the output operation starts, which is about as early as possible. I suggest this is the correct behavior and is the current behavior. It's even configurable whether it overwrites or fails when the output dir exists. Of course you can and should check the output directory in your program. In fact your program is in a better position to know whether it should be an error, warning, or whether you should just overwrite the output. > Output path validation of an action statement resulting in runtime exception > ---------------------------------------------------------------------------- > > Key: SPARK-4402 > URL: https://issues.apache.org/jira/browse/SPARK-4402 > Project: Spark > Issue Type: Wish > Reporter: Vijay > Priority: Minor > > Output path validation is happening at the time of statement execution as a > part of lazyevolution of action statement. But if the path already exists > then it throws a runtime exception. Hence all the processing completed till > that point is lost which results in resource wastage (processing time and CPU > usage). > If this I/O related validation is done before the RDD action operations then > this runtime exception can be avoided. > I believe similar validation/ feature is implemented in hadoop also. > Example: > SchemaRDD.saveAsTextFile() evaluated the path during runtime -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org