[ 
https://issues.apache.org/jira/browse/SPARK-4402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14213729#comment-14213729
 ] 

Vijay commented on SPARK-4402:
------------------------------

Thanks for the reply [~srowen]

This is different scenario from the issue SPARK-1100.

Issue SPARK-1100 says that output directory is over written if it exists.
I think that fix works fine.

But, my concern is that spark throws a runtime exception if the output 
directory exists. This is happening after executing all the previous action 
statements and resulting in abrupt termination of the program. Result of the 
previous action statements is lost.

Please confirm whether this abrupt program termination is expected?

> Output path validation of an action statement resulting in runtime exception
> ----------------------------------------------------------------------------
>
>                 Key: SPARK-4402
>                 URL: https://issues.apache.org/jira/browse/SPARK-4402
>             Project: Spark
>          Issue Type: Wish
>            Reporter: Vijay
>            Priority: Minor
>
> Output path validation is happening at the time of statement execution as a 
> part of lazyevolution of action statement. But if the path already exists 
> then it throws a runtime exception. Hence all the processing completed till 
> that point is lost which results in resource wastage (processing time and CPU 
> usage).
> If this I/O related validation is done before the RDD action operations then 
> this runtime exception can be avoided.
> I believe similar validation/ feature is implemented in hadoop also.
> Example:
> SchemaRDD.saveAsTextFile() evaluated the path during runtime 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to