[jira] [Commented] (SPARK-1677) Allow users to avoid Hadoop output checks if desired

Yang Li (JIRA) Tue, 22 Nov 2016 17:13:44 -0800

    [ 
https://issues.apache.org/jira/browse/SPARK-1677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15688502#comment-15688502
 ]


Yang Li commented on SPARK-1677:
--------------------------------

Hi Spark Community,

I'm curious on the behavior of this "spark.hadoop.validateOutputSpecs" option. 
If I set it to 'false', will existing files in output directory get wiped out 
beforehand? For example, if spark job is to output file Y under directory A, 
which already contain file X, do we expect both file X and Y under folder A? Or 
just Y will be retained after the job completion.

Thanks!

> Allow users to avoid Hadoop output checks if desired
> ----------------------------------------------------
>
>                 Key: SPARK-1677
>                 URL: https://issues.apache.org/jira/browse/SPARK-1677
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core
>    Affects Versions: 1.0.0
>            Reporter: Patrick Wendell
>            Assignee: Nan Zhu
>             Fix For: 1.0.1, 1.1.0
>
>
> For compatibility with older versions of Spark it would be nice to have an 
> option `spark.hadoop.validateOutputSpecs` (default true) and a description 
> "If set to true, validates the output specification used in saveAsHadoopFile 
> and other variants. This can be disabled to silence exceptions due to 
> pre-existing output directories."
> This would just wrap the checking done in this PR:
> https://issues.apache.org/jira/browse/SPARK-1100
> https://github.com/apache/spark/pull/11
> By first checking the spark conf.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-1677) Allow users to avoid Hadoop output checks if desired

Reply via email to