So the behavior of overwriting existing directories IMO is something
we don't want to encourage. The reason why the Hadoop client has these
checks is that it's very easy for users to do unsafe things without
them. For instance, a user could overwrite an RDD that had 100
partitions with an RDD that has 10 partitions... and if they read back
the RDD they would get a corrupted RDD that has a combination of data
from the old and new RDD.

If users want to circumvent these safety checks, we need to make them
explicitly disable them. Given this, I think a config option is as
reasonable as any alternatives. This is already pretty easy IMO.

- Patrick

On Wed, Dec 24, 2014 at 11:28 PM, Cheng, Hao <hao.ch...@intel.com> wrote:
> I am wondering if we can provide more friendly API, other than configuration 
> for this purpose. What do you think Patrick?
>
> Cheng Hao
>
> -----Original Message-----
> From: Patrick Wendell [mailto:pwend...@gmail.com]
> Sent: Thursday, December 25, 2014 3:22 PM
> To: Shao, Saisai
> Cc: u...@spark.apache.org; dev@spark.apache.org
> Subject: Re: Question on saveAsTextFile with overwrite option
>
> Is it sufficient to set "spark.hadoop.validateOutputSpecs" to false?
>
> http://spark.apache.org/docs/latest/configuration.html
>
> - Patrick
>
> On Wed, Dec 24, 2014 at 10:52 PM, Shao, Saisai <saisai.s...@intel.com> wrote:
>> Hi,
>>
>>
>>
>> We have such requirements to save RDD output to HDFS with
>> saveAsTextFile like API, but need to overwrite the data if existed.
>> I'm not sure if current Spark support such kind of operations, or I need to 
>> check this manually?
>>
>>
>>
>> There's a thread in mailing list discussed about this
>> (http://apache-spark-user-list.1001560.n3.nabble.com/How-can-I-make-Sp
>> ark-1-0-saveAsTextFile-to-overwrite-existing-file-td6696.html),
>> I'm not sure this feature is enabled or not, or with some configurations?
>>
>>
>>
>> Appreciate your suggestions.
>>
>>
>>
>> Thanks a lot
>>
>> Jerry
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional 
> commands, e-mail: user-h...@spark.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Reply via email to