Re: No overwrite flag for saveAsXXFile

2015-03-06 Thread Ted Yu
Found this thread:
http://search-hadoop.com/m/JW1q5HMrge2

Cheers

On Fri, Mar 6, 2015 at 6:42 AM, Sean Owen so...@cloudera.com wrote:

 This was discussed in the past and viewed as dangerous to enable. The
 biggest problem, by far, comes when you have a job that output M
 partitions, 'overwriting' a directory of data containing N  M old
 partitions. You suddenly have a mix of new and old data.

 It doesn't match Hadoop's semantics either, which won't let you do
 this. You can of course simply remove the output directory.

 On Fri, Mar 6, 2015 at 2:20 PM, Ted Yu yuzhih...@gmail.com wrote:
  Adding support for overwrite flag would make saveAsXXFile more user
 friendly.
 
  Cheers
 
 
 
  On Mar 6, 2015, at 2:14 AM, Jeff Zhang zjf...@gmail.com wrote:
 
  Hi folks,
 
  I found that RDD:saveXXFile has no overwrite flag which I think is very
 helpful. Is there any reason for this ?
 
 
 
  --
  Best Regards
 
  Jeff Zhang
 
  -
  To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
  For additional commands, e-mail: user-h...@spark.apache.org
 



Re: No overwrite flag for saveAsXXFile

2015-03-06 Thread Nan Zhu
Actually, except setting spark.hadoop.validateOutputSpecs to false to disable 
output validation for the whole program  

Spark implementation uses a Dynamic Variable (object PairRDDFunctions) 
internally to disable it in a case-by-case manner

val disableOutputSpecValidation: DynamicVariable[Boolean] = new 
DynamicVariable[Boolean](false)

I’m not sure if there is enough amount of benefits to make it worth exposing 
this variable to the user…  

Best,  

--  
Nan Zhu
http://codingcat.me


On Friday, March 6, 2015 at 10:22 AM, Ted Yu wrote:

 Found this thread:
 http://search-hadoop.com/m/JW1q5HMrge2
  
 Cheers
  
 On Fri, Mar 6, 2015 at 6:42 AM, Sean Owen so...@cloudera.com 
 (mailto:so...@cloudera.com) wrote:
  This was discussed in the past and viewed as dangerous to enable. The
  biggest problem, by far, comes when you have a job that output M
  partitions, 'overwriting' a directory of data containing N  M old
  partitions. You suddenly have a mix of new and old data.
   
  It doesn't match Hadoop's semantics either, which won't let you do
  this. You can of course simply remove the output directory.
   
  On Fri, Mar 6, 2015 at 2:20 PM, Ted Yu yuzhih...@gmail.com 
  (mailto:yuzhih...@gmail.com) wrote:
   Adding support for overwrite flag would make saveAsXXFile more user 
   friendly.
  
   Cheers
  
  
  
   On Mar 6, 2015, at 2:14 AM, Jeff Zhang zjf...@gmail.com 
   (mailto:zjf...@gmail.com) wrote:
  
   Hi folks,
  
   I found that RDD:saveXXFile has no overwrite flag which I think is very 
   helpful. Is there any reason for this ?
  
  
  
   --
   Best Regards
  
   Jeff Zhang
  
   -
   To unsubscribe, e-mail: user-unsubscr...@spark.apache.org 
   (mailto:user-unsubscr...@spark.apache.org)
   For additional commands, e-mail: user-h...@spark.apache.org 
   (mailto:user-h...@spark.apache.org)
  
  



Re: No overwrite flag for saveAsXXFile

2015-03-06 Thread Sean Owen
This was discussed in the past and viewed as dangerous to enable. The
biggest problem, by far, comes when you have a job that output M
partitions, 'overwriting' a directory of data containing N  M old
partitions. You suddenly have a mix of new and old data.

It doesn't match Hadoop's semantics either, which won't let you do
this. You can of course simply remove the output directory.

On Fri, Mar 6, 2015 at 2:20 PM, Ted Yu yuzhih...@gmail.com wrote:
 Adding support for overwrite flag would make saveAsXXFile more user friendly.

 Cheers



 On Mar 6, 2015, at 2:14 AM, Jeff Zhang zjf...@gmail.com wrote:

 Hi folks,

 I found that RDD:saveXXFile has no overwrite flag which I think is very 
 helpful. Is there any reason for this ?



 --
 Best Regards

 Jeff Zhang

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org


-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: No overwrite flag for saveAsXXFile

2015-03-06 Thread Ted Yu
Adding support for overwrite flag would make saveAsXXFile more user friendly. 

Cheers



 On Mar 6, 2015, at 2:14 AM, Jeff Zhang zjf...@gmail.com wrote:
 
 Hi folks,
 
 I found that RDD:saveXXFile has no overwrite flag which I think is very 
 helpful. Is there any reason for this ?
 
 
 
 -- 
 Best Regards
 
 Jeff Zhang

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: No overwrite flag for saveAsXXFile

2015-03-06 Thread Ted Yu
Since we already have spark.hadoop.validateOutputSpecs config, I think
there is not much need to expose disableOutputSpecValidation

Cheers

On Fri, Mar 6, 2015 at 7:34 AM, Nan Zhu zhunanmcg...@gmail.com wrote:

  Actually, except setting spark.hadoop.validateOutputSpecs to false to
 disable output validation for the whole program

 Spark implementation uses a Dynamic Variable (object PairRDDFunctions)
 internally to disable it in a case-by-case manner

 val disableOutputSpecValidation: DynamicVariable[Boolean] = new 
 DynamicVariable[Boolean](false)


 I’m not sure if there is enough amount of benefits to make it worth exposing 
 this variable to the user…


 Best,


 --
 Nan Zhu
 http://codingcat.me

 On Friday, March 6, 2015 at 10:22 AM, Ted Yu wrote:

 Found this thread:
 http://search-hadoop.com/m/JW1q5HMrge2

 Cheers

 On Fri, Mar 6, 2015 at 6:42 AM, Sean Owen so...@cloudera.com wrote:

 This was discussed in the past and viewed as dangerous to enable. The
 biggest problem, by far, comes when you have a job that output M
 partitions, 'overwriting' a directory of data containing N  M old
 partitions. You suddenly have a mix of new and old data.

 It doesn't match Hadoop's semantics either, which won't let you do
 this. You can of course simply remove the output directory.

 On Fri, Mar 6, 2015 at 2:20 PM, Ted Yu yuzhih...@gmail.com wrote:
  Adding support for overwrite flag would make saveAsXXFile more user
 friendly.
 
  Cheers
 
 
 
  On Mar 6, 2015, at 2:14 AM, Jeff Zhang zjf...@gmail.com wrote:
 
  Hi folks,
 
  I found that RDD:saveXXFile has no overwrite flag which I think is very
 helpful. Is there any reason for this ?
 
 
 
  --
  Best Regards
 
  Jeff Zhang
 
  -
  To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
  For additional commands, e-mail: user-h...@spark.apache.org
 






No overwrite flag for saveAsXXFile

2015-03-06 Thread Jeff Zhang
Hi folks,

I found that RDD:saveXXFile has no overwrite flag which I think is very
helpful. Is there any reason for this ?



-- 
Best Regards

Jeff Zhang