Re: No overwrite flag for saveAsXXFile
Found this thread: http://search-hadoop.com/m/JW1q5HMrge2 Cheers On Fri, Mar 6, 2015 at 6:42 AM, Sean Owen so...@cloudera.com wrote: This was discussed in the past and viewed as dangerous to enable. The biggest problem, by far, comes when you have a job that output M partitions, 'overwriting' a directory of data containing N M old partitions. You suddenly have a mix of new and old data. It doesn't match Hadoop's semantics either, which won't let you do this. You can of course simply remove the output directory. On Fri, Mar 6, 2015 at 2:20 PM, Ted Yu yuzhih...@gmail.com wrote: Adding support for overwrite flag would make saveAsXXFile more user friendly. Cheers On Mar 6, 2015, at 2:14 AM, Jeff Zhang zjf...@gmail.com wrote: Hi folks, I found that RDD:saveXXFile has no overwrite flag which I think is very helpful. Is there any reason for this ? -- Best Regards Jeff Zhang - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: No overwrite flag for saveAsXXFile
Actually, except setting spark.hadoop.validateOutputSpecs to false to disable output validation for the whole program Spark implementation uses a Dynamic Variable (object PairRDDFunctions) internally to disable it in a case-by-case manner val disableOutputSpecValidation: DynamicVariable[Boolean] = new DynamicVariable[Boolean](false) I’m not sure if there is enough amount of benefits to make it worth exposing this variable to the user… Best, -- Nan Zhu http://codingcat.me On Friday, March 6, 2015 at 10:22 AM, Ted Yu wrote: Found this thread: http://search-hadoop.com/m/JW1q5HMrge2 Cheers On Fri, Mar 6, 2015 at 6:42 AM, Sean Owen so...@cloudera.com (mailto:so...@cloudera.com) wrote: This was discussed in the past and viewed as dangerous to enable. The biggest problem, by far, comes when you have a job that output M partitions, 'overwriting' a directory of data containing N M old partitions. You suddenly have a mix of new and old data. It doesn't match Hadoop's semantics either, which won't let you do this. You can of course simply remove the output directory. On Fri, Mar 6, 2015 at 2:20 PM, Ted Yu yuzhih...@gmail.com (mailto:yuzhih...@gmail.com) wrote: Adding support for overwrite flag would make saveAsXXFile more user friendly. Cheers On Mar 6, 2015, at 2:14 AM, Jeff Zhang zjf...@gmail.com (mailto:zjf...@gmail.com) wrote: Hi folks, I found that RDD:saveXXFile has no overwrite flag which I think is very helpful. Is there any reason for this ? -- Best Regards Jeff Zhang - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org (mailto:user-unsubscr...@spark.apache.org) For additional commands, e-mail: user-h...@spark.apache.org (mailto:user-h...@spark.apache.org)
Re: No overwrite flag for saveAsXXFile
This was discussed in the past and viewed as dangerous to enable. The biggest problem, by far, comes when you have a job that output M partitions, 'overwriting' a directory of data containing N M old partitions. You suddenly have a mix of new and old data. It doesn't match Hadoop's semantics either, which won't let you do this. You can of course simply remove the output directory. On Fri, Mar 6, 2015 at 2:20 PM, Ted Yu yuzhih...@gmail.com wrote: Adding support for overwrite flag would make saveAsXXFile more user friendly. Cheers On Mar 6, 2015, at 2:14 AM, Jeff Zhang zjf...@gmail.com wrote: Hi folks, I found that RDD:saveXXFile has no overwrite flag which I think is very helpful. Is there any reason for this ? -- Best Regards Jeff Zhang - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: No overwrite flag for saveAsXXFile
Adding support for overwrite flag would make saveAsXXFile more user friendly. Cheers On Mar 6, 2015, at 2:14 AM, Jeff Zhang zjf...@gmail.com wrote: Hi folks, I found that RDD:saveXXFile has no overwrite flag which I think is very helpful. Is there any reason for this ? -- Best Regards Jeff Zhang - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: No overwrite flag for saveAsXXFile
Since we already have spark.hadoop.validateOutputSpecs config, I think there is not much need to expose disableOutputSpecValidation Cheers On Fri, Mar 6, 2015 at 7:34 AM, Nan Zhu zhunanmcg...@gmail.com wrote: Actually, except setting spark.hadoop.validateOutputSpecs to false to disable output validation for the whole program Spark implementation uses a Dynamic Variable (object PairRDDFunctions) internally to disable it in a case-by-case manner val disableOutputSpecValidation: DynamicVariable[Boolean] = new DynamicVariable[Boolean](false) I’m not sure if there is enough amount of benefits to make it worth exposing this variable to the user… Best, -- Nan Zhu http://codingcat.me On Friday, March 6, 2015 at 10:22 AM, Ted Yu wrote: Found this thread: http://search-hadoop.com/m/JW1q5HMrge2 Cheers On Fri, Mar 6, 2015 at 6:42 AM, Sean Owen so...@cloudera.com wrote: This was discussed in the past and viewed as dangerous to enable. The biggest problem, by far, comes when you have a job that output M partitions, 'overwriting' a directory of data containing N M old partitions. You suddenly have a mix of new and old data. It doesn't match Hadoop's semantics either, which won't let you do this. You can of course simply remove the output directory. On Fri, Mar 6, 2015 at 2:20 PM, Ted Yu yuzhih...@gmail.com wrote: Adding support for overwrite flag would make saveAsXXFile more user friendly. Cheers On Mar 6, 2015, at 2:14 AM, Jeff Zhang zjf...@gmail.com wrote: Hi folks, I found that RDD:saveXXFile has no overwrite flag which I think is very helpful. Is there any reason for this ? -- Best Regards Jeff Zhang - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
No overwrite flag for saveAsXXFile
Hi folks, I found that RDD:saveXXFile has no overwrite flag which I think is very helpful. Is there any reason for this ? -- Best Regards Jeff Zhang