Ah, the output directory check was just not executed in the past. I thought it deleted the files. A third way indeed.
FWIW I also think (B) is best. (A) and (C) both have their risks, but if they're non-default and everyone's willing to entertain a new arg to the API method, sure. (A) seems more surprising as its failure mode is subtler and does not mirror Hadoop semantics. I would have suggested (C) as the fallback alternative. But I suppose you can always make (C) happen yourself. On Tue, Jun 3, 2014 at 4:33 AM, Kexin Xie <kexin....@bigcommerce.com> wrote: > +1 on Option (B) with flag to allow semantics in (A) for back compatibility. > > Kexin > > > > On Tue, Jun 3, 2014 at 1:18 PM, Nicholas Chammas > <nicholas.cham...@gmail.com> wrote: >> >> On Mon, Jun 2, 2014 at 10:39 PM, Patrick Wendell <pwend...@gmail.com> >> wrote: >>> >>> (B) Semantics in Spark 1.0 and earlier: >> >> >> Do you mean 1.0 and later? >> >> Option (B) with the exception-on-clobber sounds fine to me, btw. My use >> pattern is probably common but not universal, and deleting user files is >> indeed scary. >> >> Nick > >