Fair enough. That rationale makes sense.

I would prefer that a Spark clobber option also delete the destination
files, but as long as it's a non-default option I can see the "caller
beware" side of that argument as well.

Nick


2014년 6월 2일 월요일, Sean Owen<so...@cloudera.com>님이 작성한 메시지:

> I assume the idea is for Spark to "rm -r dir/", which would clean out
> everything that was there before. It's just doing this instead of the
> caller. Hadoop still won't let you write into a location that already
> exists regardless, and part of that is for this reason that you might
> end up with files mixed-up from different jobs.
>
> This doesn't need a change to Hadoop and probably shouldn't; it's a
> change to semantics provided by Spark to do the delete for you if you
> set a flag. Viewed that way, meh, seems like the caller could just do
> that themselves rather than expand the Spark API (via a utility method
> if you like), but I can see it both ways. Caller beware.
>
> On Mon, Jun 2, 2014 at 10:08 PM, Nicholas Chammas
> <nicholas.cham...@gmail.com <javascript:;>> wrote:
> > OK, thanks for confirming. Is there something we can do about that
> leftover
> > part- files problem in Spark, or is that for the Hadoop team?
> >
> >
> > 2014년 6월 2일 월요일, Aaron Davidson<ilike...@gmail.com <javascript:;>>님이
> 작성한 메시지:
> >
> >> Yes.
> >>
> >>
> >> On Mon, Jun 2, 2014 at 1:23 PM, Nicholas Chammas
> >> <nicholas.cham...@gmail.com <javascript:;>> wrote:
> >>
> >> So in summary:
> >>
> >> As of Spark 1.0.0, saveAsTextFile() will no longer clobber by default.
> >> There is an open JIRA issue to add an option to allow clobbering.
> >> Even when clobbering, part- files may be left over from previous saves,
> >> which is dangerous.
> >>
> >> Is this correct?
>

Reply via email to