Re: How can I make Spark 1.0 saveAsTextFile to overwrite existing file

2014-06-12 Thread Daniel Siegmann
This is not very convenient... Thanks. -Original Message- From: Patrick Wendell [mailto:pwend...@gmail.com] Sent: Tuesday, June 03, 2014 11:40 AM To: user@spark.apache.org Subject: Re: How can I make Spark 1.0 saveAsTextFile to overwrite existing file (A) Semantics in Spark 0.9

Re: How can I make Spark 1.0 saveAsTextFile to overwrite existing file

2014-06-12 Thread Nan Zhu
TempFile to FileA This is not very convenient... Thanks. -Original Message- From: Patrick Wendell [mailto:pwend...@gmail.com] Sent: Tuesday, June 03, 2014 11:40 AM To: user@spark.apache.org (mailto:user@spark.apache.org) Subject: Re: How can I make Spark 1.0

Re: How can I make Spark 1.0 saveAsTextFile to overwrite existing file

2014-06-12 Thread Daniel Siegmann
(TempFile) 4. delete FileA 5. rename TempFile to FileA This is not very convenient... Thanks. -Original Message- From: Patrick Wendell [mailto:pwend...@gmail.com] Sent: Tuesday, June 03, 2014 11:40 AM To: user@spark.apache.org Subject: Re: How can I make Spark 1.0 saveAsTextFile

Re: How can I make Spark 1.0 saveAsTextFile to overwrite existing file

2014-06-12 Thread Nan Zhu
@spark.apache.org (mailto:user@spark.apache.org) Subject: Re: How can I make Spark 1.0 saveAsTextFile to overwrite existing file (A) Semantics in Spark 0.9 and earlier: Spark will ignore Hadoo's output format check and overwrite files in the destination directory

Re: How can I make Spark 1.0 saveAsTextFile to overwrite existing file

2014-06-09 Thread Aaron Davidson
(TempFile) 4. delete FileA 5. rename TempFile to FileA This is not very convenient... Thanks. -Original Message- From: Patrick Wendell [mailto:pwend...@gmail.com] Sent: Tuesday, June 03, 2014 11:40 AM To: user@spark.apache.org Subject: Re: How can I make Spark 1.0 saveAsTextFile

RE: How can I make Spark 1.0 saveAsTextFile to overwrite existing file

2014-06-08 Thread innowireless TaeYun Kim
, 2014 11:40 AM To: user@spark.apache.org Subject: Re: How can I make Spark 1.0 saveAsTextFile to overwrite existing file (A) Semantics in Spark 0.9 and earlier: Spark will ignore Hadoo's output format check and overwrite files in the destination directory. But it won't clobber the directory entirely

Re: How can I make Spark 1.0 saveAsTextFile to overwrite existing file

2014-06-03 Thread Sean Owen
Ah, the output directory check was just not executed in the past. I thought it deleted the files. A third way indeed. FWIW I also think (B) is best. (A) and (C) both have their risks, but if they're non-default and everyone's willing to entertain a new arg to the API method, sure. (A) seems more

How can I make Spark 1.0 saveAsTextFile to overwrite existing file

2014-06-02 Thread Kexin Xie
Hi, Spark 1.0 changes the default behaviour of RDD.saveAsTextFile to throw org.apache.hadoop.mapred.FileAlreadyExistsException when file already exists. Is there a way I can allow Spark to overwrite the existing file? Cheers, Kexin

Re: How can I make Spark 1.0 saveAsTextFile to overwrite existing file

2014-06-02 Thread Pierre Borckmans
+1 Same question here... Message sent from a mobile device - excuse typos and abbreviations Le 2 juin 2014 à 10:08, Kexin Xie kexin@bigcommerce.com a écrit : Hi, Spark 1.0 changes the default behaviour of RDD.saveAsTextFile to throw

Re: How can I make Spark 1.0 saveAsTextFile to overwrite existing file

2014-06-02 Thread Michael Cutler
The function saveAsTextFile https://github.com/apache/spark/blob/7d9cc9214bd06495f6838e355331dd2b5f1f7407/core/src/main/scala/org/apache/spark/rdd/RDD.scala#L1066 is a wrapper around saveAsHadoopFile

Re: How can I make Spark 1.0 saveAsTextFile to overwrite existing file

2014-06-02 Thread Pierre B
Hi Michaël, Thanks for this. We could indeed do that. But I guess the question is more about the change of behaviour from 0.9.1 to 1.0.0. We never had to care about that in previous versions. Does that mean we have to manually remove existing files or is there a way to aumotically overwrite

Re: How can I make Spark 1.0 saveAsTextFile to overwrite existing file

2014-06-02 Thread Pierre Borckmans
Indeed, the behavior has changed for good or for bad. I mean, I agree with the danger you mention but I'm not sure it's happening like that. Isn't there a mechanism for overwrite in Hadoop that automatically removes part files, then writes a _temporary folder and then only the part files along

Re: How can I make Spark 1.0 saveAsTextFile to overwrite existing file

2014-06-02 Thread Patrick Wendell
Hey There, The issue was that the old behavior could cause users to silently overwrite data, which is pretty bad, so to be conservative we decided to enforce the same checks that Hadoop does. This was documented by this JIRA: https://issues.apache.org/jira/browse/SPARK-1100

Re: How can I make Spark 1.0 saveAsTextFile to overwrite existing file

2014-06-02 Thread Nan Zhu
Hi, Patrick, I think https://issues.apache.org/jira/browse/SPARK-1677 is talking about the same thing? How about assigning it to me? I think I missed the configuration part in my previous commit, though I declared that in the PR description…. Best, -- Nan Zhu On Monday, June 2,

Re: How can I make Spark 1.0 saveAsTextFile to overwrite existing file

2014-06-02 Thread Patrick Wendell
Thanks for pointing that out. I've assigned you to SPARK-1677 (I think I accidentally assigned myself way back when I created it). This should be an easy fix. On Mon, Jun 2, 2014 at 12:19 PM, Nan Zhu zhunanmcg...@gmail.com wrote: Hi, Patrick, I think

Re: How can I make Spark 1.0 saveAsTextFile to overwrite existing file

2014-06-02 Thread Aaron Davidson
+1 please re-add this feature On Mon, Jun 2, 2014 at 12:44 PM, Patrick Wendell pwend...@gmail.com wrote: Thanks for pointing that out. I've assigned you to SPARK-1677 (I think I accidentally assigned myself way back when I created it). This should be an easy fix. On Mon, Jun 2, 2014 at

Re: How can I make Spark 1.0 saveAsTextFile to overwrite existing file

2014-06-02 Thread Nicholas Chammas
So in summary: - As of Spark 1.0.0, saveAsTextFile() will no longer clobber by default. - There is an open JIRA issue to add an option to allow clobbering. - Even when clobbering, part- files may be left over from previous saves, which is dangerous. Is this correct? On Mon, Jun 2,

Re: How can I make Spark 1.0 saveAsTextFile to overwrite existing file

2014-06-02 Thread Aaron Davidson
Yes. On Mon, Jun 2, 2014 at 1:23 PM, Nicholas Chammas nicholas.cham...@gmail.com wrote: So in summary: - As of Spark 1.0.0, saveAsTextFile() will no longer clobber by default. - There is an open JIRA issue to add an option to allow clobbering. - Even when clobbering, part-

Re: How can I make Spark 1.0 saveAsTextFile to overwrite existing file

2014-06-02 Thread Nicholas Chammas
OK, thanks for confirming. Is there something we can do about that leftover part- files problem in Spark, or is that for the Hadoop team? 2014년 6월 2일 월요일, Aaron Davidsonilike...@gmail.com님이 작성한 메시지: Yes. On Mon, Jun 2, 2014 at 1:23 PM, Nicholas Chammas nicholas.cham...@gmail.com wrote:

Re: How can I make Spark 1.0 saveAsTextFile to overwrite existing file

2014-06-02 Thread Pierre Borckmans
I'm a bit confused because the PR mentioned by Patrick seems to adress all these issues: https://github.com/apache/spark/commit/3a8b698e961ac05d9d53e2bbf0c2844fcb1010d1 Was it not accepted? Or is the description of this PR not completely implemented? Message sent from a mobile device - excuse

Re: How can I make Spark 1.0 saveAsTextFile to overwrite existing file

2014-06-02 Thread Sean Owen
I assume the idea is for Spark to rm -r dir/, which would clean out everything that was there before. It's just doing this instead of the caller. Hadoop still won't let you write into a location that already exists regardless, and part of that is for this reason that you might end up with files

Re: How can I make Spark 1.0 saveAsTextFile to overwrite existing file

2014-06-02 Thread Nicholas Chammas
Fair enough. That rationale makes sense. I would prefer that a Spark clobber option also delete the destination files, but as long as it's a non-default option I can see the caller beware side of that argument as well. Nick 2014년 6월 2일 월요일, Sean Owenso...@cloudera.com님이 작성한 메시지: I assume the

Re: How can I make Spark 1.0 saveAsTextFile to overwrite existing file

2014-06-02 Thread Nan Zhu
I made the PR, the problem is …after many rounds of review, that configuration part is missed….sorry about that I will fix it Best, -- Nan Zhu On Monday, June 2, 2014 at 5:13 PM, Pierre Borckmans wrote: I'm a bit confused because the PR mentioned by Patrick seems to adress all

Re: How can I make Spark 1.0 saveAsTextFile to overwrite existing file

2014-06-02 Thread Patrick Wendell
We can just add back a flag to make it backwards compatible - it was just missed during the original PR. Adding a *third* set of clobber semantics, I'm slightly -1 on that for the following reasons: 1. It's scary to have Spark recursively deleting user files, could easily lead to users deleting

Re: How can I make Spark 1.0 saveAsTextFile to overwrite existing file

2014-06-02 Thread Sean Owen
Is there a third way? Unless I miss something. Hadoop's OutputFormat wants the target dir to not exist no matter what, so it's just a question of whether Spark deletes it for you or errors. On Tue, Jun 3, 2014 at 12:22 AM, Patrick Wendell pwend...@gmail.com wrote: We can just add back a flag to

Re: How can I make Spark 1.0 saveAsTextFile to overwrite existing file

2014-06-02 Thread Patrick Wendell
(A) Semantics in Spark 0.9 and earlier: Spark will ignore Hadoo's output format check and overwrite files in the destination directory. But it won't clobber the directory entirely. I.e. if the directory already had part1 part2 part3 part4 and you write a new job outputing only two files (part1,

Re: How can I make Spark 1.0 saveAsTextFile to overwrite existing file

2014-06-02 Thread Nan Zhu
I remember that in the earlier version of that PR, I deleted files by calling HDFS API we discussed and concluded that, it’s a bit scary to have something directly deleting user’s files in Spark Best, -- Nan Zhu On Monday, June 2, 2014 at 10:39 PM, Patrick Wendell wrote: (A) Semantics

Re: How can I make Spark 1.0 saveAsTextFile to overwrite existing file

2014-06-02 Thread Nicholas Chammas
On Mon, Jun 2, 2014 at 10:39 PM, Patrick Wendell pwend...@gmail.com wrote: (B) Semantics in Spark 1.0 and earlier: Do you mean 1.0 and later? Option (B) with the exception-on-clobber sounds fine to me, btw. My use pattern is probably common but not universal, and deleting user files is indeed

Re: How can I make Spark 1.0 saveAsTextFile to overwrite existing file

2014-06-02 Thread Kexin Xie
+1 on Option (B) with flag to allow semantics in (A) for back compatibility. Kexin On Tue, Jun 3, 2014 at 1:18 PM, Nicholas Chammas nicholas.cham...@gmail.com wrote: On Mon, Jun 2, 2014 at 10:39 PM, Patrick Wendell pwend...@gmail.com wrote: (B) Semantics in Spark 1.0 and earlier: Do

Re: How can I make Spark 1.0 saveAsTextFile to overwrite existing file

2014-06-02 Thread Patrick Wendell
Good catch! Yes I meant 1.0 and later. On Mon, Jun 2, 2014 at 8:33 PM, Kexin Xie kexin@bigcommerce.com wrote: +1 on Option (B) with flag to allow semantics in (A) for back compatibility. Kexin On Tue, Jun 3, 2014 at 1:18 PM, Nicholas Chammas nicholas.cham...@gmail.com wrote: On Mon,