Github user pwendell commented on the pull request:

    https://github.com/apache/incubator-spark/pull/626#issuecomment-35841445
  
    Hey @CodingCat this approach has a few drawbacks. First, it will mean a 
pretty bad regression for some users. For instance, say that a user is calling 
saveAsHadoopFile(/my-dir) and that directory has some other random stuff in at 
as well. Previously it would have written spark files alongside the other 
stuff, but with this patch it will silently delete the other data and create 
the directory. Second, this changes the API's all over the place which we are 
trying not to do. Third, it's a little scary to have code in spark that's 
deleting HDFS directories - I'd rather make the user do it explicitly.
    
    What if we did the following: We look in the output directory and see if 
there are any part-XX files in there already, and if so we throw an exception 
and say that the directory already has output data in it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. To do so, please top-post your response.
If your project does not have this feature enabled and wishes so, or if the
feature is enabled but not working, please contact infrastructure at
infrastruct...@apache.org or file a JIRA ticket with INFRA.
---

Reply via email to