[ https://issues.apache.org/jira/browse/HADOOP-11452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15807864#comment-15807864 ]
Steve Loughran commented on HADOOP-11452: ----------------------------------------- We've kind of gone round in circles on the "what features" probe, because it's so fluid. HADOOP-9565 has discussed this. I think it's time to look at the method again, with a list of well known strings to look for. Blobstores can add their own "atomic-put-on-close", etc. Now regarding a patch to say "I must have atomic", well, yes, if you declare you want it, why not have the thing fail-fast? As it is, right now you get non-atomic renames *and don't even know*. w.r.t S3A, we are going to do things which relies on PUT being atomic, see HADOOP-13786 for the full algorithm. All I was proposing was a way tor people to say "This really, really must be atomic, so that peoples code which contain fundamental requirements of rename semantics aren't going to get deep into trouble on S3 or Swift (but not Azure). What gets into trouble? MRv1 and MRv2 committers, for example. Making things public? Well, FileStatus is ubiquitous; too late to remove, And, because it lets the underlying implementation do what it wants, is great to work with from blobstore code as we can do lots to minimise overhead. For example, {{FileContext.listFiles()}} implements its recursive treewalk, which would seemingly make HADOOP-13208 impossible to support. I know FC is cleaner, but for playing blobstore games, the simpler FS API is easier to improve, despite its lack of consistency across impls. So instead we have classic {{boolean rename(src, dest)}} where nobody really knows what to do when, say, the source doesn't exist, dest is "/", etc, etc. And we have a rename(src, dest, options), where the base implementation, the protected one in {{FileSystem}}, is in fact broken as in "will delete your data" broken. I consider that important to fix, even if it currently only bites anyone using FileContext.rename(src, src, overwrite). Now, the current patch *doesn't* do anything w.r.t renames, it opens up the method, fixes its base rename call to not delete the source, tries to specify what actually goes on in HFDS, pulls the error strings out of DFS & makes them shared constants, so that the other implementations can raise exceptions with identical methods. Do you want to review it? I know it's not complete, it doesn't have the tests for the corner cases I've managed to identify, but at least have a look at the FS spec document and show me where i've misunderstood thngs. > Revisit FileSystem.rename(path, path, options) > ---------------------------------------------- > > Key: HADOOP-11452 > URL: https://issues.apache.org/jira/browse/HADOOP-11452 > Project: Hadoop Common > Issue Type: Task > Components: fs > Affects Versions: 2.7.3 > Reporter: Yi Liu > Assignee: Steve Loughran > Attachments: HADOOP-11452-001.patch, HADOOP-11452-002.patch > > > Currently in {{FileSystem}}, {{rename}} with _Rename options_ is protected > and with _deprecated_ annotation. And the default implementation is not > atomic. > So this method is not able to be used outside. On the other hand, HDFS has a > good and atomic implementation. (Also an interesting thing in {{DFSClient}}, > the _deprecated_ annotations for these two methods are opposite). > It makes sense to make public for {{rename}} with _Rename options_, since > it's atomic for rename+overwrite, also it saves RPC calls if user desires > rename+overwrite. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org