[ 
https://issues.apache.org/jira/browse/HADOOP-11452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15807864#comment-15807864
 ] 

Steve Loughran commented on HADOOP-11452:
-----------------------------------------

We've kind of gone round in circles on the "what features" probe, because it's 
so fluid. HADOOP-9565 has discussed this. I think it's time to look at the 
method again, with a list of well known strings to look for. Blobstores can add 
their own "atomic-put-on-close", etc.

Now regarding a patch to say "I must have atomic", well, yes, if you declare 
you want it, why not have the thing fail-fast? As it is, right now you get 
non-atomic renames *and don't even know*.

w.r.t S3A, we are going to do things which relies on PUT being atomic, see 
HADOOP-13786 for the full algorithm. All I was proposing was a way tor people 
to say "This really, really must be atomic, so that peoples code which contain 
fundamental requirements of rename semantics aren't going to get deep into 
trouble on S3 or Swift (but not Azure). What gets into trouble? MRv1 and MRv2 
committers, for example.

Making things public? Well, FileStatus is ubiquitous; too late to remove, And, 
because it lets the underlying implementation do what it wants, is great to 
work with from blobstore code as we can do lots to minimise overhead. For 
example, {{FileContext.listFiles()}} implements its recursive treewalk, which 
would seemingly make HADOOP-13208 impossible to support. I know FC is cleaner, 
but for playing blobstore games, the simpler FS API is easier to improve, 
despite its lack of consistency across impls.

So instead we have classic {{boolean rename(src, dest)}} where nobody really 
knows what to do when, say, the source doesn't exist, dest is "/", etc, etc. 
And we have a rename(src, dest, options), where the base implementation, the 
protected one in {{FileSystem}}, is in fact broken as in "will delete your 
data" broken. I consider that important to fix, even if it currently only bites 
anyone using FileContext.rename(src, src, overwrite).

Now, the current patch *doesn't* do anything w.r.t renames, it opens up the 
method, fixes its base rename call to not delete the source, tries to specify 
what actually goes on in HFDS, pulls the error strings out of DFS & makes them 
shared constants, so that the other implementations can raise exceptions with 
identical methods.

Do you want to review it? I know it's not complete, it doesn't have the tests 
for the corner cases I've managed to identify, but at least have a look at the 
FS spec document and show me where i've misunderstood thngs.

> Revisit FileSystem.rename(path, path, options)
> ----------------------------------------------
>
>                 Key: HADOOP-11452
>                 URL: https://issues.apache.org/jira/browse/HADOOP-11452
>             Project: Hadoop Common
>          Issue Type: Task
>          Components: fs
>    Affects Versions: 2.7.3
>            Reporter: Yi Liu
>            Assignee: Steve Loughran
>         Attachments: HADOOP-11452-001.patch, HADOOP-11452-002.patch
>
>
> Currently in {{FileSystem}}, {{rename}} with _Rename options_ is protected 
> and with _deprecated_ annotation. And the default implementation is not 
> atomic.
> So this method is not able to be used outside. On the other hand, HDFS has a 
> good and atomic implementation. (Also an interesting thing in {{DFSClient}}, 
> the _deprecated_ annotations for these two methods are opposite).
> It makes sense to make public for {{rename}} with _Rename options_, since 
> it's atomic for rename+overwrite, also it saves RPC calls if user desires 
> rename+overwrite.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

Reply via email to