[ 
https://issues.apache.org/jira/browse/HADOOP-15086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16276796#comment-16276796
 ] 

Steve Loughran commented on HADOOP-15086:
-----------------------------------------

I don't disagree with you about the existence of the problem, just don't think 
it's easily fixed. Essentially: blobstores tend not to have a rename() (or 
indeed: create(overwrite=false), delete(directory), and the things we do to 
mimic this in our connectors aren't atomic

1. We cover this in [Object Stores|https://hado 
op.apache.org/docs/stable/hadoop-project-dist/hadoop-common/filesystem/introduction.html#Object_Stores_vs._Filesystems]
2. This is also common to: S3x, Swift, OSS, ADL, ...
3. By inference, the Hadoop FileOutputCommit protocol is not atomic on object 
stores either. 
4. Compare with the requirements of rename() as covered in 
[rename()|https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/filesystem/filesystem.html#boolean_renamePath_src_Path_d]
 

There is actually special support in Azure for atomic rename of HBase 
directories; this is done with leasing, recovery and stuff. It manages 
exclusivity, but it is still not an O(1) operation.

If you look at where we are going with this, the work is in moving to 
object-store specific committers which provide the commit semantics without 
relying on renames. HADOOP-13786 is the initial implementation of this for S3A, 
but the hooks put into FileOutputFormat are designed to support 
filesystem-specific committers for any store which implements one. 

I'm closing as a WONTFIX. Sorry. It's not that we don't want to, it's just 
directory operations are where the metaphor "object stores are like 
filesystems" fail if you look closely enough.

(On a brighter note: wasb is consistent of both metadata and data)

> NativeAzureFileSystem.rename is not atomic
> ------------------------------------------
>
>                 Key: HADOOP-15086
>                 URL: https://issues.apache.org/jira/browse/HADOOP-15086
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/azure
>    Affects Versions: 2.7.3
>            Reporter: Shixiong Zhu
>         Attachments: RenameReproducer.java
>
>
> When multiple threads rename files to the same target path, more than 1 
> threads can succeed. It's because check and copy file in `rename` is not 
> atomic.
> I would expect it's atomic just like HDFS.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

Reply via email to