[ https://issues.apache.org/jira/browse/HDFS-9763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15170419#comment-15170419 ]
Haohui Mai commented on HDFS-9763: ---------------------------------- I agree with [~cmccabe]. I don't think this is a good idea. The concerns are definitely valid. Addressing them by setting arbitrary usually indicating the design is problematic. If the whole point is to batch RPC and avoid TOCTOU, maybe you want to adpot the design of transactional file systems. http://www3.cs.stonybrook.edu/~porter/pubs/porter09hotos.pdf > Add merge api > ------------- > > Key: HDFS-9763 > URL: https://issues.apache.org/jira/browse/HDFS-9763 > Project: Hadoop HDFS > Issue Type: New Feature > Components: fs > Reporter: Ashutosh Chauhan > Assignee: Xiaobing Zhou > Attachments: HDFS_Merge_API_Proposal.pdf > > > It will be good to add merge(Path dir1, Path dir2, ... ) api to HDFS. > Semantics will be to move all files under dir1 to dir2 and doing a rename of > files in case of collisions. > In absence of this api, Hive[1] has to check for collision for each file and > then come up unique name and try again and so on. This is inefficient in > multiple ways: > 1) It generates huge number of calls on NN (atleast 2*number of source files > in dir1) > 2) It suffers from TOCTOU[2] bug for client picked up name in case of > collision. > 3) Whole operation is not atomic. > A merge api outlined as above will be immensely useful for Hive and > potentially to other HDFS users. > [1] > https://github.com/apache/hive/blob/release-2.0.0-rc1/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java#L2576 > [2] https://en.wikipedia.org/wiki/Time_of_check_to_time_of_use -- This message was sent by Atlassian JIRA (v6.3.4#6332)