[ https://issues.apache.org/jira/browse/HDFS-3370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13294861#comment-13294861 ]
M. C. Srivas commented on HDFS-3370: ------------------------------------ @Karthik: using hard-links for backup accomplishes exactly the opposite. The expectation with a correctly-implemented hardlink is that when the original is modified, the change is reflected in the file, no matter which path-name was used to access it. Isn't that exactly the opposite effect of what a backup/snapshot is supposed to do? Unless of course you are committing to never ever being able to modify a file once written (although that would be viewed by most as a major step backwards in the evolution of Hadoop). Another major problem is the scalability of the NN gets reduced by a factor of 10. (ie, your cluster can now hold only 10 million files instead of the 100 million which it used to be able to hold). Imagine someone doing a backup every 6 hours. Let's say the backups are to be retained as follows: 4 for the past 24 hrs, 1 daily for a week, and 1 per week for 1 month. Total: 4 + 7 + 4 = 15 backups, ie, 15 hard-links to the files, one from each backup. So each file is pointed to by 15 names, or, in another words, the NN now holds 15 names instead of 1 for each file. I think that would reduce the number of files held by the cluster practically speaking by a factor of 10, no? Thirdly, hard-links don't work with directories. What is the scheme to back up directories? (If this scheme only usable for HBase backups and nothing else, then I agree with Konstantin that it belongs in the HBase layer and not here) > HDFS hardlink > ------------- > > Key: HDFS-3370 > URL: https://issues.apache.org/jira/browse/HDFS-3370 > Project: Hadoop HDFS > Issue Type: New Feature > Reporter: Hairong Kuang > Assignee: Liyin Tang > Attachments: HDFS-HardLink.pdf > > > We'd like to add a new feature hardlink to HDFS that allows harlinked files > to share data without copying. Currently we will support hardlinking only > closed files, but it could be extended to unclosed files as well. > Among many potential use cases of the feature, the following two are > primarily used in facebook: > 1. This provides a lightweight way for applications like hbase to create a > snapshot; > 2. This also allows an application like Hive to move a table to a different > directory without breaking current running hive queries. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira