[ 
https://issues.apache.org/jira/browse/HDFS-3370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13294861#comment-13294861
 ] 

M. C. Srivas commented on HDFS-3370:
------------------------------------

@Karthik:  using hard-links for backup accomplishes exactly the opposite. The 
expectation with a correctly-implemented hardlink is that when the original is 
modified, the change is reflected in the file, no matter which path-name was 
used to access it. Isn't that exactly the opposite effect of what a 
backup/snapshot is supposed to do?  Unless of course you are committing to 
never ever being able to modify a file once written (although that would be 
viewed by most as a major step backwards in the evolution of Hadoop).

Another major problem is the scalability of the NN gets reduced by a factor of 
10.  (ie, your cluster can now hold only 10 million files instead of the 100 
million which it used to be able to hold).  Imagine someone doing a backup 
every 6 hours. Let's say the backups are to be retained as follows:  4 for the 
past 24 hrs, 1 daily for a week, and 1 per week for 1 month. Total: 4 + 7 + 4 = 
15 backups, ie, 15 hard-links to the files, one from each backup. So each file 
is pointed to by 15 names, or, in another words, the NN now holds 15 names 
instead of 1 for each file.  I think that would reduce the number of files held 
by the cluster practically speaking by a factor of 10, no?

Thirdly, hard-links don't work with directories. What is the scheme to back up 
directories?  (If this scheme only usable for HBase backups and nothing else, 
then I agree with Konstantin that it belongs in the HBase layer and not here)


                
> HDFS hardlink
> -------------
>
>                 Key: HDFS-3370
>                 URL: https://issues.apache.org/jira/browse/HDFS-3370
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>            Reporter: Hairong Kuang
>            Assignee: Liyin Tang
>         Attachments: HDFS-HardLink.pdf
>
>
> We'd like to add a new feature hardlink to HDFS that allows harlinked files 
> to share data without copying. Currently we will support hardlinking only 
> closed files, but it could be extended to unclosed files as well.
> Among many potential use cases of the feature, the following two are 
> primarily used in facebook:
> 1. This provides a lightweight way for applications like hbase to create a 
> snapshot;
> 2. This also allows an application like Hive to move a table to a different 
> directory without breaking current running hive queries.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to