[ 
https://issues.apache.org/jira/browse/HDFS-3370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13270589#comment-13270589
 ] 

Daryn Sharp commented on HDFS-3370:
-----------------------------------

While I really like the idea of hardlinks, I believe there are more non-trivial 
consideration with this proposed implementation.  I'm by no means a SME, but I 
experimented with a very different approach awhile ago.  Here are some of the 
issues I encountered:

I think the quota considerations may be a bit trickier.  The original creator 
of the file takes the nsquota & dsquota hit.  The links take just the dsquota 
hit.  However, when the original creator of the file is removed, one of the 
other links must absorb the dsquota.  If there are multiple remaining links, 
which one takes the hit?

What if none of the remaining links have available quota?  If the dsquota can 
always be exceeded, I can bypass my quota by creating the file in one dir, 
hardlinking from my out-of-dsquota dir, then removing the original.  If the 
dsquota cannot be exceeded, I can (maliciously?) hardlink from my 
out-of-dsquota dir to deny the original creator the ability to delete the file 
-- perhaps causing them to be unable to reduce their quota usage.

Block management will also be impacted.  The manager currently operates on an 
inode mapping (changing to an interface though), but which of the hardlink 
inodes will it be?  The original?  When that link is removed, how will the 
block manager be updated with another hardlink inode?

When a file is open for writing, the inode converts to under construction, so 
there would need to be a hardlink under construction.  You will have to think 
about how other hardlinks are affected/handled.  The case applies to hardlinks 
during file creation and appending.

There may also be an impact to file leases.  I believe they are path based so 
leases will now need to be enforced across multiple paths.

What if one hardlink changes the replication factor?  The maximum replication 
factor for all hardlinks should probably be obeyed, but now the setrep command 
will never succeed since it waits for the replication value to actually change.
                
> HDFS hardlink
> -------------
>
>                 Key: HDFS-3370
>                 URL: https://issues.apache.org/jira/browse/HDFS-3370
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>            Reporter: Hairong Kuang
>            Assignee: Liyin Tang
>         Attachments: HDFS-HardLinks.pdf
>
>
> We'd like to add a new feature hardlink to HDFS that allows harlinked files 
> to share data without copying. Currently we will support hardlinking only 
> closed files, but it could be extended to unclosed files as well.
> Among many potential use cases of the feature, the following two are 
> primarily used in facebook:
> 1. This provides a lightweight way for applications like hbase to create a 
> snapshot;
> 2. This also allows an application like Hive to move a table to a different 
> directory without breaking current running hive queries.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to