[jira] [Commented] (HDFS-3370) HDFS hardlink

Jesse Yates (JIRA) Mon, 02 Jul 2012 23:21:27 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-3370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13405654#comment-13405654
 ]


Jesse Yates commented on HDFS-3370:
-----------------------------------

Sorry for the slow reply, been a bit busy of late...
@Daryn
bq. Retaining ref-counted paths after deletion in the origin namespace requires 
an "inode id". A new api to reference paths based on the id is required. We 
aren't so soft anymore...

That's why I'd argue for doing it in file meta-data with periodic rewrites so 
we can just do appends. We will still need to maintain references if we do 
hardlinks, so this is just a single method call to do the update - arguably a 
pretty simple code path that doesn't need to be that highly optimized for 
multi-writers since we can argue that hardlinks are "rare". 

bq. The inode id needs to be secured since it bypasses all parent dir 
permissions, 

Yeah, thats a bit of a pain... Maybe a bit more metadate to store with the 
file...?

@Konstantin
bq. Do I understand correctly that your hidden inodes can be regular HDFS 
files, and that then the whole implementation can be done on top of existing 
HDFS, as a stand alone library supporting calls

Yeah, I guess that's a possibility. But you would probably need to have some 
sort of "namespace managers" to deal with handling hardlinks across different 
namespaces, which fits comfortably with the distributed namenode design. 

bq. ref-counted links, creating hidden "only accessible to the namenode" 
inodes, leases on arbitrated NN ownership, retention of deleted files with 
non-zero ref count, etc. Those aren't client-side operations.

Since you keep the data along with the file (including the current file owner), 
you could do it all from a library. However, since the lease needs to be 
periodically regained, you will see temporary unavailability in the hardlinked 
files in the managed namespace. If you couple the hardlink management with the 
namenode managing the space, you can then do forced-resassign of the hardlinks 
to the back-up namendoe and still see the availability as you would files in 
that namespace, in terms of creating new hardlinks (reads would still work 
since all the important data can be replicated across the different namespaces).

@Andy: I don't know if I've seen a compelling reason that we _need_ to have 
cross-namespace hardlinks, particularly since they are _hard_, to say the 
least. 
                
> HDFS hardlink
> -------------
>
>                 Key: HDFS-3370
>                 URL: https://issues.apache.org/jira/browse/HDFS-3370
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>            Reporter: Hairong Kuang
>            Assignee: Liyin Tang
>         Attachments: HDFS-HardLink.pdf
>
>
> We'd like to add a new feature hardlink to HDFS that allows harlinked files 
> to share data without copying. Currently we will support hardlinking only 
> closed files, but it could be extended to unclosed files as well.
> Among many potential use cases of the feature, the following two are 
> primarily used in facebook:
> 1. This provides a lightweight way for applications like hbase to create a 
> snapshot;
> 2. This also allows an application like Hive to move a table to a different 
> directory without breaking current running hive queries.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3370) HDFS hardlink

Reply via email to