[
https://issues.apache.org/jira/browse/HDFS-3370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13405654#comment-13405654
]
Jesse Yates commented on HDFS-3370:
-----------------------------------
Sorry for the slow reply, been a bit busy of late...
@Daryn
bq. Retaining ref-counted paths after deletion in the origin namespace requires
an "inode id". A new api to reference paths based on the id is required. We
aren't so soft anymore...
That's why I'd argue for doing it in file meta-data with periodic rewrites so
we can just do appends. We will still need to maintain references if we do
hardlinks, so this is just a single method call to do the update - arguably a
pretty simple code path that doesn't need to be that highly optimized for
multi-writers since we can argue that hardlinks are "rare".
bq. The inode id needs to be secured since it bypasses all parent dir
permissions,
Yeah, thats a bit of a pain... Maybe a bit more metadate to store with the
file...?
@Konstantin
bq. Do I understand correctly that your hidden inodes can be regular HDFS
files, and that then the whole implementation can be done on top of existing
HDFS, as a stand alone library supporting calls
Yeah, I guess that's a possibility. But you would probably need to have some
sort of "namespace managers" to deal with handling hardlinks across different
namespaces, which fits comfortably with the distributed namenode design.
bq. ref-counted links, creating hidden "only accessible to the namenode"
inodes, leases on arbitrated NN ownership, retention of deleted files with
non-zero ref count, etc. Those aren't client-side operations.
Since you keep the data along with the file (including the current file owner),
you could do it all from a library. However, since the lease needs to be
periodically regained, you will see temporary unavailability in the hardlinked
files in the managed namespace. If you couple the hardlink management with the
namenode managing the space, you can then do forced-resassign of the hardlinks
to the back-up namendoe and still see the availability as you would files in
that namespace, in terms of creating new hardlinks (reads would still work
since all the important data can be replicated across the different namespaces).
@Andy: I don't know if I've seen a compelling reason that we _need_ to have
cross-namespace hardlinks, particularly since they are _hard_, to say the
least.
> HDFS hardlink
> -------------
>
> Key: HDFS-3370
> URL: https://issues.apache.org/jira/browse/HDFS-3370
> Project: Hadoop HDFS
> Issue Type: New Feature
> Reporter: Hairong Kuang
> Assignee: Liyin Tang
> Attachments: HDFS-HardLink.pdf
>
>
> We'd like to add a new feature hardlink to HDFS that allows harlinked files
> to share data without copying. Currently we will support hardlinking only
> closed files, but it could be extended to unclosed files as well.
> Among many potential use cases of the feature, the following two are
> primarily used in facebook:
> 1. This provides a lightweight way for applications like hbase to create a
> snapshot;
> 2. This also allows an application like Hive to move a table to a different
> directory without breaking current running hive queries.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira