[
https://issues.apache.org/jira/browse/HDFS-487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12731849#action_12731849
]
dhruba borthakur commented on HDFS-487:
---------------------------------------
> So truncating a file would change the fileid?
Truncating a file does not change the fileid. There isn't an operation that can
change the fileid of an existing file. The filid is associated with a file at
file creation time. If you delete a file and then recreate a file with the same
pathname, the new file will get a new fileid. The reason I mention truncate is
to exemplify the fact that the heuristic used in "distcp -update" option might
not work very well when hdfs supports truncates. "distcp -update" could use the
fileid to reduce the probability of not detecting modified files.
> I am still not clear about block placement use case.. may be it can use id of
> the first block (it comes for free).
A blockid of a block is a concatenation of a 64 bit blockid and a 64 bit
generation stamp. An error while writing to a block causes the generation stamp
of that block to be modified. So, the blockid of the first block of a file does
not remain fixed for the lifetime of that file. That means, it cannot be used
as an unique identifier for a file.
> (3) separation of block management.
UUIDs probably make it somewhat futureproof, but we can also upgrade the
unique-within-filesystem-fileid to a globally-unique-fileid when the use case
arises. Such an upgrade will be easy to do. (The tradeoff is using more memory
in the NN)
> HDFS should expose a fileid to uniquely identify a file
> -------------------------------------------------------
>
> Key: HDFS-487
> URL: https://issues.apache.org/jira/browse/HDFS-487
> Project: Hadoop HDFS
> Issue Type: New Feature
> Reporter: dhruba borthakur
> Assignee: dhruba borthakur
> Attachments: fileid1.txt
>
>
> HDFS should expose a id that uniquely identifies a file. This helps in
> developing applications that work correctly even when files are moved from
> one directory to another. A typical use-case is to make the Pluggable Block
> Placement Policy (HDFS-385) use fileid instead of filename.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.