[ 
https://issues.apache.org/jira/browse/HDFS-487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12731849#action_12731849
 ] 

dhruba borthakur commented on HDFS-487:
---------------------------------------

> So truncating a file would change the fileid?

Truncating a file does not change the fileid. There isn't an operation that can 
change the fileid of an existing file. The filid is associated with a file at 
file creation time. If you delete a file and then recreate a file with the same 
pathname, the new file will get a new fileid. The reason I mention truncate is 
to exemplify the fact that the heuristic used in "distcp -update" option might 
not work very well when hdfs supports truncates. "distcp -update" could use the 
fileid to reduce the probability of not detecting modified files.

> I am still not clear about block placement use case.. may be it can use id of 
> the first block (it comes for free).

A blockid of a block is a concatenation of a 64 bit blockid and a 64 bit 
generation stamp. An error while writing to a block causes the generation stamp 
of that block to be modified. So, the blockid of the first block of a file does 
not remain fixed for the lifetime of that file. That means, it cannot be used 
as an unique identifier for a file.

> (3) separation of block management.

UUIDs probably make it somewhat futureproof, but we can also upgrade the 
unique-within-filesystem-fileid to a globally-unique-fileid when the use case 
arises. Such an upgrade will be easy to do. (The tradeoff is using more memory 
in the NN)

> HDFS should expose a fileid to uniquely identify a file
> -------------------------------------------------------
>
>                 Key: HDFS-487
>                 URL: https://issues.apache.org/jira/browse/HDFS-487
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: fileid1.txt
>
>
> HDFS should expose a id that uniquely identifies a file. This helps in 
> developing  applications that work correctly even when files are moved from 
> one directory to another. A typical use-case is to make the Pluggable Block 
> Placement Policy (HDFS-385) use fileid instead of filename.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to