[ https://issues.apache.org/jira/browse/HDFS-487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12731849#action_12731849 ]
dhruba borthakur commented on HDFS-487: --------------------------------------- > So truncating a file would change the fileid? Truncating a file does not change the fileid. There isn't an operation that can change the fileid of an existing file. The filid is associated with a file at file creation time. If you delete a file and then recreate a file with the same pathname, the new file will get a new fileid. The reason I mention truncate is to exemplify the fact that the heuristic used in "distcp -update" option might not work very well when hdfs supports truncates. "distcp -update" could use the fileid to reduce the probability of not detecting modified files. > I am still not clear about block placement use case.. may be it can use id of > the first block (it comes for free). A blockid of a block is a concatenation of a 64 bit blockid and a 64 bit generation stamp. An error while writing to a block causes the generation stamp of that block to be modified. So, the blockid of the first block of a file does not remain fixed for the lifetime of that file. That means, it cannot be used as an unique identifier for a file. > (3) separation of block management. UUIDs probably make it somewhat futureproof, but we can also upgrade the unique-within-filesystem-fileid to a globally-unique-fileid when the use case arises. Such an upgrade will be easy to do. (The tradeoff is using more memory in the NN) > HDFS should expose a fileid to uniquely identify a file > ------------------------------------------------------- > > Key: HDFS-487 > URL: https://issues.apache.org/jira/browse/HDFS-487 > Project: Hadoop HDFS > Issue Type: New Feature > Reporter: dhruba borthakur > Assignee: dhruba borthakur > Attachments: fileid1.txt > > > HDFS should expose a id that uniquely identifies a file. This helps in > developing applications that work correctly even when files are moved from > one directory to another. A typical use-case is to make the Pluggable Block > Placement Policy (HDFS-385) use fileid instead of filename. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.