[ 
https://issues.apache.org/jira/browse/HDFS-4489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13642290#comment-13642290
 ] 

Sanjay Radia commented on HDFS-4489:
------------------------------------

Nathan. A question.
Suresh is willing to do the performance benchmark, but I am trying to 
understand where you are coming from.  Yahoo and FB create very large 
namespaces by simply buying more memory and increasing the size of the heap. Do 
you worry about cache pollution when you create 50K more files? Given that the 
NN heap (many GBs) is so much larger than the cache, does the additional inode 
and inode-map size impact the overall system performance? Suresh has argued 
that a 24GB heap grows by 625MB. Looking at the growth in memory of this 
feature as a percentage of the total heap size is a more realistic way of 
looking at the impact of the growth than the growth of an individual data 
structure like the inode.

IMHO, not having an inode-map and inode number was a serious limitation in the 
original implementation of NN. I am willing to pay for the extra memory given 
the value inode-id and inode-map brings (as described by suresh in the 
beginning of this Jira). Permissions, access time, etc   added to the memory 
cost of the the NN and were accepted because of the value they bring. 



                
> Use InodeID as as an identifier of a file in HDFS protocols and APIs
> --------------------------------------------------------------------
>
>                 Key: HDFS-4489
>                 URL: https://issues.apache.org/jira/browse/HDFS-4489
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: namenode
>            Reporter: Brandon Li
>            Assignee: Brandon Li
>             Fix For: 2.0.5-beta
>
>
> The benefit of using InodeID to uniquely identify a file can be multiple 
> folds. Here are a few of them:
> 1. uniquely identify a file cross rename, related JIRAs include HDFS-4258, 
> HDFS-4437.
> 2. modification checks in tools like distcp. Since a file could have been 
> replaced or renamed to, the file name and size combination is no t reliable, 
> but the combination of file id and size is unique.
> 3. id based protocol support (e.g., NFS)
> 4. to make the pluggable block placement policy use fileid instead of 
> filename (HDFS-385).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to