[ https://issues.apache.org/jira/browse/HDFS-4489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13614721#comment-13614721 ]
Suresh Srinivas commented on HDFS-4489: --------------------------------------- bq. for a total of 40 bytes on a 64-bit JVM. So, adding 16-24 bytes is a pretty substantial new memory use. Here are the things that goes into ~180 bytes: INode is an object. It comes with the cost of 16 bytes object header overhead. Members include: # byte[] name - I assume typically ~56 bytes for this. That is (16 bytes object overhead, 8 byte length + bytes that make up file name, say 32) # reference to byte[] name - 8 bytes # long permission at the cost of 8 bytes. # parent reference at 8 bytes cost # modification time at 8 bytes cost # accessTime at 8 bytes cost That is roughly ~112 bytes. Typically most of the INodes are INode files (I will leave the other type of inodes as an exercise). # It has BlockInfo[]. This is again 16 bytes of object, 8 bytes length, say two blocks in a file with two references, with a cost of 40 bytes. # It has long header that adds another 8 bytes. Total ~160 bytes. So it is not very far off and the number I had posted was based on what I had calculated long back. That said, 16-24 might seem like a huge percentage (10 to 15%) of INode size. But what is the amount of memory in NN heap that is allocate for Inodes. Assuming Inodes make up for 1/3, blocks make up for another 1/3, remaining 1/3 for floating garbage, head room etc, the net impact on NN heap is 3 to 5%. That is not far off from the analysis posted above. I believe half of the work is already in trunk. Remaining two jiras need to go in. I believe doing a branch at this point in time is unnecessary work. If you are concerned about memory usage of your installs, I can add a config option and not instantiate the map. > Use InodeID as as an identifier of a file in HDFS protocols and APIs > -------------------------------------------------------------------- > > Key: HDFS-4489 > URL: https://issues.apache.org/jira/browse/HDFS-4489 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode > Reporter: Brandon Li > Assignee: Brandon Li > > The benefit of using InodeID to uniquely identify a file can be multiple > folds. Here are a few of them: > 1. uniquely identify a file cross rename, related JIRAs include HDFS-4258, > HDFS-4437. > 2. modification checks in tools like distcp. Since a file could have been > replaced or renamed to, the file name and size combination is no t reliable, > but the combination of file id and size is unique. > 3. id based protocol support (e.g., NFS) > 4. to make the pluggable block placement policy use fileid instead of > filename (HDFS-385). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira