[jira] [Commented] (HDFS-4489) Use InodeID as as an identifier of a file in HDFS protocols and APIs

Suresh Srinivas (JIRA) Tue, 26 Mar 2013 17:19:17 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-4489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13614721#comment-13614721
 ]


Suresh Srinivas commented on HDFS-4489:
---------------------------------------

bq. for a total of 40 bytes on a 64-bit JVM. So, adding 16-24 bytes is a pretty 
substantial new memory use.
Here are the things that goes into ~180 bytes:
INode is an object. It comes with the cost of 16 bytes object header overhead. 
Members include:
# byte[] name - I assume typically ~56 bytes for this. That is (16 bytes object 
overhead, 8 byte length + bytes that make up file name, say 32)
# reference to byte[] name - 8 bytes
# long permission at the cost of 8 bytes.
# parent reference at 8 bytes cost
# modification time at 8 bytes cost
# accessTime at 8 bytes cost

That is roughly ~112 bytes.

Typically most of the INodes are INode files (I will leave the other type of 
inodes as an exercise).
# It has BlockInfo[]. This is again 16 bytes of object, 8 bytes length, say two 
blocks in a file with two references, with a cost of 40 bytes.
# It has long header that adds another 8 bytes.

Total ~160 bytes. So it is not very far off and the number I had posted was 
based on what I had calculated long back.

That said, 16-24 might seem like a huge percentage (10 to 15%) of INode size. 
But what is the amount of memory in NN heap that is allocate for Inodes. 
Assuming Inodes make up for 1/3, blocks make up for another 1/3, remaining 1/3 
for floating garbage, head room etc, the net impact on NN heap is 3 to 5%. That 
is not far off from the analysis posted above.

I believe half of the work is already in trunk. Remaining two jiras need to go 
in. I believe doing a branch at this point in time is unnecessary work.

If you are concerned about memory usage of your installs, I can add a config 
option and not instantiate the map. 



                
> Use InodeID as as an identifier of a file in HDFS protocols and APIs
> --------------------------------------------------------------------
>
>                 Key: HDFS-4489
>                 URL: https://issues.apache.org/jira/browse/HDFS-4489
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: namenode
>            Reporter: Brandon Li
>            Assignee: Brandon Li
>
> The benefit of using InodeID to uniquely identify a file can be multiple 
> folds. Here are a few of them:
> 1. uniquely identify a file cross rename, related JIRAs include HDFS-4258, 
> HDFS-4437.
> 2. modification checks in tools like distcp. Since a file could have been 
> replaced or renamed to, the file name and size combination is no t reliable, 
> but the combination of file id and size is unique.
> 3. id based protocol support (e.g., NFS)
> 4. to make the pluggable block placement policy use fileid instead of 
> filename (HDFS-385).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-4489) Use InodeID as as an identifier of a file in HDFS protocols and APIs

Reply via email to