[ https://issues.apache.org/jira/browse/HDFS-1110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12875805#action_12875805 ]
Konstantin Shvachko commented on HDFS-1110: ------------------------------------------- # variable {{cache}} should read {{nameCache}} # comment for it should be transformed to JavaDoc comments. # {{FSDirectory.cache}} should be initialized in the constructor rather than during declaration. And 10 should be declared as a constant. # I would consider using NameCache<byte[]> instead of NameCache<ByteArray>. You get less objects and conversions, if of course I didn't miss anything here. # Introduce {{FSDirectory.cache(INode)}} method, which calls NameCache.put(). # In NameCache some comments need clarification #- "This class has two phases" Probably something else has 2 phases. #- "This class must be synchronized externally" #- Member inline comments should be transformed into javadoc. # NameCache.cache should be initialized in the constructor rather than during declaration. # {{UseCount}} should probably be a private inner (rather than static) class, and should use the same parameter K with which NameCache<K> is parametrized. {code} private class UseCount { int count; // Number of times a name occurs final K value; // Internal value for the name UseCount(final K value) { this.value = value; } } {code} # {{UseCount.count}} should be initialized in the constructor. It is better to have increment() and get() methods rather than accessing count directly from the outside. I like the idea of using the useThreshold to determine names that should be promoted to the nameCache. My main concern is, that the threshold is 10. This means there will a lot of names in the cache. And all these names are in a HashTable, which has a huge overhead, as we know from another jira. We still save space, but for names that occur only 10 times the savings are probably negligible. I would imagine that only 5% or 10% of the most frequently used names get promoted. It is fine with me to use this simple promoting scheme as a starting point, with an intention to optimize it later. But I would increase the useThreshold to 1000 or so. Should we make it configurable? Could be useful for testing. > Namenode heap optimization - reuse objects for commonly used file names > ----------------------------------------------------------------------- > > Key: HDFS-1110 > URL: https://issues.apache.org/jira/browse/HDFS-1110 > Project: Hadoop HDFS > Issue Type: Improvement > Reporter: Suresh Srinivas > Assignee: Suresh Srinivas > Fix For: 0.22.0 > > Attachments: hdfs-1110.2.patch, hdfs-1110.3.patch, hdfs-1110.4.patch, > hdfs-1110.patch > > > There are a lot of common file names used in HDFS, mainly created by > mapreduce, such as file names starting with "part". Reusing byte[] > corresponding to these recurring file names will save significant heap space > used for storing the file names in millions of INodeFile objects. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.