[ https://issues.apache.org/jira/browse/HDFS-1110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12875240#action_12875240 ]
Jakob Homan commented on HDFS-1110: ----------------------------------- Looking good. Review: * If you keep {{NamespaceDedupe}}, which I would recommend as I do think it adds value in and of itself, it's probably best to move its user-facing bits with the rest of the offline image viewers. {{OfflineImageViewer.java}} handles all the command line arguments and such. * {{NamespaceDedupe.java}}:51 line goes more than 80 characters. * Nit: {{TestNameDictionary::testNameReuse()}} at first looked to me like a unit test that hadn't annotated. Maybe verifyNameReuse? * The static class {{ByteArray}} seems like a candidate either for being a stand-alone class or wrapped by {{NameDictionary}}; it's not really an integral part of {{FSDirectory}}. * The {{NameDictionary.lookup(name, value)}} method seems a bit odd in its usage. Both times it's used via dictionary.lookup(name, name), which makes me wonder if this is the right API. Do we expect {{NameDictionary}} to be used elsewhere such that this abstraction is worth the odd API? Overall I think this is a good thing to do. The 12 second startup cost compared to the almost 2 gb savings seems worth it to me. There should be a linear tradeoff such that small clusters should see essentially no impact and large clusters pay a very small penalty at startup but have the benefits for their entire runtime. A useful improvement later on may be a safemode command to repopulate the dictionary, which would take into account changes since cluster startup, particularly newly popular filenames. > Namenode heap optimization - reuse objects for commonly used file names > ----------------------------------------------------------------------- > > Key: HDFS-1110 > URL: https://issues.apache.org/jira/browse/HDFS-1110 > Project: Hadoop HDFS > Issue Type: Improvement > Reporter: Suresh Srinivas > Assignee: Suresh Srinivas > Fix For: 0.22.0 > > Attachments: hdfs-1110.2.patch, hdfs-1110.3.patch, hdfs-1110.patch > > > There are a lot of common file names used in HDFS, mainly created by > mapreduce, such as file names starting with "part". Reusing byte[] > corresponding to these recurring file names will save significant heap space > used for storing the file names in millions of INodeFile objects. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.