[ https://issues.apache.org/jira/browse/HDFS-12922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Misha Dmitriev updated HDFS-12922: ---------------------------------- Attachment: screenshot-1.png > Arrays of length 1 cause 9.2% memory overhead > --------------------------------------------- > > Key: HDFS-12922 > URL: https://issues.apache.org/jira/browse/HDFS-12922 > Project: Hadoop HDFS > Issue Type: Improvement > Reporter: Misha Dmitriev > Assignee: Misha Dmitriev > Attachments: screenshot-1.png > > > I recently obtained a big (over 60GiB) heap dump from a customer and analyzed > it using jxray (www.jxray.com). One source of memory waste that the tool > detected is arrays of length 1 that come from {{BlockInfo[] > org.apache.hadoop.hdfs.server.namenode.INodeFile.blocks}} and > {{INode$Feature[] > org.apache.hadoop.hdfs.server.namenode.INodeFile.features}}. Only a small > fraction of these arrays (less than 10%) have a length greater than 1. > Collectively these arrays waste 5.5GiB, or 9.2% of the heap. See the attached > screenshot for more details. > The reason why an array of length 1 is problematic is that every array in the > JVM has a header, that takes between 16 and 20 bytes depending on the JVM > configuration. For a big enough array this 16-20 byte overhead is not a > concern, but if the array has only one element (that takes 4-8 bytes > depending on the JVM configuration), the overhead becomes bigger than the > array's "workload". > In such a situation it makes sense to replace the array data field {{Foo[] > ar}} with an {{Object obj}}, that would contain either a direct reference to > the array's single workload element, or a reference to the array if there is > more than one element. This change will require further code changes and type > casts. For example, code like {{return ar[i];}} becomes {{return (obj > instanceof Foo) ? (Foo) obj : ((Foo[]) obj)[i];}} and so on. This doesn't > look very pretty, but as far as I see, the code that deals with e.g. > INodeFile.blocks already contains various null checks, etc. So we will not > make the code much less readable. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org