[ https://issues.apache.org/jira/browse/HDFS-7435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14224638#comment-14224638 ]
Daryn Sharp commented on HDFS-7435: ----------------------------------- We're definitely on the same page. I meant questionably in the sense that the garbage generation rate is so high that CMS (you mention fragmentation) will be slow, bloated with free lists, and not keeping up. Now granted, a 20M allocation is likely to be prematurely tenured - along with the IPC protobuf containing the large report. One solution/workaround to both is reducing the size of the individual BR via multiple storages. The storages don't have to be individual drives but just subdirs. Segmenting shouldn't be an outright replacement. The decode will emit a long[][] which requires updating {{StorageBlockReport}} and {{BlockListAsLongs}}. Similar changes to the datanode, although not required for the namenode changes, will be more complex. I already considered segmenting. :) I found the complexity vs time vs benefit to be much lower than improvements to other subsystems. > PB encoding of block reports is very inefficient > ------------------------------------------------ > > Key: HDFS-7435 > URL: https://issues.apache.org/jira/browse/HDFS-7435 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode, namenode > Affects Versions: 2.0.0-alpha, 3.0.0 > Reporter: Daryn Sharp > Assignee: Daryn Sharp > Priority: Critical > Attachments: HDFS-7435.patch > > > Block reports are encoded as a PB repeating long. Repeating fields use an > {{ArrayList}} with default capacity of 10. A block report containing tens or > hundreds of thousand of longs (3 for each replica) is extremely expensive > since the {{ArrayList}} must realloc many times. Also, decoding repeating > fields will box the primitive longs which must then be unboxed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)