[ https://issues.apache.org/jira/browse/HDFS-7435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Daryn Sharp updated HDFS-7435: ------------------------------ Attachment: HDFS-7435.patch # Added javadoc. # There's compelling reasons I must include the state. See below. # iter.hasNext is insufficient to replace currentBlockIndex. ie. the iter running dry doesn't mean all the expected blocks were read # Yes, the iterator is/was one-shot for decoding. I added multi-iterator support although I think it's over-engineering for a currently unneeded use case. # Gone by virtue of #5 # Better yet: use an abstract factory pattern Kihwal expressed offline concern at bumping the layout version. I added a capabilities long to the version request. It's currently expected to act as a bitmask but we can change that later. If we find that compression of block reports, per Colin, is viable then we can add another bit here so the DN can conditionally compress. Regarding always including the 1-byte state. The format, among other reasons, was intended as a precusor to improved block report creation on the DN. Currently, the DN does 3 passes over the blocks: one to filter finalized vs uc; second for BlockListAsLongs to encode into List; third to transfer into the PB. I initially reduced it to 2 passes by removing the filtering, but intended to reduce it to 1 in the future, which I did last night for this patch. 1-pass precludes the ability to build a buffer of blocks with varying fields. The extra bits in the 1-byte can be used for other purposes. Kihwal wants to pass back preferred block "stickyness" via this field. I thought about robbing the upper bits of the block size. However that increases the size of that varint by many bytes. As performance conscience as we are with our clusters, which is why I'm doing this, a few tens of KB per storage report isn't even a concern. The size of the reports is not a problem, so much as the many hundreds of ms processing time and garbage generation. This patch will greatly reduce the garbage on both the NN and DN, expense of encoding/decoding, and add flexibility for the cost of 1 byte. If compression is viable, then it becomes a moot point. > PB encoding of block reports is very inefficient > ------------------------------------------------ > > Key: HDFS-7435 > URL: https://issues.apache.org/jira/browse/HDFS-7435 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode, namenode > Affects Versions: 2.0.0-alpha, 3.0.0 > Reporter: Daryn Sharp > Assignee: Daryn Sharp > Priority: Critical > Attachments: HDFS-7435.000.patch, HDFS-7435.001.patch, > HDFS-7435.002.patch, HDFS-7435.patch, HDFS-7435.patch, HDFS-7435.patch, > HDFS-7435.patch, HDFS-7435.patch, HDFS-7435.patch > > > Block reports are encoded as a PB repeating long. Repeating fields use an > {{ArrayList}} with default capacity of 10. A block report containing tens or > hundreds of thousand of longs (3 for each replica) is extremely expensive > since the {{ArrayList}} must realloc many times. Also, decoding repeating > fields will box the primitive longs which must then be unboxed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)