[ 
https://issues.apache.org/jira/browse/HDFS-7435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daryn Sharp updated HDFS-7435:
------------------------------
    Attachment: HDFS-7435.patch

# Added javadoc.
# There's compelling reasons I must include the state.  See below.
# iter.hasNext is insufficient to replace currentBlockIndex. ie. the iter 
running dry doesn't mean all the expected blocks were read
# Yes, the iterator is/was one-shot for decoding.  I added multi-iterator 
support although I think it's over-engineering for a currently unneeded use 
case.
# Gone by virtue of #5
# Better yet: use an abstract factory pattern

Kihwal expressed offline concern at bumping the layout version.  I added a 
capabilities long to the version request.  It's currently expected to act as a 
bitmask but we can change that later.  If we find that compression of block 
reports, per Colin, is viable then we can add another bit here so the DN can 
conditionally compress.

Regarding always including the 1-byte state.  The format, among other reasons, 
was intended as a precusor to improved block report creation on the DN.  
Currently, the DN does 3 passes over the blocks: one to filter finalized vs uc; 
second for BlockListAsLongs to encode into List; third to transfer into the PB. 
 I initially reduced it to 2 passes by removing the filtering, but intended to 
reduce it to 1 in the future, which I did last night for this patch.  1-pass 
precludes the ability to build a buffer of blocks with varying fields.

The extra bits in the 1-byte can be used for other purposes.  Kihwal wants to 
pass back preferred block "stickyness" via this field.  I thought about robbing 
the upper bits of the block size.  However that increases the size of that 
varint by many bytes.

As performance conscience as we are with our clusters, which is why I'm doing 
this, a few tens of KB per storage report isn't even a concern.  The size of 
the reports is not a problem, so much as the many hundreds of ms processing 
time and garbage generation.  This patch will greatly reduce the garbage on 
both the NN and DN, expense of encoding/decoding, and add flexibility for the 
cost of 1 byte.  If compression is viable, then it becomes a moot point.


> PB encoding of block reports is very inefficient
> ------------------------------------------------
>
>                 Key: HDFS-7435
>                 URL: https://issues.apache.org/jira/browse/HDFS-7435
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: datanode, namenode
>    Affects Versions: 2.0.0-alpha, 3.0.0
>            Reporter: Daryn Sharp
>            Assignee: Daryn Sharp
>            Priority: Critical
>         Attachments: HDFS-7435.000.patch, HDFS-7435.001.patch, 
> HDFS-7435.002.patch, HDFS-7435.patch, HDFS-7435.patch, HDFS-7435.patch, 
> HDFS-7435.patch, HDFS-7435.patch, HDFS-7435.patch
>
>
> Block reports are encoded as a PB repeating long.  Repeating fields use an 
> {{ArrayList}} with default capacity of 10.  A block report containing tens or 
> hundreds of thousand of longs (3 for each replica) is extremely expensive 
> since the {{ArrayList}} must realloc many times.  Also, decoding repeating 
> fields will box the primitive longs which must then be unboxed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to