[jira] [Commented] (HDFS-7435) PB encoding of block reports is very inefficient

Mingliang Liu (JIRA) Mon, 30 Nov 2015 11:15:32 -0800

    [ 
https://issues.apache.org/jira/browse/HDFS-7435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15032285#comment-15032285
 ]


Mingliang Liu commented on HDFS-7435:
-------------------------------------

Thanks for your comment, [~daryn] and [~shv]. I started to work on 
{{NNThroughputBenchmark}} just recently and knew nothing about it before that. 
If the empty block report list in this patch was not intensional, I think I 
don't need more context of this patch. Sure I'd like to make the change as we 
discussed above. The jira is [HDFS-9484]. Let's continue further discussion on 
the fix there.

The reason why the unit tests could pass may be that the 
{{TestNNThroughputBenchmark}} is rather a driver to run the benchmark with 
default parameters than a real unit test that asserts expected behavior for 
different scenarios. If we need a sophisticated unit test, perhaps we can 
address it separately.

> PB encoding of block reports is very inefficient
> ------------------------------------------------
>
>                 Key: HDFS-7435
>                 URL: https://issues.apache.org/jira/browse/HDFS-7435
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: datanode, namenode
>    Affects Versions: 2.0.0-alpha, 3.0.0
>            Reporter: Daryn Sharp
>            Assignee: Daryn Sharp
>            Priority: Critical
>             Fix For: 2.7.0
>
>         Attachments: HDFS-7435.000.patch, HDFS-7435.001.patch, 
> HDFS-7435.002.patch, HDFS-7435.patch, HDFS-7435.patch, HDFS-7435.patch, 
> HDFS-7435.patch, HDFS-7435.patch, HDFS-7435.patch, HDFS-7435.patch, 
> HDFS-7435.patch, HDFS-7435.patch
>
>
> Block reports are encoded as a PB repeating long.  Repeating fields use an 
> {{ArrayList}} with default capacity of 10.  A block report containing tens or 
> hundreds of thousand of longs (3 for each replica) is extremely expensive 
> since the {{ArrayList}} must realloc many times.  Also, decoding repeating 
> fields will box the primitive longs which must then be unboxed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7435) PB encoding of block reports is very inefficient

Reply via email to