[ 
https://issues.apache.org/jira/browse/HDFS-9011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14954355#comment-14954355
 ] 

Tsz Wo Nicholas Sze commented on HDFS-9011:
-------------------------------------------

Here is a new idea -- we may partition the block ID space so that datanodes can 
send multiple small full block reports for each partition.  The partitions 
needs not be fixed.
- When a full block report is larger than a threshold, the report is split into 
two reports, one for blocks with odd ID and one for blocks with even IDs.  If 
these reports are still too large, split them into four reports with ID 
suffixes 00, 01, 10 and 11.  The process continue until the reports are smaller 
than the threshold.  Datanode sends each partitioned report with its suffix.
- Since the block ID space is partitioned, Namenode can process each 
partitioned report without knowing the remaining partitioned reports.

> Support splitting BlockReport of a storage into multiple RPC
> ------------------------------------------------------------
>
>                 Key: HDFS-9011
>                 URL: https://issues.apache.org/jira/browse/HDFS-9011
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Jing Zhao
>            Assignee: Jing Zhao
>         Attachments: HDFS-9011.000.patch, HDFS-9011.001.patch, 
> HDFS-9011.002.patch
>
>
> Currently if a DataNode has too many blocks (more than 1m by default), it 
> sends multiple RPC to the NameNode for the block report, each RPC contains 
> report for a single storage. However, in practice we've seen sometimes even a 
> single storage can contains large amount of blocks and the report even 
> exceeds the max RPC data length. It may be helpful to support sending 
> multiple RPC for the block report of a storage. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to