[ 
https://issues.apache.org/jira/browse/HDFS-9011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14733188#comment-14733188
 ] 

Yi Liu commented on HDFS-9011:
------------------------------

Thanks [~jingzhao] for working on this.  Besides Nicholas' comments.
*1.* In BlockPoolSlice
{code}
+  private void saveReplicas(List<BlockListAsLongs> persistList) {
+    if (persistList == null || persistList.isEmpty()) {
       return;
     }
     File tmpFile = new File(currentDir, REPLICA_CACHE_FILE + ".tmp");
@@ -787,7 +787,9 @@ private void saveReplicas(BlockListAsLongs 
blocksListToPersist) {
     FileOutputStream out = null;
     try {
       out = new FileOutputStream(tmpFile);
-      blocksListToPersist.writeTo(out);
+      for (BlockListAsLongs blockLists : persistList) {
+        blockLists.writeTo(out);
+      }
{code}
Now we write {{BlockListAsLongs}} *list* to {{REPLICA_CACHE_FILE}}, so we 
should also change the logic of  {{readReplicasFromCache}}:
{code}
BlockListAsLongs blocksList =  BlockListAsLongs.readFrom(inputStream);
{code}
It currently read the first {{BlockListAsLongs}}.

Also in {{saveReplicas}}, if one BlockListAsLongs has 0 number of blocks, it's 
better not to persist it, otherwise there is NullPointerException while reading 
replicas from cache file.

*2.* We should also change the description about 
{{dfs.blockreport.split.threshold}} in hdfs-default.xml

Nits:  some line are longer than 80 characters in the patch.

> Support splitting BlockReport of a storage into multiple RPC
> ------------------------------------------------------------
>
>                 Key: HDFS-9011
>                 URL: https://issues.apache.org/jira/browse/HDFS-9011
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Jing Zhao
>            Assignee: Jing Zhao
>         Attachments: HDFS-9011.000.patch, HDFS-9011.001.patch, 
> HDFS-9011.002.patch
>
>
> Currently if a DataNode has too many blocks (more than 1m by default), it 
> sends multiple RPC to the NameNode for the block report, each RPC contains 
> report for a single storage. However, in practice we've seen sometimes even a 
> single storage can contains large amount of blocks and the report even 
> exceeds the max RPC data length. It may be helpful to support sending 
> multiple RPC for the block report of a storage. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to