[ 
https://issues.apache.org/jira/browse/HDFS-16999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17719619#comment-17719619
 ] 

ASF GitHub Bot commented on HDFS-16999:
---------------------------------------

Hexiaoqiao commented on PR #5622:
URL: https://github.com/apache/hadoop/pull/5622#issuecomment-1535670070

   @zhangshuyan0 Thanks for your proposal and try to fix this issue. Just 
glance the PR, it propose to switch `processFirstBlockReport` to 
`processReport` when restart DataNode only, right? I am concerned the 
performance if we do that.
   a. Is it possible to improve `processFirstBlockReport` to solve this issue?
   b. Will it affect the process logic or performance when add one new DataNode 
to cluster?
   Thanks again.




> Fix wrong use of processFirstBlockReport()
> ------------------------------------------
>
>                 Key: HDFS-16999
>                 URL: https://issues.apache.org/jira/browse/HDFS-16999
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: Shuyan Zhang
>            Assignee: Shuyan Zhang
>            Priority: Major
>              Labels: pull-request-available
>
> `processFirstBlockReport()` is used to process first block report from 
> datanode. It does not calculating `toRemove` list because it believes that 
> there is no metadata about the datanode in the namenode. However, If a 
> datanode is re registered after restarting, its `blockReportCount` will be 
> updated to 0. That is to say, the first block report after a datanode 
> restarts will be processed by `processFirstBlockReport()`.  This is 
> unreasonable because the metadata of the datanode already exists in namenode 
> at this time, and if redundant replica metadata is not removed in time, the 
> blocks with insufficient replicas cannot be reconstruct in time, which 
> increases the risk of missing block. In summary, `processFirstBlockReport()` 
> should only be used when the namenode restarts, not when the datanode 
> restarts. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to