[ 
https://issues.apache.org/jira/browse/HDFS-16898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17683978#comment-17683978
 ] 

ASF GitHub Bot commented on HDFS-16898:
---------------------------------------

virajjasani commented on PR #5330:
URL: https://github.com/apache/hadoop/pull/5330#issuecomment-1416204631

   > Hi, @virajjasani . thanks for your careful review. Surely, before 
[HDFS-6788](https://issues.apache.org/jira/browse/HDFS-6788), this part was 
covered by synchronized lock. but in method `processCommandFromActive` and 
`processCommandFromStandby`, it just use the parameter actor to print log info. 
The lock here is just trying to decide actor is whether bpServiceToActive or 
not and determine to execute either processCommandFromActive or 
processCommandFromStandby.
   > 
   > when occurs switchover between active namenode and standby namenode, the 
datanodes would be set to stale status, in stale status, we are not allowed to 
delete blocks directly, we put those blocks into postponedMisreplicatedBlocks. 
So, even we execute the DatanodeCommand from the previous active namenode(now 
standby), it is okay.
   
   Thank you @hfutatzhanghb.
   I was just going to state that we don't need write lock to verify whether 
the current actor is the one connected to active namenode, read lock would be 
sufficient. But looks like you already made the change.
   
   I did a quick glance and we don't hit this log line in our clusters so far 
but this PR has interesting fix. I will check this further for any more 
resource contention.
   




> Make write lock fine-grain in processCommandFromActor method
> ------------------------------------------------------------
>
>                 Key: HDFS-16898
>                 URL: https://issues.apache.org/jira/browse/HDFS-16898
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>    Affects Versions: 3.3.4
>            Reporter: ZhangHB
>            Priority: Major
>              Labels: pull-request-available
>
> Now in method processCommandFromActor,  we have code like below:
>  
> {code:java}
> writeLock();
> try {
>   if (actor == bpServiceToActive) {
>     return processCommandFromActive(cmd, actor);
>   } else {
>     return processCommandFromStandby(cmd, actor);
>   }
> } finally {
>   writeUnlock();
> } {code}
> if method processCommandFromActive costs much time, the write lock would not 
> release.
>  
> It maybe block the updateActorStatesFromHeartbeat method in 
> offerService,furthermore, it can cause the lastcontact of datanode very high, 
> even dead when lastcontact beyond 600s.
> {code:java}
> bpos.updateActorStatesFromHeartbeat(
>     this, resp.getNameNodeHaState());{code}
> here we can make write lock fine-grain in processCommandFromActor method to 
> address this problem
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to