[ https://issues.apache.org/jira/browse/HDFS-16898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17683978#comment-17683978 ]
ASF GitHub Bot commented on HDFS-16898: --------------------------------------- virajjasani commented on PR #5330: URL: https://github.com/apache/hadoop/pull/5330#issuecomment-1416204631 > Hi, @virajjasani . thanks for your careful review. Surely, before [HDFS-6788](https://issues.apache.org/jira/browse/HDFS-6788), this part was covered by synchronized lock. but in method `processCommandFromActive` and `processCommandFromStandby`, it just use the parameter actor to print log info. The lock here is just trying to decide actor is whether bpServiceToActive or not and determine to execute either processCommandFromActive or processCommandFromStandby. > > when occurs switchover between active namenode and standby namenode, the datanodes would be set to stale status, in stale status, we are not allowed to delete blocks directly, we put those blocks into postponedMisreplicatedBlocks. So, even we execute the DatanodeCommand from the previous active namenode(now standby), it is okay. Thank you @hfutatzhanghb. I was just going to state that we don't need write lock to verify whether the current actor is the one connected to active namenode, read lock would be sufficient. But looks like you already made the change. I did a quick glance and we don't hit this log line in our clusters so far but this PR has interesting fix. I will check this further for any more resource contention. > Make write lock fine-grain in processCommandFromActor method > ------------------------------------------------------------ > > Key: HDFS-16898 > URL: https://issues.apache.org/jira/browse/HDFS-16898 > Project: Hadoop HDFS > Issue Type: Improvement > Affects Versions: 3.3.4 > Reporter: ZhangHB > Priority: Major > Labels: pull-request-available > > Now in method processCommandFromActor, we have code like below: > > {code:java} > writeLock(); > try { > if (actor == bpServiceToActive) { > return processCommandFromActive(cmd, actor); > } else { > return processCommandFromStandby(cmd, actor); > } > } finally { > writeUnlock(); > } {code} > if method processCommandFromActive costs much time, the write lock would not > release. > > It maybe block the updateActorStatesFromHeartbeat method in > offerService,furthermore, it can cause the lastcontact of datanode very high, > even dead when lastcontact beyond 600s. > {code:java} > bpos.updateActorStatesFromHeartbeat( > this, resp.getNameNodeHaState());{code} > here we can make write lock fine-grain in processCommandFromActor method to > address this problem > -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org