[ https://issues.apache.org/jira/browse/HDFS-14361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16793450#comment-16793450 ]
star edited comment on HDFS-14361 at 3/15/19 8:30 AM: ------------------------------------------------------ [~brahmareddy], please wait. Maybe we should reconsider the design doc [^Multiple-Standby-NameNodes_V1.pdf] . The following code is from that doc: {code:java} boolean sendRequest = needCheckpoint || isPrimaryCheckPointer || secsSinceLast >= checkpointConf.getQuietPeriod();{code} Further more, SNN will send download image request as shown in comments above line 429 in 3 cases: * rollback request * are the checkpointer * are outside the quiet period But from the patch only in later two cases will SNN send download request. I think it causes issue HDFS-12248. {code:java} boolean sendRequest = isPrimaryCheckPointer || secsSinceLast >= checkpointConf.getQuietPeriod();{code} I think we should move isPrimaryCheckPointer outside of 'if' block to avoid a inconsistent state that there are more than 1 SNN with isPrimaryCheckPointer = true, though it will not break anything. Fix code like this: {code:java} boolean sendRequest = needRollbackCheckpoint || isPrimaryCheckPointer || secsSinceLast >= checkpointConf.getQuietPeriod();{code} Or we should fix the comments. was (Author: starphin): [~brahmareddy], please wait. Maybe we should reconsider the design doc [^Multiple-Standby-NameNodes_V1.pdf] . The following code is from that doc: {code:java} boolean sendRequest = needCheckpoint || isPrimaryCheckPointer || secsSinceLast >= checkpointConf.getQuietPeriod();{code} Further more, SNN will send download image request as shown in comments above line 429 in 3 cases: * rollback request * are the checkpointer * are outside the quiet period But from the patch only in later two cases will SNN send download request. I think it causes issue HDFS-12248. boolean sendRequest = isPrimaryCheckPointer || secsSinceLastUpload >= checkpointConf.getQuietPeriod(); I think we should move isPrimaryCheckPointer outside of 'if' block to avoid a inconsistent state that there are more than 1 SNN with isPrimaryCheckPointer = true, though it will not break anything. Fix code like this: boolean sendRequest = needRollbackCheckpoint || isPrimaryCheckPointer || secsSinceLastUpload >= checkpointConf.getQuietPeriod(); Thus all SNN will send request everytime rollbackCheckpoint is triggered. Or we should fix the comments. > SNN will always upload fsimage > ------------------------------ > > Key: HDFS-14361 > URL: https://issues.apache.org/jira/browse/HDFS-14361 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha, namenode > Affects Versions: 3.2.0 > Reporter: hunshenshi > Priority: Major > Fix For: 3.2.0 > > > Related to -HDFS-12248.- > {code:java} > boolean sendRequest = isPrimaryCheckPointer > || secsSinceLastUpload >= checkpointConf.getQuietPeriod(); > doCheckpoint(sendRequest); > {code} > If sendRequest is true, SNN will upload fsimage. But isPrimaryCheckPointer > always is true, > {code:java} > if (ie == null && ioe == null) { > //Update only when response from remote about success or > lastUploadTime = monotonicNow(); > // we are primary if we successfully updated the ANN > this.isPrimaryCheckPointer = success; > } > {code} > isPrimaryCheckPointer should be outside the if condition. > If the ANN update was not successful, then isPrimaryCheckPointer should be > set to false. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org