[ https://issues.apache.org/jira/browse/HDFS-6019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jing Zhao updated HDFS-6019: ---------------------------- Attachment: HDFS-6019.001.patch Thanks for the nice test and fix, Haohui! +1 Only one nit: the (timeout = 300000) is still commented in TestRollingUpgrade#testQuery. The 001 patch makes this trivial change. > Standby NN might not checkpoint when processing the rolling upgrade marker > -------------------------------------------------------------------------- > > Key: HDFS-6019 > URL: https://issues.apache.org/jira/browse/HDFS-6019 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode, ha, hdfs-client, namenode > Reporter: Haohui Mai > Assignee: Haohui Mai > Attachments: HDFS-6019.000.patch, HDFS-6019.001.patch > > > {{FsEditlogLoader}} will call {{FSNameSystem#triggerRollbackCheckpoint()}} > when processing the rollback marker, which looks like the following: > {code} > void triggerRollbackCheckpoint() { > if (standbyCheckpointer != null) { > standbyCheckpointer.triggerRollbackCheckpoint(); > } > } > {code} > There is a race condition where {{standbyCheckpointer}} can be {{null}}, > because in the constructor of the {{NameNode}}, the {{initialize()}} method > eventually starts the edit log tailer, but the standby checkpointer is > created in {{HAState#enterState()}}. Therefore, the checkpointer might not be > able to checkpoint when it sees the marker. -- This message was sent by Atlassian JIRA (v6.1.5#6160)