[ 
https://issues.apache.org/jira/browse/HDFS-6019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-6019:
----------------------------

    Attachment: HDFS-6019.001.patch

Thanks for the nice test and fix, Haohui! +1

Only one nit: the (timeout = 300000) is still commented in 
TestRollingUpgrade#testQuery. The 001 patch makes this trivial change.

> Standby NN might not checkpoint when processing the rolling upgrade marker
> --------------------------------------------------------------------------
>
>                 Key: HDFS-6019
>                 URL: https://issues.apache.org/jira/browse/HDFS-6019
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: datanode, ha, hdfs-client, namenode
>            Reporter: Haohui Mai
>            Assignee: Haohui Mai
>         Attachments: HDFS-6019.000.patch, HDFS-6019.001.patch
>
>
> {{FsEditlogLoader}} will call {{FSNameSystem#triggerRollbackCheckpoint()}} 
> when processing the rollback marker, which looks like the following:
> {code}
> void triggerRollbackCheckpoint() {
>   if (standbyCheckpointer != null) {
>     standbyCheckpointer.triggerRollbackCheckpoint();
>   }
> }
> {code}
> There is a race condition where {{standbyCheckpointer}} can be {{null}}, 
> because in the constructor of the {{NameNode}}, the {{initialize()}} method 
> eventually starts the edit log tailer, but the standby checkpointer is 
> created in {{HAState#enterState()}}. Therefore, the checkpointer might not be 
> able to checkpoint when it sees the marker.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to