[jira] [Updated] (HDFS-5291) Standby namenode after transition to active goes into safemode

2013-10-07 Thread Arun C Murthy (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated HDFS-5291:


Fix Version/s: (was: 2.2.0)
   2.2.1

> Standby namenode after transition to active goes into safemode
> --
>
> Key: HDFS-5291
> URL: https://issues.apache.org/jira/browse/HDFS-5291
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ha
>Affects Versions: 2.1.1-beta
>Reporter: Arpit Gupta
>Assignee: Jing Zhao
>Priority: Critical
> Fix For: 2.2.1
>
> Attachments: HDFS-5291.000.patch, HDFS-5291.001.patch, 
> HDFS-5291.002.patch, HDFS-5291.003.patch, HDFS-5291.003.patch, nn.log
>
>
> Some log snippets
> standby state to active transition
> {code}
> 2013-10-02 00:13:49,482 INFO  ipc.Server (Server.java:run(2068)) - IPC Server 
> handler 69 on 8020, call 
> org.apache.hadoop.hdfs.protocol.ClientProtocol.renewLease from IP:33911 
> Call#1483 Retry#1: error: org.apache.hadoop.ipc.StandbyException: Operation 
> category WRITE is not supported in state standby
> 2013-10-02 00:13:49,689 INFO  ipc.Server (Server.java:saslProcess(1342)) - 
> Auth successful for nn/hostn...@example.com (auth:SIMPLE)
> 2013-10-02 00:13:49,696 INFO  authorize.ServiceAuthorizationManager 
> (ServiceAuthorizationManager.java:authorize(111)) - Authorization successful 
> for nn/hostn...@example.com (auth:KERBEROS) for protocol=interface 
> org.apache.hadoop.ha.HAServiceProtocol
> 2013-10-02 00:13:49,700 INFO  namenode.FSNamesystem 
> (FSNamesystem.java:stopStandbyServices(1013)) - Stopping services started for 
> standby state
> 2013-10-02 00:13:49,701 WARN  ha.EditLogTailer 
> (EditLogTailer.java:doWork(336)) - Edit log tailer interrupted
> java.lang.InterruptedException: sleep interrupted
> at java.lang.Thread.sleep(Native Method)
> at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:334)
> at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$200(EditLogTailer.java:279)
> at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:296)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:356)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1463)
> at 
> org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:454)
> at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTail
> 2013-10-02 00:13:49,704 INFO  namenode.FSNamesystem 
> (FSNamesystem.java:startActiveServices(885)) - Starting services required for 
> active state
> 2013-10-02 00:13:49,719 INFO  client.QuorumJournalManager 
> (QuorumJournalManager.java:recoverUnfinalizedSegments(419)) - Starting 
> recovery process for unclosed journal segments...
> 2013-10-02 00:13:49,755 INFO  ipc.Server (Server.java:saslProcess(1342)) - 
> Auth successful for hbase/hostn...@example.com (auth:SIMPLE)
> 2013-10-02 00:13:49,761 INFO  authorize.ServiceAuthorizationManager 
> (ServiceAuthorizationManager.java:authorize(111)) - Authorization successful 
> for hbase/hostn...@example.com (auth:KERBEROS) for protocol=interface 
> org.apache.hadoop.hdfs.protocol.ClientProtocol
> 2013-10-02 00:13:49,839 INFO  client.QuorumJournalManager 
> (QuorumJournalManager.java:recoverUnfinalizedSegments(421)) - Successfully 
> started new epoch 85
> 2013-10-02 00:13:49,839 INFO  client.QuorumJournalManager 
> (QuorumJournalManager.java:recoverUnclosedSegment(249)) - Beginning recovery 
> of unclosed segment starting at txid 887112
> 2013-10-02 00:13:49,874 INFO  client.QuorumJournalManager 
> (QuorumJournalManager.java:recoverUnclosedSegment(258)) - Recovery prepare 
> phase complete. Responses:
> IP:8485: segmentState { startTxId: 887112 endTxId: 887531 isInProgress: true 
> } lastWriterEpoch: 84 lastCommittedTxId: 887530
> 172.18.145.97:8485: segmentState { startTxId: 887112 endTxId: 887531 
> isInProgress: true } lastWriterEpoch: 84 lastCommittedTxId: 887530
> 2013-10-02 00:13:49,875 INFO  client.QuorumJournalManager 
> (QuorumJournalManager.java:recover
> {code}
> And then we get into safemode
> {code}
> Construction[IP:1019|RBW]]} size 0
> 2013-10-02 00:13:50,277 INFO  BlockStateChange 
> (BlockManager.java:logAddStoredBlock(2237)) - BLOCK* addStoredBlock: blockMap 
> updated: IP:1019 is added to blk_IP157{blockUCState=UNDER_CONSTRUCTION, 
> primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[IP:1019|RBW], 
> ReplicaUnderConstruction[172.18.145.96:1019|RBW], ReplicaUnde
> rConstruction[IP:1019|RBW]]} size 0
> 2013-10-02 00:13:50,279 INFO  hdfs.StateChange 
> (FSNamesystem.

[jira] [Updated] (HDFS-5291) Standby namenode after transition to active goes into safemode

2013-10-07 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-5291:


   Resolution: Fixed
Fix Version/s: 2.2.0
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

Thanks again for the review, Suresh! I've committed this to trunk, branch-2 and 
branch-2.2.

> Standby namenode after transition to active goes into safemode
> --
>
> Key: HDFS-5291
> URL: https://issues.apache.org/jira/browse/HDFS-5291
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ha
>Affects Versions: 2.1.1-beta
>Reporter: Arpit Gupta
>Assignee: Jing Zhao
>Priority: Critical
> Fix For: 2.2.0
>
> Attachments: HDFS-5291.000.patch, HDFS-5291.001.patch, 
> HDFS-5291.002.patch, HDFS-5291.003.patch, HDFS-5291.003.patch, nn.log
>
>
> Some log snippets
> standby state to active transition
> {code}
> 2013-10-02 00:13:49,482 INFO  ipc.Server (Server.java:run(2068)) - IPC Server 
> handler 69 on 8020, call 
> org.apache.hadoop.hdfs.protocol.ClientProtocol.renewLease from IP:33911 
> Call#1483 Retry#1: error: org.apache.hadoop.ipc.StandbyException: Operation 
> category WRITE is not supported in state standby
> 2013-10-02 00:13:49,689 INFO  ipc.Server (Server.java:saslProcess(1342)) - 
> Auth successful for nn/hostn...@example.com (auth:SIMPLE)
> 2013-10-02 00:13:49,696 INFO  authorize.ServiceAuthorizationManager 
> (ServiceAuthorizationManager.java:authorize(111)) - Authorization successful 
> for nn/hostn...@example.com (auth:KERBEROS) for protocol=interface 
> org.apache.hadoop.ha.HAServiceProtocol
> 2013-10-02 00:13:49,700 INFO  namenode.FSNamesystem 
> (FSNamesystem.java:stopStandbyServices(1013)) - Stopping services started for 
> standby state
> 2013-10-02 00:13:49,701 WARN  ha.EditLogTailer 
> (EditLogTailer.java:doWork(336)) - Edit log tailer interrupted
> java.lang.InterruptedException: sleep interrupted
> at java.lang.Thread.sleep(Native Method)
> at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:334)
> at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$200(EditLogTailer.java:279)
> at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:296)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:356)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1463)
> at 
> org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:454)
> at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTail
> 2013-10-02 00:13:49,704 INFO  namenode.FSNamesystem 
> (FSNamesystem.java:startActiveServices(885)) - Starting services required for 
> active state
> 2013-10-02 00:13:49,719 INFO  client.QuorumJournalManager 
> (QuorumJournalManager.java:recoverUnfinalizedSegments(419)) - Starting 
> recovery process for unclosed journal segments...
> 2013-10-02 00:13:49,755 INFO  ipc.Server (Server.java:saslProcess(1342)) - 
> Auth successful for hbase/hostn...@example.com (auth:SIMPLE)
> 2013-10-02 00:13:49,761 INFO  authorize.ServiceAuthorizationManager 
> (ServiceAuthorizationManager.java:authorize(111)) - Authorization successful 
> for hbase/hostn...@example.com (auth:KERBEROS) for protocol=interface 
> org.apache.hadoop.hdfs.protocol.ClientProtocol
> 2013-10-02 00:13:49,839 INFO  client.QuorumJournalManager 
> (QuorumJournalManager.java:recoverUnfinalizedSegments(421)) - Successfully 
> started new epoch 85
> 2013-10-02 00:13:49,839 INFO  client.QuorumJournalManager 
> (QuorumJournalManager.java:recoverUnclosedSegment(249)) - Beginning recovery 
> of unclosed segment starting at txid 887112
> 2013-10-02 00:13:49,874 INFO  client.QuorumJournalManager 
> (QuorumJournalManager.java:recoverUnclosedSegment(258)) - Recovery prepare 
> phase complete. Responses:
> IP:8485: segmentState { startTxId: 887112 endTxId: 887531 isInProgress: true 
> } lastWriterEpoch: 84 lastCommittedTxId: 887530
> 172.18.145.97:8485: segmentState { startTxId: 887112 endTxId: 887531 
> isInProgress: true } lastWriterEpoch: 84 lastCommittedTxId: 887530
> 2013-10-02 00:13:49,875 INFO  client.QuorumJournalManager 
> (QuorumJournalManager.java:recover
> {code}
> And then we get into safemode
> {code}
> Construction[IP:1019|RBW]]} size 0
> 2013-10-02 00:13:50,277 INFO  BlockStateChange 
> (BlockManager.java:logAddStoredBlock(2237)) - BLOCK* addStoredBlock: blockMap 
> updated: IP:1019 is added to blk_IP157{blockUCState=UNDER_CONSTRUCTION, 
> primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[IP:1019|RBW], 
> ReplicaUnde

[jira] [Updated] (HDFS-5291) Standby namenode after transition to active goes into safemode

2013-10-07 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-5291:


Attachment: HDFS-5291.003.patch

The new javadoc warning should be wrong report. Update the patch to see if we 
can get rid of the complain.

> Standby namenode after transition to active goes into safemode
> --
>
> Key: HDFS-5291
> URL: https://issues.apache.org/jira/browse/HDFS-5291
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ha
>Affects Versions: 2.1.1-beta
>Reporter: Arpit Gupta
>Assignee: Jing Zhao
>Priority: Critical
> Attachments: HDFS-5291.000.patch, HDFS-5291.001.patch, 
> HDFS-5291.002.patch, HDFS-5291.003.patch, HDFS-5291.003.patch, nn.log
>
>
> Some log snippets
> standby state to active transition
> {code}
> 2013-10-02 00:13:49,482 INFO  ipc.Server (Server.java:run(2068)) - IPC Server 
> handler 69 on 8020, call 
> org.apache.hadoop.hdfs.protocol.ClientProtocol.renewLease from IP:33911 
> Call#1483 Retry#1: error: org.apache.hadoop.ipc.StandbyException: Operation 
> category WRITE is not supported in state standby
> 2013-10-02 00:13:49,689 INFO  ipc.Server (Server.java:saslProcess(1342)) - 
> Auth successful for nn/hostn...@example.com (auth:SIMPLE)
> 2013-10-02 00:13:49,696 INFO  authorize.ServiceAuthorizationManager 
> (ServiceAuthorizationManager.java:authorize(111)) - Authorization successful 
> for nn/hostn...@example.com (auth:KERBEROS) for protocol=interface 
> org.apache.hadoop.ha.HAServiceProtocol
> 2013-10-02 00:13:49,700 INFO  namenode.FSNamesystem 
> (FSNamesystem.java:stopStandbyServices(1013)) - Stopping services started for 
> standby state
> 2013-10-02 00:13:49,701 WARN  ha.EditLogTailer 
> (EditLogTailer.java:doWork(336)) - Edit log tailer interrupted
> java.lang.InterruptedException: sleep interrupted
> at java.lang.Thread.sleep(Native Method)
> at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:334)
> at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$200(EditLogTailer.java:279)
> at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:296)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:356)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1463)
> at 
> org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:454)
> at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTail
> 2013-10-02 00:13:49,704 INFO  namenode.FSNamesystem 
> (FSNamesystem.java:startActiveServices(885)) - Starting services required for 
> active state
> 2013-10-02 00:13:49,719 INFO  client.QuorumJournalManager 
> (QuorumJournalManager.java:recoverUnfinalizedSegments(419)) - Starting 
> recovery process for unclosed journal segments...
> 2013-10-02 00:13:49,755 INFO  ipc.Server (Server.java:saslProcess(1342)) - 
> Auth successful for hbase/hostn...@example.com (auth:SIMPLE)
> 2013-10-02 00:13:49,761 INFO  authorize.ServiceAuthorizationManager 
> (ServiceAuthorizationManager.java:authorize(111)) - Authorization successful 
> for hbase/hostn...@example.com (auth:KERBEROS) for protocol=interface 
> org.apache.hadoop.hdfs.protocol.ClientProtocol
> 2013-10-02 00:13:49,839 INFO  client.QuorumJournalManager 
> (QuorumJournalManager.java:recoverUnfinalizedSegments(421)) - Successfully 
> started new epoch 85
> 2013-10-02 00:13:49,839 INFO  client.QuorumJournalManager 
> (QuorumJournalManager.java:recoverUnclosedSegment(249)) - Beginning recovery 
> of unclosed segment starting at txid 887112
> 2013-10-02 00:13:49,874 INFO  client.QuorumJournalManager 
> (QuorumJournalManager.java:recoverUnclosedSegment(258)) - Recovery prepare 
> phase complete. Responses:
> IP:8485: segmentState { startTxId: 887112 endTxId: 887531 isInProgress: true 
> } lastWriterEpoch: 84 lastCommittedTxId: 887530
> 172.18.145.97:8485: segmentState { startTxId: 887112 endTxId: 887531 
> isInProgress: true } lastWriterEpoch: 84 lastCommittedTxId: 887530
> 2013-10-02 00:13:49,875 INFO  client.QuorumJournalManager 
> (QuorumJournalManager.java:recover
> {code}
> And then we get into safemode
> {code}
> Construction[IP:1019|RBW]]} size 0
> 2013-10-02 00:13:50,277 INFO  BlockStateChange 
> (BlockManager.java:logAddStoredBlock(2237)) - BLOCK* addStoredBlock: blockMap 
> updated: IP:1019 is added to blk_IP157{blockUCState=UNDER_CONSTRUCTION, 
> primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[IP:1019|RBW], 
> ReplicaUnderConstruction[172.18.145.96:1019|RBW], ReplicaUnde
> rConstruction[IP:1019|RBW]]} size 0
> 2013-10-02 00:13:50

[jira] [Updated] (HDFS-5291) Standby namenode after transition to active goes into safemode

2013-10-07 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-5291:


Attachment: HDFS-5291.003.patch

Thanks for the review Suresh! Update the patch to address your comments.

> Standby namenode after transition to active goes into safemode
> --
>
> Key: HDFS-5291
> URL: https://issues.apache.org/jira/browse/HDFS-5291
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ha
>Affects Versions: 2.1.1-beta
>Reporter: Arpit Gupta
>Assignee: Jing Zhao
>Priority: Critical
> Attachments: HDFS-5291.000.patch, HDFS-5291.001.patch, 
> HDFS-5291.002.patch, HDFS-5291.003.patch, nn.log
>
>
> Some log snippets
> standby state to active transition
> {code}
> 2013-10-02 00:13:49,482 INFO  ipc.Server (Server.java:run(2068)) - IPC Server 
> handler 69 on 8020, call 
> org.apache.hadoop.hdfs.protocol.ClientProtocol.renewLease from IP:33911 
> Call#1483 Retry#1: error: org.apache.hadoop.ipc.StandbyException: Operation 
> category WRITE is not supported in state standby
> 2013-10-02 00:13:49,689 INFO  ipc.Server (Server.java:saslProcess(1342)) - 
> Auth successful for nn/hostn...@example.com (auth:SIMPLE)
> 2013-10-02 00:13:49,696 INFO  authorize.ServiceAuthorizationManager 
> (ServiceAuthorizationManager.java:authorize(111)) - Authorization successful 
> for nn/hostn...@example.com (auth:KERBEROS) for protocol=interface 
> org.apache.hadoop.ha.HAServiceProtocol
> 2013-10-02 00:13:49,700 INFO  namenode.FSNamesystem 
> (FSNamesystem.java:stopStandbyServices(1013)) - Stopping services started for 
> standby state
> 2013-10-02 00:13:49,701 WARN  ha.EditLogTailer 
> (EditLogTailer.java:doWork(336)) - Edit log tailer interrupted
> java.lang.InterruptedException: sleep interrupted
> at java.lang.Thread.sleep(Native Method)
> at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:334)
> at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$200(EditLogTailer.java:279)
> at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:296)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:356)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1463)
> at 
> org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:454)
> at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTail
> 2013-10-02 00:13:49,704 INFO  namenode.FSNamesystem 
> (FSNamesystem.java:startActiveServices(885)) - Starting services required for 
> active state
> 2013-10-02 00:13:49,719 INFO  client.QuorumJournalManager 
> (QuorumJournalManager.java:recoverUnfinalizedSegments(419)) - Starting 
> recovery process for unclosed journal segments...
> 2013-10-02 00:13:49,755 INFO  ipc.Server (Server.java:saslProcess(1342)) - 
> Auth successful for hbase/hostn...@example.com (auth:SIMPLE)
> 2013-10-02 00:13:49,761 INFO  authorize.ServiceAuthorizationManager 
> (ServiceAuthorizationManager.java:authorize(111)) - Authorization successful 
> for hbase/hostn...@example.com (auth:KERBEROS) for protocol=interface 
> org.apache.hadoop.hdfs.protocol.ClientProtocol
> 2013-10-02 00:13:49,839 INFO  client.QuorumJournalManager 
> (QuorumJournalManager.java:recoverUnfinalizedSegments(421)) - Successfully 
> started new epoch 85
> 2013-10-02 00:13:49,839 INFO  client.QuorumJournalManager 
> (QuorumJournalManager.java:recoverUnclosedSegment(249)) - Beginning recovery 
> of unclosed segment starting at txid 887112
> 2013-10-02 00:13:49,874 INFO  client.QuorumJournalManager 
> (QuorumJournalManager.java:recoverUnclosedSegment(258)) - Recovery prepare 
> phase complete. Responses:
> IP:8485: segmentState { startTxId: 887112 endTxId: 887531 isInProgress: true 
> } lastWriterEpoch: 84 lastCommittedTxId: 887530
> 172.18.145.97:8485: segmentState { startTxId: 887112 endTxId: 887531 
> isInProgress: true } lastWriterEpoch: 84 lastCommittedTxId: 887530
> 2013-10-02 00:13:49,875 INFO  client.QuorumJournalManager 
> (QuorumJournalManager.java:recover
> {code}
> And then we get into safemode
> {code}
> Construction[IP:1019|RBW]]} size 0
> 2013-10-02 00:13:50,277 INFO  BlockStateChange 
> (BlockManager.java:logAddStoredBlock(2237)) - BLOCK* addStoredBlock: blockMap 
> updated: IP:1019 is added to blk_IP157{blockUCState=UNDER_CONSTRUCTION, 
> primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[IP:1019|RBW], 
> ReplicaUnderConstruction[172.18.145.96:1019|RBW], ReplicaUnde
> rConstruction[IP:1019|RBW]]} size 0
> 2013-10-02 00:13:50,279 INFO  hdfs.StateChange 
> (FSNamesystem.java:report

[jira] [Updated] (HDFS-5291) Standby namenode after transition to active goes into safemode

2013-10-07 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-5291:


Attachment: HDFS-5291.002.patch

Add a new unit test to make sure client retry happens.

> Standby namenode after transition to active goes into safemode
> --
>
> Key: HDFS-5291
> URL: https://issues.apache.org/jira/browse/HDFS-5291
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ha
>Affects Versions: 2.1.1-beta
>Reporter: Arpit Gupta
>Assignee: Jing Zhao
>Priority: Critical
> Attachments: HDFS-5291.000.patch, HDFS-5291.001.patch, 
> HDFS-5291.002.patch, nn.log
>
>
> Some log snippets
> standby state to active transition
> {code}
> 2013-10-02 00:13:49,482 INFO  ipc.Server (Server.java:run(2068)) - IPC Server 
> handler 69 on 8020, call 
> org.apache.hadoop.hdfs.protocol.ClientProtocol.renewLease from IP:33911 
> Call#1483 Retry#1: error: org.apache.hadoop.ipc.StandbyException: Operation 
> category WRITE is not supported in state standby
> 2013-10-02 00:13:49,689 INFO  ipc.Server (Server.java:saslProcess(1342)) - 
> Auth successful for nn/hostn...@example.com (auth:SIMPLE)
> 2013-10-02 00:13:49,696 INFO  authorize.ServiceAuthorizationManager 
> (ServiceAuthorizationManager.java:authorize(111)) - Authorization successful 
> for nn/hostn...@example.com (auth:KERBEROS) for protocol=interface 
> org.apache.hadoop.ha.HAServiceProtocol
> 2013-10-02 00:13:49,700 INFO  namenode.FSNamesystem 
> (FSNamesystem.java:stopStandbyServices(1013)) - Stopping services started for 
> standby state
> 2013-10-02 00:13:49,701 WARN  ha.EditLogTailer 
> (EditLogTailer.java:doWork(336)) - Edit log tailer interrupted
> java.lang.InterruptedException: sleep interrupted
> at java.lang.Thread.sleep(Native Method)
> at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:334)
> at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$200(EditLogTailer.java:279)
> at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:296)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:356)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1463)
> at 
> org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:454)
> at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTail
> 2013-10-02 00:13:49,704 INFO  namenode.FSNamesystem 
> (FSNamesystem.java:startActiveServices(885)) - Starting services required for 
> active state
> 2013-10-02 00:13:49,719 INFO  client.QuorumJournalManager 
> (QuorumJournalManager.java:recoverUnfinalizedSegments(419)) - Starting 
> recovery process for unclosed journal segments...
> 2013-10-02 00:13:49,755 INFO  ipc.Server (Server.java:saslProcess(1342)) - 
> Auth successful for hbase/hostn...@example.com (auth:SIMPLE)
> 2013-10-02 00:13:49,761 INFO  authorize.ServiceAuthorizationManager 
> (ServiceAuthorizationManager.java:authorize(111)) - Authorization successful 
> for hbase/hostn...@example.com (auth:KERBEROS) for protocol=interface 
> org.apache.hadoop.hdfs.protocol.ClientProtocol
> 2013-10-02 00:13:49,839 INFO  client.QuorumJournalManager 
> (QuorumJournalManager.java:recoverUnfinalizedSegments(421)) - Successfully 
> started new epoch 85
> 2013-10-02 00:13:49,839 INFO  client.QuorumJournalManager 
> (QuorumJournalManager.java:recoverUnclosedSegment(249)) - Beginning recovery 
> of unclosed segment starting at txid 887112
> 2013-10-02 00:13:49,874 INFO  client.QuorumJournalManager 
> (QuorumJournalManager.java:recoverUnclosedSegment(258)) - Recovery prepare 
> phase complete. Responses:
> IP:8485: segmentState { startTxId: 887112 endTxId: 887531 isInProgress: true 
> } lastWriterEpoch: 84 lastCommittedTxId: 887530
> 172.18.145.97:8485: segmentState { startTxId: 887112 endTxId: 887531 
> isInProgress: true } lastWriterEpoch: 84 lastCommittedTxId: 887530
> 2013-10-02 00:13:49,875 INFO  client.QuorumJournalManager 
> (QuorumJournalManager.java:recover
> {code}
> And then we get into safemode
> {code}
> Construction[IP:1019|RBW]]} size 0
> 2013-10-02 00:13:50,277 INFO  BlockStateChange 
> (BlockManager.java:logAddStoredBlock(2237)) - BLOCK* addStoredBlock: blockMap 
> updated: IP:1019 is added to blk_IP157{blockUCState=UNDER_CONSTRUCTION, 
> primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[IP:1019|RBW], 
> ReplicaUnderConstruction[172.18.145.96:1019|RBW], ReplicaUnde
> rConstruction[IP:1019|RBW]]} size 0
> 2013-10-02 00:13:50,279 INFO  hdfs.StateChange 
> (FSNamesystem.java:reportStatus(4703)) - STATE* Safe mode ON.
> 

[jira] [Updated] (HDFS-5291) Standby namenode after transition to active goes into safemode

2013-10-06 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-5291:


Attachment: HDFS-5291.001.patch

Fix the failed unit tests.

> Standby namenode after transition to active goes into safemode
> --
>
> Key: HDFS-5291
> URL: https://issues.apache.org/jira/browse/HDFS-5291
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ha
>Affects Versions: 2.1.1-beta
>Reporter: Arpit Gupta
>Assignee: Jing Zhao
>Priority: Critical
> Attachments: HDFS-5291.000.patch, HDFS-5291.001.patch, nn.log
>
>
> Some log snippets
> standby state to active transition
> {code}
> 2013-10-02 00:13:49,482 INFO  ipc.Server (Server.java:run(2068)) - IPC Server 
> handler 69 on 8020, call 
> org.apache.hadoop.hdfs.protocol.ClientProtocol.renewLease from IP:33911 
> Call#1483 Retry#1: error: org.apache.hadoop.ipc.StandbyException: Operation 
> category WRITE is not supported in state standby
> 2013-10-02 00:13:49,689 INFO  ipc.Server (Server.java:saslProcess(1342)) - 
> Auth successful for nn/hostn...@example.com (auth:SIMPLE)
> 2013-10-02 00:13:49,696 INFO  authorize.ServiceAuthorizationManager 
> (ServiceAuthorizationManager.java:authorize(111)) - Authorization successful 
> for nn/hostn...@example.com (auth:KERBEROS) for protocol=interface 
> org.apache.hadoop.ha.HAServiceProtocol
> 2013-10-02 00:13:49,700 INFO  namenode.FSNamesystem 
> (FSNamesystem.java:stopStandbyServices(1013)) - Stopping services started for 
> standby state
> 2013-10-02 00:13:49,701 WARN  ha.EditLogTailer 
> (EditLogTailer.java:doWork(336)) - Edit log tailer interrupted
> java.lang.InterruptedException: sleep interrupted
> at java.lang.Thread.sleep(Native Method)
> at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:334)
> at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$200(EditLogTailer.java:279)
> at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:296)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:356)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1463)
> at 
> org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:454)
> at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTail
> 2013-10-02 00:13:49,704 INFO  namenode.FSNamesystem 
> (FSNamesystem.java:startActiveServices(885)) - Starting services required for 
> active state
> 2013-10-02 00:13:49,719 INFO  client.QuorumJournalManager 
> (QuorumJournalManager.java:recoverUnfinalizedSegments(419)) - Starting 
> recovery process for unclosed journal segments...
> 2013-10-02 00:13:49,755 INFO  ipc.Server (Server.java:saslProcess(1342)) - 
> Auth successful for hbase/hostn...@example.com (auth:SIMPLE)
> 2013-10-02 00:13:49,761 INFO  authorize.ServiceAuthorizationManager 
> (ServiceAuthorizationManager.java:authorize(111)) - Authorization successful 
> for hbase/hostn...@example.com (auth:KERBEROS) for protocol=interface 
> org.apache.hadoop.hdfs.protocol.ClientProtocol
> 2013-10-02 00:13:49,839 INFO  client.QuorumJournalManager 
> (QuorumJournalManager.java:recoverUnfinalizedSegments(421)) - Successfully 
> started new epoch 85
> 2013-10-02 00:13:49,839 INFO  client.QuorumJournalManager 
> (QuorumJournalManager.java:recoverUnclosedSegment(249)) - Beginning recovery 
> of unclosed segment starting at txid 887112
> 2013-10-02 00:13:49,874 INFO  client.QuorumJournalManager 
> (QuorumJournalManager.java:recoverUnclosedSegment(258)) - Recovery prepare 
> phase complete. Responses:
> IP:8485: segmentState { startTxId: 887112 endTxId: 887531 isInProgress: true 
> } lastWriterEpoch: 84 lastCommittedTxId: 887530
> 172.18.145.97:8485: segmentState { startTxId: 887112 endTxId: 887531 
> isInProgress: true } lastWriterEpoch: 84 lastCommittedTxId: 887530
> 2013-10-02 00:13:49,875 INFO  client.QuorumJournalManager 
> (QuorumJournalManager.java:recover
> {code}
> And then we get into safemode
> {code}
> Construction[IP:1019|RBW]]} size 0
> 2013-10-02 00:13:50,277 INFO  BlockStateChange 
> (BlockManager.java:logAddStoredBlock(2237)) - BLOCK* addStoredBlock: blockMap 
> updated: IP:1019 is added to blk_IP157{blockUCState=UNDER_CONSTRUCTION, 
> primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[IP:1019|RBW], 
> ReplicaUnderConstruction[172.18.145.96:1019|RBW], ReplicaUnde
> rConstruction[IP:1019|RBW]]} size 0
> 2013-10-02 00:13:50,279 INFO  hdfs.StateChange 
> (FSNamesystem.java:reportStatus(4703)) - STATE* Safe mode ON.
> The reported blocks 1071 needs additional 5 blocks t

[jira] [Updated] (HDFS-5291) Standby namenode after transition to active goes into safemode

2013-10-06 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-5291:


Status: Patch Available  (was: Open)

> Standby namenode after transition to active goes into safemode
> --
>
> Key: HDFS-5291
> URL: https://issues.apache.org/jira/browse/HDFS-5291
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ha
>Affects Versions: 2.1.1-beta
>Reporter: Arpit Gupta
>Assignee: Jing Zhao
>Priority: Critical
> Attachments: HDFS-5291.000.patch, nn.log
>
>
> Some log snippets
> standby state to active transition
> {code}
> 2013-10-02 00:13:49,482 INFO  ipc.Server (Server.java:run(2068)) - IPC Server 
> handler 69 on 8020, call 
> org.apache.hadoop.hdfs.protocol.ClientProtocol.renewLease from IP:33911 
> Call#1483 Retry#1: error: org.apache.hadoop.ipc.StandbyException: Operation 
> category WRITE is not supported in state standby
> 2013-10-02 00:13:49,689 INFO  ipc.Server (Server.java:saslProcess(1342)) - 
> Auth successful for nn/hostn...@example.com (auth:SIMPLE)
> 2013-10-02 00:13:49,696 INFO  authorize.ServiceAuthorizationManager 
> (ServiceAuthorizationManager.java:authorize(111)) - Authorization successful 
> for nn/hostn...@example.com (auth:KERBEROS) for protocol=interface 
> org.apache.hadoop.ha.HAServiceProtocol
> 2013-10-02 00:13:49,700 INFO  namenode.FSNamesystem 
> (FSNamesystem.java:stopStandbyServices(1013)) - Stopping services started for 
> standby state
> 2013-10-02 00:13:49,701 WARN  ha.EditLogTailer 
> (EditLogTailer.java:doWork(336)) - Edit log tailer interrupted
> java.lang.InterruptedException: sleep interrupted
> at java.lang.Thread.sleep(Native Method)
> at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:334)
> at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$200(EditLogTailer.java:279)
> at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:296)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:356)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1463)
> at 
> org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:454)
> at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTail
> 2013-10-02 00:13:49,704 INFO  namenode.FSNamesystem 
> (FSNamesystem.java:startActiveServices(885)) - Starting services required for 
> active state
> 2013-10-02 00:13:49,719 INFO  client.QuorumJournalManager 
> (QuorumJournalManager.java:recoverUnfinalizedSegments(419)) - Starting 
> recovery process for unclosed journal segments...
> 2013-10-02 00:13:49,755 INFO  ipc.Server (Server.java:saslProcess(1342)) - 
> Auth successful for hbase/hostn...@example.com (auth:SIMPLE)
> 2013-10-02 00:13:49,761 INFO  authorize.ServiceAuthorizationManager 
> (ServiceAuthorizationManager.java:authorize(111)) - Authorization successful 
> for hbase/hostn...@example.com (auth:KERBEROS) for protocol=interface 
> org.apache.hadoop.hdfs.protocol.ClientProtocol
> 2013-10-02 00:13:49,839 INFO  client.QuorumJournalManager 
> (QuorumJournalManager.java:recoverUnfinalizedSegments(421)) - Successfully 
> started new epoch 85
> 2013-10-02 00:13:49,839 INFO  client.QuorumJournalManager 
> (QuorumJournalManager.java:recoverUnclosedSegment(249)) - Beginning recovery 
> of unclosed segment starting at txid 887112
> 2013-10-02 00:13:49,874 INFO  client.QuorumJournalManager 
> (QuorumJournalManager.java:recoverUnclosedSegment(258)) - Recovery prepare 
> phase complete. Responses:
> IP:8485: segmentState { startTxId: 887112 endTxId: 887531 isInProgress: true 
> } lastWriterEpoch: 84 lastCommittedTxId: 887530
> 172.18.145.97:8485: segmentState { startTxId: 887112 endTxId: 887531 
> isInProgress: true } lastWriterEpoch: 84 lastCommittedTxId: 887530
> 2013-10-02 00:13:49,875 INFO  client.QuorumJournalManager 
> (QuorumJournalManager.java:recover
> {code}
> And then we get into safemode
> {code}
> Construction[IP:1019|RBW]]} size 0
> 2013-10-02 00:13:50,277 INFO  BlockStateChange 
> (BlockManager.java:logAddStoredBlock(2237)) - BLOCK* addStoredBlock: blockMap 
> updated: IP:1019 is added to blk_IP157{blockUCState=UNDER_CONSTRUCTION, 
> primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[IP:1019|RBW], 
> ReplicaUnderConstruction[172.18.145.96:1019|RBW], ReplicaUnde
> rConstruction[IP:1019|RBW]]} size 0
> 2013-10-02 00:13:50,279 INFO  hdfs.StateChange 
> (FSNamesystem.java:reportStatus(4703)) - STATE* Safe mode ON.
> The reported blocks 1071 needs additional 5 blocks to reach the threshold 
> 1. of total blo

[jira] [Updated] (HDFS-5291) Standby namenode after transition to active goes into safemode

2013-10-06 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-5291:


Attachment: HDFS-5291.000.patch

Upload an initial patch for review. 

This patch adds a new RetriableException, and wraps the SafeModeException as 
the new RetriableException if 1) HA is setup, and 2) the NN is in active state. 
Then the client side will retry the same NN if it gets a RetriableException. 

Will add more javadoc and unit tests shortly.

> Standby namenode after transition to active goes into safemode
> --
>
> Key: HDFS-5291
> URL: https://issues.apache.org/jira/browse/HDFS-5291
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ha
>Affects Versions: 2.1.1-beta
>Reporter: Arpit Gupta
>Assignee: Jing Zhao
>Priority: Critical
> Attachments: HDFS-5291.000.patch, nn.log
>
>
> Some log snippets
> standby state to active transition
> {code}
> 2013-10-02 00:13:49,482 INFO  ipc.Server (Server.java:run(2068)) - IPC Server 
> handler 69 on 8020, call 
> org.apache.hadoop.hdfs.protocol.ClientProtocol.renewLease from IP:33911 
> Call#1483 Retry#1: error: org.apache.hadoop.ipc.StandbyException: Operation 
> category WRITE is not supported in state standby
> 2013-10-02 00:13:49,689 INFO  ipc.Server (Server.java:saslProcess(1342)) - 
> Auth successful for nn/hostn...@example.com (auth:SIMPLE)
> 2013-10-02 00:13:49,696 INFO  authorize.ServiceAuthorizationManager 
> (ServiceAuthorizationManager.java:authorize(111)) - Authorization successful 
> for nn/hostn...@example.com (auth:KERBEROS) for protocol=interface 
> org.apache.hadoop.ha.HAServiceProtocol
> 2013-10-02 00:13:49,700 INFO  namenode.FSNamesystem 
> (FSNamesystem.java:stopStandbyServices(1013)) - Stopping services started for 
> standby state
> 2013-10-02 00:13:49,701 WARN  ha.EditLogTailer 
> (EditLogTailer.java:doWork(336)) - Edit log tailer interrupted
> java.lang.InterruptedException: sleep interrupted
> at java.lang.Thread.sleep(Native Method)
> at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:334)
> at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$200(EditLogTailer.java:279)
> at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:296)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:356)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1463)
> at 
> org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:454)
> at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTail
> 2013-10-02 00:13:49,704 INFO  namenode.FSNamesystem 
> (FSNamesystem.java:startActiveServices(885)) - Starting services required for 
> active state
> 2013-10-02 00:13:49,719 INFO  client.QuorumJournalManager 
> (QuorumJournalManager.java:recoverUnfinalizedSegments(419)) - Starting 
> recovery process for unclosed journal segments...
> 2013-10-02 00:13:49,755 INFO  ipc.Server (Server.java:saslProcess(1342)) - 
> Auth successful for hbase/hostn...@example.com (auth:SIMPLE)
> 2013-10-02 00:13:49,761 INFO  authorize.ServiceAuthorizationManager 
> (ServiceAuthorizationManager.java:authorize(111)) - Authorization successful 
> for hbase/hostn...@example.com (auth:KERBEROS) for protocol=interface 
> org.apache.hadoop.hdfs.protocol.ClientProtocol
> 2013-10-02 00:13:49,839 INFO  client.QuorumJournalManager 
> (QuorumJournalManager.java:recoverUnfinalizedSegments(421)) - Successfully 
> started new epoch 85
> 2013-10-02 00:13:49,839 INFO  client.QuorumJournalManager 
> (QuorumJournalManager.java:recoverUnclosedSegment(249)) - Beginning recovery 
> of unclosed segment starting at txid 887112
> 2013-10-02 00:13:49,874 INFO  client.QuorumJournalManager 
> (QuorumJournalManager.java:recoverUnclosedSegment(258)) - Recovery prepare 
> phase complete. Responses:
> IP:8485: segmentState { startTxId: 887112 endTxId: 887531 isInProgress: true 
> } lastWriterEpoch: 84 lastCommittedTxId: 887530
> 172.18.145.97:8485: segmentState { startTxId: 887112 endTxId: 887531 
> isInProgress: true } lastWriterEpoch: 84 lastCommittedTxId: 887530
> 2013-10-02 00:13:49,875 INFO  client.QuorumJournalManager 
> (QuorumJournalManager.java:recover
> {code}
> And then we get into safemode
> {code}
> Construction[IP:1019|RBW]]} size 0
> 2013-10-02 00:13:50,277 INFO  BlockStateChange 
> (BlockManager.java:logAddStoredBlock(2237)) - BLOCK* addStoredBlock: blockMap 
> updated: IP:1019 is added to blk_IP157{blockUCState=UNDER_CONSTRUCTION, 
> primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[IP:1019|RBW

[jira] [Updated] (HDFS-5291) Standby namenode after transition to active goes into safemode

2013-10-02 Thread Arpit Gupta (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Gupta updated HDFS-5291:
--

Attachment: nn.log

> Standby namenode after transition to active goes into safemode
> --
>
> Key: HDFS-5291
> URL: https://issues.apache.org/jira/browse/HDFS-5291
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ha
>Affects Versions: 2.1.1-beta
>Reporter: Arpit Gupta
>Assignee: Jing Zhao
>Priority: Critical
> Attachments: nn.log
>
>
> Some log snippets
> standby state to active transition
> {code}
> 2013-10-02 00:13:49,482 INFO  ipc.Server (Server.java:run(2068)) - IPC Server 
> handler 69 on 8020, call 
> org.apache.hadoop.hdfs.protocol.ClientProtocol.renewLease from IP:33911 
> Call#1483 Retry#1: error: org.apache.hadoop.ipc.StandbyException: Operation 
> category WRITE is not supported in state standby
> 2013-10-02 00:13:49,689 INFO  ipc.Server (Server.java:saslProcess(1342)) - 
> Auth successful for nn/hostn...@example.com (auth:SIMPLE)
> 2013-10-02 00:13:49,696 INFO  authorize.ServiceAuthorizationManager 
> (ServiceAuthorizationManager.java:authorize(111)) - Authorization successful 
> for nn/hostn...@example.com (auth:KERBEROS) for protocol=interface 
> org.apache.hadoop.ha.HAServiceProtocol
> 2013-10-02 00:13:49,700 INFO  namenode.FSNamesystem 
> (FSNamesystem.java:stopStandbyServices(1013)) - Stopping services started for 
> standby state
> 2013-10-02 00:13:49,701 WARN  ha.EditLogTailer 
> (EditLogTailer.java:doWork(336)) - Edit log tailer interrupted
> java.lang.InterruptedException: sleep interrupted
> at java.lang.Thread.sleep(Native Method)
> at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:334)
> at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$200(EditLogTailer.java:279)
> at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:296)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:356)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1463)
> at 
> org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:454)
> at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTail
> 2013-10-02 00:13:49,704 INFO  namenode.FSNamesystem 
> (FSNamesystem.java:startActiveServices(885)) - Starting services required for 
> active state
> 2013-10-02 00:13:49,719 INFO  client.QuorumJournalManager 
> (QuorumJournalManager.java:recoverUnfinalizedSegments(419)) - Starting 
> recovery process for unclosed journal segments...
> 2013-10-02 00:13:49,755 INFO  ipc.Server (Server.java:saslProcess(1342)) - 
> Auth successful for hbase/hostn...@example.com (auth:SIMPLE)
> 2013-10-02 00:13:49,761 INFO  authorize.ServiceAuthorizationManager 
> (ServiceAuthorizationManager.java:authorize(111)) - Authorization successful 
> for hbase/hostn...@example.com (auth:KERBEROS) for protocol=interface 
> org.apache.hadoop.hdfs.protocol.ClientProtocol
> 2013-10-02 00:13:49,839 INFO  client.QuorumJournalManager 
> (QuorumJournalManager.java:recoverUnfinalizedSegments(421)) - Successfully 
> started new epoch 85
> 2013-10-02 00:13:49,839 INFO  client.QuorumJournalManager 
> (QuorumJournalManager.java:recoverUnclosedSegment(249)) - Beginning recovery 
> of unclosed segment starting at txid 887112
> 2013-10-02 00:13:49,874 INFO  client.QuorumJournalManager 
> (QuorumJournalManager.java:recoverUnclosedSegment(258)) - Recovery prepare 
> phase complete. Responses:
> IP:8485: segmentState { startTxId: 887112 endTxId: 887531 isInProgress: true 
> } lastWriterEpoch: 84 lastCommittedTxId: 887530
> 172.18.145.97:8485: segmentState { startTxId: 887112 endTxId: 887531 
> isInProgress: true } lastWriterEpoch: 84 lastCommittedTxId: 887530
> 2013-10-02 00:13:49,875 INFO  client.QuorumJournalManager 
> (QuorumJournalManager.java:recover
> {code}
> And then we get into safemode
> {code}
> Construction[IP:1019|RBW]]} size 0
> 2013-10-02 00:13:50,277 INFO  BlockStateChange 
> (BlockManager.java:logAddStoredBlock(2237)) - BLOCK* addStoredBlock: blockMap 
> updated: IP:1019 is added to blk_IP157{blockUCState=UNDER_CONSTRUCTION, 
> primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[IP:1019|RBW], 
> ReplicaUnderConstruction[172.18.145.96:1019|RBW], ReplicaUnde
> rConstruction[IP:1019|RBW]]} size 0
> 2013-10-02 00:13:50,279 INFO  hdfs.StateChange 
> (FSNamesystem.java:reportStatus(4703)) - STATE* Safe mode ON.
> The reported blocks 1071 needs additional 5 blocks to reach the threshold 
> 1. of total blocks 1075.
> Safe mode will be turne