[ 
https://issues.apache.org/jira/browse/HDDS-4224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mukul Kumar Singh updated HDDS-4224:
------------------------------------
    Labels: MiniOzoneChaosCluster  (was: )

> OM failed to install snapshots after OM failover
> ------------------------------------------------
>
>                 Key: HDDS-4224
>                 URL: https://issues.apache.org/jira/browse/HDDS-4224
>             Project: Hadoop Distributed Data Store
>          Issue Type: Bug
>          Components: Ozone Manager
>            Reporter: Mukul Kumar Singh
>            Priority: Major
>              Labels: MiniOzoneChaosCluster
>
> OM failed to install snapshots after OM failover
> {code}
> 2020-09-09 22:07:13,746 
> [org.apache.ratis.server.impl.LogAppender$AppenderDaemon$$Lambda$380/117485186@47069ab2]
>  INFO  server.GrpcLogAppender (GrpcLogAppender.java:installSnapshot(495)) - 
> omNode-1@group-D62
> 218D261DE->omNode-2-GrpcLogAppender: followerNextIndex = 65949 but 
> logStartIndex = 68440, notify follower to install snapshot-(t:2, i:68440)
> 2020-09-09 22:07:13,746 [grpc-default-executor-52] INFO  impl.RaftServerImpl 
> (RaftServerImpl.java:notifyStateMachineToInstallSnapshot(1282)) - 
> omNode-2@group-D62218D261DE: Snapshot Installation by StateMach
> ine is in progress.
> 2020-09-09 22:07:13,752 [grpc-default-executor-52] INFO  impl.RaftServerImpl 
> (RaftServerImpl.java:installSnapshot(1127)) - omNode-2@group-D62218D261DE: 
> reply installSnapshot: omNode-1<-omNode-2#0:FAIL-t2,IN
> _PROGRESS
> 2020-09-09 22:07:13,746 [grpc-default-executor-51] INFO  
> server.GrpcLogAppender (GrpcLogAppender.java:onNext(375)) - 
> omNode-1@group-D62218D261DE->omNode-2-InstallSnapshotResponseHandler: 
> received a reply om
> Node-1<-omNode-2#0:FAIL-t2,IN_PROGRESS
> 2020-09-09 22:07:13,752 [grpc-default-executor-51] INFO  
> server.GrpcLogAppender (GrpcLogAppender.java:onNext(392)) - 
> omNode-1@group-D62218D261DE->omNode-2-InstallSnapshotResponseHandler: 
> InstallSnapshot in
> progress.
> 2020-09-09 22:07:13,746 [grpc-default-executor-22] INFO  
> server.GrpcServerProtocolService 
> (GrpcServerProtocolService.java:onCompleted(138)) - omNode-2: Completed 
> INSTALL_SNAPSHOT, lastRequest: omNode-1->omN
> ode-2#0-t2,notify:(t:2, i:68440)
> 2020-09-09 22:07:13,753 [grpc-default-executor-51] INFO  
> server.GrpcLogAppender (GrpcLogAppender.java:onNext(375)) - 
> omNode-1@group-D62218D261DE->omNode-2-InstallSnapshotResponseHandler: 
> received a reply om
> Node-1<-omNode-2#0:FAIL-t2,IN_PROGRESS
> 2020-09-09 22:07:13,753 [grpc-default-executor-51] INFO  
> server.GrpcLogAppender (GrpcLogAppender.java:onNext(392)) - 
> omNode-1@group-D62218D261DE->omNode-2-InstallSnapshotResponseHandler: 
> InstallSnapshot in
> progress.
> 2020-09-09 22:07:13,752 [grpc-default-executor-52] INFO  
> server.GrpcServerProtocolService 
> (GrpcServerProtocolService.java:onCompleted(138)) - omNode-2: Completed 
> INSTALL_SNAPSHOT, lastRequest: omNode-1->omN
> ode-2#0-t2,notify:(t:2, i:68440)
> 2020-09-09 22:07:13,747 
> [org.apache.ratis.server.impl.LogAppender$AppenderDaemon$$Lambda$380/117485186@47069ab2]
>  INFO  server.GrpcLogAppender (GrpcLogAppender.java:installSnapshot(503)) - 
> omNode-1@group-D62
> 218D261DE->omNode-2-GrpcLogAppender: send 
> omNode-1->omNode-2#0-t2,notify:(t:2, i:68440)
> 2020-09-09 22:07:13,756 [pool-144-thread-1] ERROR om.OzoneManager 
> (OzoneManager.java:installCheckpoint(3178)) - Failed to stop/ pause the 
> services. Cannot proceed with installing the new checkpoint.
> 2020-09-09 22:07:13,759 [pool-144-thread-1] ERROR om.OzoneManager 
> (OzoneManager.java:installSnapshotFromLeader(3141)) - Failed to install 
> snapshot from Leader OM: {}
> java.lang.IllegalStateException: ILLEGAL TRANSITION: In 
> OzoneManagerStateMachine:omNode-2:group-D62218D261DE, PAUSED -> PAUSING
>         at 
> org.apache.ratis.util.Preconditions.assertTrue(Preconditions.java:63)
>         at org.apache.ratis.util.LifeCycle$State.validate(LifeCycle.java:115)
>         at org.apache.ratis.util.LifeCycle.transition(LifeCycle.java:155)
>         at 
> org.apache.hadoop.ozone.om.ratis.OzoneManagerStateMachine.pause(OzoneManagerStateMachine.java:305)
>         at 
> org.apache.hadoop.ozone.om.OzoneManager.installCheckpoint(OzoneManager.java:3176)
>         at 
> org.apache.hadoop.ozone.om.OzoneManager.installCheckpoint(OzoneManager.java:3162)
>         at 
> org.apache.hadoop.ozone.om.OzoneManager.installSnapshotFromLeader(OzoneManager.java:3139)
>         at 
> org.apache.hadoop.ozone.om.ratis.OzoneManagerStateMachine.lambda$notifyInstallSnapshotFromLeader$4(OzoneManagerStateMachine.java:372)
>         at 
> java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at java.lang.Thread.run(Thread.java:748)
> 2020-09-09 22:07:13,760 
> [org.apache.ratis.server.impl.LogAppender$AppenderDaemon$$Lambda$380/117485186@47069ab2]
>  INFO  server.GrpcLogAppender (GrpcLogAppender.java:installSnapshot(495)) - 
> omNode-1@group-D62218D261DE->omNode-2-GrpcLogAppender: followerNextIndex = 
> 65949 but logStartIndex = 68440, notify follower to install snapshot-(t:2, 
> i:68440)
> 2020-09-09 22:07:13,759 [grpc-default-executor-52] INFO  impl.RaftServerImpl 
> (RaftServerImpl.java:installSnapshot(1117)) - omNode-2@group-D62218D261DE: 
> receive installSnapshot: omNode-1->omNode-2#0-t2,notify:(t:2, i:68440)
> 2020-09-09 22:07:13,765 
> [org.apache.ratis.server.impl.LogAppender$AppenderDaemon$$Lambda$380/117485186@47069ab2]
>  INFO  server.GrpcLogAppender (GrpcLogAppender.java:installSnapshot(503)) - 
> omNode-1@group-D62218D261DE->omNode-2-GrpcLogAppender: send 
> omNode-1->omNode-2#0-t2,notify:(t:2, i:68440)
> 2020-09-09 22:07:13,765 [grpc-default-executor-52] INFO  impl.RaftServerImpl 
> (RaftServerImpl.java:notifyStateMachineToInstallSnapshot(1251)) - 
> omNode-2@group-D62218D261DE: notifyInstallSnapshot: nextIndex is 67621 but 
> the leader's first available index is 68440.
> 2020-09-09 22:07:13,766 [grpc-default-executor-52] INFO  
> ratis.OzoneManagerStateMachine 
> (OzoneManagerStateMachine.java:notifyInstallSnapshotFromLeader(368)) - 
> Received install snapshot notification from OM leader: omNode-1 with term 
> index: (t:2, i:68440)
> 2020-09-09 22:07:13,766 [grpc-default-executor-52] INFO  impl.RaftServerImpl 
> (RaftServerImpl.java:installSnapshot(1127)) - omNode-2@group-D62218D261DE: 
> reply installSnapshot: omNode-1<-omNode-2#0:FAIL-t2,IN_PROGRESS
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org

Reply via email to