[ https://issues.apache.org/jira/browse/HDDS-4224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Mukul Kumar Singh updated HDDS-4224: ------------------------------------ Labels: MiniOzoneChaosCluster (was: ) > OM failed to install snapshots after OM failover > ------------------------------------------------ > > Key: HDDS-4224 > URL: https://issues.apache.org/jira/browse/HDDS-4224 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Manager > Reporter: Mukul Kumar Singh > Priority: Major > Labels: MiniOzoneChaosCluster > > OM failed to install snapshots after OM failover > {code} > 2020-09-09 22:07:13,746 > [org.apache.ratis.server.impl.LogAppender$AppenderDaemon$$Lambda$380/117485186@47069ab2] > INFO server.GrpcLogAppender (GrpcLogAppender.java:installSnapshot(495)) - > omNode-1@group-D62 > 218D261DE->omNode-2-GrpcLogAppender: followerNextIndex = 65949 but > logStartIndex = 68440, notify follower to install snapshot-(t:2, i:68440) > 2020-09-09 22:07:13,746 [grpc-default-executor-52] INFO impl.RaftServerImpl > (RaftServerImpl.java:notifyStateMachineToInstallSnapshot(1282)) - > omNode-2@group-D62218D261DE: Snapshot Installation by StateMach > ine is in progress. > 2020-09-09 22:07:13,752 [grpc-default-executor-52] INFO impl.RaftServerImpl > (RaftServerImpl.java:installSnapshot(1127)) - omNode-2@group-D62218D261DE: > reply installSnapshot: omNode-1<-omNode-2#0:FAIL-t2,IN > _PROGRESS > 2020-09-09 22:07:13,746 [grpc-default-executor-51] INFO > server.GrpcLogAppender (GrpcLogAppender.java:onNext(375)) - > omNode-1@group-D62218D261DE->omNode-2-InstallSnapshotResponseHandler: > received a reply om > Node-1<-omNode-2#0:FAIL-t2,IN_PROGRESS > 2020-09-09 22:07:13,752 [grpc-default-executor-51] INFO > server.GrpcLogAppender (GrpcLogAppender.java:onNext(392)) - > omNode-1@group-D62218D261DE->omNode-2-InstallSnapshotResponseHandler: > InstallSnapshot in > progress. > 2020-09-09 22:07:13,746 [grpc-default-executor-22] INFO > server.GrpcServerProtocolService > (GrpcServerProtocolService.java:onCompleted(138)) - omNode-2: Completed > INSTALL_SNAPSHOT, lastRequest: omNode-1->omN > ode-2#0-t2,notify:(t:2, i:68440) > 2020-09-09 22:07:13,753 [grpc-default-executor-51] INFO > server.GrpcLogAppender (GrpcLogAppender.java:onNext(375)) - > omNode-1@group-D62218D261DE->omNode-2-InstallSnapshotResponseHandler: > received a reply om > Node-1<-omNode-2#0:FAIL-t2,IN_PROGRESS > 2020-09-09 22:07:13,753 [grpc-default-executor-51] INFO > server.GrpcLogAppender (GrpcLogAppender.java:onNext(392)) - > omNode-1@group-D62218D261DE->omNode-2-InstallSnapshotResponseHandler: > InstallSnapshot in > progress. > 2020-09-09 22:07:13,752 [grpc-default-executor-52] INFO > server.GrpcServerProtocolService > (GrpcServerProtocolService.java:onCompleted(138)) - omNode-2: Completed > INSTALL_SNAPSHOT, lastRequest: omNode-1->omN > ode-2#0-t2,notify:(t:2, i:68440) > 2020-09-09 22:07:13,747 > [org.apache.ratis.server.impl.LogAppender$AppenderDaemon$$Lambda$380/117485186@47069ab2] > INFO server.GrpcLogAppender (GrpcLogAppender.java:installSnapshot(503)) - > omNode-1@group-D62 > 218D261DE->omNode-2-GrpcLogAppender: send > omNode-1->omNode-2#0-t2,notify:(t:2, i:68440) > 2020-09-09 22:07:13,756 [pool-144-thread-1] ERROR om.OzoneManager > (OzoneManager.java:installCheckpoint(3178)) - Failed to stop/ pause the > services. Cannot proceed with installing the new checkpoint. > 2020-09-09 22:07:13,759 [pool-144-thread-1] ERROR om.OzoneManager > (OzoneManager.java:installSnapshotFromLeader(3141)) - Failed to install > snapshot from Leader OM: {} > java.lang.IllegalStateException: ILLEGAL TRANSITION: In > OzoneManagerStateMachine:omNode-2:group-D62218D261DE, PAUSED -> PAUSING > at > org.apache.ratis.util.Preconditions.assertTrue(Preconditions.java:63) > at org.apache.ratis.util.LifeCycle$State.validate(LifeCycle.java:115) > at org.apache.ratis.util.LifeCycle.transition(LifeCycle.java:155) > at > org.apache.hadoop.ozone.om.ratis.OzoneManagerStateMachine.pause(OzoneManagerStateMachine.java:305) > at > org.apache.hadoop.ozone.om.OzoneManager.installCheckpoint(OzoneManager.java:3176) > at > org.apache.hadoop.ozone.om.OzoneManager.installCheckpoint(OzoneManager.java:3162) > at > org.apache.hadoop.ozone.om.OzoneManager.installSnapshotFromLeader(OzoneManager.java:3139) > at > org.apache.hadoop.ozone.om.ratis.OzoneManagerStateMachine.lambda$notifyInstallSnapshotFromLeader$4(OzoneManagerStateMachine.java:372) > at > java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > 2020-09-09 22:07:13,760 > [org.apache.ratis.server.impl.LogAppender$AppenderDaemon$$Lambda$380/117485186@47069ab2] > INFO server.GrpcLogAppender (GrpcLogAppender.java:installSnapshot(495)) - > omNode-1@group-D62218D261DE->omNode-2-GrpcLogAppender: followerNextIndex = > 65949 but logStartIndex = 68440, notify follower to install snapshot-(t:2, > i:68440) > 2020-09-09 22:07:13,759 [grpc-default-executor-52] INFO impl.RaftServerImpl > (RaftServerImpl.java:installSnapshot(1117)) - omNode-2@group-D62218D261DE: > receive installSnapshot: omNode-1->omNode-2#0-t2,notify:(t:2, i:68440) > 2020-09-09 22:07:13,765 > [org.apache.ratis.server.impl.LogAppender$AppenderDaemon$$Lambda$380/117485186@47069ab2] > INFO server.GrpcLogAppender (GrpcLogAppender.java:installSnapshot(503)) - > omNode-1@group-D62218D261DE->omNode-2-GrpcLogAppender: send > omNode-1->omNode-2#0-t2,notify:(t:2, i:68440) > 2020-09-09 22:07:13,765 [grpc-default-executor-52] INFO impl.RaftServerImpl > (RaftServerImpl.java:notifyStateMachineToInstallSnapshot(1251)) - > omNode-2@group-D62218D261DE: notifyInstallSnapshot: nextIndex is 67621 but > the leader's first available index is 68440. > 2020-09-09 22:07:13,766 [grpc-default-executor-52] INFO > ratis.OzoneManagerStateMachine > (OzoneManagerStateMachine.java:notifyInstallSnapshotFromLeader(368)) - > Received install snapshot notification from OM leader: omNode-1 with term > index: (t:2, i:68440) > 2020-09-09 22:07:13,766 [grpc-default-executor-52] INFO impl.RaftServerImpl > (RaftServerImpl.java:installSnapshot(1127)) - omNode-2@group-D62218D261DE: > reply installSnapshot: omNode-1<-omNode-2#0:FAIL-t2,IN_PROGRESS > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org