[ https://issues.apache.org/jira/browse/RATIS-2192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17950357#comment-17950357 ]
Tsz-wo Sze commented on RATIS-2192: ----------------------------------- BTW, there is a discussion in the dev@ mailing list for moving the (incomplete) zero-copy feature to a development branch: - https://lists.apache.org/thread/y2y7dyff1mmctgbo84lsytwg8bdzsq35 As the same time, we propose to rework RATIS-2129. > Lots of errors after applying RATIS-2129 > ---------------------------------------- > > Key: RATIS-2192 > URL: https://issues.apache.org/jira/browse/RATIS-2192 > Project: Ratis > Issue Type: Bug > Affects Versions: 3.2.0 > Reporter: Wei-Chiu Chuang > Priority: Blocker > Attachments: ozone-datanode.1.tgz, ozone-datanode.2.tgz, > ozone-datanode.3.tgz > > > Ok to be honest I am not sure if it's related to RATIS-2129. But I'm using a > build that is Ratis 3.1.1 + RATIS-2129, and I am seeing all kinds of errors > running HBase on Ozone. > failed to take snapshot due to last applied txn not current: > {noformat} > 2024-11-16 00:10:31,035 INFO > [grpc-default-executor-22]-org.apache.ratis.server.RaftServer: > e693615a-d484-4165-8446-dff08cac5978: remove FOLLOWER > e693615a-d484-4165-8446-dff08cac5978@group-AF4CEBD817A1:t229, > leader=67eefe63-0930-42d7-a364-e46fde563ff1, > voted=67eefe63-0930-42d7-a364-e46fde563ff1, > raftlog=Memoized:e693615a-d484-4165-8446-dff08cac5978@group-AF4CEBD817A1-SegmentedRaftLog:OPENED:c1613342:last(t:229, > i:1613343), conf=conf: {index: 1613340, > cur=peers:[e693615a-d484-4165-8446-dff08cac5978|10.140.146.67:9856, > 67eefe63-0930-42d7-a364-e46fde563ff1|10.140.86.199:9856, > 7cc563b3-14b5-4334-820b-5c3bbecffad8|10.140.20.0:9856]|listeners:[], > old=null} RUNNING > 2024-11-16 00:10:31,038 INFO > [grpc-default-executor-22]-org.apache.ratis.server.RaftServer$Division: > e693615a-d484-4165-8446-dff08cac5978@group-AF4CEBD817A1: shutdown > 2024-11-16 00:10:31,039 INFO > [grpc-default-executor-22]-org.apache.ratis.util.JmxRegister: Successfully > un-registered JMX Bean with object name > Ratis:service=RaftServer,group=group-AF4CEBD817A1,id=e693615a-d484-4165-8446-dff08cac5978 > 2024-11-16 00:10:31,039 INFO > [grpc-default-executor-22]-org.apache.ratis.server.impl.RoleInfo: > e693615a-d484-4165-8446-dff08cac5978: shutdown > e693615a-d484-4165-8446-dff08cac5978@group-AF4CEBD817A1-FollowerState > 2024-11-16 00:10:31,039 INFO > [grpc-default-executor-22]-org.apache.ratis.server.impl.StateMachineUpdater: > e693615a-d484-4165-8446-dff08cac5978@group-AF4CEBD817A1-StateMachineUpdater: > set stopIndex = 1613342 > 2024-11-16 00:10:31,039 INFO > [e693615a-d484-4165-8446-dff08cac5978@group-AF4CEBD817A1-FollowerState]-org.apache.ratis.server.impl.FollowerState: > e693615a-d484-4165-8446-dff08cac5978@group-AF4CEBD817A1-FollowerState was > interrupted > 2024-11-16 00:10:31,043 ERROR > [e693615a-d484-4165-8446-dff08cac5978@group-AF4CEBD817A1-StateMachineUpdater]-org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine: > Failed to take snapshot for group-AF4CEBD817A1 as the stateMachine is > unhealthy. The last applied index is at (t:216, i:1613313) > 2024-11-16 00:10:31,043 ERROR > [e693615a-d484-4165-8446-dff08cac5978@group-AF4CEBD817A1-StateMachineUpdater]-org.apache.ratis.server.impl.StateMachineUpdater: > e693615a-d484-4165-8446-dff08cac5978@group-AF4CEBD817A1-StateMachineUpdater: > Failed to take snapshot > org.apache.ratis.protocol.exceptions.StateMachineException: Failed to take > snapshot for group-AF4CEBD817A1 as the stateMachine is unhealthy. The last > applied index is at (t:216, i:1613313) > at > org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.takeSnapshot(ContainerStateMachine.java:356) > at > org.apache.ratis.server.impl.StateMachineUpdater.takeSnapshot(StateMachineUpdater.java:286) > at > org.apache.ratis.server.impl.StateMachineUpdater.checkAndTakeSnapshot(StateMachineUpdater.java:278) > at > org.apache.ratis.server.impl.StateMachineUpdater.run(StateMachineUpdater.java:194) > at java.lang.Thread.run(Thread.java:748) > {noformat} > Log entry not found > {noformat} > 2024-11-14 01:59:37,516 WARN > [7cc563b3-14b5-4334-820b-5c3bbecffad8@group-0C8C280DCAED->67eefe63-0930-42d7-a364-e46fde563ff1-GrpcLogAppender-LogAppenderDaemon]-org.apache.r > atis.server.leader.LogAppenderDaemon: > 7cc563b3-14b5-4334-820b-5c3bbecffad8@group-0C8C280DCAED->67eefe63-0930-42d7-a364-e46fde563ff1-GrpcLogAppender-LogAppenderDaemon > faile > d > org.apache.ratis.server.raftlog.RaftLogIOException: Log entry not found: > index = 3205 > at > org.apache.ratis.server.raftlog.segmented.SegmentedRaftLog.getEntryWithData(SegmentedRaftLog.java:301) > at > org.apache.ratis.server.leader.LogAppenderBase.newAppendEntriesRequest(LogAppenderBase.java:240) > at > org.apache.ratis.grpc.server.GrpcLogAppender.appendLog(GrpcLogAppender.java:387) > at > org.apache.ratis.grpc.server.GrpcLogAppender.run(GrpcLogAppender.java:262) > at > org.apache.ratis.server.leader.LogAppenderDaemon.run(LogAppenderDaemon.java:80) > at java.lang.Thread.run(Thread.java:748) > {noformat} > HDDS-11720 seems to be related too. -- This message was sent by Atlassian Jira (v8.20.10#820010)