[ https://issues.apache.org/jira/browse/IOTDB-4027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
刘珍 reopened IOTDB-4027: ----------------------- master_0906_0095eb3 在snapshot前 ,down follower ip2 ,ip2在线后,raft log不同步 ip2日志 2022-09-06 15:53:31,056 [null-request--thread1] INFO o.a.r.g.s.GrpcClientProtocolService$UnorderedRequestStreamObserver:284 - Failed RaftClientRequest:client-0DF34DD58B52->172.20.70.2_40010@group-000100000002, cid=36, seq=0, RW, Message:000d00000012726f6f74...(size=68644), reply=RaftClientReply:client-0DF34DD58B52->172.20.70.2_40010@group-000100000002, cid=36, FAILED org.apache.ratis.protocol.exceptions.NotLeaderException: Server 172.20.70.2_40010@group-000100000002 is not the leader, logIndex=0, commits[172.20.70.2_40010:c202840, 172.20.70.14_40010:c556712, 172.20.70.18_40010:c556711] > ERROR o.a.i.d.e.s.SnapshotLoader:94 - Exception occurs when creating links > from snapshot directory to data directory > --------------------------------------------------------------------------------------------------------------------- > > Key: IOTDB-4027 > URL: https://issues.apache.org/jira/browse/IOTDB-4027 > Project: Apache IoTDB > Issue Type: Bug > Components: mpp-cluster > Affects Versions: 0.14.0-SNAPSHOT > Reporter: 刘珍 > Assignee: Liuxuxin > Priority: Major > Labels: pull-request-available > Fix For: 0.14.0 > > Attachments: image-2022-08-03-09-39-10-230.png, > image-2022-08-03-09-39-48-739.png, image-2022-09-06-17-05-21-387.png, > ip18_befor_stop_datanode_log.tar.gz, ip18_restart_with-error_log.tar.gz, > ip4_2000_config.properties, screenshot-1.png > > > master_0801_55b5b17 > 问题描述 > RatisConsensus,3副本3C9D,1个bm连1个datanode执行并发写入,停止1个follower节点,5分钟后启动;{color:#DE350B}*然后停止另1个follower节点10分钟后启动,此节点启动过程中报错,此节点少数据*{color}: > 2022-08-02 18:04:17,376 [pool-4-thread-1] ERROR o.a.i.d.e.s.SnapshotLoader:94 > - Exception occurs when creating links from snapshot directory to data > directory > java.io.IOException: Cannot find > /data/iotdb/master_0801_2de0dd8/datanode/./sbin/../data/consensus/data_region/47474747-4747-4747-4747-000100000001/sm/1_354536/sequence/root.ip4.g_0 > or > /data/iotdb/master_0801_2de0dd8/datanode/./sbin/../data/consensus/data_region/47474747-4747-4747-4747-000100000001/sm/1_354536/unsequence/root.ip4.g_0 > at > org.apache.iotdb.db.engine.snapshot.SnapshotLoader.createLinksFromSnapshotDirToDataDir(SnapshotLoader.java:163) > at > org.apache.iotdb.db.engine.snapshot.SnapshotLoader.loadSnapshotForStateMachine(SnapshotLoader.java:91) > at > org.apache.iotdb.db.consensus.statemachine.DataRegionStateMachine.loadSnapshot(DataRegionStateMachine.java:93) > at > org.apache.iotdb.consensus.ratis.ApplicationStateMachineProxy.loadSnapshot(ApplicationStateMachineProxy.java:188) > at > org.apache.iotdb.consensus.ratis.ApplicationStateMachineProxy.lambda$initialize$0(ApplicationStateMachineProxy.java:73) > at > org.apache.ratis.util.LifeCycle.startAndTransition(LifeCycle.java:270) > at > org.apache.iotdb.consensus.ratis.ApplicationStateMachineProxy.initialize(ApplicationStateMachineProxy.java:69) > at > org.apache.ratis.server.impl.ServerState.<init>(ServerState.java:136) > at > org.apache.ratis.server.impl.RaftServerImpl.<init>(RaftServerImpl.java:201) > at > org.apache.ratis.server.impl.RaftServerProxy.lambda$newRaftServerImpl$5(RaftServerProxy.java:274) > at > java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > 2022-08-02 18:04:17,376 [pool-4-thread-1] ERROR > o.a.i.d.c.s.DataRegionStateMachine:95 - Fail to load snapshot from > /data/iotdb/master_0801_2de0dd8/datanode/./sbin/../data/consensus/data_region/47474747-4747-4747-4747-000100000001/sm/1_354536 > ip18少数据,期望序列的count值是20000点 > !screenshot-1.png! > 1. 复现流程 > 私有云172.20.70.2/3/4/5/13/14/16/18/19 > benchmark 在ip15(连ip4) > 停ip4/启动ip4 , 停ip18/启动ip18,ip18报错 > !image-2022-08-03-09-39-10-230.png! > !image-2022-08-03-09-39-48-739.png! > 2. 启动benchmark > 2022-08-02 17:34:57 启动bm > 3. 停止ip4的datanode > 2022-08-02 17:45:42停止datanode > sleep 300 > 启动ip4 > 4. 停止ip18的datanode > 2022-08-02 17:54:11 停止ip18的datanode > sleep 600 > 启动ip18 > {color:#DE350B}*启动过程中,报错*{color}: > 见问题描述 > bm写入完成,各节点同步完成,{color:#DE350B}*ip18节点少数据*{color},ip16,ip4 的数据正确。 -- This message was sent by Atlassian Jira (v8.20.10#820010)