刘珍 created IOTDB-4809: ------------------------- Summary: [ remove datanode ] ConsensusGroupNotExistException: The consensus group DataRegion[11] doesn't exist Key: IOTDB-4809 URL: https://issues.apache.org/jira/browse/IOTDB-4809 Project: Apache IoTDB Issue Type: Bug Components: mpp-cluster Affects Versions: 0.14.0-SNAPSHOT Reporter: 刘珍 Assignee: Haonan Hou
m_1031_76b947f 3rep , 3C5D schema region : ratis data region : multiLeader benchmark runs for 1 hour and execute remove (ip72) datanode , ip72 datanode brushes ERROR logs : 2022-10-31 17:48:06,277 [pool-25-IoTDB-ClientRPC-Processor-85$20221031_094806_31457_3.1.0] ERROR o.a.i.d.m.p.s.FragmentInstanceDispatcherImpl:234 - write locally failed. TSStatus: TSStatus(code:412, message:org.apache.iotdb.consensus.exception.ConsensusGroupNotExistException: The consensus group DataRegion[13] doesn't exist), message: org.apache.iotdb.consensus.exception.ConsensusGroupNotExistException: The consensus group DataRegion[13] doesn't exist 2022-10-31 17:48:06,285 [pool-25-IoTDB-ClientRPC-Processor-93$20221031_094806_31458_3.1.0] ERROR o.a.i.d.m.e.e.RegionWriteExecutor$WritePlanNodeExecutionVisitor:235 - Something wrong happened while calling consensus layer's write API. org.apache.iotdb.consensus.exception.ConsensusGroupNotExistException: The consensus group DataRegion[11] doesn't exist at org.apache.iotdb.consensus.multileader.MultiLeaderConsensus.write(MultiLeaderConsensus.java:155) at org.apache.iotdb.db.mpp.execution.executor.RegionWriteExecutor.fireTriggerAndInsert(RegionWriteExecutor.java:101) at org.apache.iotdb.db.mpp.execution.executor.RegionWriteExecutor$WritePlanNodeExecutionVisitor.executeDataInsert(RegionWriteExecutor.java:215) at org.apache.iotdb.db.mpp.execution.executor.RegionWriteExecutor$WritePlanNodeExecutionVisitor.visitInsertTablet(RegionWriteExecutor.java:163) at org.apache.iotdb.db.mpp.execution.executor.RegionWriteExecutor$WritePlanNodeExecutionVisitor.visitInsertTablet(RegionWriteExecutor.java:117) at org.apache.iotdb.db.mpp.plan.planner.plan.node.write.InsertTabletNode.accept(InsertTabletNode.java:1085) at org.apache.iotdb.db.mpp.execution.executor.RegionWriteExecutor.execute(RegionWriteExecutor.java:83) at org.apache.iotdb.db.mpp.plan.scheduler.FragmentInstanceDispatcherImpl.dispatchLocally(FragmentInstanceDispatcherImpl.java:232) at org.apache.iotdb.db.mpp.plan.scheduler.FragmentInstanceDispatcherImpl.dispatchOneInstance(FragmentInstanceDispatcherImpl.java:137) at org.apache.iotdb.db.mpp.plan.scheduler.FragmentInstanceDispatcherImpl.dispatchWriteSync(FragmentInstanceDispatcherImpl.java:119) at org.apache.iotdb.db.mpp.plan.scheduler.FragmentInstanceDispatcherImpl.dispatch(FragmentInstanceDispatcherImpl.java:90) at org.apache.iotdb.db.mpp.plan.scheduler.ClusterScheduler.start(ClusterScheduler.java:102) at org.apache.iotdb.db.mpp.plan.execution.QueryExecution.schedule(QueryExecution.java:283) at org.apache.iotdb.db.mpp.plan.execution.QueryExecution.start(QueryExecution.java:201) at org.apache.iotdb.db.mpp.plan.Coordinator.execute(Coordinator.java:146) at org.apache.iotdb.db.mpp.plan.Coordinator.execute(Coordinator.java:160) at org.apache.iotdb.db.service.thrift.impl.ClientRPCServiceImpl.insertTablet(ClientRPCServiceImpl.java:1198) at org.apache.iotdb.service.rpc.thrift.IClientRPCService$Processor$insertTablet.getResult(IClientRPCService.java:4078) at org.apache.iotdb.service.rpc.thrift.IClientRPCService$Processor$insertTablet.getResult(IClientRPCService.java:4058) at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:38) at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:38) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:248) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.base/java.lang.Thread.run(Thread.java:834) See the attachment for region information before and after the remove operation . Test ENV: 1. 192.168.10.72、73、74、75、76 48CPU 384GB ConfigNode MAX_HEAP_SIZE="8G" cn_connection_timeout_ms=120000 Common : connection_timeout_ms=120000 max_connection_for_internal_service=200 query_timeout_threshold=36000000 multi_leader_throttle_threshold_in_byte=536870912000 max_waiting_time_when_insert_blocked=120000 schema_region_consensus_protocol_class=org.apache.iotdb.consensus.ratis.RatisConsensus data_region_consensus_protocol_class=org.apache.iotdb.consensus.multileader.MultiLeaderConsensus schema_replication_factor=3 data_replication_factor=3 Datanode : MAX_HEAP_SIZE="256G" MAX_DIRECT_MEMORY_SIZE="32G" 2. benchmark configuration See the attachment 3. remove cmd : {color:#DE350B}*fit-72*{color}:/data/mpp_test/m_1031_76b947f$ cat rm.sh #!/bin.bash sleep 1h ./sbin/start-cli.sh -h 192.168.10.76 -e "show cluster" >> bef_rm_info.out ./sbin/start-cli.sh -h 192.168.10.76 -e "show regions" >> bef_rm_info.out ./sbin/remove-datanode.sh 192.168.10.72:6667 > rm_ip72.out -- This message was sent by Atlassian Jira (v8.20.10#820010)