[ https://issues.apache.org/jira/browse/IOTDB-4809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Yongzao Dan reassigned IOTDB-4809: ---------------------------------- Assignee: 陈哲涵 (was: Yongzao Dan) > [ remove datanode ] ConsensusGroupNotExistException: The consensus group > DataRegion[11] doesn't exist > ----------------------------------------------------------------------------------------------------- > > Key: IOTDB-4809 > URL: https://issues.apache.org/jira/browse/IOTDB-4809 > Project: Apache IoTDB > Issue Type: Bug > Components: mpp-cluster > Affects Versions: 0.14.0-SNAPSHOT > Reporter: 刘珍 > Assignee: 陈哲涵 > Priority: Major > Attachments: after_remove_regions_info.out, > before_remove_regions_info.out, more_dev.conf, screenshot-1.png, > screenshot-2.png > > > m_1031_76b947f > 3rep , 3C5D > schema region : ratis > data region : multiLeader > {color:#DE350B}*This issue contains 3 bugs*{color} > benchmark runs for 1 hour and execute remove (ip72) datanode , ip72 datanode > brushes ERROR logs : > 2022-10-31 17:48:06,277 > [pool-25-IoTDB-ClientRPC-Processor-85$20221031_094806_31457_3.1.0] ERROR > o.a.i.d.m.p.s.FragmentInstanceDispatcherImpl:234 - write locally failed. > TSStatus: TSStatus(code:412, > message:org.apache.iotdb.consensus.exception.ConsensusGroupNotExistException: > The consensus group DataRegion[13] doesn't exist), message: > org.apache.iotdb.consensus.exception.ConsensusGroupNotExistException: The > consensus group DataRegion[13] doesn't exist > 2022-10-31 17:48:06,285 > [pool-25-IoTDB-ClientRPC-Processor-93$20221031_094806_31458_3.1.0] ERROR > o.a.i.d.m.e.e.RegionWriteExecutor$WritePlanNodeExecutionVisitor:235 - > {color:#DE350B}*Something wrong happened while calling consensus layer's > write API. > org.apache.iotdb.consensus.exception.ConsensusGroupNotExistException: The > consensus group DataRegion[11] doesn't exist*{color} > at > org.apache.iotdb.consensus.multileader.MultiLeaderConsensus.write(MultiLeaderConsensus.java:155) > at > org.apache.iotdb.db.mpp.execution.executor.RegionWriteExecutor.fireTriggerAndInsert(RegionWriteExecutor.java:101) > at > org.apache.iotdb.db.mpp.execution.executor.RegionWriteExecutor$WritePlanNodeExecutionVisitor.executeDataInsert(RegionWriteExecutor.java:215) > at > org.apache.iotdb.db.mpp.execution.executor.RegionWriteExecutor$WritePlanNodeExecutionVisitor.visitInsertTablet(RegionWriteExecutor.java:163) > at > org.apache.iotdb.db.mpp.execution.executor.RegionWriteExecutor$WritePlanNodeExecutionVisitor.visitInsertTablet(RegionWriteExecutor.java:117) > at > org.apache.iotdb.db.mpp.plan.planner.plan.node.write.InsertTabletNode.accept(InsertTabletNode.java:1085) > at > org.apache.iotdb.db.mpp.execution.executor.RegionWriteExecutor.execute(RegionWriteExecutor.java:83) > at > org.apache.iotdb.db.mpp.plan.scheduler.FragmentInstanceDispatcherImpl.dispatchLocally(FragmentInstanceDispatcherImpl.java:232) > at > org.apache.iotdb.db.mpp.plan.scheduler.FragmentInstanceDispatcherImpl.dispatchOneInstance(FragmentInstanceDispatcherImpl.java:137) > at > org.apache.iotdb.db.mpp.plan.scheduler.FragmentInstanceDispatcherImpl.dispatchWriteSync(FragmentInstanceDispatcherImpl.java:119) > at > org.apache.iotdb.db.mpp.plan.scheduler.FragmentInstanceDispatcherImpl.dispatch(FragmentInstanceDispatcherImpl.java:90) > at > org.apache.iotdb.db.mpp.plan.scheduler.ClusterScheduler.start(ClusterScheduler.java:102) > at > org.apache.iotdb.db.mpp.plan.execution.QueryExecution.schedule(QueryExecution.java:283) > at > org.apache.iotdb.db.mpp.plan.execution.QueryExecution.start(QueryExecution.java:201) > at > org.apache.iotdb.db.mpp.plan.Coordinator.execute(Coordinator.java:146) > at > org.apache.iotdb.db.mpp.plan.Coordinator.execute(Coordinator.java:160) > at > org.apache.iotdb.db.service.thrift.impl.ClientRPCServiceImpl.insertTablet(ClientRPCServiceImpl.java:1198) > at > org.apache.iotdb.service.rpc.thrift.IClientRPCService$Processor$insertTablet.getResult(IClientRPCService.java:4078) > at > org.apache.iotdb.service.rpc.thrift.IClientRPCService$Processor$insertTablet.getResult(IClientRPCService.java:4058) > at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:38) > at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:38) > at > org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:248) > at > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) > at > java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) > at java.base/java.lang.Thread.run(Thread.java:834) > {color:#DE350B}**See the attachment for region information before and after > the remove operation . > An incorrect phenomenon**{color} > !screenshot-1.png! > {color:#DE350B}*When removing, new dataregion was created,but no data > :*{color} > !screenshot-2.png! > Test ENV: > 1. 192.168.10.72、73、74、75、76 48CPU 384GB > ConfigNode > MAX_HEAP_SIZE="8G" > cn_connection_timeout_ms=120000 > Common : > connection_timeout_ms=120000 > max_connection_for_internal_service=200 > query_timeout_threshold=36000000 > multi_leader_throttle_threshold_in_byte=536870912000 > max_waiting_time_when_insert_blocked=120000 > schema_region_consensus_protocol_class=org.apache.iotdb.consensus.ratis.RatisConsensus > data_region_consensus_protocol_class=org.apache.iotdb.consensus.multileader.MultiLeaderConsensus > schema_replication_factor=3 > data_replication_factor=3 > Datanode : > MAX_HEAP_SIZE="256G" > MAX_DIRECT_MEMORY_SIZE="32G" > 2. benchmark configuration > See the attachment > 3. remove cmd : > {color:#DE350B}*fit-72*{color}:/data/mpp_test/m_1031_76b947f$ cat rm.sh > #!/bin.bash > sleep 1h > ./sbin/start-cli.sh -h 192.168.10.76 -e "show cluster" >> bef_rm_info.out > ./sbin/start-cli.sh -h 192.168.10.76 -e "show regions" >> bef_rm_info.out > ./sbin/remove-datanode.sh 192.168.10.72:6667 > rm_ip72.out -- This message was sent by Atlassian Jira (v8.20.10#820010)