刘珍 created IOTDB-4809:
-------------------------

             Summary: [ remove datanode ] ConsensusGroupNotExistException: The 
consensus group DataRegion[11] doesn't exist
                 Key: IOTDB-4809
                 URL: https://issues.apache.org/jira/browse/IOTDB-4809
             Project: Apache IoTDB
          Issue Type: Bug
          Components: mpp-cluster
    Affects Versions: 0.14.0-SNAPSHOT
            Reporter: 刘珍
            Assignee: Haonan Hou


m_1031_76b947f
3rep , 3C5D
schema region : ratis
data region : multiLeader

benchmark runs for 1 hour and execute remove (ip72) datanode ,  ip72 datanode 
brushes ERROR logs :
2022-10-31 17:48:06,277 
[pool-25-IoTDB-ClientRPC-Processor-85$20221031_094806_31457_3.1.0] ERROR 
o.a.i.d.m.p.s.FragmentInstanceDispatcherImpl:234 - write locally failed. 
TSStatus: TSStatus(code:412, 
message:org.apache.iotdb.consensus.exception.ConsensusGroupNotExistException: 
The consensus group DataRegion[13] doesn't exist), message: 
org.apache.iotdb.consensus.exception.ConsensusGroupNotExistException: The 
consensus group DataRegion[13] doesn't exist
2022-10-31 17:48:06,285 
[pool-25-IoTDB-ClientRPC-Processor-93$20221031_094806_31458_3.1.0] ERROR 
o.a.i.d.m.e.e.RegionWriteExecutor$WritePlanNodeExecutionVisitor:235 - Something 
wrong happened while calling consensus layer's write API.
org.apache.iotdb.consensus.exception.ConsensusGroupNotExistException: The 
consensus group DataRegion[11] doesn't exist
        at 
org.apache.iotdb.consensus.multileader.MultiLeaderConsensus.write(MultiLeaderConsensus.java:155)
        at 
org.apache.iotdb.db.mpp.execution.executor.RegionWriteExecutor.fireTriggerAndInsert(RegionWriteExecutor.java:101)
        at 
org.apache.iotdb.db.mpp.execution.executor.RegionWriteExecutor$WritePlanNodeExecutionVisitor.executeDataInsert(RegionWriteExecutor.java:215)
        at 
org.apache.iotdb.db.mpp.execution.executor.RegionWriteExecutor$WritePlanNodeExecutionVisitor.visitInsertTablet(RegionWriteExecutor.java:163)
        at 
org.apache.iotdb.db.mpp.execution.executor.RegionWriteExecutor$WritePlanNodeExecutionVisitor.visitInsertTablet(RegionWriteExecutor.java:117)
        at 
org.apache.iotdb.db.mpp.plan.planner.plan.node.write.InsertTabletNode.accept(InsertTabletNode.java:1085)
        at 
org.apache.iotdb.db.mpp.execution.executor.RegionWriteExecutor.execute(RegionWriteExecutor.java:83)
        at 
org.apache.iotdb.db.mpp.plan.scheduler.FragmentInstanceDispatcherImpl.dispatchLocally(FragmentInstanceDispatcherImpl.java:232)
        at 
org.apache.iotdb.db.mpp.plan.scheduler.FragmentInstanceDispatcherImpl.dispatchOneInstance(FragmentInstanceDispatcherImpl.java:137)
        at 
org.apache.iotdb.db.mpp.plan.scheduler.FragmentInstanceDispatcherImpl.dispatchWriteSync(FragmentInstanceDispatcherImpl.java:119)
        at 
org.apache.iotdb.db.mpp.plan.scheduler.FragmentInstanceDispatcherImpl.dispatch(FragmentInstanceDispatcherImpl.java:90)
        at 
org.apache.iotdb.db.mpp.plan.scheduler.ClusterScheduler.start(ClusterScheduler.java:102)
        at 
org.apache.iotdb.db.mpp.plan.execution.QueryExecution.schedule(QueryExecution.java:283)
        at 
org.apache.iotdb.db.mpp.plan.execution.QueryExecution.start(QueryExecution.java:201)
        at 
org.apache.iotdb.db.mpp.plan.Coordinator.execute(Coordinator.java:146)
        at 
org.apache.iotdb.db.mpp.plan.Coordinator.execute(Coordinator.java:160)
        at 
org.apache.iotdb.db.service.thrift.impl.ClientRPCServiceImpl.insertTablet(ClientRPCServiceImpl.java:1198)
        at 
org.apache.iotdb.service.rpc.thrift.IClientRPCService$Processor$insertTablet.getResult(IClientRPCService.java:4078)
        at 
org.apache.iotdb.service.rpc.thrift.IClientRPCService$Processor$insertTablet.getResult(IClientRPCService.java:4058)
        at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:38)
        at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:38)
        at 
org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:248)
        at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java.base/java.lang.Thread.run(Thread.java:834)


See the attachment for region information before and after the remove operation 
.

Test ENV:
1. 192.168.10.72、73、74、75、76         48CPU 384GB
ConfigNode
MAX_HEAP_SIZE="8G"
cn_connection_timeout_ms=120000

Common :
connection_timeout_ms=120000
max_connection_for_internal_service=200
query_timeout_threshold=36000000
multi_leader_throttle_threshold_in_byte=536870912000
max_waiting_time_when_insert_blocked=120000
schema_region_consensus_protocol_class=org.apache.iotdb.consensus.ratis.RatisConsensus
data_region_consensus_protocol_class=org.apache.iotdb.consensus.multileader.MultiLeaderConsensus
schema_replication_factor=3
data_replication_factor=3


Datanode :
MAX_HEAP_SIZE="256G"
MAX_DIRECT_MEMORY_SIZE="32G"

2. benchmark configuration
See the attachment 
3. remove cmd :
{color:#DE350B}*fit-72*{color}:/data/mpp_test/m_1031_76b947f$ cat rm.sh 
#!/bin.bash
sleep 1h
./sbin/start-cli.sh -h 192.168.10.76 -e "show cluster" >> bef_rm_info.out
./sbin/start-cli.sh -h 192.168.10.76 -e "show regions" >> bef_rm_info.out

./sbin/remove-datanode.sh  192.168.10.72:6667 > rm_ip72.out






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to