刘珍 created IOTDB-4553:
-------------------------

             Summary: [remove datanode ] SchemaRegion migration failed
                 Key: IOTDB-4553
                 URL: https://issues.apache.org/jira/browse/IOTDB-4553
             Project: Apache IoTDB
          Issue Type: Bug
          Components: mpp-cluster
    Affects Versions: 0.14.0-SNAPSHOT
            Reporter: 刘珍
            Assignee: Song Ziyang
         Attachments: image-2022-09-28-18-03-13-622.png

master_0928_e5cc456
SchemaRegion : ratis
DataRegion : multiLeader
均为3副本,先启动3C3D,bm写入数据,增加1个datanode ip40,缩容ip39,
ip39 缩容成功后,SchemaRegion 迁移失败
 !image-2022-09-28-18-03-13-622.png! 

ip40的datanode error 
2022-09-28 17:37:55,449 [pool-21-IoTDB-DataNodeInternalRPC-Processor-3] ERROR 
o.a.i.d.s.t.i.DataNodeInternalRPCServiceImpl:1002 - CreateNewRegionPeer error, 
peers: [Peer{groupId=SchemaRegion[0], endpoint=TEndPoint(ip:172.20.70.37, 
port:50010)}, Peer{groupId=SchemaRegion[0], endpoint=TEndPoint(ip:172.20.70.38, 
port:50010)}, Peer{groupId=SchemaRegion[0], endpoint=TEndPoint(ip:172.20.70.39, 
port:50010)}, Peer{groupId=SchemaRegion[0], endpoint=TEndPoint(ip:172.20.70.40, 
port:50010)}], regionId: SchemaRegion[0], errorMessage
org.apache.iotdb.consensus.exception.RatisRequestFailedException: Ratis request 
failed
        at 
org.apache.iotdb.consensus.ratis.RatisConsensus.createPeer(RatisConsensus.java:332)
        at 
org.apache.iotdb.db.service.thrift.impl.DataNodeInternalRPCServiceImpl.createNewRegionPeer(DataNodeInternalRPCServiceImpl.java:999)
        at 
org.apache.iotdb.db.service.thrift.impl.DataNodeInternalRPCServiceImpl.createNewRegionPeer(DataNodeInternalRPCServiceImpl.java:838)
        at 
org.apache.iotdb.mpp.rpc.thrift.IDataNodeRPCService$Processor$createNewRegionPeer.getResult(IDataNodeRPCService.java:3237)
        at 
org.apache.iotdb.mpp.rpc.thrift.IDataNodeRPCService$Processor$createNewRegionPeer.getResult(IDataNodeRPCService.java:3217)
        at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:38)
        at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:38)
        at 
org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:248)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.IOException: 
org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: io 
exception
        at org.apache.ratis.grpc.GrpcUtil.unwrapException(GrpcUtil.java:92)
        at 
org.apache.ratis.grpc.client.GrpcClientProtocolClient.blockingCall(GrpcClientProtocolClient.java:234)
        at 
org.apache.ratis.grpc.client.GrpcClientProtocolClient.groupAdd(GrpcClientProtocolClient.java:181)
        at 
org.apache.ratis.grpc.client.GrpcClientRpc.sendRequest(GrpcClientRpc.java:98)
        at 
org.apache.ratis.client.impl.BlockingImpl.sendRequest(BlockingImpl.java:132)
        at 
org.apache.ratis.client.impl.BlockingImpl.sendRequestWithRetry(BlockingImpl.java:98)
        at 
org.apache.ratis.client.impl.GroupManagementImpl.add(GroupManagementImpl.java:51)
        at 
org.apache.iotdb.consensus.ratis.RatisConsensus.createPeer(RatisConsensus.java:327)
        ... 10 common frames omitted
Caused by: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: 
UNAVAILABLE: io exception
        at 
org.apache.ratis.thirdparty.io.grpc.stub.ClientCalls.toStatusRuntimeException(ClientCalls.java:262)
        at 
org.apache.ratis.thirdparty.io.grpc.stub.ClientCalls.getUnchecked(ClientCalls.java:243)
        at 
org.apache.ratis.thirdparty.io.grpc.stub.ClientCalls.blockingUnaryCall(ClientCalls.java:156)
        at 
org.apache.ratis.proto.grpc.AdminProtocolServiceGrpc$AdminProtocolServiceBlockingStub.groupManagement(AdminProtocolServiceGrpc.java:507)
        at 
org.apache.ratis.grpc.client.GrpcClientProtocolClient.lambda$groupAdd$5(GrpcClientProtocolClient.java:183)
        at 
org.apache.ratis.grpc.client.GrpcClientProtocolClient.blockingCall(GrpcClientProtocolClient.java:232)
        ... 16 common frames omitted
Caused by: 
org.apache.ratis.thirdparty.io.netty.channel.AbstractChannel$AnnotatedConnectException:
 finishConnect(..) failed: Connection refused: /172.20.70.40:50010
Caused by: java.net.ConnectException: finishConnect(..) failed: Connection 
refused
        at 
org.apache.ratis.thirdparty.io.netty.channel.unix.Errors.newConnectException0(Errors.java:155)
        at 
org.apache.ratis.thirdparty.io.netty.channel.unix.Errors.handleConnectErrno(Errors.java:128)
        at 
org.apache.ratis.thirdparty.io.netty.channel.unix.Socket.finishConnect(Socket.java:320)
        at 
org.apache.ratis.thirdparty.io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe.doFinishConnect(AbstractEpollChannel.java:710)
        at 
org.apache.ratis.thirdparty.io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe.finishConnect(AbstractEpollChannel.java:687)
        at 
org.apache.ratis.thirdparty.io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe.epollOutReady(AbstractEpollChannel.java:567)
        at 
org.apache.ratis.thirdparty.io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:470)
        at 
org.apache.ratis.thirdparty.io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:378)
        at 
org.apache.ratis.thirdparty.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986)
        at 
org.apache.ratis.thirdparty.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
        at 
org.apache.ratis.thirdparty.io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
        at java.lang.Thread.run(Thread.java:748)

测试环境
1. 私有云 172.20.70.34..40   8cpu 32GB
34,35,36 是confignode
37..40是datanode
ip21上运行benchmark

2. 集群配置参数
ConfigNode
MAX_HEAP_SIZE="8G"
MAX_DIRECT_MEMORY_SIZE="4G"
schema_region_consensus_protocol_class=org.apache.iotdb.consensus.ratis.RatisConsensus
data_region_consensus_protocol_class=org.apache.iotdb.consensus.multileader.MultiLeaderConsensus
time_partition_interval_for_routing=86400000
schema_replication_factor=3
 schema_replication_factor=3


DataNode
MAX_HEAP_SIZE="20G"
MAX_DIRECT_MEMORY_SIZE="6G"

 wal_buffer_size_in_byte=1048576
 enable_timed_flush_seq_memtable=true
seq_memtable_flush_interval_in_ms=3600000
seq_memtable_flush_check_interval_in_ms=600000
enable_timed_flush_unseq_memtable=true
unseq_memtable_flush_interval_in_ms=3600000
 unseq_memtable_flush_check_interval_in_ms=600000
query_timeout_threshold=36000000

先启动3C , 34,35,36
再启动3D ,37,38,39

2. bm 配置见附件

3. 启动ip40的datanode

4.bm约运行30分钟,缩容ip39

5.查看缩容结果

日志见附件



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to