刘珍 created IOTDB-4652: ------------------------- Summary: [ MultiLeaderConsensus ] The data on the replicas is inconsistent Key: IOTDB-4652 URL: https://issues.apache.org/jira/browse/IOTDB-4652 Project: Apache IoTDB Issue Type: Bug Components: mpp-cluster Affects Versions: 0.14.0-SNAPSHOT Reporter: 刘珍 Assignee: Jinrui Zhang Attachments: image-2022-10-14-16-04-28-847.png, image-2022-10-14-16-13-37-165.png
master_1013_00dc222 schema : ratis data : multiLeader 3副本,3C3D bm写入完成(显示全成功),flush。 查询数据,副本间数据不一致。 查询ip68(最后的状态:此region的leader), ./sbin/start-cli.sh -h 192.168.10.68 -e "select count(s_0) from root.test.g_13.d_1013" 少了6个点数据 !image-2022-10-14-16-04-28-847.png! 分析ip68/ip62/ip66 此root.test.g_13.d_1013设备的数据 ip68:94个点,少6个点 ip62:100个点,正确 ip66:100个点,正确 ip66做过leader(直接写入数据较少),ip66 往ip68同步此region的数据时,有ERROR: 2022-10-14 10:55:02,593 [pool-96-IoTDB-LogDispatcher-DataRegion[66]-2] ERROR o.a.i.c.m.l.LogDispatcher$LogDispatcherThread:415 - Can not sync logs to peer Peer{groupId=DataRegion[66], endpoint=TEndPoint(ip:192.168.10.68, port:40010)} because java.io.IOException: Borrow client from pool for node TEndPoint(ip:192.168.10.68, port:40010) failed. at org.apache.iotdb.commons.client.ClientManager.borrowClient(ClientManager.java:61) at org.apache.iotdb.consensus.multileader.logdispatcher.LogDispatcher$LogDispatcherThread.sendBatchAsync(LogDispatcher.java:404) at org.apache.iotdb.consensus.multileader.logdispatcher.LogDispatcher$LogDispatcherThread.run(LogDispatcher.java:289) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.util.NoSuchElementException: Timeout waiting for idle object, borrowMaxWaitMillis=10000 at org.apache.commons.pool2.impl.GenericKeyedObjectPool.borrowObject(GenericKeyedObjectPool.java:453) at org.apache.commons.pool2.impl.GenericKeyedObjectPool.borrowObject(GenericKeyedObjectPool.java:350) at org.apache.iotdb.commons.client.ClientManager.borrowClient(ClientManager.java:50) ... 7 common frames omitted 还需要注意ip66有个ratis 堆外内存检测到泄露的error 2022-10-14 10:39:26,022 [grpc-default-worker-ELG-3-40] ERROR o.a.r.t.i.n.u.ResourceLeakDetector:319 - LEAK: ByteBuf.release() was not called before it's garbage-collected. See https://netty.io/wiki/reference-counted-objects.html for more information. Recent access records: Created at: org.apache.ratis.thirdparty.io.netty.buffer.PooledByteBufAllocator.newDirectBuffer(PooledByteBufAllocator.java:401) org.apache.ratis.thirdparty.io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:188) org.apache.ratis.thirdparty.io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:179) org.apache.ratis.thirdparty.io.netty.channel.unix.PreferredDirectByteBufAllocator.ioBuffer(PreferredDirectByteBufAllocator.java:53) org.apache.ratis.thirdparty.io.netty.channel.DefaultMaxMessagesRecvByteBufAllocator$MaxMessageHandle.allocate(DefaultMaxMessagesRecvByteBufAllocator.java:120) org.apache.ratis.thirdparty.io.netty.channel.epoll.EpollRecvByteAllocatorHandle.allocate(EpollRecvByteAllocatorHandle.java:75) org.apache.ratis.thirdparty.io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:780) org.apache.ratis.thirdparty.io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:480) org.apache.ratis.thirdparty.io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:378) org.apache.ratis.thirdparty.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986) org.apache.ratis.thirdparty.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) org.apache.ratis.thirdparty.io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) java.lang.Thread.run(Thread.java:748) 测试环境 1. 192.168.10.62/66/68 物理机 72cpu 256GB bm在ip64 配置见附件 ConfigNode MAX_HEAP_SIZE="16G" MAX_DIRECT_MEMORY_SIZE="8G" schema_region_consensus_protocol_class=org.apache.iotdb.consensus.ratis.RatisConsensus data_region_consensus_protocol_class=org.apache.iotdb.consensus.multileader.MultiLeaderConsensus schema_replication_factor=3 data_replication_factor=3 connection_timeout_ms=1200000 DataNode MAX_HEAP_SIZE="192G" MAX_DIRECT_MEMORY_SIZE="32G" connection_timeout_ms=1200000 max_waiting_time_when_insert_blocked=3600000 query_timeout_threshold=36000000 enable_auto_create_schema=false 2. bm写入 配置见附件 !image-2022-10-14-16-13-37-165.png! 3. 查询,验证数据正确性,分析结果,分析集群日志。 -- This message was sent by Atlassian Jira (v8.20.10#820010)