[ https://issues.apache.org/jira/browse/IOTDB-4334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
刘珍 reopened IOTDB-4334: ----------------------- master_1013_00dc222 , 问题还存在。 3副本3C3D,干净环境,bm写入过程中,手动模拟ip3的磁盘满, 此节点read-only,kill bm进程,再启动新的写入,device_name不同,创建新的region失败,confignode的报错信息 2022-10-17 11:29:41,112 [pool-4-IoTDB-ConfigNodeRPC-Processor-4] ERROR o.a.i.c.m.p.PartitionManager:298 - There are no available RegionGroups currently, please check the status of cluster DataNodes 第2个bm配置见附件 > No disk space, no load balancing to new datanode > ------------------------------------------------ > > Key: IOTDB-4334 > URL: https://issues.apache.org/jira/browse/IOTDB-4334 > Project: Apache IoTDB > Issue Type: Bug > Components: mpp-cluster > Affects Versions: 0.14.0-SNAPSHOT > Reporter: 刘珍 > Assignee: Yongzao Dan > Priority: Major > Attachments: cf_partition.conf, image-2022-09-05-16-46-15-829.png, > image-2022-09-05-16-51-07-405.png, image-2022-09-05-16-52-07-143.png, > image-2022-09-05-16-54-09-763.png, ip3_log.tar.gz, screenshot-1.png > > > master_0904_2db66c6 > ConfigNode开启时间分区 > !image-2022-09-05-16-46-15-829.png! > 启动3副本3C3D干净集群,启动benchmark写入数据,元数据创建完成,写入一些数据后,增加2个datanode。 > ip3(follower) 磁盘满,{color:#DE350B}*并没有路由新分区到新datanode,没有负载均衡*{color}。 > {color:#DE350B}*20分钟后,ip5(leader)写入停止(multiLeader的限流,wal > 51GB)*{color}(客户端bm并没有停止写入) > ip3 报错: > 2022-09-05 16:30:04,560 [pool-29-IoTDB-WAL-Sync(node-root.test.g0_0-1)-1] > ERROR o.a.i.d.w.b.WALBuffer$SyncBufferTask:427 - Fail to sync wal > node-root.test.g0_0-1's buffer, change system mode to error. > java.io.IOException: No space left on device > at sun.nio.ch.FileDispatcherImpl.write0(Native Method) > at sun.nio.ch.FileDispatcherImpl.write(FileDispatcherImpl.java:60) > at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93) > at sun.nio.ch.IOUtil.write(IOUtil.java:51) > at sun.nio.ch.FileChannelImpl.write(FileChannelImpl.java:211) > at org.apache.iotdb.db.wal.io.LogWriter.write(LogWriter.java:58) > at org.apache.iotdb.db.wal.io.WALWriter.write(WALWriter.java:50) > at > org.apache.iotdb.db.wal.buffer.WALBuffer$SyncBufferTask.run(WALBuffer.java:425) > at > org.apache.iotdb.commons.concurrent.WrappedRunnable$1.runMayThrow(WrappedRunnable.java:44) > at > org.apache.iotdb.commons.concurrent.WrappedRunnable.run(WrappedRunnable.java:29) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > 2022-09-05 16:30:04,637 [pool-19-IoTDB-MultiLeaderConsensusRPC-Client-15] > ERROR o.a.i.c.m.t.MultiLeaderConsensusIService$AsyncProcessor$syncLog$1:215 - > Exception inside handler > org.apache.iotdb.commons.exception.IoTDBException: Fail to sync log because > system is read-only. > at > org.apache.iotdb.consensus.multileader.service.MultiLeaderRPCServiceProcessor.syncLog(MultiLeaderRPCServiceProcessor.java:76) > at > org.apache.iotdb.consensus.multileader.thrift.MultiLeaderConsensusIService$AsyncProcessor$syncLog.start(MultiLeaderConsensusIService.java:234) > at > org.apache.iotdb.consensus.multileader.thrift.MultiLeaderConsensusIService$AsyncProcessor$syncLog.start(MultiLeaderConsensusIService.java:177) > at > org.apache.thrift.TBaseAsyncProcessor.process(TBaseAsyncProcessor.java:103) > at > org.apache.thrift.server.AbstractNonblockingServer$AsyncFrameBuffer.invoke(AbstractNonblockingServer.java:603) > at org.apache.thrift.server.Invocation.run(Invocation.java:18) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > 测试流程: > 1. 3C3D 192.168.130.3/4/5 16核32G > 2. benchmark在ip2,配置文件见附件 > /home/benchmark/bm_0620_7ec96c1 > 元数据创建完成,开始写入数据。 > 集群和regions信息: > !image-2022-09-05-16-51-07-405.png! > 3. 增加datanode ip1和ip2 > 集群和regions信息 > !image-2022-09-05-16-52-07-143.png! > 4. 往ip3机器,数据所在磁盘,复制一些数据,让ip3的磁盘空间满。 > ip3 的磁盘满,节点read-only(报错信息见问题描述),详细日志见附件。 > 集群并没有负载均衡,集群和region信息: > !image-2022-09-05-16-54-09-763.png! > ip5(leader) wal大小达到限流 阈值: > !screenshot-1.png! -- This message was sent by Atlassian Jira (v8.20.10#820010)