[ https://issues.apache.org/jira/browse/IOTDB-5061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
刘珍 reopened IOTDB-5061: ----------------------- rc/1.0.1 2023-01-07_09dd173 需要等ratis发版,目前此iotdb版本,缩容失败,报错: > Failed to rename mtree.snapshot.tmp to mtree.snapshot while creating mtree > snapshot > ----------------------------------------------------------------------------------- > > Key: IOTDB-5061 > URL: https://issues.apache.org/jira/browse/IOTDB-5061 > Project: Apache IoTDB > Issue Type: Bug > Components: mpp-cluster > Affects Versions: 0.14.0-SNAPSHOT > Reporter: 刘珍 > Assignee: Song Ziyang > Priority: Blocker > Labels: pull-request-available > Attachments: image-2022-12-17-08-39-00-520.png, iotdb_4593.conf, > screenshot-1.png, screenshot-2.png, screenshot-3.png, screenshot-4.png > > Original Estimate: 2h > Remaining Estimate: 2h > > m_1127_ffbdaf3 > 1. 启动3副本3C5D 集群 > 2.BM 写入数据,1小时后,缩容IP72 datanode。 > 3. 开始缩容,1小时40分钟IP72 刷大量ERROR(308个ERROR 日志文件 NPE) > 2022-11-27 15:42:25,876 [3@group-000200000006-StateMachineUpdater] ERROR > o.a.i.d.m.m.s.MemMTreeSnapshotUtil:89 - {color:#DE350B}Failed to rename > mtree.snapshot.tmp to mtree.snapshot while creating mtree snapshot.{color} > 2022-11-27 15:42:26,157 [3@group-000200000006-StateMachineUpdater] ERROR > o.a.r.s.i.StateMachineUpdater:194 - 3@group-000200000006-StateMachineUpdater > caught a Throwable. > {color:#DE350B}java.lang.NullPointerException: null{color} > at > org.apache.iotdb.db.metadata.tag.TagManager.createSnapshot(TagManager.java:79) > at > org.apache.iotdb.db.metadata.schemaregion.SchemaRegionMemoryImpl.createSnapshot(SchemaRegionMemoryImpl.java:456) > at > org.apache.iotdb.db.consensus.statemachine.SchemaRegionStateMachine.takeSnapshot(SchemaRegionStateMachine.java:62) > at > org.apache.iotdb.consensus.IStateMachine.takeSnapshot(IStateMachine.java:82) > at > org.apache.iotdb.consensus.ratis.ApplicationStateMachineProxy.takeSnapshot(ApplicationStateMachineProxy.java:212) > at > org.apache.ratis.server.impl.StateMachineUpdater.takeSnapshot(StateMachineUpdater.java:270) > at > org.apache.ratis.server.impl.StateMachineUpdater.checkAndTakeSnapshot(StateMachineUpdater.java:262) > at > org.apache.ratis.server.impl.StateMachineUpdater.run(StateMachineUpdater.java:186) > at java.base/java.lang.Thread.run(Thread.java:834) > 2022-11-27 15:42:26,158 [3@group-000200000006-StateMachineUpdater] ERROR > o.a.r.s.i.StateMachineUpdater:194 - 3@group-000200000006-StateMachineUpdater > caught a Throwable. > java.lang.NullPointerException: null > at > org.apache.iotdb.db.metadata.tag.TagManager.createSnapshot(TagManager.java:79) > at > org.apache.iotdb.db.metadata.schemaregion.SchemaRegionMemoryImpl.createSnapshot(SchemaRegionMemoryImpl.java:456) > at > org.apache.iotdb.db.consensus.statemachine.SchemaRegionStateMachine.takeSnapshot(SchemaRegionStateMachine.java:62) > at > org.apache.iotdb.consensus.IStateMachine.takeSnapshot(IStateMachine.java:82) > at > org.apache.iotdb.consensus.ratis.ApplicationStateMachineProxy.takeSnapshot(ApplicationStateMachineProxy.java:212) > at > org.apache.ratis.server.impl.StateMachineUpdater.takeSnapshot(StateMachineUpdater.java:270) > at > org.apache.ratis.server.impl.StateMachineUpdater.checkAndTakeSnapshot(StateMachineUpdater.java:262) > at > org.apache.ratis.server.impl.StateMachineUpdater.run(StateMachineUpdater.java:183) > at java.base/java.lang.Thread.run(Thread.java:834) > 测试环境 > 1. 192.168.10.72~76 > ConfigNode > MAX_HEAP_SIZE="8G" > cn_connection_timeout_ms=120000 > Common > schema_region_consensus_protocol_class=org.apache.iotdb.consensus.ratis.RatisConsensus > data_region_consensus_protocol_class=org.apache.iotdb.consensus.iot.IoTConsensus > schema_replication_factor=3 > data_replication_factor=3 > connection_timeout_ms=120000 > max_connection_for_internal_service=200 > max_waiting_time_when_insert_blocked=600000 > query_timeout_threshold=36000000 > DataNode > MAX_HEAP_SIZE="256G" > MAX_DIRECT_MEMORY_SIZE="32G" > 2. bm配置见附件 > 3. ip72 ${iotdb_dir}下的脚本 > sleep 1h > ./sbin/start-cli.sh -h 192.168.10.72 -e "show cluster" > bef_remove.out > ./sbin/start-cli.sh -h 192.168.10.72 -e "show regions" >> bef_remove.out > ./sbin/start-cli.sh -h 192.168.10.72 -e "show storage group" >> bef_remove.out > ./sbin/remove-datanode.sh "192.168.10.72:6667" >> remove_ip72.out > 4. 查看缩容结果,各节点日志 -- This message was sent by Atlassian Jira (v8.20.10#820010)