刘珍 created IOTDB-4380: ------------------------- Summary: delete storage group : wal file corrupt o.a.i.d.w.n.WALNode$PlanNodeIterator:695 - timeout when waiting for next WAL entry ready, execute rollWALFile. Key: IOTDB-4380 URL: https://issues.apache.org/jira/browse/IOTDB-4380 Project: Apache IoTDB Issue Type: Bug Components: mpp-cluster Affects Versions: 0.14.0-SNAPSHOT Reporter: 刘珍 Assignee: 张洪胤 Attachments: more_metadata.conf
m_0908_7915b3f。 问题描述 datanode重启失败: 2022-09-09 16:32:00,011 [pool-33-IoTDB-LogDispatcher-DataRegion[12]-2] INFO o.a.i.d.w.n.WALNode$PlanNodeIterator:695 - timeout when waiting for next WAL entry ready, execute rollWALFile. {color:#DE350B}*Current search index in wal buffer is 2959, and next target index is 2501 *{color} MultiLeaderConsensus,3副本3节点 1. 创建元数据过程中,kill ip74 benchmark配置文件见附件。 2. 清空ip74 的操作系统缓存,启动ip74的datanode 3. 再次重新运行benchmark同一配置,IS_DELETE_DATA=true 这个参数为true,会先执行delete storage group root.test.*; benchmark运行完成,stop ip74的datanode服务 备份data 为/data/mpp_test/m_0908_7915b3f/datanode/data_for_recovery_Test 4. 清ip74操作系统缓存,启动datanode服务 再次运行benchmark同一配置,benchmark运行完成, 查看ip74的日志,看到 2022-09-09 15:43:13,691 [pool-23-IoTDB-MPPDataExchangeRPC-Processor-40] ERROR o.a.t.ProcessFunction:47 - Internal error processing getDataBlock org.apache.thrift.TException: Source fragment instance not found. Fragment instance ID: TFragmentInstanceId(queryId:20220909_074205_19400_3, fragmentId:2, instanceId:0). at org.apache.iotdb.db.mpp.execution.exchange.MPPDataExchangeManager$MPPDataExchangeServiceImpl.getDataBlock(MPPDataExchangeManager.java:90) at org.apache.iotdb.mpp.rpc.thrift.MPPDataExchangeService$Processor$getDataBlock.getResult(MPPDataExchangeService.java:326) at org.apache.iotdb.mpp.rpc.thrift.MPPDataExchangeService$Processor$getDataBlock.getResult(MPPDataExchangeService.java:306) at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:38) at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:38) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:248) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.base/java.lang.Thread.run(Thread.java:834) 2022-09-09 15:43:15,312 [20220909_074205_19400_3.2.0.SinkHandle-3074] ERROR o.a.i.d.m.e.e.SinkHandle:281 - The TsBlock doesn't exist. Sequence ID is 1, remaining map is [0=<org.apache.iotdb.tsfile.read.common.block.TsBlock@5f617979,1048576>] 2022-09-09 15:43:17,119 [pool-23-IoTDB-MPPDataExchangeRPC-Processor-22] ERROR o.a.t.ProcessFunction:47 - Internal error processing getDataBlock java.lang.IllegalStateException: The data block doesn't exist. Sequence ID: 1 at org.apache.iotdb.db.mpp.execution.exchange.SinkHandle.getSerializedTsBlock(SinkHandle.java:285) at org.apache.iotdb.db.mpp.execution.exchange.MPPDataExchangeManager$MPPDataExchangeServiceImpl.getDataBlock(MPPDataExchangeManager.java:97) at org.apache.iotdb.mpp.rpc.thrift.MPPDataExchangeService$Processor$getDataBlock.getResult(MPPDataExchangeService.java:326) at org.apache.iotdb.mpp.rpc.thrift.MPPDataExchangeService$Processor$getDataBlock.getResult(MPPDataExchangeService.java:306) at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:38) at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:38) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:248) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.base/java.lang.Thread.run(Thread.java:834) 5. 停止ip74的datanode服务 备份data 到/data/mpp_test/m_0908_7915b3f/datanode/data_for_recovery_Test_2 清ip74操作系统缓存,启动ip74的datanode ,失败: 2022-09-09 16:44:00,039 [pool-33-IoTDB-LogDispatcher-DataRegion[12]-2] INFO o.a.i.d.w.n.WALNode$PlanNodeIterator:695 - timeout when waiting for next WAL entry ready, execute rollWALFile. Current search index in wal buffer is 2959, and next target index is 2501 -- This message was sent by Atlassian Jira (v8.20.10#820010)