刘珍 created IOTDB-4380:
-------------------------

             Summary: delete storage group :  wal file corrupt 
o.a.i.d.w.n.WALNode$PlanNodeIterator:695 - timeout when waiting for next WAL 
entry ready, execute rollWALFile.
                 Key: IOTDB-4380
                 URL: https://issues.apache.org/jira/browse/IOTDB-4380
             Project: Apache IoTDB
          Issue Type: Bug
          Components: mpp-cluster
    Affects Versions: 0.14.0-SNAPSHOT
            Reporter: 刘珍
            Assignee: 张洪胤
         Attachments: more_metadata.conf

m_0908_7915b3f。
问题描述
datanode重启失败:
2022-09-09 16:32:00,011 [pool-33-IoTDB-LogDispatcher-DataRegion[12]-2] INFO  
o.a.i.d.w.n.WALNode$PlanNodeIterator:695 - timeout when waiting for next WAL 
entry ready, execute rollWALFile. {color:#DE350B}*Current search index in wal 
buffer is 2959, and next target index is 2501 *{color}

MultiLeaderConsensus,3副本3节点
1. 创建元数据过程中,kill ip74
benchmark配置文件见附件。
2. 清空ip74 的操作系统缓存,启动ip74的datanode
3. 再次重新运行benchmark同一配置,IS_DELETE_DATA=true
这个参数为true,会先执行delete storage group root.test.*;
benchmark运行完成,stop ip74的datanode服务
备份data 为/data/mpp_test/m_0908_7915b3f/datanode/data_for_recovery_Test

4. 清ip74操作系统缓存,启动datanode服务
再次运行benchmark同一配置,benchmark运行完成,
查看ip74的日志,看到
2022-09-09 15:43:13,691 [pool-23-IoTDB-MPPDataExchangeRPC-Processor-40] ERROR 
o.a.t.ProcessFunction:47 - Internal error processing getDataBlock
org.apache.thrift.TException: Source fragment instance not found. Fragment 
instance ID: TFragmentInstanceId(queryId:20220909_074205_19400_3, fragmentId:2, 
instanceId:0).
        at 
org.apache.iotdb.db.mpp.execution.exchange.MPPDataExchangeManager$MPPDataExchangeServiceImpl.getDataBlock(MPPDataExchangeManager.java:90)
        at 
org.apache.iotdb.mpp.rpc.thrift.MPPDataExchangeService$Processor$getDataBlock.getResult(MPPDataExchangeService.java:326)
        at 
org.apache.iotdb.mpp.rpc.thrift.MPPDataExchangeService$Processor$getDataBlock.getResult(MPPDataExchangeService.java:306)
        at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:38)
        at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:38)
        at 
org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:248)
        at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java.base/java.lang.Thread.run(Thread.java:834)
2022-09-09 15:43:15,312 [20220909_074205_19400_3.2.0.SinkHandle-3074] ERROR 
o.a.i.d.m.e.e.SinkHandle:281 - The TsBlock doesn't exist. Sequence ID is 1, 
remaining map is 
[0=<org.apache.iotdb.tsfile.read.common.block.TsBlock@5f617979,1048576>]
2022-09-09 15:43:17,119 [pool-23-IoTDB-MPPDataExchangeRPC-Processor-22] ERROR 
o.a.t.ProcessFunction:47 - Internal error processing getDataBlock
java.lang.IllegalStateException: The data block doesn't exist. Sequence ID: 1
        at 
org.apache.iotdb.db.mpp.execution.exchange.SinkHandle.getSerializedTsBlock(SinkHandle.java:285)
        at 
org.apache.iotdb.db.mpp.execution.exchange.MPPDataExchangeManager$MPPDataExchangeServiceImpl.getDataBlock(MPPDataExchangeManager.java:97)
        at 
org.apache.iotdb.mpp.rpc.thrift.MPPDataExchangeService$Processor$getDataBlock.getResult(MPPDataExchangeService.java:326)
        at 
org.apache.iotdb.mpp.rpc.thrift.MPPDataExchangeService$Processor$getDataBlock.getResult(MPPDataExchangeService.java:306)
        at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:38)
        at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:38)
        at 
org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:248)
        at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java.base/java.lang.Thread.run(Thread.java:834)

5. 停止ip74的datanode服务
备份data 到/data/mpp_test/m_0908_7915b3f/datanode/data_for_recovery_Test_2
清ip74操作系统缓存,启动ip74的datanode ,失败:
2022-09-09 16:44:00,039 [pool-33-IoTDB-LogDispatcher-DataRegion[12]-2] INFO  
o.a.i.d.w.n.WALNode$PlanNodeIterator:695 - timeout when waiting for next WAL 
entry ready, execute rollWALFile. Current search index in wal buffer is 2959, 
and next target index is 2501 






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to