[ 
https://issues.apache.org/jira/browse/IOTDB-4380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

刘珍 reopened IOTDB-4380:
-----------------------

> The log dispatcher is inconsistent with the search index of the wal node
> ------------------------------------------------------------------------
>
>                 Key: IOTDB-4380
>                 URL: https://issues.apache.org/jira/browse/IOTDB-4380
>             Project: Apache IoTDB
>          Issue Type: Bug
>          Components: mpp-cluster
>    Affects Versions: 0.14.0-SNAPSHOT
>            Reporter: 刘珍
>            Assignee: 张洪胤
>            Priority: Major
>             Fix For: 1.0.0
>
>         Attachments: ip74_logs.tar.gz, more_metadata.conf, screenshot-1.png
>
>
> m_0908_7915b3f。
> 问题描述
> {color:#DE350B}*log dispatcher与wal node的search index不一致 , 
> datanode重启成功后日志一直刷*{color}:
> 2022-09-09 16:32:00,011 [pool-33-IoTDB-LogDispatcher-DataRegion[12]-2] INFO  
> o.a.i.d.w.n.WALNode$PlanNodeIterator:695 - timeout when waiting for next WAL 
> entry ready, execute rollWALFile. {color:#DE350B}*Current search index in wal 
> buffer is 2959, and next target index is 2501 *{color}
> MultiLeaderConsensus,3副本3节点
> 1. 创建元数据过程中,kill ip74 的datanode PID
> benchmark配置文件见附件。
> 2. 清空ip74 的操作系统缓存,启动ip74的datanode
> 3. 再次重新运行benchmark同一配置,IS_DELETE_DATA=true
> 这个参数为true,会先执行delete storage group root.test.*;
> benchmark运行完成,stop ip74的datanode服务
> 备份data 为/data/mpp_test/m_0908_7915b3f/datanode/data_for_recovery_Test
> 4. 清ip74操作系统缓存,启动datanode服务
> 再次运行benchmark同一配置,benchmark运行完成,
> 查看ip74的日志,看到
> 2022-09-09 15:43:13,691 [pool-23-IoTDB-MPPDataExchangeRPC-Processor-40] ERROR 
> o.a.t.ProcessFunction:47 - Internal error processing getDataBlock
> org.apache.thrift.TException: Source fragment instance not found. Fragment 
> instance ID: TFragmentInstanceId(queryId:20220909_074205_19400_3, 
> fragmentId:2, instanceId:0).
>         at 
> org.apache.iotdb.db.mpp.execution.exchange.MPPDataExchangeManager$MPPDataExchangeServiceImpl.getDataBlock(MPPDataExchangeManager.java:90)
>         at 
> org.apache.iotdb.mpp.rpc.thrift.MPPDataExchangeService$Processor$getDataBlock.getResult(MPPDataExchangeService.java:326)
>         at 
> org.apache.iotdb.mpp.rpc.thrift.MPPDataExchangeService$Processor$getDataBlock.getResult(MPPDataExchangeService.java:306)
>         at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:38)
>         at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:38)
>         at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:248)
>         at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>         at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>         at java.base/java.lang.Thread.run(Thread.java:834)
> 2022-09-09 15:43:15,312 [20220909_074205_19400_3.2.0.SinkHandle-3074] ERROR 
> o.a.i.d.m.e.e.SinkHandle:281 - The TsBlock doesn't exist. Sequence ID is 1, 
> remaining map is 
> [0=<org.apache.iotdb.tsfile.read.common.block.TsBlock@5f617979,1048576>]
> 2022-09-09 15:43:17,119 [pool-23-IoTDB-MPPDataExchangeRPC-Processor-22] ERROR 
> o.a.t.ProcessFunction:47 - Internal error processing getDataBlock
> java.lang.IllegalStateException: The data block doesn't exist. Sequence ID: 1
>         at 
> org.apache.iotdb.db.mpp.execution.exchange.SinkHandle.getSerializedTsBlock(SinkHandle.java:285)
>         at 
> org.apache.iotdb.db.mpp.execution.exchange.MPPDataExchangeManager$MPPDataExchangeServiceImpl.getDataBlock(MPPDataExchangeManager.java:97)
>         at 
> org.apache.iotdb.mpp.rpc.thrift.MPPDataExchangeService$Processor$getDataBlock.getResult(MPPDataExchangeService.java:326)
>         at 
> org.apache.iotdb.mpp.rpc.thrift.MPPDataExchangeService$Processor$getDataBlock.getResult(MPPDataExchangeService.java:306)
>         at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:38)
>         at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:38)
>         at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:248)
>         at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>         at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>         at java.base/java.lang.Thread.run(Thread.java:834)
> 5. 停止ip74的datanode服务
> 备份data 到/data/mpp_test/m_0908_7915b3f/datanode/data_for_recovery_Test_2
> 清ip74操作系统缓存,启动ip74的datanode ,可以成功,日志一直刷(此节点不能继续同步):
> 2022-09-09 16:44:00,039 [pool-33-IoTDB-LogDispatcher-DataRegion[12]-2] INFO  
> o.a.i.d.w.n.WALNode$PlanNodeIterator:695 - timeout when waiting for next WAL 
> entry ready, execute rollWALFile. Current search index in wal buffer is 2959, 
> and next target index is 2501 
> 机器 与 集群配置
> 1. 192.168.10.72/ 73/74 48核384G
> benchmark 在71
> 2. 集群参数
> confignode
> MAX_HEAP_SIZE="8G"
> schema_region_consensus_protocol_class=org.apache.iotdb.consensus.ratis.RatisConsensus
> data_region_consensus_protocol_class=org.apache.iotdb.consensus.multileader.MultiLeaderConsensus
> schema_replication_factor=3
> data_replication_factor=3
> datanode
> MAX_HEAP_SIZE="256G"
> MAX_DIRECT_MEMORY_SIZE="32G"
> max_connection_for_internal_service=300
> enable_timed_flush_seq_memtable=true
> seq_memtable_flush_interval_in_ms=600000
> seq_memtable_flush_check_interval_in_ms=300000
> enable_timed_flush_unseq_memtable=true
> unseq_memtable_flush_interval_in_ms=600000
> unseq_memtable_flush_check_interval_in_ms=300000
> max_waiting_time_when_insert_blocked=3600000
> query_timeout_threshold=3600000



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to