[ https://issues.apache.org/jira/browse/IOTDB-4380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
刘珍 reopened IOTDB-4380: ----------------------- > The log dispatcher is inconsistent with the search index of the wal node > ------------------------------------------------------------------------ > > Key: IOTDB-4380 > URL: https://issues.apache.org/jira/browse/IOTDB-4380 > Project: Apache IoTDB > Issue Type: Bug > Components: mpp-cluster > Affects Versions: 0.14.0-SNAPSHOT > Reporter: 刘珍 > Assignee: 张洪胤 > Priority: Major > Fix For: 1.0.0 > > Attachments: ip74_logs.tar.gz, more_metadata.conf, screenshot-1.png > > > m_0908_7915b3f。 > 问题描述 > {color:#DE350B}*log dispatcher与wal node的search index不一致 , > datanode重启成功后日志一直刷*{color}: > 2022-09-09 16:32:00,011 [pool-33-IoTDB-LogDispatcher-DataRegion[12]-2] INFO > o.a.i.d.w.n.WALNode$PlanNodeIterator:695 - timeout when waiting for next WAL > entry ready, execute rollWALFile. {color:#DE350B}*Current search index in wal > buffer is 2959, and next target index is 2501 *{color} > MultiLeaderConsensus,3副本3节点 > 1. 创建元数据过程中,kill ip74 的datanode PID > benchmark配置文件见附件。 > 2. 清空ip74 的操作系统缓存,启动ip74的datanode > 3. 再次重新运行benchmark同一配置,IS_DELETE_DATA=true > 这个参数为true,会先执行delete storage group root.test.*; > benchmark运行完成,stop ip74的datanode服务 > 备份data 为/data/mpp_test/m_0908_7915b3f/datanode/data_for_recovery_Test > 4. 清ip74操作系统缓存,启动datanode服务 > 再次运行benchmark同一配置,benchmark运行完成, > 查看ip74的日志,看到 > 2022-09-09 15:43:13,691 [pool-23-IoTDB-MPPDataExchangeRPC-Processor-40] ERROR > o.a.t.ProcessFunction:47 - Internal error processing getDataBlock > org.apache.thrift.TException: Source fragment instance not found. Fragment > instance ID: TFragmentInstanceId(queryId:20220909_074205_19400_3, > fragmentId:2, instanceId:0). > at > org.apache.iotdb.db.mpp.execution.exchange.MPPDataExchangeManager$MPPDataExchangeServiceImpl.getDataBlock(MPPDataExchangeManager.java:90) > at > org.apache.iotdb.mpp.rpc.thrift.MPPDataExchangeService$Processor$getDataBlock.getResult(MPPDataExchangeService.java:326) > at > org.apache.iotdb.mpp.rpc.thrift.MPPDataExchangeService$Processor$getDataBlock.getResult(MPPDataExchangeService.java:306) > at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:38) > at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:38) > at > org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:248) > at > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) > at > java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) > at java.base/java.lang.Thread.run(Thread.java:834) > 2022-09-09 15:43:15,312 [20220909_074205_19400_3.2.0.SinkHandle-3074] ERROR > o.a.i.d.m.e.e.SinkHandle:281 - The TsBlock doesn't exist. Sequence ID is 1, > remaining map is > [0=<org.apache.iotdb.tsfile.read.common.block.TsBlock@5f617979,1048576>] > 2022-09-09 15:43:17,119 [pool-23-IoTDB-MPPDataExchangeRPC-Processor-22] ERROR > o.a.t.ProcessFunction:47 - Internal error processing getDataBlock > java.lang.IllegalStateException: The data block doesn't exist. Sequence ID: 1 > at > org.apache.iotdb.db.mpp.execution.exchange.SinkHandle.getSerializedTsBlock(SinkHandle.java:285) > at > org.apache.iotdb.db.mpp.execution.exchange.MPPDataExchangeManager$MPPDataExchangeServiceImpl.getDataBlock(MPPDataExchangeManager.java:97) > at > org.apache.iotdb.mpp.rpc.thrift.MPPDataExchangeService$Processor$getDataBlock.getResult(MPPDataExchangeService.java:326) > at > org.apache.iotdb.mpp.rpc.thrift.MPPDataExchangeService$Processor$getDataBlock.getResult(MPPDataExchangeService.java:306) > at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:38) > at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:38) > at > org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:248) > at > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) > at > java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) > at java.base/java.lang.Thread.run(Thread.java:834) > 5. 停止ip74的datanode服务 > 备份data 到/data/mpp_test/m_0908_7915b3f/datanode/data_for_recovery_Test_2 > 清ip74操作系统缓存,启动ip74的datanode ,可以成功,日志一直刷(此节点不能继续同步): > 2022-09-09 16:44:00,039 [pool-33-IoTDB-LogDispatcher-DataRegion[12]-2] INFO > o.a.i.d.w.n.WALNode$PlanNodeIterator:695 - timeout when waiting for next WAL > entry ready, execute rollWALFile. Current search index in wal buffer is 2959, > and next target index is 2501 > 机器 与 集群配置 > 1. 192.168.10.72/ 73/74 48核384G > benchmark 在71 > 2. 集群参数 > confignode > MAX_HEAP_SIZE="8G" > schema_region_consensus_protocol_class=org.apache.iotdb.consensus.ratis.RatisConsensus > data_region_consensus_protocol_class=org.apache.iotdb.consensus.multileader.MultiLeaderConsensus > schema_replication_factor=3 > data_replication_factor=3 > datanode > MAX_HEAP_SIZE="256G" > MAX_DIRECT_MEMORY_SIZE="32G" > max_connection_for_internal_service=300 > enable_timed_flush_seq_memtable=true > seq_memtable_flush_interval_in_ms=600000 > seq_memtable_flush_check_interval_in_ms=300000 > enable_timed_flush_unseq_memtable=true > unseq_memtable_flush_interval_in_ms=600000 > unseq_memtable_flush_check_interval_in_ms=300000 > max_waiting_time_when_insert_blocked=3600000 > query_timeout_threshold=3600000 -- This message was sent by Atlassian Jira (v8.20.10#820010)