[ https://issues.apache.org/jira/browse/IOTDB-4294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
刘珍 reopened IOTDB-4294: ----------------------- > [ mem leak ] Request metadata timed out, causing memory leak > ------------------------------------------------------------ > > Key: IOTDB-4294 > URL: https://issues.apache.org/jira/browse/IOTDB-4294 > Project: Apache IoTDB > Issue Type: Bug > Components: mpp-cluster > Affects Versions: 0.14.0-SNAPSHOT > Reporter: 刘珍 > Assignee: Yuan Tian > Priority: Major > Labels: pull-request-available > Attachments: add_dn.conf, screenshot-1.png, screenshot-2.png, > screenshot-3.png > > > master_0830_42fcbfc > schema_region_consensus_protocol_class=org.apache.iotdb.consensus.standalone.{color:#DE350B}*StandAloneConsensus*{color} > {color:#DE350B}*schemaregion 1副本*{color} > dataregion 3副本 > 先启动1confignode,3datanode,启动benchmark,只有写入,{color:#DE350B}15小时后,大量写入失败{color},下图是统计每小时的写入数据量: > !screenshot-1.png! > ip5 error > 2022-08-30 18:01:04,256 [20220830_094446_42966_3.1.0-1068] ERROR > o.a.i.d.m.e.f.FragmentInstanceManager:157 - Execute error caused by > org.apache.iotdb.db.mpp.exception.MemoryNotEnoughException: There is not > enough memory to execute current fragment instance, current remaining free > memory is 1014007, estimated memory usage for current fragment instance is > 1048576 > at > org.apache.iotdb.db.mpp.plan.planner.LocalExecutionPlanner.checkMemory(LocalExecutionPlanner.java:132) > at > org.apache.iotdb.db.mpp.plan.planner.LocalExecutionPlanner.plan(LocalExecutionPlanner.java:104) > at > org.apache.iotdb.db.mpp.execution.fragment.FragmentInstanceManager.lambda$execSchemaQueryFragmentInstance$3(FragmentInstanceManager.java:147) > at > java.util.concurrent.ConcurrentHashMap.computeIfAbsent(ConcurrentHashMap.java:1660) > at > org.apache.iotdb.db.mpp.execution.fragment.FragmentInstanceManager.execSchemaQueryFragmentInstance(FragmentInstanceManager.java:133) > at > org.apache.iotdb.db.consensus.statemachine.SchemaRegionStateMachine.read(SchemaRegionStateMachine.java:94) > at > org.apache.iotdb.consensus.standalone.StandAloneServerImpl.read(StandAloneServerImpl.java:72) > at > org.apache.iotdb.consensus.standalone.StandAloneConsensus.read(StandAloneConsensus.java:135) > at > org.apache.iotdb.db.service.thrift.impl.DataNodeInternalRPCServiceImpl.sendFragmentInstance(DataNodeInternalRPCServiceImpl.java:169) > at > org.apache.iotdb.mpp.rpc.thrift.IDataNodeRPCService$Processor$sendFragmentInstance.getResult(IDataNodeRPCService.java:2136) > at > org.apache.iotdb.mpp.rpc.thrift.IDataNodeRPCService$Processor$sendFragmentInstance.getResult(IDataNodeRPCService.java:2116) > at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:38) > at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:38) > at > org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:248) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > 复现流程 > 1. 172.20.70.3/4/5 8c32G datanode > 172.20.70.31 confignode 8c32G > benchmark在ip15 /data/benchmark/bm_0620_7ec96c1 > 集群与regions信息 > !screenshot-2.png! > 2. 数据库配置参数 > confignode > MAX_HEAP_SIZE="16G" > schema_region_consensus_protocol_class=org.apache.iotdb.consensus.standalone.StandAloneConsensus > data_region_consensus_protocol_class=org.apache.iotdb.consensus.multileader.MultiLeaderConsensus > schema_replication_factor=1 > data_replication_factor=3 > datanode > MAX_HEAP_SIZE="16G" > wal_buffer_size_in_byte=1048576 > max_waiting_time_when_insert_blocked=3600000 > 3. 启动bm 配置文件见附件 > 4. 后续还有add datanode(20分钟 加1个datanode,共加6个) > ip 2/13/14/16/18/19 > 但是没有新的写入操作,所以这些新的datanode上没有数据。 -- This message was sent by Atlassian Jira (v8.20.10#820010)