[ 
https://issues.apache.org/jira/browse/IOTDB-4294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

刘珍 reopened IOTDB-4294:
-----------------------

> [ mem leak ] Request metadata timed out, causing memory leak
> ------------------------------------------------------------
>
>                 Key: IOTDB-4294
>                 URL: https://issues.apache.org/jira/browse/IOTDB-4294
>             Project: Apache IoTDB
>          Issue Type: Bug
>          Components: mpp-cluster
>    Affects Versions: 0.14.0-SNAPSHOT
>            Reporter: 刘珍
>            Assignee: Yuan Tian
>            Priority: Major
>              Labels: pull-request-available
>         Attachments: add_dn.conf, screenshot-1.png, screenshot-2.png, 
> screenshot-3.png
>
>
> master_0830_42fcbfc
> schema_region_consensus_protocol_class=org.apache.iotdb.consensus.standalone.{color:#DE350B}*StandAloneConsensus*{color}
> {color:#DE350B}*schemaregion 1副本*{color}
> dataregion 3副本
> 先启动1confignode,3datanode,启动benchmark,只有写入,{color:#DE350B}15小时后,大量写入失败{color},下图是统计每小时的写入数据量:
>   !screenshot-1.png! 
> ip5 error
> 2022-08-30 18:01:04,256 [20220830_094446_42966_3.1.0-1068] ERROR 
> o.a.i.d.m.e.f.FragmentInstanceManager:157 - Execute error caused by
> org.apache.iotdb.db.mpp.exception.MemoryNotEnoughException: There is not 
> enough memory to execute current fragment instance, current remaining free 
> memory is 1014007, estimated memory usage for current fragment instance is 
> 1048576
>         at 
> org.apache.iotdb.db.mpp.plan.planner.LocalExecutionPlanner.checkMemory(LocalExecutionPlanner.java:132)
>         at 
> org.apache.iotdb.db.mpp.plan.planner.LocalExecutionPlanner.plan(LocalExecutionPlanner.java:104)
>         at 
> org.apache.iotdb.db.mpp.execution.fragment.FragmentInstanceManager.lambda$execSchemaQueryFragmentInstance$3(FragmentInstanceManager.java:147)
>         at 
> java.util.concurrent.ConcurrentHashMap.computeIfAbsent(ConcurrentHashMap.java:1660)
>         at 
> org.apache.iotdb.db.mpp.execution.fragment.FragmentInstanceManager.execSchemaQueryFragmentInstance(FragmentInstanceManager.java:133)
>         at 
> org.apache.iotdb.db.consensus.statemachine.SchemaRegionStateMachine.read(SchemaRegionStateMachine.java:94)
>         at 
> org.apache.iotdb.consensus.standalone.StandAloneServerImpl.read(StandAloneServerImpl.java:72)
>         at 
> org.apache.iotdb.consensus.standalone.StandAloneConsensus.read(StandAloneConsensus.java:135)
>         at 
> org.apache.iotdb.db.service.thrift.impl.DataNodeInternalRPCServiceImpl.sendFragmentInstance(DataNodeInternalRPCServiceImpl.java:169)
>         at 
> org.apache.iotdb.mpp.rpc.thrift.IDataNodeRPCService$Processor$sendFragmentInstance.getResult(IDataNodeRPCService.java:2136)
>         at 
> org.apache.iotdb.mpp.rpc.thrift.IDataNodeRPCService$Processor$sendFragmentInstance.getResult(IDataNodeRPCService.java:2116)
>         at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:38)
>         at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:38)
>         at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:248)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at java.lang.Thread.run(Thread.java:748)
> 复现流程
> 1. 172.20.70.3/4/5   8c32G  datanode
> 172.20.70.31 confignode  8c32G
> benchmark在ip15  /data/benchmark/bm_0620_7ec96c1
> 集群与regions信息
>  !screenshot-2.png! 
> 2. 数据库配置参数
> confignode
> MAX_HEAP_SIZE="16G"
> schema_region_consensus_protocol_class=org.apache.iotdb.consensus.standalone.StandAloneConsensus
> data_region_consensus_protocol_class=org.apache.iotdb.consensus.multileader.MultiLeaderConsensus
> schema_replication_factor=1
> data_replication_factor=3
> datanode
> MAX_HEAP_SIZE="16G"
> wal_buffer_size_in_byte=1048576
> max_waiting_time_when_insert_blocked=3600000
> 3. 启动bm 配置文件见附件
> 4. 后续还有add datanode(20分钟 加1个datanode,共加6个)
> ip 2/13/14/16/18/19
> 但是没有新的写入操作,所以这些新的datanode上没有数据。



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to