[ https://issues.apache.org/jira/browse/IOTDB-4367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17603548#comment-17603548 ]
Jinrui Zhang commented on IOTDB-4367: ------------------------------------- We need to confirm whether this issue is led by dead-lock or resource-lack > [MultiLeaderConsensus] thread leak > ---------------------------------- > > Key: IOTDB-4367 > URL: https://issues.apache.org/jira/browse/IOTDB-4367 > Project: Apache IoTDB > Issue Type: Bug > Components: mpp-cluster > Affects Versions: 0.14.0-SNAPSHOT > Reporter: 刘珍 > Assignee: Jinrui Zhang > Priority: Major > Attachments: image-2022-09-08-18-17-19-185.png, more_metadata.conf > > > m_0908_7915b3f ,3副本3C3D > benchmark 300用户并发,3000设备,每个设备1万个序列, > 往每个序列{color:#DE350B}写入1个点{color},出现{color:#DE350B}线程泄露(IoTDB-DataNodeInternalRPC-Processor){color},且3个节点都有{color:#DE350B}乱序文件{color}: > !image-2022-09-08-18-17-19-185.png! > 田原查看日志/stack(benchmark连72): > 72往73和74转发写入请求,然后73和74处理不过来了,很多这些写入请求在73和74上阻塞住了;然后72上这个转发的线程超时(72和73上的处理线程不会结束,会一直等到处理完成),就返回给benchmark说写入失败了,benchmark的写入线程记录下这个失败后,就会开始进行下一个写入;72接收到新写入后,又会不断往73和74发,又会在73和74上创建新的处理写入的线程。 > 复现流程: > 1. 192.168.10.72/ 73/74 48核384G > benchmark 在71 > 2. 集群参数 > confignode > MAX_HEAP_SIZE="8G" > schema_region_consensus_protocol_class=org.apache.iotdb.consensus.ratis.RatisConsensus > data_region_consensus_protocol_class=org.apache.iotdb.consensus.multileader.MultiLeaderConsensus > schema_replication_factor=3 > data_replication_factor=3 > datanode > MAX_HEAP_SIZE="256G" > MAX_DIRECT_MEMORY_SIZE="32G" > {color:#DE350B}max_connection_for_internal_service=300{color} > enable_timed_flush_seq_memtable=true > seq_memtable_flush_interval_in_ms=600000 > seq_memtable_flush_check_interval_in_ms=300000 > enable_timed_flush_unseq_memtable=true > unseq_memtable_flush_interval_in_ms=600000 > unseq_memtable_flush_check_interval_in_ms=300000 > max_waiting_time_when_insert_blocked=3600000 > query_timeout_threshold=3600000 > 3. benchmark配置见附件 > 运行起来,创建元数据成功,写入数据会有报错,就可以复现,bm端: > 2022-09-08 18:05:05,571 ERROR > cn.edu.tsinghua.iotdb.benchmark.tsdb.DBWrapper:131 - Insert batch failed > because > org.apache.iotdb.rpc.StatementExecutionException: 400: > [EXECUTE_STATEMENT_ERROR(400)] Exception occurred: insertTablet failed. > org.apache.iotdb.commons.exception.IoTDBException: some children of root.test > have already been set to storage group > at org.apache.iotdb.rpc.RpcUtils.verifySuccess(RpcUtils.java:94) > at > org.apache.iotdb.rpc.RpcUtils.verifySuccessWithRedirection(RpcUtils.java:115) > at > org.apache.iotdb.session.SessionConnection.insertTablet(SessionConnection.java:589) > at org.apache.iotdb.session.Session.insertTablet(Session.java:1573) > at org.apache.iotdb.session.Session.insertTablet(Session.java:1560) > at > cn.edu.tsinghua.iotdb.benchmark.iotdb013.IoTDBSession.insertOneBatchByTablet(IoTDBSession.java:168) > at > cn.edu.tsinghua.iotdb.benchmark.iotdb013.IoTDBSessionBase.insertOneBatch(IoTDBSessionBase.java:156) > at > cn.edu.tsinghua.iotdb.benchmark.tsdb.DBWrapper.insertOneBatch(DBWrapper.java:83) > at > cn.edu.tsinghua.iotdb.benchmark.client.generate.GenerateDataMixClient.ingestionOperation(GenerateDataMixClient.java:135) > at > cn.edu.tsinghua.iotdb.benchmark.client.generate.GenerateDataMixClient.doTest(GenerateDataMixClient.java:50) > at > cn.edu.tsinghua.iotdb.benchmark.client.DataClient.run(DataClient.java:144) > at > java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) > at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) > at > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) > at > java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) > at java.base/java.lang.Thread.run(Thread.java:834) -- This message was sent by Atlassian Jira (v8.20.10#820010)