[ 
https://issues.apache.org/jira/browse/IOTDB-4367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17603548#comment-17603548
 ] 

Jinrui Zhang commented on IOTDB-4367:
-------------------------------------

We need to confirm whether this issue is led by dead-lock or resource-lack

> [MultiLeaderConsensus] thread leak
> ----------------------------------
>
>                 Key: IOTDB-4367
>                 URL: https://issues.apache.org/jira/browse/IOTDB-4367
>             Project: Apache IoTDB
>          Issue Type: Bug
>          Components: mpp-cluster
>    Affects Versions: 0.14.0-SNAPSHOT
>            Reporter: 刘珍
>            Assignee: Jinrui Zhang
>            Priority: Major
>         Attachments: image-2022-09-08-18-17-19-185.png, more_metadata.conf
>
>
> m_0908_7915b3f ,3副本3C3D
> benchmark 300用户并发,3000设备,每个设备1万个序列,
> 往每个序列{color:#DE350B}写入1个点{color},出现{color:#DE350B}线程泄露(IoTDB-DataNodeInternalRPC-Processor){color},且3个节点都有{color:#DE350B}乱序文件{color}:
>  !image-2022-09-08-18-17-19-185.png! 
> 田原查看日志/stack(benchmark连72):
> 72往73和74转发写入请求,然后73和74处理不过来了,很多这些写入请求在73和74上阻塞住了;然后72上这个转发的线程超时(72和73上的处理线程不会结束,会一直等到处理完成),就返回给benchmark说写入失败了,benchmark的写入线程记录下这个失败后,就会开始进行下一个写入;72接收到新写入后,又会不断往73和74发,又会在73和74上创建新的处理写入的线程。
> 复现流程:
> 1. 192.168.10.72/ 73/74  48核384G
> benchmark 在71
> 2. 集群参数
> confignode
> MAX_HEAP_SIZE="8G"
> schema_region_consensus_protocol_class=org.apache.iotdb.consensus.ratis.RatisConsensus
> data_region_consensus_protocol_class=org.apache.iotdb.consensus.multileader.MultiLeaderConsensus
> schema_replication_factor=3
> data_replication_factor=3
> datanode
> MAX_HEAP_SIZE="256G"
> MAX_DIRECT_MEMORY_SIZE="32G"
> {color:#DE350B}max_connection_for_internal_service=300{color}
> enable_timed_flush_seq_memtable=true
> seq_memtable_flush_interval_in_ms=600000
> seq_memtable_flush_check_interval_in_ms=300000
> enable_timed_flush_unseq_memtable=true
> unseq_memtable_flush_interval_in_ms=600000
> unseq_memtable_flush_check_interval_in_ms=300000
> max_waiting_time_when_insert_blocked=3600000
> query_timeout_threshold=3600000
> 3. benchmark配置见附件
> 运行起来,创建元数据成功,写入数据会有报错,就可以复现,bm端:
> 2022-09-08 18:05:05,571 ERROR 
> cn.edu.tsinghua.iotdb.benchmark.tsdb.DBWrapper:131 - Insert batch failed 
> because
> org.apache.iotdb.rpc.StatementExecutionException: 400: 
> [EXECUTE_STATEMENT_ERROR(400)] Exception occurred: insertTablet failed. 
> org.apache.iotdb.commons.exception.IoTDBException: some children of root.test 
> have already been set to storage group
>         at org.apache.iotdb.rpc.RpcUtils.verifySuccess(RpcUtils.java:94)
>         at 
> org.apache.iotdb.rpc.RpcUtils.verifySuccessWithRedirection(RpcUtils.java:115)
>         at 
> org.apache.iotdb.session.SessionConnection.insertTablet(SessionConnection.java:589)
>         at org.apache.iotdb.session.Session.insertTablet(Session.java:1573)
>         at org.apache.iotdb.session.Session.insertTablet(Session.java:1560)
>         at 
> cn.edu.tsinghua.iotdb.benchmark.iotdb013.IoTDBSession.insertOneBatchByTablet(IoTDBSession.java:168)
>         at 
> cn.edu.tsinghua.iotdb.benchmark.iotdb013.IoTDBSessionBase.insertOneBatch(IoTDBSessionBase.java:156)
>         at 
> cn.edu.tsinghua.iotdb.benchmark.tsdb.DBWrapper.insertOneBatch(DBWrapper.java:83)
>         at 
> cn.edu.tsinghua.iotdb.benchmark.client.generate.GenerateDataMixClient.ingestionOperation(GenerateDataMixClient.java:135)
>         at 
> cn.edu.tsinghua.iotdb.benchmark.client.generate.GenerateDataMixClient.doTest(GenerateDataMixClient.java:50)
>         at 
> cn.edu.tsinghua.iotdb.benchmark.client.DataClient.run(DataClient.java:144)
>         at 
> java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
>         at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
>         at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>         at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>         at java.base/java.lang.Thread.run(Thread.java:834)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to