[jira] [Created] (IOTDB-6266) Add the ability to flush syncIndex and update reader periodically for IoTConsensus
Xinyu Tan created IOTDB-6266: Summary: Add the ability to flush syncIndex and update reader periodically for IoTConsensus Key: IOTDB-6266 URL: https://issues.apache.org/jira/browse/IOTDB-6266 Project: Apache IoTDB Issue Type: Improvement Reporter: Xinyu Tan Assignee: Xinyu Tan Attachments: image-2023-12-11-17-43-20-151.png After the PR is merged , The safeDeletedSearchIndex passed by IoTConsensus to the wal reader is no longer the syncIndex that has been synchronized, but the SyncIndex that has been flushed to disk. When a leader migration is triggered, the problem may occur that the wal of the old leader can never be deleted, resulting in a pile-up of the wal. !image-2023-12-11-17-43-20-151.png! To solve this problem, iotconsensus can add a way to flush syncIndex and update reader periodically to avoid the accumulation of the old leader's log after the leader switch -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (IOTDB-6190) Increase the threshold for Ratis to shut itself down if it detects that a process is stuck
[ https://issues.apache.org/jira/browse/IOTDB-6190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinyu Tan reassigned IOTDB-6190: Assignee: Xinyu Tan > Increase the threshold for Ratis to shut itself down if it detects that a > process is stuck > -- > > Key: IOTDB-6190 > URL: https://issues.apache.org/jira/browse/IOTDB-6190 > Project: Apache IoTDB > Issue Type: Improvement >Reporter: Xinyu Tan >Assignee: Xinyu Tan >Priority: Major > > Currently, Ratis shuts itself down after detecting a 10min GC-free pause in > the process. However, in many user scenarios, it is common to pause VMS at > the hourly level, which may cause users to frequently restart after detecting > a ratis shutdown, so we plan to adjust this parameter to 3 days. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IOTDB-6190) Increase the threshold for Ratis to shut itself down if it detects that a process is stuck
Xinyu Tan created IOTDB-6190: Summary: Increase the threshold for Ratis to shut itself down if it detects that a process is stuck Key: IOTDB-6190 URL: https://issues.apache.org/jira/browse/IOTDB-6190 Project: Apache IoTDB Issue Type: Improvement Reporter: Xinyu Tan Currently, Ratis shuts itself down after detecting a 10min GC-free pause in the process. However, in many user scenarios, it is common to pause VMS at the hourly level, which may cause users to frequently restart after detecting a ratis shutdown, so we plan to adjust this parameter to 3 days. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IOTDB-6183) Optimize the timeout retry logic of IoTConsensus sending RPCS
Xinyu Tan created IOTDB-6183: Summary: Optimize the timeout retry logic of IoTConsensus sending RPCS Key: IOTDB-6183 URL: https://issues.apache.org/jira/browse/IOTDB-6183 Project: Apache IoTDB Issue Type: Improvement Reporter: Xinyu Tan Assignee: Xinyu Tan We should never let it time out, because the logic behind a timeout is also to retry, which might actually worsen the situation. For example, resulting in a significant increase in the number of file handles -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IOTDB-6156) Fixed TConfiguration invalidly in Thrift AsyncServer For IoTConsensus
Xinyu Tan created IOTDB-6156: Summary: Fixed TConfiguration invalidly in Thrift AsyncServer For IoTConsensus Key: IOTDB-6156 URL: https://issues.apache.org/jira/browse/IOTDB-6156 Project: Apache IoTDB Issue Type: Bug Reporter: Xinyu Tan Assignee: Xinyu Tan In a user scenario, the machine configuration is as follows: 3c3d 3 replicas, 1 database 1 device 2 measurement 1 client insertAlignTablet interface batchSize 1000 time_partition_interval=314496000 The IoTConsensus data synchronization error occurs after writing to the cluster. {code:java} 2023-09-14 12:11:50,888 [pool-19-IoTDB-IoTConsensusRPC-Processor-5] WARN o.a.t.s.AbstractNonblockingServer$AsyncFrameBuffer:606 - Exception while invoking! org.apache.thrift.transport.TTransportException: MaxMessageSize reached at org.apache.thrift.transport.TEndpointTransport.countConsumedMessageBytes(TEndpointTransport.java:96) at org.apache.thrift.transport.TMemoryInputTransport.read(TMemoryInputTransport.java:97) at org.apache.thrift.transport.TTransport.readAll(TTransport.java:109) at org.apache.iotdb.rpc.AutoScalingBufferReadTransport.fill(AutoScalingBufferReadTransport.java:38) at org.apache.iotdb.rpc.TElasticFramedTransport.readFrame(TElasticFramedTransport.java:128) at org.apache.iotdb.rpc.TElasticFramedTransport.read(TElasticFramedTransport.java:108) at org.apache.thrift.transport.TTransport.readAll(TTransport.java:109) at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:463) at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:361) at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:244) at org.apache.thrift.TBaseAsyncProcessor.process(TBaseAsyncProcessor.java:52) at org.apache.thrift.server.AbstractNonblockingServer$AsyncFrameBuffer.invoke(AbstractNonblockingServer.java:603) at org.apache.thrift.server.Invocation.run(Invocation.java:18) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) {code} This is mainly due to the use of AsyncServer in IoTConsensus. At present, the default maximum size of message is 100M instead of 512M, so it needs to be updated -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IOTDB-6144) Adjust the default thrift timeout parameter to 60s
Xinyu Tan created IOTDB-6144: Summary: Adjust the default thrift timeout parameter to 60s Key: IOTDB-6144 URL: https://issues.apache.org/jira/browse/IOTDB-6144 Project: Apache IoTDB Issue Type: Improvement Reporter: Xinyu Tan After conducting research on systems such as HBase, Doris, TiDB, and others, we have found that the default RPC timeout for many systems is set to 60 seconds. Therefore, we plan to adjust the default timeout for IoTDB. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (IOTDB-6144) Adjust the default thrift timeout parameter to 60s
[ https://issues.apache.org/jira/browse/IOTDB-6144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinyu Tan reassigned IOTDB-6144: Assignee: Xinyu Tan > Adjust the default thrift timeout parameter to 60s > -- > > Key: IOTDB-6144 > URL: https://issues.apache.org/jira/browse/IOTDB-6144 > Project: Apache IoTDB > Issue Type: Improvement >Reporter: Xinyu Tan >Assignee: Xinyu Tan >Priority: Major > > After conducting research on systems such as HBase, Doris, TiDB, and others, > we have found that the default RPC timeout for many systems is set to 60 > seconds. Therefore, we plan to adjust the default timeout for IoTDB. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (IOTDB-6099) Increase the printing threshold when ratis follower sleep detects gc
[ https://issues.apache.org/jira/browse/IOTDB-6099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17760273#comment-17760273 ] Xinyu Tan commented on IOTDB-6099: -- [https://github.com/apache/iotdb/pull/10996] has been fixed > Increase the printing threshold when ratis follower sleep detects gc > > > Key: IOTDB-6099 > URL: https://issues.apache.org/jira/browse/IOTDB-6099 > Project: Apache IoTDB > Issue Type: Improvement >Reporter: Xinyu Tan >Assignee: Xinyu Tan >Priority: Major > Attachments: image-2023-08-04-15-19-08-623.png > > > !image-2023-08-04-15-19-08-623.png! > Currently, ratis followers will print logs when they detect GCs larger than > 300ms, which will make the online system have a lot of useless logs. We will > increase the threshold to 4s -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IOTDB-6121) Consensus layer interface and exception handling refactoring
Xinyu Tan created IOTDB-6121: Summary: Consensus layer interface and exception handling refactoring Key: IOTDB-6121 URL: https://issues.apache.org/jira/browse/IOTDB-6121 Project: Apache IoTDB Issue Type: Improvement Reporter: Xinyu Tan Assignee: Xinyu Tan 1. Rename the interface of the consensus layer to reduce ambiguity 2. Refactor the consensus interface to throw exceptions, forcing the upper layer to handle the exception type 3. Improve the annotation of the consensus layer 4. Reconstruct the datanode consensus singleton -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IOTDB-6116) Disassociate the IoTConsensus retry logic from the forkjoinPool
Xinyu Tan created IOTDB-6116: Summary: Disassociate the IoTConsensus retry logic from the forkjoinPool Key: IOTDB-6116 URL: https://issues.apache.org/jira/browse/IOTDB-6116 Project: Apache IoTDB Issue Type: Improvement Reporter: Xinyu Tan Assignee: Xinyu Tan The current IoTConsensus Batch retry logic relies on the forkjoinPool and takes the thread to sleep synchronously, This may lead to frequent timeouts in the follower "waiting target request timeout. current index", So we use a ScheduledExecutorService to unbind the restart logic from the forkjoinPool -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IOTDB-6106) Fixed the timeout parameter not working in thrift asyncClient
Xinyu Tan created IOTDB-6106: Summary: Fixed the timeout parameter not working in thrift asyncClient Key: IOTDB-6106 URL: https://issues.apache.org/jira/browse/IOTDB-6106 Project: Apache IoTDB Issue Type: Improvement Reporter: Xinyu Tan Assignee: Xinyu Tan Attachments: image-2023-08-09-21-13-05-225.png, image-2023-08-09-21-13-37-644.png, image-2023-08-09-21-15-02-347.png !image-2023-08-09-21-13-05-225.png! !image-2023-08-09-21-13-37-644.png! Currently, the asyncthrift timeout parameter is set using the [TNonblockingSocket|https://people.apache.org/~thejas/thrift-0.9/javadoc/org/apache/thrift/transport/TNonblockingSocket.html] constructor, but the timeout parameter is not used. This completely deactivates our asyncclient timeout parameter. !image-2023-08-09-21-15-02-347.png! We satisfy the timeout control parameter requirements by setting TAsyncClient's timeout. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (IOTDB-5557) [ metadata ] The metadata query results are inconsistent
[ https://issues.apache.org/jira/browse/IOTDB-5557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17751560#comment-17751560 ] Xinyu Tan commented on IOTDB-5557: -- 第一次测试中唯一一条异常日志的原因分析如 https://issues.apache.org/jira/browse/IOTDB-6102 所示,与本 PR 无关,当前 PR 所对应的问题已解决。 > [ metadata ] The metadata query results are inconsistent > > > Key: IOTDB-5557 > URL: https://issues.apache.org/jira/browse/IOTDB-5557 > Project: Apache IoTDB > Issue Type: Bug > Components: Core/Schema Manager, mpp-cluster >Affects Versions: 1.1.0-SNAPSHOT >Reporter: 刘珍 >Assignee: Song Ziyang >Priority: Blocker > Labels: pull-request-available > Attachments: IOTDB_5557.conf, image-2023-02-20-14-04-32-611.png, > image-2023-07-29-08-21-43-740.png, screenshot-1.png > > > master : 0219_0cd4461 > 启动集群,log_datanode_all.log出现enjoy后,查询元数据,出现查询结果不一致(动态增加,直到全部元数据加载到内存)。 > 期望:只要集群已经开始提供查询服务,就要保证查询结果的一致性。 > 测试环境: > 1. 192.168.10.76 48cpu 384GB 内存 > 元数据信息:1db,1万设备,600序列/dev。 > ConfigNode: > MAX_HEAP_SIZE="8G" > DataNode: > MAX_HEAP_SIZE="256G" > MAX_DIRECT_MEMORY_SIZE="32G" > COMMON配置 > time_partition_interval=6048000 > query_timeout_threshold=3600 > enable_seq_space_compaction=false > enable_unseq_space_compaction=false > enable_cross_space_compaction=false > 2. 清操作系统缓存,启动数据库,出现enjoy后,执行count devices查看结果 > cat check_device_count.sh > while true > do > v_start=`grep enjoy logs/log_datanode_all.log|wc -l` > if [[ ${v_start} = "1" ]];then > for i in {1..100} > do >./sbin/start-cli.sh -h 192.168.10.76 -e "count devices;" > >> dev_count_during_start.out > done > break > fi > done > 下图结果,可以看出,count devices的结果在动态增加,直至1,完全加载到内存中: > !image-2023-02-20-14-04-32-611.png! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IOTDB-6102) Enhance ConfigNode state maintenance during DataNode startup
Xinyu Tan created IOTDB-6102: Summary: Enhance ConfigNode state maintenance during DataNode startup Key: IOTDB-6102 URL: https://issues.apache.org/jira/browse/IOTDB-6102 Project: Apache IoTDB Issue Type: Improvement Reporter: Xinyu Tan Assignee: Yongzao Dan Attachments: image-2023-08-07-16-34-13-559.png, image-2023-08-07-16-34-30-834.png The current DataNode will send a request to the ConfigNode upon startup. However, the status of the DataNode maintained by the ConfigNode will be updated only when the ConfigNode sends its first heartbeat to the DataNode. During this period, new write requests to the DataNode may fail due to the absence of partition information in the ConfigNode. Although this time window is small (about 1 second), we need to enhance the state maintenance logic during this process to avoid the occurrence of the following errors. !image-2023-08-07-16-34-13-559.png|thumbnail! !image-2023-08-07-16-34-30-834.png|thumbnail! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IOTDB-6099) Increase the printing threshold when ratis follower sleep detects gc
Xinyu Tan created IOTDB-6099: Summary: Increase the printing threshold when ratis follower sleep detects gc Key: IOTDB-6099 URL: https://issues.apache.org/jira/browse/IOTDB-6099 Project: Apache IoTDB Issue Type: Improvement Reporter: Xinyu Tan Assignee: Xinyu Tan Attachments: image-2023-08-04-15-19-08-623.png !image-2023-08-04-15-19-08-623.png! Currently, ratis followers will print logs when they detect GCs larger than 300ms, which will make the online system have a lot of useless logs. We will increase the threshold to 5s -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (IOTDB-6012) Client receive The consensus group SchemaRegion[0] doesn't exist error
[ https://issues.apache.org/jira/browse/IOTDB-6012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinyu Tan reassigned IOTDB-6012: Assignee: LiYuheng (was: Song Ziyang) > Client receive The consensus group SchemaRegion[0] doesn't exist error > -- > > Key: IOTDB-6012 > URL: https://issues.apache.org/jira/browse/IOTDB-6012 > Project: Apache IoTDB > Issue Type: Bug > Components: Core/Cluster >Reporter: Yuan Tian >Assignee: LiYuheng >Priority: Major > Attachments: SessionExample.java, image-2023-06-20-09-41-11-877.png > > > Using GraalVM CE 22.3.1 as your jdk and then mvn clean package to get your > distribution package. > Execute the SessionExample in the attachment and you will get the error > message: > !image-2023-06-20-09-41-11-877.png! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (IOTDB-5860) Total Number of file is wrong
[ https://issues.apache.org/jira/browse/IOTDB-5860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17746276#comment-17746276 ] Xinyu Tan commented on IOTDB-5860: -- !screenshot-1.png! has been fixed in rc/1.2.0 > Total Number of file is wrong > - > > Key: IOTDB-5860 > URL: https://issues.apache.org/jira/browse/IOTDB-5860 > Project: Apache IoTDB > Issue Type: Bug >Reporter: Yuan Tian >Assignee: Hongyin Zhang >Priority: Major > Attachments: image-2023-05-10-17-41-08-561.png, screenshot-1.png > > > Should not add the open_file_handlers or more exactly, we should not put > open_file_handlers metric in this chart > !image-2023-05-10-17-41-08-561.png! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IOTDB-6061) Fix the instability failure caused by initServer in IoTConsensus UT not binding to the corresponding port
Xinyu Tan created IOTDB-6061: Summary: Fix the instability failure caused by initServer in IoTConsensus UT not binding to the corresponding port Key: IOTDB-6061 URL: https://issues.apache.org/jira/browse/IOTDB-6061 Project: Apache IoTDB Issue Type: Improvement Reporter: Xinyu Tan Assignee: Xinyu Tan !image-2023-07-12-16-30-40-880.png|thumbnail! Add logic to wait until the port is available to execute the test -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IOTDB-6051) Fixed concurrency error in IoTConsensus UT when stopping cluster
Xinyu Tan created IOTDB-6051: Summary: Fixed concurrency error in IoTConsensus UT when stopping cluster Key: IOTDB-6051 URL: https://issues.apache.org/jira/browse/IOTDB-6051 Project: Apache IoTDB Issue Type: Improvement Reporter: Xinyu Tan Attachments: image-2023-07-05-20-18-03-912.png, screenshot-1.png !image-2023-07-05-20-18-03-912.png|thumbnail! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (IOTDB-6051) Fixed concurrency error in IoTConsensus UT when stopping cluster
[ https://issues.apache.org/jira/browse/IOTDB-6051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinyu Tan reassigned IOTDB-6051: Assignee: Xinyu Tan > Fixed concurrency error in IoTConsensus UT when stopping cluster > > > Key: IOTDB-6051 > URL: https://issues.apache.org/jira/browse/IOTDB-6051 > Project: Apache IoTDB > Issue Type: Improvement >Reporter: Xinyu Tan >Assignee: Xinyu Tan >Priority: Major > Attachments: image-2023-07-05-20-18-03-912.png, screenshot-1.png > > > !image-2023-07-05-20-18-03-912.png|thumbnail! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IOTDB-6022) The WAL piles up when multi-replica iotconsensus is written at high concurrency
Xinyu Tan created IOTDB-6022: Summary: The WAL piles up when multi-replica iotconsensus is written at high concurrency Key: IOTDB-6022 URL: https://issues.apache.org/jira/browse/IOTDB-6022 Project: Apache IoTDB Issue Type: Bug Reporter: Xinyu Tan Attachments: image-2023-06-21-22-48-38-945.png !image-2023-06-21-22-48-38-945.png|thumbnail! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (IOTDB-5931) The "show cluster" command displays nodes with "Unknown" status, but these nodes can still perform read and write operations normally.
[ https://issues.apache.org/jira/browse/IOTDB-5931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinyu Tan reassigned IOTDB-5931: Assignee: huxiangpeng (was: Xinyu Tan) > The "show cluster" command displays nodes with "Unknown" status, but these > nodes can still perform read and write operations normally. > -- > > Key: IOTDB-5931 > URL: https://issues.apache.org/jira/browse/IOTDB-5931 > Project: Apache IoTDB > Issue Type: Bug > Components: Core/Cluster, mpp-cluster >Reporter: 刘珍 >Assignee: huxiangpeng >Priority: Major > Attachments: exp.out, image-2023-05-29-15-30-45-953.png, > image-2023-05-29-15-31-02-797.png, ip23_cn_logs.tar.gz, ip24_cn_logs.tar.gz, > ip25_cn_logs.tar.gz, load_insert_drop_db_1.sh, run_2_client.sh > > > 测试版本:iotdb master 0524_12d67e0 > 问题1 : > 3副本3C21D集群,长时间循环运行 load tsfile ; delete 所有数据;show cluster > 21D显式状态为Unkown,但是客户端仍然可以继续读写正常。 > !image-2023-05-29-15-30-45-953.png! > 问题2:不同datanode show cluster 结果不同 > !image-2023-05-29-15-31-02-797.png! > 测试环境 ,私有云1期,172.16.2.2 - 25 > 1. 配置参数 > COMMON配置 > schema_region_group_extension_policy=CUSTOM > default_schema_region_group_num_per_database=10 > data_region_group_extension_policy=CUSTOM > default_data_region_group_num_per_database=42 > min_cross_compaction_unseq_file_level=0 > schema_replication_factor=3 > data_replication_factor=3 > default_storage_group_level=2 > compaction_write_throughput_mb_per_sec=64 > confignode > MAX_HEAP_SIZE="20G" > MAX_DIRECT_MEMORY_SIZE="6G" > cn_target_config_node_list=172.16.2.23:10710 > DATANODE: > MAX_HEAP_SIZE="20G" > MAX_DIRECT_MEMORY_SIZE="6G" > dn_target_config_node_list=172.16.2.23:10710,172.16.2.24:10710,172.16.2.25:10710 > 2. 客户端测试脚本在172.16.2.2 > /data1/iotdb/i_m_0524_12d67e0路径下 > cat load_insert_drop_db_1.sh > v_host="172.16.2.2" > cluster_dir="/data1/iotdb" > db_commit="i_m_0524_12d67e0" > db_dir="${cluster_dir}/${db_commit}" > u_name="root" > ${db_dir}/sbin/start-cli.sh -h ${v_host} -e "delete from root.test.g_0.**;" > ${db_dir}/sbin/start-cli.sh -h ${v_host} -e 'load > "/data/iotdb/load_tsfile/load_tsfile_level_1/" verify=false sglevel=2 > onSuccess=none' > ${db_dir}/sbin/start-cli.sh -h ${v_host} -e 'load > "/data/iotdb/load_tsfile/load_tsfile_1" verify=false sglevel=2 > onSuccess=none' > ${db_dir}/sbin/start-cli.sh -h ${v_host} -e 'load > "/data/iotdb/load_tsfile/load_tsfile_2" verify=false sglevel=2 > onSuccess=none' > ${db_dir}/sbin/start-cli.sh -h ${v_host} -e "flush" > ${db_dir}/sbin/start-cli.sh -h ${v_host} -e "select count(s_0) from > root.test.g_0.** align by device;" >act.out > v_diff=`diff ${db_dir}/exp.out ${db_dir}/act.out|grep root|wc -l` > if [[ ${v_diff} = 0 ]];then >echo "query pass." >> query_res.out > else >v_date=`date "+%Y-%m-%d_%H_%M_%S"` >echo "${v_date} query fail." >> aft_load_query_res.out > fi > exec 3<./dn.txt > while read node <&3 > do > v_comp=`ssh ${u_name}@${node} "find ${db_dir}/data/ -name > *compaction.log|wc -l"` > if [[ ${v_comp} -gt 0 ]];then > sleep 2 > ${db_dir}/sbin/start-cli.sh -h ${v_host} -e "delete from > root.test.g_0.**;" > ${db_dir}/sbin/start-cli.sh -h ${v_host} -e "select > count(s_0) from root.test.g_0.** having count(s_0)>0;" >> del_data_q.out > break > fi > done > sleep 10 > for i in {1..3} > do > exec 3<./dn.txt > while read node <&3 > do > v_comp=`ssh ${u_name}@${node} "find ${db_dir}/data/ -name > *compaction.log|wc -l"` > if [[ ${v_comp} -gt 0 ]];then >echo "${node} after delete from root.test.g_0 still > compacting." >> not_expect_res.out > fi > done > done -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (IOTDB-5864) Print failed to install snapshot warn log while restarting
[ https://issues.apache.org/jira/browse/IOTDB-5864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinyu Tan reassigned IOTDB-5864: Assignee: Song Ziyang (was: Xinyu Tan) > Print failed to install snapshot warn log while restarting > -- > > Key: IOTDB-5864 > URL: https://issues.apache.org/jira/browse/IOTDB-5864 > Project: Apache IoTDB > Issue Type: Bug >Reporter: Yuan Tian >Assignee: Song Ziyang >Priority: Major > Attachments: image-2023-05-11-15-28-57-190.png, > image-2023-05-11-15-34-54-296.png > > > The write throughput is also very low all the time after restarting. > !image-2023-05-11-15-28-57-190.png! > !image-2023-05-11-15-34-54-296.png! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IOTDB-5850) [CI stability] Write error becasue failed to get replicaSet of consensus group
Xinyu Tan created IOTDB-5850: Summary: [CI stability] Write error becasue failed to get replicaSet of consensus group Key: IOTDB-5850 URL: https://issues.apache.org/jira/browse/IOTDB-5850 Project: Apache IoTDB Issue Type: Improvement Reporter: Xinyu Tan Attachments: IoTDBSortedShowTimeseriesIT_showTimeseriesOrderByHeatWithLimitTest[SchemaEngineMode=SchemaFile].zip, image-2023-05-08-19-29-44-282.png, image-2023-05-08-19-30-06-447.png, image-2023-05-08-19-30-27-561.png !image-2023-05-08-19-29-44-282.png|thumbnail! !image-2023-05-08-19-30-06-447.png|thumbnail! !image-2023-05-08-19-30-27-561.png|thumbnail! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IOTDB-5841) Modify IoTConsensus default parameters to improve performance in more scenarios
Xinyu Tan created IOTDB-5841: Summary: Modify IoTConsensus default parameters to improve performance in more scenarios Key: IOTDB-5841 URL: https://issues.apache.org/jira/browse/IOTDB-5841 Project: Apache IoTDB Issue Type: Improvement Reporter: Xinyu Tan Assignee: Xinyu Tan * Add pipelineNum metric * change maxPendingBatch from 5 to 12 to improve replication performance:this change may occur more IoTConsensusServiceThread in some case,so we can not make it too large。 * change maxLogEntriesNumPerBatch from 30 to 1024:For small request scenarios, this change can increase the size of each Batch, thereby increasing the synchronization speed. This change will not affect large requests because the size of each Batch is limited to no more than 16M. * Change the queue from an infinitely long LinkedBlockingQueue to an ArrayBlockingQueue to avoid a single queue taking up too much memory. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IOTDB-5840) Avoid the problem that the insertRecords interface may cause the number of threads to balloon when there are too many data regions
Xinyu Tan created IOTDB-5840: Summary: Avoid the problem that the insertRecords interface may cause the number of threads to balloon when there are too many data regions Key: IOTDB-5840 URL: https://issues.apache.org/jira/browse/IOTDB-5840 Project: Apache IoTDB Issue Type: Improvement Reporter: Xinyu Tan Assignee: Xinyu Tan On a machine with sufficient CPU resources (for example, 32 cores), if the number of Dataregions is too small, the write pressure in the cluster is concentrated on the locks of these regions. As a result, the write latency is high and the throughput cannot be increased. When the number of DataRegion is large, for an InsertRecords request with a large batchSize such as 1, its write request may involve many DataRegion. Once the concurrency is high, It takes hundreds of internalServiceClient to dispatch the planNode. Under the current threading model of BIO, this would also increase the number of InternalServiceRPC threads in the cluster to hundreds or thousands. For example, in a user test environment, coreSize of the clientManager is set to 600 and maxSize is set to 1000 to prevent concurrent write requests from blocking each other while obtaining internalServiceClient. The result is that each node has nearly 1000 InternalServiceRPC threads. If the client increases concurrency further, a "connection reset by peer" error is reported. This error should be caused by the default parameters of the linux kernel not supporting so many connections. The current mpp framework splits Plannodes by region only. Therefore, the number of RPCS to be sent per write request is closely related to the number of dataregion involved in the request rather than the number of Datanodes. The solution to this problem is to aggregate RPC requests sent to the same datanode. This reduces the pressure on the clientManager and reduces the number of InternalServiceRPC threads. Avoid sending the connection reset by peer error to the client again. After the optimization, the number of RPC service threads was reduced from 1000 to 200. The connection reset by peer error was cleared. And we can increase the number of regions to make full use of cluster cpu resources -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (IOTDB-5780) Let users know a node was successfully removed and data is recovered
[ https://issues.apache.org/jira/browse/IOTDB-5780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinyu Tan reassigned IOTDB-5780: Assignee: Xinyu Tan > Let users know a node was successfully removed and data is recovered > > > Key: IOTDB-5780 > URL: https://issues.apache.org/jira/browse/IOTDB-5780 > Project: Apache IoTDB > Issue Type: Improvement >Reporter: Jialin Qiao >Assignee: Xinyu Tan >Priority: Minor > Attachments: screenshot-1.png > > > When the datanode is removed, we will copy the data asynchronously to a new > node to keep the replication_factor. Here users need to know how the > asynchronous job is finished. We need to provide the inspection. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (IOTDB-5780) Let users know a node was successfully removed and data is recovered
[ https://issues.apache.org/jira/browse/IOTDB-5780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17717593#comment-17717593 ] Xinyu Tan commented on IOTDB-5780: -- A procedure task is generated for cluster reduction. After all regions are migrated, the system attempts to stop the corresponding node. During this process, a log is printed for users to evaluate data migration. !screenshot-1.png! In addition, you can run the show cluster command to view the status of the region. We are still reconstructing these status types, which is expected to take 1-2 months. > Let users know a node was successfully removed and data is recovered > > > Key: IOTDB-5780 > URL: https://issues.apache.org/jira/browse/IOTDB-5780 > Project: Apache IoTDB > Issue Type: Improvement >Reporter: Jialin Qiao >Assignee: Xinyu Tan >Priority: Minor > Attachments: screenshot-1.png > > > When the datanode is removed, we will copy the data asynchronously to a new > node to keep the replication_factor. Here users need to know how the > asynchronous job is finished. We need to provide the inspection. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IOTDB-5835) Fix wal accumulation caused by datanode restart
Xinyu Tan created IOTDB-5835: Summary: Fix wal accumulation caused by datanode restart Key: IOTDB-5835 URL: https://issues.apache.org/jira/browse/IOTDB-5835 Project: Apache IoTDB Issue Type: Improvement Reporter: Xinyu Tan Assignee: Xinyu Tan Attachments: image-2023-04-28-11-08-43-542.png, image-2023-04-28-11-08-51-622.png, image-2023-04-28-11-08-57-549.png, image-2023-04-28-11-09-03-902.png When cluster is running properly, if replica A of a consensus group becomes the Leader, it continuously sends logs to other followers and updates wal's safelyDeletedSearchIndex after sending logs. wal files is deleted asynchronously. Therefore, if a restart occurs, some logs that have been synchronized to other nodes may not be deleted. After the restart, perhaps another replica B becomes the Leader and the current replica A becomes a Follower receiving logs. Because the current IoTConsensus does not use its recovered syncIndex to set the safelyDeletedSearchIndex of the underlying walnode at startup, replica A cannot delete wal files at this time, which results in the accumulation of WAL files. Write requests of all regions on the node are affected. !image-2023-04-28-11-08-43-542.png|thumbnail! !image-2023-04-28-11-08-51-622.png|thumbnail! !image-2023-04-28-11-08-57-549.png|thumbnail! !image-2023-04-28-11-09-03-902.png|thumbnail! The solution to this problem is to update the safelyDeletedSearchIndex of reader at startup -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IOTDB-5828) Optimize the implementation of some metric items in the metric module to prevent Prometheus pull timeouts
Xinyu Tan created IOTDB-5828: Summary: Optimize the implementation of some metric items in the metric module to prevent Prometheus pull timeouts Key: IOTDB-5828 URL: https://issues.apache.org/jira/browse/IOTDB-5828 Project: Apache IoTDB Issue Type: Improvement Reporter: Xinyu Tan Assignee: Liuxuxin Attachments: image-2023-04-27-17-01-37-144.png, image-2023-04-27-17-03-29-978.png !image-2023-04-27-17-03-29-978.png! !image-2023-04-27-17-01-37-144.png! Under high write pressure, even without Full GC, the elapsed time of individual monitor items in the monitoring framework will cause the Prometheus pull sampling timeout, resulting in missing monitor data, which ultimately affects performance problem troubleshooting. The three main time points found by jprofile sampling are the number of file handles, the number of client concurrency, and the number of threads. The implementation needs to be optimized -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (IOTDB-5777) When writing data using non-root users, the permission authentication module takes too long
[ https://issues.apache.org/jira/browse/IOTDB-5777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinyu Tan reassigned IOTDB-5777: Assignee: Hongyin Zhang > When writing data using non-root users, the permission authentication module > takes too long > --- > > Key: IOTDB-5777 > URL: https://issues.apache.org/jira/browse/IOTDB-5777 > Project: Apache IoTDB > Issue Type: Improvement >Reporter: Liuxuxin >Assignee: Hongyin Zhang >Priority: Major > Fix For: master branch, 1.1.0 > > Attachments: 20230414-162617.html, image-2023-04-17-11-27-41-532.png > > > When writing data using non-root users, the time consumption of the > permission authentication module is too high, accounting for about 2/3 of the > total write time. The flame graph shows that the time consumption is mainly > concentrated on the initialization of PartialPath. > !image-2023-04-17-11-27-41-532.png! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IOTDB-5731) Reconstructs the cli to support printing the enterprise logo when connecting to the Enterprise Edition
Xinyu Tan created IOTDB-5731: Summary: Reconstructs the cli to support printing the enterprise logo when connecting to the Enterprise Edition Key: IOTDB-5731 URL: https://issues.apache.org/jira/browse/IOTDB-5731 Project: Apache IoTDB Issue Type: Improvement Reporter: Xinyu Tan Assignee: Xinyu Tan see [doc|https://apache-iotdb.feishu.cn/docx/KCj5dYt3FoZNvrxS0Slc4mLMntd] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IOTDB-5725) Make internal report recording measurements asynchronous
Xinyu Tan created IOTDB-5725: Summary: Make internal report recording measurements asynchronous Key: IOTDB-5725 URL: https://issues.apache.org/jira/browse/IOTDB-5725 Project: Apache IoTDB Issue Type: Improvement Reporter: Xinyu Tan Assignee: Xinyu Tan InternalReporter of the current metric module writes synchronously to the iotdb, which may cause a slow flush. In particular, when the system records flush points for the first time, the system needs to create related regions of root.__system, which takes a long time and may result in system reject errors. This issue adjusts the processes written to the iotdb to be asynchronous and ensures that all of the metric module's current operations will be memory operations only and will not involve time-consuming RPCS. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IOTDB-5697) Only record engine cost for DataRegion in Performance Overview Dashboard
Xinyu Tan created IOTDB-5697: Summary: Only record engine cost for DataRegion in Performance Overview Dashboard Key: IOTDB-5697 URL: https://issues.apache.org/jira/browse/IOTDB-5697 Project: Apache IoTDB Issue Type: Improvement Reporter: Xinyu Tan Assignee: Xinyu Tan Currently, when we record the write state machine cost in the Performance Overview panel, we do not record only for the DataRegion, which may make the time inaccurate and thus inconsistent with the latency of the downstream disassembly -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IOTDB-5695) Ensures backward compatibility between 1.0 and 1.1 for ConfigNode when using SimpleConsensus
Xinyu Tan created IOTDB-5695: Summary: Ensures backward compatibility between 1.0 and 1.1 for ConfigNode when using SimpleConsensus Key: IOTDB-5695 URL: https://issues.apache.org/jira/browse/IOTDB-5695 Project: Apache IoTDB Issue Type: Improvement Reporter: Xinyu Tan Assignee: Xinyu Tan In version 1.1, we fixed a 1.0 SimpleConsensus bug that incorrectly set the consensus directory. For backward compatibility, we need to rename a dir name. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (IOTDB-5368) DataNode launching error when the internal_port and rpc_port are same
[ https://issues.apache.org/jira/browse/IOTDB-5368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinyu Tan reassigned IOTDB-5368: Assignee: Yufeng Liu > DataNode launching error when the internal_port and rpc_port are same > - > > Key: IOTDB-5368 > URL: https://issues.apache.org/jira/browse/IOTDB-5368 > Project: Apache IoTDB > Issue Type: Bug >Reporter: Gaofei Cao >Assignee: Yufeng Liu >Priority: Minor > Labels: pull-request-available > Attachments: image-2023-01-05-19-51-39-734.png > > > > In this case, DataNode launching will meet error, but ConfigNode still also > register this DataNode. > !image-2023-01-05-19-51-39-734.png|width=448,height=268! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (IOTDB-5684) [Uncertain Path] Got a folder named ‘target’ in iotdb
[ https://issues.apache.org/jira/browse/IOTDB-5684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinyu Tan reassigned IOTDB-5684: Assignee: Yongzao Dan (was: Xinyu Tan) > [Uncertain Path] Got a folder named ‘target’ in iotdb > -- > > Key: IOTDB-5684 > URL: https://issues.apache.org/jira/browse/IOTDB-5684 > Project: Apache IoTDB > Issue Type: Bug > Components: Core/Cluster >Affects Versions: 1.1.0-SNAPSHOT >Reporter: Qingxin Feng >Assignee: Yongzao Dan >Priority: Major > Attachments: image-2023-03-16-11-19-47-106.png > > > Got a folder named ‘target’ in iotdb,but it is generated at the location > where the startup script is running. > Please check this issue.Thanks. > B.R > !image-2023-03-16-11-19-47-106.png! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IOTDB-5674) Remove useless log in MicrometerAutoGauge
Xinyu Tan created IOTDB-5674: Summary: Remove useless log in MicrometerAutoGauge Key: IOTDB-5674 URL: https://issues.apache.org/jira/browse/IOTDB-5674 Project: Apache IoTDB Issue Type: Improvement Reporter: Xinyu Tan Assignee: Xinyu Tan Currently MicrometerAutoGauge prints all monitors registered with it, causing a lot of useless logging when the cluster starts up, so this issue will remove these logs -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (IOTDB-5616) [Sonar]Fix some code smells and bugs given by SonarLint and sonalcloud
[ https://issues.apache.org/jira/browse/IOTDB-5616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinyu Tan reassigned IOTDB-5616: Assignee: (was: Xinyu Tan) > [Sonar]Fix some code smells and bugs given by SonarLint and sonalcloud > -- > > Key: IOTDB-5616 > URL: https://issues.apache.org/jira/browse/IOTDB-5616 > Project: Apache IoTDB > Issue Type: Improvement >Reporter: Yufeng Liu >Priority: Major > Labels: pull-request-available > Fix For: master branch > > Original Estimate: 504h > Remaining Estimate: 504h > > There are 300+ bugs and 19k+ code smells in IoTDB now. This issue will try to > fix some of them.For details, please see [Apache IoTDB Project Parent > POM|https://sonarcloud.io/project/overview?id=apache_incubator-iotdb]. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (IOTDB-5616) [Sonar]Fix some code smells and bugs given by SonarLint and sonalcloud
[ https://issues.apache.org/jira/browse/IOTDB-5616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinyu Tan reassigned IOTDB-5616: Assignee: Xinyu Tan > [Sonar]Fix some code smells and bugs given by SonarLint and sonalcloud > -- > > Key: IOTDB-5616 > URL: https://issues.apache.org/jira/browse/IOTDB-5616 > Project: Apache IoTDB > Issue Type: Improvement >Reporter: Yufeng Liu >Assignee: Xinyu Tan >Priority: Major > Labels: pull-request-available > Fix For: master branch > > Original Estimate: 504h > Remaining Estimate: 504h > > There are 300+ bugs and 19k+ code smells in IoTDB now. This issue will try to > fix some of them.For details, please see [Apache IoTDB Project Parent > POM|https://sonarcloud.io/project/overview?id=apache_incubator-iotdb]. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (IOTDB-5368) DataNode launching error when the internal_port and rpc_port are same
[ https://issues.apache.org/jira/browse/IOTDB-5368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinyu Tan reassigned IOTDB-5368: Assignee: (was: Yongzao Dan) > DataNode launching error when the internal_port and rpc_port are same > - > > Key: IOTDB-5368 > URL: https://issues.apache.org/jira/browse/IOTDB-5368 > Project: Apache IoTDB > Issue Type: Bug >Reporter: Gaofei Cao >Priority: Minor > Labels: pull-request-available > Attachments: image-2023-01-05-19-51-39-734.png > > > > In this case, DataNode launching will meet error, but ConfigNode still also > register this DataNode. > !image-2023-01-05-19-51-39-734.png|width=448,height=268! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (IOTDB-5300) [MIGRATE REGION] Meets error in region migrate state
[ https://issues.apache.org/jira/browse/IOTDB-5300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17698392#comment-17698392 ] Xinyu Tan commented on IOTDB-5300: -- 目前 readonly 状态存储引擎不允许写入日志,因此没办法支持数据的迁移。 > [MIGRATE REGION] Meets error in region migrate state > > > Key: IOTDB-5300 > URL: https://issues.apache.org/jira/browse/IOTDB-5300 > Project: Apache IoTDB > Issue Type: Bug > Components: mpp-cluster >Affects Versions: master branch, 1.1.0-SNAPSHOT >Reporter: 刘珍 >Assignee: 陈哲涵 >Priority: Major > Labels: pull-request-available > Attachments: image-2022-12-28-10-39-02-695.png, > image-2022-12-28-10-40-59-938.png, image-2022-12-28-10-41-57-776.png, > image-2022-12-28-10-42-35-206.png, image-2022-12-28-10-45-20-688.png, > image-2022-12-29-12-10-15-335.png, screenshot-1.png > > > master_1227_65fb480 > 迁移region失败。 > 1.查看region信息,RegionId=1 状态是ReadOnly > !image-2022-12-28-10-39-02-695.png! > 2.迁移region > ./sbin/start-cli.sh -h 172.16.2.5 -e "migrate region 1 from 3 to 13" > !image-2022-12-28-10-40-59-938.png! > 实际没有迁移成功 > !image-2022-12-28-10-41-57-776.png! > 再次迁移,提示目标节点已存在迁移region > !image-2022-12-28-10-42-35-206.png! > 3.ConfigNode日志 > !image-2022-12-28-10-45-20-688.png! > 测试环境 > 私有云1期,测试流程同 > https://issues.apache.org/jira/browse/IOTDB-5298 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (IOTDB-5613) Remove unnecessary serialization in IoTConsensus when replicaNum is 1 to improve write performance
[ https://issues.apache.org/jira/browse/IOTDB-5613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17695694#comment-17695694 ] Xinyu Tan commented on IOTDB-5613: -- I tested with 1c1d on the machine and the throughput improved by 70% Before: !screenshot-1.png! After: !screenshot-2.png! > Remove unnecessary serialization in IoTConsensus when replicaNum is 1 to > improve write performance > -- > > Key: IOTDB-5613 > URL: https://issues.apache.org/jira/browse/IOTDB-5613 > Project: Apache IoTDB > Issue Type: Improvement >Reporter: Xinyu Tan >Assignee: Xinyu Tan >Priority: Major > Labels: pull-request-available > Attachments: image-2023-03-02-19-01-00-991.png, screenshot-1.png, > screenshot-2.png > > > The current IoTConsensus still serializes each request at the consensus layer > when replicaNum = 1, which significantly increases the time spent at the > consensus layer in the full-link tracking panel. > !image-2023-03-02-19-01-00-991.png! > Although ISSUE [4855|https://github.com/apache/iotdb/pull/8025] refactored > the seriality-related code, the problem predates that refactoring. > This issue will avoid these unwanted serializations > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IOTDB-5613) Remove unnecessary serialization in IoTConsensus when replicaNum is 1 to improve write performance
Xinyu Tan created IOTDB-5613: Summary: Remove unnecessary serialization in IoTConsensus when replicaNum is 1 to improve write performance Key: IOTDB-5613 URL: https://issues.apache.org/jira/browse/IOTDB-5613 Project: Apache IoTDB Issue Type: Improvement Reporter: Xinyu Tan Assignee: Xinyu Tan Attachments: image-2023-03-02-19-01-00-991.png The current IoTConsensus still serializes each request at the consensus layer when replicaNum = 1, which significantly increases the time spent at the consensus layer in the full-link tracking panel. !image-2023-03-02-19-01-00-991.png! Although ISSUE [4855|https://github.com/apache/iotdb/pull/8025] refactored the seriality-related code, the problem predates that refactoring. This issue will avoid these unwanted serializations -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IOTDB-5601) [Refactor] Remove AsyncConfigNodeHeartbeatServiceClient and AsyncDataNodeHeartbeatServiceClient as there core logic are duplicated
Xinyu Tan created IOTDB-5601: Summary: [Refactor] Remove AsyncConfigNodeHeartbeatServiceClient and AsyncDataNodeHeartbeatServiceClient as there core logic are duplicated Key: IOTDB-5601 URL: https://issues.apache.org/jira/browse/IOTDB-5601 Project: Apache IoTDB Issue Type: Improvement Reporter: Xinyu Tan Assignee: Xinyu Tan For AsyncConfigNodeHeartbeatServiceClient and AsyncConfigNodeIServiceClient, AsyncDataNodeHeartbeatServiceClient and AsyncDataNodeInternalServiceClient, the difference of them is whether to print log when meeting exception, so the issue to delete the above two classes, and for thriftClientProperty added a print log parameters, Reduced redundant code -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IOTDB-5596) Rename ConfigNodeRegion to ConfigRegion
Xinyu Tan created IOTDB-5596: Summary: Rename ConfigNodeRegion to ConfigRegion Key: IOTDB-5596 URL: https://issues.apache.org/jira/browse/IOTDB-5596 Project: Apache IoTDB Issue Type: Improvement Reporter: Xinyu Tan Assignee: Xinyu Tan There are currently three consensus layer types in the cluster: * DataRegion * SchemaRegion * ConfigNodeRegion As you can see, the name configNodeRegion clearly doesn't match the other two names, but the previous name PartitionRegion has too little responsibility, so we plan to name it ConfigRegion -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IOTDB-5595) Fix memory leak for TsFileProcessorInfoMetrics in TsFileProcessorInfo
Xinyu Tan created IOTDB-5595: Summary: Fix memory leak for TsFileProcessorInfoMetrics in TsFileProcessorInfo Key: IOTDB-5595 URL: https://issues.apache.org/jira/browse/IOTDB-5595 Project: Apache IoTDB Issue Type: Improvement Reporter: Xinyu Tan Assignee: Xinyu Tan Currently, each memtable corresponds to a TsFileProcessorInfo, where it registers itself with the metric module, but there is no logic to remove it after memtable is flushed. This results in a memory leak. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IOTDB-5585) Change InternalReporterType from IoTDB to Memory to reduce performance degradation
Xinyu Tan created IOTDB-5585: Summary: Change InternalReporterType from IoTDB to Memory to reduce performance degradation Key: IOTDB-5585 URL: https://issues.apache.org/jira/browse/IOTDB-5585 Project: Apache IoTDB Issue Type: Improvement Reporter: Xinyu Tan Assignee: Xinyu Tan At present, the default value of InternalReporterType is IoTDB, which may affect the performance. We plan to change it back to Memory by default. IoTDB versions optimized for this function can be opened by default -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (IOTDB-5566) Give a interface to show the configurations of IoTDB in command window
[ https://issues.apache.org/jira/browse/IOTDB-5566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17691426#comment-17691426 ] Xinyu Tan commented on IOTDB-5566: -- We currently support this sql, see the [documentation|https://iotdb.apache.org/UserGuide/Master/Cluster/Cluster-Maintenance.html#show-variables] for details > Give a interface to show the configurations of IoTDB in command window > -- > > Key: IOTDB-5566 > URL: https://issues.apache.org/jira/browse/IOTDB-5566 > Project: Apache IoTDB > Issue Type: New Feature >Reporter: changxue >Assignee: Xinyu Tan >Priority: Major > > Give a interface to show the configurations of IoTDB in command window > The configurations of iotdb-common.properties, iotdb-confignode.properties > and iotdb-datanode.properties should be shown in command window(cli), because > users may update these configurations and would like to make sure whether the > modification is effective. > Like mysql: show variables mem% -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IOTDB-5564) Modify the consensus layer to service read requests after the restart recovery is complete
Xinyu Tan created IOTDB-5564: Summary: Modify the consensus layer to service read requests after the restart recovery is complete Key: IOTDB-5564 URL: https://issues.apache.org/jira/browse/IOTDB-5564 Project: Apache IoTDB Issue Type: Improvement Reporter: Xinyu Tan Assignee: Xinyu Tan The underlying metadata engine and storage engine can serve read requests only after the recovery is complete during the restart. However, the restart and recovery of the Ratis are asynchronous. Therefore, if a region uses the ratis consensus algorithm, read requests may be served before the recovery is complete during the restart, resulting in incomplete status being returned. We need to ensure that the logs of ratis are all recovered before serving read requests -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IOTDB-5562) Change the data type of AutoGuage from long to double in metric module
Xinyu Tan created IOTDB-5562: Summary: Change the data type of AutoGuage from long to double in metric module Key: IOTDB-5562 URL: https://issues.apache.org/jira/browse/IOTDB-5562 Project: Apache IoTDB Issue Type: Improvement Reporter: Xinyu Tan Assignee: Xinyu Tan The current metric module's AutoGuage type supports long rather than double, but our default metric module class, MicrometerAutoGuage, expects a double data type, so we do a type cast internally for all long values. Some users expect AutoGuage to support double data type, which not only allows recording decimals, but also potentially reduces type cast twice such as "double->long->double". After this change, AutoGuage of type long still requires one type cast, but AutoGuage of type double has been reduced from two to zero type casts, resulting in a small positive performance gain. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IOTDB-5560) Increase default consensusLogAppenderBufferSize from 4M to 16M to reduce the probability of large request write failures
Xinyu Tan created IOTDB-5560: Summary: Increase default consensusLogAppenderBufferSize from 4M to 16M to reduce the probability of large request write failures Key: IOTDB-5560 URL: https://issues.apache.org/jira/browse/IOTDB-5560 Project: Apache IoTDB Issue Type: Improvement Reporter: Xinyu Tan Assignee: Xinyu Tan Some current [issue|https://github.com/apache/iotdb/issues/8403] have reported that IoTDB 1.0 cannot support write requests larger than 4M, mainly related to a configuration within Ratis. Although setting it larger may cause the unhealthy state of the cluster, the current 4M is too small, which interferes with the normal use of users in some scenarios, so we plan to increase it to 16M -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (IOTDB-5512) [ IoTConsensus Resend log ] The unsequence tsfile is generated after the cluster is restarted (
[ https://issues.apache.org/jira/browse/IOTDB-5512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinyu Tan reassigned IOTDB-5512: Assignee: huxiangpeng (was: Xinyu Tan) > [ IoTConsensus Resend log ] The unsequence tsfile is generated after the > cluster is restarted ( > --- > > Key: IOTDB-5512 > URL: https://issues.apache.org/jira/browse/IOTDB-5512 > Project: Apache IoTDB > Issue Type: Improvement > Components: mpp-cluster >Reporter: 刘珍 >Assignee: huxiangpeng >Priority: Major > Attachments: image-2023-02-09-15-45-05-253.png, > image-2023-02-09-15-52-51-085.png, insert_no_overflow.config.properties > > > 测试版本:rc/1.0.1 20230202 63b16f2 > 问题描述: > 3副本3节点集群,benchmark写入顺序数据,重启前检查data 为全顺序,重启集群后,会生成部分乱序tsfile。 > 重启集群前查看所有dataregion的log同步情况,已同步完成: > !image-2023-02-09-15-45-05-253.png! > consensus 文件记录的已同步的log index是10 > !image-2023-02-09-15-52-51-085.png! > 重启集群后,这90条log会重发,导致有乱序tsfile生成,可以优化一下,解决这个问题。 > 测试流程: > 1. 私有云3期 > 1ConfigNode 172.20.70.5 > 3DataNode 172.20.70.2/4/14 > benchmark 在172.20.70.13 (配置见附件) > 集群配置参数 > ConfigNode > MAX_HEAP_SIZE="20G" > MAX_DIRECT_MEMORY_SIZE="6G" > DataNode > MAX_HEAP_SIZE="20G" > MAX_DIRECT_MEMORY_SIZE="6G" > dn_max_connection_for_internal_service=300 > common文件 > schema_replication_factor=3 > data_replication_factor=3 > enable_seq_space_compaction=false > enable_unseq_space_compaction=false > enable_cross_space_compaction=false > config_node_consensus_protocol_class=org.apache.iotdb.consensus.ratis.RatisConsensus > schema_region_consensus_protocol_class=org.apache.iotdb.consensus.ratis.RatisConsensus > data_region_consensus_protocol_class=org.apache.iotdb.consensus.iot.IoTConsensus > 2. 运行benchmark > 3. 写入完成,检查data ,无乱序tsfile,log同步完成,重启集群。 > 预期结果,重启后,无乱序tsfile。 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (IOTDB-5507) Optimized the logic that Datanodes can be added to the cluster after 20 seconds after restart
[ https://issues.apache.org/jira/browse/IOTDB-5507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinyu Tan reassigned IOTDB-5507: Assignee: Yongzao Dan (was: Xinyu Tan) > Optimized the logic that Datanodes can be added to the cluster after 20 > seconds after restart > - > > Key: IOTDB-5507 > URL: https://issues.apache.org/jira/browse/IOTDB-5507 > Project: Apache IoTDB > Issue Type: Improvement >Reporter: changxue >Assignee: Yongzao Dan >Priority: Major > Attachments: image-2023-02-09-11-43-44-065.png > > > [start] It's not a good idea to wait 20s to restart a datanode > !image-2023-02-09-11-43-44-065.png|width=800! > suppose: > I change some configurations and need to restart to make sense, then I want > to restart immediately, rather than waiting 20s > It's not a good designation. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (IOTDB-5507) [start] It's not a good idea to wait 20s to restart a datanode
[ https://issues.apache.org/jira/browse/IOTDB-5507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17686242#comment-17686242 ] Xinyu Tan commented on IOTDB-5507: -- How do you restart the node? Do you use kill-9? Or do you use stop-datanode.sh? > [start] It's not a good idea to wait 20s to restart a datanode > --- > > Key: IOTDB-5507 > URL: https://issues.apache.org/jira/browse/IOTDB-5507 > Project: Apache IoTDB > Issue Type: Improvement >Reporter: changxue >Assignee: Xinyu Tan >Priority: Major > Attachments: image-2023-02-09-11-43-44-065.png > > > [start] It's not a good idea to wait 20s to restart a datanode > !image-2023-02-09-11-43-44-065.png|width=800! > suppose: > I change some configurations and need to restart to make sense, then I want > to restart immediately, rather than waiting 20s > It's not a good designation. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (IOTDB-5112) Fixed IoTConsensus synchronization stuck under low load or during restart
[ https://issues.apache.org/jira/browse/IOTDB-5112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinyu Tan reassigned IOTDB-5112: Assignee: huxiangpeng (was: Xinyu Tan) > Fixed IoTConsensus synchronization stuck under low load or during restart > - > > Key: IOTDB-5112 > URL: https://issues.apache.org/jira/browse/IOTDB-5112 > Project: Apache IoTDB > Issue Type: Bug > Components: mpp-cluster >Reporter: Chao Wang >Assignee: huxiangpeng >Priority: Major > > error log: waiting target request timeout. current index: 20, target index: > -1. > Because when requestCache.size()! = MAX_REQUEST_CACHE_SIZE, nextSyncIndex > does not reassign a value > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (IOTDB-5466) [ratis]Write logs every 2 minutes:[pool-21-IoTDB-ratis-bg-disk-guardian-1] INFO o.a.i.c.r.RatisConsensus:709 - Raft group group-000200000000 took snapshot successfully
[ https://issues.apache.org/jira/browse/IOTDB-5466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinyu Tan reassigned IOTDB-5466: Assignee: Song Ziyang (was: Xinyu Tan) > [ratis]Write logs every 2 minutes:[pool-21-IoTDB-ratis-bg-disk-guardian-1] > INFO o.a.i.c.r.RatisConsensus:709 - Raft group group-0002 took > snapshot successfully > > > Key: IOTDB-5466 > URL: https://issues.apache.org/jira/browse/IOTDB-5466 > Project: Apache IoTDB > Issue Type: Bug > Components: mpp-cluster >Affects Versions: 1.0.1 >Reporter: 刘珍 >Assignee: Song Ziyang >Priority: Major > Attachments: image-2023-02-03-10-35-48-860.png, lt.conf > > > 测试版本: rc/1.0.1 20230129 573097a > 问题描述: > 3副本3C3D,各节点状态正常,Benchmark在执行读写,datanode 间隔2分钟,持续刷如下log: > !image-2023-02-03-10-35-48-860.png! > 测试环境 > 1. 192.168.10.62/66/68/64 72CPU 256GB > ConfigNode 和DataNode在192.168.10.62/66/68 > 路径是/data/liuzhen_test/r_0129_573097a > Benchmark在192.168.10.64,/data/liuzhen_test/3c3d_longtest/bm_v1 > 2. 配置参数 > ConfigNode参数: > MAX_HEAP_SIZE="8G" > cn_target_config_node_list=192.168.10.62:10710 > DataNode参数: > MAX_HEAP_SIZE="192G" > MAX_DIRECT_MEMORY_SIZE="32G" > dn_max_connection_for_internal_service=300 > dn_target_config_node_list=192.168.10.62:10710,192.168.10.66:10710,192.168.10.68:10710 > Common参数: > schema_replication_factor=3 > data_replication_factor=3 > iot_consensus_throttle_threshold_in_byte=536870912000 > disk_space_warning_threshold=0.01 > 3. 启动Benchmark 7*24小时读写 > 配置文件见附件。 > 4.查看datanode日志 > 3个节点都有问题描述中的现象。 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (IOTDB-5383) [confignode]start-confignode fail with NPE
[ https://issues.apache.org/jira/browse/IOTDB-5383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinyu Tan reassigned IOTDB-5383: Assignee: Song Ziyang (was: Xinyu Tan) > [confignode]start-confignode fail with NPE > -- > > Key: IOTDB-5383 > URL: https://issues.apache.org/jira/browse/IOTDB-5383 > Project: Apache IoTDB > Issue Type: Bug > Components: Core/Server >Affects Versions: 1.0.1 >Reporter: changxue >Assignee: Song Ziyang >Priority: Major > Attachments: conf-46.tar.gz, confignode-npe_allnodes-log.tar.gz > > > [confignode]start-confignode fail with NPE > reproduction: > 1. config_node_ratis_snapshot_trigger_threshold=30 append in > iotdb-common.properties > 2. start 3C3D cluster > expect: start successfully > actual result: > 2C3D start successfully but it failed with NPE when start the third > confignode > {code} > show cluster > +--+--+---+---++ > |NodeID| NodeType| Status|InternalAddress|InternalPort| > +--+--+---+---++ > | 0|ConfigNode|Running| 172.20.70.44| 10710| > | 2|ConfigNode|Running| 172.20.70.45| 10710| > | 1| DataNode|Running| 172.20.70.44| 10730| > | 3| DataNode|Running| 172.20.70.45| 10730| > | 5| DataNode|Running| 172.20.70.46| 10730| > +--+--+---+---++ > {code} > {code} > 2023-01-07 14:42:11,745 [grpc-default-executor-0] INFO > o.a.r.g.s.GrpcServerProtocolService$ServerRequestStreamObserver:143 - 8: > Completed INSTALL_SNAPSHOT, lastRequest: > 0->8#0-t1,chunk:ba310edb-b921-452d-8023-4ef2ad4f51f9,8 > 2023-01-07 14:42:11,746 [8@group--StateMachineUpdater] ERROR > o.a.r.s.i.StateMachineUpdater:194 - 8@group--StateMachineUpdater > caught a Throwable. > java.lang.NullPointerException: snapshot == null > at java.util.Objects.requireNonNull(Objects.java:228) > at > org.apache.ratis.server.impl.StateMachineUpdater.reload(StateMachineUpdater.java:219) > at > org.apache.ratis.server.impl.StateMachineUpdater.run(StateMachineUpdater.java:179) > at java.lang.Thread.run(Thread.java:748) > {code} > 猜测与 config_node_ratis_snapshot_trigger_threshold > 配置太小有关。第三个confignode启动不了,show timeseries root.** 也运行不了,即集群不可用。 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (IOTDB-5411) Write an error using the session interface
[ https://issues.apache.org/jira/browse/IOTDB-5411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17681727#comment-17681727 ] Xinyu Tan commented on IOTDB-5411: -- https://github.com/apache/iotdb/pull/8840 > Write an error using the session interface > -- > > Key: IOTDB-5411 > URL: https://issues.apache.org/jira/browse/IOTDB-5411 > Project: Apache IoTDB > Issue Type: Bug > Components: Client/Java >Reporter: sunhao >Assignee: Hongyin Zhang >Priority: Major > Attachments: image-2023-01-12-18-17-00-706.png > > > !image-2023-01-12-18-17-00-706.png|width=1035,height=188! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IOTDB-5425) Consolidate all ConfigNodeClient to be managed by clientManager
Xinyu Tan created IOTDB-5425: Summary: Consolidate all ConfigNodeClient to be managed by clientManager Key: IOTDB-5425 URL: https://issues.apache.org/jira/browse/IOTDB-5425 Project: Apache IoTDB Issue Type: Improvement Reporter: Xinyu Tan Assignee: Xinyu Tan On the one hand, it makes the code logical, and on the other hand, it may resolve ConfigNodeClient leaks -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IOTDB-5384) add core_client_count_for_each_node_in_client_manager and max_client_count_for_each_node_in_client_manager parameters for confignode and datanode
Xinyu Tan created IOTDB-5384: Summary: add core_client_count_for_each_node_in_client_manager and max_client_count_for_each_node_in_client_manager parameters for confignode and datanode Key: IOTDB-5384 URL: https://issues.apache.org/jira/browse/IOTDB-5384 Project: Apache IoTDB Issue Type: Improvement Reporter: Xinyu Tan Assignee: Xinyu Tan Two parameters are added to confignode.properties: * cn_core_client_count_for_each_node_in_client_manager * cn_max_client_count_for_each_node_in_client_manager Two parameters are added to datanode.properties: * dn_core_client_count_for_each_node_in_client_manager * dn_max_client_count_for_each_node_in_client_manager This issue also causes all clientManager initializations to use these parameters, as well as updating the documentation -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IOTDB-5345) Use the logical clock to identify the snapshot version of IoTConsensus
Xinyu Tan created IOTDB-5345: Summary: Use the logical clock to identify the snapshot version of IoTConsensus Key: IOTDB-5345 URL: https://issues.apache.org/jira/browse/IOTDB-5345 Project: Apache IoTDB Issue Type: Improvement Reporter: Xinyu Tan Assignee: huxiangpeng Attachments: image-2023-01-03-23-45-07-397.png The current IoTConsensus uses physical clocks to identify different snapshot versions. In some operation scenarios, the physical clock of the machine may be rolled back. This may cause IoTConsensus to label the latest snapshot as the old snapshot version. Therefore, we need to use logical timestamps to mark different snapshot versions. For example, use a self-maintaining increment index. In addition, this work needs to ensure forward compatibility with 1.0.0 !image-2023-01-03-23-45-07-397.png! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Reopened] (IOTDB-5111) [ ratis ] Data is distributed across disks ,after the cluster is restarted, all data is lost
[ https://issues.apache.org/jira/browse/IOTDB-5111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinyu Tan reopened IOTDB-5111: -- > [ ratis ] Data is distributed across disks ,after the cluster is restarted, > all data is lost > > > Key: IOTDB-5111 > URL: https://issues.apache.org/jira/browse/IOTDB-5111 > Project: Apache IoTDB > Issue Type: Bug > Components: mpp-cluster >Affects Versions: 1.0.0 >Reporter: 刘珍 >Assignee: Song Ziyang >Priority: Major > Attachments: image-2022-12-02-17-58-45-096.png, > image-2022-12-02-17-59-05-010.png > > > rel/1.0 > config/schema/data 3个协议均是ratis, > dn_data_dirs=data/datanode/data,/data1/iotdb/datanode/data > 跨盘存储, > 写入数据,重启集群,{color:#DE350B}*数据全部丢失*{color}。 > 还有1个问题,{color:#DE350B}snapshot目录下依然有.tmp.文件夹名称{color}: > !image-2022-12-02-17-59-05-010.png! > 测试环境-私有云1期 8C32GB > 1. 3副本3C7D > Common > data_region_consensus_protocol_class=org.apache.iotdb.consensus.ratis.RatisConsensus > schema_replication_factor=3 > data_replication_factor=3 > wal_buffer_size_in_byte=1048576 > max_waiting_time_when_insert_blocked=360 > query_timeout_threshold=3600 > ConfigNode > MAX_HEAP_SIZE="20G" > MAX_DIRECT_MEMORY_SIZE="6G" > DataNode > MAX_HEAP_SIZE="20G" > MAX_DIRECT_MEMORY_SIZE="6G" > dn_data_dirs=data/datanode/data,/data1/iotdb/datanode/data > 2. 启动BM 写入数据 > GROUP_NUMBER=1 > DEVICE_NUMBER=1000 > REAL_INSERT_RATE=1.0 > SENSOR_NUMBER=1000 > IS_SENSOR_TS_ALIGNMENT=true > IS_OUT_OF_ORDER=false > OUT_OF_ORDER_RATIO=0.5 > OPERATION_PROPORTION=1:0:0:0:0:0:0:0:0:0:0 > CLIENT_NUMBER=50 > LOOP=1 > BATCH_SIZE_PER_WRITE=10 > START_TIME=2018-8-30T00:00:00+08:00 > POINT_STEP=200 > OP_MIN_INTERVAL=0 > OP_MIN_INTERVAL_RANDOM=false > INSERT_DATATYPE_PROPORTION=1:1:1:1:1:1 > ENCODINGS=PLAIN/PLAIN/PLAIN/PLAIN/PLAIN/PLAIN > COMPRESSOR=SNAPPY > IS_DELETE_DATA=false > CREATE_SCHEMA=true > BENCHMARK_CLUSTER=false > !image-2022-12-02-17-58-45-096.png! > 3. 重启集群 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Reopened] (IOTDB-5231) [monitor]datanode could not start when binding 9091 error
[ https://issues.apache.org/jira/browse/IOTDB-5231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinyu Tan reopened IOTDB-5231: -- > [monitor]datanode could not start when binding 9091 error > -- > > Key: IOTDB-5231 > URL: https://issues.apache.org/jira/browse/IOTDB-5231 > Project: Apache IoTDB > Issue Type: Bug >Affects Versions: 1.0.0 >Reporter: changxue >Assignee: Hongyin Zhang >Priority: Major > Labels: pull-request-available > Attachments: config.tar.gz, monitor_error_log.tar.gz > > > [monitor]datanode could not start when binding 9091 error > environment: > 3C3D cluster, rel/1.0 branch > 1. enable prometheus monitor > 2. the prometheus service has not been started > problem: > 1. 监控是附加功能,打开它并且它工作不正常(可以warning),但不应该出现error,不应该影响rpc service等的启动。 > 2. 这种情况下,stop-datanode.sh 是不能成功停止成功的,需要kill > 3. confignode启动成功,且成功绑定了9091, datanode再绑定9091,结果失败。需要使之成功。 > {code} > 2022-12-19 10:26:23,574 [main] INFO o.a.i.m.AbstractMetricService:130 - > Detect more than one MetricManager, will use > org.apache.iotdb.metrics.micrometer.MicrometerMetricManager > 2022-12-19 10:26:23,574 [main] INFO o.a.i.m.AbstractMetricService:137 - Load > metric reporters, type: [PROMETHEUS] > 2022-12-19 10:26:23,939 [main] ERROR o.a.i.c.s.m.MetricService:52 - Failed to > start Metrics ServerService because: > reactor.netty.ChannelBindException: Failed to bind on [0.0.0.0:9091] > Suppressed: java.lang.Exception: #block terminated with an error > at > reactor.core.publisher.BlockingSingleSubscriber.blockingGet(BlockingSingleSubscriber.java:139) > at reactor.core.publisher.Mono.block(Mono.java:1731) > at > reactor.netty.transport.ServerTransport.bindNow(ServerTransport.java:145) > at > reactor.netty.transport.ServerTransport.bindNow(ServerTransport.java:130) > at > org.apache.iotdb.metrics.reporter.prometheus.PrometheusReporter.start(PrometheusReporter.java:81) > at > org.apache.iotdb.metrics.CompositeReporter.startAll(CompositeReporter.java:38) > at > org.apache.iotdb.metrics.AbstractMetricService.startAllReporter(AbstractMetricService.java:193) > at > org.apache.iotdb.metrics.AbstractMetricService.startCoreModule(AbstractMetricService.java:98) > at > org.apache.iotdb.metrics.AbstractMetricService.startService(AbstractMetricService.java:76) > at > org.apache.iotdb.commons.service.metric.MetricService.start(MetricService.java:49) > at > org.apache.iotdb.commons.service.RegisterManager.register(RegisterManager.java:51) > at > org.apache.iotdb.db.service.DataNode.doAddNode(DataNode.java:162) > at > org.apache.iotdb.db.service.DataNodeServerCommandLine.run(DataNodeServerCommandLine.java:95) > at > org.apache.iotdb.commons.ServerCommandLine.doMain(ServerCommandLine.java:58) > at > org.apache.iotdb.db.service.DataNode.main(DataNode.java:131) > 2022-12-19 10:26:23,940 [main] ERROR o.a.i.db.service.DataNode:178 - Fail to > start server > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (IOTDB-4986) Too many IoTDB-DataNodeInternalRPC-Processor threads are open
[ https://issues.apache.org/jira/browse/IOTDB-4986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17653767#comment-17653767 ] Xinyu Tan commented on IOTDB-4986: -- Not yet. This issue requires continuous optimization of the Thrift Threading model over time, and I've broken it down a few issues that might take a sprint or two to complete > Too many IoTDB-DataNodeInternalRPC-Processor threads are open > - > > Key: IOTDB-4986 > URL: https://issues.apache.org/jira/browse/IOTDB-4986 > Project: Apache IoTDB > Issue Type: Improvement > Components: mpp-cluster >Affects Versions: 0.14.0-SNAPSHOT >Reporter: 刘珍 >Assignee: Haiming Zhu >Priority: Critical > > m_1118_3d5eeae > 1. 启动3副本3C21D 集群 > 2. 顺序启动7Benchmark > 3. 某个节点的datanode IoTDB-DataNodeInternalRPC-Processor 线程会开的很多,2k+ > (慢慢会降下来),但是会偶现OOM > 2022-11-18 14:26:48,320 > [pool-22-IoTDB-DataNodeInternalRPC-Processor-374$20221118_062422_29227_16.1.0] > ERROR o.a.i.d.m.p.s.FragmentInstanceDispatcherImpl:234 - write locally > failed. TSStatus: TSStatus(code:506, subStatus:[]), message: null > 2022-11-18 14:29:44,568 [DataNodeInternalRPC-Service]{color:red}* ERROR > o.a.i.c.c.IoTDBDefaultThreadExceptionHandler:31 - Exception in thread > DataNodeInternalRPC-Service-40 > java.lang.OutOfMemoryError: unable to create native thread: possibly out of > memory or process/resource limits reached*{color} > at java.base/java.lang.Thread.start0(Native Method) > at java.base/java.lang.Thread.start(Thread.java:803) > at > java.base/java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:937) > at > java.base/java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1354) > at > org.apache.thrift.server.TThreadPoolServer.execute(TThreadPoolServer.java:155) > at > org.apache.thrift.server.TThreadPoolServer.serve(TThreadPoolServer.java:139) > at > org.apache.iotdb.commons.service.AbstractThriftServiceThread.run(AbstractThriftServiceThread.java:258) > 2022-11-18 14:29:53,751 [ClientRPC-Service] ERROR > o.a.i.c.c.IoTDBDefaultThreadExceptionHandler:31 - Exception in thread > ClientRPC-Service-42 > java.lang.OutOfMemoryError: unable to create native thread: possibly out of > memory or process/resource limits reached > at java.base/java.lang.Thread.start0(Native Method) > at java.base/java.lang.Thread.start(Thread.java:803) > at > java.base/java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:937) > at > java.base/java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1354) > at > org.apache.thrift.server.TThreadPoolServer.execute(TThreadPoolServer.java:155) > at > org.apache.thrift.server.TThreadPoolServer.serve(TThreadPoolServer.java:139) > at > org.apache.iotdb.commons.service.AbstractThriftServiceThread.run(AbstractThriftServiceThread.java:258) > 2022-11-18 14:30:11,736 [pool-6-IoTDB-Flush-4] ERROR > o.a.i.d.e.s.TsFileProcessor:1095 - root.test.g_0-6: > /data/iotdb/m_1118_3d5eeae/sbin/../data/datanode/data/unsequence/root.test.g_0/6/2538/1668752675355-5-0-0.tsfile > meet error when flushing a memtable, change system mode to error > java.lang.OutOfMemoryError: unable to create native thread: possibly out of > memory or process/resource limits reached > at java.base/java.lang.Thread.start0(Native Method) > at java.base/java.lang.Thread.start(Thread.java:803) > at > java.base/java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:937) > at > java.base/java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1354) > at > java.base/java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:118) > at > org.apache.iotdb.db.rescon.AbstractPoolManager.submit(AbstractPoolManager.java:56) > at > org.apache.iotdb.db.engine.flush.MemTableFlushTask.(MemTableFlushTask.java:88) > at > org.apache.iotdb.db.engine.storagegroup.TsFileProcessor.flushOneMemTable(TsFileProcessor.java:1082) > at > org.apache.iotdb.db.engine.flush.FlushManager$FlushThread.runMayThrow(FlushManager.java:108) > at > org.apache.iotdb.commons.concurrent.WrappedRunnable.run(WrappedRunnable.java:29) > at > java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) > at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) > at > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) > at > java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) > at java.base/java.lang.Thread.run(Thread.java:834) > 2022-
[jira] [Reopened] (IOTDB-5060) Control the ratis log size
[ https://issues.apache.org/jira/browse/IOTDB-5060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinyu Tan reopened IOTDB-5060: -- > Control the ratis log size > -- > > Key: IOTDB-5060 > URL: https://issues.apache.org/jira/browse/IOTDB-5060 > Project: Apache IoTDB > Issue Type: Improvement >Reporter: Jialin Qiao >Assignee: Song Ziyang >Priority: Major > > Currently, we have the operation number limit, but when meet big operation, > the log will occupy too much disk. > Need control the total raft log sie. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Reopened] (IOTDB-5111) [ ratis ] Data is distributed across disks ,after the cluster is restarted, all data is lost
[ https://issues.apache.org/jira/browse/IOTDB-5111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinyu Tan reopened IOTDB-5111: -- > [ ratis ] Data is distributed across disks ,after the cluster is restarted, > all data is lost > > > Key: IOTDB-5111 > URL: https://issues.apache.org/jira/browse/IOTDB-5111 > Project: Apache IoTDB > Issue Type: Bug > Components: mpp-cluster >Affects Versions: 1.0.0 >Reporter: 刘珍 >Assignee: Song Ziyang >Priority: Major > Attachments: image-2022-12-02-17-58-45-096.png, > image-2022-12-02-17-59-05-010.png > > > rel/1.0 > config/schema/data 3个协议均是ratis, > dn_data_dirs=data/datanode/data,/data1/iotdb/datanode/data > 跨盘存储, > 写入数据,重启集群,{color:#DE350B}*数据全部丢失*{color}。 > 还有1个问题,{color:#DE350B}snapshot目录下依然有.tmp.文件夹名称{color}: > !image-2022-12-02-17-59-05-010.png! > 测试环境-私有云1期 8C32GB > 1. 3副本3C7D > Common > data_region_consensus_protocol_class=org.apache.iotdb.consensus.ratis.RatisConsensus > schema_replication_factor=3 > data_replication_factor=3 > wal_buffer_size_in_byte=1048576 > max_waiting_time_when_insert_blocked=360 > query_timeout_threshold=3600 > ConfigNode > MAX_HEAP_SIZE="20G" > MAX_DIRECT_MEMORY_SIZE="6G" > DataNode > MAX_HEAP_SIZE="20G" > MAX_DIRECT_MEMORY_SIZE="6G" > dn_data_dirs=data/datanode/data,/data1/iotdb/datanode/data > 2. 启动BM 写入数据 > GROUP_NUMBER=1 > DEVICE_NUMBER=1000 > REAL_INSERT_RATE=1.0 > SENSOR_NUMBER=1000 > IS_SENSOR_TS_ALIGNMENT=true > IS_OUT_OF_ORDER=false > OUT_OF_ORDER_RATIO=0.5 > OPERATION_PROPORTION=1:0:0:0:0:0:0:0:0:0:0 > CLIENT_NUMBER=50 > LOOP=1 > BATCH_SIZE_PER_WRITE=10 > START_TIME=2018-8-30T00:00:00+08:00 > POINT_STEP=200 > OP_MIN_INTERVAL=0 > OP_MIN_INTERVAL_RANDOM=false > INSERT_DATATYPE_PROPORTION=1:1:1:1:1:1 > ENCODINGS=PLAIN/PLAIN/PLAIN/PLAIN/PLAIN/PLAIN > COMPRESSOR=SNAPPY > IS_DELETE_DATA=false > CREATE_SCHEMA=true > BENCHMARK_CLUSTER=false > !image-2022-12-02-17-58-45-096.png! > 3. 重启集群 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (IOTDB-5324) [migrate region] 1rep1C4D ,after the region is migrated successfully, wal cannot be deleted from destDataNode
[ https://issues.apache.org/jira/browse/IOTDB-5324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinyu Tan reassigned IOTDB-5324: Assignee: Xinyu Tan (was: Gaofei Cao) > [migrate region] 1rep1C4D ,after the region is migrated successfully, wal > cannot be deleted from destDataNode > -- > > Key: IOTDB-5324 > URL: https://issues.apache.org/jira/browse/IOTDB-5324 > Project: Apache IoTDB > Issue Type: Bug > Components: mpp-cluster >Affects Versions: master branch >Reporter: 刘珍 >Assignee: Xinyu Tan >Priority: Major > Attachments: 40971672369689_.pic.jpg, mig.conf, screenshot-1.png, > screenshot-2.png > > > m_1229_0fedffd > 问题描述 > 1副本1C4D集群,写入数据过程中,迁移region(Id=1 from ip4 to ip14)成功,但是目的节点的wal删不掉。 > 1.启动1副本1C4D集群 > config/schema/data 是ratis/ratis/IoT协议 > 2.BM 写入数据(配置见附件) > 9分钟后,迁移region > ./sbin/start-cli.sh -h 172.20.70.4 -e "migrate region 1 from 2 to 3" > 迁移成功,耗时20秒(2022-12-29 18:25:17,621-2022-12-29 18:25:37,676) > 但是ip14的datanode > 的regionId=1的wal删除不掉,导致大小为50GB,一直有限流的WARN日志,BM16个多小时不结束,理论上BM1个多小时就应该执行完成: > 2022-12-30 10:14:07,669 > [pool-25-IoTDB-ClientRPC-Processor-59$20221230_021337_10719_3.1.0] WARN > o.a.i.d.m.p.s.FragmentInstanceDispatcherImpl:243 - write locally failed. > TSStatus: TSStatus(code:606, message:Reject write because there are too many > requests need to process), message: Reject write because there are too many > requests need to process > 测试环境:私有云3期 > DataNode配置 > MAX_HEAP_SIZE="20G" > MAX_DIRECT_MEMORY_SIZE="6G" > dn_max_connection_for_internal_service=300 > ConfigNode配置 > MAX_HEAP_SIZE="20G" > MAX_DIRECT_MEMORY_SIZE="6G" > 迁移region前的region信息 > !screenshot-1.png! > 迁移region成功后的region信息 > !screenshot-2.png! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IOTDB-5312) Consolidate ClientManagers in Datanodes for unified management
Xinyu Tan created IOTDB-5312: Summary: Consolidate ClientManagers in Datanodes for unified management Key: IOTDB-5312 URL: https://issues.apache.org/jira/browse/IOTDB-5312 Project: Apache IoTDB Issue Type: Improvement Reporter: Xinyu Tan Assignee: Xinyu Tan The ClientManager of Datanodes is divided into different modules. On the one hand, thrift client reuse rate is not high. On the other hand, under the current thriftServer thread model of BIO, thread explosion may occur. The PR will mainly consolidate ClientManagers in Datanodes and do some necessary reconstruction * Consolidate ClientManagers in Datanodes for unified management * Move some clientFactory from DataNodeClientPoolFactory to ClientPoolFactory * Add thrift related parameters to CommonConfig so that they can be retrieved by ClientPoolFactory * By introducing ThriftClientFactory, the BaseClientFactory is not bound to thrift, so that the RatisClientFactory does not depend on thrift related parameters in the future * Enhance clientManager's handling of null, adding necessary judgments and removing unwanted ones * Adds invalidation logic for exceptions that occur when an asynchronous client fails -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IOTDB-5260) Refactoring ClientManager API and Exception
Xinyu Tan created IOTDB-5260: Summary: Refactoring ClientManager API and Exception Key: IOTDB-5260 URL: https://issues.apache.org/jira/browse/IOTDB-5260 Project: Apache IoTDB Issue Type: Improvement Reporter: Xinyu Tan Assignee: Xinyu Tan * ClientManagerException is introduced to facilitate ClientManager users to distinguish borrowClient exception from other business exception. * remove purelyBorrowClient API to make ClientManager API clearer -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IOTDB-5246) Enhance IoTConsensus field name
Xinyu Tan created IOTDB-5246: Summary: Enhance IoTConsensus field name Key: IOTDB-5246 URL: https://issues.apache.org/jira/browse/IOTDB-5246 Project: Apache IoTDB Issue Type: Improvement Reporter: Xinyu Tan Assignee: Xinyu Tan rename: * PendingBatch -> Batch * TSyncLogReq -> TSyncLogEntriesReq * TSyncLogRes -> TSyncLogEntriesRes * TLogBatch -> TLogEntry * syncLog -> syncLogEntries -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IOTDB-5174) Use filename format such as NodeID-Index rather than Endpoint-Index to track follower sync progress
Xinyu Tan created IOTDB-5174: Summary: Use filename format such as NodeID-Index rather than Endpoint-Index to track follower sync progress Key: IOTDB-5174 URL: https://issues.apache.org/jira/browse/IOTDB-5174 Project: Apache IoTDB Issue Type: Improvement Reporter: Xinyu Tan Assignee: Xinyu Tan This work can not only solve the bug in this [issue|https://github.com/apache/iotdb/issues/8334], but also facilitate the future peer ip/port update. In addition, this work needs to be compatible with version 1.0 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (IOTDB-4350) [ MultiLeader Throttle Down] Performance does not return to normal after “Throttle Down“
[ https://issues.apache.org/jira/browse/IOTDB-4350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17645746#comment-17645746 ] Xinyu Tan commented on IOTDB-4350: -- 自从考虑 SyncStatus 的 IoTConsensus 内存控制完善以后,这个现象已经不复存在。建议复测,如果没什么问题就 close 了吧 > [ MultiLeader Throttle Down] Performance does not return to normal after > “Throttle Down“ > - > > Key: IOTDB-4350 > URL: https://issues.apache.org/jira/browse/IOTDB-4350 > Project: Apache IoTDB > Issue Type: Bug > Components: mpp-cluster >Affects Versions: 0.14.0-SNAPSHOT >Reporter: 刘珍 >Assignee: 张洪胤 >Priority: Major > Fix For: 1.0.0 > > Attachments: image-2022-09-07-14-52-58-266.png, net_restart.conf, > screenshot-1.png, screenshot-2.png, screenshot-3.png, screenshot-4.png, > screenshot-5.png, screenshot-6.png > > > m_0905_0095eb3,3副本3C3D > 3个dataregion , 每个node上有1个leader。 > ip72 断网3分钟(16:52 ~ 16:55),查看集群状态,切主成功后, > ip73断网2分钟,之后不执行故障操作。 > 同步慢,multiLeader一直在写入限流,但是限流性能也回不去,如下,统计1分钟的写入数据量(bm中的batch) > !screenshot-6.png! > IoTDB> select count(latency) from > root.result.moresession_2022_09_06_04_47_03.INGESTION where okPoint>0 group > by ([1662454041076000186,1662459764764000179),1m); > +---++ > | > Time|count(root.result.moresession_2022_09_06_04_47_03.INGESTION.latency)| > +---++ > |2022-09-06T16:47:21.076000186+08:00| >5544| > |2022-09-06T16:48:21.076000186+08:00| >6282| > |2022-09-06T16:49:21.076000186+08:00| >5671| > |2022-09-06T16:50:21.076000186+08:00| >4589| > |2022-09-06T16:51:21.076000186+08:00| >5350| > |2022-09-06T16:52:21.076000186+08:00| >1121| > |2022-09-06T16:53:21.076000186+08:00| > 901| > |2022-09-06T16:54:21.076000186+08:00| > 201| > |2022-09-06T16:55:21.076000186+08:00| > 334| > |2022-09-06T16:56:21.076000186+08:00| >3501| > |2022-09-06T16:57:21.076000186+08:00| >3677| > |2022-09-06T16:58:21.076000186+08:00| >3111| > |2022-09-06T16:59:21.076000186+08:00| >1948| > |2022-09-06T17:00:21.076000186+08:00| >3889| > |2022-09-06T17:01:21.076000186+08:00| >2982| > |2022-09-06T17:02:21.076000186+08:00| >4465| > |2022-09-06T17:03:21.076000186+08:00| >4871| > |2022-09-06T17:04:21.076000186+08:00| >4478| > |2022-09-06T17:05:21.076000186+08:00| >3242| > |2022-09-06T17:06:21.076000186+08:00| >2545| > |2022-09-06T17:07:21.076000186+08:00| >2579| > |2022-09-06T17:08:21.076000186+08:00| > 133| > |2022-09-06T17:09:21.076000186+08:00| > 488| > |2022-09-06T17:10:21.076000186+08:00| > 253| > |2022-09-06T17:11:21.076000186+08:00| > 445| > |2022-09-06T17:12:21.076000186+08:00| >2122| > |2022-09-06T17:13:21.076000186+08:00| >1799| > |2022-09-06T17:14:21.076000186+08:00| >1568| > |2022-09-06T17:15:21.076000186+08:00| >
[jira] [Assigned] (IOTDB-5112) IoTConsesus retry timeout util after restart
[ https://issues.apache.org/jira/browse/IOTDB-5112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinyu Tan reassigned IOTDB-5112: Assignee: Xinyu Tan > IoTConsesus retry timeout util after restart > > > Key: IOTDB-5112 > URL: https://issues.apache.org/jira/browse/IOTDB-5112 > Project: Apache IoTDB > Issue Type: Bug > Components: mpp-cluster >Reporter: Chao Wang >Assignee: Xinyu Tan >Priority: Major > > error log: waiting target request timeout. current index: 20, target index: > -1. > Because when requestCache.size()! = MAX_REQUEST_CACHE_SIZE, nextSyncIndex > does not reassign a value > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (IOTDB-4855) [MultiLeader] Strength the memory control
[ https://issues.apache.org/jira/browse/IOTDB-4855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinyu Tan reassigned IOTDB-4855: Assignee: Xinyu Tan (was: 张洪胤) > [MultiLeader] Strength the memory control > - > > Key: IOTDB-4855 > URL: https://issues.apache.org/jira/browse/IOTDB-4855 > Project: Apache IoTDB > Issue Type: Improvement >Reporter: 张洪胤 >Assignee: Xinyu Tan >Priority: Major > Labels: pull-request-available > > We need to strength the control of multiLeader memory and taking the size of > syncStatus and pendingBatch that reading from WAL into consideration -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IOTDB-3559) Add metrics for the consensus module
Xinyu Tan created IOTDB-3559: Summary: Add metrics for the consensus module Key: IOTDB-3559 URL: https://issues.apache.org/jira/browse/IOTDB-3559 Project: Apache IoTDB Issue Type: Improvement Reporter: Xinyu Tan -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (IOTDB-3570) Extend the Peer structure of the consensus layer to embed the ID of the upper layer
Xinyu Tan created IOTDB-3570: Summary: Extend the Peer structure of the consensus layer to embed the ID of the upper layer Key: IOTDB-3570 URL: https://issues.apache.org/jira/browse/IOTDB-3570 Project: Apache IoTDB Issue Type: Improvement Reporter: Xinyu Tan Currently, the Peer returned by the getLeader interface only contains the IP address and port corresponding to the consensus layer of the Leader node. However, the port that the upper layer wants to obtain may be the service port that the upper layer RPC can connect to, such as internalService, etc. Therefore, we can consider extending the structure of the Peer so that it can be packed with a business custom ID structure, so that the upper layer can be returned at getLeader with the ID with business semantics defined for each Peer at AddConsensusGroup, thus reducing the coding burden on the upper layer. For example, DataNode can encode the TEndpoint of internalService into the ID -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (IOTDB-3569) Use iterator batch interface under MultiLeaderConsensus to get logs from WAL logs at high speed
Xinyu Tan created IOTDB-3569: Summary: Use iterator batch interface under MultiLeaderConsensus to get logs from WAL logs at high speed Key: IOTDB-3569 URL: https://issues.apache.org/jira/browse/IOTDB-3569 Project: Apache IoTDB Issue Type: Improvement Reporter: Xinyu Tan Using the batch interface directly may result in OOM -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (IOTDB-3568) Support linearizable read for RatisConsensus
Xinyu Tan created IOTDB-3568: Summary: Support linearizable read for RatisConsensus Key: IOTDB-3568 URL: https://issues.apache.org/jira/browse/IOTDB-3568 Project: Apache IoTDB Issue Type: Improvement Reporter: Xinyu Tan * We can contribute to the Ratis community to support linear consistent reading * It is also possible to add additional coordination logic on top of the RatisConsensus to satisfy linearizable read -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (IOTDB-3564) Reduce the number of I/O threads using thrift asynchronous server mode for MultiLeaderConsensusRPC
Xinyu Tan created IOTDB-3564: Summary: Reduce the number of I/O threads using thrift asynchronous server mode for MultiLeaderConsensusRPC Key: IOTDB-3564 URL: https://issues.apache.org/jira/browse/IOTDB-3564 Project: Apache IoTDB Issue Type: Improvement Reporter: Xinyu Tan Consider using selector mode or hahs mode and abstracting out the corresponding parameters -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (IOTDB-3561) Support snapshot transfer under MultiLeaderConsensus
Xinyu Tan created IOTDB-3561: Summary: Support snapshot transfer under MultiLeaderConsensus Key: IOTDB-3561 URL: https://issues.apache.org/jira/browse/IOTDB-3561 Project: Apache IoTDB Issue Type: New Feature Reporter: Xinyu Tan * On the one hand, we can delete some unsynchronized wal after snapshot. On the other hand, we can make the old node catchup faster. * BTW, member changes must be transferred through snapshot because the corresponding WAL may have been deleted -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Assigned] (IOTDB-3548) [cluster]can not create timeseries when start 3C2D
[ https://issues.apache.org/jira/browse/IOTDB-3548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinyu Tan reassigned IOTDB-3548: Assignee: (was: Xinyu Tan) > [cluster]can not create timeseries when start 3C2D > -- > > Key: IOTDB-3548 > URL: https://issues.apache.org/jira/browse/IOTDB-3548 > Project: Apache IoTDB > Issue Type: Bug > Components: Core/Cluster >Affects Versions: 0.14.0-SNAPSHOT >Reporter: FengQingxin >Priority: Major > Attachments: iotdb-confignode.properties, iotdb-engine.properties, > log_all.log > > > > commit c42cfe5fbee50b24cc1a1078cd5af1ee69930881 > Author: YongzaoDan <33111881+crzbulab...@users.noreply.github.com> > Date: Mon Jun 20 13:54:12 2022 +0800 > [IOTDB-3510] Read/Write Routing policy (Routing to DataNode with the > lowest-loaded) (#6308) > Reproduce steps: > 1. Modify config file as 3C3D: > schema_replication_factor=3 > data_replication_factor=1 > 2. Start 3C > 3.Start 2D > 4. using iotdb-cli to execute below sql: > set storage group to root.sg; > create timeseries root.sg.d.s1 with > datatype=INT32,encoding=RLE,compression=snappy; > create timeseries root.sg.d.s2 with > datatype=INT32,encoding=RLE,compression=snappy; > create timeseries root.sg.d.s3 with > datatype=INT32,encoding=RLE,compression=snappy; > insert into root.sg.d(time,s1,s2,s3) values(1,1,2,3); > insert into root.sg.d(time,s1,s2,s3) values(2,1,2,3); > 5.Got below error msg: > Msg: 500: [INTERNAL_SERVER_ERROR(500)] Exception occurred: "create timeseries > root.sg.d.s1 with datatype=INT32,encoding=RLE,compression=snappy". > executeStatement failed. null > !image-2022-06-20-17-13-31-196.png! > > -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Assigned] (IOTDB-3551) [ thread ] Thread control is required
[ https://issues.apache.org/jira/browse/IOTDB-3551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinyu Tan reassigned IOTDB-3551: Assignee: (was: Xinyu Tan) > [ thread ] Thread control is required > - > > Key: IOTDB-3551 > URL: https://issues.apache.org/jira/browse/IOTDB-3551 > Project: Apache IoTDB > Issue Type: Bug > Components: mpp-cluster >Affects Versions: 0.14.0-SNAPSHOT >Reporter: 刘珍 >Priority: Major > Attachments: stack_1.out > > > 72cpu机器,21个dataregion,1个schemaregion,单个datanode进程的Threads: 1200 total,需做好线程控制。 > TAsyncClientManager : 378 > Compaction相关:72 > Flush 相关:76 > MultiLeaderConsensusRPC : 65 > WAL 相关 :43 > LogDispatcher : 42 > grpc-default-worker-ELG : 72 > 20220620_090240_53823线程名:161 > 详细见附件stack_1.out。 > 数据库配置参数: > data_region_consensus_protocol_class=org.apache.iotdb.consensus.multileader.MultiLeaderConsensus > schema_region_consensus_protocol_class=org.apache.iotdb.consensus.ratis.RatisConsensus > schema_replication_factor=3 > data_replication_factor=3 -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (IOTDB-3554) Controls the number of rpc threads under the MultiLeaderConsensus
Xinyu Tan created IOTDB-3554: Summary: Controls the number of rpc threads under the MultiLeaderConsensus Key: IOTDB-3554 URL: https://issues.apache.org/jira/browse/IOTDB-3554 Project: Apache IoTDB Issue Type: Improvement Reporter: Xinyu Tan Assignee: Xinyu Tan * Make all regions share a clientManager * Reduce the number of pipelines because concurrency in the same region is generally not very large -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (IOTDB-3513) Avoid double-writing of the write ahead log for data under RatisConsensus
Xinyu Tan created IOTDB-3513: Summary: Avoid double-writing of the write ahead log for data under RatisConsensus Key: IOTDB-3513 URL: https://issues.apache.org/jira/browse/IOTDB-3513 Project: Apache IoTDB Issue Type: Improvement Reporter: Xinyu Tan -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (IOTDB-3448) Migrate the logic of deleteRegion onto the consensus module
Xinyu Tan created IOTDB-3448: Summary: Migrate the logic of deleteRegion onto the consensus module Key: IOTDB-3448 URL: https://issues.apache.org/jira/browse/IOTDB-3448 Project: Apache IoTDB Issue Type: Improvement Reporter: Xinyu Tan Assignee: Xinyu Tan The deletion of a region is used as a raft log to synchronize inside the region. If the underlying state machine fails to recover to the previous state after the restart, NPE problems may occur during the restart. In addition, executing a raft log that removes itself is very strange for the consensus layer because we still end up removing the corresponding region in the consensus layer, which is not done in current implementation So we can move the deleteRegion operation above the consensus layer -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (IOTDB-3446) Deleting a storage group requires that datanode delete all data and wal files and directories related to the storage group
Xinyu Tan created IOTDB-3446: Summary: Deleting a storage group requires that datanode delete all data and wal files and directories related to the storage group Key: IOTDB-3446 URL: https://issues.apache.org/jira/browse/IOTDB-3446 Project: Apache IoTDB Issue Type: Bug Reporter: Xinyu Tan Attachments: image-2022-06-10-11-28-47-233.png !image-2022-06-10-11-28-47-233.png! Yet none of them have been deleted -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (IOTDB-3445) Deleting a storage group requires that datanode delete all metadata and directories related to the storage group
Xinyu Tan created IOTDB-3445: Summary: Deleting a storage group requires that datanode delete all metadata and directories related to the storage group Key: IOTDB-3445 URL: https://issues.apache.org/jira/browse/IOTDB-3445 Project: Apache IoTDB Issue Type: Bug Reporter: Xinyu Tan Attachments: image-2022-06-10-11-26-26-980.png Currently, however, only files can be deleted, not directories !image-2022-06-10-11-26-26-980.png! -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (IOTDB-3382) Adjust the default preAllocateSize of the ratisConsensus RaftLog
[ https://issues.apache.org/jira/browse/IOTDB-3382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17551571#comment-17551571 ] Xinyu Tan commented on IOTDB-3382: -- [Analyse doc|https://apache-iotdb.feishu.cn/docx/doxcn4CnBOLzbOwmkpDeitTHqOg] > Adjust the default preAllocateSize of the ratisConsensus RaftLog > > > Key: IOTDB-3382 > URL: https://issues.apache.org/jira/browse/IOTDB-3382 > Project: Apache IoTDB > Issue Type: Improvement >Reporter: Xinyu Tan >Assignee: Song Ziyang >Priority: Major > Attachments: image-2022-06-02-16-36-35-968.png > > > need some theoretical analysis, maybe some testing > !image-2022-06-02-16-36-35-968.png! -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Assigned] (IOTDB-3382) Adjust the default preAllocateSize of the ratisConsensus RaftLog
[ https://issues.apache.org/jira/browse/IOTDB-3382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinyu Tan reassigned IOTDB-3382: Assignee: Song Ziyang (was: Xinyu Tan) > Adjust the default preAllocateSize of the ratisConsensus RaftLog > > > Key: IOTDB-3382 > URL: https://issues.apache.org/jira/browse/IOTDB-3382 > Project: Apache IoTDB > Issue Type: Improvement >Reporter: Xinyu Tan >Assignee: Song Ziyang >Priority: Major > Attachments: image-2022-06-02-16-36-35-968.png > > > need some theoretical analysis, maybe some testing > !image-2022-06-02-16-36-35-968.png! -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (IOTDB-3395) Use thrift server to fix clientManagerTest bind address already used issue
Xinyu Tan created IOTDB-3395: Summary: Use thrift server to fix clientManagerTest bind address already used issue Key: IOTDB-3395 URL: https://issues.apache.org/jira/browse/IOTDB-3395 Project: Apache IoTDB Issue Type: Improvement Reporter: Xinyu Tan Assignee: Xinyu Tan -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Assigned] (IOTDB-3359) Refactor the serialization interface for the consensus layer to avoid hard-coding size ByteBuffers
[ https://issues.apache.org/jira/browse/IOTDB-3359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinyu Tan reassigned IOTDB-3359: Assignee: Xinyu Tan > Refactor the serialization interface for the consensus layer to avoid > hard-coding size ByteBuffers > -- > > Key: IOTDB-3359 > URL: https://issues.apache.org/jira/browse/IOTDB-3359 > Project: Apache IoTDB > Issue Type: Improvement >Reporter: Xinyu Tan >Assignee: Xinyu Tan >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (IOTDB-3386) Avoid the double-write problem of raftlog and write-ahead log at the Datanode consensus layer
Xinyu Tan created IOTDB-3386: Summary: Avoid the double-write problem of raftlog and write-ahead log at the Datanode consensus layer Key: IOTDB-3386 URL: https://issues.apache.org/jira/browse/IOTDB-3386 Project: Apache IoTDB Issue Type: Improvement Reporter: Xinyu Tan Assignee: Xinyu Tan see [doc|https://apache-iotdb.feishu.cn/docs/doccnuowRHp8qgyDOBFdSfsxUw1] -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Assigned] (IOTDB-3385) Reduce the serialization size for the Datanode consensus layer
[ https://issues.apache.org/jira/browse/IOTDB-3385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinyu Tan reassigned IOTDB-3385: Assignee: Xinyu Tan > Reduce the serialization size for the Datanode consensus layer > -- > > Key: IOTDB-3385 > URL: https://issues.apache.org/jira/browse/IOTDB-3385 > Project: Apache IoTDB > Issue Type: Improvement >Reporter: Xinyu Tan >Assignee: Xinyu Tan >Priority: Major > Attachments: image-2022-06-02-17-59-20-779.png > > > Datanode currently uses FI to pass changes to the consensus layer, but its > serialization method contains many unnecessary parts, such as replication > group endpoints and so on, which makes it write much more data than WAL or > MLOG, affecting performance. We need to think about reducing its size > !image-2022-06-02-17-59-20-779.png! -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (IOTDB-3385) Reduce the serialization size for the Datanode consensus layer
Xinyu Tan created IOTDB-3385: Summary: Reduce the serialization size for the Datanode consensus layer Key: IOTDB-3385 URL: https://issues.apache.org/jira/browse/IOTDB-3385 Project: Apache IoTDB Issue Type: Improvement Reporter: Xinyu Tan Attachments: image-2022-06-02-17-59-20-779.png Datanode currently uses FI to pass changes to the consensus layer, but its serialization method contains many unnecessary parts, such as replication group endpoints and so on, which makes it write much more data than WAL or MLOG, affecting performance. We need to think about reducing its size !image-2022-06-02-17-59-20-779.png! -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Assigned] (IOTDB-3382) Adjust the default preAllocateSize of the ratisConsensus RaftLog
[ https://issues.apache.org/jira/browse/IOTDB-3382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinyu Tan reassigned IOTDB-3382: Assignee: Xinyu Tan > Adjust the default preAllocateSize of the ratisConsensus RaftLog > > > Key: IOTDB-3382 > URL: https://issues.apache.org/jira/browse/IOTDB-3382 > Project: Apache IoTDB > Issue Type: Improvement >Reporter: Xinyu Tan >Assignee: Xinyu Tan >Priority: Major > Attachments: image-2022-06-02-16-36-35-968.png > > > need some theoretical analysis, maybe some testing > !image-2022-06-02-16-36-35-968.png! -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (IOTDB-3382) Adjust the default preAllocateSize of the ratisConsensus RaftLog
Xinyu Tan created IOTDB-3382: Summary: Adjust the default preAllocateSize of the ratisConsensus RaftLog Key: IOTDB-3382 URL: https://issues.apache.org/jira/browse/IOTDB-3382 Project: Apache IoTDB Issue Type: Improvement Reporter: Xinyu Tan Attachments: image-2022-06-02-16-36-35-968.png need some theoretical analysis, maybe some testing !image-2022-06-02-16-36-35-968.png! -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (IOTDB-3359) Refactor the serialization interface for the consensus layer to avoid hard-coding size ByteBuffers
Xinyu Tan created IOTDB-3359: Summary: Refactor the serialization interface for the consensus layer to avoid hard-coding size ByteBuffers Key: IOTDB-3359 URL: https://issues.apache.org/jira/browse/IOTDB-3359 Project: Apache IoTDB Issue Type: Improvement Reporter: Xinyu Tan -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Reopened] (IOTDB-3195) Added a configuration interface for the consensus layer
[ https://issues.apache.org/jira/browse/IOTDB-3195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinyu Tan reopened IOTDB-3195: -- > Added a configuration interface for the consensus layer > --- > > Key: IOTDB-3195 > URL: https://issues.apache.org/jira/browse/IOTDB-3195 > Project: Apache IoTDB > Issue Type: Improvement >Reporter: Xinyu Tan >Assignee: Xinyu Tan >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Assigned] (IOTDB-3240) [ErrorMSG]java.lang.IllegalStateException: Client has an error!Caused by: java.net.ConnectException: Connection refused
[ https://issues.apache.org/jira/browse/IOTDB-3240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinyu Tan reassigned IOTDB-3240: Assignee: Xinyu Tan (was: Quan Siyi) > [ErrorMSG]java.lang.IllegalStateException: Client has an error!Caused by: > java.net.ConnectException: Connection refused > --- > > Key: IOTDB-3240 > URL: https://issues.apache.org/jira/browse/IOTDB-3240 > Project: Apache IoTDB > Issue Type: Bug > Components: Core/Cluster >Affects Versions: 0.14.0-SNAPSHOT >Reporter: FengQingxin >Assignee: Xinyu Tan >Priority: Major > Attachments: image-2022-05-19-20-22-13-041.png, > image-2022-05-19-20-24-38-004.png > > > [ErrorMSG]When start a cluster 3C3D with default config file,there is an > error in the log of leader config node > 重现步骤: > 1.编译生成的distribution文件复制三份(默认配置) > 2.进入confignode文件夹下sbin目录使用start-confignode.sh启动三个ConfigNode (正常) > 3.进入datanode文件夹下sbin目录使用start-datanode.sh启动三个DataNode(datanode日志正常)(confignode中leader有error日志如下) > java.lang.IllegalStateException: Client has an error! > at > org.apache.thrift.async.TAsyncClient.checkReady(TAsyncClient.java:83) > at > org.apache.iotdb.commons.client.async.AsyncDataNodeInternalServiceClient.isReady(AsyncDataNodeInternalServiceClient.java:109) > at > org.apache.iotdb.commons.client.async.AsyncDataNodeInternalServiceClient$Factory.validateObject(AsyncDataNodeInternalServiceClient.java:154) > at > org.apache.iotdb.commons.client.async.AsyncDataNodeInternalServiceClient$Factory.validateObject(AsyncDataNodeInternalServiceClient.java:122) > at > org.apache.commons.pool2.impl.GenericKeyedObjectPool.returnObject(GenericKeyedObjectPool.java:1470) > at > org.apache.iotdb.commons.client.ClientManager.returnClient(ClientManager.java:70) > at > org.apache.iotdb.commons.client.async.AsyncDataNodeInternalServiceClient.returnSelf(AsyncDataNodeInternalServiceClient.java:83) > at > org.apache.iotdb.commons.client.async.AsyncDataNodeInternalServiceClient.onError(AsyncDataNodeInternalServiceClient.java:104) > at > org.apache.thrift.async.TAsyncMethodCall.onError(TAsyncMethodCall.java:215) > at > org.apache.thrift.async.TAsyncMethodCall.transition(TAsyncMethodCall.java:210) > at > org.apache.thrift.async.TAsyncClientManager$SelectThread.transitionMethods(TAsyncClientManager.java:143) > at > org.apache.thrift.async.TAsyncClientManager$SelectThread.run(TAsyncClientManager.java:113) > Caused by: java.net.ConnectException: Connection refused > at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) > at > sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) > at > org.apache.thrift.transport.TNonblockingSocket.finishConnect(TNonblockingSocket.java:217) > at > org.apache.thrift.async.TAsyncMethodCall.doConnecting(TAsyncMethodCall.java:279) > at > org.apache.thrift.async.TAsyncMethodCall.transition(TAsyncMethodCall.java:189) > ... 2 common frames omitted > > !image-2022-05-19-20-22-13-041.png! > > > 期望:无报错信息,正常启动 -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (IOTDB-3195) Added a configuration interface for the consensus layer
Xinyu Tan created IOTDB-3195: Summary: Added a configuration interface for the consensus layer Key: IOTDB-3195 URL: https://issues.apache.org/jira/browse/IOTDB-3195 Project: Apache IoTDB Issue Type: Improvement Reporter: Xinyu Tan Assignee: Xinyu Tan -- This message was sent by Atlassian Jira (v8.20.7#820007)