[jira] [Created] (IOTDB-6266) Add the ability to flush syncIndex and update reader periodically for IoTConsensus

2023-12-11 Thread Xinyu Tan (Jira)
Xinyu Tan created IOTDB-6266:


 Summary: Add the ability to flush syncIndex and update reader 
periodically for IoTConsensus
 Key: IOTDB-6266
 URL: https://issues.apache.org/jira/browse/IOTDB-6266
 Project: Apache IoTDB
  Issue Type: Improvement
Reporter: Xinyu Tan
Assignee: Xinyu Tan
 Attachments: image-2023-12-11-17-43-20-151.png

After the PR is merged , The safeDeletedSearchIndex passed by IoTConsensus to 
the wal reader is no longer the syncIndex that has been synchronized, but the 
SyncIndex that has been flushed to disk.

When a leader migration is triggered, the problem may occur that the wal of the 
old leader can never be deleted, resulting in a pile-up of the wal.

!image-2023-12-11-17-43-20-151.png!

To solve this problem, iotconsensus can add a way to flush syncIndex and update 
reader periodically to avoid the accumulation of the old leader's log after the 
leader switch



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (IOTDB-6190) Increase the threshold for Ratis to shut itself down if it detects that a process is stuck

2023-10-13 Thread Xinyu Tan (Jira)


 [ 
https://issues.apache.org/jira/browse/IOTDB-6190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinyu Tan reassigned IOTDB-6190:


Assignee: Xinyu Tan

> Increase the threshold for Ratis to shut itself down if it detects that a 
> process is stuck
> --
>
> Key: IOTDB-6190
> URL: https://issues.apache.org/jira/browse/IOTDB-6190
> Project: Apache IoTDB
>  Issue Type: Improvement
>Reporter: Xinyu Tan
>Assignee: Xinyu Tan
>Priority: Major
>
> Currently, Ratis shuts itself down after detecting a 10min GC-free pause in 
> the process. However, in many user scenarios, it is common to pause VMS at 
> the hourly level, which may cause users to frequently restart after detecting 
> a ratis shutdown, so we plan to adjust this parameter to 3 days.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IOTDB-6190) Increase the threshold for Ratis to shut itself down if it detects that a process is stuck

2023-10-13 Thread Xinyu Tan (Jira)
Xinyu Tan created IOTDB-6190:


 Summary: Increase the threshold for Ratis to shut itself down if 
it detects that a process is stuck
 Key: IOTDB-6190
 URL: https://issues.apache.org/jira/browse/IOTDB-6190
 Project: Apache IoTDB
  Issue Type: Improvement
Reporter: Xinyu Tan


Currently, Ratis shuts itself down after detecting a 10min GC-free pause in the 
process. However, in many user scenarios, it is common to pause VMS at the 
hourly level, which may cause users to frequently restart after detecting a 
ratis shutdown, so we plan to adjust this parameter to 3 days.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IOTDB-6183) Optimize the timeout retry logic of IoTConsensus sending RPCS

2023-10-10 Thread Xinyu Tan (Jira)
Xinyu Tan created IOTDB-6183:


 Summary: Optimize the timeout retry logic of IoTConsensus sending 
RPCS
 Key: IOTDB-6183
 URL: https://issues.apache.org/jira/browse/IOTDB-6183
 Project: Apache IoTDB
  Issue Type: Improvement
Reporter: Xinyu Tan
Assignee: Xinyu Tan


We should never let it time out, because the logic behind a timeout is also to 
retry, which might actually worsen the situation. For example, resulting in a 
significant increase in the number of file handles



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IOTDB-6156) Fixed TConfiguration invalidly in Thrift AsyncServer For IoTConsensus

2023-09-14 Thread Xinyu Tan (Jira)
Xinyu Tan created IOTDB-6156:


 Summary: Fixed TConfiguration invalidly in Thrift AsyncServer For 
IoTConsensus
 Key: IOTDB-6156
 URL: https://issues.apache.org/jira/browse/IOTDB-6156
 Project: Apache IoTDB
  Issue Type: Bug
Reporter: Xinyu Tan
Assignee: Xinyu Tan


In a user scenario, the machine configuration is as follows:
3c3d 3 replicas,
1 database
1 device
2 measurement
1 client
insertAlignTablet interface
batchSize 1000
time_partition_interval=314496000

The IoTConsensus data synchronization error occurs after writing to the cluster.

{code:java}
2023-09-14 12:11:50,888 [pool-19-IoTDB-IoTConsensusRPC-Processor-5] WARN  
o.a.t.s.AbstractNonblockingServer$AsyncFrameBuffer:606 - Exception while 
invoking! 
org.apache.thrift.transport.TTransportException: MaxMessageSize reached
 at 
org.apache.thrift.transport.TEndpointTransport.countConsumedMessageBytes(TEndpointTransport.java:96)
 at 
org.apache.thrift.transport.TMemoryInputTransport.read(TMemoryInputTransport.java:97)
 at org.apache.thrift.transport.TTransport.readAll(TTransport.java:109)
 at 
org.apache.iotdb.rpc.AutoScalingBufferReadTransport.fill(AutoScalingBufferReadTransport.java:38)
 at 
org.apache.iotdb.rpc.TElasticFramedTransport.readFrame(TElasticFramedTransport.java:128)
 at 
org.apache.iotdb.rpc.TElasticFramedTransport.read(TElasticFramedTransport.java:108)
 at org.apache.thrift.transport.TTransport.readAll(TTransport.java:109)
 at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:463)
 at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:361)
 at 
org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:244)
 at org.apache.thrift.TBaseAsyncProcessor.process(TBaseAsyncProcessor.java:52)
 at 
org.apache.thrift.server.AbstractNonblockingServer$AsyncFrameBuffer.invoke(AbstractNonblockingServer.java:603)
 at org.apache.thrift.server.Invocation.run(Invocation.java:18)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
 at java.lang.Thread.run(Thread.java:748)
{code}


This is mainly due to the use of AsyncServer in IoTConsensus. At present, the 
default maximum size of message is 100M instead of 512M, so it needs to be 
updated



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IOTDB-6144) Adjust the default thrift timeout parameter to 60s

2023-09-07 Thread Xinyu Tan (Jira)
Xinyu Tan created IOTDB-6144:


 Summary: Adjust the default thrift timeout parameter to 60s
 Key: IOTDB-6144
 URL: https://issues.apache.org/jira/browse/IOTDB-6144
 Project: Apache IoTDB
  Issue Type: Improvement
Reporter: Xinyu Tan


After conducting research on systems such as HBase, Doris, TiDB, and others, we 
have found that the default RPC timeout for many systems is set to 60 seconds. 
Therefore, we plan to adjust the default timeout for IoTDB.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (IOTDB-6144) Adjust the default thrift timeout parameter to 60s

2023-09-07 Thread Xinyu Tan (Jira)


 [ 
https://issues.apache.org/jira/browse/IOTDB-6144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinyu Tan reassigned IOTDB-6144:


Assignee: Xinyu Tan

> Adjust the default thrift timeout parameter to 60s
> --
>
> Key: IOTDB-6144
> URL: https://issues.apache.org/jira/browse/IOTDB-6144
> Project: Apache IoTDB
>  Issue Type: Improvement
>Reporter: Xinyu Tan
>Assignee: Xinyu Tan
>Priority: Major
>
> After conducting research on systems such as HBase, Doris, TiDB, and others, 
> we have found that the default RPC timeout for many systems is set to 60 
> seconds. Therefore, we plan to adjust the default timeout for IoTDB.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (IOTDB-6099) Increase the printing threshold when ratis follower sleep detects gc

2023-08-30 Thread Xinyu Tan (Jira)


[ 
https://issues.apache.org/jira/browse/IOTDB-6099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17760273#comment-17760273
 ] 

Xinyu Tan commented on IOTDB-6099:
--

[https://github.com/apache/iotdb/pull/10996] has been fixed

> Increase the printing threshold when ratis follower sleep detects gc
> 
>
> Key: IOTDB-6099
> URL: https://issues.apache.org/jira/browse/IOTDB-6099
> Project: Apache IoTDB
>  Issue Type: Improvement
>Reporter: Xinyu Tan
>Assignee: Xinyu Tan
>Priority: Major
> Attachments: image-2023-08-04-15-19-08-623.png
>
>
> !image-2023-08-04-15-19-08-623.png!
> Currently, ratis followers will print logs when they detect GCs larger than 
> 300ms, which will make the online system have a lot of useless logs. We will 
> increase the threshold to 4s



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IOTDB-6121) Consensus layer interface and exception handling refactoring

2023-08-17 Thread Xinyu Tan (Jira)
Xinyu Tan created IOTDB-6121:


 Summary: Consensus layer interface and exception handling 
refactoring
 Key: IOTDB-6121
 URL: https://issues.apache.org/jira/browse/IOTDB-6121
 Project: Apache IoTDB
  Issue Type: Improvement
Reporter: Xinyu Tan
Assignee: Xinyu Tan


1. Rename the interface of the consensus layer to reduce ambiguity
2. Refactor the consensus interface to throw exceptions, forcing the upper 
layer to handle the exception type
3. Improve the annotation of the consensus layer
4. Reconstruct the datanode consensus singleton



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IOTDB-6116) Disassociate the IoTConsensus retry logic from the forkjoinPool

2023-08-16 Thread Xinyu Tan (Jira)
Xinyu Tan created IOTDB-6116:


 Summary: Disassociate the IoTConsensus retry logic from the 
forkjoinPool
 Key: IOTDB-6116
 URL: https://issues.apache.org/jira/browse/IOTDB-6116
 Project: Apache IoTDB
  Issue Type: Improvement
Reporter: Xinyu Tan
Assignee: Xinyu Tan


The current IoTConsensus Batch retry logic relies on the forkjoinPool and takes 
the thread to sleep synchronously, This may lead to frequent timeouts in the 
follower "waiting target request timeout. current index", So we use a 
ScheduledExecutorService to unbind the restart logic from the forkjoinPool



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IOTDB-6106) Fixed the timeout parameter not working in thrift asyncClient

2023-08-09 Thread Xinyu Tan (Jira)
Xinyu Tan created IOTDB-6106:


 Summary: Fixed the timeout parameter not working in thrift 
asyncClient
 Key: IOTDB-6106
 URL: https://issues.apache.org/jira/browse/IOTDB-6106
 Project: Apache IoTDB
  Issue Type: Improvement
Reporter: Xinyu Tan
Assignee: Xinyu Tan
 Attachments: image-2023-08-09-21-13-05-225.png, 
image-2023-08-09-21-13-37-644.png, image-2023-08-09-21-15-02-347.png

!image-2023-08-09-21-13-05-225.png! 
!image-2023-08-09-21-13-37-644.png! 
Currently, the asyncthrift timeout parameter is set using the 
[TNonblockingSocket|https://people.apache.org/~thejas/thrift-0.9/javadoc/org/apache/thrift/transport/TNonblockingSocket.html]
 constructor, but the timeout parameter is not used. This completely 
deactivates our asyncclient timeout parameter.

!image-2023-08-09-21-15-02-347.png!

We satisfy the timeout control parameter requirements by setting TAsyncClient's 
timeout.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (IOTDB-5557) [ metadata ] The metadata query results are inconsistent

2023-08-07 Thread Xinyu Tan (Jira)


[ 
https://issues.apache.org/jira/browse/IOTDB-5557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17751560#comment-17751560
 ] 

Xinyu Tan commented on IOTDB-5557:
--

第一次测试中唯一一条异常日志的原因分析如 https://issues.apache.org/jira/browse/IOTDB-6102 所示,与本 PR 
无关,当前 PR 所对应的问题已解决。

> [ metadata ] The metadata query results are inconsistent
> 
>
> Key: IOTDB-5557
> URL: https://issues.apache.org/jira/browse/IOTDB-5557
> Project: Apache IoTDB
>  Issue Type: Bug
>  Components: Core/Schema Manager, mpp-cluster
>Affects Versions: 1.1.0-SNAPSHOT
>Reporter: 刘珍
>Assignee: Song Ziyang
>Priority: Blocker
>  Labels: pull-request-available
> Attachments: IOTDB_5557.conf, image-2023-02-20-14-04-32-611.png, 
> image-2023-07-29-08-21-43-740.png, screenshot-1.png
>
>
> master : 0219_0cd4461
> 启动集群,log_datanode_all.log出现enjoy后,查询元数据,出现查询结果不一致(动态增加,直到全部元数据加载到内存)。
> 期望:只要集群已经开始提供查询服务,就要保证查询结果的一致性。
> 测试环境:
> 1. 192.168.10.76  48cpu 384GB 内存
> 元数据信息:1db,1万设备,600序列/dev。
> ConfigNode:
> MAX_HEAP_SIZE="8G"
> DataNode:
> MAX_HEAP_SIZE="256G"
> MAX_DIRECT_MEMORY_SIZE="32G"
> COMMON配置
> time_partition_interval=6048000
> query_timeout_threshold=3600
> enable_seq_space_compaction=false
> enable_unseq_space_compaction=false
> enable_cross_space_compaction=false
> 2. 清操作系统缓存,启动数据库,出现enjoy后,执行count devices查看结果
> cat check_device_count.sh 
> while true
> do
>   v_start=`grep enjoy logs/log_datanode_all.log|wc -l`
>   if [[ ${v_start} = "1" ]];then
>   for i in {1..100}
>   do
>./sbin/start-cli.sh -h 192.168.10.76 -e "count devices;" 
> >> dev_count_during_start.out
>   done
>   break
>   fi
> done
> 下图结果,可以看出,count devices的结果在动态增加,直至1,完全加载到内存中:
>  !image-2023-02-20-14-04-32-611.png! 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IOTDB-6099) Increase the printing threshold when ratis follower sleep detects gc

2023-08-04 Thread Xinyu Tan (Jira)
Xinyu Tan created IOTDB-6099:


 Summary: Increase the printing threshold when ratis follower sleep 
detects gc
 Key: IOTDB-6099
 URL: https://issues.apache.org/jira/browse/IOTDB-6099
 Project: Apache IoTDB
  Issue Type: Improvement
Reporter: Xinyu Tan
Assignee: Xinyu Tan
 Attachments: image-2023-08-04-15-19-08-623.png

!image-2023-08-04-15-19-08-623.png!
Currently, ratis followers will print logs when they detect GCs larger than 
300ms, which will make the online system have a lot of useless logs. We will 
increase the threshold to 5s



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (IOTDB-6012) Client receive The consensus group SchemaRegion[0] doesn't exist error

2023-07-26 Thread Xinyu Tan (Jira)


 [ 
https://issues.apache.org/jira/browse/IOTDB-6012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinyu Tan reassigned IOTDB-6012:


Assignee: LiYuheng  (was: Song Ziyang)

> Client receive The consensus group SchemaRegion[0] doesn't exist error
> --
>
> Key: IOTDB-6012
> URL: https://issues.apache.org/jira/browse/IOTDB-6012
> Project: Apache IoTDB
>  Issue Type: Bug
>  Components: Core/Cluster
>Reporter: Yuan Tian
>Assignee: LiYuheng
>Priority: Major
> Attachments: SessionExample.java, image-2023-06-20-09-41-11-877.png
>
>
> Using GraalVM CE 22.3.1  as your jdk and then mvn clean package to get your 
> distribution package.
> Execute the SessionExample in the attachment and you will get the error 
> message:
> !image-2023-06-20-09-41-11-877.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (IOTDB-5860) Total Number of file is wrong

2023-07-24 Thread Xinyu Tan (Jira)


[ 
https://issues.apache.org/jira/browse/IOTDB-5860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17746276#comment-17746276
 ] 

Xinyu Tan commented on IOTDB-5860:
--

 !screenshot-1.png! 
 has been fixed in rc/1.2.0

> Total Number of file is wrong
> -
>
> Key: IOTDB-5860
> URL: https://issues.apache.org/jira/browse/IOTDB-5860
> Project: Apache IoTDB
>  Issue Type: Bug
>Reporter: Yuan Tian
>Assignee: Hongyin Zhang
>Priority: Major
> Attachments: image-2023-05-10-17-41-08-561.png, screenshot-1.png
>
>
> Should not add the open_file_handlers or more exactly, we should not put 
> open_file_handlers metric in this chart
> !image-2023-05-10-17-41-08-561.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IOTDB-6061) Fix the instability failure caused by initServer in IoTConsensus UT not binding to the corresponding port

2023-07-12 Thread Xinyu Tan (Jira)
Xinyu Tan created IOTDB-6061:


 Summary: Fix the instability failure caused by initServer in 
IoTConsensus UT not binding to the corresponding port
 Key: IOTDB-6061
 URL: https://issues.apache.org/jira/browse/IOTDB-6061
 Project: Apache IoTDB
  Issue Type: Improvement
Reporter: Xinyu Tan
Assignee: Xinyu Tan


 !image-2023-07-12-16-30-40-880.png|thumbnail! 

Add logic to wait until the port is available to execute the test



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IOTDB-6051) Fixed concurrency error in IoTConsensus UT when stopping cluster

2023-07-05 Thread Xinyu Tan (Jira)
Xinyu Tan created IOTDB-6051:


 Summary: Fixed concurrency error in IoTConsensus UT when stopping 
cluster
 Key: IOTDB-6051
 URL: https://issues.apache.org/jira/browse/IOTDB-6051
 Project: Apache IoTDB
  Issue Type: Improvement
Reporter: Xinyu Tan
 Attachments: image-2023-07-05-20-18-03-912.png, screenshot-1.png

 !image-2023-07-05-20-18-03-912.png|thumbnail! 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (IOTDB-6051) Fixed concurrency error in IoTConsensus UT when stopping cluster

2023-07-05 Thread Xinyu Tan (Jira)


 [ 
https://issues.apache.org/jira/browse/IOTDB-6051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinyu Tan reassigned IOTDB-6051:


Assignee: Xinyu Tan

> Fixed concurrency error in IoTConsensus UT when stopping cluster
> 
>
> Key: IOTDB-6051
> URL: https://issues.apache.org/jira/browse/IOTDB-6051
> Project: Apache IoTDB
>  Issue Type: Improvement
>Reporter: Xinyu Tan
>Assignee: Xinyu Tan
>Priority: Major
> Attachments: image-2023-07-05-20-18-03-912.png, screenshot-1.png
>
>
>  !image-2023-07-05-20-18-03-912.png|thumbnail! 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IOTDB-6022) The WAL piles up when multi-replica iotconsensus is written at high concurrency

2023-06-21 Thread Xinyu Tan (Jira)
Xinyu Tan created IOTDB-6022:


 Summary: The WAL piles up when multi-replica iotconsensus is 
written at high concurrency
 Key: IOTDB-6022
 URL: https://issues.apache.org/jira/browse/IOTDB-6022
 Project: Apache IoTDB
  Issue Type: Bug
Reporter: Xinyu Tan
 Attachments: image-2023-06-21-22-48-38-945.png

 !image-2023-06-21-22-48-38-945.png|thumbnail! 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (IOTDB-5931) The "show cluster" command displays nodes with "Unknown" status, but these nodes can still perform read and write operations normally.

2023-06-07 Thread Xinyu Tan (Jira)


 [ 
https://issues.apache.org/jira/browse/IOTDB-5931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinyu Tan reassigned IOTDB-5931:


Assignee: huxiangpeng  (was: Xinyu Tan)

> The "show cluster" command displays nodes with "Unknown" status, but these 
> nodes can still perform read and write operations normally.
> --
>
> Key: IOTDB-5931
> URL: https://issues.apache.org/jira/browse/IOTDB-5931
> Project: Apache IoTDB
>  Issue Type: Bug
>  Components: Core/Cluster, mpp-cluster
>Reporter: 刘珍
>Assignee: huxiangpeng
>Priority: Major
> Attachments: exp.out, image-2023-05-29-15-30-45-953.png, 
> image-2023-05-29-15-31-02-797.png, ip23_cn_logs.tar.gz, ip24_cn_logs.tar.gz, 
> ip25_cn_logs.tar.gz, load_insert_drop_db_1.sh, run_2_client.sh
>
>
> 测试版本:iotdb master 0524_12d67e0
> 问题1 :
> 3副本3C21D集群,长时间循环运行 load tsfile ; delete 所有数据;show cluster 
> 21D显式状态为Unkown,但是客户端仍然可以继续读写正常。
>  !image-2023-05-29-15-30-45-953.png! 
> 问题2:不同datanode show cluster 结果不同
>  !image-2023-05-29-15-31-02-797.png! 
> 测试环境 ,私有云1期,172.16.2.2 - 25
> 1. 配置参数
> COMMON配置
> schema_region_group_extension_policy=CUSTOM
> default_schema_region_group_num_per_database=10
> data_region_group_extension_policy=CUSTOM
> default_data_region_group_num_per_database=42
> min_cross_compaction_unseq_file_level=0
> schema_replication_factor=3
> data_replication_factor=3
> default_storage_group_level=2
> compaction_write_throughput_mb_per_sec=64
> confignode
> MAX_HEAP_SIZE="20G"
> MAX_DIRECT_MEMORY_SIZE="6G"
> cn_target_config_node_list=172.16.2.23:10710
> DATANODE:
> MAX_HEAP_SIZE="20G"
> MAX_DIRECT_MEMORY_SIZE="6G"
> dn_target_config_node_list=172.16.2.23:10710,172.16.2.24:10710,172.16.2.25:10710
> 2. 客户端测试脚本在172.16.2.2
> /data1/iotdb/i_m_0524_12d67e0路径下
> cat load_insert_drop_db_1.sh
> v_host="172.16.2.2"
> cluster_dir="/data1/iotdb"
> db_commit="i_m_0524_12d67e0"
> db_dir="${cluster_dir}/${db_commit}"
> u_name="root"
> ${db_dir}/sbin/start-cli.sh -h ${v_host}  -e "delete from  root.test.g_0.**;"
> ${db_dir}/sbin/start-cli.sh  -h ${v_host} -e 'load 
> "/data/iotdb/load_tsfile/load_tsfile_level_1/"  verify=false sglevel=2 
> onSuccess=none'
> ${db_dir}/sbin/start-cli.sh  -h ${v_host} -e 'load 
> "/data/iotdb/load_tsfile/load_tsfile_1"  verify=false sglevel=2 
> onSuccess=none'
> ${db_dir}/sbin/start-cli.sh  -h ${v_host} -e 'load 
> "/data/iotdb/load_tsfile/load_tsfile_2"  verify=false sglevel=2 
> onSuccess=none'
> ${db_dir}/sbin/start-cli.sh -h ${v_host}  -e "flush"
> ${db_dir}/sbin/start-cli.sh -h ${v_host}  -e "select count(s_0) from 
> root.test.g_0.** align by device;" >act.out
> v_diff=`diff ${db_dir}/exp.out ${db_dir}/act.out|grep root|wc -l`
> if [[ ${v_diff} = 0 ]];then
>echo "query pass." >> query_res.out
> else
>v_date=`date "+%Y-%m-%d_%H_%M_%S"`
>echo "${v_date} query fail." >> aft_load_query_res.out
> fi
> exec 3<./dn.txt
> while read node <&3
> do
> v_comp=`ssh ${u_name}@${node} "find ${db_dir}/data/ -name 
> *compaction.log|wc -l"`
> if [[ ${v_comp} -gt 0 ]];then
> sleep 2
> ${db_dir}/sbin/start-cli.sh -h ${v_host}  -e "delete from  
> root.test.g_0.**;"
> ${db_dir}/sbin/start-cli.sh -h ${v_host}  -e "select 
> count(s_0) from  root.test.g_0.** having count(s_0)>0;" >> del_data_q.out
> break
> fi
> done
> sleep 10
> for i in {1..3}
> do
> exec 3<./dn.txt
> while read node <&3
> do
> v_comp=`ssh ${u_name}@${node} "find ${db_dir}/data/ -name 
> *compaction.log|wc -l"`
> if [[ ${v_comp} -gt 0 ]];then
>echo "${node} after delete from root.test.g_0 still 
> compacting." >> not_expect_res.out
> fi
> done
> done



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (IOTDB-5864) Print failed to install snapshot warn log while restarting

2023-05-11 Thread Xinyu Tan (Jira)


 [ 
https://issues.apache.org/jira/browse/IOTDB-5864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinyu Tan reassigned IOTDB-5864:


Assignee: Song Ziyang  (was: Xinyu Tan)

> Print failed to install snapshot warn log while restarting
> --
>
> Key: IOTDB-5864
> URL: https://issues.apache.org/jira/browse/IOTDB-5864
> Project: Apache IoTDB
>  Issue Type: Bug
>Reporter: Yuan Tian
>Assignee: Song Ziyang
>Priority: Major
> Attachments: image-2023-05-11-15-28-57-190.png, 
> image-2023-05-11-15-34-54-296.png
>
>
> The write throughput is also very low all the time after restarting.
> !image-2023-05-11-15-28-57-190.png!
> !image-2023-05-11-15-34-54-296.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IOTDB-5850) [CI stability] Write error becasue failed to get replicaSet of consensus group

2023-05-08 Thread Xinyu Tan (Jira)
Xinyu Tan created IOTDB-5850:


 Summary: [CI stability] Write error becasue failed to get 
replicaSet of consensus group
 Key: IOTDB-5850
 URL: https://issues.apache.org/jira/browse/IOTDB-5850
 Project: Apache IoTDB
  Issue Type: Improvement
Reporter: Xinyu Tan
 Attachments: 
IoTDBSortedShowTimeseriesIT_showTimeseriesOrderByHeatWithLimitTest[SchemaEngineMode=SchemaFile].zip,
 image-2023-05-08-19-29-44-282.png, image-2023-05-08-19-30-06-447.png, 
image-2023-05-08-19-30-27-561.png

 !image-2023-05-08-19-29-44-282.png|thumbnail! 
 !image-2023-05-08-19-30-06-447.png|thumbnail! 
 !image-2023-05-08-19-30-27-561.png|thumbnail! 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IOTDB-5841) Modify IoTConsensus default parameters to improve performance in more scenarios

2023-05-06 Thread Xinyu Tan (Jira)
Xinyu Tan created IOTDB-5841:


 Summary: Modify IoTConsensus default parameters to improve 
performance in more scenarios
 Key: IOTDB-5841
 URL: https://issues.apache.org/jira/browse/IOTDB-5841
 Project: Apache IoTDB
  Issue Type: Improvement
Reporter: Xinyu Tan
Assignee: Xinyu Tan


* Add pipelineNum metric
* change maxPendingBatch from 5 to 12 to improve replication performance:this 
change may occur more IoTConsensusServiceThread in some case,so we can not make 
it too large。
* change maxLogEntriesNumPerBatch from 30 to 1024:For small request scenarios, 
this change can increase the size of each Batch, thereby increasing the 
synchronization speed. This change will not affect large requests because the 
size of each Batch is limited to no more than 16M.
* Change the queue from an infinitely long LinkedBlockingQueue to an 
ArrayBlockingQueue to avoid a single queue taking up too much memory.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IOTDB-5840) Avoid the problem that the insertRecords interface may cause the number of threads to balloon when there are too many data regions

2023-05-05 Thread Xinyu Tan (Jira)
Xinyu Tan created IOTDB-5840:


 Summary: Avoid the problem that the insertRecords interface may 
cause the number of threads to balloon when there are too many data regions
 Key: IOTDB-5840
 URL: https://issues.apache.org/jira/browse/IOTDB-5840
 Project: Apache IoTDB
  Issue Type: Improvement
Reporter: Xinyu Tan
Assignee: Xinyu Tan


On a machine with sufficient CPU resources (for example, 32 cores), if the 
number of Dataregions is too small, the write pressure in the cluster is 
concentrated on the locks of these regions. As a result, the write latency is 
high and the throughput cannot be increased. When the number of DataRegion is 
large, for an InsertRecords request with a large batchSize such as 1, its 
write request may involve many DataRegion. Once the concurrency is high, It 
takes hundreds of internalServiceClient to dispatch the planNode. Under the 
current threading model of BIO, this would also increase the number of 
InternalServiceRPC threads in the cluster to hundreds or thousands.

For example, in a user test environment, coreSize of the clientManager is set 
to 600 and maxSize is set to 1000 to prevent concurrent write requests from 
blocking each other while obtaining internalServiceClient. The result is that 
each node has nearly 1000 InternalServiceRPC threads. If the client increases 
concurrency further, a "connection reset by peer" error is reported. This error 
should be caused by the default parameters of the linux kernel not supporting 
so many connections.

The current mpp framework splits Plannodes by region only. Therefore, the 
number of RPCS to be sent per write request is closely related to the number of 
dataregion involved in the request rather than the number of Datanodes.

The solution to this problem is to aggregate RPC requests sent to the same 
datanode. This reduces the pressure on the clientManager and reduces the number 
of InternalServiceRPC threads. Avoid sending the connection reset by peer error 
to the client again.

After the optimization, the number of RPC service threads was reduced from 1000 
to 200. The connection reset by peer error was cleared. And we can increase the 
number of regions to make full use of cluster cpu resources



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (IOTDB-5780) Let users know a node was successfully removed and data is recovered

2023-04-28 Thread Xinyu Tan (Jira)


 [ 
https://issues.apache.org/jira/browse/IOTDB-5780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinyu Tan reassigned IOTDB-5780:


Assignee: Xinyu Tan

> Let users know a node was successfully removed and data is recovered
> 
>
> Key: IOTDB-5780
> URL: https://issues.apache.org/jira/browse/IOTDB-5780
> Project: Apache IoTDB
>  Issue Type: Improvement
>Reporter: Jialin Qiao
>Assignee: Xinyu Tan
>Priority: Minor
> Attachments: screenshot-1.png
>
>
> When the datanode is removed, we will copy the data asynchronously to a new 
> node to keep the replication_factor. Here users need to know how the 
> asynchronous job is finished. We need to provide the inspection.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (IOTDB-5780) Let users know a node was successfully removed and data is recovered

2023-04-28 Thread Xinyu Tan (Jira)


[ 
https://issues.apache.org/jira/browse/IOTDB-5780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17717593#comment-17717593
 ] 

Xinyu Tan commented on IOTDB-5780:
--

A procedure task is generated for cluster reduction. After all regions are 
migrated, the system attempts to stop the corresponding node. During this 
process, a log is printed for users to evaluate data migration.
 !screenshot-1.png! 

In addition, you can run the show cluster command to view the status of the 
region. We are still reconstructing these status types, which is expected to 
take 1-2 months.

> Let users know a node was successfully removed and data is recovered
> 
>
> Key: IOTDB-5780
> URL: https://issues.apache.org/jira/browse/IOTDB-5780
> Project: Apache IoTDB
>  Issue Type: Improvement
>Reporter: Jialin Qiao
>Assignee: Xinyu Tan
>Priority: Minor
> Attachments: screenshot-1.png
>
>
> When the datanode is removed, we will copy the data asynchronously to a new 
> node to keep the replication_factor. Here users need to know how the 
> asynchronous job is finished. We need to provide the inspection.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IOTDB-5835) Fix wal accumulation caused by datanode restart

2023-04-27 Thread Xinyu Tan (Jira)
Xinyu Tan created IOTDB-5835:


 Summary: Fix wal accumulation caused by datanode restart
 Key: IOTDB-5835
 URL: https://issues.apache.org/jira/browse/IOTDB-5835
 Project: Apache IoTDB
  Issue Type: Improvement
Reporter: Xinyu Tan
Assignee: Xinyu Tan
 Attachments: image-2023-04-28-11-08-43-542.png, 
image-2023-04-28-11-08-51-622.png, image-2023-04-28-11-08-57-549.png, 
image-2023-04-28-11-09-03-902.png

When cluster is running properly, if replica A of a consensus group becomes the 
Leader, it continuously sends logs to other followers and updates wal's 
safelyDeletedSearchIndex after sending logs. wal files is deleted 
asynchronously. Therefore, if a restart occurs, some logs that have been 
synchronized to other nodes may not be deleted. After the restart, perhaps 
another replica B becomes the Leader and the current replica A becomes a 
Follower receiving logs.
Because the current IoTConsensus does not use its recovered syncIndex to set 
the safelyDeletedSearchIndex of the underlying walnode at startup, replica A 
cannot delete wal files at this time, which results in the accumulation of WAL 
files. Write requests of all regions on the node are affected.
 !image-2023-04-28-11-08-43-542.png|thumbnail! 
 !image-2023-04-28-11-08-51-622.png|thumbnail! 
 !image-2023-04-28-11-08-57-549.png|thumbnail!  
!image-2023-04-28-11-09-03-902.png|thumbnail! 
The solution to this problem is to update the safelyDeletedSearchIndex of 
reader at startup



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IOTDB-5828) Optimize the implementation of some metric items in the metric module to prevent Prometheus pull timeouts

2023-04-27 Thread Xinyu Tan (Jira)
Xinyu Tan created IOTDB-5828:


 Summary: Optimize the implementation of some metric items in the 
metric module to prevent Prometheus pull timeouts
 Key: IOTDB-5828
 URL: https://issues.apache.org/jira/browse/IOTDB-5828
 Project: Apache IoTDB
  Issue Type: Improvement
Reporter: Xinyu Tan
Assignee: Liuxuxin
 Attachments: image-2023-04-27-17-01-37-144.png, 
image-2023-04-27-17-03-29-978.png

 !image-2023-04-27-17-03-29-978.png! 
!image-2023-04-27-17-01-37-144.png! 
Under high write pressure, even without Full GC, the elapsed time of individual 
monitor items in the monitoring framework will cause the Prometheus pull 
sampling timeout, resulting in missing monitor data, which ultimately affects 
performance problem troubleshooting.
The three main time points found by jprofile sampling are the number of file 
handles, the number of client concurrency, and the number of threads. The 
implementation needs to be optimized




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (IOTDB-5777) When writing data using non-root users, the permission authentication module takes too long

2023-04-27 Thread Xinyu Tan (Jira)


 [ 
https://issues.apache.org/jira/browse/IOTDB-5777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinyu Tan reassigned IOTDB-5777:


Assignee: Hongyin Zhang

> When writing data using non-root users, the permission authentication module 
> takes too long
> ---
>
> Key: IOTDB-5777
> URL: https://issues.apache.org/jira/browse/IOTDB-5777
> Project: Apache IoTDB
>  Issue Type: Improvement
>Reporter: Liuxuxin
>Assignee: Hongyin Zhang
>Priority: Major
> Fix For: master branch, 1.1.0
>
> Attachments: 20230414-162617.html, image-2023-04-17-11-27-41-532.png
>
>
> When writing data using non-root users, the time consumption of the 
> permission authentication module is too high, accounting for about 2/3 of the 
> total write time. The flame graph shows that the time consumption is mainly 
> concentrated on the initialization of PartialPath.
> !image-2023-04-17-11-27-41-532.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IOTDB-5731) Reconstructs the cli to support printing the enterprise logo when connecting to the Enterprise Edition

2023-03-26 Thread Xinyu Tan (Jira)
Xinyu Tan created IOTDB-5731:


 Summary: Reconstructs the cli to support printing the enterprise 
logo when connecting to the Enterprise Edition
 Key: IOTDB-5731
 URL: https://issues.apache.org/jira/browse/IOTDB-5731
 Project: Apache IoTDB
  Issue Type: Improvement
Reporter: Xinyu Tan
Assignee: Xinyu Tan


see [doc|https://apache-iotdb.feishu.cn/docx/KCj5dYt3FoZNvrxS0Slc4mLMntd]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IOTDB-5725) Make internal report recording measurements asynchronous

2023-03-23 Thread Xinyu Tan (Jira)
Xinyu Tan created IOTDB-5725:


 Summary: Make internal report recording measurements asynchronous
 Key: IOTDB-5725
 URL: https://issues.apache.org/jira/browse/IOTDB-5725
 Project: Apache IoTDB
  Issue Type: Improvement
Reporter: Xinyu Tan
Assignee: Xinyu Tan


InternalReporter of the current metric module writes synchronously to the 
iotdb, which may cause a slow flush. In particular, when the system records 
flush points for the first time, the system needs to create related regions of 
root.__system, which takes a long time and may result in system reject errors.

This issue adjusts the processes written to the iotdb to be asynchronous and 
ensures that all of the metric module's current operations will be memory 
operations only and will not involve time-consuming RPCS.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IOTDB-5697) Only record engine cost for DataRegion in Performance Overview Dashboard

2023-03-17 Thread Xinyu Tan (Jira)
Xinyu Tan created IOTDB-5697:


 Summary: Only record engine cost for DataRegion in Performance 
Overview Dashboard
 Key: IOTDB-5697
 URL: https://issues.apache.org/jira/browse/IOTDB-5697
 Project: Apache IoTDB
  Issue Type: Improvement
Reporter: Xinyu Tan
Assignee: Xinyu Tan


Currently, when we record the write state machine cost in the Performance 
Overview panel, we do not record only for the DataRegion, which may make the 
time inaccurate and thus inconsistent with the latency of the downstream 
disassembly



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IOTDB-5695) Ensures backward compatibility between 1.0 and 1.1 for ConfigNode when using SimpleConsensus

2023-03-17 Thread Xinyu Tan (Jira)
Xinyu Tan created IOTDB-5695:


 Summary: Ensures backward compatibility between 1.0 and 1.1 for 
ConfigNode when using SimpleConsensus
 Key: IOTDB-5695
 URL: https://issues.apache.org/jira/browse/IOTDB-5695
 Project: Apache IoTDB
  Issue Type: Improvement
Reporter: Xinyu Tan
Assignee: Xinyu Tan


In version 1.1, we fixed a 1.0 SimpleConsensus bug that incorrectly set the 
consensus
directory. For backward compatibility, we need to rename a dir name.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (IOTDB-5368) DataNode launching error when the internal_port and rpc_port are same

2023-03-16 Thread Xinyu Tan (Jira)


 [ 
https://issues.apache.org/jira/browse/IOTDB-5368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinyu Tan reassigned IOTDB-5368:


Assignee: Yufeng Liu

> DataNode launching error when the internal_port and rpc_port are same
> -
>
> Key: IOTDB-5368
> URL: https://issues.apache.org/jira/browse/IOTDB-5368
> Project: Apache IoTDB
>  Issue Type: Bug
>Reporter: Gaofei Cao
>Assignee: Yufeng Liu
>Priority: Minor
>  Labels: pull-request-available
> Attachments: image-2023-01-05-19-51-39-734.png
>
>
>  
> In this case, DataNode launching will meet error, but ConfigNode still also 
> register this DataNode.
> !image-2023-01-05-19-51-39-734.png|width=448,height=268!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (IOTDB-5684) [Uncertain Path] Got a folder named ‘target’ in iotdb

2023-03-16 Thread Xinyu Tan (Jira)


 [ 
https://issues.apache.org/jira/browse/IOTDB-5684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinyu Tan reassigned IOTDB-5684:


Assignee: Yongzao Dan  (was: Xinyu Tan)

> [Uncertain Path] Got a folder named ‘target’  in iotdb
> --
>
> Key: IOTDB-5684
> URL: https://issues.apache.org/jira/browse/IOTDB-5684
> Project: Apache IoTDB
>  Issue Type: Bug
>  Components: Core/Cluster
>Affects Versions: 1.1.0-SNAPSHOT
>Reporter: Qingxin Feng
>Assignee: Yongzao Dan
>Priority: Major
> Attachments: image-2023-03-16-11-19-47-106.png
>
>
> Got a folder named ‘target’  in iotdb,but it is generated at the location 
> where the startup script is running.
> Please check this issue.Thanks.
> B.R
> !image-2023-03-16-11-19-47-106.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IOTDB-5674) Remove useless log in MicrometerAutoGauge

2023-03-14 Thread Xinyu Tan (Jira)
Xinyu Tan created IOTDB-5674:


 Summary: Remove useless log in MicrometerAutoGauge
 Key: IOTDB-5674
 URL: https://issues.apache.org/jira/browse/IOTDB-5674
 Project: Apache IoTDB
  Issue Type: Improvement
Reporter: Xinyu Tan
Assignee: Xinyu Tan


Currently MicrometerAutoGauge prints all monitors registered with it, causing a 
lot of useless logging when the cluster starts up, so this issue will remove 
these logs



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (IOTDB-5616) [Sonar]Fix some code smells and bugs given by SonarLint and sonalcloud

2023-03-14 Thread Xinyu Tan (Jira)


 [ 
https://issues.apache.org/jira/browse/IOTDB-5616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinyu Tan reassigned IOTDB-5616:


Assignee: (was: Xinyu Tan)

> [Sonar]Fix some code smells and bugs given by SonarLint and sonalcloud
> --
>
> Key: IOTDB-5616
> URL: https://issues.apache.org/jira/browse/IOTDB-5616
> Project: Apache IoTDB
>  Issue Type: Improvement
>Reporter: Yufeng Liu
>Priority: Major
>  Labels: pull-request-available
> Fix For: master branch
>
>   Original Estimate: 504h
>  Remaining Estimate: 504h
>
> There are 300+ bugs and 19k+ code smells in IoTDB now. This issue will try to 
> fix some of them.For details, please see [Apache IoTDB Project Parent 
> POM|https://sonarcloud.io/project/overview?id=apache_incubator-iotdb].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (IOTDB-5616) [Sonar]Fix some code smells and bugs given by SonarLint and sonalcloud

2023-03-14 Thread Xinyu Tan (Jira)


 [ 
https://issues.apache.org/jira/browse/IOTDB-5616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinyu Tan reassigned IOTDB-5616:


Assignee: Xinyu Tan

> [Sonar]Fix some code smells and bugs given by SonarLint and sonalcloud
> --
>
> Key: IOTDB-5616
> URL: https://issues.apache.org/jira/browse/IOTDB-5616
> Project: Apache IoTDB
>  Issue Type: Improvement
>Reporter: Yufeng Liu
>Assignee: Xinyu Tan
>Priority: Major
>  Labels: pull-request-available
> Fix For: master branch
>
>   Original Estimate: 504h
>  Remaining Estimate: 504h
>
> There are 300+ bugs and 19k+ code smells in IoTDB now. This issue will try to 
> fix some of them.For details, please see [Apache IoTDB Project Parent 
> POM|https://sonarcloud.io/project/overview?id=apache_incubator-iotdb].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (IOTDB-5368) DataNode launching error when the internal_port and rpc_port are same

2023-03-14 Thread Xinyu Tan (Jira)


 [ 
https://issues.apache.org/jira/browse/IOTDB-5368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinyu Tan reassigned IOTDB-5368:


Assignee: (was: Yongzao Dan)

> DataNode launching error when the internal_port and rpc_port are same
> -
>
> Key: IOTDB-5368
> URL: https://issues.apache.org/jira/browse/IOTDB-5368
> Project: Apache IoTDB
>  Issue Type: Bug
>Reporter: Gaofei Cao
>Priority: Minor
>  Labels: pull-request-available
> Attachments: image-2023-01-05-19-51-39-734.png
>
>
>  
> In this case, DataNode launching will meet error, but ConfigNode still also 
> register this DataNode.
> !image-2023-01-05-19-51-39-734.png|width=448,height=268!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (IOTDB-5613) Remove unnecessary serialization in IoTConsensus when replicaNum is 1 to improve write performance

2023-03-02 Thread Xinyu Tan (Jira)


[ 
https://issues.apache.org/jira/browse/IOTDB-5613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17695694#comment-17695694
 ] 

Xinyu Tan commented on IOTDB-5613:
--

I tested with 1c1d on the machine and the throughput improved by 70%

Before:
 !screenshot-1.png! 
After:
 !screenshot-2.png! 

> Remove unnecessary serialization in IoTConsensus when replicaNum is 1 to 
> improve write performance
> --
>
> Key: IOTDB-5613
> URL: https://issues.apache.org/jira/browse/IOTDB-5613
> Project: Apache IoTDB
>  Issue Type: Improvement
>Reporter: Xinyu Tan
>Assignee: Xinyu Tan
>Priority: Major
>  Labels: pull-request-available
> Attachments: image-2023-03-02-19-01-00-991.png, screenshot-1.png, 
> screenshot-2.png
>
>
> The current IoTConsensus still serializes each request at the consensus layer 
> when replicaNum = 1, which significantly increases the time spent at the 
> consensus layer in the full-link tracking panel.
> !image-2023-03-02-19-01-00-991.png!
> Although ISSUE [4855|https://github.com/apache/iotdb/pull/8025] refactored 
> the seriality-related code, the problem predates that refactoring. 
> This issue will avoid these unwanted serializations
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IOTDB-5613) Remove unnecessary serialization in IoTConsensus when replicaNum is 1 to improve write performance

2023-03-02 Thread Xinyu Tan (Jira)
Xinyu Tan created IOTDB-5613:


 Summary: Remove unnecessary serialization in IoTConsensus when 
replicaNum is 1 to improve write performance
 Key: IOTDB-5613
 URL: https://issues.apache.org/jira/browse/IOTDB-5613
 Project: Apache IoTDB
  Issue Type: Improvement
Reporter: Xinyu Tan
Assignee: Xinyu Tan
 Attachments: image-2023-03-02-19-01-00-991.png

The current IoTConsensus still serializes each request at the consensus layer 
when replicaNum = 1, which significantly increases the time spent at the 
consensus layer in the full-link tracking panel.

!image-2023-03-02-19-01-00-991.png!

Although ISSUE [4855|https://github.com/apache/iotdb/pull/8025] refactored the 
seriality-related code, the problem predates that refactoring. 

This issue will avoid these unwanted serializations

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IOTDB-5601) [Refactor] Remove AsyncConfigNodeHeartbeatServiceClient and AsyncDataNodeHeartbeatServiceClient as there core logic are duplicated

2023-02-28 Thread Xinyu Tan (Jira)
Xinyu Tan created IOTDB-5601:


 Summary: [Refactor] Remove AsyncConfigNodeHeartbeatServiceClient 
and AsyncDataNodeHeartbeatServiceClient as there core logic are duplicated
 Key: IOTDB-5601
 URL: https://issues.apache.org/jira/browse/IOTDB-5601
 Project: Apache IoTDB
  Issue Type: Improvement
Reporter: Xinyu Tan
Assignee: Xinyu Tan


For AsyncConfigNodeHeartbeatServiceClient and AsyncConfigNodeIServiceClient,  
AsyncDataNodeHeartbeatServiceClient and AsyncDataNodeInternalServiceClient, the 
difference of them is whether to print log when meeting exception, so the issue 
to delete the above two classes, and for thriftClientProperty added a print log 
parameters, Reduced redundant code



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IOTDB-5596) Rename ConfigNodeRegion to ConfigRegion

2023-02-28 Thread Xinyu Tan (Jira)
Xinyu Tan created IOTDB-5596:


 Summary: Rename ConfigNodeRegion to ConfigRegion 
 Key: IOTDB-5596
 URL: https://issues.apache.org/jira/browse/IOTDB-5596
 Project: Apache IoTDB
  Issue Type: Improvement
Reporter: Xinyu Tan
Assignee: Xinyu Tan


There are currently three consensus layer types in the cluster:
 * DataRegion
 * SchemaRegion
 * ConfigNodeRegion

As you can see, the name configNodeRegion clearly doesn't match the other two 
names, but the previous name PartitionRegion has too little responsibility, so 
we plan to name it ConfigRegion



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IOTDB-5595) Fix memory leak for TsFileProcessorInfoMetrics in TsFileProcessorInfo

2023-02-27 Thread Xinyu Tan (Jira)
Xinyu Tan created IOTDB-5595:


 Summary: Fix memory leak for TsFileProcessorInfoMetrics in 
TsFileProcessorInfo
 Key: IOTDB-5595
 URL: https://issues.apache.org/jira/browse/IOTDB-5595
 Project: Apache IoTDB
  Issue Type: Improvement
Reporter: Xinyu Tan
Assignee: Xinyu Tan


Currently, each memtable corresponds to a TsFileProcessorInfo, where it 
registers itself with the metric module, but there is no logic to remove it 
after memtable is flushed. This results in a memory leak.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IOTDB-5585) Change InternalReporterType from IoTDB to Memory to reduce performance degradation

2023-02-26 Thread Xinyu Tan (Jira)
Xinyu Tan created IOTDB-5585:


 Summary: Change InternalReporterType from IoTDB to Memory to 
reduce performance degradation
 Key: IOTDB-5585
 URL: https://issues.apache.org/jira/browse/IOTDB-5585
 Project: Apache IoTDB
  Issue Type: Improvement
Reporter: Xinyu Tan
Assignee: Xinyu Tan


At present, the default value of InternalReporterType is IoTDB, which may 
affect the performance. We plan to change it back to Memory by default. IoTDB 
versions optimized for this function can be opened by default



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (IOTDB-5566) Give a interface to show the configurations of IoTDB in command window

2023-02-20 Thread Xinyu Tan (Jira)


[ 
https://issues.apache.org/jira/browse/IOTDB-5566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17691426#comment-17691426
 ] 

Xinyu Tan commented on IOTDB-5566:
--

We currently support this sql, see the 
[documentation|https://iotdb.apache.org/UserGuide/Master/Cluster/Cluster-Maintenance.html#show-variables]
 for details

> Give a interface to show the configurations of IoTDB in command window
> --
>
> Key: IOTDB-5566
> URL: https://issues.apache.org/jira/browse/IOTDB-5566
> Project: Apache IoTDB
>  Issue Type: New Feature
>Reporter: changxue
>Assignee: Xinyu Tan
>Priority: Major
>
> Give a interface to show the configurations of IoTDB in command window
> The configurations of iotdb-common.properties, iotdb-confignode.properties 
> and iotdb-datanode.properties should be shown in command window(cli), because 
> users may update these configurations and would like to make sure whether the 
> modification is effective.
> Like mysql: show variables mem%



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IOTDB-5562) Change the data type of AutoGuage from long to double in metric module

2023-02-20 Thread Xinyu Tan (Jira)
Xinyu Tan created IOTDB-5562:


 Summary: Change the data type of AutoGuage from long to double in 
metric module
 Key: IOTDB-5562
 URL: https://issues.apache.org/jira/browse/IOTDB-5562
 Project: Apache IoTDB
  Issue Type: Improvement
Reporter: Xinyu Tan
Assignee: Xinyu Tan


The current metric module's AutoGuage type supports long rather than double, 
but our default metric module class, MicrometerAutoGuage, expects a double data 
type, so we do a type cast internally for all long values.

Some users expect AutoGuage to support double data type, which not only allows 
recording decimals, but also potentially reduces type cast twice such as 
"double->long->double".

After this change, AutoGuage of type long still requires one type cast, but 
AutoGuage of type double has been reduced from two to zero type casts, 
resulting in a small positive performance gain.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IOTDB-5560) Increase default consensusLogAppenderBufferSize from 4M to 16M to reduce the probability of large request write failures

2023-02-20 Thread Xinyu Tan (Jira)
Xinyu Tan created IOTDB-5560:


 Summary: Increase default consensusLogAppenderBufferSize from 4M 
to 16M to reduce the probability of large request write failures
 Key: IOTDB-5560
 URL: https://issues.apache.org/jira/browse/IOTDB-5560
 Project: Apache IoTDB
  Issue Type: Improvement
Reporter: Xinyu Tan
Assignee: Xinyu Tan


Some current [issue|https://github.com/apache/iotdb/issues/8403] have reported 
that IoTDB 1.0 cannot support write requests larger than 4M, mainly related to 
a configuration within Ratis. Although setting it larger may cause the 
unhealthy state of the cluster, the current 4M is too small, which interferes 
with the normal use of users in some scenarios, so we plan to increase it to 16M



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (IOTDB-5512) [ IoTConsensus Resend log ] The unsequence tsfile is generated after the cluster is restarted (

2023-02-14 Thread Xinyu Tan (Jira)


 [ 
https://issues.apache.org/jira/browse/IOTDB-5512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinyu Tan reassigned IOTDB-5512:


Assignee: huxiangpeng  (was: Xinyu Tan)

> [ IoTConsensus Resend log ] The unsequence tsfile is generated after the 
> cluster is restarted (
> ---
>
> Key: IOTDB-5512
> URL: https://issues.apache.org/jira/browse/IOTDB-5512
> Project: Apache IoTDB
>  Issue Type: Improvement
>  Components: mpp-cluster
>Reporter: 刘珍
>Assignee: huxiangpeng
>Priority: Major
> Attachments: image-2023-02-09-15-45-05-253.png, 
> image-2023-02-09-15-52-51-085.png, insert_no_overflow.config.properties
>
>
> 测试版本:rc/1.0.1 20230202  63b16f2
> 问题描述:
> 3副本3节点集群,benchmark写入顺序数据,重启前检查data 为全顺序,重启集群后,会生成部分乱序tsfile。
> 重启集群前查看所有dataregion的log同步情况,已同步完成:
>  !image-2023-02-09-15-45-05-253.png! 
> consensus 文件记录的已同步的log index是10
>  !image-2023-02-09-15-52-51-085.png! 
> 重启集群后,这90条log会重发,导致有乱序tsfile生成,可以优化一下,解决这个问题。
> 测试流程:
> 1. 私有云3期
> 1ConfigNode 172.20.70.5
> 3DataNode 172.20.70.2/4/14
> benchmark 在172.20.70.13 (配置见附件)
> 集群配置参数
> ConfigNode
> MAX_HEAP_SIZE="20G"
> MAX_DIRECT_MEMORY_SIZE="6G"
> DataNode
> MAX_HEAP_SIZE="20G"
> MAX_DIRECT_MEMORY_SIZE="6G"
> dn_max_connection_for_internal_service=300
> common文件
> schema_replication_factor=3
> data_replication_factor=3
> enable_seq_space_compaction=false
> enable_unseq_space_compaction=false
> enable_cross_space_compaction=false
> config_node_consensus_protocol_class=org.apache.iotdb.consensus.ratis.RatisConsensus
> schema_region_consensus_protocol_class=org.apache.iotdb.consensus.ratis.RatisConsensus
> data_region_consensus_protocol_class=org.apache.iotdb.consensus.iot.IoTConsensus
> 2. 运行benchmark
> 3. 写入完成,检查data ,无乱序tsfile,log同步完成,重启集群。
> 预期结果,重启后,无乱序tsfile。



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (IOTDB-5507) Optimized the logic that Datanodes can be added to the cluster after 20 seconds after restart

2023-02-14 Thread Xinyu Tan (Jira)


 [ 
https://issues.apache.org/jira/browse/IOTDB-5507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinyu Tan reassigned IOTDB-5507:


Assignee: Yongzao Dan  (was: Xinyu Tan)

> Optimized the logic that Datanodes can be added to the cluster after 20 
> seconds after restart
> -
>
> Key: IOTDB-5507
> URL: https://issues.apache.org/jira/browse/IOTDB-5507
> Project: Apache IoTDB
>  Issue Type: Improvement
>Reporter: changxue
>Assignee: Yongzao Dan
>Priority: Major
> Attachments: image-2023-02-09-11-43-44-065.png
>
>
> [start] It's not a good idea to wait 20s to restart a datanode 
>  !image-2023-02-09-11-43-44-065.png|width=800! 
> suppose: 
> I change some configurations and need to restart to make sense, then I want 
> to restart immediately, rather than waiting 20s
> It's not a good designation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (IOTDB-5507) [start] It's not a good idea to wait 20s to restart a datanode

2023-02-08 Thread Xinyu Tan (Jira)


[ 
https://issues.apache.org/jira/browse/IOTDB-5507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17686242#comment-17686242
 ] 

Xinyu Tan commented on IOTDB-5507:
--

How do you restart the node? Do you use kill-9? Or do you use stop-datanode.sh?

> [start] It's not a good idea to wait 20s to restart a datanode 
> ---
>
> Key: IOTDB-5507
> URL: https://issues.apache.org/jira/browse/IOTDB-5507
> Project: Apache IoTDB
>  Issue Type: Improvement
>Reporter: changxue
>Assignee: Xinyu Tan
>Priority: Major
> Attachments: image-2023-02-09-11-43-44-065.png
>
>
> [start] It's not a good idea to wait 20s to restart a datanode 
>  !image-2023-02-09-11-43-44-065.png|width=800! 
> suppose: 
> I change some configurations and need to restart to make sense, then I want 
> to restart immediately, rather than waiting 20s
> It's not a good designation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (IOTDB-5112) Fixed IoTConsensus synchronization stuck under low load or during restart

2023-02-02 Thread Xinyu Tan (Jira)


 [ 
https://issues.apache.org/jira/browse/IOTDB-5112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinyu Tan reassigned IOTDB-5112:


Assignee: huxiangpeng  (was: Xinyu Tan)

> Fixed IoTConsensus synchronization stuck under low load or during restart
> -
>
> Key: IOTDB-5112
> URL: https://issues.apache.org/jira/browse/IOTDB-5112
> Project: Apache IoTDB
>  Issue Type: Bug
>  Components: mpp-cluster
>Reporter: Chao Wang
>Assignee: huxiangpeng
>Priority: Major
>
>  error log: waiting target request timeout. current index: 20,  target index: 
> -1.
>  Because when requestCache.size()! = MAX_REQUEST_CACHE_SIZE, nextSyncIndex 
> does not reassign a value
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (IOTDB-5466) [ratis]Write logs every 2 minutes:[pool-21-IoTDB-ratis-bg-disk-guardian-1] INFO o.a.i.c.r.RatisConsensus:709 - Raft group group-000200000000 took snapshot successfully

2023-02-02 Thread Xinyu Tan (Jira)


 [ 
https://issues.apache.org/jira/browse/IOTDB-5466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinyu Tan reassigned IOTDB-5466:


Assignee: Song Ziyang  (was: Xinyu Tan)

> [ratis]Write logs every 2 minutes:[pool-21-IoTDB-ratis-bg-disk-guardian-1] 
> INFO  o.a.i.c.r.RatisConsensus:709 - Raft group group-0002 took 
> snapshot successfully
> 
>
> Key: IOTDB-5466
> URL: https://issues.apache.org/jira/browse/IOTDB-5466
> Project: Apache IoTDB
>  Issue Type: Bug
>  Components: mpp-cluster
>Affects Versions: 1.0.1
>Reporter: 刘珍
>Assignee: Song Ziyang
>Priority: Major
> Attachments: image-2023-02-03-10-35-48-860.png, lt.conf
>
>
> 测试版本: rc/1.0.1 20230129  573097a
> 问题描述:
> 3副本3C3D,各节点状态正常,Benchmark在执行读写,datanode 间隔2分钟,持续刷如下log:
>  !image-2023-02-03-10-35-48-860.png! 
> 测试环境
> 1. 192.168.10.62/66/68/64   72CPU 256GB
> ConfigNode 和DataNode在192.168.10.62/66/68
> 路径是/data/liuzhen_test/r_0129_573097a
> Benchmark在192.168.10.64,/data/liuzhen_test/3c3d_longtest/bm_v1
> 2. 配置参数
> ConfigNode参数:
> MAX_HEAP_SIZE="8G"
> cn_target_config_node_list=192.168.10.62:10710
> DataNode参数:
> MAX_HEAP_SIZE="192G"
> MAX_DIRECT_MEMORY_SIZE="32G"
> dn_max_connection_for_internal_service=300
> dn_target_config_node_list=192.168.10.62:10710,192.168.10.66:10710,192.168.10.68:10710
> Common参数:
> schema_replication_factor=3
> data_replication_factor=3
> iot_consensus_throttle_threshold_in_byte=536870912000
> disk_space_warning_threshold=0.01
> 3. 启动Benchmark 7*24小时读写
> 配置文件见附件。
> 4.查看datanode日志
> 3个节点都有问题描述中的现象。



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (IOTDB-5383) [confignode]start-confignode fail with NPE

2023-02-02 Thread Xinyu Tan (Jira)


 [ 
https://issues.apache.org/jira/browse/IOTDB-5383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinyu Tan reassigned IOTDB-5383:


Assignee: Song Ziyang  (was: Xinyu Tan)

> [confignode]start-confignode fail with NPE
> --
>
> Key: IOTDB-5383
> URL: https://issues.apache.org/jira/browse/IOTDB-5383
> Project: Apache IoTDB
>  Issue Type: Bug
>  Components: Core/Server
>Affects Versions: 1.0.1
>Reporter: changxue
>Assignee: Song Ziyang
>Priority: Major
> Attachments: conf-46.tar.gz, confignode-npe_allnodes-log.tar.gz
>
>
> [confignode]start-confignode fail with NPE
> reproduction:
> 1. config_node_ratis_snapshot_trigger_threshold=30 append in 
> iotdb-common.properties
> 2. start 3C3D cluster
> expect: start successfully
> actual result:
> 2C3D start successfully but it failed with NPE when start the third 
> confignode 
> {code}
> show cluster
> +--+--+---+---++
> |NodeID|  NodeType| Status|InternalAddress|InternalPort|
> +--+--+---+---++
> | 0|ConfigNode|Running|   172.20.70.44|   10710|
> | 2|ConfigNode|Running|   172.20.70.45|   10710|
> | 1|  DataNode|Running|   172.20.70.44|   10730|
> | 3|  DataNode|Running|   172.20.70.45|   10730|
> | 5|  DataNode|Running|   172.20.70.46|   10730|
> +--+--+---+---++
> {code}
> {code}
> 2023-01-07 14:42:11,745 [grpc-default-executor-0] INFO  
> o.a.r.g.s.GrpcServerProtocolService$ServerRequestStreamObserver:143 - 8: 
> Completed INSTALL_SNAPSHOT, lastRequest: 
> 0->8#0-t1,chunk:ba310edb-b921-452d-8023-4ef2ad4f51f9,8 
> 2023-01-07 14:42:11,746 [8@group--StateMachineUpdater] ERROR 
> o.a.r.s.i.StateMachineUpdater:194 - 8@group--StateMachineUpdater 
> caught a Throwable. 
> java.lang.NullPointerException: snapshot == null
>   at java.util.Objects.requireNonNull(Objects.java:228)
>   at 
> org.apache.ratis.server.impl.StateMachineUpdater.reload(StateMachineUpdater.java:219)
>   at 
> org.apache.ratis.server.impl.StateMachineUpdater.run(StateMachineUpdater.java:179)
>   at java.lang.Thread.run(Thread.java:748)
> {code}
> 猜测与 config_node_ratis_snapshot_trigger_threshold 
> 配置太小有关。第三个confignode启动不了,show timeseries root.** 也运行不了,即集群不可用。



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (IOTDB-5411) Write an error using the session interface

2023-01-29 Thread Xinyu Tan (Jira)


[ 
https://issues.apache.org/jira/browse/IOTDB-5411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17681727#comment-17681727
 ] 

Xinyu Tan commented on IOTDB-5411:
--

https://github.com/apache/iotdb/pull/8840

> Write an error using the session interface
> --
>
> Key: IOTDB-5411
> URL: https://issues.apache.org/jira/browse/IOTDB-5411
> Project: Apache IoTDB
>  Issue Type: Bug
>  Components: Client/Java
>Reporter: sunhao
>Assignee: Hongyin Zhang
>Priority: Major
> Attachments: image-2023-01-12-18-17-00-706.png
>
>
> !image-2023-01-12-18-17-00-706.png|width=1035,height=188!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IOTDB-5425) Consolidate all ConfigNodeClient to be managed by clientManager

2023-01-16 Thread Xinyu Tan (Jira)
Xinyu Tan created IOTDB-5425:


 Summary: Consolidate all ConfigNodeClient to be managed by 
clientManager
 Key: IOTDB-5425
 URL: https://issues.apache.org/jira/browse/IOTDB-5425
 Project: Apache IoTDB
  Issue Type: Improvement
Reporter: Xinyu Tan
Assignee: Xinyu Tan


On the one hand, it makes the code logical, and on the other hand, it may 
resolve ConfigNodeClient leaks



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IOTDB-5384) add core_client_count_for_each_node_in_client_manager and max_client_count_for_each_node_in_client_manager parameters for confignode and datanode

2023-01-06 Thread Xinyu Tan (Jira)
Xinyu Tan created IOTDB-5384:


 Summary: add core_client_count_for_each_node_in_client_manager and 
max_client_count_for_each_node_in_client_manager parameters for confignode and 
datanode
 Key: IOTDB-5384
 URL: https://issues.apache.org/jira/browse/IOTDB-5384
 Project: Apache IoTDB
  Issue Type: Improvement
Reporter: Xinyu Tan
Assignee: Xinyu Tan


Two parameters are added to confignode.properties:
 * cn_core_client_count_for_each_node_in_client_manager
 * cn_max_client_count_for_each_node_in_client_manager

Two parameters are added to datanode.properties:
 * dn_core_client_count_for_each_node_in_client_manager
 * dn_max_client_count_for_each_node_in_client_manager

This issue also causes all clientManager initializations to use these 
parameters, as well as updating the documentation



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IOTDB-5345) Use the logical clock to identify the snapshot version of IoTConsensus

2023-01-03 Thread Xinyu Tan (Jira)
Xinyu Tan created IOTDB-5345:


 Summary: Use the logical clock to identify the snapshot version of 
IoTConsensus
 Key: IOTDB-5345
 URL: https://issues.apache.org/jira/browse/IOTDB-5345
 Project: Apache IoTDB
  Issue Type: Improvement
Reporter: Xinyu Tan
Assignee: huxiangpeng
 Attachments: image-2023-01-03-23-45-07-397.png

The current IoTConsensus uses physical clocks to identify different snapshot 
versions. 
In some operation scenarios, the physical clock of the machine may be rolled 
back. This may cause IoTConsensus to label the latest snapshot as the old 
snapshot version. Therefore, we need to use logical timestamps to mark 
different snapshot versions. For example, use a self-maintaining increment 
index. 

In addition, this work needs to ensure forward compatibility with 1.0.0

 !image-2023-01-03-23-45-07-397.png! 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Reopened] (IOTDB-5111) [ ratis ] Data is distributed across disks ,after the cluster is restarted, all data is lost

2023-01-03 Thread Xinyu Tan (Jira)


 [ 
https://issues.apache.org/jira/browse/IOTDB-5111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinyu Tan reopened IOTDB-5111:
--

> [ ratis ] Data is distributed across disks ,after the cluster is restarted, 
> all data is lost
> 
>
> Key: IOTDB-5111
> URL: https://issues.apache.org/jira/browse/IOTDB-5111
> Project: Apache IoTDB
>  Issue Type: Bug
>  Components: mpp-cluster
>Affects Versions: 1.0.0
>Reporter: 刘珍
>Assignee: Song Ziyang
>Priority: Major
> Attachments: image-2022-12-02-17-58-45-096.png, 
> image-2022-12-02-17-59-05-010.png
>
>
> rel/1.0
> config/schema/data 3个协议均是ratis,
> dn_data_dirs=data/datanode/data,/data1/iotdb/datanode/data
> 跨盘存储,
> 写入数据,重启集群,{color:#DE350B}*数据全部丢失*{color}。
> 还有1个问题,{color:#DE350B}snapshot目录下依然有.tmp.文件夹名称{color}:
>  !image-2022-12-02-17-59-05-010.png! 
> 测试环境-私有云1期  8C32GB
> 1. 3副本3C7D
> Common
> data_region_consensus_protocol_class=org.apache.iotdb.consensus.ratis.RatisConsensus
> schema_replication_factor=3
> data_replication_factor=3
> wal_buffer_size_in_byte=1048576
> max_waiting_time_when_insert_blocked=360
> query_timeout_threshold=3600
> ConfigNode
> MAX_HEAP_SIZE="20G"
> MAX_DIRECT_MEMORY_SIZE="6G"
> DataNode
> MAX_HEAP_SIZE="20G"
> MAX_DIRECT_MEMORY_SIZE="6G"
> dn_data_dirs=data/datanode/data,/data1/iotdb/datanode/data
> 2. 启动BM 写入数据
> GROUP_NUMBER=1
> DEVICE_NUMBER=1000
> REAL_INSERT_RATE=1.0
> SENSOR_NUMBER=1000
> IS_SENSOR_TS_ALIGNMENT=true
> IS_OUT_OF_ORDER=false
> OUT_OF_ORDER_RATIO=0.5
> OPERATION_PROPORTION=1:0:0:0:0:0:0:0:0:0:0
> CLIENT_NUMBER=50
> LOOP=1
> BATCH_SIZE_PER_WRITE=10
> START_TIME=2018-8-30T00:00:00+08:00
> POINT_STEP=200
> OP_MIN_INTERVAL=0
> OP_MIN_INTERVAL_RANDOM=false
> INSERT_DATATYPE_PROPORTION=1:1:1:1:1:1
> ENCODINGS=PLAIN/PLAIN/PLAIN/PLAIN/PLAIN/PLAIN
> COMPRESSOR=SNAPPY
> IS_DELETE_DATA=false
> CREATE_SCHEMA=true
> BENCHMARK_CLUSTER=false
>  !image-2022-12-02-17-58-45-096.png! 
> 3. 重启集群



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Reopened] (IOTDB-5231) [monitor]datanode could not start when binding 9091 error

2023-01-03 Thread Xinyu Tan (Jira)


 [ 
https://issues.apache.org/jira/browse/IOTDB-5231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinyu Tan reopened IOTDB-5231:
--

> [monitor]datanode could not start when binding 9091 error 
> --
>
> Key: IOTDB-5231
> URL: https://issues.apache.org/jira/browse/IOTDB-5231
> Project: Apache IoTDB
>  Issue Type: Bug
>Affects Versions: 1.0.0
>Reporter: changxue
>Assignee: Hongyin Zhang
>Priority: Major
>  Labels: pull-request-available
> Attachments: config.tar.gz, monitor_error_log.tar.gz
>
>
> [monitor]datanode could not start when binding 9091 error 
> environment:
> 3C3D cluster, rel/1.0 branch
> 1. enable prometheus monitor
> 2. the prometheus service has not been started
> problem:
> 1. 监控是附加功能,打开它并且它工作不正常(可以warning),但不应该出现error,不应该影响rpc service等的启动。
> 2. 这种情况下,stop-datanode.sh 是不能成功停止成功的,需要kill
> 3. confignode启动成功,且成功绑定了9091, datanode再绑定9091,结果失败。需要使之成功。
> {code}
> 2022-12-19 10:26:23,574 [main] INFO  o.a.i.m.AbstractMetricService:130 - 
> Detect more than one MetricManager, will use 
> org.apache.iotdb.metrics.micrometer.MicrometerMetricManager
> 2022-12-19 10:26:23,574 [main] INFO  o.a.i.m.AbstractMetricService:137 - Load 
> metric reporters, type: [PROMETHEUS]
> 2022-12-19 10:26:23,939 [main] ERROR o.a.i.c.s.m.MetricService:52 - Failed to 
> start Metrics ServerService because:
> reactor.netty.ChannelBindException: Failed to bind on [0.0.0.0:9091]
> Suppressed: java.lang.Exception: #block terminated with an error
> at 
> reactor.core.publisher.BlockingSingleSubscriber.blockingGet(BlockingSingleSubscriber.java:139)
> at reactor.core.publisher.Mono.block(Mono.java:1731)
> at 
> reactor.netty.transport.ServerTransport.bindNow(ServerTransport.java:145)
> at 
> reactor.netty.transport.ServerTransport.bindNow(ServerTransport.java:130)
> at 
> org.apache.iotdb.metrics.reporter.prometheus.PrometheusReporter.start(PrometheusReporter.java:81)
> at 
> org.apache.iotdb.metrics.CompositeReporter.startAll(CompositeReporter.java:38)
> at 
> org.apache.iotdb.metrics.AbstractMetricService.startAllReporter(AbstractMetricService.java:193)
> at 
> org.apache.iotdb.metrics.AbstractMetricService.startCoreModule(AbstractMetricService.java:98)
> at 
> org.apache.iotdb.metrics.AbstractMetricService.startService(AbstractMetricService.java:76)
> at 
> org.apache.iotdb.commons.service.metric.MetricService.start(MetricService.java:49)
> at 
> org.apache.iotdb.commons.service.RegisterManager.register(RegisterManager.java:51)
> at 
> org.apache.iotdb.db.service.DataNode.doAddNode(DataNode.java:162)
> at 
> org.apache.iotdb.db.service.DataNodeServerCommandLine.run(DataNodeServerCommandLine.java:95)
> at 
> org.apache.iotdb.commons.ServerCommandLine.doMain(ServerCommandLine.java:58)
> at 
> org.apache.iotdb.db.service.DataNode.main(DataNode.java:131)
> 2022-12-19 10:26:23,940 [main] ERROR o.a.i.db.service.DataNode:178 - Fail to 
> start server
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (IOTDB-4986) Too many IoTDB-DataNodeInternalRPC-Processor threads are open

2023-01-02 Thread Xinyu Tan (Jira)


[ 
https://issues.apache.org/jira/browse/IOTDB-4986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17653767#comment-17653767
 ] 

Xinyu Tan commented on IOTDB-4986:
--

Not yet. This issue requires continuous optimization of the Thrift Threading 
model over time, and I've broken it down a few issues that might take a sprint 
or two to complete

> Too many IoTDB-DataNodeInternalRPC-Processor threads are open
> -
>
> Key: IOTDB-4986
> URL: https://issues.apache.org/jira/browse/IOTDB-4986
> Project: Apache IoTDB
>  Issue Type: Improvement
>  Components: mpp-cluster
>Affects Versions: 0.14.0-SNAPSHOT
>Reporter: 刘珍
>Assignee: Haiming Zhu
>Priority: Critical
>
> m_1118_3d5eeae
> 1. 启动3副本3C21D 集群
> 2. 顺序启动7Benchmark
> 3. 某个节点的datanode IoTDB-DataNodeInternalRPC-Processor 线程会开的很多,2k+ 
> (慢慢会降下来),但是会偶现OOM
> 2022-11-18 14:26:48,320 
> [pool-22-IoTDB-DataNodeInternalRPC-Processor-374$20221118_062422_29227_16.1.0]
>  ERROR o.a.i.d.m.p.s.FragmentInstanceDispatcherImpl:234 - write locally 
> failed. TSStatus: TSStatus(code:506, subStatus:[]), message: null
> 2022-11-18 14:29:44,568 [DataNodeInternalRPC-Service]{color:red}* ERROR 
> o.a.i.c.c.IoTDBDefaultThreadExceptionHandler:31 - Exception in thread 
> DataNodeInternalRPC-Service-40
> java.lang.OutOfMemoryError: unable to create native thread: possibly out of 
> memory or process/resource limits reached*{color}
> at java.base/java.lang.Thread.start0(Native Method)
> at java.base/java.lang.Thread.start(Thread.java:803)
> at 
> java.base/java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:937)
> at 
> java.base/java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1354)
> at 
> org.apache.thrift.server.TThreadPoolServer.execute(TThreadPoolServer.java:155)
> at 
> org.apache.thrift.server.TThreadPoolServer.serve(TThreadPoolServer.java:139)
> at 
> org.apache.iotdb.commons.service.AbstractThriftServiceThread.run(AbstractThriftServiceThread.java:258)
> 2022-11-18 14:29:53,751 [ClientRPC-Service] ERROR 
> o.a.i.c.c.IoTDBDefaultThreadExceptionHandler:31 - Exception in thread 
> ClientRPC-Service-42
> java.lang.OutOfMemoryError: unable to create native thread: possibly out of 
> memory or process/resource limits reached
> at java.base/java.lang.Thread.start0(Native Method)
> at java.base/java.lang.Thread.start(Thread.java:803)
> at 
> java.base/java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:937)
> at 
> java.base/java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1354)
> at 
> org.apache.thrift.server.TThreadPoolServer.execute(TThreadPoolServer.java:155)
> at 
> org.apache.thrift.server.TThreadPoolServer.serve(TThreadPoolServer.java:139)
> at 
> org.apache.iotdb.commons.service.AbstractThriftServiceThread.run(AbstractThriftServiceThread.java:258)
> 2022-11-18 14:30:11,736 [pool-6-IoTDB-Flush-4] ERROR 
> o.a.i.d.e.s.TsFileProcessor:1095 - root.test.g_0-6: 
> /data/iotdb/m_1118_3d5eeae/sbin/../data/datanode/data/unsequence/root.test.g_0/6/2538/1668752675355-5-0-0.tsfile
>  meet error when flushing a memtable, change system mode to error
> java.lang.OutOfMemoryError: unable to create native thread: possibly out of 
> memory or process/resource limits reached
> at java.base/java.lang.Thread.start0(Native Method)
> at java.base/java.lang.Thread.start(Thread.java:803)
> at 
> java.base/java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:937)
> at 
> java.base/java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1354)
> at 
> java.base/java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:118)
> at 
> org.apache.iotdb.db.rescon.AbstractPoolManager.submit(AbstractPoolManager.java:56)
> at 
> org.apache.iotdb.db.engine.flush.MemTableFlushTask.(MemTableFlushTask.java:88)
> at 
> org.apache.iotdb.db.engine.storagegroup.TsFileProcessor.flushOneMemTable(TsFileProcessor.java:1082)
> at 
> org.apache.iotdb.db.engine.flush.FlushManager$FlushThread.runMayThrow(FlushManager.java:108)
> at 
> org.apache.iotdb.commons.concurrent.WrappedRunnable.run(WrappedRunnable.java:29)
> at 
> java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
> at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
> at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
> at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
> at java.base/java.lang.Thread.run(Thread.java:834)
> 2022-11-18 

[jira] [Reopened] (IOTDB-5060) Control the ratis log size

2023-01-02 Thread Xinyu Tan (Jira)


 [ 
https://issues.apache.org/jira/browse/IOTDB-5060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinyu Tan reopened IOTDB-5060:
--

> Control the ratis log size
> --
>
> Key: IOTDB-5060
> URL: https://issues.apache.org/jira/browse/IOTDB-5060
> Project: Apache IoTDB
>  Issue Type: Improvement
>Reporter: Jialin Qiao
>Assignee: Song Ziyang
>Priority: Major
>
> Currently, we have the operation number limit, but when meet big operation, 
> the log will occupy too much disk.
> Need control the total raft log sie.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Reopened] (IOTDB-5111) [ ratis ] Data is distributed across disks ,after the cluster is restarted, all data is lost

2023-01-02 Thread Xinyu Tan (Jira)


 [ 
https://issues.apache.org/jira/browse/IOTDB-5111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinyu Tan reopened IOTDB-5111:
--

> [ ratis ] Data is distributed across disks ,after the cluster is restarted, 
> all data is lost
> 
>
> Key: IOTDB-5111
> URL: https://issues.apache.org/jira/browse/IOTDB-5111
> Project: Apache IoTDB
>  Issue Type: Bug
>  Components: mpp-cluster
>Affects Versions: 1.0.0
>Reporter: 刘珍
>Assignee: Song Ziyang
>Priority: Major
> Attachments: image-2022-12-02-17-58-45-096.png, 
> image-2022-12-02-17-59-05-010.png
>
>
> rel/1.0
> config/schema/data 3个协议均是ratis,
> dn_data_dirs=data/datanode/data,/data1/iotdb/datanode/data
> 跨盘存储,
> 写入数据,重启集群,{color:#DE350B}*数据全部丢失*{color}。
> 还有1个问题,{color:#DE350B}snapshot目录下依然有.tmp.文件夹名称{color}:
>  !image-2022-12-02-17-59-05-010.png! 
> 测试环境-私有云1期  8C32GB
> 1. 3副本3C7D
> Common
> data_region_consensus_protocol_class=org.apache.iotdb.consensus.ratis.RatisConsensus
> schema_replication_factor=3
> data_replication_factor=3
> wal_buffer_size_in_byte=1048576
> max_waiting_time_when_insert_blocked=360
> query_timeout_threshold=3600
> ConfigNode
> MAX_HEAP_SIZE="20G"
> MAX_DIRECT_MEMORY_SIZE="6G"
> DataNode
> MAX_HEAP_SIZE="20G"
> MAX_DIRECT_MEMORY_SIZE="6G"
> dn_data_dirs=data/datanode/data,/data1/iotdb/datanode/data
> 2. 启动BM 写入数据
> GROUP_NUMBER=1
> DEVICE_NUMBER=1000
> REAL_INSERT_RATE=1.0
> SENSOR_NUMBER=1000
> IS_SENSOR_TS_ALIGNMENT=true
> IS_OUT_OF_ORDER=false
> OUT_OF_ORDER_RATIO=0.5
> OPERATION_PROPORTION=1:0:0:0:0:0:0:0:0:0:0
> CLIENT_NUMBER=50
> LOOP=1
> BATCH_SIZE_PER_WRITE=10
> START_TIME=2018-8-30T00:00:00+08:00
> POINT_STEP=200
> OP_MIN_INTERVAL=0
> OP_MIN_INTERVAL_RANDOM=false
> INSERT_DATATYPE_PROPORTION=1:1:1:1:1:1
> ENCODINGS=PLAIN/PLAIN/PLAIN/PLAIN/PLAIN/PLAIN
> COMPRESSOR=SNAPPY
> IS_DELETE_DATA=false
> CREATE_SCHEMA=true
> BENCHMARK_CLUSTER=false
>  !image-2022-12-02-17-58-45-096.png! 
> 3. 重启集群



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (IOTDB-5324) [migrate region] 1rep1C4D ,after the region is migrated successfully, wal cannot be deleted from destDataNode

2022-12-30 Thread Xinyu Tan (Jira)


 [ 
https://issues.apache.org/jira/browse/IOTDB-5324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinyu Tan reassigned IOTDB-5324:


Assignee: Xinyu Tan  (was: Gaofei Cao)

> [migrate region] 1rep1C4D ,after the region is migrated successfully, wal 
> cannot be deleted from destDataNode 
> --
>
> Key: IOTDB-5324
> URL: https://issues.apache.org/jira/browse/IOTDB-5324
> Project: Apache IoTDB
>  Issue Type: Bug
>  Components: mpp-cluster
>Affects Versions: master branch
>Reporter: 刘珍
>Assignee: Xinyu Tan
>Priority: Major
> Attachments: 40971672369689_.pic.jpg, mig.conf, screenshot-1.png, 
> screenshot-2.png
>
>
> m_1229_0fedffd
> 问题描述
> 1副本1C4D集群,写入数据过程中,迁移region(Id=1 from ip4 to ip14)成功,但是目的节点的wal删不掉。
> 1.启动1副本1C4D集群
> config/schema/data 是ratis/ratis/IoT协议
> 2.BM 写入数据(配置见附件)
> 9分钟后,迁移region
> ./sbin/start-cli.sh -h 172.20.70.4 -e "migrate region 1 from 2 to 3"
> 迁移成功,耗时20秒(2022-12-29 18:25:17,621-2022-12-29 18:25:37,676)
> 但是ip14的datanode 
> 的regionId=1的wal删除不掉,导致大小为50GB,一直有限流的WARN日志,BM16个多小时不结束,理论上BM1个多小时就应该执行完成:
> 2022-12-30 10:14:07,669 
> [pool-25-IoTDB-ClientRPC-Processor-59$20221230_021337_10719_3.1.0] WARN  
> o.a.i.d.m.p.s.FragmentInstanceDispatcherImpl:243 - write locally failed. 
> TSStatus: TSStatus(code:606, message:Reject write because there are too many 
> requests need to process), message: Reject write because there are too many 
> requests need to process
> 测试环境:私有云3期
> DataNode配置
> MAX_HEAP_SIZE="20G"
> MAX_DIRECT_MEMORY_SIZE="6G"
> dn_max_connection_for_internal_service=300
> ConfigNode配置
> MAX_HEAP_SIZE="20G"
> MAX_DIRECT_MEMORY_SIZE="6G"
> 迁移region前的region信息
>  !screenshot-1.png! 
> 迁移region成功后的region信息
>  !screenshot-2.png! 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IOTDB-5312) Consolidate ClientManagers in Datanodes for unified management

2022-12-28 Thread Xinyu Tan (Jira)
Xinyu Tan created IOTDB-5312:


 Summary: Consolidate ClientManagers in Datanodes for unified 
management
 Key: IOTDB-5312
 URL: https://issues.apache.org/jira/browse/IOTDB-5312
 Project: Apache IoTDB
  Issue Type: Improvement
Reporter: Xinyu Tan
Assignee: Xinyu Tan


The ClientManager of Datanodes is divided into different modules. On the one 
hand, thrift client reuse rate is not high. On the other hand, under the 
current thriftServer thread model of BIO, thread explosion may occur. The PR 
will mainly consolidate ClientManagers in Datanodes and do some necessary 
reconstruction

* Consolidate ClientManagers in Datanodes for unified management
* Move some clientFactory from DataNodeClientPoolFactory to ClientPoolFactory
* Add thrift related parameters to CommonConfig so that they can be retrieved 
by ClientPoolFactory
* By introducing ThriftClientFactory, the BaseClientFactory is not bound to 
thrift, so that the RatisClientFactory does not depend on thrift related 
parameters in the future
* Enhance clientManager's handling of null, adding necessary judgments and 
removing unwanted ones
* Adds invalidation logic for exceptions that occur when an asynchronous client 
fails




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IOTDB-5260) Refactoring ClientManager API and Exception

2022-12-21 Thread Xinyu Tan (Jira)
Xinyu Tan created IOTDB-5260:


 Summary: Refactoring ClientManager API and Exception
 Key: IOTDB-5260
 URL: https://issues.apache.org/jira/browse/IOTDB-5260
 Project: Apache IoTDB
  Issue Type: Improvement
Reporter: Xinyu Tan
Assignee: Xinyu Tan


* ClientManagerException is introduced to facilitate ClientManager users to 
distinguish borrowClient exception from other business exception.
* remove purelyBorrowClient API to make ClientManager API clearer
 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IOTDB-5246) Enhance IoTConsensus field name

2022-12-19 Thread Xinyu Tan (Jira)
Xinyu Tan created IOTDB-5246:


 Summary: Enhance IoTConsensus field name
 Key: IOTDB-5246
 URL: https://issues.apache.org/jira/browse/IOTDB-5246
 Project: Apache IoTDB
  Issue Type: Improvement
Reporter: Xinyu Tan
Assignee: Xinyu Tan


rename:
* PendingBatch -> Batch
* TSyncLogReq -> TSyncLogEntriesReq
* TSyncLogRes -> TSyncLogEntriesRes
* TLogBatch -> TLogEntry
* syncLog -> syncLogEntries



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IOTDB-5174) Use filename format such as NodeID-Index rather than Endpoint-Index to track follower sync progress

2022-12-11 Thread Xinyu Tan (Jira)
Xinyu Tan created IOTDB-5174:


 Summary: Use filename format such as NodeID-Index rather than 
Endpoint-Index to track follower sync progress 
 Key: IOTDB-5174
 URL: https://issues.apache.org/jira/browse/IOTDB-5174
 Project: Apache IoTDB
  Issue Type: Improvement
Reporter: Xinyu Tan
Assignee: Xinyu Tan


This work can not only solve the bug in this 
[issue|https://github.com/apache/iotdb/issues/8334], but also facilitate the 
future peer ip/port update. In addition, this work needs to be compatible with 
version 1.0



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (IOTDB-4350) [ MultiLeader Throttle Down] Performance does not return to normal after “Throttle Down“

2022-12-11 Thread Xinyu Tan (Jira)


[ 
https://issues.apache.org/jira/browse/IOTDB-4350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17645746#comment-17645746
 ] 

Xinyu Tan commented on IOTDB-4350:
--

自从考虑 SyncStatus 的 IoTConsensus 内存控制完善以后,这个现象已经不复存在。建议复测,如果没什么问题就 close 了吧

> [ MultiLeader  Throttle Down] Performance does not return to normal after 
> “Throttle Down“
> -
>
> Key: IOTDB-4350
> URL: https://issues.apache.org/jira/browse/IOTDB-4350
> Project: Apache IoTDB
>  Issue Type: Bug
>  Components: mpp-cluster
>Affects Versions: 0.14.0-SNAPSHOT
>Reporter: 刘珍
>Assignee: 张洪胤
>Priority: Major
> Fix For: 1.0.0
>
> Attachments: image-2022-09-07-14-52-58-266.png, net_restart.conf, 
> screenshot-1.png, screenshot-2.png, screenshot-3.png, screenshot-4.png, 
> screenshot-5.png, screenshot-6.png
>
>
> m_0905_0095eb3,3副本3C3D
> 3个dataregion , 每个node上有1个leader。
> ip72 断网3分钟(16:52 ~ 16:55),查看集群状态,切主成功后,
> ip73断网2分钟,之后不执行故障操作。
> 同步慢,multiLeader一直在写入限流,但是限流性能也回不去,如下,统计1分钟的写入数据量(bm中的batch)
>  !screenshot-6.png! 
> IoTDB> select count(latency) from 
> root.result.moresession_2022_09_06_04_47_03.INGESTION where okPoint>0 group 
> by ([1662454041076000186,1662459764764000179),1m);
> +---++
> |   
> Time|count(root.result.moresession_2022_09_06_04_47_03.INGESTION.latency)|
> +---++
> |2022-09-06T16:47:21.076000186+08:00| 
>5544|
> |2022-09-06T16:48:21.076000186+08:00| 
>6282|
> |2022-09-06T16:49:21.076000186+08:00| 
>5671|
> |2022-09-06T16:50:21.076000186+08:00| 
>4589|
> |2022-09-06T16:51:21.076000186+08:00| 
>5350|
> |2022-09-06T16:52:21.076000186+08:00| 
>1121|
> |2022-09-06T16:53:21.076000186+08:00| 
> 901|
> |2022-09-06T16:54:21.076000186+08:00| 
> 201|
> |2022-09-06T16:55:21.076000186+08:00| 
> 334|
> |2022-09-06T16:56:21.076000186+08:00| 
>3501|
> |2022-09-06T16:57:21.076000186+08:00| 
>3677|
> |2022-09-06T16:58:21.076000186+08:00| 
>3111|
> |2022-09-06T16:59:21.076000186+08:00| 
>1948|
> |2022-09-06T17:00:21.076000186+08:00| 
>3889|
> |2022-09-06T17:01:21.076000186+08:00| 
>2982|
> |2022-09-06T17:02:21.076000186+08:00| 
>4465|
> |2022-09-06T17:03:21.076000186+08:00| 
>4871|
> |2022-09-06T17:04:21.076000186+08:00| 
>4478|
> |2022-09-06T17:05:21.076000186+08:00| 
>3242|
> |2022-09-06T17:06:21.076000186+08:00| 
>2545|
> |2022-09-06T17:07:21.076000186+08:00| 
>2579|
> |2022-09-06T17:08:21.076000186+08:00| 
> 133|
> |2022-09-06T17:09:21.076000186+08:00| 
> 488|
> |2022-09-06T17:10:21.076000186+08:00| 
> 253|
> |2022-09-06T17:11:21.076000186+08:00| 
> 445|
> |2022-09-06T17:12:21.076000186+08:00| 
>2122|
> |2022-09-06T17:13:21.076000186+08:00| 
>1799|
> |2022-09-06T17:14:21.076000186+08:00| 
>1568|
> |2022-09-06T17:15:21.076000186+08:00| 
> 

[jira] [Assigned] (IOTDB-5112) IoTConsesus retry timeout util after restart

2022-12-05 Thread Xinyu Tan (Jira)


 [ 
https://issues.apache.org/jira/browse/IOTDB-5112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinyu Tan reassigned IOTDB-5112:


Assignee: Xinyu Tan

> IoTConsesus retry timeout util after restart
> 
>
> Key: IOTDB-5112
> URL: https://issues.apache.org/jira/browse/IOTDB-5112
> Project: Apache IoTDB
>  Issue Type: Bug
>  Components: mpp-cluster
>Reporter: Chao Wang
>Assignee: Xinyu Tan
>Priority: Major
>
>  error log: waiting target request timeout. current index: 20,  target index: 
> -1.
>  Because when requestCache.size()! = MAX_REQUEST_CACHE_SIZE, nextSyncIndex 
> does not reassign a value
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (IOTDB-4855) [MultiLeader] Strength the memory control

2022-11-16 Thread Xinyu Tan (Jira)


 [ 
https://issues.apache.org/jira/browse/IOTDB-4855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinyu Tan reassigned IOTDB-4855:


Assignee: Xinyu Tan  (was: 张洪胤)

> [MultiLeader] Strength the memory control
> -
>
> Key: IOTDB-4855
> URL: https://issues.apache.org/jira/browse/IOTDB-4855
> Project: Apache IoTDB
>  Issue Type: Improvement
>Reporter: 张洪胤
>Assignee: Xinyu Tan
>Priority: Major
>  Labels: pull-request-available
>
> We need to strength the control of multiLeader memory and taking the size of 
> syncStatus and pendingBatch that reading from WAL into consideration



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IOTDB-3559) Add metrics for the consensus module

2022-06-20 Thread Xinyu Tan (Jira)
Xinyu Tan created IOTDB-3559:


 Summary: Add metrics for the consensus module
 Key: IOTDB-3559
 URL: https://issues.apache.org/jira/browse/IOTDB-3559
 Project: Apache IoTDB
  Issue Type: Improvement
Reporter: Xinyu Tan






--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (IOTDB-3570) Extend the Peer structure of the consensus layer to embed the ID of the upper layer

2022-06-20 Thread Xinyu Tan (Jira)
Xinyu Tan created IOTDB-3570:


 Summary: Extend the Peer structure of the consensus layer to embed 
the ID of the upper layer
 Key: IOTDB-3570
 URL: https://issues.apache.org/jira/browse/IOTDB-3570
 Project: Apache IoTDB
  Issue Type: Improvement
Reporter: Xinyu Tan


Currently, the Peer returned by the getLeader interface only contains the IP 
address and port corresponding to the consensus layer of the Leader node. 
However, the port that the upper layer wants to obtain may be the service port 
that the upper layer RPC can connect to, such as internalService, etc. 
Therefore, we can consider extending the structure of the Peer so that it can 
be packed with a business custom ID structure, so that the upper layer can be 
returned at getLeader with the ID with business semantics defined for each Peer 
at AddConsensusGroup, thus reducing the coding burden on the upper layer. For 
example, DataNode can encode the TEndpoint of internalService into the ID



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (IOTDB-3569) Use iterator batch interface under MultiLeaderConsensus to get logs from WAL logs at high speed

2022-06-20 Thread Xinyu Tan (Jira)
Xinyu Tan created IOTDB-3569:


 Summary: Use iterator batch interface under MultiLeaderConsensus 
to get logs from WAL logs at high speed
 Key: IOTDB-3569
 URL: https://issues.apache.org/jira/browse/IOTDB-3569
 Project: Apache IoTDB
  Issue Type: Improvement
Reporter: Xinyu Tan


Using the batch interface directly may result in OOM



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (IOTDB-3568) Support linearizable read for RatisConsensus

2022-06-20 Thread Xinyu Tan (Jira)
Xinyu Tan created IOTDB-3568:


 Summary: Support linearizable read for RatisConsensus
 Key: IOTDB-3568
 URL: https://issues.apache.org/jira/browse/IOTDB-3568
 Project: Apache IoTDB
  Issue Type: Improvement
Reporter: Xinyu Tan


* We can contribute to the Ratis community to support linear consistent reading
 * It is also possible to add additional coordination logic on top of the 
RatisConsensus to satisfy linearizable read



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (IOTDB-3564) Reduce the number of I/O threads using thrift asynchronous server mode for MultiLeaderConsensusRPC

2022-06-20 Thread Xinyu Tan (Jira)
Xinyu Tan created IOTDB-3564:


 Summary: Reduce the number of I/O threads using thrift 
asynchronous server mode for MultiLeaderConsensusRPC
 Key: IOTDB-3564
 URL: https://issues.apache.org/jira/browse/IOTDB-3564
 Project: Apache IoTDB
  Issue Type: Improvement
Reporter: Xinyu Tan


Consider using selector mode or hahs mode and abstracting out the corresponding 
parameters



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (IOTDB-3561) Support snapshot transfer under MultiLeaderConsensus

2022-06-20 Thread Xinyu Tan (Jira)
Xinyu Tan created IOTDB-3561:


 Summary: Support snapshot transfer under MultiLeaderConsensus
 Key: IOTDB-3561
 URL: https://issues.apache.org/jira/browse/IOTDB-3561
 Project: Apache IoTDB
  Issue Type: New Feature
Reporter: Xinyu Tan


* On the one hand, we can delete some unsynchronized wal after snapshot. On the 
other hand, we can make the old node catchup faster.
 * BTW, member changes must be transferred through snapshot because the 
corresponding WAL may have been deleted



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Assigned] (IOTDB-3548) [cluster]can not create timeseries when start 3C2D

2022-06-20 Thread Xinyu Tan (Jira)


 [ 
https://issues.apache.org/jira/browse/IOTDB-3548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinyu Tan reassigned IOTDB-3548:


Assignee: (was: Xinyu Tan)

> [cluster]can not create timeseries when start 3C2D
> --
>
> Key: IOTDB-3548
> URL: https://issues.apache.org/jira/browse/IOTDB-3548
> Project: Apache IoTDB
>  Issue Type: Bug
>  Components: Core/Cluster
>Affects Versions: 0.14.0-SNAPSHOT
>Reporter: FengQingxin
>Priority: Major
> Attachments: iotdb-confignode.properties, iotdb-engine.properties, 
> log_all.log
>
>
>  
> commit c42cfe5fbee50b24cc1a1078cd5af1ee69930881
> Author: YongzaoDan <33111881+crzbulab...@users.noreply.github.com>
> Date:   Mon Jun 20 13:54:12 2022 +0800
>     [IOTDB-3510] Read/Write Routing policy (Routing to DataNode with the 
> lowest-loaded) (#6308)
> Reproduce steps:
> 1. Modify config file as 3C3D:
> schema_replication_factor=3
> data_replication_factor=1
> 2. Start 3C
> 3.Start 2D 
> 4. using iotdb-cli to execute below sql:
> set storage group to root.sg;
> create timeseries root.sg.d.s1 with 
> datatype=INT32,encoding=RLE,compression=snappy;
> create timeseries root.sg.d.s2 with 
> datatype=INT32,encoding=RLE,compression=snappy;
> create timeseries root.sg.d.s3 with 
> datatype=INT32,encoding=RLE,compression=snappy;
> insert into root.sg.d(time,s1,s2,s3) values(1,1,2,3);
> insert into root.sg.d(time,s1,s2,s3) values(2,1,2,3);
> 5.Got below error msg:
> Msg: 500: [INTERNAL_SERVER_ERROR(500)] Exception occurred: "create timeseries 
> root.sg.d.s1 with datatype=INT32,encoding=RLE,compression=snappy". 
> executeStatement failed. null
> !image-2022-06-20-17-13-31-196.png!
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Assigned] (IOTDB-3551) [ thread ] Thread control is required

2022-06-20 Thread Xinyu Tan (Jira)


 [ 
https://issues.apache.org/jira/browse/IOTDB-3551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinyu Tan reassigned IOTDB-3551:


Assignee: (was: Xinyu Tan)

> [ thread ] Thread control is required
> -
>
> Key: IOTDB-3551
> URL: https://issues.apache.org/jira/browse/IOTDB-3551
> Project: Apache IoTDB
>  Issue Type: Bug
>  Components: mpp-cluster
>Affects Versions: 0.14.0-SNAPSHOT
>Reporter: 刘珍
>Priority: Major
> Attachments: stack_1.out
>
>
> 72cpu机器,21个dataregion,1个schemaregion,单个datanode进程的Threads: 1200 total,需做好线程控制。
>  TAsyncClientManager : 378
>  Compaction相关:72
>  Flush 相关:76
> MultiLeaderConsensusRPC : 65
> WAL 相关  :43
> LogDispatcher : 42
> grpc-default-worker-ELG : 72
> 20220620_090240_53823线程名:161
> 详细见附件stack_1.out。
> 数据库配置参数:
> data_region_consensus_protocol_class=org.apache.iotdb.consensus.multileader.MultiLeaderConsensus
> schema_region_consensus_protocol_class=org.apache.iotdb.consensus.ratis.RatisConsensus
> schema_replication_factor=3
> data_replication_factor=3



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (IOTDB-3554) Controls the number of rpc threads under the MultiLeaderConsensus

2022-06-20 Thread Xinyu Tan (Jira)
Xinyu Tan created IOTDB-3554:


 Summary: Controls the number of rpc threads under the 
MultiLeaderConsensus
 Key: IOTDB-3554
 URL: https://issues.apache.org/jira/browse/IOTDB-3554
 Project: Apache IoTDB
  Issue Type: Improvement
Reporter: Xinyu Tan
Assignee: Xinyu Tan


* Make all regions share a clientManager
 * Reduce the number of pipelines because concurrency in the same region is 
generally not very large



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (IOTDB-3513) Avoid double-writing of the write ahead log for data under RatisConsensus

2022-06-15 Thread Xinyu Tan (Jira)
Xinyu Tan created IOTDB-3513:


 Summary: Avoid double-writing of the write ahead log for data 
under RatisConsensus
 Key: IOTDB-3513
 URL: https://issues.apache.org/jira/browse/IOTDB-3513
 Project: Apache IoTDB
  Issue Type: Improvement
Reporter: Xinyu Tan






--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (IOTDB-3448) Migrate the logic of deleteRegion onto the consensus module

2022-06-09 Thread Xinyu Tan (Jira)
Xinyu Tan created IOTDB-3448:


 Summary: Migrate the logic of deleteRegion onto the consensus 
module
 Key: IOTDB-3448
 URL: https://issues.apache.org/jira/browse/IOTDB-3448
 Project: Apache IoTDB
  Issue Type: Improvement
Reporter: Xinyu Tan
Assignee: Xinyu Tan


The deletion of a region is used as a raft log to synchronize inside the region.

If the underlying state machine fails to recover to the previous state after 
the restart, NPE problems may occur during the restart.

In addition, executing a raft log that removes itself is very strange for the 
consensus layer because we still end up removing the corresponding region in 
the consensus layer, which is not done in current implementation

So we can move the deleteRegion operation above the consensus layer



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (IOTDB-3446) Deleting a storage group requires that datanode delete all data and wal files and directories related to the storage group

2022-06-09 Thread Xinyu Tan (Jira)
Xinyu Tan created IOTDB-3446:


 Summary: Deleting a storage group requires that datanode delete 
all data and wal  files and directories related to the storage group
 Key: IOTDB-3446
 URL: https://issues.apache.org/jira/browse/IOTDB-3446
 Project: Apache IoTDB
  Issue Type: Bug
Reporter: Xinyu Tan
 Attachments: image-2022-06-10-11-28-47-233.png

!image-2022-06-10-11-28-47-233.png!

Yet none of them have been deleted



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (IOTDB-3445) Deleting a storage group requires that datanode delete all metadata and directories related to the storage group

2022-06-09 Thread Xinyu Tan (Jira)
Xinyu Tan created IOTDB-3445:


 Summary: Deleting a storage group requires that datanode delete 
all metadata and directories related to the storage group
 Key: IOTDB-3445
 URL: https://issues.apache.org/jira/browse/IOTDB-3445
 Project: Apache IoTDB
  Issue Type: Bug
Reporter: Xinyu Tan
 Attachments: image-2022-06-10-11-26-26-980.png

Currently, however, only files can be deleted, not directories

!image-2022-06-10-11-26-26-980.png!



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (IOTDB-3382) Adjust the default preAllocateSize of the ratisConsensus RaftLog

2022-06-08 Thread Xinyu Tan (Jira)


[ 
https://issues.apache.org/jira/browse/IOTDB-3382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17551571#comment-17551571
 ] 

Xinyu Tan commented on IOTDB-3382:
--

 [Analyse doc|https://apache-iotdb.feishu.cn/docx/doxcn4CnBOLzbOwmkpDeitTHqOg] 

> Adjust the default preAllocateSize of the ratisConsensus RaftLog
> 
>
> Key: IOTDB-3382
> URL: https://issues.apache.org/jira/browse/IOTDB-3382
> Project: Apache IoTDB
>  Issue Type: Improvement
>Reporter: Xinyu Tan
>Assignee: Song Ziyang
>Priority: Major
> Attachments: image-2022-06-02-16-36-35-968.png
>
>
> need some theoretical analysis, maybe some testing
> !image-2022-06-02-16-36-35-968.png!



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Assigned] (IOTDB-3382) Adjust the default preAllocateSize of the ratisConsensus RaftLog

2022-06-08 Thread Xinyu Tan (Jira)


 [ 
https://issues.apache.org/jira/browse/IOTDB-3382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinyu Tan reassigned IOTDB-3382:


Assignee: Song Ziyang  (was: Xinyu Tan)

> Adjust the default preAllocateSize of the ratisConsensus RaftLog
> 
>
> Key: IOTDB-3382
> URL: https://issues.apache.org/jira/browse/IOTDB-3382
> Project: Apache IoTDB
>  Issue Type: Improvement
>Reporter: Xinyu Tan
>Assignee: Song Ziyang
>Priority: Major
> Attachments: image-2022-06-02-16-36-35-968.png
>
>
> need some theoretical analysis, maybe some testing
> !image-2022-06-02-16-36-35-968.png!



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (IOTDB-3395) Use thrift server to fix clientManagerTest bind address already used issue

2022-06-05 Thread Xinyu Tan (Jira)
Xinyu Tan created IOTDB-3395:


 Summary: Use thrift server to fix clientManagerTest bind address 
already used issue
 Key: IOTDB-3395
 URL: https://issues.apache.org/jira/browse/IOTDB-3395
 Project: Apache IoTDB
  Issue Type: Improvement
Reporter: Xinyu Tan
Assignee: Xinyu Tan






--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Assigned] (IOTDB-3359) Refactor the serialization interface for the consensus layer to avoid hard-coding size ByteBuffers

2022-06-02 Thread Xinyu Tan (Jira)


 [ 
https://issues.apache.org/jira/browse/IOTDB-3359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinyu Tan reassigned IOTDB-3359:


Assignee: Xinyu Tan

> Refactor the serialization interface for the consensus layer to avoid 
> hard-coding size ByteBuffers
> --
>
> Key: IOTDB-3359
> URL: https://issues.apache.org/jira/browse/IOTDB-3359
> Project: Apache IoTDB
>  Issue Type: Improvement
>Reporter: Xinyu Tan
>Assignee: Xinyu Tan
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (IOTDB-3386) Avoid the double-write problem of raftlog and write-ahead log at the Datanode consensus layer

2022-06-02 Thread Xinyu Tan (Jira)
Xinyu Tan created IOTDB-3386:


 Summary: Avoid the double-write problem of raftlog and write-ahead 
log at the Datanode consensus layer
 Key: IOTDB-3386
 URL: https://issues.apache.org/jira/browse/IOTDB-3386
 Project: Apache IoTDB
  Issue Type: Improvement
Reporter: Xinyu Tan
Assignee: Xinyu Tan


see [doc|https://apache-iotdb.feishu.cn/docs/doccnuowRHp8qgyDOBFdSfsxUw1]



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Assigned] (IOTDB-3385) Reduce the serialization size for the Datanode consensus layer

2022-06-02 Thread Xinyu Tan (Jira)


 [ 
https://issues.apache.org/jira/browse/IOTDB-3385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinyu Tan reassigned IOTDB-3385:


Assignee: Xinyu Tan

> Reduce the serialization size for the Datanode consensus layer
> --
>
> Key: IOTDB-3385
> URL: https://issues.apache.org/jira/browse/IOTDB-3385
> Project: Apache IoTDB
>  Issue Type: Improvement
>Reporter: Xinyu Tan
>Assignee: Xinyu Tan
>Priority: Major
> Attachments: image-2022-06-02-17-59-20-779.png
>
>
> Datanode currently uses FI to pass changes to the consensus layer, but its 
> serialization method contains many unnecessary parts, such as replication 
> group endpoints and so on, which makes it write much more data than WAL or 
> MLOG, affecting performance. We need to think about reducing its size
> !image-2022-06-02-17-59-20-779.png!



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (IOTDB-3385) Reduce the serialization size for the Datanode consensus layer

2022-06-02 Thread Xinyu Tan (Jira)
Xinyu Tan created IOTDB-3385:


 Summary: Reduce the serialization size for the Datanode consensus 
layer
 Key: IOTDB-3385
 URL: https://issues.apache.org/jira/browse/IOTDB-3385
 Project: Apache IoTDB
  Issue Type: Improvement
Reporter: Xinyu Tan
 Attachments: image-2022-06-02-17-59-20-779.png

Datanode currently uses FI to pass changes to the consensus layer, but its 
serialization method contains many unnecessary parts, such as replication group 
endpoints and so on, which makes it write much more data than WAL or MLOG, 
affecting performance. We need to think about reducing its size

!image-2022-06-02-17-59-20-779.png!



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Assigned] (IOTDB-3382) Adjust the default preAllocateSize of the ratisConsensus RaftLog

2022-06-02 Thread Xinyu Tan (Jira)


 [ 
https://issues.apache.org/jira/browse/IOTDB-3382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinyu Tan reassigned IOTDB-3382:


Assignee: Xinyu Tan

> Adjust the default preAllocateSize of the ratisConsensus RaftLog
> 
>
> Key: IOTDB-3382
> URL: https://issues.apache.org/jira/browse/IOTDB-3382
> Project: Apache IoTDB
>  Issue Type: Improvement
>Reporter: Xinyu Tan
>Assignee: Xinyu Tan
>Priority: Major
> Attachments: image-2022-06-02-16-36-35-968.png
>
>
> need some theoretical analysis, maybe some testing
> !image-2022-06-02-16-36-35-968.png!



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (IOTDB-3382) Adjust the default preAllocateSize of the ratisConsensus RaftLog

2022-06-02 Thread Xinyu Tan (Jira)
Xinyu Tan created IOTDB-3382:


 Summary: Adjust the default preAllocateSize of the ratisConsensus 
RaftLog
 Key: IOTDB-3382
 URL: https://issues.apache.org/jira/browse/IOTDB-3382
 Project: Apache IoTDB
  Issue Type: Improvement
Reporter: Xinyu Tan
 Attachments: image-2022-06-02-16-36-35-968.png

need some theoretical analysis, maybe some testing

!image-2022-06-02-16-36-35-968.png!



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (IOTDB-3359) Refactor the serialization interface for the consensus layer to avoid hard-coding size ByteBuffers

2022-05-31 Thread Xinyu Tan (Jira)
Xinyu Tan created IOTDB-3359:


 Summary: Refactor the serialization interface for the consensus 
layer to avoid hard-coding size ByteBuffers
 Key: IOTDB-3359
 URL: https://issues.apache.org/jira/browse/IOTDB-3359
 Project: Apache IoTDB
  Issue Type: Improvement
Reporter: Xinyu Tan






--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Reopened] (IOTDB-3195) Added a configuration interface for the consensus layer

2022-05-29 Thread Xinyu Tan (Jira)


 [ 
https://issues.apache.org/jira/browse/IOTDB-3195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinyu Tan reopened IOTDB-3195:
--

> Added a configuration interface for the consensus layer
> ---
>
> Key: IOTDB-3195
> URL: https://issues.apache.org/jira/browse/IOTDB-3195
> Project: Apache IoTDB
>  Issue Type: Improvement
>Reporter: Xinyu Tan
>Assignee: Xinyu Tan
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Assigned] (IOTDB-3240) [ErrorMSG]java.lang.IllegalStateException: Client has an error!Caused by: java.net.ConnectException: Connection refused

2022-05-26 Thread Xinyu Tan (Jira)


 [ 
https://issues.apache.org/jira/browse/IOTDB-3240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinyu Tan reassigned IOTDB-3240:


Assignee: Xinyu Tan  (was: Quan Siyi)

> [ErrorMSG]java.lang.IllegalStateException: Client has an error!Caused by: 
> java.net.ConnectException: Connection refused
> ---
>
> Key: IOTDB-3240
> URL: https://issues.apache.org/jira/browse/IOTDB-3240
> Project: Apache IoTDB
>  Issue Type: Bug
>  Components: Core/Cluster
>Affects Versions: 0.14.0-SNAPSHOT
>Reporter: FengQingxin
>Assignee: Xinyu Tan
>Priority: Major
> Attachments: image-2022-05-19-20-22-13-041.png, 
> image-2022-05-19-20-24-38-004.png
>
>
> [ErrorMSG]When start a cluster 3C3D with default config file,there is an 
> error in the log of leader config node
> 重现步骤:
> 1.编译生成的distribution文件复制三份(默认配置)
> 2.进入confignode文件夹下sbin目录使用start-confignode.sh启动三个ConfigNode (正常)
> 3.进入datanode文件夹下sbin目录使用start-datanode.sh启动三个DataNode(datanode日志正常)(confignode中leader有error日志如下)
> java.lang.IllegalStateException: Client has an error!
>         at 
> org.apache.thrift.async.TAsyncClient.checkReady(TAsyncClient.java:83)
>         at 
> org.apache.iotdb.commons.client.async.AsyncDataNodeInternalServiceClient.isReady(AsyncDataNodeInternalServiceClient.java:109)
>         at 
> org.apache.iotdb.commons.client.async.AsyncDataNodeInternalServiceClient$Factory.validateObject(AsyncDataNodeInternalServiceClient.java:154)
>         at 
> org.apache.iotdb.commons.client.async.AsyncDataNodeInternalServiceClient$Factory.validateObject(AsyncDataNodeInternalServiceClient.java:122)
>         at 
> org.apache.commons.pool2.impl.GenericKeyedObjectPool.returnObject(GenericKeyedObjectPool.java:1470)
>         at 
> org.apache.iotdb.commons.client.ClientManager.returnClient(ClientManager.java:70)
>         at 
> org.apache.iotdb.commons.client.async.AsyncDataNodeInternalServiceClient.returnSelf(AsyncDataNodeInternalServiceClient.java:83)
>         at 
> org.apache.iotdb.commons.client.async.AsyncDataNodeInternalServiceClient.onError(AsyncDataNodeInternalServiceClient.java:104)
>         at 
> org.apache.thrift.async.TAsyncMethodCall.onError(TAsyncMethodCall.java:215)
>         at 
> org.apache.thrift.async.TAsyncMethodCall.transition(TAsyncMethodCall.java:210)
>         at 
> org.apache.thrift.async.TAsyncClientManager$SelectThread.transitionMethods(TAsyncClientManager.java:143)
>         at 
> org.apache.thrift.async.TAsyncClientManager$SelectThread.run(TAsyncClientManager.java:113)
> Caused by: java.net.ConnectException: Connection refused
>         at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>         at 
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
>         at 
> org.apache.thrift.transport.TNonblockingSocket.finishConnect(TNonblockingSocket.java:217)
>         at 
> org.apache.thrift.async.TAsyncMethodCall.doConnecting(TAsyncMethodCall.java:279)
>         at 
> org.apache.thrift.async.TAsyncMethodCall.transition(TAsyncMethodCall.java:189)
>         ... 2 common frames omitted
>  
> !image-2022-05-19-20-22-13-041.png!
>  
>  
> 期望:无报错信息,正常启动



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (IOTDB-3195) Added a configuration interface for the consensus layer

2022-05-15 Thread Xinyu Tan (Jira)
Xinyu Tan created IOTDB-3195:


 Summary: Added a configuration interface for the consensus layer
 Key: IOTDB-3195
 URL: https://issues.apache.org/jira/browse/IOTDB-3195
 Project: Apache IoTDB
  Issue Type: Improvement
Reporter: Xinyu Tan
Assignee: Xinyu Tan






--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (IOTDB-3188) Multi leader consensus algorithm implementation

2022-05-15 Thread Xinyu Tan (Jira)
Xinyu Tan created IOTDB-3188:


 Summary: Multi leader consensus algorithm implementation
 Key: IOTDB-3188
 URL: https://issues.apache.org/jira/browse/IOTDB-3188
 Project: Apache IoTDB
  Issue Type: Improvement
Reporter: Xinyu Tan
Assignee: Xinyu Tan






--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (IOTDB-3167) Add wait_leader_ready logic for RatisConsensus

2022-05-11 Thread Xinyu Tan (Jira)
Xinyu Tan created IOTDB-3167:


 Summary: Add wait_leader_ready logic for RatisConsensus
 Key: IOTDB-3167
 URL: https://issues.apache.org/jira/browse/IOTDB-3167
 Project: Apache IoTDB
  Issue Type: Improvement
Reporter: Xinyu Tan
Assignee: Xinyu Tan


In the Ratis implementation, the new leader needs to commit a log with the 
latest configuration to commit the operations of the previous term. Before the 
log is applied, even though the node is the leader, the write may still fail, 
so we need to add blocking wait logic.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (IOTDB-3104) Add Consensus Module StateMachine Event API

2022-05-05 Thread Xinyu Tan (Jira)
Xinyu Tan created IOTDB-3104:


 Summary: Add Consensus Module StateMachine Event API 
 Key: IOTDB-3104
 URL: https://issues.apache.org/jira/browse/IOTDB-3104
 Project: Apache IoTDB
  Issue Type: Wish
Reporter: Xinyu Tan
Assignee: Xinyu Tan


We can register some event apis for statemachine to make the statemachine 
handle upper-level changes specifically.

Currently we will support two api:
 * notifyLeaderChanged:Notify the \{@link IStateMachine} that a new leader has 
been elected. Note that the new leader can possibly be this server.
 * notifyConfigurationChanged:Notify the \{@link IStateMachine} a configuration 
change. This method will be invoked when a newConfiguration is processed.

 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


  1   2   3   >