from:"Jinrui Zhang $Jira$"

[jira] [Created] (IOTDB-6095) Tsfiles in sequence space may be overlap with each other due to LastFlushTime bug

2023-08-02 Thread Jinrui Zhang (Jira)

Jinrui Zhang created IOTDB-6095:
---

 Summary: Tsfiles in sequence space may be overlap with each other 
due to LastFlushTime bug
 Key: IOTDB-6095
 URL: https://issues.apache.org/jira/browse/IOTDB-6095
 Project: Apache IoTDB
  Issue Type: Improvement
Reporter: Jinrui Zhang
Assignee: Jinrui Zhang
 Fix For: 1.2.1


This issue may lead to overlapped TsFile in sequence space.

 

For example,

When recover the last flush time map from two sequence TsFiles.
 * TsFile A only contains device1 with end time = 1
 * TsFile B only contains device2 with end time = 

And the resources of these two TsFiles have been downgrade to FileTimeIndex.

The previous code will use TsFile B with end time =  to recover the last 
flush time of device1, which would cause sequence files overlapped.

 

It is due to the recover step bug during the maintenance of LastFlushTime



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (IOTDB-5964) Add correctness check for target file of compaction

2023-06-02 Thread Jinrui Zhang (Jira)



 [ 
https://issues.apache.org/jira/browse/IOTDB-5964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jinrui Zhang reassigned IOTDB-5964:
---

 Sprint: 2023-3-Storage
   Assignee: 周沛辰
Description: 
# Add correctness check for target file of compaction. We can use 
TsFileSequenceRead or some other ways to do the basic file check.
 # Make this feature can be controlled by configuration and disabled by default
Summary: Add correctness check for target file of compaction  (was: Add)

> Add correctness check for target file of compaction
> ---
>
> Key: IOTDB-5964
> URL: https://issues.apache.org/jira/browse/IOTDB-5964
> Project: Apache IoTDB
>  Issue Type: Improvement
>Reporter: Jinrui Zhang
>Assignee: 周沛辰
>Priority: Major
>
> # Add correctness check for target file of compaction. We can use 
> TsFileSequenceRead or some other ways to do the basic file check.
>  # Make this feature can be controlled by configuration and disabled by 
> default



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (IOTDB-5964) Add

2023-06-02 Thread Jinrui Zhang (Jira)

Jinrui Zhang created IOTDB-5964:
---

 Summary: Add
 Key: IOTDB-5964
 URL: https://issues.apache.org/jira/browse/IOTDB-5964
 Project: Apache IoTDB
  Issue Type: Improvement
Reporter: Jinrui Zhang






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (IOTDB-5928) DeadLock between TTL and Compaction

2023-05-28 Thread Jinrui Zhang (Jira)

Jinrui Zhang created IOTDB-5928:
---

 Summary: DeadLock between TTL and Compaction
 Key: IOTDB-5928
 URL: https://issues.apache.org/jira/browse/IOTDB-5928
 Project: Apache IoTDB
  Issue Type: Improvement
Reporter: Jinrui Zhang
Assignee: Jinrui Zhang
 Fix For: 1.1.1


h4. 版本
{panel}
Enterprise version 1.1.1-SNAPSHOT (Build: a8387f1)
{panel}
h4. 复现步骤
{panel}
问题描述：
TTL 和 合并并发产生死锁，数据写入不进去（没报错信息）。
测试流程如下：
1. 测试版本
Enterprise version 1.1.1-SNAPSHOT (Build: a8387f1) 
启动3副本3C5D集群，配置参数以ip74为例：
liuzhen@fit-74:/data/mpp_test/t_rc4_0516_a8387f1$ conf/confignode-env.sh

MAX_HEAP_SIZE="8G"

liuzhen@fit-74:/data/mpp_test/t_rc4_0516_a8387f1$ 
conf/iotdb-confignode.properties
cn_internal_address=192.168.10.74
cn_target_config_node_list=192.168.10.72:10710
cn_connection_timeout_ms=12
cn_metric_reporter_list=PROMETHEUS
cn_metric_level=IMPORTANT
cn_metric_prometheus_reporter_port=9081

liuzhen@fit-74:/data/mpp_test/t_rc4_0516_a8387f1$ conf/datanode-env.sh
MAX_HEAP_SIZE="256G"
MAX_DIRECT_MEMORY_SIZE="32G"

liuzhen@fit-74:/data/mpp_test/t_rc4_0516_a8387f1$ conf/iotdb-datanode.properties
dn_rpc_address=192.168.10.74
dn_internal_address=192.168.10.74
dn_target_config_node_list=192.168.10.72:10710,192.168.10.73:10710,192.168.10.74:10710
dn_connection_timeout_ms=12
dn_metric_reporter_list=PROMETHEUS
dn_metric_level=IMPORTANT

liuzhen@fit-74:/data/mpp_test/t_rc4_0516_a8387f1$ conf/iotdb-common.properties
schema_replication_factor=3
data_replication_factor=3
series_slot_num=1000
schema_region_group_extension_policy=CUSTOM
default_schema_region_group_num_per_database=10
data_region_group_extension_policy=CUSTOM
default_data_region_group_num_per_database=20
disk_space_warning_threshold=0.01
query_timeout_threshold=3600
iot_consensus_throttle_threshold_in_byte=536870912000
*2. 启动benchmark 读写，配置文件见附件 0517_rc4_lt.conf*
*3.启动TTL 脚本，配置文件见附件set_ttl.sh*

{*}每48小时{*}，先把集群置为READONLY, 再设置TTL 删除所有的tsfile（没flush，没封口的tsfile不删除），unset ttl 
，设置集群为RUNNING。（{*}这期间benchmark客户端读写操作不停{*}）

!image-2023-05-25-15-07-11-025.png!
*4.{color:#de350b}运行4 day，出现死锁，数据写入不进去。{color}*
监控看到的write point per second 为0

!image-2023-05-25-15-02-55-676.png!

 
{panel}
h4. Bug 现象
{panel}
TTL 和 合并并发产生死锁
{panel}
h4. 预期结果
{panel}
无死锁
{panel}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (IOTDB-5859) Compaction error when using Version as first sort dimension

2023-05-09 Thread Jinrui Zhang (Jira)

Jinrui Zhang created IOTDB-5859:
---

 Summary: Compaction error when using Version as first sort 
dimension
 Key: IOTDB-5859
 URL: https://issues.apache.org/jira/browse/IOTDB-5859
 Project: Apache IoTDB
  Issue Type: Improvement
Reporter: Jinrui Zhang
Assignee: 周沛辰
 Fix For: 1.1.1


In current implementation of compaction, the default sort dimension is file 
version when selecting compaction tasks. It will lead to compaction error in 
tsfile load scenario. It is because the TsFile with higher version may not has 
greater timestamp when the TsFile is loaded by tools.

 

Solution:

change the sort dimension from file version to timestamp.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (IOTDB-5843) Write operation won't be rejected even if the NodeStatus is ReadOnly

2023-05-06 Thread Jinrui Zhang (Jira)



 [ 
https://issues.apache.org/jira/browse/IOTDB-5843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jinrui Zhang reassigned IOTDB-5843:
---

Assignee: Song Ziyang

> Write operation won't be rejected even if the NodeStatus is ReadOnly
> 
>
> Key: IOTDB-5843
> URL: https://issues.apache.org/jira/browse/IOTDB-5843
> Project: Apache IoTDB
>  Issue Type: Bug
>Reporter: Jinrui Zhang
>Assignee: Song Ziyang
>Priority: Major
> Attachments: image-2023-05-06-19-15-45-109.png
>
>
> *Description:*
>  # Ctrl + C to stop a running DataNode
>  # `show cluster` shows that the DataNode is ReadOnly
>  # the write towards this DataNode won't be rejected
> *Analysis:*
> When using Ctrl+C to stop a DataNode, the shutdownhook will be invoked. And, 
> inside the shutdown hook, the DataNode is setting to status `Stopping` and 
> `Readonly`. 
> Because of the change of PR 
> [https://github.com/apache/iotdb/pull/9274/files,]  the isReadOnly() will 
> return false in DataRegionStatemachine, which won't reject the write 
> operations.
> !image-2023-05-06-19-15-45-109.png|width=550,height=182!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (IOTDB-5843) Write operation won't be rejected even if the NodeStatus

2023-05-06 Thread Jinrui Zhang (Jira)

Jinrui Zhang created IOTDB-5843:
---

 Summary: Write operation won't be rejected even if the NodeStatus
 Key: IOTDB-5843
 URL: https://issues.apache.org/jira/browse/IOTDB-5843
 Project: Apache IoTDB
  Issue Type: Improvement
Reporter: Jinrui Zhang






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (IOTDB-5827) Change default multi_dir_strategy to SequenceStrategy and fix original bug

2023-04-27 Thread Jinrui Zhang (Jira)

Jinrui Zhang created IOTDB-5827:
---

 Summary: Change default multi_dir_strategy to SequenceStrategy and 
fix original bug
 Key: IOTDB-5827
 URL: https://issues.apache.org/jira/browse/IOTDB-5827
 Project: Apache IoTDB
  Issue Type: Improvement
Reporter: Jinrui Zhang
 Fix For: 1.1.1


# change default multi_dir_strategy to SequenceStrategy
 # fix original bug in SequenceStrategy where one folder won't used if others 
space is limited
 # use {{diskSpaceWarningThreshold}} to decide whether a folder is full or not



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (IOTDB-4593) [Remove-DataNode] Removing nodes writes data

2023-04-20 Thread Jinrui Zhang (Jira)



[ 
https://issues.apache.org/jira/browse/IOTDB-4593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17714495#comment-17714495
 ] 

Jinrui Zhang commented on IOTDB-4593:
-

Not a blocking issue

> [Remove-DataNode] Removing nodes writes data
> 
>
> Key: IOTDB-4593
> URL: https://issues.apache.org/jira/browse/IOTDB-4593
> Project: Apache IoTDB
>  Issue Type: Bug
>  Components: mpp-cluster
>Affects Versions: 1.1.0-SNAPSHOT
>Reporter: 刘珍
>Assignee: Xinyu Tan
>Priority: Major
> Attachments: image-2022-10-10-13-36-14-475.png, 
> image-2023-03-08-11-29-52-352.png, image-2023-03-08-11-30-38-559.png, 
> image-2023-03-08-11-30-51-220.png, image-2023-03-08-11-33-49-278.png, 
> more_dev.conf, screenshot-1.png
>
>
> m_0930_2a30316
> 问题描述：
> 缩容datanode，{color:#DE350B}*节点置为Removing状态，但是在继续接受写入*{color}（benchmark运行1小时，执行缩容，*耗时3小时*，缩容完成）：
> 2022-10-08 13:23:54,686 [pool-20-IoTDB-DataNodeInternalRPC-Processor-148] 
> INFO  o.a.i.c.conf.CommonConfig:305 - *Set system mode from Running to 
> Removing*.
> Removing状态后（create 207个tsfile），
>  !image-2022-10-10-13-36-14-475.png! 
>  测试环境
> 1. 192.168.10.71-76  6台物理机 48cpu 384GB
> 3C : 192.168.10.72  , 73,74
> 5D : 192.168.10.72  , 73,74 , 75 , 76
> benchmark：192.168.10.71
> ConfigNode配置参数
> MAX_HEAP_SIZE="8G"
> schema_region_consensus_protocol_class=org.apache.iotdb.consensus.ratis.RatisConsensus
> data_region_consensus_protocol_class=org.apache.iotdb.consensus.multileader.MultiLeaderConsensus
> schema_replication_factor=3
> data_replication_factor=3
> connection_timeout_ms=12
> DataNode配置参数
> MAX_HEAP_SIZE="256G"
> MAX_DIRECT_MEMORY_SIZE="32G"
> connection_timeout_ms=12
> max_connection_for_internal_service=200
> max_waiting_time_when_insert_blocked=60
> query_timeout_threshold=3600
> 2. benchmark 配置文件见附件
> GROUP_NUMBER=10
> DEVICE_NUMBER=5
> SENSOR_NUMBER=600
> IS_OUT_OF_ORDER=false
> OPERATION_PROPORTION=1:0:0:0:0:0:0:0:0:0:0
> CLIENT_NUMBER=100
> LOOP=100
> BATCH_SIZE_PER_WRITE=100
> 3. 运行1小时，ip72缩容
> liuzhen@fit-72:/data/mpp_test/m_0930_2a30316/datanode$ cat 
> 1008_test_remove_1h.sh 
> sleep 1h
> /data/mpp_test/m_0930_2a30316/datanode/sbin/start-cli.sh -h 192.168.10.72 -e 
> "show cluster" > 1008_3c5d_bef_remove.out
> /data/mpp_test/m_0930_2a30316/datanode/sbin/start-cli.sh -h 192.168.10.72 -e 
> "show regions" >> 1008_3c5d_bef_remove.out 
> /data/mpp_test/m_0930_2a30316/datanode/sbin/remove-datanode.sh 
> "192.168.10.72:6667" >> 1008_3c5d_1hour_remove_ip72.out
> 4. ip72 的日志见机器上的备份
> /data/mpp_test/m_0930_2a30316/datanode/logs_bm_1h_remove_ip72



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (IOTDB-5063) [ start datanode ] Failed to start Grpc server

2023-04-20 Thread Jinrui Zhang (Jira)



[ 
https://issues.apache.org/jira/browse/IOTDB-5063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17714494#comment-17714494
 ] 

Jinrui Zhang commented on IOTDB-5063:
-

Suggest to re-test the scenario

> [ start datanode ] Failed to start Grpc server
> --
>
> Key: IOTDB-5063
> URL: https://issues.apache.org/jira/browse/IOTDB-5063
> Project: Apache IoTDB
>  Issue Type: Bug
>  Components: mpp-cluster
>Affects Versions: 0.14.0-SNAPSHOT
>Reporter: 刘珍
>Assignee: Jinrui Zhang
>Priority: Blocker
> Attachments: screenshot-1.png, screenshot-2.png, screenshot-3.png, 
> screenshot-4.png, screenshot-5.png
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> master : 1127_4d7c15d
> 1. 启动3ConfigNode
> 2. 启动21DataNode，总是有1个datanode启动失败（{color:#DE350B}复现3次{color}均能复现），报错信息有2种：
> 报错1 （出现2次）：
> 2022-11-28 09:44:11,906 [main] ERROR o.a.ratis.util.ExitUtils:133 - 
> Terminating with exit status 1: Failed to start Grpc server
> java.io.IOException: Failed to bind to address 0.0.0.0/0.0.0.0:50010
> at 
> org.apache.ratis.thirdparty.io.grpc.netty.NettyServer.start(NettyServer.java:328)
> at 
> org.apache.ratis.thirdparty.io.grpc.internal.ServerImpl.start(ServerImpl.java:183)
> at 
> org.apache.ratis.thirdparty.io.grpc.internal.ServerImpl.start(ServerImpl.java:92)
> at 
> org.apache.ratis.grpc.server.GrpcService.startImpl(GrpcService.java:266)
> at 
> org.apache.ratis.util.LifeCycle.startAndTransition(LifeCycle.java:270)
> at 
> org.apache.ratis.server.RaftServerRpcWithProxy.start(RaftServerRpcWithProxy.java:72)
> at 
> org.apache.ratis.server.impl.RaftServerProxy.startImpl(RaftServerProxy.java:394)
> at 
> org.apache.ratis.util.LifeCycle.startAndTransition(LifeCycle.java:270)
> at 
> org.apache.ratis.server.impl.RaftServerProxy.start(RaftServerProxy.java:387)
> at 
> org.apache.iotdb.consensus.ratis.RatisConsensus.start(RatisConsensus.java:156)
> at org.apache.iotdb.db.service.DataNode.active(DataNode.java:319)
> at org.apache.iotdb.db.service.DataNode.doAddNode(DataNode.java:162)
> at 
> org.apache.iotdb.db.service.DataNodeServerCommandLine.run(DataNodeServerCommandLine.java:95)
> at 
> org.apache.iotdb.commons.ServerCommandLine.doMain(ServerCommandLine.java:58)
> at org.apache.iotdb.db.service.DataNode.main(DataNode.java:132)
> Caused by: 
> org.apache.ratis.thirdparty.io.netty.channel.unix.Errors$NativeIoException: 
> bind(..) failed: Address already in use
> 2022-11-28 09:44:11,910 [Thread-0] ERROR o.a.ratis.util.ExitUtils:133 - 
> Terminating with exit status -1: Thread[Thread-0,5,main] has thrown an 
> uncaught exception
> java.lang.NullPointerException: null
> at 
> org.apache.iotdb.db.service.IoTDBShutdownHook.run(IoTDBShutdownHook.java:60)
> 查看这个节点的datanode进程的端口信息：
>   !screenshot-2.png! 
> 报错2（出现1次）：
>  !screenshot-3.png! 
> 查看这个节点的datanode进程的端口信息：
>  !screenshot-4.png! 
> 启动成功的datanode的端口信息：
>  !screenshot-5.png! 
> 测试环境-私有云1期  ， 8C32GB  ，24台机器
> 1. ConfigNode配置
> MAX_HEAP_SIZE="20G"
> MAX_DIRECT_MEMORY_SIZE="6G"
> 2. DataNode配置
> MAX_HEAP_SIZE="20G"
> MAX_DIRECT_MEMORY_SIZE="6G"
> 3. Common配置
> schema_replication_factor=3
> data_replication_factor=3
> 4.启动3ConfigNode （ip23，24，25）
> 5.启动21DataNode ，启动脚本（21个Datanode的启动命令，间隔1秒）
> [root@i-66xazbht deploy_mpp_scripts]# cat 4_start_data_node.sh
> #!/bin/bash
> cluster_dir="/data/iotdb"
> cur_cluster="m_1127_4d7c15d"
> u_name="root"
> exec 3 while read line <&3
> do
> ssh ${u_name}@${line} "source 
> /etc/profile;${cluster_dir}/${cur_cluster}/sbin/start-datanode.sh > /dev/null 
> 2>&1 &"
> sleep 1
> done
> 6.查看集群信息，总是有1个datanode 是Unknown，去这个节点查看log
>   !screenshot-1.png! 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (IOTDB-5557) [ metadata ] The metadata query results are inconsistent

2023-04-20 Thread Jinrui Zhang (Jira)



[ 
https://issues.apache.org/jira/browse/IOTDB-5557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17714493#comment-17714493
 ] 

Jinrui Zhang commented on IOTDB-5557:
-

DataNode should be ready (visible to client) until all the replays of metadata 
operations finish 

> [ metadata ] The metadata query results are inconsistent
> 
>
> Key: IOTDB-5557
> URL: https://issues.apache.org/jira/browse/IOTDB-5557
> Project: Apache IoTDB
>  Issue Type: Bug
>  Components: Core/Schema Manager, mpp-cluster
>Affects Versions: 1.1.0-SNAPSHOT
>Reporter: 刘珍
>Assignee: Song Ziyang
>Priority: Blocker
> Attachments: image-2023-02-20-14-04-32-611.png
>
>
> master : 0219_0cd4461
> 启动集群，log_datanode_all.log出现enjoy后，查询元数据，出现查询结果不一致（动态增加，直到全部元数据加载到内存）。
> 期望：只要集群已经开始提供查询服务，就要保证查询结果的一致性。
> 测试环境：
> 1. 192.168.10.76  48cpu 384GB 内存
> 元数据信息：1db，1万设备，600序列/dev。
> ConfigNode：
> MAX_HEAP_SIZE="8G"
> DataNode：
> MAX_HEAP_SIZE="256G"
> MAX_DIRECT_MEMORY_SIZE="32G"
> COMMON配置
> time_partition_interval=6048000
> query_timeout_threshold=3600
> enable_seq_space_compaction=false
> enable_unseq_space_compaction=false
> enable_cross_space_compaction=false
> 2. 清操作系统缓存，启动数据库，出现enjoy后，执行count devices查看结果
> cat check_device_count.sh 
> while true
> do
>   v_start=`grep enjoy logs/log_datanode_all.log|wc -l`
>   if [[ ${v_start} = "1" ]];then
>   for i in {1..100}
>   do
>./sbin/start-cli.sh -h 192.168.10.76 -e "count devices;" 
> >> dev_count_during_start.out
>   done
>   break
>   fi
> done
> 下图结果，可以看出，count devices的结果在动态增加，直至1，完全加载到内存中：
>  !image-2023-02-20-14-04-32-611.png! 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (IOTDB-5781) Change the default strategy to SequenceStrategy

2023-04-18 Thread Jinrui Zhang (Jira)



[ 
https://issues.apache.org/jira/browse/IOTDB-5781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17713918#comment-17713918
 ] 

Jinrui Zhang commented on IOTDB-5781:
-

In other words, only all the data_dirs' space is less than 5%, the system 
should change to read-only. 

> Change the default strategy to SequenceStrategy
> ---
>
> Key: IOTDB-5781
> URL: https://issues.apache.org/jira/browse/IOTDB-5781
> Project: Apache IoTDB
>  Issue Type: Improvement
>Reporter: Jialin Qiao
>Priority: Major
>
> Currently, we do not allow other strategy except for 
> MaxDiskUsableSpaceFirstStrategy, which forbids us to accelerate writing by 
> multi-data dirs.
>  
> So, we need to refine the SequenceStrategy, if the remaining space of a disk 
> is less than 
> disk_space_warning_threshold, we could stop allocating it. Then, the default 
> strategy could be SequenceStrategy.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (IOTDB-5703) Memtable won't be flushed for a long while even if the time_partition is inactive

2023-03-19 Thread Jinrui Zhang (Jira)



[ 
https://issues.apache.org/jira/browse/IOTDB-5703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17702430#comment-17702430
 ] 

Jinrui Zhang commented on IOTDB-5703:
-

Checked with [~HeimingZ] , current timed_flush mechanism works fine while the 
default interval is too long (3 hours). 

> Memtable won't be flushed for a long while even if the time_partition is 
> inactive
> -
>
> Key: IOTDB-5703
> URL: https://issues.apache.org/jira/browse/IOTDB-5703
> Project: Apache IoTDB
>  Issue Type: Improvement
>Reporter: Jinrui Zhang
>Assignee: Haiming Zhu
>Priority: Major
>
> *Description*
> During our test, we found that the memtable of some time partition won't be 
> flushed even if there is no data insertion towards the time partition. 
> *Impaction*
> If the memtable is not flushed, there will be a unclosed tsfile inside the 
> time partition, which will block the inner compaction of this time partition
> *Solution*
> We do have a timed flush strategy but it seems it is not working now. We need 
> to fix it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (IOTDB-5704) Optimize default parameters in iotdb-common for WAL part

2023-03-19 Thread Jinrui Zhang (Jira)

Jinrui Zhang created IOTDB-5704:
---

 Summary: Optimize default parameters in iotdb-common for WAL part
 Key: IOTDB-5704
 URL: https://issues.apache.org/jira/browse/IOTDB-5704
 Project: Apache IoTDB
  Issue Type: Improvement
Reporter: Jinrui Zhang
Assignee: Jinrui Zhang


During the test of customers, we found that some parameters about WAL will 
always be changed to a more suitable value. So we'd better to optimize the 
default value of them.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (IOTDB-5703) Memtable won't be flushed for a long while even if the time_partition is inactive

2023-03-19 Thread Jinrui Zhang (Jira)

Jinrui Zhang created IOTDB-5703:
---

 Summary: Memtable won't be flushed for a long while even if the 
time_partition is inactive
 Key: IOTDB-5703
 URL: https://issues.apache.org/jira/browse/IOTDB-5703
 Project: Apache IoTDB
  Issue Type: Improvement
Reporter: Jinrui Zhang
Assignee: Haiming Zhu


*Description*

During our test, we found that the memtable of some time partition won't be 
flushed even if there is no data insertion towards the time partition. 

*Impaction*

If the memtable is not flushed, there will be a unclosed tsfile inside the time 
partition, which will block the inner compaction of this time partition

*Solution*

We do have a timed flush strategy but it seems it is not working now. We need 
to fix it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (IOTDB-5667) Compaction scheduler strategy issue

2023-03-12 Thread Jinrui Zhang (Jira)



 [ 
https://issues.apache.org/jira/browse/IOTDB-5667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jinrui Zhang reassigned IOTDB-5667:
---

Assignee: 周沛辰  (was: Jinrui Zhang)

> Compaction scheduler strategy issue
> ---
>
> Key: IOTDB-5667
> URL: https://issues.apache.org/jira/browse/IOTDB-5667
> Project: Apache IoTDB
>  Issue Type: Improvement
>Reporter: Jinrui Zhang
>Assignee: 周沛辰
>Priority: Major
>
> # speed of compaction cannot reach the increase speed of new file
>  # high level compaction is executed but there are lots of 0-level files
>  # supply more IT for inner compaction regarding file selecting
>  # Queue is occupied by out-of-date/ low-priority task



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (IOTDB-5667) Compaction scheduler strategy issue

2023-03-12 Thread Jinrui Zhang (Jira)

Jinrui Zhang created IOTDB-5667:
---

 Summary: Compaction scheduler strategy issue
 Key: IOTDB-5667
 URL: https://issues.apache.org/jira/browse/IOTDB-5667
 Project: Apache IoTDB
  Issue Type: Improvement
Reporter: Jinrui Zhang
Assignee: Jinrui Zhang


# speed of compaction cannot reach the increase speed of new file
 # high level compaction is executed but there are lots of 0-level files
 # supply more IT for inner compaction regarding file selecting
 # Queue is occupied by out-of-date/ low-priority task



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (IOTDB-5628) SchemaRegion with 1 replica occurs error when restarting IoTDB server

2023-03-05 Thread Jinrui Zhang (Jira)

Jinrui Zhang created IOTDB-5628:
---

 Summary: SchemaRegion with 1 replica occurs error when restarting 
IoTDB server
 Key: IOTDB-5628
 URL: https://issues.apache.org/jira/browse/IOTDB-5628
 Project: Apache IoTDB
  Issue Type: Improvement
Reporter: Jinrui Zhang
Assignee: Jinrui Zhang


*Background:*

IoTDB server runs for a very long time with lots of timeseries/data.

 

*Operation*

shutdown server and restart it. Start client to write data again.

 

*Error*

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (IOTDB-5626) SchemaRegion with 1 replica occurs error when restarting IoTDB server

2023-03-05 Thread Jinrui Zhang (Jira)

Jinrui Zhang created IOTDB-5626:
---

 Summary: SchemaRegion with 1 replica occurs error when restarting 
IoTDB server
 Key: IOTDB-5626
 URL: https://issues.apache.org/jira/browse/IOTDB-5626
 Project: Apache IoTDB
  Issue Type: Improvement
Reporter: Jinrui Zhang
Assignee: Jinrui Zhang
 Attachments: image-2023-03-06-15-43-48-809.png

*Background:*

IoTDB server runs for a very long time with lots of timeseries/data.

 

*Operation*

shutdown server and restart it. Start client to write data again.

 

*Error*

*!image-2023-03-06-15-43-48-809.png|width=1278,height=248!*



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (IOTDB-5625) SchemaRegion with 1 replica occurs error when restarting IoTDB server

2023-03-05 Thread Jinrui Zhang (Jira)

Jinrui Zhang created IOTDB-5625:
---

 Summary: SchemaRegion with 1 replica occurs error when restarting 
IoTDB server
 Key: IOTDB-5625
 URL: https://issues.apache.org/jira/browse/IOTDB-5625
 Project: Apache IoTDB
  Issue Type: Improvement
Reporter: Jinrui Zhang
Assignee: Jinrui Zhang
 Attachments: image-2023-03-06-15-41-51-456.png

*Background:*

IoTDB server runs for a very long time with lots of timeseries/data.

 

*Operation*

shutdown server and restart it. Start client to write data again.

 

*Error*

*!image-2023-03-06-15-41-51-456.png|width=1278,height=248!*



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (IOTDB-5614) Error when restarting cluster and write client in GeelyCarTest

2023-03-02 Thread Jinrui Zhang (Jira)

Jinrui Zhang created IOTDB-5614:
---

 Summary: Error when restarting cluster and write client in 
GeelyCarTest
 Key: IOTDB-5614
 URL: https://issues.apache.org/jira/browse/IOTDB-5614
 Project: Apache IoTDB
  Issue Type: Improvement
Reporter: Jinrui Zhang
Assignee: Jinrui Zhang
 Attachments: image-2023-03-03-11-41-12-903.png

!image-2023-03-03-11-41-12-903.png|width=1236,height=454!

 

This error occurred when restarting write-client. And it disappeared after 
restarting write-client  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (IOTDB-5472) [Atmos]The number of tsfiles went up between db6f17e 【02/01】and 5602d0e【02/05】

2023-02-10 Thread Jinrui Zhang (Jira)



 [ 
https://issues.apache.org/jira/browse/IOTDB-5472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jinrui Zhang reassigned IOTDB-5472:
---

Assignee: Haiming Zhu

> [Atmos]The number of tsfiles went up between db6f17e 【02/01】and 5602d0e【02/05】
> --
>
> Key: IOTDB-5472
> URL: https://issues.apache.org/jira/browse/IOTDB-5472
> Project: Apache IoTDB
>  Issue Type: Improvement
>  Components: Core/Engine
>Reporter: Qingxin Feng
>Assignee: Haiming Zhu
>Priority: Minor
> Attachments: image-2023-02-06-09-18-27-596.png, 
> image-2023-02-06-09-22-50-304.png, image-2023-02-08-08-41-55-425.png, 
> image-2023-02-10-08-55-18-990.png, image-2023-02-10-17-31-06-102.png
>
>
> The number of tsfiles went up between db6f17e 【02/01】and 5602d0e【02/05】
> Please refer to below picture.
>  
> !image-2023-02-06-09-22-50-304.png|width=661,height=380!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (IOTDB-5511) Metrics of running compaction task is not accurate

2023-02-08 Thread Jinrui Zhang (Jira)

Jinrui Zhang created IOTDB-5511:
---

 Summary: Metrics of running compaction task is not accurate
 Key: IOTDB-5511
 URL: https://issues.apache.org/jira/browse/IOTDB-5511
 Project: Apache IoTDB
  Issue Type: Improvement
Reporter: Jinrui Zhang
Assignee: Liuxuxin
 Attachments: image-2023-02-09-15-25-28-982.png, 
image-2023-02-09-15-25-58-380.png

Currently there is 10 compaction task is running but the dashboards says 9. See 
the snapshot below.

 

Logs:

!image-2023-02-09-15-25-28-982.png|width=603,height=101!

 

Dashboard:

!image-2023-02-09-15-25-58-380.png|width=604,height=201!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (IOTDB-5509) Limit the level of unseq file in cross compaction

2023-02-08 Thread Jinrui Zhang (Jira)

Jinrui Zhang created IOTDB-5509:
---

 Summary: Limit the level of unseq file in cross compaction 
 Key: IOTDB-5509
 URL: https://issues.apache.org/jira/browse/IOTDB-5509
 Project: Apache IoTDB
  Issue Type: Improvement
Reporter: Jinrui Zhang
Assignee: Jinrui Zhang






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (IOTDB-5508) System backup Tools

2023-02-08 Thread Jinrui Zhang (Jira)

Jinrui Zhang created IOTDB-5508:
---

 Summary: System backup Tools
 Key: IOTDB-5508
 URL: https://issues.apache.org/jira/browse/IOTDB-5508
 Project: Apache IoTDB
  Issue Type: Improvement
Reporter: Jinrui Zhang
Assignee: 马子坤


Investigate and design the implementation of System Backup



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (IOTDB-5140) Add metrics for compaction deserializing pages or writing chunks

2023-01-30 Thread Jinrui Zhang (Jira)



[ 
https://issues.apache.org/jira/browse/IOTDB-5140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17681995#comment-17681995
 ] 

Jinrui Zhang commented on IOTDB-5140:
-

PR towards rel/1.0 is not merged 

> Add metrics for compaction deserializing pages or writing chunks
> 
>
> Key: IOTDB-5140
> URL: https://issues.apache.org/jira/browse/IOTDB-5140
> Project: Apache IoTDB
>  Issue Type: Improvement
>Reporter: Liuxuxin
>Assignee: Liuxuxin
>Priority: Major
>  Labels: pull-request-available
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> We want to trace the count of deserlializing chunk or pages during compaction.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (IOTDB-5206) Fix when target file is deleted in Compaction exception handler and recover

2023-01-30 Thread Jinrui Zhang (Jira)



[ 
https://issues.apache.org/jira/browse/IOTDB-5206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17681993#comment-17681993
 ] 

Jinrui Zhang commented on IOTDB-5206:
-

The fix is approved but not merged because of IT failures

> Fix when target file is deleted in Compaction exception handler and recover
> ---
>
> Key: IOTDB-5206
> URL: https://issues.apache.org/jira/browse/IOTDB-5206
> Project: Apache IoTDB
>  Issue Type: Bug
>Affects Versions: master branch, 1.0.0
>Reporter: 周沛辰
>Assignee: 周沛辰
>Priority: Major
>  Labels: pull-request-available
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> *Description*
> After compaction, if the target file is empty, its corresponding disk file 
> will be deleted. If an exception or system interruption occurs, there will be 
> problems in restart recovery and set allowCompaction to false.
> 2022-12-20 09:23:53,086 [pool-12-IoTDB-Recovery-Thread-Pool-1] ERROR 
> o.a.i.d.e.c.t.CompactionRecoverTask:300 - root.iot-0 
> [Compaction][ExceptionHandler] target file 
> sequence/root.iot/0/0/1670572962795-1051-2-1.inner is not complete, and some 
> source files is lost, do nothing. Set allowCompaction to false 
> 2022-12-20 09:23:53,087 [pool-12-IoTDB-Recovery-Thread-Pool-1] ERROR 
> o.a.i.d.e.c.t.CompactionRecoverTask:133 - root.iot-0 [Compaction][Recover] 
> Failed to recover compaction, set allowCompaction to false
> *Reason*
> Empty target files will be deleted in compaction. In recovery, system will 
> report source files are lost and empty target file has been deleted.
> *Solution*
> Empty target files are not deleted during the compaction until the end of the 
> compaction. However, after recovery, the empty target file will not be 
> deleted, but it will not affect the correctness of the system.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (IOTDB-5140) Add metrics for compaction deserializing pages or writing chunks

2023-01-29 Thread Jinrui Zhang (Jira)



[ 
https://issues.apache.org/jira/browse/IOTDB-5140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17681841#comment-17681841
 ] 

Jinrui Zhang commented on IOTDB-5140:
-

Need to put another PR to rel/1.0

> Add metrics for compaction deserializing pages or writing chunks
> 
>
> Key: IOTDB-5140
> URL: https://issues.apache.org/jira/browse/IOTDB-5140
> Project: Apache IoTDB
>  Issue Type: Improvement
>Reporter: Liuxuxin
>Assignee: Liuxuxin
>Priority: Major
>  Labels: pull-request-available
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> We want to trace the count of deserlializing chunk or pages during compaction.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Reopened] (IOTDB-5140) Add metrics for compaction deserializing pages or writing chunks

2023-01-29 Thread Jinrui Zhang (Jira)



 [ 
https://issues.apache.org/jira/browse/IOTDB-5140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jinrui Zhang reopened IOTDB-5140:
-

> Add metrics for compaction deserializing pages or writing chunks
> 
>
> Key: IOTDB-5140
> URL: https://issues.apache.org/jira/browse/IOTDB-5140
> Project: Apache IoTDB
>  Issue Type: Improvement
>Reporter: Liuxuxin
>Assignee: Liuxuxin
>Priority: Major
>  Labels: pull-request-available
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> We want to trace the count of deserlializing chunk or pages during compaction.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (IOTDB-5164) [disk]datanode takes too much disk space, should improve

2023-01-29 Thread Jinrui Zhang (Jira)



[ 
https://issues.apache.org/jira/browse/IOTDB-5164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17681832#comment-17681832
 ] 

Jinrui Zhang commented on IOTDB-5164:
-

这个issue 可以关闭了吗？

> [disk]datanode takes too much disk space, should improve
> 
>
> Key: IOTDB-5164
> URL: https://issues.apache.org/jira/browse/IOTDB-5164
> Project: Apache IoTDB
>  Issue Type: Improvement
>Reporter: changxue
>Assignee: Jinrui Zhang
>Priority: Major
> Attachments: iotdb-common.properties, iotdb-confignode.properties, 
> iotdb-datanode.properties
>
>
> [disk]datanode takes too much disk space, should improve
> Here is the disk taking state of one node, it shows 124G data would take 230G 
> in one node, and there are 3 nodes with 3 replicas, so 124G data would take 6 
> times of real one. This is too much.
> {code}
> 124G  ./datanode/data/sequence
> 51M   ./datanode/data/unsequence
> 104G  ./datanode/data/snapshot
> 228G  ./datanode/data
> 414M  ./datanode/wal/root.test-0
> 401M  ./datanode/wal/root.test-3
> 394M  ./datanode/wal/root.test-1
> 410M  ./datanode/wal/root.test-2
> 394M  ./datanode/wal/root.test-4
> 2.0G  ./datanode/wal
> 4.0K  ./datanode/system/compression_ratio
> 16K   ./datanode/system/schema
> 4.0K  ./datanode/system/roles
> 8.0K  ./datanode/system/users
> 48K   ./datanode/system/databases
> 4.0K  ./datanode/system/upgrade
> 8.0K  ./datanode/system/udf
> 100K  ./datanode/system
> 5.2M  ./datanode/consensus/schema_region
> 356K  ./datanode/consensus/data_region
> 5.6M  ./datanode/consensus
> 230G  ./datanode
> 4.0K  ./confignode/system/roles
> 8.0K  ./confignode/system/users
> 4.0K  ./confignode/system/procedure
> 24K   ./confignode/system
> 4.1M  ./confignode/consensus/47474747-4747-4747-4747-
> 4.1M  ./confignode/consensus
> 4.1M  ./confignode
> 230G  .
> {code}
> 124G的数据，单个节点上要占用230G的空间，这是个3节点集群配置的3副本，所以，它总共要占用6倍的磁盘空间。这实在太多了，我觉得需要优化。咱们snapshot的设计是否有部分重复。这部分空间是否可以复用。
> 说明：可能是因为磁盘空间不足导致readonly, 然后snapshot。



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (IOTDB-4001) Only unseq files reach a certain number can they be selected in cross compaction

2023-01-09 Thread Jinrui Zhang (Jira)



 [ 
https://issues.apache.org/jira/browse/IOTDB-4001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jinrui Zhang reassigned IOTDB-4001:
---

  Sprint: 2023-1-Storage  (was: StorageEngine-Backlog)
Assignee: Wenwei Shu  (was: 周沛辰)

> Only unseq files reach a certain number can they be selected in cross 
> compaction
> 
>
> Key: IOTDB-4001
> URL: https://issues.apache.org/jira/browse/IOTDB-4001
> Project: Apache IoTDB
>  Issue Type: Improvement
>Reporter: 周沛辰
>Assignee: Wenwei Shu
>Priority: Major
>
> When the number of unseq files reaches a certain number, they will be 
> selected to participate in the cross space compaction, so as to avoid an 
> unseq file being immediately selected to participate in the compaction, 
> resulting in write amplification.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (IOTDB-5319) The write speed in atoms testing declined after merging commit 5126711d

2023-01-08 Thread Jinrui Zhang (Jira)



[ 
https://issues.apache.org/jira/browse/IOTDB-5319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17655937#comment-17655937
 ] 

Jinrui Zhang commented on IOTDB-5319:
-

we have reverted this PR for release 1.0.1.

Let's keep tracking this issue for incoming release

> The write speed in atoms testing declined after merging commit 5126711d
> ---
>
> Key: IOTDB-5319
> URL: https://issues.apache.org/jira/browse/IOTDB-5319
> Project: Apache IoTDB
>  Issue Type: Improvement
>Reporter: Jinrui Zhang
>Assignee: Liuxuxin
>Priority: Major
> Attachments: image-2022-12-29-18-37-25-693.png, 
> image-2022-12-29-18-38-45-394.png
>
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> !image-2022-12-29-18-37-25-693.png|width=701,height=156!
>  
> After merging this commit, the write speed in atoms testing declined. 
>  
> We inferred that this change lead to compaction grabs more CPU/IO resources, 
> which decrease available resources of write/read.
>  
> After we change the parameter `iops_per_min` from 50 to 30, the scenario 
> still exists.
> See this snapshot,
> !image-2022-12-29-18-38-45-394.png|width=541,height=346!
>  
> Let's investigate the details.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (IOTDB-5385) Optimize the DataRegion leader calculation policy when ConfigNode leader changes

2023-01-07 Thread Jinrui Zhang (Jira)

Jinrui Zhang created IOTDB-5385:
---

 Summary: Optimize the DataRegion leader calculation policy when 
ConfigNode leader changes
 Key: IOTDB-5385
 URL: https://issues.apache.org/jira/browse/IOTDB-5385
 Project: Apache IoTDB
  Issue Type: Improvement
Reporter: Jinrui Zhang
Assignee: Gaofei Cao


We found a phenomenon in recent testing.

That is, the leaders of DataRegions will be updated by new elected ConfigNode 
leader.  It adds unstable risks to our system sometimes and it will be 
optimized.

See this doc for detials. 
https://apache-iotdb.feishu.cn/docx/ZTlkdiPwRoXGi0xs2cacYpSlnYb?from=space_persnoal_filelist&pre_pathname=%2Fdrive%2Ffolder%2Ffldcnf4szpemAst96rw3XajU5jb



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (IOTDB-5363) Try to construct InsertTablet from InsertRowsNode to speed up write operation

2023-01-04 Thread Jinrui Zhang (Jira)

Jinrui Zhang created IOTDB-5363:
---

 Summary: Try to construct InsertTablet from InsertRowsNode to 
speed up write operation
 Key: IOTDB-5363
 URL: https://issues.apache.org/jira/browse/IOTDB-5363
 Project: Apache IoTDB
  Issue Type: Improvement
Reporter: Jinrui Zhang
Assignee: Haiming Zhu






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (IOTDB-5206) Fix when target file is deleted in Compaction exception handler and recover

2023-01-04 Thread Jinrui Zhang (Jira)



[ 
https://issues.apache.org/jira/browse/IOTDB-5206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17654819#comment-17654819
 ] 

Jinrui Zhang commented on IOTDB-5206:
-

执行时策略： [https://apache-iotdb.feishu.cn/docs/doccnOxzotqCdP94MuO1HIDjv4c]

恢复时策略：https://apache-iotdb.feishu.cn/docx/I9yIdIoRgo5dBCxb1Svcoil7nDg#T8UmdA088oY2CwxyAfVcXHUanSh

> Fix when target file is deleted in Compaction exception handler and recover
> ---
>
> Key: IOTDB-5206
> URL: https://issues.apache.org/jira/browse/IOTDB-5206
> Project: Apache IoTDB
>  Issue Type: Bug
>Affects Versions: master branch, 1.0.0
>Reporter: 周沛辰
>Assignee: 周沛辰
>Priority: Major
>  Labels: pull-request-available
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> *Description*
> After compaction, if the target file is empty, its corresponding disk file 
> will be deleted. If an exception or system interruption occurs, there will be 
> problems in restart recovery and set allowCompaction to false.
> 2022-12-20 09:23:53,086 [pool-12-IoTDB-Recovery-Thread-Pool-1] ERROR 
> o.a.i.d.e.c.t.CompactionRecoverTask:300 - root.iot-0 
> [Compaction][ExceptionHandler] target file 
> sequence/root.iot/0/0/1670572962795-1051-2-1.inner is not complete, and some 
> source files is lost, do nothing. Set allowCompaction to false 
> 2022-12-20 09:23:53,087 [pool-12-IoTDB-Recovery-Thread-Pool-1] ERROR 
> o.a.i.d.e.c.t.CompactionRecoverTask:133 - root.iot-0 [Compaction][Recover] 
> Failed to recover compaction, set allowCompaction to false
> *Reason*
> Empty target files will be deleted in compaction. In recovery, system will 
> report source files are lost and empty target file has been deleted.
> *Solution*
> Empty target files are not deleted during the compaction until the end of the 
> compaction. However, after recovery, the empty target file will not be 
> deleted, but it will not affect the correctness of the system.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (IOTDB-5338) WAL buffer flush threshold optimaztion

2023-01-03 Thread Jinrui Zhang (Jira)

Jinrui Zhang created IOTDB-5338:
---

 Summary: WAL buffer flush threshold optimaztion
 Key: IOTDB-5338
 URL: https://issues.apache.org/jira/browse/IOTDB-5338
 Project: Apache IoTDB
  Issue Type: Improvement
Reporter: Jinrui Zhang
Assignee: Haiming Zhu






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (IOTDB-5337) Parallelization of write operation in FragmentInstanceDispatcher

2023-01-03 Thread Jinrui Zhang (Jira)



 [ 
https://issues.apache.org/jira/browse/IOTDB-5337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jinrui Zhang reassigned IOTDB-5337:
---

Sprint: 2023-1-Storage
  Assignee: Haiming Zhu
Remaining Estimate: 72h
 Original Estimate: 72h

> Parallelization of write operation in FragmentInstanceDispatcher
> 
>
> Key: IOTDB-5337
> URL: https://issues.apache.org/jira/browse/IOTDB-5337
> Project: Apache IoTDB
>  Issue Type: Improvement
>Reporter: Jinrui Zhang
>Assignee: Haiming Zhu
>Priority: Major
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> In current implementation, the write operations split will be dispatched one 
> by one.
>  
> We can try to dispatch them in parallel to improve the speed



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (IOTDB-5337) Parallelization of write operation in FragmentInstanceDispatcher

2023-01-03 Thread Jinrui Zhang (Jira)

Jinrui Zhang created IOTDB-5337:
---

 Summary: Parallelization of write operation in 
FragmentInstanceDispatcher
 Key: IOTDB-5337
 URL: https://issues.apache.org/jira/browse/IOTDB-5337
 Project: Apache IoTDB
  Issue Type: Improvement
Reporter: Jinrui Zhang


In current implementation, the write operations split will be dispatched one by 
one.

 

We can try to dispatch them in parallel to improve the speed



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (IOTDB-5336) Investigation regarding write interface used by TSBS in IoTDB

2023-01-03 Thread Jinrui Zhang (Jira)

Jinrui Zhang created IOTDB-5336:
---

 Summary: Investigation regarding write interface used by TSBS  in 
IoTDB
 Key: IOTDB-5336
 URL: https://issues.apache.org/jira/browse/IOTDB-5336
 Project: Apache IoTDB
  Issue Type: Improvement
Reporter: Jinrui Zhang
Assignee: Haiming Zhu






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (IOTDB-5335) InsertRecords performance optimization

2023-01-03 Thread Jinrui Zhang (Jira)

Jinrui Zhang created IOTDB-5335:
---

 Summary: InsertRecords performance optimization
 Key: IOTDB-5335
 URL: https://issues.apache.org/jira/browse/IOTDB-5335
 Project: Apache IoTDB
  Issue Type: Improvement
Reporter: Jinrui Zhang
Assignee: Haiming Zhu






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (IOTDB-5273) [fast compaction]The performance is slow ，there are out-of-order tsfiles after compaction

2023-01-02 Thread Jinrui Zhang (Jira)



 [ 
https://issues.apache.org/jira/browse/IOTDB-5273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jinrui Zhang reassigned IOTDB-5273:
---

  Sprint: 2023-1-Storage  (was: 2022-12-Storage)
Assignee: Wenwei Shu  (was: 周沛辰)

> [fast compaction]The performance is slow ，there are out-of-order tsfiles 
> after compaction
> -
>
> Key: IOTDB-5273
> URL: https://issues.apache.org/jira/browse/IOTDB-5273
> Project: Apache IoTDB
>  Issue Type: Bug
>  Components: mpp-cluster
>Affects Versions: master branch
>Reporter: 刘珍
>Assignee: Wenwei Shu
>Priority: Major
> Attachments: 1_luanxu.conf, 2_luanxu.conf, 3_luanxu.conf, 
> 4_luanxu.conf, image-2022-12-23-18-10-25-061.png
>
>   Original Estimate: 96h
>  Remaining Estimate: 96h
>
> master 1222_656d281
> 问题描述：
> 私有云1期，周测乱序配置，fast合并性能慢，且合并完成依然有乱序tsfile。
> !image-2022-12-23-18-10-25-061.png|width=979,height=481!
> 测试环境
> 1. 私有云1期
> 关合并，生成数据。
> 配置文件见附件
> 2.对比合并性能，无其他读写操作。
> ConfigNode配置
> MAX_HEAP_SIZE="2G"
> DataNode配置
> MAX_HEAP_SIZE="18G"
> MAX_DIRECT_MEMORY_SIZE="6G"
> Common配置
> time_partition_interval=6048000
> compaction_io_rate_per_sec=1000



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (IOTDB-5156) The backup data is twice the size of the source data

2023-01-02 Thread Jinrui Zhang (Jira)



 [ 
https://issues.apache.org/jira/browse/IOTDB-5156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jinrui Zhang reassigned IOTDB-5156:
---

Assignee: Liuxuxin

> The backup data is twice the size of the source data
> 
>
> Key: IOTDB-5156
> URL: https://issues.apache.org/jira/browse/IOTDB-5156
> Project: Apache IoTDB
>  Issue Type: Improvement
>  Components: mpp-cluster
>Reporter: 刘珍
>Assignee: Liuxuxin
>Priority: Major
> Attachments: image-2022-12-08-21-08-54-494.png
>
>
> master 
> 问题描述：
> 备份iotdb的data，大小是源data 的2倍。
> cp -rp m_1207_a0b2c8c_fast2/data  m_1208_7f2218b_fast2/
> 备份出来的数据大小是源data的2倍：
> 源snapshot 中的文件是硬链接，备份snapshot的文件会是普通文件：
>  !image-2022-12-08-21-08-54-494.png! 
> 测试流程
> 1.启动1副本1C1D（sbin/start-standalone.sh）
> config , schema ,data 是ratis ，ratis ，IoT 协议
> 2. 写入数据
> 3. 正常停止datanode（会take snapshot）
> 4. 备份数据



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (IOTDB-4665) "tsfile" (all data deleted) lack a periodic deletion policy

2023-01-02 Thread Jinrui Zhang (Jira)



 [ 
https://issues.apache.org/jira/browse/IOTDB-4665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jinrui Zhang reassigned IOTDB-4665:
---

Assignee: Jinrui Zhang

> "tsfile" (all data deleted)  lack a periodic deletion policy
> 
>
> Key: IOTDB-4665
> URL: https://issues.apache.org/jira/browse/IOTDB-4665
> Project: Apache IoTDB
>  Issue Type: Improvement
>  Components: mpp-cluster
>Reporter: 刘珍
>Assignee: Jinrui Zhang
>Priority: Major
> Attachments: image-2022-10-17-14-37-00-495.png, 
> image-2022-10-17-14-38-08-095.png
>
>
> 已执行delete timeseries root.**
> 全部数据和元数据已删除。
> 但是tsfile没有定期清理（删除）策略。
>  !image-2022-10-17-14-37-00-495.png! 
>  !image-2022-10-17-14-38-08-095.png! 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (IOTDB-5319) The write speed in atoms testing declined after merging commit 5126711d

2023-01-02 Thread Jinrui Zhang (Jira)



 [ 
https://issues.apache.org/jira/browse/IOTDB-5319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jinrui Zhang reassigned IOTDB-5319:
---

Assignee: Liuxuxin  (was: Jinrui Zhang)

> The write speed in atoms testing declined after merging commit 5126711d
> ---
>
> Key: IOTDB-5319
> URL: https://issues.apache.org/jira/browse/IOTDB-5319
> Project: Apache IoTDB
>  Issue Type: Improvement
>Reporter: Jinrui Zhang
>Assignee: Liuxuxin
>Priority: Major
> Attachments: image-2022-12-29-18-37-25-693.png, 
> image-2022-12-29-18-38-45-394.png
>
>
> !image-2022-12-29-18-37-25-693.png|width=701,height=156!
>  
> After merging this commit, the write speed in atoms testing declined. 
>  
> We inferred that this change lead to compaction grabs more CPU/IO resources, 
> which decrease available resources of write/read.
>  
> After we change the parameter `iops_per_min` from 50 to 30, the scenario 
> still exists.
> See this snapshot,
> !image-2022-12-29-18-38-45-394.png|width=541,height=346!
>  
> Let's investigate the details.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (IOTDB-5289) [Metric]Only the leader confignode can show the number of datanode and confignode

2023-01-02 Thread Jinrui Zhang (Jira)



 [ 
https://issues.apache.org/jira/browse/IOTDB-5289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jinrui Zhang reassigned IOTDB-5289:
---

Assignee: Hongyin Zhang  (was: Gaofei Cao)

> [Metric]Only the leader confignode can show the number of datanode and 
> confignode
> -
>
> Key: IOTDB-5289
> URL: https://issues.apache.org/jira/browse/IOTDB-5289
> Project: Apache IoTDB
>  Issue Type: Improvement
>  Components: Core/Cluster
>Affects Versions: 1.0.0
>Reporter: Qingxin Feng
>Assignee: Hongyin Zhang
>Priority: Minor
> Attachments: image-2022-12-27-10-14-27-396.png, 
> image-2022-12-27-10-14-39-563.png
>
>
> Only the leader confignode can show the number of datanode and confignode.
> Please refer to below pictures:
> Can we change it to "can be show on both leader and follower"?
> !image-2022-12-27-10-14-39-563.png|width=638,height=388!
> !image-2022-12-27-10-14-27-396.png|width=637,height=335!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (IOTDB-5289) [Metric]Only the leader confignode can show the number of datanode and confignode

2023-01-02 Thread Jinrui Zhang (Jira)



 [ 
https://issues.apache.org/jira/browse/IOTDB-5289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jinrui Zhang reassigned IOTDB-5289:
---

Assignee: Gaofei Cao  (was: Hongyin Zhang)

> [Metric]Only the leader confignode can show the number of datanode and 
> confignode
> -
>
> Key: IOTDB-5289
> URL: https://issues.apache.org/jira/browse/IOTDB-5289
> Project: Apache IoTDB
>  Issue Type: Improvement
>  Components: Core/Cluster
>Affects Versions: 1.0.0
>Reporter: Qingxin Feng
>Assignee: Gaofei Cao
>Priority: Minor
> Attachments: image-2022-12-27-10-14-27-396.png, 
> image-2022-12-27-10-14-39-563.png
>
>
> Only the leader confignode can show the number of datanode and confignode.
> Please refer to below pictures:
> Can we change it to "can be show on both leader and follower"?
> !image-2022-12-27-10-14-39-563.png|width=638,height=388!
> !image-2022-12-27-10-14-27-396.png|width=637,height=335!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (IOTDB-4986) Too many IoTDB-DataNodeInternalRPC-Processor threads are open

2023-01-02 Thread Jinrui Zhang (Jira)



[ 
https://issues.apache.org/jira/browse/IOTDB-4986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17653755#comment-17653755
 ] 

Jinrui Zhang commented on IOTDB-4986:
-

Let's confirmed with [~tanxinyu]  whether this issue is fixed or not

> Too many IoTDB-DataNodeInternalRPC-Processor threads are open
> -
>
> Key: IOTDB-4986
> URL: https://issues.apache.org/jira/browse/IOTDB-4986
> Project: Apache IoTDB
>  Issue Type: Improvement
>  Components: mpp-cluster
>Affects Versions: 0.14.0-SNAPSHOT
>Reporter: 刘珍
>Assignee: Haiming Zhu
>Priority: Critical
>
> m_1118_3d5eeae
> 1. 启动3副本3C21D 集群
> 2. 顺序启动7Benchmark
> 3. 某个节点的datanode IoTDB-DataNodeInternalRPC-Processor 线程会开的很多，2k+ 
> （慢慢会降下来），但是会偶现OOM
> 2022-11-18 14:26:48,320 
> [pool-22-IoTDB-DataNodeInternalRPC-Processor-374$20221118_062422_29227_16.1.0]
>  ERROR o.a.i.d.m.p.s.FragmentInstanceDispatcherImpl:234 - write locally 
> failed. TSStatus: TSStatus(code:506, subStatus:[]), message: null
> 2022-11-18 14:29:44,568 [DataNodeInternalRPC-Service]{color:red}* ERROR 
> o.a.i.c.c.IoTDBDefaultThreadExceptionHandler:31 - Exception in thread 
> DataNodeInternalRPC-Service-40
> java.lang.OutOfMemoryError: unable to create native thread: possibly out of 
> memory or process/resource limits reached*{color}
> at java.base/java.lang.Thread.start0(Native Method)
> at java.base/java.lang.Thread.start(Thread.java:803)
> at 
> java.base/java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:937)
> at 
> java.base/java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1354)
> at 
> org.apache.thrift.server.TThreadPoolServer.execute(TThreadPoolServer.java:155)
> at 
> org.apache.thrift.server.TThreadPoolServer.serve(TThreadPoolServer.java:139)
> at 
> org.apache.iotdb.commons.service.AbstractThriftServiceThread.run(AbstractThriftServiceThread.java:258)
> 2022-11-18 14:29:53,751 [ClientRPC-Service] ERROR 
> o.a.i.c.c.IoTDBDefaultThreadExceptionHandler:31 - Exception in thread 
> ClientRPC-Service-42
> java.lang.OutOfMemoryError: unable to create native thread: possibly out of 
> memory or process/resource limits reached
> at java.base/java.lang.Thread.start0(Native Method)
> at java.base/java.lang.Thread.start(Thread.java:803)
> at 
> java.base/java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:937)
> at 
> java.base/java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1354)
> at 
> org.apache.thrift.server.TThreadPoolServer.execute(TThreadPoolServer.java:155)
> at 
> org.apache.thrift.server.TThreadPoolServer.serve(TThreadPoolServer.java:139)
> at 
> org.apache.iotdb.commons.service.AbstractThriftServiceThread.run(AbstractThriftServiceThread.java:258)
> 2022-11-18 14:30:11,736 [pool-6-IoTDB-Flush-4] ERROR 
> o.a.i.d.e.s.TsFileProcessor:1095 - root.test.g_0-6: 
> /data/iotdb/m_1118_3d5eeae/sbin/../data/datanode/data/unsequence/root.test.g_0/6/2538/1668752675355-5-0-0.tsfile
>  meet error when flushing a memtable, change system mode to error
> java.lang.OutOfMemoryError: unable to create native thread: possibly out of 
> memory or process/resource limits reached
> at java.base/java.lang.Thread.start0(Native Method)
> at java.base/java.lang.Thread.start(Thread.java:803)
> at 
> java.base/java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:937)
> at 
> java.base/java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1354)
> at 
> java.base/java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:118)
> at 
> org.apache.iotdb.db.rescon.AbstractPoolManager.submit(AbstractPoolManager.java:56)
> at 
> org.apache.iotdb.db.engine.flush.MemTableFlushTask.(MemTableFlushTask.java:88)
> at 
> org.apache.iotdb.db.engine.storagegroup.TsFileProcessor.flushOneMemTable(TsFileProcessor.java:1082)
> at 
> org.apache.iotdb.db.engine.flush.FlushManager$FlushThread.runMayThrow(FlushManager.java:108)
> at 
> org.apache.iotdb.commons.concurrent.WrappedRunnable.run(WrappedRunnable.java:29)
> at 
> java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
> at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
> at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
> at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
> at java.base/java.lang.Thread.run(Thread.java:834)
> 2022-11-18 14:30:11,736 [pool-6-IoTDB-Flush-4] ERROR 
> o.a.i.c.e.HandleSystemErrorStrategy:37 - Unrecovera

[jira] [Commented] (IOTDB-4164) [ wal_mode=SYNC ] Performance needs to be optimized

2023-01-02 Thread Jinrui Zhang (Jira)



[ 
https://issues.apache.org/jira/browse/IOTDB-4164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17653754#comment-17653754
 ] 

Jinrui Zhang commented on IOTDB-4164:
-

The priority of this feature is not high. Let's move it to Backlog

> [ wal_mode=SYNC ] Performance needs to be optimized
> ---
>
> Key: IOTDB-4164
> URL: https://issues.apache.org/jira/browse/IOTDB-4164
> Project: Apache IoTDB
>  Issue Type: Improvement
>  Components: Core/WAL, mpp-cluster
>Reporter: 刘珍
>Assignee: Haiming Zhu
>Priority: Major
> Attachments: image-2022-08-17-11-38-15-849.png, 
> image-2022-10-10-09-59-07-692.png
>
>
> wal_mode=SYNC的性能需优化 : 同样（bm）配置，SYNC耗时/ASYNC=2.77
>  !image-2022-08-17-11-38-15-849.png! 
> 复现流程见
> https://issues.apache.org/jira/browse/IOTDB-4161?filter=-2



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (IOTDB-5275) [compaction][aligned ts] compaction is slow

2023-01-02 Thread Jinrui Zhang (Jira)



[ 
https://issues.apache.org/jira/browse/IOTDB-5275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17653735#comment-17653735
 ] 

Jinrui Zhang commented on IOTDB-5275:
-

What's the problem

> [compaction][aligned ts] compaction is slow
> ---
>
> Key: IOTDB-5275
> URL: https://issues.apache.org/jira/browse/IOTDB-5275
> Project: Apache IoTDB
>  Issue Type: Bug
>  Components: mpp-cluster
>Affects Versions: master branch
>Reporter: 刘珍
>Assignee: 周沛辰
>Priority: Major
> Attachments: 1_shunxu.conf, 2_shunxu.conf, 3_shunxu.conf, 
> 4_shunxu.conf, image-2022-12-23-22-27-49-731.png
>
>
> m_1222_656d281
> 问题描述：
> 对齐序列，全部是顺序数据，无其他读写操作，
> 默认配置（cross_performer=read_point
> inner_seq_performer=read_chunk
> inner_unseq_performer=read_point
> ）
> {color:#de350b}合并慢{color}
> !image-2022-12-23-22-27-49-731.png|width=987,height=390! 
> 测试环境
> 1. 私有云1期
> 关合并，生成数据。
> 配置文件见附件
> 2.对比合并性能，无其他读写操作。
> ConfigNode配置
> MAX_HEAP_SIZE="2G"
> DataNode配置
> MAX_HEAP_SIZE="18G"
> MAX_DIRECT_MEMORY_SIZE="6G"
> Common配置
> {color:#de350b}*time_partition_interval=6048000
> compaction_io_rate_per_sec=1000*{color}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (IOTDB-5319) The write speed in atoms testing declined after merging commit 5126711d

2022-12-30 Thread Jinrui Zhang (Jira)



[ 
https://issues.apache.org/jira/browse/IOTDB-5319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17653053#comment-17653053
 ] 

Jinrui Zhang commented on IOTDB-5319:
-

We have found the commit which leader to the decline. 

The commit change the limitation style of compaction task from I/O size to 
iops, which leads to that the compcation grabs more resources from read/write.

 

[~marklau99]  Please investigate the issue 

> The write speed in atoms testing declined after merging commit 5126711d
> ---
>
> Key: IOTDB-5319
> URL: https://issues.apache.org/jira/browse/IOTDB-5319
> Project: Apache IoTDB
>  Issue Type: Improvement
>Reporter: Jinrui Zhang
>Assignee: Jinrui Zhang
>Priority: Major
> Attachments: image-2022-12-29-18-37-25-693.png, 
> image-2022-12-29-18-38-45-394.png
>
>
> !image-2022-12-29-18-37-25-693.png|width=701,height=156!
>  
> After merging this commit, the write speed in atoms testing declined. 
>  
> We inferred that this change lead to compaction grabs more CPU/IO resources, 
> which decrease available resources of write/read.
>  
> After we change the parameter `iops_per_min` from 50 to 30, the scenario 
> still exists.
> See this snapshot,
> !image-2022-12-29-18-38-45-394.png|width=541,height=346!
>  
> Let's investigate the details.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (IOTDB-5319) The write speed in atoms testing declined after merging commit 5126711d

2022-12-30 Thread Jinrui Zhang (Jira)



 [ 
https://issues.apache.org/jira/browse/IOTDB-5319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jinrui Zhang reassigned IOTDB-5319:
---

Assignee: Jinrui Zhang

> The write speed in atoms testing declined after merging commit 5126711d
> ---
>
> Key: IOTDB-5319
> URL: https://issues.apache.org/jira/browse/IOTDB-5319
> Project: Apache IoTDB
>  Issue Type: Improvement
>Reporter: Jinrui Zhang
>Assignee: Jinrui Zhang
>Priority: Major
> Attachments: image-2022-12-29-18-37-25-693.png, 
> image-2022-12-29-18-38-45-394.png
>
>
> !image-2022-12-29-18-37-25-693.png|width=701,height=156!
>  
> After merging this commit, the write speed in atoms testing declined. 
>  
> We inferred that this change lead to compaction grabs more CPU/IO resources, 
> which decrease available resources of write/read.
>  
> After we change the parameter `iops_per_min` from 50 to 30, the scenario 
> still exists.
> See this snapshot,
> !image-2022-12-29-18-38-45-394.png|width=541,height=346!
>  
> Let's investigate the details.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (IOTDB-5319) The write speed in atoms testing declined after merging commit 5126711d

2022-12-30 Thread Jinrui Zhang (Jira)



 [ 
https://issues.apache.org/jira/browse/IOTDB-5319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jinrui Zhang reassigned IOTDB-5319:
---

Assignee: (was: Jinrui Zhang)

> The write speed in atoms testing declined after merging commit 5126711d
> ---
>
> Key: IOTDB-5319
> URL: https://issues.apache.org/jira/browse/IOTDB-5319
> Project: Apache IoTDB
>  Issue Type: Improvement
>Reporter: Jinrui Zhang
>Priority: Major
> Attachments: image-2022-12-29-18-37-25-693.png, 
> image-2022-12-29-18-38-45-394.png
>
>
> !image-2022-12-29-18-37-25-693.png|width=701,height=156!
>  
> After merging this commit, the write speed in atoms testing declined. 
>  
> We inferred that this change lead to compaction grabs more CPU/IO resources, 
> which decrease available resources of write/read.
>  
> After we change the parameter `iops_per_min` from 50 to 30, the scenario 
> still exists.
> See this snapshot,
> !image-2022-12-29-18-38-45-394.png|width=541,height=346!
>  
> Let's investigate the details.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (IOTDB-5279) [Metric] Got a wrong number after restart the cluster

2022-12-29 Thread Jinrui Zhang (Jira)



[ 
https://issues.apache.org/jira/browse/IOTDB-5279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17653047#comment-17653047
 ] 

Jinrui Zhang commented on IOTDB-5279:
-

Let's paste the PR if it is completed

> [Metric] Got a wrong number after restart the cluster
> -
>
> Key: IOTDB-5279
> URL: https://issues.apache.org/jira/browse/IOTDB-5279
> Project: Apache IoTDB
>  Issue Type: Bug
>  Components: Core/Cluster
>Affects Versions: 1.0.0
>Reporter: Qingxin Feng
>Assignee: Liuxuxin
>Priority: Minor
> Attachments: image-2022-12-26-11-30-04-470.png
>
>
> Reproduce:
> commit version: 1.0.1-SNAPSHOT (Build: a7908ab-dev)
> Steps:
> 1. setup cluster (3C3D 3副本）
> 2. using BM to insert data
> 3. after all test finished,try to restart the cluster
> 4. Check the result in iotdb-metric,like below picture
> http://111.202.73.147:13000/d/TbEVYRw7A/apache-iotdb-datanode-dashboard?orgId=1&from=1672013871154&to=1672024516629&var-job=datanode&var-instance=172.20.70.22:9091
> !image-2022-12-26-11-30-04-470.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (IOTDB-5319) The write speed in atoms testing declined after merging commit 5126711d

2022-12-29 Thread Jinrui Zhang (Jira)



 [ 
https://issues.apache.org/jira/browse/IOTDB-5319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jinrui Zhang reassigned IOTDB-5319:
---

Assignee: Jinrui Zhang

> The write speed in atoms testing declined after merging commit 5126711d
> ---
>
> Key: IOTDB-5319
> URL: https://issues.apache.org/jira/browse/IOTDB-5319
> Project: Apache IoTDB
>  Issue Type: Improvement
>Reporter: Jinrui Zhang
>Assignee: Jinrui Zhang
>Priority: Major
> Attachments: image-2022-12-29-18-37-25-693.png, 
> image-2022-12-29-18-38-45-394.png
>
>
> !image-2022-12-29-18-37-25-693.png|width=701,height=156!
>  
> After merging this commit, the write speed in atoms testing declined. 
>  
> We inferred that this change lead to compaction grabs more CPU/IO resources, 
> which decrease available resources of write/read.
>  
> After we change the parameter `iops_per_min` from 50 to 30, the scenario 
> still exists.
> See this snapshot,
> !image-2022-12-29-18-38-45-394.png|width=541,height=346!
>  
> Let's investigate the details.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (IOTDB-5319) The write speed in atoms testing declined after merging commit 5126711d

2022-12-29 Thread Jinrui Zhang (Jira)

Jinrui Zhang created IOTDB-5319:
---

 Summary: The write speed in atoms testing declined after merging 
commit 5126711d
 Key: IOTDB-5319
 URL: https://issues.apache.org/jira/browse/IOTDB-5319
 Project: Apache IoTDB
  Issue Type: Improvement
Reporter: Jinrui Zhang
 Attachments: image-2022-12-29-18-37-25-693.png

!image-2022-12-29-18-37-25-693.png|width=701,height=156!

 

After merging this commit, the write speed in atoms testing declined. 

 

We inferred that this change lead to compaction grabs more CPU/IO resources, 
which decrease available resources of write/read.

 

After we change the parameter `iops_per_min` from 50 to 30, the scenario still 
exists.

See this snapshot,

!image-2022-12-29-18-35-42-134.png|width=521,height=333!

 

Let's investigate the details.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (IOTDB-4684) Devices with the same name but different alignment properties are compacted into the wrong alignment property

2022-12-28 Thread Jinrui Zhang (Jira)



[ 
https://issues.apache.org/jira/browse/IOTDB-4684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17652719#comment-17652719
 ] 

Jinrui Zhang commented on IOTDB-4684:
-

The schema process in compaction task execution stage is not accurate.

> Devices with the same name but different alignment properties are compacted 
> into the wrong alignment property
> -
>
> Key: IOTDB-4684
> URL: https://issues.apache.org/jira/browse/IOTDB-4684
> Project: Apache IoTDB
>  Issue Type: Bug
>Affects Versions: 0.13.0
>Reporter: 周沛辰
>Assignee: 周沛辰
>Priority: Major
>
> *Description*
> After the nonAligned device is deleted, an aligned device with the same name 
> is created, and it will be compacted into the nonAligned device after 
> compaction.
> Similarly, after the aligned device is deleted, an nonAligned device with the 
> same name is created, and it will be compacted into the aligned device after 
> compaction.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (IOTDB-5147) Optimize compaction schedule when priority is BALANCE

2022-12-28 Thread Jinrui Zhang (Jira)



[ 
https://issues.apache.org/jira/browse/IOTDB-5147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17652718#comment-17652718
 ] 

Jinrui Zhang commented on IOTDB-5147:
-

Still in talk stage,  won't complete in 1.0.1

> Optimize compaction schedule when priority is BALANCE
> -
>
> Key: IOTDB-5147
> URL: https://issues.apache.org/jira/browse/IOTDB-5147
> Project: Apache IoTDB
>  Issue Type: Improvement
>Reporter: 周沛辰
>Assignee: 周沛辰
>Priority: Major
>
> When the priority is BALANCE, there will be a problem with the compaction 
> schedule, that is, when new inner space compaction tasks are continuously 
> submitted to the priority queue, cross space compaction tasks will be starved 
> to death.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (IOTDB-5189) Optimize the memory usage of fast compaction

2022-12-28 Thread Jinrui Zhang (Jira)



[ 
https://issues.apache.org/jira/browse/IOTDB-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17652717#comment-17652717
 ] 

Jinrui Zhang commented on IOTDB-5189:
-

Still in progress. Won't merged into 1.0.1

> Optimize the memory usage of fast compaction
> 
>
> Key: IOTDB-5189
> URL: https://issues.apache.org/jira/browse/IOTDB-5189
> Project: Apache IoTDB
>  Issue Type: Improvement
>Reporter: 周沛辰
>Assignee: 周沛辰
>Priority: Major
> Fix For: 1.0.1
>
>
> Only read the chunks that need to be used into the memory each time, instead 
> of reading all the overlapping chunks into the memory at once.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (IOTDB-4986) Too many IoTDB-DataNodeInternalRPC-Processor threads are open

2022-12-27 Thread Jinrui Zhang (Jira)



[ 
https://issues.apache.org/jira/browse/IOTDB-4986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17652409#comment-17652409
 ] 

Jinrui Zhang commented on IOTDB-4986:
-

This issue is related to the ClientManager used in IoTDB cluster. [~LebronAl]  
is fixing this issue

> Too many IoTDB-DataNodeInternalRPC-Processor threads are open
> -
>
> Key: IOTDB-4986
> URL: https://issues.apache.org/jira/browse/IOTDB-4986
> Project: Apache IoTDB
>  Issue Type: Improvement
>  Components: mpp-cluster
>Affects Versions: 0.14.0-SNAPSHOT
>Reporter: 刘珍
>Assignee: Haiming Zhu
>Priority: Critical
>
> m_1118_3d5eeae
> 1. 启动3副本3C21D 集群
> 2. 顺序启动7Benchmark
> 3. 某个节点的datanode IoTDB-DataNodeInternalRPC-Processor 线程会开的很多，2k+ 
> （慢慢会降下来），但是会偶现OOM
> 2022-11-18 14:26:48,320 
> [pool-22-IoTDB-DataNodeInternalRPC-Processor-374$20221118_062422_29227_16.1.0]
>  ERROR o.a.i.d.m.p.s.FragmentInstanceDispatcherImpl:234 - write locally 
> failed. TSStatus: TSStatus(code:506, subStatus:[]), message: null
> 2022-11-18 14:29:44,568 [DataNodeInternalRPC-Service]{color:red}* ERROR 
> o.a.i.c.c.IoTDBDefaultThreadExceptionHandler:31 - Exception in thread 
> DataNodeInternalRPC-Service-40
> java.lang.OutOfMemoryError: unable to create native thread: possibly out of 
> memory or process/resource limits reached*{color}
> at java.base/java.lang.Thread.start0(Native Method)
> at java.base/java.lang.Thread.start(Thread.java:803)
> at 
> java.base/java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:937)
> at 
> java.base/java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1354)
> at 
> org.apache.thrift.server.TThreadPoolServer.execute(TThreadPoolServer.java:155)
> at 
> org.apache.thrift.server.TThreadPoolServer.serve(TThreadPoolServer.java:139)
> at 
> org.apache.iotdb.commons.service.AbstractThriftServiceThread.run(AbstractThriftServiceThread.java:258)
> 2022-11-18 14:29:53,751 [ClientRPC-Service] ERROR 
> o.a.i.c.c.IoTDBDefaultThreadExceptionHandler:31 - Exception in thread 
> ClientRPC-Service-42
> java.lang.OutOfMemoryError: unable to create native thread: possibly out of 
> memory or process/resource limits reached
> at java.base/java.lang.Thread.start0(Native Method)
> at java.base/java.lang.Thread.start(Thread.java:803)
> at 
> java.base/java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:937)
> at 
> java.base/java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1354)
> at 
> org.apache.thrift.server.TThreadPoolServer.execute(TThreadPoolServer.java:155)
> at 
> org.apache.thrift.server.TThreadPoolServer.serve(TThreadPoolServer.java:139)
> at 
> org.apache.iotdb.commons.service.AbstractThriftServiceThread.run(AbstractThriftServiceThread.java:258)
> 2022-11-18 14:30:11,736 [pool-6-IoTDB-Flush-4] ERROR 
> o.a.i.d.e.s.TsFileProcessor:1095 - root.test.g_0-6: 
> /data/iotdb/m_1118_3d5eeae/sbin/../data/datanode/data/unsequence/root.test.g_0/6/2538/1668752675355-5-0-0.tsfile
>  meet error when flushing a memtable, change system mode to error
> java.lang.OutOfMemoryError: unable to create native thread: possibly out of 
> memory or process/resource limits reached
> at java.base/java.lang.Thread.start0(Native Method)
> at java.base/java.lang.Thread.start(Thread.java:803)
> at 
> java.base/java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:937)
> at 
> java.base/java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1354)
> at 
> java.base/java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:118)
> at 
> org.apache.iotdb.db.rescon.AbstractPoolManager.submit(AbstractPoolManager.java:56)
> at 
> org.apache.iotdb.db.engine.flush.MemTableFlushTask.(MemTableFlushTask.java:88)
> at 
> org.apache.iotdb.db.engine.storagegroup.TsFileProcessor.flushOneMemTable(TsFileProcessor.java:1082)
> at 
> org.apache.iotdb.db.engine.flush.FlushManager$FlushThread.runMayThrow(FlushManager.java:108)
> at 
> org.apache.iotdb.commons.concurrent.WrappedRunnable.run(WrappedRunnable.java:29)
> at 
> java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
> at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
> at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
> at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
> at java.base/java.lang.Thread.run(Thread.java:834)
> 2022-11-18 14:30:11,736 [pool-6-IoTDB-Flush-4] ERROR 
> o.a.i.c.e.HandleSys

[jira] [Created] (IOTDB-5285) TimePartition may be error when restarting with different time partition configuration

2022-12-25 Thread Jinrui Zhang (Jira)

Jinrui Zhang created IOTDB-5285:
---

 Summary: TimePartition may be error when restarting with different 
time partition configuration
 Key: IOTDB-5285
 URL: https://issues.apache.org/jira/browse/IOTDB-5285
 Project: Apache IoTDB
  Issue Type: Bug
Reporter: Jinrui Zhang
Assignee: Haiming Zhu


Reproduce steps:
 # generate data files using time partition configuration A (eg. 1 week)
 # backup data files
 # stop system and change time partition configuration to B (eg. 1 day)
 # restart system and see the time partition in memory
 ##  we can check the time partition using Arthas, the cmd is such as `ognl 
"@org.apache.iotdb.db.engine.StorageEngine@getInstance().dataRegionMap.get(new 
org.apache.iotdb.commons.consensus.DataRegionId(6)).tsfileManager.unsequenceFiles"`



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (IOTDB-5266) Seq file may be lost when selecting cross compaction task

2022-12-22 Thread Jinrui Zhang (Jira)

Jinrui Zhang created IOTDB-5266:
---

 Summary: Seq file may be lost when selecting cross compaction task
 Key: IOTDB-5266
 URL: https://issues.apache.org/jira/browse/IOTDB-5266
 Project: Apache IoTDB
  Issue Type: Improvement
Reporter: Jinrui Zhang
Assignee: Jinrui Zhang


Currently, when selecting cross compaction task, some seq files may be lost if 
the seq file using FileTimeIndex rather than DeviceTimeIndex.

It is because the FileTimeIndex cannot describe the startime/endtime accurately 
for a specific device. It will lead to the selection be terminated in advance. 
So that some seq files are lost



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (IOTDB-5263) Optimize the cross compaction file selection and execution

2022-12-21 Thread Jinrui Zhang (Jira)



[ 
https://issues.apache.org/jira/browse/IOTDB-5263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17650798#comment-17650798
 ] 

Jinrui Zhang commented on IOTDB-5263:
-

We will do this optimization after bug fix in current implementation

> Optimize the cross compaction file selection and execution
> --
>
> Key: IOTDB-5263
> URL: https://issues.apache.org/jira/browse/IOTDB-5263
> Project: Apache IoTDB
>  Issue Type: Improvement
>Reporter: Jinrui Zhang
>Assignee: Jinrui Zhang
>Priority: Major
> Attachments: image-2022-12-21-18-06-40-580.png
>
>
> In current implementation, when selecting the `overlapped` seq files for one 
> specific unseq file, one seq file will always be selected even though it 
> doesn't has overlap with the unseq file. See the sample below.
> !image-2022-12-21-18-06-40-580.png|width=718,height=220!
> That is, when selecting seq files for `3`, file-1 will be selected even 
> though there is no overlap between 1 and 3. It is because we need to find a 
> target file for 3 in current cross compaction implementation. Or there will 
> be overlapped seq files generated after cross compaction. 
> We need to do the optimization for it.
>  # Only select the seq files which has  overlapped with target unseq file.
>  # change the implementation of the cross compaction to find the target seq 
> file and decrease unnecessary file write.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (IOTDB-5263) Optimize the cross compaction file selection and execution

2022-12-21 Thread Jinrui Zhang (Jira)

Jinrui Zhang created IOTDB-5263:
---

 Summary: Optimize the cross compaction file selection and execution
 Key: IOTDB-5263
 URL: https://issues.apache.org/jira/browse/IOTDB-5263
 Project: Apache IoTDB
  Issue Type: Improvement
Reporter: Jinrui Zhang
Assignee: Jinrui Zhang






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (IOTDB-4674) Reimplement settle by compaction

2022-12-19 Thread Jinrui Zhang (Jira)



 [ 
https://issues.apache.org/jira/browse/IOTDB-4674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jinrui Zhang reassigned IOTDB-4674:
---

Assignee: Wenwei Shu  (was: 周沛辰)

> Reimplement settle by compaction
> 
>
> Key: IOTDB-4674
> URL: https://issues.apache.org/jira/browse/IOTDB-4674
> Project: Apache IoTDB
>  Issue Type: New Feature
>Affects Versions: 0.14.0-SNAPSHOT
>Reporter: Haonan Hou
>Assignee: Wenwei Shu
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (IOTDB-4986) Too many IoTDB-DataNodeInternalRPC-Processor threads are open

2022-12-14 Thread Jinrui Zhang (Jira)



 [ 
https://issues.apache.org/jira/browse/IOTDB-4986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jinrui Zhang reassigned IOTDB-4986:
---

Assignee: Haiming Zhu  (was: Jinrui Zhang)

> Too many IoTDB-DataNodeInternalRPC-Processor threads are open
> -
>
> Key: IOTDB-4986
> URL: https://issues.apache.org/jira/browse/IOTDB-4986
> Project: Apache IoTDB
>  Issue Type: Improvement
>  Components: mpp-cluster
>Affects Versions: 0.14.0-SNAPSHOT
>Reporter: 刘珍
>Assignee: Haiming Zhu
>Priority: Critical
>
> m_1118_3d5eeae
> 1. 启动3副本3C21D 集群
> 2. 顺序启动7Benchmark
> 3. 某个节点的datanode IoTDB-DataNodeInternalRPC-Processor 线程会开的很多，2k+ 
> （慢慢会降下来），但是会偶现OOM
> 2022-11-18 14:26:48,320 
> [pool-22-IoTDB-DataNodeInternalRPC-Processor-374$20221118_062422_29227_16.1.0]
>  ERROR o.a.i.d.m.p.s.FragmentInstanceDispatcherImpl:234 - write locally 
> failed. TSStatus: TSStatus(code:506, subStatus:[]), message: null
> 2022-11-18 14:29:44,568 [DataNodeInternalRPC-Service]{color:red}* ERROR 
> o.a.i.c.c.IoTDBDefaultThreadExceptionHandler:31 - Exception in thread 
> DataNodeInternalRPC-Service-40
> java.lang.OutOfMemoryError: unable to create native thread: possibly out of 
> memory or process/resource limits reached*{color}
> at java.base/java.lang.Thread.start0(Native Method)
> at java.base/java.lang.Thread.start(Thread.java:803)
> at 
> java.base/java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:937)
> at 
> java.base/java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1354)
> at 
> org.apache.thrift.server.TThreadPoolServer.execute(TThreadPoolServer.java:155)
> at 
> org.apache.thrift.server.TThreadPoolServer.serve(TThreadPoolServer.java:139)
> at 
> org.apache.iotdb.commons.service.AbstractThriftServiceThread.run(AbstractThriftServiceThread.java:258)
> 2022-11-18 14:29:53,751 [ClientRPC-Service] ERROR 
> o.a.i.c.c.IoTDBDefaultThreadExceptionHandler:31 - Exception in thread 
> ClientRPC-Service-42
> java.lang.OutOfMemoryError: unable to create native thread: possibly out of 
> memory or process/resource limits reached
> at java.base/java.lang.Thread.start0(Native Method)
> at java.base/java.lang.Thread.start(Thread.java:803)
> at 
> java.base/java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:937)
> at 
> java.base/java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1354)
> at 
> org.apache.thrift.server.TThreadPoolServer.execute(TThreadPoolServer.java:155)
> at 
> org.apache.thrift.server.TThreadPoolServer.serve(TThreadPoolServer.java:139)
> at 
> org.apache.iotdb.commons.service.AbstractThriftServiceThread.run(AbstractThriftServiceThread.java:258)
> 2022-11-18 14:30:11,736 [pool-6-IoTDB-Flush-4] ERROR 
> o.a.i.d.e.s.TsFileProcessor:1095 - root.test.g_0-6: 
> /data/iotdb/m_1118_3d5eeae/sbin/../data/datanode/data/unsequence/root.test.g_0/6/2538/1668752675355-5-0-0.tsfile
>  meet error when flushing a memtable, change system mode to error
> java.lang.OutOfMemoryError: unable to create native thread: possibly out of 
> memory or process/resource limits reached
> at java.base/java.lang.Thread.start0(Native Method)
> at java.base/java.lang.Thread.start(Thread.java:803)
> at 
> java.base/java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:937)
> at 
> java.base/java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1354)
> at 
> java.base/java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:118)
> at 
> org.apache.iotdb.db.rescon.AbstractPoolManager.submit(AbstractPoolManager.java:56)
> at 
> org.apache.iotdb.db.engine.flush.MemTableFlushTask.(MemTableFlushTask.java:88)
> at 
> org.apache.iotdb.db.engine.storagegroup.TsFileProcessor.flushOneMemTable(TsFileProcessor.java:1082)
> at 
> org.apache.iotdb.db.engine.flush.FlushManager$FlushThread.runMayThrow(FlushManager.java:108)
> at 
> org.apache.iotdb.commons.concurrent.WrappedRunnable.run(WrappedRunnable.java:29)
> at 
> java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
> at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
> at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
> at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
> at java.base/java.lang.Thread.run(Thread.java:834)
> 2022-11-18 14:30:11,736 [pool-6-IoTDB-Flush-4] ERROR 
> o.a.i.c.e.HandleSystemErrorStrategy:37 - Unrecoverable error occurs! Change 
> system status to read-only because handle_sys

[jira] [Commented] (IOTDB-5035) After the datanode is removed successfully, the snapshot can be deleted

2022-12-11 Thread Jinrui Zhang (Jira)



[ 
https://issues.apache.org/jira/browse/IOTDB-5035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17645745#comment-17645745
 ] 

Jinrui Zhang commented on IOTDB-5035:
-

https://github.com/apache/iotdb/pull/8383

> After the datanode is removed successfully, the snapshot can be deleted
> ---
>
> Key: IOTDB-5035
> URL: https://issues.apache.org/jira/browse/IOTDB-5035
> Project: Apache IoTDB
>  Issue Type: Improvement
>  Components: mpp-cluster
>Reporter: 刘珍
>Assignee: Haiming Zhu
>Priority: Minor
> Attachments: image-2022-11-24-14-38-09-579.png
>
>
> 测试版本：1124_cd839a4
> 在机器上的路径：/data/liuzhen_test/master_1123_32e2f98 （1124_cd839a4的lib）
> 问题描述：
> 正常stop datanode，会触发snapshot，
> 启动这个节点后，执行缩容datanode（ip76/ip62），缩容成功后，这个snapshot没有被删除：
>  !image-2022-11-24-14-38-09-579.png! 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (IOTDB-5165) [ compaction ]

2022-12-11 Thread Jinrui Zhang (Jira)



[ 
https://issues.apache.org/jira/browse/IOTDB-5165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17645744#comment-17645744
 ] 

Jinrui Zhang commented on IOTDB-5165:
-

Need to take a look in priority

> [ compaction ]
> --
>
> Key: IOTDB-5165
> URL: https://issues.apache.org/jira/browse/IOTDB-5165
> Project: Apache IoTDB
>  Issue Type: Bug
>  Components: Core/Compaction, mpp-cluster
>Affects Versions: master branch, 1.0.0
>Reporter: 刘珍
>Assignee: 周沛辰
>Priority: Major
> Attachments: 1.conf, 10.conf, 2.conf, 3.conf, 4.conf, 5.conf, 6.conf, 
> 7.conf, 8.conf, 9.conf, run.sh, run_conf.sh
>
>
> master 2022-12-09_a31441c
> 合并失败，报错
> 2022-12-09 14:21:46,728 [pool-43-IoTDB-Compaction-8] ERROR 
> o.a.i.d.e.c.CompactionUtils:281 - root.test.g2_0 Device 
> root.test.g2_0.d_82215 {color:#DE350B}*is overlapped between file*{color} is 
> /data1/iotdb/m_1209_a31441c_fast1/./sbin/../data/datanode/data/sequence/root.test.g2_0/5/25/1670420064671-7-0-0.tsfile,
>  status: COMPACTING and file is 
> /data1/iotdb/m_1209_a31441c_fast1/./sbin/../data/datanode/data/sequence/root.test.g2_0/5/25/1670420083777-8-0-0.tsfile,
>  status: COMPACTING, end time in file is 
> /data1/iotdb/m_1209_a31441c_fast1/./sbin/../data/datanode/data/sequence/root.test.g2_0/5/25/1670420064671-7-0-0.tsfile,
>  status: COMPACTING is 153556841, start time in file is 
> /data1/iotdb/m_1209_a31441c_fast1/./sbin/../data/datanode/data/sequence/root.test.g2_0/5/25/1670420083777-8-0-0.tsfile,
>  status: COMPACTING is 153555842
> 2022-12-09 14:21:46,729 [pool-43-IoTDB-Compaction-8] ERROR 
> o.a.i.d.e.c.i.InnerSpaceCompactionTask:184 - {color:#DE350B}*Failed to pass 
> compaction validation*{color}, source files is: [file is 
> /data1/iotdb/m_1209_a31441c_fast1/./sbin/../data/datanode/data/sequence/root.test.g2_0/5/25/1670444897033-4581-0-0.tsfile,
>  status: COMPACTING, file is 
> /data1/iotdb/m_1209_a31441c_fast1/./sbin/../data/datanode/data/sequence/root.test.g2_0/5/25/1670444947849-4590-0-0.tsfile,
>  status: COMPACTING, file is 
> /data1/iotdb/m_1209_a31441c_fast1/./sbin/../data/datanode/data/sequence/root.test.g2_0/5/25/167044454-4600-0-0.tsfile,
>  status: COMPACTING, file is 
> /data1/iotdb/m_1209_a31441c_fast1/./sbin/../data/datanode/data/sequence/root.test.g2_0/5/25/1670445051784-4609-0-0.tsfile,
>  status: COMPACTING, file is 
> /data1/iotdb/m_1209_a31441c_fast1/./sbin/../data/datanode/data/sequence/root.test.g2_0/5/25/1670445101595-4619-0-0.tsfile,
>  status: COMPACTING, file is 
> /data1/iotdb/m_1209_a31441c_fast1/./sbin/../data/datanode/data/sequence/root.test.g2_0/5/25/1670445153290-4628-0-0.tsfile,
>  status: COMPACTING, file is 
> /data1/iotdb/m_1209_a31441c_fast1/./sbin/../data/datanode/data/sequence/root.test.g2_0/5/25/1670445204996-4638-0-0.tsfile,
>  status: COMPACTING, file is 
> /data1/iotdb/m_1209_a31441c_fast1/./sbin/../data/datanode/data/sequence/root.test.g2_0/5/25/1670445254210-4647-0-0.tsfile,
>  status: COMPACTING, file is 
> /data1/iotdb/m_1209_a31441c_fast1/./sbin/../data/datanode/data/sequence/root.test.g2_0/5/25/1670445304094-4656-0-0.tsfile,
>  status: COMPACTING, file is 
> /data1/iotdb/m_1209_a31441c_fast1/./sbin/../data/datanode/data/sequence/root.test.g2_0/5/25/1670445355765-4666-0-0.tsfile,
>  status: COMPACTING, file is 
> /data1/iotdb/m_1209_a31441c_fast1/./sbin/../data/datanode/data/sequence/root.test.g2_0/5/25/1670445407476-4675-0-0.tsfile,
>  status: COMPACTING, file is 
> /data1/iotdb/m_1209_a31441c_fast1/./sbin/../data/datanode/data/sequence/root.test.g2_0/5/25/1670445458633-4685-0-0.tsfile,
>  status: COMPACTING, file is 
> /data1/iotdb/m_1209_a31441c_fast1/./sbin/../data/datanode/data/sequence/root.test.g2_0/5/25/1670445509050-4694-0-0.tsfile,
>  status: COMPACTING, file is 
> /data1/iotdb/m_1209_a31441c_fast1/./sbin/../data/datanode/data/sequence/root.test.g2_0/5/25/1670445558911-4703-0-0.tsfile,
>  status: COMPACTING, file is 
> /data1/iotdb/m_1209_a31441c_fast1/./sbin/../data/datanode/data/sequence/root.test.g2_0/5/25/1670445608483-4712-0-0.tsfile,
>  status: COMPACTING, file is 
> /data1/iotdb/m_1209_a31441c_fast1/./sbin/../data/datanode/data/sequence/root.test.g2_0/5/25/1670445660518-4722-0-0.tsfile,
>  status: COMPACTING, file is 
> /data1/iotdb/m_1209_a31441c_fast1/./sbin/../data/datanode/data/sequence/root.test.g2_0/5/25/1670445709301-4731-0-0.tsfile,
>  status: COMPACTING, file is 
> /data1/iotdb/m_1209_a31441c_fast1/./sbin/../data/datanode/data/sequence/root.test.g2_0/5/25/1670445760226-4741-0-0.tsfile,
>  status: COMPACTING, file is 
> /data1/iotdb/m_1209_a31441c_fast1/./sbin/../data/datanode/data/sequence/root.test.g2_0/5/25/1670445808744-4750-0-0.tsfile,
>  status: COMPACTING, file is 
> /data1/iotdb/m_1209_a31441c_fast1/./sbin/../data/datanode/data/sequence/root.test.g2_0/5/25/1670445860031

[jira] [Commented] (IOTDB-5139) [benchmark]1device over 3w sensors, insert nothing

2022-12-11 Thread Jinrui Zhang (Jira)



[ 
https://issues.apache.org/jira/browse/IOTDB-5139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17645743#comment-17645743
 ] 

Jinrui Zhang commented on IOTDB-5139:
-

It seems that this issue is easy to repro. Let's repro it and investigate 
whether it is caused by large request. 

> [benchmark]1device over 3w sensors, insert nothing 
> ---
>
> Key: IOTDB-5139
> URL: https://issues.apache.org/jira/browse/IOTDB-5139
> Project: Apache IoTDB
>  Issue Type: Bug
>Affects Versions: 1.0.0
>Reporter: changxue
>Assignee: 张洪胤
>Priority: Minor
> Attachments: allnodes-log-3w.tar.gz, benchmark-logs-3w.log, 
> benchmark-logs-50w.log, config.properties
>
>
> [benchmark]1device over 3w sensors, insert nothing 
> environment:
> benchmark: 1.0 commit: 25c1f742
> iotdb: 3C3D cluster, 1.0.0 release edition
> create timeseries succeeded but "show regions" showed schema info only.
>  
> configs and logs see attachments.
> 问题：
> 1. 3万sensors的为什么没有数据写入。3千sensors的可以。我使用session.insertRecord自己写代码insert，是成功的。
> 2. loop=2, 第二次持续刷下面日志，就是不结束。
> 2022-12-07 14:56:44,540 INFO  
> cn.edu.tsinghua.iot.benchmark.client.DataClient:137 - pool-2-thread-1 50.00% 
> workload is done.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (IOTDB-4400) [new stand-alone]enableMetric, the write performance does not meet expectations

2022-12-11 Thread Jinrui Zhang (Jira)



[ 
https://issues.apache.org/jira/browse/IOTDB-4400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17645740#comment-17645740
 ] 

Jinrui Zhang commented on IOTDB-4400:
-

Need to do the test again with latest code 

> [new stand-alone]enableMetric, the write performance does not meet 
> expectations
> ---
>
> Key: IOTDB-4400
> URL: https://issues.apache.org/jira/browse/IOTDB-4400
> Project: Apache IoTDB
>  Issue Type: Improvement
>  Components: Others
>Affects Versions: master branch, 0.14.0, 0.14.0-SNAPSHOT
>Reporter: xiaozhihong
>Assignee: 张洪胤
>Priority: Major
> Attachments: config.properties, image-2022-09-14-10-27-26-819.png
>
>
> commit 74fb350809b2f1488a90d6d7c420f27ec14b24e5
> Turn on the monitoring function, and the two frameworks perform write 
> performance tests for different metric levels. The final result is very 
> confusing. Different levels or different frameworks, the performance of 
> writing is not noticeable. Turn off monitoring, execute writing, and also no 
> obvious difference is seen, and a positioning investigation needs to be done.
> Details:
> https://apache-iotdb.feishu.cn/docx/QUQSdbRaaoWDjQxcEz9cQjFdnsz?from=create_suite
> !image-2022-09-14-10-27-26-819.png|width=528,height=275!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (IOTDB-5030) [Schema-Read-Performance] java.lang.IllegalArgumentException: all replicas for region[TConsensusGroupId(type:SchemaRegion, id:6)] are not available in these DataNodes

2022-12-11 Thread Jinrui Zhang (Jira)



 [ 
https://issues.apache.org/jira/browse/IOTDB-5030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jinrui Zhang reassigned IOTDB-5030:
---

  Sprint: 2022-12-Schema  (was: 2022-11-Cluster)
Assignee: Yukun Zhou  (was: Jinrui Zhang)

> [Schema-Read-Performance] java.lang.IllegalArgumentException: all replicas 
> for region[TConsensusGroupId(type:SchemaRegion, id:6)] are not available in 
> these DataNodes
> --
>
> Key: IOTDB-5030
> URL: https://issues.apache.org/jira/browse/IOTDB-5030
> Project: Apache IoTDB
>  Issue Type: Bug
>  Components: mpp-cluster
>Affects Versions: 0.14.0-SNAPSHOT
>Reporter: 刘珍
>Assignee: Yukun Zhou
>Priority: Minor
> Attachments: iotdb_4851.conf
>
>
> master_1123_32e2f98
> 1. 启动1副本3C5D集群
> 2. BM 写入数据，50分钟，ip68 报错
> {color:#DE350B}2022-11-23 15:32:46,820 
> [pool-24-IoTDB-DataNodeInternalRPC-Processor-122] ERROR 
> o.a.t.ProcessFunction:47 - Internal error processing sendPlanNode
> java.lang.IllegalArgumentException: all replicas for 
> region[TConsensusGroupId(type:SchemaRegion, id:1)] are not available in these 
> DataNodes[[TDataNodeLocation(dataNodeId:4, 
> clientRpcEndPoint:TEndPoint(ip:192.168.10.66, port:6667), 
> internalEndPoint:TEndPoint(ip:192.168.10.66, port:9003), 
> mPPDataExchangeEndPoint:TEndPoint(ip:192.168.10.66, port:8777), 
> dataRegionConsensusEndPoint:TEndPoint(ip:192.168.10.66, port:40010), 
> schemaRegionConsensusEndPoint:TEndPoint(ip:192.168.10.66, 
> port:50010))]]{color}
> at 
> org.apache.iotdb.db.mpp.plan.planner.distribution.SimpleFragmentParallelPlanner.selectTargetDataNode(SimpleFragmentParallelPlanner.java:146)
> at 
> org.apache.iotdb.db.mpp.plan.planner.distribution.SimpleFragmentParallelPlanner.produceFragmentInstance(SimpleFragmentParallelPlanner.java:115)
> at 
> org.apache.iotdb.db.mpp.plan.planner.distribution.SimpleFragmentParallelPlanner.prepare(SimpleFragmentParallelPlanner.java:87)
> at 
> org.apache.iotdb.db.mpp.plan.planner.distribution.SimpleFragmentParallelPlanner.parallelPlan(SimpleFragmentParallelPlanner.java:78)
> at 
> org.apache.iotdb.db.mpp.plan.planner.distribution.DistributionPlanner.planFragmentInstances(DistributionPlanner.java:94)
> at 
> org.apache.iotdb.db.mpp.plan.planner.distribution.DistributionPlanner.planFragments(DistributionPlanner.java:78)
> at 
> org.apache.iotdb.db.mpp.plan.execution.QueryExecution.doDistributedPlan(QueryExecution.java:304)
> at 
> org.apache.iotdb.db.mpp.plan.execution.QueryExecution.start(QueryExecution.java:201)
> at 
> org.apache.iotdb.db.mpp.plan.execution.QueryExecution.retry(QueryExecution.java:235)
> at 
> org.apache.iotdb.db.mpp.plan.execution.QueryExecution.getStatus(QueryExecution.java:500)
> at 
> org.apache.iotdb.db.mpp.plan.Coordinator.execute(Coordinator.java:152)
> at 
> org.apache.iotdb.db.mpp.plan.analyze.ClusterSchemaFetcher.executeSchemaFetchQuery(ClusterSchemaFetcher.java:178)
> at 
> org.apache.iotdb.db.mpp.plan.analyze.ClusterSchemaFetcher.fetchSchema(ClusterSchemaFetcher.java:156)
> at 
> org.apache.iotdb.db.mpp.plan.analyze.ClusterSchemaFetcher.fetchSchema(ClusterSchemaFetcher.java:98)
> at 
> org.apache.iotdb.db.mpp.plan.analyze.ClusterSchemaFetcher.fetchSchemaWithAutoCreate(ClusterSchemaFetcher.java:265)
> at 
> org.apache.iotdb.db.mpp.plan.analyze.SchemaValidator.validate(SchemaValidator.java:56)
> at 
> org.apache.iotdb.db.mpp.execution.executor.RegionWriteExecutor$WritePlanNodeExecutionVisitor.executeDataInsert(RegionWriteExecutor.java:193)
> at 
> org.apache.iotdb.db.mpp.execution.executor.RegionWriteExecutor$WritePlanNodeExecutionVisitor.visitInsertTablet(RegionWriteExecutor.java:165)
> at 
> org.apache.iotdb.db.mpp.execution.executor.RegionWriteExecutor$WritePlanNodeExecutionVisitor.visitInsertTablet(RegionWriteExecutor.java:119)
> at 
> org.apache.iotdb.db.mpp.plan.planner.plan.node.write.InsertTabletNode.accept(InsertTabletNode.java:1086)
> at 
> org.apache.iotdb.db.mpp.execution.executor.RegionWriteExecutor.execute(RegionWriteExecutor.java:85)
> at 
> org.apache.iotdb.db.service.thrift.impl.DataNodeInternalRPCServiceImpl.sendPlanNode(DataNodeInternalRPCServiceImpl.java:283)
> at 
> org.apache.iotdb.mpp.rpc.thrift.IDataNodeRPCService$Processor$sendPlanNode.getResult(IDataNodeRPCService.java:3607)
> at 
> org.apache.iotdb.mpp.rpc.thrift.IDataNodeRPCService$Processor$sendPlanNode.getResult(IDataNodeRPCService.java:3587)
> at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:38)
> at org.apache.thrift.TBaseProcesso

[jira] [Commented] (IOTDB-4971) dispatch write failed. status: TSStatus(code:506, subStatus:[]), message: null

2022-12-11 Thread Jinrui Zhang (Jira)



[ 
https://issues.apache.org/jira/browse/IOTDB-4971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17645737#comment-17645737
 ] 

Jinrui Zhang commented on IOTDB-4971:
-

Need to confirm what the problem is behind the logs

> dispatch write failed. status: TSStatus(code:506, subStatus:[]), message: null
> --
>
> Key: IOTDB-4971
> URL: https://issues.apache.org/jira/browse/IOTDB-4971
> Project: Apache IoTDB
>  Issue Type: Bug
>  Components: mpp-cluster
>Affects Versions: 0.14.0-SNAPSHOT
>Reporter: 刘珍
>Assignee: Jinrui Zhang
>Priority: Minor
>  Labels: pull-request-available
> Attachments: del_ts.sh, down_delete_ts.conf, run_del_1.sh, 
> run_del_2.sh, run_iotdb_4563.sh
>
>
> master_1117_d548214
> 1. start 3rep 3C 9D cluster
> 2. delete timeseries root.**和create metadata , write data 并发
> datanode （IP18） 有ERROR ：
> 2022-11-17 14:52:38,172 
> [pool-24-IoTDB-DataNodeInternalRPC-Processor-17$20221117_065237_15126_11.1.0] 
> {color:red}*ERROR o.a.i.d.m.p.s.FragmentInstanceDispatcherImpl:173 - dispatch 
> write failed. status: TSStatus(code:506, subStatus:[]), message: null*{color}
> 复现流程
> 1. 启动3C9D集群
> 3C  : 172.20.70.19/172.20.70.21/172.20.70.32
> 9D : 172.20.70.2/3/4/5/13/14/15/16/18
> 配置参数
> ConfigNode：
> MAX_HEAP_SIZE="8G"
> MAX_DIRECT_MEMORY_SIZE="6G"
> cn_connection_timeout_ms=360
> Common ：
> schema_region_consensus_protocol_class=org.apache.iotdb.consensus.ratis.RatisConsensus
> data_region_consensus_protocol_class=org.apache.iotdb.consensus.multileader.MultiLeaderConsensus
> schema_replication_factor=3
> data_replication_factor=3
> connection_timeout_ms=360
> max_connection_for_internal_service=200
> query_timeout_threshold=360
> schema_region_ratis_request_timeout_ms=180
> Datanode：
> MAX_HEAP_SIZE="20G"
> MAX_DIRECT_MEMORY_SIZE="6G"
> 2. 启动测试脚本
> down_delete_ts.conf放到${bm_dir}/conf下
> del_ts.sh 、run_del_1.sh 、 run_del_2.sh 、run_iotdb_4563.sh 这4个脚本放到${bm_dir}下
> 启动脚本是：run_iotdb_4563.sh
> 运行完成，查看 ip18datanode 日志。



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (IOTDB-4805) [Performance] Compare performance of 1C1D with "start-server.sh” and ”start-new-server.sh”

2022-12-11 Thread Jinrui Zhang (Jira)



[ 
https://issues.apache.org/jira/browse/IOTDB-4805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17645736#comment-17645736
 ] 

Jinrui Zhang commented on IOTDB-4805:
-

This issue intended to compare the performance between 1c1d and new-standalone 
server. Let's confirm whether the test is necessary or not

> [Performance] Compare performance of 1C1D with "start-server.sh” and 
> ”start-new-server.sh”
> --
>
> Key: IOTDB-4805
> URL: https://issues.apache.org/jira/browse/IOTDB-4805
> Project: Apache IoTDB
>  Issue Type: Improvement
>Affects Versions: 0.14.0-SNAPSHOT
>Reporter: FengQingxin
>Assignee: Jinrui Zhang
>Priority: Major
> Attachments: common, image-2022-10-31-12-08-22-497.png
>
>
> commit_id：76b947f
> Reproduce Steps：
> 1.Git pull the latest master code,then build it with command "mvn clean 
> package -pl distribution -am -DskipTests"
> 2.Modify the config as bellow:
> MAX_HEAP_SIZE="20G"
> enable_partition=false
> enable_seq_space_compaction=false
> enable_unseq_space_compaction=false
> enable_cross_space_compaction=false
> enableMetric: true
> 3.Start 1C1D
> 4.Insert data with bm which using iotdb-0.13-0.0.1.jar
> 5.After the test of 1C1D finished,start old server with start-server.sh
> 6.Insert data with bm which using iotdb-0.13-0.0.1.jar
> Result: 
> 1c1d/new-server=83.64%
> 1c1d/old-server=72.98%
> !image-2022-10-31-12-08-22-497.png|width=712,height=367!
> Attachment:
> benchmark config:common
> B.R
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (IOTDB-5103) LastFlushTime may be set incorrectly when DataRegion recovering

2022-12-01 Thread Jinrui Zhang (Jira)

Jinrui Zhang created IOTDB-5103:
---

 Summary: LastFlushTime may be set incorrectly when DataRegion 
recovering
 Key: IOTDB-5103
 URL: https://issues.apache.org/jira/browse/IOTDB-5103
 Project: Apache IoTDB
  Issue Type: Bug
Reporter: Jinrui Zhang
Assignee: Jinrui Zhang


During the DataRegion recovering, the unsealed file may be read before the 
complete of sealed TsFile, which will lead to the incorrect lastFlushTime for 
current DataRegion. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (IOTDB-5035) After the datanode is removed successfully, the snapshot can be deleted

2022-11-28 Thread Jinrui Zhang (Jira)



 [ 
https://issues.apache.org/jira/browse/IOTDB-5035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jinrui Zhang reassigned IOTDB-5035:
---

Assignee: Haiming Zhu  (was: Jinrui Zhang)

> After the datanode is removed successfully, the snapshot can be deleted
> ---
>
> Key: IOTDB-5035
> URL: https://issues.apache.org/jira/browse/IOTDB-5035
> Project: Apache IoTDB
>  Issue Type: Improvement
>  Components: mpp-cluster
>Reporter: 刘珍
>Assignee: Haiming Zhu
>Priority: Minor
> Attachments: image-2022-11-24-14-38-09-579.png
>
>
> 测试版本：1124_cd839a4
> 在机器上的路径：/data/liuzhen_test/master_1123_32e2f98 （1124_cd839a4的lib）
> 问题描述：
> 正常stop datanode，会触发snapshot，
> 启动这个节点后，执行缩容datanode（ip76/ip62），缩容成功后，这个snapshot没有被删除：
>  !image-2022-11-24-14-38-09-579.png! 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (IOTDB-5045) [delete] After running "drop database root.**", wal and tsfile still left

2022-11-28 Thread Jinrui Zhang (Jira)



[ 
https://issues.apache.org/jira/browse/IOTDB-5045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17639889#comment-17639889
 ] 

Jinrui Zhang commented on IOTDB-5045:
-

According to current investigation with [~Marcoss] , we found that some write 
operation may happen even the DataRegion has been marked as deleted. The `write 
operation` may be on going while the DataRegion is being deleted. And when the 
DataRegion’s deletion is done, the ongoing write may trigger the DataRegion's 
write again so that some wal/tsfile is generated.

 

There is no lock/concurrent control between `write` and `DataRegion delete`, so 
this case may be triggered in many scenarios. If the delete is submitted 
immediately after insertion, the SyncLog of IoTConsensus can trigger this case 
easily when the total insert operation is less than 5.

 

What we should do next:
 * Add the concurrent control between DataRegion's write and delete. Ensure the 
write operation will be discards/rejected when the DataRegion has been marked 
as deleted

> [delete] After running "drop database root.**",  wal and tsfile still left 
> ---
>
> Key: IOTDB-5045
> URL: https://issues.apache.org/jira/browse/IOTDB-5045
> Project: Apache IoTDB
>  Issue Type: Bug
>Affects Versions: 0.14.0-SNAPSHOT
>Reporter: changxue
>Assignee: Yukun Zhou
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.0.0
>
> Attachments: allnodes-log.tar.gz, udf-privilege.run
>
>
> [delete] After running "drop database root.**",  wal and tsfile left rather 
> than cleanup
> 3C3D cluster, Nov.25 14:00 source codes
> reproduction:
> execute the statements of attchachment udf-privilege.run on start-cli.sh 
> window several times
> actual result:
> They are all empty:
> show databases;
> show timeseries root.**;
> show regions;
> I've run flush in command wiindow but no use.
> But there are also tsfile and wal files left and won't be removed.
> find $IOTDB_HOME/data/datanode/data -type f | xargs ls -hl
> {code}
> -rw-r--r-- 1 atmos root5 Nov 25 14:30 
> /data/iotdb/apache-iotdb-0.14.0-SNAPSHOT-all-bin/data/datanode/data/.iotdb-lock
> -rw-r--r-- 1 atmos root0 Nov 25 14:43 
> /data/iotdb/apache-iotdb-0.14.0-SNAPSHOT-all-bin/data/datanode/data/sequence/root.sg1/10/0/1669358591044-1-0-0.tsfile
> -rw-r--r-- 1 atmos root0 Nov 25 14:59 
> /data/iotdb/apache-iotdb-0.14.0-SNAPSHOT-all-bin/data/datanode/data/sequence/root.sg1/19/0/1669359549944-1-0-0.tsfile
> -rw-r--r-- 1 atmos root0 Nov 25 14:59 
> /data/iotdb/apache-iotdb-0.14.0-SNAPSHOT-all-bin/data/datanode/data/sequence/root.sg1/25/0/1669359576659-1-0-0.tsfile
> -rw-r--r-- 1 atmos root0 Nov 25 15:13 
> /data/iotdb/apache-iotdb-0.14.0-SNAPSHOT-all-bin/data/datanode/data/sequence/root.sg1/32/0/1669360384185-1-0-0.tsfile
> -rw-r--r-- 1 atmos root0 Nov 25 15:34 
> /data/iotdb/apache-iotdb-0.14.0-SNAPSHOT-all-bin/data/datanode/data/sequence/root.sg1/38/0/1669361656927-1-0-0.tsfile
> -rw-r--r-- 1 atmos root0 Nov 25 15:41 
> /data/iotdb/apache-iotdb-0.14.0-SNAPSHOT-all-bin/data/datanode/data/sequence/root.sg1/50/0/1669362077038-1-0-0.tsfile
> -rw-r--r-- 1 atmos root0 Nov 25 14:34 
> /data/iotdb/apache-iotdb-0.14.0-SNAPSHOT-all-bin/data/datanode/data/sequence/root.sg1/6/0/1669358088441-1-0-0.tsfile
> {code}
> find $IOTDB_HOME/data/datanode/data -type f | xargs ls -hl
> {code}
> -rw-r--r-- 1 atmos root   54 Nov 25 14:30 
> /data/iotdb/apache-iotdb-0.14.0-SNAPSHOT-all-bin/data/datanode/system/users/root.profile
> -rw-r--r-- 1 atmos root  136 Nov 25 14:43 
> /data/iotdb/apache-iotdb-0.14.0-SNAPSHOT-all-bin/data/datanode/wal/root.sg1-10/_0-0-1.wal
> -rw-r--r-- 1 atmos root  155 Nov 25 14:43 
> /data/iotdb/apache-iotdb-0.14.0-SNAPSHOT-all-bin/data/datanode/wal/root.sg1-10/_0.checkpoint
> -rw-r--r-- 1 atmos root   68 Nov 25 14:59 
> /data/iotdb/apache-iotdb-0.14.0-SNAPSHOT-all-bin/data/datanode/wal/root.sg1-19/_0-0-1.wal
> -rw-r--r-- 1 atmos root  155 Nov 25 14:59 
> /data/iotdb/apache-iotdb-0.14.0-SNAPSHOT-all-bin/data/datanode/wal/root.sg1-19/_0.checkpoint
> -rw-r--r-- 1 atmos root   68 Nov 25 14:59 
> /data/iotdb/apache-iotdb-0.14.0-SNAPSHOT-all-bin/data/datanode/wal/root.sg1-25/_0-0-1.wal
> -rw-r--r-- 1 atmos root  155 Nov 25 14:59 
> /data/iotdb/apache-iotdb-0.14.0-SNAPSHOT-all-bin/data/datanode/wal/root.sg1-25/_0.checkpoint
> -rw-r--r-- 1 atmos root   68 Nov 25 15:13 
> /data/iotdb/apache-iotdb-0.14.0-SNAPSHOT-all-bin/data/datanode/wal/root.sg1-32/_0-0-1.wal
> -rw-r--r-- 1 atmos root  155 Nov 25 15:13 
> /data/iotdb/apache-iotdb-0.14.0-SNAPSHOT-all-bin/data/datanode/wal/root.sg1-32/_0.checkpoint
> -rw-r--r-- 1 atmos root   68 Nov 25 15:34 
> /data/iotdb/apache-iotdb-0.14.0-SNAPSHOT-all-bin/data/datanode/wal/root.sg1-38/_0-0-1.wal

[jira] [Assigned] (IOTDB-5030) java.lang.IllegalArgumentException: all replicas for region[TConsensusGroupId(type:SchemaRegion, id:6)] are not available in these DataNodes

2022-11-25 Thread Jinrui Zhang (Jira)



 [ 
https://issues.apache.org/jira/browse/IOTDB-5030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jinrui Zhang reassigned IOTDB-5030:
---

Assignee: Jinrui Zhang  (was: Yukun Zhou)

>  java.lang.IllegalArgumentException: all replicas for 
> region[TConsensusGroupId(type:SchemaRegion, id:6)] are not available in these 
> DataNodes
> -
>
> Key: IOTDB-5030
> URL: https://issues.apache.org/jira/browse/IOTDB-5030
> Project: Apache IoTDB
>  Issue Type: Bug
>  Components: mpp-cluster
>Affects Versions: 0.14.0-SNAPSHOT
>Reporter: 刘珍
>Assignee: Jinrui Zhang
>Priority: Minor
> Attachments: iotdb_4851.conf
>
>
> master_1123_32e2f98
> 1. 启动1副本3C5D集群
> 2. BM 写入数据，50分钟，ip68 报错
> {color:#DE350B}2022-11-23 15:32:46,820 
> [pool-24-IoTDB-DataNodeInternalRPC-Processor-122] ERROR 
> o.a.t.ProcessFunction:47 - Internal error processing sendPlanNode
> java.lang.IllegalArgumentException: all replicas for 
> region[TConsensusGroupId(type:SchemaRegion, id:1)] are not available in these 
> DataNodes[[TDataNodeLocation(dataNodeId:4, 
> clientRpcEndPoint:TEndPoint(ip:192.168.10.66, port:6667), 
> internalEndPoint:TEndPoint(ip:192.168.10.66, port:9003), 
> mPPDataExchangeEndPoint:TEndPoint(ip:192.168.10.66, port:8777), 
> dataRegionConsensusEndPoint:TEndPoint(ip:192.168.10.66, port:40010), 
> schemaRegionConsensusEndPoint:TEndPoint(ip:192.168.10.66, 
> port:50010))]]{color}
> at 
> org.apache.iotdb.db.mpp.plan.planner.distribution.SimpleFragmentParallelPlanner.selectTargetDataNode(SimpleFragmentParallelPlanner.java:146)
> at 
> org.apache.iotdb.db.mpp.plan.planner.distribution.SimpleFragmentParallelPlanner.produceFragmentInstance(SimpleFragmentParallelPlanner.java:115)
> at 
> org.apache.iotdb.db.mpp.plan.planner.distribution.SimpleFragmentParallelPlanner.prepare(SimpleFragmentParallelPlanner.java:87)
> at 
> org.apache.iotdb.db.mpp.plan.planner.distribution.SimpleFragmentParallelPlanner.parallelPlan(SimpleFragmentParallelPlanner.java:78)
> at 
> org.apache.iotdb.db.mpp.plan.planner.distribution.DistributionPlanner.planFragmentInstances(DistributionPlanner.java:94)
> at 
> org.apache.iotdb.db.mpp.plan.planner.distribution.DistributionPlanner.planFragments(DistributionPlanner.java:78)
> at 
> org.apache.iotdb.db.mpp.plan.execution.QueryExecution.doDistributedPlan(QueryExecution.java:304)
> at 
> org.apache.iotdb.db.mpp.plan.execution.QueryExecution.start(QueryExecution.java:201)
> at 
> org.apache.iotdb.db.mpp.plan.execution.QueryExecution.retry(QueryExecution.java:235)
> at 
> org.apache.iotdb.db.mpp.plan.execution.QueryExecution.getStatus(QueryExecution.java:500)
> at 
> org.apache.iotdb.db.mpp.plan.Coordinator.execute(Coordinator.java:152)
> at 
> org.apache.iotdb.db.mpp.plan.analyze.ClusterSchemaFetcher.executeSchemaFetchQuery(ClusterSchemaFetcher.java:178)
> at 
> org.apache.iotdb.db.mpp.plan.analyze.ClusterSchemaFetcher.fetchSchema(ClusterSchemaFetcher.java:156)
> at 
> org.apache.iotdb.db.mpp.plan.analyze.ClusterSchemaFetcher.fetchSchema(ClusterSchemaFetcher.java:98)
> at 
> org.apache.iotdb.db.mpp.plan.analyze.ClusterSchemaFetcher.fetchSchemaWithAutoCreate(ClusterSchemaFetcher.java:265)
> at 
> org.apache.iotdb.db.mpp.plan.analyze.SchemaValidator.validate(SchemaValidator.java:56)
> at 
> org.apache.iotdb.db.mpp.execution.executor.RegionWriteExecutor$WritePlanNodeExecutionVisitor.executeDataInsert(RegionWriteExecutor.java:193)
> at 
> org.apache.iotdb.db.mpp.execution.executor.RegionWriteExecutor$WritePlanNodeExecutionVisitor.visitInsertTablet(RegionWriteExecutor.java:165)
> at 
> org.apache.iotdb.db.mpp.execution.executor.RegionWriteExecutor$WritePlanNodeExecutionVisitor.visitInsertTablet(RegionWriteExecutor.java:119)
> at 
> org.apache.iotdb.db.mpp.plan.planner.plan.node.write.InsertTabletNode.accept(InsertTabletNode.java:1086)
> at 
> org.apache.iotdb.db.mpp.execution.executor.RegionWriteExecutor.execute(RegionWriteExecutor.java:85)
> at 
> org.apache.iotdb.db.service.thrift.impl.DataNodeInternalRPCServiceImpl.sendPlanNode(DataNodeInternalRPCServiceImpl.java:283)
> at 
> org.apache.iotdb.mpp.rpc.thrift.IDataNodeRPCService$Processor$sendPlanNode.getResult(IDataNodeRPCService.java:3607)
> at 
> org.apache.iotdb.mpp.rpc.thrift.IDataNodeRPCService$Processor$sendPlanNode.getResult(IDataNodeRPCService.java:3587)
> at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:38)
> at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:38)
> at 
> org.apache.thrift.server.TThreadPoolServer$WorkerPr

[jira] [Assigned] (IOTDB-5030) java.lang.IllegalArgumentException: all replicas for region[TConsensusGroupId(type:SchemaRegion, id:6)] are not available in these DataNodes

2022-11-25 Thread Jinrui Zhang (Jira)



 [ 
https://issues.apache.org/jira/browse/IOTDB-5030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jinrui Zhang reassigned IOTDB-5030:
---

Assignee: Yukun Zhou  (was: Jinrui Zhang)

>  java.lang.IllegalArgumentException: all replicas for 
> region[TConsensusGroupId(type:SchemaRegion, id:6)] are not available in these 
> DataNodes
> -
>
> Key: IOTDB-5030
> URL: https://issues.apache.org/jira/browse/IOTDB-5030
> Project: Apache IoTDB
>  Issue Type: Bug
>  Components: mpp-cluster
>Affects Versions: 0.14.0-SNAPSHOT
>Reporter: 刘珍
>Assignee: Yukun Zhou
>Priority: Minor
> Attachments: iotdb_4851.conf
>
>
> master_1123_32e2f98
> 1. 启动1副本3C5D集群
> 2. BM 写入数据，50分钟，ip68 报错
> {color:#DE350B}2022-11-23 15:32:46,820 
> [pool-24-IoTDB-DataNodeInternalRPC-Processor-122] ERROR 
> o.a.t.ProcessFunction:47 - Internal error processing sendPlanNode
> java.lang.IllegalArgumentException: all replicas for 
> region[TConsensusGroupId(type:SchemaRegion, id:1)] are not available in these 
> DataNodes[[TDataNodeLocation(dataNodeId:4, 
> clientRpcEndPoint:TEndPoint(ip:192.168.10.66, port:6667), 
> internalEndPoint:TEndPoint(ip:192.168.10.66, port:9003), 
> mPPDataExchangeEndPoint:TEndPoint(ip:192.168.10.66, port:8777), 
> dataRegionConsensusEndPoint:TEndPoint(ip:192.168.10.66, port:40010), 
> schemaRegionConsensusEndPoint:TEndPoint(ip:192.168.10.66, 
> port:50010))]]{color}
> at 
> org.apache.iotdb.db.mpp.plan.planner.distribution.SimpleFragmentParallelPlanner.selectTargetDataNode(SimpleFragmentParallelPlanner.java:146)
> at 
> org.apache.iotdb.db.mpp.plan.planner.distribution.SimpleFragmentParallelPlanner.produceFragmentInstance(SimpleFragmentParallelPlanner.java:115)
> at 
> org.apache.iotdb.db.mpp.plan.planner.distribution.SimpleFragmentParallelPlanner.prepare(SimpleFragmentParallelPlanner.java:87)
> at 
> org.apache.iotdb.db.mpp.plan.planner.distribution.SimpleFragmentParallelPlanner.parallelPlan(SimpleFragmentParallelPlanner.java:78)
> at 
> org.apache.iotdb.db.mpp.plan.planner.distribution.DistributionPlanner.planFragmentInstances(DistributionPlanner.java:94)
> at 
> org.apache.iotdb.db.mpp.plan.planner.distribution.DistributionPlanner.planFragments(DistributionPlanner.java:78)
> at 
> org.apache.iotdb.db.mpp.plan.execution.QueryExecution.doDistributedPlan(QueryExecution.java:304)
> at 
> org.apache.iotdb.db.mpp.plan.execution.QueryExecution.start(QueryExecution.java:201)
> at 
> org.apache.iotdb.db.mpp.plan.execution.QueryExecution.retry(QueryExecution.java:235)
> at 
> org.apache.iotdb.db.mpp.plan.execution.QueryExecution.getStatus(QueryExecution.java:500)
> at 
> org.apache.iotdb.db.mpp.plan.Coordinator.execute(Coordinator.java:152)
> at 
> org.apache.iotdb.db.mpp.plan.analyze.ClusterSchemaFetcher.executeSchemaFetchQuery(ClusterSchemaFetcher.java:178)
> at 
> org.apache.iotdb.db.mpp.plan.analyze.ClusterSchemaFetcher.fetchSchema(ClusterSchemaFetcher.java:156)
> at 
> org.apache.iotdb.db.mpp.plan.analyze.ClusterSchemaFetcher.fetchSchema(ClusterSchemaFetcher.java:98)
> at 
> org.apache.iotdb.db.mpp.plan.analyze.ClusterSchemaFetcher.fetchSchemaWithAutoCreate(ClusterSchemaFetcher.java:265)
> at 
> org.apache.iotdb.db.mpp.plan.analyze.SchemaValidator.validate(SchemaValidator.java:56)
> at 
> org.apache.iotdb.db.mpp.execution.executor.RegionWriteExecutor$WritePlanNodeExecutionVisitor.executeDataInsert(RegionWriteExecutor.java:193)
> at 
> org.apache.iotdb.db.mpp.execution.executor.RegionWriteExecutor$WritePlanNodeExecutionVisitor.visitInsertTablet(RegionWriteExecutor.java:165)
> at 
> org.apache.iotdb.db.mpp.execution.executor.RegionWriteExecutor$WritePlanNodeExecutionVisitor.visitInsertTablet(RegionWriteExecutor.java:119)
> at 
> org.apache.iotdb.db.mpp.plan.planner.plan.node.write.InsertTabletNode.accept(InsertTabletNode.java:1086)
> at 
> org.apache.iotdb.db.mpp.execution.executor.RegionWriteExecutor.execute(RegionWriteExecutor.java:85)
> at 
> org.apache.iotdb.db.service.thrift.impl.DataNodeInternalRPCServiceImpl.sendPlanNode(DataNodeInternalRPCServiceImpl.java:283)
> at 
> org.apache.iotdb.mpp.rpc.thrift.IDataNodeRPCService$Processor$sendPlanNode.getResult(IDataNodeRPCService.java:3607)
> at 
> org.apache.iotdb.mpp.rpc.thrift.IDataNodeRPCService$Processor$sendPlanNode.getResult(IDataNodeRPCService.java:3587)
> at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:38)
> at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:38)
> at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProc

[jira] [Commented] (IOTDB-5030) java.lang.IllegalArgumentException: all replicas for region[TConsensusGroupId(type:SchemaRegion, id:6)] are not available in these DataNodes

2022-11-25 Thread Jinrui Zhang (Jira)



[ 
https://issues.apache.org/jira/browse/IOTDB-5030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17638667#comment-17638667
 ] 

Jinrui Zhang commented on IOTDB-5030:
-

[~HeimingZ] tried to reproduce this issue in fit16-20 with 3C5D cluster and 
this issue didn't occur.

We investigate the logs when the issue occurred and found that it should be a 
timeout issue. At that time, fit68 was trying to dispatch a schema-read FI to 
fit66 but the response was not returned in the timeout. But actually it 
executed successfully in fit66 because we didn't see any error log in fit66. 
And on the other hand, we found `Read timeout` log in fit68 at that time. 

This indicates that the schema-read operation is not as fast as we expected in 
this load. So it didn't returned the response in a tolerated interval. 

According to the benchmark settings, there are 3kw series in the schema, which 
is huge. There are two ways to resolve this issue currently:
 # optimize the schema read execution to avoid timeout
 # let users to turn up the `connection_timeout_ms` in configuration regarding 
the huge load.

 

But...the optimization can definitely not be completed in a very short time, 
according to the release stage of 1.0, I will decrease the priority of this 
issue. And the optimization need [~Marcoss]  to take a look

>  java.lang.IllegalArgumentException: all replicas for 
> region[TConsensusGroupId(type:SchemaRegion, id:6)] are not available in these 
> DataNodes
> -
>
> Key: IOTDB-5030
> URL: https://issues.apache.org/jira/browse/IOTDB-5030
> Project: Apache IoTDB
>  Issue Type: Bug
>  Components: mpp-cluster
>Affects Versions: 0.14.0-SNAPSHOT
>Reporter: 刘珍
>Assignee: Jinrui Zhang
>Priority: Minor
> Attachments: iotdb_4851.conf
>
>
> master_1123_32e2f98
> 1. 启动1副本3C5D集群
> 2. BM 写入数据，50分钟，ip68 报错
> {color:#DE350B}2022-11-23 15:32:46,820 
> [pool-24-IoTDB-DataNodeInternalRPC-Processor-122] ERROR 
> o.a.t.ProcessFunction:47 - Internal error processing sendPlanNode
> java.lang.IllegalArgumentException: all replicas for 
> region[TConsensusGroupId(type:SchemaRegion, id:1)] are not available in these 
> DataNodes[[TDataNodeLocation(dataNodeId:4, 
> clientRpcEndPoint:TEndPoint(ip:192.168.10.66, port:6667), 
> internalEndPoint:TEndPoint(ip:192.168.10.66, port:9003), 
> mPPDataExchangeEndPoint:TEndPoint(ip:192.168.10.66, port:8777), 
> dataRegionConsensusEndPoint:TEndPoint(ip:192.168.10.66, port:40010), 
> schemaRegionConsensusEndPoint:TEndPoint(ip:192.168.10.66, 
> port:50010))]]{color}
> at 
> org.apache.iotdb.db.mpp.plan.planner.distribution.SimpleFragmentParallelPlanner.selectTargetDataNode(SimpleFragmentParallelPlanner.java:146)
> at 
> org.apache.iotdb.db.mpp.plan.planner.distribution.SimpleFragmentParallelPlanner.produceFragmentInstance(SimpleFragmentParallelPlanner.java:115)
> at 
> org.apache.iotdb.db.mpp.plan.planner.distribution.SimpleFragmentParallelPlanner.prepare(SimpleFragmentParallelPlanner.java:87)
> at 
> org.apache.iotdb.db.mpp.plan.planner.distribution.SimpleFragmentParallelPlanner.parallelPlan(SimpleFragmentParallelPlanner.java:78)
> at 
> org.apache.iotdb.db.mpp.plan.planner.distribution.DistributionPlanner.planFragmentInstances(DistributionPlanner.java:94)
> at 
> org.apache.iotdb.db.mpp.plan.planner.distribution.DistributionPlanner.planFragments(DistributionPlanner.java:78)
> at 
> org.apache.iotdb.db.mpp.plan.execution.QueryExecution.doDistributedPlan(QueryExecution.java:304)
> at 
> org.apache.iotdb.db.mpp.plan.execution.QueryExecution.start(QueryExecution.java:201)
> at 
> org.apache.iotdb.db.mpp.plan.execution.QueryExecution.retry(QueryExecution.java:235)
> at 
> org.apache.iotdb.db.mpp.plan.execution.QueryExecution.getStatus(QueryExecution.java:500)
> at 
> org.apache.iotdb.db.mpp.plan.Coordinator.execute(Coordinator.java:152)
> at 
> org.apache.iotdb.db.mpp.plan.analyze.ClusterSchemaFetcher.executeSchemaFetchQuery(ClusterSchemaFetcher.java:178)
> at 
> org.apache.iotdb.db.mpp.plan.analyze.ClusterSchemaFetcher.fetchSchema(ClusterSchemaFetcher.java:156)
> at 
> org.apache.iotdb.db.mpp.plan.analyze.ClusterSchemaFetcher.fetchSchema(ClusterSchemaFetcher.java:98)
> at 
> org.apache.iotdb.db.mpp.plan.analyze.ClusterSchemaFetcher.fetchSchemaWithAutoCreate(ClusterSchemaFetcher.java:265)
> at 
> org.apache.iotdb.db.mpp.plan.analyze.SchemaValidator.validate(SchemaValidator.java:56)
> at 
> org.apache.iotdb.db.mpp.execution.executor.RegionWriteExecutor$WritePlanNodeExecutionVisitor.executeDataInsert(RegionWriteExecutor.java:193)
> at 
> org.apach

[jira] [Commented] (IOTDB-5030) java.lang.IllegalArgumentException: all replicas for region[TConsensusGroupId(type:SchemaRegion, id:6)] are not available in these DataNodes

2022-11-24 Thread Jinrui Zhang (Jira)



[ 
https://issues.apache.org/jira/browse/IOTDB-5030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17638525#comment-17638525
 ] 

Jinrui Zhang commented on IOTDB-5030:
-

Please try to reproduce this issue with latest code

>  java.lang.IllegalArgumentException: all replicas for 
> region[TConsensusGroupId(type:SchemaRegion, id:6)] are not available in these 
> DataNodes
> -
>
> Key: IOTDB-5030
> URL: https://issues.apache.org/jira/browse/IOTDB-5030
> Project: Apache IoTDB
>  Issue Type: Bug
>  Components: mpp-cluster
>Affects Versions: 0.14.0-SNAPSHOT
>Reporter: 刘珍
>Assignee: Jinrui Zhang
>Priority: Major
> Attachments: iotdb_4851.conf
>
>
> master_1123_32e2f98
> 1. 启动1副本3C5D集群
> 2. BM 写入数据，50分钟，ip68 报错
> {color:#DE350B}2022-11-23 15:32:46,820 
> [pool-24-IoTDB-DataNodeInternalRPC-Processor-122] ERROR 
> o.a.t.ProcessFunction:47 - Internal error processing sendPlanNode
> java.lang.IllegalArgumentException: all replicas for 
> region[TConsensusGroupId(type:SchemaRegion, id:1)] are not available in these 
> DataNodes[[TDataNodeLocation(dataNodeId:4, 
> clientRpcEndPoint:TEndPoint(ip:192.168.10.66, port:6667), 
> internalEndPoint:TEndPoint(ip:192.168.10.66, port:9003), 
> mPPDataExchangeEndPoint:TEndPoint(ip:192.168.10.66, port:8777), 
> dataRegionConsensusEndPoint:TEndPoint(ip:192.168.10.66, port:40010), 
> schemaRegionConsensusEndPoint:TEndPoint(ip:192.168.10.66, 
> port:50010))]]{color}
> at 
> org.apache.iotdb.db.mpp.plan.planner.distribution.SimpleFragmentParallelPlanner.selectTargetDataNode(SimpleFragmentParallelPlanner.java:146)
> at 
> org.apache.iotdb.db.mpp.plan.planner.distribution.SimpleFragmentParallelPlanner.produceFragmentInstance(SimpleFragmentParallelPlanner.java:115)
> at 
> org.apache.iotdb.db.mpp.plan.planner.distribution.SimpleFragmentParallelPlanner.prepare(SimpleFragmentParallelPlanner.java:87)
> at 
> org.apache.iotdb.db.mpp.plan.planner.distribution.SimpleFragmentParallelPlanner.parallelPlan(SimpleFragmentParallelPlanner.java:78)
> at 
> org.apache.iotdb.db.mpp.plan.planner.distribution.DistributionPlanner.planFragmentInstances(DistributionPlanner.java:94)
> at 
> org.apache.iotdb.db.mpp.plan.planner.distribution.DistributionPlanner.planFragments(DistributionPlanner.java:78)
> at 
> org.apache.iotdb.db.mpp.plan.execution.QueryExecution.doDistributedPlan(QueryExecution.java:304)
> at 
> org.apache.iotdb.db.mpp.plan.execution.QueryExecution.start(QueryExecution.java:201)
> at 
> org.apache.iotdb.db.mpp.plan.execution.QueryExecution.retry(QueryExecution.java:235)
> at 
> org.apache.iotdb.db.mpp.plan.execution.QueryExecution.getStatus(QueryExecution.java:500)
> at 
> org.apache.iotdb.db.mpp.plan.Coordinator.execute(Coordinator.java:152)
> at 
> org.apache.iotdb.db.mpp.plan.analyze.ClusterSchemaFetcher.executeSchemaFetchQuery(ClusterSchemaFetcher.java:178)
> at 
> org.apache.iotdb.db.mpp.plan.analyze.ClusterSchemaFetcher.fetchSchema(ClusterSchemaFetcher.java:156)
> at 
> org.apache.iotdb.db.mpp.plan.analyze.ClusterSchemaFetcher.fetchSchema(ClusterSchemaFetcher.java:98)
> at 
> org.apache.iotdb.db.mpp.plan.analyze.ClusterSchemaFetcher.fetchSchemaWithAutoCreate(ClusterSchemaFetcher.java:265)
> at 
> org.apache.iotdb.db.mpp.plan.analyze.SchemaValidator.validate(SchemaValidator.java:56)
> at 
> org.apache.iotdb.db.mpp.execution.executor.RegionWriteExecutor$WritePlanNodeExecutionVisitor.executeDataInsert(RegionWriteExecutor.java:193)
> at 
> org.apache.iotdb.db.mpp.execution.executor.RegionWriteExecutor$WritePlanNodeExecutionVisitor.visitInsertTablet(RegionWriteExecutor.java:165)
> at 
> org.apache.iotdb.db.mpp.execution.executor.RegionWriteExecutor$WritePlanNodeExecutionVisitor.visitInsertTablet(RegionWriteExecutor.java:119)
> at 
> org.apache.iotdb.db.mpp.plan.planner.plan.node.write.InsertTabletNode.accept(InsertTabletNode.java:1086)
> at 
> org.apache.iotdb.db.mpp.execution.executor.RegionWriteExecutor.execute(RegionWriteExecutor.java:85)
> at 
> org.apache.iotdb.db.service.thrift.impl.DataNodeInternalRPCServiceImpl.sendPlanNode(DataNodeInternalRPCServiceImpl.java:283)
> at 
> org.apache.iotdb.mpp.rpc.thrift.IDataNodeRPCService$Processor$sendPlanNode.getResult(IDataNodeRPCService.java:3607)
> at 
> org.apache.iotdb.mpp.rpc.thrift.IDataNodeRPCService$Processor$sendPlanNode.getResult(IDataNodeRPCService.java:3587)
> at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:38)
> at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:38)
>

[jira] [Commented] (IOTDB-4702) [Remove-DataNode] snapshot is not deleted after delete storage group

2022-11-24 Thread Jinrui Zhang (Jira)



[ 
https://issues.apache.org/jira/browse/IOTDB-4702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17638474#comment-17638474
 ] 

Jinrui Zhang commented on IOTDB-4702:
-

Please describe the issue in details. Currently we do have some snapshots which 
cannot be removed in some situations. Please indicate the detailed scenario. 
[~刘珍] 

> [Remove-DataNode] snapshot is not deleted after delete storage group
> 
>
> Key: IOTDB-4702
> URL: https://issues.apache.org/jira/browse/IOTDB-4702
> Project: Apache IoTDB
>  Issue Type: Bug
>  Components: mpp-cluster
>Affects Versions: 0.14.0-SNAPSHOT
>Reporter: 刘珍
>Assignee: suchenglong
>Priority: Minor
> Fix For: 0.14.0
>
> Attachments: image-2022-11-24-14-54-28-187.png, 
> image-2022-11-24-14-54-33-334.png
>
>
> m_1019_f2ffb49
> 3rep,3C3D
> schemaregion : ratis
> dataregion : multiLeader
> execute stop-datanode.sh  , data_region take snapshot .
> {color:#DE350B}delete storage group , snapshot is not deleted.{color}
> How to reproduce :
> IOTDB-4700



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (IOTDB-4507) [SystemResourceIssue] Insert failed can't connect to node TEndPoint

2022-11-24 Thread Jinrui Zhang (Jira)



[ 
https://issues.apache.org/jira/browse/IOTDB-4507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17638467#comment-17638467
 ] 

Jinrui Zhang commented on IOTDB-4507:
-

We gave more optimization and bug fix to MultiLeaderCosnensus, which would 
enhance the stability of write using MultiLeader. See this PR 
[https://github.com/apache/iotdb/pull/8025.]

The error "connect connect to node" is usually caused by the error/pressure of 
server side. This fix decreased the pressure of server side and optimized the 
memory usage of DataNode, which should take effect to this issue. Let's test it 
again to see whether this issue will be reproduced or not

> [SystemResourceIssue] Insert failed  can't connect to node TEndPoint
> 
>
> Key: IOTDB-4507
> URL: https://issues.apache.org/jira/browse/IOTDB-4507
> Project: Apache IoTDB
>  Issue Type: Bug
>  Components: Core/Cluster
>Affects Versions: 0.14.0-SNAPSHOT
>Reporter: FengQingxin
>Assignee: Jinrui Zhang
>Priority: Minor
> Attachments: config.properties, confignode-env.sh, datanode-env.sh, 
> image-2022-09-23-08-44-57-017.png, iotdb-confignode.properties, 
> iotdb-datanode.properties, log.tar.gz
>
>
> Reproduce steps：
>  # Setup a cluster with 3C3D({color:#de350b}MultiLeaderConsensus{color})
>  # Using 3BMs to insert data(Loop=2000)
>  # Setup a cluster with 3C3D({color:#de350b}MultiLeaderConsensus{color})
>  # Using 3BMs to insert data(Loop=4000 or Loop=6000)
> BM    -  》  IoTDB Node
> 172.20.70.7    -  》 172.20.70.22
> 172.20.70.8    -  》 172.20.70.23
> 172.20.70.9    -  》 172.20.70.24
> !image-2022-09-23-08-44-57-017.png|width=510,height=409!
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (IOTDB-5036) Take snapshot in parallel when IoTDB shutdown

2022-11-24 Thread Jinrui Zhang (Jira)

Jinrui Zhang created IOTDB-5036:
---

 Summary: Take snapshot in parallel when IoTDB shutdown
 Key: IOTDB-5036
 URL: https://issues.apache.org/jira/browse/IOTDB-5036
 Project: Apache IoTDB
  Issue Type: Improvement
Reporter: Jinrui Zhang
Assignee: Jinrui Zhang
 Attachments: image-2022-11-24-16-24-08-844.png

Currently, the IoTDB will take a snapshot for each DataRegion when shutdown. 
The snapshot is taken one by one according to DataRegions.

Let's try to take the snapshot in parallel

!image-2022-11-24-16-24-08-844.png|width=579,height=527!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (IOTDB-4971) dispatch write failed. status: TSStatus(code:506, subStatus:[]), message: null

2022-11-23 Thread Jinrui Zhang (Jira)



[ 
https://issues.apache.org/jira/browse/IOTDB-4971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17638102#comment-17638102
 ] 

Jinrui Zhang commented on IOTDB-4971:
-

可以考虑打印明确的code 内容

> dispatch write failed. status: TSStatus(code:506, subStatus:[]), message: null
> --
>
> Key: IOTDB-4971
> URL: https://issues.apache.org/jira/browse/IOTDB-4971
> Project: Apache IoTDB
>  Issue Type: Bug
>  Components: mpp-cluster
>Affects Versions: 0.14.0-SNAPSHOT
>Reporter: 刘珍
>Assignee: Jinrui Zhang
>Priority: Minor
> Attachments: del_ts.sh, down_delete_ts.conf, run_del_1.sh, 
> run_del_2.sh, run_iotdb_4563.sh
>
>
> master_1117_d548214
> 1. start 3rep 3C 9D cluster
> 2. delete timeseries root.**和create metadata , write data 并发
> datanode （IP18） 有ERROR ：
> 2022-11-17 14:52:38,172 
> [pool-24-IoTDB-DataNodeInternalRPC-Processor-17$20221117_065237_15126_11.1.0] 
> {color:red}*ERROR o.a.i.d.m.p.s.FragmentInstanceDispatcherImpl:173 - dispatch 
> write failed. status: TSStatus(code:506, subStatus:[]), message: null*{color}
> 复现流程
> 1. 启动3C9D集群
> 3C  : 172.20.70.19/172.20.70.21/172.20.70.32
> 9D : 172.20.70.2/3/4/5/13/14/15/16/18
> 配置参数
> ConfigNode：
> MAX_HEAP_SIZE="8G"
> MAX_DIRECT_MEMORY_SIZE="6G"
> cn_connection_timeout_ms=360
> Common ：
> schema_region_consensus_protocol_class=org.apache.iotdb.consensus.ratis.RatisConsensus
> data_region_consensus_protocol_class=org.apache.iotdb.consensus.multileader.MultiLeaderConsensus
> schema_replication_factor=3
> data_replication_factor=3
> connection_timeout_ms=360
> max_connection_for_internal_service=200
> query_timeout_threshold=360
> schema_region_ratis_request_timeout_ms=180
> Datanode：
> MAX_HEAP_SIZE="20G"
> MAX_DIRECT_MEMORY_SIZE="6G"
> 2. 启动测试脚本
> down_delete_ts.conf放到${bm_dir}/conf下
> del_ts.sh 、run_del_1.sh 、 run_del_2.sh 、run_iotdb_4563.sh 这4个脚本放到${bm_dir}下
> 启动脚本是：run_iotdb_4563.sh
> 运行完成，查看 ip18datanode 日志。



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (IOTDB-4986) Too many IoTDB-DataNodeInternalRPC-Processor threads are open

2022-11-23 Thread Jinrui Zhang (Jira)



[ 
https://issues.apache.org/jira/browse/IOTDB-4986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17638099#comment-17638099
 ] 

Jinrui Zhang commented on IOTDB-4986:
-

It is not a functionality issue. Let's mark it as a enhancement because we need 
to process other bugs in priority.

> Too many IoTDB-DataNodeInternalRPC-Processor threads are open
> -
>
> Key: IOTDB-4986
> URL: https://issues.apache.org/jira/browse/IOTDB-4986
> Project: Apache IoTDB
>  Issue Type: Bug
>  Components: mpp-cluster
>Affects Versions: 0.14.0-SNAPSHOT
>Reporter: 刘珍
>Assignee: Jinrui Zhang
>Priority: Critical
>
> m_1118_3d5eeae
> 1. 启动3副本3C21D 集群
> 2. 顺序启动7Benchmark
> 3. 某个节点的datanode IoTDB-DataNodeInternalRPC-Processor 线程会开的很多，2k+ 
> （慢慢会降下来），但是会偶现OOM
> 2022-11-18 14:26:48,320 
> [pool-22-IoTDB-DataNodeInternalRPC-Processor-374$20221118_062422_29227_16.1.0]
>  ERROR o.a.i.d.m.p.s.FragmentInstanceDispatcherImpl:234 - write locally 
> failed. TSStatus: TSStatus(code:506, subStatus:[]), message: null
> 2022-11-18 14:29:44,568 [DataNodeInternalRPC-Service]{color:red}* ERROR 
> o.a.i.c.c.IoTDBDefaultThreadExceptionHandler:31 - Exception in thread 
> DataNodeInternalRPC-Service-40
> java.lang.OutOfMemoryError: unable to create native thread: possibly out of 
> memory or process/resource limits reached*{color}
> at java.base/java.lang.Thread.start0(Native Method)
> at java.base/java.lang.Thread.start(Thread.java:803)
> at 
> java.base/java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:937)
> at 
> java.base/java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1354)
> at 
> org.apache.thrift.server.TThreadPoolServer.execute(TThreadPoolServer.java:155)
> at 
> org.apache.thrift.server.TThreadPoolServer.serve(TThreadPoolServer.java:139)
> at 
> org.apache.iotdb.commons.service.AbstractThriftServiceThread.run(AbstractThriftServiceThread.java:258)
> 2022-11-18 14:29:53,751 [ClientRPC-Service] ERROR 
> o.a.i.c.c.IoTDBDefaultThreadExceptionHandler:31 - Exception in thread 
> ClientRPC-Service-42
> java.lang.OutOfMemoryError: unable to create native thread: possibly out of 
> memory or process/resource limits reached
> at java.base/java.lang.Thread.start0(Native Method)
> at java.base/java.lang.Thread.start(Thread.java:803)
> at 
> java.base/java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:937)
> at 
> java.base/java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1354)
> at 
> org.apache.thrift.server.TThreadPoolServer.execute(TThreadPoolServer.java:155)
> at 
> org.apache.thrift.server.TThreadPoolServer.serve(TThreadPoolServer.java:139)
> at 
> org.apache.iotdb.commons.service.AbstractThriftServiceThread.run(AbstractThriftServiceThread.java:258)
> 2022-11-18 14:30:11,736 [pool-6-IoTDB-Flush-4] ERROR 
> o.a.i.d.e.s.TsFileProcessor:1095 - root.test.g_0-6: 
> /data/iotdb/m_1118_3d5eeae/sbin/../data/datanode/data/unsequence/root.test.g_0/6/2538/1668752675355-5-0-0.tsfile
>  meet error when flushing a memtable, change system mode to error
> java.lang.OutOfMemoryError: unable to create native thread: possibly out of 
> memory or process/resource limits reached
> at java.base/java.lang.Thread.start0(Native Method)
> at java.base/java.lang.Thread.start(Thread.java:803)
> at 
> java.base/java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:937)
> at 
> java.base/java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1354)
> at 
> java.base/java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:118)
> at 
> org.apache.iotdb.db.rescon.AbstractPoolManager.submit(AbstractPoolManager.java:56)
> at 
> org.apache.iotdb.db.engine.flush.MemTableFlushTask.(MemTableFlushTask.java:88)
> at 
> org.apache.iotdb.db.engine.storagegroup.TsFileProcessor.flushOneMemTable(TsFileProcessor.java:1082)
> at 
> org.apache.iotdb.db.engine.flush.FlushManager$FlushThread.runMayThrow(FlushManager.java:108)
> at 
> org.apache.iotdb.commons.concurrent.WrappedRunnable.run(WrappedRunnable.java:29)
> at 
> java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
> at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
> at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
> at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
> at java.base/java.lang.Thread.run(Thread.java:834)
> 2022-11-18 14:30:11,736 [pool-6-IoTDB-Flush-4] ERROR 
> o.a.i.c.e.H

[jira] [Commented] (IOTDB-5015) [write]when writing for about 6 hours to only 1 sensor, the writing stopped with error: too many requests need to process

2022-11-23 Thread Jinrui Zhang (Jira)



[ 
https://issues.apache.org/jira/browse/IOTDB-5015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17638098#comment-17638098
 ] 

Jinrui Zhang commented on IOTDB-5015:
-

This issue may caused by the same reason in this issue 
https://issues.apache.org/jira/browse/IOTDB-5019.

 

> [write]when writing for about 6 hours to only 1 sensor, the writing stopped 
> with error: too many requests need to process
> -
>
> Key: IOTDB-5015
> URL: https://issues.apache.org/jira/browse/IOTDB-5015
> Project: Apache IoTDB
>  Issue Type: Bug
>  Components: Core/Server
>Affects Versions: 0.14.0-SNAPSHOT
>Reporter: changxue
>Assignee: Jinrui Zhang
>Priority: Major
> Attachments: allnodes-logs.tar.gz, config.properties, screenshot-1.png
>
>
> [write]when writing for about 6 hours to only 1 sensor, the writing stopped 
> with error: too many requests need to process
> environment:
> 3C3D cluster, 2 replicas
> |RegionId|Type|Status|Database|SeriesSlotId|TimeSlotId|DataNodeId|Host|RpcPort|Role|
> |10|SchemaRegion|Running|root.aggr.g_0|1|0|1|172.20.70.44|6667|Follower|
> |10|SchemaRegion|Running|root.aggr.g_0|1|0|5|172.20.70.46|6667|Leader|
> |11|DataRegion|Running|root.aggr.g_0|1|10|1|172.20.70.44|6667|Follower|
> |11|DataRegion|Running|root.aggr.g_0|1|10|5|172.20.70.46|6667|Leader|
> reproduction:
> 1. start the cluster successfully
> 2. start the 0.13 benchmark at about 20:23 Nov.21, the benchmark 
> configuration see the attachment of config.properties
> 3. errors occurred at about 3:20 Nov.22 and writing can't be continued.
> 4. 6:30 Nov.22, I start benchmark again, but can't write to iotdb successfully
> 5. run stop-datanode.sh and start-datanode.sh  at the bad node of 46
> 6. start benchmark again, now it can write successfully
> 172.20.70.46 datanode:
> {code:sh} 
> 2022-11-22 03:20:23,586 [pool-8-IoTDB-WAL-Delete-1] INFO  
> o.a.i.d.w.n.WALNode$DeleteOutdatedFileTask:367 - WAL node-root.aggr.g_0-11 
> flushes memTable-4510 to TsFile 
> /data/iotdb/apache-iotdb-0.14.0-SNAPSHOT-all-bin/data/datanode/data/sequence/root.aggr.g_0/11/52/1669036247165-4504-0-0.tsfile,
>  memTable size is 1531600.
> 2022-11-22 03:20:42,915 
> [pool-25-IoTDB-ClientRPC-Processor-2$20221121_192013_20413_5.1.0] ERROR 
> o.a.i.d.m.p.s.FragmentInstanceDispatcherImpl:234 - write locally failed. 
> TSStatus: TSStatus(code:606, message:Reject write because there are too many 
> requests need to process), message: Reject write because there are too many 
> requests need to process
> 2022-11-22 03:20:42,978 
> [pool-25-IoTDB-ClientRPC-Processor-2$20221121_192043_20414_5.1.0] INFO  
> o.a.i.c.m.MultiLeaderServerImpl:178 - [Throttle Down] index:380448, 
> safeIndex:380448
> 2022-11-22 03:20:43,594 [pool-8-IoTDB-WAL-Delete-1] INFO  
> o.a.i.d.w.n.WALNode$DeleteOutdatedFileTask:242 - Effective information ratio 
> 1.8484028968067935E-4 (active memTables cost is 13563200, flushed memTables 
> cost is 73364378500) of wal node-root.aggr.g_0-11 is below wal min effective 
> info ratio 0.1, some memTables will be snapshot or flushed.
> {code}
> 3:20 monitor:
>  !screenshot-1.png! 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (IOTDB-5030) java.lang.IllegalArgumentException: all replicas for region[TConsensusGroupId(type:SchemaRegion, id:6)] are not available in these DataNodes

2022-11-23 Thread Jinrui Zhang (Jira)



[ 
https://issues.apache.org/jira/browse/IOTDB-5030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17638092#comment-17638092
 ] 

Jinrui Zhang commented on IOTDB-5030:
-

We need to confirm whether the cluster is ok or not

>  java.lang.IllegalArgumentException: all replicas for 
> region[TConsensusGroupId(type:SchemaRegion, id:6)] are not available in these 
> DataNodes
> -
>
> Key: IOTDB-5030
> URL: https://issues.apache.org/jira/browse/IOTDB-5030
> Project: Apache IoTDB
>  Issue Type: Bug
>  Components: mpp-cluster
>Affects Versions: 0.14.0-SNAPSHOT
>Reporter: 刘珍
>Assignee: Jinrui Zhang
>Priority: Major
> Attachments: iotdb_4851.conf
>
>
> master_1123_32e2f98
> 1. 启动1副本3C5D集群
> 2. BM 写入数据，50分钟，ip68 报错
> {color:#DE350B}2022-11-23 15:32:46,820 
> [pool-24-IoTDB-DataNodeInternalRPC-Processor-122] ERROR 
> o.a.t.ProcessFunction:47 - Internal error processing sendPlanNode
> java.lang.IllegalArgumentException: all replicas for 
> region[TConsensusGroupId(type:SchemaRegion, id:1)] are not available in these 
> DataNodes[[TDataNodeLocation(dataNodeId:4, 
> clientRpcEndPoint:TEndPoint(ip:192.168.10.66, port:6667), 
> internalEndPoint:TEndPoint(ip:192.168.10.66, port:9003), 
> mPPDataExchangeEndPoint:TEndPoint(ip:192.168.10.66, port:8777), 
> dataRegionConsensusEndPoint:TEndPoint(ip:192.168.10.66, port:40010), 
> schemaRegionConsensusEndPoint:TEndPoint(ip:192.168.10.66, 
> port:50010))]]{color}
> at 
> org.apache.iotdb.db.mpp.plan.planner.distribution.SimpleFragmentParallelPlanner.selectTargetDataNode(SimpleFragmentParallelPlanner.java:146)
> at 
> org.apache.iotdb.db.mpp.plan.planner.distribution.SimpleFragmentParallelPlanner.produceFragmentInstance(SimpleFragmentParallelPlanner.java:115)
> at 
> org.apache.iotdb.db.mpp.plan.planner.distribution.SimpleFragmentParallelPlanner.prepare(SimpleFragmentParallelPlanner.java:87)
> at 
> org.apache.iotdb.db.mpp.plan.planner.distribution.SimpleFragmentParallelPlanner.parallelPlan(SimpleFragmentParallelPlanner.java:78)
> at 
> org.apache.iotdb.db.mpp.plan.planner.distribution.DistributionPlanner.planFragmentInstances(DistributionPlanner.java:94)
> at 
> org.apache.iotdb.db.mpp.plan.planner.distribution.DistributionPlanner.planFragments(DistributionPlanner.java:78)
> at 
> org.apache.iotdb.db.mpp.plan.execution.QueryExecution.doDistributedPlan(QueryExecution.java:304)
> at 
> org.apache.iotdb.db.mpp.plan.execution.QueryExecution.start(QueryExecution.java:201)
> at 
> org.apache.iotdb.db.mpp.plan.execution.QueryExecution.retry(QueryExecution.java:235)
> at 
> org.apache.iotdb.db.mpp.plan.execution.QueryExecution.getStatus(QueryExecution.java:500)
> at 
> org.apache.iotdb.db.mpp.plan.Coordinator.execute(Coordinator.java:152)
> at 
> org.apache.iotdb.db.mpp.plan.analyze.ClusterSchemaFetcher.executeSchemaFetchQuery(ClusterSchemaFetcher.java:178)
> at 
> org.apache.iotdb.db.mpp.plan.analyze.ClusterSchemaFetcher.fetchSchema(ClusterSchemaFetcher.java:156)
> at 
> org.apache.iotdb.db.mpp.plan.analyze.ClusterSchemaFetcher.fetchSchema(ClusterSchemaFetcher.java:98)
> at 
> org.apache.iotdb.db.mpp.plan.analyze.ClusterSchemaFetcher.fetchSchemaWithAutoCreate(ClusterSchemaFetcher.java:265)
> at 
> org.apache.iotdb.db.mpp.plan.analyze.SchemaValidator.validate(SchemaValidator.java:56)
> at 
> org.apache.iotdb.db.mpp.execution.executor.RegionWriteExecutor$WritePlanNodeExecutionVisitor.executeDataInsert(RegionWriteExecutor.java:193)
> at 
> org.apache.iotdb.db.mpp.execution.executor.RegionWriteExecutor$WritePlanNodeExecutionVisitor.visitInsertTablet(RegionWriteExecutor.java:165)
> at 
> org.apache.iotdb.db.mpp.execution.executor.RegionWriteExecutor$WritePlanNodeExecutionVisitor.visitInsertTablet(RegionWriteExecutor.java:119)
> at 
> org.apache.iotdb.db.mpp.plan.planner.plan.node.write.InsertTabletNode.accept(InsertTabletNode.java:1086)
> at 
> org.apache.iotdb.db.mpp.execution.executor.RegionWriteExecutor.execute(RegionWriteExecutor.java:85)
> at 
> org.apache.iotdb.db.service.thrift.impl.DataNodeInternalRPCServiceImpl.sendPlanNode(DataNodeInternalRPCServiceImpl.java:283)
> at 
> org.apache.iotdb.mpp.rpc.thrift.IDataNodeRPCService$Processor$sendPlanNode.getResult(IDataNodeRPCService.java:3607)
> at 
> org.apache.iotdb.mpp.rpc.thrift.IDataNodeRPCService$Processor$sendPlanNode.getResult(IDataNodeRPCService.java:3587)
> at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:38)
> at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:38)
>

[jira] [Commented] (IOTDB-5030) java.lang.IllegalArgumentException: all replicas for region[TConsensusGroupId(type:SchemaRegion, id:6)] are not available in these DataNodes

2022-11-23 Thread Jinrui Zhang (Jira)



[ 
https://issues.apache.org/jira/browse/IOTDB-5030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17638091#comment-17638091
 ] 

Jinrui Zhang commented on IOTDB-5030:
-

The issue is caused by the schemaFetching failure during writing. I have two 
questions here :
 # Why the SchemaRegion's replica is only distributed in 66 ? It should has 3 
replicas but only 1 is got from ConfigNode's partition info.
 # It seems that 66 cannot process the FI. We need to investigate the error log 
from 66

>  java.lang.IllegalArgumentException: all replicas for 
> region[TConsensusGroupId(type:SchemaRegion, id:6)] are not available in these 
> DataNodes
> -
>
> Key: IOTDB-5030
> URL: https://issues.apache.org/jira/browse/IOTDB-5030
> Project: Apache IoTDB
>  Issue Type: Bug
>  Components: mpp-cluster
>Affects Versions: 0.14.0-SNAPSHOT
>Reporter: 刘珍
>Assignee: Jinrui Zhang
>Priority: Major
> Attachments: iotdb_4851.conf
>
>
> master_1123_32e2f98
> 1. 启动1副本3C5D集群
> 2. BM 写入数据，50分钟，ip68 报错
> {color:#DE350B}2022-11-23 15:32:46,820 
> [pool-24-IoTDB-DataNodeInternalRPC-Processor-122] ERROR 
> o.a.t.ProcessFunction:47 - Internal error processing sendPlanNode
> java.lang.IllegalArgumentException: all replicas for 
> region[TConsensusGroupId(type:SchemaRegion, id:1)] are not available in these 
> DataNodes[[TDataNodeLocation(dataNodeId:4, 
> clientRpcEndPoint:TEndPoint(ip:192.168.10.66, port:6667), 
> internalEndPoint:TEndPoint(ip:192.168.10.66, port:9003), 
> mPPDataExchangeEndPoint:TEndPoint(ip:192.168.10.66, port:8777), 
> dataRegionConsensusEndPoint:TEndPoint(ip:192.168.10.66, port:40010), 
> schemaRegionConsensusEndPoint:TEndPoint(ip:192.168.10.66, 
> port:50010))]]{color}
> at 
> org.apache.iotdb.db.mpp.plan.planner.distribution.SimpleFragmentParallelPlanner.selectTargetDataNode(SimpleFragmentParallelPlanner.java:146)
> at 
> org.apache.iotdb.db.mpp.plan.planner.distribution.SimpleFragmentParallelPlanner.produceFragmentInstance(SimpleFragmentParallelPlanner.java:115)
> at 
> org.apache.iotdb.db.mpp.plan.planner.distribution.SimpleFragmentParallelPlanner.prepare(SimpleFragmentParallelPlanner.java:87)
> at 
> org.apache.iotdb.db.mpp.plan.planner.distribution.SimpleFragmentParallelPlanner.parallelPlan(SimpleFragmentParallelPlanner.java:78)
> at 
> org.apache.iotdb.db.mpp.plan.planner.distribution.DistributionPlanner.planFragmentInstances(DistributionPlanner.java:94)
> at 
> org.apache.iotdb.db.mpp.plan.planner.distribution.DistributionPlanner.planFragments(DistributionPlanner.java:78)
> at 
> org.apache.iotdb.db.mpp.plan.execution.QueryExecution.doDistributedPlan(QueryExecution.java:304)
> at 
> org.apache.iotdb.db.mpp.plan.execution.QueryExecution.start(QueryExecution.java:201)
> at 
> org.apache.iotdb.db.mpp.plan.execution.QueryExecution.retry(QueryExecution.java:235)
> at 
> org.apache.iotdb.db.mpp.plan.execution.QueryExecution.getStatus(QueryExecution.java:500)
> at 
> org.apache.iotdb.db.mpp.plan.Coordinator.execute(Coordinator.java:152)
> at 
> org.apache.iotdb.db.mpp.plan.analyze.ClusterSchemaFetcher.executeSchemaFetchQuery(ClusterSchemaFetcher.java:178)
> at 
> org.apache.iotdb.db.mpp.plan.analyze.ClusterSchemaFetcher.fetchSchema(ClusterSchemaFetcher.java:156)
> at 
> org.apache.iotdb.db.mpp.plan.analyze.ClusterSchemaFetcher.fetchSchema(ClusterSchemaFetcher.java:98)
> at 
> org.apache.iotdb.db.mpp.plan.analyze.ClusterSchemaFetcher.fetchSchemaWithAutoCreate(ClusterSchemaFetcher.java:265)
> at 
> org.apache.iotdb.db.mpp.plan.analyze.SchemaValidator.validate(SchemaValidator.java:56)
> at 
> org.apache.iotdb.db.mpp.execution.executor.RegionWriteExecutor$WritePlanNodeExecutionVisitor.executeDataInsert(RegionWriteExecutor.java:193)
> at 
> org.apache.iotdb.db.mpp.execution.executor.RegionWriteExecutor$WritePlanNodeExecutionVisitor.visitInsertTablet(RegionWriteExecutor.java:165)
> at 
> org.apache.iotdb.db.mpp.execution.executor.RegionWriteExecutor$WritePlanNodeExecutionVisitor.visitInsertTablet(RegionWriteExecutor.java:119)
> at 
> org.apache.iotdb.db.mpp.plan.planner.plan.node.write.InsertTabletNode.accept(InsertTabletNode.java:1086)
> at 
> org.apache.iotdb.db.mpp.execution.executor.RegionWriteExecutor.execute(RegionWriteExecutor.java:85)
> at 
> org.apache.iotdb.db.service.thrift.impl.DataNodeInternalRPCServiceImpl.sendPlanNode(DataNodeInternalRPCServiceImpl.java:283)
> at 
> org.apache.iotdb.mpp.rpc.thrift.IDataNodeRPCService$Processor$sendPlanNode.getResult(IDataNodeRPCService.java:3607)
> at 
> org.apache.i

[jira] [Commented] (IOTDB-4972) [DispatchFailed] NPE at org.apache.iotdb.db.mpp.plan.planner.plan.node.write.InsertNode.selfCheckDataTypes(InsertNode.java:251)

2022-11-22 Thread Jinrui Zhang (Jira)



[ 
https://issues.apache.org/jira/browse/IOTDB-4972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17637554#comment-17637554
 ] 

Jinrui Zhang commented on IOTDB-4972:
-

SENSOR_NUM is 10, which is too large. 

Let's investigate why the NPE is threw with large amount of series.

> [DispatchFailed] NPE at 
> org.apache.iotdb.db.mpp.plan.planner.plan.node.write.InsertNode.selfCheckDataTypes(InsertNode.java:251)
> ---
>
> Key: IOTDB-4972
> URL: https://issues.apache.org/jira/browse/IOTDB-4972
> Project: Apache IoTDB
>  Issue Type: Bug
>  Components: mpp-cluster
>Affects Versions: 0.14.0-SNAPSHOT
>Reporter: 刘珍
>Assignee: Jinrui Zhang
>Priority: Major
> Attachments: more_ts.conf
>
>
> master_1117_92c6a57
> 1. 启动3副本3C3D集群
> 2. benchmark创建元数据，写入数据
> 3.（ip62） datanode ERROR ，所有的数据写入失败（预期每个序列写入10个点，共5000万序列）：
> {color:red}*2022-11-17 16:32:59,456 
> [pool-26-IoTDB-ClientRPC-Processor-35$20221117_083256_00512_3] ERROR 
> o.a.i.d.m.p.s.FragmentInstanceDispatcherImpl:123 - [DispatchFailed]
> java.lang.NullPointerException: null
> at 
> org.apache.iotdb.db.mpp.plan.planner.plan.node.write.InsertNode.selfCheckDataTypes(InsertNode.java:251)*{color}
> at 
> org.apache.iotdb.db.mpp.plan.planner.plan.node.write.InsertTabletNode.validateAndSetSchema(InsertTabletNode.java:201)
> at 
> org.apache.iotdb.db.mpp.plan.analyze.SchemaValidator.validate(SchemaValidator.java:64)
> at 
> org.apache.iotdb.db.mpp.execution.executor.RegionWriteExecutor$WritePlanNodeExecutionVisitor.executeDataInsert(RegionWriteExecutor.java:191)
> at 
> org.apache.iotdb.db.mpp.execution.executor.RegionWriteExecutor$WritePlanNodeExecutionVisitor.visitInsertTablet(RegionWriteExecutor.java:163)
> at 
> org.apache.iotdb.db.mpp.execution.executor.RegionWriteExecutor$WritePlanNodeExecutionVisitor.visitInsertTablet(RegionWriteExecutor.java:117)
> at 
> org.apache.iotdb.db.mpp.plan.planner.plan.node.write.InsertTabletNode.accept(InsertTabletNode.java:1086)
> at 
> org.apache.iotdb.db.mpp.execution.executor.RegionWriteExecutor.execute(RegionWriteExecutor.java:83)
> at 
> org.apache.iotdb.db.mpp.plan.scheduler.FragmentInstanceDispatcherImpl.dispatchLocally(FragmentInstanceDispatcherImpl.java:232)
> at 
> org.apache.iotdb.db.mpp.plan.scheduler.FragmentInstanceDispatcherImpl.dispatchOneInstance(FragmentInstanceDispatcherImpl.java:137)
> at 
> org.apache.iotdb.db.mpp.plan.scheduler.FragmentInstanceDispatcherImpl.dispatchWriteSync(FragmentInstanceDispatcherImpl.java:119)
> at 
> org.apache.iotdb.db.mpp.plan.scheduler.FragmentInstanceDispatcherImpl.dispatch(FragmentInstanceDispatcherImpl.java:90)
> at 
> org.apache.iotdb.db.mpp.plan.scheduler.ClusterScheduler.start(ClusterScheduler.java:106)
> at 
> org.apache.iotdb.db.mpp.plan.execution.QueryExecution.schedule(QueryExecution.java:287)
> at 
> org.apache.iotdb.db.mpp.plan.execution.QueryExecution.start(QueryExecution.java:205)
> at 
> org.apache.iotdb.db.mpp.plan.Coordinator.execute(Coordinator.java:150)
> at 
> org.apache.iotdb.db.mpp.plan.Coordinator.execute(Coordinator.java:164)
> at 
> org.apache.iotdb.db.service.thrift.impl.ClientRPCServiceImpl.insertTablet(ClientRPCServiceImpl.java:1234)
> at 
> org.apache.iotdb.service.rpc.thrift.IClientRPCService$Processor$insertTablet.getResult(IClientRPCService.java:4078)
> at 
> org.apache.iotdb.service.rpc.thrift.IClientRPCService$Processor$insertTablet.getResult(IClientRPCService.java:4058)
> at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:38)
> at 
> org.apache.iotdb.db.service.thrift.ProcessorWithMetrics.process(ProcessorWithMetrics.java:64)
> at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:248)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> 复现流程
> 1. 192.168.10.62/66/68  72C256GB   3C3D
> ConfigNode配置文件：
> MAX_HEAP_SIZE="8G"
> cn_connection_timeout_ms=360
> DataNode配置文件：
> MAX_HEAP_SIZE="192G"
> MAX_DIRECT_MEMORY_SIZE="32G"
> Common配置文件：
> schema_region_consensus_protocol_class=org.apache.iotdb.consensus.ratis.RatisConsensus
> data_region_consensus_protocol_class=org.apache.iotdb.consensus.multileader.MultiLeaderConsensus
> schema_replication_factor=3
> data_replication_factor=3
> connection_timeout_ms=360
> max_connection_for_internal_service=1100
> enable_timed_flush_seq_memtable=true
> seq_memtable_flush_interval_i

[jira] [Commented] (IOTDB-4400) [new stand-alone]enableMetric, the write performance does not meet expectations

2022-11-17 Thread Jinrui Zhang (Jira)



[ 
https://issues.apache.org/jira/browse/IOTDB-4400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17635622#comment-17635622
 ] 

Jinrui Zhang commented on IOTDB-4400:
-

Let's test this issue again with the latest MultiLeader code

> [new stand-alone]enableMetric, the write performance does not meet 
> expectations
> ---
>
> Key: IOTDB-4400
> URL: https://issues.apache.org/jira/browse/IOTDB-4400
> Project: Apache IoTDB
>  Issue Type: Bug
>  Components: Others
>Affects Versions: master branch, 0.14.0, 0.14.0-SNAPSHOT
>Reporter: xiaozhihong
>Assignee: 张洪胤
>Priority: Major
> Attachments: config.properties, image-2022-09-14-10-27-26-819.png
>
>
> commit 74fb350809b2f1488a90d6d7c420f27ec14b24e5
> Turn on the monitoring function, and the two frameworks perform write 
> performance tests for different metric levels. The final result is very 
> confusing. Different levels or different frameworks, the performance of 
> writing is not noticeable. Turn off monitoring, execute writing, and also no 
> obvious difference is seen, and a positioning investigation needs to be done.
> Details:
> https://apache-iotdb.feishu.cn/docx/QUQSdbRaaoWDjQxcEz9cQjFdnsz?from=create_suite
> !image-2022-09-14-10-27-26-819.png|width=528,height=275!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (IOTDB-4505) [SystemResourceIssue] Why is the 60 client test better than the 300 client test

2022-11-17 Thread Jinrui Zhang (Jira)



[ 
https://issues.apache.org/jira/browse/IOTDB-4505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17635620#comment-17635620
 ] 

Jinrui Zhang commented on IOTDB-4505:
-

It should be related to the memory usage. That is, 300 clients will consume 
more memory so the available memory for MultiLeader is less than 60 clients. We 
have an optimization for memory control. See this PR 
https://github.com/apache/iotdb/pull/8025

> [SystemResourceIssue] Why is the 60 client test better than the 300 client 
> test
> ---
>
> Key: IOTDB-4505
> URL: https://issues.apache.org/jira/browse/IOTDB-4505
> Project: Apache IoTDB
>  Issue Type: Bug
>  Components: Core/Cluster
>Affects Versions: 0.14.0-SNAPSHOT
>Reporter: FengQingxin
>Assignee: Jinrui Zhang
>Priority: Major
> Attachments: config.properties, confignode-env.sh, datanode-env.sh, 
> image-2022-09-23-08-33-48-642.png, image-2022-09-26-08-10-41-274.png, 
> iotdb-confignode.properties, iotdb-datanode.properties
>
>
>   Reducing the number of clients reduces the number of threads and the number 
> of open files, and there is no write failure. The data file size difference 
> between the three nodes disappears
> Reproduce steps：
>  # Setup a cluster with 3C3D({color:#de350b}MultiLeaderConsensus{color})
>  # Using 3BMs to insert data(client=100*3)
>  # Setup a cluster with 3C3D({color:#de350b}MultiLeaderConsensus{color})
>  # Using 3BMs to insert data(client=20*3)
> BM    -  》  IoTDB Node
> 172.20.70.7    -  》 172.20.70.22
> 172.20.70.8    -  》 172.20.70.23
> 172.20.70.9    -  》 172.20.70.24
>  
> !image-2022-09-23-08-33-48-642.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (IOTDB-4506) [cluster]The data amount of the three nodes is quite different

2022-11-17 Thread Jinrui Zhang (Jira)



[ 
https://issues.apache.org/jira/browse/IOTDB-4506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17635617#comment-17635617
 ] 

Jinrui Zhang commented on IOTDB-4506:
-

We gave more optimization on MultiLeader module, let's keep tracking the metrics

> [cluster]The data amount of the three nodes is quite different
> --
>
> Key: IOTDB-4506
> URL: https://issues.apache.org/jira/browse/IOTDB-4506
> Project: Apache IoTDB
>  Issue Type: Bug
>  Components: Core/Cluster
>Affects Versions: 0.14.0-SNAPSHOT
>Reporter: FengQingxin
>Assignee: Jinrui Zhang
>Priority: Major
> Attachments: config.properties, confignode-env.sh, datanode-env.sh, 
> image-2022-09-23-08-42-14-584.png, iotdb-confignode.properties, 
> iotdb-datanode.properties
>
>
>  The data sizes of the three nodes are quite different, as are the maximum 
> open files and the maximum number of threads
> Reproduce steps：
>  # Setup a cluster with 3C3D({color:#de350b}MultiLeaderConsensus{color})
>  # Using 3BMs to insert data(client=100*3)
> BM    -  》  IoTDB Node
> 172.20.70.7    -  》 172.20.70.22
> 172.20.70.8    -  》 172.20.70.23
> 172.20.70.9    -  》 172.20.70.24
>  
> [http://111.202.73.147:13000/d/Qj_LC2G4z/atm-biao-zhun-da-qi-ya-huan-jing-ji-qun-xie-ru?orgId=1&from=1663288900985&to=1663643813559]
>  
> !image-2022-09-23-08-42-14-584.png|width=681,height=299!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (IOTDB-4825) [ multiLeader ] ERROR o.a.i.d.m.p.s.FixedRateFragInsStateTracker:114 - error happened while fetching query state

2022-11-17 Thread Jinrui Zhang (Jira)



[ 
https://issues.apache.org/jira/browse/IOTDB-4825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17635614#comment-17635614
 ] 

Jinrui Zhang commented on IOTDB-4825:
-

According to the log and write operation result, this issue seems to be an 
occasional issue caused by network connection. Won't fix

> [ multiLeader ] ERROR o.a.i.d.m.p.s.FixedRateFragInsStateTracker:114 - error 
> happened while fetching query state
> 
>
> Key: IOTDB-4825
> URL: https://issues.apache.org/jira/browse/IOTDB-4825
> Project: Apache IoTDB
>  Issue Type: Bug
>  Components: mpp-cluster
>Affects Versions: 0.14.0-SNAPSHOT
>Reporter: 刘珍
>Assignee: Jinrui Zhang
>Priority: Major
> Attachments: iotdb_4825.conf, screenshot-1.png
>
>
> master_1101_bc0e88b
> 3rep ， 3C3D
> schema region : ratis
> data region : multiLeader
> ip62 datanode ERROR during writing （All nodes are RUNNING） ：
> 2022-11-01 17:09:25,158 [pool-23-IoTDB-MPPCoordinatorScheduled-1] ERROR 
> o.a.i.d.m.p.s.FixedRateFragInsStateTracker:114 -{color:#DE350B}* error 
> happened while fetching query state*{color}
> java.io.IOException: Borrow client from pool for node 
> TEndPoint(ip:192.168.10.66, port:9003) failed.
> at 
> org.apache.iotdb.commons.client.ClientManager.borrowClient(ClientManager.java:61)
> at 
> org.apache.iotdb.db.mpp.plan.scheduler.AbstractFragInsStateTracker.fetchState(AbstractFragInsStateTracker.java:82)
> at 
> org.apache.iotdb.db.mpp.plan.scheduler.FixedRateFragInsStateTracker.fetchStateAndUpdate(FixedRateFragInsStateTracker.java:98)
> at 
> org.apache.iotdb.commons.concurrent.threadpool.ScheduledExecutorUtil.lambda$scheduleAtFixedRate$0(ScheduledExecutorUtil.java:153)
> at 
> org.apache.iotdb.commons.concurrent.WrappedRunnable$1.runMayThrow(WrappedRunnable.java:44)
> at 
> org.apache.iotdb.commons.concurrent.WrappedRunnable.run(WrappedRunnable.java:29)
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.InterruptedException: null
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2014)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2088)
> at 
> org.apache.commons.pool2.impl.LinkedBlockingDeque.pollFirst(LinkedBlockingDeque.java:937)
> at 
> org.apache.commons.pool2.impl.LinkedBlockingDeque.pollFirst(LinkedBlockingDeque.java:956)
> at 
> org.apache.commons.pool2.impl.GenericKeyedObjectPool.borrowObject(GenericKeyedObjectPool.java:449)
> at 
> org.apache.commons.pool2.impl.GenericKeyedObjectPool.borrowObject(GenericKeyedObjectPool.java:350)
> at 
> org.apache.iotdb.commons.client.ClientManager.borrowClient(ClientManager.java:50)
> ... 12 common frames omitted
> 测试流程：
> 1. 192.168.10.62/66/68 72C256GB
> ConfigNode
> MAX_HEAP_SIZE="8G"
> Common
> query_timeout_threshold=3600
> schema_region_consensus_protocol_class=org.apache.iotdb.consensus.ratis.RatisConsensus
> data_region_consensus_protocol_class=org.apache.iotdb.consensus.multileader.MultiLeaderConsensus
> schema_replication_factor=3
> data_replication_factor=3
> partition_region_ratis_request_timeout_ms=120
> schema_region_ratis_request_timeout_ms=120
> data_region_ratis_request_timeout_ms=120
> partition_region_ratis_max_retry_attempts=1
> schema_region_ratis_max_retry_attempts=1
> data_region_ratis_max_retry_attempts=1
> DataNode
> MAX_HEAP_SIZE="192G"
> MAX_DIRECT_MEMORY_SIZE="32G"
> 2. bm 在192.168.10.64
> /data/liuzhen_test/weektest/benchmark_tool
> 配置见附件
> {color:#00875A}*All writes succeeded*{color}
>  !screenshot-1.png! 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (IOTDB-4969) Distribution plan is not correct for Aggregation query with AlignByDevice

2022-11-16 Thread Jinrui Zhang (Jira)

Jinrui Zhang created IOTDB-4969:
---

 Summary: Distribution plan is not correct for Aggregation query 
with AlignByDevice 
 Key: IOTDB-4969
 URL: https://issues.apache.org/jira/browse/IOTDB-4969
 Project: Apache IoTDB
  Issue Type: Bug
Reporter: Jinrui Zhang
Assignee: Jinrui Zhang
 Attachments: image-2022-11-17-14-41-37-947.png, 
image-2022-11-17-14-44-18-254.png

If one device's data is distributed in more than one DataRegion. The 
aggregation query for this device is not correct due to wrong distribution 
Plan. See the wrong plan below.

!image-2022-11-17-14-41-37-947.png|width=615,height=391!

This plan will lead to wrong result as follow:

!image-2022-11-17-14-44-18-254.png|width=380,height=88!

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (IOTDB-4873) Multi-user concurrent write and query + [ select into ] : ERROR o.a.i.c.m.t.MultiLeaderConsensusIService$AsyncProcessor$syncLog$1:903 - Exception inside handler

2022-11-13 Thread Jinrui Zhang (Jira)



[ 
https://issues.apache.org/jira/browse/IOTDB-4873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17633427#comment-17633427
 ] 

Jinrui Zhang commented on IOTDB-4873:
-

It seems that some requests are fetched both from Queue and WAL when preparing 
batch, which leads to the `merge` operation on receiver side. See the snapshot 
below

!image-2022-11-14-09-19-47-544.png|width=1117,height=351!

> Multi-user concurrent write and query + [ select into ] : ERROR 
> o.a.i.c.m.t.MultiLeaderConsensusIService$AsyncProcessor$syncLog$1:903 - 
> Exception inside handler
> 
>
> Key: IOTDB-4873
> URL: https://issues.apache.org/jira/browse/IOTDB-4873
> Project: Apache IoTDB
>  Issue Type: Bug
>  Components: mpp-cluster
>Affects Versions: 0.14.0-SNAPSHOT
>Reporter: 刘珍
>Assignee: Haiming Zhu
>Priority: Major
> Attachments: 4873.conf, image-2022-11-14-09-17-48-992.png, 
> image-2022-11-14-09-18-10-120.png, image-2022-11-14-09-19-47-544.png, 
> screenshot-1.png, select_into.sh
>
>
> master_1107_523e82a
> 1. start 3rep ,3C 3D cluster
> 2. Start benchmark concurrent writes and queries
> 3. After 16 hours, ip62 execute " select into " 
> About 1000 SQL, single user execution ：
> ”select s_0,s_1,s_2,s_3,s_4,s_5,s_6,s_7,s_8,s_9,s_10 into 
> root.test.g_1.::(::) from root.test.g_1.d_ip62_660”
>  !screenshot-1.png! 
> ip62 datanode displays the following error log ：
> 2022-11-08 09:27:31,366 [pool-20-IoTDB-MultiLeaderConsensusRPC-Processor-72] 
> ERROR o.a.i.c.m.t.MultiLeaderConsensusIService$AsyncProcessor$syncLog$1:903 - 
> Exception inside handler
> java.lang.NullPointerException: null
> at 
> org.apache.iotdb.db.consensus.statemachine.DataRegionStateMachine.mergeInsertNodes(DataRegionStateMachine.java:376)
> at 
> org.apache.iotdb.db.consensus.statemachine.DataRegionStateMachine.grabInsertNode(DataRegionStateMachine.java:295)
> at 
> org.apache.iotdb.db.consensus.statemachine.DataRegionStateMachine.deserializeAndWrap(DataRegionStateMachine.java:272)
> at 
> org.apache.iotdb.db.consensus.statemachine.DataRegionStateMachine.write(DataRegionStateMachine.java:325)
> at 
> org.apache.iotdb.consensus.multileader.service.MultiLeaderRPCServiceProcessor.syncLog(MultiLeaderRPCServiceProcessor.java:132)
> at 
> org.apache.iotdb.consensus.multileader.thrift.MultiLeaderConsensusIService$AsyncProcessor$syncLog.start(MultiLeaderConsensusIService.java:922)
> at 
> org.apache.iotdb.consensus.multileader.thrift.MultiLeaderConsensusIService$AsyncProcessor$syncLog.start(MultiLeaderConsensusIService.java:865)
> at 
> org.apache.thrift.TBaseAsyncProcessor.process(TBaseAsyncProcessor.java:103)
> at 
> org.apache.thrift.server.AbstractNonblockingServer$AsyncFrameBuffer.invoke(AbstractNonblockingServer.java:603)
> at org.apache.thrift.server.Invocation.run(Invocation.java:18)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> 2022-11-08 09:27:50,962 [Query-Worker-Thread-48$20221108_012730_15774_3.1.0] 
> ERROR o.a.i.d.m.e.o.p.AbstractIntoOperator:123 - Error occurred while 
> inserting tablets in SELECT INTO: can't connect to node 
> {}TEndPoint(ip:192.168.10.68, port:9003)
> 2022-11-08 09:27:50,962 [Query-Worker-Thread-48$20221108_012730_15774_3.1.0] 
> ERROR o.a.i.d.m.e.s.AbstractDriverThread:80 - [ExecuteFailed]
> org.apache.iotdb.db.exception.IntoProcessException: Error occurred while 
> inserting tablets in SELECT INTO: can't connect to node 
> {}TEndPoint(ip:192.168.10.68, port:9003)
> at 
> org.apache.iotdb.db.mpp.execution.operator.process.AbstractIntoOperator.insertMultiTabletsInternally(AbstractIntoOperator.java:124)
> at 
> org.apache.iotdb.db.mpp.execution.operator.process.IntoOperator.next(IntoOperator.java:73)
> at 
> org.apache.iotdb.db.mpp.execution.driver.Driver.processInternal(Driver.java:186)
> at 
> org.apache.iotdb.db.mpp.execution.driver.Driver.lambda$processFor$1(Driver.java:125)
> at 
> org.apache.iotdb.db.mpp.execution.driver.Driver.tryWithLock(Driver.java:270)
> at 
> org.apache.iotdb.db.mpp.execution.driver.Driver.processFor(Driver.java:118)
> at 
> org.apache.iotdb.db.mpp.execution.schedule.DriverTaskThread.execute(DriverTaskThread.java:64)
> at 
> org.apache.iotdb.db.mpp.execution.schedule.AbstractDriverThread.run(AbstractDriverThread.java:74)
> 2022-11-08 09:27:50,966 [Query-Worker-Thread-48$20221108_012730_15774_3.1.0] 
> WARN  o.a.i.d.m.e.s.Driv

[jira] [Assigned] (IOTDB-4873) Multi-user concurrent write and query + [ select into ] : ERROR o.a.i.c.m.t.MultiLeaderConsensusIService$AsyncProcessor$syncLog$1:903 - Exception inside handler

2022-11-13 Thread Jinrui Zhang (Jira)



 [ 
https://issues.apache.org/jira/browse/IOTDB-4873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jinrui Zhang reassigned IOTDB-4873:
---

Assignee: Haiming Zhu  (was: Jinrui Zhang)

Please track this issue according to our experiment

> Multi-user concurrent write and query + [ select into ] : ERROR 
> o.a.i.c.m.t.MultiLeaderConsensusIService$AsyncProcessor$syncLog$1:903 - 
> Exception inside handler
> 
>
> Key: IOTDB-4873
> URL: https://issues.apache.org/jira/browse/IOTDB-4873
> Project: Apache IoTDB
>  Issue Type: Bug
>  Components: mpp-cluster
>Affects Versions: 0.14.0-SNAPSHOT
>Reporter: 刘珍
>Assignee: Haiming Zhu
>Priority: Major
> Attachments: 4873.conf, image-2022-11-14-09-17-48-992.png, 
> screenshot-1.png, select_into.sh
>
>
> master_1107_523e82a
> 1. start 3rep ,3C 3D cluster
> 2. Start benchmark concurrent writes and queries
> 3. After 16 hours, ip62 execute " select into " 
> About 1000 SQL, single user execution ：
> ”select s_0,s_1,s_2,s_3,s_4,s_5,s_6,s_7,s_8,s_9,s_10 into 
> root.test.g_1.::(::) from root.test.g_1.d_ip62_660”
>  !screenshot-1.png! 
> ip62 datanode displays the following error log ：
> 2022-11-08 09:27:31,366 [pool-20-IoTDB-MultiLeaderConsensusRPC-Processor-72] 
> ERROR o.a.i.c.m.t.MultiLeaderConsensusIService$AsyncProcessor$syncLog$1:903 - 
> Exception inside handler
> java.lang.NullPointerException: null
> at 
> org.apache.iotdb.db.consensus.statemachine.DataRegionStateMachine.mergeInsertNodes(DataRegionStateMachine.java:376)
> at 
> org.apache.iotdb.db.consensus.statemachine.DataRegionStateMachine.grabInsertNode(DataRegionStateMachine.java:295)
> at 
> org.apache.iotdb.db.consensus.statemachine.DataRegionStateMachine.deserializeAndWrap(DataRegionStateMachine.java:272)
> at 
> org.apache.iotdb.db.consensus.statemachine.DataRegionStateMachine.write(DataRegionStateMachine.java:325)
> at 
> org.apache.iotdb.consensus.multileader.service.MultiLeaderRPCServiceProcessor.syncLog(MultiLeaderRPCServiceProcessor.java:132)
> at 
> org.apache.iotdb.consensus.multileader.thrift.MultiLeaderConsensusIService$AsyncProcessor$syncLog.start(MultiLeaderConsensusIService.java:922)
> at 
> org.apache.iotdb.consensus.multileader.thrift.MultiLeaderConsensusIService$AsyncProcessor$syncLog.start(MultiLeaderConsensusIService.java:865)
> at 
> org.apache.thrift.TBaseAsyncProcessor.process(TBaseAsyncProcessor.java:103)
> at 
> org.apache.thrift.server.AbstractNonblockingServer$AsyncFrameBuffer.invoke(AbstractNonblockingServer.java:603)
> at org.apache.thrift.server.Invocation.run(Invocation.java:18)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> 2022-11-08 09:27:50,962 [Query-Worker-Thread-48$20221108_012730_15774_3.1.0] 
> ERROR o.a.i.d.m.e.o.p.AbstractIntoOperator:123 - Error occurred while 
> inserting tablets in SELECT INTO: can't connect to node 
> {}TEndPoint(ip:192.168.10.68, port:9003)
> 2022-11-08 09:27:50,962 [Query-Worker-Thread-48$20221108_012730_15774_3.1.0] 
> ERROR o.a.i.d.m.e.s.AbstractDriverThread:80 - [ExecuteFailed]
> org.apache.iotdb.db.exception.IntoProcessException: Error occurred while 
> inserting tablets in SELECT INTO: can't connect to node 
> {}TEndPoint(ip:192.168.10.68, port:9003)
> at 
> org.apache.iotdb.db.mpp.execution.operator.process.AbstractIntoOperator.insertMultiTabletsInternally(AbstractIntoOperator.java:124)
> at 
> org.apache.iotdb.db.mpp.execution.operator.process.IntoOperator.next(IntoOperator.java:73)
> at 
> org.apache.iotdb.db.mpp.execution.driver.Driver.processInternal(Driver.java:186)
> at 
> org.apache.iotdb.db.mpp.execution.driver.Driver.lambda$processFor$1(Driver.java:125)
> at 
> org.apache.iotdb.db.mpp.execution.driver.Driver.tryWithLock(Driver.java:270)
> at 
> org.apache.iotdb.db.mpp.execution.driver.Driver.processFor(Driver.java:118)
> at 
> org.apache.iotdb.db.mpp.execution.schedule.DriverTaskThread.execute(DriverTaskThread.java:64)
> at 
> org.apache.iotdb.db.mpp.execution.schedule.AbstractDriverThread.run(AbstractDriverThread.java:74)
> 2022-11-08 09:27:50,966 [Query-Worker-Thread-48$20221108_012730_15774_3.1.0] 
> WARN  o.a.i.d.m.e.s.DriverScheduler$Scheduler:387 - The task 
> 20221108_012730_15774_3.1.0 is aborted. All other tasks in the same query 
> will be cancelled
> TEST ENV：
> 1. 192.168.10.62 66 64   72CPU 256GB
> ConfigNode ：
> MAX_HEAP_SIZE="12G"
> MAX_DIRECT_MEMORY_SIZE="

[jira] [Commented] (IOTDB-4556) [Remove-DataNode] ERROR o.a.i.d.m.p.s.FragmentInstanceDispatcherImpl:287 - The consensus group DataRegion[24] doesn't exist

2022-11-11 Thread Jinrui Zhang (Jira)



[ 
https://issues.apache.org/jira/browse/IOTDB-4556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17632232#comment-17632232
 ] 

Jinrui Zhang commented on IOTDB-4556:
-

Talked with [~HeimingZ] for this issue, we found the behavior is not expected 
especially the error log `{*}failed to flush sync index{*}`. We doubt there 
might be some unexpected operation is triggered on ip73.

 

Let's try to reproduce it by latest code and investigate the issue online if it 
still exists

> [Remove-DataNode] ERROR o.a.i.d.m.p.s.FragmentInstanceDispatcherImpl:287 - 
> The consensus group DataRegion[24] doesn't exist
> ---
>
> Key: IOTDB-4556
> URL: https://issues.apache.org/jira/browse/IOTDB-4556
> Project: Apache IoTDB
>  Issue Type: Bug
>  Components: mpp-cluster
>Affects Versions: 0.14.0-SNAPSHOT
>Reporter: 刘珍
>Assignee: Haiming Zhu
>Priority: Major
> Attachments: 73to74.png, 73to75.png, 73to76.png, 
> ip73_dataregion24.png, more_dev.conf
>
>
> m_0929_71d5f65
> SchemaRegion : ratis
> DataRegion : multiLeader
> 均为3副本，3C5D
> 启动客户端bm写入，缩容期间写入不停。
> bm运行40分钟，缩容节点1（ip72），1小时38分钟缩容成功。
> mv 节点1的data ，logs，再上线。
> 缩容节点2（ip73，开始缩容的时间09-29 14:10），此节点不包含DataRegion[24]
> *{color:#DE350B}DataRegion[24]在ip74,ip75,ip76{color}*
> 但是ip73 error ：
> 2022-09-29 14:23:39,273 
> [pool-24-IoTDB-ClientRPC-Processor-2$20220929_062339_48081_4.1.0] 
> {color:#DE350B}*ERROR o.a.i.d.m.p.s.FragmentInstanceDispatcherImpl:287 - The 
> consensus group DataRegion[24] doesn't exist*{color}
> 2022-09-29 14:23:39,275 [MultiLeaderConsensusClientPool-selector-98] ERROR 
> o.a.i.c.m.l.IndexController:111 - {color:#DE350B}*failed to flush sync index. 
> cannot find previous version file. previous: 93500*{color}
> 2022-09-29 14:23:39,179 [pool-24-IoTDB-ClientRPC-Processor-45] WARN  
> o.a.i.d.u.ErrorHandlingUtils:62 - Status code: EXECUTE_STATEMENT_ERROR(400), 
> operation: insertTablet failed
> java.lang.RuntimeException: 
> org.apache.iotdb.commons.exception.IoTDBException: There are no available 
> RegionGroups currently, please check the status of cluster DataNodes
> at 
> org.apache.iotdb.db.mpp.plan.analyze.ClusterPartitionFetcher.getOrCreateDataPartition(ClusterPartitionFetcher.java:280)
> at 
> org.apache.iotdb.db.mpp.plan.analyze.AnalyzeVisitor.visitInsertTablet(AnalyzeVisitor.java:1236)
> at 
> org.apache.iotdb.db.mpp.plan.analyze.AnalyzeVisitor.visitInsertTablet(AnalyzeVisitor.java:150)
> at 
> org.apache.iotdb.db.mpp.plan.statement.crud.InsertTabletStatement.accept(InsertTabletStatement.java:121)
> at 
> org.apache.iotdb.db.mpp.plan.statement.StatementVisitor.process(StatementVisitor.java:98)
> at 
> org.apache.iotdb.db.mpp.plan.analyze.Analyzer.analyze(Analyzer.java:40)
> at 
> org.apache.iotdb.db.mpp.plan.execution.QueryExecution.analyze(QueryExecution.java:236)
> at 
> org.apache.iotdb.db.mpp.plan.execution.QueryExecution.(QueryExecution.java:138)
> at 
> org.apache.iotdb.db.mpp.plan.Coordinator.createQueryExecution(Coordinator.java:100)
> at 
> org.apache.iotdb.db.mpp.plan.Coordinator.execute(Coordinator.java:133)
> at 
> org.apache.iotdb.db.mpp.plan.Coordinator.execute(Coordinator.java:160)
> at 
> org.apache.iotdb.db.service.thrift.impl.ClientRPCServiceImpl.insertTablet(ClientRPCServiceImpl.java:996)
> at 
> org.apache.iotdb.service.rpc.thrift.IClientRPCService$Processor$insertTablet.getResult(IClientRPCService.java:3512)
> at 
> org.apache.iotdb.service.rpc.thrift.IClientRPCService$Processor$insertTablet.getResult(IClientRPCService.java:3492)
> at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:38)
> at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:38)
> at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:248)
> at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
> at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
> at java.base/java.lang.Thread.run(Thread.java:834)
> Caused by: org.apache.iotdb.commons.exception.IoTDBException: There are no 
> available RegionGroups currently, please check the status of cluster DataNodes
> ... 20 common frames omitted
> 测试环境
> 1. 192.168.10.72/73/74/75/76  48CPU384GB
> 3C : 72,73,74
> 5D : 72 ,73,74,75,76
> 集群配置
> ConfigNode 
> MAX_HEAP_SIZE="8G"
> schema_region_consensus_protocol_class=org.apache.iotdb.consensus.ratis.RatisConsensus
> data_region_consensus_protocol_class=org.apache.iotdb.consensus.multileader.MultiLeaderConsensus
> schema_replication_factor=3
> data_replication_fa

[jira] [Assigned] (IOTDB-4556) [Remove-DataNode] ERROR o.a.i.d.m.p.s.FragmentInstanceDispatcherImpl:287 - The consensus group DataRegion[24] doesn't exist

2022-11-09 Thread Jinrui Zhang (Jira)



 [ 
https://issues.apache.org/jira/browse/IOTDB-4556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jinrui Zhang reassigned IOTDB-4556:
---

Assignee: Haiming Zhu  (was: Jinrui Zhang)

> [Remove-DataNode] ERROR o.a.i.d.m.p.s.FragmentInstanceDispatcherImpl:287 - 
> The consensus group DataRegion[24] doesn't exist
> ---
>
> Key: IOTDB-4556
> URL: https://issues.apache.org/jira/browse/IOTDB-4556
> Project: Apache IoTDB
>  Issue Type: Bug
>  Components: mpp-cluster
>Affects Versions: 0.14.0-SNAPSHOT
>Reporter: 刘珍
>Assignee: Haiming Zhu
>Priority: Major
> Attachments: more_dev.conf
>
>
> m_0929_71d5f65
> SchemaRegion : ratis
> DataRegion : multiLeader
> 均为3副本，3C5D
> 启动客户端bm写入，缩容期间写入不停。
> bm运行40分钟，缩容节点1（ip72），1小时38分钟缩容成功。
> mv 节点1的data ，logs，再上线。
> 缩容节点2（ip73，开始缩容的时间09-29 14:10），此节点不包含DataRegion[24]
> *{color:#DE350B}DataRegion[24]在ip74,ip75,ip76{color}*
> 但是ip73 error ：
> 2022-09-29 14:23:39,273 
> [pool-24-IoTDB-ClientRPC-Processor-2$20220929_062339_48081_4.1.0] 
> {color:#DE350B}*ERROR o.a.i.d.m.p.s.FragmentInstanceDispatcherImpl:287 - The 
> consensus group DataRegion[24] doesn't exist*{color}
> 2022-09-29 14:23:39,275 [MultiLeaderConsensusClientPool-selector-98] ERROR 
> o.a.i.c.m.l.IndexController:111 - {color:#DE350B}*failed to flush sync index. 
> cannot find previous version file. previous: 93500*{color}
> 2022-09-29 14:23:39,179 [pool-24-IoTDB-ClientRPC-Processor-45] WARN  
> o.a.i.d.u.ErrorHandlingUtils:62 - Status code: EXECUTE_STATEMENT_ERROR(400), 
> operation: insertTablet failed
> java.lang.RuntimeException: 
> org.apache.iotdb.commons.exception.IoTDBException: There are no available 
> RegionGroups currently, please check the status of cluster DataNodes
> at 
> org.apache.iotdb.db.mpp.plan.analyze.ClusterPartitionFetcher.getOrCreateDataPartition(ClusterPartitionFetcher.java:280)
> at 
> org.apache.iotdb.db.mpp.plan.analyze.AnalyzeVisitor.visitInsertTablet(AnalyzeVisitor.java:1236)
> at 
> org.apache.iotdb.db.mpp.plan.analyze.AnalyzeVisitor.visitInsertTablet(AnalyzeVisitor.java:150)
> at 
> org.apache.iotdb.db.mpp.plan.statement.crud.InsertTabletStatement.accept(InsertTabletStatement.java:121)
> at 
> org.apache.iotdb.db.mpp.plan.statement.StatementVisitor.process(StatementVisitor.java:98)
> at 
> org.apache.iotdb.db.mpp.plan.analyze.Analyzer.analyze(Analyzer.java:40)
> at 
> org.apache.iotdb.db.mpp.plan.execution.QueryExecution.analyze(QueryExecution.java:236)
> at 
> org.apache.iotdb.db.mpp.plan.execution.QueryExecution.(QueryExecution.java:138)
> at 
> org.apache.iotdb.db.mpp.plan.Coordinator.createQueryExecution(Coordinator.java:100)
> at 
> org.apache.iotdb.db.mpp.plan.Coordinator.execute(Coordinator.java:133)
> at 
> org.apache.iotdb.db.mpp.plan.Coordinator.execute(Coordinator.java:160)
> at 
> org.apache.iotdb.db.service.thrift.impl.ClientRPCServiceImpl.insertTablet(ClientRPCServiceImpl.java:996)
> at 
> org.apache.iotdb.service.rpc.thrift.IClientRPCService$Processor$insertTablet.getResult(IClientRPCService.java:3512)
> at 
> org.apache.iotdb.service.rpc.thrift.IClientRPCService$Processor$insertTablet.getResult(IClientRPCService.java:3492)
> at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:38)
> at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:38)
> at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:248)
> at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
> at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
> at java.base/java.lang.Thread.run(Thread.java:834)
> Caused by: org.apache.iotdb.commons.exception.IoTDBException: There are no 
> available RegionGroups currently, please check the status of cluster DataNodes
> ... 20 common frames omitted
> 测试环境
> 1. 192.168.10.72/73/74/75/76  48CPU384GB
> 3C : 72,73,74
> 5D : 72 ,73,74,75,76
> 集群配置
> ConfigNode 
> MAX_HEAP_SIZE="8G"
> schema_region_consensus_protocol_class=org.apache.iotdb.consensus.ratis.RatisConsensus
> data_region_consensus_protocol_class=org.apache.iotdb.consensus.multileader.MultiLeaderConsensus
> schema_replication_factor=3
> data_replication_factor=3
> connection_timeout_ms=12
> DataNode
> MAX_HEAP_SIZE="256G"
> MAX_DIRECT_MEMORY_SIZE="32G"
> connection_timeout_ms=12
> max_connection_for_internal_service=200
> max_waiting_time_when_insert_blocked=60
> query_timeout_threshold=3600
> 2. benchmark配置见附件
> 3. bm运行40分钟 缩容ip72
> 等待ip72 缩容完成，datanode进程退出
> mv data logs
> 再次启动ip72
> 4. 缩容ip73 ，出现问题描述的ERROR

[jira] [Assigned] (IOTDB-4752) some data is lost after the read-only node returns to normal

2022-11-08 Thread Jinrui Zhang (Jira)



 [ 
https://issues.apache.org/jira/browse/IOTDB-4752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jinrui Zhang reassigned IOTDB-4752:
---

Assignee: Yongzao Dan  (was: Haiming Zhu)

> some data is lost after the read-only node returns to normal
> 
>
> Key: IOTDB-4752
> URL: https://issues.apache.org/jira/browse/IOTDB-4752
> Project: Apache IoTDB
>  Issue Type: Bug
>  Components: mpp-cluster
>Affects Versions: 0.14.0-SNAPSHOT
>Reporter: 刘珍
>Assignee: Yongzao Dan
>Priority: Major
> Attachments: disk space sample.png, 
> image-2022-10-25-16-22-00-198.png, iotdb_4752.conf, read-only check.png, 
> read-only reason.png, state machine lock.png, tsfile-content.png, 
> tsfiles.png, wal-content.png, 同步完成_ip72.out, 同步完成_ip73_ip74.out
>
>
> m_1025_cbc6225
> schema region : ratis
> data region : multiLeader
> 3rep , 3C3D
> During benchmark writing, ip72 is set to read-only due to "no space left on 
> device". 
> Wait for the benchmark write to complete, ip72 release disk space.
> ip72 : SET SYSTEM TO RUNNING
> Wait for synchronization to complete
> ip72 : flush
> Perform query comparison，ip72 {color:#DE350B}*lost some data*{color}.
> "select count(s_0) ,count(s_9),count(s_99),count(s_999),count(s_) from 
> root.** align by device"
>  !image-2022-10-25-16-22-00-198.png! 
> Test environment ：
> 1. 192.168.10.72 / 73 /74  48CPU 384GB
> benchmark : ip75 /home/liuzhen/benchmark/bm_0620_7ec96c1
> iotdb_dir : /ssd_data/mpp_test/m_1025_cbc6225
> ConfigNode
> MAX_HEAP_SIZE="8G"
> schema_region_consensus_protocol_class=org.apache.iotdb.consensus.ratis.RatisConsensus
> data_region_consensus_protocol_class=org.apache.iotdb.consensus.multileader.MultiLeaderConsensus
> schema_replication_factor=3
> data_replication_factor=3
> connection_timeout_ms=12
> schema_region_ratis_request_timeout_ms=120
> data_region_ratis_request_timeout_ms=120
> schema_region_ratis_max_retry_attempts=1
> data_region_ratis_max_retry_attempts=2
> DataNode
> MAX_HEAP_SIZE="256G"
> MAX_DIRECT_MEMORY_SIZE="32G"
> avg_series_point_number_threshold=1
> max_waiting_time_when_insert_blocked=360
> enable_seq_space_compaction=false
> enable_unseq_space_compaction=false
> enable_cross_space_compaction=false
> query_timeout_threshold=3600
> 2. benchmark configuration
> see attachment .
> 3. During benchmark writing, ip72 is set to read-only due to "no space left 
> on device". 
> 4. Wait for the benchmark write to complete, ip72 release disk space.
> 5.  ip72 : SET SYSTEM TO RUNNING
> 6.  Wait for synchronization to complete
> 7.  ip72 : flush
> 8. Perform query comparison
> select count(s_0) ,count(s_9),count(s_99),count(s_999),count(s_) from 
> root.** align by device



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (IOTDB-4874) NPE error when migrating MultiLeader Peer with 1 replica

2022-11-07 Thread Jinrui Zhang (Jira)

Jinrui Zhang created IOTDB-4874:
---

 Summary: NPE error when migrating MultiLeader Peer with 1 replica
 Key: IOTDB-4874
 URL: https://issues.apache.org/jira/browse/IOTDB-4874
 Project: Apache IoTDB
  Issue Type: Bug
Reporter: Jinrui Zhang
Assignee: Jinrui Zhang
 Attachments: image-2022-11-08-11-49-44-684.png

There is a bug when migrating MultiLeader Peer with 1 replica.

!image-2022-11-08-11-49-44-684.png|width=981,height=233!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (IOTDB-4731) [Remove-DataNode] Data is inconsistent （ remove datanode before the synchronization is complete ）

2022-11-06 Thread Jinrui Zhang (Jira)



 [ 
https://issues.apache.org/jira/browse/IOTDB-4731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jinrui Zhang reassigned IOTDB-4731:
---

Assignee: Haiming Zhu  (was: Jinrui Zhang)

> [Remove-DataNode] Data is inconsistent  （ remove datanode before the 
> synchronization is complete ）
> --
>
> Key: IOTDB-4731
> URL: https://issues.apache.org/jira/browse/IOTDB-4731
> Project: Apache IoTDB
>  Issue Type: Bug
>  Components: mpp-cluster
>Affects Versions: 0.14.0-SNAPSHOT
>Reporter: 刘珍
>Assignee: Haiming Zhu
>Priority: Major
> Attachments: aft_set_readonlyip68_regions.out, 
> bef_set_ip68readonly_regions.out, image-2022-10-24-15-13-45-086.png, 
> image-2022-10-24-15-22-17-729.png, 
> ip64-is-newpeer_leader_g_4_q_after_remove.out, ip66_g_4_q_after_remove.out, 
> more_ts.conf
>
>
> master_1023_2fea011
> 3rep ， 3C3D  ，benchmark  write done .
> Start the fourth datanode (ip64).
> ip68 : SET SYSTEM TO READONLY ON LOCAL
> remove datanode(ip68).
> before remove , ip68 is DataRegion[14]' Leader  , there is unsynchronized 
> data:
>  !image-2022-10-24-15-13-45-086.png! 
> When ip68 is in the removing state ， datanode error log :
> 2022-10-24 14:18:18,092 [pool-49-IoTDB-LogDispatcher-DataRegion[14]-3] ERROR 
> o.a.i.d.w.n.WALNode$PlanNodeIterator:590 - Fail to read wal from wal file 
> /data/liuzhen_test/master_1023_2fea011/sbin/../data/datanode/wal/root.test.g_4-14/_150-200-1.wal,
>  skip this file.
> java.nio.channels.ClosedByInterruptException: null
> at 
> java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202)
> at sun.nio.ch.FileChannelImpl.size(FileChannelImpl.java:315)
> at 
> org.apache.iotdb.db.wal.io.WALByteBufReader.(WALByteBufReader.java:47)
> at 
> org.apache.iotdb.db.wal.node.WALNode$PlanNodeIterator.hasNext(WALNode.java:552)
> at 
> org.apache.iotdb.db.wal.node.WALNode$PlanNodeIterator.next(WALNode.java:683)
> at 
> org.apache.iotdb.consensus.multileader.logdispatcher.LogDispatcher$LogDispatcherThread.constructBatchFromWAL(LogDispatcher.java:438)
> at 
> org.apache.iotdb.consensus.multileader.logdispatcher.LogDispatcher$LogDispatcherThread.getBatch(LogDispatcher.java:348)
> at 
> org.apache.iotdb.consensus.multileader.logdispatcher.LogDispatcher$LogDispatcherThread.run(LogDispatcher.java:274)
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> 2022-10-24 14:18:18,093 [pool-49-IoTDB-LogDispatcher-DataRegion[14]-3] ERROR 
> o.a.i.d.w.n.WALNode$PlanNodeIterator:590 - Fail to read wal from wal file 
> /data/liuzhen_test/master_1023_2fea011/sbin/../data/datanode/wal/root.test.g_4-14/_151-204-1.wal,
>  skip this file.
> java.nio.channels.ClosedByInterruptException: null
> at 
> java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202)
> at sun.nio.ch.FileChannelImpl.size(FileChannelImpl.java:315)
> at 
> org.apache.iotdb.db.wal.io.WALByteBufReader.(WALByteBufReader.java:47)
> at 
> org.apache.iotdb.db.wal.node.WALNode$PlanNodeIterator.hasNext(WALNode.java:552)
> at 
> org.apache.iotdb.db.wal.node.WALNode$PlanNodeIterator.hasNext(WALNode.java:597)
> at 
> org.apache.iotdb.db.wal.node.WALNode$PlanNodeIterator.next(WALNode.java:683)
> at 
> org.apache.iotdb.consensus.multileader.logdispatcher.LogDispatcher$LogDispatcherThread.constructBatchFromWAL(LogDispatcher.java:438)
> at 
> org.apache.iotdb.consensus.multileader.logdispatcher.LogDispatcher$LogDispatcherThread.getBatch(LogDispatcher.java:348)
> at 
> org.apache.iotdb.consensus.multileader.logdispatcher.LogDispatcher$LogDispatcherThread.run(LogDispatcher.java:274)
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> 2022-10-24 14:18:18,093 [pool-49-IoTDB-LogDispatcher-DataRegion[14]-3] ERROR 
> o.a.i.c.m.l.LogDispatcher$LogDispatcherThread:294 - Unexpected error in 
> logDispatcher for peer Peer{groupId=DataRegion[14], 
> endpoint=TEndPoint(ip:192.168.10.64, port:40010), nodeId=6}
> java.lang.ArrayInd

1 2 3 >

1 - 100 of 256 matches

Mail list logo