[jira] [Created] (IOTDB-6095) Tsfiles in sequence space may be overlap with each other due to LastFlushTime bug
Jinrui Zhang created IOTDB-6095: --- Summary: Tsfiles in sequence space may be overlap with each other due to LastFlushTime bug Key: IOTDB-6095 URL: https://issues.apache.org/jira/browse/IOTDB-6095 Project: Apache IoTDB Issue Type: Improvement Reporter: Jinrui Zhang Assignee: Jinrui Zhang Fix For: 1.2.1 This issue may lead to overlapped TsFile in sequence space. For example, When recover the last flush time map from two sequence TsFiles. * TsFile A only contains device1 with end time = 1 * TsFile B only contains device2 with end time = And the resources of these two TsFiles have been downgrade to FileTimeIndex. The previous code will use TsFile B with end time = to recover the last flush time of device1, which would cause sequence files overlapped. It is due to the recover step bug during the maintenance of LastFlushTime -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (IOTDB-5964) Add correctness check for target file of compaction
[ https://issues.apache.org/jira/browse/IOTDB-5964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jinrui Zhang reassigned IOTDB-5964: --- Sprint: 2023-3-Storage Assignee: 周沛辰 Description: # Add correctness check for target file of compaction. We can use TsFileSequenceRead or some other ways to do the basic file check. # Make this feature can be controlled by configuration and disabled by default Summary: Add correctness check for target file of compaction (was: Add) > Add correctness check for target file of compaction > --- > > Key: IOTDB-5964 > URL: https://issues.apache.org/jira/browse/IOTDB-5964 > Project: Apache IoTDB > Issue Type: Improvement >Reporter: Jinrui Zhang >Assignee: 周沛辰 >Priority: Major > > # Add correctness check for target file of compaction. We can use > TsFileSequenceRead or some other ways to do the basic file check. > # Make this feature can be controlled by configuration and disabled by > default -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IOTDB-5964) Add
Jinrui Zhang created IOTDB-5964: --- Summary: Add Key: IOTDB-5964 URL: https://issues.apache.org/jira/browse/IOTDB-5964 Project: Apache IoTDB Issue Type: Improvement Reporter: Jinrui Zhang -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IOTDB-5928) DeadLock between TTL and Compaction
Jinrui Zhang created IOTDB-5928: --- Summary: DeadLock between TTL and Compaction Key: IOTDB-5928 URL: https://issues.apache.org/jira/browse/IOTDB-5928 Project: Apache IoTDB Issue Type: Improvement Reporter: Jinrui Zhang Assignee: Jinrui Zhang Fix For: 1.1.1 h4. 版本 {panel} Enterprise version 1.1.1-SNAPSHOT (Build: a8387f1) {panel} h4. 复现步骤 {panel} 问题描述: TTL 和 合并并发产生死锁,数据写入不进去(没报错信息)。 测试流程如下: 1. 测试版本 Enterprise version 1.1.1-SNAPSHOT (Build: a8387f1) 启动3副本3C5D集群,配置参数以ip74为例: liuzhen@fit-74:/data/mpp_test/t_rc4_0516_a8387f1$ conf/confignode-env.sh MAX_HEAP_SIZE="8G" liuzhen@fit-74:/data/mpp_test/t_rc4_0516_a8387f1$ conf/iotdb-confignode.properties cn_internal_address=192.168.10.74 cn_target_config_node_list=192.168.10.72:10710 cn_connection_timeout_ms=12 cn_metric_reporter_list=PROMETHEUS cn_metric_level=IMPORTANT cn_metric_prometheus_reporter_port=9081 liuzhen@fit-74:/data/mpp_test/t_rc4_0516_a8387f1$ conf/datanode-env.sh MAX_HEAP_SIZE="256G" MAX_DIRECT_MEMORY_SIZE="32G" liuzhen@fit-74:/data/mpp_test/t_rc4_0516_a8387f1$ conf/iotdb-datanode.properties dn_rpc_address=192.168.10.74 dn_internal_address=192.168.10.74 dn_target_config_node_list=192.168.10.72:10710,192.168.10.73:10710,192.168.10.74:10710 dn_connection_timeout_ms=12 dn_metric_reporter_list=PROMETHEUS dn_metric_level=IMPORTANT liuzhen@fit-74:/data/mpp_test/t_rc4_0516_a8387f1$ conf/iotdb-common.properties schema_replication_factor=3 data_replication_factor=3 series_slot_num=1000 schema_region_group_extension_policy=CUSTOM default_schema_region_group_num_per_database=10 data_region_group_extension_policy=CUSTOM default_data_region_group_num_per_database=20 disk_space_warning_threshold=0.01 query_timeout_threshold=3600 iot_consensus_throttle_threshold_in_byte=536870912000 *2. 启动benchmark 读写,配置文件见附件 0517_rc4_lt.conf* *3.启动TTL 脚本,配置文件见附件set_ttl.sh* {*}每48小时{*},先把集群置为READONLY, 再设置TTL 删除所有的tsfile(没flush,没封口的tsfile不删除),unset ttl ,设置集群为RUNNING。({*}这期间benchmark客户端读写操作不停{*}) !image-2023-05-25-15-07-11-025.png! *4.{color:#de350b}运行4 day,出现死锁,数据写入不进去。{color}* 监控看到的write point per second 为0 !image-2023-05-25-15-02-55-676.png! {panel} h4. Bug 现象 {panel} TTL 和 合并并发产生死锁 {panel} h4. 预期结果 {panel} 无死锁 {panel} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IOTDB-5859) Compaction error when using Version as first sort dimension
Jinrui Zhang created IOTDB-5859: --- Summary: Compaction error when using Version as first sort dimension Key: IOTDB-5859 URL: https://issues.apache.org/jira/browse/IOTDB-5859 Project: Apache IoTDB Issue Type: Improvement Reporter: Jinrui Zhang Assignee: 周沛辰 Fix For: 1.1.1 In current implementation of compaction, the default sort dimension is file version when selecting compaction tasks. It will lead to compaction error in tsfile load scenario. It is because the TsFile with higher version may not has greater timestamp when the TsFile is loaded by tools. Solution: change the sort dimension from file version to timestamp. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (IOTDB-5843) Write operation won't be rejected even if the NodeStatus is ReadOnly
[ https://issues.apache.org/jira/browse/IOTDB-5843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jinrui Zhang reassigned IOTDB-5843: --- Assignee: Song Ziyang > Write operation won't be rejected even if the NodeStatus is ReadOnly > > > Key: IOTDB-5843 > URL: https://issues.apache.org/jira/browse/IOTDB-5843 > Project: Apache IoTDB > Issue Type: Bug >Reporter: Jinrui Zhang >Assignee: Song Ziyang >Priority: Major > Attachments: image-2023-05-06-19-15-45-109.png > > > *Description:* > # Ctrl + C to stop a running DataNode > # `show cluster` shows that the DataNode is ReadOnly > # the write towards this DataNode won't be rejected > *Analysis:* > When using Ctrl+C to stop a DataNode, the shutdownhook will be invoked. And, > inside the shutdown hook, the DataNode is setting to status `Stopping` and > `Readonly`. > Because of the change of PR > [https://github.com/apache/iotdb/pull/9274/files,] the isReadOnly() will > return false in DataRegionStatemachine, which won't reject the write > operations. > !image-2023-05-06-19-15-45-109.png|width=550,height=182! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IOTDB-5843) Write operation won't be rejected even if the NodeStatus
Jinrui Zhang created IOTDB-5843: --- Summary: Write operation won't be rejected even if the NodeStatus Key: IOTDB-5843 URL: https://issues.apache.org/jira/browse/IOTDB-5843 Project: Apache IoTDB Issue Type: Improvement Reporter: Jinrui Zhang -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IOTDB-5827) Change default multi_dir_strategy to SequenceStrategy and fix original bug
Jinrui Zhang created IOTDB-5827: --- Summary: Change default multi_dir_strategy to SequenceStrategy and fix original bug Key: IOTDB-5827 URL: https://issues.apache.org/jira/browse/IOTDB-5827 Project: Apache IoTDB Issue Type: Improvement Reporter: Jinrui Zhang Fix For: 1.1.1 # change default multi_dir_strategy to SequenceStrategy # fix original bug in SequenceStrategy where one folder won't used if others space is limited # use {{diskSpaceWarningThreshold}} to decide whether a folder is full or not -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (IOTDB-4593) [Remove-DataNode] Removing nodes writes data
[ https://issues.apache.org/jira/browse/IOTDB-4593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17714495#comment-17714495 ] Jinrui Zhang commented on IOTDB-4593: - Not a blocking issue > [Remove-DataNode] Removing nodes writes data > > > Key: IOTDB-4593 > URL: https://issues.apache.org/jira/browse/IOTDB-4593 > Project: Apache IoTDB > Issue Type: Bug > Components: mpp-cluster >Affects Versions: 1.1.0-SNAPSHOT >Reporter: 刘珍 >Assignee: Xinyu Tan >Priority: Major > Attachments: image-2022-10-10-13-36-14-475.png, > image-2023-03-08-11-29-52-352.png, image-2023-03-08-11-30-38-559.png, > image-2023-03-08-11-30-51-220.png, image-2023-03-08-11-33-49-278.png, > more_dev.conf, screenshot-1.png > > > m_0930_2a30316 > 问题描述: > 缩容datanode,{color:#DE350B}*节点置为Removing状态,但是在继续接受写入*{color}(benchmark运行1小时,执行缩容,*耗时3小时*,缩容完成): > 2022-10-08 13:23:54,686 [pool-20-IoTDB-DataNodeInternalRPC-Processor-148] > INFO o.a.i.c.conf.CommonConfig:305 - *Set system mode from Running to > Removing*. > Removing状态后(create 207个tsfile), > !image-2022-10-10-13-36-14-475.png! > 测试环境 > 1. 192.168.10.71-76 6台物理机 48cpu 384GB > 3C : 192.168.10.72 , 73,74 > 5D : 192.168.10.72 , 73,74 , 75 , 76 > benchmark:192.168.10.71 > ConfigNode配置参数 > MAX_HEAP_SIZE="8G" > schema_region_consensus_protocol_class=org.apache.iotdb.consensus.ratis.RatisConsensus > data_region_consensus_protocol_class=org.apache.iotdb.consensus.multileader.MultiLeaderConsensus > schema_replication_factor=3 > data_replication_factor=3 > connection_timeout_ms=12 > DataNode配置参数 > MAX_HEAP_SIZE="256G" > MAX_DIRECT_MEMORY_SIZE="32G" > connection_timeout_ms=12 > max_connection_for_internal_service=200 > max_waiting_time_when_insert_blocked=60 > query_timeout_threshold=3600 > 2. benchmark 配置文件见附件 > GROUP_NUMBER=10 > DEVICE_NUMBER=5 > SENSOR_NUMBER=600 > IS_OUT_OF_ORDER=false > OPERATION_PROPORTION=1:0:0:0:0:0:0:0:0:0:0 > CLIENT_NUMBER=100 > LOOP=100 > BATCH_SIZE_PER_WRITE=100 > 3. 运行1小时,ip72缩容 > liuzhen@fit-72:/data/mpp_test/m_0930_2a30316/datanode$ cat > 1008_test_remove_1h.sh > sleep 1h > /data/mpp_test/m_0930_2a30316/datanode/sbin/start-cli.sh -h 192.168.10.72 -e > "show cluster" > 1008_3c5d_bef_remove.out > /data/mpp_test/m_0930_2a30316/datanode/sbin/start-cli.sh -h 192.168.10.72 -e > "show regions" >> 1008_3c5d_bef_remove.out > /data/mpp_test/m_0930_2a30316/datanode/sbin/remove-datanode.sh > "192.168.10.72:6667" >> 1008_3c5d_1hour_remove_ip72.out > 4. ip72 的日志见机器上的备份 > /data/mpp_test/m_0930_2a30316/datanode/logs_bm_1h_remove_ip72 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (IOTDB-5063) [ start datanode ] Failed to start Grpc server
[ https://issues.apache.org/jira/browse/IOTDB-5063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17714494#comment-17714494 ] Jinrui Zhang commented on IOTDB-5063: - Suggest to re-test the scenario > [ start datanode ] Failed to start Grpc server > -- > > Key: IOTDB-5063 > URL: https://issues.apache.org/jira/browse/IOTDB-5063 > Project: Apache IoTDB > Issue Type: Bug > Components: mpp-cluster >Affects Versions: 0.14.0-SNAPSHOT >Reporter: 刘珍 >Assignee: Jinrui Zhang >Priority: Blocker > Attachments: screenshot-1.png, screenshot-2.png, screenshot-3.png, > screenshot-4.png, screenshot-5.png > > Original Estimate: 48h > Remaining Estimate: 48h > > master : 1127_4d7c15d > 1. 启动3ConfigNode > 2. 启动21DataNode,总是有1个datanode启动失败({color:#DE350B}复现3次{color}均能复现),报错信息有2种: > 报错1 (出现2次): > 2022-11-28 09:44:11,906 [main] ERROR o.a.ratis.util.ExitUtils:133 - > Terminating with exit status 1: Failed to start Grpc server > java.io.IOException: Failed to bind to address 0.0.0.0/0.0.0.0:50010 > at > org.apache.ratis.thirdparty.io.grpc.netty.NettyServer.start(NettyServer.java:328) > at > org.apache.ratis.thirdparty.io.grpc.internal.ServerImpl.start(ServerImpl.java:183) > at > org.apache.ratis.thirdparty.io.grpc.internal.ServerImpl.start(ServerImpl.java:92) > at > org.apache.ratis.grpc.server.GrpcService.startImpl(GrpcService.java:266) > at > org.apache.ratis.util.LifeCycle.startAndTransition(LifeCycle.java:270) > at > org.apache.ratis.server.RaftServerRpcWithProxy.start(RaftServerRpcWithProxy.java:72) > at > org.apache.ratis.server.impl.RaftServerProxy.startImpl(RaftServerProxy.java:394) > at > org.apache.ratis.util.LifeCycle.startAndTransition(LifeCycle.java:270) > at > org.apache.ratis.server.impl.RaftServerProxy.start(RaftServerProxy.java:387) > at > org.apache.iotdb.consensus.ratis.RatisConsensus.start(RatisConsensus.java:156) > at org.apache.iotdb.db.service.DataNode.active(DataNode.java:319) > at org.apache.iotdb.db.service.DataNode.doAddNode(DataNode.java:162) > at > org.apache.iotdb.db.service.DataNodeServerCommandLine.run(DataNodeServerCommandLine.java:95) > at > org.apache.iotdb.commons.ServerCommandLine.doMain(ServerCommandLine.java:58) > at org.apache.iotdb.db.service.DataNode.main(DataNode.java:132) > Caused by: > org.apache.ratis.thirdparty.io.netty.channel.unix.Errors$NativeIoException: > bind(..) failed: Address already in use > 2022-11-28 09:44:11,910 [Thread-0] ERROR o.a.ratis.util.ExitUtils:133 - > Terminating with exit status -1: Thread[Thread-0,5,main] has thrown an > uncaught exception > java.lang.NullPointerException: null > at > org.apache.iotdb.db.service.IoTDBShutdownHook.run(IoTDBShutdownHook.java:60) > 查看这个节点的datanode进程的端口信息: > !screenshot-2.png! > 报错2(出现1次): > !screenshot-3.png! > 查看这个节点的datanode进程的端口信息: > !screenshot-4.png! > 启动成功的datanode的端口信息: > !screenshot-5.png! > 测试环境-私有云1期 , 8C32GB ,24台机器 > 1. ConfigNode配置 > MAX_HEAP_SIZE="20G" > MAX_DIRECT_MEMORY_SIZE="6G" > 2. DataNode配置 > MAX_HEAP_SIZE="20G" > MAX_DIRECT_MEMORY_SIZE="6G" > 3. Common配置 > schema_replication_factor=3 > data_replication_factor=3 > 4.启动3ConfigNode (ip23,24,25) > 5.启动21DataNode ,启动脚本(21个Datanode的启动命令,间隔1秒) > [root@i-66xazbht deploy_mpp_scripts]# cat 4_start_data_node.sh > #!/bin/bash > cluster_dir="/data/iotdb" > cur_cluster="m_1127_4d7c15d" > u_name="root" > exec 3 while read line <&3 > do > ssh ${u_name}@${line} "source > /etc/profile;${cluster_dir}/${cur_cluster}/sbin/start-datanode.sh > /dev/null > 2>&1 &" > sleep 1 > done > 6.查看集群信息,总是有1个datanode 是Unknown,去这个节点查看log > !screenshot-1.png! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (IOTDB-5557) [ metadata ] The metadata query results are inconsistent
[ https://issues.apache.org/jira/browse/IOTDB-5557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17714493#comment-17714493 ] Jinrui Zhang commented on IOTDB-5557: - DataNode should be ready (visible to client) until all the replays of metadata operations finish > [ metadata ] The metadata query results are inconsistent > > > Key: IOTDB-5557 > URL: https://issues.apache.org/jira/browse/IOTDB-5557 > Project: Apache IoTDB > Issue Type: Bug > Components: Core/Schema Manager, mpp-cluster >Affects Versions: 1.1.0-SNAPSHOT >Reporter: 刘珍 >Assignee: Song Ziyang >Priority: Blocker > Attachments: image-2023-02-20-14-04-32-611.png > > > master : 0219_0cd4461 > 启动集群,log_datanode_all.log出现enjoy后,查询元数据,出现查询结果不一致(动态增加,直到全部元数据加载到内存)。 > 期望:只要集群已经开始提供查询服务,就要保证查询结果的一致性。 > 测试环境: > 1. 192.168.10.76 48cpu 384GB 内存 > 元数据信息:1db,1万设备,600序列/dev。 > ConfigNode: > MAX_HEAP_SIZE="8G" > DataNode: > MAX_HEAP_SIZE="256G" > MAX_DIRECT_MEMORY_SIZE="32G" > COMMON配置 > time_partition_interval=6048000 > query_timeout_threshold=3600 > enable_seq_space_compaction=false > enable_unseq_space_compaction=false > enable_cross_space_compaction=false > 2. 清操作系统缓存,启动数据库,出现enjoy后,执行count devices查看结果 > cat check_device_count.sh > while true > do > v_start=`grep enjoy logs/log_datanode_all.log|wc -l` > if [[ ${v_start} = "1" ]];then > for i in {1..100} > do >./sbin/start-cli.sh -h 192.168.10.76 -e "count devices;" > >> dev_count_during_start.out > done > break > fi > done > 下图结果,可以看出,count devices的结果在动态增加,直至1,完全加载到内存中: > !image-2023-02-20-14-04-32-611.png! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (IOTDB-5781) Change the default strategy to SequenceStrategy
[ https://issues.apache.org/jira/browse/IOTDB-5781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17713918#comment-17713918 ] Jinrui Zhang commented on IOTDB-5781: - In other words, only all the data_dirs' space is less than 5%, the system should change to read-only. > Change the default strategy to SequenceStrategy > --- > > Key: IOTDB-5781 > URL: https://issues.apache.org/jira/browse/IOTDB-5781 > Project: Apache IoTDB > Issue Type: Improvement >Reporter: Jialin Qiao >Priority: Major > > Currently, we do not allow other strategy except for > MaxDiskUsableSpaceFirstStrategy, which forbids us to accelerate writing by > multi-data dirs. > > So, we need to refine the SequenceStrategy, if the remaining space of a disk > is less than > disk_space_warning_threshold, we could stop allocating it. Then, the default > strategy could be SequenceStrategy. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (IOTDB-5703) Memtable won't be flushed for a long while even if the time_partition is inactive
[ https://issues.apache.org/jira/browse/IOTDB-5703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17702430#comment-17702430 ] Jinrui Zhang commented on IOTDB-5703: - Checked with [~HeimingZ] , current timed_flush mechanism works fine while the default interval is too long (3 hours). > Memtable won't be flushed for a long while even if the time_partition is > inactive > - > > Key: IOTDB-5703 > URL: https://issues.apache.org/jira/browse/IOTDB-5703 > Project: Apache IoTDB > Issue Type: Improvement >Reporter: Jinrui Zhang >Assignee: Haiming Zhu >Priority: Major > > *Description* > During our test, we found that the memtable of some time partition won't be > flushed even if there is no data insertion towards the time partition. > *Impaction* > If the memtable is not flushed, there will be a unclosed tsfile inside the > time partition, which will block the inner compaction of this time partition > *Solution* > We do have a timed flush strategy but it seems it is not working now. We need > to fix it. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IOTDB-5704) Optimize default parameters in iotdb-common for WAL part
Jinrui Zhang created IOTDB-5704: --- Summary: Optimize default parameters in iotdb-common for WAL part Key: IOTDB-5704 URL: https://issues.apache.org/jira/browse/IOTDB-5704 Project: Apache IoTDB Issue Type: Improvement Reporter: Jinrui Zhang Assignee: Jinrui Zhang During the test of customers, we found that some parameters about WAL will always be changed to a more suitable value. So we'd better to optimize the default value of them. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IOTDB-5703) Memtable won't be flushed for a long while even if the time_partition is inactive
Jinrui Zhang created IOTDB-5703: --- Summary: Memtable won't be flushed for a long while even if the time_partition is inactive Key: IOTDB-5703 URL: https://issues.apache.org/jira/browse/IOTDB-5703 Project: Apache IoTDB Issue Type: Improvement Reporter: Jinrui Zhang Assignee: Haiming Zhu *Description* During our test, we found that the memtable of some time partition won't be flushed even if there is no data insertion towards the time partition. *Impaction* If the memtable is not flushed, there will be a unclosed tsfile inside the time partition, which will block the inner compaction of this time partition *Solution* We do have a timed flush strategy but it seems it is not working now. We need to fix it. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (IOTDB-5667) Compaction scheduler strategy issue
[ https://issues.apache.org/jira/browse/IOTDB-5667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jinrui Zhang reassigned IOTDB-5667: --- Assignee: 周沛辰 (was: Jinrui Zhang) > Compaction scheduler strategy issue > --- > > Key: IOTDB-5667 > URL: https://issues.apache.org/jira/browse/IOTDB-5667 > Project: Apache IoTDB > Issue Type: Improvement >Reporter: Jinrui Zhang >Assignee: 周沛辰 >Priority: Major > > # speed of compaction cannot reach the increase speed of new file > # high level compaction is executed but there are lots of 0-level files > # supply more IT for inner compaction regarding file selecting > # Queue is occupied by out-of-date/ low-priority task -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IOTDB-5667) Compaction scheduler strategy issue
Jinrui Zhang created IOTDB-5667: --- Summary: Compaction scheduler strategy issue Key: IOTDB-5667 URL: https://issues.apache.org/jira/browse/IOTDB-5667 Project: Apache IoTDB Issue Type: Improvement Reporter: Jinrui Zhang Assignee: Jinrui Zhang # speed of compaction cannot reach the increase speed of new file # high level compaction is executed but there are lots of 0-level files # supply more IT for inner compaction regarding file selecting # Queue is occupied by out-of-date/ low-priority task -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IOTDB-5628) SchemaRegion with 1 replica occurs error when restarting IoTDB server
Jinrui Zhang created IOTDB-5628: --- Summary: SchemaRegion with 1 replica occurs error when restarting IoTDB server Key: IOTDB-5628 URL: https://issues.apache.org/jira/browse/IOTDB-5628 Project: Apache IoTDB Issue Type: Improvement Reporter: Jinrui Zhang Assignee: Jinrui Zhang *Background:* IoTDB server runs for a very long time with lots of timeseries/data. *Operation* shutdown server and restart it. Start client to write data again. *Error* -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IOTDB-5626) SchemaRegion with 1 replica occurs error when restarting IoTDB server
Jinrui Zhang created IOTDB-5626: --- Summary: SchemaRegion with 1 replica occurs error when restarting IoTDB server Key: IOTDB-5626 URL: https://issues.apache.org/jira/browse/IOTDB-5626 Project: Apache IoTDB Issue Type: Improvement Reporter: Jinrui Zhang Assignee: Jinrui Zhang Attachments: image-2023-03-06-15-43-48-809.png *Background:* IoTDB server runs for a very long time with lots of timeseries/data. *Operation* shutdown server and restart it. Start client to write data again. *Error* *!image-2023-03-06-15-43-48-809.png|width=1278,height=248!* -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IOTDB-5625) SchemaRegion with 1 replica occurs error when restarting IoTDB server
Jinrui Zhang created IOTDB-5625: --- Summary: SchemaRegion with 1 replica occurs error when restarting IoTDB server Key: IOTDB-5625 URL: https://issues.apache.org/jira/browse/IOTDB-5625 Project: Apache IoTDB Issue Type: Improvement Reporter: Jinrui Zhang Assignee: Jinrui Zhang Attachments: image-2023-03-06-15-41-51-456.png *Background:* IoTDB server runs for a very long time with lots of timeseries/data. *Operation* shutdown server and restart it. Start client to write data again. *Error* *!image-2023-03-06-15-41-51-456.png|width=1278,height=248!* -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IOTDB-5614) Error when restarting cluster and write client in GeelyCarTest
Jinrui Zhang created IOTDB-5614: --- Summary: Error when restarting cluster and write client in GeelyCarTest Key: IOTDB-5614 URL: https://issues.apache.org/jira/browse/IOTDB-5614 Project: Apache IoTDB Issue Type: Improvement Reporter: Jinrui Zhang Assignee: Jinrui Zhang Attachments: image-2023-03-03-11-41-12-903.png !image-2023-03-03-11-41-12-903.png|width=1236,height=454! This error occurred when restarting write-client. And it disappeared after restarting write-client -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (IOTDB-5472) [Atmos]The number of tsfiles went up between db6f17e 【02/01】and 5602d0e【02/05】
[ https://issues.apache.org/jira/browse/IOTDB-5472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jinrui Zhang reassigned IOTDB-5472: --- Assignee: Haiming Zhu > [Atmos]The number of tsfiles went up between db6f17e 【02/01】and 5602d0e【02/05】 > -- > > Key: IOTDB-5472 > URL: https://issues.apache.org/jira/browse/IOTDB-5472 > Project: Apache IoTDB > Issue Type: Improvement > Components: Core/Engine >Reporter: Qingxin Feng >Assignee: Haiming Zhu >Priority: Minor > Attachments: image-2023-02-06-09-18-27-596.png, > image-2023-02-06-09-22-50-304.png, image-2023-02-08-08-41-55-425.png, > image-2023-02-10-08-55-18-990.png, image-2023-02-10-17-31-06-102.png > > > The number of tsfiles went up between db6f17e 【02/01】and 5602d0e【02/05】 > Please refer to below picture. > > !image-2023-02-06-09-22-50-304.png|width=661,height=380! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IOTDB-5511) Metrics of running compaction task is not accurate
Jinrui Zhang created IOTDB-5511: --- Summary: Metrics of running compaction task is not accurate Key: IOTDB-5511 URL: https://issues.apache.org/jira/browse/IOTDB-5511 Project: Apache IoTDB Issue Type: Improvement Reporter: Jinrui Zhang Assignee: Liuxuxin Attachments: image-2023-02-09-15-25-28-982.png, image-2023-02-09-15-25-58-380.png Currently there is 10 compaction task is running but the dashboards says 9. See the snapshot below. Logs: !image-2023-02-09-15-25-28-982.png|width=603,height=101! Dashboard: !image-2023-02-09-15-25-58-380.png|width=604,height=201! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IOTDB-5509) Limit the level of unseq file in cross compaction
Jinrui Zhang created IOTDB-5509: --- Summary: Limit the level of unseq file in cross compaction Key: IOTDB-5509 URL: https://issues.apache.org/jira/browse/IOTDB-5509 Project: Apache IoTDB Issue Type: Improvement Reporter: Jinrui Zhang Assignee: Jinrui Zhang -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IOTDB-5508) System backup Tools
Jinrui Zhang created IOTDB-5508: --- Summary: System backup Tools Key: IOTDB-5508 URL: https://issues.apache.org/jira/browse/IOTDB-5508 Project: Apache IoTDB Issue Type: Improvement Reporter: Jinrui Zhang Assignee: 马子坤 Investigate and design the implementation of System Backup -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (IOTDB-5140) Add metrics for compaction deserializing pages or writing chunks
[ https://issues.apache.org/jira/browse/IOTDB-5140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17681995#comment-17681995 ] Jinrui Zhang commented on IOTDB-5140: - PR towards rel/1.0 is not merged > Add metrics for compaction deserializing pages or writing chunks > > > Key: IOTDB-5140 > URL: https://issues.apache.org/jira/browse/IOTDB-5140 > Project: Apache IoTDB > Issue Type: Improvement >Reporter: Liuxuxin >Assignee: Liuxuxin >Priority: Major > Labels: pull-request-available > Original Estimate: 24h > Remaining Estimate: 24h > > We want to trace the count of deserlializing chunk or pages during compaction. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (IOTDB-5206) Fix when target file is deleted in Compaction exception handler and recover
[ https://issues.apache.org/jira/browse/IOTDB-5206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17681993#comment-17681993 ] Jinrui Zhang commented on IOTDB-5206: - The fix is approved but not merged because of IT failures > Fix when target file is deleted in Compaction exception handler and recover > --- > > Key: IOTDB-5206 > URL: https://issues.apache.org/jira/browse/IOTDB-5206 > Project: Apache IoTDB > Issue Type: Bug >Affects Versions: master branch, 1.0.0 >Reporter: 周沛辰 >Assignee: 周沛辰 >Priority: Major > Labels: pull-request-available > Original Estimate: 72h > Remaining Estimate: 72h > > *Description* > After compaction, if the target file is empty, its corresponding disk file > will be deleted. If an exception or system interruption occurs, there will be > problems in restart recovery and set allowCompaction to false. > 2022-12-20 09:23:53,086 [pool-12-IoTDB-Recovery-Thread-Pool-1] ERROR > o.a.i.d.e.c.t.CompactionRecoverTask:300 - root.iot-0 > [Compaction][ExceptionHandler] target file > sequence/root.iot/0/0/1670572962795-1051-2-1.inner is not complete, and some > source files is lost, do nothing. Set allowCompaction to false > 2022-12-20 09:23:53,087 [pool-12-IoTDB-Recovery-Thread-Pool-1] ERROR > o.a.i.d.e.c.t.CompactionRecoverTask:133 - root.iot-0 [Compaction][Recover] > Failed to recover compaction, set allowCompaction to false > *Reason* > Empty target files will be deleted in compaction. In recovery, system will > report source files are lost and empty target file has been deleted. > *Solution* > Empty target files are not deleted during the compaction until the end of the > compaction. However, after recovery, the empty target file will not be > deleted, but it will not affect the correctness of the system. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (IOTDB-5140) Add metrics for compaction deserializing pages or writing chunks
[ https://issues.apache.org/jira/browse/IOTDB-5140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17681841#comment-17681841 ] Jinrui Zhang commented on IOTDB-5140: - Need to put another PR to rel/1.0 > Add metrics for compaction deserializing pages or writing chunks > > > Key: IOTDB-5140 > URL: https://issues.apache.org/jira/browse/IOTDB-5140 > Project: Apache IoTDB > Issue Type: Improvement >Reporter: Liuxuxin >Assignee: Liuxuxin >Priority: Major > Labels: pull-request-available > Original Estimate: 24h > Remaining Estimate: 24h > > We want to trace the count of deserlializing chunk or pages during compaction. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Reopened] (IOTDB-5140) Add metrics for compaction deserializing pages or writing chunks
[ https://issues.apache.org/jira/browse/IOTDB-5140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jinrui Zhang reopened IOTDB-5140: - > Add metrics for compaction deserializing pages or writing chunks > > > Key: IOTDB-5140 > URL: https://issues.apache.org/jira/browse/IOTDB-5140 > Project: Apache IoTDB > Issue Type: Improvement >Reporter: Liuxuxin >Assignee: Liuxuxin >Priority: Major > Labels: pull-request-available > Original Estimate: 24h > Remaining Estimate: 24h > > We want to trace the count of deserlializing chunk or pages during compaction. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (IOTDB-5164) [disk]datanode takes too much disk space, should improve
[ https://issues.apache.org/jira/browse/IOTDB-5164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17681832#comment-17681832 ] Jinrui Zhang commented on IOTDB-5164: - 这个issue 可以关闭了吗? > [disk]datanode takes too much disk space, should improve > > > Key: IOTDB-5164 > URL: https://issues.apache.org/jira/browse/IOTDB-5164 > Project: Apache IoTDB > Issue Type: Improvement >Reporter: changxue >Assignee: Jinrui Zhang >Priority: Major > Attachments: iotdb-common.properties, iotdb-confignode.properties, > iotdb-datanode.properties > > > [disk]datanode takes too much disk space, should improve > Here is the disk taking state of one node, it shows 124G data would take 230G > in one node, and there are 3 nodes with 3 replicas, so 124G data would take 6 > times of real one. This is too much. > {code} > 124G ./datanode/data/sequence > 51M ./datanode/data/unsequence > 104G ./datanode/data/snapshot > 228G ./datanode/data > 414M ./datanode/wal/root.test-0 > 401M ./datanode/wal/root.test-3 > 394M ./datanode/wal/root.test-1 > 410M ./datanode/wal/root.test-2 > 394M ./datanode/wal/root.test-4 > 2.0G ./datanode/wal > 4.0K ./datanode/system/compression_ratio > 16K ./datanode/system/schema > 4.0K ./datanode/system/roles > 8.0K ./datanode/system/users > 48K ./datanode/system/databases > 4.0K ./datanode/system/upgrade > 8.0K ./datanode/system/udf > 100K ./datanode/system > 5.2M ./datanode/consensus/schema_region > 356K ./datanode/consensus/data_region > 5.6M ./datanode/consensus > 230G ./datanode > 4.0K ./confignode/system/roles > 8.0K ./confignode/system/users > 4.0K ./confignode/system/procedure > 24K ./confignode/system > 4.1M ./confignode/consensus/47474747-4747-4747-4747- > 4.1M ./confignode/consensus > 4.1M ./confignode > 230G . > {code} > 124G的数据,单个节点上要占用230G的空间,这是个3节点集群配置的3副本,所以,它总共要占用6倍的磁盘空间。这实在太多了,我觉得需要优化。咱们snapshot的设计是否有部分重复。这部分空间是否可以复用。 > 说明:可能是因为磁盘空间不足导致readonly, 然后snapshot。 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (IOTDB-4001) Only unseq files reach a certain number can they be selected in cross compaction
[ https://issues.apache.org/jira/browse/IOTDB-4001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jinrui Zhang reassigned IOTDB-4001: --- Sprint: 2023-1-Storage (was: StorageEngine-Backlog) Assignee: Wenwei Shu (was: 周沛辰) > Only unseq files reach a certain number can they be selected in cross > compaction > > > Key: IOTDB-4001 > URL: https://issues.apache.org/jira/browse/IOTDB-4001 > Project: Apache IoTDB > Issue Type: Improvement >Reporter: 周沛辰 >Assignee: Wenwei Shu >Priority: Major > > When the number of unseq files reaches a certain number, they will be > selected to participate in the cross space compaction, so as to avoid an > unseq file being immediately selected to participate in the compaction, > resulting in write amplification. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (IOTDB-5319) The write speed in atoms testing declined after merging commit 5126711d
[ https://issues.apache.org/jira/browse/IOTDB-5319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17655937#comment-17655937 ] Jinrui Zhang commented on IOTDB-5319: - we have reverted this PR for release 1.0.1. Let's keep tracking this issue for incoming release > The write speed in atoms testing declined after merging commit 5126711d > --- > > Key: IOTDB-5319 > URL: https://issues.apache.org/jira/browse/IOTDB-5319 > Project: Apache IoTDB > Issue Type: Improvement >Reporter: Jinrui Zhang >Assignee: Liuxuxin >Priority: Major > Attachments: image-2022-12-29-18-37-25-693.png, > image-2022-12-29-18-38-45-394.png > > Original Estimate: 72h > Remaining Estimate: 72h > > !image-2022-12-29-18-37-25-693.png|width=701,height=156! > > After merging this commit, the write speed in atoms testing declined. > > We inferred that this change lead to compaction grabs more CPU/IO resources, > which decrease available resources of write/read. > > After we change the parameter `iops_per_min` from 50 to 30, the scenario > still exists. > See this snapshot, > !image-2022-12-29-18-38-45-394.png|width=541,height=346! > > Let's investigate the details. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IOTDB-5385) Optimize the DataRegion leader calculation policy when ConfigNode leader changes
Jinrui Zhang created IOTDB-5385: --- Summary: Optimize the DataRegion leader calculation policy when ConfigNode leader changes Key: IOTDB-5385 URL: https://issues.apache.org/jira/browse/IOTDB-5385 Project: Apache IoTDB Issue Type: Improvement Reporter: Jinrui Zhang Assignee: Gaofei Cao We found a phenomenon in recent testing. That is, the leaders of DataRegions will be updated by new elected ConfigNode leader. It adds unstable risks to our system sometimes and it will be optimized. See this doc for detials. https://apache-iotdb.feishu.cn/docx/ZTlkdiPwRoXGi0xs2cacYpSlnYb?from=space_persnoal_filelist&pre_pathname=%2Fdrive%2Ffolder%2Ffldcnf4szpemAst96rw3XajU5jb -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IOTDB-5363) Try to construct InsertTablet from InsertRowsNode to speed up write operation
Jinrui Zhang created IOTDB-5363: --- Summary: Try to construct InsertTablet from InsertRowsNode to speed up write operation Key: IOTDB-5363 URL: https://issues.apache.org/jira/browse/IOTDB-5363 Project: Apache IoTDB Issue Type: Improvement Reporter: Jinrui Zhang Assignee: Haiming Zhu -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (IOTDB-5206) Fix when target file is deleted in Compaction exception handler and recover
[ https://issues.apache.org/jira/browse/IOTDB-5206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17654819#comment-17654819 ] Jinrui Zhang commented on IOTDB-5206: - 执行时策略: [https://apache-iotdb.feishu.cn/docs/doccnOxzotqCdP94MuO1HIDjv4c] 恢复时策略:https://apache-iotdb.feishu.cn/docx/I9yIdIoRgo5dBCxb1Svcoil7nDg#T8UmdA088oY2CwxyAfVcXHUanSh > Fix when target file is deleted in Compaction exception handler and recover > --- > > Key: IOTDB-5206 > URL: https://issues.apache.org/jira/browse/IOTDB-5206 > Project: Apache IoTDB > Issue Type: Bug >Affects Versions: master branch, 1.0.0 >Reporter: 周沛辰 >Assignee: 周沛辰 >Priority: Major > Labels: pull-request-available > Original Estimate: 72h > Remaining Estimate: 72h > > *Description* > After compaction, if the target file is empty, its corresponding disk file > will be deleted. If an exception or system interruption occurs, there will be > problems in restart recovery and set allowCompaction to false. > 2022-12-20 09:23:53,086 [pool-12-IoTDB-Recovery-Thread-Pool-1] ERROR > o.a.i.d.e.c.t.CompactionRecoverTask:300 - root.iot-0 > [Compaction][ExceptionHandler] target file > sequence/root.iot/0/0/1670572962795-1051-2-1.inner is not complete, and some > source files is lost, do nothing. Set allowCompaction to false > 2022-12-20 09:23:53,087 [pool-12-IoTDB-Recovery-Thread-Pool-1] ERROR > o.a.i.d.e.c.t.CompactionRecoverTask:133 - root.iot-0 [Compaction][Recover] > Failed to recover compaction, set allowCompaction to false > *Reason* > Empty target files will be deleted in compaction. In recovery, system will > report source files are lost and empty target file has been deleted. > *Solution* > Empty target files are not deleted during the compaction until the end of the > compaction. However, after recovery, the empty target file will not be > deleted, but it will not affect the correctness of the system. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IOTDB-5338) WAL buffer flush threshold optimaztion
Jinrui Zhang created IOTDB-5338: --- Summary: WAL buffer flush threshold optimaztion Key: IOTDB-5338 URL: https://issues.apache.org/jira/browse/IOTDB-5338 Project: Apache IoTDB Issue Type: Improvement Reporter: Jinrui Zhang Assignee: Haiming Zhu -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (IOTDB-5337) Parallelization of write operation in FragmentInstanceDispatcher
[ https://issues.apache.org/jira/browse/IOTDB-5337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jinrui Zhang reassigned IOTDB-5337: --- Sprint: 2023-1-Storage Assignee: Haiming Zhu Remaining Estimate: 72h Original Estimate: 72h > Parallelization of write operation in FragmentInstanceDispatcher > > > Key: IOTDB-5337 > URL: https://issues.apache.org/jira/browse/IOTDB-5337 > Project: Apache IoTDB > Issue Type: Improvement >Reporter: Jinrui Zhang >Assignee: Haiming Zhu >Priority: Major > Original Estimate: 72h > Remaining Estimate: 72h > > In current implementation, the write operations split will be dispatched one > by one. > > We can try to dispatch them in parallel to improve the speed -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IOTDB-5337) Parallelization of write operation in FragmentInstanceDispatcher
Jinrui Zhang created IOTDB-5337: --- Summary: Parallelization of write operation in FragmentInstanceDispatcher Key: IOTDB-5337 URL: https://issues.apache.org/jira/browse/IOTDB-5337 Project: Apache IoTDB Issue Type: Improvement Reporter: Jinrui Zhang In current implementation, the write operations split will be dispatched one by one. We can try to dispatch them in parallel to improve the speed -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IOTDB-5336) Investigation regarding write interface used by TSBS in IoTDB
Jinrui Zhang created IOTDB-5336: --- Summary: Investigation regarding write interface used by TSBS in IoTDB Key: IOTDB-5336 URL: https://issues.apache.org/jira/browse/IOTDB-5336 Project: Apache IoTDB Issue Type: Improvement Reporter: Jinrui Zhang Assignee: Haiming Zhu -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IOTDB-5335) InsertRecords performance optimization
Jinrui Zhang created IOTDB-5335: --- Summary: InsertRecords performance optimization Key: IOTDB-5335 URL: https://issues.apache.org/jira/browse/IOTDB-5335 Project: Apache IoTDB Issue Type: Improvement Reporter: Jinrui Zhang Assignee: Haiming Zhu -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (IOTDB-5273) [fast compaction]The performance is slow ,there are out-of-order tsfiles after compaction
[ https://issues.apache.org/jira/browse/IOTDB-5273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jinrui Zhang reassigned IOTDB-5273: --- Sprint: 2023-1-Storage (was: 2022-12-Storage) Assignee: Wenwei Shu (was: 周沛辰) > [fast compaction]The performance is slow ,there are out-of-order tsfiles > after compaction > - > > Key: IOTDB-5273 > URL: https://issues.apache.org/jira/browse/IOTDB-5273 > Project: Apache IoTDB > Issue Type: Bug > Components: mpp-cluster >Affects Versions: master branch >Reporter: 刘珍 >Assignee: Wenwei Shu >Priority: Major > Attachments: 1_luanxu.conf, 2_luanxu.conf, 3_luanxu.conf, > 4_luanxu.conf, image-2022-12-23-18-10-25-061.png > > Original Estimate: 96h > Remaining Estimate: 96h > > master 1222_656d281 > 问题描述: > 私有云1期,周测乱序配置,fast合并性能慢,且合并完成依然有乱序tsfile。 > !image-2022-12-23-18-10-25-061.png|width=979,height=481! > 测试环境 > 1. 私有云1期 > 关合并,生成数据。 > 配置文件见附件 > 2.对比合并性能,无其他读写操作。 > ConfigNode配置 > MAX_HEAP_SIZE="2G" > DataNode配置 > MAX_HEAP_SIZE="18G" > MAX_DIRECT_MEMORY_SIZE="6G" > Common配置 > time_partition_interval=6048000 > compaction_io_rate_per_sec=1000 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (IOTDB-5156) The backup data is twice the size of the source data
[ https://issues.apache.org/jira/browse/IOTDB-5156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jinrui Zhang reassigned IOTDB-5156: --- Assignee: Liuxuxin > The backup data is twice the size of the source data > > > Key: IOTDB-5156 > URL: https://issues.apache.org/jira/browse/IOTDB-5156 > Project: Apache IoTDB > Issue Type: Improvement > Components: mpp-cluster >Reporter: 刘珍 >Assignee: Liuxuxin >Priority: Major > Attachments: image-2022-12-08-21-08-54-494.png > > > master > 问题描述: > 备份iotdb的data,大小是源data 的2倍。 > cp -rp m_1207_a0b2c8c_fast2/data m_1208_7f2218b_fast2/ > 备份出来的数据大小是源data的2倍: > 源snapshot 中的文件是硬链接,备份snapshot的文件会是普通文件: > !image-2022-12-08-21-08-54-494.png! > 测试流程 > 1.启动1副本1C1D(sbin/start-standalone.sh) > config , schema ,data 是ratis ,ratis ,IoT 协议 > 2. 写入数据 > 3. 正常停止datanode(会take snapshot) > 4. 备份数据 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (IOTDB-4665) "tsfile" (all data deleted) lack a periodic deletion policy
[ https://issues.apache.org/jira/browse/IOTDB-4665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jinrui Zhang reassigned IOTDB-4665: --- Assignee: Jinrui Zhang > "tsfile" (all data deleted) lack a periodic deletion policy > > > Key: IOTDB-4665 > URL: https://issues.apache.org/jira/browse/IOTDB-4665 > Project: Apache IoTDB > Issue Type: Improvement > Components: mpp-cluster >Reporter: 刘珍 >Assignee: Jinrui Zhang >Priority: Major > Attachments: image-2022-10-17-14-37-00-495.png, > image-2022-10-17-14-38-08-095.png > > > 已执行delete timeseries root.** > 全部数据和元数据已删除。 > 但是tsfile没有定期清理(删除)策略。 > !image-2022-10-17-14-37-00-495.png! > !image-2022-10-17-14-38-08-095.png! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (IOTDB-5319) The write speed in atoms testing declined after merging commit 5126711d
[ https://issues.apache.org/jira/browse/IOTDB-5319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jinrui Zhang reassigned IOTDB-5319: --- Assignee: Liuxuxin (was: Jinrui Zhang) > The write speed in atoms testing declined after merging commit 5126711d > --- > > Key: IOTDB-5319 > URL: https://issues.apache.org/jira/browse/IOTDB-5319 > Project: Apache IoTDB > Issue Type: Improvement >Reporter: Jinrui Zhang >Assignee: Liuxuxin >Priority: Major > Attachments: image-2022-12-29-18-37-25-693.png, > image-2022-12-29-18-38-45-394.png > > > !image-2022-12-29-18-37-25-693.png|width=701,height=156! > > After merging this commit, the write speed in atoms testing declined. > > We inferred that this change lead to compaction grabs more CPU/IO resources, > which decrease available resources of write/read. > > After we change the parameter `iops_per_min` from 50 to 30, the scenario > still exists. > See this snapshot, > !image-2022-12-29-18-38-45-394.png|width=541,height=346! > > Let's investigate the details. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (IOTDB-5289) [Metric]Only the leader confignode can show the number of datanode and confignode
[ https://issues.apache.org/jira/browse/IOTDB-5289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jinrui Zhang reassigned IOTDB-5289: --- Assignee: Hongyin Zhang (was: Gaofei Cao) > [Metric]Only the leader confignode can show the number of datanode and > confignode > - > > Key: IOTDB-5289 > URL: https://issues.apache.org/jira/browse/IOTDB-5289 > Project: Apache IoTDB > Issue Type: Improvement > Components: Core/Cluster >Affects Versions: 1.0.0 >Reporter: Qingxin Feng >Assignee: Hongyin Zhang >Priority: Minor > Attachments: image-2022-12-27-10-14-27-396.png, > image-2022-12-27-10-14-39-563.png > > > Only the leader confignode can show the number of datanode and confignode. > Please refer to below pictures: > Can we change it to "can be show on both leader and follower"? > !image-2022-12-27-10-14-39-563.png|width=638,height=388! > !image-2022-12-27-10-14-27-396.png|width=637,height=335! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (IOTDB-5289) [Metric]Only the leader confignode can show the number of datanode and confignode
[ https://issues.apache.org/jira/browse/IOTDB-5289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jinrui Zhang reassigned IOTDB-5289: --- Assignee: Gaofei Cao (was: Hongyin Zhang) > [Metric]Only the leader confignode can show the number of datanode and > confignode > - > > Key: IOTDB-5289 > URL: https://issues.apache.org/jira/browse/IOTDB-5289 > Project: Apache IoTDB > Issue Type: Improvement > Components: Core/Cluster >Affects Versions: 1.0.0 >Reporter: Qingxin Feng >Assignee: Gaofei Cao >Priority: Minor > Attachments: image-2022-12-27-10-14-27-396.png, > image-2022-12-27-10-14-39-563.png > > > Only the leader confignode can show the number of datanode and confignode. > Please refer to below pictures: > Can we change it to "can be show on both leader and follower"? > !image-2022-12-27-10-14-39-563.png|width=638,height=388! > !image-2022-12-27-10-14-27-396.png|width=637,height=335! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (IOTDB-4986) Too many IoTDB-DataNodeInternalRPC-Processor threads are open
[ https://issues.apache.org/jira/browse/IOTDB-4986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17653755#comment-17653755 ] Jinrui Zhang commented on IOTDB-4986: - Let's confirmed with [~tanxinyu] whether this issue is fixed or not > Too many IoTDB-DataNodeInternalRPC-Processor threads are open > - > > Key: IOTDB-4986 > URL: https://issues.apache.org/jira/browse/IOTDB-4986 > Project: Apache IoTDB > Issue Type: Improvement > Components: mpp-cluster >Affects Versions: 0.14.0-SNAPSHOT >Reporter: 刘珍 >Assignee: Haiming Zhu >Priority: Critical > > m_1118_3d5eeae > 1. 启动3副本3C21D 集群 > 2. 顺序启动7Benchmark > 3. 某个节点的datanode IoTDB-DataNodeInternalRPC-Processor 线程会开的很多,2k+ > (慢慢会降下来),但是会偶现OOM > 2022-11-18 14:26:48,320 > [pool-22-IoTDB-DataNodeInternalRPC-Processor-374$20221118_062422_29227_16.1.0] > ERROR o.a.i.d.m.p.s.FragmentInstanceDispatcherImpl:234 - write locally > failed. TSStatus: TSStatus(code:506, subStatus:[]), message: null > 2022-11-18 14:29:44,568 [DataNodeInternalRPC-Service]{color:red}* ERROR > o.a.i.c.c.IoTDBDefaultThreadExceptionHandler:31 - Exception in thread > DataNodeInternalRPC-Service-40 > java.lang.OutOfMemoryError: unable to create native thread: possibly out of > memory or process/resource limits reached*{color} > at java.base/java.lang.Thread.start0(Native Method) > at java.base/java.lang.Thread.start(Thread.java:803) > at > java.base/java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:937) > at > java.base/java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1354) > at > org.apache.thrift.server.TThreadPoolServer.execute(TThreadPoolServer.java:155) > at > org.apache.thrift.server.TThreadPoolServer.serve(TThreadPoolServer.java:139) > at > org.apache.iotdb.commons.service.AbstractThriftServiceThread.run(AbstractThriftServiceThread.java:258) > 2022-11-18 14:29:53,751 [ClientRPC-Service] ERROR > o.a.i.c.c.IoTDBDefaultThreadExceptionHandler:31 - Exception in thread > ClientRPC-Service-42 > java.lang.OutOfMemoryError: unable to create native thread: possibly out of > memory or process/resource limits reached > at java.base/java.lang.Thread.start0(Native Method) > at java.base/java.lang.Thread.start(Thread.java:803) > at > java.base/java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:937) > at > java.base/java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1354) > at > org.apache.thrift.server.TThreadPoolServer.execute(TThreadPoolServer.java:155) > at > org.apache.thrift.server.TThreadPoolServer.serve(TThreadPoolServer.java:139) > at > org.apache.iotdb.commons.service.AbstractThriftServiceThread.run(AbstractThriftServiceThread.java:258) > 2022-11-18 14:30:11,736 [pool-6-IoTDB-Flush-4] ERROR > o.a.i.d.e.s.TsFileProcessor:1095 - root.test.g_0-6: > /data/iotdb/m_1118_3d5eeae/sbin/../data/datanode/data/unsequence/root.test.g_0/6/2538/1668752675355-5-0-0.tsfile > meet error when flushing a memtable, change system mode to error > java.lang.OutOfMemoryError: unable to create native thread: possibly out of > memory or process/resource limits reached > at java.base/java.lang.Thread.start0(Native Method) > at java.base/java.lang.Thread.start(Thread.java:803) > at > java.base/java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:937) > at > java.base/java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1354) > at > java.base/java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:118) > at > org.apache.iotdb.db.rescon.AbstractPoolManager.submit(AbstractPoolManager.java:56) > at > org.apache.iotdb.db.engine.flush.MemTableFlushTask.(MemTableFlushTask.java:88) > at > org.apache.iotdb.db.engine.storagegroup.TsFileProcessor.flushOneMemTable(TsFileProcessor.java:1082) > at > org.apache.iotdb.db.engine.flush.FlushManager$FlushThread.runMayThrow(FlushManager.java:108) > at > org.apache.iotdb.commons.concurrent.WrappedRunnable.run(WrappedRunnable.java:29) > at > java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) > at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) > at > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) > at > java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) > at java.base/java.lang.Thread.run(Thread.java:834) > 2022-11-18 14:30:11,736 [pool-6-IoTDB-Flush-4] ERROR > o.a.i.c.e.HandleSystemErrorStrategy:37 - Unrecovera
[jira] [Commented] (IOTDB-4164) [ wal_mode=SYNC ] Performance needs to be optimized
[ https://issues.apache.org/jira/browse/IOTDB-4164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17653754#comment-17653754 ] Jinrui Zhang commented on IOTDB-4164: - The priority of this feature is not high. Let's move it to Backlog > [ wal_mode=SYNC ] Performance needs to be optimized > --- > > Key: IOTDB-4164 > URL: https://issues.apache.org/jira/browse/IOTDB-4164 > Project: Apache IoTDB > Issue Type: Improvement > Components: Core/WAL, mpp-cluster >Reporter: 刘珍 >Assignee: Haiming Zhu >Priority: Major > Attachments: image-2022-08-17-11-38-15-849.png, > image-2022-10-10-09-59-07-692.png > > > wal_mode=SYNC的性能需优化 : 同样(bm)配置,SYNC耗时/ASYNC=2.77 > !image-2022-08-17-11-38-15-849.png! > 复现流程见 > https://issues.apache.org/jira/browse/IOTDB-4161?filter=-2 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (IOTDB-5275) [compaction][aligned ts] compaction is slow
[ https://issues.apache.org/jira/browse/IOTDB-5275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17653735#comment-17653735 ] Jinrui Zhang commented on IOTDB-5275: - What's the problem > [compaction][aligned ts] compaction is slow > --- > > Key: IOTDB-5275 > URL: https://issues.apache.org/jira/browse/IOTDB-5275 > Project: Apache IoTDB > Issue Type: Bug > Components: mpp-cluster >Affects Versions: master branch >Reporter: 刘珍 >Assignee: 周沛辰 >Priority: Major > Attachments: 1_shunxu.conf, 2_shunxu.conf, 3_shunxu.conf, > 4_shunxu.conf, image-2022-12-23-22-27-49-731.png > > > m_1222_656d281 > 问题描述: > 对齐序列,全部是顺序数据,无其他读写操作, > 默认配置(cross_performer=read_point > inner_seq_performer=read_chunk > inner_unseq_performer=read_point > ) > {color:#de350b}合并慢{color} > !image-2022-12-23-22-27-49-731.png|width=987,height=390! > 测试环境 > 1. 私有云1期 > 关合并,生成数据。 > 配置文件见附件 > 2.对比合并性能,无其他读写操作。 > ConfigNode配置 > MAX_HEAP_SIZE="2G" > DataNode配置 > MAX_HEAP_SIZE="18G" > MAX_DIRECT_MEMORY_SIZE="6G" > Common配置 > {color:#de350b}*time_partition_interval=6048000 > compaction_io_rate_per_sec=1000*{color} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (IOTDB-5319) The write speed in atoms testing declined after merging commit 5126711d
[ https://issues.apache.org/jira/browse/IOTDB-5319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17653053#comment-17653053 ] Jinrui Zhang commented on IOTDB-5319: - We have found the commit which leader to the decline. The commit change the limitation style of compaction task from I/O size to iops, which leads to that the compcation grabs more resources from read/write. [~marklau99] Please investigate the issue > The write speed in atoms testing declined after merging commit 5126711d > --- > > Key: IOTDB-5319 > URL: https://issues.apache.org/jira/browse/IOTDB-5319 > Project: Apache IoTDB > Issue Type: Improvement >Reporter: Jinrui Zhang >Assignee: Jinrui Zhang >Priority: Major > Attachments: image-2022-12-29-18-37-25-693.png, > image-2022-12-29-18-38-45-394.png > > > !image-2022-12-29-18-37-25-693.png|width=701,height=156! > > After merging this commit, the write speed in atoms testing declined. > > We inferred that this change lead to compaction grabs more CPU/IO resources, > which decrease available resources of write/read. > > After we change the parameter `iops_per_min` from 50 to 30, the scenario > still exists. > See this snapshot, > !image-2022-12-29-18-38-45-394.png|width=541,height=346! > > Let's investigate the details. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (IOTDB-5319) The write speed in atoms testing declined after merging commit 5126711d
[ https://issues.apache.org/jira/browse/IOTDB-5319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jinrui Zhang reassigned IOTDB-5319: --- Assignee: Jinrui Zhang > The write speed in atoms testing declined after merging commit 5126711d > --- > > Key: IOTDB-5319 > URL: https://issues.apache.org/jira/browse/IOTDB-5319 > Project: Apache IoTDB > Issue Type: Improvement >Reporter: Jinrui Zhang >Assignee: Jinrui Zhang >Priority: Major > Attachments: image-2022-12-29-18-37-25-693.png, > image-2022-12-29-18-38-45-394.png > > > !image-2022-12-29-18-37-25-693.png|width=701,height=156! > > After merging this commit, the write speed in atoms testing declined. > > We inferred that this change lead to compaction grabs more CPU/IO resources, > which decrease available resources of write/read. > > After we change the parameter `iops_per_min` from 50 to 30, the scenario > still exists. > See this snapshot, > !image-2022-12-29-18-38-45-394.png|width=541,height=346! > > Let's investigate the details. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (IOTDB-5319) The write speed in atoms testing declined after merging commit 5126711d
[ https://issues.apache.org/jira/browse/IOTDB-5319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jinrui Zhang reassigned IOTDB-5319: --- Assignee: (was: Jinrui Zhang) > The write speed in atoms testing declined after merging commit 5126711d > --- > > Key: IOTDB-5319 > URL: https://issues.apache.org/jira/browse/IOTDB-5319 > Project: Apache IoTDB > Issue Type: Improvement >Reporter: Jinrui Zhang >Priority: Major > Attachments: image-2022-12-29-18-37-25-693.png, > image-2022-12-29-18-38-45-394.png > > > !image-2022-12-29-18-37-25-693.png|width=701,height=156! > > After merging this commit, the write speed in atoms testing declined. > > We inferred that this change lead to compaction grabs more CPU/IO resources, > which decrease available resources of write/read. > > After we change the parameter `iops_per_min` from 50 to 30, the scenario > still exists. > See this snapshot, > !image-2022-12-29-18-38-45-394.png|width=541,height=346! > > Let's investigate the details. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (IOTDB-5279) [Metric] Got a wrong number after restart the cluster
[ https://issues.apache.org/jira/browse/IOTDB-5279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17653047#comment-17653047 ] Jinrui Zhang commented on IOTDB-5279: - Let's paste the PR if it is completed > [Metric] Got a wrong number after restart the cluster > - > > Key: IOTDB-5279 > URL: https://issues.apache.org/jira/browse/IOTDB-5279 > Project: Apache IoTDB > Issue Type: Bug > Components: Core/Cluster >Affects Versions: 1.0.0 >Reporter: Qingxin Feng >Assignee: Liuxuxin >Priority: Minor > Attachments: image-2022-12-26-11-30-04-470.png > > > Reproduce: > commit version: 1.0.1-SNAPSHOT (Build: a7908ab-dev) > Steps: > 1. setup cluster (3C3D 3副本) > 2. using BM to insert data > 3. after all test finished,try to restart the cluster > 4. Check the result in iotdb-metric,like below picture > http://111.202.73.147:13000/d/TbEVYRw7A/apache-iotdb-datanode-dashboard?orgId=1&from=1672013871154&to=1672024516629&var-job=datanode&var-instance=172.20.70.22:9091 > !image-2022-12-26-11-30-04-470.png! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (IOTDB-5319) The write speed in atoms testing declined after merging commit 5126711d
[ https://issues.apache.org/jira/browse/IOTDB-5319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jinrui Zhang reassigned IOTDB-5319: --- Assignee: Jinrui Zhang > The write speed in atoms testing declined after merging commit 5126711d > --- > > Key: IOTDB-5319 > URL: https://issues.apache.org/jira/browse/IOTDB-5319 > Project: Apache IoTDB > Issue Type: Improvement >Reporter: Jinrui Zhang >Assignee: Jinrui Zhang >Priority: Major > Attachments: image-2022-12-29-18-37-25-693.png, > image-2022-12-29-18-38-45-394.png > > > !image-2022-12-29-18-37-25-693.png|width=701,height=156! > > After merging this commit, the write speed in atoms testing declined. > > We inferred that this change lead to compaction grabs more CPU/IO resources, > which decrease available resources of write/read. > > After we change the parameter `iops_per_min` from 50 to 30, the scenario > still exists. > See this snapshot, > !image-2022-12-29-18-38-45-394.png|width=541,height=346! > > Let's investigate the details. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IOTDB-5319) The write speed in atoms testing declined after merging commit 5126711d
Jinrui Zhang created IOTDB-5319: --- Summary: The write speed in atoms testing declined after merging commit 5126711d Key: IOTDB-5319 URL: https://issues.apache.org/jira/browse/IOTDB-5319 Project: Apache IoTDB Issue Type: Improvement Reporter: Jinrui Zhang Attachments: image-2022-12-29-18-37-25-693.png !image-2022-12-29-18-37-25-693.png|width=701,height=156! After merging this commit, the write speed in atoms testing declined. We inferred that this change lead to compaction grabs more CPU/IO resources, which decrease available resources of write/read. After we change the parameter `iops_per_min` from 50 to 30, the scenario still exists. See this snapshot, !image-2022-12-29-18-35-42-134.png|width=521,height=333! Let's investigate the details. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (IOTDB-4684) Devices with the same name but different alignment properties are compacted into the wrong alignment property
[ https://issues.apache.org/jira/browse/IOTDB-4684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17652719#comment-17652719 ] Jinrui Zhang commented on IOTDB-4684: - The schema process in compaction task execution stage is not accurate. > Devices with the same name but different alignment properties are compacted > into the wrong alignment property > - > > Key: IOTDB-4684 > URL: https://issues.apache.org/jira/browse/IOTDB-4684 > Project: Apache IoTDB > Issue Type: Bug >Affects Versions: 0.13.0 >Reporter: 周沛辰 >Assignee: 周沛辰 >Priority: Major > > *Description* > After the nonAligned device is deleted, an aligned device with the same name > is created, and it will be compacted into the nonAligned device after > compaction. > Similarly, after the aligned device is deleted, an nonAligned device with the > same name is created, and it will be compacted into the aligned device after > compaction. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (IOTDB-5147) Optimize compaction schedule when priority is BALANCE
[ https://issues.apache.org/jira/browse/IOTDB-5147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17652718#comment-17652718 ] Jinrui Zhang commented on IOTDB-5147: - Still in talk stage, won't complete in 1.0.1 > Optimize compaction schedule when priority is BALANCE > - > > Key: IOTDB-5147 > URL: https://issues.apache.org/jira/browse/IOTDB-5147 > Project: Apache IoTDB > Issue Type: Improvement >Reporter: 周沛辰 >Assignee: 周沛辰 >Priority: Major > > When the priority is BALANCE, there will be a problem with the compaction > schedule, that is, when new inner space compaction tasks are continuously > submitted to the priority queue, cross space compaction tasks will be starved > to death. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (IOTDB-5189) Optimize the memory usage of fast compaction
[ https://issues.apache.org/jira/browse/IOTDB-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17652717#comment-17652717 ] Jinrui Zhang commented on IOTDB-5189: - Still in progress. Won't merged into 1.0.1 > Optimize the memory usage of fast compaction > > > Key: IOTDB-5189 > URL: https://issues.apache.org/jira/browse/IOTDB-5189 > Project: Apache IoTDB > Issue Type: Improvement >Reporter: 周沛辰 >Assignee: 周沛辰 >Priority: Major > Fix For: 1.0.1 > > > Only read the chunks that need to be used into the memory each time, instead > of reading all the overlapping chunks into the memory at once. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (IOTDB-4986) Too many IoTDB-DataNodeInternalRPC-Processor threads are open
[ https://issues.apache.org/jira/browse/IOTDB-4986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17652409#comment-17652409 ] Jinrui Zhang commented on IOTDB-4986: - This issue is related to the ClientManager used in IoTDB cluster. [~LebronAl] is fixing this issue > Too many IoTDB-DataNodeInternalRPC-Processor threads are open > - > > Key: IOTDB-4986 > URL: https://issues.apache.org/jira/browse/IOTDB-4986 > Project: Apache IoTDB > Issue Type: Improvement > Components: mpp-cluster >Affects Versions: 0.14.0-SNAPSHOT >Reporter: 刘珍 >Assignee: Haiming Zhu >Priority: Critical > > m_1118_3d5eeae > 1. 启动3副本3C21D 集群 > 2. 顺序启动7Benchmark > 3. 某个节点的datanode IoTDB-DataNodeInternalRPC-Processor 线程会开的很多,2k+ > (慢慢会降下来),但是会偶现OOM > 2022-11-18 14:26:48,320 > [pool-22-IoTDB-DataNodeInternalRPC-Processor-374$20221118_062422_29227_16.1.0] > ERROR o.a.i.d.m.p.s.FragmentInstanceDispatcherImpl:234 - write locally > failed. TSStatus: TSStatus(code:506, subStatus:[]), message: null > 2022-11-18 14:29:44,568 [DataNodeInternalRPC-Service]{color:red}* ERROR > o.a.i.c.c.IoTDBDefaultThreadExceptionHandler:31 - Exception in thread > DataNodeInternalRPC-Service-40 > java.lang.OutOfMemoryError: unable to create native thread: possibly out of > memory or process/resource limits reached*{color} > at java.base/java.lang.Thread.start0(Native Method) > at java.base/java.lang.Thread.start(Thread.java:803) > at > java.base/java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:937) > at > java.base/java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1354) > at > org.apache.thrift.server.TThreadPoolServer.execute(TThreadPoolServer.java:155) > at > org.apache.thrift.server.TThreadPoolServer.serve(TThreadPoolServer.java:139) > at > org.apache.iotdb.commons.service.AbstractThriftServiceThread.run(AbstractThriftServiceThread.java:258) > 2022-11-18 14:29:53,751 [ClientRPC-Service] ERROR > o.a.i.c.c.IoTDBDefaultThreadExceptionHandler:31 - Exception in thread > ClientRPC-Service-42 > java.lang.OutOfMemoryError: unable to create native thread: possibly out of > memory or process/resource limits reached > at java.base/java.lang.Thread.start0(Native Method) > at java.base/java.lang.Thread.start(Thread.java:803) > at > java.base/java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:937) > at > java.base/java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1354) > at > org.apache.thrift.server.TThreadPoolServer.execute(TThreadPoolServer.java:155) > at > org.apache.thrift.server.TThreadPoolServer.serve(TThreadPoolServer.java:139) > at > org.apache.iotdb.commons.service.AbstractThriftServiceThread.run(AbstractThriftServiceThread.java:258) > 2022-11-18 14:30:11,736 [pool-6-IoTDB-Flush-4] ERROR > o.a.i.d.e.s.TsFileProcessor:1095 - root.test.g_0-6: > /data/iotdb/m_1118_3d5eeae/sbin/../data/datanode/data/unsequence/root.test.g_0/6/2538/1668752675355-5-0-0.tsfile > meet error when flushing a memtable, change system mode to error > java.lang.OutOfMemoryError: unable to create native thread: possibly out of > memory or process/resource limits reached > at java.base/java.lang.Thread.start0(Native Method) > at java.base/java.lang.Thread.start(Thread.java:803) > at > java.base/java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:937) > at > java.base/java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1354) > at > java.base/java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:118) > at > org.apache.iotdb.db.rescon.AbstractPoolManager.submit(AbstractPoolManager.java:56) > at > org.apache.iotdb.db.engine.flush.MemTableFlushTask.(MemTableFlushTask.java:88) > at > org.apache.iotdb.db.engine.storagegroup.TsFileProcessor.flushOneMemTable(TsFileProcessor.java:1082) > at > org.apache.iotdb.db.engine.flush.FlushManager$FlushThread.runMayThrow(FlushManager.java:108) > at > org.apache.iotdb.commons.concurrent.WrappedRunnable.run(WrappedRunnable.java:29) > at > java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) > at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) > at > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) > at > java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) > at java.base/java.lang.Thread.run(Thread.java:834) > 2022-11-18 14:30:11,736 [pool-6-IoTDB-Flush-4] ERROR > o.a.i.c.e.HandleSys
[jira] [Created] (IOTDB-5285) TimePartition may be error when restarting with different time partition configuration
Jinrui Zhang created IOTDB-5285: --- Summary: TimePartition may be error when restarting with different time partition configuration Key: IOTDB-5285 URL: https://issues.apache.org/jira/browse/IOTDB-5285 Project: Apache IoTDB Issue Type: Bug Reporter: Jinrui Zhang Assignee: Haiming Zhu Reproduce steps: # generate data files using time partition configuration A (eg. 1 week) # backup data files # stop system and change time partition configuration to B (eg. 1 day) # restart system and see the time partition in memory ## we can check the time partition using Arthas, the cmd is such as `ognl "@org.apache.iotdb.db.engine.StorageEngine@getInstance().dataRegionMap.get(new org.apache.iotdb.commons.consensus.DataRegionId(6)).tsfileManager.unsequenceFiles"` -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IOTDB-5266) Seq file may be lost when selecting cross compaction task
Jinrui Zhang created IOTDB-5266: --- Summary: Seq file may be lost when selecting cross compaction task Key: IOTDB-5266 URL: https://issues.apache.org/jira/browse/IOTDB-5266 Project: Apache IoTDB Issue Type: Improvement Reporter: Jinrui Zhang Assignee: Jinrui Zhang Currently, when selecting cross compaction task, some seq files may be lost if the seq file using FileTimeIndex rather than DeviceTimeIndex. It is because the FileTimeIndex cannot describe the startime/endtime accurately for a specific device. It will lead to the selection be terminated in advance. So that some seq files are lost -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (IOTDB-5263) Optimize the cross compaction file selection and execution
[ https://issues.apache.org/jira/browse/IOTDB-5263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17650798#comment-17650798 ] Jinrui Zhang commented on IOTDB-5263: - We will do this optimization after bug fix in current implementation > Optimize the cross compaction file selection and execution > -- > > Key: IOTDB-5263 > URL: https://issues.apache.org/jira/browse/IOTDB-5263 > Project: Apache IoTDB > Issue Type: Improvement >Reporter: Jinrui Zhang >Assignee: Jinrui Zhang >Priority: Major > Attachments: image-2022-12-21-18-06-40-580.png > > > In current implementation, when selecting the `overlapped` seq files for one > specific unseq file, one seq file will always be selected even though it > doesn't has overlap with the unseq file. See the sample below. > !image-2022-12-21-18-06-40-580.png|width=718,height=220! > That is, when selecting seq files for `3`, file-1 will be selected even > though there is no overlap between 1 and 3. It is because we need to find a > target file for 3 in current cross compaction implementation. Or there will > be overlapped seq files generated after cross compaction. > We need to do the optimization for it. > # Only select the seq files which has overlapped with target unseq file. > # change the implementation of the cross compaction to find the target seq > file and decrease unnecessary file write. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IOTDB-5263) Optimize the cross compaction file selection and execution
Jinrui Zhang created IOTDB-5263: --- Summary: Optimize the cross compaction file selection and execution Key: IOTDB-5263 URL: https://issues.apache.org/jira/browse/IOTDB-5263 Project: Apache IoTDB Issue Type: Improvement Reporter: Jinrui Zhang Assignee: Jinrui Zhang -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (IOTDB-4674) Reimplement settle by compaction
[ https://issues.apache.org/jira/browse/IOTDB-4674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jinrui Zhang reassigned IOTDB-4674: --- Assignee: Wenwei Shu (was: 周沛辰) > Reimplement settle by compaction > > > Key: IOTDB-4674 > URL: https://issues.apache.org/jira/browse/IOTDB-4674 > Project: Apache IoTDB > Issue Type: New Feature >Affects Versions: 0.14.0-SNAPSHOT >Reporter: Haonan Hou >Assignee: Wenwei Shu >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (IOTDB-4986) Too many IoTDB-DataNodeInternalRPC-Processor threads are open
[ https://issues.apache.org/jira/browse/IOTDB-4986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jinrui Zhang reassigned IOTDB-4986: --- Assignee: Haiming Zhu (was: Jinrui Zhang) > Too many IoTDB-DataNodeInternalRPC-Processor threads are open > - > > Key: IOTDB-4986 > URL: https://issues.apache.org/jira/browse/IOTDB-4986 > Project: Apache IoTDB > Issue Type: Improvement > Components: mpp-cluster >Affects Versions: 0.14.0-SNAPSHOT >Reporter: 刘珍 >Assignee: Haiming Zhu >Priority: Critical > > m_1118_3d5eeae > 1. 启动3副本3C21D 集群 > 2. 顺序启动7Benchmark > 3. 某个节点的datanode IoTDB-DataNodeInternalRPC-Processor 线程会开的很多,2k+ > (慢慢会降下来),但是会偶现OOM > 2022-11-18 14:26:48,320 > [pool-22-IoTDB-DataNodeInternalRPC-Processor-374$20221118_062422_29227_16.1.0] > ERROR o.a.i.d.m.p.s.FragmentInstanceDispatcherImpl:234 - write locally > failed. TSStatus: TSStatus(code:506, subStatus:[]), message: null > 2022-11-18 14:29:44,568 [DataNodeInternalRPC-Service]{color:red}* ERROR > o.a.i.c.c.IoTDBDefaultThreadExceptionHandler:31 - Exception in thread > DataNodeInternalRPC-Service-40 > java.lang.OutOfMemoryError: unable to create native thread: possibly out of > memory or process/resource limits reached*{color} > at java.base/java.lang.Thread.start0(Native Method) > at java.base/java.lang.Thread.start(Thread.java:803) > at > java.base/java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:937) > at > java.base/java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1354) > at > org.apache.thrift.server.TThreadPoolServer.execute(TThreadPoolServer.java:155) > at > org.apache.thrift.server.TThreadPoolServer.serve(TThreadPoolServer.java:139) > at > org.apache.iotdb.commons.service.AbstractThriftServiceThread.run(AbstractThriftServiceThread.java:258) > 2022-11-18 14:29:53,751 [ClientRPC-Service] ERROR > o.a.i.c.c.IoTDBDefaultThreadExceptionHandler:31 - Exception in thread > ClientRPC-Service-42 > java.lang.OutOfMemoryError: unable to create native thread: possibly out of > memory or process/resource limits reached > at java.base/java.lang.Thread.start0(Native Method) > at java.base/java.lang.Thread.start(Thread.java:803) > at > java.base/java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:937) > at > java.base/java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1354) > at > org.apache.thrift.server.TThreadPoolServer.execute(TThreadPoolServer.java:155) > at > org.apache.thrift.server.TThreadPoolServer.serve(TThreadPoolServer.java:139) > at > org.apache.iotdb.commons.service.AbstractThriftServiceThread.run(AbstractThriftServiceThread.java:258) > 2022-11-18 14:30:11,736 [pool-6-IoTDB-Flush-4] ERROR > o.a.i.d.e.s.TsFileProcessor:1095 - root.test.g_0-6: > /data/iotdb/m_1118_3d5eeae/sbin/../data/datanode/data/unsequence/root.test.g_0/6/2538/1668752675355-5-0-0.tsfile > meet error when flushing a memtable, change system mode to error > java.lang.OutOfMemoryError: unable to create native thread: possibly out of > memory or process/resource limits reached > at java.base/java.lang.Thread.start0(Native Method) > at java.base/java.lang.Thread.start(Thread.java:803) > at > java.base/java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:937) > at > java.base/java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1354) > at > java.base/java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:118) > at > org.apache.iotdb.db.rescon.AbstractPoolManager.submit(AbstractPoolManager.java:56) > at > org.apache.iotdb.db.engine.flush.MemTableFlushTask.(MemTableFlushTask.java:88) > at > org.apache.iotdb.db.engine.storagegroup.TsFileProcessor.flushOneMemTable(TsFileProcessor.java:1082) > at > org.apache.iotdb.db.engine.flush.FlushManager$FlushThread.runMayThrow(FlushManager.java:108) > at > org.apache.iotdb.commons.concurrent.WrappedRunnable.run(WrappedRunnable.java:29) > at > java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) > at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) > at > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) > at > java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) > at java.base/java.lang.Thread.run(Thread.java:834) > 2022-11-18 14:30:11,736 [pool-6-IoTDB-Flush-4] ERROR > o.a.i.c.e.HandleSystemErrorStrategy:37 - Unrecoverable error occurs! Change > system status to read-only because handle_sys
[jira] [Commented] (IOTDB-5035) After the datanode is removed successfully, the snapshot can be deleted
[ https://issues.apache.org/jira/browse/IOTDB-5035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17645745#comment-17645745 ] Jinrui Zhang commented on IOTDB-5035: - https://github.com/apache/iotdb/pull/8383 > After the datanode is removed successfully, the snapshot can be deleted > --- > > Key: IOTDB-5035 > URL: https://issues.apache.org/jira/browse/IOTDB-5035 > Project: Apache IoTDB > Issue Type: Improvement > Components: mpp-cluster >Reporter: 刘珍 >Assignee: Haiming Zhu >Priority: Minor > Attachments: image-2022-11-24-14-38-09-579.png > > > 测试版本:1124_cd839a4 > 在机器上的路径:/data/liuzhen_test/master_1123_32e2f98 (1124_cd839a4的lib) > 问题描述: > 正常stop datanode,会触发snapshot, > 启动这个节点后,执行缩容datanode(ip76/ip62),缩容成功后,这个snapshot没有被删除: > !image-2022-11-24-14-38-09-579.png! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (IOTDB-5165) [ compaction ]
[ https://issues.apache.org/jira/browse/IOTDB-5165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17645744#comment-17645744 ] Jinrui Zhang commented on IOTDB-5165: - Need to take a look in priority > [ compaction ] > -- > > Key: IOTDB-5165 > URL: https://issues.apache.org/jira/browse/IOTDB-5165 > Project: Apache IoTDB > Issue Type: Bug > Components: Core/Compaction, mpp-cluster >Affects Versions: master branch, 1.0.0 >Reporter: 刘珍 >Assignee: 周沛辰 >Priority: Major > Attachments: 1.conf, 10.conf, 2.conf, 3.conf, 4.conf, 5.conf, 6.conf, > 7.conf, 8.conf, 9.conf, run.sh, run_conf.sh > > > master 2022-12-09_a31441c > 合并失败,报错 > 2022-12-09 14:21:46,728 [pool-43-IoTDB-Compaction-8] ERROR > o.a.i.d.e.c.CompactionUtils:281 - root.test.g2_0 Device > root.test.g2_0.d_82215 {color:#DE350B}*is overlapped between file*{color} is > /data1/iotdb/m_1209_a31441c_fast1/./sbin/../data/datanode/data/sequence/root.test.g2_0/5/25/1670420064671-7-0-0.tsfile, > status: COMPACTING and file is > /data1/iotdb/m_1209_a31441c_fast1/./sbin/../data/datanode/data/sequence/root.test.g2_0/5/25/1670420083777-8-0-0.tsfile, > status: COMPACTING, end time in file is > /data1/iotdb/m_1209_a31441c_fast1/./sbin/../data/datanode/data/sequence/root.test.g2_0/5/25/1670420064671-7-0-0.tsfile, > status: COMPACTING is 153556841, start time in file is > /data1/iotdb/m_1209_a31441c_fast1/./sbin/../data/datanode/data/sequence/root.test.g2_0/5/25/1670420083777-8-0-0.tsfile, > status: COMPACTING is 153555842 > 2022-12-09 14:21:46,729 [pool-43-IoTDB-Compaction-8] ERROR > o.a.i.d.e.c.i.InnerSpaceCompactionTask:184 - {color:#DE350B}*Failed to pass > compaction validation*{color}, source files is: [file is > /data1/iotdb/m_1209_a31441c_fast1/./sbin/../data/datanode/data/sequence/root.test.g2_0/5/25/1670444897033-4581-0-0.tsfile, > status: COMPACTING, file is > /data1/iotdb/m_1209_a31441c_fast1/./sbin/../data/datanode/data/sequence/root.test.g2_0/5/25/1670444947849-4590-0-0.tsfile, > status: COMPACTING, file is > /data1/iotdb/m_1209_a31441c_fast1/./sbin/../data/datanode/data/sequence/root.test.g2_0/5/25/167044454-4600-0-0.tsfile, > status: COMPACTING, file is > /data1/iotdb/m_1209_a31441c_fast1/./sbin/../data/datanode/data/sequence/root.test.g2_0/5/25/1670445051784-4609-0-0.tsfile, > status: COMPACTING, file is > /data1/iotdb/m_1209_a31441c_fast1/./sbin/../data/datanode/data/sequence/root.test.g2_0/5/25/1670445101595-4619-0-0.tsfile, > status: COMPACTING, file is > /data1/iotdb/m_1209_a31441c_fast1/./sbin/../data/datanode/data/sequence/root.test.g2_0/5/25/1670445153290-4628-0-0.tsfile, > status: COMPACTING, file is > /data1/iotdb/m_1209_a31441c_fast1/./sbin/../data/datanode/data/sequence/root.test.g2_0/5/25/1670445204996-4638-0-0.tsfile, > status: COMPACTING, file is > /data1/iotdb/m_1209_a31441c_fast1/./sbin/../data/datanode/data/sequence/root.test.g2_0/5/25/1670445254210-4647-0-0.tsfile, > status: COMPACTING, file is > /data1/iotdb/m_1209_a31441c_fast1/./sbin/../data/datanode/data/sequence/root.test.g2_0/5/25/1670445304094-4656-0-0.tsfile, > status: COMPACTING, file is > /data1/iotdb/m_1209_a31441c_fast1/./sbin/../data/datanode/data/sequence/root.test.g2_0/5/25/1670445355765-4666-0-0.tsfile, > status: COMPACTING, file is > /data1/iotdb/m_1209_a31441c_fast1/./sbin/../data/datanode/data/sequence/root.test.g2_0/5/25/1670445407476-4675-0-0.tsfile, > status: COMPACTING, file is > /data1/iotdb/m_1209_a31441c_fast1/./sbin/../data/datanode/data/sequence/root.test.g2_0/5/25/1670445458633-4685-0-0.tsfile, > status: COMPACTING, file is > /data1/iotdb/m_1209_a31441c_fast1/./sbin/../data/datanode/data/sequence/root.test.g2_0/5/25/1670445509050-4694-0-0.tsfile, > status: COMPACTING, file is > /data1/iotdb/m_1209_a31441c_fast1/./sbin/../data/datanode/data/sequence/root.test.g2_0/5/25/1670445558911-4703-0-0.tsfile, > status: COMPACTING, file is > /data1/iotdb/m_1209_a31441c_fast1/./sbin/../data/datanode/data/sequence/root.test.g2_0/5/25/1670445608483-4712-0-0.tsfile, > status: COMPACTING, file is > /data1/iotdb/m_1209_a31441c_fast1/./sbin/../data/datanode/data/sequence/root.test.g2_0/5/25/1670445660518-4722-0-0.tsfile, > status: COMPACTING, file is > /data1/iotdb/m_1209_a31441c_fast1/./sbin/../data/datanode/data/sequence/root.test.g2_0/5/25/1670445709301-4731-0-0.tsfile, > status: COMPACTING, file is > /data1/iotdb/m_1209_a31441c_fast1/./sbin/../data/datanode/data/sequence/root.test.g2_0/5/25/1670445760226-4741-0-0.tsfile, > status: COMPACTING, file is > /data1/iotdb/m_1209_a31441c_fast1/./sbin/../data/datanode/data/sequence/root.test.g2_0/5/25/1670445808744-4750-0-0.tsfile, > status: COMPACTING, file is > /data1/iotdb/m_1209_a31441c_fast1/./sbin/../data/datanode/data/sequence/root.test.g2_0/5/25/1670445860031
[jira] [Commented] (IOTDB-5139) [benchmark]1device over 3w sensors, insert nothing
[ https://issues.apache.org/jira/browse/IOTDB-5139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17645743#comment-17645743 ] Jinrui Zhang commented on IOTDB-5139: - It seems that this issue is easy to repro. Let's repro it and investigate whether it is caused by large request. > [benchmark]1device over 3w sensors, insert nothing > --- > > Key: IOTDB-5139 > URL: https://issues.apache.org/jira/browse/IOTDB-5139 > Project: Apache IoTDB > Issue Type: Bug >Affects Versions: 1.0.0 >Reporter: changxue >Assignee: 张洪胤 >Priority: Minor > Attachments: allnodes-log-3w.tar.gz, benchmark-logs-3w.log, > benchmark-logs-50w.log, config.properties > > > [benchmark]1device over 3w sensors, insert nothing > environment: > benchmark: 1.0 commit: 25c1f742 > iotdb: 3C3D cluster, 1.0.0 release edition > create timeseries succeeded but "show regions" showed schema info only. > > configs and logs see attachments. > 问题: > 1. 3万sensors的为什么没有数据写入。3千sensors的可以。我使用session.insertRecord自己写代码insert,是成功的。 > 2. loop=2, 第二次持续刷下面日志,就是不结束。 > 2022-12-07 14:56:44,540 INFO > cn.edu.tsinghua.iot.benchmark.client.DataClient:137 - pool-2-thread-1 50.00% > workload is done. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (IOTDB-4400) [new stand-alone]enableMetric, the write performance does not meet expectations
[ https://issues.apache.org/jira/browse/IOTDB-4400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17645740#comment-17645740 ] Jinrui Zhang commented on IOTDB-4400: - Need to do the test again with latest code > [new stand-alone]enableMetric, the write performance does not meet > expectations > --- > > Key: IOTDB-4400 > URL: https://issues.apache.org/jira/browse/IOTDB-4400 > Project: Apache IoTDB > Issue Type: Improvement > Components: Others >Affects Versions: master branch, 0.14.0, 0.14.0-SNAPSHOT >Reporter: xiaozhihong >Assignee: 张洪胤 >Priority: Major > Attachments: config.properties, image-2022-09-14-10-27-26-819.png > > > commit 74fb350809b2f1488a90d6d7c420f27ec14b24e5 > Turn on the monitoring function, and the two frameworks perform write > performance tests for different metric levels. The final result is very > confusing. Different levels or different frameworks, the performance of > writing is not noticeable. Turn off monitoring, execute writing, and also no > obvious difference is seen, and a positioning investigation needs to be done. > Details: > https://apache-iotdb.feishu.cn/docx/QUQSdbRaaoWDjQxcEz9cQjFdnsz?from=create_suite > !image-2022-09-14-10-27-26-819.png|width=528,height=275! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (IOTDB-5030) [Schema-Read-Performance] java.lang.IllegalArgumentException: all replicas for region[TConsensusGroupId(type:SchemaRegion, id:6)] are not available in these DataNodes
[ https://issues.apache.org/jira/browse/IOTDB-5030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jinrui Zhang reassigned IOTDB-5030: --- Sprint: 2022-12-Schema (was: 2022-11-Cluster) Assignee: Yukun Zhou (was: Jinrui Zhang) > [Schema-Read-Performance] java.lang.IllegalArgumentException: all replicas > for region[TConsensusGroupId(type:SchemaRegion, id:6)] are not available in > these DataNodes > -- > > Key: IOTDB-5030 > URL: https://issues.apache.org/jira/browse/IOTDB-5030 > Project: Apache IoTDB > Issue Type: Bug > Components: mpp-cluster >Affects Versions: 0.14.0-SNAPSHOT >Reporter: 刘珍 >Assignee: Yukun Zhou >Priority: Minor > Attachments: iotdb_4851.conf > > > master_1123_32e2f98 > 1. 启动1副本3C5D集群 > 2. BM 写入数据,50分钟,ip68 报错 > {color:#DE350B}2022-11-23 15:32:46,820 > [pool-24-IoTDB-DataNodeInternalRPC-Processor-122] ERROR > o.a.t.ProcessFunction:47 - Internal error processing sendPlanNode > java.lang.IllegalArgumentException: all replicas for > region[TConsensusGroupId(type:SchemaRegion, id:1)] are not available in these > DataNodes[[TDataNodeLocation(dataNodeId:4, > clientRpcEndPoint:TEndPoint(ip:192.168.10.66, port:6667), > internalEndPoint:TEndPoint(ip:192.168.10.66, port:9003), > mPPDataExchangeEndPoint:TEndPoint(ip:192.168.10.66, port:8777), > dataRegionConsensusEndPoint:TEndPoint(ip:192.168.10.66, port:40010), > schemaRegionConsensusEndPoint:TEndPoint(ip:192.168.10.66, > port:50010))]]{color} > at > org.apache.iotdb.db.mpp.plan.planner.distribution.SimpleFragmentParallelPlanner.selectTargetDataNode(SimpleFragmentParallelPlanner.java:146) > at > org.apache.iotdb.db.mpp.plan.planner.distribution.SimpleFragmentParallelPlanner.produceFragmentInstance(SimpleFragmentParallelPlanner.java:115) > at > org.apache.iotdb.db.mpp.plan.planner.distribution.SimpleFragmentParallelPlanner.prepare(SimpleFragmentParallelPlanner.java:87) > at > org.apache.iotdb.db.mpp.plan.planner.distribution.SimpleFragmentParallelPlanner.parallelPlan(SimpleFragmentParallelPlanner.java:78) > at > org.apache.iotdb.db.mpp.plan.planner.distribution.DistributionPlanner.planFragmentInstances(DistributionPlanner.java:94) > at > org.apache.iotdb.db.mpp.plan.planner.distribution.DistributionPlanner.planFragments(DistributionPlanner.java:78) > at > org.apache.iotdb.db.mpp.plan.execution.QueryExecution.doDistributedPlan(QueryExecution.java:304) > at > org.apache.iotdb.db.mpp.plan.execution.QueryExecution.start(QueryExecution.java:201) > at > org.apache.iotdb.db.mpp.plan.execution.QueryExecution.retry(QueryExecution.java:235) > at > org.apache.iotdb.db.mpp.plan.execution.QueryExecution.getStatus(QueryExecution.java:500) > at > org.apache.iotdb.db.mpp.plan.Coordinator.execute(Coordinator.java:152) > at > org.apache.iotdb.db.mpp.plan.analyze.ClusterSchemaFetcher.executeSchemaFetchQuery(ClusterSchemaFetcher.java:178) > at > org.apache.iotdb.db.mpp.plan.analyze.ClusterSchemaFetcher.fetchSchema(ClusterSchemaFetcher.java:156) > at > org.apache.iotdb.db.mpp.plan.analyze.ClusterSchemaFetcher.fetchSchema(ClusterSchemaFetcher.java:98) > at > org.apache.iotdb.db.mpp.plan.analyze.ClusterSchemaFetcher.fetchSchemaWithAutoCreate(ClusterSchemaFetcher.java:265) > at > org.apache.iotdb.db.mpp.plan.analyze.SchemaValidator.validate(SchemaValidator.java:56) > at > org.apache.iotdb.db.mpp.execution.executor.RegionWriteExecutor$WritePlanNodeExecutionVisitor.executeDataInsert(RegionWriteExecutor.java:193) > at > org.apache.iotdb.db.mpp.execution.executor.RegionWriteExecutor$WritePlanNodeExecutionVisitor.visitInsertTablet(RegionWriteExecutor.java:165) > at > org.apache.iotdb.db.mpp.execution.executor.RegionWriteExecutor$WritePlanNodeExecutionVisitor.visitInsertTablet(RegionWriteExecutor.java:119) > at > org.apache.iotdb.db.mpp.plan.planner.plan.node.write.InsertTabletNode.accept(InsertTabletNode.java:1086) > at > org.apache.iotdb.db.mpp.execution.executor.RegionWriteExecutor.execute(RegionWriteExecutor.java:85) > at > org.apache.iotdb.db.service.thrift.impl.DataNodeInternalRPCServiceImpl.sendPlanNode(DataNodeInternalRPCServiceImpl.java:283) > at > org.apache.iotdb.mpp.rpc.thrift.IDataNodeRPCService$Processor$sendPlanNode.getResult(IDataNodeRPCService.java:3607) > at > org.apache.iotdb.mpp.rpc.thrift.IDataNodeRPCService$Processor$sendPlanNode.getResult(IDataNodeRPCService.java:3587) > at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:38) > at org.apache.thrift.TBaseProcesso
[jira] [Commented] (IOTDB-4971) dispatch write failed. status: TSStatus(code:506, subStatus:[]), message: null
[ https://issues.apache.org/jira/browse/IOTDB-4971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17645737#comment-17645737 ] Jinrui Zhang commented on IOTDB-4971: - Need to confirm what the problem is behind the logs > dispatch write failed. status: TSStatus(code:506, subStatus:[]), message: null > -- > > Key: IOTDB-4971 > URL: https://issues.apache.org/jira/browse/IOTDB-4971 > Project: Apache IoTDB > Issue Type: Bug > Components: mpp-cluster >Affects Versions: 0.14.0-SNAPSHOT >Reporter: 刘珍 >Assignee: Jinrui Zhang >Priority: Minor > Labels: pull-request-available > Attachments: del_ts.sh, down_delete_ts.conf, run_del_1.sh, > run_del_2.sh, run_iotdb_4563.sh > > > master_1117_d548214 > 1. start 3rep 3C 9D cluster > 2. delete timeseries root.**和create metadata , write data 并发 > datanode (IP18) 有ERROR : > 2022-11-17 14:52:38,172 > [pool-24-IoTDB-DataNodeInternalRPC-Processor-17$20221117_065237_15126_11.1.0] > {color:red}*ERROR o.a.i.d.m.p.s.FragmentInstanceDispatcherImpl:173 - dispatch > write failed. status: TSStatus(code:506, subStatus:[]), message: null*{color} > 复现流程 > 1. 启动3C9D集群 > 3C : 172.20.70.19/172.20.70.21/172.20.70.32 > 9D : 172.20.70.2/3/4/5/13/14/15/16/18 > 配置参数 > ConfigNode: > MAX_HEAP_SIZE="8G" > MAX_DIRECT_MEMORY_SIZE="6G" > cn_connection_timeout_ms=360 > Common : > schema_region_consensus_protocol_class=org.apache.iotdb.consensus.ratis.RatisConsensus > data_region_consensus_protocol_class=org.apache.iotdb.consensus.multileader.MultiLeaderConsensus > schema_replication_factor=3 > data_replication_factor=3 > connection_timeout_ms=360 > max_connection_for_internal_service=200 > query_timeout_threshold=360 > schema_region_ratis_request_timeout_ms=180 > Datanode: > MAX_HEAP_SIZE="20G" > MAX_DIRECT_MEMORY_SIZE="6G" > 2. 启动测试脚本 > down_delete_ts.conf放到${bm_dir}/conf下 > del_ts.sh 、run_del_1.sh 、 run_del_2.sh 、run_iotdb_4563.sh 这4个脚本放到${bm_dir}下 > 启动脚本是:run_iotdb_4563.sh > 运行完成,查看 ip18datanode 日志。 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (IOTDB-4805) [Performance] Compare performance of 1C1D with "start-server.sh” and ”start-new-server.sh”
[ https://issues.apache.org/jira/browse/IOTDB-4805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17645736#comment-17645736 ] Jinrui Zhang commented on IOTDB-4805: - This issue intended to compare the performance between 1c1d and new-standalone server. Let's confirm whether the test is necessary or not > [Performance] Compare performance of 1C1D with "start-server.sh” and > ”start-new-server.sh” > -- > > Key: IOTDB-4805 > URL: https://issues.apache.org/jira/browse/IOTDB-4805 > Project: Apache IoTDB > Issue Type: Improvement >Affects Versions: 0.14.0-SNAPSHOT >Reporter: FengQingxin >Assignee: Jinrui Zhang >Priority: Major > Attachments: common, image-2022-10-31-12-08-22-497.png > > > commit_id:76b947f > Reproduce Steps: > 1.Git pull the latest master code,then build it with command "mvn clean > package -pl distribution -am -DskipTests" > 2.Modify the config as bellow: > MAX_HEAP_SIZE="20G" > enable_partition=false > enable_seq_space_compaction=false > enable_unseq_space_compaction=false > enable_cross_space_compaction=false > enableMetric: true > 3.Start 1C1D > 4.Insert data with bm which using iotdb-0.13-0.0.1.jar > 5.After the test of 1C1D finished,start old server with start-server.sh > 6.Insert data with bm which using iotdb-0.13-0.0.1.jar > Result: > 1c1d/new-server=83.64% > 1c1d/old-server=72.98% > !image-2022-10-31-12-08-22-497.png|width=712,height=367! > Attachment: > benchmark config:common > B.R > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IOTDB-5103) LastFlushTime may be set incorrectly when DataRegion recovering
Jinrui Zhang created IOTDB-5103: --- Summary: LastFlushTime may be set incorrectly when DataRegion recovering Key: IOTDB-5103 URL: https://issues.apache.org/jira/browse/IOTDB-5103 Project: Apache IoTDB Issue Type: Bug Reporter: Jinrui Zhang Assignee: Jinrui Zhang During the DataRegion recovering, the unsealed file may be read before the complete of sealed TsFile, which will lead to the incorrect lastFlushTime for current DataRegion. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (IOTDB-5035) After the datanode is removed successfully, the snapshot can be deleted
[ https://issues.apache.org/jira/browse/IOTDB-5035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jinrui Zhang reassigned IOTDB-5035: --- Assignee: Haiming Zhu (was: Jinrui Zhang) > After the datanode is removed successfully, the snapshot can be deleted > --- > > Key: IOTDB-5035 > URL: https://issues.apache.org/jira/browse/IOTDB-5035 > Project: Apache IoTDB > Issue Type: Improvement > Components: mpp-cluster >Reporter: 刘珍 >Assignee: Haiming Zhu >Priority: Minor > Attachments: image-2022-11-24-14-38-09-579.png > > > 测试版本:1124_cd839a4 > 在机器上的路径:/data/liuzhen_test/master_1123_32e2f98 (1124_cd839a4的lib) > 问题描述: > 正常stop datanode,会触发snapshot, > 启动这个节点后,执行缩容datanode(ip76/ip62),缩容成功后,这个snapshot没有被删除: > !image-2022-11-24-14-38-09-579.png! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (IOTDB-5045) [delete] After running "drop database root.**", wal and tsfile still left
[ https://issues.apache.org/jira/browse/IOTDB-5045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17639889#comment-17639889 ] Jinrui Zhang commented on IOTDB-5045: - According to current investigation with [~Marcoss] , we found that some write operation may happen even the DataRegion has been marked as deleted. The `write operation` may be on going while the DataRegion is being deleted. And when the DataRegion’s deletion is done, the ongoing write may trigger the DataRegion's write again so that some wal/tsfile is generated. There is no lock/concurrent control between `write` and `DataRegion delete`, so this case may be triggered in many scenarios. If the delete is submitted immediately after insertion, the SyncLog of IoTConsensus can trigger this case easily when the total insert operation is less than 5. What we should do next: * Add the concurrent control between DataRegion's write and delete. Ensure the write operation will be discards/rejected when the DataRegion has been marked as deleted > [delete] After running "drop database root.**", wal and tsfile still left > --- > > Key: IOTDB-5045 > URL: https://issues.apache.org/jira/browse/IOTDB-5045 > Project: Apache IoTDB > Issue Type: Bug >Affects Versions: 0.14.0-SNAPSHOT >Reporter: changxue >Assignee: Yukun Zhou >Priority: Minor > Labels: pull-request-available > Fix For: 1.0.0 > > Attachments: allnodes-log.tar.gz, udf-privilege.run > > > [delete] After running "drop database root.**", wal and tsfile left rather > than cleanup > 3C3D cluster, Nov.25 14:00 source codes > reproduction: > execute the statements of attchachment udf-privilege.run on start-cli.sh > window several times > actual result: > They are all empty: > show databases; > show timeseries root.**; > show regions; > I've run flush in command wiindow but no use. > But there are also tsfile and wal files left and won't be removed. > find $IOTDB_HOME/data/datanode/data -type f | xargs ls -hl > {code} > -rw-r--r-- 1 atmos root5 Nov 25 14:30 > /data/iotdb/apache-iotdb-0.14.0-SNAPSHOT-all-bin/data/datanode/data/.iotdb-lock > -rw-r--r-- 1 atmos root0 Nov 25 14:43 > /data/iotdb/apache-iotdb-0.14.0-SNAPSHOT-all-bin/data/datanode/data/sequence/root.sg1/10/0/1669358591044-1-0-0.tsfile > -rw-r--r-- 1 atmos root0 Nov 25 14:59 > /data/iotdb/apache-iotdb-0.14.0-SNAPSHOT-all-bin/data/datanode/data/sequence/root.sg1/19/0/1669359549944-1-0-0.tsfile > -rw-r--r-- 1 atmos root0 Nov 25 14:59 > /data/iotdb/apache-iotdb-0.14.0-SNAPSHOT-all-bin/data/datanode/data/sequence/root.sg1/25/0/1669359576659-1-0-0.tsfile > -rw-r--r-- 1 atmos root0 Nov 25 15:13 > /data/iotdb/apache-iotdb-0.14.0-SNAPSHOT-all-bin/data/datanode/data/sequence/root.sg1/32/0/1669360384185-1-0-0.tsfile > -rw-r--r-- 1 atmos root0 Nov 25 15:34 > /data/iotdb/apache-iotdb-0.14.0-SNAPSHOT-all-bin/data/datanode/data/sequence/root.sg1/38/0/1669361656927-1-0-0.tsfile > -rw-r--r-- 1 atmos root0 Nov 25 15:41 > /data/iotdb/apache-iotdb-0.14.0-SNAPSHOT-all-bin/data/datanode/data/sequence/root.sg1/50/0/1669362077038-1-0-0.tsfile > -rw-r--r-- 1 atmos root0 Nov 25 14:34 > /data/iotdb/apache-iotdb-0.14.0-SNAPSHOT-all-bin/data/datanode/data/sequence/root.sg1/6/0/1669358088441-1-0-0.tsfile > {code} > find $IOTDB_HOME/data/datanode/data -type f | xargs ls -hl > {code} > -rw-r--r-- 1 atmos root 54 Nov 25 14:30 > /data/iotdb/apache-iotdb-0.14.0-SNAPSHOT-all-bin/data/datanode/system/users/root.profile > -rw-r--r-- 1 atmos root 136 Nov 25 14:43 > /data/iotdb/apache-iotdb-0.14.0-SNAPSHOT-all-bin/data/datanode/wal/root.sg1-10/_0-0-1.wal > -rw-r--r-- 1 atmos root 155 Nov 25 14:43 > /data/iotdb/apache-iotdb-0.14.0-SNAPSHOT-all-bin/data/datanode/wal/root.sg1-10/_0.checkpoint > -rw-r--r-- 1 atmos root 68 Nov 25 14:59 > /data/iotdb/apache-iotdb-0.14.0-SNAPSHOT-all-bin/data/datanode/wal/root.sg1-19/_0-0-1.wal > -rw-r--r-- 1 atmos root 155 Nov 25 14:59 > /data/iotdb/apache-iotdb-0.14.0-SNAPSHOT-all-bin/data/datanode/wal/root.sg1-19/_0.checkpoint > -rw-r--r-- 1 atmos root 68 Nov 25 14:59 > /data/iotdb/apache-iotdb-0.14.0-SNAPSHOT-all-bin/data/datanode/wal/root.sg1-25/_0-0-1.wal > -rw-r--r-- 1 atmos root 155 Nov 25 14:59 > /data/iotdb/apache-iotdb-0.14.0-SNAPSHOT-all-bin/data/datanode/wal/root.sg1-25/_0.checkpoint > -rw-r--r-- 1 atmos root 68 Nov 25 15:13 > /data/iotdb/apache-iotdb-0.14.0-SNAPSHOT-all-bin/data/datanode/wal/root.sg1-32/_0-0-1.wal > -rw-r--r-- 1 atmos root 155 Nov 25 15:13 > /data/iotdb/apache-iotdb-0.14.0-SNAPSHOT-all-bin/data/datanode/wal/root.sg1-32/_0.checkpoint > -rw-r--r-- 1 atmos root 68 Nov 25 15:34 > /data/iotdb/apache-iotdb-0.14.0-SNAPSHOT-all-bin/data/datanode/wal/root.sg1-38/_0-0-1.wal
[jira] [Assigned] (IOTDB-5030) java.lang.IllegalArgumentException: all replicas for region[TConsensusGroupId(type:SchemaRegion, id:6)] are not available in these DataNodes
[ https://issues.apache.org/jira/browse/IOTDB-5030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jinrui Zhang reassigned IOTDB-5030: --- Assignee: Jinrui Zhang (was: Yukun Zhou) > java.lang.IllegalArgumentException: all replicas for > region[TConsensusGroupId(type:SchemaRegion, id:6)] are not available in these > DataNodes > - > > Key: IOTDB-5030 > URL: https://issues.apache.org/jira/browse/IOTDB-5030 > Project: Apache IoTDB > Issue Type: Bug > Components: mpp-cluster >Affects Versions: 0.14.0-SNAPSHOT >Reporter: 刘珍 >Assignee: Jinrui Zhang >Priority: Minor > Attachments: iotdb_4851.conf > > > master_1123_32e2f98 > 1. 启动1副本3C5D集群 > 2. BM 写入数据,50分钟,ip68 报错 > {color:#DE350B}2022-11-23 15:32:46,820 > [pool-24-IoTDB-DataNodeInternalRPC-Processor-122] ERROR > o.a.t.ProcessFunction:47 - Internal error processing sendPlanNode > java.lang.IllegalArgumentException: all replicas for > region[TConsensusGroupId(type:SchemaRegion, id:1)] are not available in these > DataNodes[[TDataNodeLocation(dataNodeId:4, > clientRpcEndPoint:TEndPoint(ip:192.168.10.66, port:6667), > internalEndPoint:TEndPoint(ip:192.168.10.66, port:9003), > mPPDataExchangeEndPoint:TEndPoint(ip:192.168.10.66, port:8777), > dataRegionConsensusEndPoint:TEndPoint(ip:192.168.10.66, port:40010), > schemaRegionConsensusEndPoint:TEndPoint(ip:192.168.10.66, > port:50010))]]{color} > at > org.apache.iotdb.db.mpp.plan.planner.distribution.SimpleFragmentParallelPlanner.selectTargetDataNode(SimpleFragmentParallelPlanner.java:146) > at > org.apache.iotdb.db.mpp.plan.planner.distribution.SimpleFragmentParallelPlanner.produceFragmentInstance(SimpleFragmentParallelPlanner.java:115) > at > org.apache.iotdb.db.mpp.plan.planner.distribution.SimpleFragmentParallelPlanner.prepare(SimpleFragmentParallelPlanner.java:87) > at > org.apache.iotdb.db.mpp.plan.planner.distribution.SimpleFragmentParallelPlanner.parallelPlan(SimpleFragmentParallelPlanner.java:78) > at > org.apache.iotdb.db.mpp.plan.planner.distribution.DistributionPlanner.planFragmentInstances(DistributionPlanner.java:94) > at > org.apache.iotdb.db.mpp.plan.planner.distribution.DistributionPlanner.planFragments(DistributionPlanner.java:78) > at > org.apache.iotdb.db.mpp.plan.execution.QueryExecution.doDistributedPlan(QueryExecution.java:304) > at > org.apache.iotdb.db.mpp.plan.execution.QueryExecution.start(QueryExecution.java:201) > at > org.apache.iotdb.db.mpp.plan.execution.QueryExecution.retry(QueryExecution.java:235) > at > org.apache.iotdb.db.mpp.plan.execution.QueryExecution.getStatus(QueryExecution.java:500) > at > org.apache.iotdb.db.mpp.plan.Coordinator.execute(Coordinator.java:152) > at > org.apache.iotdb.db.mpp.plan.analyze.ClusterSchemaFetcher.executeSchemaFetchQuery(ClusterSchemaFetcher.java:178) > at > org.apache.iotdb.db.mpp.plan.analyze.ClusterSchemaFetcher.fetchSchema(ClusterSchemaFetcher.java:156) > at > org.apache.iotdb.db.mpp.plan.analyze.ClusterSchemaFetcher.fetchSchema(ClusterSchemaFetcher.java:98) > at > org.apache.iotdb.db.mpp.plan.analyze.ClusterSchemaFetcher.fetchSchemaWithAutoCreate(ClusterSchemaFetcher.java:265) > at > org.apache.iotdb.db.mpp.plan.analyze.SchemaValidator.validate(SchemaValidator.java:56) > at > org.apache.iotdb.db.mpp.execution.executor.RegionWriteExecutor$WritePlanNodeExecutionVisitor.executeDataInsert(RegionWriteExecutor.java:193) > at > org.apache.iotdb.db.mpp.execution.executor.RegionWriteExecutor$WritePlanNodeExecutionVisitor.visitInsertTablet(RegionWriteExecutor.java:165) > at > org.apache.iotdb.db.mpp.execution.executor.RegionWriteExecutor$WritePlanNodeExecutionVisitor.visitInsertTablet(RegionWriteExecutor.java:119) > at > org.apache.iotdb.db.mpp.plan.planner.plan.node.write.InsertTabletNode.accept(InsertTabletNode.java:1086) > at > org.apache.iotdb.db.mpp.execution.executor.RegionWriteExecutor.execute(RegionWriteExecutor.java:85) > at > org.apache.iotdb.db.service.thrift.impl.DataNodeInternalRPCServiceImpl.sendPlanNode(DataNodeInternalRPCServiceImpl.java:283) > at > org.apache.iotdb.mpp.rpc.thrift.IDataNodeRPCService$Processor$sendPlanNode.getResult(IDataNodeRPCService.java:3607) > at > org.apache.iotdb.mpp.rpc.thrift.IDataNodeRPCService$Processor$sendPlanNode.getResult(IDataNodeRPCService.java:3587) > at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:38) > at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:38) > at > org.apache.thrift.server.TThreadPoolServer$WorkerPr
[jira] [Assigned] (IOTDB-5030) java.lang.IllegalArgumentException: all replicas for region[TConsensusGroupId(type:SchemaRegion, id:6)] are not available in these DataNodes
[ https://issues.apache.org/jira/browse/IOTDB-5030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jinrui Zhang reassigned IOTDB-5030: --- Assignee: Yukun Zhou (was: Jinrui Zhang) > java.lang.IllegalArgumentException: all replicas for > region[TConsensusGroupId(type:SchemaRegion, id:6)] are not available in these > DataNodes > - > > Key: IOTDB-5030 > URL: https://issues.apache.org/jira/browse/IOTDB-5030 > Project: Apache IoTDB > Issue Type: Bug > Components: mpp-cluster >Affects Versions: 0.14.0-SNAPSHOT >Reporter: 刘珍 >Assignee: Yukun Zhou >Priority: Minor > Attachments: iotdb_4851.conf > > > master_1123_32e2f98 > 1. 启动1副本3C5D集群 > 2. BM 写入数据,50分钟,ip68 报错 > {color:#DE350B}2022-11-23 15:32:46,820 > [pool-24-IoTDB-DataNodeInternalRPC-Processor-122] ERROR > o.a.t.ProcessFunction:47 - Internal error processing sendPlanNode > java.lang.IllegalArgumentException: all replicas for > region[TConsensusGroupId(type:SchemaRegion, id:1)] are not available in these > DataNodes[[TDataNodeLocation(dataNodeId:4, > clientRpcEndPoint:TEndPoint(ip:192.168.10.66, port:6667), > internalEndPoint:TEndPoint(ip:192.168.10.66, port:9003), > mPPDataExchangeEndPoint:TEndPoint(ip:192.168.10.66, port:8777), > dataRegionConsensusEndPoint:TEndPoint(ip:192.168.10.66, port:40010), > schemaRegionConsensusEndPoint:TEndPoint(ip:192.168.10.66, > port:50010))]]{color} > at > org.apache.iotdb.db.mpp.plan.planner.distribution.SimpleFragmentParallelPlanner.selectTargetDataNode(SimpleFragmentParallelPlanner.java:146) > at > org.apache.iotdb.db.mpp.plan.planner.distribution.SimpleFragmentParallelPlanner.produceFragmentInstance(SimpleFragmentParallelPlanner.java:115) > at > org.apache.iotdb.db.mpp.plan.planner.distribution.SimpleFragmentParallelPlanner.prepare(SimpleFragmentParallelPlanner.java:87) > at > org.apache.iotdb.db.mpp.plan.planner.distribution.SimpleFragmentParallelPlanner.parallelPlan(SimpleFragmentParallelPlanner.java:78) > at > org.apache.iotdb.db.mpp.plan.planner.distribution.DistributionPlanner.planFragmentInstances(DistributionPlanner.java:94) > at > org.apache.iotdb.db.mpp.plan.planner.distribution.DistributionPlanner.planFragments(DistributionPlanner.java:78) > at > org.apache.iotdb.db.mpp.plan.execution.QueryExecution.doDistributedPlan(QueryExecution.java:304) > at > org.apache.iotdb.db.mpp.plan.execution.QueryExecution.start(QueryExecution.java:201) > at > org.apache.iotdb.db.mpp.plan.execution.QueryExecution.retry(QueryExecution.java:235) > at > org.apache.iotdb.db.mpp.plan.execution.QueryExecution.getStatus(QueryExecution.java:500) > at > org.apache.iotdb.db.mpp.plan.Coordinator.execute(Coordinator.java:152) > at > org.apache.iotdb.db.mpp.plan.analyze.ClusterSchemaFetcher.executeSchemaFetchQuery(ClusterSchemaFetcher.java:178) > at > org.apache.iotdb.db.mpp.plan.analyze.ClusterSchemaFetcher.fetchSchema(ClusterSchemaFetcher.java:156) > at > org.apache.iotdb.db.mpp.plan.analyze.ClusterSchemaFetcher.fetchSchema(ClusterSchemaFetcher.java:98) > at > org.apache.iotdb.db.mpp.plan.analyze.ClusterSchemaFetcher.fetchSchemaWithAutoCreate(ClusterSchemaFetcher.java:265) > at > org.apache.iotdb.db.mpp.plan.analyze.SchemaValidator.validate(SchemaValidator.java:56) > at > org.apache.iotdb.db.mpp.execution.executor.RegionWriteExecutor$WritePlanNodeExecutionVisitor.executeDataInsert(RegionWriteExecutor.java:193) > at > org.apache.iotdb.db.mpp.execution.executor.RegionWriteExecutor$WritePlanNodeExecutionVisitor.visitInsertTablet(RegionWriteExecutor.java:165) > at > org.apache.iotdb.db.mpp.execution.executor.RegionWriteExecutor$WritePlanNodeExecutionVisitor.visitInsertTablet(RegionWriteExecutor.java:119) > at > org.apache.iotdb.db.mpp.plan.planner.plan.node.write.InsertTabletNode.accept(InsertTabletNode.java:1086) > at > org.apache.iotdb.db.mpp.execution.executor.RegionWriteExecutor.execute(RegionWriteExecutor.java:85) > at > org.apache.iotdb.db.service.thrift.impl.DataNodeInternalRPCServiceImpl.sendPlanNode(DataNodeInternalRPCServiceImpl.java:283) > at > org.apache.iotdb.mpp.rpc.thrift.IDataNodeRPCService$Processor$sendPlanNode.getResult(IDataNodeRPCService.java:3607) > at > org.apache.iotdb.mpp.rpc.thrift.IDataNodeRPCService$Processor$sendPlanNode.getResult(IDataNodeRPCService.java:3587) > at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:38) > at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:38) > at > org.apache.thrift.server.TThreadPoolServer$WorkerProc
[jira] [Commented] (IOTDB-5030) java.lang.IllegalArgumentException: all replicas for region[TConsensusGroupId(type:SchemaRegion, id:6)] are not available in these DataNodes
[ https://issues.apache.org/jira/browse/IOTDB-5030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17638667#comment-17638667 ] Jinrui Zhang commented on IOTDB-5030: - [~HeimingZ] tried to reproduce this issue in fit16-20 with 3C5D cluster and this issue didn't occur. We investigate the logs when the issue occurred and found that it should be a timeout issue. At that time, fit68 was trying to dispatch a schema-read FI to fit66 but the response was not returned in the timeout. But actually it executed successfully in fit66 because we didn't see any error log in fit66. And on the other hand, we found `Read timeout` log in fit68 at that time. This indicates that the schema-read operation is not as fast as we expected in this load. So it didn't returned the response in a tolerated interval. According to the benchmark settings, there are 3kw series in the schema, which is huge. There are two ways to resolve this issue currently: # optimize the schema read execution to avoid timeout # let users to turn up the `connection_timeout_ms` in configuration regarding the huge load. But...the optimization can definitely not be completed in a very short time, according to the release stage of 1.0, I will decrease the priority of this issue. And the optimization need [~Marcoss] to take a look > java.lang.IllegalArgumentException: all replicas for > region[TConsensusGroupId(type:SchemaRegion, id:6)] are not available in these > DataNodes > - > > Key: IOTDB-5030 > URL: https://issues.apache.org/jira/browse/IOTDB-5030 > Project: Apache IoTDB > Issue Type: Bug > Components: mpp-cluster >Affects Versions: 0.14.0-SNAPSHOT >Reporter: 刘珍 >Assignee: Jinrui Zhang >Priority: Minor > Attachments: iotdb_4851.conf > > > master_1123_32e2f98 > 1. 启动1副本3C5D集群 > 2. BM 写入数据,50分钟,ip68 报错 > {color:#DE350B}2022-11-23 15:32:46,820 > [pool-24-IoTDB-DataNodeInternalRPC-Processor-122] ERROR > o.a.t.ProcessFunction:47 - Internal error processing sendPlanNode > java.lang.IllegalArgumentException: all replicas for > region[TConsensusGroupId(type:SchemaRegion, id:1)] are not available in these > DataNodes[[TDataNodeLocation(dataNodeId:4, > clientRpcEndPoint:TEndPoint(ip:192.168.10.66, port:6667), > internalEndPoint:TEndPoint(ip:192.168.10.66, port:9003), > mPPDataExchangeEndPoint:TEndPoint(ip:192.168.10.66, port:8777), > dataRegionConsensusEndPoint:TEndPoint(ip:192.168.10.66, port:40010), > schemaRegionConsensusEndPoint:TEndPoint(ip:192.168.10.66, > port:50010))]]{color} > at > org.apache.iotdb.db.mpp.plan.planner.distribution.SimpleFragmentParallelPlanner.selectTargetDataNode(SimpleFragmentParallelPlanner.java:146) > at > org.apache.iotdb.db.mpp.plan.planner.distribution.SimpleFragmentParallelPlanner.produceFragmentInstance(SimpleFragmentParallelPlanner.java:115) > at > org.apache.iotdb.db.mpp.plan.planner.distribution.SimpleFragmentParallelPlanner.prepare(SimpleFragmentParallelPlanner.java:87) > at > org.apache.iotdb.db.mpp.plan.planner.distribution.SimpleFragmentParallelPlanner.parallelPlan(SimpleFragmentParallelPlanner.java:78) > at > org.apache.iotdb.db.mpp.plan.planner.distribution.DistributionPlanner.planFragmentInstances(DistributionPlanner.java:94) > at > org.apache.iotdb.db.mpp.plan.planner.distribution.DistributionPlanner.planFragments(DistributionPlanner.java:78) > at > org.apache.iotdb.db.mpp.plan.execution.QueryExecution.doDistributedPlan(QueryExecution.java:304) > at > org.apache.iotdb.db.mpp.plan.execution.QueryExecution.start(QueryExecution.java:201) > at > org.apache.iotdb.db.mpp.plan.execution.QueryExecution.retry(QueryExecution.java:235) > at > org.apache.iotdb.db.mpp.plan.execution.QueryExecution.getStatus(QueryExecution.java:500) > at > org.apache.iotdb.db.mpp.plan.Coordinator.execute(Coordinator.java:152) > at > org.apache.iotdb.db.mpp.plan.analyze.ClusterSchemaFetcher.executeSchemaFetchQuery(ClusterSchemaFetcher.java:178) > at > org.apache.iotdb.db.mpp.plan.analyze.ClusterSchemaFetcher.fetchSchema(ClusterSchemaFetcher.java:156) > at > org.apache.iotdb.db.mpp.plan.analyze.ClusterSchemaFetcher.fetchSchema(ClusterSchemaFetcher.java:98) > at > org.apache.iotdb.db.mpp.plan.analyze.ClusterSchemaFetcher.fetchSchemaWithAutoCreate(ClusterSchemaFetcher.java:265) > at > org.apache.iotdb.db.mpp.plan.analyze.SchemaValidator.validate(SchemaValidator.java:56) > at > org.apache.iotdb.db.mpp.execution.executor.RegionWriteExecutor$WritePlanNodeExecutionVisitor.executeDataInsert(RegionWriteExecutor.java:193) > at > org.apach
[jira] [Commented] (IOTDB-5030) java.lang.IllegalArgumentException: all replicas for region[TConsensusGroupId(type:SchemaRegion, id:6)] are not available in these DataNodes
[ https://issues.apache.org/jira/browse/IOTDB-5030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17638525#comment-17638525 ] Jinrui Zhang commented on IOTDB-5030: - Please try to reproduce this issue with latest code > java.lang.IllegalArgumentException: all replicas for > region[TConsensusGroupId(type:SchemaRegion, id:6)] are not available in these > DataNodes > - > > Key: IOTDB-5030 > URL: https://issues.apache.org/jira/browse/IOTDB-5030 > Project: Apache IoTDB > Issue Type: Bug > Components: mpp-cluster >Affects Versions: 0.14.0-SNAPSHOT >Reporter: 刘珍 >Assignee: Jinrui Zhang >Priority: Major > Attachments: iotdb_4851.conf > > > master_1123_32e2f98 > 1. 启动1副本3C5D集群 > 2. BM 写入数据,50分钟,ip68 报错 > {color:#DE350B}2022-11-23 15:32:46,820 > [pool-24-IoTDB-DataNodeInternalRPC-Processor-122] ERROR > o.a.t.ProcessFunction:47 - Internal error processing sendPlanNode > java.lang.IllegalArgumentException: all replicas for > region[TConsensusGroupId(type:SchemaRegion, id:1)] are not available in these > DataNodes[[TDataNodeLocation(dataNodeId:4, > clientRpcEndPoint:TEndPoint(ip:192.168.10.66, port:6667), > internalEndPoint:TEndPoint(ip:192.168.10.66, port:9003), > mPPDataExchangeEndPoint:TEndPoint(ip:192.168.10.66, port:8777), > dataRegionConsensusEndPoint:TEndPoint(ip:192.168.10.66, port:40010), > schemaRegionConsensusEndPoint:TEndPoint(ip:192.168.10.66, > port:50010))]]{color} > at > org.apache.iotdb.db.mpp.plan.planner.distribution.SimpleFragmentParallelPlanner.selectTargetDataNode(SimpleFragmentParallelPlanner.java:146) > at > org.apache.iotdb.db.mpp.plan.planner.distribution.SimpleFragmentParallelPlanner.produceFragmentInstance(SimpleFragmentParallelPlanner.java:115) > at > org.apache.iotdb.db.mpp.plan.planner.distribution.SimpleFragmentParallelPlanner.prepare(SimpleFragmentParallelPlanner.java:87) > at > org.apache.iotdb.db.mpp.plan.planner.distribution.SimpleFragmentParallelPlanner.parallelPlan(SimpleFragmentParallelPlanner.java:78) > at > org.apache.iotdb.db.mpp.plan.planner.distribution.DistributionPlanner.planFragmentInstances(DistributionPlanner.java:94) > at > org.apache.iotdb.db.mpp.plan.planner.distribution.DistributionPlanner.planFragments(DistributionPlanner.java:78) > at > org.apache.iotdb.db.mpp.plan.execution.QueryExecution.doDistributedPlan(QueryExecution.java:304) > at > org.apache.iotdb.db.mpp.plan.execution.QueryExecution.start(QueryExecution.java:201) > at > org.apache.iotdb.db.mpp.plan.execution.QueryExecution.retry(QueryExecution.java:235) > at > org.apache.iotdb.db.mpp.plan.execution.QueryExecution.getStatus(QueryExecution.java:500) > at > org.apache.iotdb.db.mpp.plan.Coordinator.execute(Coordinator.java:152) > at > org.apache.iotdb.db.mpp.plan.analyze.ClusterSchemaFetcher.executeSchemaFetchQuery(ClusterSchemaFetcher.java:178) > at > org.apache.iotdb.db.mpp.plan.analyze.ClusterSchemaFetcher.fetchSchema(ClusterSchemaFetcher.java:156) > at > org.apache.iotdb.db.mpp.plan.analyze.ClusterSchemaFetcher.fetchSchema(ClusterSchemaFetcher.java:98) > at > org.apache.iotdb.db.mpp.plan.analyze.ClusterSchemaFetcher.fetchSchemaWithAutoCreate(ClusterSchemaFetcher.java:265) > at > org.apache.iotdb.db.mpp.plan.analyze.SchemaValidator.validate(SchemaValidator.java:56) > at > org.apache.iotdb.db.mpp.execution.executor.RegionWriteExecutor$WritePlanNodeExecutionVisitor.executeDataInsert(RegionWriteExecutor.java:193) > at > org.apache.iotdb.db.mpp.execution.executor.RegionWriteExecutor$WritePlanNodeExecutionVisitor.visitInsertTablet(RegionWriteExecutor.java:165) > at > org.apache.iotdb.db.mpp.execution.executor.RegionWriteExecutor$WritePlanNodeExecutionVisitor.visitInsertTablet(RegionWriteExecutor.java:119) > at > org.apache.iotdb.db.mpp.plan.planner.plan.node.write.InsertTabletNode.accept(InsertTabletNode.java:1086) > at > org.apache.iotdb.db.mpp.execution.executor.RegionWriteExecutor.execute(RegionWriteExecutor.java:85) > at > org.apache.iotdb.db.service.thrift.impl.DataNodeInternalRPCServiceImpl.sendPlanNode(DataNodeInternalRPCServiceImpl.java:283) > at > org.apache.iotdb.mpp.rpc.thrift.IDataNodeRPCService$Processor$sendPlanNode.getResult(IDataNodeRPCService.java:3607) > at > org.apache.iotdb.mpp.rpc.thrift.IDataNodeRPCService$Processor$sendPlanNode.getResult(IDataNodeRPCService.java:3587) > at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:38) > at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:38) >
[jira] [Commented] (IOTDB-4702) [Remove-DataNode] snapshot is not deleted after delete storage group
[ https://issues.apache.org/jira/browse/IOTDB-4702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17638474#comment-17638474 ] Jinrui Zhang commented on IOTDB-4702: - Please describe the issue in details. Currently we do have some snapshots which cannot be removed in some situations. Please indicate the detailed scenario. [~刘珍] > [Remove-DataNode] snapshot is not deleted after delete storage group > > > Key: IOTDB-4702 > URL: https://issues.apache.org/jira/browse/IOTDB-4702 > Project: Apache IoTDB > Issue Type: Bug > Components: mpp-cluster >Affects Versions: 0.14.0-SNAPSHOT >Reporter: 刘珍 >Assignee: suchenglong >Priority: Minor > Fix For: 0.14.0 > > Attachments: image-2022-11-24-14-54-28-187.png, > image-2022-11-24-14-54-33-334.png > > > m_1019_f2ffb49 > 3rep,3C3D > schemaregion : ratis > dataregion : multiLeader > execute stop-datanode.sh , data_region take snapshot . > {color:#DE350B}delete storage group , snapshot is not deleted.{color} > How to reproduce : > IOTDB-4700 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (IOTDB-4507) [SystemResourceIssue] Insert failed can't connect to node TEndPoint
[ https://issues.apache.org/jira/browse/IOTDB-4507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17638467#comment-17638467 ] Jinrui Zhang commented on IOTDB-4507: - We gave more optimization and bug fix to MultiLeaderCosnensus, which would enhance the stability of write using MultiLeader. See this PR [https://github.com/apache/iotdb/pull/8025.] The error "connect connect to node" is usually caused by the error/pressure of server side. This fix decreased the pressure of server side and optimized the memory usage of DataNode, which should take effect to this issue. Let's test it again to see whether this issue will be reproduced or not > [SystemResourceIssue] Insert failed can't connect to node TEndPoint > > > Key: IOTDB-4507 > URL: https://issues.apache.org/jira/browse/IOTDB-4507 > Project: Apache IoTDB > Issue Type: Bug > Components: Core/Cluster >Affects Versions: 0.14.0-SNAPSHOT >Reporter: FengQingxin >Assignee: Jinrui Zhang >Priority: Minor > Attachments: config.properties, confignode-env.sh, datanode-env.sh, > image-2022-09-23-08-44-57-017.png, iotdb-confignode.properties, > iotdb-datanode.properties, log.tar.gz > > > Reproduce steps: > # Setup a cluster with 3C3D({color:#de350b}MultiLeaderConsensus{color}) > # Using 3BMs to insert data(Loop=2000) > # Setup a cluster with 3C3D({color:#de350b}MultiLeaderConsensus{color}) > # Using 3BMs to insert data(Loop=4000 or Loop=6000) > BM - 》 IoTDB Node > 172.20.70.7 - 》 172.20.70.22 > 172.20.70.8 - 》 172.20.70.23 > 172.20.70.9 - 》 172.20.70.24 > !image-2022-09-23-08-44-57-017.png|width=510,height=409! > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IOTDB-5036) Take snapshot in parallel when IoTDB shutdown
Jinrui Zhang created IOTDB-5036: --- Summary: Take snapshot in parallel when IoTDB shutdown Key: IOTDB-5036 URL: https://issues.apache.org/jira/browse/IOTDB-5036 Project: Apache IoTDB Issue Type: Improvement Reporter: Jinrui Zhang Assignee: Jinrui Zhang Attachments: image-2022-11-24-16-24-08-844.png Currently, the IoTDB will take a snapshot for each DataRegion when shutdown. The snapshot is taken one by one according to DataRegions. Let's try to take the snapshot in parallel !image-2022-11-24-16-24-08-844.png|width=579,height=527! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (IOTDB-4971) dispatch write failed. status: TSStatus(code:506, subStatus:[]), message: null
[ https://issues.apache.org/jira/browse/IOTDB-4971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17638102#comment-17638102 ] Jinrui Zhang commented on IOTDB-4971: - 可以考虑打印明确的code 内容 > dispatch write failed. status: TSStatus(code:506, subStatus:[]), message: null > -- > > Key: IOTDB-4971 > URL: https://issues.apache.org/jira/browse/IOTDB-4971 > Project: Apache IoTDB > Issue Type: Bug > Components: mpp-cluster >Affects Versions: 0.14.0-SNAPSHOT >Reporter: 刘珍 >Assignee: Jinrui Zhang >Priority: Minor > Attachments: del_ts.sh, down_delete_ts.conf, run_del_1.sh, > run_del_2.sh, run_iotdb_4563.sh > > > master_1117_d548214 > 1. start 3rep 3C 9D cluster > 2. delete timeseries root.**和create metadata , write data 并发 > datanode (IP18) 有ERROR : > 2022-11-17 14:52:38,172 > [pool-24-IoTDB-DataNodeInternalRPC-Processor-17$20221117_065237_15126_11.1.0] > {color:red}*ERROR o.a.i.d.m.p.s.FragmentInstanceDispatcherImpl:173 - dispatch > write failed. status: TSStatus(code:506, subStatus:[]), message: null*{color} > 复现流程 > 1. 启动3C9D集群 > 3C : 172.20.70.19/172.20.70.21/172.20.70.32 > 9D : 172.20.70.2/3/4/5/13/14/15/16/18 > 配置参数 > ConfigNode: > MAX_HEAP_SIZE="8G" > MAX_DIRECT_MEMORY_SIZE="6G" > cn_connection_timeout_ms=360 > Common : > schema_region_consensus_protocol_class=org.apache.iotdb.consensus.ratis.RatisConsensus > data_region_consensus_protocol_class=org.apache.iotdb.consensus.multileader.MultiLeaderConsensus > schema_replication_factor=3 > data_replication_factor=3 > connection_timeout_ms=360 > max_connection_for_internal_service=200 > query_timeout_threshold=360 > schema_region_ratis_request_timeout_ms=180 > Datanode: > MAX_HEAP_SIZE="20G" > MAX_DIRECT_MEMORY_SIZE="6G" > 2. 启动测试脚本 > down_delete_ts.conf放到${bm_dir}/conf下 > del_ts.sh 、run_del_1.sh 、 run_del_2.sh 、run_iotdb_4563.sh 这4个脚本放到${bm_dir}下 > 启动脚本是:run_iotdb_4563.sh > 运行完成,查看 ip18datanode 日志。 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (IOTDB-4986) Too many IoTDB-DataNodeInternalRPC-Processor threads are open
[ https://issues.apache.org/jira/browse/IOTDB-4986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17638099#comment-17638099 ] Jinrui Zhang commented on IOTDB-4986: - It is not a functionality issue. Let's mark it as a enhancement because we need to process other bugs in priority. > Too many IoTDB-DataNodeInternalRPC-Processor threads are open > - > > Key: IOTDB-4986 > URL: https://issues.apache.org/jira/browse/IOTDB-4986 > Project: Apache IoTDB > Issue Type: Bug > Components: mpp-cluster >Affects Versions: 0.14.0-SNAPSHOT >Reporter: 刘珍 >Assignee: Jinrui Zhang >Priority: Critical > > m_1118_3d5eeae > 1. 启动3副本3C21D 集群 > 2. 顺序启动7Benchmark > 3. 某个节点的datanode IoTDB-DataNodeInternalRPC-Processor 线程会开的很多,2k+ > (慢慢会降下来),但是会偶现OOM > 2022-11-18 14:26:48,320 > [pool-22-IoTDB-DataNodeInternalRPC-Processor-374$20221118_062422_29227_16.1.0] > ERROR o.a.i.d.m.p.s.FragmentInstanceDispatcherImpl:234 - write locally > failed. TSStatus: TSStatus(code:506, subStatus:[]), message: null > 2022-11-18 14:29:44,568 [DataNodeInternalRPC-Service]{color:red}* ERROR > o.a.i.c.c.IoTDBDefaultThreadExceptionHandler:31 - Exception in thread > DataNodeInternalRPC-Service-40 > java.lang.OutOfMemoryError: unable to create native thread: possibly out of > memory or process/resource limits reached*{color} > at java.base/java.lang.Thread.start0(Native Method) > at java.base/java.lang.Thread.start(Thread.java:803) > at > java.base/java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:937) > at > java.base/java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1354) > at > org.apache.thrift.server.TThreadPoolServer.execute(TThreadPoolServer.java:155) > at > org.apache.thrift.server.TThreadPoolServer.serve(TThreadPoolServer.java:139) > at > org.apache.iotdb.commons.service.AbstractThriftServiceThread.run(AbstractThriftServiceThread.java:258) > 2022-11-18 14:29:53,751 [ClientRPC-Service] ERROR > o.a.i.c.c.IoTDBDefaultThreadExceptionHandler:31 - Exception in thread > ClientRPC-Service-42 > java.lang.OutOfMemoryError: unable to create native thread: possibly out of > memory or process/resource limits reached > at java.base/java.lang.Thread.start0(Native Method) > at java.base/java.lang.Thread.start(Thread.java:803) > at > java.base/java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:937) > at > java.base/java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1354) > at > org.apache.thrift.server.TThreadPoolServer.execute(TThreadPoolServer.java:155) > at > org.apache.thrift.server.TThreadPoolServer.serve(TThreadPoolServer.java:139) > at > org.apache.iotdb.commons.service.AbstractThriftServiceThread.run(AbstractThriftServiceThread.java:258) > 2022-11-18 14:30:11,736 [pool-6-IoTDB-Flush-4] ERROR > o.a.i.d.e.s.TsFileProcessor:1095 - root.test.g_0-6: > /data/iotdb/m_1118_3d5eeae/sbin/../data/datanode/data/unsequence/root.test.g_0/6/2538/1668752675355-5-0-0.tsfile > meet error when flushing a memtable, change system mode to error > java.lang.OutOfMemoryError: unable to create native thread: possibly out of > memory or process/resource limits reached > at java.base/java.lang.Thread.start0(Native Method) > at java.base/java.lang.Thread.start(Thread.java:803) > at > java.base/java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:937) > at > java.base/java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1354) > at > java.base/java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:118) > at > org.apache.iotdb.db.rescon.AbstractPoolManager.submit(AbstractPoolManager.java:56) > at > org.apache.iotdb.db.engine.flush.MemTableFlushTask.(MemTableFlushTask.java:88) > at > org.apache.iotdb.db.engine.storagegroup.TsFileProcessor.flushOneMemTable(TsFileProcessor.java:1082) > at > org.apache.iotdb.db.engine.flush.FlushManager$FlushThread.runMayThrow(FlushManager.java:108) > at > org.apache.iotdb.commons.concurrent.WrappedRunnable.run(WrappedRunnable.java:29) > at > java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) > at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) > at > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) > at > java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) > at java.base/java.lang.Thread.run(Thread.java:834) > 2022-11-18 14:30:11,736 [pool-6-IoTDB-Flush-4] ERROR > o.a.i.c.e.H
[jira] [Commented] (IOTDB-5015) [write]when writing for about 6 hours to only 1 sensor, the writing stopped with error: too many requests need to process
[ https://issues.apache.org/jira/browse/IOTDB-5015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17638098#comment-17638098 ] Jinrui Zhang commented on IOTDB-5015: - This issue may caused by the same reason in this issue https://issues.apache.org/jira/browse/IOTDB-5019. > [write]when writing for about 6 hours to only 1 sensor, the writing stopped > with error: too many requests need to process > - > > Key: IOTDB-5015 > URL: https://issues.apache.org/jira/browse/IOTDB-5015 > Project: Apache IoTDB > Issue Type: Bug > Components: Core/Server >Affects Versions: 0.14.0-SNAPSHOT >Reporter: changxue >Assignee: Jinrui Zhang >Priority: Major > Attachments: allnodes-logs.tar.gz, config.properties, screenshot-1.png > > > [write]when writing for about 6 hours to only 1 sensor, the writing stopped > with error: too many requests need to process > environment: > 3C3D cluster, 2 replicas > |RegionId|Type|Status|Database|SeriesSlotId|TimeSlotId|DataNodeId|Host|RpcPort|Role| > |10|SchemaRegion|Running|root.aggr.g_0|1|0|1|172.20.70.44|6667|Follower| > |10|SchemaRegion|Running|root.aggr.g_0|1|0|5|172.20.70.46|6667|Leader| > |11|DataRegion|Running|root.aggr.g_0|1|10|1|172.20.70.44|6667|Follower| > |11|DataRegion|Running|root.aggr.g_0|1|10|5|172.20.70.46|6667|Leader| > reproduction: > 1. start the cluster successfully > 2. start the 0.13 benchmark at about 20:23 Nov.21, the benchmark > configuration see the attachment of config.properties > 3. errors occurred at about 3:20 Nov.22 and writing can't be continued. > 4. 6:30 Nov.22, I start benchmark again, but can't write to iotdb successfully > 5. run stop-datanode.sh and start-datanode.sh at the bad node of 46 > 6. start benchmark again, now it can write successfully > 172.20.70.46 datanode: > {code:sh} > 2022-11-22 03:20:23,586 [pool-8-IoTDB-WAL-Delete-1] INFO > o.a.i.d.w.n.WALNode$DeleteOutdatedFileTask:367 - WAL node-root.aggr.g_0-11 > flushes memTable-4510 to TsFile > /data/iotdb/apache-iotdb-0.14.0-SNAPSHOT-all-bin/data/datanode/data/sequence/root.aggr.g_0/11/52/1669036247165-4504-0-0.tsfile, > memTable size is 1531600. > 2022-11-22 03:20:42,915 > [pool-25-IoTDB-ClientRPC-Processor-2$20221121_192013_20413_5.1.0] ERROR > o.a.i.d.m.p.s.FragmentInstanceDispatcherImpl:234 - write locally failed. > TSStatus: TSStatus(code:606, message:Reject write because there are too many > requests need to process), message: Reject write because there are too many > requests need to process > 2022-11-22 03:20:42,978 > [pool-25-IoTDB-ClientRPC-Processor-2$20221121_192043_20414_5.1.0] INFO > o.a.i.c.m.MultiLeaderServerImpl:178 - [Throttle Down] index:380448, > safeIndex:380448 > 2022-11-22 03:20:43,594 [pool-8-IoTDB-WAL-Delete-1] INFO > o.a.i.d.w.n.WALNode$DeleteOutdatedFileTask:242 - Effective information ratio > 1.8484028968067935E-4 (active memTables cost is 13563200, flushed memTables > cost is 73364378500) of wal node-root.aggr.g_0-11 is below wal min effective > info ratio 0.1, some memTables will be snapshot or flushed. > {code} > 3:20 monitor: > !screenshot-1.png! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (IOTDB-5030) java.lang.IllegalArgumentException: all replicas for region[TConsensusGroupId(type:SchemaRegion, id:6)] are not available in these DataNodes
[ https://issues.apache.org/jira/browse/IOTDB-5030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17638092#comment-17638092 ] Jinrui Zhang commented on IOTDB-5030: - We need to confirm whether the cluster is ok or not > java.lang.IllegalArgumentException: all replicas for > region[TConsensusGroupId(type:SchemaRegion, id:6)] are not available in these > DataNodes > - > > Key: IOTDB-5030 > URL: https://issues.apache.org/jira/browse/IOTDB-5030 > Project: Apache IoTDB > Issue Type: Bug > Components: mpp-cluster >Affects Versions: 0.14.0-SNAPSHOT >Reporter: 刘珍 >Assignee: Jinrui Zhang >Priority: Major > Attachments: iotdb_4851.conf > > > master_1123_32e2f98 > 1. 启动1副本3C5D集群 > 2. BM 写入数据,50分钟,ip68 报错 > {color:#DE350B}2022-11-23 15:32:46,820 > [pool-24-IoTDB-DataNodeInternalRPC-Processor-122] ERROR > o.a.t.ProcessFunction:47 - Internal error processing sendPlanNode > java.lang.IllegalArgumentException: all replicas for > region[TConsensusGroupId(type:SchemaRegion, id:1)] are not available in these > DataNodes[[TDataNodeLocation(dataNodeId:4, > clientRpcEndPoint:TEndPoint(ip:192.168.10.66, port:6667), > internalEndPoint:TEndPoint(ip:192.168.10.66, port:9003), > mPPDataExchangeEndPoint:TEndPoint(ip:192.168.10.66, port:8777), > dataRegionConsensusEndPoint:TEndPoint(ip:192.168.10.66, port:40010), > schemaRegionConsensusEndPoint:TEndPoint(ip:192.168.10.66, > port:50010))]]{color} > at > org.apache.iotdb.db.mpp.plan.planner.distribution.SimpleFragmentParallelPlanner.selectTargetDataNode(SimpleFragmentParallelPlanner.java:146) > at > org.apache.iotdb.db.mpp.plan.planner.distribution.SimpleFragmentParallelPlanner.produceFragmentInstance(SimpleFragmentParallelPlanner.java:115) > at > org.apache.iotdb.db.mpp.plan.planner.distribution.SimpleFragmentParallelPlanner.prepare(SimpleFragmentParallelPlanner.java:87) > at > org.apache.iotdb.db.mpp.plan.planner.distribution.SimpleFragmentParallelPlanner.parallelPlan(SimpleFragmentParallelPlanner.java:78) > at > org.apache.iotdb.db.mpp.plan.planner.distribution.DistributionPlanner.planFragmentInstances(DistributionPlanner.java:94) > at > org.apache.iotdb.db.mpp.plan.planner.distribution.DistributionPlanner.planFragments(DistributionPlanner.java:78) > at > org.apache.iotdb.db.mpp.plan.execution.QueryExecution.doDistributedPlan(QueryExecution.java:304) > at > org.apache.iotdb.db.mpp.plan.execution.QueryExecution.start(QueryExecution.java:201) > at > org.apache.iotdb.db.mpp.plan.execution.QueryExecution.retry(QueryExecution.java:235) > at > org.apache.iotdb.db.mpp.plan.execution.QueryExecution.getStatus(QueryExecution.java:500) > at > org.apache.iotdb.db.mpp.plan.Coordinator.execute(Coordinator.java:152) > at > org.apache.iotdb.db.mpp.plan.analyze.ClusterSchemaFetcher.executeSchemaFetchQuery(ClusterSchemaFetcher.java:178) > at > org.apache.iotdb.db.mpp.plan.analyze.ClusterSchemaFetcher.fetchSchema(ClusterSchemaFetcher.java:156) > at > org.apache.iotdb.db.mpp.plan.analyze.ClusterSchemaFetcher.fetchSchema(ClusterSchemaFetcher.java:98) > at > org.apache.iotdb.db.mpp.plan.analyze.ClusterSchemaFetcher.fetchSchemaWithAutoCreate(ClusterSchemaFetcher.java:265) > at > org.apache.iotdb.db.mpp.plan.analyze.SchemaValidator.validate(SchemaValidator.java:56) > at > org.apache.iotdb.db.mpp.execution.executor.RegionWriteExecutor$WritePlanNodeExecutionVisitor.executeDataInsert(RegionWriteExecutor.java:193) > at > org.apache.iotdb.db.mpp.execution.executor.RegionWriteExecutor$WritePlanNodeExecutionVisitor.visitInsertTablet(RegionWriteExecutor.java:165) > at > org.apache.iotdb.db.mpp.execution.executor.RegionWriteExecutor$WritePlanNodeExecutionVisitor.visitInsertTablet(RegionWriteExecutor.java:119) > at > org.apache.iotdb.db.mpp.plan.planner.plan.node.write.InsertTabletNode.accept(InsertTabletNode.java:1086) > at > org.apache.iotdb.db.mpp.execution.executor.RegionWriteExecutor.execute(RegionWriteExecutor.java:85) > at > org.apache.iotdb.db.service.thrift.impl.DataNodeInternalRPCServiceImpl.sendPlanNode(DataNodeInternalRPCServiceImpl.java:283) > at > org.apache.iotdb.mpp.rpc.thrift.IDataNodeRPCService$Processor$sendPlanNode.getResult(IDataNodeRPCService.java:3607) > at > org.apache.iotdb.mpp.rpc.thrift.IDataNodeRPCService$Processor$sendPlanNode.getResult(IDataNodeRPCService.java:3587) > at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:38) > at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:38) >
[jira] [Commented] (IOTDB-5030) java.lang.IllegalArgumentException: all replicas for region[TConsensusGroupId(type:SchemaRegion, id:6)] are not available in these DataNodes
[ https://issues.apache.org/jira/browse/IOTDB-5030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17638091#comment-17638091 ] Jinrui Zhang commented on IOTDB-5030: - The issue is caused by the schemaFetching failure during writing. I have two questions here : # Why the SchemaRegion's replica is only distributed in 66 ? It should has 3 replicas but only 1 is got from ConfigNode's partition info. # It seems that 66 cannot process the FI. We need to investigate the error log from 66 > java.lang.IllegalArgumentException: all replicas for > region[TConsensusGroupId(type:SchemaRegion, id:6)] are not available in these > DataNodes > - > > Key: IOTDB-5030 > URL: https://issues.apache.org/jira/browse/IOTDB-5030 > Project: Apache IoTDB > Issue Type: Bug > Components: mpp-cluster >Affects Versions: 0.14.0-SNAPSHOT >Reporter: 刘珍 >Assignee: Jinrui Zhang >Priority: Major > Attachments: iotdb_4851.conf > > > master_1123_32e2f98 > 1. 启动1副本3C5D集群 > 2. BM 写入数据,50分钟,ip68 报错 > {color:#DE350B}2022-11-23 15:32:46,820 > [pool-24-IoTDB-DataNodeInternalRPC-Processor-122] ERROR > o.a.t.ProcessFunction:47 - Internal error processing sendPlanNode > java.lang.IllegalArgumentException: all replicas for > region[TConsensusGroupId(type:SchemaRegion, id:1)] are not available in these > DataNodes[[TDataNodeLocation(dataNodeId:4, > clientRpcEndPoint:TEndPoint(ip:192.168.10.66, port:6667), > internalEndPoint:TEndPoint(ip:192.168.10.66, port:9003), > mPPDataExchangeEndPoint:TEndPoint(ip:192.168.10.66, port:8777), > dataRegionConsensusEndPoint:TEndPoint(ip:192.168.10.66, port:40010), > schemaRegionConsensusEndPoint:TEndPoint(ip:192.168.10.66, > port:50010))]]{color} > at > org.apache.iotdb.db.mpp.plan.planner.distribution.SimpleFragmentParallelPlanner.selectTargetDataNode(SimpleFragmentParallelPlanner.java:146) > at > org.apache.iotdb.db.mpp.plan.planner.distribution.SimpleFragmentParallelPlanner.produceFragmentInstance(SimpleFragmentParallelPlanner.java:115) > at > org.apache.iotdb.db.mpp.plan.planner.distribution.SimpleFragmentParallelPlanner.prepare(SimpleFragmentParallelPlanner.java:87) > at > org.apache.iotdb.db.mpp.plan.planner.distribution.SimpleFragmentParallelPlanner.parallelPlan(SimpleFragmentParallelPlanner.java:78) > at > org.apache.iotdb.db.mpp.plan.planner.distribution.DistributionPlanner.planFragmentInstances(DistributionPlanner.java:94) > at > org.apache.iotdb.db.mpp.plan.planner.distribution.DistributionPlanner.planFragments(DistributionPlanner.java:78) > at > org.apache.iotdb.db.mpp.plan.execution.QueryExecution.doDistributedPlan(QueryExecution.java:304) > at > org.apache.iotdb.db.mpp.plan.execution.QueryExecution.start(QueryExecution.java:201) > at > org.apache.iotdb.db.mpp.plan.execution.QueryExecution.retry(QueryExecution.java:235) > at > org.apache.iotdb.db.mpp.plan.execution.QueryExecution.getStatus(QueryExecution.java:500) > at > org.apache.iotdb.db.mpp.plan.Coordinator.execute(Coordinator.java:152) > at > org.apache.iotdb.db.mpp.plan.analyze.ClusterSchemaFetcher.executeSchemaFetchQuery(ClusterSchemaFetcher.java:178) > at > org.apache.iotdb.db.mpp.plan.analyze.ClusterSchemaFetcher.fetchSchema(ClusterSchemaFetcher.java:156) > at > org.apache.iotdb.db.mpp.plan.analyze.ClusterSchemaFetcher.fetchSchema(ClusterSchemaFetcher.java:98) > at > org.apache.iotdb.db.mpp.plan.analyze.ClusterSchemaFetcher.fetchSchemaWithAutoCreate(ClusterSchemaFetcher.java:265) > at > org.apache.iotdb.db.mpp.plan.analyze.SchemaValidator.validate(SchemaValidator.java:56) > at > org.apache.iotdb.db.mpp.execution.executor.RegionWriteExecutor$WritePlanNodeExecutionVisitor.executeDataInsert(RegionWriteExecutor.java:193) > at > org.apache.iotdb.db.mpp.execution.executor.RegionWriteExecutor$WritePlanNodeExecutionVisitor.visitInsertTablet(RegionWriteExecutor.java:165) > at > org.apache.iotdb.db.mpp.execution.executor.RegionWriteExecutor$WritePlanNodeExecutionVisitor.visitInsertTablet(RegionWriteExecutor.java:119) > at > org.apache.iotdb.db.mpp.plan.planner.plan.node.write.InsertTabletNode.accept(InsertTabletNode.java:1086) > at > org.apache.iotdb.db.mpp.execution.executor.RegionWriteExecutor.execute(RegionWriteExecutor.java:85) > at > org.apache.iotdb.db.service.thrift.impl.DataNodeInternalRPCServiceImpl.sendPlanNode(DataNodeInternalRPCServiceImpl.java:283) > at > org.apache.iotdb.mpp.rpc.thrift.IDataNodeRPCService$Processor$sendPlanNode.getResult(IDataNodeRPCService.java:3607) > at > org.apache.i
[jira] [Commented] (IOTDB-4972) [DispatchFailed] NPE at org.apache.iotdb.db.mpp.plan.planner.plan.node.write.InsertNode.selfCheckDataTypes(InsertNode.java:251)
[ https://issues.apache.org/jira/browse/IOTDB-4972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17637554#comment-17637554 ] Jinrui Zhang commented on IOTDB-4972: - SENSOR_NUM is 10, which is too large. Let's investigate why the NPE is threw with large amount of series. > [DispatchFailed] NPE at > org.apache.iotdb.db.mpp.plan.planner.plan.node.write.InsertNode.selfCheckDataTypes(InsertNode.java:251) > --- > > Key: IOTDB-4972 > URL: https://issues.apache.org/jira/browse/IOTDB-4972 > Project: Apache IoTDB > Issue Type: Bug > Components: mpp-cluster >Affects Versions: 0.14.0-SNAPSHOT >Reporter: 刘珍 >Assignee: Jinrui Zhang >Priority: Major > Attachments: more_ts.conf > > > master_1117_92c6a57 > 1. 启动3副本3C3D集群 > 2. benchmark创建元数据,写入数据 > 3.(ip62) datanode ERROR ,所有的数据写入失败(预期每个序列写入10个点,共5000万序列): > {color:red}*2022-11-17 16:32:59,456 > [pool-26-IoTDB-ClientRPC-Processor-35$20221117_083256_00512_3] ERROR > o.a.i.d.m.p.s.FragmentInstanceDispatcherImpl:123 - [DispatchFailed] > java.lang.NullPointerException: null > at > org.apache.iotdb.db.mpp.plan.planner.plan.node.write.InsertNode.selfCheckDataTypes(InsertNode.java:251)*{color} > at > org.apache.iotdb.db.mpp.plan.planner.plan.node.write.InsertTabletNode.validateAndSetSchema(InsertTabletNode.java:201) > at > org.apache.iotdb.db.mpp.plan.analyze.SchemaValidator.validate(SchemaValidator.java:64) > at > org.apache.iotdb.db.mpp.execution.executor.RegionWriteExecutor$WritePlanNodeExecutionVisitor.executeDataInsert(RegionWriteExecutor.java:191) > at > org.apache.iotdb.db.mpp.execution.executor.RegionWriteExecutor$WritePlanNodeExecutionVisitor.visitInsertTablet(RegionWriteExecutor.java:163) > at > org.apache.iotdb.db.mpp.execution.executor.RegionWriteExecutor$WritePlanNodeExecutionVisitor.visitInsertTablet(RegionWriteExecutor.java:117) > at > org.apache.iotdb.db.mpp.plan.planner.plan.node.write.InsertTabletNode.accept(InsertTabletNode.java:1086) > at > org.apache.iotdb.db.mpp.execution.executor.RegionWriteExecutor.execute(RegionWriteExecutor.java:83) > at > org.apache.iotdb.db.mpp.plan.scheduler.FragmentInstanceDispatcherImpl.dispatchLocally(FragmentInstanceDispatcherImpl.java:232) > at > org.apache.iotdb.db.mpp.plan.scheduler.FragmentInstanceDispatcherImpl.dispatchOneInstance(FragmentInstanceDispatcherImpl.java:137) > at > org.apache.iotdb.db.mpp.plan.scheduler.FragmentInstanceDispatcherImpl.dispatchWriteSync(FragmentInstanceDispatcherImpl.java:119) > at > org.apache.iotdb.db.mpp.plan.scheduler.FragmentInstanceDispatcherImpl.dispatch(FragmentInstanceDispatcherImpl.java:90) > at > org.apache.iotdb.db.mpp.plan.scheduler.ClusterScheduler.start(ClusterScheduler.java:106) > at > org.apache.iotdb.db.mpp.plan.execution.QueryExecution.schedule(QueryExecution.java:287) > at > org.apache.iotdb.db.mpp.plan.execution.QueryExecution.start(QueryExecution.java:205) > at > org.apache.iotdb.db.mpp.plan.Coordinator.execute(Coordinator.java:150) > at > org.apache.iotdb.db.mpp.plan.Coordinator.execute(Coordinator.java:164) > at > org.apache.iotdb.db.service.thrift.impl.ClientRPCServiceImpl.insertTablet(ClientRPCServiceImpl.java:1234) > at > org.apache.iotdb.service.rpc.thrift.IClientRPCService$Processor$insertTablet.getResult(IClientRPCService.java:4078) > at > org.apache.iotdb.service.rpc.thrift.IClientRPCService$Processor$insertTablet.getResult(IClientRPCService.java:4058) > at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:38) > at > org.apache.iotdb.db.service.thrift.ProcessorWithMetrics.process(ProcessorWithMetrics.java:64) > at > org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:248) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > 复现流程 > 1. 192.168.10.62/66/68 72C256GB 3C3D > ConfigNode配置文件: > MAX_HEAP_SIZE="8G" > cn_connection_timeout_ms=360 > DataNode配置文件: > MAX_HEAP_SIZE="192G" > MAX_DIRECT_MEMORY_SIZE="32G" > Common配置文件: > schema_region_consensus_protocol_class=org.apache.iotdb.consensus.ratis.RatisConsensus > data_region_consensus_protocol_class=org.apache.iotdb.consensus.multileader.MultiLeaderConsensus > schema_replication_factor=3 > data_replication_factor=3 > connection_timeout_ms=360 > max_connection_for_internal_service=1100 > enable_timed_flush_seq_memtable=true > seq_memtable_flush_interval_i
[jira] [Commented] (IOTDB-4400) [new stand-alone]enableMetric, the write performance does not meet expectations
[ https://issues.apache.org/jira/browse/IOTDB-4400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17635622#comment-17635622 ] Jinrui Zhang commented on IOTDB-4400: - Let's test this issue again with the latest MultiLeader code > [new stand-alone]enableMetric, the write performance does not meet > expectations > --- > > Key: IOTDB-4400 > URL: https://issues.apache.org/jira/browse/IOTDB-4400 > Project: Apache IoTDB > Issue Type: Bug > Components: Others >Affects Versions: master branch, 0.14.0, 0.14.0-SNAPSHOT >Reporter: xiaozhihong >Assignee: 张洪胤 >Priority: Major > Attachments: config.properties, image-2022-09-14-10-27-26-819.png > > > commit 74fb350809b2f1488a90d6d7c420f27ec14b24e5 > Turn on the monitoring function, and the two frameworks perform write > performance tests for different metric levels. The final result is very > confusing. Different levels or different frameworks, the performance of > writing is not noticeable. Turn off monitoring, execute writing, and also no > obvious difference is seen, and a positioning investigation needs to be done. > Details: > https://apache-iotdb.feishu.cn/docx/QUQSdbRaaoWDjQxcEz9cQjFdnsz?from=create_suite > !image-2022-09-14-10-27-26-819.png|width=528,height=275! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (IOTDB-4505) [SystemResourceIssue] Why is the 60 client test better than the 300 client test
[ https://issues.apache.org/jira/browse/IOTDB-4505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17635620#comment-17635620 ] Jinrui Zhang commented on IOTDB-4505: - It should be related to the memory usage. That is, 300 clients will consume more memory so the available memory for MultiLeader is less than 60 clients. We have an optimization for memory control. See this PR https://github.com/apache/iotdb/pull/8025 > [SystemResourceIssue] Why is the 60 client test better than the 300 client > test > --- > > Key: IOTDB-4505 > URL: https://issues.apache.org/jira/browse/IOTDB-4505 > Project: Apache IoTDB > Issue Type: Bug > Components: Core/Cluster >Affects Versions: 0.14.0-SNAPSHOT >Reporter: FengQingxin >Assignee: Jinrui Zhang >Priority: Major > Attachments: config.properties, confignode-env.sh, datanode-env.sh, > image-2022-09-23-08-33-48-642.png, image-2022-09-26-08-10-41-274.png, > iotdb-confignode.properties, iotdb-datanode.properties > > > Reducing the number of clients reduces the number of threads and the number > of open files, and there is no write failure. The data file size difference > between the three nodes disappears > Reproduce steps: > # Setup a cluster with 3C3D({color:#de350b}MultiLeaderConsensus{color}) > # Using 3BMs to insert data(client=100*3) > # Setup a cluster with 3C3D({color:#de350b}MultiLeaderConsensus{color}) > # Using 3BMs to insert data(client=20*3) > BM - 》 IoTDB Node > 172.20.70.7 - 》 172.20.70.22 > 172.20.70.8 - 》 172.20.70.23 > 172.20.70.9 - 》 172.20.70.24 > > !image-2022-09-23-08-33-48-642.png! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (IOTDB-4506) [cluster]The data amount of the three nodes is quite different
[ https://issues.apache.org/jira/browse/IOTDB-4506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17635617#comment-17635617 ] Jinrui Zhang commented on IOTDB-4506: - We gave more optimization on MultiLeader module, let's keep tracking the metrics > [cluster]The data amount of the three nodes is quite different > -- > > Key: IOTDB-4506 > URL: https://issues.apache.org/jira/browse/IOTDB-4506 > Project: Apache IoTDB > Issue Type: Bug > Components: Core/Cluster >Affects Versions: 0.14.0-SNAPSHOT >Reporter: FengQingxin >Assignee: Jinrui Zhang >Priority: Major > Attachments: config.properties, confignode-env.sh, datanode-env.sh, > image-2022-09-23-08-42-14-584.png, iotdb-confignode.properties, > iotdb-datanode.properties > > > The data sizes of the three nodes are quite different, as are the maximum > open files and the maximum number of threads > Reproduce steps: > # Setup a cluster with 3C3D({color:#de350b}MultiLeaderConsensus{color}) > # Using 3BMs to insert data(client=100*3) > BM - 》 IoTDB Node > 172.20.70.7 - 》 172.20.70.22 > 172.20.70.8 - 》 172.20.70.23 > 172.20.70.9 - 》 172.20.70.24 > > [http://111.202.73.147:13000/d/Qj_LC2G4z/atm-biao-zhun-da-qi-ya-huan-jing-ji-qun-xie-ru?orgId=1&from=1663288900985&to=1663643813559] > > !image-2022-09-23-08-42-14-584.png|width=681,height=299! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (IOTDB-4825) [ multiLeader ] ERROR o.a.i.d.m.p.s.FixedRateFragInsStateTracker:114 - error happened while fetching query state
[ https://issues.apache.org/jira/browse/IOTDB-4825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17635614#comment-17635614 ] Jinrui Zhang commented on IOTDB-4825: - According to the log and write operation result, this issue seems to be an occasional issue caused by network connection. Won't fix > [ multiLeader ] ERROR o.a.i.d.m.p.s.FixedRateFragInsStateTracker:114 - error > happened while fetching query state > > > Key: IOTDB-4825 > URL: https://issues.apache.org/jira/browse/IOTDB-4825 > Project: Apache IoTDB > Issue Type: Bug > Components: mpp-cluster >Affects Versions: 0.14.0-SNAPSHOT >Reporter: 刘珍 >Assignee: Jinrui Zhang >Priority: Major > Attachments: iotdb_4825.conf, screenshot-1.png > > > master_1101_bc0e88b > 3rep , 3C3D > schema region : ratis > data region : multiLeader > ip62 datanode ERROR during writing (All nodes are RUNNING) : > 2022-11-01 17:09:25,158 [pool-23-IoTDB-MPPCoordinatorScheduled-1] ERROR > o.a.i.d.m.p.s.FixedRateFragInsStateTracker:114 -{color:#DE350B}* error > happened while fetching query state*{color} > java.io.IOException: Borrow client from pool for node > TEndPoint(ip:192.168.10.66, port:9003) failed. > at > org.apache.iotdb.commons.client.ClientManager.borrowClient(ClientManager.java:61) > at > org.apache.iotdb.db.mpp.plan.scheduler.AbstractFragInsStateTracker.fetchState(AbstractFragInsStateTracker.java:82) > at > org.apache.iotdb.db.mpp.plan.scheduler.FixedRateFragInsStateTracker.fetchStateAndUpdate(FixedRateFragInsStateTracker.java:98) > at > org.apache.iotdb.commons.concurrent.threadpool.ScheduledExecutorUtil.lambda$scheduleAtFixedRate$0(ScheduledExecutorUtil.java:153) > at > org.apache.iotdb.commons.concurrent.WrappedRunnable$1.runMayThrow(WrappedRunnable.java:44) > at > org.apache.iotdb.commons.concurrent.WrappedRunnable.run(WrappedRunnable.java:29) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.lang.InterruptedException: null > at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2014) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2088) > at > org.apache.commons.pool2.impl.LinkedBlockingDeque.pollFirst(LinkedBlockingDeque.java:937) > at > org.apache.commons.pool2.impl.LinkedBlockingDeque.pollFirst(LinkedBlockingDeque.java:956) > at > org.apache.commons.pool2.impl.GenericKeyedObjectPool.borrowObject(GenericKeyedObjectPool.java:449) > at > org.apache.commons.pool2.impl.GenericKeyedObjectPool.borrowObject(GenericKeyedObjectPool.java:350) > at > org.apache.iotdb.commons.client.ClientManager.borrowClient(ClientManager.java:50) > ... 12 common frames omitted > 测试流程: > 1. 192.168.10.62/66/68 72C256GB > ConfigNode > MAX_HEAP_SIZE="8G" > Common > query_timeout_threshold=3600 > schema_region_consensus_protocol_class=org.apache.iotdb.consensus.ratis.RatisConsensus > data_region_consensus_protocol_class=org.apache.iotdb.consensus.multileader.MultiLeaderConsensus > schema_replication_factor=3 > data_replication_factor=3 > partition_region_ratis_request_timeout_ms=120 > schema_region_ratis_request_timeout_ms=120 > data_region_ratis_request_timeout_ms=120 > partition_region_ratis_max_retry_attempts=1 > schema_region_ratis_max_retry_attempts=1 > data_region_ratis_max_retry_attempts=1 > DataNode > MAX_HEAP_SIZE="192G" > MAX_DIRECT_MEMORY_SIZE="32G" > 2. bm 在192.168.10.64 > /data/liuzhen_test/weektest/benchmark_tool > 配置见附件 > {color:#00875A}*All writes succeeded*{color} > !screenshot-1.png! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IOTDB-4969) Distribution plan is not correct for Aggregation query with AlignByDevice
Jinrui Zhang created IOTDB-4969: --- Summary: Distribution plan is not correct for Aggregation query with AlignByDevice Key: IOTDB-4969 URL: https://issues.apache.org/jira/browse/IOTDB-4969 Project: Apache IoTDB Issue Type: Bug Reporter: Jinrui Zhang Assignee: Jinrui Zhang Attachments: image-2022-11-17-14-41-37-947.png, image-2022-11-17-14-44-18-254.png If one device's data is distributed in more than one DataRegion. The aggregation query for this device is not correct due to wrong distribution Plan. See the wrong plan below. !image-2022-11-17-14-41-37-947.png|width=615,height=391! This plan will lead to wrong result as follow: !image-2022-11-17-14-44-18-254.png|width=380,height=88! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (IOTDB-4873) Multi-user concurrent write and query + [ select into ] : ERROR o.a.i.c.m.t.MultiLeaderConsensusIService$AsyncProcessor$syncLog$1:903 - Exception inside handler
[ https://issues.apache.org/jira/browse/IOTDB-4873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17633427#comment-17633427 ] Jinrui Zhang commented on IOTDB-4873: - It seems that some requests are fetched both from Queue and WAL when preparing batch, which leads to the `merge` operation on receiver side. See the snapshot below !image-2022-11-14-09-19-47-544.png|width=1117,height=351! > Multi-user concurrent write and query + [ select into ] : ERROR > o.a.i.c.m.t.MultiLeaderConsensusIService$AsyncProcessor$syncLog$1:903 - > Exception inside handler > > > Key: IOTDB-4873 > URL: https://issues.apache.org/jira/browse/IOTDB-4873 > Project: Apache IoTDB > Issue Type: Bug > Components: mpp-cluster >Affects Versions: 0.14.0-SNAPSHOT >Reporter: 刘珍 >Assignee: Haiming Zhu >Priority: Major > Attachments: 4873.conf, image-2022-11-14-09-17-48-992.png, > image-2022-11-14-09-18-10-120.png, image-2022-11-14-09-19-47-544.png, > screenshot-1.png, select_into.sh > > > master_1107_523e82a > 1. start 3rep ,3C 3D cluster > 2. Start benchmark concurrent writes and queries > 3. After 16 hours, ip62 execute " select into " > About 1000 SQL, single user execution : > ”select s_0,s_1,s_2,s_3,s_4,s_5,s_6,s_7,s_8,s_9,s_10 into > root.test.g_1.::(::) from root.test.g_1.d_ip62_660” > !screenshot-1.png! > ip62 datanode displays the following error log : > 2022-11-08 09:27:31,366 [pool-20-IoTDB-MultiLeaderConsensusRPC-Processor-72] > ERROR o.a.i.c.m.t.MultiLeaderConsensusIService$AsyncProcessor$syncLog$1:903 - > Exception inside handler > java.lang.NullPointerException: null > at > org.apache.iotdb.db.consensus.statemachine.DataRegionStateMachine.mergeInsertNodes(DataRegionStateMachine.java:376) > at > org.apache.iotdb.db.consensus.statemachine.DataRegionStateMachine.grabInsertNode(DataRegionStateMachine.java:295) > at > org.apache.iotdb.db.consensus.statemachine.DataRegionStateMachine.deserializeAndWrap(DataRegionStateMachine.java:272) > at > org.apache.iotdb.db.consensus.statemachine.DataRegionStateMachine.write(DataRegionStateMachine.java:325) > at > org.apache.iotdb.consensus.multileader.service.MultiLeaderRPCServiceProcessor.syncLog(MultiLeaderRPCServiceProcessor.java:132) > at > org.apache.iotdb.consensus.multileader.thrift.MultiLeaderConsensusIService$AsyncProcessor$syncLog.start(MultiLeaderConsensusIService.java:922) > at > org.apache.iotdb.consensus.multileader.thrift.MultiLeaderConsensusIService$AsyncProcessor$syncLog.start(MultiLeaderConsensusIService.java:865) > at > org.apache.thrift.TBaseAsyncProcessor.process(TBaseAsyncProcessor.java:103) > at > org.apache.thrift.server.AbstractNonblockingServer$AsyncFrameBuffer.invoke(AbstractNonblockingServer.java:603) > at org.apache.thrift.server.Invocation.run(Invocation.java:18) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > 2022-11-08 09:27:50,962 [Query-Worker-Thread-48$20221108_012730_15774_3.1.0] > ERROR o.a.i.d.m.e.o.p.AbstractIntoOperator:123 - Error occurred while > inserting tablets in SELECT INTO: can't connect to node > {}TEndPoint(ip:192.168.10.68, port:9003) > 2022-11-08 09:27:50,962 [Query-Worker-Thread-48$20221108_012730_15774_3.1.0] > ERROR o.a.i.d.m.e.s.AbstractDriverThread:80 - [ExecuteFailed] > org.apache.iotdb.db.exception.IntoProcessException: Error occurred while > inserting tablets in SELECT INTO: can't connect to node > {}TEndPoint(ip:192.168.10.68, port:9003) > at > org.apache.iotdb.db.mpp.execution.operator.process.AbstractIntoOperator.insertMultiTabletsInternally(AbstractIntoOperator.java:124) > at > org.apache.iotdb.db.mpp.execution.operator.process.IntoOperator.next(IntoOperator.java:73) > at > org.apache.iotdb.db.mpp.execution.driver.Driver.processInternal(Driver.java:186) > at > org.apache.iotdb.db.mpp.execution.driver.Driver.lambda$processFor$1(Driver.java:125) > at > org.apache.iotdb.db.mpp.execution.driver.Driver.tryWithLock(Driver.java:270) > at > org.apache.iotdb.db.mpp.execution.driver.Driver.processFor(Driver.java:118) > at > org.apache.iotdb.db.mpp.execution.schedule.DriverTaskThread.execute(DriverTaskThread.java:64) > at > org.apache.iotdb.db.mpp.execution.schedule.AbstractDriverThread.run(AbstractDriverThread.java:74) > 2022-11-08 09:27:50,966 [Query-Worker-Thread-48$20221108_012730_15774_3.1.0] > WARN o.a.i.d.m.e.s.Driv
[jira] [Assigned] (IOTDB-4873) Multi-user concurrent write and query + [ select into ] : ERROR o.a.i.c.m.t.MultiLeaderConsensusIService$AsyncProcessor$syncLog$1:903 - Exception inside handler
[ https://issues.apache.org/jira/browse/IOTDB-4873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jinrui Zhang reassigned IOTDB-4873: --- Assignee: Haiming Zhu (was: Jinrui Zhang) Please track this issue according to our experiment > Multi-user concurrent write and query + [ select into ] : ERROR > o.a.i.c.m.t.MultiLeaderConsensusIService$AsyncProcessor$syncLog$1:903 - > Exception inside handler > > > Key: IOTDB-4873 > URL: https://issues.apache.org/jira/browse/IOTDB-4873 > Project: Apache IoTDB > Issue Type: Bug > Components: mpp-cluster >Affects Versions: 0.14.0-SNAPSHOT >Reporter: 刘珍 >Assignee: Haiming Zhu >Priority: Major > Attachments: 4873.conf, image-2022-11-14-09-17-48-992.png, > screenshot-1.png, select_into.sh > > > master_1107_523e82a > 1. start 3rep ,3C 3D cluster > 2. Start benchmark concurrent writes and queries > 3. After 16 hours, ip62 execute " select into " > About 1000 SQL, single user execution : > ”select s_0,s_1,s_2,s_3,s_4,s_5,s_6,s_7,s_8,s_9,s_10 into > root.test.g_1.::(::) from root.test.g_1.d_ip62_660” > !screenshot-1.png! > ip62 datanode displays the following error log : > 2022-11-08 09:27:31,366 [pool-20-IoTDB-MultiLeaderConsensusRPC-Processor-72] > ERROR o.a.i.c.m.t.MultiLeaderConsensusIService$AsyncProcessor$syncLog$1:903 - > Exception inside handler > java.lang.NullPointerException: null > at > org.apache.iotdb.db.consensus.statemachine.DataRegionStateMachine.mergeInsertNodes(DataRegionStateMachine.java:376) > at > org.apache.iotdb.db.consensus.statemachine.DataRegionStateMachine.grabInsertNode(DataRegionStateMachine.java:295) > at > org.apache.iotdb.db.consensus.statemachine.DataRegionStateMachine.deserializeAndWrap(DataRegionStateMachine.java:272) > at > org.apache.iotdb.db.consensus.statemachine.DataRegionStateMachine.write(DataRegionStateMachine.java:325) > at > org.apache.iotdb.consensus.multileader.service.MultiLeaderRPCServiceProcessor.syncLog(MultiLeaderRPCServiceProcessor.java:132) > at > org.apache.iotdb.consensus.multileader.thrift.MultiLeaderConsensusIService$AsyncProcessor$syncLog.start(MultiLeaderConsensusIService.java:922) > at > org.apache.iotdb.consensus.multileader.thrift.MultiLeaderConsensusIService$AsyncProcessor$syncLog.start(MultiLeaderConsensusIService.java:865) > at > org.apache.thrift.TBaseAsyncProcessor.process(TBaseAsyncProcessor.java:103) > at > org.apache.thrift.server.AbstractNonblockingServer$AsyncFrameBuffer.invoke(AbstractNonblockingServer.java:603) > at org.apache.thrift.server.Invocation.run(Invocation.java:18) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > 2022-11-08 09:27:50,962 [Query-Worker-Thread-48$20221108_012730_15774_3.1.0] > ERROR o.a.i.d.m.e.o.p.AbstractIntoOperator:123 - Error occurred while > inserting tablets in SELECT INTO: can't connect to node > {}TEndPoint(ip:192.168.10.68, port:9003) > 2022-11-08 09:27:50,962 [Query-Worker-Thread-48$20221108_012730_15774_3.1.0] > ERROR o.a.i.d.m.e.s.AbstractDriverThread:80 - [ExecuteFailed] > org.apache.iotdb.db.exception.IntoProcessException: Error occurred while > inserting tablets in SELECT INTO: can't connect to node > {}TEndPoint(ip:192.168.10.68, port:9003) > at > org.apache.iotdb.db.mpp.execution.operator.process.AbstractIntoOperator.insertMultiTabletsInternally(AbstractIntoOperator.java:124) > at > org.apache.iotdb.db.mpp.execution.operator.process.IntoOperator.next(IntoOperator.java:73) > at > org.apache.iotdb.db.mpp.execution.driver.Driver.processInternal(Driver.java:186) > at > org.apache.iotdb.db.mpp.execution.driver.Driver.lambda$processFor$1(Driver.java:125) > at > org.apache.iotdb.db.mpp.execution.driver.Driver.tryWithLock(Driver.java:270) > at > org.apache.iotdb.db.mpp.execution.driver.Driver.processFor(Driver.java:118) > at > org.apache.iotdb.db.mpp.execution.schedule.DriverTaskThread.execute(DriverTaskThread.java:64) > at > org.apache.iotdb.db.mpp.execution.schedule.AbstractDriverThread.run(AbstractDriverThread.java:74) > 2022-11-08 09:27:50,966 [Query-Worker-Thread-48$20221108_012730_15774_3.1.0] > WARN o.a.i.d.m.e.s.DriverScheduler$Scheduler:387 - The task > 20221108_012730_15774_3.1.0 is aborted. All other tasks in the same query > will be cancelled > TEST ENV: > 1. 192.168.10.62 66 64 72CPU 256GB > ConfigNode : > MAX_HEAP_SIZE="12G" > MAX_DIRECT_MEMORY_SIZE="
[jira] [Commented] (IOTDB-4556) [Remove-DataNode] ERROR o.a.i.d.m.p.s.FragmentInstanceDispatcherImpl:287 - The consensus group DataRegion[24] doesn't exist
[ https://issues.apache.org/jira/browse/IOTDB-4556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17632232#comment-17632232 ] Jinrui Zhang commented on IOTDB-4556: - Talked with [~HeimingZ] for this issue, we found the behavior is not expected especially the error log `{*}failed to flush sync index{*}`. We doubt there might be some unexpected operation is triggered on ip73. Let's try to reproduce it by latest code and investigate the issue online if it still exists > [Remove-DataNode] ERROR o.a.i.d.m.p.s.FragmentInstanceDispatcherImpl:287 - > The consensus group DataRegion[24] doesn't exist > --- > > Key: IOTDB-4556 > URL: https://issues.apache.org/jira/browse/IOTDB-4556 > Project: Apache IoTDB > Issue Type: Bug > Components: mpp-cluster >Affects Versions: 0.14.0-SNAPSHOT >Reporter: 刘珍 >Assignee: Haiming Zhu >Priority: Major > Attachments: 73to74.png, 73to75.png, 73to76.png, > ip73_dataregion24.png, more_dev.conf > > > m_0929_71d5f65 > SchemaRegion : ratis > DataRegion : multiLeader > 均为3副本,3C5D > 启动客户端bm写入,缩容期间写入不停。 > bm运行40分钟,缩容节点1(ip72),1小时38分钟缩容成功。 > mv 节点1的data ,logs,再上线。 > 缩容节点2(ip73,开始缩容的时间09-29 14:10),此节点不包含DataRegion[24] > *{color:#DE350B}DataRegion[24]在ip74,ip75,ip76{color}* > 但是ip73 error : > 2022-09-29 14:23:39,273 > [pool-24-IoTDB-ClientRPC-Processor-2$20220929_062339_48081_4.1.0] > {color:#DE350B}*ERROR o.a.i.d.m.p.s.FragmentInstanceDispatcherImpl:287 - The > consensus group DataRegion[24] doesn't exist*{color} > 2022-09-29 14:23:39,275 [MultiLeaderConsensusClientPool-selector-98] ERROR > o.a.i.c.m.l.IndexController:111 - {color:#DE350B}*failed to flush sync index. > cannot find previous version file. previous: 93500*{color} > 2022-09-29 14:23:39,179 [pool-24-IoTDB-ClientRPC-Processor-45] WARN > o.a.i.d.u.ErrorHandlingUtils:62 - Status code: EXECUTE_STATEMENT_ERROR(400), > operation: insertTablet failed > java.lang.RuntimeException: > org.apache.iotdb.commons.exception.IoTDBException: There are no available > RegionGroups currently, please check the status of cluster DataNodes > at > org.apache.iotdb.db.mpp.plan.analyze.ClusterPartitionFetcher.getOrCreateDataPartition(ClusterPartitionFetcher.java:280) > at > org.apache.iotdb.db.mpp.plan.analyze.AnalyzeVisitor.visitInsertTablet(AnalyzeVisitor.java:1236) > at > org.apache.iotdb.db.mpp.plan.analyze.AnalyzeVisitor.visitInsertTablet(AnalyzeVisitor.java:150) > at > org.apache.iotdb.db.mpp.plan.statement.crud.InsertTabletStatement.accept(InsertTabletStatement.java:121) > at > org.apache.iotdb.db.mpp.plan.statement.StatementVisitor.process(StatementVisitor.java:98) > at > org.apache.iotdb.db.mpp.plan.analyze.Analyzer.analyze(Analyzer.java:40) > at > org.apache.iotdb.db.mpp.plan.execution.QueryExecution.analyze(QueryExecution.java:236) > at > org.apache.iotdb.db.mpp.plan.execution.QueryExecution.(QueryExecution.java:138) > at > org.apache.iotdb.db.mpp.plan.Coordinator.createQueryExecution(Coordinator.java:100) > at > org.apache.iotdb.db.mpp.plan.Coordinator.execute(Coordinator.java:133) > at > org.apache.iotdb.db.mpp.plan.Coordinator.execute(Coordinator.java:160) > at > org.apache.iotdb.db.service.thrift.impl.ClientRPCServiceImpl.insertTablet(ClientRPCServiceImpl.java:996) > at > org.apache.iotdb.service.rpc.thrift.IClientRPCService$Processor$insertTablet.getResult(IClientRPCService.java:3512) > at > org.apache.iotdb.service.rpc.thrift.IClientRPCService$Processor$insertTablet.getResult(IClientRPCService.java:3492) > at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:38) > at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:38) > at > org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:248) > at > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) > at > java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) > at java.base/java.lang.Thread.run(Thread.java:834) > Caused by: org.apache.iotdb.commons.exception.IoTDBException: There are no > available RegionGroups currently, please check the status of cluster DataNodes > ... 20 common frames omitted > 测试环境 > 1. 192.168.10.72/73/74/75/76 48CPU384GB > 3C : 72,73,74 > 5D : 72 ,73,74,75,76 > 集群配置 > ConfigNode > MAX_HEAP_SIZE="8G" > schema_region_consensus_protocol_class=org.apache.iotdb.consensus.ratis.RatisConsensus > data_region_consensus_protocol_class=org.apache.iotdb.consensus.multileader.MultiLeaderConsensus > schema_replication_factor=3 > data_replication_fa
[jira] [Assigned] (IOTDB-4556) [Remove-DataNode] ERROR o.a.i.d.m.p.s.FragmentInstanceDispatcherImpl:287 - The consensus group DataRegion[24] doesn't exist
[ https://issues.apache.org/jira/browse/IOTDB-4556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jinrui Zhang reassigned IOTDB-4556: --- Assignee: Haiming Zhu (was: Jinrui Zhang) > [Remove-DataNode] ERROR o.a.i.d.m.p.s.FragmentInstanceDispatcherImpl:287 - > The consensus group DataRegion[24] doesn't exist > --- > > Key: IOTDB-4556 > URL: https://issues.apache.org/jira/browse/IOTDB-4556 > Project: Apache IoTDB > Issue Type: Bug > Components: mpp-cluster >Affects Versions: 0.14.0-SNAPSHOT >Reporter: 刘珍 >Assignee: Haiming Zhu >Priority: Major > Attachments: more_dev.conf > > > m_0929_71d5f65 > SchemaRegion : ratis > DataRegion : multiLeader > 均为3副本,3C5D > 启动客户端bm写入,缩容期间写入不停。 > bm运行40分钟,缩容节点1(ip72),1小时38分钟缩容成功。 > mv 节点1的data ,logs,再上线。 > 缩容节点2(ip73,开始缩容的时间09-29 14:10),此节点不包含DataRegion[24] > *{color:#DE350B}DataRegion[24]在ip74,ip75,ip76{color}* > 但是ip73 error : > 2022-09-29 14:23:39,273 > [pool-24-IoTDB-ClientRPC-Processor-2$20220929_062339_48081_4.1.0] > {color:#DE350B}*ERROR o.a.i.d.m.p.s.FragmentInstanceDispatcherImpl:287 - The > consensus group DataRegion[24] doesn't exist*{color} > 2022-09-29 14:23:39,275 [MultiLeaderConsensusClientPool-selector-98] ERROR > o.a.i.c.m.l.IndexController:111 - {color:#DE350B}*failed to flush sync index. > cannot find previous version file. previous: 93500*{color} > 2022-09-29 14:23:39,179 [pool-24-IoTDB-ClientRPC-Processor-45] WARN > o.a.i.d.u.ErrorHandlingUtils:62 - Status code: EXECUTE_STATEMENT_ERROR(400), > operation: insertTablet failed > java.lang.RuntimeException: > org.apache.iotdb.commons.exception.IoTDBException: There are no available > RegionGroups currently, please check the status of cluster DataNodes > at > org.apache.iotdb.db.mpp.plan.analyze.ClusterPartitionFetcher.getOrCreateDataPartition(ClusterPartitionFetcher.java:280) > at > org.apache.iotdb.db.mpp.plan.analyze.AnalyzeVisitor.visitInsertTablet(AnalyzeVisitor.java:1236) > at > org.apache.iotdb.db.mpp.plan.analyze.AnalyzeVisitor.visitInsertTablet(AnalyzeVisitor.java:150) > at > org.apache.iotdb.db.mpp.plan.statement.crud.InsertTabletStatement.accept(InsertTabletStatement.java:121) > at > org.apache.iotdb.db.mpp.plan.statement.StatementVisitor.process(StatementVisitor.java:98) > at > org.apache.iotdb.db.mpp.plan.analyze.Analyzer.analyze(Analyzer.java:40) > at > org.apache.iotdb.db.mpp.plan.execution.QueryExecution.analyze(QueryExecution.java:236) > at > org.apache.iotdb.db.mpp.plan.execution.QueryExecution.(QueryExecution.java:138) > at > org.apache.iotdb.db.mpp.plan.Coordinator.createQueryExecution(Coordinator.java:100) > at > org.apache.iotdb.db.mpp.plan.Coordinator.execute(Coordinator.java:133) > at > org.apache.iotdb.db.mpp.plan.Coordinator.execute(Coordinator.java:160) > at > org.apache.iotdb.db.service.thrift.impl.ClientRPCServiceImpl.insertTablet(ClientRPCServiceImpl.java:996) > at > org.apache.iotdb.service.rpc.thrift.IClientRPCService$Processor$insertTablet.getResult(IClientRPCService.java:3512) > at > org.apache.iotdb.service.rpc.thrift.IClientRPCService$Processor$insertTablet.getResult(IClientRPCService.java:3492) > at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:38) > at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:38) > at > org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:248) > at > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) > at > java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) > at java.base/java.lang.Thread.run(Thread.java:834) > Caused by: org.apache.iotdb.commons.exception.IoTDBException: There are no > available RegionGroups currently, please check the status of cluster DataNodes > ... 20 common frames omitted > 测试环境 > 1. 192.168.10.72/73/74/75/76 48CPU384GB > 3C : 72,73,74 > 5D : 72 ,73,74,75,76 > 集群配置 > ConfigNode > MAX_HEAP_SIZE="8G" > schema_region_consensus_protocol_class=org.apache.iotdb.consensus.ratis.RatisConsensus > data_region_consensus_protocol_class=org.apache.iotdb.consensus.multileader.MultiLeaderConsensus > schema_replication_factor=3 > data_replication_factor=3 > connection_timeout_ms=12 > DataNode > MAX_HEAP_SIZE="256G" > MAX_DIRECT_MEMORY_SIZE="32G" > connection_timeout_ms=12 > max_connection_for_internal_service=200 > max_waiting_time_when_insert_blocked=60 > query_timeout_threshold=3600 > 2. benchmark配置见附件 > 3. bm运行40分钟 缩容ip72 > 等待ip72 缩容完成,datanode进程退出 > mv data logs > 再次启动ip72 > 4. 缩容ip73 ,出现问题描述的ERROR
[jira] [Assigned] (IOTDB-4752) some data is lost after the read-only node returns to normal
[ https://issues.apache.org/jira/browse/IOTDB-4752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jinrui Zhang reassigned IOTDB-4752: --- Assignee: Yongzao Dan (was: Haiming Zhu) > some data is lost after the read-only node returns to normal > > > Key: IOTDB-4752 > URL: https://issues.apache.org/jira/browse/IOTDB-4752 > Project: Apache IoTDB > Issue Type: Bug > Components: mpp-cluster >Affects Versions: 0.14.0-SNAPSHOT >Reporter: 刘珍 >Assignee: Yongzao Dan >Priority: Major > Attachments: disk space sample.png, > image-2022-10-25-16-22-00-198.png, iotdb_4752.conf, read-only check.png, > read-only reason.png, state machine lock.png, tsfile-content.png, > tsfiles.png, wal-content.png, 同步完成_ip72.out, 同步完成_ip73_ip74.out > > > m_1025_cbc6225 > schema region : ratis > data region : multiLeader > 3rep , 3C3D > During benchmark writing, ip72 is set to read-only due to "no space left on > device". > Wait for the benchmark write to complete, ip72 release disk space. > ip72 : SET SYSTEM TO RUNNING > Wait for synchronization to complete > ip72 : flush > Perform query comparison,ip72 {color:#DE350B}*lost some data*{color}. > "select count(s_0) ,count(s_9),count(s_99),count(s_999),count(s_) from > root.** align by device" > !image-2022-10-25-16-22-00-198.png! > Test environment : > 1. 192.168.10.72 / 73 /74 48CPU 384GB > benchmark : ip75 /home/liuzhen/benchmark/bm_0620_7ec96c1 > iotdb_dir : /ssd_data/mpp_test/m_1025_cbc6225 > ConfigNode > MAX_HEAP_SIZE="8G" > schema_region_consensus_protocol_class=org.apache.iotdb.consensus.ratis.RatisConsensus > data_region_consensus_protocol_class=org.apache.iotdb.consensus.multileader.MultiLeaderConsensus > schema_replication_factor=3 > data_replication_factor=3 > connection_timeout_ms=12 > schema_region_ratis_request_timeout_ms=120 > data_region_ratis_request_timeout_ms=120 > schema_region_ratis_max_retry_attempts=1 > data_region_ratis_max_retry_attempts=2 > DataNode > MAX_HEAP_SIZE="256G" > MAX_DIRECT_MEMORY_SIZE="32G" > avg_series_point_number_threshold=1 > max_waiting_time_when_insert_blocked=360 > enable_seq_space_compaction=false > enable_unseq_space_compaction=false > enable_cross_space_compaction=false > query_timeout_threshold=3600 > 2. benchmark configuration > see attachment . > 3. During benchmark writing, ip72 is set to read-only due to "no space left > on device". > 4. Wait for the benchmark write to complete, ip72 release disk space. > 5. ip72 : SET SYSTEM TO RUNNING > 6. Wait for synchronization to complete > 7. ip72 : flush > 8. Perform query comparison > select count(s_0) ,count(s_9),count(s_99),count(s_999),count(s_) from > root.** align by device -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IOTDB-4874) NPE error when migrating MultiLeader Peer with 1 replica
Jinrui Zhang created IOTDB-4874: --- Summary: NPE error when migrating MultiLeader Peer with 1 replica Key: IOTDB-4874 URL: https://issues.apache.org/jira/browse/IOTDB-4874 Project: Apache IoTDB Issue Type: Bug Reporter: Jinrui Zhang Assignee: Jinrui Zhang Attachments: image-2022-11-08-11-49-44-684.png There is a bug when migrating MultiLeader Peer with 1 replica. !image-2022-11-08-11-49-44-684.png|width=981,height=233! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (IOTDB-4731) [Remove-DataNode] Data is inconsistent ( remove datanode before the synchronization is complete )
[ https://issues.apache.org/jira/browse/IOTDB-4731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jinrui Zhang reassigned IOTDB-4731: --- Assignee: Haiming Zhu (was: Jinrui Zhang) > [Remove-DataNode] Data is inconsistent ( remove datanode before the > synchronization is complete ) > -- > > Key: IOTDB-4731 > URL: https://issues.apache.org/jira/browse/IOTDB-4731 > Project: Apache IoTDB > Issue Type: Bug > Components: mpp-cluster >Affects Versions: 0.14.0-SNAPSHOT >Reporter: 刘珍 >Assignee: Haiming Zhu >Priority: Major > Attachments: aft_set_readonlyip68_regions.out, > bef_set_ip68readonly_regions.out, image-2022-10-24-15-13-45-086.png, > image-2022-10-24-15-22-17-729.png, > ip64-is-newpeer_leader_g_4_q_after_remove.out, ip66_g_4_q_after_remove.out, > more_ts.conf > > > master_1023_2fea011 > 3rep , 3C3D ,benchmark write done . > Start the fourth datanode (ip64). > ip68 : SET SYSTEM TO READONLY ON LOCAL > remove datanode(ip68). > before remove , ip68 is DataRegion[14]' Leader , there is unsynchronized > data: > !image-2022-10-24-15-13-45-086.png! > When ip68 is in the removing state , datanode error log : > 2022-10-24 14:18:18,092 [pool-49-IoTDB-LogDispatcher-DataRegion[14]-3] ERROR > o.a.i.d.w.n.WALNode$PlanNodeIterator:590 - Fail to read wal from wal file > /data/liuzhen_test/master_1023_2fea011/sbin/../data/datanode/wal/root.test.g_4-14/_150-200-1.wal, > skip this file. > java.nio.channels.ClosedByInterruptException: null > at > java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202) > at sun.nio.ch.FileChannelImpl.size(FileChannelImpl.java:315) > at > org.apache.iotdb.db.wal.io.WALByteBufReader.(WALByteBufReader.java:47) > at > org.apache.iotdb.db.wal.node.WALNode$PlanNodeIterator.hasNext(WALNode.java:552) > at > org.apache.iotdb.db.wal.node.WALNode$PlanNodeIterator.next(WALNode.java:683) > at > org.apache.iotdb.consensus.multileader.logdispatcher.LogDispatcher$LogDispatcherThread.constructBatchFromWAL(LogDispatcher.java:438) > at > org.apache.iotdb.consensus.multileader.logdispatcher.LogDispatcher$LogDispatcherThread.getBatch(LogDispatcher.java:348) > at > org.apache.iotdb.consensus.multileader.logdispatcher.LogDispatcher$LogDispatcherThread.run(LogDispatcher.java:274) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > 2022-10-24 14:18:18,093 [pool-49-IoTDB-LogDispatcher-DataRegion[14]-3] ERROR > o.a.i.d.w.n.WALNode$PlanNodeIterator:590 - Fail to read wal from wal file > /data/liuzhen_test/master_1023_2fea011/sbin/../data/datanode/wal/root.test.g_4-14/_151-204-1.wal, > skip this file. > java.nio.channels.ClosedByInterruptException: null > at > java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202) > at sun.nio.ch.FileChannelImpl.size(FileChannelImpl.java:315) > at > org.apache.iotdb.db.wal.io.WALByteBufReader.(WALByteBufReader.java:47) > at > org.apache.iotdb.db.wal.node.WALNode$PlanNodeIterator.hasNext(WALNode.java:552) > at > org.apache.iotdb.db.wal.node.WALNode$PlanNodeIterator.hasNext(WALNode.java:597) > at > org.apache.iotdb.db.wal.node.WALNode$PlanNodeIterator.next(WALNode.java:683) > at > org.apache.iotdb.consensus.multileader.logdispatcher.LogDispatcher$LogDispatcherThread.constructBatchFromWAL(LogDispatcher.java:438) > at > org.apache.iotdb.consensus.multileader.logdispatcher.LogDispatcher$LogDispatcherThread.getBatch(LogDispatcher.java:348) > at > org.apache.iotdb.consensus.multileader.logdispatcher.LogDispatcher$LogDispatcherThread.run(LogDispatcher.java:274) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > 2022-10-24 14:18:18,093 [pool-49-IoTDB-LogDispatcher-DataRegion[14]-3] ERROR > o.a.i.c.m.l.LogDispatcher$LogDispatcherThread:294 - Unexpected error in > logDispatcher for peer Peer{groupId=DataRegion[14], > endpoint=TEndPoint(ip:192.168.10.64, port:40010), nodeId=6} > java.lang.ArrayInd