[ 
https://issues.apache.org/jira/browse/IOTDB-4904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17656563#comment-17656563
 ] 

Yongzao Dan commented on IOTDB-4904:
------------------------------------

We can't fix this bug currently. 
**

*Reason*

I notice that in benchmark configuration each benchmark will create 3k devices, 
which means each benchmark will create 3k SchemaPartition. However, the total 
number of SeriesPartitionSlot is only 10k. At this dynamical extension 
scenario, the first 3 DataNodes will storage all 3k SchemaPartitions that 
created by the first 1 benchmark. And in the next step, we have 6 DataNodes but 
the SchemaRegionGroup won't be extended unless the number of created 
SchemaPartitions reachs 5k(10000/(6/3)). Therefore, the earier a DataNode be 
registered the more SchemaPartitions it will take.
**

*Solution*

To solve this unbalnced scenario, we should support load balancing function in 
Partition level. i.e. We should have the ability to migrate Partitions in 
different RegionGroups. But this isn't our first priority mission.

> [ ConfigNode ] When dynamically extending DataNode resources online, you need 
> to optimize schemaregion allocation policies
> --------------------------------------------------------------------------------------------------------------------------
>
>                 Key: IOTDB-4904
>                 URL: https://issues.apache.org/jira/browse/IOTDB-4904
>             Project: Apache IoTDB
>          Issue Type: Improvement
>          Components: mpp-cluster
>            Reporter: 刘珍
>            Assignee: Yongzao Dan
>            Priority: Major
>         Attachments: all_online.sh, image-2022-11-10-15-06-01-931.png, 
> image-2022-11-10-15-07-15-997.png, ip26.conf, ip27.conf, ip28.conf, 
> ip29.conf, ip30.conf, ip31.conf, ip32.conf, online_exp_datanode.sh
>
>
> m_1109_87a416e, 3副本
> 1. 启动3C3D1Benchmark,写入数据,1小时。
> 2.集群在线扩展3DataNode,再启动1Benchmark,写入数据,1小时。
> ..
> 直至DataNode 扩展至21个,客户端为7Benchmark,会出现创建元数据失败(对照:3C21D 
> 1次全部启动,顺序间隔2s启动7Benchmark,元数据创建成功),因为schema region的分配策略不均衡,创建元数据报错日志:
> 2022-11-10 11:24:43,533 [3@group-000200000000-StateMachineUpdater] ERROR 
> o.a.i.d.m.v.SchemaExecutionVisitor:184 - IoTDB: MetaData error:
> org.apache.iotdb.db.exception.metadata.SeriesOverflowException: There are too 
> many timeseries in memory, please increase MAX_HEAP_SIZE in 
> datanode-env.sh/bat, restart and create timeseries again.
>         at 
> org.apache.iotdb.db.metadata.schemaregion.SchemaRegionMemoryImpl.createTimeseries(SchemaRegionMemoryImpl.java:575)
>         at 
> org.apache.iotdb.db.metadata.visitor.SchemaExecutionVisitor.executeInternalCreateTimeseries(SchemaExecutionVisitor.java:176)
>         at 
> org.apache.iotdb.db.metadata.visitor.SchemaExecutionVisitor.visitInternalCreateTimeSeries(SchemaExecutionVisitor.java:150)
>         at 
> org.apache.iotdb.db.metadata.visitor.SchemaExecutionVisitor.visitInternalCreateTimeSeries(SchemaExecutionVisitor.java:64)
>         at 
> org.apache.iotdb.db.mpp.plan.planner.plan.node.metedata.write.InternalCreateTimeSeriesNode.accept(InternalCreateTimeSeriesNode.java:105)
>         at 
> org.apache.iotdb.db.consensus.statemachine.SchemaRegionStateMachine.write(SchemaRegionStateMachine.java:73)
>         at 
> org.apache.iotdb.consensus.ratis.ApplicationStateMachineProxy.applyTransaction(ApplicationStateMachineProxy.java:137)
>         at 
> org.apache.ratis.server.impl.RaftServerImpl.applyLogToStateMachine(RaftServerImpl.java:1672)
>         at 
> org.apache.ratis.server.impl.StateMachineUpdater.applyLog(StateMachineUpdater.java:239)
>         at 
> org.apache.ratis.server.impl.StateMachineUpdater.run(StateMachineUpdater.java:182)
>         at java.base/java.lang.Thread.run(Thread.java:834)
> 在线扩展DataNode,最后状态的SchemaRegion :
>  !image-2022-11-10-15-06-01-931.png! 
> 不发生扩展,1次全部启动所有节点,7Benchmark运行完成的SchemaRegion :
>  !image-2022-11-10-15-07-15-997.png! 
> 测试环境,私有云1期
> 172.16.2.2 ~ 32 
> benchmark配置文件见附件ip*.conf
> 在线扩展脚本见online_exp_datanode.sh
> 不扩展脚本见all_online.sh



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to