[New Feature] PB Tree for Massive Schema Management
Hi, We're going to introduce PB Tree (Prefix-B+ Tree), a new SchemaEgnine mode, in V1.2.0, which supports evict temporarily useless schema to disk and load them back in need at runtime. With PB Tree, users will not suffer from memory constraint for schema management any more, especially in scenarios with massive time series. To enable PB Tree in IoTDB, the value of parameter named schema_engine_mode in iotdb-commons.properties should be configured as PB_Tree [1]. All the memory allocated for SchemaRegion will be used by PB Tree. The allocation can be modified by changing the value of parameters named storage_query_schema_consensus_free_memory_proportion and schema_memory_proportion in iotdb-commons.properties. PB Tree is a new implementation of MTree in SchemaRegion, consisting of a a PB Tree file and a prefix-tree-structured cache. The PB Tree file manages the prefix tree structure on disk by storing pointer pointing to first disk page storing children nodes in each node record. B+ Tree is used to manage brother nodes for fast single child node search. The cache is similar to the existing MTree implementation in SchemaRegion. The main difference is that the children of a node may not cached in memory, which should be read from disk by searching specific B+ Tree located by pointer stored in node record and cached in memory node object. Reference: [1] https://iotdb.apache.org/zh/UserGuide/V1.2.x/Reference/Common-Config-Manual.html#%E5%85%83%E6%95%B0%E6%8D%AE%E5%BC%95%E6%93%8E%E9%85%8D%E7%BD%AE Thanks. Yunkun Zhou Apache IoTDB Committer
Re: 回复: Add constraint to the length of database name
We support suffix path, which will be concat to the prefix path in "from clause", and full path, starting with "root", in "where" clause. The "root" is a identifier for us to recognize which is a full path. Chao Wang 于2022年11月18日周五 18:24写道: > +1, but why including root, i think the user could ignore the "root" when > we change sg to database. And we could ignore the root in the file path. > And, the sql select from root.dbname and select from dbname all are ok to > keep compatible. > > > > > Thanks! > > > Chao Wang > BONC ltd > ccgow...@163.com > 在2022年11月18日 18:02,冯 庆新 写道: > Agree with ‘add constraint to the length of database name’,but Can we > choose a value greater than 64? > > 发件人: Jialin Qiao<mailto:qiaojia...@apache.org> > 发送时间: 2022年11月18日 16:15 > 收件人: dev@iotdb.apache.org<mailto:dev@iotdb.apache.org> > 主题: Re: Add constraint to the length of database name > > +1 > — > Jialin Qiao > Apache IoTDB PMC > > 周钰坤 于2022年11月18日周五 15:44写道: > > Hi, > > We want to add constraint to the length of database name, as most > popular database systems have such constraint as well, for example the > length of database name in Mysql shall not exceed 64. Currently, the > maximum length of database name, including "root.", is *64* and it is > immutable. Such constraint can help avoid some bugs of database and region > management, since we use database names in the directory name, which shall > not exceed the max name length defined by file system. > > best regards > > Yukun Zhou, Tsinghua University > >
Re: Add constraint to the length of database name
It's the constraint that matters and the length 64 is sufficient for most cases. 冯 庆新 于2022年11月18日周五 18:02写道: > Agree with ‘add constraint to the length of database name’,but Can we > choose a value greater than 64? > > 发件人: Jialin Qiao<mailto:qiaojia...@apache.org> > 发送时间: 2022年11月18日 16:15 > 收件人: dev@iotdb.apache.org<mailto:dev@iotdb.apache.org> > 主题: Re: Add constraint to the length of database name > > +1 > ————— > Jialin Qiao > Apache IoTDB PMC > > 周钰坤 于2022年11月18日周五 15:44写道: > > > > Hi, > > > > We want to add constraint to the length of database name, as most > > popular database systems have such constraint as well, for example the > > length of database name in Mysql shall not exceed 64. Currently, the > > maximum length of database name, including "root.", is *64* and it is > > immutable. Such constraint can help avoid some bugs of database and > region > > management, since we use database names in the directory name, which > shall > > not exceed the max name length defined by file system. > > > > best regards > > > > Yukun Zhou, Tsinghua University > >
Add constraint to the length of database name
Hi, We want to add constraint to the length of database name, as most popular database systems have such constraint as well, for example the length of database name in Mysql shall not exceed 64. Currently, the maximum length of database name, including "root.", is *64* and it is immutable. Such constraint can help avoid some bugs of database and region management, since we use database names in the directory name, which shall not exceed the max name length defined by file system. best regards Yukun Zhou, Tsinghua University
Re: [IOTDB-3800]Add Node Type Column to 'SHOW CHILD PATHS' query
Good idea! To stick to our user guidance, the leaf node type should be named measurement, since the timeseries stands for a full path from root to leaf in metadata tree. -- Yukun Zhou Tsinghua University Zhou Yifu 于2022年7月11日周一 21:57写道: > > Hi all, > > Currently our show child paths query looks a bit disordered and it seems that > only the paths names can be displayed. Besides these paths names, we cannot > know more detail about this paths’ meanings in our metadata model. So I want > to add one more ‘node type’ column to it and also can be sorted by these > types. > How about node type is: > root -> sg internal -> storage group -> internal -> device -> timeseries > Feel free to discuss here, Thanks a lot! > > Regards, > Yifu Zhou
Re: [VOTE] Apache IoTDB 0.12.6 RC1 release
+1 --- Yukun Zhou Tsinghua University Yuan Tian 于2022年7月8日周五 18:08写道: > > Hi all, > > Apache IoTDB 0.12.6 is a bug-fix version from 0.12.5. You can get its > mainly changes from [5]. > > Apache IoTDB 0.12.6 has been staged under [2] and it’s time to vote > on accepting it for release. All Maven artifacts are available under [1]. > Voting will be open for 72hr. > A minimum of 3 binding +1 votes and more binding +1 than binding -1 > are required to pass. > > Release tag: v0.12.6 > Hash for the release tag: eeb67fab595e090bfc67c62f45017d618acd24cf > > Before voting +1, PMC members are required to download > the signed source code package, compile it as provided, and test > the resulting executable on their own platform, along with also > verifying that the package meets the requirements of the ASF policy > on releases. [3] > > You can achieve the above by following [4]. > > [ ] +1 accept (indicate what you validated - e.g. performed the > non-RM items in [4]) > [ ] -1 reject (explanation required) > > > [1] https://repository.apache.org/content/repositories/orgapacheiotdb-1079 > [2] https://dist.apache.org/repos/dist/dev/iotdb/0.12.6/rc1 > [3] https://www.apache.org/dev/release.html#approving-a-release > [4] > https://cwiki.apache.org/confluence/display/IOTDB/Validating+a+staged+Release > [5] https://dist.apache.org/repos/dist/dev/iotdb/0.12.6/rc1/RELEASE_NOTES.md > [6] https://dist.apache.org/repos/dist/dev/iotdb/KEYS > > > Best, > - > Yuan Tian
Re: The structure of distribution
I think the choice 2 is better, since configNode and dataNode will be deployed separately and multi-replica means the dir will be copied. SpriCoder 于2022年7月6日周三 12:04写道: > > To more specific: > In Choice 1, folders in apache-iotdb-0.14.0-SNAPSHOT-all-bin will like that: > ├── sbin > ├── conf > ├── config > ├── data > ├── ext > ├── logs > > ├── data > ├── data > ├── logs > > > > In Choice2, folder in apache-iotdb-0.14.0-SNAPSHOT-all-bin will like that: > ├── confignode > ├── conf > ├── data > ├── ext > ├── logs > └── sbin > ├── datanode > ├── conf > ├── data > ├── logs > └── sbin > > > > > > > > -- Original -- > From: >"SpriCoder" > > Date: Tue, Jul 5, 2022 05:54 PM > To: "dev" > Subject: The structure of distribution > > > > Hi all, > > Currently, we have confignode and datanode folder in the distribution. Which > has conf, sbin, and will store the default folder of data and system. And > there is a need to refactor distribution structure. > > I think there are two choices: > > 1. remove the confignode and datanode folder, and combine their script and > configuration files into the conf and sbin under the root. In this way all > folder generated by confignode will be put into config folder, and all folder > generated by confignode will be put into data folder. > > 2. use confignode and datanode folder to manage their script and > configuration files, like: confignode/sbin, confignode/conf, datanode/sbin, > datanode/conf, etc. In this way all folder generated by confignode will be > put into confignode folder and all folder generated by datanode will be put > into datanode folder. > > > > > What's your opinion? Looking forward to your reply. > > > Best, > > --- > > Hongyin Zhang
Refactor the rule of auth check
Hi Currently, the rule of iotdb's auth check is prefix match, which is inconsistent with pattern match in DDL and DML. Therefore, we want to refactor the rule to pattern match. For example, an old sql, 'GRANT USER ln_write_user PRIVILEGES INSERT_TIMESERIES on root.ln', won't work any more. The replacement is 'GRANT USER ln_write_user PRIVILEGES INSERT_TIMESERIES on root.ln.**' . Besides, we introduce the concept, sub pattern, which means a pattern's result set contains all the elements of its sub pattern's result set. For example, 'root.sg.d.*' is a sub pattern of 'root.sg.*.*', while 'root.sg.**' is not a sub pattern of 'root.sg.*.*'. When a user is granted privilege on a pattern, the pattern used in his DDL or DML must be a sub pattern of the previlige pattern, which guarantees that the user won't access the timeseries exceed his privilege scope. To guarantee the efficiency and performance of auth check, we will implement the auth check after the generation of statement and before the execution of statement. Hope for some suggestions. Best ---- Yukun Zhou School of Software, Tsinghua University 周钰坤 清华大学 软件学院
Re: [DISCUSS] Recommend Select * OR Select ** In IOTDB-SQL
Hi For your first question, the answer is yes. Single star * will only represent one level node wherever in path. "select s1, s2 from root.**.d1" will get results like root.sg.d1.s1, root.sg.group.d2.s2. ** could represent one or more levels of nodes in path. Best Yukun Zhou School of Software, Tsinghua University 周钰坤 清华大学 软件学院 Xiangwei Wei 于2021年9月17日周五 上午9:27写道: > > Hi, > > The first question is: does single star * represent only single path node > later? > > And I support one new example, since we usually use sensor in select > clause, but it's not in 2th example. > > What about `select s1, s2 from root.**.d1" > > 周钰坤 于2021年9月16日周四 下午11:05写道: > > > Hi > > > > We are developing a new feature on master branch, support wildcard ** > > in IOTDB-SQL. Here's the link. > > https://github.com/apache/iotdb/pull/3918 > > Since we support * and **, and apply path pattern in SQL statement. > > Here are two type of DDL SQL to get all data under one prefixPath, and > > we want to choose one of them as default recommended statement > > presenting in UserGuide docs. > > 1. select * from .**, e.g. select * from root.** > > 2. select ** from , e.g. select ** from root > > Obviously, the second one is more simple than the first one. > > However, since we IoTDB have some hidden bugs in data query and > > presentation, define entities clearly in sql from clause and IoTDB > > will run more stable. That's why the second one prevails. > > > > Look forward to your suggestions. > > > > Best > > > > Yukun Zhou > > School of Software, Tsinghua University > > > > 周钰坤 > > 清华大学 软件学院 > > > > > -- > Best, > Xiangwei Wei
[DISCUSS] Recommend Select * OR Select ** In IOTDB-SQL
Hi We are developing a new feature on master branch, support wildcard ** in IOTDB-SQL. Here's the link. https://github.com/apache/iotdb/pull/3918 Since we support * and **, and apply path pattern in SQL statement. Here are two type of DDL SQL to get all data under one prefixPath, and we want to choose one of them as default recommended statement presenting in UserGuide docs. 1. select * from .**, e.g. select * from root.** 2. select ** from , e.g. select ** from root Obviously, the second one is more simple than the first one. However, since we IoTDB have some hidden bugs in data query and presentation, define entities clearly in sql from clause and IoTDB will run more stable. That's why the second one prevails. Look forward to your suggestions. Best Yukun Zhou School of Software, Tsinghua University 周钰坤 清华大学 软件学院
[Discuss] Wildcard Improvement In IoTDB-SQL
Hi We want to introduce a new wildcard **, to improve the DDL and DML of IoTDB-SQL. As we all know, a time series is represented by a full path from the root node to the measurement node in the metadata tree. The existing wildcard *, when used in the path, represents one level of the metadata tree and one or more levels when used at the tail of the path. After introducing wildcard **, we want wildcard ** to represent one or more levels of the MTree wherever used in the path, and wildcard * to only represent one level even if used at the tail of the path. Besides, we want to define the path given by a sql statement as a pattern of the target paths rather than a prefix. Some SQL statements' meaning may be different from the old versions. Here are some detailed examples and explanations of the wildcard usage. Please refer to https://shimo.im/docs/8c3Qrp88ph39QHdv/ Best - Yukun Zhou School of Software, Tsinghua University 周钰坤 清华大学 软件学院
[Discuss] Wildcard Improvement In IoTDB-SQL
We want to introduce a new wildcard **, to improve the DDL and DML of IoTDB-SQL. As we all know, a time series is represented by a full path from the root node to the measurement node in the metadata tree. The existing wildcard *, when used in the path, represents one level of the metadata tree and one or more levels when used at the tail of the path. After introducing wildcard **, we want wildcard ** to represent one or more levels of the MTree wherever used in the path, and wildcard * to only represent one level even if used at the tail of the path. Besides, we want to define the path given by a sql statement as a pattern of the target paths rather than a prefix. Some SQL statements' meaning may be different from the old versions. Here are some detailed examples and explanations of the wildcard usage. Please refer to https://shimo.im/docs/8c3Qrp88ph39QHdv/
Re: discuss wildcard improvement in IoTDB-SQL
Hi Precise path match means precise full path match. Users need to construct the full path pattern, using wildcard or not, when writing sql. In your example, using "count timeseries root.sg.*.s.*" to count in "root.sg.d.s.t " is what we want users to do. " Select * " is a simple representation of select component in a sql statement when using wildcard *. Sorry about the content, there are some style characters, used for emphasizing but confusing readers when viewed in raw text. I have not noticed that before. You may check out the e-mail in gmail. Here's the raw text. Please check out. We want to introduce a new wildcard **, to improve the DDL and DML of IoTDB-SQL. Wildcard ** will represent one or more levels in the path and the existing wildcard * will only represent one level in the path even if it's at the tail of the path. In DDL, we want to replace the prefix path match with the precise path match, which means the given path will be the pattern of result elements' full path. The prefix path match could be implemented by leveraging wildcard **. This change can help users do operations more precisely. For example the old versions query "count timeseries root.sg " will be implemented as "count timeseries root.sg.** ". Besides, the old query "count timeseries root.sg.*.s " will count timeseries like "root.sg.d.s.t " because its prefix path matches "root.sg.*.s " but the new query won't do that any more because its full path doesn't match "root.sg.*.s ". The same changes will be applyed on DDL of storage group, entity and normal nodes. In DML, after introduce wildcard **, "select * " won't query all timeseries in the subtree represented by the given path, but "select ** " will do that. "select * " will only query the timeseries on the next level of the given path. For example, the new query "select * from root.sg.d" will only query timeseries like "root.sg.d.s" but not query timeseries like "root.sg.d.a.s", while "select ** from root.sg.d" will query them all. Of course, we want to implement precise path match in DML too. Best - Yukun zhou School of software, Tsinghua University 周钰坤 清华大学 软件学院 Xiangdong Huang 于2021年8月27日周五 下午1:47写道: > > Hi, > > I am lost when reading the content what is *" "* *precise etc.? > > > In DDL, we want to replace the prefix path match with the *precise path > > match* > > What is *precise path match*? > > > For example the old versions query "count timeseries root.sg " will be > > implemented as "count timeseries root.sg.** ". > > OK, it is clear. > > > > Besides, the old query > > "count timeseries root.sg.*.s " will count timeseries like "root.sg.d.s.t " > > because its prefix path matches "root.sg.*.s " but the new query won't do > > that any more because its full path doesn't match "root.sg.*.s ". > > So if we want to count timesereis "root.sg.d.s.t " (I mean, > root.sg.d.s.t is a timeseries, not a prefix path), > we need to use "count timeseries root.sg.*.s.*"? > > in your DML paragraph, I can not understand what *"select * "* is. > > Best, > --- > Xiangdong Huang > School of Software, Tsinghua University > > 黄向东 > 清华大学 软件学院 > > 周钰坤 于2021年8月26日周四 下午12:47写道: > > > > Hi > > > > We want to introduce a new *wildcard ***, to improve the DDL and DML of > > IoTDB-SQL. > > Wildcard ** will represent one or more levels in the path and the existing > > wildcard * will only represent one level in the path even if it's at the > > tail of the path. > > > > In DDL, we want to replace the prefix path match with the *precise path > > match*, which means the given path will be the pattern of result elements' > > full path. The prefix path match could be implemented by leveraging > > wildcard **. This change can help users do operations more precisely. > > For example the old versions query "count timeseries root.sg " will be > > implemented as "count timeseries root.sg.** ". Besides, the old query > > "count timeseries root.sg.*.s " will count timeseries like "root.sg.d.s.t " > > because its prefix path matches "root.sg.*.s " but the new query won't do > > that any more because its full path doesn't match "root.sg.*.s ". > > The same changes will be applyed on DDL of storage group, entity and normal > > nodes. > > > > In DML, after introduce wildcard **, *"select * "* won't query all > > timeseries in the subtree represented by the given path, but *"select ** "* > > will do that. *"select * "* will only query the timeseries on the next > > level of the given path. > > For example, the new query "select * from root.sg.d" will only query > > timeseries like "root.sg.d.s" but not query timeseries like > > "root.sg.d.a.s", while "select ** from root.sg.d" will query them all. > > Of course, we want to implement *precise path match* in DML too. > > > > Best > > - > > Yukun zhou > > School of software, Tsinghua University > > > > 周钰坤 > > 清华大学 软件学院
discuss wildcard improvement in IoTDB-SQL
Hi We want to introduce a new *wildcard ***, to improve the DDL and DML of IoTDB-SQL. Wildcard ** will represent one or more levels in the path and the existing wildcard * will only represent one level in the path even if it's at the tail of the path. In DDL, we want to replace the prefix path match with the *precise path match*, which means the given path will be the pattern of result elements' full path. The prefix path match could be implemented by leveraging wildcard **. This change can help users do operations more precisely. For example the old versions query "count timeseries root.sg " will be implemented as "count timeseries root.sg.** ". Besides, the old query "count timeseries root.sg.*.s " will count timeseries like "root.sg.d.s.t " because its prefix path matches "root.sg.*.s " but the new query won't do that any more because its full path doesn't match "root.sg.*.s ". The same changes will be applyed on DDL of storage group, entity and normal nodes. In DML, after introduce wildcard **, *"select * "* won't query all timeseries in the subtree represented by the given path, but *"select ** "* will do that. *"select * "* will only query the timeseries on the next level of the given path. For example, the new query "select * from root.sg.d" will only query timeseries like "root.sg.d.s" but not query timeseries like "root.sg.d.a.s", while "select ** from root.sg.d" will query them all. Of course, we want to implement *precise path match* in DML too. Best - Yukun zhou School of software, Tsinghua University 周钰坤 清华大学 软件学院
Re: Measurement Template New Constraints
As far as I know, the user interface of template has been discussed and defined in this doc, 一元/多元时间序列与模式模板用户手册 (shimo.im) <https://shimo.im/docs/eME9YgS50wIqhOiO>, but the current implementation only support Session. The UI of "assign of template" has already been defined. The UI of "unassign of template" operation need further discussion and definition. The implementation of the whole eature needs more work on related modules. The mentioned constraint in upper mail could be added and implemented to current metadata module first. — Yukun Zhou School of Software, Tsinghua University Xiangdong Huang 于2021年7月26日周一 下午4:37写道: > "How to assign and unassign a template to a node" should be defined clearly > before implementation. > > --- > Xiangdong Huang > School of Software, Tsinghua University > > 黄向东 > 清华大学 软件学院 > > > Jialin Qiao 于2021年7月24日周六 下午2:40写道: > > > +1 > > — > > Jialin Qiao > > School of Software, Tsinghua University > > > > 乔嘉林 > > 清华大学 软件学院 > > > > > > 周钰坤 于2021年7月24日周六 上午10:07写道: > > > > > Hi, > > > > > > Apache IoTDB supports Measurement Template feature from version > 0.13. > > > The current version allows user to set different templates to different > > > nodes of a path and the nearest template to current node will be valid. > > > This will result in schema and data lost while user set a new > > template, > > > which doesn't contain schemas in the existing upper template. > > > Therefore, a new constraint will be introduced. If a template has > > been > > > set to a node, it will be forbidden to set any template to the > ancestors > > or > > > descendants of this node, like the constraint of storage group. > > > > > > Thanks, > > > Yukun Zhou > > > > > >
Measurement Template New Constraints
Hi, Apache IoTDB supports Measurement Template feature from version 0.13. The current version allows user to set different templates to different nodes of a path and the nearest template to current node will be valid. This will result in schema and data lost while user set a new template, which doesn't contain schemas in the existing upper template. Therefore, a new constraint will be introduced. If a template has been set to a node, it will be forbidden to set any template to the ancestors or descendants of this node, like the constraint of storage group. Thanks, Yukun Zhou
Metadata New Function and Restriction
Hi, The following changes will be involved into metadata module of current version(0.13) 1. It will be forbidden to set a MeasurementMNode as a device or add new measurement under an existing MeasurementMNode, which means the MeasurementMNode will be the leaf of MTree. 2. The implementation of Device/Entity node will be enhanced. A new class, EntityMNode, will be introduced to implement relevant functions that Template doesn't support. As a result of this, it will not be allowed to set storage group to an EntityMNode. In other words, adding MeasurementMNode to a StorageGroupMNode as child will be forbidden. However, it will be allowed to set storage group to root node in future. 3. To save memory, Leveled MeasurementMNode will be introduced. A SimpleMeasurementMNode will only support lastCache and trigger function. A CompletedMeasurement will support all functions, including alias, tag/attribute. If none of these functions are used, the MeasurementMNode won't be created if it is using Template. Thanks, Yukun Zhou