[New Feature] PB Tree for Massive Schema Management

2023-06-19 Thread
Hi,

We're going to introduce PB Tree (Prefix-B+ Tree), a new SchemaEgnine
mode, in V1.2.0, which supports evict temporarily useless schema to
disk and load them back in need at runtime. With PB Tree, users will
not suffer from memory constraint for schema management any more,
especially in scenarios with massive time series.

To enable PB Tree in IoTDB, the value of parameter named
schema_engine_mode in iotdb-commons.properties should be configured as
PB_Tree [1]. All the memory allocated for SchemaRegion will be used by
PB Tree. The allocation can be modified by changing the value of
parameters named storage_query_schema_consensus_free_memory_proportion
and schema_memory_proportion in iotdb-commons.properties.

PB Tree is a new implementation of MTree in SchemaRegion, consisting
of a a PB Tree file and a prefix-tree-structured cache. The PB Tree
file manages the prefix tree structure on disk by storing pointer
pointing to first disk page storing children nodes in each node
record. B+ Tree is used to manage brother nodes for fast single child
node search. The cache is similar to the existing MTree implementation
in SchemaRegion. The main difference is that the children of a node
may not cached in memory, which should be read from disk by searching
specific B+ Tree located by pointer stored in node record and cached
in memory node object.

Reference:
[1] 
https://iotdb.apache.org/zh/UserGuide/V1.2.x/Reference/Common-Config-Manual.html#%E5%85%83%E6%95%B0%E6%8D%AE%E5%BC%95%E6%93%8E%E9%85%8D%E7%BD%AE

Thanks.

Yunkun Zhou
Apache IoTDB Committer


Re: 回复: Add constraint to the length of database name

2022-11-18 Thread
We support suffix path, which will be concat to the prefix path in "from
clause", and full path, starting with "root",   in "where" clause. The
"root" is a identifier for us to recognize which is a full path.

Chao Wang  于2022年11月18日周五 18:24写道:

> +1, but why including root, i think the user could ignore the "root" when
> we change sg to database.  And we could ignore the root in the file path.
> And, the sql select from root.dbname and select from dbname all are ok to
> keep compatible.
>
>
>
>
> Thanks!
>
>
> Chao Wang
> BONC ltd
> ccgow...@163.com
> 在2022年11月18日 18:02,冯 庆新 写道:
> Agree with  ‘add constraint to the length of database name’,but Can we
> choose a value greater than 64?
>
> 发件人: Jialin Qiao<mailto:qiaojia...@apache.org>
> 发送时间: 2022年11月18日 16:15
> 收件人: dev@iotdb.apache.org<mailto:dev@iotdb.apache.org>
> 主题: Re: Add constraint to the length of database name
>
> +1
> —
> Jialin Qiao
> Apache IoTDB PMC
>
> 周钰坤  于2022年11月18日周五 15:44写道:
>
> Hi,
>
> We want to add constraint to the length of database name, as most
> popular database systems have such constraint as well, for example the
> length of database name in Mysql shall not exceed 64. Currently, the
> maximum length of database name, including "root.",  is *64* and it is
> immutable. Such constraint can help avoid some bugs of database and region
> management, since we use database names in the directory name, which shall
> not exceed the max name length defined by file system.
>
> best regards
> 
> Yukun Zhou, Tsinghua University
>
>


Re: Add constraint to the length of database name

2022-11-18 Thread
It's the constraint that matters and the length 64 is sufficient for most
cases.

冯 庆新  于2022年11月18日周五 18:02写道:

> Agree with  ‘add constraint to the length of database name’,but Can we
> choose a value greater than 64?
>
> 发件人: Jialin Qiao<mailto:qiaojia...@apache.org>
> 发送时间: 2022年11月18日 16:15
> 收件人: dev@iotdb.apache.org<mailto:dev@iotdb.apache.org>
> 主题: Re: Add constraint to the length of database name
>
> +1
> —————
> Jialin Qiao
> Apache IoTDB PMC
>
> 周钰坤  于2022年11月18日周五 15:44写道:
> >
> > Hi,
> >
> > We want to add constraint to the length of database name, as most
> > popular database systems have such constraint as well, for example the
> > length of database name in Mysql shall not exceed 64. Currently, the
> > maximum length of database name, including "root.",  is *64* and it is
> > immutable. Such constraint can help avoid some bugs of database and
> region
> > management, since we use database names in the directory name, which
> shall
> > not exceed the max name length defined by file system.
> >
> > best regards
> > 
> > Yukun Zhou, Tsinghua University
>
>


Add constraint to the length of database name

2022-11-17 Thread
Hi,

We want to add constraint to the length of database name, as most
popular database systems have such constraint as well, for example the
length of database name in Mysql shall not exceed 64. Currently, the
maximum length of database name, including "root.",  is *64* and it is
immutable. Such constraint can help avoid some bugs of database and region
management, since we use database names in the directory name, which shall
not exceed the max name length defined by file system.

best regards

Yukun Zhou, Tsinghua University


Re: [IOTDB-3800]Add Node Type Column to 'SHOW CHILD PATHS' query

2022-07-11 Thread
Good idea!

To stick to our user guidance, the leaf node type should be named
measurement, since the timeseries stands for a full path from root to
leaf in metadata tree.

--
Yukun Zhou
Tsinghua University

Zhou Yifu  于2022年7月11日周一 21:57写道:
>
> Hi all,
>
> Currently our show child paths query looks a bit disordered and it seems that 
> only the paths names can be displayed. Besides these paths names, we cannot 
> know more detail about this paths’ meanings in our metadata model. So I want 
> to add one more ‘node type’ column to it and also can be sorted by these 
> types.
> How about node type is:
> root -> sg internal -> storage group -> internal -> device -> timeseries
> Feel free to discuss here, Thanks a lot!
>
> Regards,
> Yifu Zhou


Re: [VOTE] Apache IoTDB 0.12.6 RC1 release

2022-07-11 Thread
+1

---
Yukun Zhou
Tsinghua University

Yuan Tian  于2022年7月8日周五 18:08写道:
>
> Hi all,
>
> Apache IoTDB 0.12.6 is a bug-fix version from 0.12.5. You can get its
> mainly changes from [5].
>
> Apache IoTDB 0.12.6 has been staged under [2] and it’s time to vote
> on accepting it for release.  All Maven artifacts are available under [1].
> Voting will be open for 72hr.
> A minimum of 3 binding +1 votes and more binding +1 than binding -1
> are required to pass.
>
> Release tag: v0.12.6
> Hash for the release tag: eeb67fab595e090bfc67c62f45017d618acd24cf
>
> Before voting +1, PMC members are required to download
> the signed source code package, compile it as provided, and test
> the resulting executable on their own platform, along with also
> verifying that the package meets the requirements of the ASF policy
> on releases. [3]
>
> You can achieve the above by following [4].
>
> [ ]  +1 accept (indicate what you validated - e.g. performed the
> non-RM items in [4])
> [ ]  -1 reject (explanation required)
>
>
> [1] https://repository.apache.org/content/repositories/orgapacheiotdb-1079
> [2] https://dist.apache.org/repos/dist/dev/iotdb/0.12.6/rc1
> [3] https://www.apache.org/dev/release.html#approving-a-release
> [4] 
> https://cwiki.apache.org/confluence/display/IOTDB/Validating+a+staged+Release
> [5] https://dist.apache.org/repos/dist/dev/iotdb/0.12.6/rc1/RELEASE_NOTES.md
> [6] https://dist.apache.org/repos/dist/dev/iotdb/KEYS
>
>
> Best,
> -
> Yuan Tian


Re: The structure of distribution

2022-07-05 Thread
I think the choice 2 is better, since configNode and dataNode will be
deployed separately and multi-replica means the dir will be copied.

SpriCoder  于2022年7月6日周三 12:04写道:
>
> To more specific:
> In Choice 1, folders in apache-iotdb-0.14.0-SNAPSHOT-all-bin will like that:
> ├── sbin
> ├── conf
> ├── config
>    ├── data
>    ├── ext
>    ├── logs
>
> ├── data
>    ├── data
>    ├── logs
>
>
>
> In Choice2, folder in apache-iotdb-0.14.0-SNAPSHOT-all-bin will like that:
> ├── confignode
>    ├── conf
>    ├── data
>    ├── ext
>    ├── logs
>    └── sbin
> ├── datanode
>    ├── conf
>    ├── data
>    ├── logs
>    └── sbin
>
>
>
>
>
>
>
> -- Original --
> From: 
>"SpriCoder"
> 
>  Date: Tue, Jul 5, 2022 05:54 PM
> To: "dev"
> Subject: The structure of distribution
>
>
>
> Hi all,
>
> Currently, we have confignode and datanode folder in the distribution. Which 
> has conf, sbin, and will store the default folder of data and system. And 
> there is a need to refactor distribution structure.
>
> I think there are two choices:
>
> 1. remove the confignode and datanode folder, and combine their script and 
> configuration files into the conf and sbin under the root. In this way all 
> folder generated by confignode will be put into config folder, and all folder 
> generated by confignode will be put into data folder.
>
> 2. use confignode and datanode folder to manage their script and 
> configuration files, like: confignode/sbin, confignode/conf, datanode/sbin, 
> datanode/conf, etc. In this way all folder generated by confignode will be 
> put into confignode folder and all folder generated by datanode will be put 
> into datanode folder.
>
>
>
>
> What's your opinion? Looking forward to your reply.
>
>
> Best,
>
> ---
>
> Hongyin Zhang


Refactor the rule of auth check

2022-05-06 Thread
Hi

Currently, the rule of iotdb's auth check is prefix match, which is
inconsistent with pattern match in DDL and DML. Therefore, we want to
refactor the rule to pattern match.
For example, an old sql, 'GRANT USER ln_write_user PRIVILEGES
INSERT_TIMESERIES on root.ln', won't work any more. The replacement is
'GRANT USER ln_write_user PRIVILEGES INSERT_TIMESERIES on root.ln.**'
.

Besides, we introduce the concept, sub pattern, which means a
pattern's result set contains all the elements of its sub pattern's
result set. For example, 'root.sg.d.*' is a sub pattern of
'root.sg.*.*', while 'root.sg.**' is not a sub pattern of
'root.sg.*.*'.
When a user is granted privilege on a pattern, the pattern used in his
DDL or DML must be a sub pattern of the previlige pattern, which
guarantees that the user won't access the timeseries exceed his
privilege scope.

To guarantee the efficiency and performance of auth check, we will
implement the auth check after the generation of statement and before
the execution of statement.

Hope for some suggestions.


Best
----
Yukun Zhou
School of Software, Tsinghua University

周钰坤
清华大学 软件学院


Re: [DISCUSS] Recommend Select * OR Select ** In IOTDB-SQL

2021-09-17 Thread
Hi

For your first question, the answer is yes. Single star * will only
represent one level node wherever in path.

 "select s1, s2 from root.**.d1" will get results like root.sg.d1.s1,
root.sg.group.d2.s2. ** could represent one or more levels of nodes in
path.


Best

Yukun Zhou
School of Software, Tsinghua University

周钰坤
清华大学 软件学院

Xiangwei Wei  于2021年9月17日周五 上午9:27写道:
>
> Hi,
>
> The first question is: does single star * represent only single path node
> later?
>
> And I support one new example, since we usually use sensor in select
> clause, but it's not in 2th example.
>
> What about `select s1, s2 from root.**.d1"
>
> 周钰坤  于2021年9月16日周四 下午11:05写道:
>
> > Hi
> >
> > We are developing a new feature on master branch, support wildcard **
> > in IOTDB-SQL. Here's the link.
> > https://github.com/apache/iotdb/pull/3918
> > Since we support * and **, and apply path pattern in SQL statement.
> > Here are two type of DDL SQL to get all data under one prefixPath, and
> > we want to choose one of them as default recommended statement
> > presenting in UserGuide docs.
> > 1. select * from .**, e.g. select * from root.**
> > 2. select ** from , e.g. select ** from root
> > Obviously, the second one is more simple than the first one.
> > However, since we IoTDB have some hidden bugs in data query and
> > presentation, define entities clearly in sql from clause and IoTDB
> > will run more stable. That's why the second one prevails.
> >
> > Look forward to your suggestions.
> >
> > Best
> > 
> > Yukun Zhou
> > School of Software, Tsinghua University
> >
> > 周钰坤
> > 清华大学 软件学院
> >
>
>
> --
> Best,
> Xiangwei Wei


[DISCUSS] Recommend Select * OR Select ** In IOTDB-SQL

2021-09-16 Thread
Hi

We are developing a new feature on master branch, support wildcard **
in IOTDB-SQL. Here's the link.
https://github.com/apache/iotdb/pull/3918
Since we support * and **, and apply path pattern in SQL statement.
Here are two type of DDL SQL to get all data under one prefixPath, and
we want to choose one of them as default recommended statement
presenting in UserGuide docs.
1. select * from .**, e.g. select * from root.**
2. select ** from , e.g. select ** from root
Obviously, the second one is more simple than the first one.
However, since we IoTDB have some hidden bugs in data query and
presentation, define entities clearly in sql from clause and IoTDB
will run more stable. That's why the second one prevails.

Look forward to your suggestions.

Best

Yukun Zhou
School of Software, Tsinghua University

周钰坤
清华大学 软件学院


[Discuss] Wildcard Improvement In IoTDB-SQL

2021-08-27 Thread
Hi

We want to introduce a new wildcard **, to improve the DDL and DML of IoTDB-SQL.

As we all know, a time series is represented by a full path from the
root node to the measurement node in the metadata tree. The existing
wildcard *, when used in the path, represents one level of the
metadata tree and one or more levels when used at the tail of the
path.

After introducing wildcard **, we want wildcard ** to represent one or
more levels of the MTree wherever used in the path, and wildcard * to
only represent one level even if used at the tail of the path.

Besides, we want to define the path given by a sql statement as a
pattern of the target paths rather than a prefix.

Some SQL statements' meaning may be different from the old versions.
Here are some detailed examples and explanations of the wildcard
usage. Please refer to https://shimo.im/docs/8c3Qrp88ph39QHdv/

Best
-
Yukun Zhou
School of Software, Tsinghua University

周钰坤
清华大学 软件学院


[Discuss] Wildcard Improvement In IoTDB-SQL

2021-08-27 Thread
We want to introduce a new wildcard **, to improve the DDL and DML of IoTDB-SQL.

As we all know, a time series is represented by a full path from the
root node to the measurement node in the metadata tree. The existing
wildcard *, when used in the path, represents one level of the
metadata tree and one or more levels when used at the tail of the
path.

After introducing wildcard **, we want wildcard ** to represent one or
more levels of the MTree wherever used in the path, and wildcard * to
only represent one level even if used at the tail of the path.

Besides, we want to define the path given by a sql statement as a
pattern of the target paths rather than a prefix.

Some SQL statements' meaning may be different from the old versions.
Here are some detailed examples and explanations of the wildcard
usage. Please refer to https://shimo.im/docs/8c3Qrp88ph39QHdv/


Re: discuss wildcard improvement in IoTDB-SQL

2021-08-27 Thread
Hi

Precise path match means precise full path match. Users need to
construct the full path pattern, using wildcard or not, when writing
sql.
In your example, using "count timeseries root.sg.*.s.*" to count in
"root.sg.d.s.t " is what we want users to do.

" Select * " is a simple representation of select component in a sql
statement when using wildcard *.

Sorry about the content, there are some style characters, used for
emphasizing but confusing readers when viewed in raw text.
I have not noticed that before. You may check out the e-mail in gmail.
Here's the raw text. Please check out.

We want to introduce a new wildcard **, to improve the DDL and DML of IoTDB-SQL.
Wildcard ** will represent one or more levels in the path and the
existing wildcard * will only represent one level in the path even if
it's at the tail of the path.

In DDL, we want to replace the prefix path match with the precise path
match, which means the given path will be the pattern of result
elements' full path. The prefix path match could be implemented by
leveraging wildcard **. This change can help users do operations more
precisely.
For example the old versions query "count timeseries root.sg " will be
implemented as "count timeseries root.sg.** ". Besides,  the old query
"count timeseries root.sg.*.s " will count timeseries like
"root.sg.d.s.t " because its prefix path matches "root.sg.*.s " but
the new query won't do that any more because its full path doesn't
match "root.sg.*.s ".
The same changes will be applyed on DDL of storage group, entity and
normal nodes.

In DML, after introduce wildcard **, "select * " won't query all
timeseries in the subtree represented by the given path, but "select
** " will do that. "select * " will only query the timeseries on the
next level of the given path.
For example, the new query "select * from root.sg.d" will only query
timeseries like "root.sg.d.s" but not query timeseries like
"root.sg.d.a.s", while "select ** from root.sg.d" will query them all.
Of course, we want to implement precise path match in DML too.

Best
-
Yukun zhou
School of software, Tsinghua University

周钰坤
清华大学 软件学院


Xiangdong Huang  于2021年8月27日周五 下午1:47写道:
>
> Hi,
>
> I am lost when reading the content what is *" "* *precise etc.?
>
> > In DDL, we want to replace the prefix path match with the *precise path
> > match*
>
> What is *precise path match*?
>
> > For example the old versions query "count timeseries root.sg " will be
> > implemented as "count timeseries root.sg.** ".
>
> OK, it is clear.
>
>
> > Besides,  the old query
> > "count timeseries root.sg.*.s " will count timeseries like "root.sg.d.s.t "
> > because its prefix path matches "root.sg.*.s " but the new query won't do
> > that any more because its full path doesn't match "root.sg.*.s ".
>
> So if we want to count timesereis "root.sg.d.s.t " (I mean,
> root.sg.d.s.t is a timeseries, not a prefix path),
> we need to use "count timeseries root.sg.*.s.*"?
>
> in your DML paragraph, I can not understand what  *"select * "* is.
>
> Best,
> ---
> Xiangdong Huang
> School of Software, Tsinghua University
>
>  黄向东
> 清华大学 软件学院
>
> 周钰坤  于2021年8月26日周四 下午12:47写道:
> >
> > Hi
> >
> > We want to introduce a new *wildcard ***, to improve the DDL and DML of
> > IoTDB-SQL.
> > Wildcard ** will represent one or more levels in the path and the existing
> > wildcard * will only represent one level in the path even if it's at the
> > tail of the path.
> >
> > In DDL, we want to replace the prefix path match with the *precise path
> > match*, which means the given path will be the pattern of result elements'
> > full path. The prefix path match could be implemented by leveraging
> > wildcard **. This change can help users do operations more precisely.
> > For example the old versions query "count timeseries root.sg " will be
> > implemented as "count timeseries root.sg.** ". Besides,  the old query
> > "count timeseries root.sg.*.s " will count timeseries like "root.sg.d.s.t "
> > because its prefix path matches "root.sg.*.s " but the new query won't do
> > that any more because its full path doesn't match "root.sg.*.s ".
> > The same changes will be applyed on DDL of storage group, entity and normal
> > nodes.
> >
> > In DML, after introduce wildcard **, *"select * "* won't query all
> > timeseries in the subtree represented by the given path, but *"select ** "*
> > will do that. *"select * "* will only query the timeseries on the next
> > level of the given path.
> > For example, the new query "select * from root.sg.d" will only query
> > timeseries like "root.sg.d.s" but not query timeseries like
> > "root.sg.d.a.s", while "select ** from root.sg.d" will query them all.
> > Of course, we want to implement *precise path match* in DML too.
> >
> > Best
> > -
> > Yukun zhou
> > School of software, Tsinghua University
> >
> > 周钰坤
> > 清华大学 软件学院


discuss wildcard improvement in IoTDB-SQL

2021-08-25 Thread
Hi

We want to introduce a new *wildcard ***, to improve the DDL and DML of
IoTDB-SQL.
Wildcard ** will represent one or more levels in the path and the existing
wildcard * will only represent one level in the path even if it's at the
tail of the path.

In DDL, we want to replace the prefix path match with the *precise path
match*, which means the given path will be the pattern of result elements'
full path. The prefix path match could be implemented by leveraging
wildcard **. This change can help users do operations more precisely.
For example the old versions query "count timeseries root.sg " will be
implemented as "count timeseries root.sg.** ". Besides,  the old query
"count timeseries root.sg.*.s " will count timeseries like "root.sg.d.s.t "
because its prefix path matches "root.sg.*.s " but the new query won't do
that any more because its full path doesn't match "root.sg.*.s ".
The same changes will be applyed on DDL of storage group, entity and normal
nodes.

In DML, after introduce wildcard **, *"select * "* won't query all
timeseries in the subtree represented by the given path, but *"select ** "*
will do that. *"select * "* will only query the timeseries on the next
level of the given path.
For example, the new query "select * from root.sg.d" will only query
timeseries like "root.sg.d.s" but not query timeseries like
"root.sg.d.a.s", while "select ** from root.sg.d" will query them all.
Of course, we want to implement *precise path match* in DML too.

Best
-
Yukun zhou
School of software, Tsinghua University

周钰坤
清华大学 软件学院


Re: Measurement Template New Constraints

2021-07-26 Thread
As far as I know, the user interface of template has been discussed and
defined in this doc,  一元/多元时间序列与模式模板用户手册 (shimo.im)
<https://shimo.im/docs/eME9YgS50wIqhOiO>, but the current implementation
only support Session.
The UI of "assign of template" has already been defined. The UI of
"unassign of template" operation need further discussion and definition.
The implementation of the whole eature needs more work on related modules.
The mentioned constraint in upper mail could be added and implemented to
current metadata module first.

—
Yukun Zhou
School of Software, Tsinghua University

Xiangdong Huang  于2021年7月26日周一 下午4:37写道:

> "How to assign and unassign a template to a node" should be defined clearly
> before implementation.
>
> ---
> Xiangdong Huang
> School of Software, Tsinghua University
>
>  黄向东
> 清华大学 软件学院
>
>
> Jialin Qiao  于2021年7月24日周六 下午2:40写道:
>
> > +1
> > —
> > Jialin Qiao
> > School of Software, Tsinghua University
> >
> > 乔嘉林
> > 清华大学 软件学院
> >
> >
> > 周钰坤  于2021年7月24日周六 上午10:07写道:
> >
> > > Hi,
> > >
> > > Apache IoTDB supports Measurement Template feature from version
> 0.13.
> > > The current version allows user to set different templates to different
> > > nodes of a path and the nearest template to current node will be valid.
> > > This will result in schema and data lost while user set a new
> > template,
> > > which doesn't contain schemas in the existing upper template.
> > > Therefore, a new constraint will be introduced. If a template has
> > been
> > > set to a node, it will be forbidden to set any template to the
> ancestors
> > or
> > > descendants of this node, like the constraint of storage group.
> > >
> > > Thanks,
> > > Yukun Zhou
> > >
> >
>


Measurement Template New Constraints

2021-07-23 Thread
Hi,

Apache IoTDB supports Measurement Template feature from version 0.13.
The current version allows user to set different templates to different
nodes of a path and the nearest template to current node will be valid.
This will result in schema and data lost while user set a new template,
which doesn't contain schemas in the existing upper template.
Therefore, a new constraint will be introduced. If a template has been
set to a node, it will be forbidden to set any template to the ancestors or
descendants of this node, like the constraint of storage group.

Thanks,
Yukun Zhou


Metadata New Function and Restriction

2021-07-15 Thread
Hi,

The following changes will be involved into metadata module of current
version(0.13)

1. It will be forbidden to set a MeasurementMNode as a device or add new
measurement under an existing MeasurementMNode, which means the
MeasurementMNode will be the leaf of MTree.

2. The implementation of Device/Entity node will be enhanced. A new class,
EntityMNode, will be introduced to implement relevant functions that
Template doesn't support. As a result of this, it will not be allowed to
set storage group to an EntityMNode. In other words, adding MeasurementMNode
to a StorageGroupMNode as child  will be forbidden. However, it  will be
allowed to set storage group to root node in future.

3. To save memory, Leveled MeasurementMNode will be introduced. A
SimpleMeasurementMNode will only support lastCache and trigger function. A
CompletedMeasurement will support  all functions, including alias,
tag/attribute. If none of these functions are used, the MeasurementMNode
won't be created if it is using Template.

Thanks,
Yukun Zhou