What about using some kind of cache that spills to disk. That way we would be up in no time and just lazy load devices when needed.
I remember that eh cache has such features (https://www.baeldung.com/ehcache) but there are other implementations as well. Julian Holen Sie sich Outlook für Android<https://aka.ms/ghei36> ________________________________ From: 孙泽嵩 <sz...@mails.tsinghua.edu.cn> Sent: Friday, June 19, 2020 7:57:51 AM To: dev@iotdb.apache.org <dev@iotdb.apache.org> Subject: Re: [IOTDB-726] CheckPoint of MTree Hi Jialin, I did an experiment for 1M timeseries, and the serialization process costs 971ms. Maybe we could consider creating a snapshot when the MTree is not changed for a long time (for example, one hour). In this way, the client will not be stuck and users may not even notice it. Best, ----------------------------------- Zesong Sun School of Software, Tsinghua University 孙泽嵩 清华大学 软件学院 > 2020年6月18日 16:19,孙泽嵩 <sz...@mails.tsinghua.edu.cn> 写道: > > Hi, > > Good opinions! > >> how about adding a "create snapshot for schema" sql to let users trigger >> this manually > > I’ll add this sql in a new PR. > >> how long it takes to recover from a 1M timeseries snapshot. > > Based on my previous experiment, it takes about 6s as you said. > >> how long it takes to create a snapshot for 1M/10M timeseries? > > I didn’t time this … I’ll do an experiment after fixing the suggested changes > in current PR [1] > > > [1] https://github.com/apache/incubator-iotdb/pull/1384 > > > Best, > ----------------------------------- > Zesong Sun > School of Software, Tsinghua University > > 孙泽嵩 > 清华大学 软件学院 > >> 2020年6月18日 14:39,Jialin Qiao <qj...@mails.tsinghua.edu.cn> 写道: >> >> Hi, >> >> Currently, the snapshot is triggered every xxx lines in mlog.txt. >> When meeting 20M timeseries, the default 10k lines will cause too many >> snapshot, which will block the creating. >> However, if we enlarge the condition to 1M, the last 1M will take about 6s >> to recover, about 160K per second. >> >> So, my concern is how long it takes to create a snapshot for 1M/10M >> timeseries? And how long it takes to recover from a 1M timeseries snapshot. >> >> Besides, how about adding a "create snapshot for schema" sql to let users >> trigger this manually? >> >> Thanks, >> -- >> Jialin Qiao >> School of Software, Tsinghua University >> >> 乔嘉林 >> 清华大学 软件学院 >> >>> -----原始邮件----- >>> 发件人: "孙泽嵩" <sz...@mails.tsinghua.edu.cn> >>> 发送时间: 2020-06-15 19:14:08 (星期一) >>> 收件人: dev@iotdb.apache.org >>> 抄送: >>> 主题: Re: [IOTDB-726] CheckPoint of MTree >>> >>> Hi Julian, >>> >>> Currently I’m just using plain text file. >>> >>> But I could consider and try with RocksDB : ) >>> I also noticed that there is an issue related to RocksDB integration [1]. >>> >>> >>> [1] https://issues.apache.org/jira/browse/IOTDB-767 >>> >>> >>> Best, >>> ----------------------------------- >>> Zesong Sun >>> School of Software, Tsinghua University >>> >>> 孙泽嵩 >>> 清华大学 软件学院 >>> >>>> 2020年6月15日 19:00,Julian Feinauer <j.feina...@pragmaticminds.de> 写道: >>>> >>>> Hi Zesong, >>>> >>>> this is an excellent Idea! >>>> Do you serialize the snapshot as plain text file? >>>> Or would it make sense to use something like RocksDB for something like >>>> that? >>>> >>>> Julian >>>> >>>> Am 15.06.20, 12:12 schrieb "孙泽嵩" <sz...@mails.tsinghua.edu.cn>: >>>> >>>> Greetings, >>>> >>>> I’m currently working on issue [IOTDB-726] CheckPoint of MTree [1] >>>> >>>> In the situation that there exist a large number of timeseries, it would >>>> take a long time to restart IoTDB by reading mlog.txt and executing the >>>> commands line by line. >>>> For example, it takes about 2 minutes to restart with 20M timeseries. >>>> >>>> To solve this problem, “checkpoint” is designed and added to MTree to >>>> reduce the time of reading mlog when IoTDB restarts: >>>> Generate a snapshot, which includes the serialization of MTree, every >>>> time mlog reaches a certain number of lines. >>>> When a new snapshot is generated, the old one is deleted. Snapshot file >>>> and mlog.txt are in the same directory. >>>> >>>> Users could configure the threshold number of the mlog lines. By default, >>>> a snapshot is generated for every 100k lines. >>>> >>>> I’ve already made a demo and proved that the method could speed up the >>>> restarting process. >>>> As for the reading mlog.txt and initializing MTree part, it reduces time >>>> by 28.3% (16.6s with origin method, 11.9s with new demo, both for 2M >>>> timeseries). >>>> >>>> I would like to make a PR afterwards. If you have any suggestions about >>>> the design, feel free to discuss with me. >>>> >>>> >>>> [1] https://issues.apache.org/jira/browse/IOTDB-726 >>>> >>>> >>>> Best, >>>> ----------------------------------- >>>> Zesong Sun >>>> School of Software, Tsinghua University >>>> >>>> 孙泽嵩 >>>> 清华大学 软件学院 >>>> >>>> >>> >