> Another thing we could consider is to chunk them according to their namespaces in folders / files or any other struct.
according to the Storage group names, for example. ----------------------------------- Xiangdong Huang School of Software, Tsinghua University 黄向东 清华大学 软件学院 Julian Feinauer <j.feina...@pragmaticminds.de> 于2020年6月19日周五 下午4:54写道: > Another thing we could consider is to chunk them according to their > namespaces in folders / files or any other struct. Then we could > efficiently do lazy loading and only pick what we really need. > > WDYT? > > Am 19.06.20, 10:36 schrieb "Xiangdong Huang" <saint...@gmail.com>: > > > I did an experiment for 1M timeseries, and the serialization process > costs 971ms. > > 971ms for Serializing 1M timeseries, but 6 seconds for deserializing? > > > I didn’t time this … I’ll do an experiment after fixing the suggested > changes in current PR [1] > > The problem of current PR is that your snapshot is larger and larger > along > with the system running. > Any idea about this case? > > Best, > ----------------------------------- > Xiangdong Huang > School of Software, Tsinghua University > > 黄向东 > 清华大学 软件学院 > > > 孙泽嵩 <sz...@mails.tsinghua.edu.cn> 于2020年6月19日周五 下午2:20写道: > > > Wow, thanks, Julian! > > > > Let me try and do experiments to get the best result : ) > > > > Best, > > ----------------------------------- > > Zesong Sun > > School of Software, Tsinghua University > > > > 孙泽嵩 > > 清华大学 软件学院 > > > > > 2020年6月19日 14:14,Julian Feinauer <j.feina...@pragmaticminds.de> > 写道: > > > > > > Oh and another note. By using a faster serialization Lib than Java > > default we could ideally speed up the process up to 10x. > > > > > > See eg here https://github.com/RuedigerMoeller/fast-serialization > > > > > > Julian > > > > > > Holen Sie sich Outlook für Android<https://aka.ms/ghei36> > > > > > > ________________________________ > > > From: Julian Feinauer <j.feina...@pragmaticminds.de> > > > Sent: Friday, June 19, 2020 8:11:56 AM > > > To: dev@iotdb.apache.org <dev@iotdb.apache.org> > > > Subject: Re: [IOTDB-726] CheckPoint of MTree > > > > > > What about using some kind of cache that spills to disk. That way > we > > would be up in no time and just lazy load devices when needed. > > > > > > I remember that eh cache has such features ( > > https://www.baeldung.com/ehcache) but there are other > implementations as > > well. > > > > > > Julian > > > > > > Holen Sie sich Outlook für Android<https://aka.ms/ghei36> > > > > > > ________________________________ > > > From: 孙泽嵩 <sz...@mails.tsinghua.edu.cn> > > > Sent: Friday, June 19, 2020 7:57:51 AM > > > To: dev@iotdb.apache.org <dev@iotdb.apache.org> > > > Subject: Re: [IOTDB-726] CheckPoint of MTree > > > > > > Hi Jialin, > > > > > > I did an experiment for 1M timeseries, and the serialization > process > > costs 971ms. > > > > > > Maybe we could consider creating a snapshot when the MTree is not > > changed for a long time (for example, one hour). > > > > > > In this way, the client will not be stuck and users may not even > notice > > it. > > > > > > > > > Best, > > > ----------------------------------- > > > Zesong Sun > > > School of Software, Tsinghua University > > > > > > 孙泽嵩 > > > 清华大学 软件学院 > > > > > >> 2020年6月18日 16:19,孙泽嵩 <sz...@mails.tsinghua.edu.cn> 写道: > > >> > > >> Hi, > > >> > > >> Good opinions! > > >> > > >>> how about adding a "create snapshot for schema" sql to let users > > trigger this manually > > >> > > >> I’ll add this sql in a new PR. > > >> > > >>> how long it takes to recover from a 1M timeseries snapshot. > > >> > > >> Based on my previous experiment, it takes about 6s as you said. > > >> > > >>> how long it takes to create a snapshot for 1M/10M timeseries? > > >> > > >> I didn’t time this … I’ll do an experiment after fixing the > suggested > > changes in current PR [1] > > >> > > >> > > >> [1] https://github.com/apache/incubator-iotdb/pull/1384 > > >> > > >> > > >> Best, > > >> ----------------------------------- > > >> Zesong Sun > > >> School of Software, Tsinghua University > > >> > > >> 孙泽嵩 > > >> 清华大学 软件学院 > > >> > > >>> 2020年6月18日 14:39,Jialin Qiao <qj...@mails.tsinghua.edu.cn> 写道: > > >>> > > >>> Hi, > > >>> > > >>> Currently, the snapshot is triggered every xxx lines in mlog.txt. > > >>> When meeting 20M timeseries, the default 10k lines will cause > too many > > snapshot, which will block the creating. > > >>> However, if we enlarge the condition to 1M, the last 1M will take > > about 6s to recover, about 160K per second. > > >>> > > >>> So, my concern is how long it takes to create a snapshot for > 1M/10M > > timeseries? And how long it takes to recover from a 1M timeseries > snapshot. > > >>> > > >>> Besides, how about adding a "create snapshot for schema" sql to > let > > users trigger this manually? > > >>> > > >>> Thanks, > > >>> -- > > >>> Jialin Qiao > > >>> School of Software, Tsinghua University > > >>> > > >>> 乔嘉林 > > >>> 清华大学 软件学院 > > >>> > > >>>> -----原始邮件----- > > >>>> 发件人: "孙泽嵩" <sz...@mails.tsinghua.edu.cn> > > >>>> 发送时间: 2020-06-15 19:14:08 (星期一) > > >>>> 收件人: dev@iotdb.apache.org > > >>>> 抄送: > > >>>> 主题: Re: [IOTDB-726] CheckPoint of MTree > > >>>> > > >>>> Hi Julian, > > >>>> > > >>>> Currently I’m just using plain text file. > > >>>> > > >>>> But I could consider and try with RocksDB : ) > > >>>> I also noticed that there is an issue related to RocksDB > integration > > [1]. > > >>>> > > >>>> > > >>>> [1] https://issues.apache.org/jira/browse/IOTDB-767 > > >>>> > > >>>> > > >>>> Best, > > >>>> ----------------------------------- > > >>>> Zesong Sun > > >>>> School of Software, Tsinghua University > > >>>> > > >>>> 孙泽嵩 > > >>>> 清华大学 软件学院 > > >>>> > > >>>>> 2020年6月15日 19:00,Julian Feinauer <j.feina...@pragmaticminds.de> > 写道: > > >>>>> > > >>>>> Hi Zesong, > > >>>>> > > >>>>> this is an excellent Idea! > > >>>>> Do you serialize the snapshot as plain text file? > > >>>>> Or would it make sense to use something like RocksDB for > something > > like that? > > >>>>> > > >>>>> Julian > > >>>>> > > >>>>> Am 15.06.20, 12:12 schrieb "孙泽嵩" <sz...@mails.tsinghua.edu.cn > >: > > >>>>> > > >>>>> Greetings, > > >>>>> > > >>>>> I’m currently working on issue [IOTDB-726] CheckPoint of MTree > [1] > > >>>>> > > >>>>> In the situation that there exist a large number of > timeseries, it > > would take a long time to restart IoTDB by reading mlog.txt and > executing > > the commands line by line. > > >>>>> For example, it takes about 2 minutes to restart with 20M > timeseries. > > >>>>> > > >>>>> To solve this problem, “checkpoint” is designed and added to > MTree > > to reduce the time of reading mlog when IoTDB restarts: > > >>>>> Generate a snapshot, which includes the serialization of MTree, > > every time mlog reaches a certain number of lines. > > >>>>> When a new snapshot is generated, the old one is deleted. > Snapshot > > file and mlog.txt are in the same directory. > > >>>>> > > >>>>> Users could configure the threshold number of the mlog lines. > By > > default, a snapshot is generated for every 100k lines. > > >>>>> > > >>>>> I’ve already made a demo and proved that the method could > speed up > > the restarting process. > > >>>>> As for the reading mlog.txt and initializing MTree part, it > reduces > > time by 28.3% (16.6s with origin method, 11.9s with new demo, both > for 2M > > timeseries). > > >>>>> > > >>>>> I would like to make a PR afterwards. If you have any > suggestions > > about the design, feel free to discuss with me. > > >>>>> > > >>>>> > > >>>>> [1] https://issues.apache.org/jira/browse/IOTDB-726 > > >>>>> > > >>>>> > > >>>>> Best, > > >>>>> ----------------------------------- > > >>>>> Zesong Sun > > >>>>> School of Software, Tsinghua University > > >>>>> > > >>>>> 孙泽嵩 > > >>>>> 清华大学 软件学院 > > >>>>> > > >>>>> > > >>>> > > >> > > > > > > > > >