Re: [IOTDB-726] CheckPoint of MTree

Julian Feinauer Thu, 18 Jun 2020 23:12:17 -0700

What about using some kind of cache that spills to disk. That way we would be 
up in no time and just lazy load devices when needed.


I remember that eh cache has such features (https://www.baeldung.com/ehcache) 
but there are other implementations as well.

Julian

Holen Sie sich Outlook für Android<https://aka.ms/ghei36>

________________________________
From: 孙泽嵩 <[email protected]>
Sent: Friday, June 19, 2020 7:57:51 AM
To: [email protected] <[email protected]>
Subject: Re: [IOTDB-726] CheckPoint of MTree

Hi Jialin,

I did an experiment for 1M timeseries, and the serialization process costs 
971ms.

Maybe we could consider creating a snapshot when the MTree is not changed for a 
long time (for example, one hour).

In this way, the client will not be stuck and users may not even notice it.


Best,
-----------------------------------
Zesong Sun
School of Software, Tsinghua University

孙泽嵩
清华大学 软件学院

> 2020年6月18日 16:19，孙泽嵩 <[email protected]> 写道：
>
> Hi,
>
> Good opinions!
>
>> how about adding a "create snapshot for schema" sql to let users trigger 
>> this manually
>
> I’ll add this sql in a new PR.
>
>> how long it takes to recover from a 1M timeseries snapshot.
>
> Based on my previous experiment, it takes about 6s as you said.
>
>> how long it takes to create a snapshot for 1M/10M timeseries?
>
> I didn’t time this … I’ll do an experiment after fixing the suggested changes 
> in current PR [1]
>
>
> [1] https://github.com/apache/incubator-iotdb/pull/1384
>
>
> Best,
> -----------------------------------
> Zesong Sun
> School of Software, Tsinghua University
>
> 孙泽嵩
> 清华大学 软件学院
>
>> 2020年6月18日 14:39，Jialin Qiao <[email protected]> 写道：
>>
>> Hi,
>>
>> Currently, the snapshot is triggered every xxx lines in mlog.txt.
>> When meeting 20M timeseries, the default 10k lines will cause too many 
>> snapshot, which will block the creating.
>> However, if we enlarge the condition to 1M, the last 1M will take about 6s 
>> to recover, about 160K per second.
>>
>> So, my concern is how long it takes to create a snapshot for 1M/10M 
>> timeseries? And how long it takes to recover from a 1M timeseries snapshot.
>>
>> Besides, how about adding a "create snapshot for schema" sql to let users 
>> trigger this manually?
>>
>> Thanks,
>> --
>> Jialin Qiao
>> School of Software, Tsinghua University
>>
>> 乔嘉林
>> 清华大学 软件学院
>>
>>> -----原始邮件-----
>>> 发件人: "孙泽嵩" <[email protected]>
>>> 发送时间: 2020-06-15 19:14:08 (星期一)
>>> 收件人: [email protected]
>>> 抄送:
>>> 主题: Re: [IOTDB-726] CheckPoint of MTree
>>>
>>> Hi Julian,
>>>
>>> Currently I’m just using plain text file.
>>>
>>> But I could consider and try with RocksDB : )
>>> I also noticed that there is an issue related to RocksDB integration [1].
>>>
>>>
>>> [1] https://issues.apache.org/jira/browse/IOTDB-767
>>>
>>>
>>> Best,
>>> -----------------------------------
>>> Zesong Sun
>>> School of Software, Tsinghua University
>>>
>>> 孙泽嵩
>>> 清华大学 软件学院
>>>
>>>> 2020年6月15日 19:00，Julian Feinauer <[email protected]> 写道：
>>>>
>>>> Hi Zesong,
>>>>
>>>> this is an excellent Idea!
>>>> Do you serialize the snapshot as plain text file?
>>>> Or would it make sense to use something like RocksDB for something like 
>>>> that?
>>>>
>>>> Julian
>>>>
>>>> Am 15.06.20, 12:12 schrieb "孙泽嵩" <[email protected]>:
>>>>
>>>>  Greetings,
>>>>
>>>>  I’m currently working on issue [IOTDB-726] CheckPoint of MTree [1]
>>>>
>>>>  In the situation that there exist a large number of timeseries, it would 
>>>> take a long time to restart IoTDB by reading mlog.txt and executing the 
>>>> commands line by line.
>>>>  For example, it takes about 2 minutes to restart with 20M timeseries.
>>>>
>>>>  To solve this problem, “checkpoint” is designed and added to MTree to 
>>>> reduce the time of reading mlog when IoTDB restarts:
>>>>  Generate a snapshot, which includes the serialization of MTree, every 
>>>> time mlog reaches a certain number of lines.
>>>>  When a new snapshot is generated, the old one is deleted. Snapshot file 
>>>> and mlog.txt are in the same directory.
>>>>
>>>>  Users could configure the threshold number of the mlog lines. By default, 
>>>> a snapshot is generated for every 100k lines.
>>>>
>>>>  I’ve already made a demo and proved that the method could speed up the 
>>>> restarting process.
>>>>  As for the reading mlog.txt and initializing MTree part, it reduces time 
>>>> by 28.3% (16.6s with origin method, 11.9s with new demo, both for 2M 
>>>> timeseries).
>>>>
>>>>  I would like to make a PR afterwards. If you have any suggestions about 
>>>> the design, feel free to discuss with me.
>>>>
>>>>
>>>>  [1] https://issues.apache.org/jira/browse/IOTDB-726
>>>>
>>>>
>>>>  Best,
>>>>  -----------------------------------
>>>>  Zesong Sun
>>>>  School of Software, Tsinghua University
>>>>
>>>>  孙泽嵩
>>>>  清华大学 软件学院
>>>>
>>>>
>>>
>

Re: [IOTDB-726] CheckPoint of MTree

Reply via email to