Hi,

I think we can split the task 1~3 as sub-tasks in JIRA.

And, I recommend to learn how Cassandra manages memory (in package:
org.apache.cassandra.utils.memory) and then design our strategy.

Best,
-----------------------------------
Xiangdong Huang
School of Software, Tsinghua University

 黄向东
清华大学 软件学院


kangrong (JIRA) <j...@apache.org> 于2019年4月22日周一 下午12:42写道:

> kangrong created IOTDB-84:
> -----------------------------
>
>              Summary: Out-of-Memory bug
>                  Key: IOTDB-84
>                  URL: https://issues.apache.org/jira/browse/IOTDB-84
>              Project: Apache IoTDB
>           Issue Type: Bug
>             Reporter: kangrong
>          Attachments: image-2019-04-22-12-38-04-903.png
>
> It occurs out-of-memory problem in the last long-term test of branch
> "add_disabled_mem_control":
>
> !image-2019-04-22-12-38-04-903.png!
>
> We analyze the reason and try to solve it as follows:
>  # 1. *Flushing to disk may double the memory cost*: A storage group
> maintains a list of ChunkGroups in memory and will be flushed to disk when
> its occupied memory exceeding the threshold (128MB by default).
>  ## In the current implementation, when starting to flush data, a
> ChunkGroup is encoded in memory and thereby a new byte array is kept in
> memory. Until all ChunkGroups have been encoded in memory, their
> corresponding byte arrays can be released together. Since the byte array
> has a comparable size with original data (0.5× to 1×), the above strategy
> may double the memory in the worst case.
>  ## Solution: It is needed to redesign the flush strategy. In TsFile, a
> Page is the minimal flush unit, where a ChunkGroup contains several Chunks
> and a Chunk contains several pages. Once a page is encoded into a byte
> array, we can flush the byte array to disk and then release it. In this
> case, the extra memory is a page size (64KB by default) at most. This
> modification involves a list of cascading change, including metadata format
> and writing process.
>  # *Memory Control Strategy*: It is needed to redesign the memory Control
> Strategy. For example, assigning 60% memory to the writing process and 30%
> memory to the querying process. The writing memory includes the memory
> table and the flush process. As an Insert coming, if its required memory
> exceeds TotalMem * 0.6 - MemTableUsage - FlushUsage, the Insert will be
> rejected.
>  # *Is the memory statistics accuracy?* In current codes, the memory usage
> of a TSRecord Java Object, corresponding to an Insert SQL, is calculated by
> summating its DataPoints. e.g., "insert into root.a.b.c(timestamp,v1, v2)
> values(1L, true, 1.2f)", its usage is 8 + 1 + 4=13, which ignores the size
> of object head and others. It is needed to redesign the memory statistics
> accuracy carefully.
>  # *Is there still the memory leak?* As shown in the log of the last crash
> due to the out of memory exception, we find out the actual JVM memory is
> 18G, whereas our memory statistic module only counts 8G. Besides the
> inaccuracy mentioned in Q3, we doubt there are still memory leak or other
> potential problems. We will continue to debug it.
>
>
>
>
>
> --
> This message was sent by Atlassian JIRA
> (v7.6.3#76005)
>

Reply via email to