[jira] [Commented] (HBASE-13408) HBase In-Memory Memstore Compaction

Lars Hofhansl (JIRA) Sun, 05 Apr 2015 10:46:48 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-13408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14396342#comment-14396342
 ]


Lars Hofhansl commented on HBASE-13408:
---------------------------------------

Why not continue on HBASE-5311? In any case, good to pick this topic up again!

Some comments/questions:
* [~Apache9] the memstore (by default) will limit the maximum time of any edit 
in the memstore to 1h. So that should be OK.
* The in-memstore compaction has to be SLAB aware or we'll get horrible 
fragmentation issues (maybe that's what you meant with MAB on the doc)
* A skiplist is actually a bad data structure when it comes to cache line 
locality. The HFile format is much better. So if the data is compacted anyway, 
might as well write it in HFile format, that would also allow to write that to 
disk later.
* If the "compactions" will simply remove expired KVs, it will likely make 
things worse. (that was also my initial thought on HBASE-5311, but it will not 
work)


> HBase In-Memory Memstore Compaction
> -----------------------------------
>
>                 Key: HBASE-13408
>                 URL: https://issues.apache.org/jira/browse/HBASE-13408
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Eshcar Hillel
>         Attachments: HBaseIn-MemoryMemstoreCompactionDesignDocument.pdf
>
>
> A store unit holds a column family in a region, where the memstore is its 
> in-memory component. The memstore absorbs all updates to the store; from time 
> to time these updates are flushed to a file on disk, where they are 
> compacted. Unlike disk components, the memstore is not compacted until it is 
> written to the filesystem and optionally to block-cache. This may result in 
> underutilization of the memory due to duplicate entries per row, for example, 
> when hot data is continuously updated. 
> Generally, the faster the data is accumulated in memory, more flushes are 
> triggered, the data sinks to disk more frequently, slowing down retrieval of 
> data, even if very recent.
> In high-churn workloads, compacting the memstore can help maintain the data 
> in memory, and thereby speed up data retrieval. 
> We suggest a new compacted memstore with the following principles:
> 1.    The data is kept in memory for as long as possible
> 2.    Memstore data is either compacted or in process of being compacted 
> 3.    Allow a panic mode, which may interrupt an in-progress compaction and 
> force a flush of part of the memstore.
> We suggest applying this optimization only to in-memory column families.
> A design document is attached.
> This feature was previously discussed in HBASE-5311.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-13408) HBase In-Memory Memstore Compaction

Reply via email to