[ 
https://issues.apache.org/jira/browse/HBASE-13408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14710978#comment-14710978
 ] 

Anoop Sam John commented on HBASE-13408:
----------------------------------------

The in memory memstore compaction help us in many ways. I think the one in your 
use case is where data is constantly updated. So the in memory compaction can 
remove old cells.  One more thing is it can allow us to use much bigger 
memstore size and hold more cells in memory. The CSLM having perf impact when 
the #entries in it increases. So if we have in mem compaction in btw, we can 
overcome this limitation.
bq.one other thing I was expecting was that the compacted version of the 
memstore was written as an in-memory hfile, so we can have leverage stuff like 
compression and encoding. but from the code looks like the compacted version 
(memstore segment?) is just another skiplist
This is the concern I also have which I raised in some older comments.  If we 
can do this, we can help all kind of use cases where the update of cells not 
happening. The CSLM each entry having some heap size overhead which we can 
avoid.

Also the in mem compaction memstore is a kind of memstore impl. Now after the 
memstore is pluggable memstore interface impl way, we can have any kind of 
impl. So I was expecting all the things for decision of in mem compaction, how 
it happens etc etc to keep as an impl detail of memstore.  So from outside no 
changes as such required for this. It will be ugly if we have to change outside 
to support an in mem compacting memstore.


> HBase In-Memory Memstore Compaction
> -----------------------------------
>
>                 Key: HBASE-13408
>                 URL: https://issues.apache.org/jira/browse/HBASE-13408
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Eshcar Hillel
>         Attachments: HBASE-13408-trunk-v01.patch, 
> HBASE-13408-trunk-v02.patch, 
> HBaseIn-MemoryMemstoreCompactionDesignDocument-ver02.pdf, 
> HBaseIn-MemoryMemstoreCompactionDesignDocument.pdf, 
> InMemoryMemstoreCompactionEvaluationResults.pdf, 
> InMemoryMemstoreCompactionScansEvaluationResults.pdf
>
>
> A store unit holds a column family in a region, where the memstore is its 
> in-memory component. The memstore absorbs all updates to the store; from time 
> to time these updates are flushed to a file on disk, where they are 
> compacted. Unlike disk components, the memstore is not compacted until it is 
> written to the filesystem and optionally to block-cache. This may result in 
> underutilization of the memory due to duplicate entries per row, for example, 
> when hot data is continuously updated. 
> Generally, the faster the data is accumulated in memory, more flushes are 
> triggered, the data sinks to disk more frequently, slowing down retrieval of 
> data, even if very recent.
> In high-churn workloads, compacting the memstore can help maintain the data 
> in memory, and thereby speed up data retrieval. 
> We suggest a new compacted memstore with the following principles:
> 1.    The data is kept in memory for as long as possible
> 2.    Memstore data is either compacted or in process of being compacted 
> 3.    Allow a panic mode, which may interrupt an in-progress compaction and 
> force a flush of part of the memstore.
> We suggest applying this optimization only to in-memory column families.
> A design document is attached.
> This feature was previously discussed in HBASE-5311.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to