[ 
https://issues.apache.org/jira/browse/HBASE-13408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14436431#comment-14436431
 ] 

stack commented on HBASE-13408:
-------------------------------

In the doc it says the proposal is for in-memory column families only and may 
not be generally unless there are lots of instances of Cells at exact same 
coordinates. But as Lars says above, the memstore is a costly data structure 
for keeping all in-memory state sorted; a compacted version that was hfile 
sorted could make for better perf than the skiplist (as speculated over in 
HBASE-5311).

Other comments:

bq. The data is kept in memory for as long as possible

What Duo says above...We need to flush to free up WALs to contain our 
WAL-burden of edits to replay on crash.

bq. pull the last component of the compaction pipeline and shift it to snapshot

What is involved running above step?

bq. CellSetMgr

What is one of these? It is a skiplist?

What do you think of the attempt at lockless snapshotting suggested over in 
HBASE-5311

Thanks for taking this up



> HBase In-Memory Memstore Compaction
> -----------------------------------
>
>                 Key: HBASE-13408
>                 URL: https://issues.apache.org/jira/browse/HBASE-13408
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Eshcar Hillel
>         Attachments: HBaseIn-MemoryMemstoreCompactionDesignDocument.pdf
>
>
> A store unit holds a column family in a region, where the memstore is its 
> in-memory component. The memstore absorbs all updates to the store; from time 
> to time these updates are flushed to a file on disk, where they are 
> compacted. Unlike disk components, the memstore is not compacted until it is 
> written to the filesystem and optionally to block-cache. This may result in 
> underutilization of the memory due to duplicate entries per row, for example, 
> when hot data is continuously updated. 
> Generally, the faster the data is accumulated in memory, more flushes are 
> triggered, the data sinks to disk more frequently, slowing down retrieval of 
> data, even if very recent.
> In high-churn workloads, compacting the memstore can help maintain the data 
> in memory, and thereby speed up data retrieval. 
> We suggest a new compacted memstore with the following principles:
> 1.    The data is kept in memory for as long as possible
> 2.    Memstore data is either compacted or in process of being compacted 
> 3.    Allow a panic mode, which may interrupt an in-progress compaction and 
> force a flush of part of the memstore.
> We suggest applying this optimization only to in-memory column families.
> A design document is attached.
> This feature was previously discussed in HBASE-5311.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to