[ 
https://issues.apache.org/jira/browse/HBASE-13408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14482938#comment-14482938
 ] 

Eshcar Hillel commented on HBASE-13408:
---------------------------------------

Thank you [~zhangduo] for raising the important WAL truncating issue and 
[~lhofhansl] and [~stack] for raising the components format issue. These two 
issues should definitely be addressed in our solution. 
1.      When the memstore compactor completes a compaction it can inquire the 
resulting component for the oldest record sequence id, and use it to apply WAL 
truncation. This might not be good enough in all scenarios, in which case the 
memstore should get into a panic mode and do a real flush. So there are several 
triggers for entering a panic mode, one relates to the memstore size and the 
other relates to the WAL size. 
2.      The CellSetMgr and CellSetScanner abstractions we suggested should 
allow for easy support of any cell storage format. Specifically, the active set 
can use a skip-list to absorb the updates and the compactor can generate 
b-trees or any other cache friendly format. We can use a Factory pattern for 
this purpose.

There is no technical challenge in making this feature available for all column 
families; however, we believe in-memory columns have better chance of 
benefiting from it while in the general case this memstore could put a burden 
on the region server. If you believe this has the potential to improve 
performance also in other scenarios there is no reason not to make it a first 
citizen column type. 

A CellSetMgr, as explained above, is an abstraction of the cell set storage, be 
it skip list or a b-tree, w/o SLAB, compressed or not, and any other details 
that should be encapsulated and de-coupled from the users of these objects.

HBASE-5311 suggested using an RCU-like mechanism to protect the components 
(layers) of the memstore as they shift around, and also applied a freezing 
phase. Our solution uses the existing sync mechanism to push the component into 
the pipeline. Once the component is in the pipeline it is read-only, therefore 
can be accessed without using locks. The only part we might need to protect is 
when we swap the subset of pipeline components with the new single compacted 
component. This should be as easy as changing a pointer and can use an RCU as 
well. When no protection is applied a concurrent reader can miss this swap then 
it goes through the “old” components, which is more expensive but is still 
correct.
 
Shifting a component from the pipeline to the snapshot should be the same as 
shifting it from active set to snapshot (as it is done today).

> HBase In-Memory Memstore Compaction
> -----------------------------------
>
>                 Key: HBASE-13408
>                 URL: https://issues.apache.org/jira/browse/HBASE-13408
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Eshcar Hillel
>         Attachments: HBaseIn-MemoryMemstoreCompactionDesignDocument.pdf
>
>
> A store unit holds a column family in a region, where the memstore is its 
> in-memory component. The memstore absorbs all updates to the store; from time 
> to time these updates are flushed to a file on disk, where they are 
> compacted. Unlike disk components, the memstore is not compacted until it is 
> written to the filesystem and optionally to block-cache. This may result in 
> underutilization of the memory due to duplicate entries per row, for example, 
> when hot data is continuously updated. 
> Generally, the faster the data is accumulated in memory, more flushes are 
> triggered, the data sinks to disk more frequently, slowing down retrieval of 
> data, even if very recent.
> In high-churn workloads, compacting the memstore can help maintain the data 
> in memory, and thereby speed up data retrieval. 
> We suggest a new compacted memstore with the following principles:
> 1.    The data is kept in memory for as long as possible
> 2.    Memstore data is either compacted or in process of being compacted 
> 3.    Allow a panic mode, which may interrupt an in-progress compaction and 
> force a flush of part of the memstore.
> We suggest applying this optimization only to in-memory column families.
> A design document is attached.
> This feature was previously discussed in HBASE-5311.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to