[jira] [Commented] (HBASE-14918) In-Memory MemStore Flush and Compaction

Anastasia Braginsky (JIRA) Tue, 23 Feb 2016 06:56:14 -0800

    [ 
https://issues.apache.org/jira/browse/HBASE-14918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15158983#comment-15158983
 ]


Anastasia Braginsky commented on HBASE-14918:
---------------------------------------------

Thank you for your immediate attention [~stack]!

Of course, we looked on CellBlock from HBASE-10713
The code there is very well written with comments and thus possible to 
understand from just reading the patch. Kudos [~anoop.hbase] :) !
(At least I hope that I understand it :) and [~anoop.hbase] please correct me 
if I am wrong.)

Alongside with some restructuring and refactoring (partially issued by 
HBASE-14919), the CellBlocks suggests to use ArrayList of PositionedByteRange 
as the underlying data structure.
PositionedByteRange and SimplePositionedByteRange are allocated simply from JVM 
heap.
The code treats many details and also provides a very important 
CellBlockScanner to scan the new data structure.
In light of the recent MemStore refactoring, the CellBlock patch clearly can 
not be used as is.
However, the most important and deep parts of the code are very valuable and 
definitely can be reused.

Thus we suggest CellBlocksSegment, which fits into new Segments structure of 
MemStore and inherits from ImmutableSegment.
Underneath, CellBlocksSegment has the same idea of CellBlock. 
Just striving to use an array of arrays, instead of list of arrays, in order to 
enjoy the binary search and less memory overhead.
Taking in consideration the earlier [~anoop.hbase]'s comments about MSLAB (and 
a simple common sense) we suggest to use MSLAB for allocating any sequence of 
bytes.
Please note that MSLAB is very suitable also because it issues the reference 
counting for chunk scans and thus the deallocation of the chunks per segment.
As far as for now MSLAB doesn't support off-heap allocation, the 
PositionedByteRange can be replaced by ByteRange/Chunk currently returned by 
MSLAB. Also little more tuning is required.

As completely orthogonal, but related issue we also see a possibility of 
enhancing the MSLAB and adding it an ability to allocate its Chunks on- and 
off-heap.
It is probably issue for sub-task number 5 of HBASE-14918 :)
Obviously, this requires some redesign of MemStoreLAB, HeapMemStoreLab.Chunk, 
and some other classes around the memory allocation.
In particular, the implementation of HeapMemStoreLab.Chunk with "byte[] field" 
and the usage of ByteRange, can be replaced with (for example) ByteBuffer.
(ByteBufferArray from hbase-common/org.apache.hadoop.hbase.util also looks very 
interesting :))
I agree that it is better to pre-allocate the off-heap Chunks, for that we can 
probably enhance the MemStoreChunkPool.
I took a look on the BoundedByteBufferPool, which I found only in hbase-client 
code. It also looks very suitable, however in different component.

Sorry for this long monolog :)
[~anoop.hbase], [~stack], everybody, what do you think?
I am thrilled to hear your insightful comments! :))))))
Thanks!

> In-Memory MemStore Flush and Compaction
> ---------------------------------------
>
>                 Key: HBASE-14918
>                 URL: https://issues.apache.org/jira/browse/HBASE-14918
>             Project: HBase
>          Issue Type: Umbrella
>    Affects Versions: 2.0.0
>            Reporter: Eshcar Hillel
>            Assignee: Eshcar Hillel
>             Fix For: 0.98.18
>
>         Attachments: CellBlocksSegmentDesign.pdf, MSLABMove.patch
>
>
> A memstore serves as the in-memory component of a store unit, absorbing all 
> updates to the store. From time to time these updates are flushed to a file 
> on disk, where they are compacted (by eliminating redundancies) and 
> compressed (i.e., written in a compressed format to reduce their storage 
> size).
> We aim to speed up data access, and therefore suggest to apply in-memory 
> memstore flush. That is to flush the active in-memory segment into an 
> intermediate buffer where it can be accessed by the application. Data in the 
> buffer is subject to compaction and can be stored in any format that allows 
> it to take up smaller space in RAM. The less space the buffer consumes the 
> longer it can reside in memory before data is flushed to disk, resulting in 
> better performance.
> Specifically, the optimization is beneficial for workloads with 
> medium-to-high key churn which incur many redundant cells, like persistent 
> messaging. 
> We suggest to structure the solution as 4 subtasks (respectively, patches). 
> (1) Infrastructure - refactoring of the MemStore hierarchy, introducing 
> segment (StoreSegment) as first-class citizen, and decoupling memstore 
> scanner from the memstore implementation;
> (2) Adding StoreServices facility at the region level to allow memstores 
> update region counters and access region level synchronization mechanism;
> (3) Implementation of a new memstore (CompactingMemstore) with non-optimized 
> immutable segment representation, and 
> (4) Memory optimization including compressed format representation and off 
> heap allocations.
> This Jira continues the discussion in HBASE-13408.
> Design documents, evaluation results and previous patches can be found in 
> HBASE-13408. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-14918) In-Memory MemStore Flush and Compaction

Reply via email to