[ 
https://issues.apache.org/jira/browse/HBASE-13408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14988099#comment-14988099
 ] 

stack commented on HBASE-13408:
-------------------------------

Thanks for the update to the design doc. It is great.

"This may result in underutilization of the memory due to duplicate entries per 
row, for example, when hot data is continuously updated."

Other reasons for compactions in memory would be to move data from an 
expensive, fat if-convenient data structure -- the skip list -- to a data 
structure that is more compact in memory and more efficient to access (can 
exploit the fact that it is read-only, can trigger the in-memory flush when 
memstore hits some 'small' size -- should improve perf). The latter is 
especially pertinent in the case where a flush is encoded and/or compressed 
such that the in-memory representation may be, say 128MB, but then the hfile on 
disk is much smaller than this.  IIRC we have seen cases where it takes longer 
to pull an entry from a large skip list than it does from cache. We could also 
remove 'duplicates' by coalescing increments, applying deletes to deleted data 
if present in memory.

Downsides to in-memory compaction is that we carry a larger backlog of WALs 
making MTTR take longer.

bq. Data is kept in memory for as long as possible, by periodically compacting 
the memstore content.

I think the principal should be more that you will afford yourself the luxury 
of being able to make smarter decisions on when to flush (Keeping data in 
memory as long as possible could actual be detrimental over the long run).

bq. A flush call may interrupt an in­progress compaction and flush to the disk 
part of the data.

The part that will be flushed is the 'compacted' part? Just trying to 
understand better.

On name of the config., I think it should be IN_MEMORY_COMPACTION rather than 
COMPACTED. We already have a compaction process. Add the IN_MEMORY prefix to 
show where it applies.  Also, this feature should be on by default. You might 
want to say that in the doc as being your intent. Lets prove it makes life 
better so we can indeed enable it by default.

Can the in-memory flush use same code as the flush-to-disk flush? Ditto on 
compaction?

bq. . An additional flush­total­size is defined as the average of flush­size 
and blocking­flush­size. 

Pardon me, what is the above for?

bq.  In case of FlushLargeStoresPolicy, the decision is based on the size of 
the active segment.

I'm not clear on when a flush to disk will happen. I get you need headroom for 
in-memory flush and compaction to run but can you be more clear on where the 
threshold for flush to disk is?  Thanks.

What is a snapshot in this scheme? It is just the last segment in the pipeline? 
Each pipeline segment because an hfile? Or we have to do a merge sort on flush 
to make the hfile?

{code}
Upon in­memory flush, we take advantage of the existing blocking mechanism ­­ 
the active
segment is pushed to the pipeline while holding the region updatesLockin 
exclusive mode.
Then, a compaction is applied at the background to all pipeline segments 
resulting in a single
immutable segment. This procedure is executed at the scope of the store level, 
and specifically
in the context of the compacted memstore.
{code}

Do we hold the region lock while we compact the in-memory segments on a column 
family? Every time a compaction runs, it compacts all segments in the pipeline?

I'm not sure I follow the approximation of oldest sequence id. Why does it have 
to to be an approximation? Can't we keep a running oldest sequenceid seen? 
(Removing it too if compactions remove the cell that is oldest -- I suppose 
this could complicate acccounting... how you find the next oldest). So, you 
have a map of timestamp to oldest sequenceid in the segment? And when a segment 
is flushed to disk, the next oldest becomes oldest edit in memory?

Do you have a rig where you can try out your implementation apart from running 
it inside a regionserver?

Should be CompactingMemStore rather than CompactedMemStore.

The class diagram looks great.

So, on design, we talking about adding one more thread -- a compacting thread 
-- per Store? Do we have to do this?  HBase has too many threads already. 
Minimally, these threads can be run by an executor so they don't live for ever 
while the process is up?  It could be trigger by an in-memory flush.

On compactionpipeline, add rather than push-in and remove rather than pull-out?

On MemstoreScanner, we are keeping the fact that the implementation is crossing 
Segments an internal implementation detail? +1 on making the MemStoreScanner 
first-class detached from DefaultMemStore.

MemStoreCompactor​ should not be first class, right? It should be internal to 
the CompactingMemStore implementation?

Yeah, its fine if internally the kv sets go into a StoreSegment...but clients 
don't have to know this, right?

How will you implement low priority compacting memstore? (Not important, just 
interested)

We should add your seek and sequenceid to KeyValueScanner so you don't have to 
have them implement something else (we have too many tiers of classes already)? 
What have a set sequenceid rather than pass to the scanner on construction?  
Same with the seek.

Do you have a harness so you can run this little engine outside of a 
regionserver? Might make testing and proofing, debugging easier?

I suppose you'll deliver a skiplist version first and then move on to work on 
in-memory storefile, a more compact in-memory representation?

Design writeup is great. Thanks.  Let me review your last rb too... might help.
























> HBase In-Memory Memstore Compaction
> -----------------------------------
>
>                 Key: HBASE-13408
>                 URL: https://issues.apache.org/jira/browse/HBASE-13408
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Eshcar Hillel
>            Assignee: Eshcar Hillel
>             Fix For: 2.0.0
>
>         Attachments: HBASE-13408-trunk-v01.patch, 
> HBASE-13408-trunk-v02.patch, HBASE-13408-trunk-v03.patch, 
> HBASE-13408-trunk-v04.patch, HBASE-13408-trunk-v05.patch, 
> HBASE-13408-trunk-v06.patch, HBASE-13408-trunk-v07.patch, 
> HBASE-13408-trunk-v08.patch, 
> HBaseIn-MemoryMemstoreCompactionDesignDocument-ver02.pdf, 
> HBaseIn-MemoryMemstoreCompactionDesignDocument-ver03.pdf, 
> HBaseIn-MemoryMemstoreCompactionDesignDocument.pdf, 
> InMemoryMemstoreCompactionEvaluationResults.pdf, 
> InMemoryMemstoreCompactionMasterEvaluationResults.pdf, 
> InMemoryMemstoreCompactionScansEvaluationResults.pdf, 
> StoreSegmentandStoreSegmentScannerClassHierarchies.pdf
>
>
> A store unit holds a column family in a region, where the memstore is its 
> in-memory component. The memstore absorbs all updates to the store; from time 
> to time these updates are flushed to a file on disk, where they are 
> compacted. Unlike disk components, the memstore is not compacted until it is 
> written to the filesystem and optionally to block-cache. This may result in 
> underutilization of the memory due to duplicate entries per row, for example, 
> when hot data is continuously updated. 
> Generally, the faster the data is accumulated in memory, more flushes are 
> triggered, the data sinks to disk more frequently, slowing down retrieval of 
> data, even if very recent.
> In high-churn workloads, compacting the memstore can help maintain the data 
> in memory, and thereby speed up data retrieval. 
> We suggest a new compacted memstore with the following principles:
> 1.    The data is kept in memory for as long as possible
> 2.    Memstore data is either compacted or in process of being compacted 
> 3.    Allow a panic mode, which may interrupt an in-progress compaction and 
> force a flush of part of the memstore.
> We suggest applying this optimization only to in-memory column families.
> A design document is attached.
> This feature was previously discussed in HBASE-5311.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to