[ 
https://issues.apache.org/jira/browse/HBASE-13408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14990234#comment-14990234
 ] 

Eshcar Hillel commented on HBASE-13408:
---------------------------------------

Great comments and questions [~stack].
We will work on improving the document and code along the lines you suggested 
and the code review. Meanwhile here are some answers and clarifications:

bq. The part that will be flushed is the 'compacted' part?

Yes. And specifically, it would be the tail of the compaction pipeline which is 
comprised of segments list.

bq. On name of the config., I think it should be IN_MEMORY_COMPACTION rather 
than COMPACTED

We’ll change the name, however we feel it is better to have it off by default, 
at least until users/applications are fully aware of the implications of this 
feature.

bq. Can the in-memory flush use same code as the flush-to-disk flush? Ditto on 
compaction?

Flush - no, compaction - yes.
In memory flush makes changes to in memory data structures, while disk flush 
writes to disk.
When compacted memstore fully supports HFile format, can share the same 
compaction code.

bq. what is the above (flush­total­size) for?
bq. can you be more clear on where the threshold for flush to disk is?

Currently flush is called when memstore size reaches 128MB, however region can 
tolerate even larger memstore size before blocking the update operation. So 
there is lower bound for triggering a flush and an upper bound for triggering a 
flush while blocking update operations.
With flush-total-size we attempt to further refine these boundaries, and have a 
soft lower bound instead of a hard bound.
In the new solution region can tolerate memstore size larger than 128MB (but 
smaller than flush-total-size) before calling a flush to disk, knowing that the 
size is not necessarily monotonically increasing between flushes. We 
distinguish between the data that is in active segments (which are still 
bounded by 128MB) and overflow segments being compacted. The size of all data 
in memstore is bounded by flush-total-size, where flush-size < flush-total-size 
< flush-blocking size.

bq. What is a snapshot in this scheme? we have to do a merge sort on flush to 
make the hfile?

The snapshot is a single immutable segment that is *not* subject to compaction. 
There is no need to do a merge sort on flush to disk.

bq. Do we hold the region lock while we compact the in-memory segments on a 
column family? Every time a compaction runs, it compacts all segments in the 
pipeline?

No - the lock is held only while making the changes to the in-memory data 
structures: removing the tail segment from the compaction pipeline and crossing 
it to snapshot.
Yes - currently a compacion compacts all segments in the pipeline.

bq. I'm not sure I follow the approximation of oldest sequence id.

This was explained in posts between july 23-july 30. Can explain this again if 
required.  

bq. Do you have a rig where you can try out your implementation apart from 
running it inside a regionserver?

What do you mean by rig? If you mean benchmark environment then no. If you mean 
testing then these are included in the patch.

bq. we talking about adding one more thread – a compacting thread – per Store?

In the new design, the threads are run by the region server executor.

bq. On MemstoreScanner, we are keeping the fact that the implementation is 
crossing Segments an internal implementation detail?

Yes.

bq. I suppose you'll deliver a skiplist version first and then move on to work 
on in-memory storefile, a more compact in-memory representation?

This is a task that should definitely be completed; HBASE-10713 is a good 
starting point.

bq. Seems like the whole notion of snapshot should not be exposed to the 
client. It is an implementation detail of the original memstore, the 
defaultmemstore, something that we should try not expose.

Agree, however seems out of the scope of the current Jira which focuses on 
in-memory compaction.


> HBase In-Memory Memstore Compaction
> -----------------------------------
>
>                 Key: HBASE-13408
>                 URL: https://issues.apache.org/jira/browse/HBASE-13408
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Eshcar Hillel
>            Assignee: Eshcar Hillel
>             Fix For: 2.0.0
>
>         Attachments: HBASE-13408-trunk-v01.patch, 
> HBASE-13408-trunk-v02.patch, HBASE-13408-trunk-v03.patch, 
> HBASE-13408-trunk-v04.patch, HBASE-13408-trunk-v05.patch, 
> HBASE-13408-trunk-v06.patch, HBASE-13408-trunk-v07.patch, 
> HBASE-13408-trunk-v08.patch, 
> HBaseIn-MemoryMemstoreCompactionDesignDocument-ver02.pdf, 
> HBaseIn-MemoryMemstoreCompactionDesignDocument-ver03.pdf, 
> HBaseIn-MemoryMemstoreCompactionDesignDocument.pdf, 
> InMemoryMemstoreCompactionEvaluationResults.pdf, 
> InMemoryMemstoreCompactionMasterEvaluationResults.pdf, 
> InMemoryMemstoreCompactionScansEvaluationResults.pdf, 
> StoreSegmentandStoreSegmentScannerClassHierarchies.pdf
>
>
> A store unit holds a column family in a region, where the memstore is its 
> in-memory component. The memstore absorbs all updates to the store; from time 
> to time these updates are flushed to a file on disk, where they are 
> compacted. Unlike disk components, the memstore is not compacted until it is 
> written to the filesystem and optionally to block-cache. This may result in 
> underutilization of the memory due to duplicate entries per row, for example, 
> when hot data is continuously updated. 
> Generally, the faster the data is accumulated in memory, more flushes are 
> triggered, the data sinks to disk more frequently, slowing down retrieval of 
> data, even if very recent.
> In high-churn workloads, compacting the memstore can help maintain the data 
> in memory, and thereby speed up data retrieval. 
> We suggest a new compacted memstore with the following principles:
> 1.    The data is kept in memory for as long as possible
> 2.    Memstore data is either compacted or in process of being compacted 
> 3.    Allow a panic mode, which may interrupt an in-progress compaction and 
> force a flush of part of the memstore.
> We suggest applying this optimization only to in-memory column families.
> A design document is attached.
> This feature was previously discussed in HBASE-5311.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to