[ 
https://issues.apache.org/jira/browse/HBASE-14921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15383748#comment-15383748
 ] 

Anastasia Braginsky commented on HBASE-14921:
---------------------------------------------

What is the default case, where we are sure we don't need to remove any 
duplicates?
Hereby, I add a summary of how the flattening is using scans.

When the size of active segment is above some threshold in CompactingMemStore, 
the active segment is pushed to pipeline (MutableSegment wrapped as 
ImmutableSegment). After that a single dedicated thread is doing the following:
1. Scan *all* segments in the pipeline (with ScanQueryMatcher) in order to 
understand whether compaction is needed. This is for now the only way to 
understand whether we have duplicates or not.
2. Decide whether to flatten or to compact
3. If to flatten, then scan the not-flat segment only (without 
ScanQueryMatcher) in order to flatten.

Can we have the real numbers showing what is the performance difference with 
and without the scan (in stage 1)? May be you ([~anoop.hbase], [~ram_krish]) 
can run this experiment on your big set up (while we have a simple 
configuration)?

> Memory optimizations
> --------------------
>
>                 Key: HBASE-14921
>                 URL: https://issues.apache.org/jira/browse/HBASE-14921
>             Project: HBase
>          Issue Type: Sub-task
>    Affects Versions: 2.0.0
>            Reporter: Eshcar Hillel
>            Assignee: Anastasia Braginsky
>         Attachments: CellBlocksSegmentInMemStore.pdf, 
> CellBlocksSegmentinthecontextofMemStore(1).pdf, HBASE-14921-V01.patch, 
> HBASE-14921-V02.patch, HBASE-14921-V03.patch, HBASE-14921-V04-CA-V02.patch, 
> HBASE-14921-V04-CA.patch, HBASE-14921-V05-CAO.patch, 
> InitialCellArrayMapEvaluation.pdf, IntroductiontoNewFlatandCompactMemStore.pdf
>
>
> Memory optimizations including compressed format representation and offheap 
> allocations



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to