[ https://issues.apache.org/jira/browse/HBASE-14921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15383748#comment-15383748 ]
Anastasia Braginsky commented on HBASE-14921: --------------------------------------------- What is the default case, where we are sure we don't need to remove any duplicates? Hereby, I add a summary of how the flattening is using scans. When the size of active segment is above some threshold in CompactingMemStore, the active segment is pushed to pipeline (MutableSegment wrapped as ImmutableSegment). After that a single dedicated thread is doing the following: 1. Scan *all* segments in the pipeline (with ScanQueryMatcher) in order to understand whether compaction is needed. This is for now the only way to understand whether we have duplicates or not. 2. Decide whether to flatten or to compact 3. If to flatten, then scan the not-flat segment only (without ScanQueryMatcher) in order to flatten. Can we have the real numbers showing what is the performance difference with and without the scan (in stage 1)? May be you ([~anoop.hbase], [~ram_krish]) can run this experiment on your big set up (while we have a simple configuration)? > Memory optimizations > -------------------- > > Key: HBASE-14921 > URL: https://issues.apache.org/jira/browse/HBASE-14921 > Project: HBase > Issue Type: Sub-task > Affects Versions: 2.0.0 > Reporter: Eshcar Hillel > Assignee: Anastasia Braginsky > Attachments: CellBlocksSegmentInMemStore.pdf, > CellBlocksSegmentinthecontextofMemStore(1).pdf, HBASE-14921-V01.patch, > HBASE-14921-V02.patch, HBASE-14921-V03.patch, HBASE-14921-V04-CA-V02.patch, > HBASE-14921-V04-CA.patch, HBASE-14921-V05-CAO.patch, > InitialCellArrayMapEvaluation.pdf, IntroductiontoNewFlatandCompactMemStore.pdf > > > Memory optimizations including compressed format representation and offheap > allocations -- This message was sent by Atlassian JIRA (v6.3.4#6332)