[ https://issues.apache.org/jira/browse/HBASE-13408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15009577#comment-15009577 ]
Edward Bortnikov commented on HBASE-13408: ------------------------------------------ Community - please review. > HBase In-Memory Memstore Compaction > ----------------------------------- > > Key: HBASE-13408 > URL: https://issues.apache.org/jira/browse/HBASE-13408 > Project: HBase > Issue Type: New Feature > Reporter: Eshcar Hillel > Assignee: Eshcar Hillel > Fix For: 2.0.0 > > Attachments: HBASE-13408-trunk-v01.patch, > HBASE-13408-trunk-v02.patch, HBASE-13408-trunk-v03.patch, > HBASE-13408-trunk-v04.patch, HBASE-13408-trunk-v05.patch, > HBASE-13408-trunk-v06.patch, HBASE-13408-trunk-v07.patch, > HBASE-13408-trunk-v08.patch, HBASE-13408-trunk-v09.patch, > HBASE-13408-trunk-v10.patch, > HBaseIn-MemoryMemstoreCompactionDesignDocument-ver02.pdf, > HBaseIn-MemoryMemstoreCompactionDesignDocument-ver03.pdf, > HBaseIn-MemoryMemstoreCompactionDesignDocument.pdf, > InMemoryMemstoreCompactionEvaluationResults.pdf, > InMemoryMemstoreCompactionMasterEvaluationResults.pdf, > InMemoryMemstoreCompactionScansEvaluationResults.pdf, > StoreSegmentandStoreSegmentScannerClassHierarchies.pdf > > > A store unit holds a column family in a region, where the memstore is its > in-memory component. The memstore absorbs all updates to the store; from time > to time these updates are flushed to a file on disk, where they are > compacted. Unlike disk components, the memstore is not compacted until it is > written to the filesystem and optionally to block-cache. This may result in > underutilization of the memory due to duplicate entries per row, for example, > when hot data is continuously updated. > Generally, the faster the data is accumulated in memory, more flushes are > triggered, the data sinks to disk more frequently, slowing down retrieval of > data, even if very recent. > In high-churn workloads, compacting the memstore can help maintain the data > in memory, and thereby speed up data retrieval. > We suggest a new compacted memstore with the following principles: > 1. The data is kept in memory for as long as possible > 2. Memstore data is either compacted or in process of being compacted > 3. Allow a panic mode, which may interrupt an in-progress compaction and > force a flush of part of the memstore. > We suggest applying this optimization only to in-memory column families. > A design document is attached. > This feature was previously discussed in HBASE-5311. -- This message was sent by Atlassian JIRA (v6.3.4#6332)