[ https://issues.apache.org/jira/browse/HBASE-13408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Eshcar Hillel updated HBASE-13408: ---------------------------------- Attachment: HBASE-13408-trunk-v02.patch InMemoryMemstoreCompactionScansEvaluationResults.pdf We attach a new patch which covers wal truncation. We also attach evaluation results for scans. The trend is very similar to the improvement we see for read operation. Following the approach suggested in HBASE-10713, we now divide flushed stores into two groups: one doing the traditional flush to disk, and the other group does in-memory flush into an inactive (read-only) memstore segment, which is subject to compaction. By default, an in-memory column family has compacted memstore which does in-memory flush, while all other column families have a default memstore which flush to disk. However, in some use cases, e.g. upon region split/merge/close, even in-memory columns flush their content to disk. Therefore, flush policy selects *two* sets of stores: one to flush to disk, and one to do in-memory flush. The first set invokes snapshot(), and the second set invokes flushInMemory() during the prepare phase. The main changes to support wal truncation are threefold: (1) upon in-memory compaction the wal is updated with a sequence number which is a lower approximation of the lowest-unflushed-sequence-id (2) When the number of log files exceed a certain threshold the store is forced to flush to disk even if it is an in-memory column. (3) upon flush to disk lowest-unflushed-sequence-id is cleared (like it used to be). Stores with in-memory segments, update this with a lower approximation of the lowest sequence id still in memory. Other stores update this sequence id with the first insert after the flush (like it used to be) While (1) should help in prolonging the time an item can stay in memory, (2) and (3) are there to ensure the wal size is maintainable and cannot explode. > HBase In-Memory Memstore Compaction > ----------------------------------- > > Key: HBASE-13408 > URL: https://issues.apache.org/jira/browse/HBASE-13408 > Project: HBase > Issue Type: New Feature > Reporter: Eshcar Hillel > Attachments: HBASE-13408-trunk-v01.patch, > HBASE-13408-trunk-v02.patch, > HBaseIn-MemoryMemstoreCompactionDesignDocument-ver02.pdf, > HBaseIn-MemoryMemstoreCompactionDesignDocument.pdf, > InMemoryMemstoreCompactionEvaluationResults.pdf, > InMemoryMemstoreCompactionScansEvaluationResults.pdf > > > A store unit holds a column family in a region, where the memstore is its > in-memory component. The memstore absorbs all updates to the store; from time > to time these updates are flushed to a file on disk, where they are > compacted. Unlike disk components, the memstore is not compacted until it is > written to the filesystem and optionally to block-cache. This may result in > underutilization of the memory due to duplicate entries per row, for example, > when hot data is continuously updated. > Generally, the faster the data is accumulated in memory, more flushes are > triggered, the data sinks to disk more frequently, slowing down retrieval of > data, even if very recent. > In high-churn workloads, compacting the memstore can help maintain the data > in memory, and thereby speed up data retrieval. > We suggest a new compacted memstore with the following principles: > 1. The data is kept in memory for as long as possible > 2. Memstore data is either compacted or in process of being compacted > 3. Allow a panic mode, which may interrupt an in-progress compaction and > force a flush of part of the memstore. > We suggest applying this optimization only to in-memory column families. > A design document is attached. > This feature was previously discussed in HBASE-5311. -- This message was sent by Atlassian JIRA (v6.3.4#6332)