[jira] [Commented] (HBASE-13408) HBase In-Memory Memstore Compaction

stack (JIRA) Mon, 23 Nov 2015 23:13:02 -0800

    [ 
https://issues.apache.org/jira/browse/HBASE-13408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15023894#comment-15023894
 ]


stack commented on HBASE-13408:
-------------------------------

Did the design doc get updated with justifications for this feature?  In 
particular principals like 'The data is kept in memory for as long as possible' 
 or statements like this: "...may help in some scenarios, however it might also 
add unnecessary overhead in other scenarios without any performance gains, like 
when there are no inmemory duplicate records most of the time." We still think 
this last statement true? If this feature is only of use when in-memory 
duplicate records -- a relatively rare instance -- then there is a lot of code 
being added for this case. Can you go bigger? Can you come up with arguments 
that have it that this feature is advantageous 90% of the time. Above I talk of 
better perf because we'll be able to have the in-memory data in a more compact, 
perforrmant (read-only) format than having it in ConcurrentSkipList. Flushes 
could be faster if the format in memory is an hfile (especially if the hfile 
were offheap as came up in a recent offlist chat w/ [~anoop.hbase]). Can we 
come up with other reasons with why this is the bees knees? ([~anoop.hbase] you 
have input here boss?). Thanks. Let me look at the patch.

> HBase In-Memory Memstore Compaction
> -----------------------------------
>
>                 Key: HBASE-13408
>                 URL: https://issues.apache.org/jira/browse/HBASE-13408
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Eshcar Hillel
>            Assignee: Eshcar Hillel
>             Fix For: 2.0.0
>
>         Attachments: HBASE-13408-trunk-v01.patch, 
> HBASE-13408-trunk-v02.patch, HBASE-13408-trunk-v03.patch, 
> HBASE-13408-trunk-v04.patch, HBASE-13408-trunk-v05.patch, 
> HBASE-13408-trunk-v06.patch, HBASE-13408-trunk-v07.patch, 
> HBASE-13408-trunk-v08.patch, HBASE-13408-trunk-v09.patch, 
> HBASE-13408-trunk-v10.patch, 
> HBaseIn-MemoryMemstoreCompactionDesignDocument-ver02.pdf, 
> HBaseIn-MemoryMemstoreCompactionDesignDocument-ver03.pdf, 
> HBaseIn-MemoryMemstoreCompactionDesignDocument.pdf, 
> InMemoryMemstoreCompactionEvaluationResults.pdf, 
> InMemoryMemstoreCompactionMasterEvaluationResults.pdf, 
> InMemoryMemstoreCompactionScansEvaluationResults.pdf, 
> StoreSegmentandStoreSegmentScannerClassHierarchies.pdf
>
>
> A store unit holds a column family in a region, where the memstore is its 
> in-memory component. The memstore absorbs all updates to the store; from time 
> to time these updates are flushed to a file on disk, where they are 
> compacted. Unlike disk components, the memstore is not compacted until it is 
> written to the filesystem and optionally to block-cache. This may result in 
> underutilization of the memory due to duplicate entries per row, for example, 
> when hot data is continuously updated. 
> Generally, the faster the data is accumulated in memory, more flushes are 
> triggered, the data sinks to disk more frequently, slowing down retrieval of 
> data, even if very recent.
> In high-churn workloads, compacting the memstore can help maintain the data 
> in memory, and thereby speed up data retrieval. 
> We suggest a new compacted memstore with the following principles:
> 1.    The data is kept in memory for as long as possible
> 2.    Memstore data is either compacted or in process of being compacted 
> 3.    Allow a panic mode, which may interrupt an in-progress compaction and 
> force a flush of part of the memstore.
> We suggest applying this optimization only to in-memory column families.
> A design document is attached.
> This feature was previously discussed in HBASE-5311.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-13408) HBase In-Memory Memstore Compaction

Reply via email to