[ https://issues.apache.org/jira/browse/HBASE-3242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Andrew Purtell resolved HBASE-3242. ----------------------------------- Resolution: Not a Problem No activity for a long time > HLog Compactions > ---------------- > > Key: HBASE-3242 > URL: https://issues.apache.org/jira/browse/HBASE-3242 > Project: HBase > Issue Type: Improvement > Components: regionserver > Reporter: Nicolas Spiegelberg > > Currently, our memstore flush algorithm is pretty trivial. We let it grow to > a flushsize and flush a region or grow to a certain log count and then flush > everything below a seqid. In certain situations, we can get big wins from > being more intelligent with our memstore flush algorithm. I suggest we look > into algorithms to intelligently handle HLog compactions. By compaction, I > mean replacing existing HLogs with new HLogs created using the contents of a > memstore snapshot. Situations where we can get huge wins: > 1. In the incrementColumnValue case, N HLog entries often correspond to a > single memstore entry. Although we may have large HLog files, our memstore > could be relatively small. > 2. If we have a hot region, the majority of the HLog consists of that one > region and other region edits would be minuscule. > In both cases, we are forced to flush a bunch of very small stores. Its > really hard for a compaction algorithm to be efficient when it has no > guarantees of the approximate size of a new StoreFile, so it currently does > unconditional, inefficient compactions. Additionally, compactions & flushes > suck because they invalidate cache entries: be it memstore or LRUcache. If > we can limit flushes to cases where we will have significant HFile output on > a per-Store basis, we can get improved performance, stability, and reduced > failover time. -- This message was sent by Atlassian JIRA (v6.2#6252)