[ https://issues.apache.org/jira/browse/HBASE-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sergey Shelukhin resolved HBASE-7667. ------------------------------------- Resolution: Fixed Fix Version/s: 0.99.0 0.98.0 All the pertinent patches have been committed for some time (before 98 was branched). > Support stripe compaction > ------------------------- > > Key: HBASE-7667 > URL: https://issues.apache.org/jira/browse/HBASE-7667 > Project: HBase > Issue Type: New Feature > Components: Compaction > Reporter: Sergey Shelukhin > Assignee: Sergey Shelukhin > Fix For: 0.98.0, 0.99.0 > > Attachments: Stripe compaction perf evaluation.pdf, Stripe compaction > perf evaluation.pdf, Stripe compaction perf evaluation.pdf, Stripe > compactions.pdf, Stripe compactions.pdf, Stripe compactions.pdf, Stripe > compactions.pdf, Using stripe compactions.pdf, Using stripe compactions.pdf, > Using stripe compactions.pdf, stripe-cdf.pdf > > > So I was thinking about having many regions as the way to make compactions > more manageable, and writing the level db doc about how level db range > overlap and data mixing breaks seqNum sorting, and discussing it with Jimmy, > Matteo and Ted, and thinking about how to avoid Level DB I/O multiplication > factor. > And I suggest the following idea, let's call it stripe compactions. It's a > mix between level db ideas and having many small regions. > It allows us to have a subset of benefits of many regions (wrt reads and > compactions) without many of the drawbacks (managing and current > memstore/etc. limitation). > It also doesn't break seqNum-based file sorting for any one key. > It works like this. > The region key space is separated into configurable number of fixed-boundary > stripes (determined the first time we stripe the data, see below). > All the data from memstores is written to normal files with all keys present > (not striped), similar to L0 in LevelDb, or current files. > Compaction policy does 3 types of compactions. > First is L0 compaction, which takes all L0 files and breaks them down by > stripe. It may be optimized by adding more small files from different > stripes, but the main logical outcome is that there are no more L0 files and > all data is striped. > Second is exactly similar to current compaction, but compacting one single > stripe. In future, nothing prevents us from applying compaction rules and > compacting part of the stripe (e.g. similar to current policy with rations > and stuff, tiers, whatever), but for the first cut I'd argue let it "major > compact" the entire stripe. Or just have the ratio and no more complexity. > Finally, the third addresses the concern of the fixed boundaries causing > stripes to be very unbalanced. > It's exactly like the 2nd, except it takes 2+ adjacent stripes and writes the > results out with different boundaries. > There's a tradeoff here - if we always take 2 adjacent stripes, compactions > will be smaller but rebalancing will take ridiculous amount of I/O. > If we take many stripes we are essentially getting into the > epic-major-compaction problem again. Some heuristics will have to be in place. > In general, if, before stripes are determined, we initially let L0 grow > before determining the stripes, we will get better boundaries. > Also, unless unbalancing is really large we don't need to rebalance really. > Obviously this scheme (as well as level) is not applicable for all scenarios, > e.g. if timestamp is your key it completely falls apart. > The end result: > - many small compactions that can be spread out in time. > - reads still read from a small number of files (one stripe + L0). > - region splits become marvelously simple (if we could move files between > regions, no references would be needed). > Main advantage over Level (for HBase) is that default store can still open > the files and get correct results - there are no range overlap shenanigans. > It also needs no metadata, although we may record some for convenience. > It also would appear to not cause as much I/O. -- This message was sent by Atlassian JIRA (v6.1#6144)