thanks for all the great feedback! I opened a ticket here:
https://issues.apache.org/jira/browse/HBASE-19528 Lets continue the discussion there. On Fri, Dec 15, 2017 at 11:34 PM, sahil aggarwal <sahil.ag...@gmail.com> wrote: > Hi, > > We wrote something similar. It just triggers major compaction with given > parallelism and distribute it across the cluster. > > https://github.com/flipkart-incubator/hbase-compactor > > > On Dec 16, 2017 10:01 AM, "Jean-Marc Spaggiari" <jean-m...@spaggiari.org> > wrote: > > Rahul, > > I had something in mind for months/years! It's a must to have! Thanks for > taking the task! I like register to the JIRA and come back very soon with > tons of ides and recommendations. You can count on my to test it too! > > JMS > > 2017-12-15 17:44 GMT-05:00 rahul gidwani <rahul.gidw...@gmail.com>: > > > The tool creates a Map of servers to CompactionRequests needing to be > > performed. You always select the server with the largest queue (*which > is > > not currently compacting) *to compact next. > > > > I created a JIRA: HBASE-19528 for this tool. > > > > On Fri, Dec 15, 2017 at 2:35 PM, Ted Yu <yuzhih...@gmail.com> wrote: > > > > > bq. with at most N distinct RegionServers compacting at a given time > > > > > > If per table balancing is not on, the regions for the underlying table > > may > > > not be evenly distributed across the cluster. > > > In that case, how would the tool which servers to perform compaction ? > > > > > > I think you can log a JIRA for upstreaming this tool. > > > > > > Thanks > > > > > > On Fri, Dec 15, 2017 at 2:01 PM, rahul gidwani <chu...@apache.org> > > wrote: > > > > > > > Hi, > > > > > > > > I was wondering if anyone was interested in a manual major compactor > > > tool. > > > > > > > > The basic overview of how this tool works is: > > > > > > > > Parameters: > > > > > > > > - > > > > > > > > Table > > > > - > > > > > > > > Stores > > > > - > > > > > > > > ClusterConcurrency > > > > - > > > > > > > > Timestamp > > > > > > > > > > > > So you input a table, desired concurrency and the list of stores you > > wish > > > > to major compact. The tool first checks the filesystem to see which > > > stores > > > > need compaction based on the timestamp you provide (default is > current > > > > time). It takes that list of stores that require compaction and > > executes > > > > those requests concurrently with at most N distinct RegionServers > > > > compacting at a given time. Each thread waits for the compaction to > > > > complete before moving to the next queue. If a region split, merge > or > > > move > > > > happens this tool ensures those regions get major compacted as well. > > > > > > > > We have started using this tool in production but were wondering if > > there > > > > is any interest from you guys in getting this upstream. > > > > > > > > This helps us in two ways, we can limit how much I/O bandwidth we are > > > using > > > > for major compaction cluster wide and we are guaranteed after the > tool > > > > completes that all requested compactions complete regardless of > moves, > > > > merges and splits. > > > > > > > > > >