Rahul, I had something in mind for months/years! It's a must to have! Thanks for taking the task! I like register to the JIRA and come back very soon with tons of ides and recommendations. You can count on my to test it too!
JMS 2017-12-15 17:44 GMT-05:00 rahul gidwani <rahul.gidw...@gmail.com>: > The tool creates a Map of servers to CompactionRequests needing to be > performed. You always select the server with the largest queue (*which is > not currently compacting) *to compact next. > > I created a JIRA: HBASE-19528 for this tool. > > On Fri, Dec 15, 2017 at 2:35 PM, Ted Yu <yuzhih...@gmail.com> wrote: > > > bq. with at most N distinct RegionServers compacting at a given time > > > > If per table balancing is not on, the regions for the underlying table > may > > not be evenly distributed across the cluster. > > In that case, how would the tool which servers to perform compaction ? > > > > I think you can log a JIRA for upstreaming this tool. > > > > Thanks > > > > On Fri, Dec 15, 2017 at 2:01 PM, rahul gidwani <chu...@apache.org> > wrote: > > > > > Hi, > > > > > > I was wondering if anyone was interested in a manual major compactor > > tool. > > > > > > The basic overview of how this tool works is: > > > > > > Parameters: > > > > > > - > > > > > > Table > > > - > > > > > > Stores > > > - > > > > > > ClusterConcurrency > > > - > > > > > > Timestamp > > > > > > > > > So you input a table, desired concurrency and the list of stores you > wish > > > to major compact. The tool first checks the filesystem to see which > > stores > > > need compaction based on the timestamp you provide (default is current > > > time). It takes that list of stores that require compaction and > executes > > > those requests concurrently with at most N distinct RegionServers > > > compacting at a given time. Each thread waits for the compaction to > > > complete before moving to the next queue. If a region split, merge or > > move > > > happens this tool ensures those regions get major compacted as well. > > > > > > We have started using this tool in production but were wondering if > there > > > is any interest from you guys in getting this upstream. > > > > > > This helps us in two ways, we can limit how much I/O bandwidth we are > > using > > > for major compaction cluster wide and we are guaranteed after the tool > > > completes that all requested compactions complete regardless of moves, > > > merges and splits. > > > > > >