The tool creates a Map of servers to CompactionRequests needing to be
performed.  You always select the server with the largest queue (*which is
not currently compacting) *to compact next.

I created a JIRA: HBASE-19528 for this tool.

On Fri, Dec 15, 2017 at 2:35 PM, Ted Yu <yuzhih...@gmail.com> wrote:

> bq. with at most N distinct RegionServers compacting at a given time
>
> If per table balancing is not on, the regions for the underlying table may
> not be evenly distributed across the cluster.
> In that case, how would the tool which servers to perform compaction ?
>
> I think you can log a JIRA for upstreaming this tool.
>
> Thanks
>
> On Fri, Dec 15, 2017 at 2:01 PM, rahul gidwani <chu...@apache.org> wrote:
>
> > Hi,
> >
> > I was wondering if anyone was interested in a manual major compactor
> tool.
> >
> > The basic overview of how this tool works is:
> >
> > Parameters:
> >
> >    -
> >
> >    Table
> >    -
> >
> >    Stores
> >    -
> >
> >    ClusterConcurrency
> >    -
> >
> >    Timestamp
> >
> >
> > So you input a table, desired concurrency and the list of stores you wish
> > to major compact.  The tool first checks the filesystem to see which
> stores
> > need compaction based on the timestamp you provide (default is current
> > time).  It takes that list of stores that require compaction and executes
> > those requests concurrently with at most N distinct RegionServers
> > compacting at a given time.  Each thread waits for the compaction to
> > complete before moving to the next queue.  If a region split, merge or
> move
> > happens this tool ensures those regions get major compacted as well.
> >
> > We have started using this tool in production but were wondering if there
> > is any interest from you guys in getting this upstream.
> >
> > This helps us in two ways, we can limit how much I/O bandwidth we are
> using
> > for major compaction cluster wide and we are guaranteed after the tool
> > completes that all requested compactions complete regardless of moves,
> > merges and splits.
> >
>

Reply via email to