Re: Major Compaction Tool

Jean-Marc Spaggiari Fri, 15 Dec 2017 20:31:59 -0800

Rahul,

I had something in mind for months/years! It's a must to have! Thanks for
taking the task! I like register to the JIRA and come back very soon with
tons of ides and recommendations. You can count on my to test it too!


JMS

2017-12-15 17:44 GMT-05:00 rahul gidwani <[email protected]>:

> The tool creates a Map of servers to CompactionRequests needing to be
> performed.  You always select the server with the largest queue (*which is
> not currently compacting) *to compact next.
>
> I created a JIRA: HBASE-19528 for this tool.
>
> On Fri, Dec 15, 2017 at 2:35 PM, Ted Yu <[email protected]> wrote:
>
> > bq. with at most N distinct RegionServers compacting at a given time
> >
> > If per table balancing is not on, the regions for the underlying table
> may
> > not be evenly distributed across the cluster.
> > In that case, how would the tool which servers to perform compaction ?
> >
> > I think you can log a JIRA for upstreaming this tool.
> >
> > Thanks
> >
> > On Fri, Dec 15, 2017 at 2:01 PM, rahul gidwani <[email protected]>
> wrote:
> >
> > > Hi,
> > >
> > > I was wondering if anyone was interested in a manual major compactor
> > tool.
> > >
> > > The basic overview of how this tool works is:
> > >
> > > Parameters:
> > >
> > >    -
> > >
> > >    Table
> > >    -
> > >
> > >    Stores
> > >    -
> > >
> > >    ClusterConcurrency
> > >    -
> > >
> > >    Timestamp
> > >
> > >
> > > So you input a table, desired concurrency and the list of stores you
> wish
> > > to major compact.  The tool first checks the filesystem to see which
> > stores
> > > need compaction based on the timestamp you provide (default is current
> > > time).  It takes that list of stores that require compaction and
> executes
> > > those requests concurrently with at most N distinct RegionServers
> > > compacting at a given time.  Each thread waits for the compaction to
> > > complete before moving to the next queue.  If a region split, merge or
> > move
> > > happens this tool ensures those regions get major compacted as well.
> > >
> > > We have started using this tool in production but were wondering if
> there
> > > is any interest from you guys in getting this upstream.
> > >
> > > This helps us in two ways, we can limit how much I/O bandwidth we are
> > using
> > > for major compaction cluster wide and we are guaranteed after the tool
> > > completes that all requested compactions complete regardless of moves,
> > > merges and splits.
> > >
> >
>

Re: Major Compaction Tool

Reply via email to