Re: Major Compaction Tool

rahul gidwani Tue, 19 Dec 2017 09:06:52 -0800

thanks for all the great feedback!

I opened a ticket here:


https://issues.apache.org/jira/browse/HBASE-19528

Lets continue the discussion there.


On Fri, Dec 15, 2017 at 11:34 PM, sahil aggarwal <sahil.ag...@gmail.com>
wrote:

> Hi,
>
> We wrote something similar. It just triggers major compaction with given
> parallelism and distribute it across the cluster.
>
> https://github.com/flipkart-incubator/hbase-compactor
>
>
> On Dec 16, 2017 10:01 AM, "Jean-Marc Spaggiari" <jean-m...@spaggiari.org>
> wrote:
>
> Rahul,
>
> I had something in mind for months/years! It's a must to have! Thanks for
> taking the task! I like register to the JIRA and come back very soon with
> tons of ides and recommendations. You can count on my to test it too!
>
> JMS
>
> 2017-12-15 17:44 GMT-05:00 rahul gidwani <rahul.gidw...@gmail.com>:
>
> > The tool creates a Map of servers to CompactionRequests needing to be
> > performed.  You always select the server with the largest queue (*which
> is
> > not currently compacting) *to compact next.
> >
> > I created a JIRA: HBASE-19528 for this tool.
> >
> > On Fri, Dec 15, 2017 at 2:35 PM, Ted Yu <yuzhih...@gmail.com> wrote:
> >
> > > bq. with at most N distinct RegionServers compacting at a given time
> > >
> > > If per table balancing is not on, the regions for the underlying table
> > may
> > > not be evenly distributed across the cluster.
> > > In that case, how would the tool which servers to perform compaction ?
> > >
> > > I think you can log a JIRA for upstreaming this tool.
> > >
> > > Thanks
> > >
> > > On Fri, Dec 15, 2017 at 2:01 PM, rahul gidwani <chu...@apache.org>
> > wrote:
> > >
> > > > Hi,
> > > >
> > > > I was wondering if anyone was interested in a manual major compactor
> > > tool.
> > > >
> > > > The basic overview of how this tool works is:
> > > >
> > > > Parameters:
> > > >
> > > >    -
> > > >
> > > >    Table
> > > >    -
> > > >
> > > >    Stores
> > > >    -
> > > >
> > > >    ClusterConcurrency
> > > >    -
> > > >
> > > >    Timestamp
> > > >
> > > >
> > > > So you input a table, desired concurrency and the list of stores you
> > wish
> > > > to major compact.  The tool first checks the filesystem to see which
> > > stores
> > > > need compaction based on the timestamp you provide (default is
> current
> > > > time).  It takes that list of stores that require compaction and
> > executes
> > > > those requests concurrently with at most N distinct RegionServers
> > > > compacting at a given time.  Each thread waits for the compaction to
> > > > complete before moving to the next queue.  If a region split, merge
> or
> > > move
> > > > happens this tool ensures those regions get major compacted as well.
> > > >
> > > > We have started using this tool in production but were wondering if
> > there
> > > > is any interest from you guys in getting this upstream.
> > > >
> > > > This helps us in two ways, we can limit how much I/O bandwidth we are
> > > using
> > > > for major compaction cluster wide and we are guaranteed after the
> tool
> > > > completes that all requested compactions complete regardless of
> moves,
> > > > merges and splits.
> > > >
> > >
> >
>

Re: Major Compaction Tool

Reply via email to