Re: Compactions nice to have features

Qiang Tian Wed, 08 Oct 2014 00:00:17 -0700

please also see https://issues.apache.org/jira/browse/HBASE-11368. it looks
to me the multi-CF bulkload could use a lower granularity region level lock
so that compaction would not block bulkload and subsequent read/scans.








On Mon, Oct 6, 2014 at 6:01 AM, lars hofhansl <la...@apache.org> wrote:

> >>- rack IO throttle. We should add that to accommodate for over
> subscription at the ToR level.
> > Can you decipher that, Lars?
>
> ToR is "Top of Rack" switch. Over subscription means that a ToR switch
> usually does not have enough bandwidth to serve traffic in and out of rack
> at full speed.
> For example if you had 40 machines in a rack with 1ge links each, and the
> ToR switch has a 10ge uplink, you'd say the ToR switch is 4 to 1 over
> subsctribed.
>
>
> Was just trying to say: "Yeah, we need that" :)
>
> >>- cluster wide compaction storms. Yeah, that's bad. Can be alleviated by
> spreading timed major compactions out. (in our clusters we set the
> interval to 1 week and the jitter to 1/2 week)
> > I think we have some JIRAs for that?
>
> That you can already do:
> hbase.hregion.majorcompaction defaults to day in 0.94 (86400000ms). Means
> *all* data is rewritten *every single* day. We set it to 604800000ms (1
> week)
> hbase.hregion.majorcompaction.jitter defaults to 20% (0.2). We set this to
> 0.5 (so we spread out the timed major compactions over 1 week, to avoid
> storms)
>
> Just checked 0.98. Turns out these are exactly the defaults there (1 week
> +- 1/2 week). Cool, forgot about that (see HBASE-8450). So in 0.98+ the
> defaults should be pretty good on this end.
>
> Compaction storms can still happen during normal load when write is
> equally spread out of very many regions. I that case it's not unlikely that
> many regions decide to compact at the same time.
>
>
> -- Lars
>
>
>
> ________________________________
>  From: Vladimir Rodionov <vladrodio...@gmail.com>
> To: "dev@hbase.apache.org" <dev@hbase.apache.org>; lars hofhansl <
> la...@apache.org>
> Sent: Sunday, October 5, 2014 2:11 PM
> Subject: Re: Compactions nice to have features
>
>
>
>
> >> A few comments:
> >> - bulkload - you mean not by loading pre-created HFiles? If you do that
> there would be no compaction during the import as the files are simply
> moved
> >> into place.
>
> Bulk load is not always convenient or feasible, we have batched mutations
> support in API but still compaction is serious issue. Cassandra allows to
> disable/enable compactions (I think its cluster-wide, not sure though), why
> do should not we have?
>
> >>- local compaction IO limit. Limiting the number of compaction threads
> (1 by default) is not good enough ... ? You can cause too much harm even
>  with a >> single thread compacting per region server?
>
> This is I am not sure about myself. The idea is to make compaction more
> I/O nicer. For example, read operations and memstore flushes  should have
> higher priority than compaction I/O. One way is to limit (throttle)
> compaction bandwidth locally, there are some other approaches as well.
>
>
> >>- rack IO throttle. We should add that to accommodate for over
> subscription at the ToR level.
>
> Can you decipher that, Lars?
>
> >>- cluster wide compaction storms. Yeah, that's bad. Can be alleviated by
>  spreading timed major compactions out. (in our clusters we set the
> interval to 1 week and the jitter to 1/2 week)
>
>
> I think we have some JIRAs for that?
>
>
> >>- what do you think about off-peak compaction? We have that in part as
> the compaction ratio can be set differently for off peak hours
>
>
> Off peak compactions can have higher limits or even different policies.
>
>
> >>Generally I like the idea of being able to pace compaction better.
> >>Do you want to file jiras for these?
>
> Yeah, will do that.
>
>
>
>
>
> On Sat, Oct 4, 2014 at 10:31 AM, lars hofhansl <la...@apache.org> wrote:
>
> Hi Vladimir,
> >
> >these are very interesting.
> >A few comments:
> >- bulkload - you mean not by loading pre-created HFiles? If you do that
> there would be no compaction during the import as the files are simply
> moved into place.
> >- local compaction IO limit. Limiting the number of compaction threads (1
> by default) is not good enough ... ? You can cause too much harm even with
> a single thread compacting per region server?
> >
> >- rack IO throttle. We should add that to accommodate for over
> subscription at the ToR level.
> >- cluster wide compaction storms. Yeah, that's bad. Can be alleviated by
> spreading timed major compactions out. (in our clusters we set the interval
> to 1 week and the jitter to 1/2 week)
> >- what do you think about off-peak compaction? We have that in part as
> the compaction ratio can be set differently for off peak hours
> >
> >
> >Generally I like the idea of being able to pace compaction better.
> >Do you want to file jiras for these? Doesn't mean you have to do all the
> work :)
> >
> >
> >-- Lars
> >
> >
> >
> >________________________________
> > From: Vladimir Rodionov <vladrodio...@gmail.com>
> >To: "dev@hbase.apache.org" <dev@hbase.apache.org>
> >Sent: Friday, October 3, 2014 10:34 PM
> >Subject: Compactions nice to have features
> >
> >
> >
> >I am thinking about the following:
> >
> >1. Compaction On/Off per CF, Table, cluster. Both: minor and major
> >
> >Good during bulk load.
> >
> >- Disable compaction for table 'T'
> >- Load 1B rows
> >- Enable compaction for table 'T'
> >
> >2. Local Compaction I/O throttle
> >
> >Set I/O limit per RS
> >
> >3. Rack Compaction I/O throttle
> >
> >Set I/O limit per server rack. Good to control uplink bandwidth.
> >
> >4. Cluster Compaction I/O throttle. Good to avoid compaction storms
> >
> >-Vladimir Rodionov
>

Re: Compactions nice to have features

Reply via email to