>
> It's supposed to be controlling how big the region is?
>
Precisely. It may not make a big difference for compaction itself, but
might have further implications on overall RS resource usage, with larger
than expected regions.  Given the feedback provided here, I guess we can
proceed with current proposal from HBASE-24530 all the way to maintenance
branches (it doesn't change IncreasingToUpperBoundRegionSplitPolicy
behaviour, but adds a new policy that in fact respect region max size for
the whole region). We can then fix IncreasingToUpperBoundRegionSplitPolicy
at minor versions branches as suggested by Busbey.

Em qua., 24 de jun. de 2020 às 18:00, Andrew Purtell <apurt...@apache.org>
escreveu:

> It's supposed to be controlling how big the region is?
>
> On Wed, Jun 24, 2020 at 8:42 AM 张铎(Duo Zhang) <palomino...@gmail.com>
> wrote:
>
> > I think one of the goals of limiting the store file size is for
> compaction.
> > As long as we just do compactions per family, what is the actual problem
> if
> > the whole region is too big?
> >
> > Wellington Chevreuil <wellington.chevre...@gmail.com> 于2020年6月24日周三
> > 下午10:56写道:
> >
> > > The expected behaviour for the property is well documented, so renaming
> > and
> > > deprecation would rather be a separate task. HBASE-24530 should concern
> > > with making IncreasingToUpperBoundRegionSplitPolicy respect what
> > > hbase.hregion.max.filesize and MAX_FILESIZE table level descriptor
> > > documentation mandate, as well as being consistent with other split
> > > policies behaviour in relation to these properties.
> > >
> > > Em qua., 24 de jun. de 2020 às 08:00, Anoop John <
> anoop.hb...@gmail.com>
> > > escreveu:
> > >
> > > > If we are going to change (correct)   hbase.hregion.max.filesize to
> > > > hbase.hregion.max.size  (Via proper deprecation cycle) also along
> with
> > > this
> > > > change, am good.
> > > >
> > > > Anoop
> > > >
> > > > On Wed, Jun 24, 2020 at 1:29 AM Sean Busbey <bus...@apache.org>
> wrote:
> > > >
> > > > > Let's fix via approach #3. Get it done for next minor versions and
> > then
> > > > if
> > > > > folks aren't sure about principle of least surprise we can talk
> about
> > > > > wether it goes into maintenance releases.
> > > > >
> > > > > On Tue, Jun 23, 2020, 13:07 Andrew Purtell <apurt...@apache.org>
> > > wrote:
> > > > >
> > > > > > > Current IncreasingToUpperBoundRegionSplitPolicy implementation
> is
> > > > > > violating those configs.
> > > > > >
> > > > > > Thank you for pointing this out. I feel even more strongly now
> this
> > > is
> > > > a
> > > > > > bug.
> > > > > > I vote for #3.
> > > > > >
> > > > > > On Tue, Jun 23, 2020 at 2:42 AM Wellington Chevreuil <
> > > > > > wellington.chevre...@gmail.com> wrote:
> > > > > >
> > > > > > > >
> > > > > > > > The config name was/is   hbase.hregion.max.*filesize* and
> > never *
> > > > > > > > hbase.hregion.max.size*.
> > > > > > > >
> > > > > > >
> > > > > > > Description for hbase.hregion.max.filesize is very clear
> stating
> > > that
> > > > > > it's
> > > > > > > the sum of all hfiles in the region that should not exceed this
> > > > > property
> > > > > > > value. And we not always use  *hbase.hregion.max.filesize* to
> > > > determine
> > > > > > the
> > > > > > > limit, but a MAX_FILESIZE table level descriptor whose
> > description
> > > > > reads
> > > > > > as
> > > > > > > below, on TableDescriptorBuilder javadoc:
> > > > > > >
> > > > > > >   /**
> > > > > > >    * Returns the maximum size upto which a region can grow to
> > after
> > > > > > which a
> > > > > > >    * region split is triggered. The region size is represented
> by
> > > the
> > > > > > size
> > > > > > > of
> > > > > > >    * the biggest store file in that region.
> > > > > > >    *
> > > > > > >    * @return max hregion size for table, -1 if not set.
> > > > > > >    */
> > > > > > >
> > > > > > > Current IncreasingToUpperBoundRegionSplitPolicy implementation
> is
> > > > > > violating
> > > > > > > those configs.
> > > > > > >
> > > > > > > Do we have a consensus on applying #3 for all active branches?
> If
> > > > so, I
> > > > > > > would instruct HBASE-24530 to proceed as such.
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > Em dom., 21 de jun. de 2020 às 19:09, Andrew Purtell <
> > > > > > > andrew.purt...@gmail.com> escreveu:
> > > > > > >
> > > > > > > > ‘Filesize’ and ‘size’ are ambiguous. They are open to
> > > > interpretation
> > > > > > and
> > > > > > > I
> > > > > > > > don’t see one as more clear than the other, other than to
> imply
> > > > > > something
> > > > > > > > about file level measures being the determining factor. It
> > > doesn’t
> > > > > > convey
> > > > > > > > more semantics beyond that, ie one file trips the limit or
> the
> > > > > combined
> > > > > > > > sizes of all files trips the limit. We can fix that with
> > > clarifying
> > > > > > > > documentation. While doing so we also have an opportunity to
> > fix
> > > > > > > something
> > > > > > > > if our consensus is the current policy is not the usual user
> > > > > > expectation.
> > > > > > > >
> > > > > > > > So how suboptimal is it? Does a compatibility concern make
> > sense
> > > if
> > > > > we
> > > > > > > > think this is just broken? Perhaps we can address all
> concerns
> > by
> > > > > > making
> > > > > > > > the change in next minor releases and then do those minor
> > > releases
> > > > > > soon.
> > > > > > > >
> > > > > > > >
> > > > > > > > > On Jun 20, 2020, at 11:06 PM, Anoop John <
> > > anoop.hb...@gmail.com>
> > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > I have a concern if we do #3 for all minor versions.  That
> > > will
> > > > > be a
> > > > > > > > major
> > > > > > > > > split behaviour change and can affect so much for tables
> with
> > > > many
> > > > > > CFs.
> > > > > > > > If
> > > > > > > > > one adjusted the pre splits so as to avoid further region
> > > splits,
> > > > > > that
> > > > > > > > calc
> > > > > > > > > might go wrong once they migrate to new minor versions with
> > > this
> > > > > > change
> > > > > > > > > right?
> > > > > > > > > The config name was/is   hbase.hregion.max.*filesize* and
> > > never *
> > > > > > > > > hbase.hregion.max.size*.  We will have HFiles at CF level
> and
> > > so
> > > > a
> > > > > > max
> > > > > > > > > filesize is applicable at CF level.   So even this config
> > name
> > > > will
> > > > > > > > create
> > > > > > > > > confusion once we change the calc to consider size at
> region
> > > > level
> > > > > > (Sum
> > > > > > > > of
> > > > > > > > > sizes at CFs)
> > > > > > > > >
> > > > > > > > > Anoop
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >> On Fri, Jun 19, 2020 at 11:44 PM Viraj Jasani <
> > > > vjas...@apache.org
> > > > > >
> > > > > > > > wrote:
> > > > > > > > >>
> > > > > > > > >> Given that SteppingSplitPolicy is the default region split
> > > > policy,
> > > > > > > > removal
> > > > > > > > >> of IncreasingToUpperBoundRegionSplitPolicy is going to
> make
> > > > things
> > > > > > > more
> > > > > > > > >> complex for master branch if we follow #2.
> > > > > > > > >> Hence, I believe we should better go with #3 for all.
> > > > > > > > >>
> > > > > > > > >>
> > > > > > > > >>> On 2020/06/19 17:52:27, Viraj Jasani <vjas...@apache.org
> >
> > > > wrote:
> > > > > > > > >>> Can we do a mix of #2 and #3 i.e remove
> > > > > > > > >> IncreasingToUpperBoundRegionSplitPolicy from master, and
> > > follow
> > > > #3
> > > > > > for
> > > > > > > > >> branch-2 and all active release branches? If it breaks any
> > > > > > > compatibility
> > > > > > > > >> rules, then we can go with #3 for all.
> > > > > > > > >>>
> > > > > > > > >>>
> > > > > > > > >>> On 2020/06/19 17:33:14, Andrew Purtell <
> > apurt...@apache.org>
> > > > > > wrote:
> > > > > > > > >>>> I vote for #3, and it should be applied to all active
> code
> > > > > lines.
> > > > > > > > >>>>
> > > > > > > > >>>>
> > > > > > > > >>>> On Fri, Jun 19, 2020 at 3:35 AM Wellington Chevreuil <
> > > > > > > > >>>> wellington.chevre...@gmail.com> wrote:
> > > > > > > > >>>>
> > > > > > > > >>>>> While going through the changes proposed on
> HBASE-24530,
> > we
> > > > > > > > >>>>> observed IncreasingToUpperBoundRegionSplitPolicy
> > > > > > > > >>>>> compares hbase.hregion.max.filesize against individual
> > > stores
> > > > > > > within
> > > > > > > > >> a
> > > > > > > > >>>>> region when deciding whether to split a region or not.
> > For
> > > > > tables
> > > > > > > > >> having
> > > > > > > > >>>>> multiple families, this can lead to regions much larger
> > > than
> > > > > > what's
> > > > > > > > >>>>> defined by hbase.hregion.max.filesize.
> > > > > > > > >>>>>
> > > > > > > > >>>>> Current proposal on HBASE-24530 is to add an extra
> policy
> > > > that
> > > > > > > > >> actually
> > > > > > > > >>>>> compares the overall region size (combining all region
> > > stores
> > > > > > > sizes)
> > > > > > > > >>>>> against hbase.hregion.max.filesize, but I wonder if it
> > > really
> > > > > > makes
> > > > > > > > >> sense
> > > > > > > > >>>>> to keep a policy with current
> > > > > > > IncreasingToUpperBoundRegionSplitPolicy
> > > > > > > > >>>>> behaviour. Would like to hear folks opinions if we
> should
> > > > take
> > > > > > any
> > > > > > > > >> of the
> > > > > > > > >>>>> below actions?
> > > > > > > > >>>>> 1) Leave IncreasingToUpperBoundRegionSplitPolicy as it
> is
> > > and
> > > > > > just
> > > > > > > > >> add the
> > > > > > > > >>>>> new policy proposed on HBASE-24530;
> > > > > > > > >>>>> 2) Make IncreasingToUpperBoundRegionSplitPolicy
> > deprecated
> > > > and
> > > > > > > > >> remove it
> > > > > > > > >>>>> from master branch;
> > > > > > > > >>>>> 3) Change IncreasingToUpperBoundRegionSplitPolicy to
> > > actually
> > > > > > > > >> implement the
> > > > > > > > >>>>> logic of the new policy proposed on HBASE-24530;
> > > > > > > > >>>>>
> > > > > > > > >>>>> My view is that the current
> > > > > > IncreasingToUpperBoundRegionSplitPolicy
> > > > > > > > >>>>> behaviour is a bug, and I vote for #3.
> > > > > > > > >>>>>
> > > > > > > > >>>>
> > > > > > > > >>>>
> > > > > > > > >>>> --
> > > > > > > > >>>> Best regards,
> > > > > > > > >>>> Andrew
> > > > > > > > >>>>
> > > > > > > > >>>> Words like orphans lost among the crosstalk, meaning
> torn
> > > from
> > > > > > > truth's
> > > > > > > > >>>> decrepit hands
> > > > > > > > >>>>   - A23, Crosstalk
> > > > > > > > >>>>
> > > > > > > > >>>
> > > > > > > > >>
> > > > > > > >
> > > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Best regards,
> > > > > > Andrew
> > > > > >
> > > > > > Words like orphans lost among the crosstalk, meaning torn from
> > > truth's
> > > > > > decrepit hands
> > > > > >    - A23, Crosstalk
> > > > > >
> > > > >
> > > >
> > >
> >
>
>
> --
> Best regards,
> Andrew
>
> Words like orphans lost among the crosstalk, meaning torn from truth's
> decrepit hands
>    - A23, Crosstalk
>

Reply via email to