> Current IncreasingToUpperBoundRegionSplitPolicy implementation is
violating those configs.

Thank you for pointing this out. I feel even more strongly now this is a
bug.
I vote for #3.

On Tue, Jun 23, 2020 at 2:42 AM Wellington Chevreuil <
wellington.chevre...@gmail.com> wrote:

> >
> > The config name was/is   hbase.hregion.max.*filesize* and never *
> > hbase.hregion.max.size*.
> >
>
> Description for hbase.hregion.max.filesize is very clear stating that it's
> the sum of all hfiles in the region that should not exceed this property
> value. And we not always use  *hbase.hregion.max.filesize* to determine the
> limit, but a MAX_FILESIZE table level descriptor whose description reads as
> below, on TableDescriptorBuilder javadoc:
>
>   /**
>    * Returns the maximum size upto which a region can grow to after which a
>    * region split is triggered. The region size is represented by the size
> of
>    * the biggest store file in that region.
>    *
>    * @return max hregion size for table, -1 if not set.
>    */
>
> Current IncreasingToUpperBoundRegionSplitPolicy implementation is violating
> those configs.
>
> Do we have a consensus on applying #3 for all active branches? If so, I
> would instruct HBASE-24530 to proceed as such.
>
>
>
> Em dom., 21 de jun. de 2020 às 19:09, Andrew Purtell <
> andrew.purt...@gmail.com> escreveu:
>
> > ‘Filesize’ and ‘size’ are ambiguous. They are open to interpretation and
> I
> > don’t see one as more clear than the other, other than to imply something
> > about file level measures being the determining factor. It doesn’t convey
> > more semantics beyond that, ie one file trips the limit or the combined
> > sizes of all files trips the limit. We can fix that with clarifying
> > documentation. While doing so we also have an opportunity to fix
> something
> > if our consensus is the current policy is not the usual user expectation.
> >
> > So how suboptimal is it? Does a compatibility concern make sense if we
> > think this is just broken? Perhaps we can address all concerns by making
> > the change in next minor releases and then do those minor releases soon.
> >
> >
> > > On Jun 20, 2020, at 11:06 PM, Anoop John <anoop.hb...@gmail.com>
> wrote:
> > >
> > > I have a concern if we do #3 for all minor versions.  That will be a
> > major
> > > split behaviour change and can affect so much for tables with many CFs.
> > If
> > > one adjusted the pre splits so as to avoid further region splits, that
> > calc
> > > might go wrong once they migrate to new minor versions with this change
> > > right?
> > > The config name was/is   hbase.hregion.max.*filesize* and never *
> > > hbase.hregion.max.size*.  We will have HFiles at CF level and so a max
> > > filesize is applicable at CF level.   So even this config name will
> > create
> > > confusion once we change the calc to consider size at region level (Sum
> > of
> > > sizes at CFs)
> > >
> > > Anoop
> > >
> > >
> > >> On Fri, Jun 19, 2020 at 11:44 PM Viraj Jasani <vjas...@apache.org>
> > wrote:
> > >>
> > >> Given that SteppingSplitPolicy is the default region split policy,
> > removal
> > >> of IncreasingToUpperBoundRegionSplitPolicy is going to make things
> more
> > >> complex for master branch if we follow #2.
> > >> Hence, I believe we should better go with #3 for all.
> > >>
> > >>
> > >>> On 2020/06/19 17:52:27, Viraj Jasani <vjas...@apache.org> wrote:
> > >>> Can we do a mix of #2 and #3 i.e remove
> > >> IncreasingToUpperBoundRegionSplitPolicy from master, and follow #3 for
> > >> branch-2 and all active release branches? If it breaks any
> compatibility
> > >> rules, then we can go with #3 for all.
> > >>>
> > >>>
> > >>> On 2020/06/19 17:33:14, Andrew Purtell <apurt...@apache.org> wrote:
> > >>>> I vote for #3, and it should be applied to all active code lines.
> > >>>>
> > >>>>
> > >>>> On Fri, Jun 19, 2020 at 3:35 AM Wellington Chevreuil <
> > >>>> wellington.chevre...@gmail.com> wrote:
> > >>>>
> > >>>>> While going through the changes proposed on HBASE-24530, we
> > >>>>> observed IncreasingToUpperBoundRegionSplitPolicy
> > >>>>> compares hbase.hregion.max.filesize against individual stores
> within
> > >> a
> > >>>>> region when deciding whether to split a region or not. For tables
> > >> having
> > >>>>> multiple families, this can lead to regions much larger than what's
> > >>>>> defined by hbase.hregion.max.filesize.
> > >>>>>
> > >>>>> Current proposal on HBASE-24530 is to add an extra policy that
> > >> actually
> > >>>>> compares the overall region size (combining all region stores
> sizes)
> > >>>>> against hbase.hregion.max.filesize, but I wonder if it really makes
> > >> sense
> > >>>>> to keep a policy with current
> IncreasingToUpperBoundRegionSplitPolicy
> > >>>>> behaviour. Would like to hear folks opinions if we should take any
> > >> of the
> > >>>>> below actions?
> > >>>>> 1) Leave IncreasingToUpperBoundRegionSplitPolicy as it is and just
> > >> add the
> > >>>>> new policy proposed on HBASE-24530;
> > >>>>> 2) Make IncreasingToUpperBoundRegionSplitPolicy deprecated and
> > >> remove it
> > >>>>> from master branch;
> > >>>>> 3) Change IncreasingToUpperBoundRegionSplitPolicy to actually
> > >> implement the
> > >>>>> logic of the new policy proposed on HBASE-24530;
> > >>>>>
> > >>>>> My view is that the current IncreasingToUpperBoundRegionSplitPolicy
> > >>>>> behaviour is a bug, and I vote for #3.
> > >>>>>
> > >>>>
> > >>>>
> > >>>> --
> > >>>> Best regards,
> > >>>> Andrew
> > >>>>
> > >>>> Words like orphans lost among the crosstalk, meaning torn from
> truth's
> > >>>> decrepit hands
> > >>>>   - A23, Crosstalk
> > >>>>
> > >>>
> > >>
> >
>


-- 
Best regards,
Andrew

Words like orphans lost among the crosstalk, meaning torn from truth's
decrepit hands
   - A23, Crosstalk

Reply via email to