> Current IncreasingToUpperBoundRegionSplitPolicy implementation is violating those configs.
Thank you for pointing this out. I feel even more strongly now this is a bug. I vote for #3. On Tue, Jun 23, 2020 at 2:42 AM Wellington Chevreuil < wellington.chevre...@gmail.com> wrote: > > > > The config name was/is hbase.hregion.max.*filesize* and never * > > hbase.hregion.max.size*. > > > > Description for hbase.hregion.max.filesize is very clear stating that it's > the sum of all hfiles in the region that should not exceed this property > value. And we not always use *hbase.hregion.max.filesize* to determine the > limit, but a MAX_FILESIZE table level descriptor whose description reads as > below, on TableDescriptorBuilder javadoc: > > /** > * Returns the maximum size upto which a region can grow to after which a > * region split is triggered. The region size is represented by the size > of > * the biggest store file in that region. > * > * @return max hregion size for table, -1 if not set. > */ > > Current IncreasingToUpperBoundRegionSplitPolicy implementation is violating > those configs. > > Do we have a consensus on applying #3 for all active branches? If so, I > would instruct HBASE-24530 to proceed as such. > > > > Em dom., 21 de jun. de 2020 às 19:09, Andrew Purtell < > andrew.purt...@gmail.com> escreveu: > > > ‘Filesize’ and ‘size’ are ambiguous. They are open to interpretation and > I > > don’t see one as more clear than the other, other than to imply something > > about file level measures being the determining factor. It doesn’t convey > > more semantics beyond that, ie one file trips the limit or the combined > > sizes of all files trips the limit. We can fix that with clarifying > > documentation. While doing so we also have an opportunity to fix > something > > if our consensus is the current policy is not the usual user expectation. > > > > So how suboptimal is it? Does a compatibility concern make sense if we > > think this is just broken? Perhaps we can address all concerns by making > > the change in next minor releases and then do those minor releases soon. > > > > > > > On Jun 20, 2020, at 11:06 PM, Anoop John <anoop.hb...@gmail.com> > wrote: > > > > > > I have a concern if we do #3 for all minor versions. That will be a > > major > > > split behaviour change and can affect so much for tables with many CFs. > > If > > > one adjusted the pre splits so as to avoid further region splits, that > > calc > > > might go wrong once they migrate to new minor versions with this change > > > right? > > > The config name was/is hbase.hregion.max.*filesize* and never * > > > hbase.hregion.max.size*. We will have HFiles at CF level and so a max > > > filesize is applicable at CF level. So even this config name will > > create > > > confusion once we change the calc to consider size at region level (Sum > > of > > > sizes at CFs) > > > > > > Anoop > > > > > > > > >> On Fri, Jun 19, 2020 at 11:44 PM Viraj Jasani <vjas...@apache.org> > > wrote: > > >> > > >> Given that SteppingSplitPolicy is the default region split policy, > > removal > > >> of IncreasingToUpperBoundRegionSplitPolicy is going to make things > more > > >> complex for master branch if we follow #2. > > >> Hence, I believe we should better go with #3 for all. > > >> > > >> > > >>> On 2020/06/19 17:52:27, Viraj Jasani <vjas...@apache.org> wrote: > > >>> Can we do a mix of #2 and #3 i.e remove > > >> IncreasingToUpperBoundRegionSplitPolicy from master, and follow #3 for > > >> branch-2 and all active release branches? If it breaks any > compatibility > > >> rules, then we can go with #3 for all. > > >>> > > >>> > > >>> On 2020/06/19 17:33:14, Andrew Purtell <apurt...@apache.org> wrote: > > >>>> I vote for #3, and it should be applied to all active code lines. > > >>>> > > >>>> > > >>>> On Fri, Jun 19, 2020 at 3:35 AM Wellington Chevreuil < > > >>>> wellington.chevre...@gmail.com> wrote: > > >>>> > > >>>>> While going through the changes proposed on HBASE-24530, we > > >>>>> observed IncreasingToUpperBoundRegionSplitPolicy > > >>>>> compares hbase.hregion.max.filesize against individual stores > within > > >> a > > >>>>> region when deciding whether to split a region or not. For tables > > >> having > > >>>>> multiple families, this can lead to regions much larger than what's > > >>>>> defined by hbase.hregion.max.filesize. > > >>>>> > > >>>>> Current proposal on HBASE-24530 is to add an extra policy that > > >> actually > > >>>>> compares the overall region size (combining all region stores > sizes) > > >>>>> against hbase.hregion.max.filesize, but I wonder if it really makes > > >> sense > > >>>>> to keep a policy with current > IncreasingToUpperBoundRegionSplitPolicy > > >>>>> behaviour. Would like to hear folks opinions if we should take any > > >> of the > > >>>>> below actions? > > >>>>> 1) Leave IncreasingToUpperBoundRegionSplitPolicy as it is and just > > >> add the > > >>>>> new policy proposed on HBASE-24530; > > >>>>> 2) Make IncreasingToUpperBoundRegionSplitPolicy deprecated and > > >> remove it > > >>>>> from master branch; > > >>>>> 3) Change IncreasingToUpperBoundRegionSplitPolicy to actually > > >> implement the > > >>>>> logic of the new policy proposed on HBASE-24530; > > >>>>> > > >>>>> My view is that the current IncreasingToUpperBoundRegionSplitPolicy > > >>>>> behaviour is a bug, and I vote for #3. > > >>>>> > > >>>> > > >>>> > > >>>> -- > > >>>> Best regards, > > >>>> Andrew > > >>>> > > >>>> Words like orphans lost among the crosstalk, meaning torn from > truth's > > >>>> decrepit hands > > >>>> - A23, Crosstalk > > >>>> > > >>> > > >> > > > -- Best regards, Andrew Words like orphans lost among the crosstalk, meaning torn from truth's decrepit hands - A23, Crosstalk