‘Filesize’ and ‘size’ are ambiguous. They are open to interpretation and I 
don’t see one as more clear than the other, other than to imply something about 
file level measures being the determining factor. It doesn’t convey more 
semantics beyond that, ie one file trips the limit or the combined sizes of all 
files trips the limit. We can fix that with clarifying documentation. While 
doing so we also have an opportunity to fix something if our consensus is the 
current policy is not the usual user expectation. 

So how suboptimal is it? Does a compatibility concern make sense if we think 
this is just broken? Perhaps we can address all concerns by making the change 
in next minor releases and then do those minor releases soon. 


> On Jun 20, 2020, at 11:06 PM, Anoop John <anoop.hb...@gmail.com> wrote:
> 
> I have a concern if we do #3 for all minor versions.  That will be a major
> split behaviour change and can affect so much for tables with many CFs. If
> one adjusted the pre splits so as to avoid further region splits, that calc
> might go wrong once they migrate to new minor versions with this change
> right?
> The config name was/is   hbase.hregion.max.*filesize* and never *
> hbase.hregion.max.size*.  We will have HFiles at CF level and so a max
> filesize is applicable at CF level.   So even this config name will create
> confusion once we change the calc to consider size at region level (Sum of
> sizes at CFs)
> 
> Anoop
> 
> 
>> On Fri, Jun 19, 2020 at 11:44 PM Viraj Jasani <vjas...@apache.org> wrote:
>> 
>> Given that SteppingSplitPolicy is the default region split policy, removal
>> of IncreasingToUpperBoundRegionSplitPolicy is going to make things more
>> complex for master branch if we follow #2.
>> Hence, I believe we should better go with #3 for all.
>> 
>> 
>>> On 2020/06/19 17:52:27, Viraj Jasani <vjas...@apache.org> wrote:
>>> Can we do a mix of #2 and #3 i.e remove
>> IncreasingToUpperBoundRegionSplitPolicy from master, and follow #3 for
>> branch-2 and all active release branches? If it breaks any compatibility
>> rules, then we can go with #3 for all.
>>> 
>>> 
>>> On 2020/06/19 17:33:14, Andrew Purtell <apurt...@apache.org> wrote:
>>>> I vote for #3, and it should be applied to all active code lines.
>>>> 
>>>> 
>>>> On Fri, Jun 19, 2020 at 3:35 AM Wellington Chevreuil <
>>>> wellington.chevre...@gmail.com> wrote:
>>>> 
>>>>> While going through the changes proposed on HBASE-24530, we
>>>>> observed IncreasingToUpperBoundRegionSplitPolicy
>>>>> compares hbase.hregion.max.filesize against individual stores within
>> a
>>>>> region when deciding whether to split a region or not. For tables
>> having
>>>>> multiple families, this can lead to regions much larger than what's
>>>>> defined by hbase.hregion.max.filesize.
>>>>> 
>>>>> Current proposal on HBASE-24530 is to add an extra policy that
>> actually
>>>>> compares the overall region size (combining all region stores sizes)
>>>>> against hbase.hregion.max.filesize, but I wonder if it really makes
>> sense
>>>>> to keep a policy with current IncreasingToUpperBoundRegionSplitPolicy
>>>>> behaviour. Would like to hear folks opinions if we should take any
>> of the
>>>>> below actions?
>>>>> 1) Leave IncreasingToUpperBoundRegionSplitPolicy as it is and just
>> add the
>>>>> new policy proposed on HBASE-24530;
>>>>> 2) Make IncreasingToUpperBoundRegionSplitPolicy deprecated and
>> remove it
>>>>> from master branch;
>>>>> 3) Change IncreasingToUpperBoundRegionSplitPolicy to actually
>> implement the
>>>>> logic of the new policy proposed on HBASE-24530;
>>>>> 
>>>>> My view is that the current IncreasingToUpperBoundRegionSplitPolicy
>>>>> behaviour is a bug, and I vote for #3.
>>>>> 
>>>> 
>>>> 
>>>> --
>>>> Best regards,
>>>> Andrew
>>>> 
>>>> Words like orphans lost among the crosstalk, meaning torn from truth's
>>>> decrepit hands
>>>>   - A23, Crosstalk
>>>> 
>>> 
>> 

Reply via email to