The return value is in bytes, the problem is that we normalize the size in
MB and then multiply MB to get the size in bytes, so if a file is less than
1MB, the returned value will be zero.

Need to investigate more here.

Reading the issue, the scalable problem they wanted to solve is that we
will go to master to get the region size, not about whether the unit is in
MB or not.

Thanks.

Nick Dimiduk <ndimi...@apache.org> 于2021年10月13日周三 上午7:47写道:

> Hi Norbert,
>
> To answer your question directly: the RegionSizeCalculator class is
> annotated with @InterfaceAudience.Private, which means there's a good
> chance that it's implementation can be changed without need for a
> deprecation cycle and user participation.
>
> Curiously, I noticed that this `sizeMap` is accessed down in the method
> `long getRegionSize(byte[])`, and its javadoc mentions the returned unit
> explicitly as bytes.
>
> So with a little investigation using git blame, I see that the switch from
> returning values in bytes to values in megabytes came in through
> HBASE-16169 -- your proposed change was the old implementation. For
> whatever reasons, it was determined to not be scalable. So, we could revert
> back, but we'd need some new solution to what HBASE-16169 aimed to solve.
>
> I hope this helps.
>
> Thanks,
> Nick
>
> On Tue, Oct 12, 2021 at 10:54 AM Norbert Kalmar <nkal...@apache.org>
> wrote:
>
> > Hi All,
> >
> > There is a new optimization in spark (SPARK-34809) where
> ignoreEmptySplits
> > filters out all regions that's size is 0. They use a hadoop library
> > getSize() in TableInputFormat.
> >
> > Drilling down, this will return Bytes, but it converts it from MegaBytes
> -
> > meaning anything under 1 MB will come down as 0 Bytes, meaning empty.
> > I did a quick PR I thought would help:
> > https://github.com/apache/hbase/pull/3737
> > But it turns out it's not as easy as requesting the size in Bytes instead
> > of MB from Size class, as we set it in MB te begin with in
> > RegionMetricsBuilder
> > -> setStoreFileSize(new Size(regionLoadPB.getStorefileSizeMB(),
> > Size.Unit.MEGABYTE))
> >
> > I did some testing, and inserting a few kilobytes of data, then
> > calling list_regions
> > will in fact give back size 0.
> >
> > My question is, is it okay to store the region size in Bytes instead?
> > Mainly asking because of backward compatibility reasons.
> >
> > Regards,
> > Norbert
> >
>

Reply via email to