[jira] [Commented] (HBASE-27232) Fix checking for encoded block size when deciding if block should be closed

Wellington Chevreuil (Jira) Fri, 22 Jul 2022 02:40:10 -0700


    [ 
https://issues.apache.org/jira/browse/HBASE-27232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17569904#comment-17569904
 ]


Wellington Chevreuil commented on HBASE-27232:
----------------------------------------------

{quote}
Did you have any indication in the BucketCache itself that changing this value 
helped it in some way? Maybe more blocks fit in?
{quote}

It helped us with the L1 mapping, because it reduced the number of blocks. 
Without it, our dataset had too many blocks, and the number of L1 mapping 
objects to the L2 was just filling up RSes heap. By reducing number of blocks, 
heap consumption by L1 mappings also decreased. An alternative workaround to 
this code change is to simply raise the block size. But then that would be 
needed on a table basis.

 

{quote}
If you're experimenting with BucketCache you may also be interested in these 2 
jiras I have open: https://issues.apache.org/jira/browse/HBASE-27229 and 
https://issues.apache.org/jira/browse/HBASE-27225. The latter specifically 
might relate to this block size skew.
{quote}

Thanks for sharing it. Look interesting for us, indeed.
 

> Fix checking for encoded block size when deciding if block should be closed
> ---------------------------------------------------------------------------
>
>                 Key: HBASE-27232
>                 URL: https://issues.apache.org/jira/browse/HBASE-27232
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Wellington Chevreuil
>            Assignee: Wellington Chevreuil
>            Priority: Major
>
> On HFileWriterImpl.checkBlockBoundary, we useed to consider the unencoded and 
> uncompressed data size when deciding to close a block and start a new one. 
> That could lead to varying "on-disk" block sizes, depending on the encoding 
> efficiency for the cells in each block.
> HBASE-17757 introduced the hbase.writer.unified.encoded.blocksize.ratio 
> property, as ration of the original configured block size, to be compared 
> against the encoded size. This was an attempt to ensure homogeneous block 
> sizes. However, the check introduced by HBASE-17757 also considers the 
> unencoded size, which in the cases where encoding efficiency is higher than 
> what's configured in hbase.writer.unified.encoded.blocksize.ratio, it would 
> still lead to varying block sizes.
> This patch changes that logic, to only consider encoded size if 
> hbase.writer.unified.encoded.blocksize.ratio property is set, otherwise, it 
> will consider the unencoded size. This gives a finer control over the on-disk 
> block sizes and the overall number of blocks when encoding is in use.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (HBASE-27232) Fix checking for encoded block size when deciding if block should be closed

Reply via email to