[jira] [Commented] (HBASE-25972) Dual File Compaction

Bryan Beaudreault (Jira) Fri, 17 May 2024 09:41:05 -0700


    [ 
https://issues.apache.org/jira/browse/HBASE-25972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17847374#comment-17847374
 ]


Bryan Beaudreault commented on HBASE-25972:
-------------------------------------------

As a general philosophy, I'd love to move towards bug fixes only in patch 
versions. So that we can push out more minor/major releases. But I feel like 
we're a ways away from having the bandwidth for that or knowing what that means 
for supporting older versions, etc.

Selfishly, I'd love to get this feature in 2.6.x because I plan to stay on this 
release line at my company for a while and we have an interest in that.

So I realize this doesn't sound very internally consistent, but since there are 
no compatibility issues I think it'd be nice to get into branch-2.6.

> Dual File Compaction
> --------------------
>
>                 Key: HBASE-25972
>                 URL: https://issues.apache.org/jira/browse/HBASE-25972
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Kadir Ozdemir
>            Assignee: Kadir Ozdemir
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 2.7.0, 3.0.0-beta-2
>
>
> HBase stores tables row by row in its files, HFiles. An HFile is composed of 
> blocks. The number of rows stored in a block depends on the row sizes. The 
> number of rows per block gets lower when rows get larger on disk due to 
> multiple row versions since HBase stores all row versions sequentially in the 
> same HFile after compaction. However, applications (e.g., Phoenix) mostly 
> query the most recent row versions.
> The default compactor in HBase compacts HFiles into one file. This Jira 
> introduces a new store file writer which writes the retained cells by 
> compaction into two files, which will be called DualFileWriter. One of these 
> files will include the live cells. This file will be called a live-version 
> file. The other file will include the rest of the cells, that is, historical 
> versions. This file will be called a historical-version file. DualFileWriter 
> will work with the default compactor.
> The historical files will not be read for the scans scanning latest row 
> versions. This eliminates scanning unnecessary cell versions in compacted 
> files and thus it is expected to improve performance of these scans.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (HBASE-25972) Dual File Compaction

Reply via email to