[jira] [Resolved] (HBASE-25972) Dual File Compaction

Andrew Kyle Purtell (Jira) Fri, 17 May 2024 12:44:13 -0700


     [ 
https://issues.apache.org/jira/browse/HBASE-25972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Andrew Kyle Purtell resolved HBASE-25972.
-----------------------------------------
    Fix Version/s: 2.6.1
                   2.5.9
     Hadoop Flags: Reviewed
     Release Note: The default compactor in HBase compacts HFiles into one 
file. This change introduces a new store file writer which writes the retained 
cells by compaction into two files, which will be called DualFileWriter. One of 
these files will include the live cells. This file will be called a 
live-version file. The other file will include the rest of the cells, that is, 
historical versions. This file will be called a historical-version file. 
DualFileWriter will work with the default compactor. The historical files will 
not be read for the scans scanning latest row versions. This eliminates 
scanning unnecessary cell versions in compacted files and thus it is expected 
to improve performance of these scans.
       Resolution: Fixed

> Dual File Compaction
> --------------------
>
>                 Key: HBASE-25972
>                 URL: https://issues.apache.org/jira/browse/HBASE-25972
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Kadir Ozdemir
>            Assignee: Kadir Ozdemir
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 2.7.0, 3.0.0-beta-2, 2.6.1, 2.5.9
>
>
> HBase stores tables row by row in its files, HFiles. An HFile is composed of 
> blocks. The number of rows stored in a block depends on the row sizes. The 
> number of rows per block gets lower when rows get larger on disk due to 
> multiple row versions since HBase stores all row versions sequentially in the 
> same HFile after compaction. However, applications (e.g., Phoenix) mostly 
> query the most recent row versions.
> The default compactor in HBase compacts HFiles into one file. This Jira 
> introduces a new store file writer which writes the retained cells by 
> compaction into two files, which will be called DualFileWriter. One of these 
> files will include the live cells. This file will be called a live-version 
> file. The other file will include the rest of the cells, that is, historical 
> versions. This file will be called a historical-version file. DualFileWriter 
> will work with the default compactor.
> The historical files will not be read for the scans scanning latest row 
> versions. This eliminates scanning unnecessary cell versions in compacted 
> files and thus it is expected to improve performance of these scans.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Resolved] (HBASE-25972) Dual File Compaction

Reply via email to