Xiaolin Ha created HBASE-25322:
----------------------------------

             Summary: Redundant Reference file in bottom region of split
                 Key: HBASE-25322
                 URL: https://issues.apache.org/jira/browse/HBASE-25322
             Project: HBase
          Issue Type: Improvement
    Affects Versions: 3.0.0-alpha-1
            Reporter: Xiaolin Ha
            Assignee: Xiaolin Ha


When we split a region ranges from (,), the bottom region should contain keys 
of(,split key), and the top region should contain keys of [split key, ).

Currently, if we do the following operations:
 # put rowkeys 100,101,102,103,104,105 to a table, and flush the memstore to 
make a hfile with rowkyes 100,101,102,103,104,105;
 # put rowkeys 200,201,202,203,204,205 to the table, and flush the memstore to 
make a hfile with rowkyes 200,201,202,203,204,205;
 # split the table region, using split key 200;
 # then the bottom region will has two Reference files, while the top region 
only has one.

But we expect the bottom region has only one Reference file as the the top 
region.

That's because when generating Reference files in child region,  the bottom 
region used the `PrivateCellUtil.createLastOnRow(splitRow)` cell to compare to 
first keys in the hfiles, while the top region used 
`PrivateCellUtil.createFirstOnRow(splitRow)` cell to compare to last keys in 
the hfiles.

`LastOnRow(splitRow)` means the maximum row generated by the split row, while 
`FirstOnRow(splitRow)` means the minimus row generated by the split row. The 
split row should be in the top region. And we should use `FirstOnRow(splitRow)` 
compare to hfile first and last keys in both bottom and top region. 

Though the redundant Reference file will not be read by the bottom region, the 
compaction of the redundant Reference file will result in empty file if only 
this redundant Reference file participates in a compaction.

 

 

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to