Re: [PR] HBASE-29328: Implement new HBase command: refresh_hfiles [hbase]

via GitHub Tue, 09 Sep 2025 01:07:24 -0700


sharmaar12 commented on PR #7149:
URL: https://github.com/apache/hbase/pull/7149#issuecomment-3269427414


   > Tested with the following steps:
   > 
   > 1. Create new table 'andor' and added a row and flushed
   > 
   > ```
   > hbase:002:0> create 'andor', 'cf1'
   > Created table andor
   > Took 0.6623 seconds
   > => Hbase::Table - andor
   > hbase:005:0> put 'andor', 'r1', 'cf1', 'bela1'
   > Took 0.0099 seconds
   > hbase:007:0> flush 'andor'
   > Took 0.3376 seconds
   > 
   > $ ls -al cf1/
   > total 16
   > drwxr-xr-x  3 andor  staff    96 Sep  2 14:42 .
   > drwxr-xr-x  6 andor  staff   192 Sep  2 14:23 ..
   > -rw-r--r--  1 andor  staff  4959 Sep  2 14:23 
56eb73a0801c4c9c91164e74dfaecebe
   > ```
   > 
   > 2. Added another row and flushed again
   > 
   > ```
   > hbase:020:0> scan 'andor'
   > ROW  COLUMN+CELL
   >  r1 column=cf1:, timestamp=2025-09-02T14:23:00.052, value=bela1
   >  r2 column=cf1:, timestamp=2025-09-02T14:42:42.472, value=bela2
   > 2 row(s)
   > Took 0.0029 seconds
   > 
   > $ ls -al cf1/
   > total 32
   > drwxr-xr-x  4 andor  staff   128 Sep  2 14:43 .
   > drwxr-xr-x  6 andor  staff   192 Sep  2 14:23 ..
   > -rw-r--r--  1 andor  staff  4959 Sep  2 14:23 
56eb73a0801c4c9c91164e74dfaecebe
   > -rw-r--r--  1 andor  staff  4959 Sep  2 14:42 
ecd71932ecb44b639ede20b42bc7b397
   > ```
   > 
   > 3. Moved away the second HFile (ecd71932ecb44b639ede20b42bc7b397) and done 
**refresh_hfiles** successfully. I only see the first row in the table.
   > 
   > ```
   > hbase:024:0> scan 'andor'
   > ROW  COLUMN+CELL
   >  r1 column=cf1:, timestamp=2025-09-02T14:23:00.052, value=bela1
   > 1 row(s)
   > ```
   > 
   > 4. Moved back the second HFile (ecd71932ecb44b639ede20b42bc7b397) and done 
**refresh_hfiles** again, but I can't see the second row no matter what I do.
   > 
   > ```
   > $ ls -al cf1/
   > total 32
   > drwxr-xr-x  4 andor  staff   128 Sep  2 14:43 .
   > drwxr-xr-x  6 andor  staff   192 Sep  2 14:23 ..
   > -rw-r--r--  1 andor  staff  4959 Sep  2 14:23 
56eb73a0801c4c9c91164e74dfaecebe
   > -rw-r--r--  1 andor  staff  4959 Sep  2 14:42 
ecd71932ecb44b639ede20b42bc7b397
   > 
   > hbase:025:0> refresh_hfiles
   > Took 0.0017 seconds
   > => 21
   > hbase:026:0> scan 'andor'
   > ROW  COLUMN+CELL
   >  r1 column=cf1:, timestamp=2025-09-02T14:23:00.052, value=bela1
   > 1 row(s)
   > Took 0.0023 seconds
   > ```
   > 
   > 5. Restarting HBase solves the problem, I can see the second row again.
   
   (After discussion and help from @wchevreuil we have arrived at the following 
conclusion)
   
   **TL;DR:** The above scenarios is an invalid test expectation and will never 
arise in the real world active and read-replica scenarios. The issue is coming 
because the Hfile name conflict when removing (marking it as compacted) and 
adding the same file (compacted file) back to the store. In case of active and 
read-replica the HFile names will always be unique hence the name conflict will 
never arise.
   
   **Detailed Explanation:**
   We can break the refreshHFiles in two parts:
   **Step1:** Detecting/Loading the newly added HFiles
   **Step2:** Making the newly added HFiles available for reading
   
   **Our Test scenario is:**
   
   1. Create Table with two rows, execute flush after each insert so that we 
have two HFiles.
   2. Move one HFile to another directory
   3. RefreshHFiles
   4. Move back the same file in 2 to CF (Column Family/store) directory
   5. Again RefreshHFiles (Expected behavior is we should see the two rows that 
we have added in 1)
   
   In both 3 and 5 we detect that one file is deleted and one file is newly 
added respectively. But why the data is not available for reading post 5 is 
because behavior in 3.
   
   Let see details of what happen in 3, say there are two HFiles `hf1` and 
`hf2` and you removed `hf1`, when we execute refereshHfiles then we detect that 
`hf1` is gone so we removed it from our SFT structure but internally HBase does 
not remove it, it actually mark it as compacted file. This is done so that for 
the transaction coming before this refresh should have access to it and for 
transaction coming after this refresh should not have access. These marked 
compacted files will get clean up by background chores. Note that the in memory 
structure has marked the file (with file name `hf1`) as compacted so anything 
coming after this point will not be able to access it.
   
   In 5, **Step1** of refreshHfiles successfully determine that `hf1` has been 
added newly, but when we try to open it for reading (**Step2**), it is not 
allowed because `hf1` (same name) is marked as compacted. So refreshing Hfiles 
is not an issue but reading the file is.
   
   If we try changing the name of the file before copying back then it work 
properly.
   Hence, we can safely assume, this scenario will not arise in case of Active 
and Read replica as the HFile names will always be unique and will not have 
such conflicts.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] HBASE-29328: Implement new HBase command: refresh_hfiles [hbase]

Reply via email to