sharmaar12 commented on PR #7149:
URL: https://github.com/apache/hbase/pull/7149#issuecomment-3270669250

   > Let's try to investigate the cases with different store file tracker 
implementations.
   > 
   > **1. DefaultStoreFileTracker**
   > 
   > ```java
   > /**
   >  * The default implementation for store file tracker, where we do not 
persist the store file list,
   >  * and use listing when loading store files.
   >  */
   > @InterfaceAudience.Private
   > class DefaultStoreFileTracker extends StoreFileTrackerBase {
   > ```
   > 
   > So, in this case the refresh command should always get a list of all 
HFiles in the CF directory and should be able to detect new HFiles 
automatically. Is the correct?
   
   Yes. In this case we will be able to detect and load the newly added files.
   
   > 
   > **2. File based tracker**
   > 
   > ```java
   > /**
   >  * A file based store file tracker.
   >  * <p/>
   >  * For this tracking way, the store file list will be persistent into a 
file, so we can write the
   >  * new store files directly to the final data directory, as we will not 
load the broken files. This
   >  * will greatly reduce the time for flush and compaction on some object 
storages as a rename is
   >  * actual a copy on them. And it also avoid listing when loading store 
file list, which could also
   >  * speed up the loading of store files as listing is also not a fast 
operation on most object
   >  * storages.
   >  */
   > @InterfaceAudience.Private
   > class FileBasedStoreFileTracker extends StoreFileTrackerBase {
   > ```
   > 
   > I think this is the case that you're talking about. In this case SFT might 
be able or might not be able to detect new HFiles depending on whether the SFT 
file has been updated or not.
   
   In this case, the copy has to be done via HBase only so the tracking file 
(IIRC `.filelist` is the file we used for tracking) gets updated properly.
   
   > So, basically if I just copy a new file to the CF directory, it won't be 
detected, because of the reasons you mentioned.
   
   In reality, with FileBasedTracker we will not be able to detect this because 
simply copying (say in S3) will not update our tracking file (`.filelist`). The 
testing issue which we faces is related to opening file for read and not 
detecting/loading the file.  
(https://github.com/apache/hbase/pull/7149#issuecomment-3269427414)
   
   > But if the new HFile was properly added by another cluster which is using 
the same SFT implementation, the file must have been updated properly, so our 
cluster will pick it up.
   
   Yes its correct. In case of, DefaultStoreFileTracker (in case of HDFS), we 
rely on listing from directory directly so no issue if manual copy happens or 
not.
   In case of, FileBasedTracker (such as S3), we will be able to detect/load 
the newly added HFiles if its added by Active cluster as that will update 
`.filelist`, also as its shared between read-replica and active cluster. Read 
replica will be able to load it. The caveat is if someone manually copy to S3 
without active cluster aware of it then neither Active or readonly be able to 
load it.
   
   > If all the above are true, do we need to add any additional logic to the 
command?
   
   I don't think we need to add additional logic as if Active cluster 
create/modify Hfiles then it will get updated in both active as well as read 
replica cluster.
   Only the case where user deliberately changes internal structure (copying 
files directly to S3 without using bulkload) then its expected to have 
inconsistent behavior.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to