Andrew Kyle Purtell created HBASE-27826:
-------------------------------------------

             Summary: Region split and merge time while offline is O(n) with 
respect to number of store files
                 Key: HBASE-27826
                 URL: https://issues.apache.org/jira/browse/HBASE-27826
             Project: HBase
          Issue Type: Bug
    Affects Versions: 2.5.4
            Reporter: Andrew Kyle Purtell


This is a significant availability issue when HFiles are on S3. =

HBASE-26079 ({_}Use StoreFileTracker when splitting and merging{_}) changed the 
split and merge table procedure implementations to indirect through the 
StoreFileTracker implementation when selecting HFiles to be merged or split, 
rather than directly listing those using file system APIs. It also changed the 
commit logic in HRegionFileSystem to add the link/ref files on resulting split 
or merged regions to the StoreFileTracker. However, the creation of a link file 
is still a filesystem operation and creating a “file” on S3 can take well over 
a second. If, for example there are 20 store files in a region, which is not 
uncommon, after the region is taken offline for a split (or merge) it may 
require more than 20 seconds to create the link files before the results can be 
brought back online, creating a severe availability problem. Splits and merges 
are supposed to be fast, completing in less than a second, certainly less than 
a few seconds. This has been true when HFiles are stored on HDFS only because 
file creation operations there are nearly instantaneous. 

There are two issues but both can be handled with modifications to the store 
file tracker interface and the file based store file tracker implementation. 

When the file based store file file tracker is enabled the HFile links should 
be virtual entities that only exist in the file manifest. We do not require 
physical files in the filesystem to serve as links now. That is the magic of 
the this file tracker, the manifest file replaces requirements to list the 
filesystem.

Then, when splitting or merging, the HFile links should be collected into a 
list and committed in one batch using a new FILE file tracker interface, 
requiring only one update of the manifest file in S3, bringing the time 
requirement for this operation to O(1) down from O(n).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to