[ 
https://issues.apache.org/jira/browse/HBASE-27650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17690136#comment-17690136
 ] 

Bryan Beaudreault commented on HBASE-27650:
-------------------------------------------

One option here is, after scanning meta when we are going to cache the new 
locations, we can call MetaTableAccessor.getMergeRegions to find any merge 
regions in the meta result. If any exists, proactively clear them from cache.

The problem with this is, the CatalogJanitor will eventually clear out these 
merge qualifiers. If no requests come for any of the merged regions before that 
happens, we'll be left in the same situation as before.

> Merging empty regions corrupts meta cache
> -----------------------------------------
>
>                 Key: HBASE-27650
>                 URL: https://issues.apache.org/jira/browse/HBASE-27650
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Bryan Beaudreault
>            Priority: Major
>
> Let's say you have three regions with start keys A, B, C and all are cached 
> in the meta cache. Region B is empty and not getting any requests, and all 3 
> regions are merged together. The new merged region has start key A.
> A user submits a request for row C1, which would previously have gone to 
> region C. That region no longer exists, so the MetaCache returns region C, 
> the request goes out to the server which throws NotServingRegionException. 
> That region C is now removed from the cache, and meta is scanned. The meta 
> scan returns the newly merged region A, which is cached into the MetaCache.
> So now we have a MetaCache where A has been updated with the newly merged 
> RegionInfo, B still exists with the old/deleted RegionInfo, and C has been 
> removed.
> A user submits a request for row C1 again. This _should_ go to region A, but 
> we do cache.floorEntry(C1) which returns the old but still cached region B. 
> We have checks in MetaCache which validate the RegionInfo.getEndKey() against 
> the requested row, and that validation fails because C1 is beyond the endkey 
> of the old region. The cached region B result is ignored and cache returns 
> null. Meta is scanned, and returns the new region A, which is cached again.
> Requests to rows C1+ will still succeed... but they will always require a 
> meta scan because the meta cache will always return that old region B which 
> is invalid and doesn't contain the C1+ rows.
> Currently, the only way this will ever resolve is if a request is sent to 
> region B, which will cause a NotServingRegionException which will finally 
> clear region B from the cache. At that point, requests for C1+ will properly 
> get resolved to region A in the cache.
> I've created a reproducible test case here: 
> [https://gist.github.com/bbeaudreault/c82ff9f8ad0b9424eb987483ede35c12]
> This problem affects both AsyncTable and branch-2's Table.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to