[ https://issues.apache.org/jira/browse/HBASE-27650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17694262#comment-17694262 ]
Duo Zhang commented on HBASE-27650: ----------------------------------- I think branch-2.4 could also be fixed? As this is a bug... > Merging empty regions corrupts meta cache > ----------------------------------------- > > Key: HBASE-27650 > URL: https://issues.apache.org/jira/browse/HBASE-27650 > Project: HBase > Issue Type: Bug > Reporter: Bryan Beaudreault > Assignee: Bryan Beaudreault > Priority: Major > Labels: patch-available > Fix For: 2.6.0, 3.0.0-alpha-4, 2.5.4 > > > Let's say you have three regions with start keys A, B, C and all are cached > in the meta cache. Region B is empty and not getting any requests, and all 3 > regions are merged together. The new merged region has start key A. > A user submits a request for row C1, which would previously have gone to > region C. That region no longer exists, so the MetaCache returns region C, > the request goes out to the server which throws NotServingRegionException. > That region C is now removed from the cache, and meta is scanned. The meta > scan returns the newly merged region A, which is cached into the MetaCache. > So now we have a MetaCache where A has been updated with the newly merged > RegionInfo, B still exists with the old/deleted RegionInfo, and C has been > removed. > A user submits a request for row C1 again. This _should_ go to region A, but > we do cache.floorEntry(C1) which returns the old but still cached region B. > We have checks in MetaCache which validate the RegionInfo.getEndKey() against > the requested row, and that validation fails because C1 is beyond the endkey > of the old region. The cached region B result is ignored and cache returns > null. Meta is scanned, and returns the new region A, which is cached again. > Requests to rows C1+ will still succeed... but they will always require a > meta scan because the meta cache will always return that old region B which > is invalid and doesn't contain the C1+ rows. > Currently, the only way this will ever resolve is if a request is sent to > region B, which will cause a NotServingRegionException which will finally > clear region B from the cache. At that point, requests for C1+ will properly > get resolved to region A in the cache. > I've created a reproducible test case here: > [https://gist.github.com/bbeaudreault/c82ff9f8ad0b9424eb987483ede35c12] > This problem affects both AsyncTable and branch-2's Table. > -- This message was sent by Atlassian Jira (v8.20.10#820010)