Thanks for the analysis. I left some comment on HBASE-8285
On Sat, Apr 6, 2013 at 1:36 PM, Varun Sharma <va...@pinterest.com> wrote: > Hi, > > We are observing this bug for a while when we use HTable.get() operation to > do a single Get call using the "Result get(Get get)" API and I thought its > best to bring it up. > > Steps to reproduce this bug: > 1) Gracefull restart a region server causing regions to get redistributed. > 2) Client call to this region keeps failing since Meta Cache is never > purged on the client for the region that moved. > > Reason behind the bug: > 1) Client continues to hit the old region server. > 2) The old region server throws NotServingRegionException which is not > handled correctly and the META cache entries are never purged for that > server causing the client to keep hitting the old server. > > The reason lies in ServerCallable code since we only purge META cache > entries when there is a RetriesExhaustedException, SocketTimeoutException > or ConnectException. However, there is no case check for > NotServingRegionException(s). > > Why is this not a problem for Scan(s) and Put(s) ? > > a) If a region server is not hosting a region/scanner, then an > UnknownScannerException is thrown which causes a relocateRegion() call > causing a refresh of the META cache for that particular region. > b) For put(s), the processBatchCallback() interface in HConnectionManager > is used which clears out META cache entries for all kinds of exceptions > except DoNotRetryException. > > Created HBASE 8285 for this. >