krconv opened a new pull request, #7629:
URL: https://github.com/apache/hbase/pull/7629

   The RegionReplicaSinkWriter.append() method checks table descriptors to 
determine if a table has region replication enabled (to decide whether to 
bypass the location cache). When a table is dropped concurrently, 
tableDescriptors.get(tableName) returns null, and the subsequent call to 
getRegionReplication() throws a NullPointerException.
   
   This race condition can occur in the following scenario:
   1. WAL entries for a table are queued for replication to region replicas
   2. The table is dropped (via disable + drop or other means)
   3. Before the dropped table is added to the disabledAndDroppedTables cache 
(which happens when TableNotFoundException is caught during location lookup), 
the code attempts to read the table descriptor
   4. tableDescriptors.get() returns null for the now-deleted table
   5. NPE crashes the replication endpoint
   
   Since RegionReplicaReplicationEndpoint handles replica updates for all 
tables on a RegionServer, a single dropped table crashes the entire endpoint. 
This stops replica updates for all regions (including those from unrelated 
tables) hosted by that RegionServer until it is restarted.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to