krconv opened a new pull request, #7629: URL: https://github.com/apache/hbase/pull/7629
The RegionReplicaSinkWriter.append() method checks table descriptors to determine if a table has region replication enabled (to decide whether to bypass the location cache). When a table is dropped concurrently, tableDescriptors.get(tableName) returns null, and the subsequent call to getRegionReplication() throws a NullPointerException. This race condition can occur in the following scenario: 1. WAL entries for a table are queued for replication to region replicas 2. The table is dropped (via disable + drop or other means) 3. Before the dropped table is added to the disabledAndDroppedTables cache (which happens when TableNotFoundException is caught during location lookup), the code attempts to read the table descriptor 4. tableDescriptors.get() returns null for the now-deleted table 5. NPE crashes the replication endpoint Since RegionReplicaReplicationEndpoint handles replica updates for all tables on a RegionServer, a single dropped table crashes the entire endpoint. This stops replica updates for all regions (including those from unrelated tables) hosted by that RegionServer until it is restarted. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
