Dong0829 created HBASE-27614: -------------------------------- Summary: Region Reopen failure when the openNum has issue Key: HBASE-27614 URL: https://issues.apache.org/jira/browse/HBASE-27614 Project: HBase Issue Type: Bug Reporter: Dong0829 Assignee: Dong0829
We faced the issue when change the TTL for the hbase table and a lot of regions keep reopen and tons of TRSP created, after troubleshooting, we found some issue for the region reopen procedure logic. In the reopen process, it will check the seqNum to confirm if the region reopened successfully or not. If the seqNum accident become bigger than the current HFile and WAL (because of the data loss), there will be issue and unnecessary loop for the region close/open We should be able to optimize the logic, more details For this regionOpenedWithoutPersistingToMeta, should we just update the OpenSeqNum when the new one is bigger than the old one? As the region already opened, we should update the OpenSeqNum no matter its bigger or smaller, otherwise, we should not just return WARN but failed the open, right? [https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/OpenRegionProcedure.java#L81] Above does matter because for the checkReopened([https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/RegionStates.java#L312]), if the seq is smaller, the region will be returned and keep reopening. So we should either update the logic in regionOpenedWithoutPersistingToMeta or checkReopened to make sure the region reopen works properly if the seqNum has issue Reproduce steps: 1. {{{}Create a test table and put some data, for example:{}}}{{{}test{}}} {{create 'test', 'info'}} {{put 'test', 'fool', 'info:cat', 'test'}} {{2. Manually update one region row for this test table in hbase:meta on the column, for example:}} {{put 'hbase:meta', 'test,,1673406566311.3eb4d3e0258bd06f4639a595920c7673.', 'info:seqnumDuringOpen', "\x00\x00\x00\x00\x00\x10\x00\x05"}} {{3. Modify the table TTL : alter 'test', \{NAME=>'info' , TTL => '63244800'}}} {{}} You will see the region keep reopening {{}} {{}} -- This message was sent by Atlassian Jira (v8.20.10#820010)