Dong0829 created HBASE-27614:
--------------------------------

             Summary: Region Reopen failure when the openNum has issue
                 Key: HBASE-27614
                 URL: https://issues.apache.org/jira/browse/HBASE-27614
             Project: HBase
          Issue Type: Bug
            Reporter: Dong0829
            Assignee: Dong0829


We faced the issue when change the TTL for the hbase table and a lot of regions 
keep reopen and tons of TRSP created, after troubleshooting, we found some 
issue for the region reopen procedure logic.

In the reopen process, it will check the seqNum to confirm if the region 
reopened successfully or not. If the seqNum accident become bigger than the 
current HFile and WAL (because of the data loss), there will be issue and 
unnecessary loop for the region close/open

 

We should be able to optimize the logic, more details

For this regionOpenedWithoutPersistingToMeta, should we just update the 
OpenSeqNum when the new one is bigger than the old one?

As the region already opened, we should update the OpenSeqNum no matter its 
bigger or smaller, otherwise, we should not just return WARN but failed the 
open, right?

[https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/OpenRegionProcedure.java#L81]

 

Above does matter because for the 
checkReopened([https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/RegionStates.java#L312]),
 if the seq is smaller, the region will be returned and keep reopening.  So we 
should either update the logic in regionOpenedWithoutPersistingToMeta or 
checkReopened to make sure the region reopen works properly if the seqNum has 
issue

 

 

Reproduce steps:

 

 

1. {{{}Create a test table and put some data, for example:{}}}{{{}test{}}}
{{create 'test', 'info'}}
{{put 'test', 'fool', 'info:cat', 'test'}}

{{2. Manually update one region row for this test table in hbase:meta on the 
column, for example:}}

{{put 'hbase:meta', 'test,,1673406566311.3eb4d3e0258bd06f4639a595920c7673.', 
'info:seqnumDuringOpen', "\x00\x00\x00\x00\x00\x10\x00\x05"}}

{{3. Modify the table TTL : 

alter 'test', \{NAME=>'info' , TTL => '63244800'}}}

{{}}

You will see the region keep reopening {{}}

{{}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to