[ https://issues.apache.org/jira/browse/HBASE-28271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Viraj Jasani updated HBASE-28271: --------------------------------- Fix Version/s: 2.6.0 2.4.18 2.5.8 3.0.0-beta-2 Status: Patch Available (was: In Progress) > Infinite waiting on lock acquisition by snapshot can result in unresponsive > master > ---------------------------------------------------------------------------------- > > Key: HBASE-28271 > URL: https://issues.apache.org/jira/browse/HBASE-28271 > Project: HBase > Issue Type: Improvement > Affects Versions: 2.5.7, 2.4.17, 3.0.0-alpha-4 > Reporter: Viraj Jasani > Assignee: Viraj Jasani > Priority: Major > Fix For: 2.6.0, 2.4.18, 2.5.8, 3.0.0-beta-2 > > Attachments: image.png > > > When a region is stuck in transition for significant time, any attempt to > take snapshot on the table would keep master handler thread in forever > waiting state. As part of the creating snapshot on enabled or disabled table, > in order to get the table level lock, LockProcedure is executed but if any > region of the table is in transition, LockProcedure could not be executed by > the snapshot handler, resulting in forever waiting until the region > transition is completed, allowing the table level lock to be acquired by the > snapshot handler. > In cases where a region stays in RIT for considerable time, if enough > attempts are made by the client to create snapshots on the table, it can > easily exhaust all handler threads, leading to potentially unresponsive > master. Attached a sample thread dump. > Proposal: The snapshot handler should not stay stuck forever if it cannot > take table level lock, it should fail-fast. > !image.png! -- This message was sent by Atlassian Jira (v8.20.10#820010)