[jira] [Updated] (HBASE-28271) Infinite waiting on lock acquisition by snapshot can result in unresponsive master

Viraj Jasani (Jira) Wed, 03 Jan 2024 15:24:27 -0800


     [ 
https://issues.apache.org/jira/browse/HBASE-28271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Viraj Jasani updated HBASE-28271:
---------------------------------
    Fix Version/s: 2.6.0
                   2.4.18
                   2.5.8
                   3.0.0-beta-2
           Status: Patch Available  (was: In Progress)

> Infinite waiting on lock acquisition by snapshot can result in unresponsive 
> master
> ----------------------------------------------------------------------------------
>
>                 Key: HBASE-28271
>                 URL: https://issues.apache.org/jira/browse/HBASE-28271
>             Project: HBase
>          Issue Type: Improvement
>    Affects Versions: 2.5.7, 2.4.17, 3.0.0-alpha-4
>            Reporter: Viraj Jasani
>            Assignee: Viraj Jasani
>            Priority: Major
>             Fix For: 2.6.0, 2.4.18, 2.5.8, 3.0.0-beta-2
>
>         Attachments: image.png
>
>
> When a region is stuck in transition for significant time, any attempt to 
> take snapshot on the table would keep master handler thread in forever 
> waiting state. As part of the creating snapshot on enabled or disabled table, 
> in order to get the table level lock, LockProcedure is executed but if any 
> region of the table is in transition, LockProcedure could not be executed by 
> the snapshot handler, resulting in forever waiting until the region 
> transition is completed, allowing the table level lock to be acquired by the 
> snapshot handler.
> In cases where a region stays in RIT for considerable time, if enough 
> attempts are made by the client to create snapshots on the table, it can 
> easily exhaust all handler threads, leading to potentially unresponsive 
> master. Attached a sample thread dump.
> Proposal: The snapshot handler should not stay stuck forever if it cannot 
> take table level lock, it should fail-fast.
> !image.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HBASE-28271) Infinite waiting on lock acquisition by snapshot can result in unresponsive master

Reply via email to