[ https://issues.apache.org/jira/browse/HBASE-21095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Allan Yang updated HBASE-21095: ------------------------------- Attachment: HBASE-21095.branch-2.0.001.patch > The timeout retry logic for several procedures are broken after master > restarts > ------------------------------------------------------------------------------- > > Key: HBASE-21095 > URL: https://issues.apache.org/jira/browse/HBASE-21095 > Project: HBase > Issue Type: Sub-task > Components: amv2, proc-v2 > Reporter: Duo Zhang > Assignee: Duo Zhang > Priority: Critical > Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.2 > > Attachments: HBASE-21095-branch-2.0.patch, HBASE-21095-v1.patch, > HBASE-21095-v2.patch, HBASE-21095.branch-2.0.001.patch, HBASE-21095.patch > > > For TRSP, and also RTP in branch-2.0 and branch-2.1, if we fail to assign or > unassign a region, we will set the procedure to WAITING_TIMEOUT state, and > rely on the ProcedureEvent in RegionStateNode to wake us up later. But after > restarting, we do not suspend the ProcedureEvent in RSN, and also do not add > the procedure to the ProcedureEvent's suspending queue, so we will hang there > forever as no one will wake us up. -- This message was sent by Atlassian JIRA (v7.6.3#76005)