[jira] [Commented] (HBASE-20846) Restore procedure locks when master restarts
[ https://issues.apache.org/jira/browse/HBASE-20846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16569515#comment-16569515 ] stack commented on HBASE-20846: --- bq. Duo Zhang, is there any problem for this fix before it can be back-ported branch-2.0? What [~allan163] says... Yeah, what test is failing. I can take a look. > Restore procedure locks when master restarts > > > Key: HBASE-20846 > URL: https://issues.apache.org/jira/browse/HBASE-20846 > Project: HBase > Issue Type: Sub-task >Affects Versions: 2.1.0 >Reporter: Allan Yang >Assignee: Duo Zhang >Priority: Major > Fix For: 3.0.0, 2.2.0, 2.1.1 > > Attachments: HBASE-20846-v1.patch, HBASE-20846-v2.patch, > HBASE-20846-v3.patch, HBASE-20846-v4.patch, HBASE-20846-v4.patch, > HBASE-20846-v4.patch, HBASE-20846-v5.patch, HBASE-20846-v6.patch, > HBASE-20846.branch-2.0.002.patch, HBASE-20846.branch-2.0.patch, > HBASE-20846.patch > > > Found this one when investigating ModifyTableProcedure got stuck while there > was a MoveRegionProcedure going on after master restart. > Though this issue can be solved by HBASE-20752. But I discovered something > else. > Before a MoveRegionProcedure can execute, it will hold the table's shared > lock. so,, when a UnassignProcedure was spwaned, it will not check the > table's shared lock since it is sure that its parent(MoveRegionProcedure) has > aquired the table's lock. > {code:java} > // If there is parent procedure, it would have already taken xlock, so no > need to take > // shared lock here. Otherwise, take shared lock. > if (!procedure.hasParent() > && waitTableQueueSharedLock(procedure, table) == null) { > return true; > } > {code} > But, it is not the case when Master was restarted. The child > procedure(UnassignProcedure) will be executed first after restart. Though it > has a parent(MoveRegionProcedure), but apprently the parent didn't hold the > table's lock. > So, since it began to execute without hold the table's shared lock. A > ModifyTableProcedure can aquire the table's exclusive lock and execute at the > same time. Which is not possible if the master was not restarted. > This will cause a stuck before HBASE-20752. But since HBASE-20752 has fixed, > I wrote a simple UT to repo this case. > I think we don't have to check the parent for table's shared lock. It is a > shared lock, right? I think we can acquire it every time we need it. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20846) Restore procedure locks when master restarts
[ https://issues.apache.org/jira/browse/HBASE-20846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16569422#comment-16569422 ] Hudson commented on HBASE-20846: Results for branch branch-2.0 [build #634 on builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/634/]: (x) *{color:red}-1 overall{color}* details (if available): (/) {color:green}+1 general checks{color} -- For more information [see general report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/634//General_Nightly_Build_Report/] (x) {color:red}-1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/634//JDK8_Nightly_Build_Report_(Hadoop2)/] (x) {color:red}-1 jdk8 hadoop3 checks{color} -- For more information [see jdk8 (hadoop3) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/634//JDK8_Nightly_Build_Report_(Hadoop3)/] (/) {color:green}+1 source release artifact{color} -- See build output for details. > Restore procedure locks when master restarts > > > Key: HBASE-20846 > URL: https://issues.apache.org/jira/browse/HBASE-20846 > Project: HBase > Issue Type: Sub-task >Affects Versions: 2.1.0 >Reporter: Allan Yang >Assignee: Duo Zhang >Priority: Major > Fix For: 3.0.0, 2.2.0, 2.1.1 > > Attachments: HBASE-20846-v1.patch, HBASE-20846-v2.patch, > HBASE-20846-v3.patch, HBASE-20846-v4.patch, HBASE-20846-v4.patch, > HBASE-20846-v4.patch, HBASE-20846-v5.patch, HBASE-20846-v6.patch, > HBASE-20846.branch-2.0.002.patch, HBASE-20846.branch-2.0.patch, > HBASE-20846.patch > > > Found this one when investigating ModifyTableProcedure got stuck while there > was a MoveRegionProcedure going on after master restart. > Though this issue can be solved by HBASE-20752. But I discovered something > else. > Before a MoveRegionProcedure can execute, it will hold the table's shared > lock. so,, when a UnassignProcedure was spwaned, it will not check the > table's shared lock since it is sure that its parent(MoveRegionProcedure) has > aquired the table's lock. > {code:java} > // If there is parent procedure, it would have already taken xlock, so no > need to take > // shared lock here. Otherwise, take shared lock. > if (!procedure.hasParent() > && waitTableQueueSharedLock(procedure, table) == null) { > return true; > } > {code} > But, it is not the case when Master was restarted. The child > procedure(UnassignProcedure) will be executed first after restart. Though it > has a parent(MoveRegionProcedure), but apprently the parent didn't hold the > table's lock. > So, since it began to execute without hold the table's shared lock. A > ModifyTableProcedure can aquire the table's exclusive lock and execute at the > same time. Which is not possible if the master was not restarted. > This will cause a stuck before HBASE-20752. But since HBASE-20752 has fixed, > I wrote a simple UT to repo this case. > I think we don't have to check the parent for table's shared lock. It is a > shared lock, right? I think we can acquire it every time we need it. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20846) Restore procedure locks when master restarts
[ https://issues.apache.org/jira/browse/HBASE-20846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16561847#comment-16561847 ] Allan Yang commented on HBASE-20846: {quote} Seems the flaky tests report can reproduce the problem of getting MERGED state when disabling. Let me dig more. {quote} [~Apache9], is there any problem for this fix before it can be back-ported branch-2.0? > Restore procedure locks when master restarts > > > Key: HBASE-20846 > URL: https://issues.apache.org/jira/browse/HBASE-20846 > Project: HBase > Issue Type: Sub-task >Affects Versions: 2.1.0 >Reporter: Allan Yang >Assignee: Duo Zhang >Priority: Major > Fix For: 3.0.0, 2.2.0, 2.1.1 > > Attachments: HBASE-20846-v1.patch, HBASE-20846-v2.patch, > HBASE-20846-v3.patch, HBASE-20846-v4.patch, HBASE-20846-v4.patch, > HBASE-20846-v4.patch, HBASE-20846-v5.patch, HBASE-20846-v6.patch, > HBASE-20846.branch-2.0.002.patch, HBASE-20846.branch-2.0.patch, > HBASE-20846.patch > > > Found this one when investigating ModifyTableProcedure got stuck while there > was a MoveRegionProcedure going on after master restart. > Though this issue can be solved by HBASE-20752. But I discovered something > else. > Before a MoveRegionProcedure can execute, it will hold the table's shared > lock. so,, when a UnassignProcedure was spwaned, it will not check the > table's shared lock since it is sure that its parent(MoveRegionProcedure) has > aquired the table's lock. > {code:java} > // If there is parent procedure, it would have already taken xlock, so no > need to take > // shared lock here. Otherwise, take shared lock. > if (!procedure.hasParent() > && waitTableQueueSharedLock(procedure, table) == null) { > return true; > } > {code} > But, it is not the case when Master was restarted. The child > procedure(UnassignProcedure) will be executed first after restart. Though it > has a parent(MoveRegionProcedure), but apprently the parent didn't hold the > table's lock. > So, since it began to execute without hold the table's shared lock. A > ModifyTableProcedure can aquire the table's exclusive lock and execute at the > same time. Which is not possible if the master was not restarted. > This will cause a stuck before HBASE-20752. But since HBASE-20752 has fixed, > I wrote a simple UT to repo this case. > I think we don't have to check the parent for table's shared lock. It is a > shared lock, right? I think we can acquire it every time we need it. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20846) Restore procedure locks when master restarts
[ https://issues.apache.org/jira/browse/HBASE-20846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558308#comment-16558308 ] Duo Zhang commented on HBASE-20846: --- Seems the flaky tests report can reproduce the problem of getting MERGED state when disabling. Let me dig more. > Restore procedure locks when master restarts > > > Key: HBASE-20846 > URL: https://issues.apache.org/jira/browse/HBASE-20846 > Project: HBase > Issue Type: Sub-task >Affects Versions: 2.1.0 >Reporter: Allan Yang >Assignee: Duo Zhang >Priority: Major > Fix For: 3.0.0, 2.2.0, 2.1.1 > > Attachments: HBASE-20846-v1.patch, HBASE-20846-v2.patch, > HBASE-20846-v3.patch, HBASE-20846-v4.patch, HBASE-20846-v4.patch, > HBASE-20846-v4.patch, HBASE-20846-v5.patch, HBASE-20846-v6.patch, > HBASE-20846.branch-2.0.002.patch, HBASE-20846.branch-2.0.patch, > HBASE-20846.patch > > > Found this one when investigating ModifyTableProcedure got stuck while there > was a MoveRegionProcedure going on after master restart. > Though this issue can be solved by HBASE-20752. But I discovered something > else. > Before a MoveRegionProcedure can execute, it will hold the table's shared > lock. so,, when a UnassignProcedure was spwaned, it will not check the > table's shared lock since it is sure that its parent(MoveRegionProcedure) has > aquired the table's lock. > {code:java} > // If there is parent procedure, it would have already taken xlock, so no > need to take > // shared lock here. Otherwise, take shared lock. > if (!procedure.hasParent() > && waitTableQueueSharedLock(procedure, table) == null) { > return true; > } > {code} > But, it is not the case when Master was restarted. The child > procedure(UnassignProcedure) will be executed first after restart. Though it > has a parent(MoveRegionProcedure), but apprently the parent didn't hold the > table's lock. > So, since it began to execute without hold the table's shared lock. A > ModifyTableProcedure can aquire the table's exclusive lock and execute at the > same time. Which is not possible if the master was not restarted. > This will cause a stuck before HBASE-20752. But since HBASE-20752 has fixed, > I wrote a simple UT to repo this case. > I think we don't have to check the parent for table's shared lock. It is a > shared lock, right? I think we can acquire it every time we need it. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20846) Restore procedure locks when master restarts
[ https://issues.apache.org/jira/browse/HBASE-20846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16556315#comment-16556315 ] Hudson commented on HBASE-20846: Results for branch branch-2.1 [build #103 on builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/103/]: (x) *{color:red}-1 overall{color}* details (if available): (x) {color:red}-1 general checks{color} -- Something went wrong running this stage, please [check relevant console output|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/103//console]. (x) {color:red}-1 jdk8 hadoop2 checks{color} -- Something went wrong running this stage, please [check relevant console output|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/103//console]. (x) {color:red}-1 jdk8 hadoop3 checks{color} -- For more information [see jdk8 (hadoop3) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/103//JDK8_Nightly_Build_Report_(Hadoop3)/] (/) {color:green}+1 source release artifact{color} -- See build output for details. > Restore procedure locks when master restarts > > > Key: HBASE-20846 > URL: https://issues.apache.org/jira/browse/HBASE-20846 > Project: HBase > Issue Type: Sub-task >Affects Versions: 2.1.0 >Reporter: Allan Yang >Assignee: Duo Zhang >Priority: Major > Fix For: 3.0.0, 2.2.0, 2.1.1 > > Attachments: HBASE-20846-v1.patch, HBASE-20846-v2.patch, > HBASE-20846-v3.patch, HBASE-20846-v4.patch, HBASE-20846-v4.patch, > HBASE-20846-v4.patch, HBASE-20846-v5.patch, HBASE-20846-v6.patch, > HBASE-20846.branch-2.0.002.patch, HBASE-20846.branch-2.0.patch, > HBASE-20846.patch > > > Found this one when investigating ModifyTableProcedure got stuck while there > was a MoveRegionProcedure going on after master restart. > Though this issue can be solved by HBASE-20752. But I discovered something > else. > Before a MoveRegionProcedure can execute, it will hold the table's shared > lock. so,, when a UnassignProcedure was spwaned, it will not check the > table's shared lock since it is sure that its parent(MoveRegionProcedure) has > aquired the table's lock. > {code:java} > // If there is parent procedure, it would have already taken xlock, so no > need to take > // shared lock here. Otherwise, take shared lock. > if (!procedure.hasParent() > && waitTableQueueSharedLock(procedure, table) == null) { > return true; > } > {code} > But, it is not the case when Master was restarted. The child > procedure(UnassignProcedure) will be executed first after restart. Though it > has a parent(MoveRegionProcedure), but apprently the parent didn't hold the > table's lock. > So, since it began to execute without hold the table's shared lock. A > ModifyTableProcedure can aquire the table's exclusive lock and execute at the > same time. Which is not possible if the master was not restarted. > This will cause a stuck before HBASE-20752. But since HBASE-20752 has fixed, > I wrote a simple UT to repo this case. > I think we don't have to check the parent for table's shared lock. It is a > shared lock, right? I think we can acquire it every time we need it. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20846) Restore procedure locks when master restarts
[ https://issues.apache.org/jira/browse/HBASE-20846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16555963#comment-16555963 ] Hudson commented on HBASE-20846: Results for branch branch-2 [build #1025 on builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1025/]: (x) *{color:red}-1 overall{color}* details (if available): (/) {color:green}+1 general checks{color} -- For more information [see general report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1025//General_Nightly_Build_Report/] (/) {color:green}+1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1025//JDK8_Nightly_Build_Report_(Hadoop2)/] (x) {color:red}-1 jdk8 hadoop3 checks{color} -- For more information [see jdk8 (hadoop3) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1025//JDK8_Nightly_Build_Report_(Hadoop3)/] (/) {color:green}+1 source release artifact{color} -- See build output for details. (/) {color:green}+1 client integration test{color} > Restore procedure locks when master restarts > > > Key: HBASE-20846 > URL: https://issues.apache.org/jira/browse/HBASE-20846 > Project: HBase > Issue Type: Sub-task >Affects Versions: 2.1.0 >Reporter: Allan Yang >Assignee: Duo Zhang >Priority: Major > Fix For: 3.0.0, 2.2.0, 2.1.1 > > Attachments: HBASE-20846-v1.patch, HBASE-20846-v2.patch, > HBASE-20846-v3.patch, HBASE-20846-v4.patch, HBASE-20846-v4.patch, > HBASE-20846-v4.patch, HBASE-20846-v5.patch, HBASE-20846-v6.patch, > HBASE-20846.branch-2.0.002.patch, HBASE-20846.branch-2.0.patch, > HBASE-20846.patch > > > Found this one when investigating ModifyTableProcedure got stuck while there > was a MoveRegionProcedure going on after master restart. > Though this issue can be solved by HBASE-20752. But I discovered something > else. > Before a MoveRegionProcedure can execute, it will hold the table's shared > lock. so,, when a UnassignProcedure was spwaned, it will not check the > table's shared lock since it is sure that its parent(MoveRegionProcedure) has > aquired the table's lock. > {code:java} > // If there is parent procedure, it would have already taken xlock, so no > need to take > // shared lock here. Otherwise, take shared lock. > if (!procedure.hasParent() > && waitTableQueueSharedLock(procedure, table) == null) { > return true; > } > {code} > But, it is not the case when Master was restarted. The child > procedure(UnassignProcedure) will be executed first after restart. Though it > has a parent(MoveRegionProcedure), but apprently the parent didn't hold the > table's lock. > So, since it began to execute without hold the table's shared lock. A > ModifyTableProcedure can aquire the table's exclusive lock and execute at the > same time. Which is not possible if the master was not restarted. > This will cause a stuck before HBASE-20752. But since HBASE-20752 has fixed, > I wrote a simple UT to repo this case. > I think we don't have to check the parent for table's shared lock. It is a > shared lock, right? I think we can acquire it every time we need it. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20846) Restore procedure locks when master restarts
[ https://issues.apache.org/jira/browse/HBASE-20846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16555616#comment-16555616 ] Hudson commented on HBASE-20846: Results for branch master [build #408 on builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/master/408/]: (x) *{color:red}-1 overall{color}* details (if available): (/) {color:green}+1 general checks{color} -- For more information [see general report|https://builds.apache.org/job/HBase%20Nightly/job/master/408//General_Nightly_Build_Report/] (x) {color:red}-1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://builds.apache.org/job/HBase%20Nightly/job/master/408//JDK8_Nightly_Build_Report_(Hadoop2)/] (x) {color:red}-1 jdk8 hadoop3 checks{color} -- Something went wrong running this stage, please [check relevant console output|https://builds.apache.org/job/HBase%20Nightly/job/master/408//console]. (/) {color:green}+1 source release artifact{color} -- See build output for details. (/) {color:green}+1 client integration test{color} > Restore procedure locks when master restarts > > > Key: HBASE-20846 > URL: https://issues.apache.org/jira/browse/HBASE-20846 > Project: HBase > Issue Type: Sub-task >Affects Versions: 2.1.0 >Reporter: Allan Yang >Assignee: Duo Zhang >Priority: Major > Fix For: 3.0.0, 2.2.0, 2.1.1 > > Attachments: HBASE-20846-v1.patch, HBASE-20846-v2.patch, > HBASE-20846-v3.patch, HBASE-20846-v4.patch, HBASE-20846-v4.patch, > HBASE-20846-v4.patch, HBASE-20846-v5.patch, HBASE-20846-v6.patch, > HBASE-20846.branch-2.0.002.patch, HBASE-20846.branch-2.0.patch, > HBASE-20846.patch > > > Found this one when investigating ModifyTableProcedure got stuck while there > was a MoveRegionProcedure going on after master restart. > Though this issue can be solved by HBASE-20752. But I discovered something > else. > Before a MoveRegionProcedure can execute, it will hold the table's shared > lock. so,, when a UnassignProcedure was spwaned, it will not check the > table's shared lock since it is sure that its parent(MoveRegionProcedure) has > aquired the table's lock. > {code:java} > // If there is parent procedure, it would have already taken xlock, so no > need to take > // shared lock here. Otherwise, take shared lock. > if (!procedure.hasParent() > && waitTableQueueSharedLock(procedure, table) == null) { > return true; > } > {code} > But, it is not the case when Master was restarted. The child > procedure(UnassignProcedure) will be executed first after restart. Though it > has a parent(MoveRegionProcedure), but apprently the parent didn't hold the > table's lock. > So, since it began to execute without hold the table's shared lock. A > ModifyTableProcedure can aquire the table's exclusive lock and execute at the > same time. Which is not possible if the master was not restarted. > This will cause a stuck before HBASE-20752. But since HBASE-20752 has fixed, > I wrote a simple UT to repo this case. > I think we don't have to check the parent for table's shared lock. It is a > shared lock, right? I think we can acquire it every time we need it. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20846) Restore procedure locks when master restarts
[ https://issues.apache.org/jira/browse/HBASE-20846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16553175#comment-16553175 ] stack commented on HBASE-20846: --- Link to HBASE-20924... An issue to backport to branch-2.0. > Restore procedure locks when master restarts > > > Key: HBASE-20846 > URL: https://issues.apache.org/jira/browse/HBASE-20846 > Project: HBase > Issue Type: Sub-task >Affects Versions: 2.1.0 >Reporter: Allan Yang >Assignee: Duo Zhang >Priority: Major > Fix For: 3.0.0, 2.2.0, 2.1.1 > > Attachments: HBASE-20846-v1.patch, HBASE-20846-v2.patch, > HBASE-20846-v3.patch, HBASE-20846-v4.patch, HBASE-20846-v4.patch, > HBASE-20846-v4.patch, HBASE-20846-v5.patch, HBASE-20846-v6.patch, > HBASE-20846.branch-2.0.002.patch, HBASE-20846.branch-2.0.patch, > HBASE-20846.patch > > > Found this one when investigating ModifyTableProcedure got stuck while there > was a MoveRegionProcedure going on after master restart. > Though this issue can be solved by HBASE-20752. But I discovered something > else. > Before a MoveRegionProcedure can execute, it will hold the table's shared > lock. so,, when a UnassignProcedure was spwaned, it will not check the > table's shared lock since it is sure that its parent(MoveRegionProcedure) has > aquired the table's lock. > {code:java} > // If there is parent procedure, it would have already taken xlock, so no > need to take > // shared lock here. Otherwise, take shared lock. > if (!procedure.hasParent() > && waitTableQueueSharedLock(procedure, table) == null) { > return true; > } > {code} > But, it is not the case when Master was restarted. The child > procedure(UnassignProcedure) will be executed first after restart. Though it > has a parent(MoveRegionProcedure), but apprently the parent didn't hold the > table's lock. > So, since it began to execute without hold the table's shared lock. A > ModifyTableProcedure can aquire the table's exclusive lock and execute at the > same time. Which is not possible if the master was not restarted. > This will cause a stuck before HBASE-20752. But since HBASE-20752 has fixed, > I wrote a simple UT to repo this case. > I think we don't have to check the parent for table's shared lock. It is a > shared lock, right? I think we can acquire it every time we need it. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20846) Restore procedure locks when master restarts
[ https://issues.apache.org/jira/browse/HBASE-20846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16552366#comment-16552366 ] Duo Zhang commented on HBASE-20846: --- Let me commit to branch-2.1+. [~stack] For 2.0 I think we could open a backport issue for it? > Restore procedure locks when master restarts > > > Key: HBASE-20846 > URL: https://issues.apache.org/jira/browse/HBASE-20846 > Project: HBase > Issue Type: Sub-task >Affects Versions: 2.1.0 >Reporter: Allan Yang >Assignee: Duo Zhang >Priority: Major > Fix For: 3.0.0, 2.0.2, 2.1.1 > > Attachments: HBASE-20846-v1.patch, HBASE-20846-v2.patch, > HBASE-20846-v3.patch, HBASE-20846-v4.patch, HBASE-20846-v4.patch, > HBASE-20846-v4.patch, HBASE-20846-v5.patch, HBASE-20846-v6.patch, > HBASE-20846.branch-2.0.002.patch, HBASE-20846.branch-2.0.patch, > HBASE-20846.patch > > > Found this one when investigating ModifyTableProcedure got stuck while there > was a MoveRegionProcedure going on after master restart. > Though this issue can be solved by HBASE-20752. But I discovered something > else. > Before a MoveRegionProcedure can execute, it will hold the table's shared > lock. so,, when a UnassignProcedure was spwaned, it will not check the > table's shared lock since it is sure that its parent(MoveRegionProcedure) has > aquired the table's lock. > {code:java} > // If there is parent procedure, it would have already taken xlock, so no > need to take > // shared lock here. Otherwise, take shared lock. > if (!procedure.hasParent() > && waitTableQueueSharedLock(procedure, table) == null) { > return true; > } > {code} > But, it is not the case when Master was restarted. The child > procedure(UnassignProcedure) will be executed first after restart. Though it > has a parent(MoveRegionProcedure), but apprently the parent didn't hold the > table's lock. > So, since it began to execute without hold the table's shared lock. A > ModifyTableProcedure can aquire the table's exclusive lock and execute at the > same time. Which is not possible if the master was not restarted. > This will cause a stuck before HBASE-20752. But since HBASE-20752 has fixed, > I wrote a simple UT to repo this case. > I think we don't have to check the parent for table's shared lock. It is a > shared lock, right? I think we can acquire it every time we need it. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20846) Restore procedure locks when master restarts
[ https://issues.apache.org/jira/browse/HBASE-20846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16551736#comment-16551736 ] stack commented on HBASE-20846: --- We thinking this goes into 2.1 and 2.0? Seems risky for 2.0.x, even though it bug fix. Maybe make a backport JIRA so can try it in long-running test. The risk might be worth it given this fix clears out a whole range of possible bug types. > Restore procedure locks when master restarts > > > Key: HBASE-20846 > URL: https://issues.apache.org/jira/browse/HBASE-20846 > Project: HBase > Issue Type: Sub-task >Affects Versions: 2.1.0 >Reporter: Allan Yang >Assignee: Duo Zhang >Priority: Major > Fix For: 3.0.0, 2.0.2, 2.1.1 > > Attachments: HBASE-20846-v1.patch, HBASE-20846-v2.patch, > HBASE-20846-v3.patch, HBASE-20846-v4.patch, HBASE-20846-v4.patch, > HBASE-20846-v4.patch, HBASE-20846-v5.patch, HBASE-20846.branch-2.0.002.patch, > HBASE-20846.branch-2.0.patch, HBASE-20846.patch > > > Found this one when investigating ModifyTableProcedure got stuck while there > was a MoveRegionProcedure going on after master restart. > Though this issue can be solved by HBASE-20752. But I discovered something > else. > Before a MoveRegionProcedure can execute, it will hold the table's shared > lock. so,, when a UnassignProcedure was spwaned, it will not check the > table's shared lock since it is sure that its parent(MoveRegionProcedure) has > aquired the table's lock. > {code:java} > // If there is parent procedure, it would have already taken xlock, so no > need to take > // shared lock here. Otherwise, take shared lock. > if (!procedure.hasParent() > && waitTableQueueSharedLock(procedure, table) == null) { > return true; > } > {code} > But, it is not the case when Master was restarted. The child > procedure(UnassignProcedure) will be executed first after restart. Though it > has a parent(MoveRegionProcedure), but apprently the parent didn't hold the > table's lock. > So, since it began to execute without hold the table's shared lock. A > ModifyTableProcedure can aquire the table's exclusive lock and execute at the > same time. Which is not possible if the master was not restarted. > This will cause a stuck before HBASE-20752. But since HBASE-20752 has fixed, > I wrote a simple UT to repo this case. > I think we don't have to check the parent for table's shared lock. It is a > shared lock, right? I think we can acquire it every time we need it. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20846) Restore procedure locks when master restarts
[ https://issues.apache.org/jira/browse/HBASE-20846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16551688#comment-16551688 ] Duo Zhang commented on HBASE-20846: --- Ping [~stack] for reviewing. It is on the RB now. > Restore procedure locks when master restarts > > > Key: HBASE-20846 > URL: https://issues.apache.org/jira/browse/HBASE-20846 > Project: HBase > Issue Type: Sub-task >Affects Versions: 2.1.0 >Reporter: Allan Yang >Assignee: Duo Zhang >Priority: Major > Fix For: 3.0.0, 2.0.2, 2.1.1 > > Attachments: HBASE-20846-v1.patch, HBASE-20846-v2.patch, > HBASE-20846-v3.patch, HBASE-20846-v4.patch, HBASE-20846-v4.patch, > HBASE-20846-v4.patch, HBASE-20846-v5.patch, HBASE-20846.branch-2.0.002.patch, > HBASE-20846.branch-2.0.patch, HBASE-20846.patch > > > Found this one when investigating ModifyTableProcedure got stuck while there > was a MoveRegionProcedure going on after master restart. > Though this issue can be solved by HBASE-20752. But I discovered something > else. > Before a MoveRegionProcedure can execute, it will hold the table's shared > lock. so,, when a UnassignProcedure was spwaned, it will not check the > table's shared lock since it is sure that its parent(MoveRegionProcedure) has > aquired the table's lock. > {code:java} > // If there is parent procedure, it would have already taken xlock, so no > need to take > // shared lock here. Otherwise, take shared lock. > if (!procedure.hasParent() > && waitTableQueueSharedLock(procedure, table) == null) { > return true; > } > {code} > But, it is not the case when Master was restarted. The child > procedure(UnassignProcedure) will be executed first after restart. Though it > has a parent(MoveRegionProcedure), but apprently the parent didn't hold the > table's lock. > So, since it began to execute without hold the table's shared lock. A > ModifyTableProcedure can aquire the table's exclusive lock and execute at the > same time. Which is not possible if the master was not restarted. > This will cause a stuck before HBASE-20752. But since HBASE-20752 has fixed, > I wrote a simple UT to repo this case. > I think we don't have to check the parent for table's shared lock. It is a > shared lock, right? I think we can acquire it every time we need it. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20846) Restore procedure locks when master restarts
[ https://issues.apache.org/jira/browse/HBASE-20846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16550321#comment-16550321 ] Duo Zhang commented on HBASE-20846: --- {quote} Nice writeup. Include in lock section on top of the class? {quote} Sure. Any other concerns? Thanks. > Restore procedure locks when master restarts > > > Key: HBASE-20846 > URL: https://issues.apache.org/jira/browse/HBASE-20846 > Project: HBase > Issue Type: Sub-task >Affects Versions: 2.1.0 >Reporter: Allan Yang >Assignee: Duo Zhang >Priority: Major > Fix For: 3.0.0, 2.0.2, 2.1.1 > > Attachments: HBASE-20846-v1.patch, HBASE-20846-v2.patch, > HBASE-20846-v3.patch, HBASE-20846-v4.patch, HBASE-20846-v4.patch, > HBASE-20846-v4.patch, HBASE-20846-v5.patch, HBASE-20846.branch-2.0.002.patch, > HBASE-20846.branch-2.0.patch, HBASE-20846.patch > > > Found this one when investigating ModifyTableProcedure got stuck while there > was a MoveRegionProcedure going on after master restart. > Though this issue can be solved by HBASE-20752. But I discovered something > else. > Before a MoveRegionProcedure can execute, it will hold the table's shared > lock. so,, when a UnassignProcedure was spwaned, it will not check the > table's shared lock since it is sure that its parent(MoveRegionProcedure) has > aquired the table's lock. > {code:java} > // If there is parent procedure, it would have already taken xlock, so no > need to take > // shared lock here. Otherwise, take shared lock. > if (!procedure.hasParent() > && waitTableQueueSharedLock(procedure, table) == null) { > return true; > } > {code} > But, it is not the case when Master was restarted. The child > procedure(UnassignProcedure) will be executed first after restart. Though it > has a parent(MoveRegionProcedure), but apprently the parent didn't hold the > table's lock. > So, since it began to execute without hold the table's shared lock. A > ModifyTableProcedure can aquire the table's exclusive lock and execute at the > same time. Which is not possible if the master was not restarted. > This will cause a stuck before HBASE-20752. But since HBASE-20752 has fixed, > I wrote a simple UT to repo this case. > I think we don't have to check the parent for table's shared lock. It is a > shared lock, right? I think we can acquire it every time we need it. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20846) Restore procedure locks when master restarts
[ https://issues.apache.org/jira/browse/HBASE-20846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16549669#comment-16549669 ] Hadoop QA commented on HBASE-20846: --- | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 13s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 8 new or modified test files. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 18s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 52s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 58s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 39s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 4m 38s{color} | {color:green} branch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 5m 5s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 54s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 15s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 5m 5s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 3m 16s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 3m 16s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 3m 16s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 12s{color} | {color:green} The patch hbase-protocol-shaded passed checkstyle {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 13s{color} | {color:green} hbase-procedure: The patch generated 0 new + 28 unchanged - 16 fixed = 28 total (was 44) {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 20s{color} | {color:green} hbase-server: The patch generated 0 new + 316 unchanged - 7 fixed = 316 total (was 323) {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 4m 49s{color} | {color:green} patch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 11m 5s{color} | {color:green} Patch does not cause any errors with Hadoop 2.7.4 or 3.0.0. {color} | | {color:green}+1{color} | {color:green} hbaseprotoc {color} | {color:green} 1m 25s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 5m 26s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 55s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 33s{color} | {color:green} hbase-protocol-shaded in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 41s{color} | {color:green} hbase-procedure in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green}132m 0s{color} | {color:green} hbase-server in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 1m 9s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} |
[jira] [Commented] (HBASE-20846) Restore procedure locks when master restarts
[ https://issues.apache.org/jira/browse/HBASE-20846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16549564#comment-16549564 ] stack commented on HBASE-20846: --- Nice writeup. Include in lock section on top of the class? > Restore procedure locks when master restarts > > > Key: HBASE-20846 > URL: https://issues.apache.org/jira/browse/HBASE-20846 > Project: HBase > Issue Type: Sub-task >Affects Versions: 2.1.0 >Reporter: Allan Yang >Assignee: Duo Zhang >Priority: Major > Fix For: 3.0.0, 2.0.2, 2.1.1 > > Attachments: HBASE-20846-v1.patch, HBASE-20846-v2.patch, > HBASE-20846-v3.patch, HBASE-20846-v4.patch, HBASE-20846-v4.patch, > HBASE-20846-v4.patch, HBASE-20846-v5.patch, HBASE-20846.branch-2.0.002.patch, > HBASE-20846.branch-2.0.patch, HBASE-20846.patch > > > Found this one when investigating ModifyTableProcedure got stuck while there > was a MoveRegionProcedure going on after master restart. > Though this issue can be solved by HBASE-20752. But I discovered something > else. > Before a MoveRegionProcedure can execute, it will hold the table's shared > lock. so,, when a UnassignProcedure was spwaned, it will not check the > table's shared lock since it is sure that its parent(MoveRegionProcedure) has > aquired the table's lock. > {code:java} > // If there is parent procedure, it would have already taken xlock, so no > need to take > // shared lock here. Otherwise, take shared lock. > if (!procedure.hasParent() > && waitTableQueueSharedLock(procedure, table) == null) { > return true; > } > {code} > But, it is not the case when Master was restarted. The child > procedure(UnassignProcedure) will be executed first after restart. Though it > has a parent(MoveRegionProcedure), but apprently the parent didn't hold the > table's lock. > So, since it began to execute without hold the table's shared lock. A > ModifyTableProcedure can aquire the table's exclusive lock and execute at the > same time. Which is not possible if the master was not restarted. > This will cause a stuck before HBASE-20752. But since HBASE-20752 has fixed, > I wrote a simple UT to repo this case. > I think we don't have to check the parent for table's shared lock. It is a > shared lock, right? I think we can acquire it every time we need it. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20846) Restore procedure locks when master restarts
[ https://issues.apache.org/jira/browse/HBASE-20846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16549364#comment-16549364 ] Duo Zhang commented on HBASE-20846: --- Review board link: https://reviews.apache.org/r/67973/ > Restore procedure locks when master restarts > > > Key: HBASE-20846 > URL: https://issues.apache.org/jira/browse/HBASE-20846 > Project: HBase > Issue Type: Sub-task >Affects Versions: 2.1.0 >Reporter: Allan Yang >Assignee: Duo Zhang >Priority: Major > Fix For: 3.0.0, 2.0.2, 2.1.1 > > Attachments: HBASE-20846-v1.patch, HBASE-20846-v2.patch, > HBASE-20846-v3.patch, HBASE-20846-v4.patch, HBASE-20846-v4.patch, > HBASE-20846-v4.patch, HBASE-20846-v5.patch, HBASE-20846.branch-2.0.002.patch, > HBASE-20846.branch-2.0.patch, HBASE-20846.patch > > > Found this one when investigating ModifyTableProcedure got stuck while there > was a MoveRegionProcedure going on after master restart. > Though this issue can be solved by HBASE-20752. But I discovered something > else. > Before a MoveRegionProcedure can execute, it will hold the table's shared > lock. so,, when a UnassignProcedure was spwaned, it will not check the > table's shared lock since it is sure that its parent(MoveRegionProcedure) has > aquired the table's lock. > {code:java} > // If there is parent procedure, it would have already taken xlock, so no > need to take > // shared lock here. Otherwise, take shared lock. > if (!procedure.hasParent() > && waitTableQueueSharedLock(procedure, table) == null) { > return true; > } > {code} > But, it is not the case when Master was restarted. The child > procedure(UnassignProcedure) will be executed first after restart. Though it > has a parent(MoveRegionProcedure), but apprently the parent didn't hold the > table's lock. > So, since it began to execute without hold the table's shared lock. A > ModifyTableProcedure can aquire the table's exclusive lock and execute at the > same time. Which is not possible if the master was not restarted. > This will cause a stuck before HBASE-20752. But since HBASE-20752 has fixed, > I wrote a simple UT to repo this case. > I think we don't have to check the parent for table's shared lock. It is a > shared lock, right? I think we can acquire it every time we need it. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20846) Restore procedure locks when master restarts
[ https://issues.apache.org/jira/browse/HBASE-20846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16549287#comment-16549287 ] Hadoop QA commented on HBASE-20846: --- | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 11s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 8 new or modified test files. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 13s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 43s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 58s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 40s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 4m 30s{color} | {color:green} branch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 38s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 51s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 14s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 45s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 3m 0s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 3m 0s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 3m 0s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 9s{color} | {color:green} The patch hbase-protocol-shaded passed checkstyle {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 14s{color} | {color:green} hbase-procedure: The patch generated 0 new + 28 unchanged - 16 fixed = 28 total (was 44) {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 16s{color} | {color:green} hbase-server: The patch generated 0 new + 316 unchanged - 7 fixed = 316 total (was 323) {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 4m 31s{color} | {color:green} patch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 9m 59s{color} | {color:green} Patch does not cause any errors with Hadoop 2.7.4 or 3.0.0. {color} | | {color:green}+1{color} | {color:green} hbaseprotoc {color} | {color:green} 1m 20s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 5m 3s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 53s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 32s{color} | {color:green} hbase-protocol-shaded in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 40s{color} | {color:green} hbase-procedure in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green}116m 41s{color} | {color:green} hbase-server in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 1m 1s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} |
[jira] [Commented] (HBASE-20846) Restore procedure locks when master restarts
[ https://issues.apache.org/jira/browse/HBASE-20846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16549096#comment-16549096 ] Duo Zhang commented on HBASE-20846: --- {quote} Most of patch content is swapping Procedure for Procedure. {quote} Just want to eliminate the warnings. Will upload the patch to RB when the pre commit result is OK, but the jenkins is in a bad state... The infra team said that they will move the jenkins server to a new machine at Saturday, will try later. The changes of this patch are: 1. Make hasLock method final, and add a locked field in Procedure to record whether we have the lock. We will set it to true in doAcquireLock and to false in doReleaseLock. The sub procedures do not need to manage it any more. 2. Also added a locked field in the proto message. When storing, the field will be set according to the return value of hasLock. And when loading, there is a new field in Procedure called lockedWhenLoading. We will set it to true if the locked field in proto message is true. 3. The reason why we can not set the locked field directly to true by calling doAcquireLock is that, during initialization, most procedures need to wait until master is initialized. So the solution here is that, we introduced a new method called waitInitialized in Procedure, and move the wait master initialized related code from acquireLock to this method. And we added a restoreLock method to Procedure, if lockedWhenLoading is true, we will call the acquireLock to get the lock, but do not set locked to true. And later when we call doAcquireLock and pass the waitInitialized check, we will test lockedWhenLoading, if it is true, when we just set the locked field to true and return, without actually calling the acquireLock method since we have already called it once. {quote} We release the lock even if we didn't have it? And maybe update store if though we didn't have lock. {quote} No, we will test hasLock before calling doReleaseLock. The only place where we do not test hasLock is in execProcedure, where we can make sure that we have the lock. And I think there are some space for optimization, for example, if holdLock is false when maybe we do not need to store the lock operation, but as I said above, later we want to make holdLock depend on the state, for example, for ModifyTableProcedure, all the states before the last state which we reopen all the regions, the holdLock should be true as we do not want other one to jump in the middle, but after the last state where we schedule a lot of sub procedures to reopen all the regions, we need to release the lock, so holdLock should be false. So I'm not sure if it is safe to use holdLock to determine whether we should store the log operations. May be OK, but need to think more. And also, for release lock, I think it could be merged with the updateStoreOnExec call, but the logic is a bit complicated as the updateStoreOnExec is in a loop, so in this patch I plan to implement it in a simple way. We can optimize it later. > Restore procedure locks when master restarts > > > Key: HBASE-20846 > URL: https://issues.apache.org/jira/browse/HBASE-20846 > Project: HBase > Issue Type: Sub-task >Affects Versions: 2.1.0 >Reporter: Allan Yang >Assignee: Duo Zhang >Priority: Major > Fix For: 3.0.0, 2.0.2, 2.1.1 > > Attachments: HBASE-20846-v1.patch, HBASE-20846-v2.patch, > HBASE-20846-v3.patch, HBASE-20846-v4.patch, HBASE-20846-v4.patch, > HBASE-20846.branch-2.0.002.patch, HBASE-20846.branch-2.0.patch, > HBASE-20846.patch > > > Found this one when investigating ModifyTableProcedure got stuck while there > was a MoveRegionProcedure going on after master restart. > Though this issue can be solved by HBASE-20752. But I discovered something > else. > Before a MoveRegionProcedure can execute, it will hold the table's shared > lock. so,, when a UnassignProcedure was spwaned, it will not check the > table's shared lock since it is sure that its parent(MoveRegionProcedure) has > aquired the table's lock. > {code:java} > // If there is parent procedure, it would have already taken xlock, so no > need to take > // shared lock here. Otherwise, take shared lock. > if (!procedure.hasParent() > && waitTableQueueSharedLock(procedure, table) == null) { > return true; > } > {code} > But, it is not the case when Master was restarted. The child > procedure(UnassignProcedure) will be executed first after restart. Though it > has a parent(MoveRegionProcedure), but apprently the parent didn't hold the > table's lock. > So, since it began to execute without hold the table's shared lock. A > ModifyTableProcedure can aquire the table's exclusive lock and execute at the > same time. Which is not
[jira] [Commented] (HBASE-20846) Restore procedure locks when master restarts
[ https://issues.apache.org/jira/browse/HBASE-20846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16548792#comment-16548792 ] stack commented on HBASE-20846: --- On the patch... Non-change in AbstractProcedureScheduler.java ? Most of patch content is swapping Procedure for Procedure. Trying to understand... now Procedure has a *locked* data member that can be used by subprocedures. Before subprocedures would supply their own boolean... sometimes it was named lock... other times locked. A subprocedure that wants a lock for the life of the procedure returns holdLock as per before? If a procedure wants to be locked for the procedure step, they do same as before too? They just call up to the super class to manage the setting an unsetting of locked? This lockedWhenLoading is not just for procedures that holdLock for life of the Procedure run it seems. We'll restore the lock even if we crashed mid-procedure step on load? That seems right/expected. 920 // persist that we have held the lock. This must be done before we actually execute the 921 // procedure, otherwise when restarting, we may consider the procedure does not have a lock, 922 // but it may have already done some changes as we have already executed it, and if another 923 // procedure gets the lock, then the semantic will be broken if the holdLock is true, as we do 924 // not expect that another procedure can be executed in the middle. 925 store.update(this); Now we do double-store to ProcedureWALStore? Once for the lock and then once for end-state of the Procedure step? Each time? Slows down pv2 throughput? Any chance of a reordering so we don't release lock until after we've persisted step and lock state? Is this right? 190* The {@link #doAcquireLock(Object, ProcedureStore)} will be split into two steps, first, it will 191* call us to determine whether we need to wait for initialization, second, it will call 192* {@link #doAcquireLock(Object, ProcedureStore)} to actually handle the lock for this procedure. Seems to say doAcquireLock calls itself. You remove @InterfaceAudience.Private on methods because class is @InterfaceAudience.Private? Could remove the Evolving from the class too... Evolving and Private don't make sense together. Nit. Nice translation: 94return subprocs.stream().mapToLong(Procedure::getProcId).toArray(); These will become annoying? LOG.debug("{} didn't hold the lock before restarting, skip acquiring lock.", this); You probably need them for the moment debugging though? Could leave them in and then strip them later when known working? We release the lock even if we didn't have it? And maybe update store if though we didn't have lock. 933 final void doReleaseLock(TEnvironment env, ProcedureStore store) { 934 locked = false; Will be back with more. Its taking time to digest. Thanks. This will change how we operate post-master crash but it should make stuff way better, easier to stand at least, and more predictable. > Restore procedure locks when master restarts > > > Key: HBASE-20846 > URL: https://issues.apache.org/jira/browse/HBASE-20846 > Project: HBase > Issue Type: Sub-task >Affects Versions: 2.1.0 >Reporter: Allan Yang >Assignee: Duo Zhang >Priority: Major > Fix For: 3.0.0, 2.0.2, 2.1.1 > > Attachments: HBASE-20846-v1.patch, HBASE-20846-v2.patch, > HBASE-20846-v3.patch, HBASE-20846-v4.patch, HBASE-20846-v4.patch, > HBASE-20846.branch-2.0.002.patch, HBASE-20846.branch-2.0.patch, > HBASE-20846.patch > > > Found this one when investigating ModifyTableProcedure got stuck while there > was a MoveRegionProcedure going on after master restart. > Though this issue can be solved by HBASE-20752. But I discovered something > else. > Before a MoveRegionProcedure can execute, it will hold the table's shared > lock. so,, when a UnassignProcedure was spwaned, it will not check the > table's shared lock since it is sure that its parent(MoveRegionProcedure) has > aquired the table's lock. > {code:java} > // If there is parent procedure, it would have already taken xlock, so no > need to take > // shared lock here. Otherwise, take shared lock. > if (!procedure.hasParent() > && waitTableQueueSharedLock(procedure, table) == null) { > return true; > } > {code} > But, it is not the case when Master was restarted. The child > procedure(UnassignProcedure) will be executed first after restart. Though it > has a parent(MoveRegionProcedure), but apprently the parent didn't hold the > table's lock. > So, since it began to execute without hold the table's shared lock. A > ModifyTableProcedure can aquire the table's exclusive
[jira] [Commented] (HBASE-20846) Restore procedure locks when master restarts
[ https://issues.apache.org/jira/browse/HBASE-20846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16548787#comment-16548787 ] Hadoop QA commented on HBASE-20846: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 18s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 8 new or modified test files. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 26s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 59s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 58s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 39s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 4m 36s{color} | {color:green} branch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 5m 9s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 6s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 17s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 5m 40s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 3m 37s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 3m 37s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 3m 37s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 13s{color} | {color:green} The patch hbase-protocol-shaded passed checkstyle {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 18s{color} | {color:green} hbase-procedure: The patch generated 0 new + 28 unchanged - 16 fixed = 28 total (was 44) {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 24s{color} | {color:green} hbase-server: The patch generated 0 new + 316 unchanged - 7 fixed = 316 total (was 323) {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 5m 18s{color} | {color:green} patch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 11m 4s{color} | {color:green} Patch does not cause any errors with Hadoop 2.7.4 or 3.0.0. {color} | | {color:green}+1{color} | {color:green} hbaseprotoc {color} | {color:green} 1m 22s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 5m 32s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 1s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 36s{color} | {color:green} hbase-protocol-shaded in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 53s{color} | {color:green} hbase-procedure in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red}174m 20s{color} | {color:red} hbase-server in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 1m 12s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}237m
[jira] [Commented] (HBASE-20846) Restore procedure locks when master restarts
[ https://issues.apache.org/jira/browse/HBASE-20846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16547284#comment-16547284 ] Duo Zhang commented on HBASE-20846: --- Tried the failed UT several times locally, all passed. The error on jenkins is that, the disable table is executed before we finish the merge table procedure, and we get a region in MERGED state. But this never happened for me locally, since merge table procedure will hold the shared lock of the table while disable table procedure needs the exclusive lock. And I added logs about acquire/release locks locally, it is fine, the disable table procedure does schedule before we finish the merge table procedure, but it can only be executed after the merge table procedure releases the shared lock on table. Let's try again. > Restore procedure locks when master restarts > > > Key: HBASE-20846 > URL: https://issues.apache.org/jira/browse/HBASE-20846 > Project: HBase > Issue Type: Sub-task >Affects Versions: 2.1.0 >Reporter: Allan Yang >Assignee: Duo Zhang >Priority: Major > Fix For: 3.0.0, 2.0.2, 2.1.1 > > Attachments: HBASE-20846-v1.patch, HBASE-20846-v2.patch, > HBASE-20846-v3.patch, HBASE-20846-v4.patch, HBASE-20846.branch-2.0.002.patch, > HBASE-20846.branch-2.0.patch, HBASE-20846.patch > > > Found this one when investigating ModifyTableProcedure got stuck while there > was a MoveRegionProcedure going on after master restart. > Though this issue can be solved by HBASE-20752. But I discovered something > else. > Before a MoveRegionProcedure can execute, it will hold the table's shared > lock. so,, when a UnassignProcedure was spwaned, it will not check the > table's shared lock since it is sure that its parent(MoveRegionProcedure) has > aquired the table's lock. > {code:java} > // If there is parent procedure, it would have already taken xlock, so no > need to take > // shared lock here. Otherwise, take shared lock. > if (!procedure.hasParent() > && waitTableQueueSharedLock(procedure, table) == null) { > return true; > } > {code} > But, it is not the case when Master was restarted. The child > procedure(UnassignProcedure) will be executed first after restart. Though it > has a parent(MoveRegionProcedure), but apprently the parent didn't hold the > table's lock. > So, since it began to execute without hold the table's shared lock. A > ModifyTableProcedure can aquire the table's exclusive lock and execute at the > same time. Which is not possible if the master was not restarted. > This will cause a stuck before HBASE-20752. But since HBASE-20752 has fixed, > I wrote a simple UT to repo this case. > I think we don't have to check the parent for table's shared lock. It is a > shared lock, right? I think we can acquire it every time we need it. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20846) Restore procedure locks when master restarts
[ https://issues.apache.org/jira/browse/HBASE-20846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16547213#comment-16547213 ] Hadoop QA commented on HBASE-20846: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 3m 48s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 8 new or modified test files. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 14s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 5m 28s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 3m 5s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 33s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 4m 21s{color} | {color:green} branch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 41s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 54s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 17s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 35s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 3m 1s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 3m 1s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 3m 1s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 10s{color} | {color:green} The patch hbase-protocol-shaded passed checkstyle {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 14s{color} | {color:red} hbase-procedure: The patch generated 1 new + 28 unchanged - 16 fixed = 29 total (was 44) {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 7s{color} | {color:green} hbase-server: The patch generated 0 new + 316 unchanged - 7 fixed = 316 total (was 323) {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 4m 22s{color} | {color:green} patch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 9m 34s{color} | {color:green} Patch does not cause any errors with Hadoop 2.7.4 or 3.0.0. {color} | | {color:green}+1{color} | {color:green} hbaseprotoc {color} | {color:green} 1m 24s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 5m 6s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 53s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 32s{color} | {color:green} hbase-protocol-shaded in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 47s{color} | {color:green} hbase-procedure in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red}196m 11s{color} | {color:red} hbase-server in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 1m 9s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}256m
[jira] [Commented] (HBASE-20846) Restore procedure locks when master restarts
[ https://issues.apache.org/jira/browse/HBASE-20846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16547003#comment-16547003 ] Hadoop QA commented on HBASE-20846: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 20s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 8 new or modified test files. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 29s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 31s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 43s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 26s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 4m 4s{color} | {color:green} branch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 25s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 50s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 15s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 14s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 44s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 2m 44s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 44s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 10s{color} | {color:green} The patch hbase-protocol-shaded passed checkstyle {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 13s{color} | {color:red} hbase-procedure: The patch generated 1 new + 28 unchanged - 16 fixed = 29 total (was 44) {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 3s{color} | {color:green} hbase-server: The patch generated 0 new + 316 unchanged - 7 fixed = 316 total (was 323) {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 4m 1s{color} | {color:green} patch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 9m 4s{color} | {color:green} Patch does not cause any errors with Hadoop 2.7.4 or 3.0.0. {color} | | {color:green}+1{color} | {color:green} hbaseprotoc {color} | {color:green} 1m 14s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 44s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 49s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 29s{color} | {color:green} hbase-protocol-shaded in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 33s{color} | {color:green} hbase-procedure in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red}192m 1s{color} | {color:red} hbase-server in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 1m 2s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}244m
[jira] [Commented] (HBASE-20846) Restore procedure locks when master restarts
[ https://issues.apache.org/jira/browse/HBASE-20846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16546646#comment-16546646 ] Duo Zhang commented on HBASE-20846: --- Just ignore TestProcedureReplayOrder. Now we will restore locks when restarting master, so we do not rely on the loading order any more. We should use lock to obtain the correct execution order. > Restore procedure locks when master restarts > > > Key: HBASE-20846 > URL: https://issues.apache.org/jira/browse/HBASE-20846 > Project: HBase > Issue Type: Sub-task >Affects Versions: 2.1.0 >Reporter: Allan Yang >Assignee: Duo Zhang >Priority: Major > Fix For: 3.0.0, 2.0.2, 2.1.1 > > Attachments: HBASE-20846-v1.patch, HBASE-20846-v2.patch, > HBASE-20846-v3.patch, HBASE-20846.branch-2.0.002.patch, > HBASE-20846.branch-2.0.patch, HBASE-20846.patch > > > Found this one when investigating ModifyTableProcedure got stuck while there > was a MoveRegionProcedure going on after master restart. > Though this issue can be solved by HBASE-20752. But I discovered something > else. > Before a MoveRegionProcedure can execute, it will hold the table's shared > lock. so,, when a UnassignProcedure was spwaned, it will not check the > table's shared lock since it is sure that its parent(MoveRegionProcedure) has > aquired the table's lock. > {code:java} > // If there is parent procedure, it would have already taken xlock, so no > need to take > // shared lock here. Otherwise, take shared lock. > if (!procedure.hasParent() > && waitTableQueueSharedLock(procedure, table) == null) { > return true; > } > {code} > But, it is not the case when Master was restarted. The child > procedure(UnassignProcedure) will be executed first after restart. Though it > has a parent(MoveRegionProcedure), but apprently the parent didn't hold the > table's lock. > So, since it began to execute without hold the table's shared lock. A > ModifyTableProcedure can aquire the table's exclusive lock and execute at the > same time. Which is not possible if the master was not restarted. > This will cause a stuck before HBASE-20752. But since HBASE-20752 has fixed, > I wrote a simple UT to repo this case. > I think we don't have to check the parent for table's shared lock. It is a > shared lock, right? I think we can acquire it every time we need it. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20846) Restore procedure locks when master restarts
[ https://issues.apache.org/jira/browse/HBASE-20846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16545969#comment-16545969 ] Duo Zhang commented on HBASE-20846: --- Thanks [~stack]. I checked the code, for a procedure in ROLLEDBACK state, we will call store.delete to remove it so we should not update the lock operation any more. And we are getting closer. Let me check the failed UTs. > Restore procedure locks when master restarts > > > Key: HBASE-20846 > URL: https://issues.apache.org/jira/browse/HBASE-20846 > Project: HBase > Issue Type: Sub-task >Affects Versions: 2.1.0 >Reporter: Allan Yang >Assignee: Duo Zhang >Priority: Major > Fix For: 3.0.0, 2.0.2, 2.1.1 > > Attachments: HBASE-20846-v1.patch, HBASE-20846-v2.patch, > HBASE-20846.branch-2.0.002.patch, HBASE-20846.branch-2.0.patch, > HBASE-20846.patch > > > Found this one when investigating ModifyTableProcedure got stuck while there > was a MoveRegionProcedure going on after master restart. > Though this issue can be solved by HBASE-20752. But I discovered something > else. > Before a MoveRegionProcedure can execute, it will hold the table's shared > lock. so,, when a UnassignProcedure was spwaned, it will not check the > table's shared lock since it is sure that its parent(MoveRegionProcedure) has > aquired the table's lock. > {code:java} > // If there is parent procedure, it would have already taken xlock, so no > need to take > // shared lock here. Otherwise, take shared lock. > if (!procedure.hasParent() > && waitTableQueueSharedLock(procedure, table) == null) { > return true; > } > {code} > But, it is not the case when Master was restarted. The child > procedure(UnassignProcedure) will be executed first after restart. Though it > has a parent(MoveRegionProcedure), but apprently the parent didn't hold the > table's lock. > So, since it began to execute without hold the table's shared lock. A > ModifyTableProcedure can aquire the table's exclusive lock and execute at the > same time. Which is not possible if the master was not restarted. > This will cause a stuck before HBASE-20752. But since HBASE-20752 has fixed, > I wrote a simple UT to repo this case. > I think we don't have to check the parent for table's shared lock. It is a > shared lock, right? I think we can acquire it every time we need it. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20846) Restore procedure locks when master restarts
[ https://issues.apache.org/jira/browse/HBASE-20846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16545861#comment-16545861 ] Hadoop QA commented on HBASE-20846: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 38s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 7 new or modified test files. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 46s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 19s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 3m 10s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 35s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 4m 22s{color} | {color:green} branch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 47s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 55s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 16s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 27s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 52s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 2m 52s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 52s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 8s{color} | {color:green} The patch hbase-protocol-shaded passed checkstyle {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 13s{color} | {color:red} hbase-procedure: The patch generated 1 new + 38 unchanged - 14 fixed = 39 total (was 52) {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 1s{color} | {color:green} hbase-server: The patch generated 0 new + 316 unchanged - 7 fixed = 316 total (was 323) {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 4m 7s{color} | {color:green} patch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 9m 14s{color} | {color:green} Patch does not cause any errors with Hadoop 2.7.4 or 3.0.0. {color} | | {color:green}+1{color} | {color:green} hbaseprotoc {color} | {color:green} 1m 13s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 54s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 48s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 29s{color} | {color:green} hbase-protocol-shaded in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 2m 46s{color} | {color:red} hbase-procedure in the patch failed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red}197m 24s{color} | {color:red} hbase-server in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 1m 7s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}255m 33s{color} |
[jira] [Commented] (HBASE-20846) Restore procedure locks when master restarts
[ https://issues.apache.org/jira/browse/HBASE-20846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16545309#comment-16545309 ] stack commented on HBASE-20846: --- bq. Now the biggest problem is that, the original code does not allow storing ROLLEDBACK procedure into the procedure store. You can't store ROLLBACK steps as we do forward steps; the framework does not currently support this. Shout if you want more detail. > Restore procedure locks when master restarts > > > Key: HBASE-20846 > URL: https://issues.apache.org/jira/browse/HBASE-20846 > Project: HBase > Issue Type: Sub-task >Affects Versions: 2.1.0 >Reporter: Allan Yang >Assignee: Duo Zhang >Priority: Major > Fix For: 3.0.0, 2.0.2, 2.1.1 > > Attachments: HBASE-20846-v1.patch, HBASE-20846-v2.patch, > HBASE-20846.branch-2.0.002.patch, HBASE-20846.branch-2.0.patch, > HBASE-20846.patch > > > Found this one when investigating ModifyTableProcedure got stuck while there > was a MoveRegionProcedure going on after master restart. > Though this issue can be solved by HBASE-20752. But I discovered something > else. > Before a MoveRegionProcedure can execute, it will hold the table's shared > lock. so,, when a UnassignProcedure was spwaned, it will not check the > table's shared lock since it is sure that its parent(MoveRegionProcedure) has > aquired the table's lock. > {code:java} > // If there is parent procedure, it would have already taken xlock, so no > need to take > // shared lock here. Otherwise, take shared lock. > if (!procedure.hasParent() > && waitTableQueueSharedLock(procedure, table) == null) { > return true; > } > {code} > But, it is not the case when Master was restarted. The child > procedure(UnassignProcedure) will be executed first after restart. Though it > has a parent(MoveRegionProcedure), but apprently the parent didn't hold the > table's lock. > So, since it began to execute without hold the table's shared lock. A > ModifyTableProcedure can aquire the table's exclusive lock and execute at the > same time. Which is not possible if the master was not restarted. > This will cause a stuck before HBASE-20752. But since HBASE-20752 has fixed, > I wrote a simple UT to repo this case. > I think we don't have to check the parent for table's shared lock. It is a > shared lock, right? I think we can acquire it every time we need it. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20846) Restore procedure locks when master restarts
[ https://issues.apache.org/jira/browse/HBASE-20846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16545192#comment-16545192 ] Hadoop QA commented on HBASE-20846: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 27s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 7 new or modified test files. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 25s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 37s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 51s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 21s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 4m 9s{color} | {color:green} branch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 36s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 47s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 13s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 30s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 52s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 2m 52s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 52s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 8s{color} | {color:green} The patch hbase-protocol-shaded passed checkstyle {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 12s{color} | {color:red} hbase-procedure: The patch generated 1 new + 38 unchanged - 14 fixed = 39 total (was 52) {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 59s{color} | {color:green} hbase-server: The patch generated 0 new + 316 unchanged - 7 fixed = 316 total (was 323) {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 4m 5s{color} | {color:green} patch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 9m 27s{color} | {color:green} Patch does not cause any errors with Hadoop 2.7.4 or 3.0.0. {color} | | {color:green}+1{color} | {color:green} hbaseprotoc {color} | {color:green} 1m 14s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 5m 3s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 49s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 29s{color} | {color:green} hbase-protocol-shaded in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 2m 42s{color} | {color:red} hbase-procedure in the patch failed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red}241m 18s{color} | {color:red} hbase-server in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 50s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}294m 57s{color} |
[jira] [Commented] (HBASE-20846) Restore procedure locks when master restarts
[ https://issues.apache.org/jira/browse/HBASE-20846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16544914#comment-16544914 ] Hadoop QA commented on HBASE-20846: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 17s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 7 new or modified test files. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 15s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 45s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 3m 1s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 36s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 4m 24s{color} | {color:green} branch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 41s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 55s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 17s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:red}-1{color} | {color:red} mvninstall {color} | {color:red} 1m 47s{color} | {color:red} root in the patch failed. {color} | | {color:red}-1{color} | {color:red} compile {color} | {color:red} 0m 45s{color} | {color:red} hbase-server in the patch failed. {color} | | {color:red}-1{color} | {color:red} cc {color} | {color:red} 0m 45s{color} | {color:red} hbase-server in the patch failed. {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 0m 45s{color} | {color:red} hbase-server in the patch failed. {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 10s{color} | {color:green} The patch hbase-protocol-shaded passed checkstyle {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 15s{color} | {color:red} hbase-procedure: The patch generated 1 new + 28 unchanged - 14 fixed = 29 total (was 42) {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 7s{color} | {color:green} hbase-server: The patch generated 0 new + 316 unchanged - 7 fixed = 316 total (was 323) {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:red}-1{color} | {color:red} shadedjars {color} | {color:red} 3m 14s{color} | {color:red} patch has 14 errors when building our shaded downstream artifacts. {color} | | {color:red}-1{color} | {color:red} hadoopcheck {color} | {color:red} 1m 41s{color} | {color:red} The patch causes 15 errors with Hadoop v2.7.4. {color} | | {color:red}-1{color} | {color:red} hadoopcheck {color} | {color:red} 3m 26s{color} | {color:red} The patch causes 15 errors with Hadoop v3.0.0. {color} | | {color:red}-1{color} | {color:red} hbaseprotoc {color} | {color:red} 0m 25s{color} | {color:red} hbase-server in the patch failed. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 25s{color} | {color:red} hbase-server in the patch failed. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 54s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 33s{color} | {color:green} hbase-protocol-shaded in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 2m 1s{color} | {color:red} hbase-procedure in the patch failed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 0m 31s{color} | {color:red} hbase-server in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green}
[jira] [Commented] (HBASE-20846) Restore procedure locks when master restarts
[ https://issues.apache.org/jira/browse/HBASE-20846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16544874#comment-16544874 ] Duo Zhang commented on HBASE-20846: --- Still have plenty of problems. Now the biggest problem is that, the original code does not allow storing ROLLEDBACK procedure into the procedure store. Need to understand more about the framework... > Restore procedure locks when master restarts > > > Key: HBASE-20846 > URL: https://issues.apache.org/jira/browse/HBASE-20846 > Project: HBase > Issue Type: Sub-task >Affects Versions: 2.1.0 >Reporter: Allan Yang >Assignee: Duo Zhang >Priority: Major > Fix For: 3.0.0, 2.0.2, 2.1.1 > > Attachments: HBASE-20846-v1.patch, HBASE-20846.branch-2.0.002.patch, > HBASE-20846.branch-2.0.patch, HBASE-20846.patch > > > Found this one when investigating ModifyTableProcedure got stuck while there > was a MoveRegionProcedure going on after master restart. > Though this issue can be solved by HBASE-20752. But I discovered something > else. > Before a MoveRegionProcedure can execute, it will hold the table's shared > lock. so,, when a UnassignProcedure was spwaned, it will not check the > table's shared lock since it is sure that its parent(MoveRegionProcedure) has > aquired the table's lock. > {code:java} > // If there is parent procedure, it would have already taken xlock, so no > need to take > // shared lock here. Otherwise, take shared lock. > if (!procedure.hasParent() > && waitTableQueueSharedLock(procedure, table) == null) { > return true; > } > {code} > But, it is not the case when Master was restarted. The child > procedure(UnassignProcedure) will be executed first after restart. Though it > has a parent(MoveRegionProcedure), but apprently the parent didn't hold the > table's lock. > So, since it began to execute without hold the table's shared lock. A > ModifyTableProcedure can aquire the table's exclusive lock and execute at the > same time. Which is not possible if the master was not restarted. > This will cause a stuck before HBASE-20752. But since HBASE-20752 has fixed, > I wrote a simple UT to repo this case. > I think we don't have to check the parent for table's shared lock. It is a > shared lock, right? I think we can acquire it every time we need it. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20846) Restore procedure locks when master restarts
[ https://issues.apache.org/jira/browse/HBASE-20846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16544297#comment-16544297 ] Hadoop QA commented on HBASE-20846: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 16s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 27s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 5m 3s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 3m 2s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 36s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 4m 30s{color} | {color:green} branch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 35s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 53s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 14s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 49s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 3m 4s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 3m 4s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 3m 4s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 10s{color} | {color:green} The patch hbase-protocol-shaded passed checkstyle {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 14s{color} | {color:red} hbase-procedure: The patch generated 1 new + 34 unchanged - 4 fixed = 35 total (was 38) {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 11s{color} | {color:green} hbase-server: The patch generated 0 new + 82 unchanged - 7 fixed = 82 total (was 89) {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 4m 30s{color} | {color:green} patch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 10m 14s{color} | {color:green} Patch does not cause any errors with Hadoop 2.7.4 or 3.0.0. {color} | | {color:green}+1{color} | {color:green} hbaseprotoc {color} | {color:green} 1m 20s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 55s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 13s{color} | {color:red} hbase-procedure generated 2 new + 0 unchanged - 0 fixed = 2 total (was 0) {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 32s{color} | {color:green} hbase-protocol-shaded in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 40m 15s{color} | {color:red} hbase-procedure in the patch failed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red}134m 31s{color} | {color:red} hbase-server in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 1m 5s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} |
[jira] [Commented] (HBASE-20846) Restore procedure locks when master restarts
[ https://issues.apache.org/jira/browse/HBASE-20846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16544228#comment-16544228 ] Duo Zhang commented on HBASE-20846: --- A first try. TestMasterFailoverWithProcedures can not pass, I know. Just want to see if there are other problems. And also need to add some UTs to confirm that the restore locks works as expected. > Restore procedure locks when master restarts > > > Key: HBASE-20846 > URL: https://issues.apache.org/jira/browse/HBASE-20846 > Project: HBase > Issue Type: Sub-task >Affects Versions: 2.1.0 >Reporter: Allan Yang >Assignee: Duo Zhang >Priority: Major > Fix For: 3.0.0, 2.0.2, 2.1.1 > > Attachments: HBASE-20846.branch-2.0.002.patch, > HBASE-20846.branch-2.0.patch, HBASE-20846.patch > > > Found this one when investigating ModifyTableProcedure got stuck while there > was a MoveRegionProcedure going on after master restart. > Though this issue can be solved by HBASE-20752. But I discovered something > else. > Before a MoveRegionProcedure can execute, it will hold the table's shared > lock. so,, when a UnassignProcedure was spwaned, it will not check the > table's shared lock since it is sure that its parent(MoveRegionProcedure) has > aquired the table's lock. > {code:java} > // If there is parent procedure, it would have already taken xlock, so no > need to take > // shared lock here. Otherwise, take shared lock. > if (!procedure.hasParent() > && waitTableQueueSharedLock(procedure, table) == null) { > return true; > } > {code} > But, it is not the case when Master was restarted. The child > procedure(UnassignProcedure) will be executed first after restart. Though it > has a parent(MoveRegionProcedure), but apprently the parent didn't hold the > table's lock. > So, since it began to execute without hold the table's shared lock. A > ModifyTableProcedure can aquire the table's exclusive lock and execute at the > same time. Which is not possible if the master was not restarted. > This will cause a stuck before HBASE-20752. But since HBASE-20752 has fixed, > I wrote a simple UT to repo this case. > I think we don't have to check the parent for table's shared lock. It is a > shared lock, right? I think we can acquire it every time we need it. -- This message was sent by Atlassian JIRA (v7.6.3#76005)