[jira] [Commented] (HBASE-21083) Introduce a mechanism to bypass the execution of a stuck procedure
[ https://issues.apache.org/jira/browse/HBASE-21083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16704316#comment-16704316 ] Allan Yang commented on HBASE-21083: And besides, bypassing a procedure in the middle without fix the inconsistency is dangerous. We don't want to expose this method to user themself. > Introduce a mechanism to bypass the execution of a stuck procedure > -- > > Key: HBASE-21083 > URL: https://issues.apache.org/jira/browse/HBASE-21083 > Project: HBase > Issue Type: Sub-task > Components: amv2 >Affects Versions: 2.1.0, 2.0.1 >Reporter: Allan Yang >Assignee: Allan Yang >Priority: Major > Fix For: 3.0.0, 2.1.1, 2.0.2 > > Attachments: HBASE-21083.branch-2.0.001.patch, > HBASE-21083.branch-2.0.002.patch, HBASE-21083.branch-2.0.003.patch, > HBASE-21083.branch-2.0.003.patch, HBASE-21083.branch-2.0.003.patch, > HBASE-21083.branch-2.1.001.patch > > > Offline discussed with [~stack] and [~Apache9]. We all agreed that we need to > introduce a mechanism to 'force complete' a stuck procedure, so the AMv2 can > continue running. > we still have some unrevealed bugs hiding in our AMv2 and procedureV2 system, > we need something to interfere with stuck procedures before HBCK2 can work. > This is very crucial for a production ready system. > For now, we have little ways to interfere with running procedures. Aborting > them is not a good choice, since some procedures are not abort-able. And some > procedure may have overridden the abort() method, which will ignore the abort > request. > So, here, I will introduce a mechanism to bypass the execution of a stuck > procedure. > Basically, I added a field called 'bypass' to Procedure class. If we set this > field to true, all the logic in execute/rollback will be skipped, letting > this procedure and its ancestors complete normally and releasing the lock > resources at last. > Notice that bypassing a procedure may leave the cluster in a middle state, > e.g. the region not assigned, or some hdfs files left behind. > The Operators need know the side effect of bypassing and recover the > inconsistent state of the cluster themselves, like issuing new procedures to > assign the regions. > A patch will be uploaded and review board will be open. For now, only APIs in > ProcedureExecutor are provided. If anything is fine, I will add it to master > service and add a shell command to bypass a procedure. Or, maybe we can use > dynamically compiled JSPs to execute those APIs as mentioned in HBASE-20679. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21083) Introduce a mechanism to bypass the execution of a stuck procedure
[ https://issues.apache.org/jira/browse/HBASE-21083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16704314#comment-16704314 ] Allan Yang commented on HBASE-21083: [~xucang], It should not be very hard to back port this patch, but this feature is only used by HBCK2 in 2.x, and it does not provide a Admin API(over RPC) or something else, it can't be directly called by shell or HBaseAdmin class. > Introduce a mechanism to bypass the execution of a stuck procedure > -- > > Key: HBASE-21083 > URL: https://issues.apache.org/jira/browse/HBASE-21083 > Project: HBase > Issue Type: Sub-task > Components: amv2 >Affects Versions: 2.1.0, 2.0.1 >Reporter: Allan Yang >Assignee: Allan Yang >Priority: Major > Fix For: 3.0.0, 2.1.1, 2.0.2 > > Attachments: HBASE-21083.branch-2.0.001.patch, > HBASE-21083.branch-2.0.002.patch, HBASE-21083.branch-2.0.003.patch, > HBASE-21083.branch-2.0.003.patch, HBASE-21083.branch-2.0.003.patch, > HBASE-21083.branch-2.1.001.patch > > > Offline discussed with [~stack] and [~Apache9]. We all agreed that we need to > introduce a mechanism to 'force complete' a stuck procedure, so the AMv2 can > continue running. > we still have some unrevealed bugs hiding in our AMv2 and procedureV2 system, > we need something to interfere with stuck procedures before HBCK2 can work. > This is very crucial for a production ready system. > For now, we have little ways to interfere with running procedures. Aborting > them is not a good choice, since some procedures are not abort-able. And some > procedure may have overridden the abort() method, which will ignore the abort > request. > So, here, I will introduce a mechanism to bypass the execution of a stuck > procedure. > Basically, I added a field called 'bypass' to Procedure class. If we set this > field to true, all the logic in execute/rollback will be skipped, letting > this procedure and its ancestors complete normally and releasing the lock > resources at last. > Notice that bypassing a procedure may leave the cluster in a middle state, > e.g. the region not assigned, or some hdfs files left behind. > The Operators need know the side effect of bypassing and recover the > inconsistent state of the cluster themselves, like issuing new procedures to > assign the regions. > A patch will be uploaded and review board will be open. For now, only APIs in > ProcedureExecutor are provided. If anything is fine, I will add it to master > service and add a shell command to bypass a procedure. Or, maybe we can use > dynamically compiled JSPs to execute those APIs as mentioned in HBASE-20679. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21083) Introduce a mechanism to bypass the execution of a stuck procedure
[ https://issues.apache.org/jira/browse/HBASE-21083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16704195#comment-16704195 ] Xu Cang commented on HBASE-21083: - [~allan163] thanks for response The reason I am asking is, seems for hbase branch-1 there is no reliable way to unblock or bypass stuck Procedures. And this feature is something potentially can be applied to branch-1 to alleviate engineering burden such as manually operating on WAL files. I briefly skimmed your patch and I see most of the code change is related to ProcedureV2, not AMv2. So since branch-1 has ProcedureV2, so I was asking how hard to port this high-level logic to branch-1. > Introduce a mechanism to bypass the execution of a stuck procedure > -- > > Key: HBASE-21083 > URL: https://issues.apache.org/jira/browse/HBASE-21083 > Project: HBase > Issue Type: Sub-task > Components: amv2 >Affects Versions: 2.1.0, 2.0.1 >Reporter: Allan Yang >Assignee: Allan Yang >Priority: Major > Fix For: 3.0.0, 2.1.1, 2.0.2 > > Attachments: HBASE-21083.branch-2.0.001.patch, > HBASE-21083.branch-2.0.002.patch, HBASE-21083.branch-2.0.003.patch, > HBASE-21083.branch-2.0.003.patch, HBASE-21083.branch-2.0.003.patch, > HBASE-21083.branch-2.1.001.patch > > > Offline discussed with [~stack] and [~Apache9]. We all agreed that we need to > introduce a mechanism to 'force complete' a stuck procedure, so the AMv2 can > continue running. > we still have some unrevealed bugs hiding in our AMv2 and procedureV2 system, > we need something to interfere with stuck procedures before HBCK2 can work. > This is very crucial for a production ready system. > For now, we have little ways to interfere with running procedures. Aborting > them is not a good choice, since some procedures are not abort-able. And some > procedure may have overridden the abort() method, which will ignore the abort > request. > So, here, I will introduce a mechanism to bypass the execution of a stuck > procedure. > Basically, I added a field called 'bypass' to Procedure class. If we set this > field to true, all the logic in execute/rollback will be skipped, letting > this procedure and its ancestors complete normally and releasing the lock > resources at last. > Notice that bypassing a procedure may leave the cluster in a middle state, > e.g. the region not assigned, or some hdfs files left behind. > The Operators need know the side effect of bypassing and recover the > inconsistent state of the cluster themselves, like issuing new procedures to > assign the regions. > A patch will be uploaded and review board will be open. For now, only APIs in > ProcedureExecutor are provided. If anything is fine, I will add it to master > service and add a shell command to bypass a procedure. Or, maybe we can use > dynamically compiled JSPs to execute those APIs as mentioned in HBASE-20679. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21083) Introduce a mechanism to bypass the execution of a stuck procedure
[ https://issues.apache.org/jira/browse/HBASE-21083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16704171#comment-16704171 ] Allan Yang commented on HBASE-21083: [~xucang] this feature is mostly used by HBCK2? how are preparing to use it in 1.x? > Introduce a mechanism to bypass the execution of a stuck procedure > -- > > Key: HBASE-21083 > URL: https://issues.apache.org/jira/browse/HBASE-21083 > Project: HBase > Issue Type: Sub-task > Components: amv2 >Affects Versions: 2.1.0, 2.0.1 >Reporter: Allan Yang >Assignee: Allan Yang >Priority: Major > Fix For: 3.0.0, 2.1.1, 2.0.2 > > Attachments: HBASE-21083.branch-2.0.001.patch, > HBASE-21083.branch-2.0.002.patch, HBASE-21083.branch-2.0.003.patch, > HBASE-21083.branch-2.0.003.patch, HBASE-21083.branch-2.0.003.patch, > HBASE-21083.branch-2.1.001.patch > > > Offline discussed with [~stack] and [~Apache9]. We all agreed that we need to > introduce a mechanism to 'force complete' a stuck procedure, so the AMv2 can > continue running. > we still have some unrevealed bugs hiding in our AMv2 and procedureV2 system, > we need something to interfere with stuck procedures before HBCK2 can work. > This is very crucial for a production ready system. > For now, we have little ways to interfere with running procedures. Aborting > them is not a good choice, since some procedures are not abort-able. And some > procedure may have overridden the abort() method, which will ignore the abort > request. > So, here, I will introduce a mechanism to bypass the execution of a stuck > procedure. > Basically, I added a field called 'bypass' to Procedure class. If we set this > field to true, all the logic in execute/rollback will be skipped, letting > this procedure and its ancestors complete normally and releasing the lock > resources at last. > Notice that bypassing a procedure may leave the cluster in a middle state, > e.g. the region not assigned, or some hdfs files left behind. > The Operators need know the side effect of bypassing and recover the > inconsistent state of the cluster themselves, like issuing new procedures to > assign the regions. > A patch will be uploaded and review board will be open. For now, only APIs in > ProcedureExecutor are provided. If anything is fine, I will add it to master > service and add a shell command to bypass a procedure. Or, maybe we can use > dynamically compiled JSPs to execute those APIs as mentioned in HBASE-20679. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21083) Introduce a mechanism to bypass the execution of a stuck procedure
[ https://issues.apache.org/jira/browse/HBASE-21083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16702299#comment-16702299 ] Xu Cang commented on HBASE-21083: - [~allan163] Do you see a value backporting this to branch-1? We have seen MasterProcedure stuck in production cluster and have very few ways to safely resolve it. (I am not very familiar with AMv2 but seems AMv2 is available in branch-1 too? I can see procedure2.Procedure class and so on) Thanks. > Introduce a mechanism to bypass the execution of a stuck procedure > -- > > Key: HBASE-21083 > URL: https://issues.apache.org/jira/browse/HBASE-21083 > Project: HBase > Issue Type: Sub-task > Components: amv2 >Affects Versions: 2.1.0, 2.0.1 >Reporter: Allan Yang >Assignee: Allan Yang >Priority: Major > Fix For: 3.0.0, 2.1.1, 2.0.2 > > Attachments: HBASE-21083.branch-2.0.001.patch, > HBASE-21083.branch-2.0.002.patch, HBASE-21083.branch-2.0.003.patch, > HBASE-21083.branch-2.0.003.patch, HBASE-21083.branch-2.0.003.patch, > HBASE-21083.branch-2.1.001.patch > > > Offline discussed with [~stack] and [~Apache9]. We all agreed that we need to > introduce a mechanism to 'force complete' a stuck procedure, so the AMv2 can > continue running. > we still have some unrevealed bugs hiding in our AMv2 and procedureV2 > system, we need something to interfere with stuck procedures before HBCK2 can > work. This is very crucial for a production ready system. > For now, we have little ways to interfere with running procedures. Aborting > them is not a good choice, since some procedures are not abort-able. And some > procedure may have overridden the abort() method, which will ignore the abort > request. > So, here, I will introduce a mechanism to bypass the execution of a stuck > procedure. > Basically, I added a field called 'bypass' to Procedure class. If we set this > field to true, all the logic in execute/rollback will be skipped, letting > this procedure and its ancestors complete normally and releasing the lock > resources at last. > Notice that bypassing a procedure may leave the cluster in a middle state, > e.g. the region not assigned, or some hdfs files left behind. > The Operators need know the side effect of bypassing and recover the > inconsistent state of the cluster themselves, like issuing new procedures to > assign the regions. > A patch will be uploaded and review board will be open. For now, only APIs in > ProcedureExecutor are provided. If anything is fine, I will add it to master > service and add a shell command to bypass a procedure. Or, maybe we can use > dynamically compiled JSPs to execute those APIs as mentioned in HBASE-20679. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21083) Introduce a mechanism to bypass the execution of a stuck procedure
[ https://issues.apache.org/jira/browse/HBASE-21083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16622046#comment-16622046 ] stack commented on HBASE-21083: --- FYI [~allan163] HBASE-21213 > Introduce a mechanism to bypass the execution of a stuck procedure > -- > > Key: HBASE-21083 > URL: https://issues.apache.org/jira/browse/HBASE-21083 > Project: HBase > Issue Type: Sub-task > Components: amv2 >Affects Versions: 2.1.0, 2.0.1 >Reporter: Allan Yang >Assignee: Allan Yang >Priority: Major > Fix For: 3.0.0, 2.1.1, 2.0.2 > > Attachments: HBASE-21083.branch-2.0.001.patch, > HBASE-21083.branch-2.0.002.patch, HBASE-21083.branch-2.0.003.patch, > HBASE-21083.branch-2.0.003.patch, HBASE-21083.branch-2.0.003.patch, > HBASE-21083.branch-2.1.001.patch > > > Offline discussed with [~stack] and [~Apache9]. We all agreed that we need to > introduce a mechanism to 'force complete' a stuck procedure, so the AMv2 can > continue running. > we still have some unrevealed bugs hiding in our AMv2 and procedureV2 > system, we need something to interfere with stuck procedures before HBCK2 can > work. This is very crucial for a production ready system. > For now, we have little ways to interfere with running procedures. Aborting > them is not a good choice, since some procedures are not abort-able. And some > procedure may have overridden the abort() method, which will ignore the abort > request. > So, here, I will introduce a mechanism to bypass the execution of a stuck > procedure. > Basically, I added a field called 'bypass' to Procedure class. If we set this > field to true, all the logic in execute/rollback will be skipped, letting > this procedure and its ancestors complete normally and releasing the lock > resources at last. > Notice that bypassing a procedure may leave the cluster in a middle state, > e.g. the region not assigned, or some hdfs files left behind. > The Operators need know the side effect of bypassing and recover the > inconsistent state of the cluster themselves, like issuing new procedures to > assign the regions. > A patch will be uploaded and review board will be open. For now, only APIs in > ProcedureExecutor are provided. If anything is fine, I will add it to master > service and add a shell command to bypass a procedure. Or, maybe we can use > dynamically compiled JSPs to execute those APIs as mentioned in HBASE-20679. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21083) Introduce a mechanism to bypass the execution of a stuck procedure
[ https://issues.apache.org/jira/browse/HBASE-21083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16598608#comment-16598608 ] Hudson commented on HBASE-21083: Results for branch branch-2.1 [build #257 on builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/257/]: (x) *{color:red}-1 overall{color}* details (if available): (x) {color:red}-1 general checks{color} -- Something went wrong running this stage, please [check relevant console output|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/257//console]. (x) {color:red}-1 jdk8 hadoop2 checks{color} -- Something went wrong running this stage, please [check relevant console output|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/257//console]. (/) {color:green}+1 jdk8 hadoop3 checks{color} -- For more information [see jdk8 (hadoop3) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/257//JDK8_Nightly_Build_Report_(Hadoop3)/] (/) {color:green}+1 source release artifact{color} -- See build output for details. (/) {color:green}+1 client integration test{color} > Introduce a mechanism to bypass the execution of a stuck procedure > -- > > Key: HBASE-21083 > URL: https://issues.apache.org/jira/browse/HBASE-21083 > Project: HBase > Issue Type: Sub-task > Components: amv2 >Affects Versions: 2.1.0, 2.0.1 >Reporter: Allan Yang >Assignee: Allan Yang >Priority: Major > Fix For: 3.0.0, 2.1.1, 2.0.2 > > Attachments: HBASE-21083.branch-2.0.001.patch, > HBASE-21083.branch-2.0.002.patch, HBASE-21083.branch-2.0.003.patch, > HBASE-21083.branch-2.0.003.patch, HBASE-21083.branch-2.0.003.patch, > HBASE-21083.branch-2.1.001.patch > > > Offline discussed with [~stack] and [~Apache9]. We all agreed that we need to > introduce a mechanism to 'force complete' a stuck procedure, so the AMv2 can > continue running. > we still have some unrevealed bugs hiding in our AMv2 and procedureV2 > system, we need something to interfere with stuck procedures before HBCK2 can > work. This is very crucial for a production ready system. > For now, we have little ways to interfere with running procedures. Aborting > them is not a good choice, since some procedures are not abort-able. And some > procedure may have overridden the abort() method, which will ignore the abort > request. > So, here, I will introduce a mechanism to bypass the execution of a stuck > procedure. > Basically, I added a field called 'bypass' to Procedure class. If we set this > field to true, all the logic in execute/rollback will be skipped, letting > this procedure and its ancestors complete normally and releasing the lock > resources at last. > Notice that bypassing a procedure may leave the cluster in a middle state, > e.g. the region not assigned, or some hdfs files left behind. > The Operators need know the side effect of bypassing and recover the > inconsistent state of the cluster themselves, like issuing new procedures to > assign the regions. > A patch will be uploaded and review board will be open. For now, only APIs in > ProcedureExecutor are provided. If anything is fine, I will add it to master > service and add a shell command to bypass a procedure. Or, maybe we can use > dynamically compiled JSPs to execute those APIs as mentioned in HBASE-20679. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21083) Introduce a mechanism to bypass the execution of a stuck procedure
[ https://issues.apache.org/jira/browse/HBASE-21083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16598582#comment-16598582 ] Hudson commented on HBASE-21083: Results for branch branch-2 [build #1179 on builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1179/]: (x) *{color:red}-1 overall{color}* details (if available): (x) {color:red}-1 general checks{color} -- For more information [see general report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1179//General_Nightly_Build_Report/] (x) {color:red}-1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1179//JDK8_Nightly_Build_Report_(Hadoop2)/] (x) {color:red}-1 jdk8 hadoop3 checks{color} -- For more information [see jdk8 (hadoop3) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1179//JDK8_Nightly_Build_Report_(Hadoop3)/] (/) {color:green}+1 source release artifact{color} -- See build output for details. (/) {color:green}+1 client integration test{color} > Introduce a mechanism to bypass the execution of a stuck procedure > -- > > Key: HBASE-21083 > URL: https://issues.apache.org/jira/browse/HBASE-21083 > Project: HBase > Issue Type: Sub-task > Components: amv2 >Affects Versions: 2.1.0, 2.0.1 >Reporter: Allan Yang >Assignee: Allan Yang >Priority: Major > Fix For: 3.0.0, 2.1.1, 2.0.2 > > Attachments: HBASE-21083.branch-2.0.001.patch, > HBASE-21083.branch-2.0.002.patch, HBASE-21083.branch-2.0.003.patch, > HBASE-21083.branch-2.0.003.patch, HBASE-21083.branch-2.0.003.patch, > HBASE-21083.branch-2.1.001.patch > > > Offline discussed with [~stack] and [~Apache9]. We all agreed that we need to > introduce a mechanism to 'force complete' a stuck procedure, so the AMv2 can > continue running. > we still have some unrevealed bugs hiding in our AMv2 and procedureV2 > system, we need something to interfere with stuck procedures before HBCK2 can > work. This is very crucial for a production ready system. > For now, we have little ways to interfere with running procedures. Aborting > them is not a good choice, since some procedures are not abort-able. And some > procedure may have overridden the abort() method, which will ignore the abort > request. > So, here, I will introduce a mechanism to bypass the execution of a stuck > procedure. > Basically, I added a field called 'bypass' to Procedure class. If we set this > field to true, all the logic in execute/rollback will be skipped, letting > this procedure and its ancestors complete normally and releasing the lock > resources at last. > Notice that bypassing a procedure may leave the cluster in a middle state, > e.g. the region not assigned, or some hdfs files left behind. > The Operators need know the side effect of bypassing and recover the > inconsistent state of the cluster themselves, like issuing new procedures to > assign the regions. > A patch will be uploaded and review board will be open. For now, only APIs in > ProcedureExecutor are provided. If anything is fine, I will add it to master > service and add a shell command to bypass a procedure. Or, maybe we can use > dynamically compiled JSPs to execute those APIs as mentioned in HBASE-20679. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21083) Introduce a mechanism to bypass the execution of a stuck procedure
[ https://issues.apache.org/jira/browse/HBASE-21083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16598488#comment-16598488 ] Hudson commented on HBASE-21083: Results for branch master [build #464 on builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/master/464/]: (x) *{color:red}-1 overall{color}* details (if available): (x) {color:red}-1 general checks{color} -- Something went wrong running this stage, please [check relevant console output|https://builds.apache.org/job/HBase%20Nightly/job/master/464//console]. (x) {color:red}-1 jdk8 hadoop2 checks{color} -- Something went wrong running this stage, please [check relevant console output|https://builds.apache.org/job/HBase%20Nightly/job/master/464//console]. (x) {color:red}-1 jdk8 hadoop3 checks{color} -- Something went wrong running this stage, please [check relevant console output|https://builds.apache.org/job/HBase%20Nightly/job/master/464//console]. (/) {color:green}+1 source release artifact{color} -- See build output for details. (/) {color:green}+1 client integration test{color} > Introduce a mechanism to bypass the execution of a stuck procedure > -- > > Key: HBASE-21083 > URL: https://issues.apache.org/jira/browse/HBASE-21083 > Project: HBase > Issue Type: Sub-task > Components: amv2 >Affects Versions: 2.1.0, 2.0.1 >Reporter: Allan Yang >Assignee: Allan Yang >Priority: Major > Fix For: 3.0.0, 2.1.1, 2.0.2 > > Attachments: HBASE-21083.branch-2.0.001.patch, > HBASE-21083.branch-2.0.002.patch, HBASE-21083.branch-2.0.003.patch, > HBASE-21083.branch-2.0.003.patch, HBASE-21083.branch-2.0.003.patch, > HBASE-21083.branch-2.1.001.patch > > > Offline discussed with [~stack] and [~Apache9]. We all agreed that we need to > introduce a mechanism to 'force complete' a stuck procedure, so the AMv2 can > continue running. > we still have some unrevealed bugs hiding in our AMv2 and procedureV2 > system, we need something to interfere with stuck procedures before HBCK2 can > work. This is very crucial for a production ready system. > For now, we have little ways to interfere with running procedures. Aborting > them is not a good choice, since some procedures are not abort-able. And some > procedure may have overridden the abort() method, which will ignore the abort > request. > So, here, I will introduce a mechanism to bypass the execution of a stuck > procedure. > Basically, I added a field called 'bypass' to Procedure class. If we set this > field to true, all the logic in execute/rollback will be skipped, letting > this procedure and its ancestors complete normally and releasing the lock > resources at last. > Notice that bypassing a procedure may leave the cluster in a middle state, > e.g. the region not assigned, or some hdfs files left behind. > The Operators need know the side effect of bypassing and recover the > inconsistent state of the cluster themselves, like issuing new procedures to > assign the regions. > A patch will be uploaded and review board will be open. For now, only APIs in > ProcedureExecutor are provided. If anything is fine, I will add it to master > service and add a shell command to bypass a procedure. Or, maybe we can use > dynamically compiled JSPs to execute those APIs as mentioned in HBASE-20679. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21083) Introduce a mechanism to bypass the execution of a stuck procedure
[ https://issues.apache.org/jira/browse/HBASE-21083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16597818#comment-16597818 ] stack commented on HBASE-21083: --- [~uagashe] My fault. Was local only. I had not pushed it. Done. > Introduce a mechanism to bypass the execution of a stuck procedure > -- > > Key: HBASE-21083 > URL: https://issues.apache.org/jira/browse/HBASE-21083 > Project: HBase > Issue Type: Sub-task > Components: amv2 >Affects Versions: 2.1.0, 2.0.1 >Reporter: Allan Yang >Assignee: Allan Yang >Priority: Major > Fix For: 3.0.0, 2.1.1, 2.0.2 > > Attachments: HBASE-21083.branch-2.0.001.patch, > HBASE-21083.branch-2.0.002.patch, HBASE-21083.branch-2.0.003.patch, > HBASE-21083.branch-2.0.003.patch, HBASE-21083.branch-2.0.003.patch, > HBASE-21083.branch-2.1.001.patch > > > Offline discussed with [~stack] and [~Apache9]. We all agreed that we need to > introduce a mechanism to 'force complete' a stuck procedure, so the AMv2 can > continue running. > we still have some unrevealed bugs hiding in our AMv2 and procedureV2 > system, we need something to interfere with stuck procedures before HBCK2 can > work. This is very crucial for a production ready system. > For now, we have little ways to interfere with running procedures. Aborting > them is not a good choice, since some procedures are not abort-able. And some > procedure may have overridden the abort() method, which will ignore the abort > request. > So, here, I will introduce a mechanism to bypass the execution of a stuck > procedure. > Basically, I added a field called 'bypass' to Procedure class. If we set this > field to true, all the logic in execute/rollback will be skipped, letting > this procedure and its ancestors complete normally and releasing the lock > resources at last. > Notice that bypassing a procedure may leave the cluster in a middle state, > e.g. the region not assigned, or some hdfs files left behind. > The Operators need know the side effect of bypassing and recover the > inconsistent state of the cluster themselves, like issuing new procedures to > assign the regions. > A patch will be uploaded and review board will be open. For now, only APIs in > ProcedureExecutor are provided. If anything is fine, I will add it to master > service and add a shell command to bypass a procedure. Or, maybe we can use > dynamically compiled JSPs to execute those APIs as mentioned in HBASE-20679. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21083) Introduce a mechanism to bypass the execution of a stuck procedure
[ https://issues.apache.org/jira/browse/HBASE-21083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16597814#comment-16597814 ] Umesh Agashe commented on HBASE-21083: -- @stack, can this be committed to master as well? > Introduce a mechanism to bypass the execution of a stuck procedure > -- > > Key: HBASE-21083 > URL: https://issues.apache.org/jira/browse/HBASE-21083 > Project: HBase > Issue Type: Sub-task > Components: amv2 >Affects Versions: 2.1.0, 2.0.1 >Reporter: Allan Yang >Assignee: Allan Yang >Priority: Major > Fix For: 3.0.0, 2.1.1, 2.0.2 > > Attachments: HBASE-21083.branch-2.0.001.patch, > HBASE-21083.branch-2.0.002.patch, HBASE-21083.branch-2.0.003.patch, > HBASE-21083.branch-2.0.003.patch, HBASE-21083.branch-2.0.003.patch, > HBASE-21083.branch-2.1.001.patch > > > Offline discussed with [~stack] and [~Apache9]. We all agreed that we need to > introduce a mechanism to 'force complete' a stuck procedure, so the AMv2 can > continue running. > we still have some unrevealed bugs hiding in our AMv2 and procedureV2 > system, we need something to interfere with stuck procedures before HBCK2 can > work. This is very crucial for a production ready system. > For now, we have little ways to interfere with running procedures. Aborting > them is not a good choice, since some procedures are not abort-able. And some > procedure may have overridden the abort() method, which will ignore the abort > request. > So, here, I will introduce a mechanism to bypass the execution of a stuck > procedure. > Basically, I added a field called 'bypass' to Procedure class. If we set this > field to true, all the logic in execute/rollback will be skipped, letting > this procedure and its ancestors complete normally and releasing the lock > resources at last. > Notice that bypassing a procedure may leave the cluster in a middle state, > e.g. the region not assigned, or some hdfs files left behind. > The Operators need know the side effect of bypassing and recover the > inconsistent state of the cluster themselves, like issuing new procedures to > assign the regions. > A patch will be uploaded and review board will be open. For now, only APIs in > ProcedureExecutor are provided. If anything is fine, I will add it to master > service and add a shell command to bypass a procedure. Or, maybe we can use > dynamically compiled JSPs to execute those APIs as mentioned in HBASE-20679. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21083) Introduce a mechanism to bypass the execution of a stuck procedure
[ https://issues.apache.org/jira/browse/HBASE-21083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16595790#comment-16595790 ] Hadoop QA commented on HBASE-21083: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 39s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | || || || || {color:brown} branch-2.0 Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 26s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 5m 38s{color} | {color:green} branch-2.0 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 12m 47s{color} | {color:green} branch-2.0 passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 50s{color} | {color:green} branch-2.0 passed {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 3m 49s{color} | {color:green} branch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 5m 18s{color} | {color:green} branch-2.0 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 10s{color} | {color:green} branch-2.0 passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 14s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 46s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 12m 35s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 12m 35s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 10m 1s{color} | {color:red} hbase-protocol-shaded generated 2 new + 98 unchanged - 2 fixed = 100 total (was 100) {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 49s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 3m 51s{color} | {color:green} patch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 9m 22s{color} | {color:green} Patch does not cause any errors with Hadoop 2.6.5 2.7.4 or 3.0.0. {color} | | {color:green}+1{color} | {color:green} hbaseprotoc {color} | {color:green} 1m 33s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 5m 41s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 6s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 28s{color} | {color:green} hbase-protocol-shaded in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 31s{color} | {color:green} hbase-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 3m 3s{color} | {color:green} hbase-procedure in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green}175m 8s{color} | {color:green} hbase-server in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 1m 21s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}254m 20s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:6f01af0 | | JIRA Issue | HBASE-21083 | | JIRA P
[jira] [Commented] (HBASE-21083) Introduce a mechanism to bypass the execution of a stuck procedure
[ https://issues.apache.org/jira/browse/HBASE-21083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16595520#comment-16595520 ] Hadoop QA commented on HBASE-21083: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 41s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 1s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | || || || || {color:brown} branch-2.0 Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 5s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 53s{color} | {color:green} branch-2.0 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 12m 20s{color} | {color:green} branch-2.0 passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 52s{color} | {color:green} branch-2.0 passed {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 3m 39s{color} | {color:green} branch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 5m 8s{color} | {color:green} branch-2.0 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 14s{color} | {color:green} branch-2.0 passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 16s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 36s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 12m 11s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 12m 11s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 9m 44s{color} | {color:red} hbase-protocol-shaded generated 2 new + 98 unchanged - 2 fixed = 100 total (was 100) {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 48s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 3m 38s{color} | {color:green} patch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 8m 21s{color} | {color:green} Patch does not cause any errors with Hadoop 2.6.5 2.7.4 or 3.0.0. {color} | | {color:green}+1{color} | {color:green} hbaseprotoc {color} | {color:green} 1m 34s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 5m 40s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 8s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 29s{color} | {color:green} hbase-protocol-shaded in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 35s{color} | {color:green} hbase-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 59s{color} | {color:green} hbase-procedure in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red}181m 3s{color} | {color:red} hbase-server in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 1m 0s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}257m 31s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hbase.client.TestSplitOrMergeStatus | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.
[jira] [Commented] (HBASE-21083) Introduce a mechanism to bypass the execution of a stuck procedure
[ https://issues.apache.org/jira/browse/HBASE-21083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16595503#comment-16595503 ] stack commented on HBASE-21083: --- Retry. > Introduce a mechanism to bypass the execution of a stuck procedure > -- > > Key: HBASE-21083 > URL: https://issues.apache.org/jira/browse/HBASE-21083 > Project: HBase > Issue Type: Sub-task > Components: amv2 >Affects Versions: 2.1.0, 2.0.1 >Reporter: Allan Yang >Assignee: Allan Yang >Priority: Major > Fix For: 3.0.0, 2.1.1, 2.0.3 > > Attachments: HBASE-21083.branch-2.0.001.patch, > HBASE-21083.branch-2.0.002.patch, HBASE-21083.branch-2.0.003.patch, > HBASE-21083.branch-2.0.003.patch, HBASE-21083.branch-2.1.001.patch > > > Offline discussed with [~stack] and [~Apache9]. We all agreed that we need to > introduce a mechanism to 'force complete' a stuck procedure, so the AMv2 can > continue running. > we still have some unrevealed bugs hiding in our AMv2 and procedureV2 > system, we need something to interfere with stuck procedures before HBCK2 can > work. This is very crucial for a production ready system. > For now, we have little ways to interfere with running procedures. Aborting > them is not a good choice, since some procedures are not abort-able. And some > procedure may have overridden the abort() method, which will ignore the abort > request. > So, here, I will introduce a mechanism to bypass the execution of a stuck > procedure. > Basically, I added a field called 'bypass' to Procedure class. If we set this > field to true, all the logic in execute/rollback will be skipped, letting > this procedure and its ancestors complete normally and releasing the lock > resources at last. > Notice that bypassing a procedure may leave the cluster in a middle state, > e.g. the region not assigned, or some hdfs files left behind. > The Operators need know the side effect of bypassing and recover the > inconsistent state of the cluster themselves, like issuing new procedures to > assign the regions. > A patch will be uploaded and review board will be open. For now, only APIs in > ProcedureExecutor are provided. If anything is fine, I will add it to master > service and add a shell command to bypass a procedure. Or, maybe we can use > dynamically compiled JSPs to execute those APIs as mentioned in HBASE-20679. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21083) Introduce a mechanism to bypass the execution of a stuck procedure
[ https://issues.apache.org/jira/browse/HBASE-21083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16595494#comment-16595494 ] Umesh Agashe commented on HBASE-21083: -- Thanks for addressing the review comments, [~stack]! Thanks [~allan163] for the changes! > Introduce a mechanism to bypass the execution of a stuck procedure > -- > > Key: HBASE-21083 > URL: https://issues.apache.org/jira/browse/HBASE-21083 > Project: HBase > Issue Type: Sub-task > Components: amv2 >Affects Versions: 2.1.0, 2.0.1 >Reporter: Allan Yang >Assignee: Allan Yang >Priority: Major > Fix For: 3.0.0, 2.1.1, 2.0.3 > > Attachments: HBASE-21083.branch-2.0.001.patch, > HBASE-21083.branch-2.0.002.patch, HBASE-21083.branch-2.0.003.patch, > HBASE-21083.branch-2.1.001.patch > > > Offline discussed with [~stack] and [~Apache9]. We all agreed that we need to > introduce a mechanism to 'force complete' a stuck procedure, so the AMv2 can > continue running. > we still have some unrevealed bugs hiding in our AMv2 and procedureV2 > system, we need something to interfere with stuck procedures before HBCK2 can > work. This is very crucial for a production ready system. > For now, we have little ways to interfere with running procedures. Aborting > them is not a good choice, since some procedures are not abort-able. And some > procedure may have overridden the abort() method, which will ignore the abort > request. > So, here, I will introduce a mechanism to bypass the execution of a stuck > procedure. > Basically, I added a field called 'bypass' to Procedure class. If we set this > field to true, all the logic in execute/rollback will be skipped, letting > this procedure and its ancestors complete normally and releasing the lock > resources at last. > Notice that bypassing a procedure may leave the cluster in a middle state, > e.g. the region not assigned, or some hdfs files left behind. > The Operators need know the side effect of bypassing and recover the > inconsistent state of the cluster themselves, like issuing new procedures to > assign the regions. > A patch will be uploaded and review board will be open. For now, only APIs in > ProcedureExecutor are provided. If anything is fine, I will add it to master > service and add a shell command to bypass a procedure. Or, maybe we can use > dynamically compiled JSPs to execute those APIs as mentioned in HBASE-20679. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21083) Introduce a mechanism to bypass the execution of a stuck procedure
[ https://issues.apache.org/jira/browse/HBASE-21083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16595483#comment-16595483 ] Hadoop QA commented on HBASE-21083: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 20s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | || || || || {color:brown} branch-2.1 Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 10s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 45s{color} | {color:green} branch-2.1 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 12m 42s{color} | {color:green} branch-2.1 passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 54s{color} | {color:green} branch-2.1 passed {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 3m 25s{color} | {color:green} branch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 5m 13s{color} | {color:green} branch-2.1 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 6s{color} | {color:green} branch-2.1 passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 15s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 37s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 12m 34s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 12m 34s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 10m 4s{color} | {color:red} hbase-protocol-shaded generated 2 new + 98 unchanged - 2 fixed = 100 total (was 100) {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 43s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 3m 22s{color} | {color:green} patch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 6m 45s{color} | {color:green} Patch does not cause any errors with Hadoop 2.7.4 or 3.0.0. {color} | | {color:green}+1{color} | {color:green} hbaseprotoc {color} | {color:green} 1m 34s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 5m 30s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 10s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 31s{color} | {color:green} hbase-protocol-shaded in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 39s{color} | {color:green} hbase-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 3m 0s{color} | {color:green} hbase-procedure in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red}194m 37s{color} | {color:red} hbase-server in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 1m 52s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}270m 9s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hbase.client.TestAsyncTableGetMultiThreaded | | | hadoop.hbase.regionserver.throttle.TestFlushWithThroug
[jira] [Commented] (HBASE-21083) Introduce a mechanism to bypass the execution of a stuck procedure
[ https://issues.apache.org/jira/browse/HBASE-21083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16595173#comment-16595173 ] stack commented on HBASE-21083: --- Thanks [~allan163]. > Introduce a mechanism to bypass the execution of a stuck procedure > -- > > Key: HBASE-21083 > URL: https://issues.apache.org/jira/browse/HBASE-21083 > Project: HBase > Issue Type: Sub-task > Components: amv2 >Affects Versions: 2.1.0, 2.0.1 >Reporter: Allan Yang >Assignee: Allan Yang >Priority: Major > Fix For: 3.0.0, 2.1.1, 2.0.3 > > Attachments: HBASE-21083.branch-2.0.001.patch, > HBASE-21083.branch-2.0.002.patch, HBASE-21083.branch-2.0.003.patch, > HBASE-21083.branch-2.1.001.patch > > > Offline discussed with [~stack] and [~Apache9]. We all agreed that we need to > introduce a mechanism to 'force complete' a stuck procedure, so the AMv2 can > continue running. > we still have some unrevealed bugs hiding in our AMv2 and procedureV2 > system, we need something to interfere with stuck procedures before HBCK2 can > work. This is very crucial for a production ready system. > For now, we have little ways to interfere with running procedures. Aborting > them is not a good choice, since some procedures are not abort-able. And some > procedure may have overridden the abort() method, which will ignore the abort > request. > So, here, I will introduce a mechanism to bypass the execution of a stuck > procedure. > Basically, I added a field called 'bypass' to Procedure class. If we set this > field to true, all the logic in execute/rollback will be skipped, letting > this procedure and its ancestors complete normally and releasing the lock > resources at last. > Notice that bypassing a procedure may leave the cluster in a middle state, > e.g. the region not assigned, or some hdfs files left behind. > The Operators need know the side effect of bypassing and recover the > inconsistent state of the cluster themselves, like issuing new procedures to > assign the regions. > A patch will be uploaded and review board will be open. For now, only APIs in > ProcedureExecutor are provided. If anything is fine, I will add it to master > service and add a shell command to bypass a procedure. Or, maybe we can use > dynamically compiled JSPs to execute those APIs as mentioned in HBASE-20679. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21083) Introduce a mechanism to bypass the execution of a stuck procedure
[ https://issues.apache.org/jira/browse/HBASE-21083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16595166#comment-16595166 ] Allan Yang commented on HBASE-21083: Uploaded a HBASE-21083.branch-2.0.003 patch based on [~stack]'s .001 against branch-2.1. And uploaded it to review board too. Thanks all for reviewing. The most significant change in this path is that according to [~Apache9]'s review comment, we need to count the wait time in tryLockEntry(long id, long time) our self , since JVM may wake the waiting thread even time is not up. > Introduce a mechanism to bypass the execution of a stuck procedure > -- > > Key: HBASE-21083 > URL: https://issues.apache.org/jira/browse/HBASE-21083 > Project: HBase > Issue Type: Sub-task > Components: amv2 >Affects Versions: 2.1.0, 2.0.1 >Reporter: Allan Yang >Assignee: Allan Yang >Priority: Major > Fix For: 3.0.0, 2.1.1, 2.0.3 > > Attachments: HBASE-21083.branch-2.0.001.patch, > HBASE-21083.branch-2.0.002.patch, HBASE-21083.branch-2.0.003.patch, > HBASE-21083.branch-2.1.001.patch > > > Offline discussed with [~stack] and [~Apache9]. We all agreed that we need to > introduce a mechanism to 'force complete' a stuck procedure, so the AMv2 can > continue running. > we still have some unrevealed bugs hiding in our AMv2 and procedureV2 > system, we need something to interfere with stuck procedures before HBCK2 can > work. This is very crucial for a production ready system. > For now, we have little ways to interfere with running procedures. Aborting > them is not a good choice, since some procedures are not abort-able. And some > procedure may have overridden the abort() method, which will ignore the abort > request. > So, here, I will introduce a mechanism to bypass the execution of a stuck > procedure. > Basically, I added a field called 'bypass' to Procedure class. If we set this > field to true, all the logic in execute/rollback will be skipped, letting > this procedure and its ancestors complete normally and releasing the lock > resources at last. > Notice that bypassing a procedure may leave the cluster in a middle state, > e.g. the region not assigned, or some hdfs files left behind. > The Operators need know the side effect of bypassing and recover the > inconsistent state of the cluster themselves, like issuing new procedures to > assign the regions. > A patch will be uploaded and review board will be open. For now, only APIs in > ProcedureExecutor are provided. If anything is fine, I will add it to master > service and add a shell command to bypass a procedure. Or, maybe we can use > dynamically compiled JSPs to execute those APIs as mentioned in HBASE-20679. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21083) Introduce a mechanism to bypass the execution of a stuck procedure
[ https://issues.apache.org/jira/browse/HBASE-21083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16595162#comment-16595162 ] stack commented on HBASE-21083: --- [~allan163] no worries. I figured you were busy. Nothing special about 2.1. What is in your .003 patch? > Introduce a mechanism to bypass the execution of a stuck procedure > -- > > Key: HBASE-21083 > URL: https://issues.apache.org/jira/browse/HBASE-21083 > Project: HBase > Issue Type: Sub-task > Components: amv2 >Affects Versions: 2.1.0, 2.0.1 >Reporter: Allan Yang >Assignee: Allan Yang >Priority: Major > Fix For: 3.0.0, 2.1.1, 2.0.3 > > Attachments: HBASE-21083.branch-2.0.001.patch, > HBASE-21083.branch-2.0.002.patch, HBASE-21083.branch-2.0.003.patch, > HBASE-21083.branch-2.1.001.patch > > > Offline discussed with [~stack] and [~Apache9]. We all agreed that we need to > introduce a mechanism to 'force complete' a stuck procedure, so the AMv2 can > continue running. > we still have some unrevealed bugs hiding in our AMv2 and procedureV2 > system, we need something to interfere with stuck procedures before HBCK2 can > work. This is very crucial for a production ready system. > For now, we have little ways to interfere with running procedures. Aborting > them is not a good choice, since some procedures are not abort-able. And some > procedure may have overridden the abort() method, which will ignore the abort > request. > So, here, I will introduce a mechanism to bypass the execution of a stuck > procedure. > Basically, I added a field called 'bypass' to Procedure class. If we set this > field to true, all the logic in execute/rollback will be skipped, letting > this procedure and its ancestors complete normally and releasing the lock > resources at last. > Notice that bypassing a procedure may leave the cluster in a middle state, > e.g. the region not assigned, or some hdfs files left behind. > The Operators need know the side effect of bypassing and recover the > inconsistent state of the cluster themselves, like issuing new procedures to > assign the regions. > A patch will be uploaded and review board will be open. For now, only APIs in > ProcedureExecutor are provided. If anything is fine, I will add it to master > service and add a shell command to bypass a procedure. Or, maybe we can use > dynamically compiled JSPs to execute those APIs as mentioned in HBASE-20679. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21083) Introduce a mechanism to bypass the execution of a stuck procedure
[ https://issues.apache.org/jira/browse/HBASE-21083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16595100#comment-16595100 ] Allan Yang commented on HBASE-21083: Does branch-2.1 have big differences so that we need another patch and another review? > Introduce a mechanism to bypass the execution of a stuck procedure > -- > > Key: HBASE-21083 > URL: https://issues.apache.org/jira/browse/HBASE-21083 > Project: HBase > Issue Type: Sub-task > Components: amv2 >Affects Versions: 2.1.0, 2.0.1 >Reporter: Allan Yang >Assignee: Allan Yang >Priority: Major > Fix For: 3.0.0, 2.1.1, 2.0.3 > > Attachments: HBASE-21083.branch-2.0.001.patch, > HBASE-21083.branch-2.0.002.patch, HBASE-21083.branch-2.1.001.patch > > > Offline discussed with [~stack] and [~Apache9]. We all agreed that we need to > introduce a mechanism to 'force complete' a stuck procedure, so the AMv2 can > continue running. > we still have some unrevealed bugs hiding in our AMv2 and procedureV2 > system, we need something to interfere with stuck procedures before HBCK2 can > work. This is very crucial for a production ready system. > For now, we have little ways to interfere with running procedures. Aborting > them is not a good choice, since some procedures are not abort-able. And some > procedure may have overridden the abort() method, which will ignore the abort > request. > So, here, I will introduce a mechanism to bypass the execution of a stuck > procedure. > Basically, I added a field called 'bypass' to Procedure class. If we set this > field to true, all the logic in execute/rollback will be skipped, letting > this procedure and its ancestors complete normally and releasing the lock > resources at last. > Notice that bypassing a procedure may leave the cluster in a middle state, > e.g. the region not assigned, or some hdfs files left behind. > The Operators need know the side effect of bypassing and recover the > inconsistent state of the cluster themselves, like issuing new procedures to > assign the regions. > A patch will be uploaded and review board will be open. For now, only APIs in > ProcedureExecutor are provided. If anything is fine, I will add it to master > service and add a shell command to bypass a procedure. Or, maybe we can use > dynamically compiled JSPs to execute those APIs as mentioned in HBASE-20679. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21083) Introduce a mechanism to bypass the execution of a stuck procedure
[ https://issues.apache.org/jira/browse/HBASE-21083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16595097#comment-16595097 ] Allan Yang commented on HBASE-21083: [~stack], sorry, boss, kinda of busy these days, will catch up. > Introduce a mechanism to bypass the execution of a stuck procedure > -- > > Key: HBASE-21083 > URL: https://issues.apache.org/jira/browse/HBASE-21083 > Project: HBase > Issue Type: Sub-task > Components: amv2 >Affects Versions: 2.1.0, 2.0.1 >Reporter: Allan Yang >Assignee: Allan Yang >Priority: Major > Fix For: 3.0.0, 2.1.1, 2.0.3 > > Attachments: HBASE-21083.branch-2.0.001.patch, > HBASE-21083.branch-2.0.002.patch, HBASE-21083.branch-2.1.001.patch > > > Offline discussed with [~stack] and [~Apache9]. We all agreed that we need to > introduce a mechanism to 'force complete' a stuck procedure, so the AMv2 can > continue running. > we still have some unrevealed bugs hiding in our AMv2 and procedureV2 > system, we need something to interfere with stuck procedures before HBCK2 can > work. This is very crucial for a production ready system. > For now, we have little ways to interfere with running procedures. Aborting > them is not a good choice, since some procedures are not abort-able. And some > procedure may have overridden the abort() method, which will ignore the abort > request. > So, here, I will introduce a mechanism to bypass the execution of a stuck > procedure. > Basically, I added a field called 'bypass' to Procedure class. If we set this > field to true, all the logic in execute/rollback will be skipped, letting > this procedure and its ancestors complete normally and releasing the lock > resources at last. > Notice that bypassing a procedure may leave the cluster in a middle state, > e.g. the region not assigned, or some hdfs files left behind. > The Operators need know the side effect of bypassing and recover the > inconsistent state of the cluster themselves, like issuing new procedures to > assign the regions. > A patch will be uploaded and review board will be open. For now, only APIs in > ProcedureExecutor are provided. If anything is fine, I will add it to master > service and add a shell command to bypass a procedure. Or, maybe we can use > dynamically compiled JSPs to execute those APIs as mentioned in HBASE-20679. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21083) Introduce a mechanism to bypass the execution of a stuck procedure
[ https://issues.apache.org/jira/browse/HBASE-21083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16595080#comment-16595080 ] stack commented on HBASE-21083: --- .001 against branch-2.1 is [~allan163] 's patch with [~uagashe] review comments addressed and some checkstyle fixup. > Introduce a mechanism to bypass the execution of a stuck procedure > -- > > Key: HBASE-21083 > URL: https://issues.apache.org/jira/browse/HBASE-21083 > Project: HBase > Issue Type: Sub-task > Components: amv2 >Affects Versions: 2.1.0, 2.0.1 >Reporter: Allan Yang >Assignee: Allan Yang >Priority: Major > Fix For: 3.0.0, 2.1.1, 2.0.3 > > Attachments: HBASE-21083.branch-2.0.001.patch, > HBASE-21083.branch-2.0.002.patch, HBASE-21083.branch-2.1.001.patch > > > Offline discussed with [~stack] and [~Apache9]. We all agreed that we need to > introduce a mechanism to 'force complete' a stuck procedure, so the AMv2 can > continue running. > we still have some unrevealed bugs hiding in our AMv2 and procedureV2 > system, we need something to interfere with stuck procedures before HBCK2 can > work. This is very crucial for a production ready system. > For now, we have little ways to interfere with running procedures. Aborting > them is not a good choice, since some procedures are not abort-able. And some > procedure may have overridden the abort() method, which will ignore the abort > request. > So, here, I will introduce a mechanism to bypass the execution of a stuck > procedure. > Basically, I added a field called 'bypass' to Procedure class. If we set this > field to true, all the logic in execute/rollback will be skipped, letting > this procedure and its ancestors complete normally and releasing the lock > resources at last. > Notice that bypassing a procedure may leave the cluster in a middle state, > e.g. the region not assigned, or some hdfs files left behind. > The Operators need know the side effect of bypassing and recover the > inconsistent state of the cluster themselves, like issuing new procedures to > assign the regions. > A patch will be uploaded and review board will be open. For now, only APIs in > ProcedureExecutor are provided. If anything is fine, I will add it to master > service and add a shell command to bypass a procedure. Or, maybe we can use > dynamically compiled JSPs to execute those APIs as mentioned in HBASE-20679. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21083) Introduce a mechanism to bypass the execution of a stuck procedure
[ https://issues.apache.org/jira/browse/HBASE-21083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16590221#comment-16590221 ] Hadoop QA commented on HBASE-21083: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 21s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | || || || || {color:brown} branch-2.0 Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 17s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 59s{color} | {color:green} branch-2.0 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 15m 9s{color} | {color:green} branch-2.0 passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 59s{color} | {color:green} branch-2.0 passed {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 4m 4s{color} | {color:green} branch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 5m 27s{color} | {color:green} branch-2.0 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 9s{color} | {color:green} branch-2.0 passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 13s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 41s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 14m 58s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 14m 58s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 12m 24s{color} | {color:red} hbase-protocol-shaded generated 2 new + 98 unchanged - 2 fixed = 100 total (was 100) {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 22s{color} | {color:red} hbase-common: The patch generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0) {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 16s{color} | {color:red} hbase-procedure: The patch generated 8 new + 21 unchanged - 0 fixed = 29 total (was 21) {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 3m 57s{color} | {color:green} patch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 10m 33s{color} | {color:green} Patch does not cause any errors with Hadoop 2.6.5 2.7.4 or 3.0.0. {color} | | {color:green}+1{color} | {color:green} hbaseprotoc {color} | {color:green} 2m 6s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 6m 55s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 29s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 36s{color} | {color:green} hbase-protocol-shaded in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 42s{color} | {color:green} hbase-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 3m 9s{color} | {color:green} hbase-procedure in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green}180m 29s{color} | {color:green} hbase-server in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 1m 19s{color} | {color:green} The patch does not generate ASF License warnings. {color}
[jira] [Commented] (HBASE-21083) Introduce a mechanism to bypass the execution of a stuck procedure
[ https://issues.apache.org/jira/browse/HBASE-21083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16589916#comment-16589916 ] Allan Yang commented on HBASE-21083: [~stack], uploaded a new patch to review. > Introduce a mechanism to bypass the execution of a stuck procedure > -- > > Key: HBASE-21083 > URL: https://issues.apache.org/jira/browse/HBASE-21083 > Project: HBase > Issue Type: Sub-task > Components: amv2 >Affects Versions: 2.1.0, 2.0.1 >Reporter: Allan Yang >Assignee: Allan Yang >Priority: Major > Attachments: HBASE-21083.branch-2.0.001.patch, > HBASE-21083.branch-2.0.002.patch > > > Offline discussed with [~stack] and [~Apache9]. We all agreed that we need to > introduce a mechanism to 'force complete' a stuck procedure, so the AMv2 can > continue running. > we still have some unrevealed bugs hiding in our AMv2 and procedureV2 > system, we need something to interfere with stuck procedures before HBCK2 can > work. This is very crucial for a production ready system. > For now, we have little ways to interfere with running procedures. Aborting > them is not a good choice, since some procedures are not abort-able. And some > procedure may have overridden the abort() method, which will ignore the abort > request. > So, here, I will introduce a mechanism to bypass the execution of a stuck > procedure. > Basically, I added a field called 'bypass' to Procedure class. If we set this > field to true, all the logic in execute/rollback will be skipped, letting > this procedure and its ancestors complete normally and releasing the lock > resources at last. > Notice that bypassing a procedure may leave the cluster in a middle state, > e.g. the region not assigned, or some hdfs files left behind. > The Operators need know the side effect of bypassing and recover the > inconsistent state of the cluster themselves, like issuing new procedures to > assign the regions. > A patch will be uploaded and review board will be open. For now, only APIs in > ProcedureExecutor are provided. If anything is fine, I will add it to master > service and add a shell command to bypass a procedure. Or, maybe we can use > dynamically compiled JSPs to execute those APIs as mentioned in HBASE-20679. -- This message was sent by Atlassian JIRA (v7.6.3#76005)