[ https://issues.apache.org/jira/browse/HBASE-25212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17226322#comment-17226322 ]
Hudson commented on HBASE-25212: -------------------------------- Results for branch branch-2 [build #93 on builds.a.o|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2/93/]: (/) *{color:green}+1 overall{color}* ---- details (if available): (/) {color:green}+1 general checks{color} -- For more information [see general report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2/93/General_20Nightly_20Build_20Report/] (/) {color:green}+1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2/93/JDK8_20Nightly_20Build_20Report_20_28Hadoop2_29/] (/) {color:green}+1 jdk8 hadoop3 checks{color} -- For more information [see jdk8 (hadoop3) report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2/93/JDK8_20Nightly_20Build_20Report_20_28Hadoop3_29/] (/) {color:green}+1 jdk11 hadoop3 checks{color} -- For more information [see jdk11 report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2/93/JDK11_20Nightly_20Build_20Report_20_28Hadoop3_29/] (/) {color:green}+1 source release artifact{color} -- See build output for details. (/) {color:green}+1 client integration test{color} > Optionally abort requests in progress after deciding a region should close > -------------------------------------------------------------------------- > > Key: HBASE-25212 > URL: https://issues.apache.org/jira/browse/HBASE-25212 > Project: HBase > Issue Type: Improvement > Components: regionserver > Reporter: Andrew Kyle Purtell > Assignee: Andrew Kyle Purtell > Priority: Major > Fix For: 3.0.0-alpha-1, 1.7.0, 2.4.0 > > > After deciding a region should be closed, the regionserver will set the > internal region state to closing and wait for all pending requests to > complete, via a rendezvous on the region lock. In closing state the region > will not accept any new requests but requests in progress will be allowed to > complete before the close action takes place. In our production we see > outlier wait times on this lock in excess of several minutes. > During close when there are requests in flight the regionserver is subject to > any conceivable reason for delay, like full scans over large regions, > expensive filtering hierarchies, bugs, or store level performance problems > like slow HDFS. The regionserver should interrupt requests in progress to > facilitate smaller/shorter close times on an opt-in basis. > Optionally, via configuration parameter -- which would be a system wide > default set in hbase-site.xml in common practice but could be overridden in > table schema for per table settings -- interrupt requests in progress holding > the region lock rather than wait for completion of all operations in flight. > Send back NotServingRegionException("region is closing") to the clients of > the interrupted operations, like we do after the write lock is acquired. The > client will transparently relocate the region data and resubmit the aborted > requests per normal retry policy. This can be less disruptive than waiting > for very long times for a region to close in extreme outlier cases (e.g. 50 > minutes). In such extreme cases it is better to abort the regionserver if the > close lock cannot be acquired in a reasonable amount of time, because the > region cannot be made available again until it has closed. > After waiting for all requests to complete then we flush the region's > memstore and finish the close. The flush portion of the close process is out > of scope of this proposal. Under normal conditions the flush portion of the > close completes quickly. It is specifically waits on the close lock that has > been an occasional issue in our production that causes difficulty achieving > 99.99% availability. -- This message was sent by Atlassian Jira (v8.3.4#803005)