[jira] [Commented] (HBASE-20952) Re-visit the WAL API
[ https://issues.apache.org/jira/browse/HBASE-20952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661751#comment-16661751 ] Hudson commented on HBASE-20952: Results for branch HBASE-20952 [build #27 on builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-20952/27/]: (x) *{color:red}-1 overall{color}* details (if available): (/) {color:green}+1 general checks{color} -- For more information [see general report|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-20952/27//General_Nightly_Build_Report/] (x) {color:red}-1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-20952/27//JDK8_Nightly_Build_Report_(Hadoop2)/] (x) {color:red}-1 jdk8 hadoop3 checks{color} -- For more information [see jdk8 (hadoop3) report|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-20952/27//JDK8_Nightly_Build_Report_(Hadoop3)/] (/) {color:green}+1 source release artifact{color} -- See build output for details. (/) {color:green}+1 client integration test{color} > Re-visit the WAL API > > > Key: HBASE-20952 > URL: https://issues.apache.org/jira/browse/HBASE-20952 > Project: HBase > Issue Type: Improvement > Components: wal >Reporter: Josh Elser >Priority: Major > Attachments: 20952.v1.txt > > > Take a step back from the current WAL implementations and think about what an > HBase WAL API should look like. What are the primitive calls that we require > to guarantee durability of writes with a high degree of performance? > The API needs to take the current implementations into consideration. We > should also have a mind for what is happening in the Ratis LogService (but > the LogService should not dictate what HBase's WAL API looks like RATIS-272). > Other "systems" inside of HBase that use WALs are replication and > backup Replication has the use-case for "tail"'ing the WAL which we > should provide via our new API. B doesn't do anything fancy (IIRC). We > should make sure all consumers are generally going to be OK with the API we > create. > The API may be "OK" (or OK in a part). We need to also consider other methods > which were "bolted" on such as {{AbstractFSWAL}} and > {{WALFileLengthProvider}}. Other corners of "WAL use" (like the > {{WALSplitter}} should also be looked at to use WAL-APIs only). > We also need to make sure that adequate interface audience and stability > annotations are chosen. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21342) FileSystem in use may get closed by other bulk load call in secure bulkLoad
[ https://issues.apache.org/jira/browse/HBASE-21342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661744#comment-16661744 ] Hudson commented on HBASE-21342: Results for branch branch-2.0 [build #1004 on builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/1004/]: (x) *{color:red}-1 overall{color}* details (if available): (/) {color:green}+1 general checks{color} -- For more information [see general report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/1004//General_Nightly_Build_Report/] (x) {color:red}-1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/1004//JDK8_Nightly_Build_Report_(Hadoop2)/] (/) {color:green}+1 jdk8 hadoop3 checks{color} -- For more information [see jdk8 (hadoop3) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/1004//JDK8_Nightly_Build_Report_(Hadoop3)/] (/) {color:green}+1 source release artifact{color} -- See build output for details. > FileSystem in use may get closed by other bulk load call in secure bulkLoad > > > Key: HBASE-21342 > URL: https://issues.apache.org/jira/browse/HBASE-21342 > Project: HBase > Issue Type: Bug >Affects Versions: 3.0.0, 2.1.0, 1.5.0, 1.3.3, 1.4.4, 2.0.1, 1.2.7 >Reporter: mazhenlin >Assignee: mazhenlin >Priority: Major > Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3 > > Attachments: 21342.v1.txt, HBASE-21342.002.patch, > HBASE-21342.003.patch, HBASE-21342.004.patch, HBASE-21342.005.patch, > HBASE-21342.006.patch, HBASE-21342.007.patch, race.patch > > > As mentioned in [HBASE-15291|#HBASE-15291], there is a race condition. If > Two secure bulkload calls from the same UGI into two different regions and > one region finishes earlier, it will close the bulk load fs, and the other > region will fail. > > Another case would be more serious. The FileSystem.close() function needs two > synchronized variables : CACHE and deleteOnExit. If one region calls > FileSystem.closeAllForUGI ( in SecureBulkLoadManager.cleanupBulkLoad) while > another region is trying to close srcFS ( in > SecureBulkLoadListener.closeSrcFs) , can cause deadlock here. > > I have wrote a UT for this and fixed it using reference counter. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21338) [balancer] If balancer is an ill-fit for cluster size, it gives little indication
[ https://issues.apache.org/jira/browse/HBASE-21338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661742#comment-16661742 ] Hudson commented on HBASE-21338: Results for branch branch-2.0 [build #1004 on builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/1004/]: (x) *{color:red}-1 overall{color}* details (if available): (/) {color:green}+1 general checks{color} -- For more information [see general report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/1004//General_Nightly_Build_Report/] (x) {color:red}-1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/1004//JDK8_Nightly_Build_Report_(Hadoop2)/] (/) {color:green}+1 jdk8 hadoop3 checks{color} -- For more information [see jdk8 (hadoop3) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/1004//JDK8_Nightly_Build_Report_(Hadoop3)/] (/) {color:green}+1 source release artifact{color} -- See build output for details. > [balancer] If balancer is an ill-fit for cluster size, it gives little > indication > - > > Key: HBASE-21338 > URL: https://issues.apache.org/jira/browse/HBASE-21338 > Project: HBase > Issue Type: Sub-task > Components: Balancer, Operability >Reporter: stack >Assignee: Xu Cang >Priority: Major > Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3 > > Attachments: HBASE-21338.master.001.patch > > > See parent issue. Running balancer on a cluster where the max steps was way > inadequate, the balancer gave little to no indication that it was > ill-configured. In fact, it only logged its starting and then that there was > nothing to do though the cluster was obviously out-of-whack. > Ideally the balancer would complain when say the maxSteps limit is a small > fraction of what the cluster's calculated max steps are, or it would notice > that the balancer is making little progress on an imbalanced cluster and > shout. Can we set balancer configs w/o having to restart Master? -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21349) Cluster is going down but CatalogJanitor and Normalizer try to run and fail noisely
[ https://issues.apache.org/jira/browse/HBASE-21349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661743#comment-16661743 ] Hudson commented on HBASE-21349: Results for branch branch-2.0 [build #1004 on builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/1004/]: (x) *{color:red}-1 overall{color}* details (if available): (/) {color:green}+1 general checks{color} -- For more information [see general report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/1004//General_Nightly_Build_Report/] (x) {color:red}-1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/1004//JDK8_Nightly_Build_Report_(Hadoop2)/] (/) {color:green}+1 jdk8 hadoop3 checks{color} -- For more information [see jdk8 (hadoop3) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/1004//JDK8_Nightly_Build_Report_(Hadoop3)/] (/) {color:green}+1 source release artifact{color} -- See build output for details. > Cluster is going down but CatalogJanitor and Normalizer try to run and fail > noisely > --- > > Key: HBASE-21349 > URL: https://issues.apache.org/jira/browse/HBASE-21349 > Project: HBase > Issue Type: Bug >Affects Versions: 2.1.0 >Reporter: stack >Assignee: Xu Cang >Priority: Minor > Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3 > > Attachments: HBASE-21349.master.002.patch, > HBASE-21349.master.002.patch, HBASE-21349.master.002.patch, > HBASE-22349.master.001.patch > > > Shutting down can take a while. Meantime catalog janitor and or normalizer > (etc?) try to run and when they can't, they fail noisely. Looks bad: > {code} > 2018-10-19 21:23:24,962 INFO org.apache.hadoop.hbase.master.ServerManager: > Cluster shutdown set; vc1205.halxg.cloudera.com,22101,1539991730711 expired; > onlineServers=51 > 2018-10-19 21:25:54,502 WARN org.apache.hadoop.hbase.master.CatalogJanitor: > Failed scan of catalog table > java.io.IOException: connection is closed > at > org.apache.hadoop.hbase.MetaTableAccessor.getMetaHTable(MetaTableAccessor.java:267) > at > org.apache.hadoop.hbase.MetaTableAccessor.scanMeta(MetaTableAccessor.java:763) > at > org.apache.hadoop.hbase.MetaTableAccessor.scanMeta(MetaTableAccessor.java:734) > at > org.apache.hadoop.hbase.MetaTableAccessor.scanMeta(MetaTableAccessor.java:684) > at > org.apache.hadoop.hbase.MetaTableAccessor.scanMetaForTableRegions(MetaTableAccessor.java:679) > at > org.apache.hadoop.hbase.master.CatalogJanitor.getMergedRegionsAndSplitParents(CatalogJanitor.java:185) > at > org.apache.hadoop.hbase.master.CatalogJanitor.getMergedRegionsAndSplitParents(CatalogJanitor.java:137) > at > org.apache.hadoop.hbase.master.CatalogJanitor.scan(CatalogJanitor.java:243) > at > org.apache.hadoop.hbase.master.CatalogJanitor.chore(CatalogJanitor.java:116) > at org.apache.hadoop.hbase.ScheduledChore.run(ScheduledChore.java:186) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) > at > org.apache.hadoop.hbase.JitterScheduledThreadPoolExecutorImpl$JitteredRunnableScheduledFuture.run(JitterScheduledThreadPoolExecutorImpl.java:111) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > 2018-10-19 21:25:54,507 ERROR > org.apache.hadoop.hbase.master.normalizer.RegionNormalizerChore: Failed to > normalize regions. > java.io.IOException: connection is closed > at > org.apache.hadoop.hbase.MetaTableAccessor.getMetaHTable(MetaTableAccessor.java:267) > at > org.apache.hadoop.hbase.MetaTableAccessor.scanMeta(MetaTableAccessor.java:763) > at > org.apache.hadoop.hbase.MetaTableAccessor.scanMeta(MetaTableAccessor.java:734) > at > org.apache.hadoop.hbase.MetaTableAccessor.scanMeta(MetaTableAccessor.java:690) > at > org.apache.hadoop.hbase.MetaTableAccessor.fullScanTables(MetaTableAccessor.java:240) > at > org.apache.hadoop.hbase.master.TableStateManager.getTablesInStates(TableStateManager.java:189) > at > org.apache.hadoop.hbase.master.HMaster.normalizeRegions(HMaster.java:1718) > at >
[jira] [Commented] (HBASE-21363) Rewrite the buildingHoldCleanupTracker method in WALProcedureStore
[ https://issues.apache.org/jira/browse/HBASE-21363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661729#comment-16661729 ] Hadoop QA commented on HBASE-21363: --- | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 9s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 6s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 27s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 18s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 4m 33s{color} | {color:green} branch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 36s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 22s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 5m 28s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 22s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 22s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 14s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 4m 32s{color} | {color:green} patch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 11m 13s{color} | {color:green} Patch does not cause any errors with Hadoop 2.7.4 or 3.0.0. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 42s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 14s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 3m 36s{color} | {color:green} hbase-procedure in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 15s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 41m 33s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:b002b0b | | JIRA Issue | HBASE-21363 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12945335/HBASE-21363-v4.patch | | Optional Tests | dupname asflicense javac javadoc unit findbugs shadedjars hadoopcheck hbaseanti checkstyle compile | | uname | Linux b207abd0da91 4.4.0-134-generic #160~14.04.1-Ubuntu SMP Fri Aug 17 11:07:07 UTC 2018 x86_64 GNU/Linux | | Build tool | maven | | Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build@2/component/dev-support/hbase-personality.sh | | git revision | master / 1f437ac221 | | maven | version: Apache Maven 3.5.4 (1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-17T18:33:14Z) | | Default Java | 1.8.0_181 | | findbugs | v3.1.0-RC3 | | Test Results | https://builds.apache.org/job/PreCommit-HBASE-Build/14840/testReport/ | | Max. process+thread count | 273 (vs. ulimit of 1) | | modules | C: hbase-procedure U: hbase-procedure | | Console output | https://builds.apache.org/job/PreCommit-HBASE-Build/14840/console | | Powered by | Apache Yetus 0.8.0 http://yetus.apache.org | This message was automatically generated. > Rewrite the
[jira] [Commented] (HBASE-21325) Force to terminate regionserver when abort hang in somewhere
[ https://issues.apache.org/jira/browse/HBASE-21325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661720#comment-16661720 ] Hadoop QA commented on HBASE-21325: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 11s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 5m 16s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 50s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 15s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 4m 26s{color} | {color:green} branch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 57s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 30s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 5m 5s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} compile {color} | {color:red} 0m 32s{color} | {color:red} hbase-server in the patch failed. {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 0m 32s{color} | {color:red} hbase-server in the patch failed. {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 1m 12s{color} | {color:red} hbase-server: The patch generated 3 new + 76 unchanged - 0 fixed = 79 total (was 76) {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 4m 15s{color} | {color:green} patch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 10m 26s{color} | {color:green} Patch does not cause any errors with Hadoop 2.7.4 or 3.0.0. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 3s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 31s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red}133m 55s{color} | {color:red} hbase-server in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 23s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}174m 13s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hbase.replication.TestSyncReplicationRemoveRemoteWAL | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:b002b0b | | JIRA Issue | HBASE-21325 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12945315/HBASE-21325.master.003.patch | | Optional Tests | dupname asflicense javac javadoc unit findbugs shadedjars hadoopcheck hbaseanti checkstyle compile | | uname | Linux 3106f256c7e3 3.13.0-143-generic #192-Ubuntu SMP Tue Feb 27 10:45:36 UTC 2018 x86_64 GNU/Linux | | Build tool | maven | | Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh | | git revision | master / 1f437ac221 | | maven | version: Apache Maven 3.5.4 (1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-17T18:33:14Z) | | Default Java | 1.8.0_181 | | findbugs | v3.1.0-RC3 | | compile | https://builds.apache.org/job/PreCommit-HBASE-Build/14836/artifact/patchprocess/patch-compile-hbase-server.txt | | javac |
[jira] [Commented] (HBASE-21338) [balancer] If balancer is an ill-fit for cluster size, it gives little indication
[ https://issues.apache.org/jira/browse/HBASE-21338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661716#comment-16661716 ] Hudson commented on HBASE-21338: Results for branch branch-2.1 [build #522 on builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/522/]: (/) *{color:green}+1 overall{color}* details (if available): (/) {color:green}+1 general checks{color} -- For more information [see general report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/522//General_Nightly_Build_Report/] (/) {color:green}+1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/522//JDK8_Nightly_Build_Report_(Hadoop2)/] (/) {color:green}+1 jdk8 hadoop3 checks{color} -- For more information [see jdk8 (hadoop3) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/522//JDK8_Nightly_Build_Report_(Hadoop3)/] (/) {color:green}+1 source release artifact{color} -- See build output for details. (/) {color:green}+1 client integration test{color} > [balancer] If balancer is an ill-fit for cluster size, it gives little > indication > - > > Key: HBASE-21338 > URL: https://issues.apache.org/jira/browse/HBASE-21338 > Project: HBase > Issue Type: Sub-task > Components: Balancer, Operability >Reporter: stack >Assignee: Xu Cang >Priority: Major > Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3 > > Attachments: HBASE-21338.master.001.patch > > > See parent issue. Running balancer on a cluster where the max steps was way > inadequate, the balancer gave little to no indication that it was > ill-configured. In fact, it only logged its starting and then that there was > nothing to do though the cluster was obviously out-of-whack. > Ideally the balancer would complain when say the maxSteps limit is a small > fraction of what the cluster's calculated max steps are, or it would notice > that the balancer is making little progress on an imbalanced cluster and > shout. Can we set balancer configs w/o having to restart Master? -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21342) FileSystem in use may get closed by other bulk load call in secure bulkLoad
[ https://issues.apache.org/jira/browse/HBASE-21342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661718#comment-16661718 ] Hudson commented on HBASE-21342: Results for branch branch-2.1 [build #522 on builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/522/]: (/) *{color:green}+1 overall{color}* details (if available): (/) {color:green}+1 general checks{color} -- For more information [see general report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/522//General_Nightly_Build_Report/] (/) {color:green}+1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/522//JDK8_Nightly_Build_Report_(Hadoop2)/] (/) {color:green}+1 jdk8 hadoop3 checks{color} -- For more information [see jdk8 (hadoop3) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/522//JDK8_Nightly_Build_Report_(Hadoop3)/] (/) {color:green}+1 source release artifact{color} -- See build output for details. (/) {color:green}+1 client integration test{color} > FileSystem in use may get closed by other bulk load call in secure bulkLoad > > > Key: HBASE-21342 > URL: https://issues.apache.org/jira/browse/HBASE-21342 > Project: HBase > Issue Type: Bug >Affects Versions: 3.0.0, 2.1.0, 1.5.0, 1.3.3, 1.4.4, 2.0.1, 1.2.7 >Reporter: mazhenlin >Assignee: mazhenlin >Priority: Major > Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3 > > Attachments: 21342.v1.txt, HBASE-21342.002.patch, > HBASE-21342.003.patch, HBASE-21342.004.patch, HBASE-21342.005.patch, > HBASE-21342.006.patch, HBASE-21342.007.patch, race.patch > > > As mentioned in [HBASE-15291|#HBASE-15291], there is a race condition. If > Two secure bulkload calls from the same UGI into two different regions and > one region finishes earlier, it will close the bulk load fs, and the other > region will fail. > > Another case would be more serious. The FileSystem.close() function needs two > synchronized variables : CACHE and deleteOnExit. If one region calls > FileSystem.closeAllForUGI ( in SecureBulkLoadManager.cleanupBulkLoad) while > another region is trying to close srcFS ( in > SecureBulkLoadListener.closeSrcFs) , can cause deadlock here. > > I have wrote a UT for this and fixed it using reference counter. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21349) Cluster is going down but CatalogJanitor and Normalizer try to run and fail noisely
[ https://issues.apache.org/jira/browse/HBASE-21349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661717#comment-16661717 ] Hudson commented on HBASE-21349: Results for branch branch-2.1 [build #522 on builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/522/]: (/) *{color:green}+1 overall{color}* details (if available): (/) {color:green}+1 general checks{color} -- For more information [see general report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/522//General_Nightly_Build_Report/] (/) {color:green}+1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/522//JDK8_Nightly_Build_Report_(Hadoop2)/] (/) {color:green}+1 jdk8 hadoop3 checks{color} -- For more information [see jdk8 (hadoop3) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/522//JDK8_Nightly_Build_Report_(Hadoop3)/] (/) {color:green}+1 source release artifact{color} -- See build output for details. (/) {color:green}+1 client integration test{color} > Cluster is going down but CatalogJanitor and Normalizer try to run and fail > noisely > --- > > Key: HBASE-21349 > URL: https://issues.apache.org/jira/browse/HBASE-21349 > Project: HBase > Issue Type: Bug >Affects Versions: 2.1.0 >Reporter: stack >Assignee: Xu Cang >Priority: Minor > Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3 > > Attachments: HBASE-21349.master.002.patch, > HBASE-21349.master.002.patch, HBASE-21349.master.002.patch, > HBASE-22349.master.001.patch > > > Shutting down can take a while. Meantime catalog janitor and or normalizer > (etc?) try to run and when they can't, they fail noisely. Looks bad: > {code} > 2018-10-19 21:23:24,962 INFO org.apache.hadoop.hbase.master.ServerManager: > Cluster shutdown set; vc1205.halxg.cloudera.com,22101,1539991730711 expired; > onlineServers=51 > 2018-10-19 21:25:54,502 WARN org.apache.hadoop.hbase.master.CatalogJanitor: > Failed scan of catalog table > java.io.IOException: connection is closed > at > org.apache.hadoop.hbase.MetaTableAccessor.getMetaHTable(MetaTableAccessor.java:267) > at > org.apache.hadoop.hbase.MetaTableAccessor.scanMeta(MetaTableAccessor.java:763) > at > org.apache.hadoop.hbase.MetaTableAccessor.scanMeta(MetaTableAccessor.java:734) > at > org.apache.hadoop.hbase.MetaTableAccessor.scanMeta(MetaTableAccessor.java:684) > at > org.apache.hadoop.hbase.MetaTableAccessor.scanMetaForTableRegions(MetaTableAccessor.java:679) > at > org.apache.hadoop.hbase.master.CatalogJanitor.getMergedRegionsAndSplitParents(CatalogJanitor.java:185) > at > org.apache.hadoop.hbase.master.CatalogJanitor.getMergedRegionsAndSplitParents(CatalogJanitor.java:137) > at > org.apache.hadoop.hbase.master.CatalogJanitor.scan(CatalogJanitor.java:243) > at > org.apache.hadoop.hbase.master.CatalogJanitor.chore(CatalogJanitor.java:116) > at org.apache.hadoop.hbase.ScheduledChore.run(ScheduledChore.java:186) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) > at > org.apache.hadoop.hbase.JitterScheduledThreadPoolExecutorImpl$JitteredRunnableScheduledFuture.run(JitterScheduledThreadPoolExecutorImpl.java:111) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > 2018-10-19 21:25:54,507 ERROR > org.apache.hadoop.hbase.master.normalizer.RegionNormalizerChore: Failed to > normalize regions. > java.io.IOException: connection is closed > at > org.apache.hadoop.hbase.MetaTableAccessor.getMetaHTable(MetaTableAccessor.java:267) > at > org.apache.hadoop.hbase.MetaTableAccessor.scanMeta(MetaTableAccessor.java:763) > at > org.apache.hadoop.hbase.MetaTableAccessor.scanMeta(MetaTableAccessor.java:734) > at > org.apache.hadoop.hbase.MetaTableAccessor.scanMeta(MetaTableAccessor.java:690) > at > org.apache.hadoop.hbase.MetaTableAccessor.fullScanTables(MetaTableAccessor.java:240) > at > org.apache.hadoop.hbase.master.TableStateManager.getTablesInStates(TableStateManager.java:189) > at >
[jira] [Commented] (HBASE-21363) Rewrite the buildingHoldCleanupTracker method in WALProcedureStore
[ https://issues.apache.org/jira/browse/HBASE-21363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661708#comment-16661708 ] Allan Yang commented on HBASE-21363: {code} +this.partial = resetDelete ? false : other.partial; {code} Add a comment for this one The v4 patch looks great, +1 for it. You can add a comment while committing > Rewrite the buildingHoldCleanupTracker method in WALProcedureStore > -- > > Key: HBASE-21363 > URL: https://issues.apache.org/jira/browse/HBASE-21363 > Project: HBase > Issue Type: Sub-task > Components: proc-v2 >Reporter: Duo Zhang >Assignee: Duo Zhang >Priority: Major > Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3 > > Attachments: HBASE-21363-v1.patch, HBASE-21363-v2.patch, > HBASE-21363-v3.patch, HBASE-21363-v4.patch, HBASE-21363.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21372) Set hbase.assignment.maximum.attempts to Long.MAX
[ https://issues.apache.org/jira/browse/HBASE-21372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661707#comment-16661707 ] Hadoop QA commented on HBASE-21372: --- | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 18s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:orange}-0{color} | {color:orange} test4tests {color} | {color:orange} 0m 0s{color} | {color:orange} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} branch-2.1 Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 38s{color} | {color:green} branch-2.1 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 51s{color} | {color:green} branch-2.1 passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 15s{color} | {color:green} branch-2.1 passed {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 3m 52s{color} | {color:green} branch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 15s{color} | {color:green} branch-2.1 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 40s{color} | {color:green} branch-2.1 passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 7s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 45s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 45s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 10s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 3m 35s{color} | {color:green} patch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 9m 12s{color} | {color:green} Patch does not cause any errors with Hadoop 2.7.4 or 3.0.0. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 26s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 32s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green}176m 23s{color} | {color:green} hbase-server in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 26s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}216m 59s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:42ca976 | | JIRA Issue | HBASE-21372 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12945308/HBASE-21372.branch-2.1.001.patch | | Optional Tests | dupname asflicense javac javadoc unit findbugs shadedjars hadoopcheck hbaseanti checkstyle compile | | uname | Linux d021c3937513 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 08:52:28 UTC 2018 x86_64 GNU/Linux | | Build tool | maven | | Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh | | git revision | branch-2.1 / d35f65f396 | | maven | version: Apache Maven 3.5.4 (1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-17T18:33:14Z) | | Default Java | 1.8.0_181 | | findbugs | v3.1.0-RC3 | | Test Results | https://builds.apache.org/job/PreCommit-HBASE-Build/14833/testReport/ | | Max. process+thread count | 4249 (vs. ulimit of 1) | | modules | C: hbase-server U: hbase-server | | Console output |
[jira] [Commented] (HBASE-21363) Rewrite the buildingHoldCleanupTracker method in WALProcedureStore
[ https://issues.apache.org/jira/browse/HBASE-21363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661695#comment-16661695 ] Duo Zhang commented on HBASE-21363: --- Add a UT to confirm that we will reset all the deleted flags when building holdingCleanupTracker even if the original tracker is partial. > Rewrite the buildingHoldCleanupTracker method in WALProcedureStore > -- > > Key: HBASE-21363 > URL: https://issues.apache.org/jira/browse/HBASE-21363 > Project: HBase > Issue Type: Sub-task > Components: proc-v2 >Reporter: Duo Zhang >Assignee: Duo Zhang >Priority: Major > Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3 > > Attachments: HBASE-21363-v1.patch, HBASE-21363-v2.patch, > HBASE-21363-v3.patch, HBASE-21363-v4.patch, HBASE-21363.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-21363) Rewrite the buildingHoldCleanupTracker method in WALProcedureStore
[ https://issues.apache.org/jira/browse/HBASE-21363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Duo Zhang updated HBASE-21363: -- Attachment: HBASE-21363-v4.patch > Rewrite the buildingHoldCleanupTracker method in WALProcedureStore > -- > > Key: HBASE-21363 > URL: https://issues.apache.org/jira/browse/HBASE-21363 > Project: HBase > Issue Type: Sub-task > Components: proc-v2 >Reporter: Duo Zhang >Assignee: Duo Zhang >Priority: Major > Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3 > > Attachments: HBASE-21363-v1.patch, HBASE-21363-v2.patch, > HBASE-21363-v3.patch, HBASE-21363-v4.patch, HBASE-21363.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-21344) hbase:meta location in ZooKeeper set to OPENING by the procedure which eventually failed but precludes Master from assigning it forever
[ https://issues.apache.org/jira/browse/HBASE-21344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ankit Singhal updated HBASE-21344: -- Attachment: HBASE-21344.branch-2.0.003.patch > hbase:meta location in ZooKeeper set to OPENING by the procedure which > eventually failed but precludes Master from assigning it forever > --- > > Key: HBASE-21344 > URL: https://issues.apache.org/jira/browse/HBASE-21344 > Project: HBase > Issue Type: Bug > Components: proc-v2 >Reporter: Ankit Singhal >Assignee: Ankit Singhal >Priority: Major > Fix For: 2.0.3 > > Attachments: HBASE-21344-branch-2.0.patch, > HBASE-21344-branch-2.0_v2.patch, HBASE-21344-branch-2.0_v3.patch, > HBASE-21344.branch-2.0.003.patch > > > [~elserj] has already summarized it well. > 1. hbase:meta was on RS8 > 2. RS8 crashed, SCP was queued for it, meta first > 3. meta was marked OFFLINE > 4. meta marked as OPENING on RS3 > 5. Can't actually send the openRegion RPC to RS3 due to the krb ticket issue > 6. We attempt the openRegion/assignment 10 times, failing each time > 7. We start rolling back the procedure: > {code:java} > 2018-10-08 06:51:24,440 WARN [PEWorker-9] procedure2.ProcedureExecutor: > Usually this should not happen, we will release the lock before if the > procedure is finished, even if the holdLock is true, arrive here means we > have some holes where we do not release the lock. And the releaseLock below > may fail since the procedure may have already been deleted from the procedure > store. > 2018-10-08 06:51:24,543 INFO [PEWorker-9] > procedure.MasterProcedureScheduler: pid=48, ppid=47, > state=FAILED:REGION_TRANSITION_QUEUE, > exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via > AssignProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException: Max > attempts exceeded; AssignProcedure table=hbase:meta, region=1588230740 > checking lock on 1588230740 > {code} > {code:java} > 2018-10-08 06:51:30,957 ERROR [PEWorker-9] procedure2.ProcedureExecutor: > CODE-BUG: Uncaught runtime exception for pid=47, > state=FAILED:SERVER_CRASH_ASSIGN_META, locked=true, > exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via > AssignProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException: Max > attempts exceeded; ServerCrashProcedure > server=,16020,1538974612843, splitWal=true, meta=true > java.lang.UnsupportedOperationException: unhandled > state=SERVER_CRASH_GET_REGIONS > at > org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.rollbackState(ServerCrashProcedure.java:254) > at > org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.rollbackState(ServerCrashProcedure.java:58) > at > org.apache.hadoop.hbase.procedure2.StateMachineProcedure.rollback(StateMachineProcedure.java:203) > at > org.apache.hadoop.hbase.procedure2.Procedure.doRollback(Procedure.java:960) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1577) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1539) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1418) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$900(ProcedureExecutor.java:75) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1981) > {code} > {code:java} > { DEBUG [PEWorker-2] client.RpcRetryingCallerImpl: Call exception, tries=7, > retries=7, started=8168 ms ago, cancelled=false, msg=Meta region is in state > OPENING, details=row 'backup:system' on table 'hbase:meta' at > region=hbase:meta,,1.1588230740, hostname=, seqNum=-1, > exception=java.io.IOException: Meta region is in state OPENING > at > org.apache.hadoop.hbase.client.ZKAsyncRegistry.lambda$null$1(ZKAsyncRegistry.java:154) > at > java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760) > at > java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736) > at > java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474) > at > java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1962) > at > org.apache.hadoop.hbase.client.ZKAsyncRegistry.lambda$getAndConvert$0(ZKAsyncRegistry.java:77) > at > java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760) > at > java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736) > at >
[jira] [Commented] (HBASE-21363) Rewrite the buildingHoldCleanupTracker method in WALProcedureStore
[ https://issues.apache.org/jira/browse/HBASE-21363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661668#comment-16661668 ] Duo Zhang commented on HBASE-21363: --- Talked with [~allan163] offline, there are still problems. I'm writing a UT now. Will be back soon. > Rewrite the buildingHoldCleanupTracker method in WALProcedureStore > -- > > Key: HBASE-21363 > URL: https://issues.apache.org/jira/browse/HBASE-21363 > Project: HBase > Issue Type: Sub-task > Components: proc-v2 >Reporter: Duo Zhang >Assignee: Duo Zhang >Priority: Major > Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3 > > Attachments: HBASE-21363-v1.patch, HBASE-21363-v2.patch, > HBASE-21363-v3.patch, HBASE-21363.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21364) Procedure holds the lock should put to front of the queue after restart
[ https://issues.apache.org/jira/browse/HBASE-21364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661667#comment-16661667 ] stack commented on HBASE-21364: --- Thanks boys. > Procedure holds the lock should put to front of the queue after restart > --- > > Key: HBASE-21364 > URL: https://issues.apache.org/jira/browse/HBASE-21364 > Project: HBase > Issue Type: Sub-task >Affects Versions: 2.1.0, 2.0.2 >Reporter: Allan Yang >Assignee: Allan Yang >Priority: Blocker > Fix For: 2.1.1, 2.0.3 > > Attachments: HBASE-21364.branch-2.0.001.patch, > HBASE-21364.branch-2.0.002.patch > > > After restore the procedures form Procedure WALs. We will put the runable > procedures back to the queue to execute. The order is not the problem before > HBASE-20846 since the first one to execute will acquire the lock itself. But > since the locks will restored after HBASE-20846. If we execute a procedure > without the lock first before a procedure with the lock in the same queue, > there is a race condition that we may not be able to execute all procedures > in the same queue at all. > The race condtion is: > 1. A procedure need to take the table's exclusive lock was put into the > table's queue, but the table's shard lock was lock by a Region Procedure. > Since no one takes the exclusive lock, the queue is put to run queue to > execute. But soon, the worker thread see the procedure can't execute because > it doesn't hold the lock, so it will stop execute and remove the queue from > run queue. > 2. At the same time, the Region procedure which holds the table's shard lock > and the region's exclusive lock is put to the table's queue. But, since the > queue already added to the run queue, it won't add again. > 3. Since 1, the table's queue was removed from the run queue. > 4. Then, no one will put the table's queue back, thus no worker will execute > the procedures inside > A test case in the patch shows how. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21344) hbase:meta location in ZooKeeper set to OPENING by the procedure which eventually failed but precludes Master from assigning it forever
[ https://issues.apache.org/jira/browse/HBASE-21344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661666#comment-16661666 ] stack commented on HBASE-21344: --- You need to add .branch-2.0. into the name of your patch [~an...@apache.org] Also, this is an important patch because the left-over start will prevent us getting to the wait-on-meta holding pattern. > hbase:meta location in ZooKeeper set to OPENING by the procedure which > eventually failed but precludes Master from assigning it forever > --- > > Key: HBASE-21344 > URL: https://issues.apache.org/jira/browse/HBASE-21344 > Project: HBase > Issue Type: Bug > Components: proc-v2 >Reporter: Ankit Singhal >Assignee: Ankit Singhal >Priority: Major > Fix For: 2.0.3 > > Attachments: HBASE-21344-branch-2.0.patch, > HBASE-21344-branch-2.0_v2.patch, HBASE-21344-branch-2.0_v3.patch > > > [~elserj] has already summarized it well. > 1. hbase:meta was on RS8 > 2. RS8 crashed, SCP was queued for it, meta first > 3. meta was marked OFFLINE > 4. meta marked as OPENING on RS3 > 5. Can't actually send the openRegion RPC to RS3 due to the krb ticket issue > 6. We attempt the openRegion/assignment 10 times, failing each time > 7. We start rolling back the procedure: > {code:java} > 2018-10-08 06:51:24,440 WARN [PEWorker-9] procedure2.ProcedureExecutor: > Usually this should not happen, we will release the lock before if the > procedure is finished, even if the holdLock is true, arrive here means we > have some holes where we do not release the lock. And the releaseLock below > may fail since the procedure may have already been deleted from the procedure > store. > 2018-10-08 06:51:24,543 INFO [PEWorker-9] > procedure.MasterProcedureScheduler: pid=48, ppid=47, > state=FAILED:REGION_TRANSITION_QUEUE, > exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via > AssignProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException: Max > attempts exceeded; AssignProcedure table=hbase:meta, region=1588230740 > checking lock on 1588230740 > {code} > {code:java} > 2018-10-08 06:51:30,957 ERROR [PEWorker-9] procedure2.ProcedureExecutor: > CODE-BUG: Uncaught runtime exception for pid=47, > state=FAILED:SERVER_CRASH_ASSIGN_META, locked=true, > exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via > AssignProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException: Max > attempts exceeded; ServerCrashProcedure > server=,16020,1538974612843, splitWal=true, meta=true > java.lang.UnsupportedOperationException: unhandled > state=SERVER_CRASH_GET_REGIONS > at > org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.rollbackState(ServerCrashProcedure.java:254) > at > org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.rollbackState(ServerCrashProcedure.java:58) > at > org.apache.hadoop.hbase.procedure2.StateMachineProcedure.rollback(StateMachineProcedure.java:203) > at > org.apache.hadoop.hbase.procedure2.Procedure.doRollback(Procedure.java:960) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1577) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1539) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1418) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$900(ProcedureExecutor.java:75) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1981) > {code} > {code:java} > { DEBUG [PEWorker-2] client.RpcRetryingCallerImpl: Call exception, tries=7, > retries=7, started=8168 ms ago, cancelled=false, msg=Meta region is in state > OPENING, details=row 'backup:system' on table 'hbase:meta' at > region=hbase:meta,,1.1588230740, hostname=, seqNum=-1, > exception=java.io.IOException: Meta region is in state OPENING > at > org.apache.hadoop.hbase.client.ZKAsyncRegistry.lambda$null$1(ZKAsyncRegistry.java:154) > at > java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760) > at > java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736) > at > java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474) > at > java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1962) > at > org.apache.hadoop.hbase.client.ZKAsyncRegistry.lambda$getAndConvert$0(ZKAsyncRegistry.java:77) > at > java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760) > at >
[jira] [Commented] (HBASE-21254) Need to find a way to limit the number of proc wal files
[ https://issues.apache.org/jira/browse/HBASE-21254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661663#comment-16661663 ] Duo Zhang commented on HBASE-21254: --- Can do this later. > Need to find a way to limit the number of proc wal files > > > Key: HBASE-21254 > URL: https://issues.apache.org/jira/browse/HBASE-21254 > Project: HBase > Issue Type: Sub-task > Components: proc-v2 >Reporter: Duo Zhang >Assignee: Duo Zhang >Priority: Critical > Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3 > > Attachments: HBASE-21254-v1.patch, HBASE-21254-v2.patch, > HBASE-21254-v3.patch, HBASE-21254.patch > > > For regionserver, we have a max wal file limitation, if we reach the > limitation, we will trigger a flush on specific regions so that we can delete > old wal files. But for proc wals, we do not have this mechanism, and it will > be worse after HBASE-21233, as if there is an old procedure which can not > make progress and do not persist its state, we need to keep the old proc wal > file for ever... -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-21357) RS should abort if OOM in Reader thread
[ https://issues.apache.org/jira/browse/HBASE-21357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allan Yang updated HBASE-21357: --- Resolution: Fixed Fix Version/s: 1.4.9 Status: Resolved (was: Patch Available) > RS should abort if OOM in Reader thread > --- > > Key: HBASE-21357 > URL: https://issues.apache.org/jira/browse/HBASE-21357 > Project: HBase > Issue Type: Bug >Affects Versions: 1.4.8 >Reporter: Allan Yang >Assignee: Allan Yang >Priority: Major > Fix For: 1.4.9 > > Attachments: HBASE-21357.branch-1.001.patch, > HBASE-21357.branch-1.001.patch > > > It is a bit strange, we will abort the RS if OOM in Listener thread, > Responder thread and in CallRunner thread, only not in Reader thread... > We should abort RS if OOM happens in Reader thread, too. If not, the reader > thread exists because of OOM, and the selector closes. Later connection > select to this reader will be ignored > {code} > try { > if (key.isValid()) { > if (key.isAcceptable()) > doAccept(key); > } > } catch (IOException ignored) { > if (LOG.isTraceEnabled()) LOG.trace("ignored", ignored); > } > {code} > Leaving the client (or Master and other RS)'s call wait until SocketTimeout. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21357) RS should abort if OOM in Reader thread
[ https://issues.apache.org/jira/browse/HBASE-21357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661659#comment-16661659 ] Allan Yang commented on HBASE-21357: Pushed to branch-1, thanks [~stack] for reviewing. > RS should abort if OOM in Reader thread > --- > > Key: HBASE-21357 > URL: https://issues.apache.org/jira/browse/HBASE-21357 > Project: HBase > Issue Type: Bug >Affects Versions: 1.4.8 >Reporter: Allan Yang >Assignee: Allan Yang >Priority: Major > Attachments: HBASE-21357.branch-1.001.patch, > HBASE-21357.branch-1.001.patch > > > It is a bit strange, we will abort the RS if OOM in Listener thread, > Responder thread and in CallRunner thread, only not in Reader thread... > We should abort RS if OOM happens in Reader thread, too. If not, the reader > thread exists because of OOM, and the selector closes. Later connection > select to this reader will be ignored > {code} > try { > if (key.isValid()) { > if (key.isAcceptable()) > doAccept(key); > } > } catch (IOException ignored) { > if (LOG.isTraceEnabled()) LOG.trace("ignored", ignored); > } > {code} > Leaving the client (or Master and other RS)'s call wait until SocketTimeout. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21344) hbase:meta location in ZooKeeper set to OPENING by the procedure which eventually failed but precludes Master from assigning it forever
[ https://issues.apache.org/jira/browse/HBASE-21344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661658#comment-16661658 ] Hadoop QA commented on HBASE-21344: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 6s{color} | {color:red} HBASE-21344 does not apply to branch-2. Rebase required? Wrong Branch? See https://yetus.apache.org/documentation/0.8.0/precommit-patchnames for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | HBASE-21344 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12945321/HBASE-21344-branch-2.0_v3.patch | | Console output | https://builds.apache.org/job/PreCommit-HBASE-Build/14837/console | | Powered by | Apache Yetus 0.8.0 http://yetus.apache.org | This message was automatically generated. > hbase:meta location in ZooKeeper set to OPENING by the procedure which > eventually failed but precludes Master from assigning it forever > --- > > Key: HBASE-21344 > URL: https://issues.apache.org/jira/browse/HBASE-21344 > Project: HBase > Issue Type: Bug > Components: proc-v2 >Reporter: Ankit Singhal >Assignee: Ankit Singhal >Priority: Major > Fix For: 2.0.3 > > Attachments: HBASE-21344-branch-2.0.patch, > HBASE-21344-branch-2.0_v2.patch, HBASE-21344-branch-2.0_v3.patch > > > [~elserj] has already summarized it well. > 1. hbase:meta was on RS8 > 2. RS8 crashed, SCP was queued for it, meta first > 3. meta was marked OFFLINE > 4. meta marked as OPENING on RS3 > 5. Can't actually send the openRegion RPC to RS3 due to the krb ticket issue > 6. We attempt the openRegion/assignment 10 times, failing each time > 7. We start rolling back the procedure: > {code:java} > 2018-10-08 06:51:24,440 WARN [PEWorker-9] procedure2.ProcedureExecutor: > Usually this should not happen, we will release the lock before if the > procedure is finished, even if the holdLock is true, arrive here means we > have some holes where we do not release the lock. And the releaseLock below > may fail since the procedure may have already been deleted from the procedure > store. > 2018-10-08 06:51:24,543 INFO [PEWorker-9] > procedure.MasterProcedureScheduler: pid=48, ppid=47, > state=FAILED:REGION_TRANSITION_QUEUE, > exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via > AssignProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException: Max > attempts exceeded; AssignProcedure table=hbase:meta, region=1588230740 > checking lock on 1588230740 > {code} > {code:java} > 2018-10-08 06:51:30,957 ERROR [PEWorker-9] procedure2.ProcedureExecutor: > CODE-BUG: Uncaught runtime exception for pid=47, > state=FAILED:SERVER_CRASH_ASSIGN_META, locked=true, > exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via > AssignProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException: Max > attempts exceeded; ServerCrashProcedure > server=,16020,1538974612843, splitWal=true, meta=true > java.lang.UnsupportedOperationException: unhandled > state=SERVER_CRASH_GET_REGIONS > at > org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.rollbackState(ServerCrashProcedure.java:254) > at > org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.rollbackState(ServerCrashProcedure.java:58) > at > org.apache.hadoop.hbase.procedure2.StateMachineProcedure.rollback(StateMachineProcedure.java:203) > at > org.apache.hadoop.hbase.procedure2.Procedure.doRollback(Procedure.java:960) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1577) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1539) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1418) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$900(ProcedureExecutor.java:75) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1981) > {code} > {code:java} > { DEBUG [PEWorker-2] client.RpcRetryingCallerImpl: Call exception, tries=7, > retries=7, started=8168 ms ago, cancelled=false, msg=Meta region is in state > OPENING, details=row 'backup:system' on table 'hbase:meta' at > region=hbase:meta,,1.1588230740, hostname=, seqNum=-1, > exception=java.io.IOException: Meta region is in state OPENING > at >
[jira] [Updated] (HBASE-21376) Add some verbose log to MasterProcedureScheduler
[ https://issues.apache.org/jira/browse/HBASE-21376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allan Yang updated HBASE-21376: --- Status: Patch Available (was: Open) > Add some verbose log to MasterProcedureScheduler > > > Key: HBASE-21376 > URL: https://issues.apache.org/jira/browse/HBASE-21376 > Project: HBase > Issue Type: Sub-task >Reporter: Allan Yang >Assignee: Allan Yang >Priority: Major > Attachments: HBASE-21376.branch-2.0.001.patch > > > As discussed in HBASE-21364, we divided the patch in HBASE-21364 to two, the > critical one is already submitted in HBASE-21364 to branch-2.0 and > branch-2.1, but I also added some useful logs which need to commit to all > branches. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-21376) Add some verbose log to MasterProcedureScheduler
[ https://issues.apache.org/jira/browse/HBASE-21376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allan Yang updated HBASE-21376: --- Attachment: HBASE-21376.branch-2.0.001.patch > Add some verbose log to MasterProcedureScheduler > > > Key: HBASE-21376 > URL: https://issues.apache.org/jira/browse/HBASE-21376 > Project: HBase > Issue Type: Sub-task >Reporter: Allan Yang >Assignee: Allan Yang >Priority: Major > Attachments: HBASE-21376.branch-2.0.001.patch > > > As discussed in HBASE-21364, we divided the patch in HBASE-21364 to two, the > critical one is already submitted in HBASE-21364 to branch-2.0 and > branch-2.1, but I also added some useful logs which need to commit to all > branches. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-21344) hbase:meta location in ZooKeeper set to OPENING by the procedure which eventually failed but precludes Master from assigning it forever
[ https://issues.apache.org/jira/browse/HBASE-21344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-21344: -- Fix Version/s: 2.0.3 > hbase:meta location in ZooKeeper set to OPENING by the procedure which > eventually failed but precludes Master from assigning it forever > --- > > Key: HBASE-21344 > URL: https://issues.apache.org/jira/browse/HBASE-21344 > Project: HBase > Issue Type: Bug > Components: proc-v2 >Reporter: Ankit Singhal >Assignee: Ankit Singhal >Priority: Major > Fix For: 2.0.3 > > Attachments: HBASE-21344-branch-2.0.patch, > HBASE-21344-branch-2.0_v2.patch, HBASE-21344-branch-2.0_v3.patch > > > [~elserj] has already summarized it well. > 1. hbase:meta was on RS8 > 2. RS8 crashed, SCP was queued for it, meta first > 3. meta was marked OFFLINE > 4. meta marked as OPENING on RS3 > 5. Can't actually send the openRegion RPC to RS3 due to the krb ticket issue > 6. We attempt the openRegion/assignment 10 times, failing each time > 7. We start rolling back the procedure: > {code:java} > 2018-10-08 06:51:24,440 WARN [PEWorker-9] procedure2.ProcedureExecutor: > Usually this should not happen, we will release the lock before if the > procedure is finished, even if the holdLock is true, arrive here means we > have some holes where we do not release the lock. And the releaseLock below > may fail since the procedure may have already been deleted from the procedure > store. > 2018-10-08 06:51:24,543 INFO [PEWorker-9] > procedure.MasterProcedureScheduler: pid=48, ppid=47, > state=FAILED:REGION_TRANSITION_QUEUE, > exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via > AssignProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException: Max > attempts exceeded; AssignProcedure table=hbase:meta, region=1588230740 > checking lock on 1588230740 > {code} > {code:java} > 2018-10-08 06:51:30,957 ERROR [PEWorker-9] procedure2.ProcedureExecutor: > CODE-BUG: Uncaught runtime exception for pid=47, > state=FAILED:SERVER_CRASH_ASSIGN_META, locked=true, > exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via > AssignProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException: Max > attempts exceeded; ServerCrashProcedure > server=,16020,1538974612843, splitWal=true, meta=true > java.lang.UnsupportedOperationException: unhandled > state=SERVER_CRASH_GET_REGIONS > at > org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.rollbackState(ServerCrashProcedure.java:254) > at > org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.rollbackState(ServerCrashProcedure.java:58) > at > org.apache.hadoop.hbase.procedure2.StateMachineProcedure.rollback(StateMachineProcedure.java:203) > at > org.apache.hadoop.hbase.procedure2.Procedure.doRollback(Procedure.java:960) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1577) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1539) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1418) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$900(ProcedureExecutor.java:75) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1981) > {code} > {code:java} > { DEBUG [PEWorker-2] client.RpcRetryingCallerImpl: Call exception, tries=7, > retries=7, started=8168 ms ago, cancelled=false, msg=Meta region is in state > OPENING, details=row 'backup:system' on table 'hbase:meta' at > region=hbase:meta,,1.1588230740, hostname=, seqNum=-1, > exception=java.io.IOException: Meta region is in state OPENING > at > org.apache.hadoop.hbase.client.ZKAsyncRegistry.lambda$null$1(ZKAsyncRegistry.java:154) > at > java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760) > at > java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736) > at > java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474) > at > java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1962) > at > org.apache.hadoop.hbase.client.ZKAsyncRegistry.lambda$getAndConvert$0(ZKAsyncRegistry.java:77) > at > java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760) > at > java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736) > at > java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474) > at >
[jira] [Updated] (HBASE-21344) hbase:meta location in ZooKeeper set to OPENING by the procedure which eventually failed but precludes Master from assigning it forever
[ https://issues.apache.org/jira/browse/HBASE-21344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-21344: -- Status: Patch Available (was: Open) > hbase:meta location in ZooKeeper set to OPENING by the procedure which > eventually failed but precludes Master from assigning it forever > --- > > Key: HBASE-21344 > URL: https://issues.apache.org/jira/browse/HBASE-21344 > Project: HBase > Issue Type: Bug > Components: proc-v2 >Reporter: Ankit Singhal >Assignee: Ankit Singhal >Priority: Major > Attachments: HBASE-21344-branch-2.0.patch, > HBASE-21344-branch-2.0_v2.patch, HBASE-21344-branch-2.0_v3.patch > > > [~elserj] has already summarized it well. > 1. hbase:meta was on RS8 > 2. RS8 crashed, SCP was queued for it, meta first > 3. meta was marked OFFLINE > 4. meta marked as OPENING on RS3 > 5. Can't actually send the openRegion RPC to RS3 due to the krb ticket issue > 6. We attempt the openRegion/assignment 10 times, failing each time > 7. We start rolling back the procedure: > {code:java} > 2018-10-08 06:51:24,440 WARN [PEWorker-9] procedure2.ProcedureExecutor: > Usually this should not happen, we will release the lock before if the > procedure is finished, even if the holdLock is true, arrive here means we > have some holes where we do not release the lock. And the releaseLock below > may fail since the procedure may have already been deleted from the procedure > store. > 2018-10-08 06:51:24,543 INFO [PEWorker-9] > procedure.MasterProcedureScheduler: pid=48, ppid=47, > state=FAILED:REGION_TRANSITION_QUEUE, > exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via > AssignProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException: Max > attempts exceeded; AssignProcedure table=hbase:meta, region=1588230740 > checking lock on 1588230740 > {code} > {code:java} > 2018-10-08 06:51:30,957 ERROR [PEWorker-9] procedure2.ProcedureExecutor: > CODE-BUG: Uncaught runtime exception for pid=47, > state=FAILED:SERVER_CRASH_ASSIGN_META, locked=true, > exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via > AssignProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException: Max > attempts exceeded; ServerCrashProcedure > server=,16020,1538974612843, splitWal=true, meta=true > java.lang.UnsupportedOperationException: unhandled > state=SERVER_CRASH_GET_REGIONS > at > org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.rollbackState(ServerCrashProcedure.java:254) > at > org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.rollbackState(ServerCrashProcedure.java:58) > at > org.apache.hadoop.hbase.procedure2.StateMachineProcedure.rollback(StateMachineProcedure.java:203) > at > org.apache.hadoop.hbase.procedure2.Procedure.doRollback(Procedure.java:960) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1577) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1539) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1418) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$900(ProcedureExecutor.java:75) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1981) > {code} > {code:java} > { DEBUG [PEWorker-2] client.RpcRetryingCallerImpl: Call exception, tries=7, > retries=7, started=8168 ms ago, cancelled=false, msg=Meta region is in state > OPENING, details=row 'backup:system' on table 'hbase:meta' at > region=hbase:meta,,1.1588230740, hostname=, seqNum=-1, > exception=java.io.IOException: Meta region is in state OPENING > at > org.apache.hadoop.hbase.client.ZKAsyncRegistry.lambda$null$1(ZKAsyncRegistry.java:154) > at > java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760) > at > java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736) > at > java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474) > at > java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1962) > at > org.apache.hadoop.hbase.client.ZKAsyncRegistry.lambda$getAndConvert$0(ZKAsyncRegistry.java:77) > at > java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760) > at > java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736) > at > java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474) > at >
[jira] [Commented] (HBASE-21344) hbase:meta location in ZooKeeper set to OPENING by the procedure which eventually failed but precludes Master from assigning it forever
[ https://issues.apache.org/jira/browse/HBASE-21344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661655#comment-16661655 ] stack commented on HBASE-21344: --- The change in HBaseTestingUtility is just formatting? Otherwise, patch seems good. > hbase:meta location in ZooKeeper set to OPENING by the procedure which > eventually failed but precludes Master from assigning it forever > --- > > Key: HBASE-21344 > URL: https://issues.apache.org/jira/browse/HBASE-21344 > Project: HBase > Issue Type: Bug > Components: proc-v2 >Reporter: Ankit Singhal >Assignee: Ankit Singhal >Priority: Major > Fix For: 2.0.3 > > Attachments: HBASE-21344-branch-2.0.patch, > HBASE-21344-branch-2.0_v2.patch, HBASE-21344-branch-2.0_v3.patch > > > [~elserj] has already summarized it well. > 1. hbase:meta was on RS8 > 2. RS8 crashed, SCP was queued for it, meta first > 3. meta was marked OFFLINE > 4. meta marked as OPENING on RS3 > 5. Can't actually send the openRegion RPC to RS3 due to the krb ticket issue > 6. We attempt the openRegion/assignment 10 times, failing each time > 7. We start rolling back the procedure: > {code:java} > 2018-10-08 06:51:24,440 WARN [PEWorker-9] procedure2.ProcedureExecutor: > Usually this should not happen, we will release the lock before if the > procedure is finished, even if the holdLock is true, arrive here means we > have some holes where we do not release the lock. And the releaseLock below > may fail since the procedure may have already been deleted from the procedure > store. > 2018-10-08 06:51:24,543 INFO [PEWorker-9] > procedure.MasterProcedureScheduler: pid=48, ppid=47, > state=FAILED:REGION_TRANSITION_QUEUE, > exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via > AssignProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException: Max > attempts exceeded; AssignProcedure table=hbase:meta, region=1588230740 > checking lock on 1588230740 > {code} > {code:java} > 2018-10-08 06:51:30,957 ERROR [PEWorker-9] procedure2.ProcedureExecutor: > CODE-BUG: Uncaught runtime exception for pid=47, > state=FAILED:SERVER_CRASH_ASSIGN_META, locked=true, > exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via > AssignProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException: Max > attempts exceeded; ServerCrashProcedure > server=,16020,1538974612843, splitWal=true, meta=true > java.lang.UnsupportedOperationException: unhandled > state=SERVER_CRASH_GET_REGIONS > at > org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.rollbackState(ServerCrashProcedure.java:254) > at > org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.rollbackState(ServerCrashProcedure.java:58) > at > org.apache.hadoop.hbase.procedure2.StateMachineProcedure.rollback(StateMachineProcedure.java:203) > at > org.apache.hadoop.hbase.procedure2.Procedure.doRollback(Procedure.java:960) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1577) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1539) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1418) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$900(ProcedureExecutor.java:75) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1981) > {code} > {code:java} > { DEBUG [PEWorker-2] client.RpcRetryingCallerImpl: Call exception, tries=7, > retries=7, started=8168 ms ago, cancelled=false, msg=Meta region is in state > OPENING, details=row 'backup:system' on table 'hbase:meta' at > region=hbase:meta,,1.1588230740, hostname=, seqNum=-1, > exception=java.io.IOException: Meta region is in state OPENING > at > org.apache.hadoop.hbase.client.ZKAsyncRegistry.lambda$null$1(ZKAsyncRegistry.java:154) > at > java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760) > at > java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736) > at > java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474) > at > java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1962) > at > org.apache.hadoop.hbase.client.ZKAsyncRegistry.lambda$getAndConvert$0(ZKAsyncRegistry.java:77) > at > java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760) > at > java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736) > at >
[jira] [Commented] (HBASE-21254) Need to find a way to limit the number of proc wal files
[ https://issues.apache.org/jira/browse/HBASE-21254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661653#comment-16661653 ] Allan Yang commented on HBASE-21254: But we can use public Entry tryLockEntry(long id, long time) instead, we can still block the forceUpdateExecutor thread if some procedure is stuck. > Need to find a way to limit the number of proc wal files > > > Key: HBASE-21254 > URL: https://issues.apache.org/jira/browse/HBASE-21254 > Project: HBase > Issue Type: Sub-task > Components: proc-v2 >Reporter: Duo Zhang >Assignee: Duo Zhang >Priority: Critical > Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3 > > Attachments: HBASE-21254-v1.patch, HBASE-21254-v2.patch, > HBASE-21254-v3.patch, HBASE-21254.patch > > > For regionserver, we have a max wal file limitation, if we reach the > limitation, we will trigger a flush on specific regions so that we can delete > old wal files. But for proc wals, we do not have this mechanism, and it will > be worse after HBASE-21233, as if there is an old procedure which can not > make progress and do not persist its state, we need to keep the old proc wal > file for ever... -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (HBASE-21364) Procedure holds the lock should put to front of the queue after restart
[ https://issues.apache.org/jira/browse/HBASE-21364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661646#comment-16661646 ] Allan Yang edited comment on HBASE-21364 at 10/24/18 2:58 AM: -- The patch without verbose log has already committed to branch-2.0 and branch-2.1, thanks! Opened HBASE-21376 to commit the verbose log. was (Author: allan163): The patch without verbose log has already committed to branch-2.0 and branch-2.1, thanks! > Procedure holds the lock should put to front of the queue after restart > --- > > Key: HBASE-21364 > URL: https://issues.apache.org/jira/browse/HBASE-21364 > Project: HBase > Issue Type: Sub-task >Affects Versions: 2.1.0, 2.0.2 >Reporter: Allan Yang >Assignee: Allan Yang >Priority: Blocker > Fix For: 2.1.1, 2.0.3 > > Attachments: HBASE-21364.branch-2.0.001.patch, > HBASE-21364.branch-2.0.002.patch > > > After restore the procedures form Procedure WALs. We will put the runable > procedures back to the queue to execute. The order is not the problem before > HBASE-20846 since the first one to execute will acquire the lock itself. But > since the locks will restored after HBASE-20846. If we execute a procedure > without the lock first before a procedure with the lock in the same queue, > there is a race condition that we may not be able to execute all procedures > in the same queue at all. > The race condtion is: > 1. A procedure need to take the table's exclusive lock was put into the > table's queue, but the table's shard lock was lock by a Region Procedure. > Since no one takes the exclusive lock, the queue is put to run queue to > execute. But soon, the worker thread see the procedure can't execute because > it doesn't hold the lock, so it will stop execute and remove the queue from > run queue. > 2. At the same time, the Region procedure which holds the table's shard lock > and the region's exclusive lock is put to the table's queue. But, since the > queue already added to the run queue, it won't add again. > 3. Since 1, the table's queue was removed from the run queue. > 4. Then, no one will put the table's queue back, thus no worker will execute > the procedures inside > A test case in the patch shows how. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-21364) Procedure holds the lock should put to front of the queue after restart
[ https://issues.apache.org/jira/browse/HBASE-21364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allan Yang updated HBASE-21364: --- Resolution: Fixed Status: Resolved (was: Patch Available) > Procedure holds the lock should put to front of the queue after restart > --- > > Key: HBASE-21364 > URL: https://issues.apache.org/jira/browse/HBASE-21364 > Project: HBase > Issue Type: Sub-task >Affects Versions: 2.1.0, 2.0.2 >Reporter: Allan Yang >Assignee: Allan Yang >Priority: Blocker > Fix For: 2.1.1, 2.0.3 > > Attachments: HBASE-21364.branch-2.0.001.patch, > HBASE-21364.branch-2.0.002.patch > > > After restore the procedures form Procedure WALs. We will put the runable > procedures back to the queue to execute. The order is not the problem before > HBASE-20846 since the first one to execute will acquire the lock itself. But > since the locks will restored after HBASE-20846. If we execute a procedure > without the lock first before a procedure with the lock in the same queue, > there is a race condition that we may not be able to execute all procedures > in the same queue at all. > The race condtion is: > 1. A procedure need to take the table's exclusive lock was put into the > table's queue, but the table's shard lock was lock by a Region Procedure. > Since no one takes the exclusive lock, the queue is put to run queue to > execute. But soon, the worker thread see the procedure can't execute because > it doesn't hold the lock, so it will stop execute and remove the queue from > run queue. > 2. At the same time, the Region procedure which holds the table's shard lock > and the region's exclusive lock is put to the table's queue. But, since the > queue already added to the run queue, it won't add again. > 3. Since 1, the table's queue was removed from the run queue. > 4. Then, no one will put the table's queue back, thus no worker will execute > the procedures inside > A test case in the patch shows how. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21364) Procedure holds the lock should put to front of the queue after restart
[ https://issues.apache.org/jira/browse/HBASE-21364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661646#comment-16661646 ] Allan Yang commented on HBASE-21364: The patch without verbose log has already committed to branch-2.0 and branch-2.1, thanks! > Procedure holds the lock should put to front of the queue after restart > --- > > Key: HBASE-21364 > URL: https://issues.apache.org/jira/browse/HBASE-21364 > Project: HBase > Issue Type: Sub-task >Affects Versions: 2.1.0, 2.0.2 >Reporter: Allan Yang >Assignee: Allan Yang >Priority: Blocker > Fix For: 2.1.1, 2.0.3 > > Attachments: HBASE-21364.branch-2.0.001.patch, > HBASE-21364.branch-2.0.002.patch > > > After restore the procedures form Procedure WALs. We will put the runable > procedures back to the queue to execute. The order is not the problem before > HBASE-20846 since the first one to execute will acquire the lock itself. But > since the locks will restored after HBASE-20846. If we execute a procedure > without the lock first before a procedure with the lock in the same queue, > there is a race condition that we may not be able to execute all procedures > in the same queue at all. > The race condtion is: > 1. A procedure need to take the table's exclusive lock was put into the > table's queue, but the table's shard lock was lock by a Region Procedure. > Since no one takes the exclusive lock, the queue is put to run queue to > execute. But soon, the worker thread see the procedure can't execute because > it doesn't hold the lock, so it will stop execute and remove the queue from > run queue. > 2. At the same time, the Region procedure which holds the table's shard lock > and the region's exclusive lock is put to the table's queue. But, since the > queue already added to the run queue, it won't add again. > 3. Since 1, the table's queue was removed from the run queue. > 4. Then, no one will put the table's queue back, thus no worker will execute > the procedures inside > A test case in the patch shows how. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-21376) Add some verbose log to MasterProcedureScheduler
Allan Yang created HBASE-21376: -- Summary: Add some verbose log to MasterProcedureScheduler Key: HBASE-21376 URL: https://issues.apache.org/jira/browse/HBASE-21376 Project: HBase Issue Type: Sub-task Reporter: Allan Yang Assignee: Allan Yang As discussed in HBASE-21364, we divided the patch in HBASE-21364 to two, the critical one is already submitted in HBASE-21364 to branch-2.0 and branch-2.1, but I also added some useful logs which need to commit to all branches. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21254) Need to find a way to limit the number of proc wal files
[ https://issues.apache.org/jira/browse/HBASE-21254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661644#comment-16661644 ] Duo Zhang commented on HBASE-21254: --- But we do need to review the implementation again. As I found that, the actual deletion for a root procedure is done later in CompletedProcedureCleaner, so a successful procedure could also block us from removing a file... > Need to find a way to limit the number of proc wal files > > > Key: HBASE-21254 > URL: https://issues.apache.org/jira/browse/HBASE-21254 > Project: HBase > Issue Type: Sub-task > Components: proc-v2 >Reporter: Duo Zhang >Assignee: Duo Zhang >Priority: Critical > Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3 > > Attachments: HBASE-21254-v1.patch, HBASE-21254-v2.patch, > HBASE-21254-v3.patch, HBASE-21254.patch > > > For regionserver, we have a max wal file limitation, if we reach the > limitation, we will trigger a flush on specific regions so that we can delete > old wal files. But for proc wals, we do not have this mechanism, and it will > be worse after HBASE-21233, as if there is an old procedure which can not > make progress and do not persist its state, we need to keep the old proc wal > file for ever... -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21254) Need to find a way to limit the number of proc wal files
[ https://issues.apache.org/jira/browse/HBASE-21254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661643#comment-16661643 ] Duo Zhang commented on HBASE-21254: --- No it will be executed in a separated thread. In the log rolling thread we will just schedule a task into the forceUpdateExecutor. > Need to find a way to limit the number of proc wal files > > > Key: HBASE-21254 > URL: https://issues.apache.org/jira/browse/HBASE-21254 > Project: HBase > Issue Type: Sub-task > Components: proc-v2 >Reporter: Duo Zhang >Assignee: Duo Zhang >Priority: Critical > Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3 > > Attachments: HBASE-21254-v1.patch, HBASE-21254-v2.patch, > HBASE-21254-v3.patch, HBASE-21254.patch > > > For regionserver, we have a max wal file limitation, if we reach the > limitation, we will trigger a flush on specific regions so that we can delete > old wal files. But for proc wals, we do not have this mechanism, and it will > be worse after HBASE-21233, as if there is an old procedure which can not > make progress and do not persist its state, we need to keep the old proc wal > file for ever... -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21254) Need to find a way to limit the number of proc wal files
[ https://issues.apache.org/jira/browse/HBASE-21254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661642#comment-16661642 ] Allan Yang commented on HBASE-21254: In forceUpdateProcedure, we will acquire the execution lock before force update: {code} private void forceUpdateProcedure(long procId) throws IOException { IdLock.Entry lockEntry = procExecutionLock.getLockEntry(procId); try { {code} We will wait forever here if the procedure is stuck. And IIRC, forceUpdateProcedure will be executed in the roll procedure log thread, which will stuck it too. > Need to find a way to limit the number of proc wal files > > > Key: HBASE-21254 > URL: https://issues.apache.org/jira/browse/HBASE-21254 > Project: HBase > Issue Type: Sub-task > Components: proc-v2 >Reporter: Duo Zhang >Assignee: Duo Zhang >Priority: Critical > Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3 > > Attachments: HBASE-21254-v1.patch, HBASE-21254-v2.patch, > HBASE-21254-v3.patch, HBASE-21254.patch > > > For regionserver, we have a max wal file limitation, if we reach the > limitation, we will trigger a flush on specific regions so that we can delete > old wal files. But for proc wals, we do not have this mechanism, and it will > be worse after HBASE-21233, as if there is an old procedure which can not > make progress and do not persist its state, we need to keep the old proc wal > file for ever... -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21254) Need to find a way to limit the number of proc wal files
[ https://issues.apache.org/jira/browse/HBASE-21254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661628#comment-16661628 ] Duo Zhang commented on HBASE-21254: --- Which lock? I do not get your point, about the 'wait for the lock when rolling log'... > Need to find a way to limit the number of proc wal files > > > Key: HBASE-21254 > URL: https://issues.apache.org/jira/browse/HBASE-21254 > Project: HBase > Issue Type: Sub-task > Components: proc-v2 >Reporter: Duo Zhang >Assignee: Duo Zhang >Priority: Critical > Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3 > > Attachments: HBASE-21254-v1.patch, HBASE-21254-v2.patch, > HBASE-21254-v3.patch, HBASE-21254.patch > > > For regionserver, we have a max wal file limitation, if we reach the > limitation, we will trigger a flush on specific regions so that we can delete > old wal files. But for proc wals, we do not have this mechanism, and it will > be worse after HBASE-21233, as if there is an old procedure which can not > make progress and do not persist its state, we need to keep the old proc wal > file for ever... -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21254) Need to find a way to limit the number of proc wal files
[ https://issues.apache.org/jira/browse/HBASE-21254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661625#comment-16661625 ] Allan Yang commented on HBASE-21254: I have a question for this one, what if a procedure is stuck there, and can't get the lock for it, will it stuck to wait for the lock when rolling log? > Need to find a way to limit the number of proc wal files > > > Key: HBASE-21254 > URL: https://issues.apache.org/jira/browse/HBASE-21254 > Project: HBase > Issue Type: Sub-task > Components: proc-v2 >Reporter: Duo Zhang >Assignee: Duo Zhang >Priority: Critical > Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3 > > Attachments: HBASE-21254-v1.patch, HBASE-21254-v2.patch, > HBASE-21254-v3.patch, HBASE-21254.patch > > > For regionserver, we have a max wal file limitation, if we reach the > limitation, we will trigger a flush on specific regions so that we can delete > old wal files. But for proc wals, we do not have this mechanism, and it will > be worse after HBASE-21233, as if there is an old procedure which can not > make progress and do not persist its state, we need to keep the old proc wal > file for ever... -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-21344) hbase:meta location in ZooKeeper set to OPENING by the procedure which eventually failed but precludes Master from assigning it forever
[ https://issues.apache.org/jira/browse/HBASE-21344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ankit Singhal updated HBASE-21344: -- Attachment: HBASE-21344-branch-2.0_v3.patch > hbase:meta location in ZooKeeper set to OPENING by the procedure which > eventually failed but precludes Master from assigning it forever > --- > > Key: HBASE-21344 > URL: https://issues.apache.org/jira/browse/HBASE-21344 > Project: HBase > Issue Type: Bug > Components: proc-v2 >Reporter: Ankit Singhal >Assignee: Ankit Singhal >Priority: Major > Attachments: HBASE-21344-branch-2.0.patch, > HBASE-21344-branch-2.0_v2.patch, HBASE-21344-branch-2.0_v3.patch > > > [~elserj] has already summarized it well. > 1. hbase:meta was on RS8 > 2. RS8 crashed, SCP was queued for it, meta first > 3. meta was marked OFFLINE > 4. meta marked as OPENING on RS3 > 5. Can't actually send the openRegion RPC to RS3 due to the krb ticket issue > 6. We attempt the openRegion/assignment 10 times, failing each time > 7. We start rolling back the procedure: > {code:java} > 2018-10-08 06:51:24,440 WARN [PEWorker-9] procedure2.ProcedureExecutor: > Usually this should not happen, we will release the lock before if the > procedure is finished, even if the holdLock is true, arrive here means we > have some holes where we do not release the lock. And the releaseLock below > may fail since the procedure may have already been deleted from the procedure > store. > 2018-10-08 06:51:24,543 INFO [PEWorker-9] > procedure.MasterProcedureScheduler: pid=48, ppid=47, > state=FAILED:REGION_TRANSITION_QUEUE, > exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via > AssignProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException: Max > attempts exceeded; AssignProcedure table=hbase:meta, region=1588230740 > checking lock on 1588230740 > {code} > {code:java} > 2018-10-08 06:51:30,957 ERROR [PEWorker-9] procedure2.ProcedureExecutor: > CODE-BUG: Uncaught runtime exception for pid=47, > state=FAILED:SERVER_CRASH_ASSIGN_META, locked=true, > exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via > AssignProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException: Max > attempts exceeded; ServerCrashProcedure > server=,16020,1538974612843, splitWal=true, meta=true > java.lang.UnsupportedOperationException: unhandled > state=SERVER_CRASH_GET_REGIONS > at > org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.rollbackState(ServerCrashProcedure.java:254) > at > org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.rollbackState(ServerCrashProcedure.java:58) > at > org.apache.hadoop.hbase.procedure2.StateMachineProcedure.rollback(StateMachineProcedure.java:203) > at > org.apache.hadoop.hbase.procedure2.Procedure.doRollback(Procedure.java:960) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1577) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1539) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1418) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$900(ProcedureExecutor.java:75) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1981) > {code} > {code:java} > { DEBUG [PEWorker-2] client.RpcRetryingCallerImpl: Call exception, tries=7, > retries=7, started=8168 ms ago, cancelled=false, msg=Meta region is in state > OPENING, details=row 'backup:system' on table 'hbase:meta' at > region=hbase:meta,,1.1588230740, hostname=, seqNum=-1, > exception=java.io.IOException: Meta region is in state OPENING > at > org.apache.hadoop.hbase.client.ZKAsyncRegistry.lambda$null$1(ZKAsyncRegistry.java:154) > at > java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760) > at > java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736) > at > java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474) > at > java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1962) > at > org.apache.hadoop.hbase.client.ZKAsyncRegistry.lambda$getAndConvert$0(ZKAsyncRegistry.java:77) > at > java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760) > at > java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736) > at > java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474) > at >
[jira] [Commented] (HBASE-21372) Set hbase.assignment.maximum.attempts to Long.MAX
[ https://issues.apache.org/jira/browse/HBASE-21372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661612#comment-16661612 ] Duo Zhang commented on HBASE-21372: --- TRSP uses the same config to decide whether to give up retrying. > Set hbase.assignment.maximum.attempts to Long.MAX > - > > Key: HBASE-21372 > URL: https://issues.apache.org/jira/browse/HBASE-21372 > Project: HBase > Issue Type: Sub-task > Components: amv2 >Reporter: stack >Assignee: stack >Priority: Major > Attachments: HBASE-21372.branch-2.1.001.patch, > HBASE-21372.branch-2.1.001.patch > > > From parent issue, [~allan163] suggests that we not give up on assign unless > there a change -- an SCP triggers failure -- or at the extreme, an operator > intervenes. This jibes w/ how we're thinking about assign (or to put it > another way, we have no handling for the case where we exhaust retries). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21363) Rewrite the buildingHoldCleanupTracker method in WALProcedureStore
[ https://issues.apache.org/jira/browse/HBASE-21363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661613#comment-16661613 ] Duo Zhang commented on HBASE-21363: --- Any other concerns? [~allan163]. > Rewrite the buildingHoldCleanupTracker method in WALProcedureStore > -- > > Key: HBASE-21363 > URL: https://issues.apache.org/jira/browse/HBASE-21363 > Project: HBase > Issue Type: Sub-task > Components: proc-v2 >Reporter: Duo Zhang >Assignee: Duo Zhang >Priority: Major > Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3 > > Attachments: HBASE-21363-v1.patch, HBASE-21363-v2.patch, > HBASE-21363-v3.patch, HBASE-21363.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21363) Rewrite the buildingHoldCleanupTracker method in WALProcedureStore
[ https://issues.apache.org/jira/browse/HBASE-21363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661611#comment-16661611 ] Hadoop QA commented on HBASE-21363: --- | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 22s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 11s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 23s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 14s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 4m 17s{color} | {color:green} branch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 27s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 13s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 5m 7s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 21s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 21s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 14s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 4m 15s{color} | {color:green} patch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 10m 30s{color} | {color:green} Patch does not cause any errors with Hadoop 2.7.4 or 3.0.0. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 32s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 12s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 3m 8s{color} | {color:green} hbase-procedure in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 10s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 37m 5s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:b002b0b | | JIRA Issue | HBASE-21363 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12945314/HBASE-21363-v3.patch | | Optional Tests | dupname asflicense javac javadoc unit findbugs shadedjars hadoopcheck hbaseanti checkstyle compile | | uname | Linux 1194f80ac5e4 3.13.0-144-generic #193-Ubuntu SMP Thu Mar 15 17:03:53 UTC 2018 x86_64 GNU/Linux | | Build tool | maven | | Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh | | git revision | master / 1f437ac221 | | maven | version: Apache Maven 3.5.4 (1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-17T18:33:14Z) | | Default Java | 1.8.0_181 | | findbugs | v3.1.0-RC3 | | Test Results | https://builds.apache.org/job/PreCommit-HBASE-Build/14835/testReport/ | | Max. process+thread count | 278 (vs. ulimit of 1) | | modules | C: hbase-procedure U: hbase-procedure | | Console output | https://builds.apache.org/job/PreCommit-HBASE-Build/14835/console | | Powered by | Apache Yetus 0.8.0 http://yetus.apache.org | This message was automatically generated. > Rewrite the
[jira] [Comment Edited] (HBASE-21372) Set hbase.assignment.maximum.attempts to Long.MAX
[ https://issues.apache.org/jira/browse/HBASE-21372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661606#comment-16661606 ] Allan Yang edited comment on HBASE-21372 at 10/24/18 2:06 AM: -- +1 for it. TransitRegionStateProcedure is introduced in branch-2+ by [~Apache9], can [~Apache9] conform that in branch-2+, we have already retry forever for assignment? was (Author: allan163): +1 for it, TransitRegionStateProcedure is introduced in branch-2+ by [~Apache9], can [~Apache9] conform that in branch-2+, we have already retry forever for assignment? > Set hbase.assignment.maximum.attempts to Long.MAX > - > > Key: HBASE-21372 > URL: https://issues.apache.org/jira/browse/HBASE-21372 > Project: HBase > Issue Type: Sub-task > Components: amv2 >Reporter: stack >Assignee: stack >Priority: Major > Attachments: HBASE-21372.branch-2.1.001.patch, > HBASE-21372.branch-2.1.001.patch > > > From parent issue, [~allan163] suggests that we not give up on assign unless > there a change -- an SCP triggers failure -- or at the extreme, an operator > intervenes. This jibes w/ how we're thinking about assign (or to put it > another way, we have no handling for the case where we exhaust retries). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21372) Set hbase.assignment.maximum.attempts to Long.MAX
[ https://issues.apache.org/jira/browse/HBASE-21372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661606#comment-16661606 ] Allan Yang commented on HBASE-21372: +1 for it, TransitRegionStateProcedure is introduced in branch-2+ by [~Apache9], can [~Apache9] conform that in branch-2+, we have already retry forever for assignment? > Set hbase.assignment.maximum.attempts to Long.MAX > - > > Key: HBASE-21372 > URL: https://issues.apache.org/jira/browse/HBASE-21372 > Project: HBase > Issue Type: Sub-task > Components: amv2 >Reporter: stack >Assignee: stack >Priority: Major > Attachments: HBASE-21372.branch-2.1.001.patch, > HBASE-21372.branch-2.1.001.patch > > > From parent issue, [~allan163] suggests that we not give up on assign unless > there a change -- an SCP triggers failure -- or at the extreme, an operator > intervenes. This jibes w/ how we're thinking about assign (or to put it > another way, we have no handling for the case where we exhaust retries). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21364) Procedure holds the lock should put to front of the queue after restart
[ https://issues.apache.org/jira/browse/HBASE-21364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661593#comment-16661593 ] Allan Yang commented on HBASE-21364: The checkstyle error is not valid, we add too many comments in the method, so it exceeds 150 lines. Will commit the actual fix to branch-2.1 and branch-2.0, and open another issue to commit the verbose code to all branches as [~Apache9] said. > Procedure holds the lock should put to front of the queue after restart > --- > > Key: HBASE-21364 > URL: https://issues.apache.org/jira/browse/HBASE-21364 > Project: HBase > Issue Type: Sub-task >Affects Versions: 2.1.0, 2.0.2 >Reporter: Allan Yang >Assignee: Allan Yang >Priority: Blocker > Fix For: 2.1.1, 2.0.3 > > Attachments: HBASE-21364.branch-2.0.001.patch, > HBASE-21364.branch-2.0.002.patch > > > After restore the procedures form Procedure WALs. We will put the runable > procedures back to the queue to execute. The order is not the problem before > HBASE-20846 since the first one to execute will acquire the lock itself. But > since the locks will restored after HBASE-20846. If we execute a procedure > without the lock first before a procedure with the lock in the same queue, > there is a race condition that we may not be able to execute all procedures > in the same queue at all. > The race condtion is: > 1. A procedure need to take the table's exclusive lock was put into the > table's queue, but the table's shard lock was lock by a Region Procedure. > Since no one takes the exclusive lock, the queue is put to run queue to > execute. But soon, the worker thread see the procedure can't execute because > it doesn't hold the lock, so it will stop execute and remove the queue from > run queue. > 2. At the same time, the Region procedure which holds the table's shard lock > and the region's exclusive lock is put to the table's queue. But, since the > queue already added to the run queue, it won't add again. > 3. Since 1, the table's queue was removed from the run queue. > 4. Then, no one will put the table's queue back, thus no worker will execute > the procedures inside > A test case in the patch shows how. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21373) Backport to branch-1, "HBASE-21338 [balancer] If balancer is an ill-fit for cluster size, it gives little indication"
[ https://issues.apache.org/jira/browse/HBASE-21373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661584#comment-16661584 ] Hadoop QA commented on HBASE-21373: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 38s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 1s{color} | {color:blue} Findbugs executables are not available. {color} | | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:orange}-0{color} | {color:orange} test4tests {color} | {color:orange} 0m 0s{color} | {color:orange} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} branch-1 Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 0s{color} | {color:green} branch-1 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 41s{color} | {color:green} branch-1 passed with JDK v1.8.0_181 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 40s{color} | {color:green} branch-1 passed with JDK v1.7.0_191 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 24s{color} | {color:green} branch-1 passed {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 2m 49s{color} | {color:green} branch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 29s{color} | {color:green} branch-1 passed with JDK v1.8.0_181 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 37s{color} | {color:green} branch-1 passed with JDK v1.7.0_191 {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 37s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 35s{color} | {color:green} the patch passed with JDK v1.8.0_181 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 35s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 40s{color} | {color:green} the patch passed with JDK v1.7.0_191 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 40s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 1m 22s{color} | {color:red} hbase-server: The patch generated 2 new + 37 unchanged - 0 fixed = 39 total (was 37) {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 2m 45s{color} | {color:green} patch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 1m 44s{color} | {color:green} Patch does not cause any errors with Hadoop 2.7.4. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 31s{color} | {color:green} the patch passed with JDK v1.8.0_181 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 38s{color} | {color:green} the patch passed with JDK v1.7.0_191 {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red}130m 40s{color} | {color:red} hbase-server in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 23s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}150m 47s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hbase.procedure.TestFailedProcCleanup | | | hadoop.hbase.mapreduce.TestLoadIncrementalHFiles | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:61288f8 | | JIRA Issue | HBASE-21373 | | JIRA Patch URL |
[jira] [Created] (HBASE-21375) Forward port "HBASE-21364 Procedure holds the lock should put to front of the queue after restart"
Duo Zhang created HBASE-21375: - Summary: Forward port "HBASE-21364 Procedure holds the lock should put to front of the queue after restart" Key: HBASE-21375 URL: https://issues.apache.org/jira/browse/HBASE-21375 Project: HBase Issue Type: Sub-task Components: proc-v2 Reporter: Duo Zhang Assignee: Duo Zhang Fix For: 3.0.0, 2.2.0 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21344) hbase:meta location in ZooKeeper set to OPENING by the procedure which eventually failed but precludes Master from assigning it forever
[ https://issues.apache.org/jira/browse/HBASE-21344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661580#comment-16661580 ] stack commented on HBASE-21344: --- Make new patch [~an...@apache.org]? Thanks. > hbase:meta location in ZooKeeper set to OPENING by the procedure which > eventually failed but precludes Master from assigning it forever > --- > > Key: HBASE-21344 > URL: https://issues.apache.org/jira/browse/HBASE-21344 > Project: HBase > Issue Type: Bug > Components: proc-v2 >Reporter: Ankit Singhal >Assignee: Ankit Singhal >Priority: Major > Attachments: HBASE-21344-branch-2.0.patch, > HBASE-21344-branch-2.0_v2.patch > > > [~elserj] has already summarized it well. > 1. hbase:meta was on RS8 > 2. RS8 crashed, SCP was queued for it, meta first > 3. meta was marked OFFLINE > 4. meta marked as OPENING on RS3 > 5. Can't actually send the openRegion RPC to RS3 due to the krb ticket issue > 6. We attempt the openRegion/assignment 10 times, failing each time > 7. We start rolling back the procedure: > {code:java} > 2018-10-08 06:51:24,440 WARN [PEWorker-9] procedure2.ProcedureExecutor: > Usually this should not happen, we will release the lock before if the > procedure is finished, even if the holdLock is true, arrive here means we > have some holes where we do not release the lock. And the releaseLock below > may fail since the procedure may have already been deleted from the procedure > store. > 2018-10-08 06:51:24,543 INFO [PEWorker-9] > procedure.MasterProcedureScheduler: pid=48, ppid=47, > state=FAILED:REGION_TRANSITION_QUEUE, > exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via > AssignProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException: Max > attempts exceeded; AssignProcedure table=hbase:meta, region=1588230740 > checking lock on 1588230740 > {code} > {code:java} > 2018-10-08 06:51:30,957 ERROR [PEWorker-9] procedure2.ProcedureExecutor: > CODE-BUG: Uncaught runtime exception for pid=47, > state=FAILED:SERVER_CRASH_ASSIGN_META, locked=true, > exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via > AssignProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException: Max > attempts exceeded; ServerCrashProcedure > server=,16020,1538974612843, splitWal=true, meta=true > java.lang.UnsupportedOperationException: unhandled > state=SERVER_CRASH_GET_REGIONS > at > org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.rollbackState(ServerCrashProcedure.java:254) > at > org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.rollbackState(ServerCrashProcedure.java:58) > at > org.apache.hadoop.hbase.procedure2.StateMachineProcedure.rollback(StateMachineProcedure.java:203) > at > org.apache.hadoop.hbase.procedure2.Procedure.doRollback(Procedure.java:960) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1577) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1539) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1418) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$900(ProcedureExecutor.java:75) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1981) > {code} > {code:java} > { DEBUG [PEWorker-2] client.RpcRetryingCallerImpl: Call exception, tries=7, > retries=7, started=8168 ms ago, cancelled=false, msg=Meta region is in state > OPENING, details=row 'backup:system' on table 'hbase:meta' at > region=hbase:meta,,1.1588230740, hostname=, seqNum=-1, > exception=java.io.IOException: Meta region is in state OPENING > at > org.apache.hadoop.hbase.client.ZKAsyncRegistry.lambda$null$1(ZKAsyncRegistry.java:154) > at > java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760) > at > java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736) > at > java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474) > at > java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1962) > at > org.apache.hadoop.hbase.client.ZKAsyncRegistry.lambda$getAndConvert$0(ZKAsyncRegistry.java:77) > at > java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760) > at > java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736) > at > java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474) > at >
[jira] [Commented] (HBASE-21371) Hbase unable to compile against Hadoop trunk (3.3.0-SNAPSHOT) due to license error
[ https://issues.apache.org/jira/browse/HBASE-21371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661576#comment-16661576 ] Hadoop QA commented on HBASE-21371: --- | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 11s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:orange}-0{color} | {color:orange} test4tests {color} | {color:orange} 0m 0s{color} | {color:orange} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 5m 31s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 10s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 5m 21s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 10s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 9s{color} | {color:green} hbase-resource-bundle in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 9s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 12m 0s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:b002b0b | | JIRA Issue | HBASE-21371 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12945311/HBASE-21371.master.001.patch | | Optional Tests | dupname asflicense javac javadoc unit xml | | uname | Linux 92319fb5b492 3.13.0-143-generic #192-Ubuntu SMP Tue Feb 27 10:45:36 UTC 2018 x86_64 GNU/Linux | | Build tool | maven | | Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh | | git revision | master / 1f437ac221 | | maven | version: Apache Maven 3.5.4 (1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-17T18:33:14Z) | | Default Java | 1.8.0_181 | | Test Results | https://builds.apache.org/job/PreCommit-HBASE-Build/14834/testReport/ | | Max. process+thread count | 87 (vs. ulimit of 1) | | modules | C: hbase-resource-bundle U: hbase-resource-bundle | | Console output | https://builds.apache.org/job/PreCommit-HBASE-Build/14834/console | | Powered by | Apache Yetus 0.8.0 http://yetus.apache.org | This message was automatically generated. > Hbase unable to compile against Hadoop trunk (3.3.0-SNAPSHOT) due to license > error > -- > > Key: HBASE-21371 > URL: https://issues.apache.org/jira/browse/HBASE-21371 > Project: HBase > Issue Type: Bug >Reporter: Wei-Chiu Chuang >Assignee: Wei-Chiu Chuang >Priority: Major > Attachments: HBASE-21371.master.001.patch > > > Hadoop 3.3.0 (trunk) updated various dependencies, which adds additional > licenses that break HBase's license check plugin. > CDDL/GPLv2+CE license > {quote}This product includes JavaBeans Activation Framework API jar licensed > under the CDDL/GPLv2+CE. > CDDL or GPL version 2 plus the Classpath Exception > ERROR: Please check this License for acceptability here: > [https://www.apache.org/legal/resolved] > If it is okay, then update the list named 'non_aggregate_fine' in the > LICENSE.vm file. > If it isn't okay, then revert the change that added the dependency. > More info on the dependency: > javax.activation > javax.activation-api > 1.2.0 > maven central search > g:javax.activation AND a:javax.activation-api AND v:1.2.0 > project website > [http://java.net/all/javax.activation-api/] > project source > [https://github.com/javaee/activation/javax.activation-api] > {quote} > Bouncy
[jira] [Comment Edited] (HBASE-21344) hbase:meta location in ZooKeeper set to OPENING by the procedure which eventually failed but precludes Master from assigning it forever
[ https://issues.apache.org/jira/browse/HBASE-21344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661554#comment-16661554 ] Ankit Singhal edited comment on HBASE-21344 at 10/24/18 1:36 AM: - {quote}What you doing here from patch?{quote} I'm just removing duplicate tableStateManager.start() (and keeping the tableStateManager.start() after checking meta is actually online). And the test in the patch is to check if we have OPENING state for meta also, still SCP can succeed , so we don't need to change state of meta znode to offline during FAILED_OPEN of assign procedure(this meta znode state will intern also help in avoiding IMP if meta is in transition due to Server crash). Anyways, a sub-task to increase the no. of Assign max attempt , will not let the call go in FAILED_OPEN path(I think). {quote} 1096 Optional optProc = this.procedureExecutor.getProcedures().stream() 1097 .filter(p -> p instanceof ServerCrashProcedure).map(o -> (ServerCrashProcedure) o) 1098 .filter(s -> s.hasMetaTableRegion()).findAny(); You are reporting SCPs only if they have meta on them? Isn't this method more generic than just meta searches? {quote} Yes, My bad, we don't need this particular change. was (Author: an...@apache.org): {quote}What you doing here from patch?{quote} I'm just removing duplicate tableStateManager.start() (and keeping the tableStateManager.start() after checking meta is actually online). And the test in the patch is to check if we have OPENING state for meta also, still SCP can succeed , so we don't need to change state of meta znode to offline during FAILED_OPEN of assign procedure(this meta znode state will intern also help in avoiding IMP if meta is in transition due to Server crash). Anyways, a sub-task to increase the no. of Assign procedure , will not let the call go in FAILED_OPEN path(I think). {quote} 1096 Optional optProc = this.procedureExecutor.getProcedures().stream() 1097 .filter(p -> p instanceof ServerCrashProcedure).map(o -> (ServerCrashProcedure) o) 1098 .filter(s -> s.hasMetaTableRegion()).findAny(); You are reporting SCPs only if they have meta on them? Isn't this method more generic than just meta searches? {quote} Yes, My bad, we don't need this particular change. > hbase:meta location in ZooKeeper set to OPENING by the procedure which > eventually failed but precludes Master from assigning it forever > --- > > Key: HBASE-21344 > URL: https://issues.apache.org/jira/browse/HBASE-21344 > Project: HBase > Issue Type: Bug > Components: proc-v2 >Reporter: Ankit Singhal >Assignee: Ankit Singhal >Priority: Major > Attachments: HBASE-21344-branch-2.0.patch, > HBASE-21344-branch-2.0_v2.patch > > > [~elserj] has already summarized it well. > 1. hbase:meta was on RS8 > 2. RS8 crashed, SCP was queued for it, meta first > 3. meta was marked OFFLINE > 4. meta marked as OPENING on RS3 > 5. Can't actually send the openRegion RPC to RS3 due to the krb ticket issue > 6. We attempt the openRegion/assignment 10 times, failing each time > 7. We start rolling back the procedure: > {code:java} > 2018-10-08 06:51:24,440 WARN [PEWorker-9] procedure2.ProcedureExecutor: > Usually this should not happen, we will release the lock before if the > procedure is finished, even if the holdLock is true, arrive here means we > have some holes where we do not release the lock. And the releaseLock below > may fail since the procedure may have already been deleted from the procedure > store. > 2018-10-08 06:51:24,543 INFO [PEWorker-9] > procedure.MasterProcedureScheduler: pid=48, ppid=47, > state=FAILED:REGION_TRANSITION_QUEUE, > exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via > AssignProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException: Max > attempts exceeded; AssignProcedure table=hbase:meta, region=1588230740 > checking lock on 1588230740 > {code} > {code:java} > 2018-10-08 06:51:30,957 ERROR [PEWorker-9] procedure2.ProcedureExecutor: > CODE-BUG: Uncaught runtime exception for pid=47, > state=FAILED:SERVER_CRASH_ASSIGN_META, locked=true, > exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via > AssignProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException: Max > attempts exceeded; ServerCrashProcedure > server=,16020,1538974612843, splitWal=true, meta=true > java.lang.UnsupportedOperationException: unhandled > state=SERVER_CRASH_GET_REGIONS > at > org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.rollbackState(ServerCrashProcedure.java:254) > at >
[jira] [Updated] (HBASE-21325) Force to terminate regionserver when abort hang in somewhere
[ https://issues.apache.org/jira/browse/HBASE-21325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang updated HBASE-21325: --- Attachment: HBASE-21325.master.003.patch > Force to terminate regionserver when abort hang in somewhere > > > Key: HBASE-21325 > URL: https://issues.apache.org/jira/browse/HBASE-21325 > Project: HBase > Issue Type: Improvement >Reporter: Duo Zhang >Assignee: Guanghao Zhang >Priority: Major > Attachments: HBASE-21325.master.001.patch, > HBASE-21325.master.001.patch, HBASE-21325.master.002.patch, > HBASE-21325.master.003.patch > > > When testing sync replication, I found that, if I transit the remote cluster > to DA, while the local cluster is still in A, the region server will hang > when shutdown. As the fsOk flag only test the local cluster(which is > reasonable), we will enter the waitOnAllRegionsToClose, and since the WAL is > broken(the remote wal directory is gone) so we will never succeed. And this > lead to an infinite wait inside waitOnAllRegionsToClose. > So I think here we should have an upper bound for the wait time in > waitOnAllRegionsToClose method. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21363) Rewrite the buildingHoldCleanupTracker method in WALProcedureStore
[ https://issues.apache.org/jira/browse/HBASE-21363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661570#comment-16661570 ] Duo Zhang commented on HBASE-21363: --- Add simple comments for the ProcedureWALFormat.load method. > Rewrite the buildingHoldCleanupTracker method in WALProcedureStore > -- > > Key: HBASE-21363 > URL: https://issues.apache.org/jira/browse/HBASE-21363 > Project: HBase > Issue Type: Sub-task > Components: proc-v2 >Reporter: Duo Zhang >Assignee: Duo Zhang >Priority: Major > Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3 > > Attachments: HBASE-21363-v1.patch, HBASE-21363-v2.patch, > HBASE-21363-v3.patch, HBASE-21363.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (HBASE-21344) hbase:meta location in ZooKeeper set to OPENING by the procedure which eventually failed but precludes Master from assigning it forever
[ https://issues.apache.org/jira/browse/HBASE-21344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661554#comment-16661554 ] Ankit Singhal edited comment on HBASE-21344 at 10/24/18 1:30 AM: - {quote}What you doing here from patch?{quote} I'm just removing duplicate tableStateManager.start() (and keeping the tableStateManager.start() after checking meta is actually online). And the test in the patch is to check if we have OPENING state for meta also, still SCP can succeed , so we don't need to change state of meta znode to offline during FAILED_OPEN of assign procedure(this meta znode state will intern also help in avoiding IMP if meta is in transition due to Server crash). Anyways, a sub-task to increase the no. of Assign procedure , will not let the call go in FAILED_OPEN path(I think). {quote} 1096 Optional optProc = this.procedureExecutor.getProcedures().stream() 1097 .filter(p -> p instanceof ServerCrashProcedure).map(o -> (ServerCrashProcedure) o) 1098 .filter(s -> s.hasMetaTableRegion()).findAny(); You are reporting SCPs only if they have meta on them? Isn't this method more generic than just meta searches? {quote} Yes, My bad, we don't need this particular change. was (Author: an...@apache.org): {quote}What you doing here from patch?{quote} I'm just removing duplicate tableStateManager.start() (and keeping the tableStateManager.start() after checking meta is actually online). And the test in the patch is to check if we have OPENING state for meta also, still SCP can succeed , so we don't need to change state of meta znode to offline during FAILED_OPEN of assign procedure(this meta znode state will intern also help in avoiding IMP if meta is in transition due to Server crash) {quote} 1096 Optional optProc = this.procedureExecutor.getProcedures().stream() 1097 .filter(p -> p instanceof ServerCrashProcedure).map(o -> (ServerCrashProcedure) o) 1098 .filter(s -> s.hasMetaTableRegion()).findAny(); You are reporting SCPs only if they have meta on them? Isn't this method more generic than just meta searches? {quote} Yes, My bad, we don't need this particular change. > hbase:meta location in ZooKeeper set to OPENING by the procedure which > eventually failed but precludes Master from assigning it forever > --- > > Key: HBASE-21344 > URL: https://issues.apache.org/jira/browse/HBASE-21344 > Project: HBase > Issue Type: Bug > Components: proc-v2 >Reporter: Ankit Singhal >Assignee: Ankit Singhal >Priority: Major > Attachments: HBASE-21344-branch-2.0.patch, > HBASE-21344-branch-2.0_v2.patch > > > [~elserj] has already summarized it well. > 1. hbase:meta was on RS8 > 2. RS8 crashed, SCP was queued for it, meta first > 3. meta was marked OFFLINE > 4. meta marked as OPENING on RS3 > 5. Can't actually send the openRegion RPC to RS3 due to the krb ticket issue > 6. We attempt the openRegion/assignment 10 times, failing each time > 7. We start rolling back the procedure: > {code:java} > 2018-10-08 06:51:24,440 WARN [PEWorker-9] procedure2.ProcedureExecutor: > Usually this should not happen, we will release the lock before if the > procedure is finished, even if the holdLock is true, arrive here means we > have some holes where we do not release the lock. And the releaseLock below > may fail since the procedure may have already been deleted from the procedure > store. > 2018-10-08 06:51:24,543 INFO [PEWorker-9] > procedure.MasterProcedureScheduler: pid=48, ppid=47, > state=FAILED:REGION_TRANSITION_QUEUE, > exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via > AssignProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException: Max > attempts exceeded; AssignProcedure table=hbase:meta, region=1588230740 > checking lock on 1588230740 > {code} > {code:java} > 2018-10-08 06:51:30,957 ERROR [PEWorker-9] procedure2.ProcedureExecutor: > CODE-BUG: Uncaught runtime exception for pid=47, > state=FAILED:SERVER_CRASH_ASSIGN_META, locked=true, > exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via > AssignProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException: Max > attempts exceeded; ServerCrashProcedure > server=,16020,1538974612843, splitWal=true, meta=true > java.lang.UnsupportedOperationException: unhandled > state=SERVER_CRASH_GET_REGIONS > at > org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.rollbackState(ServerCrashProcedure.java:254) > at > org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.rollbackState(ServerCrashProcedure.java:58) > at > org.apache.hadoop.hbase.procedure2.StateMachineProcedure.rollback(StateMachineProcedure.java:203) >
[jira] [Updated] (HBASE-21363) Rewrite the buildingHoldCleanupTracker method in WALProcedureStore
[ https://issues.apache.org/jira/browse/HBASE-21363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Duo Zhang updated HBASE-21363: -- Attachment: HBASE-21363-v3.patch > Rewrite the buildingHoldCleanupTracker method in WALProcedureStore > -- > > Key: HBASE-21363 > URL: https://issues.apache.org/jira/browse/HBASE-21363 > Project: HBase > Issue Type: Sub-task > Components: proc-v2 >Reporter: Duo Zhang >Assignee: Duo Zhang >Priority: Major > Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3 > > Attachments: HBASE-21363-v1.patch, HBASE-21363-v2.patch, > HBASE-21363-v3.patch, HBASE-21363.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (HBASE-21344) hbase:meta location in ZooKeeper set to OPENING by the procedure which eventually failed but precludes Master from assigning it forever
[ https://issues.apache.org/jira/browse/HBASE-21344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661554#comment-16661554 ] Ankit Singhal edited comment on HBASE-21344 at 10/24/18 1:24 AM: - {quote}What you doing here from patch?{quote} I'm just removing duplicate tableStateManager.start() (and keeping the tableStateManager.start() after checking meta is actually online). And the test in the patch is to check if we have OPENING state for meta also, still SCP can succeed , so we don't need to change state of meta znode to offline during FAILED_OPEN of assign procedure(this meta znode state will intern also help in avoiding IMP if meta is in transition due to Server crash) {quote} 1096 Optional optProc = this.procedureExecutor.getProcedures().stream() 1097 .filter(p -> p instanceof ServerCrashProcedure).map(o -> (ServerCrashProcedure) o) 1098 .filter(s -> s.hasMetaTableRegion()).findAny(); You are reporting SCPs only if they have meta on them? Isn't this method more generic than just meta searches? {quote} Yes, My bad, we don't need this particular change. was (Author: an...@apache.org): {quote}1096 Optional optProc = this.procedureExecutor.getProcedures().stream() 1097 .filter(p -> p instanceof ServerCrashProcedure).map(o -> (ServerCrashProcedure) o) 1098 .filter(s -> s.hasMetaTableRegion()).findAny(); You are reporting SCPs only if they have meta on them? Isn't this method more generic than just meta searches? {quote} Yes, My bad, we don't need this particular change. > hbase:meta location in ZooKeeper set to OPENING by the procedure which > eventually failed but precludes Master from assigning it forever > --- > > Key: HBASE-21344 > URL: https://issues.apache.org/jira/browse/HBASE-21344 > Project: HBase > Issue Type: Bug > Components: proc-v2 >Reporter: Ankit Singhal >Assignee: Ankit Singhal >Priority: Major > Attachments: HBASE-21344-branch-2.0.patch, > HBASE-21344-branch-2.0_v2.patch > > > [~elserj] has already summarized it well. > 1. hbase:meta was on RS8 > 2. RS8 crashed, SCP was queued for it, meta first > 3. meta was marked OFFLINE > 4. meta marked as OPENING on RS3 > 5. Can't actually send the openRegion RPC to RS3 due to the krb ticket issue > 6. We attempt the openRegion/assignment 10 times, failing each time > 7. We start rolling back the procedure: > {code:java} > 2018-10-08 06:51:24,440 WARN [PEWorker-9] procedure2.ProcedureExecutor: > Usually this should not happen, we will release the lock before if the > procedure is finished, even if the holdLock is true, arrive here means we > have some holes where we do not release the lock. And the releaseLock below > may fail since the procedure may have already been deleted from the procedure > store. > 2018-10-08 06:51:24,543 INFO [PEWorker-9] > procedure.MasterProcedureScheduler: pid=48, ppid=47, > state=FAILED:REGION_TRANSITION_QUEUE, > exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via > AssignProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException: Max > attempts exceeded; AssignProcedure table=hbase:meta, region=1588230740 > checking lock on 1588230740 > {code} > {code:java} > 2018-10-08 06:51:30,957 ERROR [PEWorker-9] procedure2.ProcedureExecutor: > CODE-BUG: Uncaught runtime exception for pid=47, > state=FAILED:SERVER_CRASH_ASSIGN_META, locked=true, > exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via > AssignProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException: Max > attempts exceeded; ServerCrashProcedure > server=,16020,1538974612843, splitWal=true, meta=true > java.lang.UnsupportedOperationException: unhandled > state=SERVER_CRASH_GET_REGIONS > at > org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.rollbackState(ServerCrashProcedure.java:254) > at > org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.rollbackState(ServerCrashProcedure.java:58) > at > org.apache.hadoop.hbase.procedure2.StateMachineProcedure.rollback(StateMachineProcedure.java:203) > at > org.apache.hadoop.hbase.procedure2.Procedure.doRollback(Procedure.java:960) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1577) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1539) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1418) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$900(ProcedureExecutor.java:75) > at >
[jira] [Updated] (HBASE-21371) Hbase unable to compile against Hadoop trunk (3.3.0-SNAPSHOT) due to license error
[ https://issues.apache.org/jira/browse/HBASE-21371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei-Chiu Chuang updated HBASE-21371: Attachment: (was: HBASE-21371.001.patch) > Hbase unable to compile against Hadoop trunk (3.3.0-SNAPSHOT) due to license > error > -- > > Key: HBASE-21371 > URL: https://issues.apache.org/jira/browse/HBASE-21371 > Project: HBase > Issue Type: Bug >Reporter: Wei-Chiu Chuang >Assignee: Wei-Chiu Chuang >Priority: Major > Attachments: HBASE-21371.master.001.patch > > > Hadoop 3.3.0 (trunk) updated various dependencies, which adds additional > licenses that break HBase's license check plugin. > CDDL/GPLv2+CE license > {quote}This product includes JavaBeans Activation Framework API jar licensed > under the CDDL/GPLv2+CE. > CDDL or GPL version 2 plus the Classpath Exception > ERROR: Please check this License for acceptability here: > [https://www.apache.org/legal/resolved] > If it is okay, then update the list named 'non_aggregate_fine' in the > LICENSE.vm file. > If it isn't okay, then revert the change that added the dependency. > More info on the dependency: > javax.activation > javax.activation-api > 1.2.0 > maven central search > g:javax.activation AND a:javax.activation-api AND v:1.2.0 > project website > [http://java.net/all/javax.activation-api/] > project source > [https://github.com/javaee/activation/javax.activation-api] > {quote} > Bouncy Castle License > {quote}– > This product includes Bouncy Castle PKIX, CMS, EAC, TSP, PKCS, OCSP, CMP, > and CRMF APIs licensed under the Bouncy Castle Licence. > ERROR: Please check this License for acceptability here: > [https://www.apache.org/legal/resolved] > If it is okay, then update the list named 'non_aggregate_fine' in the > LICENSE.vm file. > If it isn't okay, then revert the change that added the dependency. > More info on the dependency: > org.bouncycastle > bcpkix-jdk15on > 1.60 > maven central search > g:org.bouncycastle AND a:bcpkix-jdk15on AND v:1.60 > project website > [http://www.bouncycastle.org/java.html] > project source > [https://github.com/bcgit/bc-java] > – > {quote} > > And a long list of "Apache Software License - Version 2.0" licensed Jetty > dependencies like this: > {quote} > This product includes Jetty :: Servlet Annotations licensed under the Apache > Software License - Version 2.0. > ERROR: Please check this License for acceptability here: > [https://www.apache.org/legal/resolved] > If it is okay, then update the list named 'non_aggregate_fine' in the > LICENSE.vm file. > If it isn't okay, then revert the change that added the dependency. > More info on the dependency: > org.eclipse.jetty > jetty-annotations > 9.3.19.v20170502 > maven central search > g:org.eclipse.jetty AND a:jetty-annotations AND v:9.3.19.v20170502 > project website > [http://www.eclipse.org/jetty] > project source > [https://github.com/eclipse/jetty.project/jetty-annotations] > {quote} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-21371) Hbase unable to compile against Hadoop trunk (3.3.0-SNAPSHOT) due to license error
[ https://issues.apache.org/jira/browse/HBASE-21371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei-Chiu Chuang updated HBASE-21371: Status: Patch Available (was: Open) > Hbase unable to compile against Hadoop trunk (3.3.0-SNAPSHOT) due to license > error > -- > > Key: HBASE-21371 > URL: https://issues.apache.org/jira/browse/HBASE-21371 > Project: HBase > Issue Type: Bug >Reporter: Wei-Chiu Chuang >Assignee: Wei-Chiu Chuang >Priority: Major > Attachments: HBASE-21371.master.001.patch > > > Hadoop 3.3.0 (trunk) updated various dependencies, which adds additional > licenses that break HBase's license check plugin. > CDDL/GPLv2+CE license > {quote}This product includes JavaBeans Activation Framework API jar licensed > under the CDDL/GPLv2+CE. > CDDL or GPL version 2 plus the Classpath Exception > ERROR: Please check this License for acceptability here: > [https://www.apache.org/legal/resolved] > If it is okay, then update the list named 'non_aggregate_fine' in the > LICENSE.vm file. > If it isn't okay, then revert the change that added the dependency. > More info on the dependency: > javax.activation > javax.activation-api > 1.2.0 > maven central search > g:javax.activation AND a:javax.activation-api AND v:1.2.0 > project website > [http://java.net/all/javax.activation-api/] > project source > [https://github.com/javaee/activation/javax.activation-api] > {quote} > Bouncy Castle License > {quote}– > This product includes Bouncy Castle PKIX, CMS, EAC, TSP, PKCS, OCSP, CMP, > and CRMF APIs licensed under the Bouncy Castle Licence. > ERROR: Please check this License for acceptability here: > [https://www.apache.org/legal/resolved] > If it is okay, then update the list named 'non_aggregate_fine' in the > LICENSE.vm file. > If it isn't okay, then revert the change that added the dependency. > More info on the dependency: > org.bouncycastle > bcpkix-jdk15on > 1.60 > maven central search > g:org.bouncycastle AND a:bcpkix-jdk15on AND v:1.60 > project website > [http://www.bouncycastle.org/java.html] > project source > [https://github.com/bcgit/bc-java] > – > {quote} > > And a long list of "Apache Software License - Version 2.0" licensed Jetty > dependencies like this: > {quote} > This product includes Jetty :: Servlet Annotations licensed under the Apache > Software License - Version 2.0. > ERROR: Please check this License for acceptability here: > [https://www.apache.org/legal/resolved] > If it is okay, then update the list named 'non_aggregate_fine' in the > LICENSE.vm file. > If it isn't okay, then revert the change that added the dependency. > More info on the dependency: > org.eclipse.jetty > jetty-annotations > 9.3.19.v20170502 > maven central search > g:org.eclipse.jetty AND a:jetty-annotations AND v:9.3.19.v20170502 > project website > [http://www.eclipse.org/jetty] > project source > [https://github.com/eclipse/jetty.project/jetty-annotations] > {quote} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21371) Hbase unable to compile against Hadoop trunk (3.3.0-SNAPSHOT) due to license error
[ https://issues.apache.org/jira/browse/HBASE-21371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661560#comment-16661560 ] Wei-Chiu Chuang commented on HBASE-21371: - yes, this will need to be done wherever we ship the relevant jar. if we already have bouncycastle as a dependency shouldn't we have an entry for it though? I've checked the so called "Bouncy Castle License" is literally the same as the MIT License, character-by-character. Interestingly I found LICENSE.vm contains this hard coded license text: {quote}Bouncycastle is released under the MIT license (available above), and is Copyright (c) 2000 - 2006 The Legion Of The Bouncy Castle. {quote} Maybe there's a bug in the license checker plugin and it didn't find Bouncycastle before so you had to manually add this license text? > Hbase unable to compile against Hadoop trunk (3.3.0-SNAPSHOT) due to license > error > -- > > Key: HBASE-21371 > URL: https://issues.apache.org/jira/browse/HBASE-21371 > Project: HBase > Issue Type: Bug >Reporter: Wei-Chiu Chuang >Assignee: Wei-Chiu Chuang >Priority: Major > Attachments: HBASE-21371.master.001.patch > > > Hadoop 3.3.0 (trunk) updated various dependencies, which adds additional > licenses that break HBase's license check plugin. > CDDL/GPLv2+CE license > {quote}This product includes JavaBeans Activation Framework API jar licensed > under the CDDL/GPLv2+CE. > CDDL or GPL version 2 plus the Classpath Exception > ERROR: Please check this License for acceptability here: > [https://www.apache.org/legal/resolved] > If it is okay, then update the list named 'non_aggregate_fine' in the > LICENSE.vm file. > If it isn't okay, then revert the change that added the dependency. > More info on the dependency: > javax.activation > javax.activation-api > 1.2.0 > maven central search > g:javax.activation AND a:javax.activation-api AND v:1.2.0 > project website > [http://java.net/all/javax.activation-api/] > project source > [https://github.com/javaee/activation/javax.activation-api] > {quote} > Bouncy Castle License > {quote}– > This product includes Bouncy Castle PKIX, CMS, EAC, TSP, PKCS, OCSP, CMP, > and CRMF APIs licensed under the Bouncy Castle Licence. > ERROR: Please check this License for acceptability here: > [https://www.apache.org/legal/resolved] > If it is okay, then update the list named 'non_aggregate_fine' in the > LICENSE.vm file. > If it isn't okay, then revert the change that added the dependency. > More info on the dependency: > org.bouncycastle > bcpkix-jdk15on > 1.60 > maven central search > g:org.bouncycastle AND a:bcpkix-jdk15on AND v:1.60 > project website > [http://www.bouncycastle.org/java.html] > project source > [https://github.com/bcgit/bc-java] > – > {quote} > > And a long list of "Apache Software License - Version 2.0" licensed Jetty > dependencies like this: > {quote} > This product includes Jetty :: Servlet Annotations licensed under the Apache > Software License - Version 2.0. > ERROR: Please check this License for acceptability here: > [https://www.apache.org/legal/resolved] > If it is okay, then update the list named 'non_aggregate_fine' in the > LICENSE.vm file. > If it isn't okay, then revert the change that added the dependency. > More info on the dependency: > org.eclipse.jetty > jetty-annotations > 9.3.19.v20170502 > maven central search > g:org.eclipse.jetty AND a:jetty-annotations AND v:9.3.19.v20170502 > project website > [http://www.eclipse.org/jetty] > project source > [https://github.com/eclipse/jetty.project/jetty-annotations] > {quote} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-21371) Hbase unable to compile against Hadoop trunk (3.3.0-SNAPSHOT) due to license error
[ https://issues.apache.org/jira/browse/HBASE-21371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei-Chiu Chuang updated HBASE-21371: Attachment: HBASE-21371.master.001.patch > Hbase unable to compile against Hadoop trunk (3.3.0-SNAPSHOT) due to license > error > -- > > Key: HBASE-21371 > URL: https://issues.apache.org/jira/browse/HBASE-21371 > Project: HBase > Issue Type: Bug >Reporter: Wei-Chiu Chuang >Assignee: Wei-Chiu Chuang >Priority: Major > Attachments: HBASE-21371.001.patch, HBASE-21371.master.001.patch > > > Hadoop 3.3.0 (trunk) updated various dependencies, which adds additional > licenses that break HBase's license check plugin. > CDDL/GPLv2+CE license > {quote}This product includes JavaBeans Activation Framework API jar licensed > under the CDDL/GPLv2+CE. > CDDL or GPL version 2 plus the Classpath Exception > ERROR: Please check this License for acceptability here: > [https://www.apache.org/legal/resolved] > If it is okay, then update the list named 'non_aggregate_fine' in the > LICENSE.vm file. > If it isn't okay, then revert the change that added the dependency. > More info on the dependency: > javax.activation > javax.activation-api > 1.2.0 > maven central search > g:javax.activation AND a:javax.activation-api AND v:1.2.0 > project website > [http://java.net/all/javax.activation-api/] > project source > [https://github.com/javaee/activation/javax.activation-api] > {quote} > Bouncy Castle License > {quote}– > This product includes Bouncy Castle PKIX, CMS, EAC, TSP, PKCS, OCSP, CMP, > and CRMF APIs licensed under the Bouncy Castle Licence. > ERROR: Please check this License for acceptability here: > [https://www.apache.org/legal/resolved] > If it is okay, then update the list named 'non_aggregate_fine' in the > LICENSE.vm file. > If it isn't okay, then revert the change that added the dependency. > More info on the dependency: > org.bouncycastle > bcpkix-jdk15on > 1.60 > maven central search > g:org.bouncycastle AND a:bcpkix-jdk15on AND v:1.60 > project website > [http://www.bouncycastle.org/java.html] > project source > [https://github.com/bcgit/bc-java] > – > {quote} > > And a long list of "Apache Software License - Version 2.0" licensed Jetty > dependencies like this: > {quote} > This product includes Jetty :: Servlet Annotations licensed under the Apache > Software License - Version 2.0. > ERROR: Please check this License for acceptability here: > [https://www.apache.org/legal/resolved] > If it is okay, then update the list named 'non_aggregate_fine' in the > LICENSE.vm file. > If it isn't okay, then revert the change that added the dependency. > More info on the dependency: > org.eclipse.jetty > jetty-annotations > 9.3.19.v20170502 > maven central search > g:org.eclipse.jetty AND a:jetty-annotations AND v:9.3.19.v20170502 > project website > [http://www.eclipse.org/jetty] > project source > [https://github.com/eclipse/jetty.project/jetty-annotations] > {quote} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21371) Hbase unable to compile against Hadoop trunk (3.3.0-SNAPSHOT) due to license error
[ https://issues.apache.org/jira/browse/HBASE-21371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661558#comment-16661558 ] Wei-Chiu Chuang commented on HBASE-21371: - {code:java} $ mvn dependency:tree -Dhadoop.profile=3.0 -Dhadoop-three.version=3.3.0-SNAPSHOT{code} javax.activation is included indirectly from hadoop-common: {quote}[INFO] +- org.apache.hadoop:hadoop-common:jar:3.3.0-SNAPSHOT:compile [INFO] | +- javax.activation:javax.activation-api:jar:1.2.0:runtime {quote} bouncycastle is included in test jars only: {quote}[INFO] +- org.apache.hadoop:hadoop-minicluster:jar:3.3.0-SNAPSHOT:test [INFO] | - org.apache.hadoop:hadoop-yarn-server-web-proxy:jar:3.3.0-SNAPSHOT:test [INFO] | +- org.bouncycastle:bcprov-jdk15on:jar:1.60:test [INFO] | - org.bouncycastle:bcpkix-jdk15on:jar:1.60:test {quote} the new jetty dependencies are included in test jars only too: {quote}[INFO] +- org.apache.hadoop:hadoop-minicluster:jar:3.3.0-SNAPSHOT:test [INFO] | +- org.apache.hadoop:hadoop-yarn-server-tests:test-jar:tests:3.3.0-SNAPSHOT:test [INFO] | | +- org.apache.hadoop:hadoop-yarn-server-nodemanager:jar:3.3.0-SNAPSHOT:test [INFO] | | | +- org.eclipse.jetty.websocket:javax-websocket-server-impl:jar:9.3.19.v20170502:test {quote} > Hbase unable to compile against Hadoop trunk (3.3.0-SNAPSHOT) due to license > error > -- > > Key: HBASE-21371 > URL: https://issues.apache.org/jira/browse/HBASE-21371 > Project: HBase > Issue Type: Bug >Reporter: Wei-Chiu Chuang >Assignee: Wei-Chiu Chuang >Priority: Major > Attachments: HBASE-21371.001.patch, HBASE-21371.master.001.patch > > > Hadoop 3.3.0 (trunk) updated various dependencies, which adds additional > licenses that break HBase's license check plugin. > CDDL/GPLv2+CE license > {quote}This product includes JavaBeans Activation Framework API jar licensed > under the CDDL/GPLv2+CE. > CDDL or GPL version 2 plus the Classpath Exception > ERROR: Please check this License for acceptability here: > [https://www.apache.org/legal/resolved] > If it is okay, then update the list named 'non_aggregate_fine' in the > LICENSE.vm file. > If it isn't okay, then revert the change that added the dependency. > More info on the dependency: > javax.activation > javax.activation-api > 1.2.0 > maven central search > g:javax.activation AND a:javax.activation-api AND v:1.2.0 > project website > [http://java.net/all/javax.activation-api/] > project source > [https://github.com/javaee/activation/javax.activation-api] > {quote} > Bouncy Castle License > {quote}– > This product includes Bouncy Castle PKIX, CMS, EAC, TSP, PKCS, OCSP, CMP, > and CRMF APIs licensed under the Bouncy Castle Licence. > ERROR: Please check this License for acceptability here: > [https://www.apache.org/legal/resolved] > If it is okay, then update the list named 'non_aggregate_fine' in the > LICENSE.vm file. > If it isn't okay, then revert the change that added the dependency. > More info on the dependency: > org.bouncycastle > bcpkix-jdk15on > 1.60 > maven central search > g:org.bouncycastle AND a:bcpkix-jdk15on AND v:1.60 > project website > [http://www.bouncycastle.org/java.html] > project source > [https://github.com/bcgit/bc-java] > – > {quote} > > And a long list of "Apache Software License - Version 2.0" licensed Jetty > dependencies like this: > {quote} > This product includes Jetty :: Servlet Annotations licensed under the Apache > Software License - Version 2.0. > ERROR: Please check this License for acceptability here: > [https://www.apache.org/legal/resolved] > If it is okay, then update the list named 'non_aggregate_fine' in the > LICENSE.vm file. > If it isn't okay, then revert the change that added the dependency. > More info on the dependency: > org.eclipse.jetty > jetty-annotations > 9.3.19.v20170502 > maven central search > g:org.eclipse.jetty AND a:jetty-annotations AND v:9.3.19.v20170502 > project website > [http://www.eclipse.org/jetty] > project source > [https://github.com/eclipse/jetty.project/jetty-annotations] > {quote} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21344) hbase:meta location in ZooKeeper set to OPENING by the procedure which eventually failed but precludes Master from assigning it forever
[ https://issues.apache.org/jira/browse/HBASE-21344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661554#comment-16661554 ] Ankit Singhal commented on HBASE-21344: --- {quote}1096 Optional optProc = this.procedureExecutor.getProcedures().stream() 1097 .filter(p -> p instanceof ServerCrashProcedure).map(o -> (ServerCrashProcedure) o) 1098 .filter(s -> s.hasMetaTableRegion()).findAny(); You are reporting SCPs only if they have meta on them? Isn't this method more generic than just meta searches? {quote} Yes, My bad, we don't need this particular change. > hbase:meta location in ZooKeeper set to OPENING by the procedure which > eventually failed but precludes Master from assigning it forever > --- > > Key: HBASE-21344 > URL: https://issues.apache.org/jira/browse/HBASE-21344 > Project: HBase > Issue Type: Bug > Components: proc-v2 >Reporter: Ankit Singhal >Assignee: Ankit Singhal >Priority: Major > Attachments: HBASE-21344-branch-2.0.patch, > HBASE-21344-branch-2.0_v2.patch > > > [~elserj] has already summarized it well. > 1. hbase:meta was on RS8 > 2. RS8 crashed, SCP was queued for it, meta first > 3. meta was marked OFFLINE > 4. meta marked as OPENING on RS3 > 5. Can't actually send the openRegion RPC to RS3 due to the krb ticket issue > 6. We attempt the openRegion/assignment 10 times, failing each time > 7. We start rolling back the procedure: > {code:java} > 2018-10-08 06:51:24,440 WARN [PEWorker-9] procedure2.ProcedureExecutor: > Usually this should not happen, we will release the lock before if the > procedure is finished, even if the holdLock is true, arrive here means we > have some holes where we do not release the lock. And the releaseLock below > may fail since the procedure may have already been deleted from the procedure > store. > 2018-10-08 06:51:24,543 INFO [PEWorker-9] > procedure.MasterProcedureScheduler: pid=48, ppid=47, > state=FAILED:REGION_TRANSITION_QUEUE, > exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via > AssignProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException: Max > attempts exceeded; AssignProcedure table=hbase:meta, region=1588230740 > checking lock on 1588230740 > {code} > {code:java} > 2018-10-08 06:51:30,957 ERROR [PEWorker-9] procedure2.ProcedureExecutor: > CODE-BUG: Uncaught runtime exception for pid=47, > state=FAILED:SERVER_CRASH_ASSIGN_META, locked=true, > exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via > AssignProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException: Max > attempts exceeded; ServerCrashProcedure > server=,16020,1538974612843, splitWal=true, meta=true > java.lang.UnsupportedOperationException: unhandled > state=SERVER_CRASH_GET_REGIONS > at > org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.rollbackState(ServerCrashProcedure.java:254) > at > org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.rollbackState(ServerCrashProcedure.java:58) > at > org.apache.hadoop.hbase.procedure2.StateMachineProcedure.rollback(StateMachineProcedure.java:203) > at > org.apache.hadoop.hbase.procedure2.Procedure.doRollback(Procedure.java:960) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1577) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1539) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1418) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$900(ProcedureExecutor.java:75) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1981) > {code} > {code:java} > { DEBUG [PEWorker-2] client.RpcRetryingCallerImpl: Call exception, tries=7, > retries=7, started=8168 ms ago, cancelled=false, msg=Meta region is in state > OPENING, details=row 'backup:system' on table 'hbase:meta' at > region=hbase:meta,,1.1588230740, hostname=, seqNum=-1, > exception=java.io.IOException: Meta region is in state OPENING > at > org.apache.hadoop.hbase.client.ZKAsyncRegistry.lambda$null$1(ZKAsyncRegistry.java:154) > at > java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760) > at > java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736) > at > java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474) > at > java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1962) > at >
[jira] [Commented] (HBASE-21363) Rewrite the buildingHoldCleanupTracker method in WALProcedureStore
[ https://issues.apache.org/jira/browse/HBASE-21363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661555#comment-16661555 ] Duo Zhang commented on HBASE-21363: --- OK, it is in ProcedureWALFormat... Let me update the patch. > Rewrite the buildingHoldCleanupTracker method in WALProcedureStore > -- > > Key: HBASE-21363 > URL: https://issues.apache.org/jira/browse/HBASE-21363 > Project: HBase > Issue Type: Sub-task > Components: proc-v2 >Reporter: Duo Zhang >Assignee: Duo Zhang >Priority: Major > Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3 > > Attachments: HBASE-21363-v1.patch, HBASE-21363-v2.patch, > HBASE-21363.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21363) Rewrite the buildingHoldCleanupTracker method in WALProcedureStore
[ https://issues.apache.org/jira/browse/HBASE-21363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661553#comment-16661553 ] Duo Zhang commented on HBASE-21363: --- [~allan163] Where is the code we call resetModified on the tracker in ProcedureWALFormatReader? I can find it. > Rewrite the buildingHoldCleanupTracker method in WALProcedureStore > -- > > Key: HBASE-21363 > URL: https://issues.apache.org/jira/browse/HBASE-21363 > Project: HBase > Issue Type: Sub-task > Components: proc-v2 >Reporter: Duo Zhang >Assignee: Duo Zhang >Priority: Major > Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3 > > Attachments: HBASE-21363-v1.patch, HBASE-21363-v2.patch, > HBASE-21363.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21344) hbase:meta location in ZooKeeper set to OPENING by the procedure which eventually failed but precludes Master from assigning it forever
[ https://issues.apache.org/jira/browse/HBASE-21344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661548#comment-16661548 ] stack commented on HBASE-21344: --- bq. Actually, I'm working against branch-2.0 only, here you can see tableStateManager is started 2 times, My bad. Indeed, 2.0 has this. 2.1 does not. What you doing here from patch? 1096 Optional optProc = this.procedureExecutor.getProcedures().stream() 1097 .filter(p -> p instanceof ServerCrashProcedure).map(o -> (ServerCrashProcedure) o) 1098 .filter(s -> s.hasMetaTableRegion()).findAny(); You are reporting SCPs only if they have meta on them? Isn't this method more generic than just meta searches? > hbase:meta location in ZooKeeper set to OPENING by the procedure which > eventually failed but precludes Master from assigning it forever > --- > > Key: HBASE-21344 > URL: https://issues.apache.org/jira/browse/HBASE-21344 > Project: HBase > Issue Type: Bug > Components: proc-v2 >Reporter: Ankit Singhal >Assignee: Ankit Singhal >Priority: Major > Attachments: HBASE-21344-branch-2.0.patch, > HBASE-21344-branch-2.0_v2.patch > > > [~elserj] has already summarized it well. > 1. hbase:meta was on RS8 > 2. RS8 crashed, SCP was queued for it, meta first > 3. meta was marked OFFLINE > 4. meta marked as OPENING on RS3 > 5. Can't actually send the openRegion RPC to RS3 due to the krb ticket issue > 6. We attempt the openRegion/assignment 10 times, failing each time > 7. We start rolling back the procedure: > {code:java} > 2018-10-08 06:51:24,440 WARN [PEWorker-9] procedure2.ProcedureExecutor: > Usually this should not happen, we will release the lock before if the > procedure is finished, even if the holdLock is true, arrive here means we > have some holes where we do not release the lock. And the releaseLock below > may fail since the procedure may have already been deleted from the procedure > store. > 2018-10-08 06:51:24,543 INFO [PEWorker-9] > procedure.MasterProcedureScheduler: pid=48, ppid=47, > state=FAILED:REGION_TRANSITION_QUEUE, > exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via > AssignProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException: Max > attempts exceeded; AssignProcedure table=hbase:meta, region=1588230740 > checking lock on 1588230740 > {code} > {code:java} > 2018-10-08 06:51:30,957 ERROR [PEWorker-9] procedure2.ProcedureExecutor: > CODE-BUG: Uncaught runtime exception for pid=47, > state=FAILED:SERVER_CRASH_ASSIGN_META, locked=true, > exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via > AssignProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException: Max > attempts exceeded; ServerCrashProcedure > server=,16020,1538974612843, splitWal=true, meta=true > java.lang.UnsupportedOperationException: unhandled > state=SERVER_CRASH_GET_REGIONS > at > org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.rollbackState(ServerCrashProcedure.java:254) > at > org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.rollbackState(ServerCrashProcedure.java:58) > at > org.apache.hadoop.hbase.procedure2.StateMachineProcedure.rollback(StateMachineProcedure.java:203) > at > org.apache.hadoop.hbase.procedure2.Procedure.doRollback(Procedure.java:960) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1577) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1539) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1418) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$900(ProcedureExecutor.java:75) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1981) > {code} > {code:java} > { DEBUG [PEWorker-2] client.RpcRetryingCallerImpl: Call exception, tries=7, > retries=7, started=8168 ms ago, cancelled=false, msg=Meta region is in state > OPENING, details=row 'backup:system' on table 'hbase:meta' at > region=hbase:meta,,1.1588230740, hostname=, seqNum=-1, > exception=java.io.IOException: Meta region is in state OPENING > at > org.apache.hadoop.hbase.client.ZKAsyncRegistry.lambda$null$1(ZKAsyncRegistry.java:154) > at > java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760) > at > java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736) > at > java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474) > at >
[jira] [Commented] (HBASE-21344) hbase:meta location in ZooKeeper set to OPENING by the procedure which eventually failed but precludes Master from assigning it forever
[ https://issues.apache.org/jira/browse/HBASE-21344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661544#comment-16661544 ] Ankit Singhal commented on HBASE-21344: --- bq. You don't seem to be working against the tip of branch-2.0 or branch-2.1. You seem to be working in your own branch? Is that so? If so, startup has changed pretty radically since 2.0.0. Actually, I'm working against branch-2.0 only, here you can see tableStateManager is started 2 times, At this instance, we only wait for IMP( which will be ok during the first start after deploy) but not when there are SCPs. https://github.com/apache/hbase/blob/branch-2.0/hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java#L929 TableStateManger is started after meta is actually online(which is correct). https://github.com/apache/hbase/blob/branch-2.0/hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java#L958 > hbase:meta location in ZooKeeper set to OPENING by the procedure which > eventually failed but precludes Master from assigning it forever > --- > > Key: HBASE-21344 > URL: https://issues.apache.org/jira/browse/HBASE-21344 > Project: HBase > Issue Type: Bug > Components: proc-v2 >Reporter: Ankit Singhal >Assignee: Ankit Singhal >Priority: Major > Attachments: HBASE-21344-branch-2.0.patch, > HBASE-21344-branch-2.0_v2.patch > > > [~elserj] has already summarized it well. > 1. hbase:meta was on RS8 > 2. RS8 crashed, SCP was queued for it, meta first > 3. meta was marked OFFLINE > 4. meta marked as OPENING on RS3 > 5. Can't actually send the openRegion RPC to RS3 due to the krb ticket issue > 6. We attempt the openRegion/assignment 10 times, failing each time > 7. We start rolling back the procedure: > {code:java} > 2018-10-08 06:51:24,440 WARN [PEWorker-9] procedure2.ProcedureExecutor: > Usually this should not happen, we will release the lock before if the > procedure is finished, even if the holdLock is true, arrive here means we > have some holes where we do not release the lock. And the releaseLock below > may fail since the procedure may have already been deleted from the procedure > store. > 2018-10-08 06:51:24,543 INFO [PEWorker-9] > procedure.MasterProcedureScheduler: pid=48, ppid=47, > state=FAILED:REGION_TRANSITION_QUEUE, > exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via > AssignProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException: Max > attempts exceeded; AssignProcedure table=hbase:meta, region=1588230740 > checking lock on 1588230740 > {code} > {code:java} > 2018-10-08 06:51:30,957 ERROR [PEWorker-9] procedure2.ProcedureExecutor: > CODE-BUG: Uncaught runtime exception for pid=47, > state=FAILED:SERVER_CRASH_ASSIGN_META, locked=true, > exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via > AssignProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException: Max > attempts exceeded; ServerCrashProcedure > server=,16020,1538974612843, splitWal=true, meta=true > java.lang.UnsupportedOperationException: unhandled > state=SERVER_CRASH_GET_REGIONS > at > org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.rollbackState(ServerCrashProcedure.java:254) > at > org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.rollbackState(ServerCrashProcedure.java:58) > at > org.apache.hadoop.hbase.procedure2.StateMachineProcedure.rollback(StateMachineProcedure.java:203) > at > org.apache.hadoop.hbase.procedure2.Procedure.doRollback(Procedure.java:960) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1577) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1539) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1418) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$900(ProcedureExecutor.java:75) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1981) > {code} > {code:java} > { DEBUG [PEWorker-2] client.RpcRetryingCallerImpl: Call exception, tries=7, > retries=7, started=8168 ms ago, cancelled=false, msg=Meta region is in state > OPENING, details=row 'backup:system' on table 'hbase:meta' at > region=hbase:meta,,1.1588230740, hostname=, seqNum=-1, > exception=java.io.IOException: Meta region is in state OPENING > at > org.apache.hadoop.hbase.client.ZKAsyncRegistry.lambda$null$1(ZKAsyncRegistry.java:154) > at > java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760) > at >
[jira] [Commented] (HBASE-21224) Handle compaction queue duplication
[ https://issues.apache.org/jira/browse/HBASE-21224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661536#comment-16661536 ] Hadoop QA commented on HBASE-21224: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 18s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 33s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 47s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 5s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 3m 57s{color} | {color:green} branch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 1s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 27s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 48s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 45s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 45s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 3s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 4m 1s{color} | {color:green} patch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 10m 3s{color} | {color:green} Patch does not cause any errors with Hadoop 2.7.4 or 3.0.0. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 6s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 30s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red}285m 41s{color} | {color:red} hbase-server in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 26s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}325m 6s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hbase.client.TestFromClientSide | | | hadoop.hbase.namespace.TestNamespaceAuditor | | | hadoop.hbase.client.TestSnapshotTemporaryDirectoryWithRegionReplicas | | | hadoop.hbase.replication.TestReplicationKillSlaveRS | | | hadoop.hbase.client.TestSnapshotDFSTemporaryDirectory | | | hadoop.hbase.quotas.TestSpaceQuotas | | | hadoop.hbase.regionserver.TestRegionReplicaFailover | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:b002b0b | | JIRA Issue | HBASE-21224 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12945266/HBASE-21224-master.004.patch | | Optional Tests | dupname asflicense javac javadoc unit findbugs shadedjars hadoopcheck hbaseanti checkstyle compile | | uname | Linux 9607297f8ebf 4.4.0-133-generic #159-Ubuntu SMP Fri Aug 10 07:31:43 UTC 2018 x86_64 GNU/Linux | | Build tool | maven | | Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh | | git revision | master / 3b68e5393e | | maven | version: Apache Maven 3.5.4 (1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-17T18:33:14Z) | | Default Java | 1.8.0_181 | |
[jira] [Commented] (HBASE-21364) Procedure holds the lock should put to front of the queue after restart
[ https://issues.apache.org/jira/browse/HBASE-21364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661533#comment-16661533 ] stack commented on HBASE-21364: --- Ok. Thanks. Yeah, want to cut an RC0 if I can. Thanks for working on this. > Procedure holds the lock should put to front of the queue after restart > --- > > Key: HBASE-21364 > URL: https://issues.apache.org/jira/browse/HBASE-21364 > Project: HBase > Issue Type: Sub-task >Affects Versions: 2.1.0, 2.0.2 >Reporter: Allan Yang >Assignee: Allan Yang >Priority: Blocker > Fix For: 2.1.1, 2.0.3 > > Attachments: HBASE-21364.branch-2.0.001.patch, > HBASE-21364.branch-2.0.002.patch > > > After restore the procedures form Procedure WALs. We will put the runable > procedures back to the queue to execute. The order is not the problem before > HBASE-20846 since the first one to execute will acquire the lock itself. But > since the locks will restored after HBASE-20846. If we execute a procedure > without the lock first before a procedure with the lock in the same queue, > there is a race condition that we may not be able to execute all procedures > in the same queue at all. > The race condtion is: > 1. A procedure need to take the table's exclusive lock was put into the > table's queue, but the table's shard lock was lock by a Region Procedure. > Since no one takes the exclusive lock, the queue is put to run queue to > execute. But soon, the worker thread see the procedure can't execute because > it doesn't hold the lock, so it will stop execute and remove the queue from > run queue. > 2. At the same time, the Region procedure which holds the table's shard lock > and the region's exclusive lock is put to the table's queue. But, since the > queue already added to the run queue, it won't add again. > 3. Since 1, the table's queue was removed from the run queue. > 4. Then, no one will put the table's queue back, thus no worker will execute > the procedures inside > A test case in the patch shows how. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21364) Procedure holds the lock should put to front of the queue after restart
[ https://issues.apache.org/jira/browse/HBASE-21364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661528#comment-16661528 ] Duo Zhang commented on HBASE-21364: --- This is a critical problem so mark it as blocker for 2.1.1. And for the patch, I suggest that we split it into two piece. The verbose related code can be done in a separated issue, and can be committed to all branches. And the code for fixing the actual problem should be committed to branch-2.1 and branch-2.0, which should be done ASAP as we want to push out 2.1.1 now. Ping [~stack]. > Procedure holds the lock should put to front of the queue after restart > --- > > Key: HBASE-21364 > URL: https://issues.apache.org/jira/browse/HBASE-21364 > Project: HBase > Issue Type: Sub-task >Affects Versions: 2.1.0, 2.0.2 >Reporter: Allan Yang >Assignee: Allan Yang >Priority: Blocker > Fix For: 2.1.1, 2.0.3 > > Attachments: HBASE-21364.branch-2.0.001.patch, > HBASE-21364.branch-2.0.002.patch > > > After restore the procedures form Procedure WALs. We will put the runable > procedures back to the queue to execute. The order is not the problem before > HBASE-20846 since the first one to execute will acquire the lock itself. But > since the locks will restored after HBASE-20846. If we execute a procedure > without the lock first before a procedure with the lock in the same queue, > there is a race condition that we may not be able to execute all procedures > in the same queue at all. > The race condtion is: > 1. A procedure need to take the table's exclusive lock was put into the > table's queue, but the table's shard lock was lock by a Region Procedure. > Since no one takes the exclusive lock, the queue is put to run queue to > execute. But soon, the worker thread see the procedure can't execute because > it doesn't hold the lock, so it will stop execute and remove the queue from > run queue. > 2. At the same time, the Region procedure which holds the table's shard lock > and the region's exclusive lock is put to the table's queue. But, since the > queue already added to the run queue, it won't add again. > 3. Since 1, the table's queue was removed from the run queue. > 4. Then, no one will put the table's queue back, thus no worker will execute > the procedures inside > A test case in the patch shows how. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-21364) Procedure holds the lock should put to front of the queue after restart
[ https://issues.apache.org/jira/browse/HBASE-21364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Duo Zhang updated HBASE-21364: -- Priority: Blocker (was: Major) > Procedure holds the lock should put to front of the queue after restart > --- > > Key: HBASE-21364 > URL: https://issues.apache.org/jira/browse/HBASE-21364 > Project: HBase > Issue Type: Sub-task >Affects Versions: 2.1.0, 2.0.2 >Reporter: Allan Yang >Assignee: Allan Yang >Priority: Blocker > Fix For: 2.1.1, 2.0.3 > > Attachments: HBASE-21364.branch-2.0.001.patch, > HBASE-21364.branch-2.0.002.patch > > > After restore the procedures form Procedure WALs. We will put the runable > procedures back to the queue to execute. The order is not the problem before > HBASE-20846 since the first one to execute will acquire the lock itself. But > since the locks will restored after HBASE-20846. If we execute a procedure > without the lock first before a procedure with the lock in the same queue, > there is a race condition that we may not be able to execute all procedures > in the same queue at all. > The race condtion is: > 1. A procedure need to take the table's exclusive lock was put into the > table's queue, but the table's shard lock was lock by a Region Procedure. > Since no one takes the exclusive lock, the queue is put to run queue to > execute. But soon, the worker thread see the procedure can't execute because > it doesn't hold the lock, so it will stop execute and remove the queue from > run queue. > 2. At the same time, the Region procedure which holds the table's shard lock > and the region's exclusive lock is put to the table's queue. But, since the > queue already added to the run queue, it won't add again. > 3. Since 1, the table's queue was removed from the run queue. > 4. Then, no one will put the table's queue back, thus no worker will execute > the procedures inside > A test case in the patch shows how. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-21364) Procedure holds the lock should put to front of the queue after restart
[ https://issues.apache.org/jira/browse/HBASE-21364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Duo Zhang updated HBASE-21364: -- Fix Version/s: 2.0.3 2.1.1 > Procedure holds the lock should put to front of the queue after restart > --- > > Key: HBASE-21364 > URL: https://issues.apache.org/jira/browse/HBASE-21364 > Project: HBase > Issue Type: Sub-task >Affects Versions: 2.1.0, 2.0.2 >Reporter: Allan Yang >Assignee: Allan Yang >Priority: Major > Fix For: 2.1.1, 2.0.3 > > Attachments: HBASE-21364.branch-2.0.001.patch, > HBASE-21364.branch-2.0.002.patch > > > After restore the procedures form Procedure WALs. We will put the runable > procedures back to the queue to execute. The order is not the problem before > HBASE-20846 since the first one to execute will acquire the lock itself. But > since the locks will restored after HBASE-20846. If we execute a procedure > without the lock first before a procedure with the lock in the same queue, > there is a race condition that we may not be able to execute all procedures > in the same queue at all. > The race condtion is: > 1. A procedure need to take the table's exclusive lock was put into the > table's queue, but the table's shard lock was lock by a Region Procedure. > Since no one takes the exclusive lock, the queue is put to run queue to > execute. But soon, the worker thread see the procedure can't execute because > it doesn't hold the lock, so it will stop execute and remove the queue from > run queue. > 2. At the same time, the Region procedure which holds the table's shard lock > and the region's exclusive lock is put to the table's queue. But, since the > queue already added to the run queue, it won't add again. > 3. Since 1, the table's queue was removed from the run queue. > 4. Then, no one will put the table's queue back, thus no worker will execute > the procedures inside > A test case in the patch shows how. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21363) Rewrite the buildingHoldCleanupTracker method in WALProcedureStore
[ https://issues.apache.org/jira/browse/HBASE-21363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661525#comment-16661525 ] Duo Zhang commented on HBASE-21363: --- Oh shit. Let me check the code. I think this should be in 2.1. The patch is almost there. Thanks. > Rewrite the buildingHoldCleanupTracker method in WALProcedureStore > -- > > Key: HBASE-21363 > URL: https://issues.apache.org/jira/browse/HBASE-21363 > Project: HBase > Issue Type: Sub-task > Components: proc-v2 >Reporter: Duo Zhang >Assignee: Duo Zhang >Priority: Major > Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3 > > Attachments: HBASE-21363-v1.patch, HBASE-21363-v2.patch, > HBASE-21363.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20828) Finish-up AMv2 Design/List of Tenets/Specification of operation
[ https://issues.apache.org/jira/browse/HBASE-20828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661524#comment-16661524 ] stack commented on HBASE-20828: --- I need to write up what is in here. The subtasks have changed AMv2 for the better. Stuff like HBASE-21278 where now we do not try to rollback successful procedures but rather the parent needs to schedule compensatory, new Procedures needs evangelizing. Ditto the background task that is trying to limit our backlog of master proc wals TODO. > Finish-up AMv2 Design/List of Tenets/Specification of operation > --- > > Key: HBASE-20828 > URL: https://issues.apache.org/jira/browse/HBASE-20828 > Project: HBase > Issue Type: Umbrella > Components: amv2 >Reporter: stack >Priority: Major > > AMv2 is missing specification. There are too many grey-areas still. Also > missing are a concise listing of the tenets of AMv2 operation. Here are some > examples: > * HBASE-19529 "Handle null states in AM": Asks how we should treat null > state in hbase:meta. What does it 'mean'. We seem to treat it differently > dependent on context. Needs clarification. [~Apache9] recently asked similar > about the meaning of OFFLINE. > * Logging needs to have a particular form to help trace Procedure progress; > needs a write-up. > Lets fill in items to address in this umbrella issue. Can address in > subissues and produce specification doc too. We have the below but these are > mostly (incomplete) description for devs on pv2 and amv2; the specification > is missing: > http://hbase.apache.org/book.html#pv2 > http://hbase.apache.org/book.html#amv2 > (Other areas include addressing what is up w/ rollback -- when, how much, and > when it is not appropriate -- as well as recommendation on Procedures > coarseness, locking -- is it ok to lock table in alter table procedure for > the life of the procedure? -- and so on). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-21372) Set hbase.assignment.maximum.attempts to Long.MAX
[ https://issues.apache.org/jira/browse/HBASE-21372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-21372: -- Attachment: HBASE-21372.branch-2.1.001.patch > Set hbase.assignment.maximum.attempts to Long.MAX > - > > Key: HBASE-21372 > URL: https://issues.apache.org/jira/browse/HBASE-21372 > Project: HBase > Issue Type: Sub-task > Components: amv2 >Reporter: stack >Assignee: stack >Priority: Major > Attachments: HBASE-21372.branch-2.1.001.patch, > HBASE-21372.branch-2.1.001.patch > > > From parent issue, [~allan163] suggests that we not give up on assign unless > there a change -- an SCP triggers failure -- or at the extreme, an operator > intervenes. This jibes w/ how we're thinking about assign (or to put it > another way, we have no handling for the case where we exhaust retries). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21372) Set hbase.assignment.maximum.attempts to Long.MAX
[ https://issues.apache.org/jira/browse/HBASE-21372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661515#comment-16661515 ] Hadoop QA commented on HBASE-21372: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 31s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:orange}-0{color} | {color:orange} test4tests {color} | {color:orange} 0m 0s{color} | {color:orange} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} branch-2.1 Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 5m 15s{color} | {color:green} branch-2.1 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 13s{color} | {color:green} branch-2.1 passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 20s{color} | {color:green} branch-2.1 passed {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 4m 58s{color} | {color:green} branch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 19s{color} | {color:green} branch-2.1 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 48s{color} | {color:green} branch-2.1 passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 12s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 22s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 22s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 27s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 5m 16s{color} | {color:green} patch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 13m 20s{color} | {color:green} Patch does not cause any errors with Hadoop 2.7.4 or 3.0.0. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 35s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 41s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red}210m 22s{color} | {color:red} hbase-server in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 43s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}261m 12s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:42ca976 | | JIRA Issue | HBASE-21372 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12945268/HBASE-21372.branch-2.1.001.patch | | Optional Tests | dupname asflicense javac javadoc unit findbugs shadedjars hadoopcheck hbaseanti checkstyle compile | | uname | Linux cd3b360c9c27 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 08:52:28 UTC 2018 x86_64 GNU/Linux | | Build tool | maven | | Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh | | git revision | branch-2.1 / e29ce9f937 | | maven | version: Apache Maven 3.5.4 (1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-17T18:33:14Z) | | Default Java | 1.8.0_181 | | findbugs | v3.1.0-RC3 | | unit | https://builds.apache.org/job/PreCommit-HBASE-Build/14830/artifact/patchprocess/patch-unit-hbase-server.txt | | Test Results | https://builds.apache.org/job/PreCommit-HBASE-Build/14830/testReport/ | | Max. process+thread count | 4497 (vs.
[jira] [Commented] (HBASE-21344) hbase:meta location in ZooKeeper set to OPENING by the procedure which eventually failed but precludes Master from assigning it forever
[ https://issues.apache.org/jira/browse/HBASE-21344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661513#comment-16661513 ] stack commented on HBASE-21344: --- [~an...@apache.org] You don't seem to be working against the tip of branch-2.0 or branch-2.1. You seem to be working in your own branch? Is that so? If so, startup has changed pretty radically since 2.0.0. > hbase:meta location in ZooKeeper set to OPENING by the procedure which > eventually failed but precludes Master from assigning it forever > --- > > Key: HBASE-21344 > URL: https://issues.apache.org/jira/browse/HBASE-21344 > Project: HBase > Issue Type: Bug > Components: proc-v2 >Reporter: Ankit Singhal >Assignee: Ankit Singhal >Priority: Major > Attachments: HBASE-21344-branch-2.0.patch, > HBASE-21344-branch-2.0_v2.patch > > > [~elserj] has already summarized it well. > 1. hbase:meta was on RS8 > 2. RS8 crashed, SCP was queued for it, meta first > 3. meta was marked OFFLINE > 4. meta marked as OPENING on RS3 > 5. Can't actually send the openRegion RPC to RS3 due to the krb ticket issue > 6. We attempt the openRegion/assignment 10 times, failing each time > 7. We start rolling back the procedure: > {code:java} > 2018-10-08 06:51:24,440 WARN [PEWorker-9] procedure2.ProcedureExecutor: > Usually this should not happen, we will release the lock before if the > procedure is finished, even if the holdLock is true, arrive here means we > have some holes where we do not release the lock. And the releaseLock below > may fail since the procedure may have already been deleted from the procedure > store. > 2018-10-08 06:51:24,543 INFO [PEWorker-9] > procedure.MasterProcedureScheduler: pid=48, ppid=47, > state=FAILED:REGION_TRANSITION_QUEUE, > exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via > AssignProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException: Max > attempts exceeded; AssignProcedure table=hbase:meta, region=1588230740 > checking lock on 1588230740 > {code} > {code:java} > 2018-10-08 06:51:30,957 ERROR [PEWorker-9] procedure2.ProcedureExecutor: > CODE-BUG: Uncaught runtime exception for pid=47, > state=FAILED:SERVER_CRASH_ASSIGN_META, locked=true, > exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via > AssignProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException: Max > attempts exceeded; ServerCrashProcedure > server=,16020,1538974612843, splitWal=true, meta=true > java.lang.UnsupportedOperationException: unhandled > state=SERVER_CRASH_GET_REGIONS > at > org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.rollbackState(ServerCrashProcedure.java:254) > at > org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.rollbackState(ServerCrashProcedure.java:58) > at > org.apache.hadoop.hbase.procedure2.StateMachineProcedure.rollback(StateMachineProcedure.java:203) > at > org.apache.hadoop.hbase.procedure2.Procedure.doRollback(Procedure.java:960) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1577) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1539) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1418) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$900(ProcedureExecutor.java:75) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1981) > {code} > {code:java} > { DEBUG [PEWorker-2] client.RpcRetryingCallerImpl: Call exception, tries=7, > retries=7, started=8168 ms ago, cancelled=false, msg=Meta region is in state > OPENING, details=row 'backup:system' on table 'hbase:meta' at > region=hbase:meta,,1.1588230740, hostname=, seqNum=-1, > exception=java.io.IOException: Meta region is in state OPENING > at > org.apache.hadoop.hbase.client.ZKAsyncRegistry.lambda$null$1(ZKAsyncRegistry.java:154) > at > java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760) > at > java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736) > at > java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474) > at > java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1962) > at > org.apache.hadoop.hbase.client.ZKAsyncRegistry.lambda$getAndConvert$0(ZKAsyncRegistry.java:77) > at > java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760) > at >
[jira] [Commented] (HBASE-21349) Cluster is going down but CatalogJanitor and Normalizer try to run and fail noisely
[ https://issues.apache.org/jira/browse/HBASE-21349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661486#comment-16661486 ] Hadoop QA commented on HBASE-21349: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 13s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:orange}-0{color} | {color:orange} test4tests {color} | {color:orange} 0m 0s{color} | {color:orange} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 1s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 45s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 15s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 4m 19s{color} | {color:green} branch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 2s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 30s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 5m 19s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 48s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 48s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 17s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 4m 29s{color} | {color:green} patch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 10m 29s{color} | {color:green} Patch does not cause any errors with Hadoop 2.7.4 or 3.0.0. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 8s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 30s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red}130m 40s{color} | {color:red} hbase-server in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 24s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}173m 43s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hbase.client.TestBlockEvictionFromClient | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:b002b0b | | JIRA Issue | HBASE-21349 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12945273/HBASE-21349.master.002.patch | | Optional Tests | dupname asflicense javac javadoc unit findbugs shadedjars hadoopcheck hbaseanti checkstyle compile | | uname | Linux fc32ee6e94ae 3.13.0-143-generic #192-Ubuntu SMP Tue Feb 27 10:45:36 UTC 2018 x86_64 GNU/Linux | | Build tool | maven | | Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh | | git revision | master / 1e9d998727 | | maven | version: Apache Maven 3.5.4 (1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-17T18:33:14Z) | | Default Java | 1.8.0_181 | | findbugs | v3.1.0-RC3 | | unit | https://builds.apache.org/job/PreCommit-HBASE-Build/14831/artifact/patchprocess/patch-unit-hbase-server.txt | | Test Results |
[jira] [Work stopped] (HBASE-21373) Backport to branch-1, "HBASE-21338 [balancer] If balancer is an ill-fit for cluster size, it gives little indication"
[ https://issues.apache.org/jira/browse/HBASE-21373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HBASE-21373 stopped by Xu Cang. --- > Backport to branch-1, "HBASE-21338 [balancer] If balancer is an ill-fit for > cluster size, it gives little indication" > - > > Key: HBASE-21373 > URL: https://issues.apache.org/jira/browse/HBASE-21373 > Project: HBase > Issue Type: Bug > Components: Operability >Reporter: stack >Assignee: Xu Cang >Priority: Major > Attachments: HBASE-21373.branch-1.001.patch > > > Issue to backport to branch-1. Hope you don't mind my assigning it to you Xu > Cang. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work started] (HBASE-21373) Backport to branch-1, "HBASE-21338 [balancer] If balancer is an ill-fit for cluster size, it gives little indication"
[ https://issues.apache.org/jira/browse/HBASE-21373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HBASE-21373 started by Xu Cang. --- > Backport to branch-1, "HBASE-21338 [balancer] If balancer is an ill-fit for > cluster size, it gives little indication" > - > > Key: HBASE-21373 > URL: https://issues.apache.org/jira/browse/HBASE-21373 > Project: HBase > Issue Type: Bug > Components: Operability >Reporter: stack >Assignee: Xu Cang >Priority: Major > Attachments: HBASE-21373.branch-1.001.patch > > > Issue to backport to branch-1. Hope you don't mind my assigning it to you Xu > Cang. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-21373) Backport to branch-1, "HBASE-21338 [balancer] If balancer is an ill-fit for cluster size, it gives little indication"
[ https://issues.apache.org/jira/browse/HBASE-21373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xu Cang updated HBASE-21373: Attachment: HBASE-21373.branch-1.001.patch Status: Patch Available (was: Open) > Backport to branch-1, "HBASE-21338 [balancer] If balancer is an ill-fit for > cluster size, it gives little indication" > - > > Key: HBASE-21373 > URL: https://issues.apache.org/jira/browse/HBASE-21373 > Project: HBase > Issue Type: Bug > Components: Operability >Reporter: stack >Assignee: Xu Cang >Priority: Major > Attachments: HBASE-21373.branch-1.001.patch > > > Issue to backport to branch-1. Hope you don't mind my assigning it to you Xu > Cang. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21073) "Maintenance mode" master
[ https://issues.apache.org/jira/browse/HBASE-21073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661421#comment-16661421 ] Hudson commented on HBASE-21073: Results for branch branch-2.0 [build #1002 on builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/1002/]: (x) *{color:red}-1 overall{color}* details (if available): (/) {color:green}+1 general checks{color} -- For more information [see general report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/1002//General_Nightly_Build_Report/] (x) {color:red}-1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/1002//JDK8_Nightly_Build_Report_(Hadoop2)/] (x) {color:red}-1 jdk8 hadoop3 checks{color} -- For more information [see jdk8 (hadoop3) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/1002//JDK8_Nightly_Build_Report_(Hadoop3)/] (/) {color:green}+1 source release artifact{color} -- See build output for details. > "Maintenance mode" master > - > > Key: HBASE-21073 > URL: https://issues.apache.org/jira/browse/HBASE-21073 > Project: HBase > Issue Type: Sub-task > Components: amv2, hbck2, master >Reporter: stack >Assignee: Mike Drob >Priority: Major > Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3 > > Attachments: HBASE-21073.branch-2.001.patch, > HBASE-21073.branch-2.1.001.patch, HBASE-21073.branch-2.1.002.patch, > HBASE-21073.master.001.patch, HBASE-21073.master.002.patch, > HBASE-21073.master.003.patch, HBASE-21073.master.004.patch, > HBASE-21073.master.005.patch, HBASE-21073.master.006.patch, > HBASE-21073.master.007.patch, HBASE-21073.master.008.patch, > HBASE-21073.master.009.patch, HBASE-21073.master.010.patch, > HBASE-21073.master.011.patch > > > Make it so we can bring up a Master in "maintenance mode". This is parse of > master wal procs but not taking on regionservers. It would be in a state > where "repair" Procedures could run; e.g. a Procedure that could recover meta > by looking for meta WALs, splitting them, dropping recovered.edits, and even > making it so meta is readable. See parent issue for why needed (disaster > recovery). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21344) hbase:meta location in ZooKeeper set to OPENING by the procedure which eventually failed but precludes Master from assigning it forever
[ https://issues.apache.org/jira/browse/HBASE-21344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661395#comment-16661395 ] Ankit Singhal commented on HBASE-21344: --- bq. This should be happening already. We wait on meta assign. If SCPs, they'll run and recover meta if one of them was holding it. If no assign for meta in the procedure store, then something untoward and at least for now, operator needs to figure what happened until we fix the bug. Operator can schedule an assign with hbck2 bq. branch-2.0 will go into a holding pattern if hbase:meta is not assigned (ditto if hbase:namespace is not assigned) waiting on operator intevention to clear the lack-of-assign. Thanks [~stack] for the pointer, I didn't go down as the problem was started when we are starting tableStateManager without waiting for meta assignment by SCPs. I think we can just remove this from here as we already starting after waiting for meta to get online.(attached patch for the same) {code} if (initMetaProc != null) { initMetaProc.await(); } -tableStateManager.start(); {code} bq. That said, I see some value in this patch. In particular the bit around resetting hbase:meta state if failure. We shouldn't offline the meta if we are failing the assignment as it will start the InitMetaProcedure (which we don't want as SCP need to take care of recovering of Meta). > hbase:meta location in ZooKeeper set to OPENING by the procedure which > eventually failed but precludes Master from assigning it forever > --- > > Key: HBASE-21344 > URL: https://issues.apache.org/jira/browse/HBASE-21344 > Project: HBase > Issue Type: Bug > Components: proc-v2 >Reporter: Ankit Singhal >Assignee: Ankit Singhal >Priority: Major > Attachments: HBASE-21344-branch-2.0.patch, > HBASE-21344-branch-2.0_v2.patch > > > [~elserj] has already summarized it well. > 1. hbase:meta was on RS8 > 2. RS8 crashed, SCP was queued for it, meta first > 3. meta was marked OFFLINE > 4. meta marked as OPENING on RS3 > 5. Can't actually send the openRegion RPC to RS3 due to the krb ticket issue > 6. We attempt the openRegion/assignment 10 times, failing each time > 7. We start rolling back the procedure: > {code:java} > 2018-10-08 06:51:24,440 WARN [PEWorker-9] procedure2.ProcedureExecutor: > Usually this should not happen, we will release the lock before if the > procedure is finished, even if the holdLock is true, arrive here means we > have some holes where we do not release the lock. And the releaseLock below > may fail since the procedure may have already been deleted from the procedure > store. > 2018-10-08 06:51:24,543 INFO [PEWorker-9] > procedure.MasterProcedureScheduler: pid=48, ppid=47, > state=FAILED:REGION_TRANSITION_QUEUE, > exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via > AssignProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException: Max > attempts exceeded; AssignProcedure table=hbase:meta, region=1588230740 > checking lock on 1588230740 > {code} > {code:java} > 2018-10-08 06:51:30,957 ERROR [PEWorker-9] procedure2.ProcedureExecutor: > CODE-BUG: Uncaught runtime exception for pid=47, > state=FAILED:SERVER_CRASH_ASSIGN_META, locked=true, > exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via > AssignProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException: Max > attempts exceeded; ServerCrashProcedure > server=,16020,1538974612843, splitWal=true, meta=true > java.lang.UnsupportedOperationException: unhandled > state=SERVER_CRASH_GET_REGIONS > at > org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.rollbackState(ServerCrashProcedure.java:254) > at > org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.rollbackState(ServerCrashProcedure.java:58) > at > org.apache.hadoop.hbase.procedure2.StateMachineProcedure.rollback(StateMachineProcedure.java:203) > at > org.apache.hadoop.hbase.procedure2.Procedure.doRollback(Procedure.java:960) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1577) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1539) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1418) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$900(ProcedureExecutor.java:75) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1981) > {code} > {code:java} > { DEBUG [PEWorker-2] client.RpcRetryingCallerImpl: Call exception, tries=7, > retries=7, started=8168
[jira] [Updated] (HBASE-21344) hbase:meta location in ZooKeeper set to OPENING by the procedure which eventually failed but precludes Master from assigning it forever
[ https://issues.apache.org/jira/browse/HBASE-21344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ankit Singhal updated HBASE-21344: -- Attachment: HBASE-21344-branch-2.0_v2.patch > hbase:meta location in ZooKeeper set to OPENING by the procedure which > eventually failed but precludes Master from assigning it forever > --- > > Key: HBASE-21344 > URL: https://issues.apache.org/jira/browse/HBASE-21344 > Project: HBase > Issue Type: Bug > Components: proc-v2 >Reporter: Ankit Singhal >Assignee: Ankit Singhal >Priority: Major > Attachments: HBASE-21344-branch-2.0.patch, > HBASE-21344-branch-2.0_v2.patch > > > [~elserj] has already summarized it well. > 1. hbase:meta was on RS8 > 2. RS8 crashed, SCP was queued for it, meta first > 3. meta was marked OFFLINE > 4. meta marked as OPENING on RS3 > 5. Can't actually send the openRegion RPC to RS3 due to the krb ticket issue > 6. We attempt the openRegion/assignment 10 times, failing each time > 7. We start rolling back the procedure: > {code:java} > 2018-10-08 06:51:24,440 WARN [PEWorker-9] procedure2.ProcedureExecutor: > Usually this should not happen, we will release the lock before if the > procedure is finished, even if the holdLock is true, arrive here means we > have some holes where we do not release the lock. And the releaseLock below > may fail since the procedure may have already been deleted from the procedure > store. > 2018-10-08 06:51:24,543 INFO [PEWorker-9] > procedure.MasterProcedureScheduler: pid=48, ppid=47, > state=FAILED:REGION_TRANSITION_QUEUE, > exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via > AssignProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException: Max > attempts exceeded; AssignProcedure table=hbase:meta, region=1588230740 > checking lock on 1588230740 > {code} > {code:java} > 2018-10-08 06:51:30,957 ERROR [PEWorker-9] procedure2.ProcedureExecutor: > CODE-BUG: Uncaught runtime exception for pid=47, > state=FAILED:SERVER_CRASH_ASSIGN_META, locked=true, > exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via > AssignProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException: Max > attempts exceeded; ServerCrashProcedure > server=,16020,1538974612843, splitWal=true, meta=true > java.lang.UnsupportedOperationException: unhandled > state=SERVER_CRASH_GET_REGIONS > at > org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.rollbackState(ServerCrashProcedure.java:254) > at > org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.rollbackState(ServerCrashProcedure.java:58) > at > org.apache.hadoop.hbase.procedure2.StateMachineProcedure.rollback(StateMachineProcedure.java:203) > at > org.apache.hadoop.hbase.procedure2.Procedure.doRollback(Procedure.java:960) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1577) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1539) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1418) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$900(ProcedureExecutor.java:75) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1981) > {code} > {code:java} > { DEBUG [PEWorker-2] client.RpcRetryingCallerImpl: Call exception, tries=7, > retries=7, started=8168 ms ago, cancelled=false, msg=Meta region is in state > OPENING, details=row 'backup:system' on table 'hbase:meta' at > region=hbase:meta,,1.1588230740, hostname=, seqNum=-1, > exception=java.io.IOException: Meta region is in state OPENING > at > org.apache.hadoop.hbase.client.ZKAsyncRegistry.lambda$null$1(ZKAsyncRegistry.java:154) > at > java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760) > at > java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736) > at > java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474) > at > java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1962) > at > org.apache.hadoop.hbase.client.ZKAsyncRegistry.lambda$getAndConvert$0(ZKAsyncRegistry.java:77) > at > java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760) > at > java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736) > at > java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474) > at >
[jira] [Resolved] (HBASE-21353) TestHBCKCommandLineParsing#testCommandWithOptions hangs on call to HBCK2#checkHBCKSupport
[ https://issues.apache.org/jira/browse/HBASE-21353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack resolved HBASE-21353. --- Resolution: Fixed Assignee: stack Fix Version/s: hbck2-1.0.0 Pushed fix over on hbase-operator-tools/hbase-hbck2. > TestHBCKCommandLineParsing#testCommandWithOptions hangs on call to > HBCK2#checkHBCKSupport > - > > Key: HBASE-21353 > URL: https://issues.apache.org/jira/browse/HBASE-21353 > Project: HBase > Issue Type: Test > Components: hbase-operator-tools, hbck2 >Reporter: Ted Yu >Assignee: stack >Priority: Major > Fix For: hbck2-1.0.0 > > > I noticed the following when running > TestHBCKCommandLineParsing#testCommandWithOptions : > {code} > "main" #1 prio=5 os_prio=31 tid=0x7f851c80 nid=0x1703 waiting on > condition [0x70216000] >java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x00076d3055d8> (a > java.util.concurrent.CompletableFuture$Signaller) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) > at > java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1693) > at > java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323) > at > java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1729) > at > java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895) > at > org.apache.hadoop.hbase.client.ConnectionImplementation.retrieveClusterId(ConnectionImplementation.java:564) > at > org.apache.hadoop.hbase.client.ConnectionImplementation.(ConnectionImplementation.java:297) > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:423) > at > org.apache.hadoop.hbase.client.ConnectionFactory.lambda$createConnection$0(ConnectionFactory.java:229) > at > org.apache.hadoop.hbase.client.ConnectionFactory$$Lambda$11/502838712.run(Unknown > Source) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1686) > at > org.apache.hadoop.hbase.security.User$SecureHadoopUser.runAs(User.java:347) > at > org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:227) > at > org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:127) > at org.apache.hbase.HBCK2.checkHBCKSupport(HBCK2.java:93) > at org.apache.hbase.HBCK2.run(HBCK2.java:352) > at > org.apache.hbase.TestHBCKCommandLineParsing.testCommandWithOptions(TestHBCKCommandLineParsing.java:62) > {code} > The test doesn't spin up hbase cluster. > Hence the call to check hbck support hangs. > In HBCK2#run, we can refactor the code such that argument parsing is done > prior to calling HBCK2#checkHBCKSupport . -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-21353) TestHBCKCommandLineParsing#testCommandWithOptions hangs on call to HBCK2#checkHBCKSupport
[ https://issues.apache.org/jira/browse/HBASE-21353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-21353: -- Component/s: hbck2 hbase-operator-tools > TestHBCKCommandLineParsing#testCommandWithOptions hangs on call to > HBCK2#checkHBCKSupport > - > > Key: HBASE-21353 > URL: https://issues.apache.org/jira/browse/HBASE-21353 > Project: HBase > Issue Type: Test > Components: hbase-operator-tools, hbck2 >Reporter: Ted Yu >Priority: Major > > I noticed the following when running > TestHBCKCommandLineParsing#testCommandWithOptions : > {code} > "main" #1 prio=5 os_prio=31 tid=0x7f851c80 nid=0x1703 waiting on > condition [0x70216000] >java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x00076d3055d8> (a > java.util.concurrent.CompletableFuture$Signaller) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) > at > java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1693) > at > java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323) > at > java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1729) > at > java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895) > at > org.apache.hadoop.hbase.client.ConnectionImplementation.retrieveClusterId(ConnectionImplementation.java:564) > at > org.apache.hadoop.hbase.client.ConnectionImplementation.(ConnectionImplementation.java:297) > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:423) > at > org.apache.hadoop.hbase.client.ConnectionFactory.lambda$createConnection$0(ConnectionFactory.java:229) > at > org.apache.hadoop.hbase.client.ConnectionFactory$$Lambda$11/502838712.run(Unknown > Source) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1686) > at > org.apache.hadoop.hbase.security.User$SecureHadoopUser.runAs(User.java:347) > at > org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:227) > at > org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:127) > at org.apache.hbase.HBCK2.checkHBCKSupport(HBCK2.java:93) > at org.apache.hbase.HBCK2.run(HBCK2.java:352) > at > org.apache.hbase.TestHBCKCommandLineParsing.testCommandWithOptions(TestHBCKCommandLineParsing.java:62) > {code} > The test doesn't spin up hbase cluster. > Hence the call to check hbck support hangs. > In HBCK2#run, we can refactor the code such that argument parsing is done > prior to calling HBCK2#checkHBCKSupport . -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21371) Hbase unable to compile against Hadoop trunk (3.3.0-SNAPSHOT) due to license error
[ https://issues.apache.org/jira/browse/HBASE-21371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661363#comment-16661363 ] Wei-Chiu Chuang commented on HBASE-21371: - {quote}yes, this will need to be done wherever we ship the relevant jar. if we already have bouncycastle as a dependency shouldn't we have an entry for it though? {quote} It looks like bouncycastle was under MIT License before (Hadoop used a very old version of bouncycastle in the past) and it's now Bouncy Castle License although it's essentially the same thing. > Hbase unable to compile against Hadoop trunk (3.3.0-SNAPSHOT) due to license > error > -- > > Key: HBASE-21371 > URL: https://issues.apache.org/jira/browse/HBASE-21371 > Project: HBase > Issue Type: Bug >Reporter: Wei-Chiu Chuang >Assignee: Wei-Chiu Chuang >Priority: Major > Attachments: HBASE-21371.001.patch > > > Hadoop 3.3.0 (trunk) updated various dependencies, which adds additional > licenses that break HBase's license check plugin. > CDDL/GPLv2+CE license > {quote}This product includes JavaBeans Activation Framework API jar licensed > under the CDDL/GPLv2+CE. > CDDL or GPL version 2 plus the Classpath Exception > ERROR: Please check this License for acceptability here: > [https://www.apache.org/legal/resolved] > If it is okay, then update the list named 'non_aggregate_fine' in the > LICENSE.vm file. > If it isn't okay, then revert the change that added the dependency. > More info on the dependency: > javax.activation > javax.activation-api > 1.2.0 > maven central search > g:javax.activation AND a:javax.activation-api AND v:1.2.0 > project website > [http://java.net/all/javax.activation-api/] > project source > [https://github.com/javaee/activation/javax.activation-api] > {quote} > Bouncy Castle License > {quote}– > This product includes Bouncy Castle PKIX, CMS, EAC, TSP, PKCS, OCSP, CMP, > and CRMF APIs licensed under the Bouncy Castle Licence. > ERROR: Please check this License for acceptability here: > [https://www.apache.org/legal/resolved] > If it is okay, then update the list named 'non_aggregate_fine' in the > LICENSE.vm file. > If it isn't okay, then revert the change that added the dependency. > More info on the dependency: > org.bouncycastle > bcpkix-jdk15on > 1.60 > maven central search > g:org.bouncycastle AND a:bcpkix-jdk15on AND v:1.60 > project website > [http://www.bouncycastle.org/java.html] > project source > [https://github.com/bcgit/bc-java] > – > {quote} > > And a long list of "Apache Software License - Version 2.0" licensed Jetty > dependencies like this: > {quote} > This product includes Jetty :: Servlet Annotations licensed under the Apache > Software License - Version 2.0. > ERROR: Please check this License for acceptability here: > [https://www.apache.org/legal/resolved] > If it is okay, then update the list named 'non_aggregate_fine' in the > LICENSE.vm file. > If it isn't okay, then revert the change that added the dependency. > More info on the dependency: > org.eclipse.jetty > jetty-annotations > 9.3.19.v20170502 > maven central search > g:org.eclipse.jetty AND a:jetty-annotations AND v:9.3.19.v20170502 > project website > [http://www.eclipse.org/jetty] > project source > [https://github.com/eclipse/jetty.project/jetty-annotations] > {quote} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21371) Hbase unable to compile against Hadoop trunk (3.3.0-SNAPSHOT) due to license error
[ https://issues.apache.org/jira/browse/HBASE-21371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661361#comment-16661361 ] Wei-Chiu Chuang commented on HBASE-21371: - {quote}instead of adding yet another way of referring to ALv2 can we please update supplemental to correct the new phrasing? {quote} Yeah I was doing that until I realized there are like a dozen Jetty artifacts that require the update ... But sure I can do that if this is the preferred way. > Hbase unable to compile against Hadoop trunk (3.3.0-SNAPSHOT) due to license > error > -- > > Key: HBASE-21371 > URL: https://issues.apache.org/jira/browse/HBASE-21371 > Project: HBase > Issue Type: Bug >Reporter: Wei-Chiu Chuang >Assignee: Wei-Chiu Chuang >Priority: Major > Attachments: HBASE-21371.001.patch > > > Hadoop 3.3.0 (trunk) updated various dependencies, which adds additional > licenses that break HBase's license check plugin. > CDDL/GPLv2+CE license > {quote}This product includes JavaBeans Activation Framework API jar licensed > under the CDDL/GPLv2+CE. > CDDL or GPL version 2 plus the Classpath Exception > ERROR: Please check this License for acceptability here: > [https://www.apache.org/legal/resolved] > If it is okay, then update the list named 'non_aggregate_fine' in the > LICENSE.vm file. > If it isn't okay, then revert the change that added the dependency. > More info on the dependency: > javax.activation > javax.activation-api > 1.2.0 > maven central search > g:javax.activation AND a:javax.activation-api AND v:1.2.0 > project website > [http://java.net/all/javax.activation-api/] > project source > [https://github.com/javaee/activation/javax.activation-api] > {quote} > Bouncy Castle License > {quote}– > This product includes Bouncy Castle PKIX, CMS, EAC, TSP, PKCS, OCSP, CMP, > and CRMF APIs licensed under the Bouncy Castle Licence. > ERROR: Please check this License for acceptability here: > [https://www.apache.org/legal/resolved] > If it is okay, then update the list named 'non_aggregate_fine' in the > LICENSE.vm file. > If it isn't okay, then revert the change that added the dependency. > More info on the dependency: > org.bouncycastle > bcpkix-jdk15on > 1.60 > maven central search > g:org.bouncycastle AND a:bcpkix-jdk15on AND v:1.60 > project website > [http://www.bouncycastle.org/java.html] > project source > [https://github.com/bcgit/bc-java] > – > {quote} > > And a long list of "Apache Software License - Version 2.0" licensed Jetty > dependencies like this: > {quote} > This product includes Jetty :: Servlet Annotations licensed under the Apache > Software License - Version 2.0. > ERROR: Please check this License for acceptability here: > [https://www.apache.org/legal/resolved] > If it is okay, then update the list named 'non_aggregate_fine' in the > LICENSE.vm file. > If it isn't okay, then revert the change that added the dependency. > More info on the dependency: > org.eclipse.jetty > jetty-annotations > 9.3.19.v20170502 > maven central search > g:org.eclipse.jetty AND a:jetty-annotations AND v:9.3.19.v20170502 > project website > [http://www.eclipse.org/jetty] > project source > [https://github.com/eclipse/jetty.project/jetty-annotations] > {quote} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21374) Backport HBASE-21342 to branch-1
[ https://issues.apache.org/jira/browse/HBASE-21374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661360#comment-16661360 ] Mike Drob commented on HBASE-21374: --- There was a conflict in the cherry-pick that I haven't looked into yet, will take care of it tomorrow if nobody else gets to it before that. FYI [~apurtell] > Backport HBASE-21342 to branch-1 > > > Key: HBASE-21374 > URL: https://issues.apache.org/jira/browse/HBASE-21374 > Project: HBase > Issue Type: Task >Reporter: Mike Drob >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-21374) Backport HBASE-21342 to branch-1
Mike Drob created HBASE-21374: - Summary: Backport HBASE-21342 to branch-1 Key: HBASE-21374 URL: https://issues.apache.org/jira/browse/HBASE-21374 Project: HBase Issue Type: Task Reporter: Mike Drob -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-21342) FileSystem in use may get closed by other bulk load call in secure bulkLoad
[ https://issues.apache.org/jira/browse/HBASE-21342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mike Drob updated HBASE-21342: -- Resolution: Fixed Status: Resolved (was: Patch Available) > FileSystem in use may get closed by other bulk load call in secure bulkLoad > > > Key: HBASE-21342 > URL: https://issues.apache.org/jira/browse/HBASE-21342 > Project: HBase > Issue Type: Bug >Affects Versions: 3.0.0, 2.1.0, 1.5.0, 1.3.3, 1.4.4, 2.0.1, 1.2.7 >Reporter: mazhenlin >Assignee: mazhenlin >Priority: Major > Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3 > > Attachments: 21342.v1.txt, HBASE-21342.002.patch, > HBASE-21342.003.patch, HBASE-21342.004.patch, HBASE-21342.005.patch, > HBASE-21342.006.patch, HBASE-21342.007.patch, race.patch > > > As mentioned in [HBASE-15291|#HBASE-15291], there is a race condition. If > Two secure bulkload calls from the same UGI into two different regions and > one region finishes earlier, it will close the bulk load fs, and the other > region will fail. > > Another case would be more serious. The FileSystem.close() function needs two > synchronized variables : CACHE and deleteOnExit. If one region calls > FileSystem.closeAllForUGI ( in SecureBulkLoadManager.cleanupBulkLoad) while > another region is trying to close srcFS ( in > SecureBulkLoadListener.closeSrcFs) , can cause deadlock here. > > I have wrote a UT for this and fixed it using reference counter. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21342) FileSystem in use may get closed by other bulk load call in secure bulkLoad
[ https://issues.apache.org/jira/browse/HBASE-21342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661356#comment-16661356 ] Mike Drob commented on HBASE-21342: --- I'm not trying to ignore any versions or leave anything unfixed. I'm not trying to make policy. The fix versions are the set of versions that I had already done backports and pushed to. I hadn't pushed code to branch-1 or branch-2.0 yet, so they weren't included. I was using fix version in Jira as descriptive, not prescriptive. I wanted to be able to close the issue so that the RM could generate release notes if needed, and then expected to continue work in a separate backport issue. I have pushed this to branch-2.0+ There is a conflict cherry-picking to branch-1 that I don't have time to resolve today. I will open a backport Jira to not conflict with the release notes for Stack if he cuts a 2.1.1 tonight. > FileSystem in use may get closed by other bulk load call in secure bulkLoad > > > Key: HBASE-21342 > URL: https://issues.apache.org/jira/browse/HBASE-21342 > Project: HBase > Issue Type: Bug >Affects Versions: 3.0.0, 2.1.0, 1.5.0, 1.3.3, 1.4.4, 2.0.1, 1.2.7 >Reporter: mazhenlin >Assignee: mazhenlin >Priority: Major > Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3 > > Attachments: 21342.v1.txt, HBASE-21342.002.patch, > HBASE-21342.003.patch, HBASE-21342.004.patch, HBASE-21342.005.patch, > HBASE-21342.006.patch, HBASE-21342.007.patch, race.patch > > > As mentioned in [HBASE-15291|#HBASE-15291], there is a race condition. If > Two secure bulkload calls from the same UGI into two different regions and > one region finishes earlier, it will close the bulk load fs, and the other > region will fail. > > Another case would be more serious. The FileSystem.close() function needs two > synchronized variables : CACHE and deleteOnExit. If one region calls > FileSystem.closeAllForUGI ( in SecureBulkLoadManager.cleanupBulkLoad) while > another region is trying to close srcFS ( in > SecureBulkLoadListener.closeSrcFs) , can cause deadlock here. > > I have wrote a UT for this and fixed it using reference counter. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-21342) FileSystem in use may get closed by other bulk load call in secure bulkLoad
[ https://issues.apache.org/jira/browse/HBASE-21342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mike Drob updated HBASE-21342: -- Fix Version/s: 2.0.3 2.1.1 > FileSystem in use may get closed by other bulk load call in secure bulkLoad > > > Key: HBASE-21342 > URL: https://issues.apache.org/jira/browse/HBASE-21342 > Project: HBase > Issue Type: Bug >Affects Versions: 3.0.0, 2.1.0, 1.5.0, 1.3.3, 1.4.4, 2.0.1, 1.2.7 >Reporter: mazhenlin >Assignee: mazhenlin >Priority: Major > Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3 > > Attachments: 21342.v1.txt, HBASE-21342.002.patch, > HBASE-21342.003.patch, HBASE-21342.004.patch, HBASE-21342.005.patch, > HBASE-21342.006.patch, HBASE-21342.007.patch, race.patch > > > As mentioned in [HBASE-15291|#HBASE-15291], there is a race condition. If > Two secure bulkload calls from the same UGI into two different regions and > one region finishes earlier, it will close the bulk load fs, and the other > region will fail. > > Another case would be more serious. The FileSystem.close() function needs two > synchronized variables : CACHE and deleteOnExit. If one region calls > FileSystem.closeAllForUGI ( in SecureBulkLoadManager.cleanupBulkLoad) while > another region is trying to close srcFS ( in > SecureBulkLoadListener.closeSrcFs) , can cause deadlock here. > > I have wrote a UT for this and fixed it using reference counter. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21342) FileSystem in use may get closed by other bulk load call in secure bulkLoad
[ https://issues.apache.org/jira/browse/HBASE-21342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661350#comment-16661350 ] Andrew Purtell commented on HBASE-21342: The affects versions are set. Shouldn’t the fix versions be the same if that code is similarly affected? Are we leaving known bugs in branch-1 unfixed by policy now? Separate back port JIRA is better than nothing but not much > FileSystem in use may get closed by other bulk load call in secure bulkLoad > > > Key: HBASE-21342 > URL: https://issues.apache.org/jira/browse/HBASE-21342 > Project: HBase > Issue Type: Bug >Affects Versions: 3.0.0, 2.1.0, 1.5.0, 1.3.3, 1.4.4, 2.0.1, 1.2.7 >Reporter: mazhenlin >Assignee: mazhenlin >Priority: Major > Fix For: 3.0.0, 2.2.0 > > Attachments: 21342.v1.txt, HBASE-21342.002.patch, > HBASE-21342.003.patch, HBASE-21342.004.patch, HBASE-21342.005.patch, > HBASE-21342.006.patch, HBASE-21342.007.patch, race.patch > > > As mentioned in [HBASE-15291|#HBASE-15291], there is a race condition. If > Two secure bulkload calls from the same UGI into two different regions and > one region finishes earlier, it will close the bulk load fs, and the other > region will fail. > > Another case would be more serious. The FileSystem.close() function needs two > synchronized variables : CACHE and deleteOnExit. If one region calls > FileSystem.closeAllForUGI ( in SecureBulkLoadManager.cleanupBulkLoad) while > another region is trying to close srcFS ( in > SecureBulkLoadListener.closeSrcFs) , can cause deadlock here. > > I have wrote a UT for this and fixed it using reference counter. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-21349) Cluster is going down but CatalogJanitor and Normalizer try to run and fail noisely
[ https://issues.apache.org/jira/browse/HBASE-21349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-21349: -- Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: (was: 2.1.2) 2.1.1 Status: Resolved (was: Patch Available) Pushed to branch-2.0+. Thanks for the patch [~xucang] > Cluster is going down but CatalogJanitor and Normalizer try to run and fail > noisely > --- > > Key: HBASE-21349 > URL: https://issues.apache.org/jira/browse/HBASE-21349 > Project: HBase > Issue Type: Bug >Affects Versions: 2.1.0 >Reporter: stack >Assignee: Xu Cang >Priority: Minor > Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3 > > Attachments: HBASE-21349.master.002.patch, > HBASE-21349.master.002.patch, HBASE-21349.master.002.patch, > HBASE-22349.master.001.patch > > > Shutting down can take a while. Meantime catalog janitor and or normalizer > (etc?) try to run and when they can't, they fail noisely. Looks bad: > {code} > 2018-10-19 21:23:24,962 INFO org.apache.hadoop.hbase.master.ServerManager: > Cluster shutdown set; vc1205.halxg.cloudera.com,22101,1539991730711 expired; > onlineServers=51 > 2018-10-19 21:25:54,502 WARN org.apache.hadoop.hbase.master.CatalogJanitor: > Failed scan of catalog table > java.io.IOException: connection is closed > at > org.apache.hadoop.hbase.MetaTableAccessor.getMetaHTable(MetaTableAccessor.java:267) > at > org.apache.hadoop.hbase.MetaTableAccessor.scanMeta(MetaTableAccessor.java:763) > at > org.apache.hadoop.hbase.MetaTableAccessor.scanMeta(MetaTableAccessor.java:734) > at > org.apache.hadoop.hbase.MetaTableAccessor.scanMeta(MetaTableAccessor.java:684) > at > org.apache.hadoop.hbase.MetaTableAccessor.scanMetaForTableRegions(MetaTableAccessor.java:679) > at > org.apache.hadoop.hbase.master.CatalogJanitor.getMergedRegionsAndSplitParents(CatalogJanitor.java:185) > at > org.apache.hadoop.hbase.master.CatalogJanitor.getMergedRegionsAndSplitParents(CatalogJanitor.java:137) > at > org.apache.hadoop.hbase.master.CatalogJanitor.scan(CatalogJanitor.java:243) > at > org.apache.hadoop.hbase.master.CatalogJanitor.chore(CatalogJanitor.java:116) > at org.apache.hadoop.hbase.ScheduledChore.run(ScheduledChore.java:186) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) > at > org.apache.hadoop.hbase.JitterScheduledThreadPoolExecutorImpl$JitteredRunnableScheduledFuture.run(JitterScheduledThreadPoolExecutorImpl.java:111) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > 2018-10-19 21:25:54,507 ERROR > org.apache.hadoop.hbase.master.normalizer.RegionNormalizerChore: Failed to > normalize regions. > java.io.IOException: connection is closed > at > org.apache.hadoop.hbase.MetaTableAccessor.getMetaHTable(MetaTableAccessor.java:267) > at > org.apache.hadoop.hbase.MetaTableAccessor.scanMeta(MetaTableAccessor.java:763) > at > org.apache.hadoop.hbase.MetaTableAccessor.scanMeta(MetaTableAccessor.java:734) > at > org.apache.hadoop.hbase.MetaTableAccessor.scanMeta(MetaTableAccessor.java:690) > at > org.apache.hadoop.hbase.MetaTableAccessor.fullScanTables(MetaTableAccessor.java:240) > at > org.apache.hadoop.hbase.master.TableStateManager.getTablesInStates(TableStateManager.java:189) > at > org.apache.hadoop.hbase.master.HMaster.normalizeRegions(HMaster.java:1718) > at > org.apache.hadoop.hbase.master.normalizer.RegionNormalizerChore.chore(RegionNormalizerChore.java:48) > at org.apache.hadoop.hbase.ScheduledChore.run(ScheduledChore.java:186) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) > at >
[jira] [Commented] (HBASE-21342) FileSystem in use may get closed by other bulk load call in secure bulkLoad
[ https://issues.apache.org/jira/browse/HBASE-21342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661336#comment-16661336 ] Mike Drob commented on HBASE-21342: --- I didn't drop the branch-1 versions, they were never in the fix version list. They're in affects version still. Paused my backports to consult with Stack offline about wether it's safe to commit to branch-2.1 right now, since I know he's prepping for a release. Will likely spin off separate backport issues. > FileSystem in use may get closed by other bulk load call in secure bulkLoad > > > Key: HBASE-21342 > URL: https://issues.apache.org/jira/browse/HBASE-21342 > Project: HBase > Issue Type: Bug >Affects Versions: 3.0.0, 2.1.0, 1.5.0, 1.3.3, 1.4.4, 2.0.1, 1.2.7 >Reporter: mazhenlin >Assignee: mazhenlin >Priority: Major > Fix For: 3.0.0, 2.2.0 > > Attachments: 21342.v1.txt, HBASE-21342.002.patch, > HBASE-21342.003.patch, HBASE-21342.004.patch, HBASE-21342.005.patch, > HBASE-21342.006.patch, HBASE-21342.007.patch, race.patch > > > As mentioned in [HBASE-15291|#HBASE-15291], there is a race condition. If > Two secure bulkload calls from the same UGI into two different regions and > one region finishes earlier, it will close the bulk load fs, and the other > region will fail. > > Another case would be more serious. The FileSystem.close() function needs two > synchronized variables : CACHE and deleteOnExit. If one region calls > FileSystem.closeAllForUGI ( in SecureBulkLoadManager.cleanupBulkLoad) while > another region is trying to close srcFS ( in > SecureBulkLoadListener.closeSrcFs) , can cause deadlock here. > > I have wrote a UT for this and fixed it using reference counter. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20623) Introduce the helper method "getCellBuilder()" to Mutation
[ https://issues.apache.org/jira/browse/HBASE-20623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661315#comment-16661315 ] Hadoop QA commented on HBASE-20623: --- | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 16s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 14s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 59s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 8s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 2m 12s{color} | {color:green} master passed {color} | | {color:blue}0{color} | {color:blue} refguide {color} | {color:blue} 5m 28s{color} | {color:blue} branch has no errors when building the reference guide. See footer for rendered docs, which you should manually inspect. {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 4m 18s{color} | {color:green} branch has no errors when building our shaded downstream artifacts. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: . {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 2s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 34s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 13s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 5m 12s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 8s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 6m 8s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 2m 17s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:blue}0{color} | {color:blue} refguide {color} | {color:blue} 5m 11s{color} | {color:blue} patch has no errors when building the reference guide. See footer for rendered docs, which you should manually inspect. {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 4m 33s{color} | {color:green} patch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 11m 52s{color} | {color:green} Patch does not cause any errors with Hadoop 2.7.4 or 3.0.0. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: . {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 15s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 35s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green}303m 24s{color} | {color:green} root in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 1m 42s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}378m 34s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:b002b0b | | JIRA Issue | HBASE-20623 | | JIRA Patch URL |
[jira] [Commented] (HBASE-21349) Cluster is going down but CatalogJanitor and Normalizer try to run and fail noisely
[ https://issues.apache.org/jira/browse/HBASE-21349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661277#comment-16661277 ] Hadoop QA commented on HBASE-21349: --- | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 14s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:orange}-0{color} | {color:orange} test4tests {color} | {color:orange} 0m 0s{color} | {color:orange} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 5m 26s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 49s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 15s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 4m 12s{color} | {color:green} branch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 12s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 37s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 5m 25s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 52s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 52s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 17s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 4m 34s{color} | {color:green} patch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 11m 22s{color} | {color:green} Patch does not cause any errors with Hadoop 2.7.4 or 3.0.0. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 31s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 34s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green}134m 41s{color} | {color:green} hbase-server in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 21s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}178m 49s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:b002b0b | | JIRA Issue | HBASE-21349 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12945252/HBASE-21349.master.002.patch | | Optional Tests | dupname asflicense javac javadoc unit findbugs shadedjars hadoopcheck hbaseanti checkstyle compile | | uname | Linux c8b0eb4d0b34 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 14:43:09 UTC 2018 x86_64 GNU/Linux | | Build tool | maven | | Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh | | git revision | master / 3b68e5393e | | maven | version: Apache Maven 3.5.4 (1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-17T18:33:14Z) | | Default Java | 1.8.0_181 | | findbugs | v3.1.0-RC3 | | Test Results | https://builds.apache.org/job/PreCommit-HBASE-Build/14827/testReport/ | | Max. process+thread count | 5051 (vs. ulimit of 1) | | modules | C: hbase-server U: hbase-server | | Console output |
[jira] [Commented] (HBASE-21342) FileSystem in use may get closed by other bulk load call in secure bulkLoad
[ https://issues.apache.org/jira/browse/HBASE-21342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661275#comment-16661275 ] Andrew Purtell commented on HBASE-21342: Why did we drop all of the branch-1 fix versions? Is this not an issue there? > FileSystem in use may get closed by other bulk load call in secure bulkLoad > > > Key: HBASE-21342 > URL: https://issues.apache.org/jira/browse/HBASE-21342 > Project: HBase > Issue Type: Bug >Affects Versions: 3.0.0, 2.1.0, 1.5.0, 1.3.3, 1.4.4, 2.0.1, 1.2.7 >Reporter: mazhenlin >Assignee: mazhenlin >Priority: Major > Fix For: 3.0.0, 2.2.0 > > Attachments: 21342.v1.txt, HBASE-21342.002.patch, > HBASE-21342.003.patch, HBASE-21342.004.patch, HBASE-21342.005.patch, > HBASE-21342.006.patch, HBASE-21342.007.patch, race.patch > > > As mentioned in [HBASE-15291|#HBASE-15291], there is a race condition. If > Two secure bulkload calls from the same UGI into two different regions and > one region finishes earlier, it will close the bulk load fs, and the other > region will fail. > > Another case would be more serious. The FileSystem.close() function needs two > synchronized variables : CACHE and deleteOnExit. If one region calls > FileSystem.closeAllForUGI ( in SecureBulkLoadManager.cleanupBulkLoad) while > another region is trying to close srcFS ( in > SecureBulkLoadListener.closeSrcFs) , can cause deadlock here. > > I have wrote a UT for this and fixed it using reference counter. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-21342) FileSystem in use may get closed by other bulk load call in secure bulkLoad
[ https://issues.apache.org/jira/browse/HBASE-21342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mike Drob updated HBASE-21342: -- Fix Version/s: 2.2.0 3.0.0 > FileSystem in use may get closed by other bulk load call in secure bulkLoad > > > Key: HBASE-21342 > URL: https://issues.apache.org/jira/browse/HBASE-21342 > Project: HBase > Issue Type: Bug >Affects Versions: 3.0.0, 2.1.0, 1.5.0, 1.3.3, 1.4.4, 2.0.1, 1.2.7 >Reporter: mazhenlin >Assignee: mazhenlin >Priority: Major > Fix For: 3.0.0, 2.2.0 > > Attachments: 21342.v1.txt, HBASE-21342.002.patch, > HBASE-21342.003.patch, HBASE-21342.004.patch, HBASE-21342.005.patch, > HBASE-21342.006.patch, HBASE-21342.007.patch, race.patch > > > As mentioned in [HBASE-15291|#HBASE-15291], there is a race condition. If > Two secure bulkload calls from the same UGI into two different regions and > one region finishes earlier, it will close the bulk load fs, and the other > region will fail. > > Another case would be more serious. The FileSystem.close() function needs two > synchronized variables : CACHE and deleteOnExit. If one region calls > FileSystem.closeAllForUGI ( in SecureBulkLoadManager.cleanupBulkLoad) while > another region is trying to close srcFS ( in > SecureBulkLoadListener.closeSrcFs) , can cause deadlock here. > > I have wrote a UT for this and fixed it using reference counter. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21349) Cluster is going down but CatalogJanitor and Normalizer try to run and fail noisely
[ https://issues.apache.org/jira/browse/HBASE-21349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661272#comment-16661272 ] Xu Cang commented on HBASE-21349: - It compiles locally. I think the hadoop-qa failure was caused by this commit: 86f23128b0d66deb70790785e63d2f7e01d5ab8d and Duo Zhang has fixed it in later commit. Let me re-trigger the Hadoop-QA. thanks [~stack] > Cluster is going down but CatalogJanitor and Normalizer try to run and fail > noisely > --- > > Key: HBASE-21349 > URL: https://issues.apache.org/jira/browse/HBASE-21349 > Project: HBase > Issue Type: Bug >Affects Versions: 2.1.0 >Reporter: stack >Assignee: Xu Cang >Priority: Minor > Fix For: 3.0.0, 2.2.0, 2.0.3, 2.1.2 > > Attachments: HBASE-21349.master.002.patch, > HBASE-21349.master.002.patch, HBASE-21349.master.002.patch, > HBASE-22349.master.001.patch > > > Shutting down can take a while. Meantime catalog janitor and or normalizer > (etc?) try to run and when they can't, they fail noisely. Looks bad: > {code} > 2018-10-19 21:23:24,962 INFO org.apache.hadoop.hbase.master.ServerManager: > Cluster shutdown set; vc1205.halxg.cloudera.com,22101,1539991730711 expired; > onlineServers=51 > 2018-10-19 21:25:54,502 WARN org.apache.hadoop.hbase.master.CatalogJanitor: > Failed scan of catalog table > java.io.IOException: connection is closed > at > org.apache.hadoop.hbase.MetaTableAccessor.getMetaHTable(MetaTableAccessor.java:267) > at > org.apache.hadoop.hbase.MetaTableAccessor.scanMeta(MetaTableAccessor.java:763) > at > org.apache.hadoop.hbase.MetaTableAccessor.scanMeta(MetaTableAccessor.java:734) > at > org.apache.hadoop.hbase.MetaTableAccessor.scanMeta(MetaTableAccessor.java:684) > at > org.apache.hadoop.hbase.MetaTableAccessor.scanMetaForTableRegions(MetaTableAccessor.java:679) > at > org.apache.hadoop.hbase.master.CatalogJanitor.getMergedRegionsAndSplitParents(CatalogJanitor.java:185) > at > org.apache.hadoop.hbase.master.CatalogJanitor.getMergedRegionsAndSplitParents(CatalogJanitor.java:137) > at > org.apache.hadoop.hbase.master.CatalogJanitor.scan(CatalogJanitor.java:243) > at > org.apache.hadoop.hbase.master.CatalogJanitor.chore(CatalogJanitor.java:116) > at org.apache.hadoop.hbase.ScheduledChore.run(ScheduledChore.java:186) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) > at > org.apache.hadoop.hbase.JitterScheduledThreadPoolExecutorImpl$JitteredRunnableScheduledFuture.run(JitterScheduledThreadPoolExecutorImpl.java:111) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > 2018-10-19 21:25:54,507 ERROR > org.apache.hadoop.hbase.master.normalizer.RegionNormalizerChore: Failed to > normalize regions. > java.io.IOException: connection is closed > at > org.apache.hadoop.hbase.MetaTableAccessor.getMetaHTable(MetaTableAccessor.java:267) > at > org.apache.hadoop.hbase.MetaTableAccessor.scanMeta(MetaTableAccessor.java:763) > at > org.apache.hadoop.hbase.MetaTableAccessor.scanMeta(MetaTableAccessor.java:734) > at > org.apache.hadoop.hbase.MetaTableAccessor.scanMeta(MetaTableAccessor.java:690) > at > org.apache.hadoop.hbase.MetaTableAccessor.fullScanTables(MetaTableAccessor.java:240) > at > org.apache.hadoop.hbase.master.TableStateManager.getTablesInStates(TableStateManager.java:189) > at > org.apache.hadoop.hbase.master.HMaster.normalizeRegions(HMaster.java:1718) > at > org.apache.hadoop.hbase.master.normalizer.RegionNormalizerChore.chore(RegionNormalizerChore.java:48) > at org.apache.hadoop.hbase.ScheduledChore.run(ScheduledChore.java:186) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) > at >
[jira] [Updated] (HBASE-21349) Cluster is going down but CatalogJanitor and Normalizer try to run and fail noisely
[ https://issues.apache.org/jira/browse/HBASE-21349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xu Cang updated HBASE-21349: Attachment: HBASE-21349.master.002.patch > Cluster is going down but CatalogJanitor and Normalizer try to run and fail > noisely > --- > > Key: HBASE-21349 > URL: https://issues.apache.org/jira/browse/HBASE-21349 > Project: HBase > Issue Type: Bug >Affects Versions: 2.1.0 >Reporter: stack >Assignee: Xu Cang >Priority: Minor > Fix For: 3.0.0, 2.2.0, 2.0.3, 2.1.2 > > Attachments: HBASE-21349.master.002.patch, > HBASE-21349.master.002.patch, HBASE-21349.master.002.patch, > HBASE-22349.master.001.patch > > > Shutting down can take a while. Meantime catalog janitor and or normalizer > (etc?) try to run and when they can't, they fail noisely. Looks bad: > {code} > 2018-10-19 21:23:24,962 INFO org.apache.hadoop.hbase.master.ServerManager: > Cluster shutdown set; vc1205.halxg.cloudera.com,22101,1539991730711 expired; > onlineServers=51 > 2018-10-19 21:25:54,502 WARN org.apache.hadoop.hbase.master.CatalogJanitor: > Failed scan of catalog table > java.io.IOException: connection is closed > at > org.apache.hadoop.hbase.MetaTableAccessor.getMetaHTable(MetaTableAccessor.java:267) > at > org.apache.hadoop.hbase.MetaTableAccessor.scanMeta(MetaTableAccessor.java:763) > at > org.apache.hadoop.hbase.MetaTableAccessor.scanMeta(MetaTableAccessor.java:734) > at > org.apache.hadoop.hbase.MetaTableAccessor.scanMeta(MetaTableAccessor.java:684) > at > org.apache.hadoop.hbase.MetaTableAccessor.scanMetaForTableRegions(MetaTableAccessor.java:679) > at > org.apache.hadoop.hbase.master.CatalogJanitor.getMergedRegionsAndSplitParents(CatalogJanitor.java:185) > at > org.apache.hadoop.hbase.master.CatalogJanitor.getMergedRegionsAndSplitParents(CatalogJanitor.java:137) > at > org.apache.hadoop.hbase.master.CatalogJanitor.scan(CatalogJanitor.java:243) > at > org.apache.hadoop.hbase.master.CatalogJanitor.chore(CatalogJanitor.java:116) > at org.apache.hadoop.hbase.ScheduledChore.run(ScheduledChore.java:186) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) > at > org.apache.hadoop.hbase.JitterScheduledThreadPoolExecutorImpl$JitteredRunnableScheduledFuture.run(JitterScheduledThreadPoolExecutorImpl.java:111) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > 2018-10-19 21:25:54,507 ERROR > org.apache.hadoop.hbase.master.normalizer.RegionNormalizerChore: Failed to > normalize regions. > java.io.IOException: connection is closed > at > org.apache.hadoop.hbase.MetaTableAccessor.getMetaHTable(MetaTableAccessor.java:267) > at > org.apache.hadoop.hbase.MetaTableAccessor.scanMeta(MetaTableAccessor.java:763) > at > org.apache.hadoop.hbase.MetaTableAccessor.scanMeta(MetaTableAccessor.java:734) > at > org.apache.hadoop.hbase.MetaTableAccessor.scanMeta(MetaTableAccessor.java:690) > at > org.apache.hadoop.hbase.MetaTableAccessor.fullScanTables(MetaTableAccessor.java:240) > at > org.apache.hadoop.hbase.master.TableStateManager.getTablesInStates(TableStateManager.java:189) > at > org.apache.hadoop.hbase.master.HMaster.normalizeRegions(HMaster.java:1718) > at > org.apache.hadoop.hbase.master.normalizer.RegionNormalizerChore.chore(RegionNormalizerChore.java:48) > at org.apache.hadoop.hbase.ScheduledChore.run(ScheduledChore.java:186) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) > at > org.apache.hadoop.hbase.JitterScheduledThreadPoolExecutorImpl$JitteredRunnableScheduledFuture.run(JitterScheduledThreadPoolExecutorImpl.java:111) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at >
[jira] [Commented] (HBASE-21342) FileSystem in use may get closed by other bulk load call in secure bulkLoad
[ https://issues.apache.org/jira/browse/HBASE-21342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661256#comment-16661256 ] Ted Yu commented on HBASE-21342: Mike: Please go ahead. Thanks > FileSystem in use may get closed by other bulk load call in secure bulkLoad > > > Key: HBASE-21342 > URL: https://issues.apache.org/jira/browse/HBASE-21342 > Project: HBase > Issue Type: Bug >Affects Versions: 3.0.0, 2.1.0, 1.5.0, 1.3.3, 1.4.4, 2.0.1, 1.2.7 >Reporter: mazhenlin >Assignee: mazhenlin >Priority: Major > Attachments: 21342.v1.txt, HBASE-21342.002.patch, > HBASE-21342.003.patch, HBASE-21342.004.patch, HBASE-21342.005.patch, > HBASE-21342.006.patch, HBASE-21342.007.patch, race.patch > > > As mentioned in [HBASE-15291|#HBASE-15291], there is a race condition. If > Two secure bulkload calls from the same UGI into two different regions and > one region finishes earlier, it will close the bulk load fs, and the other > region will fail. > > Another case would be more serious. The FileSystem.close() function needs two > synchronized variables : CACHE and deleteOnExit. If one region calls > FileSystem.closeAllForUGI ( in SecureBulkLoadManager.cleanupBulkLoad) while > another region is trying to close srcFS ( in > SecureBulkLoadListener.closeSrcFs) , can cause deadlock here. > > I have wrote a UT for this and fixed it using reference counter. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21373) Backport to branch-1, "HBASE-21338 [balancer] If balancer is an ill-fit for cluster size, it gives little indication"
[ https://issues.apache.org/jira/browse/HBASE-21373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661250#comment-16661250 ] Andrew Purtell commented on HBASE-21373: Thanks [~xucang] [~stack] > Backport to branch-1, "HBASE-21338 [balancer] If balancer is an ill-fit for > cluster size, it gives little indication" > - > > Key: HBASE-21373 > URL: https://issues.apache.org/jira/browse/HBASE-21373 > Project: HBase > Issue Type: Bug > Components: Operability >Reporter: stack >Assignee: Xu Cang >Priority: Major > > Issue to backport to branch-1. Hope you don't mind my assigning it to you Xu > Cang. -- This message was sent by Atlassian JIRA (v7.6.3#76005)