[jira] [Created] (HBASE-19893) restore_snapshot is broken in master branch
Toshihiro Suzuki created HBASE-19893: Summary: restore_snapshot is broken in master branch Key: HBASE-19893 URL: https://issues.apache.org/jira/browse/HBASE-19893 Project: HBase Issue Type: Bug Reporter: Toshihiro Suzuki When I was investigating HBASE-19850, I found restore_snapshot didn't work in master branch. Steps to reproduce are as follows: 1. Create a table {code} create "test", "cf" {code} 2. Load data (2000 rows) to the table {code} (0...2000).each\{|i| put "test", "row#{i}", "cf:col", "val"} {code} 3. Split the table {code} split "test" {code} 4. Take a snapshot {code} snapshot "test", "snap" {code} 5. Load more data (2000 rows) to the table and split the table agin {code} (2000...4000).each\{|i| put "test", "row#{i}", "cf:col", "val"} split "test" {code} 6. Restore the table from the snapshot {code} disable "test" restore_snapshot "snap" enable "test" {code} 8. Scan the table {code} scan "test" {code} However, this scan returns only 244 rows (it should return 2000 rows) like the following: {code} hbase(main):038:0> scan "test" ROW COLUMN+CELL row78 column=cf:col, timestamp=1517298307049, value=val row999 column=cf:col, timestamp=1517298307608, value=val 244 row(s) Took 0.1500 seconds {code} Also, the restored table should have 2 online regions but it has 3 online regions. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-19892) Checking 'patch attach' and yetus 0.7.0 and move to Yetus 0.7.0
[ https://issues.apache.org/jira/browse/HBASE-19892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16344595#comment-16344595 ] Hadoop QA commented on HBASE-19892: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 12s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 1s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 26s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 17s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 9m 24s{color} | {color:green} branch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 17s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 23s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 17s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 17s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 4m 48s{color} | {color:green} patch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 19m 48s{color} | {color:green} Patch does not cause any errors with Hadoop 2.6.5 2.7.4 or 3.0.0. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 17s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 21s{color} | {color:green} hbase-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 9s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 37m 21s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:eee3b01 | | JIRA Issue | HBASE-19892 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12908272/HBASE-19892.master.002.patch | | Optional Tests | asflicense javac javadoc unit shadedjars hadoopcheck xml compile | | uname | Linux 254110142b1a 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 14:43:09 UTC 2018 x86_64 GNU/Linux | | Build tool | maven | | Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh | | git revision | master / 9b8d7e0aef | | maven | version: Apache Maven 3.5.2 (138edd61fd100ec658bfa2d307c43b76940a5d7d; 2017-10-18T07:58:13Z) | | Default Java | 1.8.0_151 | | Test Results | https://builds.apache.org/job/PreCommit-HBASE-Build/11243/testReport/ | | Max. process+thread count | 348 (vs. ulimit of 1000) | | modules | C: hbase-common U: hbase-common | | Console output | https://builds.apache.org/job/PreCommit-HBASE-Build/11243/console | | Powered by | Apache Yetus 0.7.0 http://yetus.apache.org | This message was automatically generated. > Checking 'patch attach' and yetus 0.7.0 and move to Yetus 0.7.0 > --- > > Key: HBASE-19892 > URL: https://issues.apache.org/jira/browse/HBASE-19892 > Project: HBase > Issue Type: Bug >Reporter: stack >Assignee: stack >Priority: Major > Fix For: 1.5.0, 2.0.0-beta-2, 1.4.2 > > Attachments: HBASE-19892.master
[jira] [Commented] (HBASE-19892) Checking 'patch attach' and yetus 0.7.0 and move to Yetus 0.7.0
[ https://issues.apache.org/jira/browse/HBASE-19892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16344597#comment-16344597 ] Hadoop QA commented on HBASE-19892: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 10s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 54s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 17s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 9m 52s{color} | {color:green} branch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 17s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:red}-1{color} | {color:red} mvninstall {color} | {color:red} 4m 28s{color} | {color:red} root in the patch failed. {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 25s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 25s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 2s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 6m 9s{color} | {color:green} patch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 23m 28s{color} | {color:green} Patch does not cause any errors with Hadoop 2.6.5 2.7.4 or 3.0.0. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 17s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 18s{color} | {color:green} hbase-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 8s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 41m 34s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:eee3b01 | | JIRA Issue | HBASE-19892 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12908271/HBASE-19892.master.001.patch | | Optional Tests | asflicense javac javadoc unit shadedjars hadoopcheck xml compile | | uname | Linux 3d63e4b9f0be 3.13.0-133-generic #182-Ubuntu SMP Tue Sep 19 15:49:21 UTC 2017 x86_64 GNU/Linux | | Build tool | maven | | Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh | | git revision | master / 9b8d7e0aef | | maven | version: Apache Maven 3.5.2 (138edd61fd100ec658bfa2d307c43b76940a5d7d; 2017-10-18T07:58:13Z) | | Default Java | 1.8.0_151 | | mvninstall | https://builds.apache.org/job/PreCommit-HBASE-Build/11242/artifact/patchprocess/patch-mvninstall-root.txt | | Test Results | https://builds.apache.org/job/PreCommit-HBASE-Build/11242/testReport/ | | Max. process+thread count | 262 (vs. ulimit of 1000) | | modules | C: hbase-common U: hbase-common | | Console output | https://builds.apache.org/job/PreCommit-HBASE-Build/11242/console | | Powered by | Apache Yetus 0.7.0 http://yetus.apache.org | This message was automatically generated. > Checking 'patch attach' and yetus 0.7.0 and move to Yetus 0.7.0 > --- > > Key: HBASE-19892 > URL: https://issues.apache.org/jira/browse/HBASE-19892 > Project: HBase > Issue Type: Bug >Reporter: stack >Assignee: s
[jira] [Comment Edited] (HBASE-19887) Do not overwrite the surefire junit listener property in the pom of sub modules
[ https://issues.apache.org/jira/browse/HBASE-19887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16344590#comment-16344590 ] stack edited comment on HBASE-19887 at 1/30/18 7:01 AM: Patch looks beautiful. +1. Its the way it should have been was (Author: stack): Patch looks beautiful. +1. > Do not overwrite the surefire junit listener property in the pom of sub > modules > --- > > Key: HBASE-19887 > URL: https://issues.apache.org/jira/browse/HBASE-19887 > Project: HBase > Issue Type: Sub-task > Components: build >Reporter: Duo Zhang >Assignee: Duo Zhang >Priority: Major > Attachments: HBASE-19887.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-19887) Do not overwrite the surefire junit listener property in the pom of sub modules
[ https://issues.apache.org/jira/browse/HBASE-19887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16344590#comment-16344590 ] stack commented on HBASE-19887: --- Patch looks beautiful. +1. > Do not overwrite the surefire junit listener property in the pom of sub > modules > --- > > Key: HBASE-19887 > URL: https://issues.apache.org/jira/browse/HBASE-19887 > Project: HBase > Issue Type: Sub-task > Components: build >Reporter: Duo Zhang >Assignee: Duo Zhang >Priority: Major > Attachments: HBASE-19887.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-19887) Do not overwrite the surefire junit listener property in the pom of sub modules
[ https://issues.apache.org/jira/browse/HBASE-19887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16344588#comment-16344588 ] Duo Zhang commented on HBASE-19887: --- The ConnectionCountResourceAnalyzer had been removed since HBASE-13252. So I think for now we can just remove it... Mind taking a look at the patch sir? [~stack] Thanks. > Do not overwrite the surefire junit listener property in the pom of sub > modules > --- > > Key: HBASE-19887 > URL: https://issues.apache.org/jira/browse/HBASE-19887 > Project: HBase > Issue Type: Sub-task > Components: build >Reporter: Duo Zhang >Assignee: Duo Zhang >Priority: Major > Attachments: HBASE-19887.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (HBASE-17852) Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental backup)
[ https://issues.apache.org/jira/browse/HBASE-17852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16344578#comment-16344578 ] Appy edited comment on HBASE-17852 at 1/30/18 6:53 AM: --- bq. My intent was not to squash your opinions, but to avoid being blocked if you were not interested/busy as seemed might be the case. That's reasonable. Sorry for the delay on my part. Leaving a comment saying the same and that post-hoc reviews would be welcome would have avoided whole situation. The only thing that put me off was, finding it out myself, followed by certain tone in certain comments. [~elserj] i believe you. Everyone makes mistakes from time-to-time, i'm certain i must have done too. Always happy with "acknowledge, learn, and move past them" way. All's good (between us two). was (Author: appy): bq. My intent was not to squash your opinions, but to avoid being blocked if you were not interested/busy as seemed might be the case. That's reasonable. Sorry for the delay on my part. [~elserj] i believe you. Everyone makes mistakes from time-to-time, i'm certain i must have done too. Always happy with "acknowledge, learn, and move past them" way. All's good (between us two). > Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental > backup) > > > Key: HBASE-17852 > URL: https://issues.apache.org/jira/browse/HBASE-17852 > Project: HBase > Issue Type: Sub-task >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov >Priority: Major > Fix For: 3.0.0 > > Attachments: HBASE-17852-v10.patch, screenshot-1.png > > > Design approach rollback-via-snapshot implemented in this ticket: > # Before backup create/delete/merge starts we take a snapshot of the backup > meta-table (backup system table). This procedure is lightweight because meta > table is small, usually should fit a single region. > # When operation fails on a server side, we handle this failure by cleaning > up partial data in backup destination, followed by restoring backup > meta-table from a snapshot. > # When operation fails on a client side (abnormal termination, for example), > next time user will try create/merge/delete he(she) will see error message, > that system is in inconsistent state and repair is required, he(she) will > need to run backup repair tool. > # To avoid multiple writers to the backup system table (backup client and > BackupObserver's) we introduce small table ONLY to keep listing of bulk > loaded files. All backup observers will work only with this new tables. The > reason: in case of a failure during backup create/delete/merge/restore, when > system performs automatic rollback, some data written by backup observers > during failed operation may be lost. This is what we try to avoid. > # Second table keeps only bulk load related references. We do not care about > consistency of this table, because bulk load is idempotent operation and can > be repeated after failure. Partially written data in second table does not > affect on BackupHFileCleaner plugin, because this data (list of bulk loaded > files) correspond to a files which have not been loaded yet successfully and, > hence - are not visible to the system -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-19892) Checking 'patch attach' and yetus 0.7.0 and move to Yetus 0.7.0
[ https://issues.apache.org/jira/browse/HBASE-19892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16344585#comment-16344585 ] Hudson commented on HBASE-19892: SUCCESS: Integrated in Jenkins build HBase-1.3-IT #345 (See [https://builds.apache.org/job/HBase-1.3-IT/345/]) HBASE-19892 Checking patch attach and yetus 0.7.0 and move to Yetus (stack: rev 85b4615a1f5c88e3442ee47644a7a163cbbd3dfa) * (edit) dev-support/Jenkinsfile > Checking 'patch attach' and yetus 0.7.0 and move to Yetus 0.7.0 > --- > > Key: HBASE-19892 > URL: https://issues.apache.org/jira/browse/HBASE-19892 > Project: HBase > Issue Type: Bug >Reporter: stack >Assignee: stack >Priority: Major > Fix For: 1.5.0, 2.0.0-beta-2, 1.4.2 > > Attachments: HBASE-19892.master.001.patch, > HBASE-19892.master.002.patch, HBASE-19892.master.003.patch > > > _Yetus-0.7.0 has a fix for the changed Jira behavior that made it so we > weren't picking up the latest attached patch. Check it works and if it does > move over to yetus 0.7.0_ -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-19887) Do not overwrite the surefire junit listener property in the pom of sub modules
[ https://issues.apache.org/jira/browse/HBASE-19887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16344584#comment-16344584 ] Duo Zhang commented on HBASE-19887: --- Thanks sir. Let me check the code on branch-1.3. > Do not overwrite the surefire junit listener property in the pom of sub > modules > --- > > Key: HBASE-19887 > URL: https://issues.apache.org/jira/browse/HBASE-19887 > Project: HBase > Issue Type: Sub-task > Components: build >Reporter: Duo Zhang >Assignee: Duo Zhang >Priority: Major > Attachments: HBASE-19887.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-19887) Do not overwrite the surefire junit listener property in the pom of sub modules
[ https://issues.apache.org/jira/browse/HBASE-19887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16344582#comment-16344582 ] Duo Zhang commented on HBASE-19887: --- Remove ServerResourceCheckerJUnitListener since it is useless for now. I think we can add more checks in ResourceCheckerJUnitListener directly. Remove the surefire plugin definition in sub modules if not necessary so that they will not overwrite the default listener config. > Do not overwrite the surefire junit listener property in the pom of sub > modules > --- > > Key: HBASE-19887 > URL: https://issues.apache.org/jira/browse/HBASE-19887 > Project: HBase > Issue Type: Sub-task > Components: build >Reporter: Duo Zhang >Assignee: Duo Zhang >Priority: Major > Attachments: HBASE-19887.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-19887) Do not overwrite the surefire junit listener property in the pom of sub modules
[ https://issues.apache.org/jira/browse/HBASE-19887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16344580#comment-16344580 ] stack commented on HBASE-19887: --- Yes go for it. These resource checker/server resource checker are useful util added long ago. In branch-1.3 at least there is a difference. {code:java} static class ConnectionCountResourceAnalyzer extends ResourceChecker.ResourceAnalyzer { @Override public int getVal(Phase phase) { return HConnectionTestingUtility.getConnectionCount(); } } @Override protected void addResourceAnalyzer(ResourceChecker rc) { rc.addResourceAnalyzer(new ConnectionCountResourceAnalyzer()); }{code} Does our new ClassRule do what these listeners used to do? It doesn't seem too. Will I add it into ./hbase-common/src/test/java/org/apache/hadoop/hbase/HBaseClassTestRuleChecker.java ? Thanks [~Apache9] > Do not overwrite the surefire junit listener property in the pom of sub > modules > --- > > Key: HBASE-19887 > URL: https://issues.apache.org/jira/browse/HBASE-19887 > Project: HBase > Issue Type: Sub-task > Components: build >Reporter: Duo Zhang >Assignee: Duo Zhang >Priority: Major > Attachments: HBASE-19887.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-19887) Do not overwrite the surefire junit listener property in the pom of sub modules
[ https://issues.apache.org/jira/browse/HBASE-19887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Duo Zhang updated HBASE-19887: -- Attachment: (was: HBASE-19887.patch) > Do not overwrite the surefire junit listener property in the pom of sub > modules > --- > > Key: HBASE-19887 > URL: https://issues.apache.org/jira/browse/HBASE-19887 > Project: HBase > Issue Type: Sub-task > Components: build >Reporter: Duo Zhang >Assignee: Duo Zhang >Priority: Major > Attachments: HBASE-19887.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-19892) Checking 'patch attach' and yetus 0.7.0 and move to Yetus 0.7.0
[ https://issues.apache.org/jira/browse/HBASE-19892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16344579#comment-16344579 ] Chia-Ping Tsai commented on HBASE-19892: yay! we get rid of the jira noise about the deletion of patch. > Checking 'patch attach' and yetus 0.7.0 and move to Yetus 0.7.0 > --- > > Key: HBASE-19892 > URL: https://issues.apache.org/jira/browse/HBASE-19892 > Project: HBase > Issue Type: Bug >Reporter: stack >Assignee: stack >Priority: Major > Fix For: 1.5.0, 2.0.0-beta-2, 1.4.2 > > Attachments: HBASE-19892.master.001.patch, > HBASE-19892.master.002.patch, HBASE-19892.master.003.patch > > > _Yetus-0.7.0 has a fix for the changed Jira behavior that made it so we > weren't picking up the latest attached patch. Check it works and if it does > move over to yetus 0.7.0_ -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-19887) Do not overwrite the surefire junit listener property in the pom of sub modules
[ https://issues.apache.org/jira/browse/HBASE-19887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Duo Zhang updated HBASE-19887: -- Attachment: HBASE-19887.patch > Do not overwrite the surefire junit listener property in the pom of sub > modules > --- > > Key: HBASE-19887 > URL: https://issues.apache.org/jira/browse/HBASE-19887 > Project: HBase > Issue Type: Sub-task > Components: build >Reporter: Duo Zhang >Assignee: Duo Zhang >Priority: Major > Attachments: HBASE-19887.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-19887) Do not overwrite the surefire junit listener property in the pom of sub modules
[ https://issues.apache.org/jira/browse/HBASE-19887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Duo Zhang updated HBASE-19887: -- Summary: Do not overwrite the surefire junit listener property in the pom of sub modules (was: Find out why the HBaseClassTestRuleCheck does not work in pre commit) > Do not overwrite the surefire junit listener property in the pom of sub > modules > --- > > Key: HBASE-19887 > URL: https://issues.apache.org/jira/browse/HBASE-19887 > Project: HBase > Issue Type: Sub-task > Components: build >Reporter: Duo Zhang >Assignee: Duo Zhang >Priority: Major > Attachments: HBASE-19887.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-17852) Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental backup)
[ https://issues.apache.org/jira/browse/HBASE-17852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16344578#comment-16344578 ] Appy commented on HBASE-17852: -- bq. My intent was not to squash your opinions, but to avoid being blocked if you were not interested/busy as seemed might be the case. That's reasonable. Sorry for the delay on my part. [~elserj] i believe you. Everyone makes mistakes from time-to-time, i'm certain i must have done too. Always happy with "acknowledge, learn, and move past them" way. All's good (between us two). > Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental > backup) > > > Key: HBASE-17852 > URL: https://issues.apache.org/jira/browse/HBASE-17852 > Project: HBase > Issue Type: Sub-task >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov >Priority: Major > Fix For: 3.0.0 > > Attachments: HBASE-17852-v10.patch, screenshot-1.png > > > Design approach rollback-via-snapshot implemented in this ticket: > # Before backup create/delete/merge starts we take a snapshot of the backup > meta-table (backup system table). This procedure is lightweight because meta > table is small, usually should fit a single region. > # When operation fails on a server side, we handle this failure by cleaning > up partial data in backup destination, followed by restoring backup > meta-table from a snapshot. > # When operation fails on a client side (abnormal termination, for example), > next time user will try create/merge/delete he(she) will see error message, > that system is in inconsistent state and repair is required, he(she) will > need to run backup repair tool. > # To avoid multiple writers to the backup system table (backup client and > BackupObserver's) we introduce small table ONLY to keep listing of bulk > loaded files. All backup observers will work only with this new tables. The > reason: in case of a failure during backup create/delete/merge/restore, when > system performs automatic rollback, some data written by backup observers > during failed operation may be lost. This is what we try to avoid. > # Second table keeps only bulk load related references. We do not care about > consistency of this table, because bulk load is idempotent operation and can > be repeated after failure. Partially written data in second table does not > affect on BackupHFileCleaner plugin, because this data (list of bulk loaded > files) correspond to a files which have not been loaded yet successfully and, > hence - are not visible to the system -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-19892) Checking 'patch attach' and yetus 0.7.0 and move to Yetus 0.7.0
[ https://issues.apache.org/jira/browse/HBASE-19892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-19892: -- Release Note: Moved our internal yetus reference from 0.6.0 to 0.7.0. Concurrently, I changed hadoopqa to run with 0.7.0 (by editing the config in jenkins). (was: Moved our internal yetus reference from 0.6.0 to 0.7.0.) > Checking 'patch attach' and yetus 0.7.0 and move to Yetus 0.7.0 > --- > > Key: HBASE-19892 > URL: https://issues.apache.org/jira/browse/HBASE-19892 > Project: HBase > Issue Type: Bug >Reporter: stack >Assignee: stack >Priority: Major > Fix For: 1.5.0, 2.0.0-beta-2, 1.4.2 > > Attachments: HBASE-19892.master.001.patch, > HBASE-19892.master.002.patch, HBASE-19892.master.003.patch > > > _Yetus-0.7.0 has a fix for the changed Jira behavior that made it so we > weren't picking up the latest attached patch. Check it works and if it does > move over to yetus 0.7.0_ -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-19892) Checking 'patch attach' and yetus 0.7.0 and move to Yetus 0.7.0
[ https://issues.apache.org/jira/browse/HBASE-19892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-19892: -- Summary: Checking 'patch attach' and yetus 0.7.0 and move to Yetus 0.7.0 (was: Checking patch attach and yetus 0.7.0 and move to Yetus 0.7.0) > Checking 'patch attach' and yetus 0.7.0 and move to Yetus 0.7.0 > --- > > Key: HBASE-19892 > URL: https://issues.apache.org/jira/browse/HBASE-19892 > Project: HBase > Issue Type: Bug >Reporter: stack >Assignee: stack >Priority: Major > Fix For: 1.5.0, 2.0.0-beta-2, 1.4.2 > > Attachments: HBASE-19892.master.001.patch, > HBASE-19892.master.002.patch, HBASE-19892.master.003.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-19892) Checking 'patch attach' and yetus 0.7.0 and move to Yetus 0.7.0
[ https://issues.apache.org/jira/browse/HBASE-19892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-19892: -- Description: _Yetus-0.7.0 has a fix for the changed Jira behavior that made it so we weren't picking up the latest attached patch. Check it works and if it does move over to yetus 0.7.0_ > Checking 'patch attach' and yetus 0.7.0 and move to Yetus 0.7.0 > --- > > Key: HBASE-19892 > URL: https://issues.apache.org/jira/browse/HBASE-19892 > Project: HBase > Issue Type: Bug >Reporter: stack >Assignee: stack >Priority: Major > Fix For: 1.5.0, 2.0.0-beta-2, 1.4.2 > > Attachments: HBASE-19892.master.001.patch, > HBASE-19892.master.002.patch, HBASE-19892.master.003.patch > > > _Yetus-0.7.0 has a fix for the changed Jira behavior that made it so we > weren't picking up the latest attached patch. Check it works and if it does > move over to yetus 0.7.0_ -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-19892) Checking patch attach and yetus 0.7.0 and move to Yetus 0.7.0
[ https://issues.apache.org/jira/browse/HBASE-19892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-19892: -- Resolution: Fixed Assignee: stack Fix Version/s: 1.4.2 2.0.0-beta-2 1.5.0 Release Note: Moved our internal yetus reference from 0.6.0 to 0.7.0. Status: Resolved (was: Patch Available) Did following on master, branch-2, branch-1, branch-1.4, and branch-1.3: commit 85b4615a1f5c88e3442ee47644a7a163cbbd3dfa Author: Michael Stack Date: Mon Jan 29 22:34:31 2018 -0800 HBASE-19892 Checking patch attach and yetus 0.7.0 and move to Yetus 0.7.0 One-liner that ups our yetus version from 0.6.0 to 0.7.0. diff --git a/dev-support/Jenkinsfile b/dev-support/Jenkinsfile index b0d6724bf0..d6da2b8637 100644 --- a/dev-support/Jenkinsfile +++ b/dev-support/Jenkinsfile @@ -33,7 +33,7 @@ pipeline { TOOLS = "${env.WORKSPACE}/tools" // where we check out to across stages BASEDIR = "${env.WORKSPACE}/component" - YETUS_RELEASE = '0.6.0' + YETUS_RELEASE = '0.7.0' PROJECT = 'hbase' PROJECT_PERSONALITY = 'https://raw.githubusercontent.com/apache/hbase/master/dev-support/hbase-personality.sh' // This section of the docs tells folks not to use the javadoc tag. older branches have our old version of the check for said tag. > Checking patch attach and yetus 0.7.0 and move to Yetus 0.7.0 > - > > Key: HBASE-19892 > URL: https://issues.apache.org/jira/browse/HBASE-19892 > Project: HBase > Issue Type: Bug >Reporter: stack >Assignee: stack >Priority: Major > Fix For: 1.5.0, 2.0.0-beta-2, 1.4.2 > > Attachments: HBASE-19892.master.001.patch, > HBASE-19892.master.002.patch, HBASE-19892.master.003.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-19680) BufferedMutatorImpl#mutate should wait the result from AP in order to throw the failed mutations
[ https://issues.apache.org/jira/browse/HBASE-19680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16344571#comment-16344571 ] Chia-Ping Tsai commented on HBASE-19680: {quote}+1 on commit then. {quote} Thanks for the reviews. Let met do the performance testing before committing just in case concurrent condition exist. > BufferedMutatorImpl#mutate should wait the result from AP in order to throw > the failed mutations > > > Key: HBASE-19680 > URL: https://issues.apache.org/jira/browse/HBASE-19680 > Project: HBase > Issue Type: Improvement >Reporter: Chia-Ping Tsai >Assignee: Chia-Ping Tsai >Priority: Major > Fix For: 2.0.0-beta-2 > > Attachments: HBASE-19680.v0.patch, HBASE-19680.v1.patch > > > Currently, BMI#mutate doesn't wait the result from AP so the errors are > stored in AP. The only way which can return the errors to user is, calling > the flush to catch the exception. That is non-intuitive. > I feel BMI#mutate should wait the result. That is to say, user can parse the > exception thrown by BM#mutate to get the failed mutations. Also, we can > remove the global error from AP. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-19892) Checking patch attach and yetus 0.7.0 and move to Yetus 0.7.0
[ https://issues.apache.org/jira/browse/HBASE-19892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-19892: -- Summary: Checking patch attach and yetus 0.7.0 and move to Yetus 0.7.0 (was: Checking patch attach and yetus 0.7.0; ignore!) > Checking patch attach and yetus 0.7.0 and move to Yetus 0.7.0 > - > > Key: HBASE-19892 > URL: https://issues.apache.org/jira/browse/HBASE-19892 > Project: HBase > Issue Type: Bug >Reporter: stack >Priority: Major > Attachments: HBASE-19892.master.001.patch, > HBASE-19892.master.002.patch, HBASE-19892.master.003.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-19892) Checking patch attach and yetus 0.7.0 and move to Yetus 0.7.0
[ https://issues.apache.org/jira/browse/HBASE-19892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16344568#comment-16344568 ] stack commented on HBASE-19892: --- Seems to be doing right thing. Let me change our yetus version in JenkinsFile to be 0.7.0 as part of this issue. > Checking patch attach and yetus 0.7.0 and move to Yetus 0.7.0 > - > > Key: HBASE-19892 > URL: https://issues.apache.org/jira/browse/HBASE-19892 > Project: HBase > Issue Type: Bug >Reporter: stack >Priority: Major > Attachments: HBASE-19892.master.001.patch, > HBASE-19892.master.002.patch, HBASE-19892.master.003.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-19892) Checking patch attach and yetus 0.7.0; ignore!
[ https://issues.apache.org/jira/browse/HBASE-19892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-19892: -- Attachment: HBASE-19892.master.003.patch > Checking patch attach and yetus 0.7.0; ignore! > -- > > Key: HBASE-19892 > URL: https://issues.apache.org/jira/browse/HBASE-19892 > Project: HBase > Issue Type: Bug >Reporter: stack >Priority: Major > Attachments: HBASE-19892.master.001.patch, > HBASE-19892.master.002.patch, HBASE-19892.master.003.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-19892) Checking patch attach and yetus 0.7.0; ignore!
[ https://issues.apache.org/jira/browse/HBASE-19892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-19892: -- Attachment: HBASE-19892.master.002.patch > Checking patch attach and yetus 0.7.0; ignore! > -- > > Key: HBASE-19892 > URL: https://issues.apache.org/jira/browse/HBASE-19892 > Project: HBase > Issue Type: Bug >Reporter: stack >Priority: Major > Attachments: HBASE-19892.master.001.patch, > HBASE-19892.master.002.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-19892) Checking patch attach and yetus 0.7.0; ignore!
[ https://issues.apache.org/jira/browse/HBASE-19892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-19892: -- Attachment: HBASE-19892.master.001.patch > Checking patch attach and yetus 0.7.0; ignore! > -- > > Key: HBASE-19892 > URL: https://issues.apache.org/jira/browse/HBASE-19892 > Project: HBase > Issue Type: Bug >Reporter: stack >Priority: Major > Attachments: HBASE-19892.master.001.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-19892) Checking patch attach and yetus 0.7.0; ignore!
[ https://issues.apache.org/jira/browse/HBASE-19892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-19892: -- Status: Patch Available (was: Open) .001 non-change in the hbase-common pom.xml > Checking patch attach and yetus 0.7.0; ignore! > -- > > Key: HBASE-19892 > URL: https://issues.apache.org/jira/browse/HBASE-19892 > Project: HBase > Issue Type: Bug >Reporter: stack >Priority: Major > Attachments: HBASE-19892.master.001.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-19892) Checking patch attach and yetus 0.7.0; ignore!
stack created HBASE-19892: - Summary: Checking patch attach and yetus 0.7.0; ignore! Key: HBASE-19892 URL: https://issues.apache.org/jira/browse/HBASE-19892 Project: HBase Issue Type: Bug Reporter: stack -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (HBASE-19891) Up nightly test run timeout from 6 hours to 8
[ https://issues.apache.org/jira/browse/HBASE-19891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack resolved HBASE-19891. --- Resolution: Fixed Assignee: stack Fix Version/s: 2.0.0-beta-2 .001 is what I pushed to master and branch-2. > Up nightly test run timeout from 6 hours to 8 > - > > Key: HBASE-19891 > URL: https://issues.apache.org/jira/browse/HBASE-19891 > Project: HBase > Issue Type: Sub-task >Reporter: stack >Assignee: stack >Priority: Major > Fix For: 2.0.0-beta-2 > > Attachments: HBASE-19891.master.001.patch > > > Yesterday, a nightly run for hbase2 passed all unit tests against hadoop2. > Hadoop3 tests got cut off at the 6 hour mark, our maximum total run time. > This is crazy but for now, just up the max time from 6 to 8 hours to see if > we can get a good build in. Can work on breaking this down in subsequent > issues. To be clear, the nightly 2.0 runs full test suite against hadoop2 and > then hadoop3... this is why it takes a while. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-19891) Up nightly test run timeout from 6 hours to 8
[ https://issues.apache.org/jira/browse/HBASE-19891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-19891: -- Attachment: HBASE-19891.master.001.patch > Up nightly test run timeout from 6 hours to 8 > - > > Key: HBASE-19891 > URL: https://issues.apache.org/jira/browse/HBASE-19891 > Project: HBase > Issue Type: Sub-task >Reporter: stack >Priority: Major > Attachments: HBASE-19891.master.001.patch > > > Yesterday, a nightly run for hbase2 passed all unit tests against hadoop2. > Hadoop3 tests got cut off at the 6 hour mark, our maximum total run time. > This is crazy but for now, just up the max time from 6 to 8 hours to see if > we can get a good build in. Can work on breaking this down in subsequent > issues. To be clear, the nightly 2.0 runs full test suite against hadoop2 and > then hadoop3... this is why it takes a while. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-19891) Up nightly test run timeout from 6 hours to 8
[ https://issues.apache.org/jira/browse/HBASE-19891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-19891: -- Description: Yesterday, a nightly run for hbase2 passed all unit tests against hadoop2. Hadoop3 tests got cut off at the 6 hour mark, our maximum total run time. This is crazy but for now, just up the max time from 6 to 8 hours to see if we can get a good build in. Can work on breaking this down in subsequent issues. To be clear, the nightly 2.0 runs full test suite against hadoop2 and then hadoop3... this is why it takes a while. (was: Yesterday, a nightly run for hbase2 passed all unit tests against hadoop2. Hadoop3 tests got cut off at the 6 hour mark, our maximum total run time. This is crazy but for now, just up the max time from 6 to 8 hours to see if we can get a good build in. Can work on breaking this down in subsequent issues.) > Up nightly test run timeout from 6 hours to 8 > - > > Key: HBASE-19891 > URL: https://issues.apache.org/jira/browse/HBASE-19891 > Project: HBase > Issue Type: Sub-task >Reporter: stack >Priority: Major > > Yesterday, a nightly run for hbase2 passed all unit tests against hadoop2. > Hadoop3 tests got cut off at the 6 hour mark, our maximum total run time. > This is crazy but for now, just up the max time from 6 to 8 hours to see if > we can get a good build in. Can work on breaking this down in subsequent > issues. To be clear, the nightly 2.0 runs full test suite against hadoop2 and > then hadoop3... this is why it takes a while. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-19887) Find out why the HBaseClassTestRuleCheck does not work in pre commit
[ https://issues.apache.org/jira/browse/HBASE-19887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16344554#comment-16344554 ] Duo Zhang commented on HBASE-19887: --- OK, the problem is the same with hbase-common in HBASE-19873. We define a listener in hbase-server/pom.xml and overwrite the default listener in the parent pom.xml. {noformat} listener org.apache.hadoop.hbase.ServerResourceCheckerJUnitListener {noformat} Looked at the code, there is no difference between ServerResourceCheckerJUnitListener and ResourceCheckerJUnitListener? {code} /** * Monitor the resources. use by the tests All resources in {@link ResourceCheckerJUnitListener} * plus the number of connection. */ public class ServerResourceCheckerJUnitListener extends ResourceCheckerJUnitListener { } {code} Although the comments said that we have an extra number of connection check but seems not. In general I prefer we set the listener in the parent pom.xml. What do you think [~stack]. Thanks. > Find out why the HBaseClassTestRuleCheck does not work in pre commit > > > Key: HBASE-19887 > URL: https://issues.apache.org/jira/browse/HBASE-19887 > Project: HBase > Issue Type: Sub-task > Components: build >Reporter: Duo Zhang >Assignee: Duo Zhang >Priority: Major > Attachments: HBASE-19887.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-19891) Up nightly test run timeout from 6 hours to 8
stack created HBASE-19891: - Summary: Up nightly test run timeout from 6 hours to 8 Key: HBASE-19891 URL: https://issues.apache.org/jira/browse/HBASE-19891 Project: HBase Issue Type: Sub-task Reporter: stack Yesterday, a nightly run for hbase2 passed all unit tests against hadoop2. Hadoop3 tests got cut off at the 6 hour mark, our maximum total run time. This is crazy but for now, just up the max time from 6 to 8 hours to see if we can get a good build in. Can work on breaking this down in subsequent issues. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-19887) Find out why the HBaseClassTestRuleCheck does not work in pre commit
[ https://issues.apache.org/jira/browse/HBASE-19887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16344550#comment-16344550 ] Duo Zhang commented on HBASE-19887: --- So it works... Let me check why. > Find out why the HBaseClassTestRuleCheck does not work in pre commit > > > Key: HBASE-19887 > URL: https://issues.apache.org/jira/browse/HBASE-19887 > Project: HBase > Issue Type: Sub-task > Components: build >Reporter: Duo Zhang >Assignee: Duo Zhang >Priority: Major > Attachments: HBASE-19887.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-19887) Find out why the HBaseClassTestRuleCheck does not work in pre commit
[ https://issues.apache.org/jira/browse/HBASE-19887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16344547#comment-16344547 ] Hadoop QA commented on HBASE-19887: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 10s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Findbugs executables are not available. {color} | | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 26s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 17s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 22s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 5m 5s{color} | {color:green} branch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 16s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 18s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 16s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 16s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 23s{color} | {color:red} hbase-common: The patch generated 2 new + 0 unchanged - 0 fixed = 2 total (was 0) {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 4m 33s{color} | {color:green} patch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 18m 14s{color} | {color:green} Patch does not cause any errors with Hadoop 2.6.5 2.7.4 or 3.0.0. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 16s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 2m 5s{color} | {color:red} hbase-common in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 9s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 36m 7s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hbase.TestRuleChecker | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:eee3b01 | | JIRA Issue | HBASE-19887 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12908264/HBASE-19887.patch | | Optional Tests | asflicense javac javadoc unit findbugs shadedjars hadoopcheck hbaseanti checkstyle compile | | uname | Linux b18a4f89a7cb 3.13.0-133-generic #182-Ubuntu SMP Tue Sep 19 15:49:21 UTC 2017 x86_64 GNU/Linux | | Build tool | maven | | Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh | | git revision | master / 34c6c99041 | | maven | version: Apache Maven 3.5.2 (138edd61fd100ec658bfa2d307c43b76940a5d7d; 2017-10-18T07:58:13Z) | | Default Java | 1.8.0_151 | | checkstyle | https://builds.apache.org/job/PreCommit-HBASE-Build/11241/artifact/patchprocess/diff-checkstyle-hbase-common.txt | | unit | https://builds.apache.org/job/PreCommit-HBASE-Build/11241/artifact/patchprocess/patch-unit-hbase-common.txt | | Test Results | https://builds.apache.org/job/PreCommit-HBASE-Build/11241/testReport/ | | modules | C: hbase-common U: hbase-common | | Console output | https://build
[jira] [Updated] (HBASE-19868) TestCoprocessorWhitelistMasterObserver is flakey
[ https://issues.apache.org/jira/browse/HBASE-19868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-19868: -- Issue Type: Sub-task (was: Bug) Parent: HBASE-19147 > TestCoprocessorWhitelistMasterObserver is flakey > > > Key: HBASE-19868 > URL: https://issues.apache.org/jira/browse/HBASE-19868 > Project: HBase > Issue Type: Sub-task > Components: flakey, test >Affects Versions: 2.0.0-beta-1 >Reporter: Peter Somogyi >Assignee: Peter Somogyi >Priority: Major > Fix For: 2.0.0-beta-2 > > Attachments: HBASE-19868.branch-2.001.patch > > > TestCoprocessorWhitelistMasterObserver is failing 33% of the time. In the > logs it looks like the failure is related to Master initialization. > Following log is from > [https://builds.apache.org/job/HBase%20Nightly/job/branch-2/203] > {noformat} > 2018-01-26 02:36:36,686 WARN [M:0;1f0c4777c1ba:35049] > master.TableNamespaceManager(307): Caught exception in initializing namespace > table manager > org.apache.hadoop.hbase.DoNotRetryIOException: hconnection-0x18cd2ac8 closed > at > org.apache.hadoop.hbase.client.ConnectionImplementation.locateRegion(ConnectionImplementation.java:722) > at > org.apache.hadoop.hbase.client.ConnectionUtils$ShortCircuitingClusterConnection.locateRegion(ConnectionUtils.java:131) > at > org.apache.hadoop.hbase.client.ConnectionImplementation.locateRegion(ConnectionImplementation.java:714) > at > org.apache.hadoop.hbase.client.ConnectionUtils$ShortCircuitingClusterConnection.locateRegion(ConnectionUtils.java:131) > at > org.apache.hadoop.hbase.client.ConnectionImplementation.locateRegion(ConnectionImplementation.java:684) > at > org.apache.hadoop.hbase.client.ConnectionUtils$ShortCircuitingClusterConnection.locateRegion(ConnectionUtils.java:131) > at > org.apache.hadoop.hbase.client.ConnectionImplementation.getRegionLocation(ConnectionImplementation.java:562) > at > org.apache.hadoop.hbase.client.ConnectionUtils$ShortCircuitingClusterConnection.getRegionLocation(ConnectionUtils.java:131) > at > org.apache.hadoop.hbase.client.HRegionLocator.getRegionLocation(HRegionLocator.java:73) > at > org.apache.hadoop.hbase.client.RegionServerCallable.prepare(RegionServerCallable.java:223) > at > org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithRetries(RpcRetryingCallerImpl.java:105) > at org.apache.hadoop.hbase.client.HTable.get(HTable.java:388) > at org.apache.hadoop.hbase.client.HTable.get(HTable.java:362) > at > org.apache.hadoop.hbase.master.TableNamespaceManager.get(TableNamespaceManager.java:141) > at > org.apache.hadoop.hbase.master.TableNamespaceManager.isTableAvailableAndInitialized(TableNamespaceManager.java:281) > at > org.apache.hadoop.hbase.master.TableNamespaceManager.start(TableNamespaceManager.java:103) > at > org.apache.hadoop.hbase.master.ClusterSchemaServiceImpl.doStart(ClusterSchemaServiceImpl.java:62) > at > org.apache.hbase.thirdparty.com.google.common.util.concurrent.AbstractService.startAsync(AbstractService.java:226) > at > org.apache.hadoop.hbase.master.HMaster.initClusterSchemaService(HMaster.java:1059) > at > org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:921) > at > org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2034) > at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:553) > at java.lang.Thread.run(Thread.java:748) > 2018-01-26 02:36:36,691 ERROR [M:0;1f0c4777c1ba:35049] > helpers.MarkerIgnoringBase(159): Failed to become active master > java.lang.IllegalStateException: Expected the service > ClusterSchemaServiceImpl [FAILED] to be RUNNING, but the service has FAILED > at > org.apache.hbase.thirdparty.com.google.common.util.concurrent.AbstractService.checkCurrentState(AbstractService.java:345) > at > org.apache.hbase.thirdparty.com.google.common.util.concurrent.AbstractService.awaitRunning(AbstractService.java:291) > at > org.apache.hadoop.hbase.master.HMaster.initClusterSchemaService(HMaster.java:1061) > at > org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:921) > at > org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2034) > at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:553) > at java.lang.Thread.run(Thread.java:748) > Caused by: org.apache.hadoop.hbase.DoNotRetryIOException: > hconnection-0x18cd2ac8 closed > at > org.apache.hadoop.hbase.client.ConnectionImplementation.locateRegion(ConnectionImplementation.java:722) > at > org.apache.hadoop.hbase.client.ConnectionUtils$ShortCircuitingClusterConnection.locateRegion(ConnectionUtils.java:131) > at > org.apache.hadoop.hbase.client.ConnectionImplementation.locateRegion(ConnectionImplementation.java:714) > at > org.apache.hadoop.hbase.clien
[jira] [Commented] (HBASE-19868) TestCoprocessorWhitelistMasterObserver is flakey
[ https://issues.apache.org/jira/browse/HBASE-19868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16344545#comment-16344545 ] stack commented on HBASE-19868: --- Oh, flakey 2.0 just came in. It has a few of these failures in the top-list. Thats good because these have more logs [https://builds.apache.org/job/HBASE-Flaky-Tests-branch2.0/983/] I'm still a bit stuck though. Our thread-dumper seems to be missing in this test run > TestCoprocessorWhitelistMasterObserver is flakey > > > Key: HBASE-19868 > URL: https://issues.apache.org/jira/browse/HBASE-19868 > Project: HBase > Issue Type: Bug > Components: flakey, test >Affects Versions: 2.0.0-beta-1 >Reporter: Peter Somogyi >Assignee: Peter Somogyi >Priority: Major > Fix For: 2.0.0-beta-2 > > Attachments: HBASE-19868.branch-2.001.patch > > > TestCoprocessorWhitelistMasterObserver is failing 33% of the time. In the > logs it looks like the failure is related to Master initialization. > Following log is from > [https://builds.apache.org/job/HBase%20Nightly/job/branch-2/203] > {noformat} > 2018-01-26 02:36:36,686 WARN [M:0;1f0c4777c1ba:35049] > master.TableNamespaceManager(307): Caught exception in initializing namespace > table manager > org.apache.hadoop.hbase.DoNotRetryIOException: hconnection-0x18cd2ac8 closed > at > org.apache.hadoop.hbase.client.ConnectionImplementation.locateRegion(ConnectionImplementation.java:722) > at > org.apache.hadoop.hbase.client.ConnectionUtils$ShortCircuitingClusterConnection.locateRegion(ConnectionUtils.java:131) > at > org.apache.hadoop.hbase.client.ConnectionImplementation.locateRegion(ConnectionImplementation.java:714) > at > org.apache.hadoop.hbase.client.ConnectionUtils$ShortCircuitingClusterConnection.locateRegion(ConnectionUtils.java:131) > at > org.apache.hadoop.hbase.client.ConnectionImplementation.locateRegion(ConnectionImplementation.java:684) > at > org.apache.hadoop.hbase.client.ConnectionUtils$ShortCircuitingClusterConnection.locateRegion(ConnectionUtils.java:131) > at > org.apache.hadoop.hbase.client.ConnectionImplementation.getRegionLocation(ConnectionImplementation.java:562) > at > org.apache.hadoop.hbase.client.ConnectionUtils$ShortCircuitingClusterConnection.getRegionLocation(ConnectionUtils.java:131) > at > org.apache.hadoop.hbase.client.HRegionLocator.getRegionLocation(HRegionLocator.java:73) > at > org.apache.hadoop.hbase.client.RegionServerCallable.prepare(RegionServerCallable.java:223) > at > org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithRetries(RpcRetryingCallerImpl.java:105) > at org.apache.hadoop.hbase.client.HTable.get(HTable.java:388) > at org.apache.hadoop.hbase.client.HTable.get(HTable.java:362) > at > org.apache.hadoop.hbase.master.TableNamespaceManager.get(TableNamespaceManager.java:141) > at > org.apache.hadoop.hbase.master.TableNamespaceManager.isTableAvailableAndInitialized(TableNamespaceManager.java:281) > at > org.apache.hadoop.hbase.master.TableNamespaceManager.start(TableNamespaceManager.java:103) > at > org.apache.hadoop.hbase.master.ClusterSchemaServiceImpl.doStart(ClusterSchemaServiceImpl.java:62) > at > org.apache.hbase.thirdparty.com.google.common.util.concurrent.AbstractService.startAsync(AbstractService.java:226) > at > org.apache.hadoop.hbase.master.HMaster.initClusterSchemaService(HMaster.java:1059) > at > org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:921) > at > org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2034) > at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:553) > at java.lang.Thread.run(Thread.java:748) > 2018-01-26 02:36:36,691 ERROR [M:0;1f0c4777c1ba:35049] > helpers.MarkerIgnoringBase(159): Failed to become active master > java.lang.IllegalStateException: Expected the service > ClusterSchemaServiceImpl [FAILED] to be RUNNING, but the service has FAILED > at > org.apache.hbase.thirdparty.com.google.common.util.concurrent.AbstractService.checkCurrentState(AbstractService.java:345) > at > org.apache.hbase.thirdparty.com.google.common.util.concurrent.AbstractService.awaitRunning(AbstractService.java:291) > at > org.apache.hadoop.hbase.master.HMaster.initClusterSchemaService(HMaster.java:1061) > at > org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:921) > at > org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2034) > at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:553) > at java.lang.Thread.run(Thread.java:748) > Caused by: org.apache.hadoop.hbase.DoNotRetryIOException: > hconnection-0x18cd2ac8 closed > at > org.apache.hadoop.hbase.client.ConnectionImplementation.locateRegion(ConnectionImplementation.java:722) > a
[jira] [Commented] (HBASE-19147) All branch-2 unit tests pass
[ https://issues.apache.org/jira/browse/HBASE-19147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16344529#comment-16344529 ] stack commented on HBASE-19147: --- We got a pass on a nightly for hadoop2. Failed in hadoop3: https://builds.apache.org/view/H-L/view/HBase/job/HBase%20Nightly/job/branch-2/ > All branch-2 unit tests pass > > > Key: HBASE-19147 > URL: https://issues.apache.org/jira/browse/HBASE-19147 > Project: HBase > Issue Type: Bug > Components: test >Reporter: stack >Priority: Blocker > Fix For: 2.0.0-beta-2 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (HBASE-19868) TestCoprocessorWhitelistMasterObserver is flakey
[ https://issues.apache.org/jira/browse/HBASE-19868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack reassigned HBASE-19868: - Assignee: Peter Somogyi > TestCoprocessorWhitelistMasterObserver is flakey > > > Key: HBASE-19868 > URL: https://issues.apache.org/jira/browse/HBASE-19868 > Project: HBase > Issue Type: Bug > Components: flakey, test >Affects Versions: 2.0.0-beta-1 >Reporter: Peter Somogyi >Assignee: Peter Somogyi >Priority: Major > Fix For: 2.0.0-beta-2 > > Attachments: HBASE-19868.branch-2.001.patch > > > TestCoprocessorWhitelistMasterObserver is failing 33% of the time. In the > logs it looks like the failure is related to Master initialization. > Following log is from > [https://builds.apache.org/job/HBase%20Nightly/job/branch-2/203] > {noformat} > 2018-01-26 02:36:36,686 WARN [M:0;1f0c4777c1ba:35049] > master.TableNamespaceManager(307): Caught exception in initializing namespace > table manager > org.apache.hadoop.hbase.DoNotRetryIOException: hconnection-0x18cd2ac8 closed > at > org.apache.hadoop.hbase.client.ConnectionImplementation.locateRegion(ConnectionImplementation.java:722) > at > org.apache.hadoop.hbase.client.ConnectionUtils$ShortCircuitingClusterConnection.locateRegion(ConnectionUtils.java:131) > at > org.apache.hadoop.hbase.client.ConnectionImplementation.locateRegion(ConnectionImplementation.java:714) > at > org.apache.hadoop.hbase.client.ConnectionUtils$ShortCircuitingClusterConnection.locateRegion(ConnectionUtils.java:131) > at > org.apache.hadoop.hbase.client.ConnectionImplementation.locateRegion(ConnectionImplementation.java:684) > at > org.apache.hadoop.hbase.client.ConnectionUtils$ShortCircuitingClusterConnection.locateRegion(ConnectionUtils.java:131) > at > org.apache.hadoop.hbase.client.ConnectionImplementation.getRegionLocation(ConnectionImplementation.java:562) > at > org.apache.hadoop.hbase.client.ConnectionUtils$ShortCircuitingClusterConnection.getRegionLocation(ConnectionUtils.java:131) > at > org.apache.hadoop.hbase.client.HRegionLocator.getRegionLocation(HRegionLocator.java:73) > at > org.apache.hadoop.hbase.client.RegionServerCallable.prepare(RegionServerCallable.java:223) > at > org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithRetries(RpcRetryingCallerImpl.java:105) > at org.apache.hadoop.hbase.client.HTable.get(HTable.java:388) > at org.apache.hadoop.hbase.client.HTable.get(HTable.java:362) > at > org.apache.hadoop.hbase.master.TableNamespaceManager.get(TableNamespaceManager.java:141) > at > org.apache.hadoop.hbase.master.TableNamespaceManager.isTableAvailableAndInitialized(TableNamespaceManager.java:281) > at > org.apache.hadoop.hbase.master.TableNamespaceManager.start(TableNamespaceManager.java:103) > at > org.apache.hadoop.hbase.master.ClusterSchemaServiceImpl.doStart(ClusterSchemaServiceImpl.java:62) > at > org.apache.hbase.thirdparty.com.google.common.util.concurrent.AbstractService.startAsync(AbstractService.java:226) > at > org.apache.hadoop.hbase.master.HMaster.initClusterSchemaService(HMaster.java:1059) > at > org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:921) > at > org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2034) > at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:553) > at java.lang.Thread.run(Thread.java:748) > 2018-01-26 02:36:36,691 ERROR [M:0;1f0c4777c1ba:35049] > helpers.MarkerIgnoringBase(159): Failed to become active master > java.lang.IllegalStateException: Expected the service > ClusterSchemaServiceImpl [FAILED] to be RUNNING, but the service has FAILED > at > org.apache.hbase.thirdparty.com.google.common.util.concurrent.AbstractService.checkCurrentState(AbstractService.java:345) > at > org.apache.hbase.thirdparty.com.google.common.util.concurrent.AbstractService.awaitRunning(AbstractService.java:291) > at > org.apache.hadoop.hbase.master.HMaster.initClusterSchemaService(HMaster.java:1061) > at > org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:921) > at > org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2034) > at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:553) > at java.lang.Thread.run(Thread.java:748) > Caused by: org.apache.hadoop.hbase.DoNotRetryIOException: > hconnection-0x18cd2ac8 closed > at > org.apache.hadoop.hbase.client.ConnectionImplementation.locateRegion(ConnectionImplementation.java:722) > at > org.apache.hadoop.hbase.client.ConnectionUtils$ShortCircuitingClusterConnection.locateRegion(ConnectionUtils.java:131) > at > org.apache.hadoop.hbase.client.ConnectionImplementation.locateRegion(ConnectionImplementation.java:714) > at > org.apache.hadoop.hbase.client.ConnectionUtils$ShortCircuitingCl
[jira] [Commented] (HBASE-19868) TestCoprocessorWhitelistMasterObserver is flakey
[ https://issues.apache.org/jira/browse/HBASE-19868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16344524#comment-16344524 ] stack commented on HBASE-19868: --- 2.001 just ups retries from 1 to 5. Test runs in about same time locally at least. Pushed to master and branch-2 to see. > TestCoprocessorWhitelistMasterObserver is flakey > > > Key: HBASE-19868 > URL: https://issues.apache.org/jira/browse/HBASE-19868 > Project: HBase > Issue Type: Bug > Components: flakey, test >Affects Versions: 2.0.0-beta-1 >Reporter: Peter Somogyi >Priority: Major > Fix For: 2.0.0-beta-2 > > Attachments: HBASE-19868.branch-2.001.patch > > > TestCoprocessorWhitelistMasterObserver is failing 33% of the time. In the > logs it looks like the failure is related to Master initialization. > Following log is from > [https://builds.apache.org/job/HBase%20Nightly/job/branch-2/203] > {noformat} > 2018-01-26 02:36:36,686 WARN [M:0;1f0c4777c1ba:35049] > master.TableNamespaceManager(307): Caught exception in initializing namespace > table manager > org.apache.hadoop.hbase.DoNotRetryIOException: hconnection-0x18cd2ac8 closed > at > org.apache.hadoop.hbase.client.ConnectionImplementation.locateRegion(ConnectionImplementation.java:722) > at > org.apache.hadoop.hbase.client.ConnectionUtils$ShortCircuitingClusterConnection.locateRegion(ConnectionUtils.java:131) > at > org.apache.hadoop.hbase.client.ConnectionImplementation.locateRegion(ConnectionImplementation.java:714) > at > org.apache.hadoop.hbase.client.ConnectionUtils$ShortCircuitingClusterConnection.locateRegion(ConnectionUtils.java:131) > at > org.apache.hadoop.hbase.client.ConnectionImplementation.locateRegion(ConnectionImplementation.java:684) > at > org.apache.hadoop.hbase.client.ConnectionUtils$ShortCircuitingClusterConnection.locateRegion(ConnectionUtils.java:131) > at > org.apache.hadoop.hbase.client.ConnectionImplementation.getRegionLocation(ConnectionImplementation.java:562) > at > org.apache.hadoop.hbase.client.ConnectionUtils$ShortCircuitingClusterConnection.getRegionLocation(ConnectionUtils.java:131) > at > org.apache.hadoop.hbase.client.HRegionLocator.getRegionLocation(HRegionLocator.java:73) > at > org.apache.hadoop.hbase.client.RegionServerCallable.prepare(RegionServerCallable.java:223) > at > org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithRetries(RpcRetryingCallerImpl.java:105) > at org.apache.hadoop.hbase.client.HTable.get(HTable.java:388) > at org.apache.hadoop.hbase.client.HTable.get(HTable.java:362) > at > org.apache.hadoop.hbase.master.TableNamespaceManager.get(TableNamespaceManager.java:141) > at > org.apache.hadoop.hbase.master.TableNamespaceManager.isTableAvailableAndInitialized(TableNamespaceManager.java:281) > at > org.apache.hadoop.hbase.master.TableNamespaceManager.start(TableNamespaceManager.java:103) > at > org.apache.hadoop.hbase.master.ClusterSchemaServiceImpl.doStart(ClusterSchemaServiceImpl.java:62) > at > org.apache.hbase.thirdparty.com.google.common.util.concurrent.AbstractService.startAsync(AbstractService.java:226) > at > org.apache.hadoop.hbase.master.HMaster.initClusterSchemaService(HMaster.java:1059) > at > org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:921) > at > org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2034) > at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:553) > at java.lang.Thread.run(Thread.java:748) > 2018-01-26 02:36:36,691 ERROR [M:0;1f0c4777c1ba:35049] > helpers.MarkerIgnoringBase(159): Failed to become active master > java.lang.IllegalStateException: Expected the service > ClusterSchemaServiceImpl [FAILED] to be RUNNING, but the service has FAILED > at > org.apache.hbase.thirdparty.com.google.common.util.concurrent.AbstractService.checkCurrentState(AbstractService.java:345) > at > org.apache.hbase.thirdparty.com.google.common.util.concurrent.AbstractService.awaitRunning(AbstractService.java:291) > at > org.apache.hadoop.hbase.master.HMaster.initClusterSchemaService(HMaster.java:1061) > at > org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:921) > at > org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2034) > at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:553) > at java.lang.Thread.run(Thread.java:748) > Caused by: org.apache.hadoop.hbase.DoNotRetryIOException: > hconnection-0x18cd2ac8 closed > at > org.apache.hadoop.hbase.client.ConnectionImplementation.locateRegion(ConnectionImplementation.java:722) > at > org.apache.hadoop.hbase.client.ConnectionUtils$ShortCircuitingClusterConnection.locateRegion(ConnectionUtils.java:131) > at > org.apache.hadoop.hbase.client.ConnectionImplementation.locateRegion
[jira] [Updated] (HBASE-19868) TestCoprocessorWhitelistMasterObserver is flakey
[ https://issues.apache.org/jira/browse/HBASE-19868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-19868: -- Attachment: HBASE-19868.branch-2.001.patch > TestCoprocessorWhitelistMasterObserver is flakey > > > Key: HBASE-19868 > URL: https://issues.apache.org/jira/browse/HBASE-19868 > Project: HBase > Issue Type: Bug > Components: flakey, test >Affects Versions: 2.0.0-beta-1 >Reporter: Peter Somogyi >Priority: Major > Fix For: 2.0.0-beta-2 > > Attachments: HBASE-19868.branch-2.001.patch > > > TestCoprocessorWhitelistMasterObserver is failing 33% of the time. In the > logs it looks like the failure is related to Master initialization. > Following log is from > [https://builds.apache.org/job/HBase%20Nightly/job/branch-2/203] > {noformat} > 2018-01-26 02:36:36,686 WARN [M:0;1f0c4777c1ba:35049] > master.TableNamespaceManager(307): Caught exception in initializing namespace > table manager > org.apache.hadoop.hbase.DoNotRetryIOException: hconnection-0x18cd2ac8 closed > at > org.apache.hadoop.hbase.client.ConnectionImplementation.locateRegion(ConnectionImplementation.java:722) > at > org.apache.hadoop.hbase.client.ConnectionUtils$ShortCircuitingClusterConnection.locateRegion(ConnectionUtils.java:131) > at > org.apache.hadoop.hbase.client.ConnectionImplementation.locateRegion(ConnectionImplementation.java:714) > at > org.apache.hadoop.hbase.client.ConnectionUtils$ShortCircuitingClusterConnection.locateRegion(ConnectionUtils.java:131) > at > org.apache.hadoop.hbase.client.ConnectionImplementation.locateRegion(ConnectionImplementation.java:684) > at > org.apache.hadoop.hbase.client.ConnectionUtils$ShortCircuitingClusterConnection.locateRegion(ConnectionUtils.java:131) > at > org.apache.hadoop.hbase.client.ConnectionImplementation.getRegionLocation(ConnectionImplementation.java:562) > at > org.apache.hadoop.hbase.client.ConnectionUtils$ShortCircuitingClusterConnection.getRegionLocation(ConnectionUtils.java:131) > at > org.apache.hadoop.hbase.client.HRegionLocator.getRegionLocation(HRegionLocator.java:73) > at > org.apache.hadoop.hbase.client.RegionServerCallable.prepare(RegionServerCallable.java:223) > at > org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithRetries(RpcRetryingCallerImpl.java:105) > at org.apache.hadoop.hbase.client.HTable.get(HTable.java:388) > at org.apache.hadoop.hbase.client.HTable.get(HTable.java:362) > at > org.apache.hadoop.hbase.master.TableNamespaceManager.get(TableNamespaceManager.java:141) > at > org.apache.hadoop.hbase.master.TableNamespaceManager.isTableAvailableAndInitialized(TableNamespaceManager.java:281) > at > org.apache.hadoop.hbase.master.TableNamespaceManager.start(TableNamespaceManager.java:103) > at > org.apache.hadoop.hbase.master.ClusterSchemaServiceImpl.doStart(ClusterSchemaServiceImpl.java:62) > at > org.apache.hbase.thirdparty.com.google.common.util.concurrent.AbstractService.startAsync(AbstractService.java:226) > at > org.apache.hadoop.hbase.master.HMaster.initClusterSchemaService(HMaster.java:1059) > at > org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:921) > at > org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2034) > at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:553) > at java.lang.Thread.run(Thread.java:748) > 2018-01-26 02:36:36,691 ERROR [M:0;1f0c4777c1ba:35049] > helpers.MarkerIgnoringBase(159): Failed to become active master > java.lang.IllegalStateException: Expected the service > ClusterSchemaServiceImpl [FAILED] to be RUNNING, but the service has FAILED > at > org.apache.hbase.thirdparty.com.google.common.util.concurrent.AbstractService.checkCurrentState(AbstractService.java:345) > at > org.apache.hbase.thirdparty.com.google.common.util.concurrent.AbstractService.awaitRunning(AbstractService.java:291) > at > org.apache.hadoop.hbase.master.HMaster.initClusterSchemaService(HMaster.java:1061) > at > org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:921) > at > org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2034) > at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:553) > at java.lang.Thread.run(Thread.java:748) > Caused by: org.apache.hadoop.hbase.DoNotRetryIOException: > hconnection-0x18cd2ac8 closed > at > org.apache.hadoop.hbase.client.ConnectionImplementation.locateRegion(ConnectionImplementation.java:722) > at > org.apache.hadoop.hbase.client.ConnectionUtils$ShortCircuitingClusterConnection.locateRegion(ConnectionUtils.java:131) > at > org.apache.hadoop.hbase.client.ConnectionImplementation.locateRegion(ConnectionImplementation.java:714) > at > org.apache.hadoop.hbase.client.ConnectionUtils$ShortCircuitingClusterConnection.locateReg
[jira] [Commented] (HBASE-13153) Bulk Loaded HFile Replication
[ https://issues.apache.org/jira/browse/HBASE-13153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16344518#comment-16344518 ] Ashish Singhi commented on HBASE-13153: --- Hi [~anoop.hbase], I will try to post a patch for document update coming weekend. On week days I am little busy with my paid job. bq. Many bugs were resolved in bulk load and all such fix should be there in 1.3+ versions. AFAIK all those bugs are fixed in 1.3+ versions. I usually keep a eye on it. bq. The mentioned potential issues also seems out of date. which one's you are pointing at ? > Bulk Loaded HFile Replication > - > > Key: HBASE-13153 > URL: https://issues.apache.org/jira/browse/HBASE-13153 > Project: HBase > Issue Type: New Feature > Components: Replication >Reporter: sunhaitao >Assignee: Ashish Singhi >Priority: Major > Fix For: 2.0.0, 1.3.0 > > Attachments: HBASE-13153-branch-1-v20.patch, > HBASE-13153-branch-1-v21.patch, HBASE-13153-v1.patch, HBASE-13153-v10.patch, > HBASE-13153-v11.patch, HBASE-13153-v12.patch, HBASE-13153-v13.patch, > HBASE-13153-v14.patch, HBASE-13153-v15.patch, HBASE-13153-v16.patch, > HBASE-13153-v17.patch, HBASE-13153-v18.patch, HBASE-13153-v19.patch, > HBASE-13153-v2.patch, HBASE-13153-v20.patch, HBASE-13153-v21.patch, > HBASE-13153-v3.patch, HBASE-13153-v4.patch, HBASE-13153-v5.patch, > HBASE-13153-v6.patch, HBASE-13153-v7.patch, HBASE-13153-v8.patch, > HBASE-13153-v9.patch, HBASE-13153.patch, HBase Bulk Load > Replication-v1-1.pdf, HBase Bulk Load Replication-v2.pdf, HBase Bulk Load > Replication-v3.pdf, HBase Bulk Load Replication.pdf, HDFS_HA_Solution.PNG > > > Currently we plan to use HBase Replication feature to deal with disaster > tolerance scenario.But we encounter an issue that we will use bulkload very > frequently,because bulkload bypass write path, and will not generate WAL, so > the data will not be replicated to backup cluster. It's inappropriate to > bukload twice both on active cluster and backup cluster. So i advise do some > modification to bulkload feature to enable bukload to both active cluster and > backup cluster -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-19887) Find out why the HBaseClassTestRuleCheck does not work in pre commit
[ https://issues.apache.org/jira/browse/HBASE-19887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Duo Zhang updated HBASE-19887: -- Assignee: Duo Zhang Status: Patch Available (was: Open) > Find out why the HBaseClassTestRuleCheck does not work in pre commit > > > Key: HBASE-19887 > URL: https://issues.apache.org/jira/browse/HBASE-19887 > Project: HBase > Issue Type: Sub-task > Components: build >Reporter: Duo Zhang >Assignee: Duo Zhang >Priority: Major > Attachments: HBASE-19887.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-19887) Find out why the HBaseClassTestRuleCheck does not work in pre commit
[ https://issues.apache.org/jira/browse/HBASE-19887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Duo Zhang updated HBASE-19887: -- Attachment: HBASE-19887.patch > Find out why the HBaseClassTestRuleCheck does not work in pre commit > > > Key: HBASE-19887 > URL: https://issues.apache.org/jira/browse/HBASE-19887 > Project: HBase > Issue Type: Sub-task > Components: build >Reporter: Duo Zhang >Priority: Major > Attachments: HBASE-19887.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-13153) Bulk Loaded HFile Replication
[ https://issues.apache.org/jira/browse/HBASE-13153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16344508#comment-16344508 ] Anoop Sam John commented on HBASE-13153: This feature as such is not described in book? If so, we have to add. Ya we have to update the section with removing the replication related limitation. Many bugs were resolved in bulk load and all such fix should be there in 1.3+ versions. The mentioned potential issues also seems out of date. > Bulk Loaded HFile Replication > - > > Key: HBASE-13153 > URL: https://issues.apache.org/jira/browse/HBASE-13153 > Project: HBase > Issue Type: New Feature > Components: Replication >Reporter: sunhaitao >Assignee: Ashish Singhi >Priority: Major > Fix For: 2.0.0, 1.3.0 > > Attachments: HBASE-13153-branch-1-v20.patch, > HBASE-13153-branch-1-v21.patch, HBASE-13153-v1.patch, HBASE-13153-v10.patch, > HBASE-13153-v11.patch, HBASE-13153-v12.patch, HBASE-13153-v13.patch, > HBASE-13153-v14.patch, HBASE-13153-v15.patch, HBASE-13153-v16.patch, > HBASE-13153-v17.patch, HBASE-13153-v18.patch, HBASE-13153-v19.patch, > HBASE-13153-v2.patch, HBASE-13153-v20.patch, HBASE-13153-v21.patch, > HBASE-13153-v3.patch, HBASE-13153-v4.patch, HBASE-13153-v5.patch, > HBASE-13153-v6.patch, HBASE-13153-v7.patch, HBASE-13153-v8.patch, > HBASE-13153-v9.patch, HBASE-13153.patch, HBase Bulk Load > Replication-v1-1.pdf, HBase Bulk Load Replication-v2.pdf, HBase Bulk Load > Replication-v3.pdf, HBase Bulk Load Replication.pdf, HDFS_HA_Solution.PNG > > > Currently we plan to use HBase Replication feature to deal with disaster > tolerance scenario.But we encounter an issue that we will use bulkload very > frequently,because bulkload bypass write path, and will not generate WAL, so > the data will not be replicated to backup cluster. It's inappropriate to > bukload twice both on active cluster and backup cluster. So i advise do some > modification to bulkload feature to enable bukload to both active cluster and > backup cluster -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-19868) TestCoprocessorWhitelistMasterObserver is flakey
[ https://issues.apache.org/jira/browse/HBASE-19868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16344504#comment-16344504 ] stack commented on HBASE-19868: --- [~psomogyi] How'd you extract the exception? Going about nightlies, I was unable to get good log. The exception seems to come after the log run at 02:36:36,686 (Log here ends at [https://builds.apache.org/job/HBase%20Nightly/job/branch-2/203/testReport/junit/org.apache.hadoop.hbase.security.access/TestCoprocessorWhitelistMasterObserver/org_apache_hadoop_hbase_security_access_TestCoprocessorWhitelistMasterObserver/] 2018-01-26 02:36:36,440) I can't make the test fail locally which I'm guessing is what you are finding. (I was looking to see why we are not archiving the surefire test run output seemingly. It looks like the flags are in place but i can't find the raw emissions. Need to dig more). So, the odd thing about this test is conf.setInt("hbase.client.retries.number", 1) (I think). If stuff is slow around startup, we'll fail our one attempt. Its happening in ClusterSchemaServiceImpl which is cast as a Guava Service. It is being started async. Are we stuck in isTableAvailableAndInitialized failing our one attempt over and over. I can't tell. Or is it that Master fails to come up and then we are just stuck in mini cluster startup trying to scan a meta that is not coming? Let me try some debug and up the retries to 5 Test should still fail fast > TestCoprocessorWhitelistMasterObserver is flakey > > > Key: HBASE-19868 > URL: https://issues.apache.org/jira/browse/HBASE-19868 > Project: HBase > Issue Type: Bug > Components: flakey, test >Affects Versions: 2.0.0-beta-1 >Reporter: Peter Somogyi >Priority: Major > Fix For: 2.0.0-beta-2 > > > TestCoprocessorWhitelistMasterObserver is failing 33% of the time. In the > logs it looks like the failure is related to Master initialization. > Following log is from > [https://builds.apache.org/job/HBase%20Nightly/job/branch-2/203] > {noformat} > 2018-01-26 02:36:36,686 WARN [M:0;1f0c4777c1ba:35049] > master.TableNamespaceManager(307): Caught exception in initializing namespace > table manager > org.apache.hadoop.hbase.DoNotRetryIOException: hconnection-0x18cd2ac8 closed > at > org.apache.hadoop.hbase.client.ConnectionImplementation.locateRegion(ConnectionImplementation.java:722) > at > org.apache.hadoop.hbase.client.ConnectionUtils$ShortCircuitingClusterConnection.locateRegion(ConnectionUtils.java:131) > at > org.apache.hadoop.hbase.client.ConnectionImplementation.locateRegion(ConnectionImplementation.java:714) > at > org.apache.hadoop.hbase.client.ConnectionUtils$ShortCircuitingClusterConnection.locateRegion(ConnectionUtils.java:131) > at > org.apache.hadoop.hbase.client.ConnectionImplementation.locateRegion(ConnectionImplementation.java:684) > at > org.apache.hadoop.hbase.client.ConnectionUtils$ShortCircuitingClusterConnection.locateRegion(ConnectionUtils.java:131) > at > org.apache.hadoop.hbase.client.ConnectionImplementation.getRegionLocation(ConnectionImplementation.java:562) > at > org.apache.hadoop.hbase.client.ConnectionUtils$ShortCircuitingClusterConnection.getRegionLocation(ConnectionUtils.java:131) > at > org.apache.hadoop.hbase.client.HRegionLocator.getRegionLocation(HRegionLocator.java:73) > at > org.apache.hadoop.hbase.client.RegionServerCallable.prepare(RegionServerCallable.java:223) > at > org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithRetries(RpcRetryingCallerImpl.java:105) > at org.apache.hadoop.hbase.client.HTable.get(HTable.java:388) > at org.apache.hadoop.hbase.client.HTable.get(HTable.java:362) > at > org.apache.hadoop.hbase.master.TableNamespaceManager.get(TableNamespaceManager.java:141) > at > org.apache.hadoop.hbase.master.TableNamespaceManager.isTableAvailableAndInitialized(TableNamespaceManager.java:281) > at > org.apache.hadoop.hbase.master.TableNamespaceManager.start(TableNamespaceManager.java:103) > at > org.apache.hadoop.hbase.master.ClusterSchemaServiceImpl.doStart(ClusterSchemaServiceImpl.java:62) > at > org.apache.hbase.thirdparty.com.google.common.util.concurrent.AbstractService.startAsync(AbstractService.java:226) > at > org.apache.hadoop.hbase.master.HMaster.initClusterSchemaService(HMaster.java:1059) > at > org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:921) > at > org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2034) > at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:553) > at java.lang.Thread.run(Thread.java:748) > 2018-01-26 02:36:36,691 ERROR [M:0;1f0c4777c1ba:35049] > helpers.MarkerIgnoringBase(159): Failed to become active master > java.lang.IllegalStateException: Expected the service > ClusterSch
[jira] [Commented] (HBASE-19824) SingleColumnValueFilter returns wrong result when used in shell command
[ https://issues.apache.org/jira/browse/HBASE-19824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16344456#comment-16344456 ] Reid Chan commented on HBASE-19824: --- Got it! Thanks Chia-Ping, Ted. > SingleColumnValueFilter returns wrong result when used in shell command > --- > > Key: HBASE-19824 > URL: https://issues.apache.org/jira/browse/HBASE-19824 > Project: HBase > Issue Type: Bug >Affects Versions: 2.0.0-alpha-4 >Reporter: Ted Yu >Assignee: Reid Chan >Priority: Major > > There are two cells in table t1: > {code} > ROW COLUMN+CELL > r1 column=f1:a1, > timestamp=1516313683984, value=a2 > r1 column=f1:b1, > timestamp=1516313700744, value=b2 > {code} > When SingleColumnValueFilter is used in shell command, no filtering was done: > {code} > hbase(main):022:0> scan 't1', {FILTER => "SingleColumnValueFilter('f1', 'a1', > =, 'binary:a2')"} > ROW COLUMN+CELL > r1 column=f1:a1, > timestamp=1516313683984, value=a2 > r1 column=f1:b1, > timestamp=1516313700744, value=b2 > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-19811) Fix findbugs and error-prone warnings in hbase-server (branch-2)
[ https://issues.apache.org/jira/browse/HBASE-19811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1635#comment-1635 ] Hudson commented on HBASE-19811: FAILURE: Integrated in Jenkins build HBase-Trunk_matrix #4493 (See [https://builds.apache.org/job/HBase-Trunk_matrix/4493/]) HBASE-19811 Fix findbugs and error-prone warnings in hbase-server (stack: rev 34c6c99041d5f4a217363667b090fb1b5beb7abe) * (edit) hbase-server/src/test/java/org/apache/hadoop/hbase/procedure/TestZKProcedure.java * (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MemStoreFlusher.java * (edit) hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java * (edit) hbase-server/src/test/java/org/apache/hadoop/hbase/zookeeper/TestZooKeeperACL.java > Fix findbugs and error-prone warnings in hbase-server (branch-2) > > > Key: HBASE-19811 > URL: https://issues.apache.org/jira/browse/HBASE-19811 > Project: HBase > Issue Type: Sub-task >Affects Versions: 2.0.0-beta-1 >Reporter: Peter Somogyi >Assignee: Peter Somogyi >Priority: Major > Fix For: 2.0.0-beta-2 > > Attachments: 0HBASE-19811.branch-2.ADDENDUM.2.patch, > 1-HBASE-19811.branch-2.002.patch, HBASE-19811.branch-2.001.patch, > HBASE-19811.branch-2.001.patch, HBASE-19811.branch-2.002.patch, > HBASE-19811.branch-2.ADDENDUM.patch, HBASE-19811.master.002.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-17852) Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental backup)
[ https://issues.apache.org/jira/browse/HBASE-17852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1631#comment-1631 ] Josh Elser commented on HBASE-17852: bq. I'm not messing with you, Appy. Check the push logs/comments on the other JIRA issue.. I swear to you that I did not push this until after I heard back from you. My guess is that this is due to me using git-am or cherry picking a commit from a local branch. My apologies, Appy. I am wrong. I apparently got impatient and pushed this because there was silence from the Dec 6th mention and the Jan 12th re-ping. My intent was not to squash your opinions, but to avoid being blocked if you were not interested/busy as seemed might be the case. If you have since changed your mind about the reduced patch hitting master, my offer to revert stands. My apologies again for arguing with you while in the wrong. > Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental > backup) > > > Key: HBASE-17852 > URL: https://issues.apache.org/jira/browse/HBASE-17852 > Project: HBase > Issue Type: Sub-task >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov >Priority: Major > Fix For: 3.0.0 > > Attachments: HBASE-17852-v10.patch, screenshot-1.png > > > Design approach rollback-via-snapshot implemented in this ticket: > # Before backup create/delete/merge starts we take a snapshot of the backup > meta-table (backup system table). This procedure is lightweight because meta > table is small, usually should fit a single region. > # When operation fails on a server side, we handle this failure by cleaning > up partial data in backup destination, followed by restoring backup > meta-table from a snapshot. > # When operation fails on a client side (abnormal termination, for example), > next time user will try create/merge/delete he(she) will see error message, > that system is in inconsistent state and repair is required, he(she) will > need to run backup repair tool. > # To avoid multiple writers to the backup system table (backup client and > BackupObserver's) we introduce small table ONLY to keep listing of bulk > loaded files. All backup observers will work only with this new tables. The > reason: in case of a failure during backup create/delete/merge/restore, when > system performs automatic rollback, some data written by backup observers > during failed operation may be lost. This is what we try to avoid. > # Second table keeps only bulk load related references. We do not care about > consistency of this table, because bulk load is idempotent operation and can > be repeated after failure. Partially written data in second table does not > affect on BackupHFileCleaner plugin, because this data (list of bulk loaded > files) correspond to a files which have not been loaded yet successfully and, > hence - are not visible to the system -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-19890) Canary usage should document hbase.canary.sink.class config
Ted Yu created HBASE-19890: -- Summary: Canary usage should document hbase.canary.sink.class config Key: HBASE-19890 URL: https://issues.apache.org/jira/browse/HBASE-19890 Project: HBase Issue Type: Bug Reporter: Ted Yu Canary#main uses config hbase.canary.sink.class to instantiate Sink class. The Sink instance affects creation of Monitor. In the refguide for Canary, hbase.canary.sink.class was not mentioned. We should document this config. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Issue Comment Deleted] (HBASE-17852) Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental backup)
[ https://issues.apache.org/jira/browse/HBASE-17852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vladimir Rodionov updated HBASE-17852: -- Comment: was deleted (was: Hmm, did I insult someone savagely, [~elserj]? ) > Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental > backup) > > > Key: HBASE-17852 > URL: https://issues.apache.org/jira/browse/HBASE-17852 > Project: HBase > Issue Type: Sub-task >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov >Priority: Major > Fix For: 3.0.0 > > Attachments: HBASE-17852-v10.patch, screenshot-1.png > > > Design approach rollback-via-snapshot implemented in this ticket: > # Before backup create/delete/merge starts we take a snapshot of the backup > meta-table (backup system table). This procedure is lightweight because meta > table is small, usually should fit a single region. > # When operation fails on a server side, we handle this failure by cleaning > up partial data in backup destination, followed by restoring backup > meta-table from a snapshot. > # When operation fails on a client side (abnormal termination, for example), > next time user will try create/merge/delete he(she) will see error message, > that system is in inconsistent state and repair is required, he(she) will > need to run backup repair tool. > # To avoid multiple writers to the backup system table (backup client and > BackupObserver's) we introduce small table ONLY to keep listing of bulk > loaded files. All backup observers will work only with this new tables. The > reason: in case of a failure during backup create/delete/merge/restore, when > system performs automatic rollback, some data written by backup observers > during failed operation may be lost. This is what we try to avoid. > # Second table keeps only bulk load related references. We do not care about > consistency of this table, because bulk load is idempotent operation and can > be repeated after failure. Partially written data in second table does not > affect on BackupHFileCleaner plugin, because this data (list of bulk loaded > files) correspond to a files which have not been loaded yet successfully and, > hence - are not visible to the system -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-19863) java.lang.IllegalStateException: isDelete failed when SingleColumnValueFilter is used
[ https://issues.apache.org/jira/browse/HBASE-19863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16344426#comment-16344426 ] Hadoop QA commented on HBASE-19863: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 11s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Findbugs executables are not available. {color} | | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 59s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 48s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 13s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 6m 6s{color} | {color:green} branch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 28s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 21s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 43s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 43s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 5s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 4m 46s{color} | {color:green} patch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 18m 25s{color} | {color:green} Patch does not cause any errors with Hadoop 2.6.5 2.7.4 or 3.0.0. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 28s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red}105m 41s{color} | {color:red} hbase-server in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 19s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}143m 49s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hbase.TestZooKeeper | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:eee3b01 | | JIRA Issue | HBASE-19863 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12908234/HBASE-19863-branch1.patch | | Optional Tests | asflicense javac javadoc unit findbugs shadedjars hadoopcheck hbaseanti checkstyle compile | | uname | Linux a53b47ac1993 3.13.0-133-generic #182-Ubuntu SMP Tue Sep 19 15:49:21 UTC 2017 x86_64 GNU/Linux | | Build tool | maven | | Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh | | git revision | master / 34c6c99041 | | maven | version: Apache Maven 3.5.2 (138edd61fd100ec658bfa2d307c43b76940a5d7d; 2017-10-18T07:58:13Z) | | Default Java | 1.8.0_151 | | unit | https://builds.apache.org/job/PreCommit-HBASE-Build/11240/artifact/patchprocess/patch-unit-hbase-server.txt | | Test Results | https://builds.apache.org/job/PreCommit-HBASE-Build/11240/testReport/ | | modules | C: hbase-server U: hbase-server | | Console output | https://builds.apache.org/job/PreCommit-HBASE-Build/11240/console |
[jira] [Commented] (HBASE-17852) Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental backup)
[ https://issues.apache.org/jira/browse/HBASE-17852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16344425#comment-16344425 ] Vladimir Rodionov commented on HBASE-17852: --- Hmm, did I insult someone savagely, [~elserj]? > Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental > backup) > > > Key: HBASE-17852 > URL: https://issues.apache.org/jira/browse/HBASE-17852 > Project: HBase > Issue Type: Sub-task >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov >Priority: Major > Fix For: 3.0.0 > > Attachments: HBASE-17852-v10.patch, screenshot-1.png > > > Design approach rollback-via-snapshot implemented in this ticket: > # Before backup create/delete/merge starts we take a snapshot of the backup > meta-table (backup system table). This procedure is lightweight because meta > table is small, usually should fit a single region. > # When operation fails on a server side, we handle this failure by cleaning > up partial data in backup destination, followed by restoring backup > meta-table from a snapshot. > # When operation fails on a client side (abnormal termination, for example), > next time user will try create/merge/delete he(she) will see error message, > that system is in inconsistent state and repair is required, he(she) will > need to run backup repair tool. > # To avoid multiple writers to the backup system table (backup client and > BackupObserver's) we introduce small table ONLY to keep listing of bulk > loaded files. All backup observers will work only with this new tables. The > reason: in case of a failure during backup create/delete/merge/restore, when > system performs automatic rollback, some data written by backup observers > during failed operation may be lost. This is what we try to avoid. > # Second table keeps only bulk load related references. We do not care about > consistency of this table, because bulk load is idempotent operation and can > be repeated after failure. Partially written data in second table does not > affect on BackupHFileCleaner plugin, because this data (list of bulk loaded > files) correspond to a files which have not been loaded yet successfully and, > hence - are not visible to the system -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Issue Comment Deleted] (HBASE-17852) Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental backup)
[ https://issues.apache.org/jira/browse/HBASE-17852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vladimir Rodionov updated HBASE-17852: -- Comment: was deleted (was: {quote} that's precisely the reason why i can't trust you. {quote} You can start discussion about trust and respect in HBase community and I assure you I have a lot to say about. ) > Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental > backup) > > > Key: HBASE-17852 > URL: https://issues.apache.org/jira/browse/HBASE-17852 > Project: HBase > Issue Type: Sub-task >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov >Priority: Major > Fix For: 3.0.0 > > Attachments: HBASE-17852-v10.patch, screenshot-1.png > > > Design approach rollback-via-snapshot implemented in this ticket: > # Before backup create/delete/merge starts we take a snapshot of the backup > meta-table (backup system table). This procedure is lightweight because meta > table is small, usually should fit a single region. > # When operation fails on a server side, we handle this failure by cleaning > up partial data in backup destination, followed by restoring backup > meta-table from a snapshot. > # When operation fails on a client side (abnormal termination, for example), > next time user will try create/merge/delete he(she) will see error message, > that system is in inconsistent state and repair is required, he(she) will > need to run backup repair tool. > # To avoid multiple writers to the backup system table (backup client and > BackupObserver's) we introduce small table ONLY to keep listing of bulk > loaded files. All backup observers will work only with this new tables. The > reason: in case of a failure during backup create/delete/merge/restore, when > system performs automatic rollback, some data written by backup observers > during failed operation may be lost. This is what we try to avoid. > # Second table keeps only bulk load related references. We do not care about > consistency of this table, because bulk load is idempotent operation and can > be repeated after failure. Partially written data in second table does not > affect on BackupHFileCleaner plugin, because this data (list of bulk loaded > files) correspond to a files which have not been loaded yet successfully and, > hence - are not visible to the system -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-17852) Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental backup)
[ https://issues.apache.org/jira/browse/HBASE-17852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16344416#comment-16344416 ] Josh Elser commented on HBASE-17852: [~vrodionov] that's also plenty shit-slinging from you too on the matter. Thanks. > Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental > backup) > > > Key: HBASE-17852 > URL: https://issues.apache.org/jira/browse/HBASE-17852 > Project: HBase > Issue Type: Sub-task >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov >Priority: Major > Fix For: 3.0.0 > > Attachments: HBASE-17852-v10.patch, screenshot-1.png > > > Design approach rollback-via-snapshot implemented in this ticket: > # Before backup create/delete/merge starts we take a snapshot of the backup > meta-table (backup system table). This procedure is lightweight because meta > table is small, usually should fit a single region. > # When operation fails on a server side, we handle this failure by cleaning > up partial data in backup destination, followed by restoring backup > meta-table from a snapshot. > # When operation fails on a client side (abnormal termination, for example), > next time user will try create/merge/delete he(she) will see error message, > that system is in inconsistent state and repair is required, he(she) will > need to run backup repair tool. > # To avoid multiple writers to the backup system table (backup client and > BackupObserver's) we introduce small table ONLY to keep listing of bulk > loaded files. All backup observers will work only with this new tables. The > reason: in case of a failure during backup create/delete/merge/restore, when > system performs automatic rollback, some data written by backup observers > during failed operation may be lost. This is what we try to avoid. > # Second table keeps only bulk load related references. We do not care about > consistency of this table, because bulk load is idempotent operation and can > be repeated after failure. Partially written data in second table does not > affect on BackupHFileCleaner plugin, because this data (list of bulk loaded > files) correspond to a files which have not been loaded yet successfully and, > hence - are not visible to the system -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-17852) Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental backup)
[ https://issues.apache.org/jira/browse/HBASE-17852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16344407#comment-16344407 ] Josh Elser commented on HBASE-17852: Also, again, if you want this reverted, please say so. > Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental > backup) > > > Key: HBASE-17852 > URL: https://issues.apache.org/jira/browse/HBASE-17852 > Project: HBase > Issue Type: Sub-task >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov >Priority: Major > Fix For: 3.0.0 > > Attachments: HBASE-17852-v10.patch, screenshot-1.png > > > Design approach rollback-via-snapshot implemented in this ticket: > # Before backup create/delete/merge starts we take a snapshot of the backup > meta-table (backup system table). This procedure is lightweight because meta > table is small, usually should fit a single region. > # When operation fails on a server side, we handle this failure by cleaning > up partial data in backup destination, followed by restoring backup > meta-table from a snapshot. > # When operation fails on a client side (abnormal termination, for example), > next time user will try create/merge/delete he(she) will see error message, > that system is in inconsistent state and repair is required, he(she) will > need to run backup repair tool. > # To avoid multiple writers to the backup system table (backup client and > BackupObserver's) we introduce small table ONLY to keep listing of bulk > loaded files. All backup observers will work only with this new tables. The > reason: in case of a failure during backup create/delete/merge/restore, when > system performs automatic rollback, some data written by backup observers > during failed operation may be lost. This is what we try to avoid. > # Second table keeps only bulk load related references. We do not care about > consistency of this table, because bulk load is idempotent operation and can > be repeated after failure. Partially written data in second table does not > affect on BackupHFileCleaner plugin, because this data (list of bulk loaded > files) correspond to a files which have not been loaded yet successfully and, > hence - are not visible to the system -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-17852) Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental backup)
[ https://issues.apache.org/jira/browse/HBASE-17852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16344404#comment-16344404 ] Josh Elser commented on HBASE-17852: bq. I did say it was okay to go in master, but that's like 4 days after the commit - 2018-01-16T12:46:19-0800 I'm not messing with you, [~appy]. Check the push logs/comments on the other JIRA issue.. I swear to you that I did not push this until after I heard back from you. My guess is that this is due to me using git-am or cherry picking a commit from a local branch. > Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental > backup) > > > Key: HBASE-17852 > URL: https://issues.apache.org/jira/browse/HBASE-17852 > Project: HBase > Issue Type: Sub-task >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov >Priority: Major > Fix For: 3.0.0 > > Attachments: HBASE-17852-v10.patch, screenshot-1.png > > > Design approach rollback-via-snapshot implemented in this ticket: > # Before backup create/delete/merge starts we take a snapshot of the backup > meta-table (backup system table). This procedure is lightweight because meta > table is small, usually should fit a single region. > # When operation fails on a server side, we handle this failure by cleaning > up partial data in backup destination, followed by restoring backup > meta-table from a snapshot. > # When operation fails on a client side (abnormal termination, for example), > next time user will try create/merge/delete he(she) will see error message, > that system is in inconsistent state and repair is required, he(she) will > need to run backup repair tool. > # To avoid multiple writers to the backup system table (backup client and > BackupObserver's) we introduce small table ONLY to keep listing of bulk > loaded files. All backup observers will work only with this new tables. The > reason: in case of a failure during backup create/delete/merge/restore, when > system performs automatic rollback, some data written by backup observers > during failed operation may be lost. This is what we try to avoid. > # Second table keeps only bulk load related references. We do not care about > consistency of this table, because bulk load is idempotent operation and can > be repeated after failure. Partially written data in second table does not > affect on BackupHFileCleaner plugin, because this data (list of bulk loaded > files) correspond to a files which have not been loaded yet successfully and, > hence - are not visible to the system -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-19803) False positive for the HBASE-Find-Flaky-Tests job
[ https://issues.apache.org/jira/browse/HBASE-19803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16344384#comment-16344384 ] Appy commented on HBASE-19803: -- Yeah, it's infra issue. I'm not able to even access the site https://builds.apache.org/ > False positive for the HBASE-Find-Flaky-Tests job > - > > Key: HBASE-19803 > URL: https://issues.apache.org/jira/browse/HBASE-19803 > Project: HBase > Issue Type: Sub-task >Reporter: Duo Zhang >Priority: Major > Attachments: 2018-01-24T17-45-37_000-jvmRun1.dumpstream, > HBASE-19803.master.001.patch > > > It reports two hangs for TestAsyncTableGetMultiThreaded, but I checked the > surefire output > https://builds.apache.org/job/HBASE-Flaky-Tests/24830/artifact/hbase-server/target/surefire-reports/org.apache.hadoop.hbase.client.TestAsyncTableGetMultiThreaded-output.txt > This one was likely to be killed in the middle of the run within 20 seconds. > https://builds.apache.org/job/HBASE-Flaky-Tests/24852/artifact/hbase-server/target/surefire-reports/org.apache.hadoop.hbase.client.TestAsyncTableGetMultiThreaded-output.txt > This one was also killed within about 1 minutes. > The test is declared as LargeTests so the time limit should be 10 minutes. It > seems that the jvm may crash during the mvn test run and then we will kill > all the running tests and then we may mark some of them as hang which leads > to the false positive. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-17852) Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental backup)
[ https://issues.apache.org/jira/browse/HBASE-17852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16344380#comment-16344380 ] Vladimir Rodionov commented on HBASE-17852: --- {quote} that's precisely the reason why i can't trust you. {quote} You can start discussion about trust and respect in HBase community and I assure you I have a lot to say about. > Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental > backup) > > > Key: HBASE-17852 > URL: https://issues.apache.org/jira/browse/HBASE-17852 > Project: HBase > Issue Type: Sub-task >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov >Priority: Major > Fix For: 3.0.0 > > Attachments: HBASE-17852-v10.patch, screenshot-1.png > > > Design approach rollback-via-snapshot implemented in this ticket: > # Before backup create/delete/merge starts we take a snapshot of the backup > meta-table (backup system table). This procedure is lightweight because meta > table is small, usually should fit a single region. > # When operation fails on a server side, we handle this failure by cleaning > up partial data in backup destination, followed by restoring backup > meta-table from a snapshot. > # When operation fails on a client side (abnormal termination, for example), > next time user will try create/merge/delete he(she) will see error message, > that system is in inconsistent state and repair is required, he(she) will > need to run backup repair tool. > # To avoid multiple writers to the backup system table (backup client and > BackupObserver's) we introduce small table ONLY to keep listing of bulk > loaded files. All backup observers will work only with this new tables. The > reason: in case of a failure during backup create/delete/merge/restore, when > system performs automatic rollback, some data written by backup observers > during failed operation may be lost. This is what we try to avoid. > # Second table keeps only bulk load related references. We do not care about > consistency of this table, because bulk load is idempotent operation and can > be repeated after failure. Partially written data in second table does not > affect on BackupHFileCleaner plugin, because this data (list of bulk loaded > files) correspond to a files which have not been loaded yet successfully and, > hence - are not visible to the system -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-19803) False positive for the HBASE-Find-Flaky-Tests job
[ https://issues.apache.org/jira/browse/HBASE-19803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16344372#comment-16344372 ] Duo Zhang commented on HBASE-19803: --- Seems the infra sucks... I added label Hadoop but the two new builds still fail due to disconnect from the build machine... > False positive for the HBASE-Find-Flaky-Tests job > - > > Key: HBASE-19803 > URL: https://issues.apache.org/jira/browse/HBASE-19803 > Project: HBase > Issue Type: Sub-task >Reporter: Duo Zhang >Priority: Major > Attachments: 2018-01-24T17-45-37_000-jvmRun1.dumpstream, > HBASE-19803.master.001.patch > > > It reports two hangs for TestAsyncTableGetMultiThreaded, but I checked the > surefire output > https://builds.apache.org/job/HBASE-Flaky-Tests/24830/artifact/hbase-server/target/surefire-reports/org.apache.hadoop.hbase.client.TestAsyncTableGetMultiThreaded-output.txt > This one was likely to be killed in the middle of the run within 20 seconds. > https://builds.apache.org/job/HBASE-Flaky-Tests/24852/artifact/hbase-server/target/surefire-reports/org.apache.hadoop.hbase.client.TestAsyncTableGetMultiThreaded-output.txt > This one was also killed within about 1 minutes. > The test is declared as LargeTests so the time limit should be 10 minutes. It > seems that the jvm may crash during the mvn test run and then we will kill > all the running tests and then we may mark some of them as hang which leads > to the false positive. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-19876) The exception happening in converting pb mutation to hbase.mutation messes up the CellScanner
[ https://issues.apache.org/jira/browse/HBASE-19876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16344363#comment-16344363 ] Anoop Sam John commented on HBASE-19876: Append and increment Mutation may have N Cells in that (for same row). Even if one cell is malformed, we have to fail the op. Ya that make sense. Need to verify once though. Thanks Chia. > The exception happening in converting pb mutation to hbase.mutation messes up > the CellScanner > - > > Key: HBASE-19876 > URL: https://issues.apache.org/jira/browse/HBASE-19876 > Project: HBase > Issue Type: Bug >Reporter: Chia-Ping Tsai >Assignee: Chia-Ping Tsai >Priority: Critical > Fix For: 1.3.2, 1.5.0, 1.2.7, 2.0.0-beta-2, 1.4.2 > > > {code:java} > 2018-01-27 22:51:43,794 INFO [hconnection-0x3291b443-shared-pool11-t6] > client.AsyncRequestFutureImpl(778): id=5, table=testQuotaStatusFromMaster3, > attempt=6/16 failed=20ops, last > exception=org.apache.hadoop.hbase.client.WrongRowIOException: > org.apache.hadoop.hbase.client.WrongRowIOException: The row in xxx doesn't > match the original one aaa > at org.apache.hadoop.hbase.client.Mutation.add(Mutation.java:776) > at org.apache.hadoop.hbase.client.Put.add(Put.java:282) > at > org.apache.hadoop.hbase.shaded.protobuf.ProtobufUtil.toPut(ProtobufUtil.java:642) > at > org.apache.hadoop.hbase.regionserver.RSRpcServices.doBatchOp(RSRpcServices.java:952) > at > org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicRegionMutation(RSRpcServices.java:896) > at > org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:2591) > at > org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:41560) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:404) > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:130) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304){code} > I noticed this bug when testing the table space quota. > When rs are converting pb mutation to hbase.mutation, the quota exception or > cell exception may be thrown. > {code} > Unable to find source-code formatter for language: > rsrpcservices#dobatchop.java. Available languages are: actionscript, ada, > applescript, bash, c, c#, c++, cpp, css, erlang, go, groovy, haskell, html, > java, javascript, js, json, lua, none, nyan, objc, perl, php, python, r, > rainbow, ruby, scala, sh, sql, swift, visualbasic, xml, yaml for > (ClientProtos.Action action: mutations) { > MutationProto m = action.getMutation(); > Mutation mutation; > if (m.getMutateType() == MutationType.PUT) { > mutation = ProtobufUtil.toPut(m, cells); > batchContainsPuts = true; > } else { > mutation = ProtobufUtil.toDelete(m, cells); > batchContainsDelete = true; > } > mutationActionMap.put(mutation, action); > mArray[i++] = mutation; > checkCellSizeLimit(region, mutation); > // Check if a space quota disallows this mutation > spaceQuotaEnforcement.getPolicyEnforcement(region).check(mutation); > quota.addMutation(mutation); > } > {code} > rs has caught the exception but it doesn't have the cellscanner skip the > failed cells. > {code:java} > } catch (IOException ie) { > if (atomic) { > throw ie; > } > for (Action mutation : mutations) { > builder.addResultOrException(getResultOrException(ie, > mutation.getIndex())); > } > } > {code} > The bug results in the WrongRowIOException to remaining mutations since they > refer to invalid cells. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-17852) Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental backup)
[ https://issues.apache.org/jira/browse/HBASE-17852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16344359#comment-16344359 ] Appy commented on HBASE-17852: -- If your mental radar doesn't tick-off in loud red alarms between the time of choosing the second approach and getting someone to commit it, that's precisely the reason why i can't trust you. > Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental > backup) > > > Key: HBASE-17852 > URL: https://issues.apache.org/jira/browse/HBASE-17852 > Project: HBase > Issue Type: Sub-task >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov >Priority: Major > Fix For: 3.0.0 > > Attachments: HBASE-17852-v10.patch, screenshot-1.png > > > Design approach rollback-via-snapshot implemented in this ticket: > # Before backup create/delete/merge starts we take a snapshot of the backup > meta-table (backup system table). This procedure is lightweight because meta > table is small, usually should fit a single region. > # When operation fails on a server side, we handle this failure by cleaning > up partial data in backup destination, followed by restoring backup > meta-table from a snapshot. > # When operation fails on a client side (abnormal termination, for example), > next time user will try create/merge/delete he(she) will see error message, > that system is in inconsistent state and repair is required, he(she) will > need to run backup repair tool. > # To avoid multiple writers to the backup system table (backup client and > BackupObserver's) we introduce small table ONLY to keep listing of bulk > loaded files. All backup observers will work only with this new tables. The > reason: in case of a failure during backup create/delete/merge/restore, when > system performs automatic rollback, some data written by backup observers > during failed operation may be lost. This is what we try to avoid. > # Second table keeps only bulk load related references. We do not care about > consistency of this table, because bulk load is idempotent operation and can > be repeated after failure. Partially written data in second table does not > affect on BackupHFileCleaner plugin, because this data (list of bulk loaded > files) correspond to a files which have not been loaded yet successfully and, > hence - are not visible to the system -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-19803) False positive for the HBASE-Find-Flaky-Tests job
[ https://issues.apache.org/jira/browse/HBASE-19803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16344357#comment-16344357 ] stack commented on HBASE-19803: --- {quote}[~appy] The flakey test finder job is hang? {quote} I was wondering why it wasn't moving today... > False positive for the HBASE-Find-Flaky-Tests job > - > > Key: HBASE-19803 > URL: https://issues.apache.org/jira/browse/HBASE-19803 > Project: HBase > Issue Type: Sub-task >Reporter: Duo Zhang >Priority: Major > Attachments: 2018-01-24T17-45-37_000-jvmRun1.dumpstream, > HBASE-19803.master.001.patch > > > It reports two hangs for TestAsyncTableGetMultiThreaded, but I checked the > surefire output > https://builds.apache.org/job/HBASE-Flaky-Tests/24830/artifact/hbase-server/target/surefire-reports/org.apache.hadoop.hbase.client.TestAsyncTableGetMultiThreaded-output.txt > This one was likely to be killed in the middle of the run within 20 seconds. > https://builds.apache.org/job/HBASE-Flaky-Tests/24852/artifact/hbase-server/target/surefire-reports/org.apache.hadoop.hbase.client.TestAsyncTableGetMultiThreaded-output.txt > This one was also killed within about 1 minutes. > The test is declared as LargeTests so the time limit should be 10 minutes. It > seems that the jvm may crash during the mvn test run and then we will kill > all the running tests and then we may mark some of them as hang which leads > to the false positive. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-19840) Flakey TestMetaWithReplicas
[ https://issues.apache.org/jira/browse/HBASE-19840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16344342#comment-16344342 ] Hudson commented on HBASE-19840: FAILURE: Integrated in Jenkins build HBase-Trunk_matrix #4492 (See [https://builds.apache.org/job/HBase-Trunk_matrix/4492/]) HBASE-19840 Flakey TestMetaWithReplicas (stack: rev 4f547b3817e01a1f98c965a502775de481e6ca96) * (edit) hbase-server/src/test/java/org/apache/hadoop/hbase/client/TestMetaWithReplicas.java * (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/AssignmentManager.java * (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/master/MasterMetaBootstrap.java * (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/master/ServerManager.java * (edit) hbase-common/src/main/java/org/apache/hadoop/hbase/util/HasThread.java * (edit) hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestMasterNoCluster.java * (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java HBASE-19840 Flakey TestMetaWithReplicas; ADDENDUM to fix Checksyte (stack: rev 0b9a0dc9519d511908efd28caf2cf010e3a1ff79) * (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/master/MasterMetaBootstrap.java > Flakey TestMetaWithReplicas > --- > > Key: HBASE-19840 > URL: https://issues.apache.org/jira/browse/HBASE-19840 > Project: HBase > Issue Type: Sub-task > Components: flakey, test >Reporter: stack >Assignee: stack >Priority: Major > Fix For: 2.0.0-beta-2 > > Attachments: HBASE-19840.master.001.patch, > HBASE-19840.master.001.patch > > > Failing about 15% of the time.. In testShutdownHandling.. > [https://builds.apache.org/view/H-L/view/HBase/job/HBase-Find-Flaky-Tests-branch2.0/lastSuccessfulBuild/artifact/dashboard.html] > > Adding some debug. Its hard to follow what is going on in this test. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-17852) Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental backup)
[ https://issues.apache.org/jira/browse/HBASE-17852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16344341#comment-16344341 ] Vladimir Rodionov commented on HBASE-17852: --- {quote} I did say it was okay to go in master, but that's like 4 days after the commit - 2018-01-16T12:46:19-0800 {quote} OK, there was an issue found during QA testing - HBASE-19568. It turned out that HBASE-17852 fixes the issue. Let us say I have had two options: # Find out which part of HBASE-17852 fixes the issue and create smaller HBASE-19568- specific patch # Apply HBASE-17852 patch directly (with some refactoring part stripped down) So I have chosen the latter one. Reasons: time, time, time. We can revert HBASE-19568 back if there are so many objections. > Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental > backup) > > > Key: HBASE-17852 > URL: https://issues.apache.org/jira/browse/HBASE-17852 > Project: HBase > Issue Type: Sub-task >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov >Priority: Major > Fix For: 3.0.0 > > Attachments: HBASE-17852-v10.patch, screenshot-1.png > > > Design approach rollback-via-snapshot implemented in this ticket: > # Before backup create/delete/merge starts we take a snapshot of the backup > meta-table (backup system table). This procedure is lightweight because meta > table is small, usually should fit a single region. > # When operation fails on a server side, we handle this failure by cleaning > up partial data in backup destination, followed by restoring backup > meta-table from a snapshot. > # When operation fails on a client side (abnormal termination, for example), > next time user will try create/merge/delete he(she) will see error message, > that system is in inconsistent state and repair is required, he(she) will > need to run backup repair tool. > # To avoid multiple writers to the backup system table (backup client and > BackupObserver's) we introduce small table ONLY to keep listing of bulk > loaded files. All backup observers will work only with this new tables. The > reason: in case of a failure during backup create/delete/merge/restore, when > system performs automatic rollback, some data written by backup observers > during failed operation may be lost. This is what we try to avoid. > # Second table keeps only bulk load related references. We do not care about > consistency of this table, because bulk load is idempotent operation and can > be repeated after failure. Partially written data in second table does not > affect on BackupHFileCleaner plugin, because this data (list of bulk loaded > files) correspond to a files which have not been loaded yet successfully and, > hence - are not visible to the system -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (HBASE-17852) Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental backup)
[ https://issues.apache.org/jira/browse/HBASE-17852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16344327#comment-16344327 ] Appy edited comment on HBASE-17852 at 1/30/18 1:13 AM: --- Commit date is 12th jan {noformat} commit a5601c8eac6bfcac7d869574547f505d44e49065 Author: Vladimir Rodionov AuthorDate: Wed Jan 10 16:26:09 2018 -0800 Commit: Josh Elser CommitDate: Fri Jan 12 13:13:17 2018 -0500 HBASE-19568: Restore of HBase table using incremental backup doesn't restore rows from an earlier incremental backup Signed-off-by: Josh Elser {noformat} I did say it was okay to go in master, but that's like 4 days after the commit - 2018-01-16T12:46:19-0800 {color:red}Edit{color} Btw, anyone wishing to cross check the diffs. Diff on this jira that wasn't approved (until 16th) : https://reviews.apache.org/r/63155/diff/5/ Diff on HBASE-19568 which was committed on 12th: https://issues.apache.org/jira/secure/attachment/12905579/HBASE-19568-v4.patch was (Author: appy): Commit date is 12th jan {noformat} commit a5601c8eac6bfcac7d869574547f505d44e49065 Author: Vladimir Rodionov AuthorDate: Wed Jan 10 16:26:09 2018 -0800 Commit: Josh Elser CommitDate: Fri Jan 12 13:13:17 2018 -0500 HBASE-19568: Restore of HBase table using incremental backup doesn't restore rows from an earlier incremental backup Signed-off-by: Josh Elser {noformat} I did say it was okay to go in master, but that's like 4 days after the commit - 2018-01-16T12:46:19-0800 > Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental > backup) > > > Key: HBASE-17852 > URL: https://issues.apache.org/jira/browse/HBASE-17852 > Project: HBase > Issue Type: Sub-task >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov >Priority: Major > Fix For: 3.0.0 > > Attachments: HBASE-17852-v10.patch, screenshot-1.png > > > Design approach rollback-via-snapshot implemented in this ticket: > # Before backup create/delete/merge starts we take a snapshot of the backup > meta-table (backup system table). This procedure is lightweight because meta > table is small, usually should fit a single region. > # When operation fails on a server side, we handle this failure by cleaning > up partial data in backup destination, followed by restoring backup > meta-table from a snapshot. > # When operation fails on a client side (abnormal termination, for example), > next time user will try create/merge/delete he(she) will see error message, > that system is in inconsistent state and repair is required, he(she) will > need to run backup repair tool. > # To avoid multiple writers to the backup system table (backup client and > BackupObserver's) we introduce small table ONLY to keep listing of bulk > loaded files. All backup observers will work only with this new tables. The > reason: in case of a failure during backup create/delete/merge/restore, when > system performs automatic rollback, some data written by backup observers > during failed operation may be lost. This is what we try to avoid. > # Second table keeps only bulk load related references. We do not care about > consistency of this table, because bulk load is idempotent operation and can > be repeated after failure. Partially written data in second table does not > affect on BackupHFileCleaner plugin, because this data (list of bulk loaded > files) correspond to a files which have not been loaded yet successfully and, > hence - are not visible to the system -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Reopened] (HBASE-19872) hbase1.3.1 regionerver Crash (bucketcache)
[ https://issues.apache.org/jira/browse/HBASE-19872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Duo Zhang reopened HBASE-19872: --- > hbase1.3.1 regionerver Crash (bucketcache) > -- > > Key: HBASE-19872 > URL: https://issues.apache.org/jira/browse/HBASE-19872 > Project: HBase > Issue Type: Bug > Components: BucketCache >Affects Versions: 1.3.1 >Reporter: gehaijiang >Priority: Major > > hbase 1.3.1 regionserver crash , configure bucketcache. > error log : > FATAL [RpcServer.FifoWFPBQ.default.handler=42,queue=2,port=16020] > regionserver.RSRpcServices: Run out of memory; RSRpcServices will abort > itself immediately > > hbase-env.sh: > export HBASE_REGIONSERVER_OPTS="-XX:+UseG1GC -XX:+UnlockExperimentalVMOptions > -Xmx24g -Xms24g -XX:MetaspaceSize=256M -XX:MaxMetaspaceSize=512M > -XX:MaxGCPauseMillis=100 -XX:G1NewSizePercent=5 -XX:ConcGCThreads=4 > -XX:ParallelGCThreads=16 -XX:-ResizePLAB -XX:+ParallelRefProcEnabled > -XX:InitiatingHeapOccupancyPercent=65 -XX:G1HeapRegionSize=32M > -XX:G1MixedGCCountTarget=64 -XX:G1OldCSetRegionThresholdPercent=5 > -XX:MaxTenuringThreshold=1 -XX:MaxDirectMemorySize=28g > -XX:ReservedCodeCacheSize=512M -XX:+DisableExplicitGC > -Xloggc:${HBASE_LOG_DIR}/regionserver.gc.log" > > hbase-site.xml: > > > hbase.bucketcache.combinedcache.enabled > true > > > hbase.bucketcache.ioengine > offheap > > > hbase.bucketcache.size > 25600 > > > hbase.bucketcache.writer.queuelength > 64 > > > hbase.bucketcache.writer.threads > 3 > > > hfile.block.cache.size > 0.3 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (HBASE-19872) hbase1.3.1 regionerver Crash (bucketcache)
[ https://issues.apache.org/jira/browse/HBASE-19872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Duo Zhang resolved HBASE-19872. --- Resolution: Not A Problem > hbase1.3.1 regionerver Crash (bucketcache) > -- > > Key: HBASE-19872 > URL: https://issues.apache.org/jira/browse/HBASE-19872 > Project: HBase > Issue Type: Bug > Components: BucketCache >Affects Versions: 1.3.1 >Reporter: gehaijiang >Priority: Major > > hbase 1.3.1 regionserver crash , configure bucketcache. > error log : > FATAL [RpcServer.FifoWFPBQ.default.handler=42,queue=2,port=16020] > regionserver.RSRpcServices: Run out of memory; RSRpcServices will abort > itself immediately > > hbase-env.sh: > export HBASE_REGIONSERVER_OPTS="-XX:+UseG1GC -XX:+UnlockExperimentalVMOptions > -Xmx24g -Xms24g -XX:MetaspaceSize=256M -XX:MaxMetaspaceSize=512M > -XX:MaxGCPauseMillis=100 -XX:G1NewSizePercent=5 -XX:ConcGCThreads=4 > -XX:ParallelGCThreads=16 -XX:-ResizePLAB -XX:+ParallelRefProcEnabled > -XX:InitiatingHeapOccupancyPercent=65 -XX:G1HeapRegionSize=32M > -XX:G1MixedGCCountTarget=64 -XX:G1OldCSetRegionThresholdPercent=5 > -XX:MaxTenuringThreshold=1 -XX:MaxDirectMemorySize=28g > -XX:ReservedCodeCacheSize=512M -XX:+DisableExplicitGC > -Xloggc:${HBASE_LOG_DIR}/regionserver.gc.log" > > hbase-site.xml: > > > hbase.bucketcache.combinedcache.enabled > true > > > hbase.bucketcache.ioengine > offheap > > > hbase.bucketcache.size > 25600 > > > hbase.bucketcache.writer.queuelength > 64 > > > hbase.bucketcache.writer.threads > 3 > > > hfile.block.cache.size > 0.3 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Reopened] (HBASE-19874) Regionserver handle is not deleted(bucketcache)
[ https://issues.apache.org/jira/browse/HBASE-19874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Duo Zhang reopened HBASE-19874: --- > Regionserver handle is not deleted(bucketcache) > - > > Key: HBASE-19874 > URL: https://issues.apache.org/jira/browse/HBASE-19874 > Project: HBase > Issue Type: Bug > Components: BucketCache >Affects Versions: 1.3.1 >Reporter: gehaijiang >Priority: Major > > hbase configure bucketcache, Production environment exist is not > deleted。 > deleted file number reached more than a few hundred, and Growing。 Memory is > growing. > $ ll|grep delete > lr-x-- 1 data data 64 Jan 28 14:28 1048 -> > /block4/hadoop/dfs/data/current/BP-1101579887-10.50.64.23-1497104043858/current/finalized/subdir247/subdir216/blk_3036141819 > (deleted) > lr-x-- 1 data data 64 Jan 28 14:28 1050 -> > /block4/hadoop/dfs/data/current/BP-1101579887-10.50.64.23-1497104043858/current/finalized/subdir247/subdir216/blk_3036141819_1962457009.meta > (deleted) > lr-x-- 1 data data 64 Jan 28 14:28 1078 -> > /block5/hadoop/dfs/data/current/BP-1101579887-10.50.64.23-1497104043858/current/finalized/subdir247/subdir62/blk_3036102314 > (deleted) > lr-x-- 1 data data 64 Jan 28 14:28 1079 -> > /block7/hadoop/dfs/data/current/BP-1101579887-10.50.64.23-1497104043858/current/finalized/subdir247/subdir95/blk_3036110832 > (deleted) > lr-x-- 1 data data 64 Jan 28 14:28 1091 -> > /block3/hadoop/dfs/data/current/BP-1101579887-10.50.64.23-1497104043858/current/finalized/subdir245/subdir53/blk_3035968976_1962284166.meta > (deleted) > lr-x-- 1 data data 64 Jan 28 14:28 1092 -> > /block9/hadoop/dfs/data/current/BP-1101579887-10.50.64.23-1497104043858/current/finalized/subdir247/subdir97/blk_3036111332 > (deleted) > lr-x-- 1 data data 64 Jan 28 14:28 1093 -> > /block9/hadoop/dfs/data/current/BP-1101579887-10.50.64.23-1497104043858/current/finalized/subdir247/subdir97/blk_3036111332_1962426522.meta > (deleted) > lr-x-- 1 data data 64 Jan 28 14:28 1096 -> > /block5/hadoop/dfs/data/current/BP-1101579887-10.50.64.23-1497104043858/current/finalized/subdir247/subdir62/blk_3036102314_1962417504.meta > (deleted) > lr-x-- 1 data data 64 Jan 28 14:28 1100 -> > /block7/hadoop/dfs/data/current/BP-1101579887-10.50.64.23-1497104043858/current/finalized/subdir247/subdir95/blk_3036110832_1962426022.meta > (deleted) > lr-x-- 1 data data 64 Jan 28 14:28 1101 -> > /block10/hadoop/dfs/data/current/BP-1101579887-10.50.64.23-1497104043858/current/finalized/subdir245/subdir13/blk_3035958550 > (deleted) > lr-x-- 1 data data 64 Jan 28 14:28 1102 -> > /block10/hadoop/dfs/data/current/BP-1101579887-10.50.64.23-1497104043858/current/finalized/subdir245/subdir13/blk_3035958550_1962273740.meta > (deleted) > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (HBASE-19874) Regionserver handle is not deleted(bucketcache)
[ https://issues.apache.org/jira/browse/HBASE-19874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Duo Zhang resolved HBASE-19874. --- Resolution: Not A Problem Hadoop Flags: (was: Reviewed) > Regionserver handle is not deleted(bucketcache) > - > > Key: HBASE-19874 > URL: https://issues.apache.org/jira/browse/HBASE-19874 > Project: HBase > Issue Type: Bug > Components: BucketCache >Affects Versions: 1.3.1 >Reporter: gehaijiang >Priority: Major > > hbase configure bucketcache, Production environment exist is not > deleted。 > deleted file number reached more than a few hundred, and Growing。 Memory is > growing. > $ ll|grep delete > lr-x-- 1 data data 64 Jan 28 14:28 1048 -> > /block4/hadoop/dfs/data/current/BP-1101579887-10.50.64.23-1497104043858/current/finalized/subdir247/subdir216/blk_3036141819 > (deleted) > lr-x-- 1 data data 64 Jan 28 14:28 1050 -> > /block4/hadoop/dfs/data/current/BP-1101579887-10.50.64.23-1497104043858/current/finalized/subdir247/subdir216/blk_3036141819_1962457009.meta > (deleted) > lr-x-- 1 data data 64 Jan 28 14:28 1078 -> > /block5/hadoop/dfs/data/current/BP-1101579887-10.50.64.23-1497104043858/current/finalized/subdir247/subdir62/blk_3036102314 > (deleted) > lr-x-- 1 data data 64 Jan 28 14:28 1079 -> > /block7/hadoop/dfs/data/current/BP-1101579887-10.50.64.23-1497104043858/current/finalized/subdir247/subdir95/blk_3036110832 > (deleted) > lr-x-- 1 data data 64 Jan 28 14:28 1091 -> > /block3/hadoop/dfs/data/current/BP-1101579887-10.50.64.23-1497104043858/current/finalized/subdir245/subdir53/blk_3035968976_1962284166.meta > (deleted) > lr-x-- 1 data data 64 Jan 28 14:28 1092 -> > /block9/hadoop/dfs/data/current/BP-1101579887-10.50.64.23-1497104043858/current/finalized/subdir247/subdir97/blk_3036111332 > (deleted) > lr-x-- 1 data data 64 Jan 28 14:28 1093 -> > /block9/hadoop/dfs/data/current/BP-1101579887-10.50.64.23-1497104043858/current/finalized/subdir247/subdir97/blk_3036111332_1962426522.meta > (deleted) > lr-x-- 1 data data 64 Jan 28 14:28 1096 -> > /block5/hadoop/dfs/data/current/BP-1101579887-10.50.64.23-1497104043858/current/finalized/subdir247/subdir62/blk_3036102314_1962417504.meta > (deleted) > lr-x-- 1 data data 64 Jan 28 14:28 1100 -> > /block7/hadoop/dfs/data/current/BP-1101579887-10.50.64.23-1497104043858/current/finalized/subdir247/subdir95/blk_3036110832_1962426022.meta > (deleted) > lr-x-- 1 data data 64 Jan 28 14:28 1101 -> > /block10/hadoop/dfs/data/current/BP-1101579887-10.50.64.23-1497104043858/current/finalized/subdir245/subdir13/blk_3035958550 > (deleted) > lr-x-- 1 data data 64 Jan 28 14:28 1102 -> > /block10/hadoop/dfs/data/current/BP-1101579887-10.50.64.23-1497104043858/current/finalized/subdir245/subdir13/blk_3035958550_1962273740.meta > (deleted) > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (HBASE-19874) Regionserver handle is not deleted(bucketcache)
[ https://issues.apache.org/jira/browse/HBASE-19874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] gehaijiang resolved HBASE-19874. Resolution: Done Hadoop Flags: Reviewed Release Note: hdfs problem > Regionserver handle is not deleted(bucketcache) > - > > Key: HBASE-19874 > URL: https://issues.apache.org/jira/browse/HBASE-19874 > Project: HBase > Issue Type: Bug > Components: BucketCache >Affects Versions: 1.3.1 >Reporter: gehaijiang >Priority: Major > > hbase configure bucketcache, Production environment exist is not > deleted。 > deleted file number reached more than a few hundred, and Growing。 Memory is > growing. > $ ll|grep delete > lr-x-- 1 data data 64 Jan 28 14:28 1048 -> > /block4/hadoop/dfs/data/current/BP-1101579887-10.50.64.23-1497104043858/current/finalized/subdir247/subdir216/blk_3036141819 > (deleted) > lr-x-- 1 data data 64 Jan 28 14:28 1050 -> > /block4/hadoop/dfs/data/current/BP-1101579887-10.50.64.23-1497104043858/current/finalized/subdir247/subdir216/blk_3036141819_1962457009.meta > (deleted) > lr-x-- 1 data data 64 Jan 28 14:28 1078 -> > /block5/hadoop/dfs/data/current/BP-1101579887-10.50.64.23-1497104043858/current/finalized/subdir247/subdir62/blk_3036102314 > (deleted) > lr-x-- 1 data data 64 Jan 28 14:28 1079 -> > /block7/hadoop/dfs/data/current/BP-1101579887-10.50.64.23-1497104043858/current/finalized/subdir247/subdir95/blk_3036110832 > (deleted) > lr-x-- 1 data data 64 Jan 28 14:28 1091 -> > /block3/hadoop/dfs/data/current/BP-1101579887-10.50.64.23-1497104043858/current/finalized/subdir245/subdir53/blk_3035968976_1962284166.meta > (deleted) > lr-x-- 1 data data 64 Jan 28 14:28 1092 -> > /block9/hadoop/dfs/data/current/BP-1101579887-10.50.64.23-1497104043858/current/finalized/subdir247/subdir97/blk_3036111332 > (deleted) > lr-x-- 1 data data 64 Jan 28 14:28 1093 -> > /block9/hadoop/dfs/data/current/BP-1101579887-10.50.64.23-1497104043858/current/finalized/subdir247/subdir97/blk_3036111332_1962426522.meta > (deleted) > lr-x-- 1 data data 64 Jan 28 14:28 1096 -> > /block5/hadoop/dfs/data/current/BP-1101579887-10.50.64.23-1497104043858/current/finalized/subdir247/subdir62/blk_3036102314_1962417504.meta > (deleted) > lr-x-- 1 data data 64 Jan 28 14:28 1100 -> > /block7/hadoop/dfs/data/current/BP-1101579887-10.50.64.23-1497104043858/current/finalized/subdir247/subdir95/blk_3036110832_1962426022.meta > (deleted) > lr-x-- 1 data data 64 Jan 28 14:28 1101 -> > /block10/hadoop/dfs/data/current/BP-1101579887-10.50.64.23-1497104043858/current/finalized/subdir245/subdir13/blk_3035958550 > (deleted) > lr-x-- 1 data data 64 Jan 28 14:28 1102 -> > /block10/hadoop/dfs/data/current/BP-1101579887-10.50.64.23-1497104043858/current/finalized/subdir245/subdir13/blk_3035958550_1962273740.meta > (deleted) > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-17852) Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental backup)
[ https://issues.apache.org/jira/browse/HBASE-17852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16344327#comment-16344327 ] Appy commented on HBASE-17852: -- Commit date is 12th jan {noformat} commit a5601c8eac6bfcac7d869574547f505d44e49065 Author: Vladimir Rodionov AuthorDate: Wed Jan 10 16:26:09 2018 -0800 Commit: Josh Elser CommitDate: Fri Jan 12 13:13:17 2018 -0500 HBASE-19568: Restore of HBase table using incremental backup doesn't restore rows from an earlier incremental backup Signed-off-by: Josh Elser {noformat} I did say it was okay to go in master, but that's like 4 days after the commit - 2018-01-16T12:46:19-0800 > Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental > backup) > > > Key: HBASE-17852 > URL: https://issues.apache.org/jira/browse/HBASE-17852 > Project: HBase > Issue Type: Sub-task >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov >Priority: Major > Fix For: 3.0.0 > > Attachments: HBASE-17852-v10.patch, screenshot-1.png > > > Design approach rollback-via-snapshot implemented in this ticket: > # Before backup create/delete/merge starts we take a snapshot of the backup > meta-table (backup system table). This procedure is lightweight because meta > table is small, usually should fit a single region. > # When operation fails on a server side, we handle this failure by cleaning > up partial data in backup destination, followed by restoring backup > meta-table from a snapshot. > # When operation fails on a client side (abnormal termination, for example), > next time user will try create/merge/delete he(she) will see error message, > that system is in inconsistent state and repair is required, he(she) will > need to run backup repair tool. > # To avoid multiple writers to the backup system table (backup client and > BackupObserver's) we introduce small table ONLY to keep listing of bulk > loaded files. All backup observers will work only with this new tables. The > reason: in case of a failure during backup create/delete/merge/restore, when > system performs automatic rollback, some data written by backup observers > during failed operation may be lost. This is what we try to avoid. > # Second table keeps only bulk load related references. We do not care about > consistency of this table, because bulk load is idempotent operation and can > be repeated after failure. Partially written data in second table does not > affect on BackupHFileCleaner plugin, because this data (list of bulk loaded > files) correspond to a files which have not been loaded yet successfully and, > hence - are not visible to the system -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (HBASE-19872) hbase1.3.1 regionerver Crash (bucketcache)
[ https://issues.apache.org/jira/browse/HBASE-19872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] gehaijiang resolved HBASE-19872. Resolution: Done > hbase1.3.1 regionerver Crash (bucketcache) > -- > > Key: HBASE-19872 > URL: https://issues.apache.org/jira/browse/HBASE-19872 > Project: HBase > Issue Type: Bug > Components: BucketCache >Affects Versions: 1.3.1 >Reporter: gehaijiang >Priority: Major > > hbase 1.3.1 regionserver crash , configure bucketcache. > error log : > FATAL [RpcServer.FifoWFPBQ.default.handler=42,queue=2,port=16020] > regionserver.RSRpcServices: Run out of memory; RSRpcServices will abort > itself immediately > > hbase-env.sh: > export HBASE_REGIONSERVER_OPTS="-XX:+UseG1GC -XX:+UnlockExperimentalVMOptions > -Xmx24g -Xms24g -XX:MetaspaceSize=256M -XX:MaxMetaspaceSize=512M > -XX:MaxGCPauseMillis=100 -XX:G1NewSizePercent=5 -XX:ConcGCThreads=4 > -XX:ParallelGCThreads=16 -XX:-ResizePLAB -XX:+ParallelRefProcEnabled > -XX:InitiatingHeapOccupancyPercent=65 -XX:G1HeapRegionSize=32M > -XX:G1MixedGCCountTarget=64 -XX:G1OldCSetRegionThresholdPercent=5 > -XX:MaxTenuringThreshold=1 -XX:MaxDirectMemorySize=28g > -XX:ReservedCodeCacheSize=512M -XX:+DisableExplicitGC > -Xloggc:${HBASE_LOG_DIR}/regionserver.gc.log" > > hbase-site.xml: > > > hbase.bucketcache.combinedcache.enabled > true > > > hbase.bucketcache.ioengine > offheap > > > hbase.bucketcache.size > 25600 > > > hbase.bucketcache.writer.queuelength > 64 > > > hbase.bucketcache.writer.threads > 3 > > > hfile.block.cache.size > 0.3 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-19874) Regionserver handle is not deleted(bucketcache)
[ https://issues.apache.org/jira/browse/HBASE-19874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16344324#comment-16344324 ] gehaijiang commented on HBASE-19874: open shortcircuit problem > Regionserver handle is not deleted(bucketcache) > - > > Key: HBASE-19874 > URL: https://issues.apache.org/jira/browse/HBASE-19874 > Project: HBase > Issue Type: Bug > Components: BucketCache >Affects Versions: 1.3.1 >Reporter: gehaijiang >Priority: Major > > hbase configure bucketcache, Production environment exist is not > deleted。 > deleted file number reached more than a few hundred, and Growing。 Memory is > growing. > $ ll|grep delete > lr-x-- 1 data data 64 Jan 28 14:28 1048 -> > /block4/hadoop/dfs/data/current/BP-1101579887-10.50.64.23-1497104043858/current/finalized/subdir247/subdir216/blk_3036141819 > (deleted) > lr-x-- 1 data data 64 Jan 28 14:28 1050 -> > /block4/hadoop/dfs/data/current/BP-1101579887-10.50.64.23-1497104043858/current/finalized/subdir247/subdir216/blk_3036141819_1962457009.meta > (deleted) > lr-x-- 1 data data 64 Jan 28 14:28 1078 -> > /block5/hadoop/dfs/data/current/BP-1101579887-10.50.64.23-1497104043858/current/finalized/subdir247/subdir62/blk_3036102314 > (deleted) > lr-x-- 1 data data 64 Jan 28 14:28 1079 -> > /block7/hadoop/dfs/data/current/BP-1101579887-10.50.64.23-1497104043858/current/finalized/subdir247/subdir95/blk_3036110832 > (deleted) > lr-x-- 1 data data 64 Jan 28 14:28 1091 -> > /block3/hadoop/dfs/data/current/BP-1101579887-10.50.64.23-1497104043858/current/finalized/subdir245/subdir53/blk_3035968976_1962284166.meta > (deleted) > lr-x-- 1 data data 64 Jan 28 14:28 1092 -> > /block9/hadoop/dfs/data/current/BP-1101579887-10.50.64.23-1497104043858/current/finalized/subdir247/subdir97/blk_3036111332 > (deleted) > lr-x-- 1 data data 64 Jan 28 14:28 1093 -> > /block9/hadoop/dfs/data/current/BP-1101579887-10.50.64.23-1497104043858/current/finalized/subdir247/subdir97/blk_3036111332_1962426522.meta > (deleted) > lr-x-- 1 data data 64 Jan 28 14:28 1096 -> > /block5/hadoop/dfs/data/current/BP-1101579887-10.50.64.23-1497104043858/current/finalized/subdir247/subdir62/blk_3036102314_1962417504.meta > (deleted) > lr-x-- 1 data data 64 Jan 28 14:28 1100 -> > /block7/hadoop/dfs/data/current/BP-1101579887-10.50.64.23-1497104043858/current/finalized/subdir247/subdir95/blk_3036110832_1962426022.meta > (deleted) > lr-x-- 1 data data 64 Jan 28 14:28 1101 -> > /block10/hadoop/dfs/data/current/BP-1101579887-10.50.64.23-1497104043858/current/finalized/subdir245/subdir13/blk_3035958550 > (deleted) > lr-x-- 1 data data 64 Jan 28 14:28 1102 -> > /block10/hadoop/dfs/data/current/BP-1101579887-10.50.64.23-1497104043858/current/finalized/subdir245/subdir13/blk_3035958550_1962273740.meta > (deleted) > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-19872) hbase1.3.1 regionerver Crash (bucketcache)
[ https://issues.apache.org/jira/browse/HBASE-19872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16344319#comment-16344319 ] gehaijiang commented on HBASE-19872: thanks, I known > hbase1.3.1 regionerver Crash (bucketcache) > -- > > Key: HBASE-19872 > URL: https://issues.apache.org/jira/browse/HBASE-19872 > Project: HBase > Issue Type: Bug > Components: BucketCache >Affects Versions: 1.3.1 >Reporter: gehaijiang >Priority: Major > > hbase 1.3.1 regionserver crash , configure bucketcache. > error log : > FATAL [RpcServer.FifoWFPBQ.default.handler=42,queue=2,port=16020] > regionserver.RSRpcServices: Run out of memory; RSRpcServices will abort > itself immediately > > hbase-env.sh: > export HBASE_REGIONSERVER_OPTS="-XX:+UseG1GC -XX:+UnlockExperimentalVMOptions > -Xmx24g -Xms24g -XX:MetaspaceSize=256M -XX:MaxMetaspaceSize=512M > -XX:MaxGCPauseMillis=100 -XX:G1NewSizePercent=5 -XX:ConcGCThreads=4 > -XX:ParallelGCThreads=16 -XX:-ResizePLAB -XX:+ParallelRefProcEnabled > -XX:InitiatingHeapOccupancyPercent=65 -XX:G1HeapRegionSize=32M > -XX:G1MixedGCCountTarget=64 -XX:G1OldCSetRegionThresholdPercent=5 > -XX:MaxTenuringThreshold=1 -XX:MaxDirectMemorySize=28g > -XX:ReservedCodeCacheSize=512M -XX:+DisableExplicitGC > -Xloggc:${HBASE_LOG_DIR}/regionserver.gc.log" > > hbase-site.xml: > > > hbase.bucketcache.combinedcache.enabled > true > > > hbase.bucketcache.ioengine > offheap > > > hbase.bucketcache.size > 25600 > > > hbase.bucketcache.writer.queuelength > 64 > > > hbase.bucketcache.writer.threads > 3 > > > hfile.block.cache.size > 0.3 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-17852) Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental backup)
[ https://issues.apache.org/jira/browse/HBASE-17852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16344314#comment-16344314 ] Appy commented on HBASE-17852: -- Okay, f**k it, I really don't want to waste anymore of my time fighting some fight. It's obvious from events what happened here, and that it shouldn't have - makes me very sad and angry. I leave its further handling to PMC. At the very least, someone lost my basic trust and respect. > Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental > backup) > > > Key: HBASE-17852 > URL: https://issues.apache.org/jira/browse/HBASE-17852 > Project: HBase > Issue Type: Sub-task >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov >Priority: Major > Fix For: 3.0.0 > > Attachments: HBASE-17852-v10.patch, screenshot-1.png > > > Design approach rollback-via-snapshot implemented in this ticket: > # Before backup create/delete/merge starts we take a snapshot of the backup > meta-table (backup system table). This procedure is lightweight because meta > table is small, usually should fit a single region. > # When operation fails on a server side, we handle this failure by cleaning > up partial data in backup destination, followed by restoring backup > meta-table from a snapshot. > # When operation fails on a client side (abnormal termination, for example), > next time user will try create/merge/delete he(she) will see error message, > that system is in inconsistent state and repair is required, he(she) will > need to run backup repair tool. > # To avoid multiple writers to the backup system table (backup client and > BackupObserver's) we introduce small table ONLY to keep listing of bulk > loaded files. All backup observers will work only with this new tables. The > reason: in case of a failure during backup create/delete/merge/restore, when > system performs automatic rollback, some data written by backup observers > during failed operation may be lost. This is what we try to avoid. > # Second table keeps only bulk load related references. We do not care about > consistency of this table, because bulk load is idempotent operation and can > be repeated after failure. Partially written data in second table does not > affect on BackupHFileCleaner plugin, because this data (list of bulk loaded > files) correspond to a files which have not been loaded yet successfully and, > hence - are not visible to the system -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-19803) False positive for the HBASE-Find-Flaky-Tests job
[ https://issues.apache.org/jira/browse/HBASE-19803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16344311#comment-16344311 ] Duo Zhang commented on HBASE-19803: --- I've changed the label from 'ubuntu' to 'ubuntu||Hadoop' so it can start a new build... > False positive for the HBASE-Find-Flaky-Tests job > - > > Key: HBASE-19803 > URL: https://issues.apache.org/jira/browse/HBASE-19803 > Project: HBase > Issue Type: Sub-task >Reporter: Duo Zhang >Priority: Major > Attachments: 2018-01-24T17-45-37_000-jvmRun1.dumpstream, > HBASE-19803.master.001.patch > > > It reports two hangs for TestAsyncTableGetMultiThreaded, but I checked the > surefire output > https://builds.apache.org/job/HBASE-Flaky-Tests/24830/artifact/hbase-server/target/surefire-reports/org.apache.hadoop.hbase.client.TestAsyncTableGetMultiThreaded-output.txt > This one was likely to be killed in the middle of the run within 20 seconds. > https://builds.apache.org/job/HBASE-Flaky-Tests/24852/artifact/hbase-server/target/surefire-reports/org.apache.hadoop.hbase.client.TestAsyncTableGetMultiThreaded-output.txt > This one was also killed within about 1 minutes. > The test is declared as LargeTests so the time limit should be 10 minutes. It > seems that the jvm may crash during the mvn test run and then we will kill > all the running tests and then we may mark some of them as hang which leads > to the false positive. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-17852) Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental backup)
[ https://issues.apache.org/jira/browse/HBASE-17852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16344310#comment-16344310 ] Josh Elser commented on HBASE-17852: bq. I'll can't believe that because I can't believe that.. [~appy], truly, boss, if you weren't giving your blessing on the fix going into master, say so and I'll revert it when next at a computer. I was operating under the assumption that we had time to address design and not look gift-contribtuion(horses) in the mouth. The rest of this is a product of some heavy-handedness about the busted Yetus after the JIRA upgrade. Not trying to tell you something different than what you think happened, did. Trying to express that I thought you were ok with this plan against master (not branch-2). > Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental > backup) > > > Key: HBASE-17852 > URL: https://issues.apache.org/jira/browse/HBASE-17852 > Project: HBase > Issue Type: Sub-task >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov >Priority: Major > Fix For: 3.0.0 > > Attachments: HBASE-17852-v10.patch, screenshot-1.png > > > Design approach rollback-via-snapshot implemented in this ticket: > # Before backup create/delete/merge starts we take a snapshot of the backup > meta-table (backup system table). This procedure is lightweight because meta > table is small, usually should fit a single region. > # When operation fails on a server side, we handle this failure by cleaning > up partial data in backup destination, followed by restoring backup > meta-table from a snapshot. > # When operation fails on a client side (abnormal termination, for example), > next time user will try create/merge/delete he(she) will see error message, > that system is in inconsistent state and repair is required, he(she) will > need to run backup repair tool. > # To avoid multiple writers to the backup system table (backup client and > BackupObserver's) we introduce small table ONLY to keep listing of bulk > loaded files. All backup observers will work only with this new tables. The > reason: in case of a failure during backup create/delete/merge/restore, when > system performs automatic rollback, some data written by backup observers > during failed operation may be lost. This is what we try to avoid. > # Second table keeps only bulk load related references. We do not care about > consistency of this table, because bulk load is idempotent operation and can > be repeated after failure. Partially written data in second table does not > affect on BackupHFileCleaner plugin, because this data (list of bulk loaded > files) correspond to a files which have not been loaded yet successfully and, > hence - are not visible to the system -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-17852) Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental backup)
[ https://issues.apache.org/jira/browse/HBASE-17852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16344307#comment-16344307 ] Vladimir Rodionov commented on HBASE-17852: --- {quote} Wasn't the said patch objected against committing by multiple members of the community? {quote} Calm down, [~appy]. We are not doing anything criminal here. The result of these two patches is what you have agreed on personally : https://issues.apache.org/jira/browse/HBASE-17852?focusedCommentId=16327774&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16327774 > Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental > backup) > > > Key: HBASE-17852 > URL: https://issues.apache.org/jira/browse/HBASE-17852 > Project: HBase > Issue Type: Sub-task >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov >Priority: Major > Fix For: 3.0.0 > > Attachments: HBASE-17852-v10.patch, screenshot-1.png > > > Design approach rollback-via-snapshot implemented in this ticket: > # Before backup create/delete/merge starts we take a snapshot of the backup > meta-table (backup system table). This procedure is lightweight because meta > table is small, usually should fit a single region. > # When operation fails on a server side, we handle this failure by cleaning > up partial data in backup destination, followed by restoring backup > meta-table from a snapshot. > # When operation fails on a client side (abnormal termination, for example), > next time user will try create/merge/delete he(she) will see error message, > that system is in inconsistent state and repair is required, he(she) will > need to run backup repair tool. > # To avoid multiple writers to the backup system table (backup client and > BackupObserver's) we introduce small table ONLY to keep listing of bulk > loaded files. All backup observers will work only with this new tables. The > reason: in case of a failure during backup create/delete/merge/restore, when > system performs automatic rollback, some data written by backup observers > during failed operation may be lost. This is what we try to avoid. > # Second table keeps only bulk load related references. We do not care about > consistency of this table, because bulk load is idempotent operation and can > be repeated after failure. Partially written data in second table does not > affect on BackupHFileCleaner plugin, because this data (list of bulk loaded > files) correspond to a files which have not been loaded yet successfully and, > hence - are not visible to the system -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-19803) False positive for the HBASE-Find-Flaky-Tests job
[ https://issues.apache.org/jira/browse/HBASE-19803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16344305#comment-16344305 ] Duo Zhang commented on HBASE-19803: --- [~appy] The flakey test finder job is hang? https://builds.apache.org/job/HBASE-Find-Flaky-Tests/ The last build can not start... > False positive for the HBASE-Find-Flaky-Tests job > - > > Key: HBASE-19803 > URL: https://issues.apache.org/jira/browse/HBASE-19803 > Project: HBase > Issue Type: Sub-task >Reporter: Duo Zhang >Priority: Major > Attachments: 2018-01-24T17-45-37_000-jvmRun1.dumpstream, > HBASE-19803.master.001.patch > > > It reports two hangs for TestAsyncTableGetMultiThreaded, but I checked the > surefire output > https://builds.apache.org/job/HBASE-Flaky-Tests/24830/artifact/hbase-server/target/surefire-reports/org.apache.hadoop.hbase.client.TestAsyncTableGetMultiThreaded-output.txt > This one was likely to be killed in the middle of the run within 20 seconds. > https://builds.apache.org/job/HBASE-Flaky-Tests/24852/artifact/hbase-server/target/surefire-reports/org.apache.hadoop.hbase.client.TestAsyncTableGetMultiThreaded-output.txt > This one was also killed within about 1 minutes. > The test is declared as LargeTests so the time limit should be 10 minutes. It > seems that the jvm may crash during the mvn test run and then we will kill > all the running tests and then we may mark some of them as hang which leads > to the false positive. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-19873) Add a CategoryBasedTimeout ClassRule for all UTs
[ https://issues.apache.org/jira/browse/HBASE-19873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16344304#comment-16344304 ] Duo Zhang commented on HBASE-19873: --- Thanks [~stack]. > Add a CategoryBasedTimeout ClassRule for all UTs > > > Key: HBASE-19873 > URL: https://issues.apache.org/jira/browse/HBASE-19873 > Project: HBase > Issue Type: Sub-task >Reporter: Duo Zhang >Assignee: Duo Zhang >Priority: Major > Fix For: 2.0.0-beta-2 > > Attachments: HBASE-19873-branch-2-v2.patch > > > So that our test can timeout as expected without making the surefire plugin > kill other tests. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-17852) Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental backup)
[ https://issues.apache.org/jira/browse/HBASE-17852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16344302#comment-16344302 ] stack commented on HBASE-17852: --- {quote}The majority of this code (but not all) went into master in HBASE-19568 btw. {quote} The majority of 'HBASE-17852 Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental backup)', a contentious issue, went into another commit named 'HBASE-19568 Restore of HBase table using incremental backup doesn't restore rows from an earlier incremental backup' with no outline of what made it and what did not, and no changeset explaination. There is no release note. The two JIRAs are not even linked. {quote}Nope, it turned out that this patch (HBASE-17852) also fixes the issue raised in HBASE-19568, that is why it was committed (with refactoring code stripped down). No conspiracy here. {quote} But hang on, now the patch here on 'fault tolerance' fixes issues over in the 'restore rows' issue, -HBASE-19568?- I can see how [~appy] might arrive at his assessment. On the 'declarations', the first offers options free of context or explanation. This one I find interesting: # Use procedure framework: Short answer - no. I will wait until procv2 becomes more mature and robust. I do not want to build new feature on a foundation of a new feature. Too risky in my opinion. NO when we are talking about a hbase3 (possibly) feature and when there is no alternative. Anyway, keeping it short. > Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental > backup) > > > Key: HBASE-17852 > URL: https://issues.apache.org/jira/browse/HBASE-17852 > Project: HBase > Issue Type: Sub-task >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov >Priority: Major > Fix For: 3.0.0 > > Attachments: HBASE-17852-v10.patch, screenshot-1.png > > > Design approach rollback-via-snapshot implemented in this ticket: > # Before backup create/delete/merge starts we take a snapshot of the backup > meta-table (backup system table). This procedure is lightweight because meta > table is small, usually should fit a single region. > # When operation fails on a server side, we handle this failure by cleaning > up partial data in backup destination, followed by restoring backup > meta-table from a snapshot. > # When operation fails on a client side (abnormal termination, for example), > next time user will try create/merge/delete he(she) will see error message, > that system is in inconsistent state and repair is required, he(she) will > need to run backup repair tool. > # To avoid multiple writers to the backup system table (backup client and > BackupObserver's) we introduce small table ONLY to keep listing of bulk > loaded files. All backup observers will work only with this new tables. The > reason: in case of a failure during backup create/delete/merge/restore, when > system performs automatic rollback, some data written by backup observers > during failed operation may be lost. This is what we try to avoid. > # Second table keeps only bulk load related references. We do not care about > consistency of this table, because bulk load is idempotent operation and can > be repeated after failure. Partially written data in second table does not > affect on BackupHFileCleaner plugin, because this data (list of bulk loaded > files) correspond to a files which have not been loaded yet successfully and, > hence - are not visible to the system -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-17852) Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental backup)
[ https://issues.apache.org/jira/browse/HBASE-17852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16344287#comment-16344287 ] Vladimir Rodionov commented on HBASE-17852: --- {quote} he had random urge to delete all previous 9 patches from this jira {quote} No conspiracy here as well. I was not able to submit patch v10 due to some Apache Jira issues and had to remove all previous patches to be able to submit v10. > Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental > backup) > > > Key: HBASE-17852 > URL: https://issues.apache.org/jira/browse/HBASE-17852 > Project: HBase > Issue Type: Sub-task >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov >Priority: Major > Fix For: 3.0.0 > > Attachments: HBASE-17852-v10.patch, screenshot-1.png > > > Design approach rollback-via-snapshot implemented in this ticket: > # Before backup create/delete/merge starts we take a snapshot of the backup > meta-table (backup system table). This procedure is lightweight because meta > table is small, usually should fit a single region. > # When operation fails on a server side, we handle this failure by cleaning > up partial data in backup destination, followed by restoring backup > meta-table from a snapshot. > # When operation fails on a client side (abnormal termination, for example), > next time user will try create/merge/delete he(she) will see error message, > that system is in inconsistent state and repair is required, he(she) will > need to run backup repair tool. > # To avoid multiple writers to the backup system table (backup client and > BackupObserver's) we introduce small table ONLY to keep listing of bulk > loaded files. All backup observers will work only with this new tables. The > reason: in case of a failure during backup create/delete/merge/restore, when > system performs automatic rollback, some data written by backup observers > during failed operation may be lost. This is what we try to avoid. > # Second table keeps only bulk load related references. We do not care about > consistency of this table, because bulk load is idempotent operation and can > be repeated after failure. Partially written data in second table does not > affect on BackupHFileCleaner plugin, because this data (list of bulk loaded > files) correspond to a files which have not been loaded yet successfully and, > hence - are not visible to the system -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-17852) Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental backup)
[ https://issues.apache.org/jira/browse/HBASE-17852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16344272#comment-16344272 ] Appy commented on HBASE-17852: -- bq. There was nothing malicious intending to happen here I'll can't believe that because I can't believe that - he started fixing the other jira from clean slate and somehow mysteriously ended up with exact same diff as was here, and which we all were against. - he had random urge to delete all previous 9 patches from this jira, but not from phase1 jira HBASE-14030 or phase2 jira HBASE-14123, which both have like 40 patches each > Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental > backup) > > > Key: HBASE-17852 > URL: https://issues.apache.org/jira/browse/HBASE-17852 > Project: HBase > Issue Type: Sub-task >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov >Priority: Major > Fix For: 3.0.0 > > Attachments: HBASE-17852-v10.patch, screenshot-1.png > > > Design approach rollback-via-snapshot implemented in this ticket: > # Before backup create/delete/merge starts we take a snapshot of the backup > meta-table (backup system table). This procedure is lightweight because meta > table is small, usually should fit a single region. > # When operation fails on a server side, we handle this failure by cleaning > up partial data in backup destination, followed by restoring backup > meta-table from a snapshot. > # When operation fails on a client side (abnormal termination, for example), > next time user will try create/merge/delete he(she) will see error message, > that system is in inconsistent state and repair is required, he(she) will > need to run backup repair tool. > # To avoid multiple writers to the backup system table (backup client and > BackupObserver's) we introduce small table ONLY to keep listing of bulk > loaded files. All backup observers will work only with this new tables. The > reason: in case of a failure during backup create/delete/merge/restore, when > system performs automatic rollback, some data written by backup observers > during failed operation may be lost. This is what we try to avoid. > # Second table keeps only bulk load related references. We do not care about > consistency of this table, because bulk load is idempotent operation and can > be repeated after failure. Partially written data in second table does not > affect on BackupHFileCleaner plugin, because this data (list of bulk loaded > files) correspond to a files which have not been loaded yet successfully and, > hence - are not visible to the system -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-17852) Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental backup)
[ https://issues.apache.org/jira/browse/HBASE-17852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16344258#comment-16344258 ] Appy commented on HBASE-17852: -- bq. Nope, it turned out that this patch (HBASE-17852) also fixes the issue raised in HBASE-19568, that is why it was committed (with refactoring code stripped down). Not a justification! Did you not use the patch in this jira to fix HBASE-19568? Wasn't the said patch objected against committing by multiple members of the community? Did you brought to anyone's attention, who raised the objections (me/stack/andrew/[~mdrob]), the fact that you were committing these changes. bq. No conspiracy here. Besides this, I thought that we have agreed on pushing this to the master branch and continue working on a critical changes after that? You really think that'd work? People can match timestamps, you committed 4 days before i even replied back! > Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental > backup) > > > Key: HBASE-17852 > URL: https://issues.apache.org/jira/browse/HBASE-17852 > Project: HBase > Issue Type: Sub-task >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov >Priority: Major > Fix For: 3.0.0 > > Attachments: HBASE-17852-v10.patch, screenshot-1.png > > > Design approach rollback-via-snapshot implemented in this ticket: > # Before backup create/delete/merge starts we take a snapshot of the backup > meta-table (backup system table). This procedure is lightweight because meta > table is small, usually should fit a single region. > # When operation fails on a server side, we handle this failure by cleaning > up partial data in backup destination, followed by restoring backup > meta-table from a snapshot. > # When operation fails on a client side (abnormal termination, for example), > next time user will try create/merge/delete he(she) will see error message, > that system is in inconsistent state and repair is required, he(she) will > need to run backup repair tool. > # To avoid multiple writers to the backup system table (backup client and > BackupObserver's) we introduce small table ONLY to keep listing of bulk > loaded files. All backup observers will work only with this new tables. The > reason: in case of a failure during backup create/delete/merge/restore, when > system performs automatic rollback, some data written by backup observers > during failed operation may be lost. This is what we try to avoid. > # Second table keeps only bulk load related references. We do not care about > consistency of this table, because bulk load is idempotent operation and can > be repeated after failure. Partially written data in second table does not > affect on BackupHFileCleaner plugin, because this data (list of bulk loaded > files) correspond to a files which have not been loaded yet successfully and, > hence - are not visible to the system -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-19863) java.lang.IllegalStateException: isDelete failed when SingleColumnValueFilter is used
[ https://issues.apache.org/jira/browse/HBASE-19863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Soldatov updated HBASE-19863: Status: Patch Available (was: Open) A WIP patch to deal with this issue. If during the attempt of skip to next column we hit the case when store scanner was changed and new head is behind the cell we are looking for, we just return false to force reseek in seekOrSkipToNextColumn. Not sure yet whether it covers all possible scenarios. > java.lang.IllegalStateException: isDelete failed when SingleColumnValueFilter > is used > - > > Key: HBASE-19863 > URL: https://issues.apache.org/jira/browse/HBASE-19863 > Project: HBase > Issue Type: Bug > Components: Filters >Affects Versions: 1.4.1 >Reporter: Sergey Soldatov >Assignee: Sergey Soldatov >Priority: Major > Attachments: HBASE-19863-branch1.patch, HBASE-19863-test.patch > > > Under some circumstances scan with SingleColumnValueFilter may fail with an > exception > {noformat} > java.lang.IllegalStateException: isDelete failed: deleteBuffer=C3, > qualifier=C2, timestamp=1516433595543, comparison result: 1 > at > org.apache.hadoop.hbase.regionserver.ScanDeleteTracker.isDeleted(ScanDeleteTracker.java:149) > at > org.apache.hadoop.hbase.regionserver.ScanQueryMatcher.match(ScanQueryMatcher.java:386) > at > org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:545) > at > org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:147) > at > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:5876) > at > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:6027) > at > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:5814) > at > org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2552) > at > org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32385) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2150) > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:112) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:187) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:167) > {noformat} > Conditions: > table T with a single column family 0 that uses ROWCOL bloom filter > (important) and column qualifiers C1,C2,C3,C4,C5. > When we fill the table for every row we put deleted cell for C3. > The table has a single region with two HStore: > A: start row: 0, stop row: 99 > B: start row: 10 stop row: 99 > B has newer versions of rows 10-99. Store files have several blocks each > (important). > Store A is the result of major compaction, so it doesn't have any deleted > cells (important). > So, we are running a scan like: > {noformat} > scan 'T', { COLUMNS => ['0:C3','0:C5'], FILTER => "SingleColumnValueFilter > ('0','C5',=,'binary:whatever')"} > {noformat} > How the scan performs: > First, we iterate A for rows 0 and 1 without any problems. > Next, we start to iterate A for row 10, so read the first cell and set hfs > scanner to A : > 10:0/C1/0/Put/x but found that we have a newer version of the cell in B : > 10:0/C1/1/Put/x, > so we make B as our current store scanner. Since we are looking for > particular columns > C3 and C5, we perform the optimization StoreScanner.seekOrSkipToNextColumn > which > would run reseek for all store scanners. > For store A the following magic would happen in requestSeek: > 1. bloom filter check passesGeneralBloomFilter would set haveToSeek to > false because row 10 doesn't have C3 qualifier in store A. > 2. Since we don't have to seek we just create a fake row > 10:0/C3/OLDEST_TIMESTAMP/Maximum, an optimization that is quite important for > us and it commented with : > {noformat} > // Multi-column Bloom filter optimization. > // Create a fake key/value, so that this scanner only bubbles up to the > top > // of the KeyValueHeap in StoreScanner after we scanned this row/column in > // all other store files. The query matcher will then just skip this fake > // key/value and the store scanner will progress to the next column. This > // is obviously not a "real real" seek, but unlike the fake KV earlier in > // this method, we want this to be propagated to ScanQueryMatcher. > {noformat} > > For store B we would set it to fake 10:0/C3/createFirstOnRowColTS()/Maximum > to skip C3 entirely. > After that we start searching for qualifier C5 using seekOrSkipToNextColumn > which run first trySkipToNextColumn: > {n
[jira] [Updated] (HBASE-19863) java.lang.IllegalStateException: isDelete failed when SingleColumnValueFilter is used
[ https://issues.apache.org/jira/browse/HBASE-19863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Soldatov updated HBASE-19863: Attachment: HBASE-19863-branch1.patch > java.lang.IllegalStateException: isDelete failed when SingleColumnValueFilter > is used > - > > Key: HBASE-19863 > URL: https://issues.apache.org/jira/browse/HBASE-19863 > Project: HBase > Issue Type: Bug > Components: Filters >Affects Versions: 1.4.1 >Reporter: Sergey Soldatov >Assignee: Sergey Soldatov >Priority: Major > Attachments: HBASE-19863-branch1.patch, HBASE-19863-test.patch > > > Under some circumstances scan with SingleColumnValueFilter may fail with an > exception > {noformat} > java.lang.IllegalStateException: isDelete failed: deleteBuffer=C3, > qualifier=C2, timestamp=1516433595543, comparison result: 1 > at > org.apache.hadoop.hbase.regionserver.ScanDeleteTracker.isDeleted(ScanDeleteTracker.java:149) > at > org.apache.hadoop.hbase.regionserver.ScanQueryMatcher.match(ScanQueryMatcher.java:386) > at > org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:545) > at > org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:147) > at > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:5876) > at > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:6027) > at > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:5814) > at > org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2552) > at > org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32385) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2150) > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:112) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:187) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:167) > {noformat} > Conditions: > table T with a single column family 0 that uses ROWCOL bloom filter > (important) and column qualifiers C1,C2,C3,C4,C5. > When we fill the table for every row we put deleted cell for C3. > The table has a single region with two HStore: > A: start row: 0, stop row: 99 > B: start row: 10 stop row: 99 > B has newer versions of rows 10-99. Store files have several blocks each > (important). > Store A is the result of major compaction, so it doesn't have any deleted > cells (important). > So, we are running a scan like: > {noformat} > scan 'T', { COLUMNS => ['0:C3','0:C5'], FILTER => "SingleColumnValueFilter > ('0','C5',=,'binary:whatever')"} > {noformat} > How the scan performs: > First, we iterate A for rows 0 and 1 without any problems. > Next, we start to iterate A for row 10, so read the first cell and set hfs > scanner to A : > 10:0/C1/0/Put/x but found that we have a newer version of the cell in B : > 10:0/C1/1/Put/x, > so we make B as our current store scanner. Since we are looking for > particular columns > C3 and C5, we perform the optimization StoreScanner.seekOrSkipToNextColumn > which > would run reseek for all store scanners. > For store A the following magic would happen in requestSeek: > 1. bloom filter check passesGeneralBloomFilter would set haveToSeek to > false because row 10 doesn't have C3 qualifier in store A. > 2. Since we don't have to seek we just create a fake row > 10:0/C3/OLDEST_TIMESTAMP/Maximum, an optimization that is quite important for > us and it commented with : > {noformat} > // Multi-column Bloom filter optimization. > // Create a fake key/value, so that this scanner only bubbles up to the > top > // of the KeyValueHeap in StoreScanner after we scanned this row/column in > // all other store files. The query matcher will then just skip this fake > // key/value and the store scanner will progress to the next column. This > // is obviously not a "real real" seek, but unlike the fake KV earlier in > // this method, we want this to be propagated to ScanQueryMatcher. > {noformat} > > For store B we would set it to fake 10:0/C3/createFirstOnRowColTS()/Maximum > to skip C3 entirely. > After that we start searching for qualifier C5 using seekOrSkipToNextColumn > which run first trySkipToNextColumn: > {noformat} > protected boolean trySkipToNextColumn(Cell cell) throws IOException { > Cell nextCell = null; > do { > Cell nextIndexedKey = getNextIndexedKey(); > if (nextIndexedKey != null && nextIndexedKey != > KeyValueScanner.NO_NEXT_INDEXED_KEY > && matcher.compareKeyFo
[jira] [Commented] (HBASE-17852) Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental backup)
[ https://issues.apache.org/jira/browse/HBASE-17852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16344231#comment-16344231 ] Josh Elser commented on HBASE-17852: {quote} HBASE-19568 had basically everything that was objected in the reviews here, why wasn't it brought to the attention of people who raised objections? The title/reason of that jira reason doesn't matter. I see it as a really sly move - going behind community and committed changes which were heavily objected against, by using separate jira. {quote} [~appy], let's take a step back, please. I called this out to your attention -- I was under the impression that, based on your earlier comment ([here|https://issues.apache.org/jira/browse/HBASE-17852?focusedCommentId=16327774&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16327774]) that you were OK of this implementation landing in master as-is. HBASE-19568 was used to commit to master (with what I thought was your blessing) while we continue to use this JIRA issue to flesh out design because of all of the discussion that has happened. If I misunderstood you or poorly asked you the question, let's take that over to HBASE-19568 and get a revert in place. There was nothing malicious intending to happen here. > Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental > backup) > > > Key: HBASE-17852 > URL: https://issues.apache.org/jira/browse/HBASE-17852 > Project: HBase > Issue Type: Sub-task >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov >Priority: Major > Fix For: 3.0.0 > > Attachments: HBASE-17852-v10.patch, screenshot-1.png > > > Design approach rollback-via-snapshot implemented in this ticket: > # Before backup create/delete/merge starts we take a snapshot of the backup > meta-table (backup system table). This procedure is lightweight because meta > table is small, usually should fit a single region. > # When operation fails on a server side, we handle this failure by cleaning > up partial data in backup destination, followed by restoring backup > meta-table from a snapshot. > # When operation fails on a client side (abnormal termination, for example), > next time user will try create/merge/delete he(she) will see error message, > that system is in inconsistent state and repair is required, he(she) will > need to run backup repair tool. > # To avoid multiple writers to the backup system table (backup client and > BackupObserver's) we introduce small table ONLY to keep listing of bulk > loaded files. All backup observers will work only with this new tables. The > reason: in case of a failure during backup create/delete/merge/restore, when > system performs automatic rollback, some data written by backup observers > during failed operation may be lost. This is what we try to avoid. > # Second table keeps only bulk load related references. We do not care about > consistency of this table, because bulk load is idempotent operation and can > be repeated after failure. Partially written data in second table does not > affect on BackupHFileCleaner plugin, because this data (list of bulk loaded > files) correspond to a files which have not been loaded yet successfully and, > hence - are not visible to the system -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (HBASE-17852) Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental backup)
[ https://issues.apache.org/jira/browse/HBASE-17852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16344228#comment-16344228 ] Vladimir Rodionov edited comment on HBASE-17852 at 1/30/18 12:07 AM: - Nope, it turned out that this patch (HBASE-17852) also fixes the issue raised in HBASE-19568, that is why it was committed (with refactoring code stripped down). No conspiracy here. Besides this, I thought that we have agreed on pushing this to the master branch and continue working on a critical changes after that? was (Author: vrodionov): Nope, it turned out that this patch (HBASE-17852) also fixes the issue raised in HBASE-19568, that is why it was committed (with refactoring code stripped down). No conspiracy here. Besides this, I thought that we agreed on pusing this to the master branch and continue working on a critical changes after that? > Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental > backup) > > > Key: HBASE-17852 > URL: https://issues.apache.org/jira/browse/HBASE-17852 > Project: HBase > Issue Type: Sub-task >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov >Priority: Major > Fix For: 3.0.0 > > Attachments: HBASE-17852-v10.patch, screenshot-1.png > > > Design approach rollback-via-snapshot implemented in this ticket: > # Before backup create/delete/merge starts we take a snapshot of the backup > meta-table (backup system table). This procedure is lightweight because meta > table is small, usually should fit a single region. > # When operation fails on a server side, we handle this failure by cleaning > up partial data in backup destination, followed by restoring backup > meta-table from a snapshot. > # When operation fails on a client side (abnormal termination, for example), > next time user will try create/merge/delete he(she) will see error message, > that system is in inconsistent state and repair is required, he(she) will > need to run backup repair tool. > # To avoid multiple writers to the backup system table (backup client and > BackupObserver's) we introduce small table ONLY to keep listing of bulk > loaded files. All backup observers will work only with this new tables. The > reason: in case of a failure during backup create/delete/merge/restore, when > system performs automatic rollback, some data written by backup observers > during failed operation may be lost. This is what we try to avoid. > # Second table keeps only bulk load related references. We do not care about > consistency of this table, because bulk load is idempotent operation and can > be repeated after failure. Partially written data in second table does not > affect on BackupHFileCleaner plugin, because this data (list of bulk loaded > files) correspond to a files which have not been loaded yet successfully and, > hence - are not visible to the system -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (HBASE-17852) Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental backup)
[ https://issues.apache.org/jira/browse/HBASE-17852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16344228#comment-16344228 ] Vladimir Rodionov edited comment on HBASE-17852 at 1/30/18 12:05 AM: - Nope, it turned out that this patch (HBASE-17852) also fixes the issue raised in HBASE-19568, that is why it was committed (with refactoring code stripped down). No conspiracy here. Besides this, I thought that we agreed on pusing this to the master branch and continue working on a critical changes after that? was (Author: vrodionov): Nope, it turned out that this patch (HBASE-17852) also fixes the issue raised in HBASE-19568, that is why it was committed (with refactoring code stripped down). No conspiracy here. > Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental > backup) > > > Key: HBASE-17852 > URL: https://issues.apache.org/jira/browse/HBASE-17852 > Project: HBase > Issue Type: Sub-task >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov >Priority: Major > Fix For: 3.0.0 > > Attachments: HBASE-17852-v10.patch, screenshot-1.png > > > Design approach rollback-via-snapshot implemented in this ticket: > # Before backup create/delete/merge starts we take a snapshot of the backup > meta-table (backup system table). This procedure is lightweight because meta > table is small, usually should fit a single region. > # When operation fails on a server side, we handle this failure by cleaning > up partial data in backup destination, followed by restoring backup > meta-table from a snapshot. > # When operation fails on a client side (abnormal termination, for example), > next time user will try create/merge/delete he(she) will see error message, > that system is in inconsistent state and repair is required, he(she) will > need to run backup repair tool. > # To avoid multiple writers to the backup system table (backup client and > BackupObserver's) we introduce small table ONLY to keep listing of bulk > loaded files. All backup observers will work only with this new tables. The > reason: in case of a failure during backup create/delete/merge/restore, when > system performs automatic rollback, some data written by backup observers > during failed operation may be lost. This is what we try to avoid. > # Second table keeps only bulk load related references. We do not care about > consistency of this table, because bulk load is idempotent operation and can > be repeated after failure. Partially written data in second table does not > affect on BackupHFileCleaner plugin, because this data (list of bulk loaded > files) correspond to a files which have not been loaded yet successfully and, > hence - are not visible to the system -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-17852) Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental backup)
[ https://issues.apache.org/jira/browse/HBASE-17852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16344228#comment-16344228 ] Vladimir Rodionov commented on HBASE-17852: --- Nope, it turned out that this patch (HBASE-17852) also fixes the issue raised in HBASE-19568, that is why it was committed (with refactoring code stripped down). No conspiracy here. > Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental > backup) > > > Key: HBASE-17852 > URL: https://issues.apache.org/jira/browse/HBASE-17852 > Project: HBase > Issue Type: Sub-task >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov >Priority: Major > Fix For: 3.0.0 > > Attachments: HBASE-17852-v10.patch, screenshot-1.png > > > Design approach rollback-via-snapshot implemented in this ticket: > # Before backup create/delete/merge starts we take a snapshot of the backup > meta-table (backup system table). This procedure is lightweight because meta > table is small, usually should fit a single region. > # When operation fails on a server side, we handle this failure by cleaning > up partial data in backup destination, followed by restoring backup > meta-table from a snapshot. > # When operation fails on a client side (abnormal termination, for example), > next time user will try create/merge/delete he(she) will see error message, > that system is in inconsistent state and repair is required, he(she) will > need to run backup repair tool. > # To avoid multiple writers to the backup system table (backup client and > BackupObserver's) we introduce small table ONLY to keep listing of bulk > loaded files. All backup observers will work only with this new tables. The > reason: in case of a failure during backup create/delete/merge/restore, when > system performs automatic rollback, some data written by backup observers > during failed operation may be lost. This is what we try to avoid. > # Second table keeps only bulk load related references. We do not care about > consistency of this table, because bulk load is idempotent operation and can > be repeated after failure. Partially written data in second table does not > affect on BackupHFileCleaner plugin, because this data (list of bulk loaded > files) correspond to a files which have not been loaded yet successfully and, > hence - are not visible to the system -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-19725) Build fails, unable to read hbase/checkstyle-suppressions.xml "invalid distance too far back"
[ https://issues.apache.org/jira/browse/HBASE-19725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16344222#comment-16344222 ] stack commented on HBASE-19725: --- This is fixed by -HBASE-19780- Resolved I'll not apply the below change, undoing a workaround which does not run checkstyle as part of site build... its not harm (checkstyle is in place when we do the initial build when we make a release candidate). {code:java} diff --git a/dev-support/make_rc.sh b/dev-support/make_rc.sh index f067ee9..1424249 100755 --- a/dev-support/make_rc.sh +++ b/dev-support/make_rc.sh @@ -78,8 +78,7 @@ function build_bin { MAVEN_OPTS="${mvnopts}" ${mvn} clean install -DskipTests \ -Papache-release -Prelease \ -Dmaven.repo.local=${output_dir}/repository - MAVEN_OPTS="${mvnopts}" ${mvn} install -DskipTests \ - -Dcheckstyle.skip=true site assembly:single \ + MAVEN_OPTS="${mvnopts}" ${mvn} install -DskipTests assembly:single \ -Papache-release -Prelease \ -Dmaven.repo.local=${output_dir}/repository mv ./hbase-assembly/target/hbase-*.tar.gz "${output_dir}"{code} > Build fails, unable to read hbase/checkstyle-suppressions.xml "invalid > distance too far back" > - > > Key: HBASE-19725 > URL: https://issues.apache.org/jira/browse/HBASE-19725 > Project: HBase > Issue Type: Sub-task >Reporter: stack >Assignee: stack >Priority: Blocker > Fix For: 2.0.0-beta-2 > > > Build is failing on me (Trying to cut beta-1 RC on branch-2). It is first > time we go to use the jars made by hbase-checkstyle in the hbase-error-prone > module under 'build support' module when running the 'site' target. It is > trying to make the checkstyle report. > I see that we find the right jar to read: > [DEBUG] The resource 'hbase/checkstyle-suppressions.xml' was found as > jar:file:/home/stack/rc/hbase-2.0.0-beta-1.20180107T061305Z/repository/org/apache/hbase/hbase-checkstyle/2.0.0-beta-1/hbase-checkstyle-2.0.0-beta-1.jar!/hbase/checkstyle-suppressions.xml. > But then it thinks the jar corrupt 'ZipException: invalid distance too far > back'. > Here is mvn output: > 12667058 [ERROR] Failed to execute goal > org.apache.maven.plugins:maven-checkstyle-plugin:2.17:check (checkstyle) on > project hbase-error-prone: Failed during checkstyle executi on: > Unable to process suppressions file location: > hbase/checkstyle-suppressions.xml: Cannot create file-based resource:invalid > distance too far back -> [Help 1] > 12667059 org.apache.maven.lifecycle.LifecycleExecutionException: Failed to > execute goal org.apache.maven.plugins:maven-checkstyle-plugin:2.17:check > (checkstyle) on project hba se-error-prone: Failed during checkstyle > execution > I'm running this command: > mvn -X install -DskipTests site assembly:single -Papache-release -Prelease > -Dmaven.repo.local=//home/stack/rc/hbase-2.0.0-beta-1.20180107T061305Z/repository > Apache Maven 3.3.9 (bb52d8502b132ec0a5a3f4c09453c07478323dc5; > 2015-11-10T08:41:47-08:00) > Java version: 1.8.0_151, vendor: Oracle Corporation -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (HBASE-19725) Build fails, unable to read hbase/checkstyle-suppressions.xml "invalid distance too far back"
[ https://issues.apache.org/jira/browse/HBASE-19725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack resolved HBASE-19725. --- Resolution: Not A Problem > Build fails, unable to read hbase/checkstyle-suppressions.xml "invalid > distance too far back" > - > > Key: HBASE-19725 > URL: https://issues.apache.org/jira/browse/HBASE-19725 > Project: HBase > Issue Type: Sub-task >Reporter: stack >Assignee: stack >Priority: Blocker > Fix For: 2.0.0-beta-2 > > > Build is failing on me (Trying to cut beta-1 RC on branch-2). It is first > time we go to use the jars made by hbase-checkstyle in the hbase-error-prone > module under 'build support' module when running the 'site' target. It is > trying to make the checkstyle report. > I see that we find the right jar to read: > [DEBUG] The resource 'hbase/checkstyle-suppressions.xml' was found as > jar:file:/home/stack/rc/hbase-2.0.0-beta-1.20180107T061305Z/repository/org/apache/hbase/hbase-checkstyle/2.0.0-beta-1/hbase-checkstyle-2.0.0-beta-1.jar!/hbase/checkstyle-suppressions.xml. > But then it thinks the jar corrupt 'ZipException: invalid distance too far > back'. > Here is mvn output: > 12667058 [ERROR] Failed to execute goal > org.apache.maven.plugins:maven-checkstyle-plugin:2.17:check (checkstyle) on > project hbase-error-prone: Failed during checkstyle executi on: > Unable to process suppressions file location: > hbase/checkstyle-suppressions.xml: Cannot create file-based resource:invalid > distance too far back -> [Help 1] > 12667059 org.apache.maven.lifecycle.LifecycleExecutionException: Failed to > execute goal org.apache.maven.plugins:maven-checkstyle-plugin:2.17:check > (checkstyle) on project hba se-error-prone: Failed during checkstyle > execution > I'm running this command: > mvn -X install -DskipTests site assembly:single -Papache-release -Prelease > -Dmaven.repo.local=//home/stack/rc/hbase-2.0.0-beta-1.20180107T061305Z/repository > Apache Maven 3.3.9 (bb52d8502b132ec0a5a3f4c09453c07478323dc5; > 2015-11-10T08:41:47-08:00) > Java version: 1.8.0_151, vendor: Oracle Corporation -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-17852) Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental backup)
[ https://issues.apache.org/jira/browse/HBASE-17852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16344219#comment-16344219 ] Appy commented on HBASE-17852: -- Forget all the design discussion, that's not important anymore. HBASE-19568 had basically everything that was objected in the reviews here, why wasn't it brought to the attention of people who raised objections? The title/reason of that jira reason doesn't matter. I see it as a really sly move - going behind community and committed changes which were heavily objected against, by using separate jira. Ping reviewers of other jira: [~elserj] [~tedyu] Ping [~stack] [~apurtell] > Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental > backup) > > > Key: HBASE-17852 > URL: https://issues.apache.org/jira/browse/HBASE-17852 > Project: HBase > Issue Type: Sub-task >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov >Priority: Major > Fix For: 3.0.0 > > Attachments: HBASE-17852-v10.patch, screenshot-1.png > > > Design approach rollback-via-snapshot implemented in this ticket: > # Before backup create/delete/merge starts we take a snapshot of the backup > meta-table (backup system table). This procedure is lightweight because meta > table is small, usually should fit a single region. > # When operation fails on a server side, we handle this failure by cleaning > up partial data in backup destination, followed by restoring backup > meta-table from a snapshot. > # When operation fails on a client side (abnormal termination, for example), > next time user will try create/merge/delete he(she) will see error message, > that system is in inconsistent state and repair is required, he(she) will > need to run backup repair tool. > # To avoid multiple writers to the backup system table (backup client and > BackupObserver's) we introduce small table ONLY to keep listing of bulk > loaded files. All backup observers will work only with this new tables. The > reason: in case of a failure during backup create/delete/merge/restore, when > system performs automatic rollback, some data written by backup observers > during failed operation may be lost. This is what we try to avoid. > # Second table keeps only bulk load related references. We do not care about > consistency of this table, because bulk load is idempotent operation and can > be repeated after failure. Partially written data in second table does not > affect on BackupHFileCleaner plugin, because this data (list of bulk loaded > files) correspond to a files which have not been loaded yet successfully and, > hence - are not visible to the system -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (HBASE-17852) Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental backup)
[ https://issues.apache.org/jira/browse/HBASE-17852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16344160#comment-16344160 ] Appy edited comment on HBASE-17852 at 1/29/18 11:10 PM: I see only patch v10 in the attached files, and all it's doing is changing name of BackupSystemTable to BackupMetaTable. It's far from what the title says - "Add Fault Tolerance". What am i missing? {color:red}Edit: {color} *Please never delete attachments which formed the basis of earlier discussions in a jira* was (Author: appy): I see only patch v10 in the attached files, and all it's doing is changing name of BackupSystemTable to BackupMetaTable. It's far from what the title says - "Add Fault Tolerance". What am i missing? *Please never delete attachments which formed the basis of earlier discussions in a jira* > Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental > backup) > > > Key: HBASE-17852 > URL: https://issues.apache.org/jira/browse/HBASE-17852 > Project: HBase > Issue Type: Sub-task >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov >Priority: Major > Fix For: 3.0.0 > > Attachments: HBASE-17852-v10.patch, screenshot-1.png > > > Design approach rollback-via-snapshot implemented in this ticket: > # Before backup create/delete/merge starts we take a snapshot of the backup > meta-table (backup system table). This procedure is lightweight because meta > table is small, usually should fit a single region. > # When operation fails on a server side, we handle this failure by cleaning > up partial data in backup destination, followed by restoring backup > meta-table from a snapshot. > # When operation fails on a client side (abnormal termination, for example), > next time user will try create/merge/delete he(she) will see error message, > that system is in inconsistent state and repair is required, he(she) will > need to run backup repair tool. > # To avoid multiple writers to the backup system table (backup client and > BackupObserver's) we introduce small table ONLY to keep listing of bulk > loaded files. All backup observers will work only with this new tables. The > reason: in case of a failure during backup create/delete/merge/restore, when > system performs automatic rollback, some data written by backup observers > during failed operation may be lost. This is what we try to avoid. > # Second table keeps only bulk load related references. We do not care about > consistency of this table, because bulk load is idempotent operation and can > be repeated after failure. Partially written data in second table does not > affect on BackupHFileCleaner plugin, because this data (list of bulk loaded > files) correspond to a files which have not been loaded yet successfully and, > hence - are not visible to the system -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (HBASE-17852) Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental backup)
[ https://issues.apache.org/jira/browse/HBASE-17852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16344160#comment-16344160 ] Appy edited comment on HBASE-17852 at 1/29/18 11:09 PM: I see only patch v10 in the attached files, and all it's doing is changing name of BackupSystemTable to BackupMetaTable. It's far from what the title says - "Add Fault Tolerance". What am i missing? *Please never delete attachments which formed the basis of earlier discussions in a jira* was (Author: appy): I see only patch v10 in the attached files, and all it's doing is changing name of BackupSystemTable to BackupMetaTable. It's far from what the title says - "Add Fault Tolerance". What am i missing? > Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental > backup) > > > Key: HBASE-17852 > URL: https://issues.apache.org/jira/browse/HBASE-17852 > Project: HBase > Issue Type: Sub-task >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov >Priority: Major > Fix For: 3.0.0 > > Attachments: HBASE-17852-v10.patch, screenshot-1.png > > > Design approach rollback-via-snapshot implemented in this ticket: > # Before backup create/delete/merge starts we take a snapshot of the backup > meta-table (backup system table). This procedure is lightweight because meta > table is small, usually should fit a single region. > # When operation fails on a server side, we handle this failure by cleaning > up partial data in backup destination, followed by restoring backup > meta-table from a snapshot. > # When operation fails on a client side (abnormal termination, for example), > next time user will try create/merge/delete he(she) will see error message, > that system is in inconsistent state and repair is required, he(she) will > need to run backup repair tool. > # To avoid multiple writers to the backup system table (backup client and > BackupObserver's) we introduce small table ONLY to keep listing of bulk > loaded files. All backup observers will work only with this new tables. The > reason: in case of a failure during backup create/delete/merge/restore, when > system performs automatic rollback, some data written by backup observers > during failed operation may be lost. This is what we try to avoid. > # Second table keeps only bulk load related references. We do not care about > consistency of this table, because bulk load is idempotent operation and can > be repeated after failure. Partially written data in second table does not > affect on BackupHFileCleaner plugin, because this data (list of bulk loaded > files) correspond to a files which have not been loaded yet successfully and, > hence - are not visible to the system -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-17852) Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental backup)
[ https://issues.apache.org/jira/browse/HBASE-17852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16344163#comment-16344163 ] Vladimir Rodionov commented on HBASE-17852: --- I will quote myself {quote} I will rebase patch to the current master. The majority of this code (but not all) went into master in HBASE-19568 btw. {quote} > Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental > backup) > > > Key: HBASE-17852 > URL: https://issues.apache.org/jira/browse/HBASE-17852 > Project: HBase > Issue Type: Sub-task >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov >Priority: Major > Fix For: 3.0.0 > > Attachments: HBASE-17852-v10.patch, screenshot-1.png > > > Design approach rollback-via-snapshot implemented in this ticket: > # Before backup create/delete/merge starts we take a snapshot of the backup > meta-table (backup system table). This procedure is lightweight because meta > table is small, usually should fit a single region. > # When operation fails on a server side, we handle this failure by cleaning > up partial data in backup destination, followed by restoring backup > meta-table from a snapshot. > # When operation fails on a client side (abnormal termination, for example), > next time user will try create/merge/delete he(she) will see error message, > that system is in inconsistent state and repair is required, he(she) will > need to run backup repair tool. > # To avoid multiple writers to the backup system table (backup client and > BackupObserver's) we introduce small table ONLY to keep listing of bulk > loaded files. All backup observers will work only with this new tables. The > reason: in case of a failure during backup create/delete/merge/restore, when > system performs automatic rollback, some data written by backup observers > during failed operation may be lost. This is what we try to avoid. > # Second table keeps only bulk load related references. We do not care about > consistency of this table, because bulk load is idempotent operation and can > be repeated after failure. Partially written data in second table does not > affect on BackupHFileCleaner plugin, because this data (list of bulk loaded > files) correspond to a files which have not been loaded yet successfully and, > hence - are not visible to the system -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-17852) Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental backup)
[ https://issues.apache.org/jira/browse/HBASE-17852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16344160#comment-16344160 ] Appy commented on HBASE-17852: -- I see only patch v10 in the attached files, and all it's doing is changing name of BackupSystemTable to BackupMetaTable. It's far from what the title says - "Add Fault Tolerance". What am i missing? > Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental > backup) > > > Key: HBASE-17852 > URL: https://issues.apache.org/jira/browse/HBASE-17852 > Project: HBase > Issue Type: Sub-task >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov >Priority: Major > Fix For: 3.0.0 > > Attachments: HBASE-17852-v10.patch, screenshot-1.png > > > Design approach rollback-via-snapshot implemented in this ticket: > # Before backup create/delete/merge starts we take a snapshot of the backup > meta-table (backup system table). This procedure is lightweight because meta > table is small, usually should fit a single region. > # When operation fails on a server side, we handle this failure by cleaning > up partial data in backup destination, followed by restoring backup > meta-table from a snapshot. > # When operation fails on a client side (abnormal termination, for example), > next time user will try create/merge/delete he(she) will see error message, > that system is in inconsistent state and repair is required, he(she) will > need to run backup repair tool. > # To avoid multiple writers to the backup system table (backup client and > BackupObserver's) we introduce small table ONLY to keep listing of bulk > loaded files. All backup observers will work only with this new tables. The > reason: in case of a failure during backup create/delete/merge/restore, when > system performs automatic rollback, some data written by backup observers > during failed operation may be lost. This is what we try to avoid. > # Second table keeps only bulk load related references. We do not care about > consistency of this table, because bulk load is idempotent operation and can > be repeated after failure. Partially written data in second table does not > affect on BackupHFileCleaner plugin, because this data (list of bulk loaded > files) correspond to a files which have not been loaded yet successfully and, > hence - are not visible to the system -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-19889) Revert Workaround: Purge User API building from branch-2 so can make a beta-1
[ https://issues.apache.org/jira/browse/HBASE-19889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-19889: -- Fix Version/s: (was: 2.0.0-beta-2) > Revert Workaround: Purge User API building from branch-2 so can make a beta-1 > - > > Key: HBASE-19889 > URL: https://issues.apache.org/jira/browse/HBASE-19889 > Project: HBase > Issue Type: Sub-task > Components: site >Reporter: stack >Assignee: stack >Priority: Major > > Root fix looks to be -HBASE-19780- > Let me try it -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-19663) site build fails complaining "javadoc: error - class file for javax.annotation.meta.TypeQualifierNickname not found"
[ https://issues.apache.org/jira/browse/HBASE-19663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-19663: -- Fix Version/s: (was: 2.0.0) 2.0.0-beta-2 > site build fails complaining "javadoc: error - class file for > javax.annotation.meta.TypeQualifierNickname not found" > > > Key: HBASE-19663 > URL: https://issues.apache.org/jira/browse/HBASE-19663 > Project: HBase > Issue Type: Bug > Components: site >Reporter: stack >Assignee: stack >Priority: Blocker > Fix For: 2.0.0-beta-2 > > Attachments: script.sh > > > Cryptic failure trying to build beta-1 RC. Fails like this: > {code} > [INFO] BUILD FAILURE > [INFO] > > [INFO] Total time: 03:54 min > [INFO] Finished at: 2017-12-29T01:13:15-08:00 > [INFO] Final Memory: 381M/9165M > [INFO] > > [ERROR] Failed to execute goal > org.apache.maven.plugins:maven-site-plugin:3.4:site (default-site) on project > hbase: Error generating maven-javadoc-plugin:2.10.3:aggregate: > [ERROR] Exit code: 1 - warning: unknown enum constant When.ALWAYS > [ERROR] reason: class file for javax.annotation.meta.When not found > [ERROR] warning: unknown enum constant When.UNKNOWN > [ERROR] warning: unknown enum constant When.MAYBE > [ERROR] > /home/stack/hbase.git/hbase-common/src/main/java/org/apache/hadoop/hbase/CellUtil.java:762: > warning - Tag @link: malformed: "#matchingRows(Cell, byte[]))" > [ERROR] > /home/stack/hbase.git/hbase-common/src/main/java/org/apache/hadoop/hbase/CellUtil.java:762: > warning - Tag @link: reference not found: #matchingRows(Cell, byte[])) > [ERROR] > /home/stack/hbase.git/hbase-common/src/main/java/org/apache/hadoop/hbase/CellUtil.java:762: > warning - Tag @link: reference not found: #matchingRows(Cell, byte[])) > [ERROR] javadoc: warning - Class javax.annotation.Nonnull not found. > [ERROR] javadoc: error - class file for > javax.annotation.meta.TypeQualifierNickname not found > [ERROR] > [ERROR] Command line was: /home/stack/bin/jdk1.8.0_151/jre/../bin/javadoc > -J-Xmx2G @options @packages > [ERROR] > [ERROR] Refer to the generated Javadoc files in > '/home/stack/hbase.git/target/site/apidocs' dir. > [ERROR] -> [Help 1] > [ERROR] > [ERROR] To see the full stack trace of the errors, re-run Maven with the -e > switch. > [ERROR] Re-run Maven using the -X switch to enable full debug logging. > [ERROR] > [ERROR] For more information about the errors and possible solutions, please > read the following articles: > [ERROR] [Help 1] > http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException > {code} > javax.annotation.meta.TypeQualifierNickname is out of jsr305 but we don't > include this anywhere according to mvn dependency. > Happens building the User API both test and main. > Excluding these lines gets us passing again: > {code} > 3511 > 3512 > org.apache.yetus.audience.tools.IncludePublicAnnotationsStandardDoclet > 3513 > 3514 > 3515 org.apache.yetus > 3516 audience-annotations > 3517 ${audience-annotations.version} > 3518 > + 3519 true > {code} > Tried upgrading to newer mvn site (ours is three years old) but that a > different set of problems. -- This message was sent by Atlassian JIRA (v7.6.3#76005)