[jira] [Commented] (HBASE-21490) WALProcedure may remove proc wal files still with active procedures
[ https://issues.apache.org/jira/browse/HBASE-21490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16693107#comment-16693107 ] Hudson commented on HBASE-21490: Results for branch master [build #618 on builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/master/618/]: (x) *{color:red}-1 overall{color}* details (if available): (/) {color:green}+1 general checks{color} -- For more information [see general report|https://builds.apache.org/job/HBase%20Nightly/job/master/618//General_Nightly_Build_Report/] (x) {color:red}-1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://builds.apache.org/job/HBase%20Nightly/job/master/618//JDK8_Nightly_Build_Report_(Hadoop2)/] (x) {color:red}-1 jdk8 hadoop3 checks{color} -- For more information [see jdk8 (hadoop3) report|https://builds.apache.org/job/HBase%20Nightly/job/master/618//JDK8_Nightly_Build_Report_(Hadoop3)/] (/) {color:green}+1 source release artifact{color} -- See build output for details. (/) {color:green}+1 client integration test{color} > WALProcedure may remove proc wal files still with active procedures > --- > > Key: HBASE-21490 > URL: https://issues.apache.org/jira/browse/HBASE-21490 > Project: HBase > Issue Type: Sub-task > Components: proc-v2 >Reporter: Duo Zhang >Assignee: Duo Zhang >Priority: Major > Fix For: 3.0.0, 2.2.0, 2.0.3, 2.1.2 > > Attachments: HBASE-21490-UT.patch, HBASE-21490-v1.patch, > HBASE-21490.patch, HBASE-21490.patch > > > It happens for me several times. After master restart, all the procedures are > gone. > And the proc wal files were deleted before restarting, I see this in the > master's log > {noformat} > 2018-11-16,20:57:40,177 INFO [WALProcedureStoreSyncThread] > org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore: Remove all > state logs with ID less than 184, since all the active procedures are in the > latest log > 2018-11-16,20:57:40,177 INFO [WALProcedureStoreSyncThread] > org.apache.hadoop.hbase.procedure2.store.wal.ProcedureWALFile: Archiving > hdfs://c4tst-xiaomi/hbase/c4tst-sync1/MasterProcWALs/pv2-0184.log > to hdfs://c4tst-xiaomi/hbase/c4tst-sync1/oldWALs/pv2-0184.log > {noformat} > Let me dig... -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21490) WALProcedure may remove proc wal files still with active procedures
[ https://issues.apache.org/jira/browse/HBASE-21490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16692630#comment-16692630 ] Hudson commented on HBASE-21490: Results for branch branch-2.1 [build #620 on builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/620/]: (/) *{color:green}+1 overall{color}* details (if available): (/) {color:green}+1 general checks{color} -- For more information [see general report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/620//General_Nightly_Build_Report/] (/) {color:green}+1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/620//JDK8_Nightly_Build_Report_(Hadoop2)/] (/) {color:green}+1 jdk8 hadoop3 checks{color} -- For more information [see jdk8 (hadoop3) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/620//JDK8_Nightly_Build_Report_(Hadoop3)/] (/) {color:green}+1 source release artifact{color} -- See build output for details. (/) {color:green}+1 client integration test{color} > WALProcedure may remove proc wal files still with active procedures > --- > > Key: HBASE-21490 > URL: https://issues.apache.org/jira/browse/HBASE-21490 > Project: HBase > Issue Type: Sub-task > Components: proc-v2 >Reporter: Duo Zhang >Assignee: Duo Zhang >Priority: Major > Fix For: 3.0.0, 2.2.0, 2.0.3, 2.1.2 > > Attachments: HBASE-21490-UT.patch, HBASE-21490-v1.patch, > HBASE-21490.patch, HBASE-21490.patch > > > It happens for me several times. After master restart, all the procedures are > gone. > And the proc wal files were deleted before restarting, I see this in the > master's log > {noformat} > 2018-11-16,20:57:40,177 INFO [WALProcedureStoreSyncThread] > org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore: Remove all > state logs with ID less than 184, since all the active procedures are in the > latest log > 2018-11-16,20:57:40,177 INFO [WALProcedureStoreSyncThread] > org.apache.hadoop.hbase.procedure2.store.wal.ProcedureWALFile: Archiving > hdfs://c4tst-xiaomi/hbase/c4tst-sync1/MasterProcWALs/pv2-0184.log > to hdfs://c4tst-xiaomi/hbase/c4tst-sync1/oldWALs/pv2-0184.log > {noformat} > Let me dig... -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21490) WALProcedure may remove proc wal files still with active procedures
[ https://issues.apache.org/jira/browse/HBASE-21490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16692614#comment-16692614 ] Hudson commented on HBASE-21490: Results for branch branch-2.0 [build #1098 on builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/1098/]: (/) *{color:green}+1 overall{color}* details (if available): (/) {color:green}+1 general checks{color} -- For more information [see general report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/1098//General_Nightly_Build_Report/] (/) {color:green}+1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/1098//JDK8_Nightly_Build_Report_(Hadoop2)/] (/) {color:green}+1 jdk8 hadoop3 checks{color} -- For more information [see jdk8 (hadoop3) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/1098//JDK8_Nightly_Build_Report_(Hadoop3)/] (/) {color:green}+1 source release artifact{color} -- See build output for details. > WALProcedure may remove proc wal files still with active procedures > --- > > Key: HBASE-21490 > URL: https://issues.apache.org/jira/browse/HBASE-21490 > Project: HBase > Issue Type: Sub-task > Components: proc-v2 >Reporter: Duo Zhang >Assignee: Duo Zhang >Priority: Major > Fix For: 3.0.0, 2.2.0, 2.0.3, 2.1.2 > > Attachments: HBASE-21490-UT.patch, HBASE-21490-v1.patch, > HBASE-21490.patch, HBASE-21490.patch > > > It happens for me several times. After master restart, all the procedures are > gone. > And the proc wal files were deleted before restarting, I see this in the > master's log > {noformat} > 2018-11-16,20:57:40,177 INFO [WALProcedureStoreSyncThread] > org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore: Remove all > state logs with ID less than 184, since all the active procedures are in the > latest log > 2018-11-16,20:57:40,177 INFO [WALProcedureStoreSyncThread] > org.apache.hadoop.hbase.procedure2.store.wal.ProcedureWALFile: Archiving > hdfs://c4tst-xiaomi/hbase/c4tst-sync1/MasterProcWALs/pv2-0184.log > to hdfs://c4tst-xiaomi/hbase/c4tst-sync1/oldWALs/pv2-0184.log > {noformat} > Let me dig... -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21490) WALProcedure may remove proc wal files still with active procedures
[ https://issues.apache.org/jira/browse/HBASE-21490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16692431#comment-16692431 ] Hudson commented on HBASE-21490: Results for branch branch-2 [build #1512 on builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1512/]: (x) *{color:red}-1 overall{color}* details (if available): (/) {color:green}+1 general checks{color} -- For more information [see general report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1512//General_Nightly_Build_Report/] (x) {color:red}-1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1512//JDK8_Nightly_Build_Report_(Hadoop2)/] (/) {color:green}+1 jdk8 hadoop3 checks{color} -- For more information [see jdk8 (hadoop3) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1512//JDK8_Nightly_Build_Report_(Hadoop3)/] (/) {color:green}+1 source release artifact{color} -- See build output for details. (/) {color:green}+1 client integration test{color} > WALProcedure may remove proc wal files still with active procedures > --- > > Key: HBASE-21490 > URL: https://issues.apache.org/jira/browse/HBASE-21490 > Project: HBase > Issue Type: Sub-task > Components: proc-v2 >Reporter: Duo Zhang >Assignee: Duo Zhang >Priority: Major > Fix For: 3.0.0, 2.2.0, 2.0.3, 2.1.2 > > Attachments: HBASE-21490-UT.patch, HBASE-21490-v1.patch, > HBASE-21490.patch, HBASE-21490.patch > > > It happens for me several times. After master restart, all the procedures are > gone. > And the proc wal files were deleted before restarting, I see this in the > master's log > {noformat} > 2018-11-16,20:57:40,177 INFO [WALProcedureStoreSyncThread] > org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore: Remove all > state logs with ID less than 184, since all the active procedures are in the > latest log > 2018-11-16,20:57:40,177 INFO [WALProcedureStoreSyncThread] > org.apache.hadoop.hbase.procedure2.store.wal.ProcedureWALFile: Archiving > hdfs://c4tst-xiaomi/hbase/c4tst-sync1/MasterProcWALs/pv2-0184.log > to hdfs://c4tst-xiaomi/hbase/c4tst-sync1/oldWALs/pv2-0184.log > {noformat} > Let me dig... -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21490) WALProcedure may remove proc wal files still with active procedures
[ https://issues.apache.org/jira/browse/HBASE-21490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16691654#comment-16691654 ] Duo Zhang commented on HBASE-21490: --- Will commit tomorrow if no objections. > WALProcedure may remove proc wal files still with active procedures > --- > > Key: HBASE-21490 > URL: https://issues.apache.org/jira/browse/HBASE-21490 > Project: HBase > Issue Type: Sub-task > Components: proc-v2 >Reporter: Duo Zhang >Assignee: Duo Zhang >Priority: Major > Fix For: 3.0.0, 2.2.0, 2.0.3, 2.1.2 > > Attachments: HBASE-21490-UT.patch, HBASE-21490-v1.patch, > HBASE-21490.patch, HBASE-21490.patch > > > It happens for me several times. After master restart, all the procedures are > gone. > And the proc wal files were deleted before restarting, I see this in the > master's log > {noformat} > 2018-11-16,20:57:40,177 INFO [WALProcedureStoreSyncThread] > org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore: Remove all > state logs with ID less than 184, since all the active procedures are in the > latest log > 2018-11-16,20:57:40,177 INFO [WALProcedureStoreSyncThread] > org.apache.hadoop.hbase.procedure2.store.wal.ProcedureWALFile: Archiving > hdfs://c4tst-xiaomi/hbase/c4tst-sync1/MasterProcWALs/pv2-0184.log > to hdfs://c4tst-xiaomi/hbase/c4tst-sync1/oldWALs/pv2-0184.log > {noformat} > Let me dig... -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21490) WALProcedure may remove proc wal files still with active procedures
[ https://issues.apache.org/jira/browse/HBASE-21490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16691295#comment-16691295 ] Allan Yang commented on HBASE-21490: OK, +1 for the patch then > WALProcedure may remove proc wal files still with active procedures > --- > > Key: HBASE-21490 > URL: https://issues.apache.org/jira/browse/HBASE-21490 > Project: HBase > Issue Type: Sub-task > Components: proc-v2 >Reporter: Duo Zhang >Assignee: Duo Zhang >Priority: Major > Fix For: 3.0.0, 2.2.0, 2.0.3, 2.1.2 > > Attachments: HBASE-21490-UT.patch, HBASE-21490-v1.patch, > HBASE-21490.patch, HBASE-21490.patch > > > It happens for me several times. After master restart, all the procedures are > gone. > And the proc wal files were deleted before restarting, I see this in the > master's log > {noformat} > 2018-11-16,20:57:40,177 INFO [WALProcedureStoreSyncThread] > org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore: Remove all > state logs with ID less than 184, since all the active procedures are in the > latest log > 2018-11-16,20:57:40,177 INFO [WALProcedureStoreSyncThread] > org.apache.hadoop.hbase.procedure2.store.wal.ProcedureWALFile: Archiving > hdfs://c4tst-xiaomi/hbase/c4tst-sync1/MasterProcWALs/pv2-0184.log > to hdfs://c4tst-xiaomi/hbase/c4tst-sync1/oldWALs/pv2-0184.log > {noformat} > Let me dig... -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21490) WALProcedure may remove proc wal files still with active procedures
[ https://issues.apache.org/jira/browse/HBASE-21490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16691289#comment-16691289 ] Duo Zhang commented on HBASE-21490: --- {code} can we just use abort flag? {code} No we don't. As said above, the sync thread will do periodicalRoll if not in loading state, in this method we just call the close method with abort = false. And it could happen that we fail to load procedures, and before we actually call stop with abort = true, the sync thread has already deleted some inactive logs based on the broken store tracker. So generally speaking, we should store the 'failed loading' state in the class to prevent further damage, since damage could happen before we call stop with abort = true. > WALProcedure may remove proc wal files still with active procedures > --- > > Key: HBASE-21490 > URL: https://issues.apache.org/jira/browse/HBASE-21490 > Project: HBase > Issue Type: Sub-task > Components: proc-v2 >Reporter: Duo Zhang >Assignee: Duo Zhang >Priority: Major > Fix For: 3.0.0, 2.2.0, 2.0.3, 2.1.2 > > Attachments: HBASE-21490-UT.patch, HBASE-21490-v1.patch, > HBASE-21490.patch, HBASE-21490.patch > > > It happens for me several times. After master restart, all the procedures are > gone. > And the proc wal files were deleted before restarting, I see this in the > master's log > {noformat} > 2018-11-16,20:57:40,177 INFO [WALProcedureStoreSyncThread] > org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore: Remove all > state logs with ID less than 184, since all the active procedures are in the > latest log > 2018-11-16,20:57:40,177 INFO [WALProcedureStoreSyncThread] > org.apache.hadoop.hbase.procedure2.store.wal.ProcedureWALFile: Archiving > hdfs://c4tst-xiaomi/hbase/c4tst-sync1/MasterProcWALs/pv2-0184.log > to hdfs://c4tst-xiaomi/hbase/c4tst-sync1/oldWALs/pv2-0184.log > {noformat} > Let me dig... -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21490) WALProcedure may remove proc wal files still with active procedures
[ https://issues.apache.org/jira/browse/HBASE-21490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16691286#comment-16691286 ] Hadoop QA commented on HBASE-21490: --- | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 12s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 24s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 56s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 7s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 20s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 3m 46s{color} | {color:green} branch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 23s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 43s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 16s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 2s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 10s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 10s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 20s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 3m 45s{color} | {color:green} patch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 8m 18s{color} | {color:green} Patch does not cause any errors with Hadoop 2.7.4 or 3.0.0. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 35s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 43s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 3m 32s{color} | {color:green} hbase-procedure in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green}130m 49s{color} | {color:green} hbase-server in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 51s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}174m 1s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:b002b0b | | JIRA Issue | HBASE-21490 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12948652/HBASE-21490-v1.patch | | Optional Tests | dupname asflicense javac javadoc unit findbugs shadedjars hadoopcheck hbaseanti checkstyle compile | | uname | Linux f48da578f574 4.4.0-138-generic #164-Ubuntu SMP Tue Oct 2 17:16:02 UTC 2018 x86_64 GNU/Linux | | Build tool | maven | | Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh | | git revision | master / b329e6e3f2 | | maven | version: Apache Maven 3.5.4 (1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-17T18:33:14Z) | | Default Java | 1.8.0_181 | | findbugs |
[jira] [Commented] (HBASE-21490) WALProcedure may remove proc wal files still with active procedures
[ https://issues.apache.org/jira/browse/HBASE-21490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16691288#comment-16691288 ] Allan Yang commented on HBASE-21490: Why using loading to decide whether persistence is needed? can we just use abort flag? {quote} But in a real production I think we should do more, as we'd better not rely on the abort flag, we should know that the store tracker is in a broken state... {quote} What's you conern here? [~Apache9] > WALProcedure may remove proc wal files still with active procedures > --- > > Key: HBASE-21490 > URL: https://issues.apache.org/jira/browse/HBASE-21490 > Project: HBase > Issue Type: Sub-task > Components: proc-v2 >Reporter: Duo Zhang >Assignee: Duo Zhang >Priority: Major > Fix For: 3.0.0, 2.2.0, 2.0.3, 2.1.2 > > Attachments: HBASE-21490-UT.patch, HBASE-21490-v1.patch, > HBASE-21490.patch, HBASE-21490.patch > > > It happens for me several times. After master restart, all the procedures are > gone. > And the proc wal files were deleted before restarting, I see this in the > master's log > {noformat} > 2018-11-16,20:57:40,177 INFO [WALProcedureStoreSyncThread] > org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore: Remove all > state logs with ID less than 184, since all the active procedures are in the > latest log > 2018-11-16,20:57:40,177 INFO [WALProcedureStoreSyncThread] > org.apache.hadoop.hbase.procedure2.store.wal.ProcedureWALFile: Archiving > hdfs://c4tst-xiaomi/hbase/c4tst-sync1/MasterProcWALs/pv2-0184.log > to hdfs://c4tst-xiaomi/hbase/c4tst-sync1/oldWALs/pv2-0184.log > {noformat} > Let me dig... -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21490) WALProcedure may remove proc wal files still with active procedures
[ https://issues.apache.org/jira/browse/HBASE-21490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16691210#comment-16691210 ] Hadoop QA commented on HBASE-21490: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 18s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 15s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 5m 1s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 49s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 38s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 4m 45s{color} | {color:green} branch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 4s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 54s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 18s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 5m 0s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 55s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 55s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 39s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 4m 44s{color} | {color:green} patch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 10m 32s{color} | {color:green} Patch does not cause any errors with Hadoop 2.7.4 or 3.0.0. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 22s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 54s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 3m 54s{color} | {color:red} hbase-procedure in the patch failed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red}282m 24s{color} | {color:red} hbase-server in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 49s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}336m 7s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hbase.procedure2.TestForceUpdateProcedure | | | hadoop.hbase.procedure2.store.wal.TestWALProcedureStore | | | hadoop.hbase.client.TestMobRestoreSnapshotFromClientAfterSplittingRegions | | | hadoop.hbase.client.TestCloneSnapshotFromClientAfterSplittingRegion | | | hadoop.hbase.client.TestRestoreSnapshotFromClientAfterSplittingRegions | | | hadoop.hbase.master.procedure.TestServerCrashProcedureWithReplicas | | | hadoop.hbase.client.TestRestoreSnapshotFromClientWithRegionReplicas | | | hadoop.hbase.client.TestMobCloneSnapshotFromClientAfterSplittingRegion | | | hadoop.hbase.client.TestAdmin1 | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:b002b0b | | JIRA Issue | HBASE-21490 | | JIRA Patch URL |
[jira] [Commented] (HBASE-21490) WALProcedure may remove proc wal files still with active procedures
[ https://issues.apache.org/jira/browse/HBASE-21490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16691209#comment-16691209 ] Duo Zhang commented on HBASE-21490: --- Review board link: https://reviews.apache.org/r/69387/ > WALProcedure may remove proc wal files still with active procedures > --- > > Key: HBASE-21490 > URL: https://issues.apache.org/jira/browse/HBASE-21490 > Project: HBase > Issue Type: Sub-task > Components: proc-v2 >Reporter: Duo Zhang >Assignee: Duo Zhang >Priority: Major > Fix For: 3.0.0, 2.2.0, 2.0.3, 2.1.2 > > Attachments: HBASE-21490-UT.patch, HBASE-21490-v1.patch, > HBASE-21490.patch, HBASE-21490.patch > > > It happens for me several times. After master restart, all the procedures are > gone. > And the proc wal files were deleted before restarting, I see this in the > master's log > {noformat} > 2018-11-16,20:57:40,177 INFO [WALProcedureStoreSyncThread] > org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore: Remove all > state logs with ID less than 184, since all the active procedures are in the > latest log > 2018-11-16,20:57:40,177 INFO [WALProcedureStoreSyncThread] > org.apache.hadoop.hbase.procedure2.store.wal.ProcedureWALFile: Archiving > hdfs://c4tst-xiaomi/hbase/c4tst-sync1/MasterProcWALs/pv2-0184.log > to hdfs://c4tst-xiaomi/hbase/c4tst-sync1/oldWALs/pv2-0184.log > {noformat} > Let me dig... -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21490) WALProcedure may remove proc wal files still with active procedures
[ https://issues.apache.org/jira/browse/HBASE-21490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16691140#comment-16691140 ] Duo Zhang commented on HBASE-21490: --- Let me check the failed UT, they should be related. The problem could also happen for branch-2,1 & 2.0, as the root cause is that, we fail when loading and leave the storeTracker in an intermediate state and then persist it with a proc wal file. > WALProcedure may remove proc wal files still with active procedures > --- > > Key: HBASE-21490 > URL: https://issues.apache.org/jira/browse/HBASE-21490 > Project: HBase > Issue Type: Sub-task > Components: proc-v2 >Reporter: Duo Zhang >Assignee: Duo Zhang >Priority: Major > Fix For: 3.0.0, 2.2.0, 2.0.3, 2.1.2 > > Attachments: HBASE-21490-UT.patch, HBASE-21490.patch, > HBASE-21490.patch > > > It happens for me several times. After master restart, all the procedures are > gone. > And the proc wal files were deleted before restarting, I see this in the > master's log > {noformat} > 2018-11-16,20:57:40,177 INFO [WALProcedureStoreSyncThread] > org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore: Remove all > state logs with ID less than 184, since all the active procedures are in the > latest log > 2018-11-16,20:57:40,177 INFO [WALProcedureStoreSyncThread] > org.apache.hadoop.hbase.procedure2.store.wal.ProcedureWALFile: Archiving > hdfs://c4tst-xiaomi/hbase/c4tst-sync1/MasterProcWALs/pv2-0184.log > to hdfs://c4tst-xiaomi/hbase/c4tst-sync1/oldWALs/pv2-0184.log > {noformat} > Let me dig... -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21490) WALProcedure may remove proc wal files still with active procedures
[ https://issues.apache.org/jira/browse/HBASE-21490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16691114#comment-16691114 ] stack commented on HBASE-21490: --- Just saw note above... As per Allan, nice find. You think this could happen in branch-2.1/branch-2.0? > WALProcedure may remove proc wal files still with active procedures > --- > > Key: HBASE-21490 > URL: https://issues.apache.org/jira/browse/HBASE-21490 > Project: HBase > Issue Type: Sub-task > Components: proc-v2 >Reporter: Duo Zhang >Assignee: Duo Zhang >Priority: Major > Fix For: 3.0.0, 2.2.0, 2.0.3, 2.1.2 > > Attachments: HBASE-21490-UT.patch, HBASE-21490.patch, > HBASE-21490.patch > > > It happens for me several times. After master restart, all the procedures are > gone. > And the proc wal files were deleted before restarting, I see this in the > master's log > {noformat} > 2018-11-16,20:57:40,177 INFO [WALProcedureStoreSyncThread] > org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore: Remove all > state logs with ID less than 184, since all the active procedures are in the > latest log > 2018-11-16,20:57:40,177 INFO [WALProcedureStoreSyncThread] > org.apache.hadoop.hbase.procedure2.store.wal.ProcedureWALFile: Archiving > hdfs://c4tst-xiaomi/hbase/c4tst-sync1/MasterProcWALs/pv2-0184.log > to hdfs://c4tst-xiaomi/hbase/c4tst-sync1/oldWALs/pv2-0184.log > {noformat} > Let me dig... -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21490) WALProcedure may remove proc wal files still with active procedures
[ https://issues.apache.org/jira/browse/HBASE-21490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16691103#comment-16691103 ] stack commented on HBASE-21490: --- Does this apply to branch-2.0/branch-2.1? There is not RecoverStandByProcedure in those branches. Looking like patch though, it looks like good stuff that belongs on all branches? Thanks. > WALProcedure may remove proc wal files still with active procedures > --- > > Key: HBASE-21490 > URL: https://issues.apache.org/jira/browse/HBASE-21490 > Project: HBase > Issue Type: Sub-task > Components: proc-v2 >Reporter: Duo Zhang >Assignee: Duo Zhang >Priority: Major > Fix For: 3.0.0, 2.2.0, 2.0.3, 2.1.2 > > Attachments: HBASE-21490-UT.patch, HBASE-21490.patch, > HBASE-21490.patch > > > It happens for me several times. After master restart, all the procedures are > gone. > And the proc wal files were deleted before restarting, I see this in the > master's log > {noformat} > 2018-11-16,20:57:40,177 INFO [WALProcedureStoreSyncThread] > org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore: Remove all > state logs with ID less than 184, since all the active procedures are in the > latest log > 2018-11-16,20:57:40,177 INFO [WALProcedureStoreSyncThread] > org.apache.hadoop.hbase.procedure2.store.wal.ProcedureWALFile: Archiving > hdfs://c4tst-xiaomi/hbase/c4tst-sync1/MasterProcWALs/pv2-0184.log > to hdfs://c4tst-xiaomi/hbase/c4tst-sync1/oldWALs/pv2-0184.log > {noformat} > Let me dig... -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21490) WALProcedure may remove proc wal files still with active procedures
[ https://issues.apache.org/jira/browse/HBASE-21490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16691100#comment-16691100 ] stack commented on HBASE-21490: --- Retry > WALProcedure may remove proc wal files still with active procedures > --- > > Key: HBASE-21490 > URL: https://issues.apache.org/jira/browse/HBASE-21490 > Project: HBase > Issue Type: Sub-task > Components: proc-v2 >Reporter: Duo Zhang >Assignee: Duo Zhang >Priority: Major > Fix For: 3.0.0, 2.2.0, 2.0.3, 2.1.2 > > Attachments: HBASE-21490-UT.patch, HBASE-21490.patch, > HBASE-21490.patch > > > It happens for me several times. After master restart, all the procedures are > gone. > And the proc wal files were deleted before restarting, I see this in the > master's log > {noformat} > 2018-11-16,20:57:40,177 INFO [WALProcedureStoreSyncThread] > org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore: Remove all > state logs with ID less than 184, since all the active procedures are in the > latest log > 2018-11-16,20:57:40,177 INFO [WALProcedureStoreSyncThread] > org.apache.hadoop.hbase.procedure2.store.wal.ProcedureWALFile: Archiving > hdfs://c4tst-xiaomi/hbase/c4tst-sync1/MasterProcWALs/pv2-0184.log > to hdfs://c4tst-xiaomi/hbase/c4tst-sync1/oldWALs/pv2-0184.log > {noformat} > Let me dig... -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21490) WALProcedure may remove proc wal files still with active procedures
[ https://issues.apache.org/jira/browse/HBASE-21490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16690947#comment-16690947 ] Hadoop QA commented on HBASE-21490: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 9s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 24s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 0s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 10s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 18s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 3m 48s{color} | {color:green} branch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 18s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 44s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 16s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 3s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 10s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 10s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 19s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 3m 49s{color} | {color:green} patch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 8m 19s{color} | {color:green} Patch does not cause any errors with Hadoop 2.7.4 or 3.0.0. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 36s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 43s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 3m 41s{color} | {color:red} hbase-procedure in the patch failed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green}131m 13s{color} | {color:green} hbase-server in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 50s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}174m 36s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hbase.procedure2.TestForceUpdateProcedure | | | hadoop.hbase.procedure2.store.wal.TestWALProcedureStore | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:b002b0b | | JIRA Issue | HBASE-21490 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12948631/HBASE-21490.patch | | Optional Tests | dupname asflicense javac javadoc unit findbugs shadedjars hadoopcheck hbaseanti checkstyle compile | | uname | Linux db803b92b851 4.4.0-138-generic #164-Ubuntu SMP Tue Oct 2 17:16:02 UTC 2018 x86_64 GNU/Linux | | Build tool | maven | | Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh | | git revision | master /
[jira] [Commented] (HBASE-21490) WALProcedure may remove proc wal files still with active procedures
[ https://issues.apache.org/jira/browse/HBASE-21490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16690893#comment-16690893 ] Duo Zhang commented on HBASE-21490: --- We do not set abort to true when aborting master, this is why the UT will fail. But in a real production I think we should do more, as we'd better not rely on the abort flag, we should know that the store tracker is in a broken state... > WALProcedure may remove proc wal files still with active procedures > --- > > Key: HBASE-21490 > URL: https://issues.apache.org/jira/browse/HBASE-21490 > Project: HBase > Issue Type: Sub-task > Components: proc-v2 >Reporter: Duo Zhang >Priority: Major > Attachments: HBASE-21490-UT.patch > > > It happens for me several times. After master restart, all the procedures are > gone. > And the proc wal files were deleted before restarting, I see this in the > master's log > {noformat} > 2018-11-16,20:57:40,177 INFO [WALProcedureStoreSyncThread] > org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore: Remove all > state logs with ID less than 184, since all the active procedures are in the > latest log > 2018-11-16,20:57:40,177 INFO [WALProcedureStoreSyncThread] > org.apache.hadoop.hbase.procedure2.store.wal.ProcedureWALFile: Archiving > hdfs://c4tst-xiaomi/hbase/c4tst-sync1/MasterProcWALs/pv2-0184.log > to hdfs://c4tst-xiaomi/hbase/c4tst-sync1/oldWALs/pv2-0184.log > {noformat} > Let me dig... -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21490) WALProcedure may remove proc wal files still with active procedures
[ https://issues.apache.org/jira/browse/HBASE-21490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16690891#comment-16690891 ] Allan Yang commented on HBASE-21490: Good finding! I think we can move the set partial flag to the finally block. And another point is that I think we shouldn't persist any storeTracker when aborting. > WALProcedure may remove proc wal files still with active procedures > --- > > Key: HBASE-21490 > URL: https://issues.apache.org/jira/browse/HBASE-21490 > Project: HBase > Issue Type: Sub-task > Components: proc-v2 >Reporter: Duo Zhang >Priority: Major > Attachments: HBASE-21490-UT.patch > > > It happens for me several times. After master restart, all the procedures are > gone. > And the proc wal files were deleted before restarting, I see this in the > master's log > {noformat} > 2018-11-16,20:57:40,177 INFO [WALProcedureStoreSyncThread] > org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore: Remove all > state logs with ID less than 184, since all the active procedures are in the > latest log > 2018-11-16,20:57:40,177 INFO [WALProcedureStoreSyncThread] > org.apache.hadoop.hbase.procedure2.store.wal.ProcedureWALFile: Archiving > hdfs://c4tst-xiaomi/hbase/c4tst-sync1/MasterProcWALs/pv2-0184.log > to hdfs://c4tst-xiaomi/hbase/c4tst-sync1/oldWALs/pv2-0184.log > {noformat} > Let me dig... -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21490) WALProcedure may remove proc wal files still with active procedures
[ https://issues.apache.org/jira/browse/HBASE-21490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16690885#comment-16690885 ] Duo Zhang commented on HBASE-21490: --- UT to reproduce the problem. And also found a typo in WALProcedureStore, forgot to update the tracker variable in the loop at the end of buildHoldingCleanupTracker... > WALProcedure may remove proc wal files still with active procedures > --- > > Key: HBASE-21490 > URL: https://issues.apache.org/jira/browse/HBASE-21490 > Project: HBase > Issue Type: Sub-task > Components: proc-v2 >Reporter: Duo Zhang >Priority: Major > Attachments: HBASE-21490-UT.patch > > > It happens for me several times. After master restart, all the procedures are > gone. > And the proc wal files were deleted before restarting, I see this in the > master's log > {noformat} > 2018-11-16,20:57:40,177 INFO [WALProcedureStoreSyncThread] > org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore: Remove all > state logs with ID less than 184, since all the active procedures are in the > latest log > 2018-11-16,20:57:40,177 INFO [WALProcedureStoreSyncThread] > org.apache.hadoop.hbase.procedure2.store.wal.ProcedureWALFile: Archiving > hdfs://c4tst-xiaomi/hbase/c4tst-sync1/MasterProcWALs/pv2-0184.log > to hdfs://c4tst-xiaomi/hbase/c4tst-sync1/oldWALs/pv2-0184.log > {noformat} > Let me dig... -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21490) WALProcedure may remove proc wal files still with active procedures
[ https://issues.apache.org/jira/browse/HBASE-21490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16690838#comment-16690838 ] Duo Zhang commented on HBASE-21490: --- OK I think I found the problem... In ProcedureExecutor.load, we will do this in the finally block {code} try { // try to cleanup inactive wals and complete the operation buildHoldingCleanupTracker(); tryCleanupLogsOnLoad(); loading.set(false); } finally { lock.unlock(); } {code} And also, in ProcedureExecutor.stop, we will close the current log stream, and persist the current storeTracker into the file. And this is the code when loading procedures {code} public static void load(Iterator logs, ProcedureStoreTracker tracker, Loader loader) throws IOException { ProcedureWALFormatReader reader = new ProcedureWALFormatReader(tracker, loader); tracker.setKeepDeletes(true); try { // Ignore the last log which is current active log. while (logs.hasNext()) { ProcedureWALFile log = logs.next(); log.open(); try { reader.read(log); } finally { log.close(); } } reader.finish(); // The tracker is now updated with all the procedures read from the logs if (tracker.isPartial()) { tracker.setPartialFlag(false); } tracker.resetModified(); } finally { tracker.setKeepDeletes(false); } } {code} And for HBASE-21494, we will throw exception at reader.finish, so we do not unset the partial flag, and more important, we do not call resetModified, this means that the current storeTracker will have all the active procedures modified. So after the first crash, we will persist the broken storeTracker into the file, and when loading the second time, we will load this storeTracker, and since we will open another new file, this will not be the last file, which means we will use its modified bits when building holdingCleanupTracker, and no doubt, it contains all active procedures so we think it is OK to delete the all the files before it... And although the second time we will still crashes, the buildHoldingCleanupTracker and removeInactiveLogs are in the finally block, the above logic will still be executed and then we will delete all the proc wal files... Let me think how to fix. [~stack] [~allan163] FYI. > WALProcedure may remove proc wal files still with active procedures > --- > > Key: HBASE-21490 > URL: https://issues.apache.org/jira/browse/HBASE-21490 > Project: HBase > Issue Type: Sub-task > Components: proc-v2 >Reporter: Duo Zhang >Priority: Major > > It happens for me several times. After master restart, all the procedures are > gone. > And the proc wal files were deleted before restarting, I see this in the > master's log > {noformat} > 2018-11-16,20:57:40,177 INFO [WALProcedureStoreSyncThread] > org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore: Remove all > state logs with ID less than 184, since all the active procedures are in the > latest log > 2018-11-16,20:57:40,177 INFO [WALProcedureStoreSyncThread] > org.apache.hadoop.hbase.procedure2.store.wal.ProcedureWALFile: Archiving > hdfs://c4tst-xiaomi/hbase/c4tst-sync1/MasterProcWALs/pv2-0184.log > to hdfs://c4tst-xiaomi/hbase/c4tst-sync1/oldWALs/pv2-0184.log > {noformat} > Let me dig... -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21490) WALProcedure may remove proc wal files still with active procedures
[ https://issues.apache.org/jira/browse/HBASE-21490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16690535#comment-16690535 ] Duo Zhang commented on HBASE-21490: --- OK I found this {noformat} 2018-11-16,21:06:04,667 INFO [master/c4-hadoop-tst-ct05:19100:becomeActiveMaster] org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore: Remove the oldest log hdfs://c4tst-xiaomi/hbase/c4tst-sync1/MasterProcWALs/pv2-0185.log 2018-11-16,21:06:04,667 INFO [master/c4-hadoop-tst-ct05:19100:becomeActiveMaster] org.apache.hadoop.hbase.procedure2.store.wal.ProcedureWALFile: Archiving hdfs://c4tst-xiaomi/hbase/c4tst-sync1/MasterProcWALs/pv2-0185.log to hdfs://c4tst-xiaomi/hbase/c4tst-sync1/oldWALs/pv2-0185.log 2018-11-16,21:06:04,672 DEBUG [master/c4-hadoop-tst-ct05:19100:becomeActiveMaster] org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore: Removed log=hdfs://c4tst-xiaomi/hbase/c4tst-sync1/MasterProcWALs/pv2-0185.log, activeLogs=[hdfs://c4tst-xiaomi/hbase/c4tst-sync1/MasterProcWALs/pv2-0186.log, hdfs://c4tst-xiaomi/hbase/c4tst-sync1/MasterProcWALs/pv2-0187.log] {noformat} I think there maybe something wrong when building the holdingCleanupTracker under some special case. Let me dig. > WALProcedure may remove proc wal files still with active procedures > --- > > Key: HBASE-21490 > URL: https://issues.apache.org/jira/browse/HBASE-21490 > Project: HBase > Issue Type: Sub-task > Components: proc-v2 >Reporter: Duo Zhang >Priority: Major > > It happens for me several times. After master restart, all the procedures are > gone. > And the proc wal files were deleted before restarting, I see this in the > master's log > {noformat} > 2018-11-16,20:57:40,177 INFO [WALProcedureStoreSyncThread] > org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore: Remove all > state logs with ID less than 184, since all the active procedures are in the > latest log > 2018-11-16,20:57:40,177 INFO [WALProcedureStoreSyncThread] > org.apache.hadoop.hbase.procedure2.store.wal.ProcedureWALFile: Archiving > hdfs://c4tst-xiaomi/hbase/c4tst-sync1/MasterProcWALs/pv2-0184.log > to hdfs://c4tst-xiaomi/hbase/c4tst-sync1/oldWALs/pv2-0184.log > {noformat} > Let me dig... -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21490) WALProcedure may remove proc wal files still with active procedures
[ https://issues.apache.org/jira/browse/HBASE-21490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16690533#comment-16690533 ] Duo Zhang commented on HBASE-21490: --- OK, the root cause is a bug in RecoverStandByProcedure, there is a NPE when loading it and then causes the master down. But after two times of restarts, the file contains the procedures is deleted. {noformat} 2018-11-16,20:43:37,454 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: allowed=true ugi=hbase_tst/hadoop@XIAOMI.HADOOP (auth:KERBEROS) ip=/10.132.16.33 cmd=create src=/hbase/c4tst-sync1/MasterProcWALs/pv2-0185.log perm=hbase_tst:supergroup:rw-r-proto=rpc 2018-11-16,21:05:58,652 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: allowed=true ugi=hbase_tst/hadoop@XIAOMI.HADOOP (auth:KERBEROS) ip=/10.132.16.34 cmd=open src=/hbase/c4tst-sync1/MasterProcWALs/pv2-0185.log proto=rpc 2018-11-16,21:05:58,747 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: allowed=true ugi=hbase_tst/hadoop@XIAOMI.HADOOP (auth:KERBEROS) ip=/10.132.16.34 cmd=open src=/hbase/c4tst-sync1/MasterProcWALs/pv2-0185.log proto=rpc 2018-11-16,21:06:04,196 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: allowed=true ugi=hbase_tst/hadoop@XIAOMI.HADOOP (auth:KERBEROS) ip=/10.132.16.34 cmd=open src=/hbase/c4tst-sync1/MasterProcWALs/pv2-0185.log proto=rpc 2018-11-16,21:06:04,305 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: allowed=true ugi=hbase_tst/hadoop@XIAOMI.HADOOP (auth:KERBEROS) ip=/10.132.16.34 cmd=open src=/hbase/c4tst-sync1/MasterProcWALs/pv2-0185.log proto=rpc 2018-11-16,21:06:04,669 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: allowed=true ugi=hbase_tst/hadoop@XIAOMI.HADOOP (auth:KERBEROS) ip=/10.132.16.34 cmd=rename src=/hbase/c4tst-sync1/MasterProcWALs/pv2-0185.log dst=/hbase/c4tst-sync1/oldWALs/pv2-0185.log perm=hbase_tst:supergroup:rw-r- proto=rpc 2018-11-16,21:07:12,776 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: allowed=true ugi=hbase_tst/hadoop@XIAOMI.HADOOP (auth:KERBEROS) ip=/10.132.16.34 cmd=delete src=/hbase/c4tst-sync1/oldWALs/pv2-0185.log {noformat} Let me check what is going on here... > WALProcedure may remove proc wal files still with active procedures > --- > > Key: HBASE-21490 > URL: https://issues.apache.org/jira/browse/HBASE-21490 > Project: HBase > Issue Type: Sub-task > Components: proc-v2 >Reporter: Duo Zhang >Priority: Major > > It happens for me several times. After master restart, all the procedures are > gone. > And the proc wal files were deleted before restarting, I see this in the > master's log > {noformat} > 2018-11-16,20:57:40,177 INFO [WALProcedureStoreSyncThread] > org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore: Remove all > state logs with ID less than 184, since all the active procedures are in the > latest log > 2018-11-16,20:57:40,177 INFO [WALProcedureStoreSyncThread] > org.apache.hadoop.hbase.procedure2.store.wal.ProcedureWALFile: Archiving > hdfs://c4tst-xiaomi/hbase/c4tst-sync1/MasterProcWALs/pv2-0184.log > to hdfs://c4tst-xiaomi/hbase/c4tst-sync1/oldWALs/pv2-0184.log > {noformat} > Let me dig... -- This message was sent by Atlassian JIRA (v7.6.3#76005)