[jira] [Commented] (HBASE-21490) WALProcedure may remove proc wal files still with active procedures

2018-11-20 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16693107#comment-16693107
 ] 

Hudson commented on HBASE-21490:


Results for branch master
[build #618 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/master/618/]: (x) 
*{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/master/618//General_Nightly_Build_Report/]




(x) {color:red}-1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/master/618//JDK8_Nightly_Build_Report_(Hadoop2)/]


(x) {color:red}-1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/master/618//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> WALProcedure may remove proc wal files still with active procedures
> ---
>
> Key: HBASE-21490
> URL: https://issues.apache.org/jira/browse/HBASE-21490
> Project: HBase
>  Issue Type: Sub-task
>  Components: proc-v2
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
> Fix For: 3.0.0, 2.2.0, 2.0.3, 2.1.2
>
> Attachments: HBASE-21490-UT.patch, HBASE-21490-v1.patch, 
> HBASE-21490.patch, HBASE-21490.patch
>
>
> It happens for me several times. After master restart, all the procedures are 
> gone.
> And the proc wal files were deleted before restarting, I see this in the 
> master's log
> {noformat}
> 2018-11-16,20:57:40,177 INFO [WALProcedureStoreSyncThread] 
> org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore: Remove all 
> state logs with ID less than 184, since all the active procedures are in the 
> latest log
> 2018-11-16,20:57:40,177 INFO [WALProcedureStoreSyncThread] 
> org.apache.hadoop.hbase.procedure2.store.wal.ProcedureWALFile: Archiving 
> hdfs://c4tst-xiaomi/hbase/c4tst-sync1/MasterProcWALs/pv2-0184.log
>  to hdfs://c4tst-xiaomi/hbase/c4tst-sync1/oldWALs/pv2-0184.log
> {noformat}
> Let me dig...



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21490) WALProcedure may remove proc wal files still with active procedures

2018-11-19 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16692630#comment-16692630
 ] 

Hudson commented on HBASE-21490:


Results for branch branch-2.1
[build #620 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/620/]: 
(/) *{color:green}+1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/620//General_Nightly_Build_Report/]




(/) {color:green}+1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/620//JDK8_Nightly_Build_Report_(Hadoop2)/]


(/) {color:green}+1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/620//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> WALProcedure may remove proc wal files still with active procedures
> ---
>
> Key: HBASE-21490
> URL: https://issues.apache.org/jira/browse/HBASE-21490
> Project: HBase
>  Issue Type: Sub-task
>  Components: proc-v2
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
> Fix For: 3.0.0, 2.2.0, 2.0.3, 2.1.2
>
> Attachments: HBASE-21490-UT.patch, HBASE-21490-v1.patch, 
> HBASE-21490.patch, HBASE-21490.patch
>
>
> It happens for me several times. After master restart, all the procedures are 
> gone.
> And the proc wal files were deleted before restarting, I see this in the 
> master's log
> {noformat}
> 2018-11-16,20:57:40,177 INFO [WALProcedureStoreSyncThread] 
> org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore: Remove all 
> state logs with ID less than 184, since all the active procedures are in the 
> latest log
> 2018-11-16,20:57:40,177 INFO [WALProcedureStoreSyncThread] 
> org.apache.hadoop.hbase.procedure2.store.wal.ProcedureWALFile: Archiving 
> hdfs://c4tst-xiaomi/hbase/c4tst-sync1/MasterProcWALs/pv2-0184.log
>  to hdfs://c4tst-xiaomi/hbase/c4tst-sync1/oldWALs/pv2-0184.log
> {noformat}
> Let me dig...



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21490) WALProcedure may remove proc wal files still with active procedures

2018-11-19 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16692614#comment-16692614
 ] 

Hudson commented on HBASE-21490:


Results for branch branch-2.0
[build #1098 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/1098/]: 
(/) *{color:green}+1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/1098//General_Nightly_Build_Report/]




(/) {color:green}+1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/1098//JDK8_Nightly_Build_Report_(Hadoop2)/]


(/) {color:green}+1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/1098//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


> WALProcedure may remove proc wal files still with active procedures
> ---
>
> Key: HBASE-21490
> URL: https://issues.apache.org/jira/browse/HBASE-21490
> Project: HBase
>  Issue Type: Sub-task
>  Components: proc-v2
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
> Fix For: 3.0.0, 2.2.0, 2.0.3, 2.1.2
>
> Attachments: HBASE-21490-UT.patch, HBASE-21490-v1.patch, 
> HBASE-21490.patch, HBASE-21490.patch
>
>
> It happens for me several times. After master restart, all the procedures are 
> gone.
> And the proc wal files were deleted before restarting, I see this in the 
> master's log
> {noformat}
> 2018-11-16,20:57:40,177 INFO [WALProcedureStoreSyncThread] 
> org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore: Remove all 
> state logs with ID less than 184, since all the active procedures are in the 
> latest log
> 2018-11-16,20:57:40,177 INFO [WALProcedureStoreSyncThread] 
> org.apache.hadoop.hbase.procedure2.store.wal.ProcedureWALFile: Archiving 
> hdfs://c4tst-xiaomi/hbase/c4tst-sync1/MasterProcWALs/pv2-0184.log
>  to hdfs://c4tst-xiaomi/hbase/c4tst-sync1/oldWALs/pv2-0184.log
> {noformat}
> Let me dig...



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21490) WALProcedure may remove proc wal files still with active procedures

2018-11-19 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16692431#comment-16692431
 ] 

Hudson commented on HBASE-21490:


Results for branch branch-2
[build #1512 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1512/]: 
(x) *{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1512//General_Nightly_Build_Report/]




(x) {color:red}-1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1512//JDK8_Nightly_Build_Report_(Hadoop2)/]


(/) {color:green}+1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1512//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> WALProcedure may remove proc wal files still with active procedures
> ---
>
> Key: HBASE-21490
> URL: https://issues.apache.org/jira/browse/HBASE-21490
> Project: HBase
>  Issue Type: Sub-task
>  Components: proc-v2
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
> Fix For: 3.0.0, 2.2.0, 2.0.3, 2.1.2
>
> Attachments: HBASE-21490-UT.patch, HBASE-21490-v1.patch, 
> HBASE-21490.patch, HBASE-21490.patch
>
>
> It happens for me several times. After master restart, all the procedures are 
> gone.
> And the proc wal files were deleted before restarting, I see this in the 
> master's log
> {noformat}
> 2018-11-16,20:57:40,177 INFO [WALProcedureStoreSyncThread] 
> org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore: Remove all 
> state logs with ID less than 184, since all the active procedures are in the 
> latest log
> 2018-11-16,20:57:40,177 INFO [WALProcedureStoreSyncThread] 
> org.apache.hadoop.hbase.procedure2.store.wal.ProcedureWALFile: Archiving 
> hdfs://c4tst-xiaomi/hbase/c4tst-sync1/MasterProcWALs/pv2-0184.log
>  to hdfs://c4tst-xiaomi/hbase/c4tst-sync1/oldWALs/pv2-0184.log
> {noformat}
> Let me dig...



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21490) WALProcedure may remove proc wal files still with active procedures

2018-11-19 Thread Duo Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16691654#comment-16691654
 ] 

Duo Zhang commented on HBASE-21490:
---

Will commit tomorrow if no objections.

> WALProcedure may remove proc wal files still with active procedures
> ---
>
> Key: HBASE-21490
> URL: https://issues.apache.org/jira/browse/HBASE-21490
> Project: HBase
>  Issue Type: Sub-task
>  Components: proc-v2
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
> Fix For: 3.0.0, 2.2.0, 2.0.3, 2.1.2
>
> Attachments: HBASE-21490-UT.patch, HBASE-21490-v1.patch, 
> HBASE-21490.patch, HBASE-21490.patch
>
>
> It happens for me several times. After master restart, all the procedures are 
> gone.
> And the proc wal files were deleted before restarting, I see this in the 
> master's log
> {noformat}
> 2018-11-16,20:57:40,177 INFO [WALProcedureStoreSyncThread] 
> org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore: Remove all 
> state logs with ID less than 184, since all the active procedures are in the 
> latest log
> 2018-11-16,20:57:40,177 INFO [WALProcedureStoreSyncThread] 
> org.apache.hadoop.hbase.procedure2.store.wal.ProcedureWALFile: Archiving 
> hdfs://c4tst-xiaomi/hbase/c4tst-sync1/MasterProcWALs/pv2-0184.log
>  to hdfs://c4tst-xiaomi/hbase/c4tst-sync1/oldWALs/pv2-0184.log
> {noformat}
> Let me dig...



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21490) WALProcedure may remove proc wal files still with active procedures

2018-11-18 Thread Allan Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16691295#comment-16691295
 ] 

Allan Yang commented on HBASE-21490:


OK, +1 for the patch then

> WALProcedure may remove proc wal files still with active procedures
> ---
>
> Key: HBASE-21490
> URL: https://issues.apache.org/jira/browse/HBASE-21490
> Project: HBase
>  Issue Type: Sub-task
>  Components: proc-v2
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
> Fix For: 3.0.0, 2.2.0, 2.0.3, 2.1.2
>
> Attachments: HBASE-21490-UT.patch, HBASE-21490-v1.patch, 
> HBASE-21490.patch, HBASE-21490.patch
>
>
> It happens for me several times. After master restart, all the procedures are 
> gone.
> And the proc wal files were deleted before restarting, I see this in the 
> master's log
> {noformat}
> 2018-11-16,20:57:40,177 INFO [WALProcedureStoreSyncThread] 
> org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore: Remove all 
> state logs with ID less than 184, since all the active procedures are in the 
> latest log
> 2018-11-16,20:57:40,177 INFO [WALProcedureStoreSyncThread] 
> org.apache.hadoop.hbase.procedure2.store.wal.ProcedureWALFile: Archiving 
> hdfs://c4tst-xiaomi/hbase/c4tst-sync1/MasterProcWALs/pv2-0184.log
>  to hdfs://c4tst-xiaomi/hbase/c4tst-sync1/oldWALs/pv2-0184.log
> {noformat}
> Let me dig...



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21490) WALProcedure may remove proc wal files still with active procedures

2018-11-18 Thread Duo Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16691289#comment-16691289
 ] 

Duo Zhang commented on HBASE-21490:
---

{code}
can we just use abort flag?
{code}

No we don't. As said above, the sync thread will do periodicalRoll if not in 
loading state, in this method we just call the close method with abort = false. 
And it could happen that we fail to load procedures, and before we actually 
call stop with abort = true, the sync thread has already deleted some inactive 
logs based on the broken store tracker.

So generally speaking, we should store the 'failed loading' state in the class 
to prevent further damage, since damage could happen before we call stop with 
abort = true.

> WALProcedure may remove proc wal files still with active procedures
> ---
>
> Key: HBASE-21490
> URL: https://issues.apache.org/jira/browse/HBASE-21490
> Project: HBase
>  Issue Type: Sub-task
>  Components: proc-v2
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
> Fix For: 3.0.0, 2.2.0, 2.0.3, 2.1.2
>
> Attachments: HBASE-21490-UT.patch, HBASE-21490-v1.patch, 
> HBASE-21490.patch, HBASE-21490.patch
>
>
> It happens for me several times. After master restart, all the procedures are 
> gone.
> And the proc wal files were deleted before restarting, I see this in the 
> master's log
> {noformat}
> 2018-11-16,20:57:40,177 INFO [WALProcedureStoreSyncThread] 
> org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore: Remove all 
> state logs with ID less than 184, since all the active procedures are in the 
> latest log
> 2018-11-16,20:57:40,177 INFO [WALProcedureStoreSyncThread] 
> org.apache.hadoop.hbase.procedure2.store.wal.ProcedureWALFile: Archiving 
> hdfs://c4tst-xiaomi/hbase/c4tst-sync1/MasterProcWALs/pv2-0184.log
>  to hdfs://c4tst-xiaomi/hbase/c4tst-sync1/oldWALs/pv2-0184.log
> {noformat}
> Let me dig...



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21490) WALProcedure may remove proc wal files still with active procedures

2018-11-18 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16691286#comment-16691286
 ] 

Hadoop QA commented on HBASE-21490:
---

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
12s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
24s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  3m 
56s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m  
7s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
20s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  3m 
46s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
23s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
43s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
16s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
 2s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  2m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
20s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  3m 
45s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green}  
8m 18s{color} | {color:green} Patch does not cause any errors with Hadoop 2.7.4 
or 3.0.0. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
35s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
43s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  3m 
32s{color} | {color:green} hbase-procedure in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}130m 
49s{color} | {color:green} hbase-server in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
51s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}174m  1s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:b002b0b |
| JIRA Issue | HBASE-21490 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12948652/HBASE-21490-v1.patch |
| Optional Tests |  dupname  asflicense  javac  javadoc  unit  findbugs  
shadedjars  hadoopcheck  hbaseanti  checkstyle  compile  |
| uname | Linux f48da578f574 4.4.0-138-generic #164-Ubuntu SMP Tue Oct 2 
17:16:02 UTC 2018 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
 |
| git revision | master / b329e6e3f2 |
| maven | version: Apache Maven 3.5.4 
(1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-17T18:33:14Z) |
| Default Java | 1.8.0_181 |
| findbugs | 

[jira] [Commented] (HBASE-21490) WALProcedure may remove proc wal files still with active procedures

2018-11-18 Thread Allan Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16691288#comment-16691288
 ] 

Allan Yang commented on HBASE-21490:


Why using loading to decide whether persistence is needed? can we just use 
abort flag?
{quote}
But in a real production I think we should do more, as we'd better not rely on 
the abort flag, we should know that the store tracker is in a broken state...
{quote}
What's you conern here? [~Apache9]

> WALProcedure may remove proc wal files still with active procedures
> ---
>
> Key: HBASE-21490
> URL: https://issues.apache.org/jira/browse/HBASE-21490
> Project: HBase
>  Issue Type: Sub-task
>  Components: proc-v2
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
> Fix For: 3.0.0, 2.2.0, 2.0.3, 2.1.2
>
> Attachments: HBASE-21490-UT.patch, HBASE-21490-v1.patch, 
> HBASE-21490.patch, HBASE-21490.patch
>
>
> It happens for me several times. After master restart, all the procedures are 
> gone.
> And the proc wal files were deleted before restarting, I see this in the 
> master's log
> {noformat}
> 2018-11-16,20:57:40,177 INFO [WALProcedureStoreSyncThread] 
> org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore: Remove all 
> state logs with ID less than 184, since all the active procedures are in the 
> latest log
> 2018-11-16,20:57:40,177 INFO [WALProcedureStoreSyncThread] 
> org.apache.hadoop.hbase.procedure2.store.wal.ProcedureWALFile: Archiving 
> hdfs://c4tst-xiaomi/hbase/c4tst-sync1/MasterProcWALs/pv2-0184.log
>  to hdfs://c4tst-xiaomi/hbase/c4tst-sync1/oldWALs/pv2-0184.log
> {noformat}
> Let me dig...



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21490) WALProcedure may remove proc wal files still with active procedures

2018-11-18 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16691210#comment-16691210
 ] 

Hadoop QA commented on HBASE-21490:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
18s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
15s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  5m 
 1s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m 
49s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
38s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
45s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m  
4s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
54s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
18s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  5m 
 0s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m 
55s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  2m 
55s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
44s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
10m 32s{color} | {color:green} Patch does not cause any errors with Hadoop 
2.7.4 or 3.0.0. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
22s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
54s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  3m 54s{color} 
| {color:red} hbase-procedure in the patch failed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red}282m 24s{color} 
| {color:red} hbase-server in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
49s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}336m  7s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hbase.procedure2.TestForceUpdateProcedure |
|   | hadoop.hbase.procedure2.store.wal.TestWALProcedureStore |
|   | hadoop.hbase.client.TestMobRestoreSnapshotFromClientAfterSplittingRegions 
|
|   | hadoop.hbase.client.TestCloneSnapshotFromClientAfterSplittingRegion |
|   | hadoop.hbase.client.TestRestoreSnapshotFromClientAfterSplittingRegions |
|   | hadoop.hbase.master.procedure.TestServerCrashProcedureWithReplicas |
|   | hadoop.hbase.client.TestRestoreSnapshotFromClientWithRegionReplicas |
|   | hadoop.hbase.client.TestMobCloneSnapshotFromClientAfterSplittingRegion |
|   | hadoop.hbase.client.TestAdmin1 |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:b002b0b |
| JIRA Issue | HBASE-21490 |
| JIRA Patch URL | 

[jira] [Commented] (HBASE-21490) WALProcedure may remove proc wal files still with active procedures

2018-11-18 Thread Duo Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16691209#comment-16691209
 ] 

Duo Zhang commented on HBASE-21490:
---

Review board link:

https://reviews.apache.org/r/69387/

> WALProcedure may remove proc wal files still with active procedures
> ---
>
> Key: HBASE-21490
> URL: https://issues.apache.org/jira/browse/HBASE-21490
> Project: HBase
>  Issue Type: Sub-task
>  Components: proc-v2
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
> Fix For: 3.0.0, 2.2.0, 2.0.3, 2.1.2
>
> Attachments: HBASE-21490-UT.patch, HBASE-21490-v1.patch, 
> HBASE-21490.patch, HBASE-21490.patch
>
>
> It happens for me several times. After master restart, all the procedures are 
> gone.
> And the proc wal files were deleted before restarting, I see this in the 
> master's log
> {noformat}
> 2018-11-16,20:57:40,177 INFO [WALProcedureStoreSyncThread] 
> org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore: Remove all 
> state logs with ID less than 184, since all the active procedures are in the 
> latest log
> 2018-11-16,20:57:40,177 INFO [WALProcedureStoreSyncThread] 
> org.apache.hadoop.hbase.procedure2.store.wal.ProcedureWALFile: Archiving 
> hdfs://c4tst-xiaomi/hbase/c4tst-sync1/MasterProcWALs/pv2-0184.log
>  to hdfs://c4tst-xiaomi/hbase/c4tst-sync1/oldWALs/pv2-0184.log
> {noformat}
> Let me dig...



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21490) WALProcedure may remove proc wal files still with active procedures

2018-11-18 Thread Duo Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16691140#comment-16691140
 ] 

Duo Zhang commented on HBASE-21490:
---

Let me check the failed UT, they should be related.

The problem could also happen for branch-2,1 & 2.0, as the root cause is that, 
we fail when loading and leave the storeTracker in an intermediate state and 
then persist it with a proc wal file.

> WALProcedure may remove proc wal files still with active procedures
> ---
>
> Key: HBASE-21490
> URL: https://issues.apache.org/jira/browse/HBASE-21490
> Project: HBase
>  Issue Type: Sub-task
>  Components: proc-v2
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
> Fix For: 3.0.0, 2.2.0, 2.0.3, 2.1.2
>
> Attachments: HBASE-21490-UT.patch, HBASE-21490.patch, 
> HBASE-21490.patch
>
>
> It happens for me several times. After master restart, all the procedures are 
> gone.
> And the proc wal files were deleted before restarting, I see this in the 
> master's log
> {noformat}
> 2018-11-16,20:57:40,177 INFO [WALProcedureStoreSyncThread] 
> org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore: Remove all 
> state logs with ID less than 184, since all the active procedures are in the 
> latest log
> 2018-11-16,20:57:40,177 INFO [WALProcedureStoreSyncThread] 
> org.apache.hadoop.hbase.procedure2.store.wal.ProcedureWALFile: Archiving 
> hdfs://c4tst-xiaomi/hbase/c4tst-sync1/MasterProcWALs/pv2-0184.log
>  to hdfs://c4tst-xiaomi/hbase/c4tst-sync1/oldWALs/pv2-0184.log
> {noformat}
> Let me dig...



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21490) WALProcedure may remove proc wal files still with active procedures

2018-11-18 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16691114#comment-16691114
 ] 

stack commented on HBASE-21490:
---

Just saw note above... As per Allan, nice find. You think this could happen in 
branch-2.1/branch-2.0?



> WALProcedure may remove proc wal files still with active procedures
> ---
>
> Key: HBASE-21490
> URL: https://issues.apache.org/jira/browse/HBASE-21490
> Project: HBase
>  Issue Type: Sub-task
>  Components: proc-v2
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
> Fix For: 3.0.0, 2.2.0, 2.0.3, 2.1.2
>
> Attachments: HBASE-21490-UT.patch, HBASE-21490.patch, 
> HBASE-21490.patch
>
>
> It happens for me several times. After master restart, all the procedures are 
> gone.
> And the proc wal files were deleted before restarting, I see this in the 
> master's log
> {noformat}
> 2018-11-16,20:57:40,177 INFO [WALProcedureStoreSyncThread] 
> org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore: Remove all 
> state logs with ID less than 184, since all the active procedures are in the 
> latest log
> 2018-11-16,20:57:40,177 INFO [WALProcedureStoreSyncThread] 
> org.apache.hadoop.hbase.procedure2.store.wal.ProcedureWALFile: Archiving 
> hdfs://c4tst-xiaomi/hbase/c4tst-sync1/MasterProcWALs/pv2-0184.log
>  to hdfs://c4tst-xiaomi/hbase/c4tst-sync1/oldWALs/pv2-0184.log
> {noformat}
> Let me dig...



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21490) WALProcedure may remove proc wal files still with active procedures

2018-11-18 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16691103#comment-16691103
 ] 

stack commented on HBASE-21490:
---

Does this apply to branch-2.0/branch-2.1? There is not RecoverStandByProcedure 
in those branches.

Looking like patch though, it looks like good stuff that belongs on all 
branches?

Thanks.

> WALProcedure may remove proc wal files still with active procedures
> ---
>
> Key: HBASE-21490
> URL: https://issues.apache.org/jira/browse/HBASE-21490
> Project: HBase
>  Issue Type: Sub-task
>  Components: proc-v2
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
> Fix For: 3.0.0, 2.2.0, 2.0.3, 2.1.2
>
> Attachments: HBASE-21490-UT.patch, HBASE-21490.patch, 
> HBASE-21490.patch
>
>
> It happens for me several times. After master restart, all the procedures are 
> gone.
> And the proc wal files were deleted before restarting, I see this in the 
> master's log
> {noformat}
> 2018-11-16,20:57:40,177 INFO [WALProcedureStoreSyncThread] 
> org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore: Remove all 
> state logs with ID less than 184, since all the active procedures are in the 
> latest log
> 2018-11-16,20:57:40,177 INFO [WALProcedureStoreSyncThread] 
> org.apache.hadoop.hbase.procedure2.store.wal.ProcedureWALFile: Archiving 
> hdfs://c4tst-xiaomi/hbase/c4tst-sync1/MasterProcWALs/pv2-0184.log
>  to hdfs://c4tst-xiaomi/hbase/c4tst-sync1/oldWALs/pv2-0184.log
> {noformat}
> Let me dig...



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21490) WALProcedure may remove proc wal files still with active procedures

2018-11-18 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16691100#comment-16691100
 ] 

stack commented on HBASE-21490:
---

Retry

> WALProcedure may remove proc wal files still with active procedures
> ---
>
> Key: HBASE-21490
> URL: https://issues.apache.org/jira/browse/HBASE-21490
> Project: HBase
>  Issue Type: Sub-task
>  Components: proc-v2
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
> Fix For: 3.0.0, 2.2.0, 2.0.3, 2.1.2
>
> Attachments: HBASE-21490-UT.patch, HBASE-21490.patch, 
> HBASE-21490.patch
>
>
> It happens for me several times. After master restart, all the procedures are 
> gone.
> And the proc wal files were deleted before restarting, I see this in the 
> master's log
> {noformat}
> 2018-11-16,20:57:40,177 INFO [WALProcedureStoreSyncThread] 
> org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore: Remove all 
> state logs with ID less than 184, since all the active procedures are in the 
> latest log
> 2018-11-16,20:57:40,177 INFO [WALProcedureStoreSyncThread] 
> org.apache.hadoop.hbase.procedure2.store.wal.ProcedureWALFile: Archiving 
> hdfs://c4tst-xiaomi/hbase/c4tst-sync1/MasterProcWALs/pv2-0184.log
>  to hdfs://c4tst-xiaomi/hbase/c4tst-sync1/oldWALs/pv2-0184.log
> {noformat}
> Let me dig...



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21490) WALProcedure may remove proc wal files still with active procedures

2018-11-18 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16690947#comment-16690947
 ] 

Hadoop QA commented on HBASE-21490:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
9s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
24s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
 0s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m 
10s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
18s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  3m 
48s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
18s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
44s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
16s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
 3s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  2m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
19s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  3m 
49s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green}  
8m 19s{color} | {color:green} Patch does not cause any errors with Hadoop 2.7.4 
or 3.0.0. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
36s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
43s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  3m 41s{color} 
| {color:red} hbase-procedure in the patch failed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}131m 
13s{color} | {color:green} hbase-server in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
50s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}174m 36s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hbase.procedure2.TestForceUpdateProcedure |
|   | hadoop.hbase.procedure2.store.wal.TestWALProcedureStore |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:b002b0b |
| JIRA Issue | HBASE-21490 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12948631/HBASE-21490.patch |
| Optional Tests |  dupname  asflicense  javac  javadoc  unit  findbugs  
shadedjars  hadoopcheck  hbaseanti  checkstyle  compile  |
| uname | Linux db803b92b851 4.4.0-138-generic #164-Ubuntu SMP Tue Oct 2 
17:16:02 UTC 2018 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
 |
| git revision | master / 

[jira] [Commented] (HBASE-21490) WALProcedure may remove proc wal files still with active procedures

2018-11-18 Thread Duo Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16690893#comment-16690893
 ] 

Duo Zhang commented on HBASE-21490:
---

We do not set abort to true when aborting master, this is why the UT will fail. 
But in a real production I think we should do more, as we'd better not rely on 
the abort flag, we should know that the store tracker is in a broken state...

> WALProcedure may remove proc wal files still with active procedures
> ---
>
> Key: HBASE-21490
> URL: https://issues.apache.org/jira/browse/HBASE-21490
> Project: HBase
>  Issue Type: Sub-task
>  Components: proc-v2
>Reporter: Duo Zhang
>Priority: Major
> Attachments: HBASE-21490-UT.patch
>
>
> It happens for me several times. After master restart, all the procedures are 
> gone.
> And the proc wal files were deleted before restarting, I see this in the 
> master's log
> {noformat}
> 2018-11-16,20:57:40,177 INFO [WALProcedureStoreSyncThread] 
> org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore: Remove all 
> state logs with ID less than 184, since all the active procedures are in the 
> latest log
> 2018-11-16,20:57:40,177 INFO [WALProcedureStoreSyncThread] 
> org.apache.hadoop.hbase.procedure2.store.wal.ProcedureWALFile: Archiving 
> hdfs://c4tst-xiaomi/hbase/c4tst-sync1/MasterProcWALs/pv2-0184.log
>  to hdfs://c4tst-xiaomi/hbase/c4tst-sync1/oldWALs/pv2-0184.log
> {noformat}
> Let me dig...



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21490) WALProcedure may remove proc wal files still with active procedures

2018-11-18 Thread Allan Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16690891#comment-16690891
 ] 

Allan Yang commented on HBASE-21490:


Good finding! I think we can move the set partial flag to the finally block. 
And another point is that I think we shouldn't persist any storeTracker when 
aborting.

> WALProcedure may remove proc wal files still with active procedures
> ---
>
> Key: HBASE-21490
> URL: https://issues.apache.org/jira/browse/HBASE-21490
> Project: HBase
>  Issue Type: Sub-task
>  Components: proc-v2
>Reporter: Duo Zhang
>Priority: Major
> Attachments: HBASE-21490-UT.patch
>
>
> It happens for me several times. After master restart, all the procedures are 
> gone.
> And the proc wal files were deleted before restarting, I see this in the 
> master's log
> {noformat}
> 2018-11-16,20:57:40,177 INFO [WALProcedureStoreSyncThread] 
> org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore: Remove all 
> state logs with ID less than 184, since all the active procedures are in the 
> latest log
> 2018-11-16,20:57:40,177 INFO [WALProcedureStoreSyncThread] 
> org.apache.hadoop.hbase.procedure2.store.wal.ProcedureWALFile: Archiving 
> hdfs://c4tst-xiaomi/hbase/c4tst-sync1/MasterProcWALs/pv2-0184.log
>  to hdfs://c4tst-xiaomi/hbase/c4tst-sync1/oldWALs/pv2-0184.log
> {noformat}
> Let me dig...



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21490) WALProcedure may remove proc wal files still with active procedures

2018-11-18 Thread Duo Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16690885#comment-16690885
 ] 

Duo Zhang commented on HBASE-21490:
---

UT to reproduce the problem.

And also found a typo in WALProcedureStore, forgot to update the tracker 
variable in the loop at the end of buildHoldingCleanupTracker...

> WALProcedure may remove proc wal files still with active procedures
> ---
>
> Key: HBASE-21490
> URL: https://issues.apache.org/jira/browse/HBASE-21490
> Project: HBase
>  Issue Type: Sub-task
>  Components: proc-v2
>Reporter: Duo Zhang
>Priority: Major
> Attachments: HBASE-21490-UT.patch
>
>
> It happens for me several times. After master restart, all the procedures are 
> gone.
> And the proc wal files were deleted before restarting, I see this in the 
> master's log
> {noformat}
> 2018-11-16,20:57:40,177 INFO [WALProcedureStoreSyncThread] 
> org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore: Remove all 
> state logs with ID less than 184, since all the active procedures are in the 
> latest log
> 2018-11-16,20:57:40,177 INFO [WALProcedureStoreSyncThread] 
> org.apache.hadoop.hbase.procedure2.store.wal.ProcedureWALFile: Archiving 
> hdfs://c4tst-xiaomi/hbase/c4tst-sync1/MasterProcWALs/pv2-0184.log
>  to hdfs://c4tst-xiaomi/hbase/c4tst-sync1/oldWALs/pv2-0184.log
> {noformat}
> Let me dig...



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21490) WALProcedure may remove proc wal files still with active procedures

2018-11-18 Thread Duo Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16690838#comment-16690838
 ] 

Duo Zhang commented on HBASE-21490:
---

OK I think I found the problem...

In ProcedureExecutor.load, we will do this in the finally block

{code}
  try {
// try to cleanup inactive wals and complete the operation
buildHoldingCleanupTracker();
tryCleanupLogsOnLoad();
loading.set(false);
  } finally {
lock.unlock();
  }
{code}

And also, in ProcedureExecutor.stop, we will close the current log stream, and 
persist the current storeTracker into the file.

And this is the code when loading procedures
{code}
  public static void load(Iterator logs, 
ProcedureStoreTracker tracker,
  Loader loader) throws IOException {
ProcedureWALFormatReader reader = new ProcedureWALFormatReader(tracker, 
loader);
tracker.setKeepDeletes(true);
try {
  // Ignore the last log which is current active log.
  while (logs.hasNext()) {
ProcedureWALFile log = logs.next();
log.open();
try {
  reader.read(log);
} finally {
  log.close();
}
  }
  reader.finish();

  // The tracker is now updated with all the procedures read from the logs
  if (tracker.isPartial()) {
tracker.setPartialFlag(false);
  }
  tracker.resetModified();
} finally {
  tracker.setKeepDeletes(false);
}
  }
{code}

And for HBASE-21494, we will throw exception at reader.finish, so we do not 
unset the partial flag, and more important, we do not call resetModified, this 
means that the current storeTracker will have all the active procedures 
modified.

So after the first crash, we will persist the broken storeTracker into the 
file, and when loading the second time, we will load this storeTracker, and 
since we will open another new file, this will not be the last file, which 
means we will use its modified bits when building holdingCleanupTracker, and no 
doubt, it contains all active procedures so we think it is OK to delete the all 
the files before it...

And although the second time we will still crashes, the 
buildHoldingCleanupTracker and removeInactiveLogs are in the finally block, the 
above logic will still be executed and then we will delete all the proc wal 
files...

Let me think how to fix.

[~stack] [~allan163] FYI.

> WALProcedure may remove proc wal files still with active procedures
> ---
>
> Key: HBASE-21490
> URL: https://issues.apache.org/jira/browse/HBASE-21490
> Project: HBase
>  Issue Type: Sub-task
>  Components: proc-v2
>Reporter: Duo Zhang
>Priority: Major
>
> It happens for me several times. After master restart, all the procedures are 
> gone.
> And the proc wal files were deleted before restarting, I see this in the 
> master's log
> {noformat}
> 2018-11-16,20:57:40,177 INFO [WALProcedureStoreSyncThread] 
> org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore: Remove all 
> state logs with ID less than 184, since all the active procedures are in the 
> latest log
> 2018-11-16,20:57:40,177 INFO [WALProcedureStoreSyncThread] 
> org.apache.hadoop.hbase.procedure2.store.wal.ProcedureWALFile: Archiving 
> hdfs://c4tst-xiaomi/hbase/c4tst-sync1/MasterProcWALs/pv2-0184.log
>  to hdfs://c4tst-xiaomi/hbase/c4tst-sync1/oldWALs/pv2-0184.log
> {noformat}
> Let me dig...



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21490) WALProcedure may remove proc wal files still with active procedures

2018-11-17 Thread Duo Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16690535#comment-16690535
 ] 

Duo Zhang commented on HBASE-21490:
---

OK I found this

{noformat}
2018-11-16,21:06:04,667 INFO 
[master/c4-hadoop-tst-ct05:19100:becomeActiveMaster] 
org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore: Remove the 
oldest log 
hdfs://c4tst-xiaomi/hbase/c4tst-sync1/MasterProcWALs/pv2-0185.log
2018-11-16,21:06:04,667 INFO 
[master/c4-hadoop-tst-ct05:19100:becomeActiveMaster] 
org.apache.hadoop.hbase.procedure2.store.wal.ProcedureWALFile: Archiving 
hdfs://c4tst-xiaomi/hbase/c4tst-sync1/MasterProcWALs/pv2-0185.log
 to hdfs://c4tst-xiaomi/hbase/c4tst-sync1/oldWALs/pv2-0185.log
2018-11-16,21:06:04,672 DEBUG 
[master/c4-hadoop-tst-ct05:19100:becomeActiveMaster] 
org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore: Removed 
log=hdfs://c4tst-xiaomi/hbase/c4tst-sync1/MasterProcWALs/pv2-0185.log,
 
activeLogs=[hdfs://c4tst-xiaomi/hbase/c4tst-sync1/MasterProcWALs/pv2-0186.log,
 
hdfs://c4tst-xiaomi/hbase/c4tst-sync1/MasterProcWALs/pv2-0187.log]
{noformat}

I think there maybe something wrong when building the holdingCleanupTracker 
under some special case. Let me dig.

> WALProcedure may remove proc wal files still with active procedures
> ---
>
> Key: HBASE-21490
> URL: https://issues.apache.org/jira/browse/HBASE-21490
> Project: HBase
>  Issue Type: Sub-task
>  Components: proc-v2
>Reporter: Duo Zhang
>Priority: Major
>
> It happens for me several times. After master restart, all the procedures are 
> gone.
> And the proc wal files were deleted before restarting, I see this in the 
> master's log
> {noformat}
> 2018-11-16,20:57:40,177 INFO [WALProcedureStoreSyncThread] 
> org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore: Remove all 
> state logs with ID less than 184, since all the active procedures are in the 
> latest log
> 2018-11-16,20:57:40,177 INFO [WALProcedureStoreSyncThread] 
> org.apache.hadoop.hbase.procedure2.store.wal.ProcedureWALFile: Archiving 
> hdfs://c4tst-xiaomi/hbase/c4tst-sync1/MasterProcWALs/pv2-0184.log
>  to hdfs://c4tst-xiaomi/hbase/c4tst-sync1/oldWALs/pv2-0184.log
> {noformat}
> Let me dig...



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21490) WALProcedure may remove proc wal files still with active procedures

2018-11-17 Thread Duo Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16690533#comment-16690533
 ] 

Duo Zhang commented on HBASE-21490:
---

OK, the root cause is a bug in RecoverStandByProcedure, there is a NPE when 
loading it and then causes the master down. But after two times of restarts, 
the file contains the procedures is deleted.

{noformat}
2018-11-16,20:43:37,454 INFO 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: allowed=true
ugi=hbase_tst/hadoop@XIAOMI.HADOOP (auth:KERBEROS)  ip=/10.132.16.33
cmd=create  
src=/hbase/c4tst-sync1/MasterProcWALs/pv2-0185.log   
perm=hbase_tst:supergroup:rw-r-proto=rpc
2018-11-16,21:05:58,652 INFO 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: allowed=true
ugi=hbase_tst/hadoop@XIAOMI.HADOOP (auth:KERBEROS)  ip=/10.132.16.34
cmd=open
src=/hbase/c4tst-sync1/MasterProcWALs/pv2-0185.log   proto=rpc
2018-11-16,21:05:58,747 INFO 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: allowed=true
ugi=hbase_tst/hadoop@XIAOMI.HADOOP (auth:KERBEROS)  ip=/10.132.16.34
cmd=open
src=/hbase/c4tst-sync1/MasterProcWALs/pv2-0185.log   proto=rpc
2018-11-16,21:06:04,196 INFO 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: allowed=true
ugi=hbase_tst/hadoop@XIAOMI.HADOOP (auth:KERBEROS)  ip=/10.132.16.34
cmd=open
src=/hbase/c4tst-sync1/MasterProcWALs/pv2-0185.log   proto=rpc
2018-11-16,21:06:04,305 INFO 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: allowed=true
ugi=hbase_tst/hadoop@XIAOMI.HADOOP (auth:KERBEROS)  ip=/10.132.16.34
cmd=open
src=/hbase/c4tst-sync1/MasterProcWALs/pv2-0185.log   proto=rpc
2018-11-16,21:06:04,669 INFO 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: allowed=true
ugi=hbase_tst/hadoop@XIAOMI.HADOOP (auth:KERBEROS)  ip=/10.132.16.34
cmd=rename  
src=/hbase/c4tst-sync1/MasterProcWALs/pv2-0185.log   
dst=/hbase/c4tst-sync1/oldWALs/pv2-0185.log
perm=hbase_tst:supergroup:rw-r- proto=rpc
2018-11-16,21:07:12,776 INFO 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: allowed=true
ugi=hbase_tst/hadoop@XIAOMI.HADOOP (auth:KERBEROS)  ip=/10.132.16.34
cmd=delete  src=/hbase/c4tst-sync1/oldWALs/pv2-0185.log 
{noformat}

Let me check what is going on here...

> WALProcedure may remove proc wal files still with active procedures
> ---
>
> Key: HBASE-21490
> URL: https://issues.apache.org/jira/browse/HBASE-21490
> Project: HBase
>  Issue Type: Sub-task
>  Components: proc-v2
>Reporter: Duo Zhang
>Priority: Major
>
> It happens for me several times. After master restart, all the procedures are 
> gone.
> And the proc wal files were deleted before restarting, I see this in the 
> master's log
> {noformat}
> 2018-11-16,20:57:40,177 INFO [WALProcedureStoreSyncThread] 
> org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore: Remove all 
> state logs with ID less than 184, since all the active procedures are in the 
> latest log
> 2018-11-16,20:57:40,177 INFO [WALProcedureStoreSyncThread] 
> org.apache.hadoop.hbase.procedure2.store.wal.ProcedureWALFile: Archiving 
> hdfs://c4tst-xiaomi/hbase/c4tst-sync1/MasterProcWALs/pv2-0184.log
>  to hdfs://c4tst-xiaomi/hbase/c4tst-sync1/oldWALs/pv2-0184.log
> {noformat}
> Let me dig...



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)