[jira] [Commented] (HBASE-22081) master shutdown: close RpcServer and procWAL first thing

2019-05-01 Thread Duo Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-22081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16831324#comment-16831324
 ] 

Duo Zhang commented on HBASE-22081:
---

Sorry it is holiday in China so do not have enough cycle to review the patch...

> master shutdown: close RpcServer and procWAL first thing
> 
>
> Key: HBASE-22081
> URL: https://issues.apache.org/jira/browse/HBASE-22081
> Project: HBase
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Major
> Attachments: HBASE-22081.01.patch, HBASE-22081.02.patch, 
> HBASE-22081.03.patch, HBASE-22081.patch
>
>
> I had a master get stuck due to HBASE-22079 and noticed it was logging RS 
> abort messages during shutdown.
> [~bahramch] found some issues where messages are processed by old master 
> during shutdown due to a race condition in RS cache (or it could also happen 
> due to a network race).
> Previously I found some bug where SCP was created during master shutdown that 
> had incorrect state (because some structures already got cleaned).
> I think before master fencing is implemented we can at least make these 
> issues much less likely by thinking about shutdown order.
> 1) First kill RCP server so we don't receive any more messages. There's no 
> need to receive messages when we are shutting down. Server heartbeats could 
> be impacted I guess, but I don't think they will be cause we currently only 
> kill RS on ZK timeout.
> 2) Then do whatever cleanup we think is needed that requires proc wal.
> 3) Then close proc WAL so no errant threads can create more procs.
> 4) Then do whatever other cleanup.
> 5) Finally delete znode.
> Right now znode is deleted somewhat early I think, and RpcServer is closed 
> very late.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-22081) master shutdown: close RpcServer and procWAL first thing

2019-04-30 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-22081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16830810#comment-16830810
 ] 

Sergey Shelukhin commented on HBASE-22081:
--

[~Apache9] does this patch make sense to you? it moves Rpc server and proc 
closing to the beginning of the shutdown to limit potential race conditions 
with incorrect state/new requests.

> master shutdown: close RpcServer and procWAL first thing
> 
>
> Key: HBASE-22081
> URL: https://issues.apache.org/jira/browse/HBASE-22081
> Project: HBase
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Major
> Attachments: HBASE-22081.01.patch, HBASE-22081.02.patch, 
> HBASE-22081.03.patch, HBASE-22081.patch
>
>
> I had a master get stuck due to HBASE-22079 and noticed it was logging RS 
> abort messages during shutdown.
> [~bahramch] found some issues where messages are processed by old master 
> during shutdown due to a race condition in RS cache (or it could also happen 
> due to a network race).
> Previously I found some bug where SCP was created during master shutdown that 
> had incorrect state (because some structures already got cleaned).
> I think before master fencing is implemented we can at least make these 
> issues much less likely by thinking about shutdown order.
> 1) First kill RCP server so we don't receive any more messages. There's no 
> need to receive messages when we are shutting down. Server heartbeats could 
> be impacted I guess, but I don't think they will be cause we currently only 
> kill RS on ZK timeout.
> 2) Then do whatever cleanup we think is needed that requires proc wal.
> 3) Then close proc WAL so no errant threads can create more procs.
> 4) Then do whatever other cleanup.
> 5) Finally delete znode.
> Right now znode is deleted somewhat early I think, and RpcServer is closed 
> very late.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-22081) master shutdown: close RpcServer and procWAL first thing

2019-04-30 Thread HBase QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-22081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16830749#comment-16830749
 ] 

HBase QA commented on HBASE-22081:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
33s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  5m 
16s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
0s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
24s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
57s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
24s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
37s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
31s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
59s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
59s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  1m 
26s{color} | {color:red} hbase-server: The patch generated 1 new + 320 
unchanged - 0 fixed = 321 total (was 320) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
49s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green}  
9m 18s{color} | {color:green} Patch does not cause any errors with Hadoop 2.7.4 
or 3.0.0. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
40s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
35s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}137m 
18s{color} | {color:green} hbase-server in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
22s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}182m 53s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce base: 
https://builds.apache.org/job/PreCommit-HBASE-Build/221/artifact/patchprocess/Dockerfile
 |
| JIRA Issue | HBASE-22081 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12967522/HBASE-22081.03.patch |
| Optional Tests |  dupname  asflicense  javac  javadoc  unit  findbugs  
shadedjars  hadoopcheck  hbaseanti  checkstyle  compile  |
| uname | Linux 64c2b92bd90d 4.4.0-143-generic #169~14.04.2-Ubuntu SMP Wed Feb 
13 15:00:41 UTC 2019 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | dev-support/hbase-personality.sh |
| git revision | master / 70296a2e78 |
| maven | version: Apache Maven 3.5.4 
(1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-17T18:33:14Z) |
| Default Java | 1.8.0_181 |
| findbugs | v3.1.11 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-HBASE-Build/221/artifact/patchprocess/diff-checkstyle-hbase-server.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HBASE-Build/221/testReport/ |
| Max. process+thread count | 4915 (vs. ulimit of 1) |
| modules | C: hbase-server U: hbase-server |
| Console output | 

[jira] [Commented] (HBASE-22081) master shutdown: close RpcServer and procWAL first thing

2019-04-30 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-22081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16830659#comment-16830659
 ] 

Sergey Shelukhin commented on HBASE-22081:
--

Interesting... tests pass in the JIRA and locally, but not in the PR.

> master shutdown: close RpcServer and procWAL first thing
> 
>
> Key: HBASE-22081
> URL: https://issues.apache.org/jira/browse/HBASE-22081
> Project: HBase
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Major
> Attachments: HBASE-22081.01.patch, HBASE-22081.02.patch, 
> HBASE-22081.03.patch, HBASE-22081.patch
>
>
> I had a master get stuck due to HBASE-22079 and noticed it was logging RS 
> abort messages during shutdown.
> [~bahramch] found some issues where messages are processed by old master 
> during shutdown due to a race condition in RS cache (or it could also happen 
> due to a network race).
> Previously I found some bug where SCP was created during master shutdown that 
> had incorrect state (because some structures already got cleaned).
> I think before master fencing is implemented we can at least make these 
> issues much less likely by thinking about shutdown order.
> 1) First kill RCP server so we don't receive any more messages. There's no 
> need to receive messages when we are shutting down. Server heartbeats could 
> be impacted I guess, but I don't think they will be cause we currently only 
> kill RS on ZK timeout.
> 2) Then do whatever cleanup we think is needed that requires proc wal.
> 3) Then close proc WAL so no errant threads can create more procs.
> 4) Then do whatever other cleanup.
> 5) Finally delete znode.
> Right now znode is deleted somewhat early I think, and RpcServer is closed 
> very late.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-22081) master shutdown: close RpcServer and procWAL first thing

2019-04-29 Thread HBase QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-22081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16829938#comment-16829938
 ] 

HBase QA commented on HBASE-22081:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
25s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
52s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
54s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
21s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
50s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
48s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
36s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
53s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
6s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m  
6s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  1m 
31s{color} | {color:red} hbase-server: The patch generated 1 new + 320 
unchanged - 0 fixed = 321 total (was 320) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
59s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green}  
9m 39s{color} | {color:green} Patch does not cause any errors with Hadoop 2.7.4 
or 3.0.0. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
41s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
30s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}139m 
24s{color} | {color:green} hbase-server in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
22s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}185m 33s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce base: 
https://builds.apache.org/job/PreCommit-HBASE-Build/216/artifact/patchprocess/Dockerfile
 |
| JIRA Issue | HBASE-22081 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12967421/HBASE-22081.02.patch |
| Optional Tests |  dupname  asflicense  javac  javadoc  unit  findbugs  
shadedjars  hadoopcheck  hbaseanti  checkstyle  compile  |
| uname | Linux f91b46245a79 4.4.0-143-generic #169~14.04.2-Ubuntu SMP Wed Feb 
13 15:00:41 UTC 2019 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | dev-support/hbase-personality.sh |
| git revision | master / 9743b3c70d |
| maven | version: Apache Maven 3.5.4 
(1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-17T18:33:14Z) |
| Default Java | 1.8.0_181 |
| findbugs | v3.1.11 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-HBASE-Build/216/artifact/patchprocess/diff-checkstyle-hbase-server.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HBASE-Build/216/testReport/ |
| Max. process+thread count | 4824 (vs. ulimit of 1) |
| modules | C: hbase-server U: hbase-server |
| Console output | 

[jira] [Commented] (HBASE-22081) master shutdown: close RpcServer and procWAL first thing

2019-04-29 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-22081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16829874#comment-16829874
 ] 

Sergey Shelukhin commented on HBASE-22081:
--

This patch is getting more and more interesting.
Looks like some procedures do not handle interruptedioexception correctly, 
retrying it forever, which in the case of minicluster, prevents it from 
shutting down. Not sure how the order of termination affected it, probably 
procwal terminating early just catches the proc in the test in a different 
state than it did before.

> master shutdown: close RpcServer and procWAL first thing
> 
>
> Key: HBASE-22081
> URL: https://issues.apache.org/jira/browse/HBASE-22081
> Project: HBase
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Major
> Attachments: HBASE-22081.01.patch, HBASE-22081.02.patch, 
> HBASE-22081.patch
>
>
> I had a master get stuck due to HBASE-22079 and noticed it was logging RS 
> abort messages during shutdown.
> [~bahramch] found some issues where messages are processed by old master 
> during shutdown due to a race condition in RS cache (or it could also happen 
> due to a network race).
> Previously I found some bug where SCP was created during master shutdown that 
> had incorrect state (because some structures already got cleaned).
> I think before master fencing is implemented we can at least make these 
> issues much less likely by thinking about shutdown order.
> 1) First kill RCP server so we don't receive any more messages. There's no 
> need to receive messages when we are shutting down. Server heartbeats could 
> be impacted I guess, but I don't think they will be cause we currently only 
> kill RS on ZK timeout.
> 2) Then do whatever cleanup we think is needed that requires proc wal.
> 3) Then close proc WAL so no errant threads can create more procs.
> 4) Then do whatever other cleanup.
> 5) Finally delete znode.
> Right now znode is deleted somewhat early I think, and RpcServer is closed 
> very late.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-22081) master shutdown: close RpcServer and procWAL first thing

2019-04-26 Thread HBase QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-22081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16827439#comment-16827439
 ] 

HBase QA commented on HBASE-22081:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
26s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
 9s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
51s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
14s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
20s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
28s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
31s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  3m 
53s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
51s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
51s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  1m 
14s{color} | {color:red} hbase-server: The patch generated 1 new + 283 
unchanged - 0 fixed = 284 total (was 283) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
20s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green}  
8m  4s{color} | {color:green} Patch does not cause any errors with Hadoop 2.7.4 
or 3.0.0. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
41s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
30s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}135m  8s{color} 
| {color:red} hbase-server in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
27s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}173m 43s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.hbase.coprocessor.TestMasterCoprocessorExceptionWithAbort |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce base: 
https://builds.apache.org/job/PreCommit-HBASE-Build/205/artifact/patchprocess/Dockerfile
 |
| JIRA Issue | HBASE-22081 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12967205/HBASE-22081.01.patch |
| Optional Tests |  dupname  asflicense  javac  javadoc  unit  findbugs  
shadedjars  hadoopcheck  hbaseanti  checkstyle  compile  |
| uname | Linux 839bd0029510 4.4.0-138-generic #164-Ubuntu SMP Tue Oct 2 
17:16:02 UTC 2018 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | dev-support/hbase-personality.sh |
| git revision | master / 962585d376 |
| maven | version: Apache Maven 3.5.4 
(1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-17T18:33:14Z) |
| Default Java | 1.8.0_181 |
| findbugs | v3.1.11 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-HBASE-Build/205/artifact/patchprocess/diff-checkstyle-hbase-server.txt
 |
| unit | 
https://builds.apache.org/job/PreCommit-HBASE-Build/205/artifact/patchprocess/patch-unit-hbase-server.txt
 |
|  Test 

[jira] [Commented] (HBASE-22081) master shutdown: close RpcServer and procWAL first thing

2019-04-26 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-22081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16827407#comment-16827407
 ] 

Sergey Shelukhin commented on HBASE-22081:
--


Before the patch, it is mere coincidence that while all the other stuff is 
shutting down, the rpc that caused it has a chance to return. 
Caller could get unlucky and stop RPC would fail because RPC server was 
closed... now that we shut down RPC server first thing, it happens almost all 
the time.
Added a small sleep before starting shutdown if it was triggered by and RPC 
request 0_o Unfortunately it doesn't seem to be possible externally to wait for 
RPC(s) responses to finish.

> master shutdown: close RpcServer and procWAL first thing
> 
>
> Key: HBASE-22081
> URL: https://issues.apache.org/jira/browse/HBASE-22081
> Project: HBase
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Major
> Attachments: HBASE-22081.01.patch, HBASE-22081.patch
>
>
> I had a master get stuck due to HBASE-22079 and noticed it was logging RS 
> abort messages during shutdown.
> [~bahramch] found some issues where messages are processed by old master 
> during shutdown due to a race condition in RS cache (or it could also happen 
> due to a network race).
> Previously I found some bug where SCP was created during master shutdown that 
> had incorrect state (because some structures already got cleaned).
> I think before master fencing is implemented we can at least make these 
> issues much less likely by thinking about shutdown order.
> 1) First kill RCP server so we don't receive any more messages. There's no 
> need to receive messages when we are shutting down. Server heartbeats could 
> be impacted I guess, but I don't think they will be cause we currently only 
> kill RS on ZK timeout.
> 2) Then do whatever cleanup we think is needed that requires proc wal.
> 3) Then close proc WAL so no errant threads can create more procs.
> 4) Then do whatever other cleanup.
> 5) Finally delete znode.
> Right now znode is deleted somewhat early I think, and RpcServer is closed 
> very late.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)