[jira] [Updated] (HBASE-22081) master shutdown: close RpcServer and procWAL first thing

2019-07-23 Thread Duo Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-22081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HBASE-22081:
--
Status: Open  (was: Patch Available)

> master shutdown: close RpcServer and procWAL first thing
> 
>
> Key: HBASE-22081
> URL: https://issues.apache.org/jira/browse/HBASE-22081
> Project: HBase
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Major
> Attachments: HBASE-22081.01.patch, HBASE-22081.02.patch, 
> HBASE-22081.03.patch, HBASE-22081.patch
>
>
> I had a master get stuck due to HBASE-22079 and noticed it was logging RS 
> abort messages during shutdown.
> [~bahramch] found some issues where messages are processed by old master 
> during shutdown due to a race condition in RS cache (or it could also happen 
> due to a network race).
> Previously I found some bug where SCP was created during master shutdown that 
> had incorrect state (because some structures already got cleaned).
> I think before master fencing is implemented we can at least make these 
> issues much less likely by thinking about shutdown order.
> 1) First kill RCP server so we don't receive any more messages. There's no 
> need to receive messages when we are shutting down. Server heartbeats could 
> be impacted I guess, but I don't think they will be cause we currently only 
> kill RS on ZK timeout.
> 2) Then do whatever cleanup we think is needed that requires proc wal.
> 3) Then close proc WAL so no errant threads can create more procs.
> 4) Then do whatever other cleanup.
> 5) Finally delete znode.
> Right now znode is deleted somewhat early I think, and RpcServer is closed 
> very late.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (HBASE-22081) master shutdown: close RpcServer and procWAL first thing

2019-04-30 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-22081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HBASE-22081:
-
Attachment: HBASE-22081.03.patch

> master shutdown: close RpcServer and procWAL first thing
> 
>
> Key: HBASE-22081
> URL: https://issues.apache.org/jira/browse/HBASE-22081
> Project: HBase
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Major
> Attachments: HBASE-22081.01.patch, HBASE-22081.02.patch, 
> HBASE-22081.03.patch, HBASE-22081.patch
>
>
> I had a master get stuck due to HBASE-22079 and noticed it was logging RS 
> abort messages during shutdown.
> [~bahramch] found some issues where messages are processed by old master 
> during shutdown due to a race condition in RS cache (or it could also happen 
> due to a network race).
> Previously I found some bug where SCP was created during master shutdown that 
> had incorrect state (because some structures already got cleaned).
> I think before master fencing is implemented we can at least make these 
> issues much less likely by thinking about shutdown order.
> 1) First kill RCP server so we don't receive any more messages. There's no 
> need to receive messages when we are shutting down. Server heartbeats could 
> be impacted I guess, but I don't think they will be cause we currently only 
> kill RS on ZK timeout.
> 2) Then do whatever cleanup we think is needed that requires proc wal.
> 3) Then close proc WAL so no errant threads can create more procs.
> 4) Then do whatever other cleanup.
> 5) Finally delete znode.
> Right now znode is deleted somewhat early I think, and RpcServer is closed 
> very late.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-22081) master shutdown: close RpcServer and procWAL first thing

2019-04-29 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-22081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HBASE-22081:
-
Attachment: HBASE-22081.02.patch

> master shutdown: close RpcServer and procWAL first thing
> 
>
> Key: HBASE-22081
> URL: https://issues.apache.org/jira/browse/HBASE-22081
> Project: HBase
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Major
> Attachments: HBASE-22081.01.patch, HBASE-22081.02.patch, 
> HBASE-22081.patch
>
>
> I had a master get stuck due to HBASE-22079 and noticed it was logging RS 
> abort messages during shutdown.
> [~bahramch] found some issues where messages are processed by old master 
> during shutdown due to a race condition in RS cache (or it could also happen 
> due to a network race).
> Previously I found some bug where SCP was created during master shutdown that 
> had incorrect state (because some structures already got cleaned).
> I think before master fencing is implemented we can at least make these 
> issues much less likely by thinking about shutdown order.
> 1) First kill RCP server so we don't receive any more messages. There's no 
> need to receive messages when we are shutting down. Server heartbeats could 
> be impacted I guess, but I don't think they will be cause we currently only 
> kill RS on ZK timeout.
> 2) Then do whatever cleanup we think is needed that requires proc wal.
> 3) Then close proc WAL so no errant threads can create more procs.
> 4) Then do whatever other cleanup.
> 5) Finally delete znode.
> Right now znode is deleted somewhat early I think, and RpcServer is closed 
> very late.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-22081) master shutdown: close RpcServer and procWAL first thing

2019-04-26 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-22081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HBASE-22081:
-
Attachment: HBASE-22081.01.patch

> master shutdown: close RpcServer and procWAL first thing
> 
>
> Key: HBASE-22081
> URL: https://issues.apache.org/jira/browse/HBASE-22081
> Project: HBase
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Major
> Attachments: HBASE-22081.01.patch, HBASE-22081.patch
>
>
> I had a master get stuck due to HBASE-22079 and noticed it was logging RS 
> abort messages during shutdown.
> [~bahramch] found some issues where messages are processed by old master 
> during shutdown due to a race condition in RS cache (or it could also happen 
> due to a network race).
> Previously I found some bug where SCP was created during master shutdown that 
> had incorrect state (because some structures already got cleaned).
> I think before master fencing is implemented we can at least make these 
> issues much less likely by thinking about shutdown order.
> 1) First kill RCP server so we don't receive any more messages. There's no 
> need to receive messages when we are shutting down. Server heartbeats could 
> be impacted I guess, but I don't think they will be cause we currently only 
> kill RS on ZK timeout.
> 2) Then do whatever cleanup we think is needed that requires proc wal.
> 3) Then close proc WAL so no errant threads can create more procs.
> 4) Then do whatever other cleanup.
> 5) Finally delete znode.
> Right now znode is deleted somewhat early I think, and RpcServer is closed 
> very late.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-22081) master shutdown: close RpcServer and procWAL first thing

2019-04-19 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-22081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HBASE-22081:
-
Summary: master shutdown: close RpcServer and procWAL first thing  (was: 
master shutdown: close RpcServer first thing, close procWAL as soon as viable, 
and delete znode the last thing)

> master shutdown: close RpcServer and procWAL first thing
> 
>
> Key: HBASE-22081
> URL: https://issues.apache.org/jira/browse/HBASE-22081
> Project: HBase
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Major
> Attachments: HBASE-22081.patch
>
>
> I had a master get stuck due to HBASE-22079 and noticed it was logging RS 
> abort messages during shutdown.
> [~bahramch] found some issues where messages are processed by old master 
> during shutdown due to a race condition in RS cache (or it could also happen 
> due to a network race).
> Previously I found some bug where SCP was created during master shutdown that 
> had incorrect state (because some structures already got cleaned).
> I think before master fencing is implemented we can at least make these 
> issues much less likely by thinking about shutdown order.
> 1) First kill RCP server so we don't receive any more messages. There's no 
> need to receive messages when we are shutting down. Server heartbeats could 
> be impacted I guess, but I don't think they will be cause we currently only 
> kill RS on ZK timeout.
> 2) Then do whatever cleanup we think is needed that requires proc wal.
> 3) Then close proc WAL so no errant threads can create more procs.
> 4) Then do whatever other cleanup.
> 5) Finally delete znode.
> Right now znode is deleted somewhat early I think, and RpcServer is closed 
> very late.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)