[tickets] [opensaf:tickets] #2451 clm: Make the cluster reset admin op safe

2021-09-13 Thread Gary Lee via Opensaf-tickets
- **status**: review --> unassigned
- **Milestone**: 5.21.09 --> future



---

** [tickets:#2451] clm: Make the cluster reset admin op safe**

**Status:** unassigned
**Milestone:** future
**Created:** Wed May 03, 2017 10:51 AM UTC by Anders Widell
**Last Updated:** Tue Sep 14, 2021 05:53 AM UTC
**Owner:** Hans Nordebäck


The cluster reset admin operation that was implemented in ticket [#2053] is not 
safe: if a node reboots very fast it can come up again and join the old cluster 
before other nodes have rebooted. See mail discussion:

https://sourceforge.net/p/opensaf/mailman/message/35398725/

This can be solved by implementing a two-phase cluster reset or by introducing 
a cluster generation number which is increased at each cluster reset (maybe 
both ordered an spontaneous cluster resets). A node will not be allowed to join 
the cluster with a different cluster genration without first rebooting.


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2451 clm: Make the cluster reset admin op safe

2019-07-22 Thread Gary Lee via Opensaf-tickets
- **Milestone**: 5.19.07 --> 5.19.10



---

** [tickets:#2451] clm: Make the cluster reset admin op safe**

**Status:** review
**Milestone:** 5.19.10
**Created:** Wed May 03, 2017 10:51 AM UTC by Anders Widell
**Last Updated:** Wed Jul 03, 2019 06:28 AM UTC
**Owner:** Hans Nordebäck


The cluster reset admin operation that was implemented in ticket [#2053] is not 
safe: if a node reboots very fast it can come up again and join the old cluster 
before other nodes have rebooted. See mail discussion:

https://sourceforge.net/p/opensaf/mailman/message/35398725/

This can be solved by implementing a two-phase cluster reset or by introducing 
a cluster generation number which is increased at each cluster reset (maybe 
both ordered an spontaneous cluster resets). A node will not be allowed to join 
the cluster with a different cluster genration without first rebooting.


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2451 clm: Make the cluster reset admin op safe

2019-01-09 Thread Gary Lee via Opensaf-tickets
- **Milestone**: 5.19.01 --> 5.19.03



---

** [tickets:#2451] clm: Make the cluster reset admin op safe**

**Status:** review
**Milestone:** 5.19.03
**Created:** Wed May 03, 2017 10:51 AM UTC by Anders Widell
**Last Updated:** Wed Jan 09, 2019 09:23 PM UTC
**Owner:** Hans Nordebäck


The cluster reset admin operation that was implemented in ticket [#2053] is not 
safe: if a node reboots very fast it can come up again and join the old cluster 
before other nodes have rebooted. See mail discussion:

https://sourceforge.net/p/opensaf/mailman/message/35398725/

This can be solved by implementing a two-phase cluster reset or by introducing 
a cluster generation number which is increased at each cluster reset (maybe 
both ordered an spontaneous cluster resets). A node will not be allowed to join 
the cluster with a different cluster genration without first rebooting.


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2451 clm: Make the cluster reset admin op safe

2018-09-29 Thread Gary Lee via Opensaf-tickets
- **Milestone**: 5.18.09 --> 5.18.12



---

** [tickets:#2451] clm: Make the cluster reset admin op safe**

**Status:** review
**Milestone:** 5.18.12
**Created:** Wed May 03, 2017 10:51 AM UTC by Anders Widell
**Last Updated:** Thu Aug 30, 2018 12:45 PM UTC
**Owner:** Hans Nordebäck


The cluster reset admin operation that was implemented in ticket [#2053] is not 
safe: if a node reboots very fast it can come up again and join the old cluster 
before other nodes have rebooted. See mail discussion:

https://sourceforge.net/p/opensaf/mailman/message/35398725/

This can be solved by implementing a two-phase cluster reset or by introducing 
a cluster generation number which is increased at each cluster reset (maybe 
both ordered an spontaneous cluster resets). A node will not be allowed to join 
the cluster with a different cluster genration without first rebooting.


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2451 clm: Make the cluster reset admin op safe

2017-09-26 Thread Hans Nordebäck
As an alternative to already sent out patches, I'll send out another patch that 
"emulates pxe" for reivew during this week.


---

** [tickets:#2451] clm: Make the cluster reset admin op safe**

**Status:** review
**Milestone:** 5.17.10
**Created:** Wed May 03, 2017 10:51 AM UTC by Anders Widell
**Last Updated:** Fri Sep 22, 2017 01:13 PM UTC
**Owner:** Hans Nordebäck


The cluster reset admin operation that was implemented in ticket [#2053] is not 
safe: if a node reboots very fast it can come up again and join the old cluster 
before other nodes have rebooted. See mail discussion:

https://sourceforge.net/p/opensaf/mailman/message/35398725/

This can be solved by implementing a two-phase cluster reset or by introducing 
a cluster generation number which is increased at each cluster reset (maybe 
both ordered an spontaneous cluster resets). A node will not be allowed to join 
the cluster with a different cluster genration without first rebooting.


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2451 clm: Make the cluster reset admin op safe

2017-09-22 Thread Zoran Milinkovic via Opensaf-tickets
I attached one idea (prototype) for the safe cluster restart.
The attached file contains a bit change in IMM and CLM.

The idea is that when cluster restart is invoked by CLM admin operation, that 
CLM first disable sync in IMM (change in IMM), and then continue with rebooting 
nodes.

If a rebooted node comes up too fast, before the last IMM veteran node goes 
down, IMM sync will not be possible, and the node will be hanging in the NID 
phase waiting for the sync.
When the last IMM veteran node goes down, IMMD will start with electing a new 
coordinator. Since there is no any veteran node in the cluster, the new IMM 
coordinator will start loading data from PBE or XML file.

The side effect of the attached file is that some nodes which joined before the 
last veteran goes down, can be rebooted again mostly due to QUIESCED role in 
RDE, or if they are payload running without SC absence allowed.
There is nothing wrong with rebooting that nodes again. They are still in 
OpenSAF starting phase, and there is no any application up and running. So, 
rebooting that nodes are safe.

The attached file is only a proposal and needs to be split in two tickets, one 
for IMM (disable sync feature) and this ticket for CLM.

For IMM part, I would like to make the disable sync function as a one way 
function, and when the sync is disabled, it cannot be enabled again until the 
cluster restart is done.
In the attached file, disable sync feature can be switched on and off.



Attachments:

- 
[clmrestart.diff](https://sourceforge.net/p/opensaf/tickets/_discuss/thread/d666d71b/3ab4/attachment/clmrestart.diff)
 (6.4 kB; application/octet-stream)


---

** [tickets:#2451] clm: Make the cluster reset admin op safe**

**Status:** review
**Milestone:** 5.17.10
**Created:** Wed May 03, 2017 10:51 AM UTC by Anders Widell
**Last Updated:** Fri Sep 15, 2017 06:01 AM UTC
**Owner:** Hans Nordebäck


The cluster reset admin operation that was implemented in ticket [#2053] is not 
safe: if a node reboots very fast it can come up again and join the old cluster 
before other nodes have rebooted. See mail discussion:

https://sourceforge.net/p/opensaf/mailman/message/35398725/

This can be solved by implementing a two-phase cluster reset or by introducing 
a cluster generation number which is increased at each cluster reset (maybe 
both ordered an spontaneous cluster resets). A node will not be allowed to join 
the cluster with a different cluster genration without first rebooting.


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2451 clm: Make the cluster reset admin op safe

2017-09-14 Thread Hans Nordebäck
- **status**: unassigned --> review
- **assigned_to**: Hans Nordebäck



---

** [tickets:#2451] clm: Make the cluster reset admin op safe**

**Status:** review
**Milestone:** 5.17.10
**Created:** Wed May 03, 2017 10:51 AM UTC by Anders Widell
**Last Updated:** Wed Jul 19, 2017 10:06 AM UTC
**Owner:** Hans Nordebäck


The cluster reset admin operation that was implemented in ticket [#2053] is not 
safe: if a node reboots very fast it can come up again and join the old cluster 
before other nodes have rebooted. See mail discussion:

https://sourceforge.net/p/opensaf/mailman/message/35398725/

This can be solved by implementing a two-phase cluster reset or by introducing 
a cluster generation number which is increased at each cluster reset (maybe 
both ordered an spontaneous cluster resets). A node will not be allowed to join 
the cluster with a different cluster genration without first rebooting.


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2451 clm: Make the cluster reset admin op safe

2017-07-19 Thread Rafael Odzakow via Opensaf-tickets
For rolling upgrades only

commit 653edb5d9b217f1a3280b5aed8597fb53ffa5f61 (HEAD -> develop, 
origin/develop, ticket-2521)
Author: Rafael Odzakow 
Date:   Wed Jul 19 11:52:57 2017 +0200

smf: no node locking when procedures are empty [#2521]



---

** [tickets:#2451] clm: Make the cluster reset admin op safe**

**Status:** unassigned
**Milestone:** 5.17.10
**Created:** Wed May 03, 2017 10:51 AM UTC by Anders Widell
**Last Updated:** Sat Jul 01, 2017 04:15 PM UTC
**Owner:** nobody


The cluster reset admin operation that was implemented in ticket [#2053] is not 
safe: if a node reboots very fast it can come up again and join the old cluster 
before other nodes have rebooted. See mail discussion:

https://sourceforge.net/p/opensaf/mailman/message/35398725/

This can be solved by implementing a two-phase cluster reset or by introducing 
a cluster generation number which is increased at each cluster reset (maybe 
both ordered an spontaneous cluster resets). A node will not be allowed to join 
the cluster with a different cluster genration without first rebooting.


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2451 clm: Make the cluster reset admin op safe

2017-06-29 Thread Anders Widell via Opensaf-tickets
Ideally yes, though then we are talking about a full clustering solution.


---

** [tickets:#2451] clm: Make the cluster reset admin op safe**

**Status:** unassigned
**Milestone:** 5.17.08
**Created:** Wed May 03, 2017 10:51 AM UTC by Anders Widell
**Last Updated:** Thu Jun 29, 2017 01:41 PM UTC
**Owner:** nobody


The cluster reset admin operation that was implemented in ticket [#2053] is not 
safe: if a node reboots very fast it can come up again and join the old cluster 
before other nodes have rebooted. See mail discussion:

https://sourceforge.net/p/opensaf/mailman/message/35398725/

This can be solved by implementing a two-phase cluster reset or by introducing 
a cluster generation number which is increased at each cluster reset (maybe 
both ordered an spontaneous cluster resets). A node will not be allowed to join 
the cluster with a different cluster genration without first rebooting.


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2451 clm: Make the cluster reset admin op safe

2017-06-29 Thread Rafael Odzakow via Opensaf-tickets
For the node that is not allowed to join the CLM cluster will this solution 
also block IMM (and other services) from starting up?


---

** [tickets:#2451] clm: Make the cluster reset admin op safe**

**Status:** unassigned
**Milestone:** 5.17.08
**Created:** Wed May 03, 2017 10:51 AM UTC by Anders Widell
**Last Updated:** Wed May 03, 2017 10:51 AM UTC
**Owner:** nobody


The cluster reset admin operation that was implemented in ticket [#2053] is not 
safe: if a node reboots very fast it can come up again and join the old cluster 
before other nodes have rebooted. See mail discussion:

https://sourceforge.net/p/opensaf/mailman/message/35398725/

This can be solved by implementing a two-phase cluster reset or by introducing 
a cluster generation number which is increased at each cluster reset (maybe 
both ordered an spontaneous cluster resets). A node will not be allowed to join 
the cluster with a different cluster genration without first rebooting.


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2451 clm: Make the cluster reset admin op safe

2017-05-03 Thread Anders Widell



---

** [tickets:#2451] clm: Make the cluster reset admin op safe**

**Status:** unassigned
**Milestone:** 5.17.08
**Created:** Wed May 03, 2017 10:51 AM UTC by Anders Widell
**Last Updated:** Wed May 03, 2017 10:51 AM UTC
**Owner:** nobody


The cluster reset admin operation that was implemented in ticket [#2053] is not 
safe: if a node reboots very fast it can come up again and join the old cluster 
before other nodes have rebooted. See mail discussion:

https://sourceforge.net/p/opensaf/mailman/message/35398725/

This can be solved by implementing a two-phase cluster reset or by introducing 
a cluster generation number which is increased at each cluster reset (maybe 
both ordered an spontaneous cluster resets). A node will not be allowed to join 
the cluster with a different cluster genration without first rebooting.


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets