[tickets] [opensaf:tickets] #2451 clm: Make the cluster reset admin op safe
- **status**: review --> unassigned - **Milestone**: 5.21.09 --> future --- ** [tickets:#2451] clm: Make the cluster reset admin op safe** **Status:** unassigned **Milestone:** future **Created:** Wed May 03, 2017 10:51 AM UTC by Anders Widell **Last Updated:** Tue Sep 14, 2021 05:53 AM UTC **Owner:** Hans Nordebäck The cluster reset admin operation that was implemented in ticket [#2053] is not safe: if a node reboots very fast it can come up again and join the old cluster before other nodes have rebooted. See mail discussion: https://sourceforge.net/p/opensaf/mailman/message/35398725/ This can be solved by implementing a two-phase cluster reset or by introducing a cluster generation number which is increased at each cluster reset (maybe both ordered an spontaneous cluster resets). A node will not be allowed to join the cluster with a different cluster genration without first rebooting. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2451 clm: Make the cluster reset admin op safe
- **Milestone**: 5.19.07 --> 5.19.10 --- ** [tickets:#2451] clm: Make the cluster reset admin op safe** **Status:** review **Milestone:** 5.19.10 **Created:** Wed May 03, 2017 10:51 AM UTC by Anders Widell **Last Updated:** Wed Jul 03, 2019 06:28 AM UTC **Owner:** Hans Nordebäck The cluster reset admin operation that was implemented in ticket [#2053] is not safe: if a node reboots very fast it can come up again and join the old cluster before other nodes have rebooted. See mail discussion: https://sourceforge.net/p/opensaf/mailman/message/35398725/ This can be solved by implementing a two-phase cluster reset or by introducing a cluster generation number which is increased at each cluster reset (maybe both ordered an spontaneous cluster resets). A node will not be allowed to join the cluster with a different cluster genration without first rebooting. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2451 clm: Make the cluster reset admin op safe
- **Milestone**: 5.19.01 --> 5.19.03 --- ** [tickets:#2451] clm: Make the cluster reset admin op safe** **Status:** review **Milestone:** 5.19.03 **Created:** Wed May 03, 2017 10:51 AM UTC by Anders Widell **Last Updated:** Wed Jan 09, 2019 09:23 PM UTC **Owner:** Hans Nordebäck The cluster reset admin operation that was implemented in ticket [#2053] is not safe: if a node reboots very fast it can come up again and join the old cluster before other nodes have rebooted. See mail discussion: https://sourceforge.net/p/opensaf/mailman/message/35398725/ This can be solved by implementing a two-phase cluster reset or by introducing a cluster generation number which is increased at each cluster reset (maybe both ordered an spontaneous cluster resets). A node will not be allowed to join the cluster with a different cluster genration without first rebooting. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2451 clm: Make the cluster reset admin op safe
- **Milestone**: 5.18.09 --> 5.18.12 --- ** [tickets:#2451] clm: Make the cluster reset admin op safe** **Status:** review **Milestone:** 5.18.12 **Created:** Wed May 03, 2017 10:51 AM UTC by Anders Widell **Last Updated:** Thu Aug 30, 2018 12:45 PM UTC **Owner:** Hans Nordebäck The cluster reset admin operation that was implemented in ticket [#2053] is not safe: if a node reboots very fast it can come up again and join the old cluster before other nodes have rebooted. See mail discussion: https://sourceforge.net/p/opensaf/mailman/message/35398725/ This can be solved by implementing a two-phase cluster reset or by introducing a cluster generation number which is increased at each cluster reset (maybe both ordered an spontaneous cluster resets). A node will not be allowed to join the cluster with a different cluster genration without first rebooting. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2451 clm: Make the cluster reset admin op safe
As an alternative to already sent out patches, I'll send out another patch that "emulates pxe" for reivew during this week. --- ** [tickets:#2451] clm: Make the cluster reset admin op safe** **Status:** review **Milestone:** 5.17.10 **Created:** Wed May 03, 2017 10:51 AM UTC by Anders Widell **Last Updated:** Fri Sep 22, 2017 01:13 PM UTC **Owner:** Hans Nordebäck The cluster reset admin operation that was implemented in ticket [#2053] is not safe: if a node reboots very fast it can come up again and join the old cluster before other nodes have rebooted. See mail discussion: https://sourceforge.net/p/opensaf/mailman/message/35398725/ This can be solved by implementing a two-phase cluster reset or by introducing a cluster generation number which is increased at each cluster reset (maybe both ordered an spontaneous cluster resets). A node will not be allowed to join the cluster with a different cluster genration without first rebooting. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2451 clm: Make the cluster reset admin op safe
I attached one idea (prototype) for the safe cluster restart. The attached file contains a bit change in IMM and CLM. The idea is that when cluster restart is invoked by CLM admin operation, that CLM first disable sync in IMM (change in IMM), and then continue with rebooting nodes. If a rebooted node comes up too fast, before the last IMM veteran node goes down, IMM sync will not be possible, and the node will be hanging in the NID phase waiting for the sync. When the last IMM veteran node goes down, IMMD will start with electing a new coordinator. Since there is no any veteran node in the cluster, the new IMM coordinator will start loading data from PBE or XML file. The side effect of the attached file is that some nodes which joined before the last veteran goes down, can be rebooted again mostly due to QUIESCED role in RDE, or if they are payload running without SC absence allowed. There is nothing wrong with rebooting that nodes again. They are still in OpenSAF starting phase, and there is no any application up and running. So, rebooting that nodes are safe. The attached file is only a proposal and needs to be split in two tickets, one for IMM (disable sync feature) and this ticket for CLM. For IMM part, I would like to make the disable sync function as a one way function, and when the sync is disabled, it cannot be enabled again until the cluster restart is done. In the attached file, disable sync feature can be switched on and off. Attachments: - [clmrestart.diff](https://sourceforge.net/p/opensaf/tickets/_discuss/thread/d666d71b/3ab4/attachment/clmrestart.diff) (6.4 kB; application/octet-stream) --- ** [tickets:#2451] clm: Make the cluster reset admin op safe** **Status:** review **Milestone:** 5.17.10 **Created:** Wed May 03, 2017 10:51 AM UTC by Anders Widell **Last Updated:** Fri Sep 15, 2017 06:01 AM UTC **Owner:** Hans Nordebäck The cluster reset admin operation that was implemented in ticket [#2053] is not safe: if a node reboots very fast it can come up again and join the old cluster before other nodes have rebooted. See mail discussion: https://sourceforge.net/p/opensaf/mailman/message/35398725/ This can be solved by implementing a two-phase cluster reset or by introducing a cluster generation number which is increased at each cluster reset (maybe both ordered an spontaneous cluster resets). A node will not be allowed to join the cluster with a different cluster genration without first rebooting. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2451 clm: Make the cluster reset admin op safe
- **status**: unassigned --> review - **assigned_to**: Hans Nordebäck --- ** [tickets:#2451] clm: Make the cluster reset admin op safe** **Status:** review **Milestone:** 5.17.10 **Created:** Wed May 03, 2017 10:51 AM UTC by Anders Widell **Last Updated:** Wed Jul 19, 2017 10:06 AM UTC **Owner:** Hans Nordebäck The cluster reset admin operation that was implemented in ticket [#2053] is not safe: if a node reboots very fast it can come up again and join the old cluster before other nodes have rebooted. See mail discussion: https://sourceforge.net/p/opensaf/mailman/message/35398725/ This can be solved by implementing a two-phase cluster reset or by introducing a cluster generation number which is increased at each cluster reset (maybe both ordered an spontaneous cluster resets). A node will not be allowed to join the cluster with a different cluster genration without first rebooting. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2451 clm: Make the cluster reset admin op safe
For rolling upgrades only commit 653edb5d9b217f1a3280b5aed8597fb53ffa5f61 (HEAD -> develop, origin/develop, ticket-2521) Author: Rafael Odzakow Date: Wed Jul 19 11:52:57 2017 +0200 smf: no node locking when procedures are empty [#2521] --- ** [tickets:#2451] clm: Make the cluster reset admin op safe** **Status:** unassigned **Milestone:** 5.17.10 **Created:** Wed May 03, 2017 10:51 AM UTC by Anders Widell **Last Updated:** Sat Jul 01, 2017 04:15 PM UTC **Owner:** nobody The cluster reset admin operation that was implemented in ticket [#2053] is not safe: if a node reboots very fast it can come up again and join the old cluster before other nodes have rebooted. See mail discussion: https://sourceforge.net/p/opensaf/mailman/message/35398725/ This can be solved by implementing a two-phase cluster reset or by introducing a cluster generation number which is increased at each cluster reset (maybe both ordered an spontaneous cluster resets). A node will not be allowed to join the cluster with a different cluster genration without first rebooting. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2451 clm: Make the cluster reset admin op safe
Ideally yes, though then we are talking about a full clustering solution. --- ** [tickets:#2451] clm: Make the cluster reset admin op safe** **Status:** unassigned **Milestone:** 5.17.08 **Created:** Wed May 03, 2017 10:51 AM UTC by Anders Widell **Last Updated:** Thu Jun 29, 2017 01:41 PM UTC **Owner:** nobody The cluster reset admin operation that was implemented in ticket [#2053] is not safe: if a node reboots very fast it can come up again and join the old cluster before other nodes have rebooted. See mail discussion: https://sourceforge.net/p/opensaf/mailman/message/35398725/ This can be solved by implementing a two-phase cluster reset or by introducing a cluster generation number which is increased at each cluster reset (maybe both ordered an spontaneous cluster resets). A node will not be allowed to join the cluster with a different cluster genration without first rebooting. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2451 clm: Make the cluster reset admin op safe
For the node that is not allowed to join the CLM cluster will this solution also block IMM (and other services) from starting up? --- ** [tickets:#2451] clm: Make the cluster reset admin op safe** **Status:** unassigned **Milestone:** 5.17.08 **Created:** Wed May 03, 2017 10:51 AM UTC by Anders Widell **Last Updated:** Wed May 03, 2017 10:51 AM UTC **Owner:** nobody The cluster reset admin operation that was implemented in ticket [#2053] is not safe: if a node reboots very fast it can come up again and join the old cluster before other nodes have rebooted. See mail discussion: https://sourceforge.net/p/opensaf/mailman/message/35398725/ This can be solved by implementing a two-phase cluster reset or by introducing a cluster generation number which is increased at each cluster reset (maybe both ordered an spontaneous cluster resets). A node will not be allowed to join the cluster with a different cluster genration without first rebooting. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2451 clm: Make the cluster reset admin op safe
--- ** [tickets:#2451] clm: Make the cluster reset admin op safe** **Status:** unassigned **Milestone:** 5.17.08 **Created:** Wed May 03, 2017 10:51 AM UTC by Anders Widell **Last Updated:** Wed May 03, 2017 10:51 AM UTC **Owner:** nobody The cluster reset admin operation that was implemented in ticket [#2053] is not safe: if a node reboots very fast it can come up again and join the old cluster before other nodes have rebooted. See mail discussion: https://sourceforge.net/p/opensaf/mailman/message/35398725/ This can be solved by implementing a two-phase cluster reset or by introducing a cluster generation number which is increased at each cluster reset (maybe both ordered an spontaneous cluster resets). A node will not be allowed to join the cluster with a different cluster genration without first rebooting. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets