[tickets] [opensaf:tickets] #1732 OUT_OF_SYNC (failed over) new active controller should go for immediate reboot
- **status**: review --> fixed - **Comment**: changeset: 7485:c6ebe597634d branch: opensaf-5.0.x user:Anders Widell date:Tue Apr 12 14:12:30 2016 +0200 summary: amfd: Reboot standby when failover fails due to out of sync [#1732] changeset: 7486:fd5caf343318 parent: 7483:0d1bf5efac9a user:Anders Widell date:Tue Apr 12 14:12:30 2016 +0200 summary: amfd: Reboot standby when failover fails due to out of sync [#1732] [staging:c6ebe5] [staging:fd5caf] --- ** [tickets:#1732] OUT_OF_SYNC (failed over) new active controller should go for immediate reboot** **Status:** fixed **Milestone:** 5.0.RC2 **Created:** Wed Apr 06, 2016 11:02 AM UTC by Srikanth R **Last Updated:** Tue Apr 12, 2016 11:08 AM UTC **Owner:** nobody Changeset : 7436 Version : 5.0 FC Setup : Two controllers Issue : Out of sync (failed over) new active controller should go for immediate reboot, During failover, if the standby controller is OUT OF SYNC and could not get promoted to active, the node should be rebooted immediately. Apr 6 16:03:53 CONTROLLER-2 osafamfd[431]: ER FAILOVER StandBy --> Active FAILED, Standby OUT OF SYNC Apr 6 16:03:53 CONTROLLER-2 osafamfd[431]: ER avd_role_change role change failure Apr 6 16:03:53 CONTROLLER-2 osafimmd[380]: WA IMMND DOWN on active controller 1 detected at standby immd!! 2. Possible failover .. Apr 6 16:06:53 CONTROLLER-2 osafamfnd[441]: WA AMF director unexpectedly crashed Apr 6 16:06:53 CONTROLLER-2 osafamfnd[441]: Rebooting OpenSAF NodeId = 131599 EE Name = , Reason: local AVD down(Adest) or both AVD down(Vdest) received, OwnNodeId = 131599, SupervisionTime = 60 Apr 6 16:06:53 CONTROLLER-2 opensaf_reboot: Rebooting local node; timeout=60 This issue is fixed as part of #1334, but might be observed because of #79 --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Find and fix application performance issues faster with Applications Manager Applications Manager provides deep performance insights into multiple tiers of your business applications. It resolves application problems quickly and reduces your MTTR. Get your free trial! https://ad.doubleclick.net/ddm/clk/302982198;130105516;z___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1732 OUT_OF_SYNC (failed over) new active controller should go for immediate reboot
Hi Anders, I tested it and it works fine. Please push the patch. Thanks -Nagu --- ** [tickets:#1732] OUT_OF_SYNC (failed over) new active controller should go for immediate reboot** **Status:** review **Milestone:** 5.0.RC2 **Created:** Wed Apr 06, 2016 11:02 AM UTC by Srikanth R **Last Updated:** Mon Apr 11, 2016 08:54 PM UTC **Owner:** nobody Changeset : 7436 Version : 5.0 FC Setup : Two controllers Issue : Out of sync (failed over) new active controller should go for immediate reboot, During failover, if the standby controller is OUT OF SYNC and could not get promoted to active, the node should be rebooted immediately. Apr 6 16:03:53 CONTROLLER-2 osafamfd[431]: ER FAILOVER StandBy --> Active FAILED, Standby OUT OF SYNC Apr 6 16:03:53 CONTROLLER-2 osafamfd[431]: ER avd_role_change role change failure Apr 6 16:03:53 CONTROLLER-2 osafimmd[380]: WA IMMND DOWN on active controller 1 detected at standby immd!! 2. Possible failover .. Apr 6 16:06:53 CONTROLLER-2 osafamfnd[441]: WA AMF director unexpectedly crashed Apr 6 16:06:53 CONTROLLER-2 osafamfnd[441]: Rebooting OpenSAF NodeId = 131599 EE Name = , Reason: local AVD down(Adest) or both AVD down(Vdest) received, OwnNodeId = 131599, SupervisionTime = 60 Apr 6 16:06:53 CONTROLLER-2 opensaf_reboot: Rebooting local node; timeout=60 This issue is fixed as part of #1334, but might be observed because of #79 --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Find and fix application performance issues faster with Applications Manager Applications Manager provides deep performance insights into multiple tiers of your business applications. It resolves application problems quickly and reduces your MTTR. Get your free trial! https://ad.doubleclick.net/ddm/clk/302982198;130105516;z___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1732 OUT_OF_SYNC (failed over) new active controller should go for immediate reboot
- **Milestone**: 5.0.RC1 --> 5.0.RC2 --- ** [tickets:#1732] OUT_OF_SYNC (failed over) new active controller should go for immediate reboot** **Status:** review **Milestone:** 5.0.RC2 **Created:** Wed Apr 06, 2016 11:02 AM UTC by Srikanth R **Last Updated:** Thu Apr 07, 2016 10:46 AM UTC **Owner:** nobody Changeset : 7436 Version : 5.0 FC Setup : Two controllers Issue : Out of sync (failed over) new active controller should go for immediate reboot, During failover, if the standby controller is OUT OF SYNC and could not get promoted to active, the node should be rebooted immediately. Apr 6 16:03:53 CONTROLLER-2 osafamfd[431]: ER FAILOVER StandBy --> Active FAILED, Standby OUT OF SYNC Apr 6 16:03:53 CONTROLLER-2 osafamfd[431]: ER avd_role_change role change failure Apr 6 16:03:53 CONTROLLER-2 osafimmd[380]: WA IMMND DOWN on active controller 1 detected at standby immd!! 2. Possible failover .. Apr 6 16:06:53 CONTROLLER-2 osafamfnd[441]: WA AMF director unexpectedly crashed Apr 6 16:06:53 CONTROLLER-2 osafamfnd[441]: Rebooting OpenSAF NodeId = 131599 EE Name = , Reason: local AVD down(Adest) or both AVD down(Vdest) received, OwnNodeId = 131599, SupervisionTime = 60 Apr 6 16:06:53 CONTROLLER-2 opensaf_reboot: Rebooting local node; timeout=60 This issue is fixed as part of #1334, but might be observed because of #79 --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Find and fix application performance issues faster with Applications Manager Applications Manager provides deep performance insights into multiple tiers of your business applications. It resolves application problems quickly and reduces your MTTR. Get your free trial! https://ad.doubleclick.net/ddm/clk/302982198;130105516;z___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1732 OUT_OF_SYNC (failed over) new active controller should go for immediate reboot
- **status**: unassigned --> review --- ** [tickets:#1732] OUT_OF_SYNC (failed over) new active controller should go for immediate reboot** **Status:** review **Milestone:** 5.0.RC1 **Created:** Wed Apr 06, 2016 11:02 AM UTC by Srikanth R **Last Updated:** Wed Apr 06, 2016 01:53 PM UTC **Owner:** nobody Changeset : 7436 Version : 5.0 FC Setup : Two controllers Issue : Out of sync (failed over) new active controller should go for immediate reboot, During failover, if the standby controller is OUT OF SYNC and could not get promoted to active, the node should be rebooted immediately. Apr 6 16:03:53 CONTROLLER-2 osafamfd[431]: ER FAILOVER StandBy --> Active FAILED, Standby OUT OF SYNC Apr 6 16:03:53 CONTROLLER-2 osafamfd[431]: ER avd_role_change role change failure Apr 6 16:03:53 CONTROLLER-2 osafimmd[380]: WA IMMND DOWN on active controller 1 detected at standby immd!! 2. Possible failover .. Apr 6 16:06:53 CONTROLLER-2 osafamfnd[441]: WA AMF director unexpectedly crashed Apr 6 16:06:53 CONTROLLER-2 osafamfnd[441]: Rebooting OpenSAF NodeId = 131599 EE Name = , Reason: local AVD down(Adest) or both AVD down(Vdest) received, OwnNodeId = 131599, SupervisionTime = 60 Apr 6 16:06:53 CONTROLLER-2 opensaf_reboot: Rebooting local node; timeout=60 This issue is fixed as part of #1334, but might be observed because of #79 --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1732 OUT_OF_SYNC (failed over) new active controller should go for immediate reboot
Please re-test with the attached patch. Attachments: - [1732.diff](https://sourceforge.net/p/opensaf/tickets/_discuss/thread/042c4c4d/e701/attachment/1732.diff) (1.2 kB; text/x-patch) --- ** [tickets:#1732] OUT_OF_SYNC (failed over) new active controller should go for immediate reboot** **Status:** unassigned **Milestone:** 5.0.RC1 **Created:** Wed Apr 06, 2016 11:02 AM UTC by Srikanth R **Last Updated:** Wed Apr 06, 2016 12:49 PM UTC **Owner:** nobody Changeset : 7436 Version : 5.0 FC Setup : Two controllers Issue : Out of sync (failed over) new active controller should go for immediate reboot, During failover, if the standby controller is OUT OF SYNC and could not get promoted to active, the node should be rebooted immediately. Apr 6 16:03:53 CONTROLLER-2 osafamfd[431]: ER FAILOVER StandBy --> Active FAILED, Standby OUT OF SYNC Apr 6 16:03:53 CONTROLLER-2 osafamfd[431]: ER avd_role_change role change failure Apr 6 16:03:53 CONTROLLER-2 osafimmd[380]: WA IMMND DOWN on active controller 1 detected at standby immd!! 2. Possible failover .. Apr 6 16:06:53 CONTROLLER-2 osafamfnd[441]: WA AMF director unexpectedly crashed Apr 6 16:06:53 CONTROLLER-2 osafamfnd[441]: Rebooting OpenSAF NodeId = 131599 EE Name = , Reason: local AVD down(Adest) or both AVD down(Vdest) received, OwnNodeId = 131599, SupervisionTime = 60 Apr 6 16:06:53 CONTROLLER-2 opensaf_reboot: Rebooting local node; timeout=60 This issue is fixed as part of #1334, but might be observed because of #79 --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1732 OUT_OF_SYNC (failed over) new active controller should go for immediate reboot
AMFD reboots the standby node when a failover STANDBY->ACTIVE is unsucessful, but for some reason the "out of sync" case is handled differently and in this case AMFD does not order a reboot. uint32_t status = NCSCC_RC_FAILURE; [...] if (AVD_STBY_OUT_OF_SYNC == cb->stby_sync_state) { LOG_ER("FAILOVER StandBy --> Active FAILED, Standby OUT OF SYNC"); return NCSCC_RC_FAILURE; } if (nullptr == (my_node = avd_node_find_nodeid(cb->node_id_avd))) { LOG_ER("FAILOVER StandBy --> Active FAILED, node %x not found", cb->node_id_avd); goto done; } if (nullptr == (failed_node = avd_node_find_nodeid(cb->node_id_avd_other))) { LOG_ER("FAILOVER StandBy --> Active FAILED, node %x not found", cb->node_id_avd_other); goto done; } /* check the node state */ if (my_node->node_state != AVD_AVND_STATE_PRESENT) { LOG_ER("FAILOVER StandBy --> Active FAILED, stdby not in good state"); goto done; } [...] done: if (NCSCC_RC_SUCCESS != status) opensaf_reboot(my_node != nullptr ? my_node->node_info.nodeId : 0, my_node != nullptr ? (char *)my_node->node_info.executionEnvironment.value : nullptr, "FAILOVER failed"); --- ** [tickets:#1732] OUT_OF_SYNC (failed over) new active controller should go for immediate reboot** **Status:** unassigned **Milestone:** 5.0.RC1 **Created:** Wed Apr 06, 2016 11:02 AM UTC by Srikanth R **Last Updated:** Wed Apr 06, 2016 11:02 AM UTC **Owner:** nobody Changeset : 7436 Version : 5.0 FC Setup : Two controllers Issue : Out of sync (failed over) new active controller should go for immediate reboot, During failover, if the standby controller is OUT OF SYNC and could not get promoted to active, the node should be rebooted immediately. Apr 6 16:03:53 CONTROLLER-2 osafamfd[431]: ER FAILOVER StandBy --> Active FAILED, Standby OUT OF SYNC Apr 6 16:03:53 CONTROLLER-2 osafamfd[431]: ER avd_role_change role change failure Apr 6 16:03:53 CONTROLLER-2 osafimmd[380]: WA IMMND DOWN on active controller 1 detected at standby immd!! 2. Possible failover .. Apr 6 16:06:53 CONTROLLER-2 osafamfnd[441]: WA AMF director unexpectedly crashed Apr 6 16:06:53 CONTROLLER-2 osafamfnd[441]: Rebooting OpenSAF NodeId = 131599 EE Name = , Reason: local AVD down(Adest) or both AVD down(Vdest) received, OwnNodeId = 131599, SupervisionTime = 60 Apr 6 16:06:53 CONTROLLER-2 opensaf_reboot: Rebooting local node; timeout=60 This issue is fixed as part of #1334, but might be observed because of #79 --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1732 OUT_OF_SYNC (failed over) new active controller should go for immediate reboot
--- ** [tickets:#1732] OUT_OF_SYNC (failed over) new active controller should go for immediate reboot** **Status:** unassigned **Milestone:** 5.0.RC1 **Created:** Wed Apr 06, 2016 11:02 AM UTC by Srikanth R **Last Updated:** Wed Apr 06, 2016 11:02 AM UTC **Owner:** nobody Changeset : 7436 Version : 5.0 FC Setup : Two controllers Issue : Out of sync (failed over) new active controller should go for immediate reboot, During failover, if the standby controller is OUT OF SYNC and could not get promoted to active, the node should be rebooted immediately. Apr 6 16:03:53 CONTROLLER-2 osafamfd[431]: ER FAILOVER StandBy --> Active FAILED, Standby OUT OF SYNC Apr 6 16:03:53 CONTROLLER-2 osafamfd[431]: ER avd_role_change role change failure Apr 6 16:03:53 CONTROLLER-2 osafimmd[380]: WA IMMND DOWN on active controller 1 detected at standby immd!! 2. Possible failover .. Apr 6 16:06:53 CONTROLLER-2 osafamfnd[441]: WA AMF director unexpectedly crashed Apr 6 16:06:53 CONTROLLER-2 osafamfnd[441]: Rebooting OpenSAF NodeId = 131599 EE Name = , Reason: local AVD down(Adest) or both AVD down(Vdest) received, OwnNodeId = 131599, SupervisionTime = 60 Apr 6 16:06:53 CONTROLLER-2 opensaf_reboot: Rebooting local node; timeout=60 This issue is fixed as part of #1334, but might be observed because of #79 --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets