[tickets] [opensaf:tickets] #1732 OUT_OF_SYNC (failed over) new active controller should go for immediate reboot

2016-04-12 Thread Anders Widell
- **status**: review --> fixed
- **Comment**:

changeset:   7485:c6ebe597634d
branch:  opensaf-5.0.x
user:Anders Widell 
date:Tue Apr 12 14:12:30 2016 +0200
summary: amfd: Reboot standby when failover fails due to out of sync [#1732]

changeset:   7486:fd5caf343318
parent:  7483:0d1bf5efac9a
user:Anders Widell 
date:Tue Apr 12 14:12:30 2016 +0200
summary: amfd: Reboot standby when failover fails due to out of sync [#1732]

[staging:c6ebe5]
[staging:fd5caf]




---

** [tickets:#1732] OUT_OF_SYNC (failed over) new active controller should go 
for immediate reboot**

**Status:** fixed
**Milestone:** 5.0.RC2
**Created:** Wed Apr 06, 2016 11:02 AM UTC by Srikanth R
**Last Updated:** Tue Apr 12, 2016 11:08 AM UTC
**Owner:** nobody


Changeset : 7436 
Version : 5.0 FC
Setup : Two controllers


Issue :
  Out of sync (failed over) new active controller should go for immediate 
reboot,
  
  During failover, if the standby controller is OUT OF SYNC and could not get 
promoted to active, the node should be rebooted immediately.
 
Apr  6 16:03:53 CONTROLLER-2 osafamfd[431]: ER FAILOVER StandBy --> Active 
FAILED, Standby OUT OF SYNC
Apr  6 16:03:53 CONTROLLER-2 osafamfd[431]: ER avd_role_change role change 
failure
Apr  6 16:03:53 CONTROLLER-2 osafimmd[380]: WA IMMND DOWN on active controller 
1 detected at standby immd!! 2. Possible failover
..
Apr  6 16:06:53 CONTROLLER-2 osafamfnd[441]: WA AMF director unexpectedly 
crashed
Apr  6 16:06:53 CONTROLLER-2 osafamfnd[441]: Rebooting OpenSAF NodeId = 131599 
EE Name = , Reason: local AVD down(Adest) or both AVD down(Vdest) received, 
OwnNodeId = 131599, SupervisionTime = 60
Apr  6 16:06:53 CONTROLLER-2 opensaf_reboot: Rebooting local node; timeout=60

This issue is fixed as part of  #1334, but might be observed because of #79


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Find and fix application performance issues faster with Applications Manager
Applications Manager provides deep performance insights into multiple tiers of
your business applications. It resolves application problems quickly and
reduces your MTTR. Get your free trial!
https://ad.doubleclick.net/ddm/clk/302982198;130105516;z___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #1732 OUT_OF_SYNC (failed over) new active controller should go for immediate reboot

2016-04-12 Thread Nagendra Kumar
Hi Anders,
I tested it and it works fine. Please push the patch.

Thanks
-Nagu


---

** [tickets:#1732] OUT_OF_SYNC (failed over) new active controller should go 
for immediate reboot**

**Status:** review
**Milestone:** 5.0.RC2
**Created:** Wed Apr 06, 2016 11:02 AM UTC by Srikanth R
**Last Updated:** Mon Apr 11, 2016 08:54 PM UTC
**Owner:** nobody


Changeset : 7436 
Version : 5.0 FC
Setup : Two controllers


Issue :
  Out of sync (failed over) new active controller should go for immediate 
reboot,
  
  During failover, if the standby controller is OUT OF SYNC and could not get 
promoted to active, the node should be rebooted immediately.
 
Apr  6 16:03:53 CONTROLLER-2 osafamfd[431]: ER FAILOVER StandBy --> Active 
FAILED, Standby OUT OF SYNC
Apr  6 16:03:53 CONTROLLER-2 osafamfd[431]: ER avd_role_change role change 
failure
Apr  6 16:03:53 CONTROLLER-2 osafimmd[380]: WA IMMND DOWN on active controller 
1 detected at standby immd!! 2. Possible failover
..
Apr  6 16:06:53 CONTROLLER-2 osafamfnd[441]: WA AMF director unexpectedly 
crashed
Apr  6 16:06:53 CONTROLLER-2 osafamfnd[441]: Rebooting OpenSAF NodeId = 131599 
EE Name = , Reason: local AVD down(Adest) or both AVD down(Vdest) received, 
OwnNodeId = 131599, SupervisionTime = 60
Apr  6 16:06:53 CONTROLLER-2 opensaf_reboot: Rebooting local node; timeout=60

This issue is fixed as part of  #1334, but might be observed because of #79


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Find and fix application performance issues faster with Applications Manager
Applications Manager provides deep performance insights into multiple tiers of
your business applications. It resolves application problems quickly and
reduces your MTTR. Get your free trial!
https://ad.doubleclick.net/ddm/clk/302982198;130105516;z___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #1732 OUT_OF_SYNC (failed over) new active controller should go for immediate reboot

2016-04-11 Thread Mathi Naickan
- **Milestone**: 5.0.RC1 --> 5.0.RC2



---

** [tickets:#1732] OUT_OF_SYNC (failed over) new active controller should go 
for immediate reboot**

**Status:** review
**Milestone:** 5.0.RC2
**Created:** Wed Apr 06, 2016 11:02 AM UTC by Srikanth R
**Last Updated:** Thu Apr 07, 2016 10:46 AM UTC
**Owner:** nobody


Changeset : 7436 
Version : 5.0 FC
Setup : Two controllers


Issue :
  Out of sync (failed over) new active controller should go for immediate 
reboot,
  
  During failover, if the standby controller is OUT OF SYNC and could not get 
promoted to active, the node should be rebooted immediately.
 
Apr  6 16:03:53 CONTROLLER-2 osafamfd[431]: ER FAILOVER StandBy --> Active 
FAILED, Standby OUT OF SYNC
Apr  6 16:03:53 CONTROLLER-2 osafamfd[431]: ER avd_role_change role change 
failure
Apr  6 16:03:53 CONTROLLER-2 osafimmd[380]: WA IMMND DOWN on active controller 
1 detected at standby immd!! 2. Possible failover
..
Apr  6 16:06:53 CONTROLLER-2 osafamfnd[441]: WA AMF director unexpectedly 
crashed
Apr  6 16:06:53 CONTROLLER-2 osafamfnd[441]: Rebooting OpenSAF NodeId = 131599 
EE Name = , Reason: local AVD down(Adest) or both AVD down(Vdest) received, 
OwnNodeId = 131599, SupervisionTime = 60
Apr  6 16:06:53 CONTROLLER-2 opensaf_reboot: Rebooting local node; timeout=60

This issue is fixed as part of  #1334, but might be observed because of #79


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Find and fix application performance issues faster with Applications Manager
Applications Manager provides deep performance insights into multiple tiers of
your business applications. It resolves application problems quickly and
reduces your MTTR. Get your free trial!
https://ad.doubleclick.net/ddm/clk/302982198;130105516;z___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #1732 OUT_OF_SYNC (failed over) new active controller should go for immediate reboot

2016-04-07 Thread Anders Widell
- **status**: unassigned --> review



---

** [tickets:#1732] OUT_OF_SYNC (failed over) new active controller should go 
for immediate reboot**

**Status:** review
**Milestone:** 5.0.RC1
**Created:** Wed Apr 06, 2016 11:02 AM UTC by Srikanth R
**Last Updated:** Wed Apr 06, 2016 01:53 PM UTC
**Owner:** nobody


Changeset : 7436 
Version : 5.0 FC
Setup : Two controllers


Issue :
  Out of sync (failed over) new active controller should go for immediate 
reboot,
  
  During failover, if the standby controller is OUT OF SYNC and could not get 
promoted to active, the node should be rebooted immediately.
 
Apr  6 16:03:53 CONTROLLER-2 osafamfd[431]: ER FAILOVER StandBy --> Active 
FAILED, Standby OUT OF SYNC
Apr  6 16:03:53 CONTROLLER-2 osafamfd[431]: ER avd_role_change role change 
failure
Apr  6 16:03:53 CONTROLLER-2 osafimmd[380]: WA IMMND DOWN on active controller 
1 detected at standby immd!! 2. Possible failover
..
Apr  6 16:06:53 CONTROLLER-2 osafamfnd[441]: WA AMF director unexpectedly 
crashed
Apr  6 16:06:53 CONTROLLER-2 osafamfnd[441]: Rebooting OpenSAF NodeId = 131599 
EE Name = , Reason: local AVD down(Adest) or both AVD down(Vdest) received, 
OwnNodeId = 131599, SupervisionTime = 60
Apr  6 16:06:53 CONTROLLER-2 opensaf_reboot: Rebooting local node; timeout=60

This issue is fixed as part of  #1334, but might be observed because of #79


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #1732 OUT_OF_SYNC (failed over) new active controller should go for immediate reboot

2016-04-06 Thread Anders Widell
Please re-test with the attached patch.


Attachments:

- 
[1732.diff](https://sourceforge.net/p/opensaf/tickets/_discuss/thread/042c4c4d/e701/attachment/1732.diff)
 (1.2 kB; text/x-patch)


---

** [tickets:#1732] OUT_OF_SYNC (failed over) new active controller should go 
for immediate reboot**

**Status:** unassigned
**Milestone:** 5.0.RC1
**Created:** Wed Apr 06, 2016 11:02 AM UTC by Srikanth R
**Last Updated:** Wed Apr 06, 2016 12:49 PM UTC
**Owner:** nobody


Changeset : 7436 
Version : 5.0 FC
Setup : Two controllers


Issue :
  Out of sync (failed over) new active controller should go for immediate 
reboot,
  
  During failover, if the standby controller is OUT OF SYNC and could not get 
promoted to active, the node should be rebooted immediately.
 
Apr  6 16:03:53 CONTROLLER-2 osafamfd[431]: ER FAILOVER StandBy --> Active 
FAILED, Standby OUT OF SYNC
Apr  6 16:03:53 CONTROLLER-2 osafamfd[431]: ER avd_role_change role change 
failure
Apr  6 16:03:53 CONTROLLER-2 osafimmd[380]: WA IMMND DOWN on active controller 
1 detected at standby immd!! 2. Possible failover
..
Apr  6 16:06:53 CONTROLLER-2 osafamfnd[441]: WA AMF director unexpectedly 
crashed
Apr  6 16:06:53 CONTROLLER-2 osafamfnd[441]: Rebooting OpenSAF NodeId = 131599 
EE Name = , Reason: local AVD down(Adest) or both AVD down(Vdest) received, 
OwnNodeId = 131599, SupervisionTime = 60
Apr  6 16:06:53 CONTROLLER-2 opensaf_reboot: Rebooting local node; timeout=60

This issue is fixed as part of  #1334, but might be observed because of #79


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #1732 OUT_OF_SYNC (failed over) new active controller should go for immediate reboot

2016-04-06 Thread Anders Widell
AMFD reboots the standby node when a failover STANDBY->ACTIVE is unsucessful, 
but for some reason the "out of sync" case is handled differently and in this 
case AMFD does not order a reboot.



uint32_t status = NCSCC_RC_FAILURE;


[...]


if (AVD_STBY_OUT_OF_SYNC == cb->stby_sync_state) {
LOG_ER("FAILOVER StandBy --> Active FAILED, Standby OUT OF 
SYNC");
return NCSCC_RC_FAILURE;
}

if (nullptr == (my_node = avd_node_find_nodeid(cb->node_id_avd))) {
LOG_ER("FAILOVER StandBy --> Active FAILED, node %x not found", 
cb->node_id_avd);
goto done;
}

if (nullptr == (failed_node = 
avd_node_find_nodeid(cb->node_id_avd_other))) {
LOG_ER("FAILOVER StandBy --> Active FAILED, node %x not found", 
cb->node_id_avd_other);
goto done;
}

/* check the node state */
if (my_node->node_state != AVD_AVND_STATE_PRESENT) {
LOG_ER("FAILOVER StandBy --> Active FAILED, stdby not in good 
state");
goto done;
}


[...]


done:
if (NCSCC_RC_SUCCESS != status)
opensaf_reboot(my_node != nullptr ? my_node->node_info.nodeId : 
0,
my_node != nullptr ? (char 
*)my_node->node_info.executionEnvironment.value : nullptr,
"FAILOVER failed");



---

** [tickets:#1732] OUT_OF_SYNC (failed over) new active controller should go 
for immediate reboot**

**Status:** unassigned
**Milestone:** 5.0.RC1
**Created:** Wed Apr 06, 2016 11:02 AM UTC by Srikanth R
**Last Updated:** Wed Apr 06, 2016 11:02 AM UTC
**Owner:** nobody


Changeset : 7436 
Version : 5.0 FC
Setup : Two controllers


Issue :
  Out of sync (failed over) new active controller should go for immediate 
reboot,
  
  During failover, if the standby controller is OUT OF SYNC and could not get 
promoted to active, the node should be rebooted immediately.
 
Apr  6 16:03:53 CONTROLLER-2 osafamfd[431]: ER FAILOVER StandBy --> Active 
FAILED, Standby OUT OF SYNC
Apr  6 16:03:53 CONTROLLER-2 osafamfd[431]: ER avd_role_change role change 
failure
Apr  6 16:03:53 CONTROLLER-2 osafimmd[380]: WA IMMND DOWN on active controller 
1 detected at standby immd!! 2. Possible failover
..
Apr  6 16:06:53 CONTROLLER-2 osafamfnd[441]: WA AMF director unexpectedly 
crashed
Apr  6 16:06:53 CONTROLLER-2 osafamfnd[441]: Rebooting OpenSAF NodeId = 131599 
EE Name = , Reason: local AVD down(Adest) or both AVD down(Vdest) received, 
OwnNodeId = 131599, SupervisionTime = 60
Apr  6 16:06:53 CONTROLLER-2 opensaf_reboot: Rebooting local node; timeout=60

This issue is fixed as part of  #1334, but might be observed because of #79


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #1732 OUT_OF_SYNC (failed over) new active controller should go for immediate reboot

2016-04-06 Thread Srikanth R



---

** [tickets:#1732] OUT_OF_SYNC (failed over) new active controller should go 
for immediate reboot**

**Status:** unassigned
**Milestone:** 5.0.RC1
**Created:** Wed Apr 06, 2016 11:02 AM UTC by Srikanth R
**Last Updated:** Wed Apr 06, 2016 11:02 AM UTC
**Owner:** nobody


Changeset : 7436 
Version : 5.0 FC
Setup : Two controllers


Issue :
  Out of sync (failed over) new active controller should go for immediate 
reboot,
  
  During failover, if the standby controller is OUT OF SYNC and could not get 
promoted to active, the node should be rebooted immediately.
 
Apr  6 16:03:53 CONTROLLER-2 osafamfd[431]: ER FAILOVER StandBy --> Active 
FAILED, Standby OUT OF SYNC
Apr  6 16:03:53 CONTROLLER-2 osafamfd[431]: ER avd_role_change role change 
failure
Apr  6 16:03:53 CONTROLLER-2 osafimmd[380]: WA IMMND DOWN on active controller 
1 detected at standby immd!! 2. Possible failover
..
Apr  6 16:06:53 CONTROLLER-2 osafamfnd[441]: WA AMF director unexpectedly 
crashed
Apr  6 16:06:53 CONTROLLER-2 osafamfnd[441]: Rebooting OpenSAF NodeId = 131599 
EE Name = , Reason: local AVD down(Adest) or both AVD down(Vdest) received, 
OwnNodeId = 131599, SupervisionTime = 60
Apr  6 16:06:53 CONTROLLER-2 opensaf_reboot: Rebooting local node; timeout=60

This issue is fixed as part of  #1334, but might be observed because of #79


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets