- **status**: unassigned --> wontfix
- **Comment**:

The problem and the logs share are not matching, share the correct logs.


** [tickets:#1868] Headless: IMM: Cluster reset happened due to 'avaDown' while 
killing immd**

**Status:** wontfix
**Milestone:** future
**Created:** Wed Jun 08, 2016 12:45 PM UTC by Chani Srivastava
**Last Updated:** Tue Nov 08, 2016 09:04 AM UTC
**Owner:** nobody

 (153.5 kB; application/octet-stream)
 (173.4 kB; application/octet-stream)
 (147.6 kB; application/octet-stream)
 (124.9 kB; application/octet-stream)

Version - opensaf 5.0.GA
6-Node cluster(SC-1:Active, SC-2:Standby, SC-3:Spare PL:4,PL-5&PL-6: Payloads)

Step to reproduce:
1. Install and bring up opensaf on 6 nodes in cluster with with Active, 
Stanbdy, Spare and 3 Payloads
2. Take cluster in headless state by killing immd onActive Controller first 
followed by Standby and Spare controller.
3. IMMD got crashed due to avaDown andf cluster reset happened.

> Jun  8 15:35:53 SCALE_SLOT-81 osafimmnd[1806]: NO SERVER STATE: 
Jun  8 15:35:53 SCALE_SLOT-81 osafimmd[1756]: NO ACT: New Epoch for IMMND 
process at node 2060f old epoch: 0  new epoch:104
Jun  8 15:35:54 SCALE_SLOT-81 osafamfd[1852]: NO Received node_up from 2060f: 
msg_id 1
Jun  8 15:35:54 SCALE_SLOT-81 osafamfd[1852]: NO Node 'PL-6' joined the cluster
Jun  8 15:35:56 SCALE_SLOT-81 osafimmnd[1806]: NO Implementer connected: 748 
(MsgQueueService132623) <0, 2060f>
Jun  8 15:43:50 SCALE_SLOT-81 osafimmnd[1806]: NO ERR_BAD_OPERATION: parent 
object not owned by 'SetUp_Ccb'
Jun  8 15:43:50 SCALE_SLOT-81 osafimmnd[1806]: NO ERR_BAD_OPERATION: parent 
object not owned by 'SetUp_Ccb'
Jun  8 15:43:52 SCALE_SLOT-81 osafimmnd[1806]: NO Implementer connected: 749 
(RUNTIMEIMPL) <0, 2050f>
Jun  8 15:44:06 SCALE_SLOT-81 sshd[3213]: Accepted keyboard-interactive/pam for 
root from port 37187 ssh2
Jun  8 15:44:07 SCALE_SLOT-81 root: killing osafimmd from run_headless.sh on 
spare controller
Jun  8 15:44:07 SCALE_SLOT-81 osafamfnd[1863]: NO 
'safComp=IMMD,safSu=SC-1,safSg=2N,safApp=OpenSAF' faulted due to 'avaDown' : 
Recovery is 'nodeFailfast'
Jun  8 15:44:07 SCALE_SLOT-81 osafamfnd[1863]: **ER 
safComp=IMMD,safSu=SC-1,safSg=2N,safApp=OpenSAF Faulted due to:avaDown Recovery 
Jun  8 15:44:07 SCALE_SLOT-81 osafamfnd[1863]: Rebooting OpenSAF NodeId = 
131343 EE Name = , Reason: Component faulted: recovery is node failfast, 
OwnNodeId = 131343, SupervisionTime = 60**
Jun  8 15:44:07 SCALE_SLOT-81 osafimmnd[1806]: WA DISCARD DUPLICATE FEVS 
Jun  8 15:44:07 SCALE_SLOT-81 osafimmnd[1806]: WA Error code 2 returned for 
message type 82 - ignoring
Jun  8 15:44:07 SCALE_SLOT-81 osafimmnd[1806]: WA DISCARD DUPLICATE FEVS 
Jun  8 15:44:07 SCALE_SLOT-81 osafimmnd[1806]: WA Error code 2 returned for 
message type 82 - ignoring
Jun  8 15:44:07 SCALE_SLOT-81 opensaf_reboot: Rebooting local node; timeout=60

Attaching syslogs for controllers and payload in action
Traces are huge in size. Will share seperately

Note: Machines are not sync with timings. Current logs are the ones after June 8


Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
Opensaf-tickets mailing list

Reply via email to