That should have been "MDS broadcast using TIPC broadcast has higher
throughput than the older implementaiton".

And this is no test it is a sync client seing dropped messages.
That IMMN restarts and will sync again. 
The CLM restart due to TRY_AGAIN on imm-handle-initialize is a bug in CLM,
unless it has retried for 60 seconds (max allowed sync time). 




---

** [tickets:#1036] mds:  IMMND restarts because of out of order messages**

**Status:** unassigned
**Milestone:** 4.5.0
**Created:** Tue Sep 02, 2014 01:06 PM UTC by Neelakanta Reddy
**Last Updated:** Tue Sep 02, 2014 01:24 PM UTC
**Owner:** nobody

Sep  2 05:16:57 SLES-SLOT-2 osafimmnd[21492]: WA MESSAGE:81414 OUT OF ORDER my 
highest processed:81412, exiting

Recreation steps:

1. The problem is reproduced when 100 swithovers are done

2. Immediately when failover is done 

Then outof order message is observed.

3. Because of out-of order message, new-active IMMND went for re-start.

>From there on :
Sep  2 05:17:04 SLES-SLOT-2 osafimmnd[10230]: WA Sync MESSAGE:81639 OUT OF 
ORDER my highest processed:81637
Sep  2 05:17:09 SLES-SLOT-2 osafimmnd[10254]: WA Sync MESSAGE:81893 OUT OF 
ORDER my highest processed:81891
Sep  2 05:17:16 SLES-SLOT-2 osafimmnd[10275]: WA Sync MESSAGE:82114 OUT OF 
ORDER my highest processed:82112
Sep  2 05:17:20 SLES-SLOT-2 osafimmnd[10295]: WA Sync MESSAGE:82335 OUT OF 
ORDER my highest processed:82333 

4. Because of constant IMMND restarts at the time of sync,CLM got TRY_AGAIN and 
node went for reboot
Sep  2 05:17:16 SLES-SLOT-2 osafimmd[10478]: NO Node 2020f request sync 
sync-pid:10295 epoch:0
Sep  2 05:17:17 SLES-SLOT-2 osafclmd[10521]: ER saImmOiInitialize_2 failed 6, 
exiting
Sep  2 05:17:17 SLES-SLOT-2 osafamfnd[10550]: NO 
'safComp=CLM,safSu=SC-2,safSg=2N,safApp=OpenSAF' faulted due to 'avaDown' : 
Recovery is 'nodeFailfast'
Sep  2 05:17:17 SLES-SLOT-2 osafamfnd[10550]: ER 
safComp=CLM,safSu=SC-2,safSg=2N,safApp=OpenSAF Faulted due to:avaDown Recovery 
is:nodeFailfast
Sep  2 05:17:17 SLES-SLOT-2 osafamfnd[10550]: Rebooting OpenSAF NodeId = 131599 
EE Name = , Reason: Component faulted: recovery is node failfast, OwnNodeId = 
131599, SupervisionTime = 60




---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.
------------------------------------------------------------------------------
Slashdot TV.  
Video for Nerds.  Stuff that matters.
http://tv.slashdot.org/
_______________________________________________
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

Reply via email to