The main question for the current review of this ticket is if it is
posible to reproduce the repeated IMMND restarts reported in the creation
of this ticket. 

A question is also if that was a test that has been run on older releases 
without any problems. 

I am pretty sure that it has always been possible to overload the system by 
generating enough traffic.

The IMMND has the flow control mechanism to reduse the risk of overloading the 
central bottleneck of the IMMD.
But after the MDS-TIPC multicast enhancement, that bottleneck in the IMMD  has 
been reduced a lot.
The effect of that is a raised presure on fevs receivers, the IMMNDs receiving 
the fevs from the IMMD.

If I remember correctly, there was also some optimization of the sync to
increase the sync-batch size etc.
Perhaps we went too far?
A burst of large messages is more likley than normal to overload the tipc
receiver, resulting in tipc discarding some messagesm, resulting in this
MESSAGE OUT OF ORDER symptom.

The problem with this ticket is that it has no single or absolute solution.
There is no one problem to solve.

The problem is also not the fact of getting out of order message. 
If this ticket points to a problem it would be that this message out of order 
now can happen too often and even cyclicaly.

If that is still the case, then we perhaps need to partly rewind/undo some
of the imm-sync optimizations that where done in 4.5.

But the first thing to test (as part of the on-going review) should be to
redo the test that generated this ticket.

Is it still "too easy" to get repeated IMMND restarts in sync ?
 





---

** [tickets:#1036] mds:  IMMND restarts because of out of order messages**

**Status:** review
**Milestone:** 4.5.0
**Created:** Tue Sep 02, 2014 01:06 PM UTC by Neelakanta Reddy
**Last Updated:** Wed Oct 01, 2014 01:32 PM UTC
**Owner:** A V Mahesh (AVM)

Sep  2 05:16:57 SLES-SLOT-2 osafimmnd[21492]: WA MESSAGE:81414 OUT OF ORDER my 
highest processed:81412, exiting

Recreation steps:

1. The problem is reproduced when 100 swithovers are done

2. Immediately when failover is done 

Then outof order message is observed.

3. Because of out-of order message, new-active IMMND went for re-start.

>From there on :
Sep  2 05:17:04 SLES-SLOT-2 osafimmnd[10230]: WA Sync MESSAGE:81639 OUT OF 
ORDER my highest processed:81637
Sep  2 05:17:09 SLES-SLOT-2 osafimmnd[10254]: WA Sync MESSAGE:81893 OUT OF 
ORDER my highest processed:81891
Sep  2 05:17:16 SLES-SLOT-2 osafimmnd[10275]: WA Sync MESSAGE:82114 OUT OF 
ORDER my highest processed:82112
Sep  2 05:17:20 SLES-SLOT-2 osafimmnd[10295]: WA Sync MESSAGE:82335 OUT OF 
ORDER my highest processed:82333 

4. Because of constant IMMND restarts at the time of sync,CLM got TRY_AGAIN and 
node went for reboot
Sep  2 05:17:16 SLES-SLOT-2 osafimmd[10478]: NO Node 2020f request sync 
sync-pid:10295 epoch:0
Sep  2 05:17:17 SLES-SLOT-2 osafclmd[10521]: ER saImmOiInitialize_2 failed 6, 
exiting
Sep  2 05:17:17 SLES-SLOT-2 osafamfnd[10550]: NO 
'safComp=CLM,safSu=SC-2,safSg=2N,safApp=OpenSAF' faulted due to 'avaDown' : 
Recovery is 'nodeFailfast'
Sep  2 05:17:17 SLES-SLOT-2 osafamfnd[10550]: ER 
safComp=CLM,safSu=SC-2,safSg=2N,safApp=OpenSAF Faulted due to:avaDown Recovery 
is:nodeFailfast
Sep  2 05:17:17 SLES-SLOT-2 osafamfnd[10550]: Rebooting OpenSAF NodeId = 131599 
EE Name = , Reason: Component faulted: recovery is node failfast, OwnNodeId = 
131599, SupervisionTime = 60




---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.
------------------------------------------------------------------------------
Meet PCI DSS 3.0 Compliance Requirements with EventLog Analyzer
Achieve PCI DSS 3.0 Compliant Status with Out-of-the-box PCI DSS Reports
Are you Audit-Ready for PCI DSS 3.0 Compliance? Download White paper
Comply to PCI DSS 3.0 Requirement 10 and 11.5 with EventLog Analyzer
http://pubads.g.doubleclick.net/gampad/clk?id=154622311&iu=/4140/ostg.clktrk
_______________________________________________
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

Reply via email to