I didn't intend to suggest that OpenSAF should attempt to alter either of them. 
Especially when there is already support to configure importance at build time.

My point was to imagine whether even after tuning all the attributes exposed by 
TIPC for congestion control, we would still hit the EAGAIN case or not.
But, that's interesting to note about how TIPC will evolve in this area.
Either way, there would still be some tunable attributes for congestion control.
Still, if we assume that we would hit EAGAIN then iam not sure how returning 
TRY AGAIN would be helpful for an AVA that is busy(*time*) reporting 
healthcheck to AMF, because it might be dependent on the *time* TIPC/system 
takes to recover and is ready to transmit(upon retry) the subsequent/backlogged 
message.
In terms of criticality, yes having a retry-loop in AMFD, AMFND could help(to 
start with).


---

** [tickets:#641] MDS resend**

**Status:** unassigned
**Created:** Thu Nov 28, 2013 09:40 AM UTC by Hans Feldt
**Last Updated:** Tue Dec 03, 2013 06:49 AM UTC
**Owner:** nobody

Occasionally we see TIPC link congestion at the sending node. This can be seen 
with "tipc-config -ls" as in:

Link <1.1.1:eth0-1.1.2:eth0>
  ACTIVE  MTU:1500  Priority:10  Tolerance:1500 ms  Window:100 packets
  RX packets:1877 fragments:0/0 bundles:0/0
  TX packets:17511 fragments:0/0 bundles:0/0
  TX profile sample:489 packets  average:151 octets
  0-64:0% -256:97% -1024:0% -4096:3% -16354:0% -32768:0% -66000:0%
  RX states:12974 probes:5918 naks:0 defs:0 dups:0
  TX states:12148 probes:6230 naks:0 acks:0 dups:0
  Congestion bearer:0 link:12  Send queue max:81 avg:0

Above is from testing with tipc-pipe. With the default link window at 50 I get 
one send EAGAIN when sending 100 msgs. When I increase link window to 100 I can 
burst 100 msgs without link congestion. If I then burst 1000 msgs I see lots of 
link congestion.


If we transfer this to opensaf it means a failed MDS send. MDS does not do any 
resends, neither does any service (IMMs FEVS not considered)

AMFd for example could loose a message to a node director which is very bad.

What I suggest is that messages should be resent, typically a loop with 3 
retries with a 100ms sleep in between.

There are some concerns, like:
* Can we put this in MDS or should it be done per service?
* What about MDS/TCP, same problem there?


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.
------------------------------------------------------------------------------
Rapidly troubleshoot problems before they affect your business. Most IT 
organizations don't have a clear picture of how application performance 
affects their revenue. With AppDynamics, you get 100% visibility into your 
Java,.NET, & PHP application. Start your 15-day FREE TRIAL of AppDynamics Pro!
http://pubads.g.doubleclick.net/gampad/clk?id=84349351&iu=/4140/ostg.clktrk
_______________________________________________
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

Reply via email to