Re: Packet loss every 30.999 seconds

Bruce Evans Tue, 18 Dec 2007 01:29:00 -0800

On Mon, 17 Dec 2007, David G Lawrence wrote:

While trying to diagnose a packet loss problem in a RELENG_6 snapshot
dated
November 8, 2007 it looks like I've stumbled across a broken driver or
kernel routine which stops interrupt processing long enough to severly
degrade network performance every 30.99 seconds.


I see the same behaviour under a heavily modified version of FreeBSD-5.2
(except the period was 2 ms longer and the latency was 7 ms instead
of 11 ms when numvnodes was at a certain value.  Now with numvnodes =
17500, the latency is 3 ms.

  I noticed this as well some time ago. The problem has to do with the
processing (syncing) of vnodes. When the total number of allocated vnodes
in the system grows to tens of thousands, the ~31 second periodic sync
process takes a long time to run. Try this patch and let people know if
it helps your problem. It will periodically wait for one tick (1ms) every
500 vnodes of processing, which will allow other things to run.


However, the syncer should be running at a relative low priority and not
cause packet loss.  I don't see any packet loss even in ~5.2 where the
network stack (but not drivers) is still Giant-locked.

Other too-high latencies showed up:
- syscons LED setting and vt switching gives a latency of 5.5 msec because
  syscons still uses busy-waiting for setting LEDs :-(.  Oops, I do see
  packet loss -- this causes it under ~5.2 but not under -current.  For
  the bge and/or em drivers, the packet loss shows up in netstat output
  as a few hundred errors for every LED setting on the receiving machine,
  while receiving tiny packets at the maximum possible rate of 640 kpps.
  sysctl is completely Giant-locked and so are upper layers of the
  network stack.  The bge hardware rx ring size is 256 in -current and
  512 in ~5.2.  At 640 kpps, 512 packets take 800 us so bge wants to
  call the the upper layers with a latency of far below 800 us.  I
  don't know exactly where the upper layers block on Giant.
- a user CPU hog process gives a latency of over 200 ms every half a
  second or so when the hog starts up, and a 300-400 ms after the
  hog has been running for some time.  Two user CPU hog processes
  double the latency.  Reducing kern.sched.quantum from 100 ms to 10
  ms and/or renicing the hogs don't seem to affect this.  Running the
  hogs at idle priority fixes this.  This won't affect packet loss,
  but it might affect user network processes -- they might need to
  run at real time priority to get low enough latency.  They might need
  to do this anyway -- a scheduling quantum of 100 ms should give a
  latency of 100 ms per CPU hog quite often, though not usually since
  the hogs should never be prefered to a higher-prioerity process.

Previously I've used a less specialized clock-watching program to
determine the syscall latency.  It showed similar problems for CPU
hogs.  I just remembered that I found the fix for these under ~5.2 --
remove a local hack that sacrifices latency for reduced context
switches between user threads.  -current with SCHED_4BSD does this
non-hackishly, but seems to have a bug somehwhere that gives a latency
that is large enough to be noticeable in interactive programs.

Bruce
_______________________________________________
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: Packet loss every 30.999 seconds

Reply via email to