In previous letter I've described my attempts to try vr(4) from HEAD.
Now I'd like to explain why I've tried it.

The problem is that stock vr(4) from 8.3-STABLE/i386 has serious issues for my 
system.
I have home router with two vr interfaces, vr0 is for LAN (IPoE) and vr1 is for 
WAN (PPPoE/mpd).

Presently, every day my WAN vr interface stops running correctly:
sometimes it stops receiving all packets - tcpdump shows none of them.
Sometimes, it receives some but with great delay - up to 10 seconds (not 
miliseconds)
and even more. tcpdump shows that delay occurs on receive path.
Sometimes, it even rearranges packets - tcpdump shows that some incoming ICMP 
echo requests
with lower sequence numbers come in later that already answered higher-numbered 
requests.

ifconfig vr1 down/up revives interface completely until next morning.
sysctl net.inet.ip.fw.enable=0 does not solve the problem.

I have control over WAN switching/routing network and may assure it runs just 
fine.
However, I can't guarantee it has no "soft" anomalies like short storms or some 
silly broadcasts.

I've tried to make incoming flood with ng_source(4) generated UDP flood at 100M 
rate
for 60 seconds and failed to reproduce the problem artificially.

I've tried to move WAN from vr1 to vr0 and the problem has moved to vr0 too.
My LAN has very little traffic and corresponding vr interface exhibits no 
problems.

This router also routinely runs transmission (torrent client from ports)
serving torrents from USB-attached HDD making severe CPU load, so I suspect
the problem may be related with CPU load.

I've also checked mbuf/mbuf clusters usage and they are all right:

# netstat -m
1539/2076/3615 mbufs in use (current/cache/total)
1200/1278/2478/65536 mbuf clusters in use (current/cache/total/max)
1200/306 mbuf+clusters out of packet secondary zone in use (current/cache)
318/181/499/12800 4k (page size) jumbo clusters in use (current/cache/total/max)
0/0/0/6400 9k jumbo clusters in use (current/cache/total/max)
0/0/0/3200 16k jumbo clusters in use (current/cache/total/max)
4056K/3799K/7855K bytes allocated to network (current/cache/total)
0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters)
0/0/0 requests for jumbo clusters denied (4k/9k/16k)
0/4/6656 sfbufs in use (current/peak/max)
0 requests for sfbufs denied
0 requests for sfbufs delayed
0 requests for I/O initiated by sendfile
0 calls to protocol drain routines

# vmstat -z | egrep -i 'ITEM|mbuf'
ITEM                     SIZE     LIMIT      USED      FREE  REQUESTS  FAILURES
mbuf_packet:              256,        0,     1429,       77, 112854470,        0
mbuf:                     256,        0,      489,     1620, 369073316,        0
mbuf_cluster:            2048,    65536,     1506,      604,  5401864,        0
mbuf_jumbo_page:         4096,    12800,      469,      158,  8306777,        0
mbuf_jumbo_9k:           9216,     6400,        0,        0,        0,        0
mbuf_jumbo_16k:         16384,     3200,        0,        0,        0,        0
mbuf_ext_refcnt:            4,        0,        0,        0,        0,        0
NetGraph items:            36,     4130,        1,      117,   263123,        0
NetGraph data items:       36,      531,        0,      295, 106663377,        0

While ifconfig vr1 down/up solves the problem completely (for some long time),
taking link down/up using switch solves it "in half" - huge packet delays 
disappear
and turn to 25% packet loss happening in regular short intervals, once a second 
of like.

ifconfig down/up clears this mess too.

Please help me to debug this, it's pretty annoying.
I had a hope new vr(4) driver would help but it takes my system down under 
average load
and is unusable.

Here is start of dmesg.boot:

Copyright (c) 1992-2012 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
        The Regents of the University of California. All rights reserved.
FreeBSD is a registered trademark of The FreeBSD Foundation.
FreeBSD 8.3-STABLE #1: Wed Aug 29 22:49:45 NOVT 2012
    r...@grosbein.pp.ru:/usr/local/obj/nanobsd.gw/i386/usr/local/src/sys/GW i386
Timecounter "i8254" frequency 1193182 Hz quality 0
CPU: Geode(TM) Integrated Processor by AMD PCS (499.91-MHz 586-class CPU)
  Origin = "AuthenticAMD"  Id = 0x5a2  Family = 5  Model = a  Stepping = 2
  Features=0x88a93d<FPU,DE,PSE,TSC,MSR,CX8,SEP,PGE,CMOV,CLFLUSH,MMX>
  AMD Features=0xc0400000<MMX+,3DNow!+,3DNow!>
real memory  = 1065025536 (1015 MB)
avail memory = 1032929280 (985 MB)
K6-family MTRR support enabled (2 registers)

I must also note that this system runs with ACPI disabled in /boot/loader.conf:
hint.acpi.0.disabled=1

Otherwise, its timekeeping becomes broken.

Eugene Gtosbein
_______________________________________________
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"

Reply via email to