Have you tries fixed speed/duplex?

alexpalias-bsd...@yahoo.com wrote:
Good day

I'm running a FreeBSD 7.2 router and I am seeing a lot of input errors on one 
of the em interfaces (em0), coupled with (at approximately the same times) much 
fewer errors on em1 and em2.  Monitoring is done with SNMP from another 
machine, and the CPU load as reported via SNMP is mostly below 30%, with a 
couple of spikes up to 35%.

Software description:

- FreeBSD 7.2-RELEASE-p2, amd64
- bsnmpd with modules: hostres and (from ports) snmp_ucd
- quagga 0.99.12 (running only zebra and bgpd)
- netgraph (ng_ether and ng_netflow)

Hardware description:

- Dell machine, dual Xeon 3.20 GHz, 4 GB RAM
- 2 x built-in gigabit interfaces (em0, em1)
- 1 x dual-port gigabit interface, PCI-X (em2, em3) [see pciconf near the end]


The machine receives the global routing table ("netstat -nr | wc -l" gives 
289115 currently).

All of the em interfaces are just configured "up", with various vlan interfaces on them.  Note that 
I use "kpps" to mean "thousands of packets per second", sorry if that's the wrong 
shorthand.

- em0 sees a traffic of 10...22 kpps in, and 15...35 kpps out.  In bits, it's 
30...120Mbits/s in, and 100...210Mbits/s out.  Vlans configured are vlan100 and 
vlan200, and most of the traffic is on vlan100 (vlan200 sees 4kpps in / 0.5kpps 
out maximum, with the average at about one third of this).  em0 is the external 
interface, and its traffic corresponds to the sum of traffic through em1 and em2

- em1 has 5 vlans, and sees about 22kpps in / 11kpps out (maximum)

- em2 has a single VLAN, and sees about 4...13kpps both in and out (almost 
equal in/out during most of the day)

- em3 is a backup interface, with 2 VLANS, and is the only one which has seen 
no errors.

Only the vlans on em0 are analyzed by ng_netflow, and the errors I'm seeing 
have started appearing days before netgraph was even loaded in the kernel.

Tuning done:

/boot/loader.conf:
hw.em.rxd=4096
hw.em.txd=4096

Witout the above we were seeing way more errors, now they are reduced, but 
still come in bursts of over 1000 errors on em0.

/etc/sysctl.conf:
net.inet.ip.fastforwarding=1
dev.em.0.rx_processing_limit=300
dev.em.1.rx_processing_limit=300
dev.em.2.rx_processing_limit=300
dev.em.3.rx_processing_limit=300

Still seeing errros, after some searching the mailing lists we also added:

# the four lines below are repeated for em1, em2, em3
dev.em.0.rx_int_delay=0
dev.em.0.rx_abs_int_delay=0
dev.em.0.tx_int_delay=0
dev.em.0.tx_abs_int_delay=0

Still getting errors, so I also added:

net.inet.ip.intr_queue_maxlen=4096
net.route.netisr_maxqlen=1024

and

kern.ipc.nmbclusters=655360


Also tried with rx_processing_limit set to -1 on all em interfaces, still 
getting errors.

Looking at the shape of the error and packet graphs, there seems to be a correlation 
between the number of packets per second on em0 and the height of the error 
"spikes" on the error graph.  These spikes are spread throughout the day, with 
spaces (zones with no errors) of various lengths (10 minutes ... 2 hours spaces within 
the last 24 hours), but sometimes there are errors even in the lowest kpps times of the 
day.

em0 and em1 error times are correlated, with all errors on the graph for em0 
having a smaller corresponding error spike on em1 at the same time, and 
sometimes an error spike on em2.

The old router was seeing about the same traffic, and had em0, em1, re0 and re1 
network cards, and was only seeing errors on the em cards.  It was running 
7.2-PRERELEASE/i386


Any suggestions would be greatly appreciated.  Please note that this is a live 
router, and I can't reboot it (unless absolutely necessary).  Tuning that can 
be applied without rebooting will be tried first.

Here are some more details:

Trimmed output of netstat -ni (sorry if there are line breaks):
Name    Mtu Network       Address              Ipkts Ierrs    Opkts Oerrs  Coll
em0    1500 <Link#1>      00:14:22:xx:xx:xx 19744458839 15494721 24284439443    
 0     0
em1    1500 <Link#2>      00:14:22:xx:xx:xx 12832245469 123181 10105031790     
0     0
em2    1500 <Link#3>      00:04:23:xx:xx:xx 12082552403 10964 10339416865     0 
    0
em3    1500 <Link#4>      00:04:23:xx:xx:xx 79912337     0 48178737     0     0

Relevant part of pciconf -vl:

e...@pci0:6:7:0: class=0x020000 card=0x016d1028 chip=0x10768086 rev=0x05 
hdr=0x00
    vendor     = 'Intel Corporation'
    device     = '82541EI Gigabit Ethernet Controller'
    class      = network
    subclass   = ethernet
e...@pci0:7:8:0: class=0x020000 card=0x016d1028 chip=0x10768086 rev=0x05 
hdr=0x00
    vendor     = 'Intel Corporation'
    device     = '82541EI Gigabit Ethernet Controller'
    class      = network
    subclass   = ethernet
e...@pci0:9:4:0: class=0x020000 card=0x10128086 chip=0x10108086 rev=0x01 
hdr=0x00
    vendor     = 'Intel Corporation'
    device     = '82546EB Dual Port Gigabit Ethernet Controller (Copper)'
    class      = network
    subclass   = ethernet
e...@pci0:9:4:1: class=0x020000 card=0x10128086 chip=0x10108086 rev=0x01 
hdr=0x00
    vendor     = 'Intel Corporation'
    device     = '82546EB Dual Port Gigabit Ethernet Controller (Copper)'
    class      = network
    subclass   = ethernet

Kernel messages after sysctl dev.em.0.stats=1:
(note that I've removed the lines which only showed zeros in the second and 
third outputs)

em0: Excessive collisions = 0
em0: Sequence errors = 0
em0: Defer count = 0
em0: Missed Packets = 15435312
em0: Receive No Buffers = 16446113
em0: Receive Length Errors = 0
em0: Receive errors = 1
em0: Crc errors = 2
em0: Alignment errors = 0
em0: Collision/Carrier extension errors = 0
em0: RX overruns = 96826
em0: watchdog timeouts = 0
em0: RX MSIX IRQ = 0 TX MSIX IRQ = 0 LINK MSIX IRQ = 0
em0: XON Rcvd = 0
em0: XON Xmtd = 0
em0: XOFF Rcvd = 0
em0: XOFF Xmtd = 0
em0: Good Packets Rcvd = 19002068797
em0: Good Packets Xmtd = 23168462599
em0: TSO Contexts Xmtd = 0
em0: TSO Contexts Failed = 0

[later]
em0: Excessive collisions = 0
em0: Missed Packets = 15459111
em0: Receive No Buffers = 16447082
em0: Receive errors = 1
em0: Crc errors = 2
em0: RX overruns = 96835
em0: Good Packets Rcvd = 19165047284
em0: Good Packets Xmtd = 23386976960

[later]
em0: Excessive collisions = 0
em0: Missed Packets = 15470583
em0: Receive No Buffers = 16447686
em0: Receive errors = 1
em0: Crc errors = 2
em0: RX overruns = 96840
em0: Good Packets Rcvd = 19255466068
em0: Good Packets Xmtd = 23519004546


Thank you for your time.

_______________________________________________
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


--

Best regards.
Hooman Fazaeli




_______________________________________________
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"

Reply via email to