Re: network bandwith with em(4)
Le Tue, 22 Feb 2011 18:09:32 +0100, Patrick Lamaiziere patf...@davenulle.org a icrit : (4.8/amd64) I'm using two ethernet cards Intel 1000/PRO quad ports (gigabit) on a firewall (one fiber and one copper). The problem is that we don't get more than ~320 Mbits/s of bandwith beetween the internal networks and internet (gigabit). As far I can see, on load there is a number of Ierr on the interface connected to Internet (between 1% to 5%). -- dmesg (on 4.8): em0 at pci5 dev 0 function 0 Intel PRO/1000 QP (82571EB) rev 0x06: apic 1 int 13 (irq 14), address 00:15:17:ed:98:9d em4 at pci9 dev 0 function 0 Intel PRO/1000 QP (82575GB) rev 0x02: apic 1 int 23 (irq 11), address 00:1b:21:38:e0:80 Hello, This issue (IERR on em) looks to be fixed on 5.0. With 4.8 and 4.9 there were IERR errors with traffic 150 Mbs. With 5.0 there are only few IERR from time to time, even on high load ( 400 Mbits/s, 40K packets/s in, 30K packets/s out) I guess that the fixes on em(4) helps. May be the use of MSI interrupts too because I see a significant improvement on CPU interrupt load (around 60% in load to 50% with 5.0). (the measures are averaged on 5 minutes) That's cool! There are still some PF congestions from time to time but I have to investigate. It happens even when the box is idle but may be there are some burst of traffic. The box has 6 interfaces and I don't believe it can handle 6 Gbits at once. Too finish this too long thread, since february we (an university) are very happy with the reliability of our two PF firewalls, that just works. Thanks a lot, regards.
Re: network bandwith with em(4)
On 2011-02-28, Manuel Guesdon ml+openbsd.m...@oxymium.net wrote: OK. Anyway NIC buffers restrict buffered packets number. But the problem remain: why a (for exemple) dual Xeon E5520@2.27GHz with Intel PRO/1000 (82576) can't route 150kpps without Ierr :-) http://www.oxymium.net/tmp/core3-dmesg So looking at this dmesg you have ppb and em sharing ints; it wouldn't be a total surprise if the Performance degradation after upgrade thread was relevant: http://comments.gmane.org/gmane.os.openbsd.misc/184121
Re: network bandwith with em(4)
2011/3/23 Kapetanakis Giannis bil...@edu.physics.uoc.gr: I'm testing my self a 2 port 82571EB on a new fw. How are you doing the pps test? I'm actually reporting the values found in the first systat page. I have a suspicion these counters act weird on cloning interfaces (I saw the IPKTS being twice as much as OPKTS on a router without much local-originating/consuming traffic, with fifty carps and vlans on one side and bgp on the other), but in all of these tests the values were more or less the same - around 200k each. The bandwidth was distributed 113MB/s inbound and 70MB/s outbound (depending on the way of course), and I watched it in systat ifs. 2011/3/23 Theo de Raadt dera...@cvs.openbsd.org: -current kernels contain an option called POOL_DEBUG which has a pretty high impact on network traffic. B Unfortunately POOL_DEBUG is useful.. Thank you! I've only played with DEBUG once, but after failing to explain some of the behaviour I consider myself not educated enough to play with kernel options... Unfortunately I probably won't be able to repeat the tests for some time now, as the machine is already in production. -- Martin Pelikan
Re: network bandwith with em(4)
Hi, we just bought a new firewall, so I did some tests. It has 2 integrated i82574L's and we use 2port i82571EB. I tested routing through this box with a simple match out on em1 nat-to (em1) rule, using 4.8-stable, tcpbench on all five end computers and here's what I got: - maximum throughput 183 MB/s according to systat ifs - total. almost exactly 200kpps in each direction. - the difference between amd64-SP and amd64-MP is insignificant (few percent of cpu load, SP better) - the difference between amd64-SP and i386-SP is noticeable (the throughput stays, load decreases a bit more, i386 better) - I couldn't boot i386-MP -stable version, the system kept rebooting after fs checks... - the difference between 82574L and 82571EB is quite big (574L at 183 MB/s and i386-SP had cpu load about 70-80% (intr), whereas 571EB performed the same with about 45-55% interrupt cpu load!) - tuning of ITR or the amount of Tx/Rx descriptors used per card is useless (at least here, different kind of traffic might behave different way) - even if you gain a few megabits, you are still risking latency problems (probably system usability?) - at the end of the day I tried 4.9 -current amd64 from 18th March and it actually performed worse - around 175 MB/s max and 70% CPU with 571EBs. - it's a brilliant motherboard, compared to our other 6 Intels Is there anything I should test or mention and I didn't? Still, hope this helps someone... dmesg below: OpenBSD 4.8-stable (GENERIC.MP) #0: Tue Mar 22 17:42:14 CET 2011 peli...@koza.steadynet.cz:/usr/src/sys/arch/amd64/compile/GENERIC.MP real mem = 2137653248 (2038MB) avail mem = 2066927616 (1971MB) mainbus0 at root bios0 at mainbus0: SMBIOS rev. 2.6 @ 0x9f000 (68 entries) bios0: vendor American Megatrends Inc. version 1.1 date 05/27/2010 bios0: Supermicro X8SIL acpi0 at bios0: rev 2 acpi0: sleep states S0 S1 S4 S5 acpi0: tables DSDT FACP APIC MCFG OEMB HPET GSCI SSDT acpi0: wakeup devices P0P1(S4) P0P3(S4) P0P4(S4) P0P5(S4) P0P6(S4) BR1E(S4) PS2K(S4) PS2M(S4) USB0(S4) USB1(S4) USB2(S4) USB3(S4) USB4(S4) USB5(S4) USB6(S4) GBE_(S4) BR20(S4 ) BR21(S4) BR22(S4) BR23(S4) BR24(S4) BR25(S4) BR26(S4) BR27(S4) EUSB(S4) USBE(S4) SLPB(S4) acpitimer0 at acpi0: 3579545 Hz, 24 bits acpimadt0 at acpi0 addr 0xfee0: PC-AT compat cpu0 at mainbus0: apid 0 (boot processor) cpu0: Intel(R) Core(TM) i3 CPU 540 @ 3.07GHz, 3067.11 MHz cpu0: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,SBF,SSE3,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,CX16,xTPR,PD CM,SSE4.1,SSE4.2,POPCNT,NXE,LONG cpu0: 256KB 64b/line 8-way L2 cache cpu0: apic clock running at 133MHz cpu1 at mainbus0: apid 4 (application processor) cpu1: Intel(R) Core(TM) i3 CPU 540 @ 3.07GHz, 3066.67 MHz cpu1: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,SBF,SSE3,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,CX16,xTPR,PD CM,SSE4.1,SSE4.2,POPCNT,NXE,LONG cpu1: 256KB 64b/line 8-way L2 cache ioapic0 at mainbus0: apid 5 pa 0xfec0, version 20, 24 pins ioapic0: misconfigured as apic 1, remapped to apid 5 acpihpet0 at acpi0: 14318179 Hz acpiprt0 at acpi0: bus 0 (PCI0) acpiprt1 at acpi0: bus -1 (P0P1) acpiprt2 at acpi0: bus -1 (P0P3) acpiprt3 at acpi0: bus -1 (P0P5) acpiprt4 at acpi0: bus -1 (P0P6) acpiprt5 at acpi0: bus 4 (BR1E) acpiprt6 at acpi0: bus 1 (BR20) acpiprt7 at acpi0: bus 2 (BR24) acpiprt8 at acpi0: bus 3 (BR25) acpicpu0 at acpi0: C3, C2, C1, PSS acpicpu1 at acpi0: C3, C2, C1, PSS acpibtn0 at acpi0: SLPB acpibtn1 at acpi0: PWRB ipmi at mainbus0 not configured cpu0: Enhanced SpeedStep 3066 MHz: speeds: 3067, 2933, 2800, 2667, 2533, 2400, 2267, 2133, 2000, 1867, 1733, 1600, 1467, 1333, 1200 MHz pci0 at mainbus0 bus 0 pchb0 at pci0 dev 0 function 0 vendor Intel, unknown product 0x0048 rev 0x18 ppb0 at pci0 dev 28 function 0 Intel 3400 PCIE rev 0x05: apic 5 int 17 (irq 10) pci1 at ppb0 bus 1 em0 at pci1 dev 0 function 0 Intel PRO/1000 PT (82571EB) rev 0x06: apic 5 int 16 (irq 11), address 00:1b:21:82:67:0a em1 at pci1 dev 0 function 1 Intel PRO/1000 PT (82571EB) rev 0x06: apic 5 int 17 (irq 10), address 00:1b:21:82:67:0b ppb1 at pci0 dev 28 function 4 Intel 3400 PCIE rev 0x05: apic 5 int 17 (irq 10) pci2 at ppb1 bus 2 em2 at pci2 dev 0 function 0 Intel PRO/1000 MT (82574L) rev 0x00: apic 5 int 16 (irq 11), address 00:25:90:0e:77:7a ppb2 at pci0 dev 28 function 5 Intel 3400 PCIE rev 0x05: apic 5 int 16 (irq 11) pci3 at ppb2 bus 3 em3 at pci3 dev 0 function 0 Intel PRO/1000 MT (82574L) rev 0x00: apic 5 int 17 (irq 10), address 00:25:90:0e:77:7b ehci0 at pci0 dev 29 function 0 Intel 3400 USB rev 0x05: apic 5 int 23 (irq 15) usb0 at ehci0: USB revision 2.0 uhub0 at usb0 Intel EHCI root hub rev 2.00/1.00 addr 1 ppb3 at pci0 dev 30 function 0 Intel 82801BA Hub-to-PCI rev 0xa5 pci4 at ppb3 bus 4 vga1 at pci4 dev 3 function 0 Matrox MGA G200eW rev 0x0a wsdisplay0 at vga1 mux 1: console (80x25, vt100 emulation) wsdisplay0: screen 1-5 added
Re: network bandwith with em(4)
- at the end of the day I tried 4.9 -current amd64 from 18th March and it actually performed worse - around 175 MB/s max and 70% CPU with 571EBs. -current kernels contain an option called POOL_DEBUG which has a pretty high impact on network traffic. Unfortunately POOL_DEBUG is useful..
Re: network bandwith with em(4)
On 23/03/11 16:59, Martin Pelikan wrote: Hi, we just bought a new firewall, so I did some tests. It has 2 integrated i82574L's and we use 2port i82571EB. I tested routing through this box with a simple match out on em1 nat-to (em1) rule, using 4.8-stable, tcpbench on all five end computers and here's what I got: - maximum throughput 183 MB/s according to systat ifs - total. almost exactly 200kpps in each direction. I'm testing my self a 2 port 82571EB on a new fw. How are you doing the pps test? Giannis [demime 1.01d removed an attachment of type application/pkcs7-signature which had a name of smime.p7s]
Re: network bandwith with em(4)
Hello I have couple of old ProLiants with bxp/em interfaces with 4.8 stable. If you provide me more info what to test extactly and what output to send, I'd gladly help. BR Peter On 13 Mar 2011 03:56, Ryan McBride mcbr...@openbsd.org wrote: On Sat, Mar 12, 2011 at 06:29:42PM -0800, Chris Cappuccio wrote: Are you suggesting that because you have a quad-port gig nic, your box should be able to do 6 *million* packets per second? By that logic my 5-port Soekris net4801 should be able to handle 740kpps. (for reference, the net4801 does about 3kpps with 4.9) are you sure? that seems low, the 4501 used to do 4kpps with openbsd 3.3 ! Quite sure, though I certainly welcome someone else doing independent testing to prove me wrong. (FWIW: I tested 3.3 last month and got a maximum of 2400pps before packet loss exceeded 1%) The numbers above are for IP forwarding (not bridging), no PF, TCP syn packets with random ports, ISN, and source address, but fixed destination address. Measurements are on either side of the device using SNMP on the switch, and they match very closely what I'm seeing from the endpoints on either side of the firewall. The results are also stable across the more than 30,000 individual tests I've run to date against a variety of hardware and versions (automated, of course!) Note that If you measure on the box itself (i.e. the IPKTS/OPKTS) you will get lies when the system is livelocking. If you push harder you can get more packets through the soekris but it's meaningless as most of the packets are being dropped and the box is completely livelocked.
Re: network bandwith with em(4)
W dniu 2011-03-12 01:26, Stuart Henderson pisze: On 2011-03-11, RLWseran...@o2.pl wrote: Because lately some people wrote to the group about network bandwidth problems with em(4) i have run some test myself. Most of the recent posts about this have been about packet forwarding perfornance; sourcing/sinking packets on the box itself is also interesting of course, but it's a totally separate measurement. bandwidth test by IPERF and NETPERF: iperf -c 10.0.0.X -t 60 -i 5 netperf -H 10.0.0.X -p 9192 -n1 -l 10 Not sure about netperf but from what I remember iperf isn't a great performer on OpenBSD. I will happily run some network tests. Could someone suggest best programs, methods, etc for the job? I have bnx(4) dual port pcie x4, em(4) inegrated and pcie x1 cards. best regards, RLW
Re: network bandwith with em(4)
Ryan McBride [mcbr...@openbsd.org] wrote: Are you suggesting that because you have a quad-port gig nic, your box should be able to do 6 *million* packets per second? By that logic my 5-port Soekris net4801 should be able to handle 740kpps. (for reference, the net4801 does about 3kpps with 4.9) are you sure? that seems low, the 4501 used to do 4kpps with openbsd 3.3 !
Re: network bandwith with em(4)
On Sat, Mar 12, 2011 at 06:29:42PM -0800, Chris Cappuccio wrote: Are you suggesting that because you have a quad-port gig nic, your box should be able to do 6 *million* packets per second? By that logic my 5-port Soekris net4801 should be able to handle 740kpps. (for reference, the net4801 does about 3kpps with 4.9) are you sure? that seems low, the 4501 used to do 4kpps with openbsd 3.3 ! Quite sure, though I certainly welcome someone else doing independent testing to prove me wrong. (FWIW: I tested 3.3 last month and got a maximum of 2400pps before packet loss exceeded 1%) The numbers above are for IP forwarding (not bridging), no PF, TCP syn packets with random ports, ISN, and source address, but fixed destination address. Measurements are on either side of the device using SNMP on the switch, and they match very closely what I'm seeing from the endpoints on either side of the firewall. The results are also stable across the more than 30,000 individual tests I've run to date against a variety of hardware and versions (automated, of course!) Note that If you measure on the box itself (i.e. the IPKTS/OPKTS) you will get lies when the system is livelocking. If you push harder you can get more packets through the soekris but it's meaningless as most of the packets are being dropped and the box is completely livelocked.
Re: network bandwith with em(4)
I fixed my issue. I demoted the OpenBSD 4.4 machine so the 4.8 one took over as CARP master, downed pfsync0 on both machines and now the 4.8 box is happily passing tons of packets. It was pfsync0 that was messing up 4.8 even with defer: off it was struggling. Going to test it for about a week, then upgrade the remaining 4.4 box to 4.8. Thank goodness it wasn't a hardware issue. Tom
Re: network bandwith with em(4)
W dniu 2011-03-05 21:24, Manuel Guesdon pisze: On Sat, 5 Mar 2011 22:09:51 +0900 Ryan McBridemcbr...@openbsd.org wrote: | On Fri, Feb 25, 2011 at 08:40:10PM +0100, Manuel Guesdon wrote: | systat -s 2 vmstat: | | 3.2%Int 0.1%Sys 0.0%Usr 0.0%Nic 96.8%Idle | ||||||||||| | | The numbers presented here are calculated against the sum of your CPUs. | Since you are running bsd.mp with hyperthreading turned on, your machine | has 16 CPUs; each CPU accounts for about 6% of the total available so | the 3.2%Int value in your systat vmstat means that you have one cpu | (the only one that is actually working in the kernel) about 50% in | interrupt context. | | The exact behaviour varies from hardware to hardware, but it's not | surprising that you start losing packets at this level of load. OK. Understood. Thank you. I'll try SP kernel with mulithread disabled as soon as I can and make some tests. Manuel -- __ Manuel Guesdon - OXYMIUM Hello, Because lately some people wrote to the group about network bandwidth problems with em(4) i have run some test myself. On the same hardware i have run tests on Debian and OpenBSD. It seems, there might be something in OpenBSD that slows bandwidth on gbit NICs. Below detailed info. I can run some more tests if someone push me in the wright direction ;) -- TEST BOX: Mainbord: Intel D955XBK CPU: Pentium 4 3GHz, HT disabled LAN 1 (integrated): Gigabit (10/100/1000 Mbits/sec) LAN subsystem using the Intel. 82573E/82573V/82574V Gigabit Ethernet Controller LAN 2 (pcie x4): HP NC380T PCI-E x4 Dual Port Multifunction Gigabit Server NIC HDD1: OpenBSD 4.8 i386, pf disabled HDD2: Debian 6.0 i386 bandwidth test by IPERF and NETPERF: iperf -c 10.0.0.X -t 60 -i 5 netperf -H 10.0.0.X -p 9192 -n1 -l 10 -- DMESG OpenBSD 4.8 CD INSTALL: cpu0: Intel(R) Pentium(R) 4 CPU 3.00GHz (GenuineIntel 686-class) 3.01 GHz cpu0: FPU,V86,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT, PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,SBF,SSE3,MWAIT, DS-CPL,EST,CNXT-ID,CX16,xTPR cpu0 at mainbus0: apid 0 (boot processor) cpu0: apic clock running at 200MHz cpu at mainbus0: not configured acpicpu0 at acpi0: FVS, 3000, 2800 MHz bnx0 at pci3 dev 4 function 0 Broadcom BCM5706 rev 0x02: apic 2 int 16 (irq 11) brgphy0 at bnx0 phy 1: BCM5706 10/100/1000baseT/SX PHY, rev. 2 em0 at pci6 dev 0 function 0 Intel PRO/1000MT (82573E) rev 0x03: apic 2 int 17 (irq 10) -- DMESG Debian 6.0: Linux version 2.6.32-5-686 (Debian 2.6.32-30) CPU0: Intel(R) Pentium(R) 4 CPU 3.00GHz stepping 03 e1000e :04:00.0: irq 28 for MSI/MSI-X e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX Broadcom NetXtreme II Gigabit Ethernet Driver bnx2 v2.0.2 (Aug 21, 2009) eth0: Broadcom NetXtreme II BCM5706 1000Base-T (A2) PCI-X 64-bit 100MHz found at mem 2400, IRQ 16 eth1: Broadcom NetXtreme II BCM5706 1000Base-T (A2) PCI-X 64-bit 100MHz found at mem 2200, IRQ 17 lspci: Ethernet controller: Broadcom Corporation NetXtreme II BCM5706 Gigabit Ethernet (rev 02) Ethernet controller: Broadcom Corporation NetXtreme II BCM5706 Gigabit Ethernet (rev 02) Ethernet controller: Intel Corporation 82573V Gigabit Ethernet Controller (Copper) (rev 03) -- test 1a @ bnx0 (pcie x4): iperf from OpenBSD 4.8 - Debian 6.0 [ 3] 45.0-50.0 sec236 MBytes395 Mbits/sec [ ID] Interval Transfer Bandwidth [ 3] 50.0-55.0 sec236 MBytes396 Mbits/sec [ ID] Interval Transfer Bandwidth [ 3] 55.0-60.0 sec236 MBytes396 Mbits/sec [ ID] Interval Transfer Bandwidth [ 3] 0.0-60.0 sec 2.76 GBytes396 Mbits/sec load averages: 0.42, 0.27, 0.18 rlw.local.kig 12:07:09 22 processes: 1 running, 20 idle, 1 on processor CPU states: 0.0% user, 0.0% nice, 33.1% system, 22.2% interrupt, 44.7% idle Memory: Real: 9060K/46M act/tot Free: 445M Swap: 0K/764M used/tot -- test 1b @ bnx0 (pcie x4): netperf from OpenBSD 4.8 - OpenBSD 4.8 Recv SendSend Socket Socket Message Elapsed Size SizeSize Time Throughput bytes bytes bytessecs.10^6bits/sec 16384 16384 1638410.25 453.33 load averages: 0.51, 0.19, 0.07 rlw.local.kig 14:55:05 24 processes: 23 idle, 1 on processor CPU states: 0.2% user, 0.0% nice, 45.1% system, 33.7% interrupt, 21.0% idle Memory: Real: 8672K/37M act/tot Free: 454M Swap: 0K/764M used/tot PID USERNAME PRI NICE SIZE RES STATE WAIT TIMECPU COMMAND 18327 root 20 396K 812K sleep netio 0:04 15.48% netperf -- test 2a @ em0 (integrated): iperf from OpenBSD 4.8 - Debian 6.0 [ ID] Interval Transfer
Re: network bandwith with em(4)
RLW wrote: W dniu 2011-03-05 21:24, Manuel Guesdon pisze: On Sat, 5 Mar 2011 22:09:51 +0900 Ryan McBridemcbr...@openbsd.org wrote: | On Fri, Feb 25, 2011 at 08:40:10PM +0100, Manuel Guesdon wrote: | systat -s 2 vmstat: | | 3.2%Int 0.1%Sys 0.0%Usr 0.0%Nic 96.8%Idle | ||||||||||| | | The numbers presented here are calculated against the sum of your CPUs. | Since you are running bsd.mp with hyperthreading turned on, your machine | has 16 CPUs; each CPU accounts for about 6% of the total available so | the 3.2%Int value in your systat vmstat means that you have one cpu | (the only one that is actually working in the kernel) about 50% in | interrupt context. | | The exact behaviour varies from hardware to hardware, but it's not | surprising that you start losing packets at this level of load. OK. Understood. Thank you. I'll try SP kernel with mulithread disabled as soon as I can and make some tests. Manuel -- __ Manuel Guesdon - OXYMIUM Hello, Because lately some people wrote to the group about network bandwidth problems with em(4) i have run some test myself. On the same hardware i have run tests on Debian and OpenBSD. It seems, there might be something in OpenBSD that slows bandwidth on gbit NICs. Below detailed info. snip How about MTU? Did you have jumbo frames enabled on Debian? Alexey
Re: network bandwith with em(4)
On 2011-03-11, RLW seran...@o2.pl wrote: Because lately some people wrote to the group about network bandwidth problems with em(4) i have run some test myself. Most of the recent posts about this have been about packet forwarding perfornance; sourcing/sinking packets on the box itself is also interesting of course, but it's a totally separate measurement. bandwidth test by IPERF and NETPERF: iperf -c 10.0.0.X -t 60 -i 5 netperf -H 10.0.0.X -p 9192 -n1 -l 10 Not sure about netperf but from what I remember iperf isn't a great performer on OpenBSD.
Re: network bandwith with em(4)
Hi, I had a pair of Dell PowerEdge R200s that have both em(4) and bge(4)s in them, however, it's the em(4) doing the heavy lifting. Roughly 30-40 megabits/s sustained and doing anywhere between 3000-4000 packets/s. On OpenBSD 4.4, it happily forwards packets along. I upgraded one of the firewalls to 4.8 and switched CARP over to it (yes, I know the redundancy is broken anyway now.) and it couldn't seem to handle the traffic. Any inbound connections would stall and I have no idea why. There were no net.inet.ip.ifq.drops, but I noticed 10 livelocks when running systat mbufs (on em0). Could MCLGETI be hindering performance? Is there anything I can try? Tom OpenBSD 4.8 (GENERIC) #136: Mon Aug 16 09:06:23 MDT 2010 dera...@i386.openbsd.org:/usr/src/sys/arch/i386/compile/GENERIC cpu0: Intel(R) Core(TM)2 Duo CPU E4500 @ 2.20GHz (GenuineIntel 686-class) 2.21 GHz cpu0: FPU,V86,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,SBF,SSE3,MWAIT,DS-CPL,EST ,TM2,SSSE3,CX16,xTPR,PDCM real mem = 1071947776 (1022MB) avail mem = 1044451328 (996MB) mainbus0 at root bios0 at mainbus0: AT/286+ BIOS, date 10/24/07, BIOS32 rev. 0 @ 0xfadd0, SMBIOS rev. 2.5 @ 0x3ff9c000 (46 entries) bios0: vendor Dell Inc. version 1.0.0 date 10/24/2007 bios0: Dell Inc. PowerEdge R200 acpi0 at bios0: rev 2 acpi0: sleep states S0 S4 S5 acpi0: tables DSDT FACP APIC SPCR HPET MCFG WD__ SLIC ERST HEST BERT EINJ SSDT SSDT SSDT acpi0: wakeup devices PCI0(S5) acpitimer0 at acpi0: 3579545 Hz, 24 bits acpimadt0 at acpi0 addr 0xfee0: PC-AT compat cpu0 at mainbus0: apid 0 (boot processor) cpu0: apic clock running at 200MHz cpu at mainbus0: not configured ioapic0 at mainbus0: apid 2 pa 0xfec0, version 20, 24 pins ioapic0: misconfigured as apic 0, remapped to apid 2 acpihpet0 at acpi0: 14318179 Hz acpiprt0 at acpi0: bus 0 (PCI0) acpiprt1 at acpi0: bus 1 (PEX1) acpiprt2 at acpi0: bus 2 (SBE0) acpiprt3 at acpi0: bus 3 (SBE4) acpiprt4 at acpi0: bus 4 (SBE5) acpiprt5 at acpi0: bus 5 (COMP) acpicpu0 at acpi0: PSS bios0: ROM list: 0xc/0x9000 0xec000/0x4000! ipmi at mainbus0 not configured cpu0: Enhanced SpeedStep 2201 MHz: speeds: 2200, 2000, 1800, 1600, 1400, 1200 MHz pci0 at mainbus0 bus 0: configuration mode 1 (no bios) pchb0 at pci0 dev 0 function 0 Intel 3200/3210 Host rev 0x01 ppb0 at pci0 dev 1 function 0 Intel 3200/3210 PCIE rev 0x01: apic 2 int 16 (irq 15) pci1 at ppb0 bus 1 ppb1 at pci0 dev 28 function 0 Intel 82801I PCIE rev 0x02: apic 2 int 16 (irq 15) pci2 at ppb1 bus 2 em0 at pci2 dev 0 function 0 Intel PRO/1000 PT (82571EB) rev 0x06: apic 2 int 16 (irq 15), address 00:15:17:6c:c7:a2 em1 at pci2 dev 0 function 1 Intel PRO/1000 PT (82571EB) rev 0x06: apic 2 int 17 (irq 14), address 00:15:17:6c:c7:a3 ppb2 at pci0 dev 28 function 4 Intel 82801I PCIE rev 0x02 pci3 at ppb2 bus 3 bge0 at pci3 dev 0 function 0 Broadcom BCM5721 rev 0x21, BCM5750 C1 (0x4201): apic 2 int 16 (irq 15), address 00:19:b9:fa:59:20 brgphy0 at bge0 phy 1: BCM5750 10/100/1000baseT PHY, rev. 0 ppb3 at pci0 dev 28 function 5 Intel 82801I PCIE rev 0x02 pci4 at ppb3 bus 4 bge1 at pci4 dev 0 function 0 Broadcom BCM5721 rev 0x21, BCM5750 C1 (0x4201): apic 2 int 17 (irq 14), address 00:19:b9:fa:59:21 brgphy1 at bge1 phy 1: BCM5750 10/100/1000baseT PHY, rev. 0 uhci0 at pci0 dev 29 function 0 Intel 82801I USB rev 0x02: apic 2 int 21 (irq 11) uhci1 at pci0 dev 29 function 1 Intel 82801I USB rev 0x02: apic 2 int 20 (irq 10) uhci2 at pci0 dev 29 function 2 Intel 82801I USB rev 0x02: apic 2 int 21 (irq 11) ehci0 at pci0 dev 29 function 7 Intel 82801I USB rev 0x02: apic 2 int 21 (irq 11) usb0 at ehci0: USB revision 2.0 uhub0 at usb0 Intel EHCI root hub rev 2.00/1.00 addr 1 ppb4 at pci0 dev 30 function 0 Intel 82801BA Hub-to-PCI rev 0x92 pci5 at ppb4 bus 5 vga1 at pci5 dev 5 function 0 ATI ES1000 rev 0x02 wsdisplay0 at vga1 mux 1: console (80x25, vt100 emulation) wsdisplay0: screen 1-5 added (80x25, vt100 emulation) radeondrm0 at vga1: apic 2 int 19 (irq 5) drm0 at radeondrm0 ichpcib0 at pci0 dev 31 function 0 Intel 82801IR LPC rev 0x02: PM disabled pciide0 at pci0 dev 31 function 2 Intel 82801I SATA rev 0x02: DMA, channel 0 configured to native-PCI, channel 1 configured to native-PCI pciide0: using apic 2 int 23 (irq 6) for native-PCI interrupt wd0 at pciide0 channel 0 drive 0: WDC WD1601ABYS-18C0A0 wd0: 16-sector PIO, LBA48, 152587MB, 31250 sectors atapiscsi0 at pciide0 channel 0 drive 1 scsibus0 at atapiscsi0: 2 targets cd0 at scsibus0 targ 0 lun 0: HL-DT-ST, CDRW/DVD GCCT10N, A102 ATAPI 5/cdrom removable wd0(pciide0:0:0): using PIO mode 4, Ultra-DMA mode 6 cd0(pciide0:0:1): using PIO mode 4, Ultra-DMA mode 2 usb1 at uhci0: USB revision 1.0 uhub1 at usb1 Intel UHCI root hub rev 1.00/1.00 addr 1 usb2 at uhci1: USB revision 1.0 uhub2 at usb2 Intel UHCI root hub rev 1.00/1.00 addr 1 usb3 at uhci2: USB revision 1.0 uhub3 at usb3 Intel UHCI root hub rev 1.00/1.00 addr 1 isa0 at ichpcib0
Re: network bandwith with em(4)
On Thu, Mar 10, 2011 at 12:18:32PM +, Tom Murphy wrote: I had a pair of Dell PowerEdge R200s that have both em(4) and bge(4)s in them, however, it's the em(4) doing the heavy lifting. Roughly 30-40 megabits/s sustained and doing anywhere between 3000-4000 packets/s. On OpenBSD 4.4, it happily forwards packets along. I upgraded one of the firewalls to 4.8 and switched CARP over to it (yes, I know the redundancy is broken anyway now.) and it couldn't seem to handle the traffic. Any inbound connections would stall and I have no idea why. I assume that you don't have the 'defer' option set on your pfsync interface (it would be broken until you upgrade both firewalls) There were no net.inet.ip.ifq.drops, but I noticed 10 livelocks when running systat mbufs (on em0). I think in 4.8 systat mbufs is showing the total number of livelocks ever, and 10 is a tiny number. On a system nearing it's limit you could expect the livelocks counter to get hit a few times a second, but if it's getting hit 50 times per second you're way over capacity. Note you can also look at 'sysctl kern.netlivelocks' which is a little less ambiguous, and shows the total number of livelocks since boot. Could MCLGETI be hindering performance? I'm doing a lot of testing in this area these days on a broad range of hardware, and I have yet to find a case where MCLGETI does not improve a system's ability to handle load. If anything MCLGETI needs to be more aggressive, and we're looking at ways to do that. -Ryan
Re: network bandwith with em(4)
Ryan McBride wrote: On Thu, Mar 10, 2011 at 12:18:32PM +, Tom Murphy wrote: I had a pair of Dell PowerEdge R200s that have both em(4) and bge(4)s in them, however, it's the em(4) doing the heavy lifting. Roughly 30-40 megabits/s sustained and doing anywhere between 3000-4000 packets/s. On OpenBSD 4.4, it happily forwards packets along. I upgraded one of the firewalls to 4.8 and switched CARP over to it (yes, I know the redundancy is broken anyway now.) and it couldn't seem to handle the traffic. Any inbound connections would stall and I have no idea why. I assume that you don't have the 'defer' option set on your pfsync interface (it would be broken until you upgrade both firewalls) Correct. The defer option is off by default and when I looked at pfsync0 on the 4.8 box it said: pfsync: syncdev: bge1 maxupd: 128 defer: off There were no net.inet.ip.ifq.drops, but I noticed 10 livelocks when running systat mbufs (on em0). I think in 4.8 systat mbufs is showing the total number of livelocks ever, and 10 is a tiny number. On a system nearing it's limit you could expect the livelocks counter to get hit a few times a second, but if it's getting hit 50 times per second you're way over capacity. Yeah I only had 10 after about 3-4 hours and the number did not increase. Note you can also look at 'sysctl kern.netlivelocks' which is a little less ambiguous, and shows the total number of livelocks since boot. Thanks! I will bear that in mind. Could MCLGETI be hindering performance? I'm doing a lot of testing in this area these days on a broad range of hardware, and I have yet to find a case where MCLGETI does not improve a system's ability to handle load. If anything MCLGETI needs to be more aggressive, and we're looking at ways to do that. I notice the machines are mostly idle.. between 90-95%. They also use very little memory (top reports 15-18M of memory used). The 4.8 box only has 1 gig of RAM, whereas the 4.4 box has 2 gig. It doesn't seem to make much of a difference in this case. Whichever firewall is active can handle upwards to about 62000 states during peak times. Would it be worth just shutting down pfsync(4) on both machines to test performance? I wouldn't want pfsync getting in the way since pfsync is broken anyway. It would be one more variable to remove from the equation. Tom
kernel leaks (was: Re: network bandwith with em(4))
On 03/10/2011 03:45 PM, Tom Murphy wrote: Ryan McBride wrote: On Thu, Mar 10, 2011 at 12:18:32PM +, Tom Murphy wrote: I had a pair of Dell PowerEdge R200s that have both em(4) and bge(4)s in them, however, it's the em(4) doing the heavy lifting. Roughly 30-40 megabits/s sustained and doing anywhere between 3000-4000 packets/s. On OpenBSD 4.4, it happily forwards packets along. I upgraded one of the firewalls to 4.8 and switched CARP over to it (yes, I know the redundancy is broken anyway now.) and it couldn't seem to handle the traffic. Any inbound connections would stall and I have no idea why. Hi folks, Sorry for hijacking this thread. I also have a Dell machine with em(4)'s. When I upgraded a machine from 4.3 or 4.4 to 4.7 the kernel is leaking memory I've been looking at it ever since. This was just before 4.8 came out so it didn't get 4.8. I disabled everything I could find to figure out if I did something wrong, ranging from openvpn with briding setup and to the new setup I made with relayd. Anything I could think of in userspace. And I also reverted back to the stock kernel instead of one with errata patches applied. I've set the interfaces to use a full duplex instead of automatic. Disabled the use of IPv6 (which wasn't used before the upgrade). Nothing seems to have worked so far. It isn't a big machine and it doesn't need to handle a lot of traffic but at the current rate it is leaking memory all day long and I have to reboot the machine every 1 or 2 weeks or it will stop working. Which obvously is very sad. When it gets to about 8000+ mbufs the machine starts to exhibit really weird behaviour but does not lockup. It can setup client connections on TCP but TCP or Unix server sockets can not receive any new connections. I keep a log of the output of netstat -m. There is part of the output of the log and the dmesg at the end of this email. The one part I haven't tried disabling is the dynamic routing, it does get frequent route updates. I have an other machine which runs exactly the same binaries but the hardware is a bit different. I looked at a lot of changes in CVS and I didn't see anything special in the related drivers that I could find which warranted an upgrade to 4.8. Doing an upgrade would take quite a bit of time I don't have right now and I also didn't want to make the problem worse. ;-) If you have any tips I could to further investigate or fix problem I would really appreciate it. If you need any extra information let me know. I keep wondering what changed between 4.3/4.4 and 4.7/4.8 in correspondance with Dell and em(4). At this point I'm thinking wasn't there a big update in how ACPI works on OpenBSD or something like that which might effect how interrupts and drivers work ? Anyway have a nice day, Leen. ___ OpenBSD 4.7 (GENERIC) #558: Wed Mar 17 20:46:15 MDT 2010 dera...@i386.openbsd.org:/usr/src/sys/arch/i386/compile/GENERIC cpu0: Intel(R) Pentium(R) 4 CPU 3.06GHz (GenuineIntel 686-class) 3.07 GHz cpu0: FPU,V86,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,SBF,CNXT-ID,xTPR real mem = 1073184768 (1023MB) avail mem = 1031110656 (983MB) mainbus0 at root bios0 at mainbus0: AT/286+ BIOS, date 10/08/03, BIOS32 rev. 0 @ 0xffe90, SMBIOS rev. 2.3 @ 0xfae10 (77 entries) bios0: vendor Dell Computer Corporation version A04 date 10/08/2003 bios0: Dell Computer Corporation PowerEdge 650 acpi0 at bios0: rev 0 acpi0: tables DSDT FACP APIC SPCR acpi0: wakeup devices PCI0(S5) PCI1(S5) PCI2(S5) acpitimer0 at acpi0: 3579545 Hz, 32 bits acpimadt0 at acpi0 addr 0xfee0: PC-AT compat cpu0 at mainbus0: apid 0 (boot processor) cpu0: apic clock running at 133MHz cpu at mainbus0: not configured ioapic0 at mainbus0: apid 2 pa 0xfec0, version 11, 16 pins ioapic0: misconfigured as apic 0, remapped to apid 2 ioapic1 at mainbus0: apid 3 pa 0xfec01000, version 11, 16 pins ioapic1: misconfigured as apic 0, remapped to apid 3 ioapic2 at mainbus0: apid 4 pa 0xfec02000, version 11, 16 pins ioapic2: misconfigured as apic 0, remapped to apid 4 acpiprt0 at acpi0: bus 0 (PCI0) acpiprt1 at acpi0: bus 1 (PCI1) acpiprt2 at acpi0: bus 2 (PCI2) acpicpu0 at acpi0 bios0: ROM list: 0xc/0x8000 0xc8000/0x4800 0xcc800/0x1800 0xce000/0x1800 0xec000/0x4000! pci0 at mainbus0 bus 0: configuration mode 1 (bios) pchb0 at pci0 dev 0 function 0 ServerWorks GCNB-LE Host rev 0x32 pchb1 at pci0 dev 0 function 1 ServerWorks GCNB-LE Host rev 0x00 pci1 at pchb1 bus 1 em0 at pci1 dev 3 function 0 Intel PRO/1000MT (82546EB) rev 0x01: apic 3 int 3 (irq 5), address 00:04:23:9f:24:56 em1 at pci1 dev 3 function 1 Intel PRO/1000MT (82546EB) rev 0x01: apic 3 int 4 (irq 3), address 00:04:23:9f:24:57 em2 at pci0 dev 3 function 0 Intel PRO/1000MT (82546EB) rev 0x01: apic 3 int 1 (irq 15), address 00:04:23:5f:1c:b2 em3 at pci0 dev 3 function 1 Intel PRO/1000MT (82546EB) rev 0x01: apic 3 int 2 (irq 11),
Re: kernel leaks (was: Re: network bandwith with em(4))
On Fri, Mar 11, 2011 at 12:22 AM, Leen Besselink open...@consolejunkie.net Hi folks, Sorry for hijacking this thread. I also have a Dell machine with em(4)'s. When I upgraded a machine from 4.3 or 4.4 to 4.7 the kernel is leaking memory I've been looking at it ever since. This was just before 4.8 came out so it didn't get 4.8. There have been a number of mbuf leak fixes between 4.8 and 4.9. Reinstall with 4.9/current and repeat your tests.
Re: network bandwith with em(4)
On Fri, Feb 25, 2011 at 08:40:10PM +0100, Manuel Guesdon wrote: systat -s 2 vmstat: 3.2%Int 0.1%Sys 0.0%Usr 0.0%Nic 96.8%Idle ||||||||||| The numbers presented here are calculated against the sum of your CPUs. Since you are running bsd.mp with hyperthreading turned on, your machine has 16 CPUs; each CPU accounts for about 6% of the total available so the 3.2%Int value in your systat vmstat means that you have one cpu (the only one that is actually working in the kernel) about 50% in interrupt context. The exact behaviour varies from hardware to hardware, but it's not surprising that you start losing packets at this level of load.
Re: network bandwith with em(4)
On Sat, 5 Mar 2011 22:09:51 +0900 Ryan McBride mcbr...@openbsd.org wrote: | On Fri, Feb 25, 2011 at 08:40:10PM +0100, Manuel Guesdon wrote: | systat -s 2 vmstat: | | 3.2%Int 0.1%Sys 0.0%Usr 0.0%Nic 96.8%Idle | ||||||||||| | | The numbers presented here are calculated against the sum of your CPUs. | Since you are running bsd.mp with hyperthreading turned on, your machine | has 16 CPUs; each CPU accounts for about 6% of the total available so | the 3.2%Int value in your systat vmstat means that you have one cpu | (the only one that is actually working in the kernel) about 50% in | interrupt context. | | The exact behaviour varies from hardware to hardware, but it's not | surprising that you start losing packets at this level of load. OK. Understood. Thank you. I'll try SP kernel with mulithread disabled as soon as I can and make some tests. Manuel -- __ Manuel Guesdon - OXYMIUM
Re: network bandwith with em(4)
On Thu, Mar 03, 2011 at 03:52:54PM +0100, Manuel Guesdon wrote: Of course and s/OpenBSD/FreeBSD/ may help too but none of these proposals seems very constructive. If you think that you'd be better served by FreeBSD, please go ahead and use that instead. | I think we already mentioned it that you will always see Ierr. The | question is if the box is able to forward more then 150kpps. Yes that's one a the questions. We can divide it into 3 questions: 1) is the limitation comes from hardware ? 2) is the limitation comes from OpenBSD ? 3) is the limitation comes from the way OpenBSD exploit hardware. 1) Except if someone explain by a+b why the hardware can't forward this rate, I'm keep thinking it can do it (otherwise I don't see reason to sell quad 1Gbps nic). Are you suggesting that because you have a quad-port gig nic, your box should be able to do 6 *million* packets per second? By that logic my 5-port Soekris net4801 should be able to handle 740kpps. (for reference, the net4801 does about 3kpps with 4.9) I'm ok to hear that I've purchased crappy motherboard card or nic (but I'd like to understand why they are crappy). It has nothing to do with hardware crappiness, it has to do with your expectations. Your box should certainly be able to fill a few of your gig ports with 1500byte packets, but there is no way it'll handle a full 4 gigabits / second of TCP syn packets. I've spent days and days making tests, searches, reading kernel source code and so on because I think it's interesting for the community to find where the problem come from and how to solve it (if possible). If finally the answer is that OpenBSD (or may be any other OS) can't forward more than 150kpps without losing 1 to 20 pps with this hardware, I'll live with it. Are you actually complaining about 1 to 20 errors per second? That's 0.01% packet loss, welcome to ethernet. You will not see this change by switching to different hardware or OS. It /is/ possible that something is wrong with your box and you could be getting a slightly higher throughput. But don't expect that we'll make it handle 2 million PPS any time soon. But as we've already seen that increasing int/s improve performances (for good or bad reason), I keep thinking there's something to improve or fix but I may be wrong. There are MANY more performance considerations than just pps: latency, interactive/userland performance under load, how the system responds once it is overloaded, etc. We're not going to sacrifice all these just to get a higher pps number. However, don't bother just telling us there's something to improve. We've working on this for years, we've already made huge improvements, and we're always looking for more. Perhaps the biggest limitation on modern hardware is that we can't split the packet handling across multiple CPUs, but your input provides exactly ZERO help with changing that.
Re: network bandwith with em(4)
On Fri, 4 Mar 2011 22:53:30 +0900 Ryan McBride mcbr...@openbsd.org wrote: | On Thu, Mar 03, 2011 at 03:52:54PM +0100, Manuel Guesdon wrote: | | I think we already mentioned it that you will always see Ierr. The | | question is if the box is able to forward more then 150kpps. | | Yes that's one a the questions. We can divide it into 3 questions: | 1) is the limitation comes from hardware ? | 2) is the limitation comes from OpenBSD ? | 3) is the limitation comes from the way OpenBSD exploit hardware. | | 1) Except if someone explain by a+b why the hardware can't forward this | rate, I'm keep thinking it can do it (otherwise I don't see reason to sell | quad 1Gbps nic). | | Are you suggesting that because you have a quad-port gig nic, your box | should be able to do 6 *million* packets per second? By that logic my | 5-port Soekris net4801 should be able to handle 740kpps. (for reference, | the net4801 does about 3kpps with 4.9) No, I don't suggest that, I simply think it strange to have these kind of hardware specification (bus length and speed and bgps nic) and can't handle something like 160kpps in packets when the 'only' (i.e. no userland application) job of the server is to forward packets and that server seems to be 90% idle. | I'm ok to hear that I've purchased crappy motherboard card | or nic (but I'd like to understand why they are crappy). | | It has nothing to do with hardware crappiness, it has to do with your | expectations. Your box should certainly be able to fill a few of your | gig ports with 1500byte packets, but there is no way it'll handle a full | 4 gigabits / second of TCP syn packets. I don't expect that numbers. | I've spent days and days making tests, searches, reading kernel source | code and so on because I think it's interesting for the community to | find where the problem come from and how to solve it (if possible). If | finally the answer is that OpenBSD (or may be any other OS) can't | forward more than 150kpps without losing 1 to 20 pps with this | hardware, I'll live with it. | | Are you actually complaining about 1 to 20 errors per second? That's | 0.01% packet loss, welcome to ethernet. You will not see this change by | switching to different hardware or OS. I'm not complaining, I just try to see if it's 'normal' to have these loss when server seems not very loaded or if it hide a problem. | It /is/ possible that something is wrong with your box and you could be | getting a slightly higher throughput. But don't expect that we'll make | it handle 2 million PPS any time soon. Once again, I don't expect forwarding 2Mpps nor 4Gbps. | However, don't bother just telling us there's something to improve. | We've working on this for years, we've already made huge improvements, | and we're always looking for more. Perhaps the biggest limitation on | modern hardware is that we can't split the packet handling across | multiple CPUs, but your input provides exactly ZERO help with changing | that. Please see my previous messages: I've never said I see Ierrs, please fix it. Claudio suggested a possible mbuf leak problem and I've asked how can I try to confirm (or not) that. You've also pointed out high livelocks value so I've understood it as there's may be something wrong somewhere. I've provided requested information to help us trying to see if there's a problem or not. I'm not hardware expert, not driver expert and even not OpenBSD expert, I just try to understand and may be help improving things; All my apologies if my previous messages didn't reflect that. Manuel -- __ Manuel Guesdon - OXYMIUM
Re: network bandwith with em(4)
| On Thu, Mar 03, 2011 at 03:52:54PM +0100, Manuel Guesdon wrote: | | I think we already mentioned it that you will always see Ierr. The | | question is if the box is able to forward more then 150kpps. | | Yes that's one a the questions. We can divide it into 3 questions: | 1) is the limitation comes from hardware ? | 2) is the limitation comes from OpenBSD ? | 3) is the limitation comes from the way OpenBSD exploit hardware. | | 1) Except if someone explain by a+b why the hardware can't forward this | rate, I'm keep thinking it can do it (otherwise I don't see reason to sell | quad 1Gbps nic). | | Are you suggesting that because you have a quad-port gig nic, your box | should be able to do 6 *million* packets per second? By that logic my | 5-port Soekris net4801 should be able to handle 740kpps. (for reference, | the net4801 does about 3kpps with 4.9) No, I don't suggest that, I simply think it strange to have these kind of hardware specification (bus length and speed and bgps nic) [...] It is strange that the vendors of these hardware products lie with statistics. You are astoundingly naive. We simply don't need the grief of entertaining users like you.
Re: network bandwith with em(4)
On Thu, 3 Mar 2011 00:51:46 + (UTC) Stuart Henderson s...@spacehopper.org wrote: | On 2011-02-28, Manuel Guesdon ml+openbsd.m...@oxymium.net wrote: | http://www.oxymium.net/tmp/core3-dmesg | | ipmi0 at mainbus0: version 2.0 interface KCS iobase 0xca2/2 spacing 1 | | ipmi is disabled in GENERIC. have you tried without it? Not on this server (I can't reboot it often) but on another one with same hardware: it doesn't seems to make difference (it still have Ierr). Manuel -- __ Manuel Guesdon - OXYMIUM
Re: network bandwith with em(4)
On Thu, Mar 03, 2011 at 09:11:13AM +0100, Manuel Guesdon wrote: On Thu, 3 Mar 2011 00:51:46 + (UTC) Stuart Henderson s...@spacehopper.org wrote: | On 2011-02-28, Manuel Guesdon ml+openbsd.m...@oxymium.net wrote: | http://www.oxymium.net/tmp/core3-dmesg | | ipmi0 at mainbus0: version 2.0 interface KCS iobase 0xca2/2 spacing 1 | | ipmi is disabled in GENERIC. have you tried without it? Not on this server (I can't reboot it often) but on another one with same hardware: it doesn't seems to make difference (it still have Ierr). This diff will help./sarcasm I think we already mentioned it that you will always see Ierr. The question is if the box is able to forward more then 150kpps. -- :wq Claudio Index: if_em.c === RCS file: /cvs/src/sys/dev/pci/if_em.c,v retrieving revision 1.249 diff -u -p -r1.249 if_em.c --- if_em.c 13 Feb 2011 19:45:54 - 1.249 +++ if_em.c 3 Mar 2011 10:01:39 - @@ -3194,14 +3194,7 @@ em_update_stats_counters(struct em_softc ifp-if_collisions = sc-stats.colc; /* Rx Errors */ - ifp-if_ierrors = - sc-dropped_pkts + - sc-stats.rxerrc + - sc-stats.crcerrs + - sc-stats.algnerrc + - sc-stats.ruc + sc-stats.roc + - sc-stats.mpc + sc-stats.cexterr + - sc-rx_overruns; + ifp-if_ierrors = 0; /* Tx Errors */ ifp-if_oerrors = sc-stats.ecol + sc-stats.latecol +
Re: network bandwith with em(4)
On Thu, 3 Mar 2011 11:12:09 +0100 Claudio Jeker cje...@diehard.n-r-g.com wrote: | On Thu, Mar 03, 2011 at 09:11:13AM +0100, Manuel Guesdon wrote: | On Thu, 3 Mar 2011 00:51:46 + (UTC) | Stuart Henderson s...@spacehopper.org wrote: | | | On 2011-02-28, Manuel Guesdon ml+openbsd.m...@oxymium.net wrote: | | http://www.oxymium.net/tmp/core3-dmesg | | | | ipmi0 at mainbus0: version 2.0 interface KCS iobase 0xca2/2 spacing 1 | | | | ipmi is disabled in GENERIC. have you tried without it? | | Not on this server (I can't reboot it often) but on another one with same | hardware: it doesn't seems to make difference (it still have Ierr). | | | This diff will help./sarcasm Of course and s/OpenBSD/FreeBSD/ may help too but none of these proposals seems very constructive. | I think we already mentioned it that you will always see Ierr. The | question is if the box is able to forward more then 150kpps. Yes that's one a the questions. We can divide it into 3 questions: 1) is the limitation comes from hardware ? 2) is the limitation comes from OpenBSD ? 3) is the limitation comes from the way OpenBSD exploit hardware. 1) Except if someone explain by a+b why the hardware can't forward this rate, I'm keep thinking it can do it (otherwise I don't see reason to sell quad 1Gbps nic). I'm ok to hear that I've purchased crappy motherboard card or nic (but I'd like to understand why they are crappy). The last 2 questions are still open in my mind. I've spent days and days making tests, searches, reading kernel source code and so on because I think it's interesting for the community to find where the problem come from and how to solve it (if possible). If finally the answer is that OpenBSD (or may be any other OS) can't forward more than 150kpps without losing 1 to 20 pps with this hardware, I'll live with it. But as we've already seen that increasing int/s improve performances (for good or bad reason), I keep thinking there's something to improve or fix but I may be wrong. Anyway, thank you for your work and help. Manuel -- __ Manuel Guesdon - OXYMIUM
Re: network bandwith with em(4)
- Original Message - | On Thu, Mar 03, 2011 at 09:11:13AM +0100, Manuel Guesdon wrote: | On Thu, 3 Mar 2011 00:51:46 + (UTC) | Stuart Henderson s...@spacehopper.org wrote: | | | On 2011-02-28, Manuel Guesdon ml+openbsd.m...@oxymium.net | | wrote: | | http://www.oxymium.net/tmp/core3-dmesg | | | | ipmi0 at mainbus0: version 2.0 interface KCS iobase 0xca2/2 | | spacing 1 | | | | ipmi is disabled in GENERIC. have you tried without it? | | Not on this server (I can't reboot it often) but on another one with | same | hardware: it doesn't seems to make difference (it still have Ierr). | | | This diff will help./sarcasm | I think we already mentioned it that you will always see Ierr. The | question is if the box is able to forward more then 150kpps. | | -- | :wq Claudio | | Index: if_em.c | === | RCS file: /cvs/src/sys/dev/pci/if_em.c,v | retrieving revision 1.249 | diff -u -p -r1.249 if_em.c | --- if_em.c 13 Feb 2011 19:45:54 - 1.249 | +++ if_em.c 3 Mar 2011 10:01:39 - | @@ -3194,14 +3194,7 @@ em_update_stats_counters(struct em_softc | ifp-if_collisions = sc-stats.colc; | | /* Rx Errors */ | - ifp-if_ierrors = | - sc-dropped_pkts + | - sc-stats.rxerrc + | - sc-stats.crcerrs + | - sc-stats.algnerrc + | - sc-stats.ruc + sc-stats.roc + | - sc-stats.mpc + sc-stats.cexterr + | - sc-rx_overruns; | + ifp-if_ierrors = 0; | | /* Tx Errors */ | ifp-if_oerrors = sc-stats.ecol + sc-stats.latecol + Hey Claudio, Thanks! This diff helped and now my errors have gone to zero! LOL! That was funny. -- James A. Peltier IT Services - Research Computing Group Simon Fraser University - Burnaby Campus Phone : 778-782-6573 Fax : 778-782-3045 E-Mail : jpelt...@sfu.ca Website : http://www.sfu.ca/itservices http://blogs.sfu.ca/people/jpeltier
Re: network bandwith with em(4)
W dniu 2011-03-02 13:52, Ryan McBride pisze: On Mon, Feb 28, 2011 at 12:49:01PM +0100, Manuel Guesdon wrote: OK. Anyway NIC buffers restrict buffered packets number. But the problem remain: why a (for exemple) dual Xeon E5520@2.27GHz with Intel PRO/1000 (82576) can't route 150kpps without Ierr :-) http://www.oxymium.net/tmp/core3-dmesg I've done some more comprehensive testing and talked to some other developers, and it seems that 150kpps is in the range of what is expected for such hardware with an unoptimized install. One thing that seems to have a big performance impact is net.inet.ip.ifq.maxlen. If and only if your network cards are all supported by MCLGETI (ie, they show LWM/CWM/HWM values in 'systat mbufs', you can try increasing ifq.maxlen until you don't see net.inet.ip.ifq.drops incrementing anymore under constant load. On my test box here - Intel(R) Xeon(R) CPU 5140 @ 2.33GHz with em(4), pf disabled - increasing net.inet.ip.ifq.maxlen to 8192 gets more than double the performance compared with the default of 256. We're looking at making the ifq.maxlen tune itself so you don't have to twiddle this knob anymore, not sure if and when that will happen though. I also have problems with bandwidth on em(4). On default clean 4.8 install i get 430Mbit/s. (with pf and altq enabled it's only 275Mbit/s). systat shows: 31.7%Int 62.1%Sys 0.0%Usr 0.0%Nic 6.2%Idle ||||||||||| === Interrupts 8025 total 100 clock 7921 em0 4 ichiic0 http://erydium.pl/upload/vmstat.gif http://erydium.pl/upload/systat.gif http://erydium.pl/upload/kern_profiling.txt my hardware: box: Lenovo ThinkCentre A51P nic: Intel PRO/1000 PT Desktop Adapter (PCIe, model: EXPI9300PTBLK) DMESG: OpenBSD 4.8 (KERN_PROF.PROF) #0: Thu Dec 30 13:25:40 CET 2010 r...@router-test.local.kig:/usr/src/sys/arch/i386/compile/KERN_PROF.PROF cpu0: Intel(R) Celeron(R) CPU 2.80GHz (GenuineIntel 686-class) 2.80 GHz cpu0: FPU,V86,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,SBF,SSE3,MWAIT,DS-CPL,TM2,CNXT-ID,xTPR real mem = 526938112 (502MB) avail mem = 508166144 (484MB) mainbus0 at root bios0 at mainbus0: AT/286+ BIOS, date 05/10/07, BIOS32 rev. 0 @ 0xfd6dc, SMBIOS rev. 2.34 @ 0xefc60 (52 entries) bios0: vendor IBM version 2BKT52AUS date 05/10/2007 bios0: IBM 8422W4P acpi0 at bios0: rev 0 acpi0: sleep states S0 S1 S3 S4 S5 acpi0: tables DSDT FACP TCPA APIC BOOT MCFG acpi0: wakeup devices EXP0(S5) EXP1(S5) EXP2(S5) EXP3(S5) USB1(S3) USB2(S3) USB3(S3) USB4(S3) USBE(S3) SLOT(S5) KBC_(S3) PSM_(S3) COMA(S5) COMB(S5) acpitimer0 at acpi0: 3579545 Hz, 24 bits acpimadt0 at acpi0 addr 0xfee0: PC-AT compat cpu0 at mainbus0: apid 0 (boot processor) cpu0: apic clock running at 133MHz ioapic0 at mainbus0: apid 1 pa 0xfec0, version 20, 24 pins acpiprt0 at acpi0: bus 0 (PCI0) acpiprt1 at acpi0: bus -1 (PEG_) acpiprt2 at acpi0: bus 2 (EXP0) acpiprt3 at acpi0: bus -1 (EXP1) acpiprt4 at acpi0: bus -1 (EXP2) acpiprt5 at acpi0: bus -1 (EXP3) acpiprt6 at acpi0: bus 10 (SLOT) acpicpu0 at acpi0 acpitz0 at acpi0: critical temperature 105 degC acpibtn0 at acpi0: PWRB bios0: ROM list: 0xc/0xae00! 0xcb000/0x1000 0xcc000/0x2000 0xce000/0x800 0xce800/0x800 0xe/0x1! pci0 at mainbus0 bus 0: configuration mode 1 (bios) pchb0 at pci0 dev 0 function 0 Intel 82915G Host rev 0x04 vga1 at pci0 dev 2 function 0 Intel 82915G Video rev 0x04 wsdisplay0 at vga1 mux 1: console (80x25, vt100 emulation) wsdisplay0: screen 1-5 added (80x25, vt100 emulation) intagp0 at vga1 agp0 at intagp0: aperture at 0xc000, size 0x1000 inteldrm0 at vga1: apic 1 int 16 (irq 5) drm0 at inteldrm0 ppb0 at pci0 dev 28 function 0 Intel 82801FB PCIE rev 0x03: apic 1 int 17 (irq 5) pci1 at ppb0 bus 2 em0 at pci1 dev 0 function 0 Intel PRO/1000 PT (82572EI) rev 0x06: apic 1 int 16 (irq 5), address 00:1b:21:05:1f:39 uhci0 at pci0 dev 29 function 0 Intel 82801FB USB rev 0x03: apic 1 int 23 (irq 11) uhci1 at pci0 dev 29 function 1 Intel 82801FB USB rev 0x03: apic 1 int 19 (irq 9) uhci2 at pci0 dev 29 function 2 Intel 82801FB USB rev 0x03: apic 1 int 18 (irq 10) uhci3 at pci0 dev 29 function 3 Intel 82801FB USB rev 0x03: apic 1 int 16 (irq 5) ehci0 at pci0 dev 29 function 7 Intel 82801FB USB rev 0x03: apic 1 int 23 (irq 11) usb0 at ehci0: USB revision 2.0 uhub0 at usb0 Intel EHCI root hub rev 2.00/1.00 addr 1 ppb1 at pci0 dev 30 function 0 Intel 82801BA Hub-to-PCI rev 0xd3 pci2 at ppb1 bus 10 xl0 at pci2 dev 10 function 0 3Com 3c905C 100Base-TX rev 0x74: apic 1 int 22 (irq 3), address 00:04:76:0b:90:9f bmtphy0 at xl0 phy 24: 3C905C internal PHY, rev. 6 bge0 at pci2 dev 11 function 0 Broadcom BCM5705K rev 0x03, BCM5705 A3 (0x3003): apic 1 int 16 (irq 5), address 00:11:25:4f:9a:f4 brgphy0 at bge0 phy 1: BCM5705 10/100/1000baseT PHY, rev. 2 xl1 at pci2 dev 12 function 0 3Com 3c905C 100Base-TX rev 0x74: apic 1 int
Re: network bandwith with em(4)
On Mon, Feb 28, 2011 at 12:49:01PM +0100, Manuel Guesdon wrote: OK. Anyway NIC buffers restrict buffered packets number. But the problem remain: why a (for exemple) dual Xeon E5520@2.27GHz with Intel PRO/1000 (82576) can't route 150kpps without Ierr :-) http://www.oxymium.net/tmp/core3-dmesg I've done some more comprehensive testing and talked to some other developers, and it seems that 150kpps is in the range of what is expected for such hardware with an unoptimized install. One thing that seems to have a big performance impact is net.inet.ip.ifq.maxlen. If and only if your network cards are all supported by MCLGETI (ie, they show LWM/CWM/HWM values in 'systat mbufs', you can try increasing ifq.maxlen until you don't see net.inet.ip.ifq.drops incrementing anymore under constant load. On my test box here - Intel(R) Xeon(R) CPU 5140 @ 2.33GHz with em(4), pf disabled - increasing net.inet.ip.ifq.maxlen to 8192 gets more than double the performance compared with the default of 256. We're looking at making the ifq.maxlen tune itself so you don't have to twiddle this knob anymore, not sure if and when that will happen though.
Re: network bandwith with em(4)
On Wed, 2 Mar 2011 21:52:03 +0900 Ryan McBride mcbr...@openbsd.org wrote: | On Mon, Feb 28, 2011 at 12:49:01PM +0100, Manuel Guesdon wrote: | OK. Anyway NIC buffers restrict buffered packets number. But the problem | remain: why a (for exemple) dual Xeon E5520@2.27GHz with Intel PRO/1000 | (82576) can't route 150kpps without Ierr :-) | http://www.oxymium.net/tmp/core3-dmesg | | I've done some more comprehensive testing and talked to some other | developers, and it seems that 150kpps is in the range of what is | expected for such hardware with an unoptimized install. Thank you for the help ! | One thing that seems to have a big performance impact is | net.inet.ip.ifq.maxlen. If and only if your network cards are all | supported by MCLGETI (ie, they show LWM/CWM/HWM values in 'systat | mbufs', you can try increasing ifq.maxlen until you don't see | net.inet.ip.ifq.drops incrementing anymore under constant load. Yes all my nic interfaces have LWM/CWM/HWM values: IFACE LIVELOCKS SIZE ALIVE LWM HWM CWM System256 837715502 2k 1601252 em0 372k 4 4 256 4 em1 2582k 4 4 256 4 em2 3727512k 7 4 256 7 em382582k 4 4 256 4 em4 250722k63 4 25663 em536582k 8 4 256 8 em6 5012882k24 4 25624 em7 222k 4 4 256 4 em8 365512k23 4 25623 em9 520532k 5 4 256 4 I've already increased to 2048 some time ago with good effect on ifq.drops but even when ifq.drops doesn't increase, I still have Ierrs on interfaces (I've just verified this right now) :-) I've made some change to em some time ago to dump card stats with -debug option and it give me this stuff like this: --- em4: Dropped PKTS = 0 em4: Excessive collisions = 0 em4: Symbol errors = 0 em4: Sequence errors = 0 em4: Defer count = 3938 em4: Missed Packets = 17728103 em4: Receive No Buffers = 21687370 em4: Receive Length Errors = 0 em4: Receive errors = 0 em4: Crc errors = 0 em4: Alignment errors = 0 em4: Carrier extension errors = 0 em4: RX overruns = 1456725 em4: watchdog timeouts = 0 em4: XON Rcvd = 31813 em4: XON Xmtd = 2304158 em4: XOFF Rcvd = 935928 em4: XOFF Xmtd = 20031226 em4: Good Packets Rcvd = 33772245185 em4: Good Packets Xmtd = 20662758161 --- em4: Dropped PKTS = 0 em4: Excessive collisions = 0 em4: Symbol errors = 0 em4: Sequence errors = 0 em4: Defer count = 3938 em4: Missed Packets = 17728457 em4: Receive No Buffers = 21687421 em4: Receive Length Errors = 0 em4: Receive errors = 0 em4: Crc errors = 0 em4: Alignment errors = 0 em4: Carrier extension errors = 0 em4: RX overruns = 1456730 em4: watchdog timeouts = 0 em4: XON Rcvd = 31813 em4: XON Xmtd = 2304166 em4: XOFF Rcvd = 935928 em4: XOFF Xmtd = 20031588 em4: Good Packets Rcvd = 33772265127 em4: Good Packets Xmtd = 20662759039 So If I well understand this, the card indicate that there are Missed Packets because the nic have sometime not enough buffer space to store them which seems stange with 8000 int/s and an 40K buffer (40K for Rx, 24K for Tx as seen in if_em.c) One of my interrogation is how to know that the system is heavy loaded. systat -s 2 vmstat, give me these informations: Proc:r d s wCsw Trp Sys Int Sof Flt 14 149 2 509 2011898 31 3.5%Int 0.5%Sys 0.0%Usr 0.0%Nic 96.0%Idle ||||||||||| which make me think that the system is really not very loaded but I may miss a point Manuel -- __ Manuel Guesdon - OXYMIUM
Re: network bandwith with em(4)
On Wed, Mar 02, 2011 at 08:34:02PM +0100, Manuel Guesdon wrote: On Wed, 2 Mar 2011 21:52:03 +0900 Ryan McBride mcbr...@openbsd.org wrote: | On Mon, Feb 28, 2011 at 12:49:01PM +0100, Manuel Guesdon wrote: | OK. Anyway NIC buffers restrict buffered packets number. But the problem | remain: why a (for exemple) dual Xeon E5520@2.27GHz with Intel PRO/1000 | (82576) can't route 150kpps without Ierr :-) | http://www.oxymium.net/tmp/core3-dmesg | | I've done some more comprehensive testing and talked to some other | developers, and it seems that 150kpps is in the range of what is | expected for such hardware with an unoptimized install. Thank you for the help ! Hmpf. My last tests where done with ix(4) and it performed way better. Not sure if something got back into em(4) that makes the driver slow or if it is something different. | One thing that seems to have a big performance impact is | net.inet.ip.ifq.maxlen. If and only if your network cards are all | supported by MCLGETI (ie, they show LWM/CWM/HWM values in 'systat | mbufs', you can try increasing ifq.maxlen until you don't see | net.inet.ip.ifq.drops incrementing anymore under constant load. Yes all my nic interfaces have LWM/CWM/HWM values: IFACE LIVELOCKS SIZE ALIVE LWM HWM CWM System256 837715502 2k 1601252 em0 372k 4 4 256 4 em1 2582k 4 4 256 4 em2 3727512k 7 4 256 7 em382582k 4 4 256 4 em4 250722k63 4 25663 em536582k 8 4 256 8 em6 5012882k24 4 25624 em7 222k 4 4 256 4 em8 365512k23 4 25623 em9 520532k 5 4 256 4 Woohoo. That is a lot of livelocks you hit. In other words you are losing ticks by something spinning to long in the kernel. Interfaces with a very low CWM but a high pps rate are the ones you need to investigate about. Additionally I would like to see your netstat -m and vmstat -m output. If I see it right you have 83771 mbufs allocated in your system. This sounds like a serious mbuf leak and could actually be the reason for your bad performance. It is very well possible that most of your buffer allocations fail causing the tiny rings and suboptimal performance. I've already increased to 2048 some time ago with good effect on ifq.drops but even when ifq.drops doesn't increase, I still have Ierrs on interfaces (I've just verified this right now) :-) Having some Ierrs is not a big issue always put them in perspective with the number of packets received. e.g. em6 1500 Link 00:30:48:9c:3a:80 72007980648 143035 62166589667 0 0 This interface had 143035 Ierrs but it also passed 72 billion packets so this is far less then 1% and not a problem. I've made some change to em some time ago to dump card stats with -debug option and it give me this stuff like this: --- em4: Dropped PKTS = 0 em4: Excessive collisions = 0 em4: Symbol errors = 0 em4: Sequence errors = 0 em4: Defer count = 3938 em4: Missed Packets = 17728103 em4: Receive No Buffers = 21687370 em4: Receive Length Errors = 0 em4: Receive errors = 0 em4: Crc errors = 0 em4: Alignment errors = 0 em4: Carrier extension errors = 0 em4: RX overruns = 1456725 em4: watchdog timeouts = 0 em4: XON Rcvd = 31813 em4: XON Xmtd = 2304158 em4: XOFF Rcvd = 935928 em4: XOFF Xmtd = 20031226 em4: Good Packets Rcvd = 33772245185 em4: Good Packets Xmtd = 20662758161 --- em4: Dropped PKTS = 0 em4: Excessive collisions = 0 em4: Symbol errors = 0 em4: Sequence errors = 0 em4: Defer count = 3938 em4: Missed Packets = 17728457 em4: Receive No Buffers = 21687421 em4: Receive Length Errors = 0 em4: Receive errors = 0 em4: Crc errors = 0 em4: Alignment errors = 0 em4: Carrier extension errors = 0 em4: RX overruns = 1456730 em4: watchdog timeouts = 0 em4: XON Rcvd = 31813 em4: XON Xmtd = 2304166 em4: XOFF Rcvd = 935928 em4: XOFF Xmtd = 20031588 em4: Good Packets Rcvd = 33772265127 em4: Good Packets Xmtd = 20662759039 So If I well understand this, the card indicate that there are Missed Packets because the nic have sometime not enough buffer space to store them which seems stange with 8000 int/s and an 40K buffer (40K for Rx, 24K for Tx as seen in if_em.c) The FIFO on the card don't matter that much. The problem is the DMA ring and the amount of slots on the ring that are actually usable. This is the CWM in the systat mbuf output. MCLGETI() reduces the buffers on the ring to limit the work getting into the system over a specific network card. One of my interrogation is how to know that the system is
Re: network bandwith with em(4)
On Wed, 2 Mar 2011 21:12:24 +0100 Claudio Jeker cje...@diehard.n-r-g.com wrote: | | One thing that seems to have a big performance impact is | | net.inet.ip.ifq.maxlen. If and only if your network cards are all | | supported by MCLGETI (ie, they show LWM/CWM/HWM values in 'systat | | mbufs', you can try increasing ifq.maxlen until you don't see | | net.inet.ip.ifq.drops incrementing anymore under constant load. | | Yes all my nic interfaces have LWM/CWM/HWM values: | IFACE LIVELOCKS SIZE ALIVE LWM HWM CWM | System256 837715502 | 2k 1601252 | em0 372k 4 4 256 4 | em1 2582k 4 4 256 4 | em2 3727512k 7 4 256 7 | em382582k 4 4 256 4 | em4 250722k63 4 25663 | em536582k 8 4 256 8 | em6 5012882k24 4 25624 | em7 222k 4 4 256 4 | em8 365512k23 4 25623 | em9 520532k 5 4 256 4 | | | Woohoo. That is a lot of livelocks you hit. In other words you are losing | ticks by something spinning to long in the kernel. Interfaces with a very | low CWM but a high pps rate are the ones you need to investigate about. Hum OK. A strange thing on livelocks is the big difference beetwen for example em2 and em4: NameMtu Network Ipkts Ierrs Opkts Oerrs Colls em2 1500 Link 886803460042899 6562765482 0 0 em2 1500 fe80::%em2/ 886803460042899 6562765482 0 0 em4 1500 Link 33934108692 19371393 20672882997 0 0 em4 1500 fe80::%em4/ 33934108692 19371393 20672882997 0 0 There's more livelocks on em2 but less packets (or may be counters were reseted to 0 after reaching max value) | Additionally I would like to see your netstat -m and vmstat -m output. netstat -m: 18472 mbufs in use: 18449 mbufs allocated to data 16 mbufs allocated to packet headers 7 mbufs allocated to socket names and addresses 331/4188/6144 mbuf 2048 byte clusters in use (current/peak/max) 0/8/6144 mbuf 4096 byte clusters in use (current/peak/max) 0/8/6144 mbuf 8192 byte clusters in use (current/peak/max) 0/8/6144 mbuf 9216 byte clusters in use (current/peak/max) 0/8/6144 mbuf 12288 byte clusters in use (current/peak/max) 0/8/6144 mbuf 16384 byte clusters in use (current/peak/max) 0/8/6144 mbuf 65536 byte clusters in use (current/peak/max) 30704 Kbytes allocated to network (70% in use) 0 requests for memory denied 0 requests for memory delayed 0 calls to protocol drain routines vmstat -m: Memory statistics by bucket size Size In Use Free Requests HighWater Couldfree 16 113578 195414 324140581280 6712 32 378705687 74930489 640 6824 64 7707869 11878746 320 27074 12811411 45 36424677 160 78 256 7875973 328666338 80 60487950 512 1951 656017929 40 413368 1024 3311771947159 20 880831 2048 57 3 496398 10 0 4096 5164 15 260948 5 166561 8192 36 5 226431 5 18240 16384 12 08279177 5 0 327685 0 11 5 0 655362 0 2 5 0 Memory usage type by bucket size Size Type(s) 16 devbuf, pcb, routetbl, sysctl, UFS mount, dirhash, ACPI, exec, xform_data, VM swap, UVM amap, UVM aobj, USB, USB device, temp 32 devbuf, pcb, routetbl, ifaddr, UFS mount, sem, dirhash, ACPI, ip_moptions, in_multi, exec, pfkey data, xform_data, UVM amap, USB, temp 64 devbuf, pcb, routetbl, fragtbl, ifaddr, vnodes, UFS mount, dirhash, ACPI, proc, VFS cluster, in_multi, ether_multi, VM swap, UVM amap, USB, USB device, NDP, temp 128 devbuf, pcb, routetbl, fragtbl, ifaddr, mount, sem, dirhash, ACPI, VFS cluster, MFS node, NFS srvsock, ip_moptions, ttys, pfkey data, UVM amap, USB, USB device, NDP, temp 256 devbuf, routetbl, ifaddr, ioctlops, iov, vnodes, shm, VM map, dirhash, ACPI, ip_moptions, exec, UVM amap, USB, USB device, ip6_options, temp 512 devbuf, ifaddr, sysctl, ioctlops, iov, vnodes, dirhash, file desc, NFS daemon, ttys, newblk, UVM amap, USB, USB device, temp 1024 devbuf, pcb, sysctl, ioctlops, iov, mount, UFS mount, shm, ACPI, proc, ttys, exec, UVM amap, USB HC, crypto data,
Re: network bandwith with em(4)
Claudio Jeker wrote: On Wed, Mar 02, 2011 at 08:34:02PM +0100, Manuel Guesdon wrote: On Wed, 2 Mar 2011 21:52:03 +0900 Ryan McBride mcbr...@openbsd.org wrote: | On Mon, Feb 28, 2011 at 12:49:01PM +0100, Manuel Guesdon wrote: | OK. Anyway NIC buffers restrict buffered packets number. But the problem | remain: why a (for exemple) dual Xeon E5520@2.27GHz with Intel PRO/1000 | (82576) can't route 150kpps without Ierr :-) | http://www.oxymium.net/tmp/core3-dmesg | | I've done some more comprehensive testing and talked to some other | developers, and it seems that 150kpps is in the range of what is | expected for such hardware with an unoptimized install. Thank you for the help ! Hmpf. My last tests where done with ix(4) and it performed way better. Not sure if something got back into em(4) that makes the driver slow or if it is something different. According to http://www.oxymium.net/tmp/core3-dmesg, interrupts are shared heavily (see dmesg parts below). Most problematic (wrt livelocks) em6 uses apic 9 int 15 which is shared by other devices including PCIe bridges. Is there any possibility for PCIe bridge to conflict with slave device if interrupt is shared and have excessive livelocks as a result? How bridge interrupts are handled inside kernel? Alexey ppb6 at pci6 dev 1 function 0 PLX PEX 8533 rev 0xaa: apic 9 int 13 (irq 11) pci7 at ppb6 bus 7 ppb0 at pci0 dev 1 function 0 Intel X58 PCIE rev 0x13 pci1 at ppb0 bus 1 em0 at pci1 dev 0 function 0 Intel PRO/1000 (82576) rev 0x01: apic 9 int 4 (irq 10), address 00:30:48:9f:17:52 em1 at pci1 dev 0 function 1 Intel PRO/1000 (82576) rev 0x01: apic 9 int 16 (irq 11), address 00:30:48:9f:17:53 ppb7 at pci6 dev 8 function 0 PLX PEX 8533 rev 0xaa: apic 9 int 6 (irq 10) pci8 at ppb7 bus 8 ppb9 at pci9 dev 1 function 0 PLX PEX 8518 rev 0xac: apic 9 int 13 (irq 11) pci10 at ppb9 bus 10 em2 at pci10 dev 0 function 0 Intel PRO/1000 (82576) rev 0x01: apic 9 int 13 (irq 11), address 00:25:90:05:53:3c em3 at pci10 dev 0 function 1 Intel PRO/1000 (82576) rev 0x01: apic 9 int 15 (irq 15), address 00:25:90:05:53:3d ppb10 at pci9 dev 2 function 0 PLX PEX 8518 rev 0xac: apic 9 int 15 (irq 15) pci11 at ppb10 bus 11 em4 at pci11 dev 0 function 0 Intel PRO/1000 (82576) rev 0x01: apic 9 int 15 (irq 15), address 00:25:90:05:53:3e em5 at pci11 dev 0 function 1 Intel PRO/1000 (82576) rev 0x01: apic 9 int 14 (irq 14), address 00:25:90:05:53:3f ppb11 at pci6 dev 9 function 0 PLX PEX 8533 rev 0xaa: apic 9 int 13 (irq 11) pci12 at ppb11 bus 12 ppb13 at pci13 dev 1 function 0 PLX PEX 8518 rev 0xac: apic 9 int 15 (irq 15) pci14 at ppb13 bus 14 em6 at pci14 dev 0 function 0 Intel PRO/1000 (82576) rev 0x01: apic 9 int 15 (irq 15), address 00:25:90:05:51:d8 em7 at pci14 dev 0 function 1 Intel PRO/1000 (82576) rev 0x01: apic 9 int 14 (irq 14), address 00:25:90:05:51:d9 ppb14 at pci13 dev 2 function 0 PLX PEX 8518 rev 0xac: apic 9 int 14 (irq 14) pci15 at ppb14 bus 15 em8 at pci15 dev 0 function 0 Intel PRO/1000 (82576) rev 0x01: apic 9 int 14 (irq 14), address 00:25:90:05:51:da em9 at pci15 dev 0 function 1 Intel PRO/1000 (82576) rev 0x01: apic 9 int 6 (irq 10), address 00:25:90:05:51:db
Re: network bandwith with em(4)
On 2011-02-28, Manuel Guesdon ml+openbsd.m...@oxymium.net wrote: http://www.oxymium.net/tmp/core3-dmesg ipmi0 at mainbus0: version 2.0 interface KCS iobase 0xca2/2 spacing 1 ipmi is disabled in GENERIC. have you tried without it?
Re: network bandwith with em(4)
On Thu, 24 Feb 2011 22:03:22 -0700 (MST) Theo de Raadt dera...@cvs.openbsd.org wrote: | We've got same problems (on a routeur, not a firewall). Increasing | MAX_INTS_PER_SEC to 24000 increased bandwith and lowered packet loss. | Our cards are Intel PRO/1000 (82576) and Intel PRO/1000 FP | (82576). | | Did you try to increase the number of descriptor? | #define EM_MAX_TXD 256 | #define EM_MAX_RXD 256 | | I've tried up to 2048 (and with MAX_INTS_PER_SEC = 16000) but it looks | worth. | | Say you increase this. | | That means on a single interrupt, the handler could be forced to handle | around 2000 packets. | | Nothing else will happen on the machine during that period. | | Can you say 'interrupt latency increase' boys and girls? OK. Anyway NIC buffers restrict buffered packets number. But the problem remain: why a (for exemple) dual Xeon E5520@2.27GHz with Intel PRO/1000 (82576) can't route 150kpps without Ierr :-) http://www.oxymium.net/tmp/core3-dmesg Manuel
Re: network bandwith with em(4)
On Mon, Feb 28, 2011 at 12:49:01PM +0100, Manuel Guesdon wrote: OK. Anyway NIC buffers restrict buffered packets number. But the problem remain: why a (for exemple) dual Xeon E5520@2.27GHz with Intel PRO/1000 (82576) can't route 150kpps without Ierr :-) http://www.oxymium.net/tmp/core3-dmesg Turn off hyperthreading, run a uniprocessor kernel rather than bsd.mp. I can't immediately tell if you're running i386 or amd64, but i386 will probably be better. There may be something else going on here, because 150kpps should be trivial for a box like this, but the advice above will certainly improve your situation. (Yes, it will hurt to know that 7 of your cores are doing nothing. Too bad, they're just slowing you down now)
Re: network bandwith with em(4)
On Mon, 28 Feb 2011 21:29:01 +0900 Ryan McBride mcbr...@openbsd.org wrote: | On Mon, Feb 28, 2011 at 12:49:01PM +0100, Manuel Guesdon wrote: | OK. Anyway NIC buffers restrict buffered packets number. But the problem | remain: why a (for exemple) dual Xeon E5520@2.27GHz with Intel PRO/1000 | (82576) can't route 150kpps without Ierr :-) | http://www.oxymium.net/tmp/core3-dmesg | | Turn off hyperthreading, run a uniprocessor kernel rather than bsd.mp. | I can't immediately tell if you're running i386 or amd64, but i386 will | probably be better. amd64 currently. | There may be something else going on here, because 150kpps should be | trivial for a box like this, but the advice above will certainly improve | your situation. Thank you ! I'll plan to test that ! | (Yes, it will hurt to know that 7 of your cores are doing nothing. Too | bad, they're just slowing you down now) Hum, I prefer to see it working well with only 1 core instead of working bad using 8 cores :-) Manuel -- __ Manuel Guesdon - OXYMIUM
Re: network bandwith with em(4)
Le Sat, 26 Feb 2011 00:23:36 +0900, Ryan McBride mcbr...@openbsd.org a icrit : How about a _full_ dmesg, so someone can take a wild guess at what your machine is capable of? full dmesg : http://user.lamaiziere.net/patrick/dmesg-open48.txt The box is a Dell R610 server. This box should be able to fill a gigabit of regular TCP traffic (1500 MTU) without any problem. Double-check your testing procedures. I will test this. I have some additional comments/questions though: 1) you probably don't want to run bsd.mp on a firewall, it'll hurt you more than it helps, unless you have significant CPU-bound userland stuff going on, for example antivirus scanning of email. I've tried with a sp kernel (amd64), does not look to change something. 2) You may get better performance running i386. I will try, but I do not expect a lot of difference on the IErr rate. 3) Besides the the em driver changes you've mentioned, is the source code you're building the kernel clean OPENBSD_4_8 -stable, or something else (4.8-current from after the 4.8 release, for example) It's a clean release 4.8/amd64, with 4.8 erratas applied. Thanks, regards.
Re: network bandwith with em(4)
Le 28/02/2011 16:51, Patrick Lamaiziere a icrit : Le Sat, 26 Feb 2011 00:23:36 +0900, Ryan McBridemcbr...@openbsd.org a icrit : How about a _full_ dmesg, so someone can take a wild guess at what your machine is capable of? full dmesg : http://user.lamaiziere.net/patrick/dmesg-open48.txt The box is a Dell R610 server. This box should be able to fill a gigabit of regular TCP traffic (1500 MTU) without any problem. Double-check your testing procedures. I will test this. As i said earlier, almost same setup with bnx(4) instead of em(4) (Dell R510 with a single Intel X5660) and we can send at Gigabit full duplex with only around 25% interrupt CPU. It looks like R610 has 4x bnx(4) iface (Broadcom BCM 5709) maybe you can try to use them just for testing purpose. I have some additional comments/questions though: 1) you probably don't want to run bsd.mp on a firewall, it'll hurt you more than it helps, unless you have significant CPU-bound userland stuff going on, for example antivirus scanning of email. I've tried with a sp kernel (amd64), does not look to change something. 2) You may get better performance running i386. I will try, but I do not expect a lot of difference on the IErr rate. 3) Besides the the em driver changes you've mentioned, is the source code you're building the kernel clean OPENBSD_4_8 -stable, or something else (4.8-current from after the 4.8 release, for example) It's a clean release 4.8/amd64, with 4.8 erratas applied. Thanks, regards.
Re: network bandwith with em(4)
OK. Anyway NIC buffers restrict buffered packets number. But the problem remain: why a (for exemple) dual Xeon E5520@2.27GHz with Intel PRO/1000 (82576) can't route 150kpps without Ierr :-) http://www.oxymium.net/tmp/core3-dmesg Just an idea, but may be it very well could have something to do with this: http://www.openbsd.org/want.html Specifically this part: # Intel 82576 SFP and 82580 based Gigabit Ethernet devices for improving hardware support in em(4). Needed in Hannover, Germany. Contact j...@openbsd.org. I assume that if the wanted is still there form that chipset based network card and you use the same chipset on yours that the support for it is not as good as it could be? Juts a thought, but I sure could well be way off as well. Food for thought. Best, Daniel
Re: network bandwith with em(4)
mendagen den 28 februari 2011 23.00.10 skrev Daniel Ouellet: OK. Anyway NIC buffers restrict buffered packets number. But the problem remain: why a (for exemple) dual Xeon E5520@2.27GHz with Intel PRO/1000 (82576) can't route 150kpps without Ierr :-) http://www.oxymium.net/tmp/core3-dmesg Just an idea, but may be it very well could have something to do with this: http://www.openbsd.org/want.html Specifically this part: # Intel 82576 SFP and 82580 based Gigabit Ethernet devices for improving hardware support in em(4). Needed in Hannover, Germany. Contact j...@openbsd.org. I've sent an 4-port network card (Intel NIC I340-T4 (82580 ethernet chipset)) to jsg which he did received 2011-01-18. -- //fredan
Re: network bandwith with em(4)
On 02/24/11 19:28, RLW wrote: W dniu 2011-02-24 12:11, Patrick Lamaiziere pisze: Le Wed, 23 Feb 2011 22:09:18 +0100, Manuel Guesdonml+openbsd.m...@oxymium.net a icrit : | Did you try to increase the number of descriptor? | #define EM_MAX_TXD 256 | #define EM_MAX_RXD 256 | | I've tried up to 2048 (and with MAX_INTS_PER_SEC = 16000) but it looks | worth. Thank you ! I'll investigate this ! As I said it is worth here. The load is increaded and I lose around 50 Mbits of bandwith. I was curious if you've made some tests on this. ok, so the conclusion might be, that if one want to have transfers bigger than 300mbit/s on em(4), one should tuning the em(4) driver source code? I have firewalls with more than 300Mbit/s and standard GENERIC.MP.
Re: network bandwith with em(4)
On Thu, Feb 24 2011 at 28:19, RLW wrote: [...] ok, so the conclusion might be, that if one want to have transfers bigger than 300mbit/s on em(4), one should tuning the em(4) driver source code? False Here are the tests I've done with a packet generator. http://marc.info/?l=openbsd-miscm=129534605406967w=2 Claer
Re: network bandwith with em(4)
Le Fri, 25 Feb 2011 08:41:20 +0900, Ryan McBride mcbr...@openbsd.org a icrit : On Wed, Feb 23, 2011 at 06:07:16PM +0100, Patrick Lamaiziere wrote: I log the congestion counter (each 10s) and there are at max 3 or 4 congestions per day. I don't think the bottleneck is pf. The congestion counter doesn't directly mean you have a bottleneck in PF; it's triggered by the IP input queue being full, and could indicate a bottleneck in other places as well, which PF tries to help out with by dropping packets earlier. Interface errors? Quite a lot. The output of `systat mbufs` is worth looking at, in particular the figure for LIVELOCKS, and the LWM/CWM figures for the interface(s) in question. If the livelocks value is very high, and the LWM/CWM numbers are very small, it is likely that the MCLGETI interface is protecting your system from being completly flattened by forcing the em card to drop packets (supported by your statement that the error rate is high). If it's bad enough MCLGETI will be so effective that the pf congestion counter will not get increment. systat mbufs: IFACELIVELOCKS SIZE ALIVE LWM HWM CWM System 256 375 149 2k 240 1125 em0 17722k 80 4 256 80 em1112k 5 4 256 5 em2 2932k 110 4 256 110 em3 em4182k 11 4 256 11 em5102k 12 4 256 12 em6142k 5 4 256 5 bnx032k 4 2 510 4 bnx112k 4 2 510 4 bnx312k 2 2 510 2 You mentioned the following in your initial email: #define MAX_INTS_PER_SEC8000 Do you think I can increase this value? The interrupt rate of the machine is at max ~60% (top). Increasing this value will likely hurt you. 60% interrupt rate sounds about right to me for a firewall system that is running at full tilt; 100% interrupt is very bad, if your system spends all cycles servicing interrupts it will not do very much of anything useful. dmesg: em0 at pci5 dev 0 function 0 Intel PRO/1000 QP (82571EB) rev 0x06: apic 1 int 13 (irq 14), address 00:15:17:ed:98:9d em4 at pci9 dev 0 function 0 Intel PRO/1000 QP (82575GB) rev 0x02: apic 1 int 23 (irq 11), address 00:1b:21:38:e0:80 How about a _full_ dmesg, so someone can take a wild guess at what your machine is capable of? -Ryan -- -- Patrick Lamaizihre CRI Universiti de Rennes 1 Til: 02 23 23 71 45
Re: network bandwith with em(4)
Le Fri, 25 Feb 2011 13:51:32 +0100, Patrick Lamaiziere patf...@davenulle.org a icrit : (ooops, push the wrong button) How about a _full_ dmesg, so someone can take a wild guess at what your machine is capable of? full dmesg : http://user.lamaiziere.net/patrick/dmesg-open48.txt The box is a Dell R610 server. Thanks, regards.
Re: network bandwith with em(4)
Le Fri, 25 Feb 2011 13:51:32 +0100, Patrick Lamaiziere patf...@davenulle.org a icrit : systat mbufs: IFACELIVELOCKS SIZE ALIVE LWM HWM CWM What does these counters mean? Thanks.
Re: network bandwith with em(4)
Le Tue, 22 Feb 2011 18:09:32 +0100, Patrick Lamaiziere patf...@davenulle.org a icrit : (4.8/amd64) Hello, I'm using two ethernet cards Intel 1000/PRO quad ports (gigabit) on a firewall (one fiber and one copper). The problem is that we don't get more than ~320 Mbits/s of bandwith beetween the internal networks and internet (gigabit). As far I can see, on load there is a number of Ierr on the interface connected to Internet (between 1% to 5%). Also the interrupt rate on this card is around ~7500 (using systat). In the em(4) driver, there is a limitation of the interrupt rate at 8000/s. ... Well, I've made some tests and increasing the number of interrupts or the number of RX descriptors does not help to reduce the Ierr count or to increase the bandwith. So I don't know where is the problem... Do you think the hardware used is not powerful enough ? (dmesg : http://user.lamaiziere.net/patrick/dmesg-openbsd4.8.txt). The box is a router/firewall, there are 6 interfaces on the box, one is connected to internet (the most busy interface). One is connected to the lan (very busy too). The others are far less busy. To give an idea, this box replaces an old Cisco 7204 which hangs at 200 Mbits, no more. I would be happy to know which kind of hardware you are using to build a gigabit router with good performance? Thanks to all. regards.
Re: network bandwith with em(4)
On Fri, Feb 25, 2011 at 02:05:30PM +0100, Patrick Lamaiziere wrote: Le Fri, 25 Feb 2011 13:51:32 +0100, Patrick Lamaiziere patf...@davenulle.org a icrit : (ooops, push the wrong button) How about a _full_ dmesg, so someone can take a wild guess at what your machine is capable of? full dmesg : http://user.lamaiziere.net/patrick/dmesg-open48.txt The box is a Dell R610 server. This box should be able to fill a gigabit of regular TCP traffic (1500 MTU) without any problem. Double-check your testing procedures. I have some additional comments/questions though: 1) you probably don't want to run bsd.mp on a firewall, it'll hurt you more than it helps, unless you have significant CPU-bound userland stuff going on, for example antivirus scanning of email. 2) You may get better performance running i386. 3) Besides the the em driver changes you've mentioned, is the source code you're building the kernel clean OPENBSD_4_8 -stable, or something else (4.8-current from after the 4.8 release, for example)
Re: network bandwith with em(4)
Hi, On Fri, 25 Feb 2011 08:41:20 +0900 Ryan McBride mcbr...@openbsd.org wrote: .. | The output of `systat mbufs` is worth looking at, in particular the | figure for LIVELOCKS, and the LWM/CWM figures for the interface(s) in | question. | | If the livelocks value is very high, and the LWM/CWM numbers are very | small, Thnak you for your help, Ryan. It seems I'm in this situation: 5 usersLoad 0.17 0.15 0.10 (1-48 of 58)Fri Feb 25 20:27:44 2011 IFACE LIVELOCKS SIZE ALIVE LWM HWM CWM System256 820505446 2k 2571252 lo0 em0 342k 4 4 256 4 em1 2572k 4 4 256 4 em2 3383822k 7 4 256 7 em382582k 4 4 256 4 em4 226352k48 4 25648 em534702k 6 4 256 6 em6 4582412k28 4 25628 em7 82k 4 4 256 4 em8 332322k50 4 25650 em9 468782k 4 4 256 4 systat -s 2 vmstat: 5 usersLoad 0.22 0.17 0.10 Fri Feb 25 20:28:18 2011 memory totals (in KB)PAGING SWAPPING Interrupts real virtual free in out in out25589 total Active 741204741204 1761104 ops 1600 clock All 1278264 1278264 1761104 pages 11 ipi 1 em0 Proc:r d s wCsw Trp Sys Int Sof Flt forks em1 1532 5 117 2286799 33 fkppw2691 em2 fksvm em3 3.2%Int 0.1%Sys 0.0%Usr 0.0%Nic 96.8%Idle pwait6778 em4 ||||||||||| relck 382 em5 ||rlkok7328 em6 noram em7 Namei Sys-cacheProc-cacheNo-cache ndcpy6724 em8 Calls hits%hits %miss % fltcp 74 em9 3 zfod uhci1 cow ehci0 Disks wd0 cd0 sd0 25328 fmin ehci1 seeks 33770 ftarg pciide0 xfers itarg com0 speed 241 wired com1 sec pdfre pckbc0 pdscn pzidle 44 kmapent 81542 IPKTS 78860 OPKTS (it's on a device with MAX_INTS_PER_SEC=8000) |it is likely that the MCLGETI interface is protecting your system | from being completly flattened by forcing the em card to drop packets | (supported by your statement that the error rate is high). | | How about a _full_ dmesg, so someone can take a wild guess at what | your machine is capable of? http://www.oxymium.net/tmp/core3-dmesg This device is not overloaded but it drop packets :-( Manuel
Re: network bandwith with em(4)
Le Wed, 23 Feb 2011 22:09:18 +0100, Manuel Guesdon ml+openbsd.m...@oxymium.net a icrit : | Did you try to increase the number of descriptor? | #define EM_MAX_TXD 256 | #define EM_MAX_RXD 256 | | I've tried up to 2048 (and with MAX_INTS_PER_SEC = 16000) but it looks | worth. Thank you ! I'll investigate this ! As I said it is worth here. The load is increaded and I lose around 50 Mbits of bandwith. I was curious if you've made some tests on this.
Re: network bandwith with em(4)
W dniu 2011-02-24 12:11, Patrick Lamaiziere pisze: Le Wed, 23 Feb 2011 22:09:18 +0100, Manuel Guesdonml+openbsd.m...@oxymium.net a icrit : | Did you try to increase the number of descriptor? | #define EM_MAX_TXD 256 | #define EM_MAX_RXD 256 | | I've tried up to 2048 (and with MAX_INTS_PER_SEC = 16000) but it looks | worth. Thank you ! I'll investigate this ! As I said it is worth here. The load is increaded and I lose around 50 Mbits of bandwith. I was curious if you've made some tests on this. ok, so the conclusion might be, that if one want to have transfers bigger than 300mbit/s on em(4), one should tuning the em(4) driver source code? best regards, RLW
Re: network bandwith with em(4)
On Wed, Feb 23, 2011 at 06:07:16PM +0100, Patrick Lamaiziere wrote: I log the congestion counter (each 10s) and there are at max 3 or 4 congestions per day. I don't think the bottleneck is pf. The congestion counter doesn't directly mean you have a bottleneck in PF; it's triggered by the IP input queue being full, and could indicate a bottleneck in other places as well, which PF tries to help out with by dropping packets earlier. Interface errors? Quite a lot. The output of `systat mbufs` is worth looking at, in particular the figure for LIVELOCKS, and the LWM/CWM figures for the interface(s) in question. If the livelocks value is very high, and the LWM/CWM numbers are very small, it is likely that the MCLGETI interface is protecting your system from being completly flattened by forcing the em card to drop packets (supported by your statement that the error rate is high). If it's bad enough MCLGETI will be so effective that the pf congestion counter will not get increment. You mentioned the following in your initial email: #define MAX_INTS_PER_SEC8000 Do you think I can increase this value? The interrupt rate of the machine is at max ~60% (top). Increasing this value will likely hurt you. 60% interrupt rate sounds about right to me for a firewall system that is running at full tilt; 100% interrupt is very bad, if your system spends all cycles servicing interrupts it will not do very much of anything useful. dmesg: em0 at pci5 dev 0 function 0 Intel PRO/1000 QP (82571EB) rev 0x06: apic 1 int 13 (irq 14), address 00:15:17:ed:98:9d em4 at pci9 dev 0 function 0 Intel PRO/1000 QP (82575GB) rev 0x02: apic 1 int 23 (irq 11), address 00:1b:21:38:e0:80 How about a _full_ dmesg, so someone can take a wild guess at what your machine is capable of? -Ryan
Re: network bandwith with em(4)
id like to reiterate ryans advice to have a look at the systat mbuf output. as he said, mclgeti will try to protect the host by restricting the number of packets placed on the rx rings. it turns out you dont need (or cant use) a lot of packets on the ring, so bumping the ring size is a useless tweak. mclgeti simply wont let you fill all those descriptors. if you were allowed to fill all 2048 entries on your modified rings, that would just mean you spend more time in the interrupt handler pulling packets off these rings and freeing them immediately because you have no time to process them. ie, increasing the ring size would actually slow down your forwarding rate if mclgeti was disabled. cheers, dlg On 25/02/2011, at 9:41 AM, Ryan McBride wrote: On Wed, Feb 23, 2011 at 06:07:16PM +0100, Patrick Lamaiziere wrote: I log the congestion counter (each 10s) and there are at max 3 or 4 congestions per day. I don't think the bottleneck is pf. The congestion counter doesn't directly mean you have a bottleneck in PF; it's triggered by the IP input queue being full, and could indicate a bottleneck in other places as well, which PF tries to help out with by dropping packets earlier. Interface errors? Quite a lot. The output of `systat mbufs` is worth looking at, in particular the figure for LIVELOCKS, and the LWM/CWM figures for the interface(s) in question. If the livelocks value is very high, and the LWM/CWM numbers are very small, it is likely that the MCLGETI interface is protecting your system from being completly flattened by forcing the em card to drop packets (supported by your statement that the error rate is high). If it's bad enough MCLGETI will be so effective that the pf congestion counter will not get increment. You mentioned the following in your initial email: #define MAX_INTS_PER_SEC8000 Do you think I can increase this value? The interrupt rate of the machine is at max ~60% (top). Increasing this value will likely hurt you. 60% interrupt rate sounds about right to me for a firewall system that is running at full tilt; 100% interrupt is very bad, if your system spends all cycles servicing interrupts it will not do very much of anything useful. dmesg: em0 at pci5 dev 0 function 0 Intel PRO/1000 QP (82571EB) rev 0x06: apic 1 int 13 (irq 14), address 00:15:17:ed:98:9d em4 at pci9 dev 0 function 0 Intel PRO/1000 QP (82575GB) rev 0x02: apic 1 int 23 (irq 11), address 00:1b:21:38:e0:80 How about a _full_ dmesg, so someone can take a wild guess at what your machine is capable of? -Ryan
Re: network bandwith with em(4)
We've got same problems (on a routeur, not a firewall). Increasing MAX_INTS_PER_SEC to 24000 increased bandwith and lowered packet loss. Our cards are Intel PRO/1000 (82576) and Intel PRO/1000 FP (82576). Did you try to increase the number of descriptor? #define EM_MAX_TXD 256 #define EM_MAX_RXD 256 I've tried up to 2048 (and with MAX_INTS_PER_SEC = 16000) but it looks worth. Say you increase this. That means on a single interrupt, the handler could be forced to handle around 2000 packets. Nothing else will happen on the machine during that period. Can you say 'interrupt latency increase' boys and girls?
Re: network bandwith with em(4)
Le Tue, 22 Feb 2011 19:13:48 +0100, Manuel Guesdon ml+openbsd.m...@oxymium.net a icrit : Hello, We've got same problems (on a routeur, not a firewall). Increasing MAX_INTS_PER_SEC to 24000 increased bandwith and lowered packet loss. Our cards are Intel PRO/1000 (82576) and Intel PRO/1000 FP (82576). Did you try to increase the number of descriptor? #define EM_MAX_TXD 256 #define EM_MAX_RXD 256 I've tried up to 2048 (and with MAX_INTS_PER_SEC = 16000) but it looks worth. My configuration is two firewalls in master/backup mode. On the first one the two most busy links are on the first card (Fiber). On the second, these two links are not on the same card, one is on the fiber card and the other on the cupper card. I've noticed today that the input Ierr rate is far lower on the second firewall than on the first. Is it possible to have a bottleneck on the ethernet card or on the bus? I will make more tests tomorrow... Thanks, regards.
Re: network bandwith with em(4)
Le Tue, 22 Feb 2011 10:22:16 -0800 (PST), James A. Peltier jpelt...@sfu.ca a icrit : Those documents do not necessarily apply any more. Don't go tweaking knobs until you know what they do. We have machines here that transfer nearly a gigabit of traffic/s without tuning in bridge mode non-the-less. Are you seeing any packet congestion markers (counter congestion) in systat pf? If so you might not have sufficient states available I log the congestion counter (each 10s) and there are at max 3 or 4 congestions per day. I don't think the bottleneck is pf. What about framentation? None. Interface errors? Quite a lot. There are many other non-tweakable issues that could cause this. Sure, it's hard to know. Thanks, regards.
Re: network bandwith with em(4)
On Wed, 23 Feb 2011 17:52:21 +0100 Patrick Lamaiziere patf...@davenulle.org wrote: | Le Tue, 22 Feb 2011 19:13:48 +0100, | Manuel Guesdon ml+openbsd.m...@oxymium.net a icrit : | | Hello, | | We've got same problems (on a routeur, not a firewall). Increasing | MAX_INTS_PER_SEC to 24000 increased bandwith and lowered packet loss. | Our cards are Intel PRO/1000 (82576) and Intel PRO/1000 FP | (82576). | | Did you try to increase the number of descriptor? | #define EM_MAX_TXD 256 | #define EM_MAX_RXD 256 | | I've tried up to 2048 (and with MAX_INTS_PER_SEC = 16000) but it looks | worth. Thank you ! I'll investigate this ! | My configuration is two firewalls in master/backup mode. On the first | one the two most busy links are on the first card (Fiber). On the | second, these two links are not on the same card, one is on the fiber | card and the other on the cupper card. I've noticed today that the | input Ierr rate is far lower on the second firewall than on the first. | | Is it possible to have a bottleneck on the ethernet card or on the bus? May be (but I'm not an expert :-). In my case, the bus doesn't seems to be the problem (cards are on the PCI #1 64-bit PCI Express on a X8DTU http://www.supermicro.com/products/motherboard/QPI/5500/X8DTU.cfm). Manuel -- __ Manuel Guesdon - OXYMIUM
network bandwith with em(4)
(4.8/amd64) Hello, I'm using two ethernet cards Intel 1000/PRO quad ports (gigabit) on a firewall (one fiber and one copper). The problem is that we don't get more than ~320 Mbits/s of bandwith beetween the internal networks and internet (gigabit). As far I can see, on load there is a number of Ierr on the interface connected to Internet (between 1% to 5%). Also the interrupt rate on this card is around ~7500 (using systat). In the em(4) driver, there is a limitation of the interrupt rate at 8000/s. if_em.h /* * MAX_INTS_PER_SEC (ITR - Interrupt Throttle Register) * The Interrupt Throttle Register (ITR) limits the delivery of interrupts * to a reasonable rate by providing a guaranteed inter-interrupt delay * between interrupts asserted by the Ethernet controller. */ #define MAX_INTS_PER_SEC8000 Do you think I can increase this value? The interrupt rate of the machine is at max ~60% (top). Other ideas to increase the bandwith would be welcome too. I don't think the limitation come from PF because I don't see any congestion. thanks, regards. -- dmesg: em0 at pci5 dev 0 function 0 Intel PRO/1000 QP (82571EB) rev 0x06: apic 1 int 13 (irq 14), address 00:15:17:ed:98:9d em4 at pci9 dev 0 function 0 Intel PRO/1000 QP (82575GB) rev 0x02: apic 1 int 23 (irq 11), address 00:1b:21:38:e0:80
Re: network bandwith with em(4)
On 22 Feb 2011, Patrick Lamaiziere wrote: The problem is that we don't get more than ~320 Mbits/s of bandwith beetween the internal networks and internet (gigabit). Have you already looked at: --- https://calomel.org/network_performance.html -- Mark Nipper ni...@bitgnome.net (XMPP) +1 979 575 3193
Re: network bandwith with em(4)
Hello, We kinda have the same setup, but with bnx(4) devices. And there is no problem. I'm used to download big files on FTP all over the world and we have gigabit connectivity without any pf related tuning. We are planning to use em(4) 82876 on another path to another ISP so if you find anything else, i'm very interested. Bonne soirie ;) Le 22/02/2011 18:19, Mark Nipper a icrit : On 22 Feb 2011, Patrick Lamaiziere wrote: The problem is that we don't get more than ~320 Mbits/s of bandwith beetween the internal networks and internet (gigabit). Have you already looked at: --- https://calomel.org/network_performance.html
Re: network bandwith with em(4)
Le Tue, 22 Feb 2011 11:19:26 -0600, Mark Nipper ni...@bitgnome.net a icrit : The problem is that we don't get more than ~320 Mbits/s of bandwith beetween the internal networks and internet (gigabit). Have you already looked at: --- https://calomel.org/network_performance.html Yes thanks. I've already increase the size of the net.inet.ip.ifq.maxlen. But I don't see the point of these tunings for a firewall. IMHO, it could help for a host handling tcp/udp connection. Anyway, I've tried, that does not change anything and I don't think it should. I'm not a network expert, I could be wrong. Let see: ## Calomel.org OpenBSD /etc/sysctl.conf ## kern.maxclusters=128000 # Cluster allocation limit = netstat -m reports a peak of *only* 2500 mbufs used. net.inet.ip.mtudisc=0 # TCP MTU (Maximum Transmission Unit) = still at 1. I don't use scrub in pf or mss clamping. net.inet.tcp.ackonpush=1# acks for packets with the push bit = only one TCP connection on the firewall (ssh). net.inet.tcp.ecn=1 # Explicit Congestion Notification enabled net.inet.tcp.mssdflt=1472 # maximum segment size (1472 from scrub pf.conf) = same here, I guess the default mss is for connections from the machine. tcpdump shows that the mss is negociated around 1450. Looks good. net.inet.tcp.recvspace=262144 # Increase TCP recieve windows size to increase performance = same, no tcp nor udp... I'm wrong? Thanks, regards.
Re: network bandwith with em(4)
Hi, On Tue, 22 Feb 2011 18:09:32 +0100 Patrick Lamaiziere patf...@davenulle.org wrote: | I'm using two ethernet cards Intel 1000/PRO quad ports (gigabit) on a | firewall (one fiber and one copper). | | The problem is that we don't get more than ~320 Mbits/s of bandwith | beetween the internal networks and internet (gigabit). | | As far I can see, on load there is a number of Ierr on the interface | connected to Internet (between 1% to 5%). | | Also the interrupt rate on this card is around ~7500 (using systat). In | the em(4) driver, there is a limitation of the interrupt rate at 8000/s. | | if_em.h | /* | * MAX_INTS_PER_SEC (ITR - Interrupt Throttle Register) | * The Interrupt Throttle Register (ITR) limits the delivery of | interrupts | * to a reasonable rate by providing a guaranteed inter-interrupt delay | * between interrupts asserted by the Ethernet controller. | */ | #define MAX_INTS_PER_SEC 8000 | | Do you think I can increase this value? The interrupt rate of the | machine is at max ~60% (top). We've got same problems (on a routeur, not a firewall). Increasing MAX_INTS_PER_SEC to 24000 increased bandwith and lowered packet loss. Our cards are Intel PRO/1000 (82576) and Intel PRO/1000 FP (82576). We still have Ierr (but lower count). I don't understand why we still get errors with a 90+%Idle system. I've made some calculations and for a 1Gbps link with 600 Bytes packets, we have to process 208 334 pps. With a 40KB RX buffer on nic (4/600=66 packets max in buffer) we only need 208334/66=3157 interrupts/s so 24000 and even 8000 interrupts/s should be enough :-( If someone have an explanation... Manuel
Re: network bandwith with em(4)
W dniu 2011-02-22 18:31, Fridiric URBAN pisze: Hello, We kinda have the same setup, but with bnx(4) devices. And there is no problem. I'm used to download big files on FTP all over the world and we have gigabit connectivity without any pf related tuning. We are planning to use em(4) 82876 on another path to another ISP so if you find anything else, i'm very interested. Bonne soirie ;) Le 22/02/2011 18:19, Mark Nipper a icrit : On 22 Feb 2011, Patrick Lamaiziere wrote: The problem is that we don't get more than ~320 Mbits/s of bandwith beetween the internal networks and internet (gigabit). Have you already looked at: --- https://calomel.org/network_performance.html Hello, i have been writing to this group about the same problem on November 2010 - http://marc.info/?l=openbsd-miscm=128990880310013w=2 After some discussion, Claudio Joker suggested, that there might be problem with TBR (token bucket regulator). When I tried to set tbrsize in pf.conf like man says a got an error. altq on em0 cbq bandwidth 1Gb tbrsize 4K queue { q_lan } queue q_lan bandwidth 950Mb cbq (default) i got error: root@router-test (/root)# pfctl -f /etc/pf.conf /etc/pf.conf:9: syntax error /etc/pf.conf:10: queue q_lan has no parent /etc/pf.conf:10: errors in queue definition pfctl: Syntax error in config file: pf rules not loaded without tbrsize altq definition is ok. Problem exist for Broadcom cards (bge) also but developers don't have enough time to look into it deeper unfortunately. best regards, RLW
Re: network bandwith with em(4)
On 02/22/11 11:19, Mark Nipper wrote: On 22 Feb 2011, Patrick Lamaiziere wrote: The problem is that we don't get more than ~320 Mbits/s of bandwith beetween the internal networks and internet (gigabit). Have you already looked at: --- https://calomel.org/network_performance.html Henning Brauer have some very interesting thoughts about the content of that particular page. Recent changes on the network stack make those sysctl settings useless. -luis
Re: network bandwith with em(4)
Those documents do not necessarily apply any more. Don't go tweaking knobs until you know what they do. We have machines here that transfer nearly a gigabit of traffic/s without tuning in bridge mode non-the-less. Are you seeing any packet congestion markers (counter congestion) in systat pf? If so you might not have sufficient states available What about framentation? Interface errors? There are many other non-tweakable issues that could cause this. - Original Message - | Le Tue, 22 Feb 2011 11:19:26 -0600, | Mark Nipper ni...@bitgnome.net a icrit : | | The problem is that we don't get more than ~320 Mbits/s of | bandwith | beetween the internal networks and internet (gigabit). | | Have you already looked at: | --- | https://calomel.org/network_performance.html | | Yes thanks. I've already increase the size of the | net.inet.ip.ifq.maxlen. | | But I don't see the point of these tunings for a firewall. IMHO, it | could help for a host handling tcp/udp connection. | | Anyway, I've tried, that does not change anything and I don't think it | should. | | I'm not a network expert, I could be wrong. Let see: | ## Calomel.org OpenBSD /etc/sysctl.conf | ## | kern.maxclusters=128000 # Cluster allocation limit | | = netstat -m reports a peak of *only* 2500 mbufs used. | | net.inet.ip.mtudisc=0 # TCP MTU (Maximum Transmission Unit) | | = still at 1. I don't use scrub in pf or mss clamping. | | net.inet.tcp.ackonpush=1 # acks for packets with the push bit | | = only one TCP connection on the firewall (ssh). | | net.inet.tcp.ecn=1 # Explicit Congestion Notification enabled | | net.inet.tcp.mssdflt=1472 # maximum segment size (1472 from scrub | pf.conf) | | = same here, I guess the default mss is for connections from the | machine. tcpdump shows that the mss is negociated around 1450. Looks | good. | | net.inet.tcp.recvspace=262144 # Increase TCP recieve windows size | to increase performance | | = same, no tcp nor udp... | | I'm wrong? | | Thanks, regards. -- James A. Peltier IT Services - Research Computing Group Simon Fraser University - Burnaby Campus Phone : 778-782-6573 Fax : 778-782-3045 E-Mail : jpelt...@sfu.ca Website : http://www.sfu.ca/itservices http://blogs.sfu.ca/people/jpeltier
Re: network bandwith with em(4)
On Tue, Feb 22, 2011 at 1:06 PM, Patrick Lamaiziere patf...@davenulle.org wrote: https://calomel.org/network_performance.html Yes thanks. I've already increase the size of the net.inet.ip.ifq.maxlen. But I don't see the point of these tunings for a firewall. IMHO, it could help for a host handling tcp/udp connection. Wow, you're like the first person ever to realize that. I'm serious. I wish more people would at least try to think about what they're doing before they go twisting every dial they can find because the internet said so. Sorry I can't give you much useful help, but ignoring the calomel crap is a great start.
Re: network bandwith with em(4)
On 22 February 2011 14:09, Patrick Lamaiziere patf...@davenulle.org wrote: (4.8/amd64) Hello, I'm using two ethernet cards Intel 1000/PRO quad ports (gigabit) on a firewall (one fiber and one copper). The problem is that we don't get more than ~320 Mbits/s of bandwith beetween the internal networks and internet (gigabit). As far I can see, on load there is a number of Ierr on the interface connected to Internet (between 1% to 5%). Also the interrupt rate on this card is around ~7500 (using systat). In the em(4) driver, there is a limitation of the interrupt rate at 8000/s. if_em.h /* * MAX_INTS_PER_SEC (ITR - Interrupt Throttle Register) * The Interrupt Throttle Register (ITR) limits the delivery of interrupts * to a reasonable rate by providing a guaranteed inter-interrupt delay * between interrupts asserted by the Ethernet controller. */ #define MAX_INTS_PER_SEC8000 Do you think I can increase this value? The interrupt rate of the machine is at max ~60% (top). Other ideas to increase the bandwith would be welcome too. I don't think the limitation come from PF because I don't see any congestion. thanks, regards. -- dmesg: em0 at pci5 dev 0 function 0 Intel PRO/1000 QP (82571EB) rev 0x06: apic 1 int 13 (irq 14), address 00:15:17:ed:98:9d em4 at pci9 dev 0 function 0 Intel PRO/1000 QP (82575GB) rev 0x02: apic 1 int 23 (irq 11), address 00:1b:21:38:e0:80 How exactly are you measuring the bandwidth ? What does tcpbench tells you ?