Re: LACP trunk load balancing hash algorithm
On Wed, 19 Jan 2011 06:40:59 +0700, David Gwynne l...@animata.net wrote: On 18/01/2011, at 11:25 PM, Insan Praja SW wrote: My november 21st i386.MP -current handles 1.3Mpps inbound and 1.3Mpps outbound packet during rootkits attacks on one of our collocated costumer, on an 80Mbps traffic, via a vlan interface. CPU is 1% idle, system still responsive (I get to ssh-ed the machine and see systat). where were you reading this 1.3Mpps value from? Systat vmstat dlg Insan Praja -- Using Opera's revolutionary email client: http://www.opera.com/mail/
Re: LACP trunk load balancing hash algorithm
On Wed, 19 Jan 2011 07:10:33 +0700, Ted Unangst ted.unan...@gmail.com wrote: On Tue, Jan 18, 2011 at 6:40 PM, David Gwynne l...@animata.net wrote: On 18/01/2011, at 11:25 PM, Insan Praja SW wrote: My november 21st i386.MP -current handles 1.3Mpps inbound and 1.3Mpps outbound packet during rootkits attacks on one of our collocated costumer, on an 80Mbps traffic, via a vlan interface. CPU is 1% idle, system still responsive (I get to ssh-ed the machine and see systat). where were you reading this 1.3Mpps value from? I think David is asking because 1.3Mpps and 80Mbps implies your traffic consists of 8 byte packets, which may be enough for source and destination IP addresses, but doesn't leave room for the port numbers. :) It's on the total IPKTS and OPKTS on systat vmstat, this is the captured packets. 00:15:17:49:03:b4 00:15:17:49:02:31 0800 92: 202.43.64.61.49334 168.144.196.66.53: [udp sum ok] 29556 updateM [b23=0x6400] [0q] [83au] (50) (ttl 62, id 14151, len 78) 00:15:17:49:03:b4 00:15:17:49:02:31 0800 92: 202.43.64.61.49334 168.144.196.66.53: [udp sum ok] 29556 updateM [b23=0x6400] [0q] [83au] (50) (ttl 62, id 14154, len 78) 00:15:17:49:03:b4 00:15:17:49:02:31 0800 92: 202.43.64.61.49334 168.144.196.66.53: [udp sum ok] 29556 updateM [b23=0x6400] [0q] [83au] (50) (ttl 62, id 14157, len 78) 00:15:17:49:03:b4 00:15:17:49:02:31 0800 92: 202.43.64.61.49334 168.144.196.66.53: [udp sum ok] 29556 updateM [b23=0x6400] [0q] [83au] (50) (ttl 62, id 14160, len 78) Thanks, Insan Praja -- Using Opera's revolutionary email client: http://www.opera.com/mail/
Re: LACP trunk load balancing hash algorithm
On Mon, Jan 17 2011 at 35:23, Jason Healy wrote: I had a few hours to play with a hardware traffic generator today, I wanted to try beating up my OpenBSD setup to see what kind of throughput I could get. For the curious, I was able to pulverize it with 64 byte packets and it topped out at about 165kpps. Throughput was less than physical interface speed (about 800Mbps). For fun, I cranked the payload size up to 1500 bytes, but I couldn't get the box to exceed 1Gbps, even though I had several gigabit interfaces trunked together. At first, it was a switch problem (the switch was sending all the traffic over a single link). However, after I found out my switches LACP hash algorithm I was able to spread the traffic out by randomizing the port numbers. I also played with a traffic generator recently. Here are sone numbers with the tests we've done : Hardware : Dell R310, 2 Gb of RAM, CPU was a quad core, I didn't wrote the exact model yet. I could bring it in a few days. NIC : Intel gigabit Quad port ET2 (Chipset 82576) Software : 4.8-stable, no patch, no sysctl customisation except ip.forwarding=1 Firewall Ruleset : echo pass all /etc/pf.conf pfctl -f /etc/pf.conf Max pps =~ 330-340K Test IMIX =~ 1,1 Gb/s We were able to fully route 2Gb/s of traffic with 1024 packet length (1 Gb/s in each direction) without loosing a single packet. Theorically, the box should be able to route 4Gb/s (33*10^4*1518*8) Largely enougth for our use :) I then confirmed that 4Gbps of traffic was leaving the switch to the OpenBSD box, but only 1Gbps was coming back. Therefore, I'm guessing that the load-balancing algorithm for OpenBSD does not behave the same way as my Juniper switching gear. Does anybody know the LACP hash that the trunk interface in OpenBSD uses to load-balance the outgoing traffic? I didn't have time to do more than a cursory test with different port numbers and IP addresses, so I'm not sure what I might be doing wrong, or if its even possible to use layer 3/4 info in OpenBSD to hash the traffic. Since I'm using the box as a router, layer 2 hashing doesn't help me very much since the source MAC is always the same. I took a peek at the source, but I'm definitely not a C hacker, so nothing jumped out at me for computing the hash... Thanks, Jason Claer
Re: LACP trunk load balancing hash algorithm
On Mon, Jan 17, 2011 at 11:35:02PM -0500, Jason Healy wrote: I had a few hours to play with a hardware traffic generator today, I wanted to try beating up my OpenBSD setup to see what kind of throughput I could get. For the curious, I was able to pulverize it with 64 byte packets and it topped out at about 165kpps. Throughput was less than physical interface speed (about 800Mbps). For fun, I cranked the payload size up to 1500 bytes, but I couldn't get the box to exceed 1Gbps, even though I had several gigabit interfaces trunked together. At first, it was a switch problem (the switch was sending all the traffic over a single link). However, after I found out my switches LACP hash algorithm I was able to spread the traffic out by randomizing the port numbers. I then confirmed that 4Gbps of traffic was leaving the switch to the OpenBSD box, but only 1Gbps was coming back. Therefore, I'm guessing that the load-balancing algorithm for OpenBSD does not behave the same way as my Juniper switching gear. Does anybody know the LACP hash that the trunk interface in OpenBSD uses to load-balance the outgoing traffic? I didn't have time to do more than a cursory test with different port numbers and IP addresses, so I'm not sure what I might be doing wrong, or if its even possible to use layer 3/4 info in OpenBSD to hash the traffic. Since I'm using the box as a router, layer 2 hashing doesn't help me very much since the source MAC is always the same. I took a peek at the source, but I'm definitely not a C hacker, so nothing jumped out at me for computing the hash... 165kpps is fairly low. Please add a dmesg so there is a chance to see what is causing this low rate. Modern HW with good nics should handle around 500kpps. Btw. trunk is using src dest MAC addrs, a possible vlan tag and the IP / IPv6 src dst addrs to build the hash. It does not use port numbers. The function used for this is trunk_hashmbuf(). -- :wq Claudio
Re: LACP trunk load balancing hash algorithm
On Jan 18, 2011, at 6:51 AM, Claudio Jeker wrote: 165kpps is fairly low. Please add a dmesg so there is a chance to see what is causing this low rate. Modern HW with good nics should handle around 500kpps. Good to know. Right now we're only on a 45Mbps connection at about 5kpps, so that seemed fine to me. =) I've attached the dmesg below. It's nothing fancy, as our needs aren't too huge. I was just pushing this particular machine to its limits for fun to see what it could do since I had the traffic generator on-site for a little extra time. I didn't have a lot of time, so proper repeated testing wasn't really possible. Even though we don't have the box anymore, I may set up some iperf machines later to push it again just to verify the config (or test any suggestions I receive from the list). When the packet rate leveled off, we were at about 90% interrupt / 7% idle on one CPU. The other CPU was (as expected), idle. Booting to SP didn't make a noticeable difference. There were a significant number of ifq drops, though I figured that was because the box was getting twice as much traffic in as it was passing back out (and increasing maxlen substantially didn't do much). The ruleset is simple: === table ixiablue { 192.168.100.101 } counters table ixiared { 192.168.101.101 } counters altq on trunk1 hfsc bandwidth 2Gb queue { other , toblue , tored } queue other hfsc(realtime 10Mb linkshare 10Mb upperlimit 10Mb default) queue toblue hfsc(realtime 100Mb linkshare 100Mb upperlimit 2Gb) qlimit 50 queue tored hfsc(realtime 100Mb linkshare 100Mb upperlimit 2Gb) qlimit 50 block log (all) pass flags any pass out on trunk1 from ixiablue to ixiared queue tored pass out on trunk1 from ixiared to ixiablue queue toblue === Turning off queuing improved performance somewhat, but isn't helpful as a real-world test since we rely on queuing in our production environment. Btw. trunk is using src dest MAC addrs, a possible vlan tag and the IP / IPv6 src dst addrs to build the hash. It does not use port numbers. The function used for this is trunk_hashmbuf(). Interesting. We definitely were seeing one of the interfaces in the trunk totally unused for traffic outbound from the OpenBSD box. Since it doesn't use port numbers, I suppose it's possible that we were just unlucky with the src/dst IPs (we were only testing with two endpoint addresses and relying on the multiple layer 4 ports to hash differently). The trunk interface had a single IP (no vlans), and we were routing traffic to it from our core switch using a policy routing scheme (we normally have a box like this deployed in an off-path routing setup, so our core switch passes traffic to it to be shaped). Not sure if a routed setup like that changes the hash (it doesn't sound like it should), but I thought I'd mention it. We can try again with more IP addresses next time to see if that spreads the traffic a little more. Thanks, Jason # dmesg OpenBSD 4.8 (GENERIC.MP) #335: Mon Aug 16 09:09:20 MDT 2010 dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP real mem = 3486973952 (3325MB) avail mem = 3380334592 (3223MB) mainbus0 at root bios0 at mainbus0: SMBIOS rev. 2.5 @ 0xcfedf000 (39 entries) bios0: vendor Phoenix Technologies LTD version 1.3a date 11/03/2009 bios0: Supermicro X7SBi acpi0 at bios0: rev 2 acpi0: sleep states S0 S1 S4 S5 acpi0: tables DSDT FACP _MAR MCFG HPET APIC BOOT SPCR ERST HEST BERT EINJ SLIC SSDT SSDT SSDT SSDT SSDT SSDT SSDT SSDT SSDT acpi0: wakeup devices PXHA(S5) PEX_(S5) LAN_(S5) USB4(S5) USB5(S5) USB7(S5) ESB2(S5) EXP1(S5) EXP5(S5) EXP6(S5) USB1(S5) USB2(S5) USB3(S5) USB6(S5) ESB1(S5) PCIB(S5) KBC0(S1) MSE0(S1) COM1(S5) COM2(S5) PWRB(S3) acpitimer0 at acpi0: 3579545 Hz, 24 bits acpihpet0 at acpi0: 14318179 Hz acpimadt0 at acpi0 addr 0xfee0: PC-AT compat cpu0 at mainbus0: apid 0 (boot processor) cpu0: Intel(R) Xeon(R) CPU E3110 @ 3.00GHz, 3000.62 MHz cpu0: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUS H,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,SBF,SSE3,MWAIT,DS-CPL,VMX,SMX,EST,TM2,S SSE3,CX16,xTPR,PDCM,SSE4.1,XSAVE,NXE,LONG cpu0: 6MB 64b/line 16-way L2 cache cpu0: apic clock running at 333MHz cpu1 at mainbus0: apid 1 (application processor) cpu1: Intel(R) Xeon(R) CPU E3110 @ 3.00GHz, 3000.21 MHz cpu1: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUS H,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,SBF,SSE3,MWAIT,DS-CPL,VMX,SMX,EST,TM2,S SSE3,CX16,xTPR,PDCM,SSE4.1,XSAVE,NXE,LONG cpu1: 6MB 64b/line 16-way L2 cache ioapic0 at mainbus0: apid 2 pa 0xfec0, version 20, 24 pins ioapic1 at mainbus0: apid 3 pa 0xfecc, version 20, 24 pins acpiprt0 at acpi0: bus 0 (PCI0) acpiprt1 at acpi0: bus 2 (PXHA) acpiprt2 at acpi0: bus 3 (PEX_) acpiprt3 at acpi0: bus 5 (EXP1) acpiprt4 at acpi0: bus 13 (EXP5) acpiprt5 at acpi0: bus 15 (EXP6) acpiprt6 at acpi0: bus 17 (PCIB) acpicpu0 at acpi0: C3, PSS acpicpu1 at acpi0: C3,
Re: LACP trunk load balancing hash algorithm
On Tue, 18 Jan 2011 18:51:32 +0700, Claudio Jeker cje...@diehard.n-r-g.com wrote: On Mon, Jan 17, 2011 at 11:35:02PM -0500, Jason Healy wrote: I had a few hours to play with a hardware traffic generator today, I wanted to try beating up my OpenBSD setup to see what kind of throughput I could get. For the curious, I was able to pulverize it with 64 byte packets and it topped out at about 165kpps. Throughput was less than physical interface speed (about 800Mbps). For fun, I cranked the payload size up to 1500 bytes, but I couldn't get the box to exceed 1Gbps, even though I had several gigabit interfaces trunked together. At first, it was a switch problem (the switch was sending all the traffic over a single link). However, after I found out my switches LACP hash algorithm I was able to spread the traffic out by randomizing the port numbers. I then confirmed that 4Gbps of traffic was leaving the switch to the OpenBSD box, but only 1Gbps was coming back. Therefore, I'm guessing that the load-balancing algorithm for OpenBSD does not behave the same way as my Juniper switching gear. Does anybody know the LACP hash that the trunk interface in OpenBSD uses to load-balance the outgoing traffic? I didn't have time to do more than a cursory test with different port numbers and IP addresses, so I'm not sure what I might be doing wrong, or if its even possible to use layer 3/4 info in OpenBSD to hash the traffic. Since I'm using the box as a router, layer 2 hashing doesn't help me very much since the source MAC is always the same. I took a peek at the source, but I'm definitely not a C hacker, so nothing jumped out at me for computing the hash... 165kpps is fairly low. Please add a dmesg so there is a chance to see what is causing this low rate. Modern HW with good nics should handle around 500kpps. My november 21st i386.MP -current handles 1.3Mpps inbound and 1.3Mpps outbound packet during rootkits attacks on one of our collocated costumer, on an 80Mbps traffic, via a vlan interface. CPU is 1% idle, system still responsive (I get to ssh-ed the machine and see systat). Btw. trunk is using src dest MAC addrs, a possible vlan tag and the IP / IPv6 src dst addrs to build the hash. It does not use port numbers. The function used for this is trunk_hashmbuf(). dmesg: OpenBSD 4.8-current (GENERIC.MP) #21: Sun Nov 21 03:46:30 WIT 2010 r...@greenrouter-jkt01.mygreenlinks.net:/usr/src/sys/arch/i386/compile/GENERIC.MP RTC BIOS diagnostic error ffixed_disk,invalid_time cpu0: Intel(R) Xeon(R) CPU X3220 @ 2.40GHz (GenuineIntel 686-class) 2.41 GHz cpu0: FPU,V86,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,SBF,S SE3,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,CX16,xTPR,PDCM real mem = 2142744576 (2043MB) avail mem = 2097610752 (2000MB) mainbus0 at root bios0 at mainbus0: AT/286+ BIOS, date 03/26/07, SMBIOS rev. 2.4 @ 0x7fbe4000 (43 entries) bios0: vendor Intel Corporation version S3000.86B.02.00.0054.061120091710 date 06/11/2009 bios0: Intel S3000AH acpi0 at bios0: rev 2 acpi0: sleep states S0 S1 S4 S5 acpi0: tables DSDT SLIC FACP APIC WDDT HPET MCFG ASF! SSDT SSDT SSDT SSDT SSDT HEST BERT ERST EINJ acpi0: wakeup devices SLPB(S4) P32_(S4) UAR1(S1) PEX4(S4) PEX5(S4) UHC1(S1) UHC2(S1) UHC3(S1) UHC4(S1) EHCI(S1) AC9M(S4) AZAL(S4) acpitimer0 at acpi0: 3579545 Hz, 24 bits acpimadt0 at acpi0 addr 0xfee0: PC-AT compat cpu0 at mainbus0: apid 0 (boot processor) cpu0: apic clock running at 266MHz cpu1 at mainbus0: apid 2 (application processor) cpu1: Intel(R) Xeon(R) CPU X3220 @ 2.40GHz (GenuineIntel 686-class) 2.41 GHz cpu1: FPU,V86,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,SBF,S SE3,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,CX16,xTPR,PDCM cpu2 at mainbus0: apid 1 (application processor) cpu2: Intel(R) Xeon(R) CPU X3220 @ 2.40GHz (GenuineIntel 686-class) 2.41 GHz cpu2: FPU,V86,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,SBF,S SE3,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,CX16,xTPR,PDCM cpu3 at mainbus0: apid 3 (application processor) cpu3: Intel(R) Xeon(R) CPU X3220 @ 2.40GHz (GenuineIntel 686-class) 2.41 GHz cpu3: FPU,V86,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,SBF,S SE3,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,CX16,xTPR,PDCM ioapic0 at mainbus0: apid 5 pa 0xfec0, version 20, 24 pins ioapic0: misconfigured as apic 0, remapped to apid 5 acpihpet0 at acpi0: 14318179 Hz acpiprt0 at acpi0: bus 0 (PCI0) acpiprt1 at acpi0: bus 4 (P32_) acpiprt2 at acpi0: bus 1 (PEX0) acpiprt3 at acpi0: bus -1 (PEX1) acpiprt4 at acpi0: bus -1 (PEX2) acpiprt5 at acpi0: bus -1 (PEX3) acpiprt6 at acpi0: bus 2 (PEX4) acpiprt7 at acpi0: bus 3 (PEX5) acpicpu0 at acpi0: PSS acpicpu1 at acpi0: PSS acpicpu2 at acpi0: PSS acpicpu3 at acpi0: PSS acpibtn0
Re: LACP trunk load balancing hash algorithm
On 18/01/2011, at 11:25 PM, Insan Praja SW wrote: My november 21st i386.MP -current handles 1.3Mpps inbound and 1.3Mpps outbound packet during rootkits attacks on one of our collocated costumer, on an 80Mbps traffic, via a vlan interface. CPU is 1% idle, system still responsive (I get to ssh-ed the machine and see systat). where were you reading this 1.3Mpps value from? dlg
Re: LACP trunk load balancing hash algorithm
On Tue, Jan 18, 2011 at 6:40 PM, David Gwynne l...@animata.net wrote: On 18/01/2011, at 11:25 PM, Insan Praja SW wrote: My november 21st i386.MP -current handles 1.3Mpps inbound and 1.3Mpps outbound packet during rootkits attacks on one of our collocated costumer, on an 80Mbps traffic, via a vlan interface. CPU is 1% idle, system still responsive (I get to ssh-ed the machine and see systat). where were you reading this 1.3Mpps value from? I think David is asking because 1.3Mpps and 80Mbps implies your traffic consists of 8 byte packets, which may be enough for source and destination IP addresses, but doesn't leave room for the port numbers. :)