Re: LACP trunk load balancing hash algorithm

2011-01-19 Thread Insan Praja SW

On Wed, 19 Jan 2011 06:40:59 +0700, David Gwynne l...@animata.net wrote:



On 18/01/2011, at 11:25 PM, Insan Praja SW wrote:


My november 21st i386.MP -current handles 1.3Mpps inbound and 1.3Mpps  
outbound packet during rootkits attacks on one of our collocated  
costumer, on an 80Mbps traffic, via a vlan interface. CPU is 1% idle,  
system still responsive (I get to ssh-ed the machine and see systat).


where were you reading this 1.3Mpps value from?


Systat vmstat


dlg




Insan Praja
--
Using Opera's revolutionary email client: http://www.opera.com/mail/



Re: LACP trunk load balancing hash algorithm

2011-01-19 Thread Insan Praja SW
On Wed, 19 Jan 2011 07:10:33 +0700, Ted Unangst ted.unan...@gmail.com  
wrote:



On Tue, Jan 18, 2011 at 6:40 PM, David Gwynne l...@animata.net wrote:

On 18/01/2011, at 11:25 PM, Insan Praja SW wrote:


My november 21st i386.MP -current handles 1.3Mpps inbound and 1.3Mpps
outbound packet during rootkits attacks on one of our collocated  
costumer, on

an 80Mbps traffic, via a vlan interface. CPU is 1% idle, system still
responsive (I get to ssh-ed the machine and see systat).

where were you reading this 1.3Mpps value from?


I think David is asking because 1.3Mpps and 80Mbps implies your
traffic consists of 8 byte packets, which may be enough for source and
destination IP addresses, but doesn't leave room for the port numbers.
:)


It's on the total IPKTS and OPKTS on systat vmstat, this is the captured  
packets.



00:15:17:49:03:b4 00:15:17:49:02:31 0800 92: 202.43.64.61.49334 
168.144.196.66.53: [udp sum ok] 29556 updateM [b23=0x6400] [0q] [83au]
(50) (ttl 62, id 14151, len 78)
00:15:17:49:03:b4 00:15:17:49:02:31 0800 92: 202.43.64.61.49334 
168.144.196.66.53: [udp sum ok] 29556 updateM [b23=0x6400] [0q] [83au]
(50) (ttl 62, id 14154, len 78)
00:15:17:49:03:b4 00:15:17:49:02:31 0800 92: 202.43.64.61.49334 
168.144.196.66.53: [udp sum ok] 29556 updateM [b23=0x6400] [0q] [83au]
(50) (ttl 62, id 14157, len 78)
00:15:17:49:03:b4 00:15:17:49:02:31 0800 92: 202.43.64.61.49334 
168.144.196.66.53: [udp sum ok] 29556 updateM [b23=0x6400] [0q] [83au]
(50) (ttl 62, id 14160, len 78)

Thanks,


Insan Praja

--
Using Opera's revolutionary email client: http://www.opera.com/mail/



Re: LACP trunk load balancing hash algorithm

2011-01-18 Thread Claer
On Mon, Jan 17 2011 at 35:23, Jason Healy wrote:
 I had a few hours to play with a hardware traffic generator today, I wanted to
 try beating up my OpenBSD setup to see what kind of throughput I could get.
 
 For the curious, I was able to pulverize it with 64 byte packets and it topped
 out at about 165kpps.  Throughput was less than physical interface speed
 (about 800Mbps).  For fun, I cranked the payload size up to 1500 bytes, but I
 couldn't get the box to exceed 1Gbps, even though I had several gigabit
 interfaces trunked together.  At first, it was a switch problem (the switch
 was sending all the traffic over a single link).  However, after I found out
 my switches LACP hash algorithm I was able to spread the traffic out by
 randomizing the port numbers.
I also played with a traffic generator recently. Here are sone numbers with the
tests we've done :

Hardware : Dell R310, 2 Gb of RAM, CPU was a quad core, I didn't wrote the exact
model yet. I could bring it in a few days.
NIC : Intel gigabit Quad port ET2 (Chipset 82576)
Software : 4.8-stable, no patch, no sysctl customisation except ip.forwarding=1
Firewall Ruleset : echo pass all  /etc/pf.conf  pfctl -f /etc/pf.conf

Max pps =~ 330-340K
Test IMIX =~ 1,1 Gb/s

We were able to fully route 2Gb/s of traffic with 1024 packet length 
(1 Gb/s in each direction) without loosing a single packet.

Theorically, the box should be able to route 4Gb/s (33*10^4*1518*8)
Largely enougth for our use :)

 I then confirmed that 4Gbps of traffic was leaving the switch to the OpenBSD
 box, but only 1Gbps was coming back.  Therefore, I'm guessing that the
 load-balancing algorithm for OpenBSD does not behave the same way as my
 Juniper switching gear.  Does anybody know the LACP hash that the trunk
 interface in OpenBSD uses to load-balance the outgoing traffic?  I didn't have
 time to do more than a cursory test with different port numbers and IP
 addresses, so I'm not sure what I might be doing wrong, or if its even
 possible to use layer 3/4 info in OpenBSD to hash the traffic.  Since I'm
 using the box as a router, layer 2 hashing doesn't help me very much since the
 source MAC is always the same.
 
 I took a peek at the source, but I'm definitely not a C hacker, so nothing
 jumped out at me for computing the hash...
 
 Thanks,
 
 Jason

Claer



Re: LACP trunk load balancing hash algorithm

2011-01-18 Thread Claudio Jeker
On Mon, Jan 17, 2011 at 11:35:02PM -0500, Jason Healy wrote:
 I had a few hours to play with a hardware traffic generator today, I wanted to
 try beating up my OpenBSD setup to see what kind of throughput I could get.
 
 For the curious, I was able to pulverize it with 64 byte packets and it topped
 out at about 165kpps.  Throughput was less than physical interface speed
 (about 800Mbps).  For fun, I cranked the payload size up to 1500 bytes, but I
 couldn't get the box to exceed 1Gbps, even though I had several gigabit
 interfaces trunked together.  At first, it was a switch problem (the switch
 was sending all the traffic over a single link).  However, after I found out
 my switches LACP hash algorithm I was able to spread the traffic out by
 randomizing the port numbers.
 
 I then confirmed that 4Gbps of traffic was leaving the switch to the OpenBSD
 box, but only 1Gbps was coming back.  Therefore, I'm guessing that the
 load-balancing algorithm for OpenBSD does not behave the same way as my
 Juniper switching gear.  Does anybody know the LACP hash that the trunk
 interface in OpenBSD uses to load-balance the outgoing traffic?  I didn't have
 time to do more than a cursory test with different port numbers and IP
 addresses, so I'm not sure what I might be doing wrong, or if its even
 possible to use layer 3/4 info in OpenBSD to hash the traffic.  Since I'm
 using the box as a router, layer 2 hashing doesn't help me very much since the
 source MAC is always the same.
 
 I took a peek at the source, but I'm definitely not a C hacker, so nothing
 jumped out at me for computing the hash...
 

165kpps is fairly low. Please add a dmesg so there is a chance to see what
is causing this low rate. Modern HW with good nics should handle around
500kpps.

Btw. trunk is using src  dest MAC addrs, a possible vlan tag and the IP /
IPv6 src  dst addrs to build the hash. It does not use port numbers.
The function used for this is trunk_hashmbuf().

-- 
:wq Claudio



Re: LACP trunk load balancing hash algorithm

2011-01-18 Thread Jason Healy
On Jan 18, 2011, at 6:51 AM, Claudio Jeker wrote:

 165kpps is fairly low. Please add a dmesg so there is a chance to see what
 is causing this low rate. Modern HW with good nics should handle around
 500kpps.

Good to know.  Right now we're only on a 45Mbps connection at about 5kpps, so
that seemed fine to me.  =)  I've attached the dmesg below.  It's nothing
fancy, as our needs aren't too huge.  I was just pushing this particular
machine to its limits for fun to see what it could do since I had the traffic
generator on-site for a little extra time.

I didn't have a lot of time, so proper repeated testing wasn't really
possible.  Even though we don't have the box anymore, I may set up some iperf
machines later to push it again just to verify the config (or test any
suggestions I receive from the list).

When the packet rate leveled off, we were at about 90% interrupt / 7% idle on
one CPU.  The other CPU was (as expected), idle.  Booting to SP didn't make a
noticeable difference.  There were a significant number of ifq drops, though I
figured that was because the box was getting twice as much traffic in as it
was passing back out (and increasing maxlen substantially didn't do much).

The ruleset is simple:

===
table ixiablue { 192.168.100.101 } counters
table ixiared { 192.168.101.101 } counters

altq on trunk1 hfsc bandwidth 2Gb queue { other , toblue , tored }
queue other hfsc(realtime 10Mb linkshare 10Mb upperlimit 10Mb default)
queue toblue hfsc(realtime 100Mb linkshare 100Mb upperlimit 2Gb) qlimit 50
queue tored hfsc(realtime 100Mb linkshare 100Mb upperlimit 2Gb) qlimit 50

block log (all)

pass flags any

pass out on trunk1 from ixiablue to ixiared queue tored
pass out on trunk1 from ixiared to ixiablue queue toblue
===

Turning off queuing improved performance somewhat, but isn't helpful as a
real-world test since we rely on queuing in our production environment.

 Btw. trunk is using src  dest MAC addrs, a possible vlan tag and the IP /
 IPv6 src  dst addrs to build the hash. It does not use port numbers.
 The function used for this is trunk_hashmbuf().

Interesting.  We definitely were seeing one of the interfaces in the trunk
totally unused for traffic outbound from the OpenBSD box. Since it doesn't use
port numbers, I suppose it's possible that we were just unlucky with the
src/dst IPs (we were only testing with two endpoint addresses and relying on
the multiple layer 4 ports to hash differently).

The trunk interface had a single IP (no vlans), and we were routing traffic to
it from our core switch using a policy routing scheme (we normally have a box
like this deployed in an off-path routing setup, so our core switch passes
traffic to it to be shaped).  Not sure if a routed setup like that changes the
hash (it doesn't sound like it should), but I thought I'd mention it.

We can try again with more IP addresses next time to see if that spreads the
traffic a little more.

Thanks,

Jason


# dmesg
OpenBSD 4.8 (GENERIC.MP) #335: Mon Aug 16 09:09:20 MDT 2010
dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
real mem = 3486973952 (3325MB)
avail mem = 3380334592 (3223MB)
mainbus0 at root
bios0 at mainbus0: SMBIOS rev. 2.5 @ 0xcfedf000 (39 entries)
bios0: vendor Phoenix Technologies LTD version 1.3a date 11/03/2009
bios0: Supermicro X7SBi
acpi0 at bios0: rev 2
acpi0: sleep states S0 S1 S4 S5
acpi0: tables DSDT FACP _MAR MCFG HPET APIC BOOT SPCR ERST HEST BERT EINJ SLIC
SSDT SSDT SSDT SSDT SSDT SSDT SSDT SSDT SSDT
acpi0: wakeup devices PXHA(S5) PEX_(S5) LAN_(S5) USB4(S5) USB5(S5) USB7(S5)
ESB2(S5) EXP1(S5) EXP5(S5) EXP6(S5) USB1(S5) USB2(S5) USB3(S5) USB6(S5)
ESB1(S5) PCIB(S5) KBC0(S1) MSE0(S1) COM1(S5) COM2(S5) PWRB(S3)
acpitimer0 at acpi0: 3579545 Hz, 24 bits
acpihpet0 at acpi0: 14318179 Hz
acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: Intel(R) Xeon(R) CPU E3110 @ 3.00GHz, 3000.62 MHz
cpu0:
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUS
H,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,SBF,SSE3,MWAIT,DS-CPL,VMX,SMX,EST,TM2,S
SSE3,CX16,xTPR,PDCM,SSE4.1,XSAVE,NXE,LONG
cpu0: 6MB 64b/line 16-way L2 cache
cpu0: apic clock running at 333MHz
cpu1 at mainbus0: apid 1 (application processor)
cpu1: Intel(R) Xeon(R) CPU E3110 @ 3.00GHz, 3000.21 MHz
cpu1:
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUS
H,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,SBF,SSE3,MWAIT,DS-CPL,VMX,SMX,EST,TM2,S
SSE3,CX16,xTPR,PDCM,SSE4.1,XSAVE,NXE,LONG
cpu1: 6MB 64b/line 16-way L2 cache
ioapic0 at mainbus0: apid 2 pa 0xfec0, version 20, 24 pins
ioapic1 at mainbus0: apid 3 pa 0xfecc, version 20, 24 pins
acpiprt0 at acpi0: bus 0 (PCI0)
acpiprt1 at acpi0: bus 2 (PXHA)
acpiprt2 at acpi0: bus 3 (PEX_)
acpiprt3 at acpi0: bus 5 (EXP1)
acpiprt4 at acpi0: bus 13 (EXP5)
acpiprt5 at acpi0: bus 15 (EXP6)
acpiprt6 at acpi0: bus 17 (PCIB)
acpicpu0 at acpi0: C3, PSS
acpicpu1 at acpi0: C3, 

Re: LACP trunk load balancing hash algorithm

2011-01-18 Thread Insan Praja SW
On Tue, 18 Jan 2011 18:51:32 +0700, Claudio Jeker  
cje...@diehard.n-r-g.com wrote:



On Mon, Jan 17, 2011 at 11:35:02PM -0500, Jason Healy wrote:
I had a few hours to play with a hardware traffic generator today, I  
wanted to
try beating up my OpenBSD setup to see what kind of throughput I could  
get.


For the curious, I was able to pulverize it with 64 byte packets and it  
topped

out at about 165kpps.  Throughput was less than physical interface speed
(about 800Mbps).  For fun, I cranked the payload size up to 1500 bytes,  
but I

couldn't get the box to exceed 1Gbps, even though I had several gigabit
interfaces trunked together.  At first, it was a switch problem (the  
switch
was sending all the traffic over a single link).  However, after I  
found out

my switches LACP hash algorithm I was able to spread the traffic out by
randomizing the port numbers.

I then confirmed that 4Gbps of traffic was leaving the switch to the  
OpenBSD

box, but only 1Gbps was coming back.  Therefore, I'm guessing that the
load-balancing algorithm for OpenBSD does not behave the same way as my
Juniper switching gear.  Does anybody know the LACP hash that the trunk
interface in OpenBSD uses to load-balance the outgoing traffic?  I  
didn't have

time to do more than a cursory test with different port numbers and IP
addresses, so I'm not sure what I might be doing wrong, or if its even
possible to use layer 3/4 info in OpenBSD to hash the traffic.  Since  
I'm
using the box as a router, layer 2 hashing doesn't help me very much  
since the

source MAC is always the same.

I took a peek at the source, but I'm definitely not a C hacker, so  
nothing

jumped out at me for computing the hash...



165kpps is fairly low. Please add a dmesg so there is a chance to see  
what

is causing this low rate. Modern HW with good nics should handle around
500kpps.



My november 21st i386.MP -current handles 1.3Mpps inbound and 1.3Mpps  
outbound packet during rootkits attacks on one of our collocated costumer,  
on an 80Mbps traffic, via a vlan interface. CPU is 1% idle, system still  
responsive (I get to ssh-ed the machine and see systat).


Btw. trunk is using src  dest MAC addrs, a possible vlan tag and the IP  
/

IPv6 src  dst addrs to build the hash. It does not use port numbers.
The function used for this is trunk_hashmbuf().



dmesg:

OpenBSD 4.8-current (GENERIC.MP) #21: Sun Nov 21 03:46:30 WIT 2010

r...@greenrouter-jkt01.mygreenlinks.net:/usr/src/sys/arch/i386/compile/GENERIC.MP
RTC BIOS diagnostic error ffixed_disk,invalid_time
cpu0: Intel(R) Xeon(R) CPU X3220 @ 2.40GHz (GenuineIntel 686-class) 2.41  
GHz
cpu0:  
FPU,V86,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,SBF,S

SE3,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,CX16,xTPR,PDCM
real mem  = 2142744576 (2043MB)
avail mem = 2097610752 (2000MB)
mainbus0 at root
bios0 at mainbus0: AT/286+ BIOS, date 03/26/07, SMBIOS rev. 2.4 @  
0x7fbe4000 (43 entries)
bios0: vendor Intel Corporation version  
S3000.86B.02.00.0054.061120091710 date 06/11/2009

bios0: Intel S3000AH
acpi0 at bios0: rev 2
acpi0: sleep states S0 S1 S4 S5
acpi0: tables DSDT SLIC FACP APIC WDDT HPET MCFG ASF! SSDT SSDT SSDT SSDT  
SSDT HEST BERT ERST EINJ
acpi0: wakeup devices SLPB(S4) P32_(S4) UAR1(S1) PEX4(S4) PEX5(S4)  
UHC1(S1) UHC2(S1) UHC3(S1) UHC4(S1) EHCI(S1) AC9M(S4) AZAL(S4)

acpitimer0 at acpi0: 3579545 Hz, 24 bits
acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: apic clock running at 266MHz
cpu1 at mainbus0: apid 2 (application processor)
cpu1: Intel(R) Xeon(R) CPU X3220 @ 2.40GHz (GenuineIntel 686-class) 2.41  
GHz
cpu1:  
FPU,V86,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,SBF,S

SE3,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,CX16,xTPR,PDCM
cpu2 at mainbus0: apid 1 (application processor)
cpu2: Intel(R) Xeon(R) CPU X3220 @ 2.40GHz (GenuineIntel 686-class) 2.41  
GHz
cpu2:  
FPU,V86,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,SBF,S

SE3,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,CX16,xTPR,PDCM
cpu3 at mainbus0: apid 3 (application processor)
cpu3: Intel(R) Xeon(R) CPU X3220 @ 2.40GHz (GenuineIntel 686-class) 2.41  
GHz
cpu3:  
FPU,V86,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,SBF,S

SE3,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,CX16,xTPR,PDCM
ioapic0 at mainbus0: apid 5 pa 0xfec0, version 20, 24 pins
ioapic0: misconfigured as apic 0, remapped to apid 5
acpihpet0 at acpi0: 14318179 Hz
acpiprt0 at acpi0: bus 0 (PCI0)
acpiprt1 at acpi0: bus 4 (P32_)
acpiprt2 at acpi0: bus 1 (PEX0)
acpiprt3 at acpi0: bus -1 (PEX1)
acpiprt4 at acpi0: bus -1 (PEX2)
acpiprt5 at acpi0: bus -1 (PEX3)
acpiprt6 at acpi0: bus 2 (PEX4)
acpiprt7 at acpi0: bus 3 (PEX5)
acpicpu0 at acpi0: PSS
acpicpu1 at acpi0: PSS
acpicpu2 at acpi0: PSS
acpicpu3 at acpi0: PSS
acpibtn0 

Re: LACP trunk load balancing hash algorithm

2011-01-18 Thread David Gwynne
On 18/01/2011, at 11:25 PM, Insan Praja SW wrote:

 My november 21st i386.MP -current handles 1.3Mpps inbound and 1.3Mpps
outbound packet during rootkits attacks on one of our collocated costumer, on
an 80Mbps traffic, via a vlan interface. CPU is 1% idle, system still
responsive (I get to ssh-ed the machine and see systat).

where were you reading this 1.3Mpps value from?

dlg



Re: LACP trunk load balancing hash algorithm

2011-01-18 Thread Ted Unangst
On Tue, Jan 18, 2011 at 6:40 PM, David Gwynne l...@animata.net wrote:
 On 18/01/2011, at 11:25 PM, Insan Praja SW wrote:

 My november 21st i386.MP -current handles 1.3Mpps inbound and 1.3Mpps
 outbound packet during rootkits attacks on one of our collocated costumer, on
 an 80Mbps traffic, via a vlan interface. CPU is 1% idle, system still
 responsive (I get to ssh-ed the machine and see systat).

 where were you reading this 1.3Mpps value from?

I think David is asking because 1.3Mpps and 80Mbps implies your
traffic consists of 8 byte packets, which may be enough for source and
destination IP addresses, but doesn't leave room for the port numbers.
:)