Re: Network Slowness Proliant DL380 G4

2008-02-07 Thread rezidue
I believe I had this same problem when I was compiling the source offered on
iperf's site.  This was resolved by compiling the version offered by ports.

On Feb 7, 2008 5:08 AM, Joe Warren-Meeks <[EMAIL PROTECTED]> wrote:

> On Wed, Feb 06, 2008 at 07:19:03PM +0100, Pete Vickers wrote:
>
> Hey there,
>
> > OpenBSD's bge driver sucks big time, typical symptoms are very slow
> > transfers, and incrementing errors (netstat -i).
> > You can confirm this by booting $other_os_boot_cd and retesting.
>
> Ah, I was unaware of this. I've got a pair of OpenBSD firewalls running
> pf and carp using bge interfaces.
>
> What is the best miliation strategy to deal with this? I've upped the
> tcp recvspace and sendspace. Any idea if/when the driver will be
> improved?
>
> Thanks.
>
>  -- joe.



Re: Speed Problems Part 2

2007-10-01 Thread rezidue
I decided to pump up maxlen to 8192 to see what would happen and I thought
it actually has stopped the drops.  Unfortunately I was under the impression
they had stopped when I believe this was causing the count to not increase:

WARNING: mclpool limit reached; increase kern.maxclusters

I've pumped up kern.maxclusters to about 2.5x of it's original value and my
drops have begun to increment again along with pf congestion which seems to
go hand in hand.

net.inet.ip.ifq.len=0
net.inet.ip.ifq.maxlen=8192
net.inet.ip.ifq.drops=1566435

I'd going to double again and I'll report my finding shortly.  Again though
, this seems over excessive.



On 10/1/07, rezidue <[EMAIL PROTECTED]> wrote:
>
>
>
> I've now got both of my edge routers running 4.1 and there is definitely a
> speed improvement over all.  Unfortunately I can't seem to get drops to stop
> occurring and after hitting a traffic peak for the past three hours it looks
> like this:
>
> net.inet.ip.ifq.len=0
> net.inet.ip.ifq.maxlen=4096
> net.inet.ip.ifq.drops=1289636



Re: Speed Problems Part 2

2007-10-01 Thread rezidue
On 9/26/07, Stuart Henderson <[EMAIL PROTECTED]> wrote:
>
> On 2007/09/26 13:50, rezidue wrote:
> >
> > >Order a 4.2 CD and install it as soon as you get it. 4.2 removed
> many
> > >bottlenecks in the network stack. In the meanwhile check out for
> the ip
> > >ifq len:
> > ># sysctl net.inet.ip.ifq
> > >net.inet.ip.ifq.len=0
> > >net.inet.ip.ifq.maxlen=256
> > >net.inet.ip.ifq.drops=0
> >
> > >I bet your drops are non 0 and the maxlen is to small (256 is a
> better
> > >value for gigabit firewalls/routers).
> > >--
> > >:wq Claudio
> >
> > I've gone through the 4.1 and 4.2 changes in hopes I would find some
> clear
> > reason as to why I'm having these issues but I've not seen anything.
>
> At the last hackathon, there was a lot of work done on profiling and
> optimizing the path through the network stack/PF; you'll see more about
> this at http://www.openbsd.org/papers/cuug2007/mgp00012.html (and the
> following pages).
>
> > What exactly is this queue?  The odd thing is that I report a negative
> > value for drops and it's counting down.
>
> The -ve is because it's a signed integer and has, on your system,
> exceeded the maximum value since bootup..
>
> > net.inet.ip.ifq.drops=-1381027346
> > I've put maxlen=256 and it seems to have slowed the count down.
>
> You might like to try bumping it up until it stops increasing (uh,
> decreasing. :-) And re-investigate when you get 4.2 (or make any other
> changes to the system).
>
>
I've now got both of my edge routers running 4.1 and there is definitely a
speed improvement over all.  Unfortunately I can't seem to get drops to stop
occurring and after hitting a traffic peak for the past three hours it looks
like this:

net.inet.ip.ifq.len=0
net.inet.ip.ifq.maxlen=4096
net.inet.ip.ifq.drops=1289636

About 100k existed before I managed to get drops to stop over the weekend
with maxlen=2048 but that's only a small portion of the total count now.
I'm afraid to raise maxlen but I'm tempted to see what value I would need to
get this to stop.  The box is peaking at about 180mbps, 30-40k pps.  I still
have plenty of resources available.  In top I've only seen interrupts on
cpu0 and it gets between 30-35 and goes back and forth to 0%.

Here is a dmesg:

 revision 1.0
uhub0 at usb0
uhub0: AMD OHCI root hub, rev 1.00/1.00, addr 1
uhub0: 3 ports with 3 removable, self powered
ohci1 at pci1 dev 0 function 1 "AMD 8111 USB" rev 0x0b: irq 9, version 1.0,
legacy support
usb1 at ohci1: USB revision 1.0
uhpb1 at usb1
uhub1: AMD OHCI root hub, rev 1.00/1.00, addr 1
uhub1: 3 ports with 3 removable, self powered
pciide0 at pci1 dev 5 function 0 "CMD Technology SiI3114 SATA" rev 0x02: DMA
pciide0: using irq 10 for native-PCI interrupt
pciide0: port 0: device present, speed: 1.5Gb/s
wd0 at pciide0 channel 0 drive 0: 
wd0: 16-sector PIO  LBA48, 238475MB, 488397168 sectors
wd0(pciide0:0:0): using BIOS timings, Ultra-DMA mode 6
pciide0: port 1: device present, speed: 1.5Gb/s
wd1 at pciide0 channel 1 drive 0: 
wd1: 16-sector PIO, LBA48, 238475MB, 488397168 sectors
wd1(pciide0:1:0): using BIOS timings, Ultra-DMA mode 6
vga1 at pci1 dev 6 function 0 "ATI Rage XL" rev 0x27
wsdisplay0 at vga1 mux 1: console (80x25, vt100 emulation)
"AMD 8111 LPC" rev 0x05 at pci0 dev 7 function 0 not configured
pciide1 at pci0 dev 7 function 1 "AMD 8111 IDE" rev 0x03: DMA, channel 0
configured to compatibility, channel 1 configured to compatibility
atapiscsi0 at pciide1 channel 0 drive 0
scsibus0 at atapiscsi0: 2 targets
cd0 at scsibus0 targ 0 lun 0:  SCSI0 5/cdrom
removable
cd0(pciide1:0:0): using PIO mode 4, Ultra-DMA mode 2
pciide1: channel 1 disabled (no drives)
"AMD 8111 SMBus" rev 0x02 at pci0 dev 7 function 2 not configured
"AMD 8111 Power" rev 0x05 at pci0 dev 7 function 3 not configured
ppb1 at pci0 dev 10 function 0 "AMD 8131 PCIX" rev 0x12
pci2 at ppb1 bus 2
bge0 at pci2 dev 9 function 0 "Broadcom BCM5704C" rev 0x03, BCM5704 A3
(0x2003): irq 5, address 00:e0:81:40:bd:8e
brgphy0 at bge0 phy 1: BCM5704 10/100/1000baseT PHY, rev. 0
bge1 at pci2 dev 9 function 1 "Broadcom BCM5704C" rev 0x03, BCM5704 A3
(0x2003): irq 10, address 00:e0:81:40:bd:8f
brgphy1 at bge1 phy 1: BCD5704 10/100/1000baseT PHY, rev. 0
"AMD 8131 PCIX IOAPIC" rev 0x01 at pci0 dev 10 function 1 not configured
ppb2 at pci0 dev 11 function 0 "AMD 8131 PCIX" rev 0x12
pci3 at ppb2 bus 1
"AMD 8131 PCIX IOAPIC" rev 0x01 at pci0 dev 11 function 1 not configured
pchb0 at pci0 dev 24 function 0 "AMD AMD64 HyperTransport" rev 0x00
pchb1 at pci0 dev 24 function 1 "AMD AMD

Speed Problems Part 2

2007-09-26 Thread rezidue
For some reason I can't seem to reply to the earlier responses.  Hopefully
this gets through.

On 9/26/07, Bryan Irvine < [EMAIL PROTECTED]> wrote:

>What have you looked at? are you running pf? what kind of ruleset?
>   Tried simplifying it?
>
>--Bryan

I wasn't running pf originally when I noticed this problem but I am now just
to block ssh from the outside.  I've disabled and re-enabled pf to see if it
affects throughput and it's not, or isn't that noticeable.  As for what I
have done I have performed a number of bandwidth tests.  I've come from the
outside, traversing the gateway while downloading from an internal host.
I've come from the outside to the gateway downloading from it, I've come
from the local subnet on a machine running the exact same hardware and
installation while transferring a file in each direction.  While under high
load all forms of this testing is affected with poor speeds.  Even when not
under high loads I never see the speeds I should.  I've checked interface
stats on the switch and have found no errors.  I have run iperf and can only
seem to get 5-16Mb/s.  I even bumped up sendspace and recvspace to help with
edge host to host transfer but I've not seen any improvement.  I'm going to
be tinkering with netperf more because I'm not sure if I ran into an issue
on bsd with it.  On two linux boxes on the inside it reports line speed
between them.

On 9/26/07, Maxim Belooussov < [EMAIL PROTECTED]> wrote:

>Hi,
>   The first thing to do is to check the cable :)
>
>   And the second thing to do is to check the entire chain. Maybe it's
>   not you, but the other end who cannot handle the load.
>
>   Max

Cables don't show any problems and I have the problems internally as well,
not just external hosts.  I wish it was that simple.

On 9/26/07, Claudio Jeker <[EMAIL PROTECTED]> wrote:

>Order a 4.2 CD and install it as soon as you get it. 4.2 removed many
>bottlenecks in the network stack. In the meanwhile check out for the ip
>ifq len:
># sysctl net.inet.ip.ifq
>net.inet.ip.ifq.len=0
>net.inet.ip.ifq.maxlen=256
>net.inet.ip.ifq.drops=0

>I bet your drops are non 0 and the maxlen is to small (256 is a better
>value for gigabit firewalls/routers).
>--
>:wq Claudio

I've gone through the 4.1 and 4.2 changes in hopes I would find some clear
reason as to why I'm having these issues but I've not seen anything.  What
exactly is this queue?  The odd thing is that I report a negative value for
drops and it's counting down.

net.inet.ip.ifq.drops=-1381027346

I've put maxlen=256 and it seems to have slowed the count down.

On 9/26/07, Stuart Henderson <[EMAIL PROTECTED]> wrote:

>dmesg and vmstat -i might give clues. Also try bsd.mp if you use
>bsd (or vice-versa), and Claudio's suggestion of 4.2 is a good one.

Dmesg has not shown any issues.  I've been a bit confused with how to
interpret the output of vmstat and "systat vmstat".  I was told to look for
interrupts on "systat vmstat" but I haven't seen any being thrown while
under heavy load.  As for "vmstat -i", I'm not exactly sure what would
signify a problem but I get the following output:

Gateway1 (about 3-4 times the load of gateway2)
interrupt   total rate
irq0/clock 6455328221  399
irq0/ipi   2543041813  157
irq19/ohci0  91660
irq17/pciide0 76302290
irq0/bge0 25346022947 1570
irq1/bge1 21123330824 1308
Total 55475363200 3437

Gateway2:
interrupt   total rate
irq0/clock 6455272059  400
irq0/ipi   1819715207  112
irq19/ohci0 125740
irq17/pciide0 62321130
irq0/bge0  8118898045  503
irq1/bge1 12291117020  761
Total 28691247018 1777

Here is my sysctl -a output:

kern.ostype=OpenBSD
kern.osrelease=4.0
kern.osrevision=200611
kern.version=OpenBSD 4.0-stable (GENERIC.MP) #0: Thu Mar 15 07:28:19 CST
2007
[EMAIL PROTECTED]
:/usr/src/sys/arch/amd64/compile/GENERIC.MP

kern.maxvnodes=1310
kern.maxproc=532
kern.maxfiles=1772
kern.argmax=262144
kern.securelevel=1
kern.hostname=dyno1.nothingtoseehere.com
kern.hostid=0
kern.clockrate=tick = 1, tickadj = 40, hz = 100, profhz = 100, stathz =
100
kern.posix1version=199009
kern.ngroups=16
kern.job_control=1
kern.saved_ids=1
kern.boottime=Fri Mar 23 06:44:05 2007
kern.domainname=
kern.maxpartitions=16
kern.rawpartition=2
kern.osversion=GENERIC.MP#0
kern.somaxconn=128
kern.sominconn=80
kern.usermount=0
kern.random=160901082016 47373568 0 502891828 23135 5922320 0 0 0 0 0 0
22063035075 935474146 14935755619 48820374348 1984945954 2097660952
3949423372 384190080 606887773 1054912573 2101170714 1709697072 1531324571
891699911 1726356236 407933168 707207288 1237035834 37928905 5295362
1570

Re: Speed Problems

2007-09-26 Thread rezidue
Hopefully this makes it through , I've been trying to post comments all day
but they don't seem to make it here.

To Bryan, I wasn't running pf originally when I noticed this problem but I
am now just to block ssh from the outside.  I've disabled and re-enabled pf
to see if it affects throughput and it's not, or isn't that noticeable.  As
for what I have done I have performed a number of bandwidth tests.  I've
come from the outside, traversing the gateway while downloading from an
internal host.  I've come from the outside to the gateway downloading from
it, I've come from the local subnet on a machine running the exact same
hardware and installation while transferring a file in each direction.
While under high load all forms of this testing is affected with poor
speeds.  Even when not under high loads I never see the speeds I should.
I've checked interface stats on the switch and have found no errors.  I have
run iperf and can only seem to get 5-16Mb/s.  I even bumped up sendspace and
recvspace to help with edge host to host transfer but I've not seen any
improvement.  I'm going to be tinkering with netperf more because I'm not
sure if I ran into an issue on bsd with it.  On two linux boxes on the
inside it reports line speed between them.

To Max, Cables don't show any problems and I have the problems internally as
well, not just external hosts.  I wish it was that simple.

To Claudio, I've gone through the 4.1 and 4.2 changes in hopes I would find
some clear reason as to why I'm having these issues but I've not seen
anything. The odd thing is that I report a negative value for drops and it's
counting down.

net.inet.ip.ifq.drops=-1381027346

I've put maxlen=256 and it seems to have slowed the count down.


To Stuart, Dmesg has not shown any issues.  I've been a bit confused with
how to interpret the output of vmstat and "systat vmstat".  I was told to
look for interrupts on "systat vmstat" but I haven't seen any being thrown
while under heavy load.  I also don't think I understand how interrupts
work.  As for "vmstat -i", I'm not exactly sure what would signify a problem
but I get the following output:

Gateway1 (about 3-4 times the load of gateway2)
interrupt   total rate
irq0/clock 6455328221  399
irq0/ipi   2543041813  157
irq19/ohci0  91660
irq17/pciide0 76302290
irq0/bge0 25346022947 1570
irq1/bge1 21123330824 1308
Total 55475363200 3437

Gateway2:
interrupt   total rate
irq0/clock 6455272059  400
irq0/ipi   1819715207  112
irq19/ohci0 125740
irq17/pciide0 62321130
irq0/bge0  8118898045  503
irq1/bge1 12291117020  761
Total 28691247018 1777



On 9/26/07, Tom Bombadil <[EMAIL PROTECTED]> wrote:
>
> > net.inet.ip.ifq.maxlen defines how many packets can be queued in the IP
> > input queue before further packets are dropped. Packets comming from the
> > network card are first put into this queue and the actuall IP packet
> > processing is done later. Gigabit cards with interrupt mitigation may
> spit
> > out many packets per interrupt plus heavy use of pf can slowdown the
> > packet forwarding. So it is possible that a heavy burst of packets is
> > overflowing this queue. On the other hand you do not want to use a too
> big
> > number because this has negative effects on the system (livelock etc).
> > 256 seems to be a better default then the 50 but additional tweaking may
> > allow you to process a few packets more.
>
> Thanks Claudio...
>
> In the link that Stuart posted here, Henning mentions 256 times the
> number of interfaces:
> http://archive.openbsd.nu/?ml=openbsd-tech&a=2006-10&t=2474666
>
> I'll try both and see.
>
> Thanks you and Stuart for the hints.



Speed Problems

2007-09-25 Thread rezidue
I've been having problems with throughput on a box I'm using as an edge
gateway.  I can't seem to get it to push out more than 150Mb/sec at about
20k pps.  It's a Tyan Thunder K8SR (S2881) board that has two gig broadcom
interfaces on a shared pci-x bus.  It's on the bcm5704c chipset and I'm
running OpenBSD 4.0.  The machine has two dual core amd opteron chips and
two gigs of ram.  Barley any resources are being used when we are peaking
during the day.  When we hit around 140+Mb/sec I start seeing packet loss
and when I copy a file from this machine via scp to another host over the
gig lan I can see that it directly affects throughput.  I've spent all day
trying to find the problem but I've had no luck.  Any ideas?  Any info I can
provide?



IBGP Problems

2007-03-28 Thread rezidue
I guess I should start from the beginning.  When I originally started this
project my goal was to have two machines running carp between them and have
the master connect to two different ISP's sending full routes.  This was
working fine and failover didn't cause any issues.  At least I thought
everything was fine until I rebooted the machine.  For a reason I could not
figure out, I would end up having a kernel panic if I didn't kill bgpd at
startup.  What was happening would be the machine would boot, become master
of the carp interfaces, bgpd would connect to ebgp peers and fill the rib
and then process and add to fib.  After this occured the machine would look
to itself and totally ignore the fib.  While this goes on the host is not
having a problem receiving updates from it's peers right up until the kernel
panics.  I even built 4.1 just to test and had the same problem.  I didn't
have enough time to keep investigating so I plan on getting dumps to post
and hopefully aid with tracking down this problem, if it truly is one.

With that out of the way I can explain the problem I'm having now.  Since I
was unable to get the previous solution working I decided to split the
peering between the two servers, not have bgpd rely on carp and just use
carp for my gateway.  With this in place I started workign on ibgp between
the boxes to make sure that no matter where my default route went, the host
would send me through the best provider be it directly attached or off of
the other host.  Right off the bat I started noticing a problem that I
thought was either a configuration error or a misunderstanding of IBGP.
When the hosts connect to each other a full prefix table is sent but then
almost immediately it's neighbor starts withdrawing prefixes.  On one host I
jump from 210k prefixes in the initial connection to only 59k after all of
the withdrawals.  On the other host I go from 210k to 197k which isn't as
bad but I'm still unsure of why it does this.  I started to think that if
IBGP advertises a prefix that the other host is trying to advertise as well,
it just withdraws it from the neighbor since it already is announcing it but
the numbers just don't add up.  I came to this conclusion because if I have
one of the hosts announce none, the IBGP neighbor doesn't start withdrawing
after it sends the entire prefix table.  Hopefully someone has some ideas.
I also tried to convert ibgp to route-reflectors but I don't think removing
'announce all' and adding 'route-reflector' did anything because I saw the
same behavior.

Below is my config:

Host1
AS 111
router-id  172.16.1.1
listen on 172.16.1.1
listen on 172.16.2.1
network 192.168.10.0/24
network 192.168.11.0/24
neighbor 172.16.1.2 {
remote-as   6517
descr   yipes-1
local-address   172.16.1.1
holdtime180
announceself
depend on   trunk0
}

neighbor 172.16.1.3 {
remote-as   6517
descr   yipes-2
local-address 172.16.1.1
holdtime180
announceself
depend on   trunk0
}

neighbor 172.16.2.2 {
remote-as   111
descr   dyno1
local-address   172.16.2.1
holdtime180
announceall
depend on   trunk0
set nexthop self
}


host2
AS 111
router-id 172.16.3.1
listen on 172.16.3.1
listen on 172.16.2.2
network 192.168.10.0/24
network 192.168.11.0/24

neighbor 172.16.3.2 {
remote-as   174
descr   cogent-1
local-address   172.16.3.1
holdtime180
announceself
depend on   trunk0
}
neighbor 172.16.2.1 {
remote-as   111
descr   dyno2
local-address   172.16.2.2
holdtime180
announceall
depend on   trunk0
set nexthop self
}



Anyone have any ideas?  Thanks for taking the time to read this.