High CPU usage on high-bandwidth long distance connections.

2003-03-18 Thread Borje Josefsson

Hello,

Scenario:

Two hosts:

*** Host a:
CPU: Intel(R) Xeon(TM) CPU 2.80GHz (2790.96-MHz 686-class CPU)
Hyperthreading: 2 logical CPUs
real memory  = 1073676288 (1048512K bytes)
em0: flags=8843 mtu 4470
options=3
media: Ethernet autoselect (1000baseSX )

*** Host b:
CPU: Intel(R) Xeon(TM) CPU 2.80GHz (2790.96-MHz 686-class CPU)
Hyperthreading: 2 logical CPUs
real memory  = 536301568 (523732K bytes)

bge0: flags=8843 mtu 4470
options=3
media: Ethernet autoselect (1000baseSX )

Both Ethernet cards are PCI-X.
 
Parameters (for both hosts):

kern.ipc.maxsockbuf=8388608
net.inet.tcp.rfc1323=1
kern.ipc.nmbclusters="8192"

The hosts are connected directly (no LAN equipment inbetween) to high 
capacity backbone routers (10 Gbit/sec backbone), and are approx 1000 
km/625 miles(!) apart. Measuring RTT gives:
RTTmax = 20.64 ms. Buffer size needed = 3.69 Mbytes, so I add 25% and set:

sysctl net.inet.tcp.sendspace=4836562 
sysctl net.inet.tcp.recvspace=4836562

MTU=4470 all the way.

OS = FreeBSD 4-STABLE (as of today).

 Now the problem:

The receiver works fine, but on the *sender* I run out if CPU (doesn't 
matter if host a or host b is sender). Measuring bandwidth with ttcp gives:

ttcp-t: buflen=61440, nbuf=30517, align=16384/0, port=5001  tcp
ttcp-t: 1874964480 bytes in 22.39 real seconds = 638.82 Mbit/sec +++
ttcp-t: 30517 I/O calls, msec/call = 0.75, calls/sec = 1362.82
ttcp-t: 0.0user 20.8sys 0:22real 93% 16i+382d 326maxrss 0+15pf 9+280csw

This is very repeatable (within a few %), and is the same regardless of 
which direction I use.

During that period, the sender shows:

0.0% user,  0.0% nice, 94.6% system,  5.4% interrupt,  0.0% idle

I have read about DEVICE_POLLING, but that doesn't seem to be supported on 
any GigE PCI-X cards?!?

Does anybody have an idea on which knob to tune next to be able to fill my 
(long-distance) GigE link? I am mostly interested in what to do to not eat 
all my CPU, but also if there are anu other TCP parameters that I haven't 
thought about.

I have configured my kernel for SMP (Xeon CPU with hyperthreading), don't 
know if that is good or bad in this case?

With kind regards,

--Borje



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message


Re: High CPU usage on high-bandwidth long distance connections.

2003-03-18 Thread Ed Mooring
On Tue, Mar 18, 2003 at 08:51:29PM +0100, Borje Josefsson wrote:
[snip scenario]
> 
> The hosts are connected directly (no LAN equipment inbetween) to high 
> capacity backbone routers (10 Gbit/sec backbone), and are approx 1000 
> km/625 miles(!) apart. Measuring RTT gives:
> RTTmax = 20.64 ms. Buffer size needed = 3.69 Mbytes, so I add 25% and set:
> 
> sysctl net.inet.tcp.sendspace=4836562 
> sysctl net.inet.tcp.recvspace=4836562
> 
> MTU=4470 all the way.
> 
> OS = FreeBSD 4-STABLE (as of today).
> 
>  Now the problem:
> 
> The receiver works fine, but on the *sender* I run out if CPU (doesn't 
> matter if host a or host b is sender). Measuring bandwidth with ttcp gives:
> 
> ttcp-t: buflen=61440, nbuf=30517, align=16384/0, port=5001  tcp
> ttcp-t: 1874964480 bytes in 22.39 real seconds = 638.82 Mbit/sec +++
> ttcp-t: 30517 I/O calls, msec/call = 0.75, calls/sec = 1362.82
> ttcp-t: 0.0user 20.8sys 0:22real 93% 16i+382d 326maxrss 0+15pf 9+280csw
> 
> This is very repeatable (within a few %), and is the same regardless of 
> which direction I use.
> 
> During that period, the sender shows:
> 
> 0.0% user,  0.0% nice, 94.6% system,  5.4% interrupt,  0.0% idle

I had something vaguely similar happen while I was porting the FreeBSD
4.2 networking stack to LynxOS. It turned out the culprit was sbappend().
It does a linear pointer chase down the mbuf chain each time you do
a write() or send(). With a high bandwidth-delay product, that chain
can get very long.

This topic came up on freebsd-net last July, and Luigi Rizzo provided
the following URL for a patch to cache the end of the mbuf chain, so
sbappend() stays O(1) instead of O(n).

http://docs.freebsd.org/cgi/getmsg.cgi?fetch=366972+0+archive/2001/freebsd-net/20010211.freebsd-net

The subject of the July thread was 'the incredible shrinking socket', if
you want to hunt through the archives.

Hope this helps.

-- 
Ed Mooring ([EMAIL PROTECTED])

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message


Re: High CPU usage on high-bandwidth long distance connections.

2003-03-19 Thread Luigi Rizzo
On Tue, Mar 18, 2003 at 01:28:31PM -0800, Ed Mooring wrote:
...
> I had something vaguely similar happen while I was porting the FreeBSD
> 4.2 networking stack to LynxOS. It turned out the culprit was sbappend().
> It does a linear pointer chase down the mbuf chain each time you do
> a write() or send(). With a high bandwidth-delay product, that chain
> can get very long.
> 
> This topic came up on freebsd-net last July, and Luigi Rizzo provided
> the following URL for a patch to cache the end of the mbuf chain, so
> sbappend() stays O(1) instead of O(n).

the patch was only for UDP though. I think the poster was seeing the problem
with TCP (which is also affected by the same thing).

cheers
luigi

> http://docs.freebsd.org/cgi/getmsg.cgi?fetch=366972+0+archive/2001/freebsd-net/20010211.freebsd-net
> 
> The subject of the July thread was 'the incredible shrinking socket', if
> you want to hunt through the archives.
> 
> Hope this helps.
> 
> -- 
> Ed Mooring ([EMAIL PROTECTED])
> 
> To Unsubscribe: send mail to [EMAIL PROTECTED]
> with "unsubscribe freebsd-hackers" in the body of the message

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message


Re: High CPU usage on high-bandwidth long distance connections.

2003-03-19 Thread Borje Josefsson
On Wed, 19 Mar 2003 02:30:58 PST Sean Chittenden wrote:

> Ooooh!  Opportune timing!  I was going to bring this up on the
> performance@ list (core@, ::hint hint::), but now's as good of a time
> as any. 

Great!

> Luigi, I've updated the patch mentioned in this email.  Could you
> review this and possibly commit it or give it a green light for being
> committed?  What's the value of conditionalizing the O(1) behavior
> anyway?  It seems like a tail append would always be the preferred
> case. 

If Luigi "blesses" this patch, I am willing to to use my two boxes as 
guinea-pigs for this, as they currently aren't used for any production 
traffic.

--Börje


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message