Re: Problem in High Speed and Long Delay with FreeBSD

2002-11-11 Thread Mark Allman

> sockbuf datalen snd_time rcv_time
> -
> 16384   15000   0.0000.617
> 15  14  0.0034.021
> 50  495000  0.01514.083
> 100 995000  0.04228.577
> 150 149 0.07947.986
> 160 159 0.08844.055
> 180 179 0.10850.810
> 190 189 0.11755.010
> 200 199 1.01157.666
> 210 209 3.84560.233
> 300 299 39.594   122.308

Folks-

Thanks for all the suggestions.  This is what I get for comparing
apples and oranges.  The results from above were hacked out late one
night on my home machine (4.7) which is certainly not tuned as well
as the lab machines Cindy and Fran have been using.  It looks like I
was limited by the number of mbuf clusters.  The 4.1 machines at the
lab are not.  The kernel easily swallows any of the above amount of
data.

However...  I will say that I think it is bogus that freebsd blocked
my (asyncronous) write() call when it was out of mbuf clusters.  I
think it would be nice if that were fixed.

Back to the drawing board...  (Joe is digging through the TCP code
and chasing a couple of things.)

Thanks again!

allman


--
Mark Allman -- BBN/NASA GRC -- http://roland.grc.nasa.gov/~mallman/

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-net" in the body of the message



Re: Problem in High Speed and Long Delay with FreeBSD

2002-11-08 Thread Andre Oppermann
Mark Allman wrote:
> 
> Folks-
> 
> Lots of interesting thoughts on this thread already.  But, we have
> not yet figured it out.  So, a further data point...
> 
> I have been playing this evening on my machine at home -- a way old
> p5 running freebsd 4.7.  I am seeing the same problem as we see at
> GRC on the freebsd 4.1 boxes.  As I see it we can have one of a few
> possible problems:
> 
>   * For some reason the application is not able to fill the socket
> buffer.
> 
>   * For some reason TCP is not grabbing everything from the socket
> buffer.
> 
>   * TCP's congestion control is getting hosed somehow into thinking
> we need to slow down.

Try to enable the TCP inflight code from Matt Dillon. Have a look at
man tuning and turn the sysctls on. Also turn on the debug sysctl and
watch the bandwidthdelay product calculations. That might give you
some more hints on what is going on.

-- 
Andre


> So, my idea goes something like this: Say I have a connection with a
> X byte socket buffer.  I start a timer, open a TCP connection, write
> Y bytes (Y < X), close the connection and stop the timer. The time
> should be very small in this case, because I am really timing how
> long it take the app to send data to the OS.  (I am not enabling
> SO_LINGER).  So, I ran some tests, as follows (sockbuf == X, dataen
> == Y):
> 
> sockbuf datalen snd_time rcv_time
> -
> 16384   15000   0.0000.617
> 15  14  0.0034.021
> 50  495000  0.01514.083
> 100 995000  0.04228.577
> 150 149 0.07947.986
> 160 159 0.08844.055
> 180 179 0.10850.810
> 190 189 0.11755.010
> 200 199 1.01157.666
> 210 209 3.84560.233
> 300 299 39.594   122.308
> 
> So, except for the last 3 lines all the tests easily filled up the
> socket buffers and exited.  But, in the third line from the bottom
> we started to see something else.  We "filled" the buffers until we
> could fill no more and had to wait for some ACKs to allow the buffer
> to drain before dumping everything in it.  But, the socket buffer is
> bigger than the data stream so this should not be an issue, right?
> The situation only gets worse as we increase the transfer sizes (and
> socket buffer sizes).  The break point seems to be somewhere near
> 2MB.  For instance, when sending almost 3MB the sender though the
> transfer took ~40 seconds, while the receiver accurately shows ~122
> seconds.  That shows that the sender had to wait for a third of the
> data to drain from the network to get all its data into the socket
> buffers.  But, that should not have been.
> 
> One more piece of information...  The socket descriptor is placed in
> non-blocking mode.  I write in 8KB chunks.  So, I do a select(), and
> write accordingly.  However, even though I cannot write into the
> socket buffer at times, select() never fails to say that I can write
> the descriptor *and* I never see a short write().  But, clearly I am
> blocking or I'd shoot everything into the socket buffer.
> 
> Does any of this make any sense to anyone?  Any ideas on what might
> be wrong here?  Any suggestions on places to start looking?
> 
> Thanks!
> 
> allman
> 
> --
> Mark Allman -- NASA GRC/BBN -- http://roland.grc.nasa.gov/~mallman/
> 
> To Unsubscribe: send mail to [EMAIL PROTECTED]
> with "unsubscribe freebsd-net" in the body of the message

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-net" in the body of the message



Re: Problem in High Speed and Long Delay with FreeBSD

2002-11-07 Thread Mark Allman

Folks- 

Lots of interesting thoughts on this thread already.  But, we have
not yet figured it out.  So, a further data point...

I have been playing this evening on my machine at home -- a way old
p5 running freebsd 4.7.  I am seeing the same problem as we see at
GRC on the freebsd 4.1 boxes.  As I see it we can have one of a few
possible problems:

  * For some reason the application is not able to fill the socket
buffer. 

  * For some reason TCP is not grabbing everything from the socket
buffer.

  * TCP's congestion control is getting hosed somehow into thinking
we need to slow down.

So, my idea goes something like this: Say I have a connection with a
X byte socket buffer.  I start a timer, open a TCP connection, write
Y bytes (Y < X), close the connection and stop the timer. The time
should be very small in this case, because I am really timing how
long it take the app to send data to the OS.  (I am not enabling
SO_LINGER).  So, I ran some tests, as follows (sockbuf == X, dataen
== Y):

sockbuf datalen snd_time rcv_time
-
16384   15000   0.0000.617
15  14  0.0034.021
50  495000  0.01514.083
100 995000  0.04228.577
150 149 0.07947.986
160 159 0.08844.055
180 179 0.10850.810
190 189 0.11755.010
200 199 1.01157.666
210 209 3.84560.233
300 299 39.594   122.308

So, except for the last 3 lines all the tests easily filled up the
socket buffers and exited.  But, in the third line from the bottom
we started to see something else.  We "filled" the buffers until we
could fill no more and had to wait for some ACKs to allow the buffer
to drain before dumping everything in it.  But, the socket buffer is
bigger than the data stream so this should not be an issue, right?
The situation only gets worse as we increase the transfer sizes (and
socket buffer sizes).  The break point seems to be somewhere near
2MB.  For instance, when sending almost 3MB the sender though the
transfer took ~40 seconds, while the receiver accurately shows ~122
seconds.  That shows that the sender had to wait for a third of the
data to drain from the network to get all its data into the socket
buffers.  But, that should not have been.

One more piece of information...  The socket descriptor is placed in
non-blocking mode.  I write in 8KB chunks.  So, I do a select(), and
write accordingly.  However, even though I cannot write into the
socket buffer at times, select() never fails to say that I can write
the descriptor *and* I never see a short write().  But, clearly I am
blocking or I'd shoot everything into the socket buffer.

Does any of this make any sense to anyone?  Any ideas on what might
be wrong here?  Any suggestions on places to start looking?

Thanks!

allman


--
Mark Allman -- NASA GRC/BBN -- http://roland.grc.nasa.gov/~mallman/


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-net" in the body of the message



Re: Problem in High Speed and Long Delay with FreeBSD

2002-11-04 Thread Bakul Shah
> Your suggestion of increasing the -l seems to have made a positive
> impact -- tests this morning with a higher buffer length size of 8192
> gave us a better throughput of 44Mbps.  Now the time sequence plot
> shows a window usage of 1.5MB as opposed to the previous 1MB usage.
>
> We still don't understand as to why we are not getting a larger
> window usage when we are requesting a 3MB socket buffer. (BTW,
> a typo in my example testing commands, left out a "0" in the
> "-b".)

Since *something* is making a difference, you may wish to try
changing one independent parameter at a time.  For instance,
do you get different throughput numbers with -l = 16k, 32k,
and so on?  What is the limit?  You will want to decrease the
-n parameter correspondingly so as to not keep waiting longer
and longer!

Similarly try changing other limits one at a time.

If you can, try to use the *same* non-FreeBSD machine at one
end and characterize send and recv performance of FreeBSD
separately.  This ought to help you figure out where the
bottleneck may be.

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-net" in the body of the message



Re: Problem in High Speed and Long Delay with FreeBSD

2002-11-01 Thread Barney Wolff
Try retrieving a very large file via ftp.  The sendfile() code seems
more efficient than ttcp, and if performance improves that may be
a clue that the problem lies in the user/kernel interface.  If not,
probably in the stack.  Could it conceivably be a resonance effect
between the actual rtt and the stack timing granularity?

I would also try setting ttcp's block size to a multiple of the exact
transmitted seg size rather than a power of 2.
Barney

-- 
Barney Wolff http://www.databus.com/bwresume.pdf
I'm available by contract or FT, in the NYC metro area or via the 'Net.

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-net" in the body of the message



Re: Problem in High Speed and Long Delay with FreeBSD

2002-11-01 Thread Mark Allman

> Are you sure you're not hitting the top of the pipe and bouncing
> around in congestion avoidance ?  Unless your window size limits
> your bw at exactly the correct amount, you'll never get the steady
> state

We're not bouncing around.  We see no loss, which would indicate
that either we should continue slow start (exponential cwnd growth)
or stop growing cwnd when we hit the advertised window.  But, we do
neither of these.  At some point the cwnd just stops growing -- even
though we have plenty of advertised window left and no loss.

Hm.

allman


--
Mark Allman -- BBN/NASA GRC -- http://roland.grc.nasa.gov/~mallman/

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-net" in the body of the message



Re: Problem in High Speed and Long Delay with FreeBSD

2002-11-01 Thread Mark Allman

> > From: Mark Allman [mailto:mallman@;grc.nasa.gov]
> > Thanks!  Other ideas?
> 
> What MSS is advertised on each end?

1500 byte packets (from looking at the trace file).

allman


--
Mark Allman -- BBN/NASA GRC -- http://roland.grc.nasa.gov/~mallman/

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-net" in the body of the message



RE: Problem in High Speed and Long Delay with FreeBSD

2002-11-01 Thread Don Bowman
> From: Mark Allman [mailto:mallman@;grc.nasa.gov]
> Thanks!  Other ideas?

What MSS is advertised on each end?

--don ([EMAIL PROTECTED] www.sandvine.com)

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-net" in the body of the message



Re: Problem in High Speed and Long Delay with FreeBSD

2002-11-01 Thread rick norman
Are you sure you're not hitting the top of the pipe and bouncing
around in congestion avoidance ?  Unless your window size limits
your bw at exactly the correct amount, you'll never get the steady state

bw you want.


Mark Allman wrote:

> > Have you checked that both sides are negotiating SACK?
>
> No SACK in 4.1.  But, there is no loss in th connection.
>
> > And both sides are negotiating a window scale option sufficiently
> > large? (sounds like you need a window scale option of at least 5
> > bits?)
>
> We're seeing a shift of 6.
>
> > And the socket-buffer to ttcp is actually being set as large
> > as you think? (perhaps run 'ktrace' or 'truss' on ttcp and look
> > for an error on the setsockopt).
>
> We hacked ttcp to run getsockopt() to tell us if the kernel did not
> honor our setsockopt() request.  All looks fine.
>
> Thanks!  Other ideas?
>
> allman
>
> --
> Mark Allman -- BBN/NASA GRC -- http://roland.grc.nasa.gov/~mallman/
>
> To Unsubscribe: send mail to [EMAIL PROTECTED]
> with "unsubscribe freebsd-net" in the body of the message

--
"I'm a-goin' to stay where you sleep all day
Where they hung the jerk that invented work
In the Big Rock Candy Mountains"

wk: 408 742 1619
[EMAIL PROTECTED]

hm: 650 726 0677
[EMAIL PROTECTED]
cell: 650 303 3877



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-net" in the body of the message



Re: Problem in High Speed and Long Delay with FreeBSD

2002-11-01 Thread Mark Allman

> Have you checked that both sides are negotiating SACK?

No SACK in 4.1.  But, there is no loss in th connection.

> And both sides are negotiating a window scale option sufficiently
> large? (sounds like you need a window scale option of at least 5
> bits?)

We're seeing a shift of 6.

> And the socket-buffer to ttcp is actually being set as large
> as you think? (perhaps run 'ktrace' or 'truss' on ttcp and look
> for an error on the setsockopt).

We hacked ttcp to run getsockopt() to tell us if the kernel did not
honor our setsockopt() request.  All looks fine.

Thanks!  Other ideas?

allman


--
Mark Allman -- BBN/NASA GRC -- http://roland.grc.nasa.gov/~mallman/

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-net" in the body of the message



RE: Problem in High Speed and Long Delay with FreeBSD

2002-11-01 Thread Don Bowman
> From: Fran Lawas-Grodek [mailto:Fran.Lawas-Grodek@;grc.nasa.gov]
> Well... our development code that we are to ultimately test was
> developed on 4.1, thus we really need to try to stick with 4.1.
> It does not look like either of the above parameters are available
> until 4.7.

No worries.
Have you checked that both sides are negotiating SACK?
And both sides are negotiating a window scale option sufficiently
large? (sounds like you need a window scale option of at least 5
bits?)
And the socket-buffer to ttcp is actually being set as large
as you think? (perhaps run 'ktrace' or 'truss' on ttcp and look
for an error on the setsockopt).
http://www.rfc-editor.org/rfc/rfc1323.txt has some other
suggestions I think, but I'm guessing you've already gone
over it.

--don ([EMAIL PROTECTED] www.sandvine.com)

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-net" in the body of the message



Re: Problem in High Speed and Long Delay with FreeBSD

2002-11-01 Thread Fran Lawas-Grodek
On Fri, Nov 01, 2002 at 11:18:50AM -0500, Don Bowman wrote:
> 
> Perhaps 
> sysctl net.inet.tcp.inflight_enable=1
> will help?
> 
> you may wish to also change tcp.inflight_max.
> See tcp(4) as of 4.7.

Hello Don,

Well... our development code that we are to ultimately test was
developed on 4.1, thus we really need to try to stick with 4.1.
It does not look like either of the above parameters are available
until 4.7.

Thank you very much for your suggestion,


Fran Lawas-Grodek
-- 


Frances J. Lawas-Grodek   | 
NASA Glenn Research Center| phone: (216) 433-5052
21000 Brookpark Rd, MS 142-4  | fax  : (216) 433-8000
Cleveland, Ohio  44135| email: [EMAIL PROTECTED]




To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-net" in the body of the message



Re: Problem in High Speed and Long Delay with FreeBSD

2002-11-01 Thread Fran Lawas-Grodek
Hello Bakul,

Your suggestion of increasing the -l seems to have made a positive
impact -- tests this morning with a higher buffer length size of 8192
gave us a better throughput of 44Mbps.  Now the time sequence plot
shows a window usage of 1.5MB as opposed to the previous 1MB usage.
We still don't understand as to why we are not getting a larger
window usage when we are requesting a 3MB socket buffer. (BTW,
a typo in my example testing commands, left out a "0" in the
"-b".)

Any other thoughts would be greatly appreciated.

Fran Lawas-Grodek
Cindy Tran
NASA Glenn Research Center


Frances J. Lawas-Grodek   | 
NASA Glenn Research Center| phone: (216) 433-5052
21000 Brookpark Rd, MS 142-4  | fax  : (216) 433-8000
Cleveland, Ohio  44135| email: [EMAIL PROTECTED]


On Thu, Oct 31, 2002 at 07:01:30PM -0800, Bakul Shah wrote:
> > Testing commands used:
> >   receiver> ttcp -b 312500 -l 1024 -r -s
> >   sender>   ttcp -b 312500 -l 1024 -n 10 -s -t receiverhost
> 
> Pure speculation:
> 
> Since you are writing 1K at a time, could it be an N^2 effect
> while appending mbufs?  Since you can have many MBs in the
> pipe due to large delay*BW, you can potentially have many
> many mbufs.  The Nth write will have to traverse O(N) mbuf
> chains to append the new data.  One way to test is to
> increase the -l parameter value to something large and see if
> the throughput improves.  If this is the case, FreeBSD will
> have to optimize this common case for stream protocols.
> 
> Note that I haven't even looked at the relevant FreeBSD code!
> For all I know it may already be doing this.
> 
> -- bakul

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-net" in the body of the message



RE: Problem in High Speed and Long Delay with FreeBSD

2002-11-01 Thread Don Bowman
> From: Fran Lawas-Grodek [mailto:Fran.Lawas-Grodek@;grc.nasa.gov]

Perhaps 
sysctl net.inet.tcp.inflight_enable=1
will help?

you may wish to also change tcp.inflight_max.
See tcp(4) as of 4.7.

--don ([EMAIL PROTECTED] www.sandvine.com)

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-net" in the body of the message



Re: Problem in High Speed and Long Delay with FreeBSD

2002-11-01 Thread Fran Lawas-Grodek
These are our sysctl settings:

 kern.ipc.maxsockbuf=4194304
 net.inet.tcp.sendspace=3125000
 net.inet.tcp.recvspace=3125000
 net.inet.ip.intr_queue_maxlen=500
 nmbclusters=32768

After reading your suggestion, we were able to achieve a
slightly better throughput from 32Mbps on the 250ms delayed
network, to 46Mbps overall throughput, by increasing the -l
buffer length from 1024 to 8192 bytes.  Increasing
the above intr_queue_maxlen from the default 50 to 500 also
helped a bit.  Our time sequence plot now shows a sender buffer
window of 1.5MB being used, from the 1MB of the earlier tests.
(BTW, a typo in the my posted ttcp example -- "-b" should
be "-b 3125000", not "-b 312500".)

We still do not understand why we cannot not get a better window
usage of our requested socket buffer of 3MB, a better throughput of
60+Mbps.

Any other thoughts?


Fran Lawas-Grodek
Cindy Tran
NASA Glenn Research Center


Frances J. Lawas-Grodek   | 
NASA Glenn Research Center| phone: (216) 433-5052
21000 Brookpark Rd, MS 142-4  | fax  : (216) 433-8000
Cleveland, Ohio  44135| email: [EMAIL PROTECTED]



On Thu, Oct 31, 2002 at 03:02:17PM -0800, Luigi Rizzo wrote:
> you might want to have a look at the sysctl variable
> kern.ipc.sockbuf_waste_factor too.
> 
> Remember that memory is charged to socket buffers depending on how
> many clusters are allocated, even if they are not fully used.
> E.g. in your example you are probably doing 1KB writes each of
> which consumes a 2KB cluster plus a 256byte mbuf, so no
> matter what you will never manage to reach more than (roughly)
> a window larger than kern.ipc.maxsockbuf/2.
> 
> The max raw amounf of memory allocated in a socket buffer is
> typically
> 
>   min( tcp_{send|recv}buf * kern.ipc.sockbuf_waste_factor,
>   kern.ipc.maxsockbuf)
> 
> and probably you are hitting the roof on the second one.
> 
>   cheers
>   luigi

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-net" in the body of the message



Re: Problem in High Speed and Long Delay with FreeBSD

2002-10-31 Thread Luigi Rizzo
you might want to have a look at the sysctl variable
kern.ipc.sockbuf_waste_factor too.

Remember that memory is charged to socket buffers depending on how
many clusters are allocated, even if they are not fully used.
E.g. in your example you are probably doing 1KB writes each of
which consumes a 2KB cluster plus a 256byte mbuf, so no
matter what you will never manage to reach more than (roughly)
a window larger than kern.ipc.maxsockbuf/2.

The max raw amounf of memory allocated in a socket buffer is
typically

min( tcp_{send|recv}buf * kern.ipc.sockbuf_waste_factor,
kern.ipc.maxsockbuf)

and probably you are hitting the roof on the second one.

cheers
luigi

On Thu, Oct 31, 2002 at 01:56:01PM -0500, Fran Lawas-Grodek wrote:
> Hello,
> 
> Hopefully someone might have some advice on our problem.
> 
> We are setting up a testbed consisting of FreeBSD 4.1 on the sender
> and receiver machines.  (This older version of FreeBSD is necessary
> due to subsequent TCP development patches that are to be tested.)
> The problem that we are seeing is that on a 100Mbps link with a
> 250millisecond delay, our overall throughput does not exceed 32Mbps.
> We expect to see at least 60Mbps with the 250ms delay.  Without a
> delay, throughput is acceptable at 87Mbps.  Tcptrace output shows no
> retransmissions and no out-of-order packets under the delay.
> 
> With the delay, we are setting up a 3MByte sender and receiver socket
> buffer size, but through the time sequence plot, we only see about 1MB
> used of the buffer on the sender side.  Additional tests were run with an
> 80ms delay and requesting a 1MB buffer, and yet it appears only are
> allowed to use up to 500KB of the window.  Raising the requested
> socket buffer size shows no affect -- we keep running into some 
> invisible threshold that is much smaller than the requested socket
> buffer size.  A hacked version of the ttcp application prints out that
> what we request in the -b buffer size via setsockopt, is what we get 
> (via getsockopt).  Unfortunately, the xplots show that our transfers
> reach a lower threshold (1MB instead of 3MB, 500KB instead of 1MB).
> 
> Testing commands used:
>   receiver> ttcp -b 312500 -l 1024 -r -s
>   sender>   ttcp -b 312500 -l 1024 -n 10 -s -t receiverhost
> 
> The -b above is set to the value of the Bandwidth Delay Product
> for a 100Mbps link and 250ms delay.
> 
> Per the Pittsburgh Supercomputing site, we have already increased the
> maxsockbuf, nmbclusters, memory, tcp_sendspace, tcp_recvspace sysctl
> values, but are still resulting in this low throughput.
> 
> The network cards that we are using are 3Com95-TX 100BaseT cards.  The
> machines with the FreeBSD installations are Pentium II 400Mhz with
> Xeon chips, 400 Mb RAM each.
> 
> Using two other non-FreeBSD machines, we have verified that it is not
> the intermediary network routers or delay emulator equipment, as the 
> non-FreeBSD machines give the expected throughput under delayed and 
> no delay conditions. (60+Mbps under 250ms delay)
> 
> Has anyone else any experiences in this type of testing?  
> Perhaps there might be something wrong with our network card driver?  
> Any other suggested network cards to try?  
> Does anyone know of a limit that needs to be tweaked in the TCP 
> stack or FreeBSD operating system that would allow us to actually 
> use more than this invisible socket buffer limit?  One thought is 
> that there is some sort of calculated limit that won't allow us to 
> send more packets than our requested socket buffer will allow, but 
> not having any the kernel expertise, we are not sure where to look.
> 
> 
> Thanks in advance for any suggestions,
> 
> Fran Lawas-Grodek
> Cindy Tran
> NASA Glenn Research Center
> Cleveland OH USA
> 
> (ps: We have already consulted with Mark Allman at our lab, and he
> is just as stumped.  The feeling is that the cause of the problem
> is buried somewhere in the kernel, not due to the network cards.)
> 
> -- 
> 
> 
> Frances J. Lawas-Grodek   | 
> NASA Glenn Research Center| phone: (216) 433-5052
> 21000 Brookpark Rd, MS 142-4  | fax  : (216) 433-8000
> Cleveland, Ohio  44135| email: [EMAIL PROTECTED]
> 
> 
> 
> 
> To Unsubscribe: send mail to [EMAIL PROTECTED]
> with "unsubscribe freebsd-net" in the body of the message

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-net" in the body of the message