read() returns ETIMEDOUT on steady TCP connection

2008-04-19 Thread Mark Hills

Hello,

I'm are having a trouble with TCP connections being dropped with "read: 
Operation timed out". What is unusual is that this is happening right in 
the middle of sending a steady stream of data with no network congestion.


The system is FreeBSD 7 and a bespoke streaming server with 1Gbit 
connection. The server receives a 192kbps inbound stream over TCP, and 
broadcasts it over a large number of TCP streams.


With no visible or obvious pattern, the inbound read() fails with 
ETIMEDOUT. The likelihood of this happening seems to increase as the 
number of audience connections increases. It's happens every few minutes 
even with a small audience (eg. 300 outbound connections and about 
60mbit).


It doesn't cough and splutter -- steady data is coming in, then it just 
drops the connection.


systat doesn't show problems inbound; all packets received are delivered 
to the upper layer. But on outbound, there is consistent 'output drops':


IP Output
7028 total packets sent
7028 - generated locally
 314 - output drops

As the number of outbound connections increases, the 'output drops' 
increases to around 10% of the total packets sent and maintains that 
ratio. There's no problems with network capacity.


I've tried different servers, different network interfaces (bge, em), 
different kernel (7-RELEASE, 7-STABLE). Have also checked dev.bge.0.stats 
and dev.em.0.stats for CRC errors etc. which show no problems. 'netstat 
-m' doesn't show any reaching of mbuf and sbuf limits. The problem is seen 
in a dedicated, uncontended test environment.


Can anyone explain why the packets are being dropped outbound, and how 
this could affect inbound TCP data in such an abrupt way? What can I do to 
solve this?


Thanks,

Mark
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: read() returns ETIMEDOUT on steady TCP connection

2008-04-19 Thread Peter Jeremy
On Sat, Apr 19, 2008 at 03:27:28PM +0100, Mark Hills wrote:
>I'm are having a trouble with TCP connections being dropped with "read: 
>Operation timed out". What is unusual is that this is happening right in 
>the middle of sending a steady stream of data with no network congestion.

Can you give some more detail about your hardware (speed, CPU,
available RAM, UP or SMP) and the application (roughly what does the
core of the code look like and is it single-threaded/multi-threaded
and/or multi-process).

>systat doesn't show problems inbound; all packets received are delivered to 
>the upper layer. But on outbound, there is consistent 'output drops':
>
>IP Output
>7028 total packets sent
>7028 - generated locally
> 314 - output drops
>
>As the number of outbound connections increases, the 'output drops' 
>increases to around 10% of the total packets sent and maintains that ratio. 
>There's no problems with network capacity.

'output drops' (ips_odropped) means that the kernel is unable to
buffer the write (no mbufs or send queue full).  Userland should see
ENOBUFS unless the error was triggered by a fragmentation request.

I can't explain the problem but it definitely looks like a resource
starvation issue within the kernel.

-- 
Peter Jeremy
Please excuse any delays as the result of my ISP's inability to implement
an MTA that is either RFC2821-compliant or matches their claimed behaviour.


pgp497bYIDN9y.pgp
Description: PGP signature


Re: read() returns ETIMEDOUT on steady TCP connection

2008-04-20 Thread Mark Hills

On Sun, 20 Apr 2008, Peter Jeremy wrote:


Can you give some more detail about your hardware (speed, CPU,
available RAM, UP or SMP) and the application (roughly what does the
core of the code look like and is it single-threaded/multi-threaded
and/or multi-process).


The current test is a Dell 2650, 2Gb, Quad Xeon with onboard bge.

The application is single threaded, non-blocking multiplexed I/O based on 
poll(). It's relatively simple at its core -- read() from an inbound 
connection and write() to outbound sockets.



As the number of outbound connections increases, the 'output drops'
increases to around 10% of the total packets sent and maintains that ratio.
There's no problems with network capacity.


'output drops' (ips_odropped) means that the kernel is unable to
buffer the write (no mbufs or send queue full).  Userland should see
ENOBUFS unless the error was triggered by a fragmentation request.


The app definitely isn't seeing ENOBUFS; this would be treated as a fatal 
condition and reported.



I can't explain the problem but it definitely looks like a resource
starvation issue within the kernel.


I've traced the source of the ETIMEDOUT within the kernel to 
tcp_timer_rexmt() in tcp_timer.c:


  if (++tp->t_rxtshift > TCP_MAXRXTSHIFT) {
  tp->t_rxtshift = TCP_MAXRXTSHIFT;
  tcpstat.tcps_timeoutdrop++;
  tp = tcp_drop(tp, tp->t_softerror ?
tp->t_softerror : ETIMEDOUT);
  goto out;
  }

I'm new to FreeBSD, but it seems to implies that it's reaching a limit of 
a number of retransmits of sending ACKs on the TCP connection receiving 
the inbound data? But I checked this using tcpdump on the server and could 
see no retransmissions.


As a test, I ran a simulation with the necessary changes to increase 
TCP_MAXRXTSHIFT (including adding appropriate entries to 
tcp_sync_backoff[] and tcp_backoff[]) and it appeared I was able to reduce 
the frequency of the problem occurring, but not to a usable level.


With ACKs in mind, I took the test back to stock kernel and configuration, 
and went ahead with disabling sack on the server and the client which 
supplies the data (FreeBSD 6.1, not 7). This greatly reduced the 
'duplicate acks' metric, but didn't fix the problem. The next step was to 
switch off delayed_ack as well, and I didn't see the problem for some 
hours on the test system at 850mbit output. But hasn't eliminated it, as 
it happened again.


Perhaps someone with a greater knowledge can help to join the dots of all 
these symptoms?


Mark
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: read() returns ETIMEDOUT on steady TCP connection

2008-04-20 Thread Andre Oppermann

Mark Hills wrote:

On Sun, 20 Apr 2008, Peter Jeremy wrote:


Can you give some more detail about your hardware (speed, CPU,
available RAM, UP or SMP) and the application (roughly what does the
core of the code look like and is it single-threaded/multi-threaded
and/or multi-process).


The current test is a Dell 2650, 2Gb, Quad Xeon with onboard bge.

The application is single threaded, non-blocking multiplexed I/O based 
on poll(). It's relatively simple at its core -- read() from an inbound 
connection and write() to outbound sockets.



As the number of outbound connections increases, the 'output drops'
increases to around 10% of the total packets sent and maintains that 
ratio.

There's no problems with network capacity.


'output drops' (ips_odropped) means that the kernel is unable to
buffer the write (no mbufs or send queue full).  Userland should see
ENOBUFS unless the error was triggered by a fragmentation request.


The app definitely isn't seeing ENOBUFS; this would be treated as a 
fatal condition and reported.


TCP application will never see ENOBUFS.  TCP tries to reliably deliver
all data even on temporary memory shortages that prevent it from sending
a segment right now.  Only after all those retries failed it will report
ETIMEDOUT and abort the connection.


I can't explain the problem but it definitely looks like a resource
starvation issue within the kernel.


I've traced the source of the ETIMEDOUT within the kernel to 
tcp_timer_rexmt() in tcp_timer.c:


  if (++tp->t_rxtshift > TCP_MAXRXTSHIFT) {
  tp->t_rxtshift = TCP_MAXRXTSHIFT;
  tcpstat.tcps_timeoutdrop++;
  tp = tcp_drop(tp, tp->t_softerror ?
tp->t_softerror : ETIMEDOUT);
  goto out;
  }


Yes, this is related to either lack of mbufs to create a segment
or a problem in sending it.  That may be full interface queue, a
bandwidth manager (dummynet) or some firewall internally rejecting
the segment (ipfw, pf).  Do you run any firewall in stateful mode?

I'm new to FreeBSD, but it seems to implies that it's reaching a limit 
of a number of retransmits of sending ACKs on the TCP connection 
receiving the inbound data? But I checked this using tcpdump on the 
server and could see no retransmissions.


When you have internal problems the segment never makes it to the
wire and thus you wont see it in tcpdump.

Please report the output of 'netstat -s -p tcp' and 'netstat -m'.

As a test, I ran a simulation with the necessary changes to increase 
TCP_MAXRXTSHIFT (including adding appropriate entries to 
tcp_sync_backoff[] and tcp_backoff[]) and it appeared I was able to 
reduce the frequency of the problem occurring, but not to a usable level.


Possible causes are timers that fire too early.  Resource starvation
(you are doing a lot of traffic).  Or of course some bug in the code.

With ACKs in mind, I took the test back to stock kernel and 
configuration, and went ahead with disabling sack on the server and the 
client which supplies the data (FreeBSD 6.1, not 7). This greatly 
reduced the 'duplicate acks' metric, but didn't fix the problem. The 
next step was to switch off delayed_ack as well, and I didn't see the 
problem for some hours on the test system at 850mbit output. But hasn't 
eliminated it, as it happened again.


Perhaps someone with a greater knowledge can help to join the dots of 
all these symptoms?


--
Andre

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: read() returns ETIMEDOUT on steady TCP connection

2008-04-20 Thread Sten Daniel Soersdal

Mark Hills wrote:

On Sun, 20 Apr 2008, Peter Jeremy wrote:


Can you give some more detail about your hardware (speed, CPU,
available RAM, UP or SMP) and the application (roughly what does the
core of the code look like and is it single-threaded/multi-threaded
and/or multi-process).


The current test is a Dell 2650, 2Gb, Quad Xeon with onboard bge.

The application is single threaded, non-blocking multiplexed I/O based 
on poll(). It's relatively simple at its core -- read() from an inbound 
connection and write() to outbound sockets.



As the number of outbound connections increases, the 'output drops'
increases to around 10% of the total packets sent and maintains that 
ratio.

There's no problems with network capacity.


'output drops' (ips_odropped) means that the kernel is unable to
buffer the write (no mbufs or send queue full).  Userland should see
ENOBUFS unless the error was triggered by a fragmentation request.


The app definitely isn't seeing ENOBUFS; this would be treated as a 
fatal condition and reported.



I can't explain the problem but it definitely looks like a resource
starvation issue within the kernel.


I've traced the source of the ETIMEDOUT within the kernel to 
tcp_timer_rexmt() in tcp_timer.c:


  if (++tp->t_rxtshift > TCP_MAXRXTSHIFT) {
  tp->t_rxtshift = TCP_MAXRXTSHIFT;
  tcpstat.tcps_timeoutdrop++;
  tp = tcp_drop(tp, tp->t_softerror ?
tp->t_softerror : ETIMEDOUT);
  goto out;
  }

I'm new to FreeBSD, but it seems to implies that it's reaching a limit 
of a number of retransmits of sending ACKs on the TCP connection 
receiving the inbound data? But I checked this using tcpdump on the 
server and could see no retransmissions.


As a test, I ran a simulation with the necessary changes to increase 
TCP_MAXRXTSHIFT (including adding appropriate entries to 
tcp_sync_backoff[] and tcp_backoff[]) and it appeared I was able to 
reduce the frequency of the problem occurring, but not to a usable level.


With ACKs in mind, I took the test back to stock kernel and 
configuration, and went ahead with disabling sack on the server and the 
client which supplies the data (FreeBSD 6.1, not 7). This greatly 
reduced the 'duplicate acks' metric, but didn't fix the problem. The 
next step was to switch off delayed_ack as well, and I didn't see the 
problem for some hours on the test system at 850mbit output. But hasn't 
eliminated it, as it happened again.


Perhaps someone with a greater knowledge can help to join the dots of 
all these symptoms?


Verify that you are not experiencing connection loss due to mtu related 
issues. What is path mtu? is mss adjusted along the way?

Try turning off txcsum and rxcsum using ifconfig.

Just my $0.02

--
Sten Daniel Soersdal
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: read() returns ETIMEDOUT on steady TCP connection

2008-04-20 Thread Mark Hills

On Mon, 21 Apr 2008, Andre Oppermann wrote:


Mark Hills wrote:

On Sun, 20 Apr 2008, Peter Jeremy wrote:



I can't explain the problem but it definitely looks like a resource
starvation issue within the kernel.


I've traced the source of the ETIMEDOUT within the kernel to 
tcp_timer_rexmt() in tcp_timer.c:


  if (++tp->t_rxtshift > TCP_MAXRXTSHIFT) {
  tp->t_rxtshift = TCP_MAXRXTSHIFT;
  tcpstat.tcps_timeoutdrop++;
  tp = tcp_drop(tp, tp->t_softerror ?
tp->t_softerror : ETIMEDOUT);
  goto out;
  }


Yes, this is related to either lack of mbufs to create a segment
or a problem in sending it.  That may be full interface queue, a
bandwidth manager (dummynet) or some firewall internally rejecting
the segment (ipfw, pf).  Do you run any firewall in stateful mode?


There's no firewall running.

I'm new to FreeBSD, but it seems to implies that it's reaching a limit 
of a number of retransmits of sending ACKs on the TCP connection 
receiving the inbound data? But I checked this using tcpdump on the 
server and could see no retransmissions.


When you have internal problems the segment never makes it to the
wire and thus you wont see it in tcpdump.

Please report the output of 'netstat -s -p tcp' and 'netstat -m'.


Posted below. You can see it it in there: "131 connections dropped by 
rexmit timeout"


As a test, I ran a simulation with the necessary changes to increase 
TCP_MAXRXTSHIFT (including adding appropriate entries to tcp_sync_backoff[] 
and tcp_backoff[]) and it appeared I was able to reduce the frequency of 
the problem occurring, but not to a usable level.


Possible causes are timers that fire too early.  Resource starvation
(you are doing a lot of traffic).  Or of course some bug in the code.


As I said in my original email, the data transfer doesn't stop or 
splutter, it's simply cut mid-flow. Sounds like something happening 
prematurely.


Thanks for the help,

Mark



$ netstat -m
14632/8543/23175 mbufs in use (current/cache/total)
504/4036/4540/25600 mbuf clusters in use (current/cache/total/max)
504/3976 mbuf+clusters out of packet secondary zone in use (current/cache)
12550/250/12800/12800 4k (page size) jumbo clusters in use
(current/cache/total/max)
0/0/0/6400 9k jumbo clusters in use (current/cache/total/max)
0/0/0/3200 16k jumbo clusters in use (current/cache/total/max)
54866K/11207K/66073K bytes allocated to network (current/cache/total)
0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters)
0/0/0 requests for jumbo clusters denied (4k/9k/16k)
0/6/6656 sfbufs in use (current/peak/max)
0 requests for sfbufs denied
0 requests for sfbufs delayed
0 requests for I/O initiated by sendfile
0 calls to protocol drain routines

$ netstat -s -p tcp
tcp:
   3408601864 packets sent
   3382078274 data packets (1431587515 bytes)
   454189057 data packets (1209708476 bytes) retransmitted
   14969051 data packets unnecessarily retransmitted
   0 resends initiated by MTU discovery
   2216740 ack-only packets (9863 delayed)
   0 URG only packets
   0 window probe packets
   273815 window update packets
   35946 control packets
   2372591976 packets received
   1991632669 acks (for 2122913190 bytes)
   16032443 duplicate acks
   0 acks for unsent data
   1719033 packets (1781984933 bytes) received in-sequence
   1404 completely duplicate packets (197042 bytes)
   1 old duplicate packet
   54 packets with some dup. data (6403 bytes duped)
   9858 out-of-order packets (9314285 bytes)
   0 packets (0 bytes) of data after window
   0 window probes
   363132176 window update packets
   3 packets received after close
   0 discarded for bad checksums
   0 discarded for bad header offset fields
   0 discarded because packet too short
   635 discarded due to memory problems
   39 connection requests
   86333 connection accepts
   0 bad connection attempts
   2256 listen queue overflows
   8557 ignored RSTs in the windows
   86369 connections established (including accepts)
   83380 connections closed (including 31174 drops)
   74004 connections updated cached RTT on close
   74612 connections updated cached RTT variance on close
   74591 connections updated cached ssthresh on close
   3 embryonic connections dropped
   1979184038 segments updated rtt (of 1729113221 attempts)
   110108313 retransmit timeouts
   131 connections dropped by rexmit timeout
   1 persist timeout
   0 connections dropped by persist timeout
   0 Connections (fin_wait_2) dropped because of timeout
   23 keepalive timeouts
   22 keepalive probes sent
 

Re: read() returns ETIMEDOUT on steady TCP connection

2008-04-21 Thread Andre Oppermann

Mark Hills wrote:

On Mon, 21 Apr 2008, Andre Oppermann wrote:


Mark Hills wrote:

On Sun, 20 Apr 2008, Peter Jeremy wrote:



I can't explain the problem but it definitely looks like a resource
starvation issue within the kernel.


I've traced the source of the ETIMEDOUT within the kernel to 
tcp_timer_rexmt() in tcp_timer.c:


  if (++tp->t_rxtshift > TCP_MAXRXTSHIFT) {
  tp->t_rxtshift = TCP_MAXRXTSHIFT;
  tcpstat.tcps_timeoutdrop++;
  tp = tcp_drop(tp, tp->t_softerror ?
tp->t_softerror : ETIMEDOUT);
  goto out;
  }


Yes, this is related to either lack of mbufs to create a segment
or a problem in sending it.  That may be full interface queue, a
bandwidth manager (dummynet) or some firewall internally rejecting
the segment (ipfw, pf).  Do you run any firewall in stateful mode?


There's no firewall running.

I'm new to FreeBSD, but it seems to implies that it's reaching a 
limit of a number of retransmits of sending ACKs on the TCP 
connection receiving the inbound data? But I checked this using 
tcpdump on the server and could see no retransmissions.


When you have internal problems the segment never makes it to the
wire and thus you wont see it in tcpdump.

Please report the output of 'netstat -s -p tcp' and 'netstat -m'.


Posted below. You can see it it in there: "131 connections dropped by 
rexmit timeout"


As a test, I ran a simulation with the necessary changes to increase 
TCP_MAXRXTSHIFT (including adding appropriate entries to 
tcp_sync_backoff[] and tcp_backoff[]) and it appeared I was able to 
reduce the frequency of the problem occurring, but not to a usable 
level.


Possible causes are timers that fire too early.  Resource starvation
(you are doing a lot of traffic).  Or of course some bug in the code.


As I said in my original email, the data transfer doesn't stop or 
splutter, it's simply cut mid-flow. Sounds like something happening 
prematurely.


Thanks for the help,


The output doesn't show any obvious problems.  I have to write some
debug code to run on your system.  I'll do that later today if time
permits.  Otherwise tomorrow.

--
Andre


Mark



$ netstat -m
14632/8543/23175 mbufs in use (current/cache/total)
504/4036/4540/25600 mbuf clusters in use (current/cache/total/max)
504/3976 mbuf+clusters out of packet secondary zone in use (current/cache)
12550/250/12800/12800 4k (page size) jumbo clusters in use
(current/cache/total/max)
0/0/0/6400 9k jumbo clusters in use (current/cache/total/max)
0/0/0/3200 16k jumbo clusters in use (current/cache/total/max)
54866K/11207K/66073K bytes allocated to network (current/cache/total)
0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters)
0/0/0 requests for jumbo clusters denied (4k/9k/16k)
0/6/6656 sfbufs in use (current/peak/max)
0 requests for sfbufs denied
0 requests for sfbufs delayed
0 requests for I/O initiated by sendfile
0 calls to protocol drain routines

$ netstat -s -p tcp
tcp:
   3408601864 packets sent
   3382078274 data packets (1431587515 bytes)
   454189057 data packets (1209708476 bytes) retransmitted
   14969051 data packets unnecessarily retransmitted
   0 resends initiated by MTU discovery
   2216740 ack-only packets (9863 delayed)
   0 URG only packets
   0 window probe packets
   273815 window update packets
   35946 control packets
   2372591976 packets received
   1991632669 acks (for 2122913190 bytes)
   16032443 duplicate acks
   0 acks for unsent data
   1719033 packets (1781984933 bytes) received in-sequence
   1404 completely duplicate packets (197042 bytes)
   1 old duplicate packet
   54 packets with some dup. data (6403 bytes duped)
   9858 out-of-order packets (9314285 bytes)
   0 packets (0 bytes) of data after window
   0 window probes
   363132176 window update packets
   3 packets received after close
   0 discarded for bad checksums
   0 discarded for bad header offset fields
   0 discarded because packet too short
   635 discarded due to memory problems
   39 connection requests
   86333 connection accepts
   0 bad connection attempts
   2256 listen queue overflows
   8557 ignored RSTs in the windows
   86369 connections established (including accepts)
   83380 connections closed (including 31174 drops)
   74004 connections updated cached RTT on close
   74612 connections updated cached RTT variance on close
   74591 connections updated cached ssthresh on close
   3 embryonic connections dropped
   1979184038 segments updated rtt (of 1729113221 attempts)
   110108313 retransmit timeouts
   131 connections dropped by rexmit timeout
   1 pers

Re: read() returns ETIMEDOUT on steady TCP connection

2008-04-22 Thread Andre Oppermann

Andre Oppermann wrote:

Mark Hills wrote:

On Mon, 21 Apr 2008, Andre Oppermann wrote:


Mark Hills wrote:

On Sun, 20 Apr 2008, Peter Jeremy wrote:



I can't explain the problem but it definitely looks like a resource
starvation issue within the kernel.


I've traced the source of the ETIMEDOUT within the kernel to 
tcp_timer_rexmt() in tcp_timer.c:


  if (++tp->t_rxtshift > TCP_MAXRXTSHIFT) {
  tp->t_rxtshift = TCP_MAXRXTSHIFT;
  tcpstat.tcps_timeoutdrop++;
  tp = tcp_drop(tp, tp->t_softerror ?
tp->t_softerror : ETIMEDOUT);
  goto out;
  }


Yes, this is related to either lack of mbufs to create a segment
or a problem in sending it.  That may be full interface queue, a
bandwidth manager (dummynet) or some firewall internally rejecting
the segment (ipfw, pf).  Do you run any firewall in stateful mode?


There's no firewall running.

I'm new to FreeBSD, but it seems to implies that it's reaching a 
limit of a number of retransmits of sending ACKs on the TCP 
connection receiving the inbound data? But I checked this using 
tcpdump on the server and could see no retransmissions.


When you have internal problems the segment never makes it to the
wire and thus you wont see it in tcpdump.

Please report the output of 'netstat -s -p tcp' and 'netstat -m'.


Posted below. You can see it it in there: "131 connections dropped by 
rexmit timeout"


As a test, I ran a simulation with the necessary changes to increase 
TCP_MAXRXTSHIFT (including adding appropriate entries to 
tcp_sync_backoff[] and tcp_backoff[]) and it appeared I was able to 
reduce the frequency of the problem occurring, but not to a usable 
level.


Possible causes are timers that fire too early.  Resource starvation
(you are doing a lot of traffic).  Or of course some bug in the code.


As I said in my original email, the data transfer doesn't stop or 
splutter, it's simply cut mid-flow. Sounds like something happening 
prematurely.


Thanks for the help,


The output doesn't show any obvious problems.  I have to write some
debug code to run on your system.  I'll do that later today if time
permits.  Otherwise tomorrow.


 http://people.freebsd.org/~andre/tcp_output-error-log.diff

Please apply this patch and enable the sysctl net.inet.tcp.log_debug=1
and report any output.  You likely get some (normal) noise from syncache.
What we are looking for is reports from tcp_output.

--
Andre
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: read() returns ETIMEDOUT on steady TCP connection

2008-04-23 Thread Mark Hills

On Wed, 23 Apr 2008, Andre Oppermann wrote:


http://people.freebsd.org/~andre/tcp_output-error-log.diff

Please apply this patch and enable the sysctl net.inet.tcp.log_debug=1
and report any output.  You likely get some (normal) noise from syncache.
What we are looking for is reports from tcp_output.


Hi Andre, I've applied the patch and tested.

Aside from syncache noise, I get a constant stream of 'error 55' 
(ENOBUFS?), once the number of connection gets to around 150 at 192kbps.


TCP: [192.168.5.43]:52153 to [192.168.5.40]:8080; tcp_output: error 55 while 
sending

192.168.5.40 is the IP address of this host, running the server.

I tried to correlate the point of the application receiving ETIMEDOUT with 
these messages, but that is tricky as it seems to be outputting a lot of 
messages, and multiple messages over eachother (see below).


Because of the mention of no buffer space available, I checked the values 
of net.inet.tcp.sendbuf* and recvbuf*, and increased the max values with 
no effect.


When I get time I will modify the kernel to print errors which aren't 
ENOBUFS to see if there are any others. But in the meantime, this sounds 
like a problem to me. Is that correct?


Mark


:8080; tcp_output: error 55 while sending
TCP: [192.168.5.42]:57384T CtPo:  [[119922..116688..55..4402]]::85048400;1  
ttoc p[_1o9u2t.p1u6t8:. 5e.r4r0o]r: 8080;5 5t cwp_hoiultep uste:n deirnrgor 55 
while sending
TCP: [192.168.5.42]:57382 to [192.168.5.40]:8080; tcp_output: error 55 while 
sending
TCP: [192.168.5.42]:57381 to [192.168.5.40]:8080; tcp_output: error 55 while 
sending
TCP: [192.168.5.42]:57380 to [192.168.5.40]:8080; tcp_output: error 55 while 
sending

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: read() returns ETIMEDOUT on steady TCP connection

2008-04-24 Thread Andre Oppermann

Mark Hills wrote:

On Wed, 23 Apr 2008, Andre Oppermann wrote:


http://people.freebsd.org/~andre/tcp_output-error-log.diff

Please apply this patch and enable the sysctl net.inet.tcp.log_debug=1
and report any output.  You likely get some (normal) noise from syncache.
What we are looking for is reports from tcp_output.


Hi Andre, I've applied the patch and tested.

Aside from syncache noise, I get a constant stream of 'error 55' 
(ENOBUFS?), once the number of connection gets to around 150 at 192kbps.


TCP: [192.168.5.43]:52153 to [192.168.5.40]:8080; tcp_output: error 55 
while sending


192.168.5.40 is the IP address of this host, running the server.

I tried to correlate the point of the application receiving ETIMEDOUT 
with these messages, but that is tricky as it seems to be outputting a 
lot of messages, and multiple messages over eachother (see below).


Because of the mention of no buffer space available, I checked the 
values of net.inet.tcp.sendbuf* and recvbuf*, and increased the max 
values with no effect.


When I get time I will modify the kernel to print errors which aren't 
ENOBUFS to see if there are any others. But in the meantime, this sounds 
like a problem to me. Is that correct?


Yes.  I'll investigate why you get ENOBUFS here despite your netstat -m
output not showing any request denied.

--
Andre


Mark


:8080; tcp_output: error 55 while sending
TCP: [192.168.5.42]:57384T CtPo:  
[[119922..116688..55..4402]]::85048400;1  ttoc p[_1o9u2t.p1u6t8:. 
5e.r4r0o]r: 8080;5 5t cwp_hoiultep uste:n deirnrgor 55 while sending
TCP: [192.168.5.42]:57382 to [192.168.5.40]:8080; tcp_output: error 55 
while sending
TCP: [192.168.5.42]:57381 to [192.168.5.40]:8080; tcp_output: error 55 
while sending
TCP: [192.168.5.42]:57380 to [192.168.5.40]:8080; tcp_output: error 55 
while sending






___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: read() returns ETIMEDOUT on steady TCP connection

2008-05-02 Thread Andre Oppermann

Mark Hills wrote:

On Wed, 23 Apr 2008, Andre Oppermann wrote:


http://people.freebsd.org/~andre/tcp_output-error-log.diff

Please apply this patch and enable the sysctl net.inet.tcp.log_debug=1
and report any output.  You likely get some (normal) noise from syncache.
What we are looking for is reports from tcp_output.


Hi Andre, I've applied the patch and tested.

Aside from syncache noise, I get a constant stream of 'error 55' 
(ENOBUFS?), once the number of connection gets to around 150 at 192kbps.


TCP: [192.168.5.43]:52153 to [192.168.5.40]:8080; tcp_output: error 55 
while sending


192.168.5.40 is the IP address of this host, running the server.

I tried to correlate the point of the application receiving ETIMEDOUT 
with these messages, but that is tricky as it seems to be outputting a 
lot of messages, and multiple messages over eachother (see below).


Because of the mention of no buffer space available, I checked the 
values of net.inet.tcp.sendbuf* and recvbuf*, and increased the max 
values with no effect.


When I get time I will modify the kernel to print errors which aren't 
ENOBUFS to see if there are any others. But in the meantime, this sounds 
like a problem to me. Is that correct?


Mark


:8080; tcp_output: error 55 while sending
TCP: [192.168.5.42]:57384T CtPo:  
[[119922..116688..55..4402]]::85048400;1  ttoc p[_1o9u2t.p1u6t8:. 
5e.r4r0o]r: 8080;5 5t cwp_hoiultep uste:n deirnrgor 55 while sending
TCP: [192.168.5.42]:57382 to [192.168.5.40]:8080; tcp_output: error 55 
while sending
TCP: [192.168.5.42]:57381 to [192.168.5.40]:8080; tcp_output: error 55 
while sending
TCP: [192.168.5.42]:57380 to [192.168.5.40]:8080; tcp_output: error 55 
while sending


After tracing through the code it seems you are indeed memory limited.
Looking back at the netstat -m output:

 12550/250/12800/12800 4k (page size) jumbo clusters in use
 (current/cache/total/max)
 0/0/0 requests for jumbo clusters denied (4k/9k/16k)

This shows that the supply of 4k jumbo clusters is pretty much exhausted.
The cache may be allocated to different CPUs and the one making the request
at a given point may be depleted and can't get any from the global pool.
The big question is why the denied counter doesn't report anything.  I've
looked at the code paths and don't see any obvious reason why it doesn't
get counted.  Maybe Robert can give some insight here.

Try doubling the amount of 4k page size jumbo mbufs.  They are the primary
workhorse in the kernel right now:

 sysctl kern.ipc.nmbjumbop=25600

This should get further.  Still more may be necessary depending on workloads.

--
Andre

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: read() returns ETIMEDOUT on steady TCP connection

2008-05-03 Thread Tim Gebbett

Hi Andre,

Just to introduce myself, I am now helping Mark Hills with testing. 
Thank you for your suggestion, here are the results from a similar 
system (RELENG-7) with increasing

kern.ipc.nmbjumbop to 25600.

at 1600 streams using approx 340mbit, netstat  -m  was reporting

12550/250/12800/12800 4k (page size) jumbo clusters in use

After the read() returns ETIMEDOUT,

3857/10551/14408/25600 4k (page size) jumbo clusters in use

sysctl kern.ipc.nmbjumbop=25600 > 51200

After the read() returns ETIMEDOUT,

200/25400/25600/51200 4k (page size) jumbo clusters in use 
(current/cache/total/max)


netstat -m:

4140/26205/30345 mbufs in use (current/cache/total)
256/3482/3738/25600 mbuf clusters in use (current/cache/total/max)
256/3328 mbuf+clusters out of packet secondary zone in use (current/cache)
3882/21718/25600/51200 4k (page size) jumbo clusters in use 
(current/cache/total/max)

0/0/0/6400 9k jumbo clusters in use (current/cache/total/max)
0/0/0/3200 16k jumbo clusters in use (current/cache/total/max)
17075K/100387K/117462K bytes allocated to network (current/cache/total)
0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters)
0/0/0 requests for jumbo clusters denied (4k/9k/16k)
0/7/6656 sfbufs in use (current/peak/max)
0 requests for sfbufs denied
0 requests for sfbufs delayed
0 requests for I/O initiated by sendfile
0 calls to protocol drain routines

Do you think we need to reel out further sysctls and should I apply the 
patch to see if tcp_output: error 55  is still occuring ?


Thanks again, Tim


Andre Oppermann wrote:

Mark Hills wrote:

On Wed, 23 Apr 2008, Andre Oppermann wrote:


http://people.freebsd.org/~andre/tcp_output-error-log.diff

Please apply this patch and enable the sysctl net.inet.tcp.log_debug=1
and report any output.  You likely get some (normal) noise from 
syncache.

What we are looking for is reports from tcp_output.


Hi Andre, I've applied the patch and tested.

Aside from syncache noise, I get a constant stream of 'error 55' 
(ENOBUFS?), once the number of connection gets to around 150 at 192kbps.


TCP: [192.168.5.43]:52153 to [192.168.5.40]:8080; tcp_output: error 
55 while sending


192.168.5.40 is the IP address of this host, running the server.

I tried to correlate the point of the application receiving ETIMEDOUT 
with these messages, but that is tricky as it seems to be outputting 
a lot of messages, and multiple messages over eachother (see below).


Because of the mention of no buffer space available, I checked the 
values of net.inet.tcp.sendbuf* and recvbuf*, and increased the max 
values with no effect.


When I get time I will modify the kernel to print errors which aren't 
ENOBUFS to see if there are any others. But in the meantime, this 
sounds like a problem to me. Is that correct?


Mark


:8080; tcp_output: error 55 while sending
TCP: [192.168.5.42]:57384T CtPo:  
[[119922..116688..55..4402]]::85048400;1  ttoc p[_1o9u2t.p1u6t8:. 
5e.r4r0o]r: 8080;5 5t cwp_hoiultep uste:n deirnrgor 55 while sending
TCP: [192.168.5.42]:57382 to [192.168.5.40]:8080; tcp_output: error 
55 while sending
TCP: [192.168.5.42]:57381 to [192.168.5.40]:8080; tcp_output: error 
55 while sending
TCP: [192.168.5.42]:57380 to [192.168.5.40]:8080; tcp_output: error 
55 while sending


After tracing through the code it seems you are indeed memory limited.
Looking back at the netstat -m output:

 12550/250/12800/12800 4k (page size) jumbo clusters in use
 (current/cache/total/max)
 0/0/0 requests for jumbo clusters denied (4k/9k/16k)

This shows that the supply of 4k jumbo clusters is pretty much exhausted.
The cache may be allocated to different CPUs and the one making the 
request

at a given point may be depleted and can't get any from the global pool.
The big question is why the denied counter doesn't report anything.  I've
looked at the code paths and don't see any obvious reason why it doesn't
get counted.  Maybe Robert can give some insight here.

Try doubling the amount of 4k page size jumbo mbufs.  They are the 
primary

workhorse in the kernel right now:

 sysctl kern.ipc.nmbjumbop=25600

This should get further.  Still more may be necessary depending on 
workloads.




___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: read() returns ETIMEDOUT on steady TCP connection

2008-05-05 Thread Deng XueFeng
hi
I'am also meet this problem in my mss server(missey streaming server).
one encoder push stream to mss, then run 100 client player playing the
sream, as the client number increase,  mss  will occur this error sooner or 
later
like this:
I'am using kqueue, and will got a event with  EV_EOF and fflags =
ETIMEDOUT,
if i ignore EV_EOF  flag, then ETIMEDOUT will be return by read(2),

and the tcpdump also show that server  will send RST packet to encoder.


> Hello,
> 
> I'm are having a trouble with TCP connections being dropped with "read: 
> Operation timed out". What is unusual is that this is happening right in 
> the middle of sending a steady stream of data with no network congestion.
> 
> The system is FreeBSD 7 and a bespoke streaming server with 1Gbit 
> connection. The server receives a 192kbps inbound stream over TCP, and 
> broadcasts it over a large number of TCP streams.
> 
> With no visible or obvious pattern, the inbound read() fails with 
> ETIMEDOUT. The likelihood of this happening seems to increase as the 
> number of audience connections increases. It's happens every few minutes 
> even with a small audience (eg. 300 outbound connections and about 
> 60mbit).
> 
> It doesn't cough and splutter -- steady data is coming in, then it just 
> drops the connection.
> 
> systat doesn't show problems inbound; all packets received are delivered 
> to the upper layer. But on outbound, there is consistent 'output drops':
> 
>  IP Output
> 7028 total packets sent
> 7028 - generated locally
>   314 - output drops
> 
> As the number of outbound connections increases, the 'output drops' 
> increases to around 10% of the total packets sent and maintains that 
> ratio. There's no problems with network capacity.
> 
> I've tried different servers, different network interfaces (bge, em), 
> different kernel (7-RELEASE, 7-STABLE). Have also checked dev.bge.0.stats 
> and dev.em.0.stats for CRC errors etc. which show no problems. 'netstat 
> -m' doesn't show any reaching of mbuf and sbuf limits. The problem is seen 
> in a dedicated, uncontended test environment.
> 
> Can anyone explain why the packets are being dropped outbound, and how 
> this could affect inbound TCP data in such an abrupt way? What can I do to 
> solve this?
> 
> Thanks,
> 
> Mark
> ___
> freebsd-net@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to "[EMAIL PROTECTED]"

-- 
Deng XueFeng <[EMAIL PROTECTED]>

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: read() returns ETIMEDOUT on steady TCP connection

2008-05-07 Thread Andre Oppermann

I've looked at the code paths again.  There are two possibilities:

 a) the mbuf allocator has some anomaly where it rejects memory requests
but doesn't update the statistics (the code is there however).

 b) the error doesn't come from the mbuf allocation but from ip_output()
and further down the chain.

To differentiate please try this updated patch and report the log output
again (don't forget to set net.inet.tcp.log_debug=1):

 http://people.freebsd.org/~andre/tcp_output-error-log.diff

--
Andre

Deng XueFeng wrote:

hi
I'am also meet this problem in my mss server(missey streaming server).
one encoder push stream to mss, then run 100 client player playing the
sream, as the client number increase,  mss  will occur this error sooner or 
later
like this:
I'am using kqueue, and will got a event with  EV_EOF and fflags =
ETIMEDOUT,
if i ignore EV_EOF  flag, then ETIMEDOUT will be return by read(2),

and the tcpdump also show that server  will send RST packet to encoder.



Hello,

I'm are having a trouble with TCP connections being dropped with "read: 
Operation timed out". What is unusual is that this is happening right in 
the middle of sending a steady stream of data with no network congestion.


The system is FreeBSD 7 and a bespoke streaming server with 1Gbit 
connection. The server receives a 192kbps inbound stream over TCP, and 
broadcasts it over a large number of TCP streams.


With no visible or obvious pattern, the inbound read() fails with 
ETIMEDOUT. The likelihood of this happening seems to increase as the 
number of audience connections increases. It's happens every few minutes 
even with a small audience (eg. 300 outbound connections and about 
60mbit).


It doesn't cough and splutter -- steady data is coming in, then it just 
drops the connection.


systat doesn't show problems inbound; all packets received are delivered 
to the upper layer. But on outbound, there is consistent 'output drops':


 IP Output
7028 total packets sent
7028 - generated locally
  314 - output drops

As the number of outbound connections increases, the 'output drops' 
increases to around 10% of the total packets sent and maintains that 
ratio. There's no problems with network capacity.


I've tried different servers, different network interfaces (bge, em), 
different kernel (7-RELEASE, 7-STABLE). Have also checked dev.bge.0.stats 
and dev.em.0.stats for CRC errors etc. which show no problems. 'netstat 
-m' doesn't show any reaching of mbuf and sbuf limits. The problem is seen 
in a dedicated, uncontended test environment.


Can anyone explain why the packets are being dropped outbound, and how 
this could affect inbound TCP data in such an abrupt way? What can I do to 
solve this?


Thanks,

Mark
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"




___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: read() returns ETIMEDOUT on steady TCP connection

2008-05-07 Thread Deng XueFeng
hi,
the patch can not apply to 6.2, cound do a new patch for 6.2 or 6.3 ?

> I've looked at the code paths again.  There are two possibilities:
> 
>   a) the mbuf allocator has some anomaly where it rejects memory requests
>  but doesn't update the statistics (the code is there however).
> 
>   b) the error doesn't come from the mbuf allocation but from ip_output()
>  and further down the chain.
> 
> To differentiate please try this updated patch and report the log output
> again (don't forget to set net.inet.tcp.log_debug=1):
> 
>   http://people.freebsd.org/~andre/tcp_output-error-log.diff
> 
> -- 
> Andre
> 
> Deng XueFeng wrote:
> > hi
> > I'am also meet this problem in my mss server(missey streaming server).
> > one encoder push stream to mss, then run 100 client player playing the
> > sream, as the client number increase,  mss  will occur this error sooner or 
> > later
> > like this:
> > I'am using kqueue, and will got a event with  EV_EOF and fflags =
> > ETIMEDOUT,
> > if i ignore EV_EOF  flag, then ETIMEDOUT will be return by read(2),
> > 
> > and the tcpdump also show that server  will send RST packet to encoder.
> > 
> > 
> >> Hello,
> >>
> >> I'm are having a trouble with TCP connections being dropped with "read: 
> >> Operation timed out". What is unusual is that this is happening right in 
> >> the middle of sending a steady stream of data with no network congestion.
> >>
> >> The system is FreeBSD 7 and a bespoke streaming server with 1Gbit 
> >> connection. The server receives a 192kbps inbound stream over TCP, and 
> >> broadcasts it over a large number of TCP streams.
> >>
> >> With no visible or obvious pattern, the inbound read() fails with 
> >> ETIMEDOUT. The likelihood of this happening seems to increase as the 
> >> number of audience connections increases. It's happens every few minutes 
> >> even with a small audience (eg. 300 outbound connections and about 
> >> 60mbit).
> >>
> >> It doesn't cough and splutter -- steady data is coming in, then it just 
> >> drops the connection.
> >>
> >> systat doesn't show problems inbound; all packets received are delivered 
> >> to the upper layer. But on outbound, there is consistent 'output drops':
> >>
> >>  IP Output
> >> 7028 total packets sent
> >> 7028 - generated locally
> >>   314 - output drops
> >>
> >> As the number of outbound connections increases, the 'output drops' 
> >> increases to around 10% of the total packets sent and maintains that 
> >> ratio. There's no problems with network capacity.
> >>
> >> I've tried different servers, different network interfaces (bge, em), 
> >> different kernel (7-RELEASE, 7-STABLE). Have also checked dev.bge.0.stats 
> >> and dev.em.0.stats for CRC errors etc. which show no problems. 'netstat 
> >> -m' doesn't show any reaching of mbuf and sbuf limits. The problem is seen 
> >> in a dedicated, uncontended test environment.
> >>
> >> Can anyone explain why the packets are being dropped outbound, and how 
> >> this could affect inbound TCP data in such an abrupt way? What can I do to 
> >> solve this?
> >>
> >> Thanks,
> >>
> >> Mark
> >> ___
> >> freebsd-net@freebsd.org mailing list
> >> http://lists.freebsd.org/mailman/listinfo/freebsd-net
> >> To unsubscribe, send any mail to "[EMAIL PROTECTED]"
> > 

-- 
Deng XueFeng <[EMAIL PROTECTED]>

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: read() returns ETIMEDOUT on steady TCP connection

2008-05-08 Thread Andre Oppermann

Deng XueFeng wrote:

hi,
the patch can not apply to 6.2, cound do a new patch for 6.2 or 6.3 ?


The logging function is not (yet) present in RELENG_6.  I'll post the
patch when I've backported the functionality.

However it's an important information that it happens on 6.2 too.  That
means the source of the trouble wasn't introduced with 7.0.

--
Andre


I've looked at the code paths again.  There are two possibilities:

  a) the mbuf allocator has some anomaly where it rejects memory requests
 but doesn't update the statistics (the code is there however).

  b) the error doesn't come from the mbuf allocation but from ip_output()
 and further down the chain.

To differentiate please try this updated patch and report the log output
again (don't forget to set net.inet.tcp.log_debug=1):

  http://people.freebsd.org/~andre/tcp_output-error-log.diff

--
Andre

Deng XueFeng wrote:

hi
I'am also meet this problem in my mss server(missey streaming server).
one encoder push stream to mss, then run 100 client player playing the
sream, as the client number increase,  mss  will occur this error sooner or 
later
like this:
I'am using kqueue, and will got a event with  EV_EOF and fflags =
ETIMEDOUT,
if i ignore EV_EOF  flag, then ETIMEDOUT will be return by read(2),

and the tcpdump also show that server  will send RST packet to encoder.



Hello,

I'm are having a trouble with TCP connections being dropped with "read: 
Operation timed out". What is unusual is that this is happening right in 
the middle of sending a steady stream of data with no network congestion.


The system is FreeBSD 7 and a bespoke streaming server with 1Gbit 
connection. The server receives a 192kbps inbound stream over TCP, and 
broadcasts it over a large number of TCP streams.


With no visible or obvious pattern, the inbound read() fails with 
ETIMEDOUT. The likelihood of this happening seems to increase as the 
number of audience connections increases. It's happens every few minutes 
even with a small audience (eg. 300 outbound connections and about 
60mbit).


It doesn't cough and splutter -- steady data is coming in, then it just 
drops the connection.


systat doesn't show problems inbound; all packets received are delivered 
to the upper layer. But on outbound, there is consistent 'output drops':


 IP Output
7028 total packets sent
7028 - generated locally
  314 - output drops

As the number of outbound connections increases, the 'output drops' 
increases to around 10% of the total packets sent and maintains that 
ratio. There's no problems with network capacity.


I've tried different servers, different network interfaces (bge, em), 
different kernel (7-RELEASE, 7-STABLE). Have also checked dev.bge.0.stats 
and dev.em.0.stats for CRC errors etc. which show no problems. 'netstat 
-m' doesn't show any reaching of mbuf and sbuf limits. The problem is seen 
in a dedicated, uncontended test environment.


Can anyone explain why the packets are being dropped outbound, and how 
this could affect inbound TCP data in such an abrupt way? What can I do to 
solve this?


Thanks,

Mark
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"




___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: read() returns ETIMEDOUT on steady TCP connection

2008-05-08 Thread Mark Hills

On Thu, 8 May 2008, Andre Oppermann wrote:


Deng XueFeng wrote:

hi,
the patch can not apply to 6.2, cound do a new patch for 6.2 or 6.3 ?


The logging function is not (yet) present in RELENG_6.  I'll post the
patch when I've backported the functionality.

However it's an important information that it happens on 6.2 too.  That
means the source of the trouble wasn't introduced with 7.0.


I did earlier tests with the same software on FreeBSD 6.3 and never saw 
ETIMEDOUT -- only on FreeBSD 7.0.


But I did have a different issue with 6.3 (lockups under very heavy 
conditions), although didn't do any specific tuning to try and stop this. 
Instead I made the jump to 7.0 which stopped this problem but introduced 
the ETIMEDOUT one.


Mark
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: read() returns ETIMEDOUT on steady TCP connection

2008-05-08 Thread Tim Gebbett
Hi all,

applied the patch,

Well before a ETIMEDOUT error occurred (around 60secs), the tcp debug started 
venting massive quantities of tcp_output error 55 while sending with syncache 
noise:


y  8 12:14:26 timtest kernel: :63859 to [192.168.5.40]:80; tcp_output: error 55 
whilTeC Ps:e n[d1i9n2g. 1(6i8p._5o.u4t3p]u:t64 0371 )t
May  8 12:14:26 timtest kernel: o
May  8 12:14:26 timtest kernel: [192.168.5.40]:80; tcp_output: error 55 while 
sendingT
May  8 12:14:26 timtest kernel: C
May  8 12:14:26 timtest kernel: P: [192.168.5.43]:63859 to [192.168.5.40]:80; 
tcp_output: error 55 while sending
May  8 12:14:26 timtest kernel: TCP: [192.168.5.43]:64037 to [192.168.5.40]:80; 
tcp_output: error 55 while sending (ip_output 1)
May  8 12:14:26 timtest kernel: TCP: [192.168.5.43]:64037 to [192.168.5.40]:80; 
tcp_output: error 55 while sending
May  8 12:14:26 timtest kernel: TCP: [192.168.5.43]:63857 to [192.168.5.40]:80; 
tcp_output: error 55 while sending (ip_output 1)
May  8 12:14:26 timtest kernel: TCP: [192.168.5.42]:56421 toT C[P1:9 
2[.119628..51.6480.]5:.8403;] :6t3c8p57_ otuot p[u1t9:2 .e1r68r.o5r. 40]:8505;  
whticlpe_ osuetnpduitn:g  e(rirpo_ro utpu5t5  w1h)i

interspersed with clean blocks of 20 entries or so of:

May  8 12:14:26 timtest kernel: TCP: [192.168.5.43]:64037 to [192.168.5.40]:80; 
tcp_output: error 55 while sending (ip_output 1)
May  8 12:14:26 timtest kernel: TCP: [192.168.5.43]:64037 to [192.168.5.40]:80; 
tcp_output: error 55 while sending
May  8 12:14:26 timtest kernel: TCP: [192.168.5.43]:63857 to [192.168.5.40]:80; 
tcp_output: error 55 while sending (ip_output 1)


The output did not look appreciably different when the ETIMEDOUT occurred.

On stopping the client test program:

May  8 12:14:46 timtest kernel: TCP: [192.168.5.43]:63978 to [192.168.5.40]:80 
tcpflags 0x4; syncache_chkrst: Spurious RST without matching syncache 
entry (possibly syncookie only), segment ignored
May  8 12:14:46 timtest kernel: TCP: [192.168.5.43]:63978 to [192.168.5.40]:80 
tcpflags 0x4; syncache_chkrst: Spurious RST without matching syncache 
entry (possibly syncookie only), segment ignored
May  8 12:14:46 timtest kernel: TCP: [192.168.5.43]:63978 to [192.168.5.40]:80 
tcpflags 0x4; syncache_chkrst: Spurious RST without matching syncache 
entry (possibly syncookie only), segment ignored

netstat -m

258/11007/11265 mbufs in use (current/cache/total)
256/1596/1852/25600 mbuf clusters in use (current/cache/total/max)
256/1536 mbuf+clusters out of packet secondary zone in use (current/cache)
0/7585/7585/51200 4k (page size) jumbo clusters in use (current/cache/total/max)
0/0/0/6400 9k jumbo clusters in use (current/cache/total/max)
0/0/0/3200 16k jumbo clusters in use (current/cache/total/max)
576K/36283K/36860K bytes allocated to network (current/cache/total)
0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters)
0/0/0 requests for jumbo clusters denied (4k/9k/16k)
0/4/6656 sfbufs in use (current/peak/max)
0 requests for sfbufs denied
0 requests for sfbufs delayed
0 requests for I/O initiated by sendfile
0 calls to protocol drain routines

Thanks again for your help - Tim







___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: read() returns ETIMEDOUT on steady TCP connection

2008-05-08 Thread Andre Oppermann

Hi Tim,

looking at the ip_output() path there are some places that can
return ENOBUFS:

 a) interface queue length check

 b) packet filter

 c) destination address rewrite through NAT

 d) if_output() call

 e) IP fragmentation if DF was not set

The first one of those is the most likely to be the source of the
error.  The output interface queue length in read unlocked and may
be a stale value on an SMP machine.  Further down in ether_output()
there are some further possibilities for ENOBUFS errors.  But lets
concentrate on a) first.

For testing purposes please apply the following patch to ip_output():

---
cvs diff -up ip_output.c
Index: ip_output.c
===
RCS file: /home/ncvs/src/sys/netinet/ip_output.c,v
retrieving revision 1.276.2.1
diff -u -p -r1.276.2.1 ip_output.c
--- ip_output.c 9 Mar 2008 21:04:54 -   1.276.2.1
+++ ip_output.c 8 May 2008 16:02:32 -
@@ -370,7 +370,7 @@ again:
ip->ip_src = IA_SIN(ia)->sin_addr;
}
}
-
+#if 0
/*
 * Verify that we have any chance at all of being able to queue the
 * packet or packet fragments, unless ALTQ is enabled on the given
@@ -390,7 +390,7 @@ again:
ifp->if_snd.ifq_drops += (ip->ip_len / ifp->if_mtu + 1);
goto bad;
}
-
+#endif
/*
 * Look for broadcast address and
 * verify user is allowed to send
---

If there is a real interface output queue full event the IFQ_HANDOFF()
and IFQ_ENQUEUE() macros will report it too.  Then we can focus on the
interface queues.

--
Andre

Tim Gebbett wrote:

Hi all,

applied the patch,

Well before a ETIMEDOUT error occurred (around 60secs), the tcp debug started 
venting massive quantities of tcp_output error 55 while sending with syncache 
noise:


y  8 12:14:26 timtest kernel: :63859 to [192.168.5.40]:80; tcp_output: error 55 
whilTeC Ps:e n[d1i9n2g. 1(6i8p._5o.u4t3p]u:t64 0371 )t
May  8 12:14:26 timtest kernel: o
May  8 12:14:26 timtest kernel: [192.168.5.40]:80; tcp_output: error 55 while 
sendingT
May  8 12:14:26 timtest kernel: C
May  8 12:14:26 timtest kernel: P: [192.168.5.43]:63859 to [192.168.5.40]:80; 
tcp_output: error 55 while sending
May  8 12:14:26 timtest kernel: TCP: [192.168.5.43]:64037 to [192.168.5.40]:80; 
tcp_output: error 55 while sending (ip_output 1)
May  8 12:14:26 timtest kernel: TCP: [192.168.5.43]:64037 to [192.168.5.40]:80; 
tcp_output: error 55 while sending
May  8 12:14:26 timtest kernel: TCP: [192.168.5.43]:63857 to [192.168.5.40]:80; 
tcp_output: error 55 while sending (ip_output 1)
May  8 12:14:26 timtest kernel: TCP: [192.168.5.42]:56421 toT C[P1:9 
2[.119628..51.6480.]5:.8403;] :6t3c8p57_ otuot p[u1t9:2 .e1r68r.o5r. 40]:8505;  
whticlpe_ osuetnpduitn:g  e(rirpo_ro utpu5t5  w1h)i

interspersed with clean blocks of 20 entries or so of:

May  8 12:14:26 timtest kernel: TCP: [192.168.5.43]:64037 to [192.168.5.40]:80; 
tcp_output: error 55 while sending (ip_output 1)
May  8 12:14:26 timtest kernel: TCP: [192.168.5.43]:64037 to [192.168.5.40]:80; 
tcp_output: error 55 while sending
May  8 12:14:26 timtest kernel: TCP: [192.168.5.43]:63857 to [192.168.5.40]:80; 
tcp_output: error 55 while sending (ip_output 1)


The output did not look appreciably different when the ETIMEDOUT occurred.

On stopping the client test program:

May  8 12:14:46 timtest kernel: TCP: [192.168.5.43]:63978 to [192.168.5.40]:80 
tcpflags 0x4; syncache_chkrst: Spurious RST without matching syncache 
entry (possibly syncookie only), segment ignored
May  8 12:14:46 timtest kernel: TCP: [192.168.5.43]:63978 to [192.168.5.40]:80 
tcpflags 0x4; syncache_chkrst: Spurious RST without matching syncache 
entry (possibly syncookie only), segment ignored
May  8 12:14:46 timtest kernel: TCP: [192.168.5.43]:63978 to [192.168.5.40]:80 
tcpflags 0x4; syncache_chkrst: Spurious RST without matching syncache 
entry (possibly syncookie only), segment ignored

netstat -m

258/11007/11265 mbufs in use (current/cache/total)
256/1596/1852/25600 mbuf clusters in use (current/cache/total/max)
256/1536 mbuf+clusters out of packet secondary zone in use (current/cache)
0/7585/7585/51200 4k (page size) jumbo clusters in use (current/cache/total/max)
0/0/0/6400 9k jumbo clusters in use (current/cache/total/max)
0/0/0/3200 16k jumbo clusters in use (current/cache/total/max)
576K/36283K/36860K bytes allocated to network (current/cache/total)
0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters)
0/0/0 requests for jumbo clusters denied (4k/9k/16k)
0/4/6656 sfbufs in use (current/peak/max)
0 requests for sfbufs denied
0 requests for sfbufs delayed
0 requests for I/O initiated by sendfile
0 calls to protocol drain routines

Thanks again for your help - Tim











___
freebs

Re: read() returns ETIMEDOUT on steady TCP connection

2008-05-08 Thread Tim Gebbett
Hi Andre,

Applied the patch, I could not see anything different to the last test. No
packet filtering or NAT are enabled, the test is running over a switch.

Many thanks - Tim

258/6657/6915 mbufs in use (current/cache/total)
256/1084/1340/25600 mbuf clusters in use (current/cache/total/max)
256/1024 mbuf+clusters out of packet secondary zone in use (current/cache)
0/3565/3565/51200 4k (page size) jumbo clusters in use
(current/cache/total/max)
0/0/0/6400 9k jumbo clusters in use (current/cache/total/max)
0/0/0/3200 16k jumbo clusters in use (current/cache/total/max)
576K/18092K/18668K bytes allocated to network (current/cache/total)
0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters)
0/0/0 requests for jumbo clusters denied (4k/9k/16k)
0/5/6656 sfbufs in use (current/peak/max)
0 requests for sfbufs denied
0 requests for sfbufs delayed
0 requests for I/O initiated by sendfile
0 calls to protocol drain routines



tcp:
2535986 packets sent
2331504 data packets (3266105457 bytes)
258369 data packets (371873800 bytes) retransmitted
46685 data packets unnecessarily retransmitted
0 resends initiated by MTU discovery
27972 ack-only packets (3905 delayed)
0 URG only packets
0 window probe packets
10636 window update packets
4 control packets
1888423 packets received
1172093 acks (for 3256670917 bytes)
425512 duplicate acks
0 acks for unsent data
46363 packets (52992771 bytes) received in-sequence
19 completely duplicate packets (17508 bytes)
0 old duplicate packets
0 packets with some dup. data (0 bytes duped)
293 out-of-order packets (376674 bytes)
0 packets (0 bytes) of data after window
0 window probes
242695 window update packets
0 packets received after close
0 discarded for bad checksums
0 discarded for bad header offset fields
0 discarded because packet too short
0 discarded due to memory problems
2 connection requests
2054 connection accepts
0 bad connection attempts
0 listen queue overflows
597 ignored RSTs in the windows
2056 connections established (including accepts)
2052 connections closed (including 2049 drops)
2048 connections updated cached RTT on close
2048 connections updated cached RTT variance on close
2048 connections updated cached ssthresh on close
0 embryonic connections dropped
1172093 segments updated rtt (of 1057825 attempts)
72399 retransmit timeouts
1 connection dropped by rexmit timeout
0 persist timeouts
0 connections dropped by persist timeout
0 Connections (fin_wait_2) dropped because of timeout
0 keepalive timeouts
0 keepalive probes sent
0 connections dropped by keepalive
4806 correct ACK header predictions
43111 correct data packet header predictions
2054 syncache entries added
0 retransmitted
0 dupsyn
0 dropped
2054 completed
0 bucket overflow
0 cache overflow
0 reset
0 stale
0 aborted
0 badack
0 unreach
0 zone failures
2054 cookies sent
0 cookies received
83801 SACK recovery episodes
116632 segment rexmits in SACK recovery episodes
168710294 byte rexmits in SACK recovery episodes
544885 SACK options (SACK blocks) received
14 SACK options (SACK blocks) sent
0 SACK scoreboard overflow
udp:
31 datagrams received
0 with incomplete header
0 with bad data length field
0 with bad checksum
0 with no checksum
0 dropped due to no socket
10 broadcast/multicast datagrams undelivered
0 dropped due to full socket buffers
0 not for hashed pcb
21 delivered
22 datagrams output
0 times multicast source filter matched
sctp:
0 input packets
0 datagrams
0 packets that had data
0 input SACK chunks
0 input DATA chunks
0 duplicate DATA chunks
0 input HB chunks
0 HB-ACK chunks
0 input ECNE chunks
0 input AUTH chunks
0 chunks missing AUTH
0 invalid HMAC ids received
0 invalid secret ids received
0 auth failed
0 fast path receives all one chunk
0 fast path mu

Re: read() returns ETIMEDOUT on steady TCP connection

2008-05-08 Thread Deng XueFeng
hi,
 applied the patch for 6.2, rebuild & install  kernel, 
but  nothing changed.
ETIMEDOUT still occur.

> Hi Tim,
> 
> looking at the ip_output() path there are some places that can
> return ENOBUFS:
> 
>   a) interface queue length check
> 
>   b) packet filter
> 
>   c) destination address rewrite through NAT
> 
>   d) if_output() call
> 
>   e) IP fragmentation if DF was not set
> 
> The first one of those is the most likely to be the source of the
> error.  The output interface queue length in read unlocked and may
> be a stale value on an SMP machine.  Further down in ether_output()
> there are some further possibilities for ENOBUFS errors.  But lets
> concentrate on a) first.
> 
> For testing purposes please apply the following patch to ip_output():
> 
> ---
> cvs diff -up ip_output.c
> Index: ip_output.c
> ===
> RCS file: /home/ncvs/src/sys/netinet/ip_output.c,v
> retrieving revision 1.276.2.1
> diff -u -p -r1.276.2.1 ip_output.c
> --- ip_output.c 9 Mar 2008 21:04:54 -   1.276.2.1
> +++ ip_output.c 8 May 2008 16:02:32 -
> @@ -370,7 +370,7 @@ again:
>  ip->ip_src = IA_SIN(ia)->sin_addr;
>  }
>  }
> -
> +#if 0
>  /*
>   * Verify that we have any chance at all of being able to queue the
>   * packet or packet fragments, unless ALTQ is enabled on the given
> @@ -390,7 +390,7 @@ again:
>  ifp->if_snd.ifq_drops += (ip->ip_len / ifp->if_mtu + 1);
>  goto bad;
>  }
> -
> +#endif
>  /*
>   * Look for broadcast address and
>   * verify user is allowed to send
> ---
> 
> If there is a real interface output queue full event the IFQ_HANDOFF()
> and IFQ_ENQUEUE() macros will report it too.  Then we can focus on the
> interface queues.
> 
> -- 
> Andre
> 
> Tim Gebbett wrote:
> > Hi all,
> > 
> > applied the patch,
> > 
> > Well before a ETIMEDOUT error occurred (around 60secs), the tcp debug 
> > started venting massive quantities of tcp_output error 55 while sending 
> > with syncache noise:
> > 
> > 
> > y  8 12:14:26 timtest kernel: :63859 to [192.168.5.40]:80; tcp_output: 
> > error 55 whilTeC Ps:e n[d1i9n2g. 1(6i8p._5o.u4t3p]u:t64 0371 )t
> > May  8 12:14:26 timtest kernel: o
> > May  8 12:14:26 timtest kernel: [192.168.5.40]:80; tcp_output: error 55 
> > while sendingT
> > May  8 12:14:26 timtest kernel: C
> > May  8 12:14:26 timtest kernel: P: [192.168.5.43]:63859 to 
> > [192.168.5.40]:80; tcp_output: error 55 while sending
> > May  8 12:14:26 timtest kernel: TCP: [192.168.5.43]:64037 to 
> > [192.168.5.40]:80; tcp_output: error 55 while sending (ip_output 1)
> > May  8 12:14:26 timtest kernel: TCP: [192.168.5.43]:64037 to 
> > [192.168.5.40]:80; tcp_output: error 55 while sending
> > May  8 12:14:26 timtest kernel: TCP: [192.168.5.43]:63857 to 
> > [192.168.5.40]:80; tcp_output: error 55 while sending (ip_output 1)
> > May  8 12:14:26 timtest kernel: TCP: [192.168.5.42]:56421 toT C[P1:9 
> > 2[.119628..51.6480.]5:.8403;] :6t3c8p57_ otuot p[u1t9:2 .e1r68r.o5r. 
> > 40]:8505;  whticlpe_ osuetnpduitn:g  e(rirpo_ro utpu5t5  w1h)i
> > 
> > interspersed with clean blocks of 20 entries or so of:
> > 
> > May  8 12:14:26 timtest kernel: TCP: [192.168.5.43]:64037 to 
> > [192.168.5.40]:80; tcp_output: error 55 while sending (ip_output 1)
> > May  8 12:14:26 timtest kernel: TCP: [192.168.5.43]:64037 to 
> > [192.168.5.40]:80; tcp_output: error 55 while sending
> > May  8 12:14:26 timtest kernel: TCP: [192.168.5.43]:63857 to 
> > [192.168.5.40]:80; tcp_output: error 55 while sending (ip_output 1)
> > 
> > 
> > The output did not look appreciably different when the ETIMEDOUT occurred.
> > 
> > On stopping the client test program:
> > 
> > May  8 12:14:46 timtest kernel: TCP: [192.168.5.43]:63978 to 
> > [192.168.5.40]:80 tcpflags 0x4; syncache_chkrst: Spurious RST without 
> > matching syncache entry (possibly syncookie only), segment ignored
> > May  8 12:14:46 timtest kernel: TCP: [192.168.5.43]:63978 to 
> > [192.168.5.40]:80 tcpflags 0x4; syncache_chkrst: Spurious RST without 
> > matching syncache entry (possibly syncookie only), segment ignored
> > May  8 12:14:46 timtest kernel: TCP: [192.168.5.43]:63978 to 
> > [192.168.5.40]:80 tcpflags 0x4; syncache_chkrst: Spurious RST without 
> > matching syncache entry (possibly syncookie only), segment ignored
> > 
> > netstat -m
> > 
> > 258/11007/11265 mbufs in use (current/cache/total)
> > 256/1596/1852/25600 mbuf clusters in use (current/cache/total/max)
> > 256/1536 mbuf+clusters out of packet secondary zone in use (current/cache)
> > 0/7585/7585/51200 4k (page size) jumbo clusters in use 
> > (current/cache/total/max)
> > 0/0/0/6400 9k jumbo clusters in use (current/cache/total/max)
> > 0/0/0/3200 16k jumbo clusters in use (cu

Re: read() returns ETIMEDOUT on steady TCP connection

2008-05-09 Thread Andre Oppermann

Tim Gebbett wrote:

Hi all,

applied the patch,

Well before a ETIMEDOUT error occurred (around 60secs), the tcp debug started 
venting massive
quantities of tcp_output error 55 while sending with syncache noise:


The error seems to be coming from the interface send queue which hits the limit.
If you are using em(4) network interface please add this line to loader.conf(5):

 hw.em.txd=1024

Or even more if problems persist.  The maximum is 4096.

--
Andre


y  8 12:14:26 timtest kernel: :63859 to [192.168.5.40]:80; tcp_output: error 55 
whilTeC Ps:e
n[d1i9n2g. 1(6i8p._5o.u4t3p]u:t64 0371 )t May  8 12:14:26 timtest kernel: o May 
 8 12:14:26
timtest kernel: [192.168.5.40]:80; tcp_output: error 55 while sendingT May  8 
12:14:26 timtest
kernel: C May  8 12:14:26 timtest kernel: P: [192.168.5.43]:63859 to 
[192.168.5.40]:80;
tcp_output: error 55 while sending May  8 12:14:26 timtest kernel: TCP: 
[192.168.5.43]:64037 to
[192.168.5.40]:80; tcp_output: error 55 while sending (ip_output 1) May  8 
12:14:26 timtest
kernel: TCP: [192.168.5.43]:64037 to [192.168.5.40]:80; tcp_output: error 55 
while sending May  8
12:14:26 timtest kernel: TCP: [192.168.5.43]:63857 to [192.168.5.40]:80; 
tcp_output: error 55
while sending (ip_output 1) May  8 12:14:26 timtest kernel: TCP: 
[192.168.5.42]:56421 toT C[P1:9
2[.119628..51.6480.]5:.8403;] :6t3c8p57_ otuot p[u1t9:2 .e1r68r.o5r. 40]:8505;  
whticlpe_
osuetnpduitn:g  e(rirpo_ro utpu5t5  w1h)i

interspersed with clean blocks of 20 entries or so of:

May  8 12:14:26 timtest kernel: TCP: [192.168.5.43]:64037 to [192.168.5.40]:80; 
tcp_output: error
55 while sending (ip_output 1) May  8 12:14:26 timtest kernel: TCP: 
[192.168.5.43]:64037 to
[192.168.5.40]:80; tcp_output: error 55 while sending May  8 12:14:26 timtest 
kernel: TCP:
[192.168.5.43]:63857 to [192.168.5.40]:80; tcp_output: error 55 while sending 
(ip_output 1)


The output did not look appreciably different when the ETIMEDOUT occurred.

On stopping the client test program:

May  8 12:14:46 timtest kernel: TCP: [192.168.5.43]:63978 to [192.168.5.40]:80 
tcpflags 0x4;
syncache_chkrst: Spurious RST without matching syncache entry (possibly 
syncookie only), segment
ignored May  8 12:14:46 timtest kernel: TCP: [192.168.5.43]:63978 to 
[192.168.5.40]:80 tcpflags
0x4; syncache_chkrst: Spurious RST without matching syncache entry 
(possibly syncookie
only), segment ignored May  8 12:14:46 timtest kernel: TCP: 
[192.168.5.43]:63978 to
[192.168.5.40]:80 tcpflags 0x4; syncache_chkrst: Spurious RST without 
matching syncache
entry (possibly syncookie only), segment ignored

netstat -m

258/11007/11265 mbufs in use (current/cache/total) 256/1596/1852/25600 mbuf 
clusters in use
(current/cache/total/max) 256/1536 mbuf+clusters out of packet secondary zone 
in use
(current/cache) 0/7585/7585/51200 4k (page size) jumbo clusters in use (current/cache/total/max) 
0/0/0/6400 9k jumbo clusters in use (current/cache/total/max) 0/0/0/3200 16k jumbo clusters in

use (current/cache/total/max) 576K/36283K/36860K bytes allocated to network 
(current/cache/total)
 0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters) 0/0/0 requests 
for jumbo clusters
denied (4k/9k/16k) 0/4/6656 sfbufs in use (current/peak/max) 0 requests for 
sfbufs denied 0
requests for sfbufs delayed 0 requests for I/O initiated by sendfile 0 calls to 
protocol drain
routines

Thanks again for your help - Tim











___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: read() returns ETIMEDOUT on steady TCP connection

2008-05-10 Thread Tim Gebbett
Hi Andre, did some careful testing yesterday and last night. I seem to 
be still hitting an unknown buffer although the probem is  much alleviated.
The system achieved a 7hour run at 500mbit where ETIMEDOUT occured. I 
was feeding 11 other streams to the server whos counters show an 
uninterrupted eleven hours. The feeder streams are from the same source, 
so it is unlikely that the one feeding the test could of had a problem 
without affecting the counters of the others.

sysctls are:

(loader.conf) hw.em.txd=4096
net.inet.tcp.sendspace=78840
net.inet.tcp.recvspace=78840

kern.ipc.nmbjumbop=51200
kern.ipc.nmbclusters=78840
kern.maxfiles=5

IP stats are miraculously improved, going from a 10% packet loss within 
stack (output drops) to a consistent zero at peaks of 8 pps. I 
believe the problem is now being shunted  to the NIC from the following 
output:


dev.em.0.debug=1

< em0: Adapter hardware address = 0xc520b224

< em0: CTRL = 0x48f00249 RCTL = 0x8002 
< em0: Packet buffer = Tx=16k Rx=48k 
< em0: Flow control watermarks high = 47104 low = 45604

< em0: tx_int_delay = 66, tx_abs_int_delay = 66
< em0: rx_int_delay = 0, rx_abs_int_delay = 66
< em0: fifo workaround = 0, fifo_reset_count = 0
< em0: hw tdh = 3285, hw tdt = 3285
< em0: hw rdh = 201, hw rdt = 200
< em0: Num Tx descriptors avail = 4096
< em0: Tx Descriptors not avail1 = 4591225
< em0: Tx Descriptors not avail2 = 0
< em0: Std mbuf failed = 0
< em0: Std mbuf cluster failed = 0
< em0: Driver dropped packets = 0
< em0: Driver tx dma failure in encap = 0

dev.em.0.stats=1

< em0: Excessive collisions = 0

< em0: Sequence errors = 0
< em0: Defer count = 0
< em0: Missed Packets = 16581181
< em0: Receive No Buffers = 7460
< em0: Receive Length Errors = 0
< em0: Receive errors = 0
< em0: Crc errors = 0
< em0: Alignment errors = 0
< em0: Collision/Carrier extension errors = 0
< em0: RX overruns = 289717
< em0: watchdog timeouts = 0
< em0: XON Rcvd = 0
< em0: XON Xmtd = 0
< em0: XOFF Rcvd = 0
< em0: XOFF Xmtd = 0
< em0: Good Packets Rcvd = 848158221
< em0: Good Packets Xmtd = 1080368640
< em0: TSO Contexts Xmtd = 0
< em0: TSO Contexts Failed = 0


Does the counter 'Tx Descriptors not avail1'  indicate lack of  
decriptors at the time not available, and would this be symptomatic of  
something Mark suggested:
"(the stack) needs to handle local buffer fills not as a failed attempt 
on transmission that increments the retry counter, a possible better 
strategy required for backoff

when the hardware buffer is full?"

Thanks for your continued time and effort - Tim


Andre Oppermann wrote:

Tim Gebbett wrote:

Hi all,

applied the patch,

Well before a ETIMEDOUT error occurred (around 60secs), the tcp debug 
started venting massive

quantities of tcp_output error 55 while sending with syncache noise:


The error seems to be coming from the interface send queue which hits 
the limit.
If you are using em(4) network interface please add this line to 
loader.conf(5):


 hw.em.txd=1024

Or even more if problems persist.  The maximum is 4096.



___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: read() returns ETIMEDOUT on steady TCP connection

2008-05-12 Thread Andre Oppermann

Tim Gebbett wrote:
Hi Andre, did some careful testing yesterday and last night. I seem to 
be still hitting an unknown buffer although the probem is  much alleviated.
The system achieved a 7hour run at 500mbit where ETIMEDOUT occured. I 
was feeding 11 other streams to the server whos counters show an 
uninterrupted eleven hours. The feeder streams are from the same source, 
so it is unlikely that the one feeding the test could of had a problem 
without affecting the counters of the others.

sysctls are:

(loader.conf) hw.em.txd=4096
net.inet.tcp.sendspace=78840
net.inet.tcp.recvspace=78840

kern.ipc.nmbjumbop=51200
kern.ipc.nmbclusters=78840
kern.maxfiles=5

IP stats are miraculously improved, going from a 10% packet loss within 
stack (output drops) to a consistent zero at peaks of 8 pps. I 
believe the problem is now being shunted  to the NIC from the following 
output:


dev.em.0.debug=1

< em0: Adapter hardware address = 0xc520b224

< em0: CTRL = 0x48f00249 RCTL = 0x8002 < em0: Packet buffer = Tx=16k 
Rx=48k < em0: Flow control watermarks high = 47104 low = 45604

< em0: tx_int_delay = 66, tx_abs_int_delay = 66
< em0: rx_int_delay = 0, rx_abs_int_delay = 66
< em0: fifo workaround = 0, fifo_reset_count = 0
< em0: hw tdh = 3285, hw tdt = 3285
< em0: hw rdh = 201, hw rdt = 200
< em0: Num Tx descriptors avail = 4096
< em0: Tx Descriptors not avail1 = 4591225
< em0: Tx Descriptors not avail2 = 0
< em0: Std mbuf failed = 0
< em0: Std mbuf cluster failed = 0
< em0: Driver dropped packets = 0
< em0: Driver tx dma failure in encap = 0

dev.em.0.stats=1

< em0: Excessive collisions = 0

< em0: Sequence errors = 0
< em0: Defer count = 0
< em0: Missed Packets = 16581181
< em0: Receive No Buffers = 7460
< em0: Receive Length Errors = 0
< em0: Receive errors = 0
< em0: Crc errors = 0
< em0: Alignment errors = 0
< em0: Collision/Carrier extension errors = 0
< em0: RX overruns = 289717
< em0: watchdog timeouts = 0
< em0: XON Rcvd = 0
< em0: XON Xmtd = 0
< em0: XOFF Rcvd = 0
< em0: XOFF Xmtd = 0
< em0: Good Packets Rcvd = 848158221
< em0: Good Packets Xmtd = 1080368640
< em0: TSO Contexts Xmtd = 0
< em0: TSO Contexts Failed = 0


Does the counter 'Tx Descriptors not avail1'  indicate lack of  
decriptors at the time not available, and would this be symptomatic of  
something Mark suggested:
"(the stack) needs to handle local buffer fills not as a failed attempt 
on transmission that increments the retry counter, a possible better 
strategy required for backoff

when the hardware buffer is full?"


Indeed.  We have rethink a couple of assumptions the code currently
makes and has made for the longest time.  Additionally the defaults
for the network hardware need to be better tuned for workloads like
yours.  I'm on my way to BSDCan'08 soon and I will discuss these
issues at the Developer Summit.

--
Andre
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"