Re: TCP connection stops after high load.
> Yes, it fixes. Thanks, I will submit it to -stable branch. David and John, Thanks for your caring and attention. -- Sincerely, Robert Iakobashvili, coroberti %x40 gmail %x2e com ... Navigare necesse est, vivere non est necesse ... http://curl-loader.sourceforge.net An open-source HTTP/S, FTP/S traffic generating, and web testing tool. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: TCP connection stops after high load.
From: John Heffner <[EMAIL PROTECTED]> Date: Tue, 17 Apr 2007 15:47:58 -0400 > My only reservation in submitting this to -stable is that it will in > many cases increase the default tcp_mem values, which in turn can > increase the default tcp_rmem values, and therefore the window scale. > There will be some set of people with broken firewalls who trigger that > problem for the first time by upgrading along the stable branch. While > it's not our fault, it could cause some complaints... It is a very valid concern. However this is fixing a problem where we are in the wrong, whereas the firewall issues are external and should not block us from being able to fix our own bugs :-) - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: TCP connection stops after high load.
David Miller wrote: From: "Robert Iakobashvili" <[EMAIL PROTECTED]> Date: Tue, 17 Apr 2007 10:58:04 +0300 David, On 4/16/07, David Miller <[EMAIL PROTECTED]> wrote: Commit: 53cdcc04c1e85d4e423b2822b66149b6f2e52c2c Author: John Heffner <[EMAIL PROTECTED]> Fri, 16 Mar 2007 15:04:03 -0700 [TCP]: Fix tcp_mem[] initialization. Change tcp_mem initialization function. The fraction of total memory is now a continuous function of memory size, and independent of page size. Kernels 2.6.19 and 2.6.20 series are effectively broken right now. Don't you wish to patch them? Can you verify that this patch actually fixes your problem? Yes, it fixes. Thanks, I will submit it to -stable branch. My only reservation in submitting this to -stable is that it will in many cases increase the default tcp_mem values, which in turn can increase the default tcp_rmem values, and therefore the window scale. There will be some set of people with broken firewalls who trigger that problem for the first time by upgrading along the stable branch. While it's not our fault, it could cause some complaints... Thanks, -John - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: TCP connection stops after high load.
From: "Robert Iakobashvili" <[EMAIL PROTECTED]> Date: Tue, 17 Apr 2007 10:58:04 +0300 > David, > > On 4/16/07, David Miller <[EMAIL PROTECTED]> wrote: > > > > Commit: 53cdcc04c1e85d4e423b2822b66149b6f2e52c2c > > > > Author: John Heffner <[EMAIL PROTECTED]> Fri, 16 Mar 2007 15:04:03 -0700 > > > > > > > > [TCP]: Fix tcp_mem[] initialization. > > > > Change tcp_mem initialization function. The fraction of total > > > > memory > > > > is now a continuous function of memory size, and independent of > > > > page > > > > size. > > > > > > > > > Kernels 2.6.19 and 2.6.20 series are effectively broken right now. > > > Don't you wish to patch them? > > > > Can you verify that this patch actually fixes your problem? > > Yes, it fixes. Thanks, I will submit it to -stable branch. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: TCP connection stops after high load.
David, On 4/16/07, David Miller <[EMAIL PROTECTED]> wrote: > > Commit: 53cdcc04c1e85d4e423b2822b66149b6f2e52c2c > > Author: John Heffner <[EMAIL PROTECTED]> Fri, 16 Mar 2007 15:04:03 -0700 > > > > [TCP]: Fix tcp_mem[] initialization. > > Change tcp_mem initialization function. The fraction of total memory > > is now a continuous function of memory size, and independent of page > > size. > > > Kernels 2.6.19 and 2.6.20 series are effectively broken right now. > Don't you wish to patch them? Can you verify that this patch actually fixes your problem? Yes, it fixes. After the patch curl-loader works with patched 2.6.19.7 and with patched 2.6.20.7 using simulteneous 3000 local connections smothly, and even better than with referred as a "good" 2.6.18.3. Besides that the tcp_mem status for my machine: kernel tcp_mem -- 2.6.19.7 30724096 6144 2.6.19.7-patched45696 60928 91392 2.6.20.7 30724096 6144 2.6.20.7-patched45696 60928 91392 The patch was applied smothly just with line offsets. -- Sincerely, Robert Iakobashvili, coroberti %x40 gmail %x2e com ... Navigare necesse est, vivere non est necesse ... http://curl-loader.sourceforge.net An open-source HTTP/S, FTP/S traffic generating, and web testing tool. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: TCP connection stops after high load.
From: John Heffner <[EMAIL PROTECTED]> Date: Mon, 16 Apr 2007 15:11:07 -0400 > I don't know if this qualifies as an unconditional bug. The commit > above was actually a bugfix so that the limits were not higher than > total memory on some systems, but had the side effect that it made them > even smaller on your particular configuration. Also, having initial > sysctl values that are conservatively small probably doesn't qualify as > a bug (for patching stable trees). You might ask the -stable > maintainers if they have a different opinion. > > For most people, 2.6.19 and 2.6.20 work fine. For those who really care > about the tcp_mem values (are using a substantial fraction of physical > memory for TCP connections), the best bet is to set the tcp_mem sysctl > values in the startup scripts, or use the new initialization function in > 2.6.21. What's most important is determining if that tcp_mem[] patch actually fixes his problem, so it is his responsibility to see whether this is the case. If it does fix the problem, I'm happy to submit the backport to -stable. But until such tests are made, it's just speculation whether the patch fixes the problem or not, and therefore there is zero justification to submit it to -stable. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: TCP connection stops after high load.
From: "Robert Iakobashvili" <[EMAIL PROTECTED]> Date: Mon, 16 Apr 2007 20:51:54 +0200 > > Commit: 53cdcc04c1e85d4e423b2822b66149b6f2e52c2c > > Author: John Heffner <[EMAIL PROTECTED]> Fri, 16 Mar 2007 15:04:03 -0700 > > > > [TCP]: Fix tcp_mem[] initialization. > > Change tcp_mem initialization function. The fraction of total memory > > is now a continuous function of memory size, and independent of page > > size. > > > Kernels 2.6.19 and 2.6.20 series are effectively broken right now. > Don't you wish to patch them? Can you verify that this patch actually fixes your problem? - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: TCP connection stops after high load.
Robert Iakobashvili wrote: Kernels 2.6.19 and 2.6.20 series are effectively broken right now. Don't you wish to patch them? I don't know if this qualifies as an unconditional bug. The commit above was actually a bugfix so that the limits were not higher than total memory on some systems, but had the side effect that it made them even smaller on your particular configuration. Also, having initial sysctl values that are conservatively small probably doesn't qualify as a bug (for patching stable trees). You might ask the -stable maintainers if they have a different opinion. For most people, 2.6.19 and 2.6.20 work fine. For those who really care about the tcp_mem values (are using a substantial fraction of physical memory for TCP connections), the best bet is to set the tcp_mem sysctl values in the startup scripts, or use the new initialization function in 2.6.21. Thanks, -John - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: TCP connection stops after high load.
>> Robert Iakobashvili wrote: >> > Vanilla 2.6.18.3 works for me perfectly, whereas 2.6.19.5 and >> > 2.6.20.6 do not. >> > >> > Looking into the tcp /proc entries of 2.6.18.3 versus 2.6.19.5 >> > tcp_rmem and tcp_wmem are the same, whereas tcp_mem are >> > much different: >> > >> > kernel tcp_mem >> > --- >> > 2.6.18.312288 16384 24576 >> > 2.6.19.5 30724096 6144 > >> Another patch that went in right around that time: >> >> commit 52bf376c63eebe72e862a1a6e713976b038c3f50 >> Author: John Heffner <[EMAIL PROTECTED]> >> Date: Tue Nov 14 20:25:17 2006 -0800 >> >> [TCP]: Fix up sysctl_tcp_mem initialization. >> (This has been changed again for 2.6.21.) >> Yes, this difference is caused by the commit above. The current net-2.6 (2.6.21) has a redesigned tcp_mem initialization that should give you more appropriate values, something like 45408 60546 90816. For reference: Commit: 53cdcc04c1e85d4e423b2822b66149b6f2e52c2c Author: John Heffner <[EMAIL PROTECTED]> Fri, 16 Mar 2007 15:04:03 -0700 [TCP]: Fix tcp_mem[] initialization. Change tcp_mem initialization function. The fraction of total memory is now a continuous function of memory size, and independent of page size. Kernels 2.6.19 and 2.6.20 series are effectively broken right now. Don't you wish to patch them? -- Sincerely, Robert Iakobashvili, coroberti %x40 gmail %x2e com ... Navigare necesse est, vivere non est necesse ... http://curl-loader.sourceforge.net An open-source HTTP/S, FTP/S traffic generating, and web testing tool. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: TCP connection stops after high load.
Robert Iakobashvili wrote: Hi John, On 4/15/07, John Heffner <[EMAIL PROTECTED]> wrote: Robert Iakobashvili wrote: > Vanilla 2.6.18.3 works for me perfectly, whereas 2.6.19.5 and > 2.6.20.6 do not. > > Looking into the tcp /proc entries of 2.6.18.3 versus 2.6.19.5 > tcp_rmem and tcp_wmem are the same, whereas tcp_mem are > much different: > > kernel tcp_mem > --- > 2.6.18.312288 16384 24576 > 2.6.19.5 30724096 6144 Another patch that went in right around that time: commit 52bf376c63eebe72e862a1a6e713976b038c3f50 Author: John Heffner <[EMAIL PROTECTED]> Date: Tue Nov 14 20:25:17 2006 -0800 [TCP]: Fix up sysctl_tcp_mem initialization. (This has been changed again for 2.6.21.) In the dmesg, there should be some messages like this: IP route cache hash table entries: 32768 (order: 5, 131072 bytes) TCP established hash table entries: 131072 (order: 8, 1048576 bytes) TCP bind hash table entries: 65536 (order: 6, 262144 bytes) TCP: Hash tables configured (established 131072 bind 65536) What do yours say? For the 2.6.19.5, where we have this problem: From dmsg: IP route cache hash table entries: 4096 (order: 2, 16384 bytes) TCP established hash table entries: 16384 (order: 5, 131072 bytes) TCP bind hash table entries: 8192 (order: 4, 65536 bytes) #cat /proc/sys/net/ipv4/tcp_mem 307240966144 MemTotal: 484368 kB CONFIG_HIGHMEM4G=y Yes, this difference is caused by the commit above. The old way didn't really make a lot of sense, since it was different based on smp/non-smp and page size, and had large discontinuities at 512MB and every power of two. It was hard to make the limit never larger than the memory pool but never too small either, when based on the hash table size. The current net-2.6 (2.6.21) has a redesigned tcp_mem initialization that should give you more appropriate values, something like 45408 60546 90816. For reference: Commit: 53cdcc04c1e85d4e423b2822b66149b6f2e52c2c Author: John Heffner <[EMAIL PROTECTED]> Fri, 16 Mar 2007 15:04:03 -0700 [TCP]: Fix tcp_mem[] initialization. Change tcp_mem initialization function. The fraction of total memory is now a continuous function of memory size, and independent of page size. Signed-off-by: John Heffner <[EMAIL PROTECTED]> Signed-off-by: David S. Miller <[EMAIL PROTECTED]> Thanks, -John - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: TCP connection stops after high load.
Hi John, On 4/15/07, John Heffner <[EMAIL PROTECTED]> wrote: Robert Iakobashvili wrote: > Vanilla 2.6.18.3 works for me perfectly, whereas 2.6.19.5 and > 2.6.20.6 do not. > > Looking into the tcp /proc entries of 2.6.18.3 versus 2.6.19.5 > tcp_rmem and tcp_wmem are the same, whereas tcp_mem are > much different: > > kernel tcp_mem > --- > 2.6.18.312288 16384 24576 > 2.6.19.5 30724096 6144 Another patch that went in right around that time: commit 52bf376c63eebe72e862a1a6e713976b038c3f50 Author: John Heffner <[EMAIL PROTECTED]> Date: Tue Nov 14 20:25:17 2006 -0800 [TCP]: Fix up sysctl_tcp_mem initialization. (This has been changed again for 2.6.21.) In the dmesg, there should be some messages like this: IP route cache hash table entries: 32768 (order: 5, 131072 bytes) TCP established hash table entries: 131072 (order: 8, 1048576 bytes) TCP bind hash table entries: 65536 (order: 6, 262144 bytes) TCP: Hash tables configured (established 131072 bind 65536) What do yours say? For the 2.6.19.5, where we have this problem: From dmsg: IP route cache hash table entries: 4096 (order: 2, 16384 bytes) TCP established hash table entries: 16384 (order: 5, 131072 bytes) TCP bind hash table entries: 8192 (order: 4, 65536 bytes) #cat /proc/sys/net/ipv4/tcp_mem 307240966144 MemTotal: 484368 kB CONFIG_HIGHMEM4G=y Thanks, Sincerely, Robert Iakobashvili, coroberti %x40 gmail %x2e com ... Navigare necesse est, vivere non est necesse ... http://curl-loader.sourceforge.net An open-source HTTP/S, FTP/S traffic generating, and web testing tool. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: TCP connection stops after high load.
Robert Iakobashvili wrote: Vanilla 2.6.18.3 works for me perfectly, whereas 2.6.19.5 and 2.6.20.6 do not. Looking into the tcp /proc entries of 2.6.18.3 versus 2.6.19.5 tcp_rmem and tcp_wmem are the same, whereas tcp_mem are much different: kernel tcp_mem --- 2.6.18.312288 16384 24576 2.6.19.5 30724096 6144 Is not it done deliberately by the below patch: commit 9e950efa20dc8037c27509666cba6999da9368e8 Author: John Heffner <[EMAIL PROTECTED]> Date: Mon Nov 6 23:10:51 2006 -0800 [TCP]: Don't use highmem in tcp hash size calculation. This patch removes consideration of high memory when determining TCP hash table sizes. Taking into account high memory results in tcp_mem values that are too large. Is it a feature? My machine has: MemTotal: 484368 kB and for all kernel configurations are actually the same with CONFIG_HIGHMEM4G=y Thanks, Another patch that went in right around that time: commit 52bf376c63eebe72e862a1a6e713976b038c3f50 Author: John Heffner <[EMAIL PROTECTED]> Date: Tue Nov 14 20:25:17 2006 -0800 [TCP]: Fix up sysctl_tcp_mem initialization. Fix up tcp_mem initial settings to take into account the size of the hash entries (different on SMP and non-SMP systems). Signed-off-by: John Heffner <[EMAIL PROTECTED]> Signed-off-by: David S. Miller <[EMAIL PROTECTED]> (This has been changed again for 2.6.21.) In the dmesg, there should be some messages like this: IP route cache hash table entries: 32768 (order: 5, 131072 bytes) TCP established hash table entries: 131072 (order: 8, 1048576 bytes) TCP bind hash table entries: 65536 (order: 6, 262144 bytes) TCP: Hash tables configured (established 131072 bind 65536) What do yours say? Thanks, -John - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: TCP connection stops after high load.
On 4/13/07, David Miller <[EMAIL PROTECTED]> wrote: From: "Robert Iakobashvili" <[EMAIL PROTECTED]> Date: Thu, 12 Apr 2007 23:11:14 +0200 > It works good with 2.6.11.8 and debian 2.6.18.3-i686 image. > > At the same Intel Pentium-4 PC with the same about kernel configuration > (make oldconfig using Debian config-2.6.18.3-i686) the setup fails with the > tcp-connections stalled after 1000 established connections when the kernel > is 2.6.20.6 or 2.6.19.5. > > It stalls even earlier, when lighttpd used with the default (poll ()) > demultiplexing > after 500 connections or when apache2 web server used (memory?) - after 100 > connections. > > I am currently going to try vanilla 2.6.18.3 and, if with it also > fails, to look through > Debian patches, trying to figure out, what is the delta. Vanilla 2.6.18.3 works for me perfectly, whereas 2.6.19.5 and 2.6.20.6 do not. Looking into the tcp /proc entries of 2.6.18.3 versus 2.6.19.5 tcp_rmem and tcp_wmem are the same, whereas tcp_mem are much different: kernel tcp_mem --- 2.6.18.312288 16384 24576 2.6.19.5 30724096 6144 Is not it done deliberately by the below patch: commit 9e950efa20dc8037c27509666cba6999da9368e8 Author: John Heffner <[EMAIL PROTECTED]> Date: Mon Nov 6 23:10:51 2006 -0800 Sorry, the commit is innocent. Something else has been broken in tcp_mem initialization logic. My machine has: MemTotal: 484368 kB and for all kernel configurations are actually the same with CONFIG_HIGHMEM4G=y Sincerely, Robert Iakobashvili, coroberti %x40 gmail %x2e com ... Navigare necesse est, vivere non est necesse ... http://curl-loader.sourceforge.net An open-source HTTP/S, FTP/S traffic generating, and web testing tool. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: TCP connection stops after high load.
On 4/13/07, David Miller <[EMAIL PROTECTED]> wrote: From: "Robert Iakobashvili" <[EMAIL PROTECTED]> Date: Thu, 12 Apr 2007 23:11:14 +0200 > It works good with 2.6.11.8 and debian 2.6.18.3-i686 image. > > At the same Intel Pentium-4 PC with the same about kernel configuration > (make oldconfig using Debian config-2.6.18.3-i686) the setup fails with the > tcp-connections stalled after 1000 established connections when the kernel > is 2.6.20.6 or 2.6.19.5. > > It stalls even earlier, when lighttpd used with the default (poll ()) > demultiplexing > after 500 connections or when apache2 web server used (memory?) - after 100 > connections. > > I am currently going to try vanilla 2.6.18.3 and, if with it also > fails, to look through > Debian patches, trying to figure out, what is the delta. Vanilla 2.6.18.3 works for me perfectly, whereas 2.6.19.5 and 2.6.20.6 do not. Looking into the tcp /proc entries of 2.6.18.3 versus 2.6.19.5 tcp_rmem and tcp_wmem are the same, whereas tcp_mem are much different: kernel tcp_mem --- 2.6.18.312288 16384 24576 2.6.19.5 30724096 6144 Is not it done deliberately by the below patch: commit 9e950efa20dc8037c27509666cba6999da9368e8 Author: John Heffner <[EMAIL PROTECTED]> Date: Mon Nov 6 23:10:51 2006 -0800 [TCP]: Don't use highmem in tcp hash size calculation. This patch removes consideration of high memory when determining TCP hash table sizes. Taking into account high memory results in tcp_mem values that are too large. Is it a feature? My machine has: MemTotal: 484368 kB and for all kernel configurations are actually the same with CONFIG_HIGHMEM4G=y Thanks, -- Sincerely, Robert Iakobashvili, coroberti %x40 gmail %x2e com ... Navigare necesse est, vivere non est necesse ... http://curl-loader.sourceforge.net An open-source HTTP/S, FTP/S traffic generating, and web testing tool. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: TCP connection stops after high load.
From: Eric Dumazet <[EMAIL PROTECTED]> Date: Sat, 14 Apr 2007 07:31:35 +0200 > When did tg3 model changed exactly ? June of 2006: commit 00b7050426da8e7e58c889c5c80a19920d2d41b3 Author: Michael Chan <[EMAIL PROTECTED]> Date: Sat Jun 17 21:58:45 2006 -0700 [TG3]: Convert to non-LLTX Herbert Xu pointed out that it is unsafe to call netif_tx_disable() from LLTX drivers because it uses dev->xmit_lock to synchronize whereas LLTX drivers use private locks. Convert tg3 to non-LLTX to fix this issue. tg3 is a lockless driver where hard_start_xmit and tx completion handling can run concurrently under normal conditions. A tx_lock is only needed to prevent netif_stop_queue and netif_wake_queue race condtions when the queue is full. So whether we use LLTX or non-LLTX, it makes practically no difference. Signed-off-by: Michael Chan <[EMAIL PROTECTED]> Signed-off-by: David S. Miller <[EMAIL PROTECTED]> - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: TCP connection stops after high load.
Herbert Xu a écrit : Eric Dumazet <[EMAIL PROTECTED]> wrote: dev_queue_xmit_nit() is called before attempting to send packet to device. If device could not accept the packet (hard_start_xmit() returns an error), packet is requeued and retried later. each retry means call ev_queue_xmit_nit() again, so tcpdump/sniffers can 'see' packet transmited several times. This should only happen with LLTX drivers. In fact, LLTX drivers are really more trouble than they're worth. They should all be rewritten to follow the model used in tg3. When did tg3 model changed exactly ? Because I remember having this 'problem' with tg3 devices not a long time ago... - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: TCP connection stops after high load.
From: Herbert Xu <[EMAIL PROTECTED]> Date: Sat, 14 Apr 2007 14:21:44 +1000 > Eric Dumazet <[EMAIL PROTECTED]> wrote: > > > > dev_queue_xmit_nit() is called before attempting to send packet to device. > > > > If device could not accept the packet (hard_start_xmit() returns an error), > > packet is requeued and retried later. > > each retry means call ev_queue_xmit_nit() again, so tcpdump/sniffers can > > 'see' packet transmited several times. > > This should only happen with LLTX drivers. In fact, LLTX drivers are > really more trouble than they're worth. They should all be rewritten > to follow the model used in tg3. Agreed. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: TCP connection stops after high load.
Eric Dumazet <[EMAIL PROTECTED]> wrote: > > dev_queue_xmit_nit() is called before attempting to send packet to device. > > If device could not accept the packet (hard_start_xmit() returns an error), > packet is requeued and retried later. > each retry means call ev_queue_xmit_nit() again, so tcpdump/sniffers can > 'see' packet transmited several times. This should only happen with LLTX drivers. In fact, LLTX drivers are really more trouble than they're worth. They should all be rewritten to follow the model used in tg3. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: TCP connection stops after high load.
Evgeniy Polyakov wrote: On Thu, Apr 12, 2007 at 02:36:34PM -0700, Ben Greear ([EMAIL PROTECTED]) wrote: I am not sure if the problem is fixed or just harder to hit, but for now it looks good. Wasn't default congestion control algo changed between that kernel releases? With such small rtt like in your setup there could be some obscure bug, try to set different one and check if it still works good/bad. I had earlier tried changing between bic and reno (the only two I had compiled in that kernel), and it did not affect anything. I also realized that I had been reproducing the bug (and the traces I sent to this list earlier) on a 2.6.17.4 kernel..not 2.6.18 as I had supposed. So, it's possible that the problem was fixed between 2.6.17.4 and 2.6.18.2 as well. I also figured out yesterday that rebooting to go to a new kernel makes it slower to reproduce, even on kernels known to have the problem. This is probably because lots of memory is available after a reboot. I am going to set up some long term tests on 2.6.18, 2.6.19 and 2.6.20 and let them cook for several days to make sure the problem is truly fixed in the later kernels. Thanks, Ben -- Ben Greear <[EMAIL PROTECTED]> Candela Technologies Inc http://www.candelatech.com - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: TCP connection stops after high load.
On Fri, 13 Apr 2007 18:10:12 +0200 Daniel Schaffrath <[EMAIL PROTECTED]> wrote: > > On 2007/04/12 , at 20:19, Eric Dumazet wrote: > > > > Warning : tcpdump can lie, telling you packets being transmited > > several time. > Maybe you have further pointers how come that tcpdump lies about > duplicated packets? > dev_queue_xmit_nit() is called before attempting to send packet to device. If device could not accept the packet (hard_start_xmit() returns an error), packet is requeued and retried later. each retry means call ev_queue_xmit_nit() again, so tcpdump/sniffers can 'see' packet transmited several times. This is why I asked for "tc -s -d qdisc" results : to check the requeue counter (not its absolute value, but relative to number of packets sent) See dev_hard_start_xmit() in net/core/dev.c - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: TCP connection stops after high load.
On 2007/04/12 , at 20:19, Eric Dumazet wrote: On Thu, 12 Apr 2007 10:59:19 -0700 Ben Greear <[EMAIL PROTECTED]> wrote: Here is a tcpdump of the connection in the stalled state. As you can see by the 'time' output, it's running at around 100,000 packets per second. tcpdump dropped the vast majority of these. Based on the network interface stats, I believe both sides of the connection are sending acks at about the same rate (about 160kpps when tcpdump is not running it seems). Warning : tcpdump can lie, telling you packets being transmited several time. Maybe you have further pointers how come that tcpdump lies about duplicated packets? Thanks a lot, Daniel And yes, tcpdump slow things down because of enabling accurate timestamping of packets. 10:46:46.541490 IP 20.20.20.20.33011 > 20.20.20.30.33012: . ack 48 win 6132 10:46:46.541494 IP 20.20.20.30.33012 > 20.20.20.20.33011: . ack 1 win 114 10:46:46.541567 IP 20.20.20.30.33012 > 20.20.20.20.33011: . ack 1 win 114 10:46:46.541653 IP 20.20.20.30.33012 > 20.20.20.20.33011: . ack 1 win 114 10:46:46.541886 IP 20.20.20.30.33012 > 20.20.20.20.33011: . ack 1 win 114 10:46:46.541891 IP 20.20.20.20.33011 > 20.20.20.30.33012: . ack 48 win 6132 10:46:46.541895 IP 20.20.20.30.33012 > 20.20.20.20.33011: . ack 1 win 114 10:46:46.541988 IP 20.20.20.30.33012 > 20.20.20.20.33011: . ack 1 win 114 - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: TCP connection stops after high load.
On Thu, Apr 12, 2007 at 02:36:34PM -0700, Ben Greear ([EMAIL PROTECTED]) wrote: > I am not sure if the problem is fixed or just harder to hit, > but for now it looks good. Wasn't default congestion control algo changed between that kernel releases? With such small rtt like in your setup there could be some obscure bug, try to set different one and check if it still works good/bad. -- Evgeniy Polyakov - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: TCP connection stops after high load.
Eric Dumazet wrote: Hum, could you try to bind nic irqs on separate cpus ? I just started a run on 2.6.20.4, and so far (~20 minutes), it is behaving perfect..running at around 925Mbps in both directions. CWND averages about 600, bouncing from a low of 300 up to 800, but that could very well be perfectly normal. I'm quite pleased with the faster performance in this kernel as well...seems like the old one would rarely get above 800Mbps even when it was passing traffic! I am not sure if the problem is fixed or just harder to hit, but for now it looks good. I'm going to also try a 2.6.19 kernel and see if the problem hits there in an attempt to figure out what patch changed the behaviour. Thanks, Ben -- Ben Greear <[EMAIL PROTECTED]> Candela Technologies Inc http://www.candelatech.com - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: TCP connection stops after high load.
From: "Robert Iakobashvili" <[EMAIL PROTECTED]> Date: Thu, 12 Apr 2007 23:11:14 +0200 > It works good with 2.6.11.8 and debian 2.6.18.3-i686 image. > > At the same Intel Pentium-4 PC with the same about kernel configuration > (make oldconfig using Debian config-2.6.18.3-i686) the setup fails with the > tcp-connections stalled after 1000 established connections when the kernel > is 2.6.20.6 or 2.6.19.5. > > It stalls even earlier, when lighttpd used with the default (poll ()) > demultiplexing > after 500 connections or when apache2 web server used (memory?) - after 100 > connections. > > I am currently going to try vanilla 2.6.18.3 and, if with it also > fails, to look through > Debian patches, trying to figure out, what is the delta. > > strace-ing and logs has revealed actually 2 scenarios of failures. > Connections are established successfully and: > - request sent and there is no response; > - partial response received and the connection stalls. The following patch is not the cause, but it likely exacerbates the problem, can you revert the following patch from your kernel and see if it changes the behavior? commit 7b4f4b5ebceab67ce440a61081a69f0265e17c2a Author: John Heffner <[EMAIL PROTECTED]> Date: Sat Mar 25 01:34:07 2006 -0800 [TCP]: Set default max buffers from memory pool size This patch sets the maximum TCP buffer sizes (available to automatic buffer tuning, not to setsockopt) based on the TCP memory pool size. The maximum sndbuf and rcvbuf each will be up to 4 MB, but no more than 1/128 of the memory pressure threshold. Signed-off-by: John Heffner <[EMAIL PROTECTED]> Signed-off-by: David S. Miller <[EMAIL PROTECTED]> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index 4b0272c..591e96d 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -276,8 +276,8 @@ atomic_t tcp_orphan_count = ATOMIC_INIT(0); EXPORT_SYMBOL_GPL(tcp_orphan_count); int sysctl_tcp_mem[3]; -int sysctl_tcp_wmem[3] = { 4 * 1024, 16 * 1024, 128 * 1024 }; -int sysctl_tcp_rmem[3] = { 4 * 1024, 87380, 87380 * 2 }; +int sysctl_tcp_wmem[3]; +int sysctl_tcp_rmem[3]; EXPORT_SYMBOL(sysctl_tcp_mem); EXPORT_SYMBOL(sysctl_tcp_rmem); @@ -2081,7 +2081,8 @@ __setup("thash_entries=", set_thash_entries); void __init tcp_init(void) { struct sk_buff *skb = NULL; - int order, i; + unsigned long limit; + int order, i, max_share; if (sizeof(struct tcp_skb_cb) > sizeof(skb->cb)) __skb_cb_too_small_for_tcp(sizeof(struct tcp_skb_cb), @@ -2155,12 +2156,16 @@ void __init tcp_init(void) sysctl_tcp_mem[1] = 1024 << order; sysctl_tcp_mem[2] = 1536 << order; - if (order < 3) { - sysctl_tcp_wmem[2] = 64 * 1024; - sysctl_tcp_rmem[0] = PAGE_SIZE; - sysctl_tcp_rmem[1] = 43689; - sysctl_tcp_rmem[2] = 2 * 43689; - } + limit = ((unsigned long)sysctl_tcp_mem[1]) << (PAGE_SHIFT - 7); + max_share = min(4UL*1024*1024, limit); + + sysctl_tcp_wmem[0] = SK_STREAM_MEM_QUANTUM; + sysctl_tcp_wmem[1] = 16*1024; + sysctl_tcp_wmem[2] = max(64*1024, max_share); + + sysctl_tcp_rmem[0] = SK_STREAM_MEM_QUANTUM; + sysctl_tcp_rmem[1] = 87380; + sysctl_tcp_rmem[2] = max(87380, max_share); printk(KERN_INFO "TCP: Hash tables configured " "(established %d bind %d)\n", - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: TCP connection stops after high load.
Ben Greear a écrit : Eric Dumazet wrote: What "tc -s -d qdisc" "ifconfig -a" "cat /proc/interrupts" "cat /proc/net/sockstat" and "cat /proc/net/softnet_stat" are telling ? In this test, eth2 is talking to eth3, using something similar to this send-to-self patch: http://www.candelatech.com/oss/sts.patch [EMAIL PROTECTED] ipv4]# ifconfig -a eth0 Link encap:Ethernet HWaddr 00:30:48:89:74:60 inet addr:192.168.100.187 Bcast:192.168.100.255 Mask:255.255.255.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:1672220 errors:0 dropped:0 overruns:0 frame:0 TX packets:1560305 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:151896589 (144.8 MiB) TX bytes:1375280163 (1.2 GiB) Interrupt:17 eth1 Link encap:Ethernet HWaddr 00:30:48:89:74:61 UP BROADCAST MULTICAST MTU:1500 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:0 (0.0 b) TX bytes:0 (0.0 b) Interrupt:18 eth2 Link encap:Ethernet HWaddr 00:07:E9:1F:CE:02 inet addr:20.20.20.20 Bcast:20.20.20.255 Mask:255.255.255.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:2175144684 errors:0 dropped:2 overruns:0 frame:0 TX packets:2196560123 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:1321380186 (1.2 GiB) TX bytes:2274982574 (2.1 GiB) Base address:0xd000 Memory:d000-d002 eth3 Link encap:Ethernet HWaddr 00:07:E9:1F:CE:03 inet addr:20.20.20.30 Bcast:20.20.20.255 Mask:255.255.255.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:2196315966 errors:0 dropped:0 overruns:0 frame:0 TX packets:2174900538 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:2257901986 (2.1 GiB) TX bytes:1304493504 (1.2 GiB) Base address:0xd100 Memory:d002-d004 loLink encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 UP LOOPBACK RUNNING MTU:16436 Metric:1 RX packets:1159378 errors:0 dropped:0 overruns:0 frame:0 TX packets:1159378 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:1133646590 (1.0 GiB) TX bytes:1133646590 (1.0 GiB) [EMAIL PROTECTED] ipv4]# tc -s -d qdisc qdisc pfifo_fast 0: dev eth0 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1 Sent 1367521025 bytes 1324808 pkt (dropped 0, overlimits 0 requeues 0) rate 0bit 0pps backlog 0b 0p requeues 0 qdisc pfifo_fast 0: dev eth1 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) rate 0bit 0pps backlog 0b 0p requeues 0 qdisc pfifo_fast 0: dev eth2 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1 Sent 1815657070136 bytes 1536674488 pkt (dropped 0, overlimits 0 requeues 1448094) rate 0bit 0pps backlog 0b 0p requeues 1448094 qdisc pfifo_fast 0: dev eth3 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1 Sent 1752033393324 bytes 1536906566 pkt (dropped 0, overlimits 0 requeues 1063672) rate 0bit 0pps backlog 0b 0p requeues 1063672 [EMAIL PROTECTED] ipv4]# cat /proc/interrupts CPU0 CPU1 0: 46020594 44501954IO-APIC-edge timer 1: 9 0IO-APIC-edge i8042 7: 0 0IO-APIC-edge parport0 8: 1 0IO-APIC-edge rtc 9: 0 0 IO-APIC-level acpi 12: 96 0IO-APIC-edge i8042 14: 394023 407282IO-APIC-edge ide0 16: 0 0 IO-APIC-level uhci_hcd:usb4 17:11343461034006 IO-APIC-level uhci_hcd:usb3, eth0 18: 81605 83739 IO-APIC-level libata, uhci_hcd:usb2, eth1 19: 0 0 IO-APIC-level uhci_hcd:usb1, ehci_hcd:usb5 20: 53056128 46598235 IO-APIC-level eth2 21: 47534577 52189674 IO-APIC-level eth3 NMI: 0 0 LOC: 90485383 90485382 ERR: 0 MIS: 0 [EMAIL PROTECTED] ipv4]# cat /proc/net/sockstat sockets: used 334 TCP: inuse 27 orphan 0 tw 0 alloc 27 mem 360 UDP: inuse 12 RAW: inuse 0 FRAG: inuse 0 memory 0 [EMAIL PROTECTED] ipv4]# cat /proc/net/softnet_stat d58236f1 023badc3 0004ef01 3a4354a1 01b57b4b 0005445f Hum, could you try to bind nic irqs on separate cpus ? eth2 -> CPU0 and eth3 -> CPU1 # echo 1 >/proc/irq/20/smp_affinity # echo 2 >/proc/irq/21/smp_affinity - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: TCP connection stops after high load.
Eric Dumazet wrote: What "tc -s -d qdisc" "ifconfig -a" "cat /proc/interrupts" "cat /proc/net/sockstat" and "cat /proc/net/softnet_stat" are telling ? In this test, eth2 is talking to eth3, using something similar to this send-to-self patch: http://www.candelatech.com/oss/sts.patch [EMAIL PROTECTED] ipv4]# ifconfig -a eth0 Link encap:Ethernet HWaddr 00:30:48:89:74:60 inet addr:192.168.100.187 Bcast:192.168.100.255 Mask:255.255.255.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:1672220 errors:0 dropped:0 overruns:0 frame:0 TX packets:1560305 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:151896589 (144.8 MiB) TX bytes:1375280163 (1.2 GiB) Interrupt:17 eth1 Link encap:Ethernet HWaddr 00:30:48:89:74:61 UP BROADCAST MULTICAST MTU:1500 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:0 (0.0 b) TX bytes:0 (0.0 b) Interrupt:18 eth2 Link encap:Ethernet HWaddr 00:07:E9:1F:CE:02 inet addr:20.20.20.20 Bcast:20.20.20.255 Mask:255.255.255.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:2175144684 errors:0 dropped:2 overruns:0 frame:0 TX packets:2196560123 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:1321380186 (1.2 GiB) TX bytes:2274982574 (2.1 GiB) Base address:0xd000 Memory:d000-d002 eth3 Link encap:Ethernet HWaddr 00:07:E9:1F:CE:03 inet addr:20.20.20.30 Bcast:20.20.20.255 Mask:255.255.255.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:2196315966 errors:0 dropped:0 overruns:0 frame:0 TX packets:2174900538 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:2257901986 (2.1 GiB) TX bytes:1304493504 (1.2 GiB) Base address:0xd100 Memory:d002-d004 loLink encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 UP LOOPBACK RUNNING MTU:16436 Metric:1 RX packets:1159378 errors:0 dropped:0 overruns:0 frame:0 TX packets:1159378 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:1133646590 (1.0 GiB) TX bytes:1133646590 (1.0 GiB) [EMAIL PROTECTED] ipv4]# tc -s -d qdisc qdisc pfifo_fast 0: dev eth0 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1 Sent 1367521025 bytes 1324808 pkt (dropped 0, overlimits 0 requeues 0) rate 0bit 0pps backlog 0b 0p requeues 0 qdisc pfifo_fast 0: dev eth1 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) rate 0bit 0pps backlog 0b 0p requeues 0 qdisc pfifo_fast 0: dev eth2 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1 Sent 1815657070136 bytes 1536674488 pkt (dropped 0, overlimits 0 requeues 1448094) rate 0bit 0pps backlog 0b 0p requeues 1448094 qdisc pfifo_fast 0: dev eth3 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1 Sent 1752033393324 bytes 1536906566 pkt (dropped 0, overlimits 0 requeues 1063672) rate 0bit 0pps backlog 0b 0p requeues 1063672 [EMAIL PROTECTED] ipv4]# cat /proc/interrupts CPU0 CPU1 0: 46020594 44501954IO-APIC-edge timer 1: 9 0IO-APIC-edge i8042 7: 0 0IO-APIC-edge parport0 8: 1 0IO-APIC-edge rtc 9: 0 0 IO-APIC-level acpi 12: 96 0IO-APIC-edge i8042 14: 394023 407282IO-APIC-edge ide0 16: 0 0 IO-APIC-level uhci_hcd:usb4 17:11343461034006 IO-APIC-level uhci_hcd:usb3, eth0 18: 81605 83739 IO-APIC-level libata, uhci_hcd:usb2, eth1 19: 0 0 IO-APIC-level uhci_hcd:usb1, ehci_hcd:usb5 20: 53056128 46598235 IO-APIC-level eth2 21: 47534577 52189674 IO-APIC-level eth3 NMI: 0 0 LOC: 90485383 90485382 ERR: 0 MIS: 0 [EMAIL PROTECTED] ipv4]# cat /proc/net/sockstat sockets: used 334 TCP: inuse 27 orphan 0 tw 0 alloc 27 mem 360 UDP: inuse 12 RAW: inuse 0 FRAG: inuse 0 memory 0 [EMAIL PROTECTED] ipv4]# cat /proc/net/softnet_stat d58236f1 023badc3 0004ef01 3a4354a1 01b57b4b 0005445f Thanks, Ben - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html -- Ben Greear <[EMAIL PROTECTED]> Candela Technologies Inc http://www.candelatech.com - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More
Re: TCP connection stops after high load.
On Thu, 12 Apr 2007 10:59:19 -0700 Ben Greear <[EMAIL PROTECTED]> wrote: > > Here is a tcpdump of the connection in the stalled state. As you can see by > the 'time' output, it's running at around 100,000 packets per second. tcpdump > dropped the vast majority of these. Based on the network interface stats, I > believe both sides of the connection are sending acks at about the same > rate (about 160kpps when tcpdump is not running it seems). Warning : tcpdump can lie, telling you packets being transmited several time. And yes, tcpdump slow things down because of enabling accurate timestamping of packets. > > > 10:46:46.541490 IP 20.20.20.20.33011 > 20.20.20.30.33012: . ack 48 win 6132 > > 10:46:46.541494 IP 20.20.20.30.33012 > 20.20.20.20.33011: . ack 1 win 114 > > 10:46:46.541567 IP 20.20.20.30.33012 > 20.20.20.20.33011: . ack 1 win 114 > > 10:46:46.541653 IP 20.20.20.30.33012 > 20.20.20.20.33011: . ack 1 win 114 > > 10:46:46.541886 IP 20.20.20.30.33012 > 20.20.20.20.33011: . ack 1 win 114 > > 10:46:46.541891 IP 20.20.20.20.33011 > 20.20.20.30.33012: . ack 48 win 6132 > > 10:46:46.541895 IP 20.20.20.30.33012 > 20.20.20.20.33011: . ack 1 win 114 > > 10:46:46.541988 IP 20.20.20.30.33012 > 20.20.20.20.33011: . ack 1 win 114 > > What "tc -s -d qdisc" "ifconfig -a" "cat /proc/interrupts" "cat /proc/net/sockstat" and "cat /proc/net/softnet_stat" are telling ? - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: TCP connection stops after high load.
Andi Kleen wrote: Ben Greear <[EMAIL PROTECTED]> writes: I don't mind adding printks...and I've started reading through the code, but there is a lot of it, and indiscriminate printks will likely just hide the problem because it will slow down performance so much. You could add /proc/net/snmp counters for interesting events (e.g. GFP_ATOMIC allocations failing). Perhaps netstat -s already shows something interesting. I will look for more interesting events to add counters for, thanks for the suggestion. Thanks for the rest of the suggestions and patches from others as well, I will be trying those out today and will let you know how it goes. I can also try this on the 2.6.20 kernel. This is on the machine connected to itself. This is by far the easiest way to reproduce the problem. This is from the stalled state. About 3-5 minutes later (I wasn't watching too closely), the connection briefly started up again and then stalled again. While it is stalled and sending ACKs, the netstat -an counters remain the same. It appears this run/stall behaviour happens repeatedly, as the over-all bits-per-second average overnight was around 90Mbps, and it runs at ~800Mbps when running full speed. from netstat -an: tcp0 759744 20.20.20.30:33012 20.20.20.20:33011 ESTABLISHED tcp0 722984 20.20.20.20:33011 20.20.20.30:33012 ESTABLISHED I'm not sure if netstat -s shows interesting things or not...it does show a very large number of packets in and out. I ran it twice..about 5 seconds apart. I pasted some values from the second run on the right-hand side where the numbers looked interesting. This info is at the bottom of this email. For GFP_ATOMIC allocations failing, doesn't that show up as order X allocation failure messages in the kernel (I see no messages of this type.)? Here is a tcpdump of the connection in the stalled state. As you can see by the 'time' output, it's running at around 100,000 packets per second. tcpdump dropped the vast majority of these. Based on the network interface stats, I believe both sides of the connection are sending acks at about the same rate (about 160kpps when tcpdump is not running it seems). 10:46:46.541490 IP 20.20.20.20.33011 > 20.20.20.30.33012: . ack 48 win 6132 10:46:46.541494 IP 20.20.20.30.33012 > 20.20.20.20.33011: . ack 1 win 114 10:46:46.541567 IP 20.20.20.30.33012 > 20.20.20.20.33011: . ack 1 win 114 10:46:46.541653 IP 20.20.20.30.33012 > 20.20.20.20.33011: . ack 1 win 114 10:46:46.541886 IP 20.20.20.30.33012 > 20.20.20.20.33011: . ack 1 win 114 10:46:46.541891 IP 20.20.20.20.33011 > 20.20.20.30.33012: . ack 48 win 6132 10:46:46.541895 IP 20.20.20.30.33012 > 20.20.20.20.33011: . ack 1 win 114 10:46:46.541988 IP 20.20.20.30.33012 > 20.20.20.20.33011: . ack 1 win 114 10:46:46.542077 IP 20.20.20.30.33012 > 20.20.20.20.33011: . ack 1 win 114 10:46:46.542307 IP 20.20.20.30.33012 > 20.20.20.20.33011: . ack 1 win 114 10:46:46.542312 IP 20.20.20.20.33011 > 20.20.20.30.33012: . ack 48 win 6132 10:46:46.542321 IP 20.20.20.20.33011 > 20.20.20.30.33012: . ack 48 win 6132 10:46:46.542410 IP 20.20.20.30.33012 > 20.20.20.20.33011: . ack 1 win 114 10:46:46.542494 IP 20.20.20.30.33012 > 20.20.20.20.33011: . ack 1 win 114 10:46:46.542708 IP 20.20.20.30.33012 > 20.20.20.20.33011: . ack 1 win 114 10:46:46.542718 IP 20.20.20.30.33012 > 20.20.20.20.33011: . ack 1 win 114 10:46:46.542735 IP 20.20.20.30.33012 > 20.20.20.20.33011: . ack 1 win 114 10:46:46.542818 IP 20.20.20.30.33012 > 20.20.20.20.33011: . ack 1 win 114 10:46:46.542899 IP 20.20.20.20.33011 > 20.20.20.30.33012: . ack 48 win 6132 4214 packets captured 253889 packets received by filter 244719 packets dropped by kernel real0m2.640s user0m0.067s sys 0m0.079s Two netstat -s outputsabout 5 seconds apart. [EMAIL PROTECTED] ipv4]# netstat -s Ip: 2823452436 total packets received 2840939253 total packets received 1 with invalid addresses 0 forwarded 0 incoming packets discarded 2823452435 incoming packets delivered 2840939252 incoming packets delivered 1549687963 requests sent out1565951477 requests sent out Icmp: 0 ICMP messages received 0 input ICMP message failed. ICMP input histogram: 0 ICMP messages sent 0 ICMP messages failed ICMP output histogram: Tcp: 77 active connections openings 74 passive connection openings 0 failed connection attempts 122 connection resets received 10 connections established 2823426197 segments received2840914122 segments received 1549683727 segments send out1565948373 segments send out 2171 segments retransmited 2187 segments retransmited 0 bad segments received. 2203 resets sent Udp: 21739 packets received 0 packets to unknown port received. 0 packet receive errors 4236 packets sent TcpExt: 1164 invalid SY
Re: TCP connection stops after high load.
Ben Greear <[EMAIL PROTECTED]> writes: > > I don't mind adding printks...and I've started reading through the code, > but there is a lot of it, and indiscriminate printks will likely just > hide the problem because it will slow down performance so much. You could add /proc/net/snmp counters for interesting events (e.g. GFP_ATOMIC allocations failing). Perhaps netstat -s already shows something interesting. -Andi - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: TCP connection stops after high load.
On Wed, 11 Apr 2007, Ben Greear wrote: > The problem is that I set up a TCP connection with bi-directional traffic > of around 800Mbps, doing large (20k - 64k writes and reads) between two ports > on > the same machine (this 2.6.18.2 kernel is tainted with my full patch set, > but I also reproduced with only the non-tainted send-to-self patch applied > last may on the 2.6.16 kernel, so I assume the bug is not particular to my > patch > set). > > At first, all is well, but within 5-10 minutes, the TCP connection will stall > and I only see a massive amount of duplicate ACKs on the link. Before, > I sometimes saw OOM messages, but this time there are no OOM messages. The > system > has a two-port pro/1000 fibre NIC, 1GB RAM, kernel 2.6.18.2 + hacks, etc. > Stopping and starting the connection allows traffic to flow again (if > briefly). > Starting a new connection works fine even if the old one is still stalled, > so it's not a global memory exhaustion problem. > > So, I would like to dig into this problem myself since no one else > is reporting this type of problem, but I am quite ignorant of the TCP > stack implementation. Based on the dup-acks I see on the wire, I assume > the TCP state machine is messed up somehow. Could anyone point me to > likely places in the TCP stack to start looking for this bug? Since your doing bidirectional, try this patch below (probably you'll have apply it manually to 2.6.18 series due to space changes that were made after it in net/ hierarchy). I suspect it's a part of the problem but there could be other things as well because this should only hinder TCP before RTO occurs: [PATCH] [TCP]: Fix ratehalving with bidirectional flows Actually, the ratehalving seems to work too well, as cwnd is reduced on every second ACK even though the packets in flight remains unchanged. Recoveries in a bidirectional flows suffer quite badly because of this, both NewReno and SACK are affected. After this patch, rate halving is performed per ACK only if packets in flight was supposedly changed too. Signed-off-by: Ilpo Järvinen <[EMAIL PROTECTED]> --- net/ipv4/tcp_input.c | 23 +-- 1 files changed, 13 insertions(+), 10 deletions(-) diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index 322e43c..bf0f74c 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -1823,19 +1823,22 @@ static inline u32 tcp_cwnd_min(const str } /* Decrease cwnd each second ack. */ -static void tcp_cwnd_down(struct sock *sk) +static void tcp_cwnd_down(struct sock *sk, int flag) { struct tcp_sock *tp = tcp_sk(sk); int decr = tp->snd_cwnd_cnt + 1; + + if ((flag&FLAG_FORWARD_PROGRESS) || + (IsReno(tp) && !(flag&FLAG_NOT_DUP))) { + tp->snd_cwnd_cnt = decr&1; + decr >>= 1; - tp->snd_cwnd_cnt = decr&1; - decr >>= 1; + if (decr && tp->snd_cwnd > tcp_cwnd_min(sk)) + tp->snd_cwnd -= decr; - if (decr && tp->snd_cwnd > tcp_cwnd_min(sk)) - tp->snd_cwnd -= decr; - - tp->snd_cwnd = min(tp->snd_cwnd, tcp_packets_in_flight(tp)+1); - tp->snd_cwnd_stamp = tcp_time_stamp; + tp->snd_cwnd = min(tp->snd_cwnd, tcp_packets_in_flight(tp)+1); + tp->snd_cwnd_stamp = tcp_time_stamp; + } } /* Nothing was retransmitted or returned timestamp is less @@ -2020,7 +2023,7 @@ static void tcp_try_to_open(struct sock } tcp_moderate_cwnd(tp); } else { - tcp_cwnd_down(sk); + tcp_cwnd_down(sk, flag); } } @@ -2220,7 +2223,7 @@ tcp_fastretrans_alert(struct sock *sk, u if (is_dupack || tcp_head_timedout(sk, tp)) tcp_update_scoreboard(sk, tp); - tcp_cwnd_down(sk); + tcp_cwnd_down(sk, flag); tcp_xmit_retransmit_queue(sk); } -- 1.4.2
Re: TCP connection stops after high load.
I also noticed this happening with 2.6.18 kernel version, but this was not severe with linux 2.6.20.3. So, the short-term solution will be upgrading to the latest kernel of FC-6. A long black-out is mostly observed when a lot of packet losses happened in slow start. You can prevent this by applying a patch (limited slow start) to your slow start. Did you have same problems with cubic which employs a less aggressive slow start? I leave this debugging for some later version of kernel but you are welcome to debug this problem. I recommend you install tcp_probe and recreate the problem. Whenever you get an ack from the receiver, the probe will print the current congestion information. Also, you can easily include some other information you want in that module. You can get some information from some statistics on /proc/net/tcp and /proc/net/netstat. See http://netsrv.csc.ncsu.edu/wiki/index.php/Efficiency_of_SACK_processing Thanks, Sangtae On 4/11/07, Ben Greear <[EMAIL PROTECTED]> wrote: David Miller wrote: > From: Ben Greear <[EMAIL PROTECTED]> > Date: Wed, 11 Apr 2007 14:06:31 -0700 > >> Does the CWND == 1 count as solid? Any idea how/why this would go >> to 1 in conjunction with the dup acks? >> >> For the dup acks, I see nothing *but* dup acks on the wire...going in >> both directions interestingly, at greater than 100,000 packets per second. >> >> I don't mind adding printks...and I've started reading through the code, >> but there is a lot of it, and indiscriminate printks will likely just >> hide the problem because it will slow down performance so much. > > If you know that it doesn't take Einstein to figure out that maybe you > should add logging when CWND is one and we're sending out an ACK? > > This is why I think you're very lazy Ben and I get very agitated with > all of your reports, you put zero effort into thinking about how to > debug the problem even though you know full well how to do it. I've spent solid weeks tracking down obscure races. I'm hoping that someone who knows the tcp stack will have some idea of places to look based on the reported symptoms so that I don't have to spend another solid week chasing this one. If not, so be it..I'm still working on this between sending emails. For what it's worth, the problem (or something similar) is reproducible on a stock FC5 .18-ish kernel as well, running between two machines, 2 ports each. Ben -- Ben Greear <[EMAIL PROTECTED]> Candela Technologies Inc http://www.candelatech.com - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: TCP connection stops after high load.
On Wed, Apr 11, 2007 at 02:06:31PM -0700, Ben Greear wrote: > For the dup acks, I see nothing *but* dup acks on the wire...going in > both directions interestingly, at greater than 100,000 packets per second. > > I don't mind adding printks...and I've started reading through the code, > but there is a lot of it, and indiscriminate printks will likely just > hide the problem because it will slow down performance so much. What do the timestamps look like? PAWS contains logic which will drop packets if the timestamps are too old compared to what the receiver expects. -ben -- "Time is of no importance, Mr. President, only life is important." Don't Email: <[EMAIL PROTECTED]>. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: TCP connection stops after high load.
From: Ben Greear <[EMAIL PROTECTED]> Date: Wed, 11 Apr 2007 14:31:00 -0700 > I've spent solid weeks tracking down obscure races. I've spent solid weeks tracking down kernel stack corruption and scsi problems on sparc64, as well as attending to my network maintainer duties, what is your point? > I'm hoping that someone who knows the tcp stack will have some idea > of places to look based on the reported symptoms so that I don't > have to spend another solid week chasing this one. If you can reproduce the bug and others cannot, you are the one in the best possible situation to add diagnostics and figure out what's wrong. Please do this. You get a lot from Linux in your work, but you sure grumble a lot when you might need to give even a smidgen back. You just dump random pieces of information at this list and expect other people to just fix it for you. It's this part of your attitude that I absolutely do not like. Other people are able to report bugs in a pleasant and non-selfish way that makes me want to go and fix the bug for them proactively, you do not. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: TCP connection stops after high load.
David Miller wrote: From: Ben Greear <[EMAIL PROTECTED]> Date: Wed, 11 Apr 2007 14:06:31 -0700 Does the CWND == 1 count as solid? Any idea how/why this would go to 1 in conjunction with the dup acks? For the dup acks, I see nothing *but* dup acks on the wire...going in both directions interestingly, at greater than 100,000 packets per second. I don't mind adding printks...and I've started reading through the code, but there is a lot of it, and indiscriminate printks will likely just hide the problem because it will slow down performance so much. If you know that it doesn't take Einstein to figure out that maybe you should add logging when CWND is one and we're sending out an ACK? > This is why I think you're very lazy Ben and I get very agitated with all of your reports, you put zero effort into thinking about how to debug the problem even though you know full well how to do it. I've spent solid weeks tracking down obscure races. I'm hoping that someone who knows the tcp stack will have some idea of places to look based on the reported symptoms so that I don't have to spend another solid week chasing this one. If not, so be it..I'm still working on this between sending emails. For what it's worth, the problem (or something similar) is reproducible on a stock FC5 .18-ish kernel as well, running between two machines, 2 ports each. Ben -- Ben Greear <[EMAIL PROTECTED]> Candela Technologies Inc http://www.candelatech.com - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: TCP connection stops after high load.
From: Ben Greear <[EMAIL PROTECTED]> Date: Wed, 11 Apr 2007 14:06:31 -0700 > Does the CWND == 1 count as solid? Any idea how/why this would go > to 1 in conjunction with the dup acks? > > For the dup acks, I see nothing *but* dup acks on the wire...going in > both directions interestingly, at greater than 100,000 packets per second. > > I don't mind adding printks...and I've started reading through the code, > but there is a lot of it, and indiscriminate printks will likely just > hide the problem because it will slow down performance so much. If you know that it doesn't take Einstein to figure out that maybe you should add logging when CWND is one and we're sending out an ACK? This is why I think you're very lazy Ben and I get very agitated with all of your reports, you put zero effort into thinking about how to debug the problem even though you know full well how to do it. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: TCP connection stops after high load.
David Miller wrote: From: Ben Greear <[EMAIL PROTECTED]> Date: Wed, 11 Apr 2007 13:26:36 -0700 Interestingly, I found this page mentioning a SACK problem in Linux: http://www-didc.lbl.gov/TCP-tuning/linux.html Don't read that page, it is the last place in the world your should take hints and advice from, most of the problems they speak of there have been fixed years ago. Much of their memory and buffer settings are similar to what I've seen elsewhere..and what I use, but it could be we're all getting the same info from the same faulty source. Suggestions of a proper site for tuning TCP for high speed/high latency links are welcome. Please start instrumenting the TCP code instead of "poking around" hoping you'll hit the grand jackpot by manipulating some sysctl setting. It doesn't help us and it won't help you, start reading and understanding the TCP code, add debugging printk's, anything to get more information about this. And please don't report anything here until you have some solid piece of debugging information, else I'll just sit here replying and prodding you along ever so slowly. :( Does the CWND == 1 count as solid? Any idea how/why this would go to 1 in conjunction with the dup acks? For the dup acks, I see nothing *but* dup acks on the wire...going in both directions interestingly, at greater than 100,000 packets per second. I don't mind adding printks...and I've started reading through the code, but there is a lot of it, and indiscriminate printks will likely just hide the problem because it will slow down performance so much. Thanks, Ben -- Ben Greear <[EMAIL PROTECTED]> Candela Technologies Inc http://www.candelatech.com - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: TCP connection stops after high load.
From: Ben Greear <[EMAIL PROTECTED]> Date: Wed, 11 Apr 2007 13:26:36 -0700 > Interestingly, I found this page mentioning a SACK problem in Linux: > http://www-didc.lbl.gov/TCP-tuning/linux.html Don't read that page, it is the last place in the world your should take hints and advice from, most of the problems they speak of there have been fixed years ago. Please start instrumenting the TCP code instead of "poking around" hoping you'll hit the grand jackpot by manipulating some sysctl setting. It doesn't help us and it won't help you, start reading and understanding the TCP code, add debugging printk's, anything to get more information about this. And please don't report anything here until you have some solid piece of debugging information, else I'll just sit here replying and prodding you along ever so slowly. :( - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: TCP connection stops after high load.
From: Ben Greear <[EMAIL PROTECTED]> Date: Wed, 11 Apr 2007 11:50:18 -0700 > So, I would like to dig into this problem myself since no one else > is reporting this type of problem, but I am quite ignorant of the TCP > stack implementation. Based on the dup-acks I see on the wire, I assume > the TCP state machine is messed up somehow. Could anyone point me to > likely places in the TCP stack to start looking for this bug? Dup acks mean that packets are being dropped and there are thus holes in the sequence seen at the receiver. Likely what happens is that we hit the global memory pressure limit, start dropping packets, and never recover even after the memory pressure is within it's limits again. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: TCP connection stops after high load.
Ben Greear wrote: Back in May of last year, I reported this problem, but worked around it at the time by changing the kernel memory settings in the networking stack. I reproduced the problem again today with the previously working kernel memory settings..which is not supprising since I just papered over the bug last time. So, I have been poking around. Disabling tso makes the problem happen sooner (< 1 minute). Changing the tcp_congestion_control does not help. Interestingly, I found this page mentioning a SACK problem in Linux: http://www-didc.lbl.gov/TCP-tuning/linux.html I tried disabling SACK, but the problem still happens. However, I do see the CWND go to 1 as soon as the connection stalls (I'm not sure exactly which happens first.) Before the stall, I see CWND reported in the ~40 range. Maybe something similar to the SACK bug can happen on very fast, very low latency links, with large send/receive buffers configured? Thanks, Ben -- Ben Greear <[EMAIL PROTECTED]> Candela Technologies Inc http://www.candelatech.com - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html