Re: 2.4 tcp very slow under certain circumstances (Re: netdev issues (3c905B))
"David S. Miller" wrote: > > Jordan Mendelson writes: > > Now, if it didn't have the side effect of dropping packets left and > > right after ~4000 open connections (simultaneously), I could finally > > move our production system to 2.4.x. > > The change I posted as-is, is unacceptable because it adds unnecessary > cost to a fast path. The final change I actually use will likely > involve using the TCP sequence numbers to calculate an "always > changing" ID number in the IPv4 headers to placate these broken > windows machines. Just for kicks I modified the fast path to use a globally incremented count to see if it would fix both Win9x problem and my 4K connection problem and it appears to be working just fine. What probably happened was the sheer number of packets at 4K connections without the fast path just slowed everything down to a crawl. Thanks Dave, Jordan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.4 tcp very slow under certain circumstances (Re: netdev issues (3c905B))
"David S. Miller" wrote: > > Jordan Mendelson writes: > > Now, if it didn't have the side effect of dropping packets left and > > right after ~4000 open connections (simultaneously), I could finally > > move our production system to 2.4.x. > > There is no reason my patch should have this effect. My guess is that the fast path prevented the need for looking up the destination in some structure which is limited to ~4K entries (route table?). Jordan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.4 tcp very slow under certain circumstances (Re: netdev issues (3c905B))
"David S. Miller" wrote: > > Ookhoi writes: > > We have exactly the same problem but in our case it depends on the > > following three conditions: 1, kernel 2.4 (2.2 is fine), 2, windows ip > > header compression turned on, 3, a free internet access provider in > > Holland called 'Wish' (which seemes to stand for 'I Wish I had a faster > > connection'). > > If we remove one of the three conditions, the connection is oke. It is > > only tcp which is affected. > > A packet on its way from linux server to windows client seems to get > > dropped once and retransmitted. This makes the connection _very_ slow. > > :-( I hate these buggy systems. > > Does this patch below fix the performance problem and are the windows > clients win2000 or win95? Just a note however... this patch did fix the problem we were seeing with retransmits and Win95 compressed PPP and dialup over earthlink in the bay area. Now, if it didn't have the side effect of dropping packets left and right after ~4000 open connections (simultaneously), I could finally move our production system to 2.4.x. Jordan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.4 tcp very slow under certain circumstances (Re: netdev issues (3c905B))
"David S. Miller" wrote: > > Ookhoi writes: > > We have exactly the same problem but in our case it depends on the > > following three conditions: 1, kernel 2.4 (2.2 is fine), 2, windows ip > > header compression turned on, 3, a free internet access provider in > > Holland called 'Wish' (which seemes to stand for 'I Wish I had a faster > > connection'). > > If we remove one of the three conditions, the connection is oke. It is > > only tcp which is affected. > > A packet on its way from linux server to windows client seems to get > > dropped once and retransmitted. This makes the connection _very_ slow. > > :-( I hate these buggy systems. > > Does this patch below fix the performance problem and are the windows > clients win2000 or win95? I wanted to see if this would fix the problem I was seeing with Win9x users on PPP w/ compression dialing up to Earthlink in the bay area (there are others, but it's the only one I can reproduce). I compiled 2.4.1 with this change and for some odd reason, the kernel started dropping packets and became unusable (couldn't ssh in) after around 4050 connections were opened. I tested it also with 2.4.1-ac20 and had the same problem right around 4050 connections. This is on a VA Linux box with dual eepro100's (one used) connected to a Cisco 6509. > --- include/net/ip.h.~1~Mon Feb 19 00:12:31 2001 > +++ include/net/ip.hWed Feb 21 02:56:15 2001 > @@ -190,9 +190,11 @@ > > static inline void ip_select_ident(struct iphdr *iph, struct dst_entry *dst) > { > +#if 0 > if (iph->frag_off&__constant_htons(IP_DF)) > iph->id = 0; > else > +#endif > __ip_select_ident(iph, dst); > } > > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to [EMAIL PROTECTED] > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.4 tcp very slow under certain circumstances (Re: netdev issues (3c905B))
"David S. Miller" wrote: Ookhoi writes: We have exactly the same problem but in our case it depends on the following three conditions: 1, kernel 2.4 (2.2 is fine), 2, windows ip header compression turned on, 3, a free internet access provider in Holland called 'Wish' (which seemes to stand for 'I Wish I had a faster connection'). If we remove one of the three conditions, the connection is oke. It is only tcp which is affected. A packet on its way from linux server to windows client seems to get dropped once and retransmitted. This makes the connection _very_ slow. :-( I hate these buggy systems. Does this patch below fix the performance problem and are the windows clients win2000 or win95? I wanted to see if this would fix the problem I was seeing with Win9x users on PPP w/ compression dialing up to Earthlink in the bay area (there are others, but it's the only one I can reproduce). I compiled 2.4.1 with this change and for some odd reason, the kernel started dropping packets and became unusable (couldn't ssh in) after around 4050 connections were opened. I tested it also with 2.4.1-ac20 and had the same problem right around 4050 connections. This is on a VA Linux box with dual eepro100's (one used) connected to a Cisco 6509. --- include/net/ip.h.~1~Mon Feb 19 00:12:31 2001 +++ include/net/ip.hWed Feb 21 02:56:15 2001 @@ -190,9 +190,11 @@ static inline void ip_select_ident(struct iphdr *iph, struct dst_entry *dst) { +#if 0 if (iph-frag_off__constant_htons(IP_DF)) iph-id = 0; else +#endif __ip_select_ident(iph, dst); } - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.4 tcp very slow under certain circumstances (Re: netdev issues (3c905B))
"David S. Miller" wrote: Ookhoi writes: We have exactly the same problem but in our case it depends on the following three conditions: 1, kernel 2.4 (2.2 is fine), 2, windows ip header compression turned on, 3, a free internet access provider in Holland called 'Wish' (which seemes to stand for 'I Wish I had a faster connection'). If we remove one of the three conditions, the connection is oke. It is only tcp which is affected. A packet on its way from linux server to windows client seems to get dropped once and retransmitted. This makes the connection _very_ slow. :-( I hate these buggy systems. Does this patch below fix the performance problem and are the windows clients win2000 or win95? Just a note however... this patch did fix the problem we were seeing with retransmits and Win95 compressed PPP and dialup over earthlink in the bay area. Now, if it didn't have the side effect of dropping packets left and right after ~4000 open connections (simultaneously), I could finally move our production system to 2.4.x. Jordan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.4 tcp very slow under certain circumstances (Re: netdev issues (3c905B))
"David S. Miller" wrote: Jordan Mendelson writes: Now, if it didn't have the side effect of dropping packets left and right after ~4000 open connections (simultaneously), I could finally move our production system to 2.4.x. There is no reason my patch should have this effect. My guess is that the fast path prevented the need for looking up the destination in some structure which is limited to ~4K entries (route table?). Jordan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.4 tcp very slow under certain circumstances (Re: netdev issues (3c905B))
"David S. Miller" wrote: Jordan Mendelson writes: Now, if it didn't have the side effect of dropping packets left and right after ~4000 open connections (simultaneously), I could finally move our production system to 2.4.x. The change I posted as-is, is unacceptable because it adds unnecessary cost to a fast path. The final change I actually use will likely involve using the TCP sequence numbers to calculate an "always changing" ID number in the IPv4 headers to placate these broken windows machines. Just for kicks I modified the fast path to use a globally incremented count to see if it would fix both Win9x problem and my 4K connection problem and it appears to be working just fine. What probably happened was the sheer number of packets at 4K connections without the fast path just slowed everything down to a crawl. Thanks Dave, Jordan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: MTU and 2.4.x kernel
Rick Jones wrote: > > > Default of 536 is sadistic (and apaprently will be changed eventually > > to stop tears of poor people whose providers not only supply them > > with bogus mtu values sort of 552 or even 296, but also jailed them > > to some proxy or masquearding domain), but it is still right: IP > > with mtu lower 576 is not full functional. > > I thought that the specs said that 576 was the "minimum maximum" > reassemblable IP datagram size and not a minimum MTU. RFC 1191 (Path MTU Discovery as it happens): PlateauMTUComments Reference -- --- - 65535 Official maximum MTU RFC 791 65535 Hyperchannel RFC 1044 65535 32000 Just in case 17914 16Mb IBM Token Ring ref. [6] 17914 8166 IEEE 802.4RFC 1042 8166 4464 IEEE 802.5 (4Mb max) RFC 1042 4352 FDDI (Revised)RFC 1188 4352 (1%) 2048 Wideband Network RFC 907 2002 IEEE 802.5 (4Mb recommended) RFC 1042 2002 (2%) 1536 Exp. Ethernet NetsRFC 895 1500 Ethernet Networks RFC 894 1500 Point-to-Point (default) RFC 1134 1492 IEEE 802.3RFC 1042 1492 (3%) 1006 SLIP RFC 1055 1006 ARPANET BBN 1822 1006 576X.25 Networks RFC 877 544DEC IP Portal ref. [10] 512NETBIOS RFC 1088 508IEEE 802/Source-Rt Bridge RFC 1042 508ARCNETRFC 1051 508 (13%) 296Point-to-Point (low delay)RFC 1144 296 68Official minimum MTU RFC 791 Jordan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: MTU and 2.4.x kernel
Rick Jones wrote: Default of 536 is sadistic (and apaprently will be changed eventually to stop tears of poor people whose providers not only supply them with bogus mtu values sort of 552 or even 296, but also jailed them to some proxy or masquearding domain), but it is still right: IP with mtu lower 576 is not full functional. I thought that the specs said that 576 was the "minimum maximum" reassemblable IP datagram size and not a minimum MTU. RFC 1191 (Path MTU Discovery as it happens): PlateauMTUComments Reference -- --- - 65535 Official maximum MTU RFC 791 65535 Hyperchannel RFC 1044 65535 32000 Just in case 17914 16Mb IBM Token Ring ref. [6] 17914 8166 IEEE 802.4RFC 1042 8166 4464 IEEE 802.5 (4Mb max) RFC 1042 4352 FDDI (Revised)RFC 1188 4352 (1%) 2048 Wideband Network RFC 907 2002 IEEE 802.5 (4Mb recommended) RFC 1042 2002 (2%) 1536 Exp. Ethernet NetsRFC 895 1500 Ethernet Networks RFC 894 1500 Point-to-Point (default) RFC 1134 1492 IEEE 802.3RFC 1042 1492 (3%) 1006 SLIP RFC 1055 1006 ARPANET BBN 1822 1006 576X.25 Networks RFC 877 544DEC IP Portal ref. [10] 512NETBIOS RFC 1088 508IEEE 802/Source-Rt Bridge RFC 1042 508ARCNETRFC 1051 508 (13%) 296Point-to-Point (low delay)RFC 1144 296 68Official minimum MTU RFC 791 Jordan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.4.0 Networking oddity
Daniel Walton wrote: > > The server in question is running the tulip driver. dmesg reports: > > Linux Tulip driver version 0.9.13 (January 2, 2001) > > I have seen this same behavior on a couple of my servers running 3com > 3c905c adaptors as well. > > The last time I was experiencing it I rebooted the system and it didn't > solve the problem. When it came up it was still lagging. This would lead > me to believe that it is caused by some sort of network condition, but what > I don't know. > > If anyone has ideas, I'd be more than happy to run tests/provide more info.. > If you are running an intelligent switch, double check to make sure your duplex and speed match what the switch sees on it's port. The biggest problem I've had with any of my machines is autonegotiation of port speed and duplex. Typically all that is required is that I force speed and duplex on the Linux end. Jordan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: 2.4.0 Networking oddity
Daniel Walton wrote: The server in question is running the tulip driver. dmesg reports: Linux Tulip driver version 0.9.13 (January 2, 2001) I have seen this same behavior on a couple of my servers running 3com 3c905c adaptors as well. The last time I was experiencing it I rebooted the system and it didn't solve the problem. When it came up it was still lagging. This would lead me to believe that it is caused by some sort of network condition, but what I don't know. If anyone has ideas, I'd be more than happy to run tests/provide more info.. If you are running an intelligent switch, double check to make sure your duplex and speed match what the switch sees on it's port. The biggest problem I've had with any of my machines is autonegotiation of port speed and duplex. Typically all that is required is that I force speed and duplex on the Linux end. Jordan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: USB problems with 2.4.0: USBDEVFS_BULK failed
Greg KH wrote: > > On Thu, Jan 04, 2001 at 07:52:15PM -0800, Jordan Mendelson wrote: > > > > Alright, this is driving me nuts. I have a Canon S20 digital camera > > hooked up to a Sony XG series laptop via the USB port and am using s10sh > > to access it. s10sh uses libusb 0.1.1, but I've also tried it using > > libusb 0.1.2 without any luck. libusb uses usbfs to access to the device > > from userspace. > > > > The last time it worked was around 2.4.0test10, but might have been > > test9. test12, prerelease and 2.4.0 final all fail. > > Could you try to verify exactly which version things died on? As you > know USB has had a number of changes to the code recently :) > > That would help us try to determine what broke. I just rebooted a few times... 2.4.0-test10 is the last kernel that it worked correctly with. 2.4.0-test11 shows the same signs as 2.4.0-test12, prerelease and 2.4.0 proper. Jordan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: USB problems with 2.4.0: USBDEVFS_BULK failed
Greg KH wrote: On Thu, Jan 04, 2001 at 07:52:15PM -0800, Jordan Mendelson wrote: Alright, this is driving me nuts. I have a Canon S20 digital camera hooked up to a Sony XG series laptop via the USB port and am using s10sh to access it. s10sh uses libusb 0.1.1, but I've also tried it using libusb 0.1.2 without any luck. libusb uses usbfs to access to the device from userspace. The last time it worked was around 2.4.0test10, but might have been test9. test12, prerelease and 2.4.0 final all fail. Could you try to verify exactly which version things died on? As you know USB has had a number of changes to the code recently :) That would help us try to determine what broke. I just rebooted a few times... 2.4.0-test10 is the last kernel that it worked correctly with. 2.4.0-test11 shows the same signs as 2.4.0-test12, prerelease and 2.4.0 proper. Jordan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
USB problems with 2.4.0: USBDEVFS_BULK failed
Alright, this is driving me nuts. I have a Canon S20 digital camera hooked up to a Sony XG series laptop via the USB port and am using s10sh to access it. s10sh uses libusb 0.1.1, but I've also tried it using libusb 0.1.2 without any luck. libusb uses usbfs to access to the device from userspace. The last time it worked was around 2.4.0test10, but might have been test9. test12, prerelease and 2.4.0 final all fail. I've compiled the uhci driver with debugging. The log starts before I send the file transfer request to the camera and ends after the camera blows up and disconnects itself. This was done using 2.4.0 final. I have also included a protocol dump from s10sh, recorded during a second attempt. It looks like s10sh might strip header bytes from the log, but it should help somewhat. Now as far as I can tell, we submit a bulk transfer request and start reading. We want to read 2872 bytes (44 @ 64 bytes, 1 @ 56 bytes). We read off 44 @ 64 bytes, but for some reason don't read off the last 56 bytes and a babble is detected. Jordan Jan 4 18:06:29 u2 kernel: klogd 1.3-3#33.1, log source = /proc/kmsg started. Jan 4 18:06:29 u2 kernel: Inspecting /boot/System.map-2.4.0 Jan 4 18:06:29 u2 kernel: Loaded 13430 symbols from /boot/System.map-2.4.0. Jan 4 18:06:29 u2 kernel: Symbols match kernel version 2.4.0. Jan 4 18:06:29 u2 kernel: Loaded 145 symbols from 6 modules. Jan 4 18:06:32 u2 kernel: usb-uhci.c: search_dev_ep: Jan 4 18:06:32 u2 kernel: usb-uhci.c: submit_urb: scheduling cf2cfba0 Jan 4 18:06:32 u2 kernel: usb-uhci.c: uhci_submit_control start Jan 4 18:06:32 u2 kernel: usb-uhci.c: Allocated qh @ c43809e0 Jan 4 18:06:32 u2 kernel: usb-uhci.c: uhci_submit_control end Jan 4 18:06:32 u2 kernel: usb-uhci.c: submit_urb: scheduled with ret: 0 Jan 4 18:06:32 u2 kernel: usb-uhci.c: interrupt Jan 4 18:06:32 u2 kernel: usb-uhci.c: process_transfer: len:8 status:3807 mapped:0 toggle:0 Jan 4 18:06:32 u2 kernel: usb-uhci.c: process_transfer: len:32 status:381f mapped:0 toggle:1 Jan 4 18:06:32 u2 kernel: usb-uhci.c: process_transfer: len:32 status:381f mapped:0 toggle:0 Jan 4 18:06:32 u2 kernel: usb-uhci.c: process_transfer: len:32 status:381f mapped:0 toggle:1 Jan 4 18:06:32 u2 kernel: usb-uhci.c: process_transfer: len:22 status:3815 mapped:0 toggle:0 Jan 4 18:06:32 u2 kernel: usb-uhci.c: process_transfer: len:0 status:190007ff mapped:0 toggle:1 Jan 4 18:06:32 u2 kernel: usb-uhci.c: uhci_clean_transfer: No more bulks for urb cf2cfba0, qh c43809e0, bqh , nqh Jan 4 18:06:32 u2 kernel: usb-uhci.c: unlink qh c43809e0, pqh c4380720, nxqh c43806e0, to 043806e0 Jan 4 18:06:32 u2 kernel: usb-uhci.c: process_transfer: (end) urb cf2cfba0, wanted len 118, len 118 status 0 err 0 Jan 4 18:06:32 u2 kernel: usb-uhci.c: dequeued urb: cf2cfba0 Jan 4 18:06:32 u2 kernel: usb-uhci.c: unlink td @ c4380e20 Jan 4 18:06:32 u2 kernel: usb-uhci.c: unlink td @ c4380de0 Jan 4 18:06:32 u2 kernel: usb-uhci.c: unlink td @ c4380ee0 Jan 4 18:06:32 u2 kernel: usb-uhci.c: search_dev_ep: Jan 4 18:06:32 u2 kernel: usb-uhci.c: submit_urb: scheduling cf2cfba0 Jan 4 18:06:32 u2 kernel: usb-uhci.c: uhci_submit_bulk_urb: urb cf2cfba0, old , pipe c0008280, len 64 Jan 4 18:06:32 u2 kernel: usb-uhci.c: Allocated qh @ c4380aa0 Jan 4 18:06:32 u2 kernel: usb-uhci.c: uhci_submit_bulk: qh c4380aa0 bqh 0001 nqh c02595f2 Jan 4 18:06:32 u2 kernel: usb-uhci.c: submit_urb: scheduled with ret: 0 Jan 4 18:06:32 u2 kernel: usb-uhci.c: interrupt Jan 4 18:06:32 u2 kernel: usb-uhci.c: process_transfer: len:64 status:393f mapped:0 toggle:0 Jan 4 18:06:32 u2 kernel: usb-uhci.c: uhci_clean_transfer: No more bulks for urb cf2cfba0, qh c4380aa0, bqh , nqh Jan 4 18:06:32 u2 kernel: usb-uhci.c: unlink qh c4380aa0, pqh c43806e0, nxqh c4380660, to 04380660 Jan 4 18:06:32 u2 kernel: usb-uhci.c: process_transfer: (end) urb cf2cfba0, wanted len 64, len 64 status 0 err 0 Jan 4 18:06:32 u2 kernel: usb-uhci.c: dequeued urb: cf2cfba0 Jan 4 18:06:32 u2 kernel: usb-uhci.c: unlink td @ c4380e60 Jan 4 18:06:32 u2 kernel: usb-uhci.c: unlink td @ c4380d60 Jan 4 18:06:32 u2 kernel: usb-uhci.c: unlink td @ c4380ea0 Jan 4 18:06:32 u2 kernel: usb-uhci.c: unlink td @ c4380d20 Jan 4 18:06:32 u2 kernel: usb-uhci.c: unlink td @ c4380f20 Jan 4 18:06:32 u2 kernel: usb-uhci.c: unlink td @ c4380da0 Jan 4 18:06:32 u2 kernel: usb-uhci.c: search_dev_ep: Jan 4 18:06:32 u2 kernel: usb-uhci.c: submit_urb: scheduling cf2cfba0 Jan 4 18:06:32 u2 kernel: usb-uhci.c: uhci_submit_bulk_urb: urb cf2cfba0, old , pipe c0008280, len 2872 Jan 4 18:06:32 u2 kernel: usb-uhci.c: Allocated qh @ c43809e0 Jan 4 18:06:32 u2 kernel: usb-uhci.c: uhci_submit_bulk: qh c43809e0 bqh 0001 nqh c02595f2 Jan 4 18:06:32 u2 kernel: usb-uhci.c: submit_urb: scheduled with ret: 0 Jan 4 18:06:32 u2 kernel: usb-uhci.c: interrupt Jan 4 18:06:32 u2 kernel: usb-uhci.c: interrupt, status 3, frame#
Re: And oh, btw..
dep wrote: > > On Thursday 04 January 2001 07:36 pm, Jordan Mendelson wrote: > > | Go home, get out the epson salts, fill up the tub with hot water > | and just relax. > > right after getting the source posted on kernel.org! Sigh, try: http://www.kernel.org/pub/linux/kernel/testing/prerelease-diff Please don't flood kernel.org though... use a mirror. Jordan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: And oh, btw..
Linus Torvalds wrote: > > In a move unanimously hailed by the trade press and industry analysts as > being a sure sign of incipient braindamage, Linus Torvalds (also known as > the "father of Linux" or, more commonly, as "mush-for-brains") decided > that enough is enough, and that things don't get better from having the > same people test it over and over again. In short, 2.4.0 is out there. Everyone who has ever been the press spotlight knows that most of it is inaccurate, rushed and written to bring in readers rather than to report well thought out stories. Go home, get out the epson salts, fill up the tub with hot water and just relax. Jordan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
2.4.0-pre: usbdevfs: USBDEVFS_BULK failed ...
I've been having some problems with the recent 2.4.x kernels with my digital camera. The s10sh program accesses the Canon S20 digital camera using libusb in conjunction with usbfs to download images. Apparently, incorrect data about the size of images is being sent down the line after the first image transfer. Here are some messages printed to syslog: hub.c: USB new device connect on bus1/1, assigned device number 4 usbserial.c: none matched usb.c: USB device 4 (vend/prod 0x4a9/0x3043) is not claimed by any active driver. usb-uhci.c: interrupt, status 3, frame# 496 usbdevfs: USBDEVFS_BULK failed dev 4 ep 0x81 len 2872 ret -32 usbdevfs: USBDEVFS_BULK failed dev 4 ep 0x81 len 84 ret -32 usbdevfs: USBDEVFS_BULK failed dev 4 ep 0x81 len 64 ret -32 usb.c: USB disconnect on device 4 Now, the USB disconnect never actually happened physically. The camera looks like it stopped responding to it's USB port. Jordan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
2.4.0-pre: usbdevfs: USBDEVFS_BULK failed ...
I've been having some problems with the recent 2.4.x kernels with my digital camera. The s10sh program accesses the Canon S20 digital camera using libusb in conjunction with usbfs to download images. Apparently, incorrect data about the size of images is being sent down the line after the first image transfer. Here are some messages printed to syslog: hub.c: USB new device connect on bus1/1, assigned device number 4 usbserial.c: none matched usb.c: USB device 4 (vend/prod 0x4a9/0x3043) is not claimed by any active driver. usb-uhci.c: interrupt, status 3, frame# 496 usbdevfs: USBDEVFS_BULK failed dev 4 ep 0x81 len 2872 ret -32 usbdevfs: USBDEVFS_BULK failed dev 4 ep 0x81 len 84 ret -32 usbdevfs: USBDEVFS_BULK failed dev 4 ep 0x81 len 64 ret -32 usb.c: USB disconnect on device 4 Now, the USB disconnect never actually happened physically. The camera looks like it stopped responding to it's USB port. Jordan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: And oh, btw..
Linus Torvalds wrote: In a move unanimously hailed by the trade press and industry analysts as being a sure sign of incipient braindamage, Linus Torvalds (also known as the "father of Linux" or, more commonly, as "mush-for-brains") decided that enough is enough, and that things don't get better from having the same people test it over and over again. In short, 2.4.0 is out there. Everyone who has ever been the press spotlight knows that most of it is inaccurate, rushed and written to bring in readers rather than to report well thought out stories. Go home, get out the epson salts, fill up the tub with hot water and just relax. Jordan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
2.4.0-pre able to mount SHM twice
This is probably due to the source being 'none', but the shm mount point can be mounted twice at the same mount point. Shouldn't mount(2) return -EBUSY in this case? # cat /etc/mtab /dev/hda4 / ext2 rw,errors=remount-ro,errors=remount-ro 0 0 proc /proc proc rw 0 0 devpts /dev/pts devpts rw,gid=5,mode=620 0 0 /dev/hda1 /boot ext2 rw 0 0 /dev/hda3 /mnt/win vfat rw 0 0 none /proc/bus/usb usbdevfs rw 0 0 # mount /dev/shm # cat /etc/mtab /dev/hda4 / ext2 rw,errors=remount-ro,errors=remount-ro 0 0 proc /proc proc rw 0 0 devpts /dev/pts devpts rw,gid=5,mode=620 0 0 /dev/hda1 /boot ext2 rw 0 0 /dev/hda3 /mnt/win vfat rw 0 0 none /proc/bus/usb usbdevfs rw 0 0 none /dev/shm shm rw 0 0 # mount /dev/shm # cat /etc/mtab /dev/hda4 / ext2 rw,errors=remount-ro,errors=remount-ro 0 0 proc /proc proc rw 0 0 devpts /dev/pts devpts rw,gid=5,mode=620 0 0 /dev/hda1 /boot ext2 rw 0 0 /dev/hda3 /mnt/win vfat rw 0 0 none /proc/bus/usb usbdevfs rw 0 0 none /dev/shm shm rw 0 0 none /dev/shm shm rw 0 0 # umount /dev/shm # cat /etc/mtab /dev/hda4 / ext2 rw,errors=remount-ro,errors=remount-ro 0 0 proc /proc proc rw 0 0 devpts /dev/pts devpts rw,gid=5,mode=620 0 0 /dev/hda1 /boot ext2 rw 0 0 /dev/hda3 /mnt/win vfat rw 0 0 none /proc/bus/usb usbdevfs rw 0 0 Jordan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
2.4.0-pre able to mount SHM twice
This is probably due to the source being 'none', but the shm mount point can be mounted twice at the same mount point. Shouldn't mount(2) return -EBUSY in this case? # cat /etc/mtab /dev/hda4 / ext2 rw,errors=remount-ro,errors=remount-ro 0 0 proc /proc proc rw 0 0 devpts /dev/pts devpts rw,gid=5,mode=620 0 0 /dev/hda1 /boot ext2 rw 0 0 /dev/hda3 /mnt/win vfat rw 0 0 none /proc/bus/usb usbdevfs rw 0 0 # mount /dev/shm # cat /etc/mtab /dev/hda4 / ext2 rw,errors=remount-ro,errors=remount-ro 0 0 proc /proc proc rw 0 0 devpts /dev/pts devpts rw,gid=5,mode=620 0 0 /dev/hda1 /boot ext2 rw 0 0 /dev/hda3 /mnt/win vfat rw 0 0 none /proc/bus/usb usbdevfs rw 0 0 none /dev/shm shm rw 0 0 # mount /dev/shm # cat /etc/mtab /dev/hda4 / ext2 rw,errors=remount-ro,errors=remount-ro 0 0 proc /proc proc rw 0 0 devpts /dev/pts devpts rw,gid=5,mode=620 0 0 /dev/hda1 /boot ext2 rw 0 0 /dev/hda3 /mnt/win vfat rw 0 0 none /proc/bus/usb usbdevfs rw 0 0 none /dev/shm shm rw 0 0 none /dev/shm shm rw 0 0 # umount /dev/shm # cat /etc/mtab /dev/hda4 / ext2 rw,errors=remount-ro,errors=remount-ro 0 0 proc /proc proc rw 0 0 devpts /dev/pts devpts rw,gid=5,mode=620 0 0 /dev/hda1 /boot ext2 rw 0 0 /dev/hda3 /mnt/win vfat rw 0 0 none /proc/bus/usb usbdevfs rw 0 0 Jordan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Poor TCP Performance 2.4.0-10 <-> Win98 SE PPP
Andi Kleen wrote: > > On Mon, Nov 06, 2000 at 11:16:21PM -0800, Jordan Mendelson wrote: > > > It is clear though, that something is messing with or corrupting the > > > packets. One thing you might try is turning off TCP header > > > compression for the PPP link, does this make a difference? > > > > Actually, there has been several reports that turning header compression > > does help. > > What does help ? Turning it on or turning it off ? We had a good number of reports that turning PPP header compression off helped. The windows 98 connection I was testing with it did have header compression turned on. Unfortunatly, I can't just ask the entire windows world to turn off header compression in order to use our software. :) I believe we've reverted all of our machines to 2.2, so testing this any further is going to be a problem. Jordan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Poor TCP Performance 2.4.0-10 - Win98 SE PPP
Andi Kleen wrote: On Mon, Nov 06, 2000 at 11:16:21PM -0800, Jordan Mendelson wrote: It is clear though, that something is messing with or corrupting the packets. One thing you might try is turning off TCP header compression for the PPP link, does this make a difference? Actually, there has been several reports that turning header compression does help. What does help ? Turning it on or turning it off ? We had a good number of reports that turning PPP header compression off helped. The windows 98 connection I was testing with it did have header compression turned on. Unfortunatly, I can't just ask the entire windows world to turn off header compression in order to use our software. :) I believe we've reverted all of our machines to 2.2, so testing this any further is going to be a problem. Jordan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Poor TCP Performance 2.4.0-10 <-> Win98 SE PPP
"David S. Miller" wrote: > >Date: Mon, 06 Nov 2000 23:16:21 -0800 >From: Jordan Mendelson <[EMAIL PROTECTED]> > >"David S. Miller" wrote: >> It is clear though, that something is messing with or corrupting the >> packets. One thing you might try is turning off TCP header >> compression for the PPP link, does this make a difference? > >Actually, there has been several reports that turning header >compression does help. > > If this is what is causing the TCP sequence numbers to change > then either Win98's or Earthlink terminal server's implementation > of TCP header compression is buggy. > > Assuming this is true, it explains why Win98's TCP does not "see" the > data sent by Linux, because such a bug would make the TCP checksum of > these packets incorrect and thus dropped by Win98's TCP. Ok, but why doesn't 2.2.16 exhibit this behavior? We've had reports from quite a number of people complaining about this and I'm fairly certain not all of them are from Earthlink. Jordan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Poor TCP Performance 2.4.0-10 <-> Win98 SE PPP
"David S. Miller" wrote: > >Date: Mon, 06 Nov 2000 22:44:00 -0800 >From: Jordan Mendelson <[EMAIL PROTECTED]> > >Attached to this message are dumps from the windows 98 machine using >windump and the linux 2.4.0-test10. Sorry the time stamps don't match >up. > > (ie. Linux sends bytes 1:21 both the first time, and when it > retransmits that data. However win98 "sees" this as 1:19 the first > time and 1:21 during the retransmit by Linux) > > That is bogus. Something is mangling the packets between the Linux > machine and the win98 machine. You mentioned something about > bandwidth limiting at your upstream provider, any chance you can have > them turn this bandwidth limiting device off? It actually turns out that that problem with bandwidth was fixed yesterday, so this can not be the problem here and yes, 64.124.41.179 is a linux box. :) > Or maybe earthlink is using some packet mangling device? > > It is clear though, that something is messing with or corrupting the > packets. One thing you might try is turning off TCP header > compression for the PPP link, does this make a difference? Actually, there has been several reports that turning header compression does help. Jordan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Poor TCP Performance 2.4.0-10 <-> Win98 SE PPP
"David S. Miller" wrote: > >Date: Mon, 06 Nov 2000 22:13:23 -0800 >From: Jordan Mendelson <[EMAIL PROTECTED]> > >There is a possibility that we are hitting an upper level bandwidth >limit between us an our upstream provider due to a misconfiguration >on the other end, but this should only happen during peak time >(which it is not right now). It just bugs me that 2.2.16 doesn't >appear to have this problem. > > The only thing I can do now is beg for a tcpdump from the windows95 > machine side. Do you have the facilities necessary to obtain this? > This would prove that it is packet drop between the two systems, for > whatever reason, that is causing this. Attached to this message are dumps from the windows 98 machine using windump and the linux 2.4.0-test10. Sorry the time stamps don't match up. Jordan 23:36:15.252817 209.179.194.175.1084 > 64.124.41.179.: S 370996:370996(0) win 8192 (DF) 23:36:15.252891 64.124.41.179. > 209.179.194.175.1084: S 3050526223:3050526223(0) ack 370997 win 5840 (DF) 23:36:16.159685 209.179.194.175.1084 > 64.124.41.179.: . ack 1 win 8576 (DF) 23:36:16.160461 209.179.194.175.1084 > 64.124.41.179.: . ack 1 win 65280 (DF) 23:36:16.160488 209.179.194.175.1084 > 64.124.41.179.: P 1:44(43) ack 1 win 65280 (DF) 23:36:16.160506 64.124.41.179. > 209.179.194.175.1084: . ack 44 win 5840 (DF) 23:36:16.261533 64.124.41.179. > 209.179.194.175.1084: P 1:21(20) ack 44 win 5840 (DF) 23:36:16.261669 64.124.41.179. > 209.179.194.175.1084: P 21:557(536) ack 44 win 5840 (DF) 23:36:19.261055 64.124.41.179. > 209.179.194.175.1084: P 1:21(20) ack 44 win 5840 (DF) 23:36:19.450762 209.179.194.175.1084 > 64.124.41.179.: P 44:56(12) ack 21 win 65260 (DF) 23:36:19.450788 64.124.41.179. > 209.179.194.175.1084: P 21:557(536) ack 44 win 5840 (DF) 23:36:19.450820 64.124.41.179. > 209.179.194.175.1084: P 557:1093(536) ack 56 win 5840 (DF) 23:36:22.281248 209.179.194.175.1084 > 64.124.41.179.: P 44:456(412) ack 21 win 65260 (DF) 23:36:22.281308 64.124.41.179. > 209.179.194.175.1084: . ack 456 win 6432 (DF) 23:36:25.441061 64.124.41.179. > 209.179.194.175.1084: P 21:557(536) ack 456 win 6432 (DF) 23:36:25.701796 209.179.194.175.1084 > 64.124.41.179.: . ack 557 win 65280 (DF) 23:36:25.701841 64.124.41.179. > 209.179.194.175.1084: P 557:1093(536) ack 456 win 6432 (DF) 23:36:25.701859 64.124.41.179. > 209.179.194.175.1084: P 1093:1629(536) ack 456 win 6432 (DF) 23:36:37.701091 64.124.41.179. > 209.179.194.175.1084: P 557:1093(536) ack 456 win 6432 (DF) 23:36:38.026766 209.179.194.175.1084 > 64.124.41.179.: . ack 1093 win 65280 (DF) 23:36:38.026826 64.124.41.179. > 209.179.194.175.1084: P 1093:1629(536) ack 456 win 6432 (DF) 23:36:38.026839 64.124.41.179. > 209.179.194.175.1084: P 1629:1847(218) ack 456 win 6432 (DF) 23:37:02.021068 64.124.41.179. > 209.179.194.175.1084: P 1093:1629(536) ack 456 win 6432 (DF) 23:37:02.328163 209.179.194.175.1084 > 64.124.41.179.: . ack 1629 win 65280 (DF) 23:37:02.328189 64.124.41.179. > 209.179.194.175.1084: P 1629:1847(218) ack 456 win 6432 (DF) 23:37:50.321057 64.124.41.179. > 209.179.194.175.1084: P 1629:1847(218) ack 456 win 6432 (DF) 23:37:50.673000 209.179.194.175.1084 > 64.124.41.179.: . ack 1847 win 65062 (DF) 23:37:50.673068 64.124.41.179. > 209.179.194.175.1084: P 1847:1868(21) ack 456 win 6432 (DF) 23:38:00.162380 209.179.194.175.1084 > 64.124.41.179.: F 456:456(0) ack 1847 win 65062 (DF) 23:38:00.181055 64.124.41.179. > 209.179.194.175.1084: . ack 457 win 6432 (DF) 23:38:00.187291 64.124.41.179. > 209.179.194.175.1084: F 1868:1868(0) ack 457 win 6432 (DF) 23:38:00.363357 209.179.194.175.1084 > 64.124.41.179.: . ack 1847 win 65062 (DF) 23:39:26.671050 64.124.41.179. > 209.179.194.175.1084: P 1847:1868(21) ack 457 win 6432 (DF) 23:39:26.886417 209.179.194.175.1084 > 64.124.41.179.: R 371453:371453(0) win 0 (DF) 22:34:34.884487 arp who-has 64.124.41.179 tell 209.179.194.175 22:34:34.889477 209.179.194.175.1084 > 64.124.41.179.: S 370996:370996(0) win 8192 (DF) 22:34:35.669892 64.124.41.179. > 209.179.194.175.1084: S 3050526223:3050526223(0) ack 370997 win 5840 (DF) 22:34:35.670624 209.179.194.175.1084 > 64.124.41.179.: . ack 1 win 8576 (DF) 22:34:35.670653 209.179.194.175.1084 > 64.124.41.179.: . ack 1 win 65280 (DF) 22:34:35.674484 209.179.194.175.1084 > 64.124.41.179.: P 1:44(43) ack 1 win 65280 (DF) 22:34:36.049808 64.124.41.179. > 209.179.194.175.1084: . ack 44 win 5840 (DF) 22:34:36.069773 64.124.41.179. > 209.179.194.175.1084: P 1:19(18) ack 44 win 5840 (DF) 22:34:36.069837 64.124.41.179. > 209.179.194.175.1084: P 19:553(534) ack 44 win 5840 (DF)
Re: Poor TCP Performance 2.4.0-10 <-> Win98 SE PPP
"David S. Miller" wrote: > >Date: Mon, 06 Nov 2000 21:20:39 -0800 >From: Jordan Mendelson <[EMAIL PROTECTED]> > >It looks to me like there is an artificial delay in 2.4.0 which is >slowing down the traffic to unbearable levels. > > No, I think I see whats wrong, it's nothing more than packet drop. > > Looking at the equivalent 220 traces, the only difference appears to > be that the packets are not getting dropped. I would like to note that these two machines the windows client is connecting to are sitting on the exact same switch connected to the same provider handling identical user loads. > Alexey, do you have any other similar reports wrt. the new MSS > advertisement scheme in 2.4.x? > > Jordan, you mentioned something about possibly being "bandwidth > limited"? Please, elaborate... There is a possibility that we are hitting an upper level bandwidth limit between us an our upstream provider due to a misconfiguration on the other end, but this should only happen during peak time (which it is not right now). It just bugs me that 2.2.16 doesn't appear to have this problem. Jordan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Poor TCP Performance 2.4.0-10 <-> Win98 SE PPP
"David S. Miller" wrote: > >Date:Mon, 06 Nov 2000 18:17:19 -0800 >From: Jordan Mendelson <[EMAIL PROTECTED]> > >18:54:57.394894 eth0 > 64.124.41.177. > 209.179.248.69.1238: . >2429:2429(0) ack 506 win 6432 (DF) > > And this is it? The connection dies right here and says no > more? Surely, there was more said on this connection after > this point. > > Otherwise I see nothing obviously wrong in these dumps. I've provided two new dumps of the complete connection lifetime between 2.4.0 and 2.2.16. Both logs show the same client connecting to identical machines, receiving the same data and then disconnecting. 2.2.16 handles the entire process in under 5 seconds while 2.4.0 takes over 2 minutes. Also note that the 2.4.0 connection did not get shut down correctly and had to send an RST... though this is probably due to the client side closing down the connection while there was still data on it. Both machines were under approximately the same load. It looks to me like there is an artificial delay in 2.4.0 which is slowing down the traffic to unbearable levels. Jordan 22:00:39.625351 209.179.245.186.1092 > 64.124.41.179.: S 4155530:4155530(0) win 8192 (DF) 22:00:39.625437 64.124.41.179. > 209.179.245.186.1092: S 1301092473:1301092473(0) ack 4155531 win 5840 (DF) 22:00:39.887133 209.179.245.186.1092 > 64.124.41.179.: . ack 1 win 8576 (DF) 22:00:39.887969 209.179.245.186.1092 > 64.124.41.179.: . ack 1 win 65280 (DF) 22:00:39.888951 209.179.245.186.1092 > 64.124.41.179.: P 1:44(43) ack 1 win 65280 (DF) 22:00:39.888964 64.124.41.179. > 209.179.245.186.1092: . ack 44 win 5840 (DF) 22:00:39.991515 64.124.41.179. > 209.179.245.186.1092: P 1:21(20) ack 44 win 5840 (DF) 22:00:39.991660 64.124.41.179. > 209.179.245.186.1092: P 21:557(536) ack 44 win 5840 (DF) 22:00:42.991490 64.124.41.179. > 209.179.245.186.1092: P 1:21(20) ack 44 win 5840 (DF) 22:00:43.180946 209.179.245.186.1092 > 64.124.41.179.: P 44:56(12) ack 21 win 65260 (DF) 22:00:43.180997 64.124.41.179. > 209.179.245.186.1092: P 21:557(536) ack 44 win 5840 (DF) 22:00:43.181025 64.124.41.179. > 209.179.245.186.1092: P 557:1093(536) ack 56 win 5840 (DF) 22:00:45.685143 209.179.245.186.1092 > 64.124.41.179.: P 44:456(412) ack 21 win 65260 (DF) 22:00:45.685204 64.124.41.179. > 209.179.245.186.1092: . ack 456 win 6432 (DF) 22:00:49.171046 64.124.41.179. > 209.179.245.186.1092: P 21:557(536) ack 456 win 6432 (DF) 22:00:49.470193 209.179.245.186.1092 > 64.124.41.179.: . ack 557 win 65280 (DF) 22:00:49.470233 64.124.41.179. > 209.179.245.186.1092: P 557:1093(536) ack 456 win 6432 (DF) 22:00:49.470248 64.124.41.179. > 209.179.245.186.1092: P 1093:1629(536) ack 456 win 6432 (DF) 22:01:01.461056 64.124.41.179. > 209.179.245.186.1092: P 557:1093(536) ack 456 win 6432 (DF) 22:01:01.755362 209.179.245.186.1092 > 64.124.41.179.: . ack 1093 win 65280 (DF) 22:01:01.755428 64.124.41.179. > 209.179.245.186.1092: P 1093:1629(536) ack 456 win 6432 (DF) 22:01:01.755451 64.124.41.179. > 209.179.245.186.1092: P 1629:1825(196) ack 456 win 6432 (DF) 22:01:25.751048 64.124.41.179. > 209.179.245.186.1092: P 1093:1629(536) ack 456 win 6432 (DF) 22:01:26.171932 209.179.245.186.1092 > 64.124.41.179.: . ack 1629 win 65280 (DF) 22:01:26.171979 64.124.41.179. > 209.179.245.186.1092: P 1629:1825(196) ack 456 win 6432 (DF) 22:02:14.171052 64.124.41.179. > 209.179.245.186.1092: P 1629:1825(196) ack 456 win 6432 (DF) 22:02:14.499920 209.179.245.186.1092 > 64.124.41.179.: . ack 1825 win 65084 (DF) 22:02:14.499944 64.124.41.179. > 209.179.245.186.1092: P 1825:1847(22) ack 456 win 6432 (DF) 22:02:16.168708 209.179.245.186.1092 > 64.124.41.179.: F 456:456(0) ack 1825 win 65084 (DF) 22:02:16.181061 64.124.41.179. > 209.179.245.186.1092: . ack 457 win 6432 (DF) 22:02:16.281724 64.124.41.179. > 209.179.245.186.1092: F 1847:1847(0) ack 457 win 6432 (DF) 22:02:16.477943 209.179.245.186.1092 > 64.124.41.179.: . ack 1825 win 65084 (DF) 22:03:50.491063 64.124.41.179. > 209.179.245.186.1092: P 1825:1847(22) ack 457 win 6432 (DF) 22:03:50.680141 209.179.245.186.1092 > 64.124.41.179.: R 4155987:4155987(0) win 0 (DF) 22:00:01.684927 209.179.245.186.1091 > 64.124.41.136.: S 4033171:4033171(0) win 8192 (DF) 22:00:01.685021 64.124.41.136. > 209.179.245.186.1091: S 1261602556:1261602556(0) ack 4033172 win 32696 (DF) 22:00:01.916120 209.179.245.186.1091 > 64.124.41.136.: . ack 1 win 8576 (DF) 22:00:01.916191 209.179.245.186.1091 > 64.124.41.136.: . ack 1 win 65280 (DF) 22:00:01.916981 209.179.245.186.1091 > 64.124.41.136.: P 1:44(43) ack 1 win 65280 (DF) 22:00:01.917032 64.124.41.136. > 209.179.245.186.1091: .
Re: Poor TCP Performance 2.4.0-10 <-> Win98 SE PPP
Jordan Mendelson wrote: > > We are seeing a performance slowdown between Windows PPP users and > servers running 2.4.0-test10. Attached is a tcpdump log of the > connection. The machines is without TCP ECN support. The Windows machine > is running Windows 98 SE 4.10. A dialed up over PPP w/ TCP header > compression. The Linux machine is connected directly to the Internet via > a 6509. There is a possibility that we are hitting a bandwidth cap on > outgoing traffic. Just some updates. This problem does not appear to happen under 2.2.16. The dump for 2.2.16 is almost the same except we send an mss back of 536 and not 1460 (remote mtu vs local mtu). Here is the head of a tcpdump with the same client, but this time with a 2.2.16 machine instead of a 2.4.0-test10 machine: 19:26:23.593114 eth0 < 209.179.248.69.1260 > 64.124.41.136.: S 5061245:5061245(0) win 8192 (DF) 19:26:23.593237 eth0 > 64.124.41.136. > 209.179.248.69.1260: S 119520695:119520695(0) ack 5061246 win 32696 (DF) 19:26:23.824394 eth0 < 209.179.248.69.1260 > 64.124.41.136.: . 1:1(0) ack 1 win 65280 (DF) 19:26:23.824398 eth0 < 209.179.248.69.1260 > 64.124.41.136.: . 1:1(0) ack 1 win 8576 (DF) 19:26:23.825249 eth0 < 209.179.248.69.1260 > 64.124.41.136.: P 1:44(43) ack 1 win 65280 (DF) 19:26:23.825283 eth0 > 64.124.41.136. > 209.179.248.69.1260: . 1:1(0) ack 44 win 32696 (DF) 19:26:25.245845 eth0 > 64.124.41.136. > 209.179.248.69.1260: P 1:21(20) ack 44 win 32696 (DF) 19:26:25.245956 eth0 > 64.124.41.136. > 209.179.248.69.1260: P 21:342(321) ack 44 win 32696 (DF) 19:26:25.466759 eth0 < 209.179.248.69.1260 > 64.124.41.136.: . 44:44(0) ack 342 win 64939 (DF) 19:26:25.466792 eth0 > 64.124.41.136. > 209.179.248.69.1260: P 342:878(536) ack 44 win 32696 (DF) 19:26:25.466800 eth0 > 64.124.41.136. > 209.179.248.69.1260: P 878:1401(523) ack 44 win 32696 (DF) 19:26:25.467562 eth0 < 209.179.248.69.1260 > 64.124.41.136.: P 44:56(12) ack 342 win 64939 (DF) 19:26:25.480104 eth0 > 64.124.41.136. > 209.179.248.69.1260: . 1401:1401(0) ack 56 win 32696 (DF) 19:26:25.763509 eth0 < 209.179.248.69.1260 > 64.124.41.136.: P 56:456(400) ack 878 win 65280 (DF) 19:26:25.766253 eth0 < 209.179.248.69.1260 > 64.124.41.136.: . 456:456(0) ack 1401 win 64757 (DF) 19:26:26.070115 eth0 > 64.124.41.136. > 209.179.248.69.1260: . 1401:1401(0) ack 456 win 32296 (DF) 19:26:26.431515 eth0 > 64.124.41.136. > 209.179.248.69.1260: P 1401:1413(12) ack 456 win 32696 (DF) 19:26:26.432141 eth0 > 64.124.41.136. > 209.179.248.69.1260: P 1413:1684(271) ack 456 win 32696 (DF) 19:26:26.657631 eth0 < 209.179.248.69.1260 > 64.124.41.136.: . 456:456(0) ack 1684 win 65280 (DF) 19:26:26.657663 eth0 > 64.124.41.136. > 209.179.248.69.1260: P 1684:1817(133) ack 456 win 32696 (DF) 19:26:26.952825 eth0 < 209.179.248.69.1260 > 64.124.41.136.: . 456:456(0) ack 1817 win 65147 (DF) 19:26:31.086138 eth0 < 209.179.248.69.1260 > 64.124.41.136.: P 456:506(50) ack 1817 win 65147 (DF) > 18:51:33.282286 eth0 < 209.179.248.69.1238 > 64.124.41.177.: S > 3013389:3013389(0) win 8192 (DF) > 18:51:33.282395 eth0 > 64.124.41.177. > 209.179.248.69.1238: S > 2198113890:2198113890(0) ack 3013390 win 5840 > (DF) > 18:51:33.509532 eth0 < 209.179.248.69.1238 > 64.124.41.177.: . > 1:1(0) ack 1 win 8576 (DF) > 18:51:33.510360 eth0 < 209.179.248.69.1238 > 64.124.41.177.: . > 1:1(0) ack 1 win 65280 (DF) > 18:51:33.510416 eth0 < 209.179.248.69.1238 > 64.124.41.177.: P > 1:44(43) ack 1 win 65280 (DF) > 18:51:33.510457 eth0 > 64.124.41.177. > 209.179.248.69.1238: . > 1:1(0) ack 44 win 5840 (DF) > 18:51:33.988330 eth0 > 64.124.41.177. > 209.179.248.69.1238: P > 1:21(20) ack 44 win 5840 (DF) > 18:51:33.988474 eth0 > 64.124.41.177. > 209.179.248.69.1238: P > 21:557(536) ack 44 win 5840 (DF) > 18:51:36.987336 eth0 > 64.124.41.177. > 209.179.248.69.1238: P > 1:21(20) ack 44 win 5840 (DF) > 18:51:37.12 eth0 < 209.179.248.69.1238 > 64.124.41.177.: P > 44:56(12) ack 21 win 65260 (DF) > 18:51:37.177794 eth0 > 64.124.41.177. > 209.179.248.69.1238: P > 21:557(536) ack 44 win 5840 (DF) > 18:51:37.177806 eth0 > 64.124.41.177. > 209.179.248.69.1238: P > 557:1093(536) ack 56 win 5840 (DF) > 18:51:39.845046 eth0 < 209.179.248.69.1238 > 64.124.41.177.: P > 44:456(412) ack 21 win 65260 (DF) > 18:51:39.845071 eth0 > 64.124.41.177. > 209.179.248.69.1238: . > 1093:1093(0) ack 456 win 6432 (DF) > 18:51:43.177329 eth0 > 64.124.41.177. > 209.179.248.69.1238: P > 21:557(536) ack 456 win 6432 (DF) > 18:51:43.538219 eth0 < 209.179.248.69.1238 > 64.124.41.177.888
Poor TCP Performance 2.4.0-10 <-> Win98 SE PPP
We are seeing a performance slowdown between Windows PPP users and servers running 2.4.0-test10. Attached is a tcpdump log of the connection. The machines is without TCP ECN support. The Windows machine is running Windows 98 SE 4.10. A dialed up over PPP w/ TCP header compression. The Linux machine is connected directly to the Internet via a 6509. There is a possibility that we are hitting a bandwidth cap on outgoing traffic. 18:51:33.282286 eth0 < 209.179.248.69.1238 > 64.124.41.177.: S 3013389:3013389(0) win 8192 (DF) 18:51:33.282395 eth0 > 64.124.41.177. > 209.179.248.69.1238: S 2198113890:2198113890(0) ack 3013390 win 5840 (DF) 18:51:33.509532 eth0 < 209.179.248.69.1238 > 64.124.41.177.: . 1:1(0) ack 1 win 8576 (DF) 18:51:33.510360 eth0 < 209.179.248.69.1238 > 64.124.41.177.: . 1:1(0) ack 1 win 65280 (DF) 18:51:33.510416 eth0 < 209.179.248.69.1238 > 64.124.41.177.: P 1:44(43) ack 1 win 65280 (DF) 18:51:33.510457 eth0 > 64.124.41.177. > 209.179.248.69.1238: . 1:1(0) ack 44 win 5840 (DF) 18:51:33.988330 eth0 > 64.124.41.177. > 209.179.248.69.1238: P 1:21(20) ack 44 win 5840 (DF) 18:51:33.988474 eth0 > 64.124.41.177. > 209.179.248.69.1238: P 21:557(536) ack 44 win 5840 (DF) 18:51:36.987336 eth0 > 64.124.41.177. > 209.179.248.69.1238: P 1:21(20) ack 44 win 5840 (DF) 18:51:37.12 eth0 < 209.179.248.69.1238 > 64.124.41.177.: P 44:56(12) ack 21 win 65260 (DF) 18:51:37.177794 eth0 > 64.124.41.177. > 209.179.248.69.1238: P 21:557(536) ack 44 win 5840 (DF) 18:51:37.177806 eth0 > 64.124.41.177. > 209.179.248.69.1238: P 557:1093(536) ack 56 win 5840 (DF) 18:51:39.845046 eth0 < 209.179.248.69.1238 > 64.124.41.177.: P 44:456(412) ack 21 win 65260 (DF) 18:51:39.845071 eth0 > 64.124.41.177. > 209.179.248.69.1238: . 1093:1093(0) ack 456 win 6432 (DF) 18:51:43.177329 eth0 > 64.124.41.177. > 209.179.248.69.1238: P 21:557(536) ack 456 win 6432 (DF) 18:51:43.538219 eth0 < 209.179.248.69.1238 > 64.124.41.177.: . 456:456(0) ack 557 win 65280 (DF) 18:51:43.538275 eth0 > 64.124.41.177. > 209.179.248.69.1238: P 557:1093(536) ack 456 win 6432 (DF) 18:51:43.538292 eth0 > 64.124.41.177. > 209.179.248.69.1238: P 1093:1629(536) ack 456 win 6432 (DF) 18:51:55.537346 eth0 > 64.124.41.177. > 209.179.248.69.1238: P 557:1093(536) ack 456 win 6432 (DF) 18:51:55.841360 eth0 < 209.179.248.69.1238 > 64.124.41.177.: . 456:456(0) ack 1093 win 65280 (DF) 18:51:55.841384 eth0 > 64.124.41.177. > 209.179.248.69.1238: P 1093:1629(536) ack 456 win 6432 (DF) 18:51:55.841393 eth0 > 64.124.41.177. > 209.179.248.69.1238: P 1629:1849(220) ack 456 win 6432 (DF) 18:52:19.837335 eth0 > 64.124.41.177. > 209.179.248.69.1238: P 1093:1629(536) ack 456 win 6432 (DF) 18:52:20.153776 eth0 < 209.179.248.69.1238 > 64.124.41.177.: . 456:456(0) ack 1629 win 65280 (DF) 18:52:20.153803 eth0 > 64.124.41.177. > 209.179.248.69.1238: P 1629:1849(220) ack 456 win 6432 (DF) 18:53:08.147334 eth0 > 64.124.41.177. > 209.179.248.69.1238: P 1629:1849(220) ack 456 win 6432 (DF) 18:53:08.475911 eth0 < 209.179.248.69.1238 > 64.124.41.177.: . 456:456(0) ack 1849 win 65060 (DF) 18:53:08.475947 eth0 > 64.124.41.177. > 209.179.248.69.1238: P 1849:1871(22) ack 456 win 6432 (DF) 18:54:44.467332 eth0 > 64.124.41.177. > 209.179.248.69.1238: P 1849:1871(22) ack 456 win 6432 (DF) 18:54:44.824187 eth0 < 209.179.248.69.1238 > 64.124.41.177.: . 456:456(0) ack 1871 win 65038 (DF) 18:54:44.824256 eth0 > 64.124.41.177. > 209.179.248.69.1238: P 1871:1893(22) ack 456 win 6432 (DF) 18:54:55.212750 eth0 < 209.179.248.69.1238 > 64.124.41.177.: P 456:506(50) ack 1871 win 65038 (DF) 18:54:55.212767 eth0 > 64.124.41.177. > 209.179.248.69.1238: . 1893:1893(0) ack 506 win 6432 (DF) 18:54:55.571337 eth0 > 64.124.41.177. > 209.179.248.69.1238: P 1893:2429(536) ack 506 win 6432 (DF) 18:54:57.394879 eth0 < 209.179.248.69.1238 > 64.124.41.177.: P 456:506(50) ack 1871 win 65038 (DF) 18:54:57.394894 eth0 > 64.124.41.177. > 209.179.248.69.1238: . 2429:2429(0) ack 506 win 6432 (DF) Here are some numbers from /proc/sys/net/ipv4: $ cat /proc/sys/net/ipv4/tcp_rmem 409687380 174760 $ cat /proc/sys/net/ipv4/tcp_wmem 409616384 131072 $ cat /proc/sys/net/ipv4/tcp_sack 1 $ cat /proc/sys/net/ipv4/tcp_fack 1 $ cat /proc/sys/net/ipv4/tcp_dsack 1 $ cat /proc/sys/net/ipv4/tcp_window_scaling 1 $ cat /proc/sys/net/ipv4/tcp_syncookies 0 $ cat /proc/sys/net/ipv4/tcp_timestamps 1 Jordan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Poor TCP Performance 2.4.0-10 - Win98 SE PPP
Jordan Mendelson wrote: We are seeing a performance slowdown between Windows PPP users and servers running 2.4.0-test10. Attached is a tcpdump log of the connection. The machines is without TCP ECN support. The Windows machine is running Windows 98 SE 4.10. A dialed up over PPP w/ TCP header compression. The Linux machine is connected directly to the Internet via a 6509. There is a possibility that we are hitting a bandwidth cap on outgoing traffic. Just some updates. This problem does not appear to happen under 2.2.16. The dump for 2.2.16 is almost the same except we send an mss back of 536 and not 1460 (remote mtu vs local mtu). Here is the head of a tcpdump with the same client, but this time with a 2.2.16 machine instead of a 2.4.0-test10 machine: 19:26:23.593114 eth0 209.179.248.69.1260 64.124.41.136.: S 5061245:5061245(0) win 8192 mss 536,nop,nop,sackOK (DF) 19:26:23.593237 eth0 64.124.41.136. 209.179.248.69.1260: S 119520695:119520695(0) ack 5061246 win 32696 mss 536,nop,nop,sackOK (DF) 19:26:23.824394 eth0 209.179.248.69.1260 64.124.41.136.: . 1:1(0) ack 1 win 65280 (DF) 19:26:23.824398 eth0 209.179.248.69.1260 64.124.41.136.: . 1:1(0) ack 1 win 8576 (DF) 19:26:23.825249 eth0 209.179.248.69.1260 64.124.41.136.: P 1:44(43) ack 1 win 65280 (DF) 19:26:23.825283 eth0 64.124.41.136. 209.179.248.69.1260: . 1:1(0) ack 44 win 32696 (DF) 19:26:25.245845 eth0 64.124.41.136. 209.179.248.69.1260: P 1:21(20) ack 44 win 32696 (DF) 19:26:25.245956 eth0 64.124.41.136. 209.179.248.69.1260: P 21:342(321) ack 44 win 32696 (DF) 19:26:25.466759 eth0 209.179.248.69.1260 64.124.41.136.: . 44:44(0) ack 342 win 64939 (DF) 19:26:25.466792 eth0 64.124.41.136. 209.179.248.69.1260: P 342:878(536) ack 44 win 32696 (DF) 19:26:25.466800 eth0 64.124.41.136. 209.179.248.69.1260: P 878:1401(523) ack 44 win 32696 (DF) 19:26:25.467562 eth0 209.179.248.69.1260 64.124.41.136.: P 44:56(12) ack 342 win 64939 (DF) 19:26:25.480104 eth0 64.124.41.136. 209.179.248.69.1260: . 1401:1401(0) ack 56 win 32696 (DF) 19:26:25.763509 eth0 209.179.248.69.1260 64.124.41.136.: P 56:456(400) ack 878 win 65280 (DF) 19:26:25.766253 eth0 209.179.248.69.1260 64.124.41.136.: . 456:456(0) ack 1401 win 64757 (DF) 19:26:26.070115 eth0 64.124.41.136. 209.179.248.69.1260: . 1401:1401(0) ack 456 win 32296 (DF) 19:26:26.431515 eth0 64.124.41.136. 209.179.248.69.1260: P 1401:1413(12) ack 456 win 32696 (DF) 19:26:26.432141 eth0 64.124.41.136. 209.179.248.69.1260: P 1413:1684(271) ack 456 win 32696 (DF) 19:26:26.657631 eth0 209.179.248.69.1260 64.124.41.136.: . 456:456(0) ack 1684 win 65280 (DF) 19:26:26.657663 eth0 64.124.41.136. 209.179.248.69.1260: P 1684:1817(133) ack 456 win 32696 (DF) 19:26:26.952825 eth0 209.179.248.69.1260 64.124.41.136.: . 456:456(0) ack 1817 win 65147 (DF) 19:26:31.086138 eth0 209.179.248.69.1260 64.124.41.136.: P 456:506(50) ack 1817 win 65147 (DF) 18:51:33.282286 eth0 209.179.248.69.1238 64.124.41.177.: S 3013389:3013389(0) win 8192 mss 536,nop,nop,sackOK (DF) 18:51:33.282395 eth0 64.124.41.177. 209.179.248.69.1238: S 2198113890:2198113890(0) ack 3013390 win 5840 mss 1460,nop,nop,sackOK (DF) 18:51:33.509532 eth0 209.179.248.69.1238 64.124.41.177.: . 1:1(0) ack 1 win 8576 (DF) 18:51:33.510360 eth0 209.179.248.69.1238 64.124.41.177.: . 1:1(0) ack 1 win 65280 (DF) 18:51:33.510416 eth0 209.179.248.69.1238 64.124.41.177.: P 1:44(43) ack 1 win 65280 (DF) 18:51:33.510457 eth0 64.124.41.177. 209.179.248.69.1238: . 1:1(0) ack 44 win 5840 (DF) 18:51:33.988330 eth0 64.124.41.177. 209.179.248.69.1238: P 1:21(20) ack 44 win 5840 (DF) 18:51:33.988474 eth0 64.124.41.177. 209.179.248.69.1238: P 21:557(536) ack 44 win 5840 (DF) 18:51:36.987336 eth0 64.124.41.177. 209.179.248.69.1238: P 1:21(20) ack 44 win 5840 (DF) 18:51:37.12 eth0 209.179.248.69.1238 64.124.41.177.: P 44:56(12) ack 21 win 65260 (DF) 18:51:37.177794 eth0 64.124.41.177. 209.179.248.69.1238: P 21:557(536) ack 44 win 5840 (DF) 18:51:37.177806 eth0 64.124.41.177. 209.179.248.69.1238: P 557:1093(536) ack 56 win 5840 (DF) 18:51:39.845046 eth0 209.179.248.69.1238 64.124.41.177.: P 44:456(412) ack 21 win 65260 (DF) 18:51:39.845071 eth0 64.124.41.177. 209.179.248.69.1238: . 1093:1093(0) ack 456 win 6432 nop,nop, sack 1 {44:56} (DF) 18:51:43.177329 eth0 64.124.41.177. 209.179.248.69.1238: P 21:557(536) ack 456 win 6432 (DF) 18:51:43.538219 eth0 209.179.248.69.1238 64.124.41.177.: . 456:456(0) ack 557 win 65280 (DF) 18:51:43.538275 eth0 64.124.41.177. 209.179.248.69.1238: P 557:1093(536) ack 456 win 6432 (DF) 18:51:43.538292 eth0 64.124.41.177. 209.179.248.69.1238: P 1093:1629(536) ack 456 win 6432 (DF) 18:51:55.537346 eth0 64.124.41.177. 209.179.248.69.1238: P 557:1093(536) ack 456 win 6432 (DF
Re: Poor TCP Performance 2.4.0-10 - Win98 SE PPP
"David S. Miller" wrote: Date:Mon, 06 Nov 2000 18:17:19 -0800 From: Jordan Mendelson [EMAIL PROTECTED] 18:54:57.394894 eth0 64.124.41.177. 209.179.248.69.1238: . 2429:2429(0) ack 506 win 6432 nop,nop, sack 1 {456:506} (DF) And this is it? The connection dies right here and says no more? Surely, there was more said on this connection after this point. Otherwise I see nothing obviously wrong in these dumps. I've provided two new dumps of the complete connection lifetime between 2.4.0 and 2.2.16. Both logs show the same client connecting to identical machines, receiving the same data and then disconnecting. 2.2.16 handles the entire process in under 5 seconds while 2.4.0 takes over 2 minutes. Also note that the 2.4.0 connection did not get shut down correctly and had to send an RST... though this is probably due to the client side closing down the connection while there was still data on it. Both machines were under approximately the same load. It looks to me like there is an artificial delay in 2.4.0 which is slowing down the traffic to unbearable levels. Jordan 22:00:39.625351 209.179.245.186.1092 64.124.41.179.: S 4155530:4155530(0) win 8192 mss 536,nop,nop,sackOK (DF) 22:00:39.625437 64.124.41.179. 209.179.245.186.1092: S 1301092473:1301092473(0) ack 4155531 win 5840 mss 1460,nop,nop,sackOK (DF) 22:00:39.887133 209.179.245.186.1092 64.124.41.179.: . ack 1 win 8576 (DF) 22:00:39.887969 209.179.245.186.1092 64.124.41.179.: . ack 1 win 65280 (DF) 22:00:39.888951 209.179.245.186.1092 64.124.41.179.: P 1:44(43) ack 1 win 65280 (DF) 22:00:39.888964 64.124.41.179. 209.179.245.186.1092: . ack 44 win 5840 (DF) 22:00:39.991515 64.124.41.179. 209.179.245.186.1092: P 1:21(20) ack 44 win 5840 (DF) 22:00:39.991660 64.124.41.179. 209.179.245.186.1092: P 21:557(536) ack 44 win 5840 (DF) 22:00:42.991490 64.124.41.179. 209.179.245.186.1092: P 1:21(20) ack 44 win 5840 (DF) 22:00:43.180946 209.179.245.186.1092 64.124.41.179.: P 44:56(12) ack 21 win 65260 (DF) 22:00:43.180997 64.124.41.179. 209.179.245.186.1092: P 21:557(536) ack 44 win 5840 (DF) 22:00:43.181025 64.124.41.179. 209.179.245.186.1092: P 557:1093(536) ack 56 win 5840 (DF) 22:00:45.685143 209.179.245.186.1092 64.124.41.179.: P 44:456(412) ack 21 win 65260 (DF) 22:00:45.685204 64.124.41.179. 209.179.245.186.1092: . ack 456 win 6432 nop,nop, sack 1 {44:56} (DF) 22:00:49.171046 64.124.41.179. 209.179.245.186.1092: P 21:557(536) ack 456 win 6432 (DF) 22:00:49.470193 209.179.245.186.1092 64.124.41.179.: . ack 557 win 65280 (DF) 22:00:49.470233 64.124.41.179. 209.179.245.186.1092: P 557:1093(536) ack 456 win 6432 (DF) 22:00:49.470248 64.124.41.179. 209.179.245.186.1092: P 1093:1629(536) ack 456 win 6432 (DF) 22:01:01.461056 64.124.41.179. 209.179.245.186.1092: P 557:1093(536) ack 456 win 6432 (DF) 22:01:01.755362 209.179.245.186.1092 64.124.41.179.: . ack 1093 win 65280 (DF) 22:01:01.755428 64.124.41.179. 209.179.245.186.1092: P 1093:1629(536) ack 456 win 6432 (DF) 22:01:01.755451 64.124.41.179. 209.179.245.186.1092: P 1629:1825(196) ack 456 win 6432 (DF) 22:01:25.751048 64.124.41.179. 209.179.245.186.1092: P 1093:1629(536) ack 456 win 6432 (DF) 22:01:26.171932 209.179.245.186.1092 64.124.41.179.: . ack 1629 win 65280 (DF) 22:01:26.171979 64.124.41.179. 209.179.245.186.1092: P 1629:1825(196) ack 456 win 6432 (DF) 22:02:14.171052 64.124.41.179. 209.179.245.186.1092: P 1629:1825(196) ack 456 win 6432 (DF) 22:02:14.499920 209.179.245.186.1092 64.124.41.179.: . ack 1825 win 65084 (DF) 22:02:14.499944 64.124.41.179. 209.179.245.186.1092: P 1825:1847(22) ack 456 win 6432 (DF) 22:02:16.168708 209.179.245.186.1092 64.124.41.179.: F 456:456(0) ack 1825 win 65084 (DF) 22:02:16.181061 64.124.41.179. 209.179.245.186.1092: . ack 457 win 6432 (DF) 22:02:16.281724 64.124.41.179. 209.179.245.186.1092: F 1847:1847(0) ack 457 win 6432 (DF) 22:02:16.477943 209.179.245.186.1092 64.124.41.179.: . ack 1825 win 65084 nop,nop, sack 1 {1847:1848} (DF) 22:03:50.491063 64.124.41.179. 209.179.245.186.1092: P 1825:1847(22) ack 457 win 6432 (DF) 22:03:50.680141 209.179.245.186.1092 64.124.41.179.: R 4155987:4155987(0) win 0 (DF) 22:00:01.684927 209.179.245.186.1091 64.124.41.136.: S 4033171:4033171(0) win 8192 mss 536,nop,nop,sackOK (DF) 22:00:01.685021 64.124.41.136. 209.179.245.186.1091: S 1261602556:1261602556(0) ack 4033172 win 32696 mss 536,nop,nop,sackOK (DF) 22:00:01.916120 209.179.245.186.1091 64.124.41.136.: . ack 1 win 8576 (DF) 22:00:01.916191 209.179.245.186.1091 64.124.41.136.: . ack 1 win 65280 (DF) 22:00:01.916981 209.179.245.186.1091 64.124.41.136.: P 1:44(43) ack 1 win 65280 (DF) 22:00:01.917032 64.124.41.136. 209.179.245.186.1091: . ack 44 win 32696 (DF) 22:00:02.121143 64.124.4
Re: Poor TCP Performance 2.4.0-10 - Win98 SE PPP
"David S. Miller" wrote: Date: Mon, 06 Nov 2000 21:20:39 -0800 From: Jordan Mendelson [EMAIL PROTECTED] It looks to me like there is an artificial delay in 2.4.0 which is slowing down the traffic to unbearable levels. No, I think I see whats wrong, it's nothing more than packet drop. Looking at the equivalent 220 traces, the only difference appears to be that the packets are not getting dropped. I would like to note that these two machines the windows client is connecting to are sitting on the exact same switch connected to the same provider handling identical user loads. Alexey, do you have any other similar reports wrt. the new MSS advertisement scheme in 2.4.x? Jordan, you mentioned something about possibly being "bandwidth limited"? Please, elaborate... There is a possibility that we are hitting an upper level bandwidth limit between us an our upstream provider due to a misconfiguration on the other end, but this should only happen during peak time (which it is not right now). It just bugs me that 2.2.16 doesn't appear to have this problem. Jordan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Poor TCP Performance 2.4.0-10 - Win98 SE PPP
"David S. Miller" wrote: Date: Mon, 06 Nov 2000 22:13:23 -0800 From: Jordan Mendelson [EMAIL PROTECTED] There is a possibility that we are hitting an upper level bandwidth limit between us an our upstream provider due to a misconfiguration on the other end, but this should only happen during peak time (which it is not right now). It just bugs me that 2.2.16 doesn't appear to have this problem. The only thing I can do now is beg for a tcpdump from the windows95 machine side. Do you have the facilities necessary to obtain this? This would prove that it is packet drop between the two systems, for whatever reason, that is causing this. Attached to this message are dumps from the windows 98 machine using windump and the linux 2.4.0-test10. Sorry the time stamps don't match up. Jordan 23:36:15.252817 209.179.194.175.1084 64.124.41.179.: S 370996:370996(0) win 8192 mss 536,nop,nop,sackOK (DF) 23:36:15.252891 64.124.41.179. 209.179.194.175.1084: S 3050526223:3050526223(0) ack 370997 win 5840 mss 1460,nop,nop,sackOK (DF) 23:36:16.159685 209.179.194.175.1084 64.124.41.179.: . ack 1 win 8576 (DF) 23:36:16.160461 209.179.194.175.1084 64.124.41.179.: . ack 1 win 65280 (DF) 23:36:16.160488 209.179.194.175.1084 64.124.41.179.: P 1:44(43) ack 1 win 65280 (DF) 23:36:16.160506 64.124.41.179. 209.179.194.175.1084: . ack 44 win 5840 (DF) 23:36:16.261533 64.124.41.179. 209.179.194.175.1084: P 1:21(20) ack 44 win 5840 (DF) 23:36:16.261669 64.124.41.179. 209.179.194.175.1084: P 21:557(536) ack 44 win 5840 (DF) 23:36:19.261055 64.124.41.179. 209.179.194.175.1084: P 1:21(20) ack 44 win 5840 (DF) 23:36:19.450762 209.179.194.175.1084 64.124.41.179.: P 44:56(12) ack 21 win 65260 (DF) 23:36:19.450788 64.124.41.179. 209.179.194.175.1084: P 21:557(536) ack 44 win 5840 (DF) 23:36:19.450820 64.124.41.179. 209.179.194.175.1084: P 557:1093(536) ack 56 win 5840 (DF) 23:36:22.281248 209.179.194.175.1084 64.124.41.179.: P 44:456(412) ack 21 win 65260 (DF) 23:36:22.281308 64.124.41.179. 209.179.194.175.1084: . ack 456 win 6432 nop,nop, sack 1 {44:56} (DF) 23:36:25.441061 64.124.41.179. 209.179.194.175.1084: P 21:557(536) ack 456 win 6432 (DF) 23:36:25.701796 209.179.194.175.1084 64.124.41.179.: . ack 557 win 65280 (DF) 23:36:25.701841 64.124.41.179. 209.179.194.175.1084: P 557:1093(536) ack 456 win 6432 (DF) 23:36:25.701859 64.124.41.179. 209.179.194.175.1084: P 1093:1629(536) ack 456 win 6432 (DF) 23:36:37.701091 64.124.41.179. 209.179.194.175.1084: P 557:1093(536) ack 456 win 6432 (DF) 23:36:38.026766 209.179.194.175.1084 64.124.41.179.: . ack 1093 win 65280 (DF) 23:36:38.026826 64.124.41.179. 209.179.194.175.1084: P 1093:1629(536) ack 456 win 6432 (DF) 23:36:38.026839 64.124.41.179. 209.179.194.175.1084: P 1629:1847(218) ack 456 win 6432 (DF) 23:37:02.021068 64.124.41.179. 209.179.194.175.1084: P 1093:1629(536) ack 456 win 6432 (DF) 23:37:02.328163 209.179.194.175.1084 64.124.41.179.: . ack 1629 win 65280 (DF) 23:37:02.328189 64.124.41.179. 209.179.194.175.1084: P 1629:1847(218) ack 456 win 6432 (DF) 23:37:50.321057 64.124.41.179. 209.179.194.175.1084: P 1629:1847(218) ack 456 win 6432 (DF) 23:37:50.673000 209.179.194.175.1084 64.124.41.179.: . ack 1847 win 65062 (DF) 23:37:50.673068 64.124.41.179. 209.179.194.175.1084: P 1847:1868(21) ack 456 win 6432 (DF) 23:38:00.162380 209.179.194.175.1084 64.124.41.179.: F 456:456(0) ack 1847 win 65062 (DF) 23:38:00.181055 64.124.41.179. 209.179.194.175.1084: . ack 457 win 6432 (DF) 23:38:00.187291 64.124.41.179. 209.179.194.175.1084: F 1868:1868(0) ack 457 win 6432 (DF) 23:38:00.363357 209.179.194.175.1084 64.124.41.179.: . ack 1847 win 65062 nop,nop, sack 1 {1868:1869} (DF) 23:39:26.671050 64.124.41.179. 209.179.194.175.1084: P 1847:1868(21) ack 457 win 6432 (DF) 23:39:26.886417 209.179.194.175.1084 64.124.41.179.: R 371453:371453(0) win 0 (DF) 22:34:34.884487 arp who-has 64.124.41.179 tell 209.179.194.175 22:34:34.889477 209.179.194.175.1084 64.124.41.179.: S 370996:370996(0) win 8192 mss 536,nop,nop,sackOK (DF) 22:34:35.669892 64.124.41.179. 209.179.194.175.1084: S 3050526223:3050526223(0) ack 370997 win 5840 mss 1460,nop,nop,sackOK (DF) 22:34:35.670624 209.179.194.175.1084 64.124.41.179.: . ack 1 win 8576 (DF) 22:34:35.670653 209.179.194.175.1084 64.124.41.179.: . ack 1 win 65280 (DF) 22:34:35.674484 209.179.194.175.1084 64.124.41.179.: P 1:44(43) ack 1 win 65280 (DF) 22:34:36.049808 64.124.41.179. 209.179.194.175.1084: . ack 44 win 5840 (DF) 22:34:36.069773 64.124.41.179. 209.179.194.175.1084: P 1:19(18) ack 44 win 5840 (DF) 22:34:36.069837 64.124.41.179. 209.179.194.175.1084: P 19:553(534) ack 44 win 5840 (DF) 22:34:39.049788 64.124.41.179. 209.179.194.175.1084: P 1:21(20) ack 44 win 5840
Re: Poor TCP Performance 2.4.0-10 - Win98 SE PPP
"David S. Miller" wrote: Date: Mon, 06 Nov 2000 22:44:00 -0800 From: Jordan Mendelson [EMAIL PROTECTED] Attached to this message are dumps from the windows 98 machine using windump and the linux 2.4.0-test10. Sorry the time stamps don't match up. (ie. Linux sends bytes 1:21 both the first time, and when it retransmits that data. However win98 "sees" this as 1:19 the first time and 1:21 during the retransmit by Linux) That is bogus. Something is mangling the packets between the Linux machine and the win98 machine. You mentioned something about bandwidth limiting at your upstream provider, any chance you can have them turn this bandwidth limiting device off? It actually turns out that that problem with bandwidth was fixed yesterday, so this can not be the problem here and yes, 64.124.41.179 is a linux box. :) Or maybe earthlink is using some packet mangling device? It is clear though, that something is messing with or corrupting the packets. One thing you might try is turning off TCP header compression for the PPP link, does this make a difference? Actually, there has been several reports that turning header compression does help. Jordan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Poor TCP Performance 2.4.0-10 - Win98 SE PPP
"David S. Miller" wrote: Date: Mon, 06 Nov 2000 23:16:21 -0800 From: Jordan Mendelson [EMAIL PROTECTED] "David S. Miller" wrote: It is clear though, that something is messing with or corrupting the packets. One thing you might try is turning off TCP header compression for the PPP link, does this make a difference? Actually, there has been several reports that turning header compression does help. If this is what is causing the TCP sequence numbers to change then either Win98's or Earthlink terminal server's implementation of TCP header compression is buggy. Assuming this is true, it explains why Win98's TCP does not "see" the data sent by Linux, because such a bug would make the TCP checksum of these packets incorrect and thus dropped by Win98's TCP. Ok, but why doesn't 2.2.16 exhibit this behavior? We've had reports from quite a number of people complaining about this and I'm fairly certain not all of them are from Earthlink. Jordan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
0-order allocation failed / Fragmentation Bug? (2.4.0-test10)
I've been receiving these error messages during times of near complete memory depletion. This particular machine runs a bare minimum of processes and a our own application which is a threaded long running (1 day, 5:39) which consumes most of the resources on the machine. Oddly enough however, the mallinfo() for this process shows a discrepancy of 650 megs with ps and top. This process handles a large number of TCP connections and does a lot of dynamic memory allocation, so I assumed the difference was due to memory fragmentation on our part, however I thought that kswapd would reclaim memory once it started swapping it out. Another oddity is the bogomips reported by each CPU are somewhat different from eachother. The message being received is: Nov 3 10:04:30 n175 kernel: __alloc_pages: 0-order allocation failed. Nov 3 10:04:30 n175 last message repeated 363 times The kernel version: Linux version 2.4.0-test10 (root@stp) (gcc version egcs-2.91.66 19990314/Linux (egcs-1.1.2 release)) #1 SMP Tue Oct 31 13:13:05 PST 2000 What free reports: total used free sharedbuffers cached Mem: 10282561024172 4084 0148 59296 -/+ buffers/cache: 964728 63528 Swap: 136512 75588 60924 ps and top report that the process taking up all this memory has an RSS of 967656 KB and a VSIZE of 1005892, but mallinfo() on the process shows a completely different number: Memory statistics from mallinfo: Total space allocated from system: 361406208 Number of non-inuse chunks: 1079273 Number of mmapped regions: 0 Total space in mmapped regions: 0 Total allocated space: 235536032 Total non-inuse space: 125870176 Top-most, releasable (via malloc_trim) space: 68776 All memory in this process is allocated via new or malloc (new calls malloc though) and the numbers mallinfo() and ps report are 99.5% accurate up until a sort of "slide" period where they diverge fairly quickly. $ cat /proc/slabinfo slabinfo - version: 1.1 (SMP) kmem_cache68 68232441 : 252 126 nfs_read_data 0 0352001 : 124 62 nfs_write_data 0 0384001 : 124 62 nfs_page 0 0 96001 : 252 126 nfs_fh80 80 96221 : 252 126 tcp_tw_bucket 39 80 96221 : 252 126 tcp_bind_bucket 33452 32441 : 252 126 tcp_open_request 403413 64771 : 252 126 inet_peer_cache1 59 64111 : 252 126 ip_fib_hash 11113 32111 : 252 126 ip_dst_cache 15813 23856160 994 9941 : 252 126 arp_cache 46 90128331 : 252 126 blkdev_requests 768800 96 20 201 : 252 126 dnotify cache 0 0 20001 : 252 126 file lock cache0 0 92001 : 252 126 fasync cache 0 0 16001 : 252 126 uid_cache 3226 32221 : 252 126 skbuff_head_cache 2840 12984160 541 5411 : 252 126 sock7757 10310800 2062 20621 : 124 62 inode_cache 7616 11000384 1100 11001 : 124 62 bdev_cache 7118 64221 : 252 126 sigqueue 58 58132221 : 252 126 kiobuf 0 0128001 : 252 126 dentry_cache7730 12420128 414 4141 : 252 126 filp 11758 11800 96 295 2951 : 252 126 names_cache2 2 4096221 : 60 30 buffer_head15200 25520 96 637 6381 : 252 126 mm_struct 72 72160331 : 252 126 vm_area_struct 1048 1062 64 18 181 : 252 126 fs_cache 118118 64221 : 252 126 files_cache 27 27416331 : 124 62 signal_act24 24 1312881 : 60 30 size-131072(DMA) 0 0 13107200 32 :00 size-1310720 0 13107200 32 :00 size-65536(DMA)0 0 6553600 16 :00 size-65536 0 0 6553600 16 :00 size-32768(DMA)0 0 32768008 :00 size-32768 0 0 32768008 :00 size-16384(DMA)0 0 16384004 :00 size-16384 0 0 16384004 :00 size-8192(DMA) 0 0 8192002 :00 size-8192 3 3 8192332 :00 size-4096(DMA) 0 0 4096001 : 60 30 size-4096 12 12 4096 12 121 : 60 30 size-2048(DMA) 0 0 2048001 :
0-order allocation failed / Fragmentation Bug? (2.4.0-test10)
I've been receiving these error messages during times of near complete memory depletion. This particular machine runs a bare minimum of processes and a our own application which is a threaded long running (1 day, 5:39) which consumes most of the resources on the machine. Oddly enough however, the mallinfo() for this process shows a discrepancy of 650 megs with ps and top. This process handles a large number of TCP connections and does a lot of dynamic memory allocation, so I assumed the difference was due to memory fragmentation on our part, however I thought that kswapd would reclaim memory once it started swapping it out. Another oddity is the bogomips reported by each CPU are somewhat different from eachother. The message being received is: Nov 3 10:04:30 n175 kernel: __alloc_pages: 0-order allocation failed. Nov 3 10:04:30 n175 last message repeated 363 times The kernel version: Linux version 2.4.0-test10 (root@stp) (gcc version egcs-2.91.66 19990314/Linux (egcs-1.1.2 release)) #1 SMP Tue Oct 31 13:13:05 PST 2000 What free reports: total used free sharedbuffers cached Mem: 10282561024172 4084 0148 59296 -/+ buffers/cache: 964728 63528 Swap: 136512 75588 60924 ps and top report that the process taking up all this memory has an RSS of 967656 KB and a VSIZE of 1005892, but mallinfo() on the process shows a completely different number: Memory statistics from mallinfo: Total space allocated from system: 361406208 Number of non-inuse chunks: 1079273 Number of mmapped regions: 0 Total space in mmapped regions: 0 Total allocated space: 235536032 Total non-inuse space: 125870176 Top-most, releasable (via malloc_trim) space: 68776 All memory in this process is allocated via new or malloc (new calls malloc though) and the numbers mallinfo() and ps report are 99.5% accurate up until a sort of "slide" period where they diverge fairly quickly. $ cat /proc/slabinfo slabinfo - version: 1.1 (SMP) kmem_cache68 68232441 : 252 126 nfs_read_data 0 0352001 : 124 62 nfs_write_data 0 0384001 : 124 62 nfs_page 0 0 96001 : 252 126 nfs_fh80 80 96221 : 252 126 tcp_tw_bucket 39 80 96221 : 252 126 tcp_bind_bucket 33452 32441 : 252 126 tcp_open_request 403413 64771 : 252 126 inet_peer_cache1 59 64111 : 252 126 ip_fib_hash 11113 32111 : 252 126 ip_dst_cache 15813 23856160 994 9941 : 252 126 arp_cache 46 90128331 : 252 126 blkdev_requests 768800 96 20 201 : 252 126 dnotify cache 0 0 20001 : 252 126 file lock cache0 0 92001 : 252 126 fasync cache 0 0 16001 : 252 126 uid_cache 3226 32221 : 252 126 skbuff_head_cache 2840 12984160 541 5411 : 252 126 sock7757 10310800 2062 20621 : 124 62 inode_cache 7616 11000384 1100 11001 : 124 62 bdev_cache 7118 64221 : 252 126 sigqueue 58 58132221 : 252 126 kiobuf 0 0128001 : 252 126 dentry_cache7730 12420128 414 4141 : 252 126 filp 11758 11800 96 295 2951 : 252 126 names_cache2 2 4096221 : 60 30 buffer_head15200 25520 96 637 6381 : 252 126 mm_struct 72 72160331 : 252 126 vm_area_struct 1048 1062 64 18 181 : 252 126 fs_cache 118118 64221 : 252 126 files_cache 27 27416331 : 124 62 signal_act24 24 1312881 : 60 30 size-131072(DMA) 0 0 13107200 32 :00 size-1310720 0 13107200 32 :00 size-65536(DMA)0 0 6553600 16 :00 size-65536 0 0 6553600 16 :00 size-32768(DMA)0 0 32768008 :00 size-32768 0 0 32768008 :00 size-16384(DMA)0 0 16384004 :00 size-16384 0 0 16384004 :00 size-8192(DMA) 0 0 8192002 :00 size-8192 3 3 8192332 :00 size-4096(DMA) 0 0 4096001 : 60 30 size-4096 12 12 4096 12 121 : 60 30 size-2048(DMA) 0 0 2048001 :
Re: Linux's implementation of poll() not scalable?
Linus Torvalds wrote: > > On Tue, 24 Oct 2000, Andi Kleen wrote: > > > > I don't see the problem. You have the poll table allocated in the kernel, > > the drivers directly change it and the user mmaps it (I was not proposing > > to let poll make a kiobuf out of the passed array) > Th eproblem with poll() as-is is that the user doesn't really tell the > kernel explictly when it is changing the table.. What you describe is exactly what the /dev/poll interface patch from the Linux scalability project does. It creates a special device which you can open up and write add/remove/modify entries you wish to be notified of using the standard struct pollfd. Removing entries is done by setting the events in a struct written to the device to POLLREMOVE. You can optionally mmap() memory which the notifications are written to. Two ioctl() calls are provide for the initial allocation and also to force it to check all items in your poll() list. Solaris has this same interface minus the mmap()'ed memory. Jordan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Linux's implementation of poll() not scalable?
Dan Kegel wrote: > > Jordan Mendelson ([EMAIL PROTECTED]) wrote: > > An implementation of /dev/poll for Linux already exists and has shown to > > be more scalable than using RT signals under my tests. A patch for 2.2.x > > and 2.4.x should be available at the Linux Scalability Project @ > > http://www.citi.umich.edu/projects/linux-scalability/ in the patches > > section. > > If you'll look at the page I linked to in my original post, > http://www.kegel.com/dkftpbench/Poller_bench.html > you'll see that I also benchmarked /dev/poll. The Linux /dev/poll implementation has a few "non-standard" features such as the ability to mmap() the poll structure memory to eliminate a memory copy. int dpoll_fd; unsigned char *dpoll; struct pollfd *mmap_dpoll; dpoll_fd = open("/dev/poll", O_RDWR, 0); ioctl(dpoll_fd, DP_ALLOC, 1); dpoll = mmap(0, DP_MMAP_SIZE(1), PROT_WRITE|PROT_READ, MAP_SHARED, dpoll_fd, 0); dpoll = (struct pollfd *)mmap_dpoll; Use this memory when reading and write() to add/remove and see if you get any boost in performance. Also, I believe there is a hash table associated with /dev/poll in the kernel patch which might slow down your performance tests when it's first growing to resize itself. Jordan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Linux's implementation of poll() not scalable?
Dan Kegel wrote: > > Linus Torvalds wrote: > > Dan Kegel wrote: > >> [ http://www.kegel.com/dkftpbench/Poller_bench.html ] > >> [ With only one active fd and N idle ones, poll's execution time scales > >> [ as 6N on Solaris, but as 300N on Linux. ] > > > > Basically, poll() is _fundamentally_ a O(n) interface. There is no way > > to avoid it - you have an array, and there simply is _no_ known > > algorithm to scan an array in faster than O(n) time. Sorry. > > ... > > Under Linux, I'm personally more worried about the performance of X etc, > > and small poll()'s are actually common. So I would argue that the > > Solaris scalability is going the wrong way. But as performance really > > depends on the load, and maybe that 1 entry load is what you > > consider "real life", you are of course free to disagree (and you'd be > > equally right ;) > The way I'm implementing RT signal support is by writing a userspace > wrapper to make it look like an OO version of poll(), more or less, > with an 'add(int fd)' method so the wrapper manages the arrays of pollfd's. > When and if I get that working, I may move it into the kernel as an > implementation of /dev/poll -- and then I won't need to worry about > the RT signal queue overflowing anymore, and I won't care how scalable > poll() is. An implementation of /dev/poll for Linux already exists and has shown to be more scalable than using RT signals under my tests. A patch for 2.2.x and 2.4.x should be available at the Linux Scalability Project @ http://www.citi.umich.edu/projects/linux-scalability/ in the patches section. It works fairly well, but I was actually somewhat disappointed to find that it wasn't the primary cause for the system CPU suckage for my particular system. Granted, when you only have to poll a few times per second, the overhead of standard poll() just isn't that bad. Jordan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Linux's implementation of poll() not scalable?
Dan Kegel wrote: Linus Torvalds wrote: Dan Kegel wrote: [ http://www.kegel.com/dkftpbench/Poller_bench.html ] [ With only one active fd and N idle ones, poll's execution time scales [ as 6N on Solaris, but as 300N on Linux. ] Basically, poll() is _fundamentally_ a O(n) interface. There is no way to avoid it - you have an array, and there simply is _no_ known algorithm to scan an array in faster than O(n) time. Sorry. ... Under Linux, I'm personally more worried about the performance of X etc, and small poll()'s are actually common. So I would argue that the Solaris scalability is going the wrong way. But as performance really depends on the load, and maybe that 1 entry load is what you consider "real life", you are of course free to disagree (and you'd be equally right ;) The way I'm implementing RT signal support is by writing a userspace wrapper to make it look like an OO version of poll(), more or less, with an 'add(int fd)' method so the wrapper manages the arrays of pollfd's. When and if I get that working, I may move it into the kernel as an implementation of /dev/poll -- and then I won't need to worry about the RT signal queue overflowing anymore, and I won't care how scalable poll() is. An implementation of /dev/poll for Linux already exists and has shown to be more scalable than using RT signals under my tests. A patch for 2.2.x and 2.4.x should be available at the Linux Scalability Project @ http://www.citi.umich.edu/projects/linux-scalability/ in the patches section. It works fairly well, but I was actually somewhat disappointed to find that it wasn't the primary cause for the system CPU suckage for my particular system. Granted, when you only have to poll a few times per second, the overhead of standard poll() just isn't that bad. Jordan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Linux's implementation of poll() not scalable?
Linus Torvalds wrote: On Tue, 24 Oct 2000, Andi Kleen wrote: I don't see the problem. You have the poll table allocated in the kernel, the drivers directly change it and the user mmaps it (I was not proposing to let poll make a kiobuf out of the passed array) Th eproblem with poll() as-is is that the user doesn't really tell the kernel explictly when it is changing the table.. What you describe is exactly what the /dev/poll interface patch from the Linux scalability project does. It creates a special device which you can open up and write add/remove/modify entries you wish to be notified of using the standard struct pollfd. Removing entries is done by setting the events in a struct written to the device to POLLREMOVE. You can optionally mmap() memory which the notifications are written to. Two ioctl() calls are provide for the initial allocation and also to force it to check all items in your poll() list. Solaris has this same interface minus the mmap()'ed memory. Jordan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: TCP: peer x.x.x.x:y/z shrinks window a:b:c...
[EMAIL PROTECTED] wrote: > > Hello! > > > I'll keep looking. > > Is it easy to reproduce? If so, try to make tcpdump, which > covers one of these messages. It's extremely rare. We maintain persistent connections open for long periods of time and even though a user who triggered it is online, it only triggers the message a maximum of 26 times (typically ~4) and the traffic volume we handle, it is not extremely practical for me to log all traffic. Of the IPs which triggered the response and were online at the time, every single one of thm has either not had any ports open, been firewalled or had nmap not be able to guess correctly with a single exception of a machine which nmap said was "Windows NT4 / Win95 / Win98, Windows NT 4 SP3, Windows NT 4.0 Server SP5 + 2047 Hotfixes." that had port 1500/tcp (vlsi-lm) open. However, during the scan, nmap reported that the report server was sending RST from port 1500. One thing I did notice is that most of the machines which I could ping that triggered this message were extremely lagged (ping times 800+). I'll keep trying though. Jordan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: TCP: peer x.x.x.x:y/z shrinks window a:b:c...
[EMAIL PROTECTED] wrote: Hello! I'll keep looking. Is it easy to reproduce? If so, try to make tcpdump, which covers one of these messages. It's extremely rare. We maintain persistent connections open for long periods of time and even though a user who triggered it is online, it only triggers the message a maximum of 26 times (typically ~4) and the traffic volume we handle, it is not extremely practical for me to log all traffic. Of the IPs which triggered the response and were online at the time, every single one of thm has either not had any ports open, been firewalled or had nmap not be able to guess correctly with a single exception of a machine which nmap said was "Windows NT4 / Win95 / Win98, Windows NT 4 SP3, Windows NT 4.0 Server SP5 + 2047 Hotfixes." that had port 1500/tcp (vlsi-lm) open. However, during the scan, nmap reported that the report server was sending RST from port 1500. One thing I did notice is that most of the machines which I could ping that triggered this message were extremely lagged (ping times 800+). I'll keep trying though. Jordan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: TCP: peer x.x.x.x:y/z shrinks window a:b:c...
"David S. Miller" wrote: > > The IP addresses are important because we can use them to find out > what TCP implementations shrink their offered windows. > > Actually, you don't need to tell me or anyone else what these IP > addresses are, you can instead run one of the "remote OS identifier" > programs out there to those sites and just let me know what OS those > systems are running :-) All of the IPs which were reported appeared to be firewalled making a direct scan impossible. The hop above them were almost always reported as Cisco terminal servers running IOS 11.2 and in one case it was reported as a Cisco router/switch running IOS 11.2. I'll keep looking. Jordan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: TCP: peer x.x.x.x:y/z shrinks window a:b:c...
"David S. Miller" wrote: The IP addresses are important because we can use them to find out what TCP implementations shrink their offered windows. Actually, you don't need to tell me or anyone else what these IP addresses are, you can instead run one of the "remote OS identifier" programs out there to those sites and just let me know what OS those systems are running :-) All of the IPs which were reported appeared to be firewalled making a direct scan impossible. The hop above them were almost always reported as Cisco terminal servers running IOS 11.2 and in one case it was reported as a Cisco router/switch running IOS 11.2. I'll keep looking. Jordan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
TCP: peer x.x.x.x:y/z shrinks window a:b:c...
I've begun to test 2.4.0 kernels on some high traffic machines to see what kind of difference it makes. I have seen a lot of these error messages in dmesg and although they don't seem to happen very often and seem harmless, I figured I'd report it anyway. They show up in groups (mostly) from the same IP and only in very small numbers (3-15). IPs have been changed to protect the innocent: TCP: peer x.x.x.x:1268/ shrinks window 2604660027:635:2604661487. Bad, what else can I say? TCP: peer x.x.x.x:1268/ shrinks window 2604662947:121:2604664407. Bad, what else can I say? TCP: peer x.x.x.x:1268/ shrinks window 2604665867:635:2604667327. Bad, what else can I say? and another: TCP: peer y.y.y.y:1125/ shrinks window 548103043:635:548104503. Bad, what else can I say? TCP: peer y.y.y.y:1125/ shrinks window 548119685:620:548121145. Bad, what else can I say? TCP: peer y.y.y.y:1125/ shrinks window 548122605:635:548124065. Bad, what else can I say? TCP: peer y.y.y.y:1125/ shrinks window 548125525:635:548126985. Bad, what else can I say? TCP: peer y.y.y.y:1125/ shrinks window 548128445:635:548129905. Bad, what else can I say? TCP: peer y.y.y.y:1125/ shrinks window 548143150:635:548144610. Bad, what else can I say? TCP: peer y.y.y.y:1125/ shrinks window 548146070:635:548146913. Bad, what else can I say? TCP: peer y.y.y.y:1125/ shrinks window 548156080:635:548157540. Bad, what else can I say? TCP: peer y.y.y.y:1125/ shrinks window 548163715:635:548165175. Bad, what else can I say? TCP: peer y.y.y.y:1125/ shrinks window 548175395:635:548176311. Bad, what else can I say? Jordan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
TCP: peer x.x.x.x:y/z shrinks window a:b:c...
I've begun to test 2.4.0 kernels on some high traffic machines to see what kind of difference it makes. I have seen a lot of these error messages in dmesg and although they don't seem to happen very often and seem harmless, I figured I'd report it anyway. They show up in groups (mostly) from the same IP and only in very small numbers (3-15). IPs have been changed to protect the innocent: TCP: peer x.x.x.x:1268/ shrinks window 2604660027:635:2604661487. Bad, what else can I say? TCP: peer x.x.x.x:1268/ shrinks window 2604662947:121:2604664407. Bad, what else can I say? TCP: peer x.x.x.x:1268/ shrinks window 2604665867:635:2604667327. Bad, what else can I say? and another: TCP: peer y.y.y.y:1125/ shrinks window 548103043:635:548104503. Bad, what else can I say? TCP: peer y.y.y.y:1125/ shrinks window 548119685:620:548121145. Bad, what else can I say? TCP: peer y.y.y.y:1125/ shrinks window 548122605:635:548124065. Bad, what else can I say? TCP: peer y.y.y.y:1125/ shrinks window 548125525:635:548126985. Bad, what else can I say? TCP: peer y.y.y.y:1125/ shrinks window 548128445:635:548129905. Bad, what else can I say? TCP: peer y.y.y.y:1125/ shrinks window 548143150:635:548144610. Bad, what else can I say? TCP: peer y.y.y.y:1125/ shrinks window 548146070:635:548146913. Bad, what else can I say? TCP: peer y.y.y.y:1125/ shrinks window 548156080:635:548157540. Bad, what else can I say? TCP: peer y.y.y.y:1125/ shrinks window 548163715:635:548165175. Bad, what else can I say? TCP: peer y.y.y.y:1125/ shrinks window 548175395:635:548176311. Bad, what else can I say? Jordan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Odd Xircom Realport Tulip Behavior
[this message was previously cc'ed to tulip-bug] It seems that my Xircom report refuses to work correctly when first initialized. I'm running Linux 2.4.0-test7 with the standard xircom_tulip_cb driver. I can get the Xircom to work just fine, but I seem to always need to go through a song and dance to do so. When the driver is first initialized, ifconfig eth0 with an IP and adding the default route will not work correctly. To get the realport to work, I have to: # ifconfig eth0 a.b.c.d netmask x.x.x.x # ifconfig eth0 down # ifconfig eth0 up # route add default gw l.m.n.o As soon as I bring it back up for the second time, it mysteriously starts working correctly. Here are the relevant kernel messages on boot: cs: IO port probe 0x0c00-0x0cff: clean. cs: IO port probe 0x0800-0x08ff: clean. cs: IO port probe 0x0100-0x04ff: excluding 0x220-0x22f 0x330-0x337 0x388-0x38f 0x398-0x39f 0x4d0-0x4d7 cs: IO port probe 0x0a00-0x0aff: clean. tulip_attach(04:00.0) PCI: Setting latency timer of device 04:00.0 to 64 xircom_tulip_cb.c:v0.91 4/14/99 [EMAIL PROTECTED] (modified by [EMAIL PROTECTED] for XIRCOM CBE, fixed by Doug Ledford) eth0: Xircom Cardbus Adapter (DEC 21143 compatible mode) rev 3 at 0x1c00, 00:10:A4:EB:58:4C, IRQ 9. eth0: MII transceiver #0 config 3100 status 7809 advertising 01e1. Jordy - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Odd Xircom Realport Tulip Behavior
[this message was previously cc'ed to tulip-bug] It seems that my Xircom report refuses to work correctly when first initialized. I'm running Linux 2.4.0-test7 with the standard xircom_tulip_cb driver. I can get the Xircom to work just fine, but I seem to always need to go through a song and dance to do so. When the driver is first initialized, ifconfig eth0 with an IP and adding the default route will not work correctly. To get the realport to work, I have to: # ifconfig eth0 a.b.c.d netmask x.x.x.x # ifconfig eth0 down # ifconfig eth0 up # route add default gw l.m.n.o As soon as I bring it back up for the second time, it mysteriously starts working correctly. Here are the relevant kernel messages on boot: cs: IO port probe 0x0c00-0x0cff: clean. cs: IO port probe 0x0800-0x08ff: clean. cs: IO port probe 0x0100-0x04ff: excluding 0x220-0x22f 0x330-0x337 0x388-0x38f 0x398-0x39f 0x4d0-0x4d7 cs: IO port probe 0x0a00-0x0aff: clean. tulip_attach(04:00.0) PCI: Setting latency timer of device 04:00.0 to 64 xircom_tulip_cb.c:v0.91 4/14/99 [EMAIL PROTECTED] (modified by [EMAIL PROTECTED] for XIRCOM CBE, fixed by Doug Ledford) eth0: Xircom Cardbus Adapter (DEC 21143 compatible mode) rev 3 at 0x1c00, 00:10:A4:EB:58:4C, IRQ 9. eth0: MII transceiver #0 config 3100 status 7809 advertising 01e1. Jordy - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/