Re: 2.4 tcp very slow under certain circumstances (Re: netdev issues (3c905B))

2001-02-21 Thread Jordan Mendelson

"David S. Miller" wrote:
> 
> Jordan Mendelson writes:
>  > Now, if it didn't have the side effect of dropping packets left and
>  > right after ~4000 open connections (simultaneously), I could finally
>  > move our production system to 2.4.x.
> 
> The change I posted as-is, is unacceptable because it adds unnecessary
> cost to a fast path.  The final change I actually use will likely
> involve using the TCP sequence numbers to calculate an "always
> changing" ID number in the IPv4 headers to placate these broken
> windows machines.

Just for kicks I modified the fast path to use a globally incremented
count to see if it would fix both Win9x problem and my 4K connection
problem and it appears to be working just fine.

What probably happened was the sheer number of packets at 4K connections
without the fast path just slowed everything down to a crawl.


Thanks Dave,

Jordan
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: 2.4 tcp very slow under certain circumstances (Re: netdev issues (3c905B))

2001-02-21 Thread Jordan Mendelson

"David S. Miller" wrote:
> 
> Jordan Mendelson writes:
>  > Now, if it didn't have the side effect of dropping packets left and
>  > right after ~4000 open connections (simultaneously), I could finally
>  > move our production system to 2.4.x.
> 
> There is no reason my patch should have this effect.

My guess is that the fast path prevented the need for looking up the
destination in some structure which is limited to ~4K entries (route
table?).


Jordan
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: 2.4 tcp very slow under certain circumstances (Re: netdev issues (3c905B))

2001-02-21 Thread Jordan Mendelson

"David S. Miller" wrote:
> 
> Ookhoi writes:
>  > We have exactly the same problem but in our case it depends on the
>  > following three conditions: 1, kernel 2.4 (2.2 is fine), 2, windows ip
>  > header compression turned on, 3, a free internet access provider in
>  > Holland called 'Wish' (which seemes to stand for 'I Wish I had a faster
>  > connection').
>  > If we remove one of the three conditions, the connection is oke. It is
>  > only tcp which is affected.
>  > A packet on its way from linux server to windows client seems to get
>  > dropped once and retransmitted. This makes the connection _very_ slow.
> 
> :-( I hate these buggy systems.
> 
> Does this patch below fix the performance problem and are the windows
> clients win2000 or win95?

Just a note however... this patch did fix the problem we were seeing
with retransmits and Win95 compressed PPP and dialup over earthlink in
the bay area.

Now, if it didn't have the side effect of dropping packets left and
right after ~4000 open connections (simultaneously), I could finally
move our production system to 2.4.x.



Jordan
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: 2.4 tcp very slow under certain circumstances (Re: netdev issues (3c905B))

2001-02-21 Thread Jordan Mendelson

"David S. Miller" wrote:
> 
> Ookhoi writes:
>  > We have exactly the same problem but in our case it depends on the
>  > following three conditions: 1, kernel 2.4 (2.2 is fine), 2, windows ip
>  > header compression turned on, 3, a free internet access provider in
>  > Holland called 'Wish' (which seemes to stand for 'I Wish I had a faster
>  > connection').
>  > If we remove one of the three conditions, the connection is oke. It is
>  > only tcp which is affected.
>  > A packet on its way from linux server to windows client seems to get
>  > dropped once and retransmitted. This makes the connection _very_ slow.
> 
> :-( I hate these buggy systems.
> 
> Does this patch below fix the performance problem and are the windows
> clients win2000 or win95?

I wanted to see if this would fix the problem I was seeing with Win9x
users on PPP w/ compression dialing up to Earthlink in the bay area
(there are others, but it's the only one I can reproduce).

I compiled 2.4.1 with this change and for some odd reason, the kernel
started dropping packets and became unusable (couldn't ssh in) after
around 4050 connections were opened. I tested it also with 2.4.1-ac20
and had the same problem right around 4050 connections.

This is on a VA Linux box with dual eepro100's (one used) connected to a
Cisco 6509.



> --- include/net/ip.h.~1~Mon Feb 19 00:12:31 2001
> +++ include/net/ip.hWed Feb 21 02:56:15 2001
> @@ -190,9 +190,11 @@
> 
>  static inline void ip_select_ident(struct iphdr *iph, struct dst_entry *dst)
>  {
> +#if 0
> if (iph->frag_off&__constant_htons(IP_DF))
> iph->id = 0;
> else
> +#endif
> __ip_select_ident(iph, dst);
>  }
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: 2.4 tcp very slow under certain circumstances (Re: netdev issues (3c905B))

2001-02-21 Thread Jordan Mendelson

"David S. Miller" wrote:
 
 Ookhoi writes:
   We have exactly the same problem but in our case it depends on the
   following three conditions: 1, kernel 2.4 (2.2 is fine), 2, windows ip
   header compression turned on, 3, a free internet access provider in
   Holland called 'Wish' (which seemes to stand for 'I Wish I had a faster
   connection').
   If we remove one of the three conditions, the connection is oke. It is
   only tcp which is affected.
   A packet on its way from linux server to windows client seems to get
   dropped once and retransmitted. This makes the connection _very_ slow.
 
 :-( I hate these buggy systems.
 
 Does this patch below fix the performance problem and are the windows
 clients win2000 or win95?

I wanted to see if this would fix the problem I was seeing with Win9x
users on PPP w/ compression dialing up to Earthlink in the bay area
(there are others, but it's the only one I can reproduce).

I compiled 2.4.1 with this change and for some odd reason, the kernel
started dropping packets and became unusable (couldn't ssh in) after
around 4050 connections were opened. I tested it also with 2.4.1-ac20
and had the same problem right around 4050 connections.

This is on a VA Linux box with dual eepro100's (one used) connected to a
Cisco 6509.



 --- include/net/ip.h.~1~Mon Feb 19 00:12:31 2001
 +++ include/net/ip.hWed Feb 21 02:56:15 2001
 @@ -190,9 +190,11 @@
 
  static inline void ip_select_ident(struct iphdr *iph, struct dst_entry *dst)
  {
 +#if 0
 if (iph-frag_off__constant_htons(IP_DF))
 iph-id = 0;
 else
 +#endif
 __ip_select_ident(iph, dst);
  }
 
 -
 To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: 2.4 tcp very slow under certain circumstances (Re: netdev issues (3c905B))

2001-02-21 Thread Jordan Mendelson

"David S. Miller" wrote:
 
 Ookhoi writes:
   We have exactly the same problem but in our case it depends on the
   following three conditions: 1, kernel 2.4 (2.2 is fine), 2, windows ip
   header compression turned on, 3, a free internet access provider in
   Holland called 'Wish' (which seemes to stand for 'I Wish I had a faster
   connection').
   If we remove one of the three conditions, the connection is oke. It is
   only tcp which is affected.
   A packet on its way from linux server to windows client seems to get
   dropped once and retransmitted. This makes the connection _very_ slow.
 
 :-( I hate these buggy systems.
 
 Does this patch below fix the performance problem and are the windows
 clients win2000 or win95?

Just a note however... this patch did fix the problem we were seeing
with retransmits and Win95 compressed PPP and dialup over earthlink in
the bay area.

Now, if it didn't have the side effect of dropping packets left and
right after ~4000 open connections (simultaneously), I could finally
move our production system to 2.4.x.



Jordan
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: 2.4 tcp very slow under certain circumstances (Re: netdev issues (3c905B))

2001-02-21 Thread Jordan Mendelson

"David S. Miller" wrote:
 
 Jordan Mendelson writes:
   Now, if it didn't have the side effect of dropping packets left and
   right after ~4000 open connections (simultaneously), I could finally
   move our production system to 2.4.x.
 
 There is no reason my patch should have this effect.

My guess is that the fast path prevented the need for looking up the
destination in some structure which is limited to ~4K entries (route
table?).


Jordan
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: 2.4 tcp very slow under certain circumstances (Re: netdev issues (3c905B))

2001-02-21 Thread Jordan Mendelson

"David S. Miller" wrote:
 
 Jordan Mendelson writes:
   Now, if it didn't have the side effect of dropping packets left and
   right after ~4000 open connections (simultaneously), I could finally
   move our production system to 2.4.x.
 
 The change I posted as-is, is unacceptable because it adds unnecessary
 cost to a fast path.  The final change I actually use will likely
 involve using the TCP sequence numbers to calculate an "always
 changing" ID number in the IPv4 headers to placate these broken
 windows machines.

Just for kicks I modified the fast path to use a globally incremented
count to see if it would fix both Win9x problem and my 4K connection
problem and it appears to be working just fine.

What probably happened was the sheer number of packets at 4K connections
without the fast path just slowed everything down to a crawl.


Thanks Dave,

Jordan
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: MTU and 2.4.x kernel

2001-02-15 Thread Jordan Mendelson

Rick Jones wrote:
> 
> > Default of 536 is sadistic (and apaprently will be changed eventually
> > to stop tears of poor people whose providers not only supply them
> > with bogus mtu values sort of 552 or even 296, but also jailed them
> > to some proxy or masquearding domain), but it is still right: IP
> > with mtu lower 576 is not full functional.
> 
> I thought that the specs said that 576 was the "minimum maximum"
> reassemblable IP datagram size and not a minimum MTU.

RFC 1191 (Path MTU Discovery as it happens):

 
   PlateauMTUComments  Reference
   -- ---  -
  65535  Official maximum MTU  RFC 791
  65535  Hyperchannel  RFC 1044
   65535
   32000 Just in case
  17914  16Mb IBM Token Ring   ref. [6]
   17914
  8166   IEEE 802.4RFC 1042
   8166
  4464   IEEE 802.5 (4Mb max)  RFC 1042
  4352   FDDI (Revised)RFC 1188
   4352 (1%)
  2048   Wideband Network  RFC 907
  2002   IEEE 802.5 (4Mb recommended)  RFC 1042
   2002 (2%)
  1536   Exp. Ethernet NetsRFC 895
  1500   Ethernet Networks RFC 894
  1500   Point-to-Point (default)  RFC 1134
  1492   IEEE 802.3RFC 1042
   1492 (3%)
  1006   SLIP  RFC 1055
  1006   ARPANET   BBN 1822
   1006
  576X.25 Networks RFC 877
  544DEC IP Portal ref. [10]
  512NETBIOS   RFC 1088
  508IEEE 802/Source-Rt Bridge RFC 1042
  508ARCNETRFC 1051
   508 (13%)
  296Point-to-Point (low delay)RFC 1144
   296
   68Official minimum MTU  RFC 791
 

Jordan
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: MTU and 2.4.x kernel

2001-02-15 Thread Jordan Mendelson

Rick Jones wrote:
 
  Default of 536 is sadistic (and apaprently will be changed eventually
  to stop tears of poor people whose providers not only supply them
  with bogus mtu values sort of 552 or even 296, but also jailed them
  to some proxy or masquearding domain), but it is still right: IP
  with mtu lower 576 is not full functional.
 
 I thought that the specs said that 576 was the "minimum maximum"
 reassemblable IP datagram size and not a minimum MTU.

RFC 1191 (Path MTU Discovery as it happens):

 
   PlateauMTUComments  Reference
   -- ---  -
  65535  Official maximum MTU  RFC 791
  65535  Hyperchannel  RFC 1044
   65535
   32000 Just in case
  17914  16Mb IBM Token Ring   ref. [6]
   17914
  8166   IEEE 802.4RFC 1042
   8166
  4464   IEEE 802.5 (4Mb max)  RFC 1042
  4352   FDDI (Revised)RFC 1188
   4352 (1%)
  2048   Wideband Network  RFC 907
  2002   IEEE 802.5 (4Mb recommended)  RFC 1042
   2002 (2%)
  1536   Exp. Ethernet NetsRFC 895
  1500   Ethernet Networks RFC 894
  1500   Point-to-Point (default)  RFC 1134
  1492   IEEE 802.3RFC 1042
   1492 (3%)
  1006   SLIP  RFC 1055
  1006   ARPANET   BBN 1822
   1006
  576X.25 Networks RFC 877
  544DEC IP Portal ref. [10]
  512NETBIOS   RFC 1088
  508IEEE 802/Source-Rt Bridge RFC 1042
  508ARCNETRFC 1051
   508 (13%)
  296Point-to-Point (low delay)RFC 1144
   296
   68Official minimum MTU  RFC 791
 

Jordan
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: 2.4.0 Networking oddity

2001-01-29 Thread Jordan Mendelson

Daniel Walton wrote:
> 
> The server in question is running the tulip driver.  dmesg reports:
> 
> Linux Tulip driver version 0.9.13 (January 2, 2001)
> 
> I have seen this same behavior on a couple of my servers running 3com
> 3c905c adaptors as well.
> 
> The last time I was experiencing it I rebooted the system and it didn't
> solve the problem.  When it came up it was still lagging.  This would lead
> me to believe that it is caused by some sort of network condition, but what
> I don't know.
> 
> If anyone has ideas, I'd be more than happy to run tests/provide more info..
> 

If you are running an intelligent switch, double check to make sure your
duplex and speed match what the switch sees on it's port. The biggest
problem I've had with any of my machines is autonegotiation of port
speed and duplex. Typically all that is required is that I force speed
and duplex on the Linux end.


Jordan
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: 2.4.0 Networking oddity

2001-01-29 Thread Jordan Mendelson

Daniel Walton wrote:
 
 The server in question is running the tulip driver.  dmesg reports:
 
 Linux Tulip driver version 0.9.13 (January 2, 2001)
 
 I have seen this same behavior on a couple of my servers running 3com
 3c905c adaptors as well.
 
 The last time I was experiencing it I rebooted the system and it didn't
 solve the problem.  When it came up it was still lagging.  This would lead
 me to believe that it is caused by some sort of network condition, but what
 I don't know.
 
 If anyone has ideas, I'd be more than happy to run tests/provide more info..
 

If you are running an intelligent switch, double check to make sure your
duplex and speed match what the switch sees on it's port. The biggest
problem I've had with any of my machines is autonegotiation of port
speed and duplex. Typically all that is required is that I force speed
and duplex on the Linux end.


Jordan
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: USB problems with 2.4.0: USBDEVFS_BULK failed

2001-01-05 Thread Jordan Mendelson

Greg KH wrote:
> 
> On Thu, Jan 04, 2001 at 07:52:15PM -0800, Jordan Mendelson wrote:
> >
> > Alright, this is driving me nuts. I have a Canon S20 digital camera
> > hooked up to a Sony XG series laptop via the USB port and am using s10sh
> > to access it. s10sh uses libusb 0.1.1, but I've also tried it using
> > libusb 0.1.2 without any luck. libusb uses usbfs to access to the device
> > from userspace.
> >
> > The last time it worked was around 2.4.0test10, but might have been
> > test9. test12, prerelease and 2.4.0 final all fail.
> 
> Could you try to verify exactly which version things died on?  As you
> know USB has had a number of changes to the code recently :)
> 
> That would help us try to determine what broke.

I just rebooted a few times... 2.4.0-test10 is the last kernel that it
worked correctly with. 2.4.0-test11 shows the same signs as
2.4.0-test12, prerelease and 2.4.0 proper.


Jordan
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: USB problems with 2.4.0: USBDEVFS_BULK failed

2001-01-05 Thread Jordan Mendelson

Greg KH wrote:
 
 On Thu, Jan 04, 2001 at 07:52:15PM -0800, Jordan Mendelson wrote:
 
  Alright, this is driving me nuts. I have a Canon S20 digital camera
  hooked up to a Sony XG series laptop via the USB port and am using s10sh
  to access it. s10sh uses libusb 0.1.1, but I've also tried it using
  libusb 0.1.2 without any luck. libusb uses usbfs to access to the device
  from userspace.
 
  The last time it worked was around 2.4.0test10, but might have been
  test9. test12, prerelease and 2.4.0 final all fail.
 
 Could you try to verify exactly which version things died on?  As you
 know USB has had a number of changes to the code recently :)
 
 That would help us try to determine what broke.

I just rebooted a few times... 2.4.0-test10 is the last kernel that it
worked correctly with. 2.4.0-test11 shows the same signs as
2.4.0-test12, prerelease and 2.4.0 proper.


Jordan
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



USB problems with 2.4.0: USBDEVFS_BULK failed

2001-01-04 Thread Jordan Mendelson


Alright, this is driving me nuts. I have a Canon S20 digital camera
hooked up to a Sony XG series laptop via the USB port and am using s10sh
to access it. s10sh uses libusb 0.1.1, but I've also tried it using
libusb 0.1.2 without any luck. libusb uses usbfs to access to the device
from userspace. 

The last time it worked was around 2.4.0test10, but might have been
test9. test12, prerelease and 2.4.0 final all fail.

I've compiled the uhci driver with debugging. The log starts before I
send the file transfer request to the camera and ends after the camera
blows up and disconnects itself. This was done using 2.4.0 final.

I have also included a protocol dump from s10sh, recorded during a
second attempt. It looks like s10sh might strip header bytes from the
log, but it should help somewhat.

Now as far as I can tell, we submit a bulk transfer request and start
reading. We want to read 2872 bytes (44 @ 64 bytes, 1 @ 56 bytes). We
read off 44 @ 64 bytes, but for some reason don't read off the last 56
bytes and a babble is detected.


Jordan

Jan  4 18:06:29 u2 kernel: klogd 1.3-3#33.1, log source = /proc/kmsg started.
Jan  4 18:06:29 u2 kernel: Inspecting /boot/System.map-2.4.0
Jan  4 18:06:29 u2 kernel: Loaded 13430 symbols from /boot/System.map-2.4.0.
Jan  4 18:06:29 u2 kernel: Symbols match kernel version 2.4.0.
Jan  4 18:06:29 u2 kernel: Loaded 145 symbols from 6 modules.
Jan  4 18:06:32 u2 kernel: usb-uhci.c: search_dev_ep:
Jan  4 18:06:32 u2 kernel: usb-uhci.c: submit_urb: scheduling cf2cfba0
Jan  4 18:06:32 u2 kernel: usb-uhci.c: uhci_submit_control start
Jan  4 18:06:32 u2 kernel: usb-uhci.c: Allocated qh @ c43809e0
Jan  4 18:06:32 u2 kernel: usb-uhci.c: uhci_submit_control end
Jan  4 18:06:32 u2 kernel: usb-uhci.c: submit_urb: scheduled with ret: 0
Jan  4 18:06:32 u2 kernel: usb-uhci.c: interrupt
Jan  4 18:06:32 u2 kernel: usb-uhci.c: process_transfer: len:8 status:3807 
mapped:0 toggle:0
Jan  4 18:06:32 u2 kernel: usb-uhci.c: process_transfer: len:32 status:381f 
mapped:0 toggle:1
Jan  4 18:06:32 u2 kernel: usb-uhci.c: process_transfer: len:32 status:381f 
mapped:0 toggle:0
Jan  4 18:06:32 u2 kernel: usb-uhci.c: process_transfer: len:32 status:381f 
mapped:0 toggle:1
Jan  4 18:06:32 u2 kernel: usb-uhci.c: process_transfer: len:22 status:3815 
mapped:0 toggle:0
Jan  4 18:06:32 u2 kernel: usb-uhci.c: process_transfer: len:0 status:190007ff 
mapped:0 toggle:1
Jan  4 18:06:32 u2 kernel: usb-uhci.c: uhci_clean_transfer: No more bulks for urb 
cf2cfba0, qh c43809e0, bqh , nqh 
Jan  4 18:06:32 u2 kernel: usb-uhci.c: unlink qh c43809e0, pqh c4380720, nxqh 
c43806e0, to 043806e0
Jan  4 18:06:32 u2 kernel: usb-uhci.c: process_transfer: (end) urb cf2cfba0, wanted 
len 118, len 118 status 0 err 0
Jan  4 18:06:32 u2 kernel: usb-uhci.c: dequeued urb: cf2cfba0
Jan  4 18:06:32 u2 kernel: usb-uhci.c: unlink td @ c4380e20
Jan  4 18:06:32 u2 kernel: usb-uhci.c: unlink td @ c4380de0
Jan  4 18:06:32 u2 kernel: usb-uhci.c: unlink td @ c4380ee0
Jan  4 18:06:32 u2 kernel: usb-uhci.c: search_dev_ep:
Jan  4 18:06:32 u2 kernel: usb-uhci.c: submit_urb: scheduling cf2cfba0
Jan  4 18:06:32 u2 kernel: usb-uhci.c: uhci_submit_bulk_urb: urb cf2cfba0, old 
, pipe c0008280, len 64
Jan  4 18:06:32 u2 kernel: usb-uhci.c: Allocated qh @ c4380aa0
Jan  4 18:06:32 u2 kernel: usb-uhci.c: uhci_submit_bulk: qh c4380aa0 bqh 0001 nqh 
c02595f2
Jan  4 18:06:32 u2 kernel: usb-uhci.c: submit_urb: scheduled with ret: 0
Jan  4 18:06:32 u2 kernel: usb-uhci.c: interrupt
Jan  4 18:06:32 u2 kernel: usb-uhci.c: process_transfer: len:64 status:393f 
mapped:0 toggle:0
Jan  4 18:06:32 u2 kernel: usb-uhci.c: uhci_clean_transfer: No more bulks for urb 
cf2cfba0, qh c4380aa0, bqh , nqh 
Jan  4 18:06:32 u2 kernel: usb-uhci.c: unlink qh c4380aa0, pqh c43806e0, nxqh 
c4380660, to 04380660
Jan  4 18:06:32 u2 kernel: usb-uhci.c: process_transfer: (end) urb cf2cfba0, wanted 
len 64, len 64 status 0 err 0
Jan  4 18:06:32 u2 kernel: usb-uhci.c: dequeued urb: cf2cfba0
Jan  4 18:06:32 u2 kernel: usb-uhci.c: unlink td @ c4380e60
Jan  4 18:06:32 u2 kernel: usb-uhci.c: unlink td @ c4380d60
Jan  4 18:06:32 u2 kernel: usb-uhci.c: unlink td @ c4380ea0
Jan  4 18:06:32 u2 kernel: usb-uhci.c: unlink td @ c4380d20
Jan  4 18:06:32 u2 kernel: usb-uhci.c: unlink td @ c4380f20
Jan  4 18:06:32 u2 kernel: usb-uhci.c: unlink td @ c4380da0
Jan  4 18:06:32 u2 kernel: usb-uhci.c: search_dev_ep:
Jan  4 18:06:32 u2 kernel: usb-uhci.c: submit_urb: scheduling cf2cfba0
Jan  4 18:06:32 u2 kernel: usb-uhci.c: uhci_submit_bulk_urb: urb cf2cfba0, old 
, pipe c0008280, len 2872
Jan  4 18:06:32 u2 kernel: usb-uhci.c: Allocated qh @ c43809e0
Jan  4 18:06:32 u2 kernel: usb-uhci.c: uhci_submit_bulk: qh c43809e0 bqh 0001 nqh 
c02595f2
Jan  4 18:06:32 u2 kernel: usb-uhci.c: submit_urb: scheduled with ret: 0
Jan  4 18:06:32 u2 kernel: usb-uhci.c: interrupt
Jan  4 18:06:32 u2 kernel: usb-uhci.c: interrupt, status 3, frame# 

Re: And oh, btw..

2001-01-04 Thread Jordan Mendelson

dep wrote:
> 
> On Thursday 04 January 2001 07:36 pm, Jordan Mendelson wrote:
> 
> | Go home, get out the epson salts, fill up the tub with hot water
> | and just relax.
> 
> right after getting the source posted on kernel.org!


Sigh, try:

http://www.kernel.org/pub/linux/kernel/testing/prerelease-diff


Please don't flood kernel.org though... use a mirror.


Jordan
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: And oh, btw..

2001-01-04 Thread Jordan Mendelson

Linus Torvalds wrote:
> 
> In a move unanimously hailed by the trade press and industry analysts as
> being a sure sign of incipient braindamage, Linus Torvalds (also known as
> the "father of Linux" or, more commonly, as "mush-for-brains") decided
> that enough is enough, and that things don't get better from having the
> same people test it over and over again. In short, 2.4.0 is out there.

Everyone who has ever been the press spotlight knows that most of it is
inaccurate, rushed and written to bring in readers rather than to report
well thought out stories.

Go home, get out the epson salts, fill up the tub with hot water and
just relax.


Jordan
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



2.4.0-pre: usbdevfs: USBDEVFS_BULK failed ...

2001-01-04 Thread Jordan Mendelson


I've been having some problems with the recent 2.4.x kernels with my
digital camera. The s10sh program accesses the Canon S20 digital camera
using libusb in conjunction with usbfs to download images. Apparently,
incorrect data about the size of images is being sent down the line
after the first image transfer.

Here are some messages printed to syslog:

hub.c: USB new device connect on bus1/1, assigned device number 4
usbserial.c: none matched
usb.c: USB device 4 (vend/prod 0x4a9/0x3043) is not claimed by any
active driver.
usb-uhci.c: interrupt, status 3, frame# 496
usbdevfs: USBDEVFS_BULK failed dev 4 ep 0x81 len 2872 ret -32
usbdevfs: USBDEVFS_BULK failed dev 4 ep 0x81 len 84 ret -32
usbdevfs: USBDEVFS_BULK failed dev 4 ep 0x81 len 64 ret -32
usb.c: USB disconnect on device 4

Now, the USB disconnect never actually happened physically. The camera
looks like it stopped responding to it's USB port.


Jordan
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



2.4.0-pre: usbdevfs: USBDEVFS_BULK failed ...

2001-01-04 Thread Jordan Mendelson


I've been having some problems with the recent 2.4.x kernels with my
digital camera. The s10sh program accesses the Canon S20 digital camera
using libusb in conjunction with usbfs to download images. Apparently,
incorrect data about the size of images is being sent down the line
after the first image transfer.

Here are some messages printed to syslog:

hub.c: USB new device connect on bus1/1, assigned device number 4
usbserial.c: none matched
usb.c: USB device 4 (vend/prod 0x4a9/0x3043) is not claimed by any
active driver.
usb-uhci.c: interrupt, status 3, frame# 496
usbdevfs: USBDEVFS_BULK failed dev 4 ep 0x81 len 2872 ret -32
usbdevfs: USBDEVFS_BULK failed dev 4 ep 0x81 len 84 ret -32
usbdevfs: USBDEVFS_BULK failed dev 4 ep 0x81 len 64 ret -32
usb.c: USB disconnect on device 4

Now, the USB disconnect never actually happened physically. The camera
looks like it stopped responding to it's USB port.


Jordan
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: And oh, btw..

2001-01-04 Thread Jordan Mendelson

Linus Torvalds wrote:
 
 In a move unanimously hailed by the trade press and industry analysts as
 being a sure sign of incipient braindamage, Linus Torvalds (also known as
 the "father of Linux" or, more commonly, as "mush-for-brains") decided
 that enough is enough, and that things don't get better from having the
 same people test it over and over again. In short, 2.4.0 is out there.

Everyone who has ever been the press spotlight knows that most of it is
inaccurate, rushed and written to bring in readers rather than to report
well thought out stories.

Go home, get out the epson salts, fill up the tub with hot water and
just relax.


Jordan
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



2.4.0-pre able to mount SHM twice

2001-01-03 Thread Jordan Mendelson


This is probably due to the source being 'none', but the shm mount point
can be mounted twice at the same mount point.

Shouldn't mount(2) return -EBUSY in this case?


# cat /etc/mtab
/dev/hda4 / ext2 rw,errors=remount-ro,errors=remount-ro 0 0
proc /proc proc rw 0 0
devpts /dev/pts devpts rw,gid=5,mode=620 0 0
/dev/hda1 /boot ext2 rw 0 0
/dev/hda3 /mnt/win vfat rw 0 0
none /proc/bus/usb usbdevfs rw 0 0

# mount /dev/shm
# cat /etc/mtab
/dev/hda4 / ext2 rw,errors=remount-ro,errors=remount-ro 0 0
proc /proc proc rw 0 0
devpts /dev/pts devpts rw,gid=5,mode=620 0 0
/dev/hda1 /boot ext2 rw 0 0
/dev/hda3 /mnt/win vfat rw 0 0
none /proc/bus/usb usbdevfs rw 0 0
none /dev/shm shm rw 0 0

# mount /dev/shm
# cat /etc/mtab
/dev/hda4 / ext2 rw,errors=remount-ro,errors=remount-ro 0 0
proc /proc proc rw 0 0
devpts /dev/pts devpts rw,gid=5,mode=620 0 0
/dev/hda1 /boot ext2 rw 0 0
/dev/hda3 /mnt/win vfat rw 0 0
none /proc/bus/usb usbdevfs rw 0 0
none /dev/shm shm rw 0 0
none /dev/shm shm rw 0 0

# umount /dev/shm
# cat /etc/mtab
/dev/hda4 / ext2 rw,errors=remount-ro,errors=remount-ro 0 0
proc /proc proc rw 0 0
devpts /dev/pts devpts rw,gid=5,mode=620 0 0
/dev/hda1 /boot ext2 rw 0 0
/dev/hda3 /mnt/win vfat rw 0 0
none /proc/bus/usb usbdevfs rw 0 0

Jordan
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



2.4.0-pre able to mount SHM twice

2001-01-03 Thread Jordan Mendelson


This is probably due to the source being 'none', but the shm mount point
can be mounted twice at the same mount point.

Shouldn't mount(2) return -EBUSY in this case?


# cat /etc/mtab
/dev/hda4 / ext2 rw,errors=remount-ro,errors=remount-ro 0 0
proc /proc proc rw 0 0
devpts /dev/pts devpts rw,gid=5,mode=620 0 0
/dev/hda1 /boot ext2 rw 0 0
/dev/hda3 /mnt/win vfat rw 0 0
none /proc/bus/usb usbdevfs rw 0 0

# mount /dev/shm
# cat /etc/mtab
/dev/hda4 / ext2 rw,errors=remount-ro,errors=remount-ro 0 0
proc /proc proc rw 0 0
devpts /dev/pts devpts rw,gid=5,mode=620 0 0
/dev/hda1 /boot ext2 rw 0 0
/dev/hda3 /mnt/win vfat rw 0 0
none /proc/bus/usb usbdevfs rw 0 0
none /dev/shm shm rw 0 0

# mount /dev/shm
# cat /etc/mtab
/dev/hda4 / ext2 rw,errors=remount-ro,errors=remount-ro 0 0
proc /proc proc rw 0 0
devpts /dev/pts devpts rw,gid=5,mode=620 0 0
/dev/hda1 /boot ext2 rw 0 0
/dev/hda3 /mnt/win vfat rw 0 0
none /proc/bus/usb usbdevfs rw 0 0
none /dev/shm shm rw 0 0
none /dev/shm shm rw 0 0

# umount /dev/shm
# cat /etc/mtab
/dev/hda4 / ext2 rw,errors=remount-ro,errors=remount-ro 0 0
proc /proc proc rw 0 0
devpts /dev/pts devpts rw,gid=5,mode=620 0 0
/dev/hda1 /boot ext2 rw 0 0
/dev/hda3 /mnt/win vfat rw 0 0
none /proc/bus/usb usbdevfs rw 0 0

Jordan
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Poor TCP Performance 2.4.0-10 <-> Win98 SE PPP

2000-11-07 Thread Jordan Mendelson

Andi Kleen wrote:
> 
> On Mon, Nov 06, 2000 at 11:16:21PM -0800, Jordan Mendelson wrote:
> > > It is clear though, that something is messing with or corrupting the
> > > packets.  One thing you might try is turning off TCP header
> > > compression for the PPP link, does this make a difference?
> >
> > Actually, there has been several reports that turning header compression
> > does help.
> 
> What does help ? Turning it on or turning it off ?

We had a good number of reports that turning PPP header compression off
helped. The windows 98 connection I was testing with it did have header
compression turned on. Unfortunatly, I can't just ask the entire windows
world to turn off header compression in order to use our software. :)

I believe we've reverted all of our machines to 2.2, so testing this any
further is going to be a problem.


Jordan
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Poor TCP Performance 2.4.0-10 - Win98 SE PPP

2000-11-07 Thread Jordan Mendelson

Andi Kleen wrote:
 
 On Mon, Nov 06, 2000 at 11:16:21PM -0800, Jordan Mendelson wrote:
   It is clear though, that something is messing with or corrupting the
   packets.  One thing you might try is turning off TCP header
   compression for the PPP link, does this make a difference?
 
  Actually, there has been several reports that turning header compression
  does help.
 
 What does help ? Turning it on or turning it off ?

We had a good number of reports that turning PPP header compression off
helped. The windows 98 connection I was testing with it did have header
compression turned on. Unfortunatly, I can't just ask the entire windows
world to turn off header compression in order to use our software. :)

I believe we've reverted all of our machines to 2.2, so testing this any
further is going to be a problem.


Jordan
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Poor TCP Performance 2.4.0-10 <-> Win98 SE PPP

2000-11-06 Thread Jordan Mendelson

"David S. Miller" wrote:
> 
>Date: Mon, 06 Nov 2000 23:16:21 -0800
>From: Jordan Mendelson <[EMAIL PROTECTED]>
> 
>"David S. Miller" wrote:
>> It is clear though, that something is messing with or corrupting the
>> packets.  One thing you might try is turning off TCP header
>> compression for the PPP link, does this make a difference?
> 
>Actually, there has been several reports that turning header
>compression does help.
> 
> If this is what is causing the TCP sequence numbers to change
> then either Win98's or Earthlink terminal server's implementation
> of TCP header compression is buggy.
> 
> Assuming this is true, it explains why Win98's TCP does not "see" the
> data sent by Linux, because such a bug would make the TCP checksum of
> these packets incorrect and thus dropped by Win98's TCP.

Ok, but why doesn't 2.2.16 exhibit this behavior?

We've had reports from quite a number of people complaining about this
and I'm fairly certain not all of them are from Earthlink.


Jordan
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Poor TCP Performance 2.4.0-10 <-> Win98 SE PPP

2000-11-06 Thread Jordan Mendelson

"David S. Miller" wrote:
> 
>Date: Mon, 06 Nov 2000 22:44:00 -0800
>From: Jordan Mendelson <[EMAIL PROTECTED]>
> 
>Attached to this message are dumps from the windows 98 machine using
>windump and the linux 2.4.0-test10. Sorry the time stamps don't match
>up.
> 
> (ie. Linux sends bytes 1:21 both the first time, and when it
>  retransmits that data.  However win98 "sees" this as 1:19 the first
>  time and 1:21 during the retransmit by Linux)
> 
> That is bogus.  Something is mangling the packets between the Linux
> machine and the win98 machine.  You mentioned something about
> bandwidth limiting at your upstream provider, any chance you can have
> them turn this bandwidth limiting device off?

It actually turns out that that problem with bandwidth was fixed
yesterday, so this can not be the problem here and yes, 64.124.41.179 is
a linux box. :)

> Or maybe earthlink is using some packet mangling device?
> 
> It is clear though, that something is messing with or corrupting the
> packets.  One thing you might try is turning off TCP header
> compression for the PPP link, does this make a difference?

Actually, there has been several reports that turning header compression
does help.


Jordan
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Poor TCP Performance 2.4.0-10 <-> Win98 SE PPP

2000-11-06 Thread Jordan Mendelson

"David S. Miller" wrote:
> 
>Date: Mon, 06 Nov 2000 22:13:23 -0800
>From: Jordan Mendelson <[EMAIL PROTECTED]>
> 
>There is a possibility that we are hitting an upper level bandwidth
>limit between us an our upstream provider due to a misconfiguration
>on the other end, but this should only happen during peak time
>(which it is not right now). It just bugs me that 2.2.16 doesn't
>appear to have this problem.
> 
> The only thing I can do now is beg for a tcpdump from the windows95
> machine side.  Do you have the facilities necessary to obtain this?
> This would prove that it is packet drop between the two systems, for
> whatever reason, that is causing this.

Attached to this message are dumps from the windows 98 machine using
windump and the linux 2.4.0-test10. Sorry the time stamps don't match
up.


Jordan

23:36:15.252817 209.179.194.175.1084 > 64.124.41.179.: S 370996:370996(0) win 8192 
 (DF)
23:36:15.252891 64.124.41.179. > 209.179.194.175.1084: S 3050526223:3050526223(0) 
ack 370997 win 5840  (DF)
23:36:16.159685 209.179.194.175.1084 > 64.124.41.179.: . ack 1 win 8576 (DF)
23:36:16.160461 209.179.194.175.1084 > 64.124.41.179.: . ack 1 win 65280 (DF)
23:36:16.160488 209.179.194.175.1084 > 64.124.41.179.: P 1:44(43) ack 1 win 65280 
(DF)
23:36:16.160506 64.124.41.179. > 209.179.194.175.1084: . ack 44 win 5840 (DF)
23:36:16.261533 64.124.41.179. > 209.179.194.175.1084: P 1:21(20) ack 44 win 5840 
(DF)
23:36:16.261669 64.124.41.179. > 209.179.194.175.1084: P 21:557(536) ack 44 win 
5840 (DF)
23:36:19.261055 64.124.41.179. > 209.179.194.175.1084: P 1:21(20) ack 44 win 5840 
(DF)
23:36:19.450762 209.179.194.175.1084 > 64.124.41.179.: P 44:56(12) ack 21 win 
65260 (DF)
23:36:19.450788 64.124.41.179. > 209.179.194.175.1084: P 21:557(536) ack 44 win 
5840 (DF)
23:36:19.450820 64.124.41.179. > 209.179.194.175.1084: P 557:1093(536) ack 56 win 
5840 (DF)
23:36:22.281248 209.179.194.175.1084 > 64.124.41.179.: P 44:456(412) ack 21 win 
65260 (DF)
23:36:22.281308 64.124.41.179. > 209.179.194.175.1084: . ack 456 win 6432 
 (DF)
23:36:25.441061 64.124.41.179. > 209.179.194.175.1084: P 21:557(536) ack 456 win 
6432 (DF)
23:36:25.701796 209.179.194.175.1084 > 64.124.41.179.: . ack 557 win 65280 (DF)
23:36:25.701841 64.124.41.179. > 209.179.194.175.1084: P 557:1093(536) ack 456 win 
6432 (DF)
23:36:25.701859 64.124.41.179. > 209.179.194.175.1084: P 1093:1629(536) ack 456 
win 6432 (DF)
23:36:37.701091 64.124.41.179. > 209.179.194.175.1084: P 557:1093(536) ack 456 win 
6432 (DF)
23:36:38.026766 209.179.194.175.1084 > 64.124.41.179.: . ack 1093 win 65280 (DF)
23:36:38.026826 64.124.41.179. > 209.179.194.175.1084: P 1093:1629(536) ack 456 
win 6432 (DF)
23:36:38.026839 64.124.41.179. > 209.179.194.175.1084: P 1629:1847(218) ack 456 
win 6432 (DF)
23:37:02.021068 64.124.41.179. > 209.179.194.175.1084: P 1093:1629(536) ack 456 
win 6432 (DF)
23:37:02.328163 209.179.194.175.1084 > 64.124.41.179.: . ack 1629 win 65280 (DF)
23:37:02.328189 64.124.41.179. > 209.179.194.175.1084: P 1629:1847(218) ack 456 
win 6432 (DF)
23:37:50.321057 64.124.41.179. > 209.179.194.175.1084: P 1629:1847(218) ack 456 
win 6432 (DF)
23:37:50.673000 209.179.194.175.1084 > 64.124.41.179.: . ack 1847 win 65062 (DF)
23:37:50.673068 64.124.41.179. > 209.179.194.175.1084: P 1847:1868(21) ack 456 win 
6432 (DF)
23:38:00.162380 209.179.194.175.1084 > 64.124.41.179.: F 456:456(0) ack 1847 win 
65062 (DF)
23:38:00.181055 64.124.41.179. > 209.179.194.175.1084: . ack 457 win 6432 (DF)
23:38:00.187291 64.124.41.179. > 209.179.194.175.1084: F 1868:1868(0) ack 457 win 
6432 (DF)
23:38:00.363357 209.179.194.175.1084 > 64.124.41.179.: . ack 1847 win 65062 
 (DF)
23:39:26.671050 64.124.41.179. > 209.179.194.175.1084: P 1847:1868(21) ack 457 win 
6432 (DF)
23:39:26.886417 209.179.194.175.1084 > 64.124.41.179.: R 371453:371453(0) win 0 
(DF)


22:34:34.884487 arp who-has 64.124.41.179 tell 209.179.194.175
22:34:34.889477 209.179.194.175.1084 > 64.124.41.179.: S 370996:370996(0) win 8192 
 (DF)
22:34:35.669892 64.124.41.179. > 209.179.194.175.1084: S 3050526223:3050526223(0) 
ack 370997 win 5840  (DF)
22:34:35.670624 209.179.194.175.1084 > 64.124.41.179.: . ack 1 win 8576 (DF)
22:34:35.670653 209.179.194.175.1084 > 64.124.41.179.: . ack 1 win 65280 (DF)
22:34:35.674484 209.179.194.175.1084 > 64.124.41.179.: P 1:44(43) ack 1 win 65280 
(DF)
22:34:36.049808 64.124.41.179. > 209.179.194.175.1084: . ack 44 win 5840 (DF)
22:34:36.069773 64.124.41.179. > 209.179.194.175.1084: P 1:19(18) ack 44 win 5840 
(DF)
22:34:36.069837 64.124.41.179. > 209.179.194.175.1084: P 19:553(534) ack 44 win 
5840 (DF)

Re: Poor TCP Performance 2.4.0-10 <-> Win98 SE PPP

2000-11-06 Thread Jordan Mendelson

"David S. Miller" wrote:
> 
>Date: Mon, 06 Nov 2000 21:20:39 -0800
>From: Jordan Mendelson <[EMAIL PROTECTED]>
> 
>It looks to me like there is an artificial delay in 2.4.0 which is
>slowing down the traffic to unbearable levels.
> 
> No, I think I see whats wrong, it's nothing more than packet drop.
>
> Looking at the equivalent 220 traces, the only difference appears to
> be that the packets are not getting dropped.

I would like to note that these two machines the windows client is
connecting to are sitting on the exact same switch connected to the same
provider handling identical user loads.

> Alexey, do you have any other similar reports wrt. the new MSS
> advertisement scheme in 2.4.x?
> 
> Jordan, you mentioned something about possibly being "bandwidth
> limited"?  Please, elaborate...

There is a possibility that we are hitting an upper level bandwidth
limit between us an our upstream provider due to a misconfiguration on
the other end, but this should only happen during peak time (which it is
not right now). It just bugs me that 2.2.16 doesn't appear to have this
problem.


Jordan
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Poor TCP Performance 2.4.0-10 <-> Win98 SE PPP

2000-11-06 Thread Jordan Mendelson

"David S. Miller" wrote:
> 
>Date:Mon, 06 Nov 2000 18:17:19 -0800
>From: Jordan Mendelson <[EMAIL PROTECTED]>
> 
>18:54:57.394894 eth0 > 64.124.41.177. > 209.179.248.69.1238: .
>2429:2429(0) ack 506 win 6432  (DF)
> 
> And this is it?  The connection dies right here and says no
> more?  Surely, there was more said on this connection after
> this point.
> 
> Otherwise I see nothing obviously wrong in these dumps.

I've provided two new dumps of the complete connection lifetime between
2.4.0 and 2.2.16. Both logs show the same client connecting to identical
machines, receiving the same data and then disconnecting.

2.2.16 handles the entire process in under 5 seconds while 2.4.0 takes
over 2 minutes.

Also note that the 2.4.0 connection did not get shut down correctly and
had to send an RST... though this is probably due to the client side
closing down the connection while there was still data on it. Both
machines were under approximately the same load.

It looks to me like there is an artificial delay in 2.4.0 which is
slowing down the traffic to unbearable levels. 


Jordan

22:00:39.625351 209.179.245.186.1092 > 64.124.41.179.: S 4155530:4155530(0) win 
8192  (DF)
22:00:39.625437 64.124.41.179. > 209.179.245.186.1092: S 1301092473:1301092473(0) 
ack 4155531 win 5840  (DF)
22:00:39.887133 209.179.245.186.1092 > 64.124.41.179.: . ack 1 win 8576 (DF)
22:00:39.887969 209.179.245.186.1092 > 64.124.41.179.: . ack 1 win 65280 (DF)
22:00:39.888951 209.179.245.186.1092 > 64.124.41.179.: P 1:44(43) ack 1 win 65280 
(DF)
22:00:39.888964 64.124.41.179. > 209.179.245.186.1092: . ack 44 win 5840 (DF)
22:00:39.991515 64.124.41.179. > 209.179.245.186.1092: P 1:21(20) ack 44 win 5840 
(DF)
22:00:39.991660 64.124.41.179. > 209.179.245.186.1092: P 21:557(536) ack 44 win 
5840 (DF)
22:00:42.991490 64.124.41.179. > 209.179.245.186.1092: P 1:21(20) ack 44 win 5840 
(DF)
22:00:43.180946 209.179.245.186.1092 > 64.124.41.179.: P 44:56(12) ack 21 win 
65260 (DF)
22:00:43.180997 64.124.41.179. > 209.179.245.186.1092: P 21:557(536) ack 44 win 
5840 (DF)
22:00:43.181025 64.124.41.179. > 209.179.245.186.1092: P 557:1093(536) ack 56 win 
5840 (DF)
22:00:45.685143 209.179.245.186.1092 > 64.124.41.179.: P 44:456(412) ack 21 win 
65260 (DF)
22:00:45.685204 64.124.41.179. > 209.179.245.186.1092: . ack 456 win 6432 
 (DF)
22:00:49.171046 64.124.41.179. > 209.179.245.186.1092: P 21:557(536) ack 456 win 
6432 (DF)
22:00:49.470193 209.179.245.186.1092 > 64.124.41.179.: . ack 557 win 65280 (DF)
22:00:49.470233 64.124.41.179. > 209.179.245.186.1092: P 557:1093(536) ack 456 win 
6432 (DF)
22:00:49.470248 64.124.41.179. > 209.179.245.186.1092: P 1093:1629(536) ack 456 
win 6432 (DF)
22:01:01.461056 64.124.41.179. > 209.179.245.186.1092: P 557:1093(536) ack 456 win 
6432 (DF)
22:01:01.755362 209.179.245.186.1092 > 64.124.41.179.: . ack 1093 win 65280 (DF)
22:01:01.755428 64.124.41.179. > 209.179.245.186.1092: P 1093:1629(536) ack 456 
win 6432 (DF)
22:01:01.755451 64.124.41.179. > 209.179.245.186.1092: P 1629:1825(196) ack 456 
win 6432 (DF)
22:01:25.751048 64.124.41.179. > 209.179.245.186.1092: P 1093:1629(536) ack 456 
win 6432 (DF)
22:01:26.171932 209.179.245.186.1092 > 64.124.41.179.: . ack 1629 win 65280 (DF)
22:01:26.171979 64.124.41.179. > 209.179.245.186.1092: P 1629:1825(196) ack 456 
win 6432 (DF)
22:02:14.171052 64.124.41.179. > 209.179.245.186.1092: P 1629:1825(196) ack 456 
win 6432 (DF)
22:02:14.499920 209.179.245.186.1092 > 64.124.41.179.: . ack 1825 win 65084 (DF)
22:02:14.499944 64.124.41.179. > 209.179.245.186.1092: P 1825:1847(22) ack 456 win 
6432 (DF)
22:02:16.168708 209.179.245.186.1092 > 64.124.41.179.: F 456:456(0) ack 1825 win 
65084 (DF)
22:02:16.181061 64.124.41.179. > 209.179.245.186.1092: . ack 457 win 6432 (DF)
22:02:16.281724 64.124.41.179. > 209.179.245.186.1092: F 1847:1847(0) ack 457 win 
6432 (DF)
22:02:16.477943 209.179.245.186.1092 > 64.124.41.179.: . ack 1825 win 65084 
 (DF)
22:03:50.491063 64.124.41.179. > 209.179.245.186.1092: P 1825:1847(22) ack 457 win 
6432 (DF)
22:03:50.680141 209.179.245.186.1092 > 64.124.41.179.: R 4155987:4155987(0) win 0 
(DF)


22:00:01.684927 209.179.245.186.1091 > 64.124.41.136.: S 4033171:4033171(0) win 
8192  (DF)
22:00:01.685021 64.124.41.136. > 209.179.245.186.1091: S 1261602556:1261602556(0) 
ack 4033172 win 32696  (DF)
22:00:01.916120 209.179.245.186.1091 > 64.124.41.136.: . ack 1 win 8576 (DF)
22:00:01.916191 209.179.245.186.1091 > 64.124.41.136.: . ack 1 win 65280 (DF)
22:00:01.916981 209.179.245.186.1091 > 64.124.41.136.: P 1:44(43) ack 1 win 65280 
(DF)
22:00:01.917032 64.124.41.136. > 209.179.245.186.1091: . 

Re: Poor TCP Performance 2.4.0-10 <-> Win98 SE PPP

2000-11-06 Thread Jordan Mendelson

Jordan Mendelson wrote:
> 
> We are seeing a performance slowdown between Windows PPP users and
> servers running 2.4.0-test10. Attached is a tcpdump log of the
> connection. The machines is without TCP ECN support. The Windows machine
> is running Windows 98 SE 4.10. A dialed up over PPP w/ TCP header
> compression. The Linux machine is connected directly to the Internet via
> a 6509. There is a possibility that we are hitting a bandwidth cap on
> outgoing traffic.


Just some updates. This problem does not appear to happen under 2.2.16.
The dump for 2.2.16 is almost the same except we send an mss back of 536
and not 1460 (remote mtu vs local mtu).

Here is the head of a tcpdump with the same client, but this time with a
2.2.16 machine instead of a 2.4.0-test10 machine:

19:26:23.593114 eth0 < 209.179.248.69.1260 > 64.124.41.136.: S
5061245:5061245(0) win 8192  (DF)
19:26:23.593237 eth0 > 64.124.41.136. > 209.179.248.69.1260: S
119520695:119520695(0) ack 5061246 win 32696 
(DF)
19:26:23.824394 eth0 < 209.179.248.69.1260 > 64.124.41.136.: .
1:1(0) ack 1 win 65280 (DF)
19:26:23.824398 eth0 < 209.179.248.69.1260 > 64.124.41.136.: .
1:1(0) ack 1 win 8576 (DF)
19:26:23.825249 eth0 < 209.179.248.69.1260 > 64.124.41.136.: P
1:44(43) ack 1 win 65280 (DF)
19:26:23.825283 eth0 > 64.124.41.136. > 209.179.248.69.1260: .
1:1(0) ack 44 win 32696 (DF)
19:26:25.245845 eth0 > 64.124.41.136. > 209.179.248.69.1260: P
1:21(20) ack 44 win 32696 (DF)
19:26:25.245956 eth0 > 64.124.41.136. > 209.179.248.69.1260: P
21:342(321) ack 44 win 32696 (DF)
19:26:25.466759 eth0 < 209.179.248.69.1260 > 64.124.41.136.: .
44:44(0) ack 342 win 64939 (DF)
19:26:25.466792 eth0 > 64.124.41.136. > 209.179.248.69.1260: P
342:878(536) ack 44 win 32696 (DF)
19:26:25.466800 eth0 > 64.124.41.136. > 209.179.248.69.1260: P
878:1401(523) ack 44 win 32696 (DF)
19:26:25.467562 eth0 < 209.179.248.69.1260 > 64.124.41.136.: P
44:56(12) ack 342 win 64939 (DF)
19:26:25.480104 eth0 > 64.124.41.136. > 209.179.248.69.1260: .
1401:1401(0) ack 56 win 32696 (DF)
19:26:25.763509 eth0 < 209.179.248.69.1260 > 64.124.41.136.: P
56:456(400) ack 878 win 65280 (DF)
19:26:25.766253 eth0 < 209.179.248.69.1260 > 64.124.41.136.: .
456:456(0) ack 1401 win 64757 (DF)
19:26:26.070115 eth0 > 64.124.41.136. > 209.179.248.69.1260: .
1401:1401(0) ack 456 win 32296 (DF)
19:26:26.431515 eth0 > 64.124.41.136. > 209.179.248.69.1260: P
1401:1413(12) ack 456 win 32696 (DF)
19:26:26.432141 eth0 > 64.124.41.136. > 209.179.248.69.1260: P
1413:1684(271) ack 456 win 32696 (DF)
19:26:26.657631 eth0 < 209.179.248.69.1260 > 64.124.41.136.: .
456:456(0) ack 1684 win 65280 (DF)
19:26:26.657663 eth0 > 64.124.41.136. > 209.179.248.69.1260: P
1684:1817(133) ack 456 win 32696 (DF)
19:26:26.952825 eth0 < 209.179.248.69.1260 > 64.124.41.136.: .
456:456(0) ack 1817 win 65147 (DF)
19:26:31.086138 eth0 < 209.179.248.69.1260 > 64.124.41.136.: P
456:506(50) ack 1817 win 65147 (DF)

> 18:51:33.282286 eth0 < 209.179.248.69.1238 > 64.124.41.177.: S
> 3013389:3013389(0) win 8192  (DF)
> 18:51:33.282395 eth0 > 64.124.41.177. > 209.179.248.69.1238: S
> 2198113890:2198113890(0) ack 3013390 win 5840 
> (DF)
> 18:51:33.509532 eth0 < 209.179.248.69.1238 > 64.124.41.177.: .
> 1:1(0) ack 1 win 8576 (DF)
> 18:51:33.510360 eth0 < 209.179.248.69.1238 > 64.124.41.177.: .
> 1:1(0) ack 1 win 65280 (DF)
> 18:51:33.510416 eth0 < 209.179.248.69.1238 > 64.124.41.177.: P
> 1:44(43) ack 1 win 65280 (DF)
> 18:51:33.510457 eth0 > 64.124.41.177. > 209.179.248.69.1238: .
> 1:1(0) ack 44 win 5840 (DF)
> 18:51:33.988330 eth0 > 64.124.41.177. > 209.179.248.69.1238: P
> 1:21(20) ack 44 win 5840 (DF)
> 18:51:33.988474 eth0 > 64.124.41.177. > 209.179.248.69.1238: P
> 21:557(536) ack 44 win 5840 (DF)
> 18:51:36.987336 eth0 > 64.124.41.177. > 209.179.248.69.1238: P
> 1:21(20) ack 44 win 5840 (DF)
> 18:51:37.12 eth0 < 209.179.248.69.1238 > 64.124.41.177.: P
> 44:56(12) ack 21 win 65260 (DF)
> 18:51:37.177794 eth0 > 64.124.41.177. > 209.179.248.69.1238: P
> 21:557(536) ack 44 win 5840 (DF)
> 18:51:37.177806 eth0 > 64.124.41.177. > 209.179.248.69.1238: P
> 557:1093(536) ack 56 win 5840 (DF)
> 18:51:39.845046 eth0 < 209.179.248.69.1238 > 64.124.41.177.: P
> 44:456(412) ack 21 win 65260 (DF)
> 18:51:39.845071 eth0 > 64.124.41.177. > 209.179.248.69.1238: .
> 1093:1093(0) ack 456 win 6432  (DF)
> 18:51:43.177329 eth0 > 64.124.41.177. > 209.179.248.69.1238: P
> 21:557(536) ack 456 win 6432 (DF)
> 18:51:43.538219 eth0 < 209.179.248.69.1238 > 64.124.41.177.888

Poor TCP Performance 2.4.0-10 <-> Win98 SE PPP

2000-11-06 Thread Jordan Mendelson


We are seeing a performance slowdown between Windows PPP users and
servers running 2.4.0-test10. Attached is a tcpdump log of the
connection. The machines is without TCP ECN support. The Windows machine
is running Windows 98 SE 4.10. A dialed up over PPP w/ TCP header
compression. The Linux machine is connected directly to the Internet via
a 6509. There is a possibility that we are hitting a bandwidth cap on
outgoing traffic.


18:51:33.282286 eth0 < 209.179.248.69.1238 > 64.124.41.177.: S
3013389:3013389(0) win 8192  (DF)
18:51:33.282395 eth0 > 64.124.41.177. > 209.179.248.69.1238: S
2198113890:2198113890(0) ack 3013390 win 5840 
(DF)
18:51:33.509532 eth0 < 209.179.248.69.1238 > 64.124.41.177.: .
1:1(0) ack 1 win 8576 (DF)
18:51:33.510360 eth0 < 209.179.248.69.1238 > 64.124.41.177.: .
1:1(0) ack 1 win 65280 (DF)
18:51:33.510416 eth0 < 209.179.248.69.1238 > 64.124.41.177.: P
1:44(43) ack 1 win 65280 (DF)
18:51:33.510457 eth0 > 64.124.41.177. > 209.179.248.69.1238: .
1:1(0) ack 44 win 5840 (DF)
18:51:33.988330 eth0 > 64.124.41.177. > 209.179.248.69.1238: P
1:21(20) ack 44 win 5840 (DF)
18:51:33.988474 eth0 > 64.124.41.177. > 209.179.248.69.1238: P
21:557(536) ack 44 win 5840 (DF)
18:51:36.987336 eth0 > 64.124.41.177. > 209.179.248.69.1238: P
1:21(20) ack 44 win 5840 (DF)
18:51:37.12 eth0 < 209.179.248.69.1238 > 64.124.41.177.: P
44:56(12) ack 21 win 65260 (DF)
18:51:37.177794 eth0 > 64.124.41.177. > 209.179.248.69.1238: P
21:557(536) ack 44 win 5840 (DF)
18:51:37.177806 eth0 > 64.124.41.177. > 209.179.248.69.1238: P
557:1093(536) ack 56 win 5840 (DF)
18:51:39.845046 eth0 < 209.179.248.69.1238 > 64.124.41.177.: P
44:456(412) ack 21 win 65260 (DF)
18:51:39.845071 eth0 > 64.124.41.177. > 209.179.248.69.1238: .
1093:1093(0) ack 456 win 6432  (DF)
18:51:43.177329 eth0 > 64.124.41.177. > 209.179.248.69.1238: P
21:557(536) ack 456 win 6432 (DF)
18:51:43.538219 eth0 < 209.179.248.69.1238 > 64.124.41.177.: .
456:456(0) ack 557 win 65280 (DF)
18:51:43.538275 eth0 > 64.124.41.177. > 209.179.248.69.1238: P
557:1093(536) ack 456 win 6432 (DF)
18:51:43.538292 eth0 > 64.124.41.177. > 209.179.248.69.1238: P
1093:1629(536) ack 456 win 6432 (DF)
18:51:55.537346 eth0 > 64.124.41.177. > 209.179.248.69.1238: P
557:1093(536) ack 456 win 6432 (DF)
18:51:55.841360 eth0 < 209.179.248.69.1238 > 64.124.41.177.: .
456:456(0) ack 1093 win 65280 (DF)
18:51:55.841384 eth0 > 64.124.41.177. > 209.179.248.69.1238: P
1093:1629(536) ack 456 win 6432 (DF)
18:51:55.841393 eth0 > 64.124.41.177. > 209.179.248.69.1238: P
1629:1849(220) ack 456 win 6432 (DF)
18:52:19.837335 eth0 > 64.124.41.177. > 209.179.248.69.1238: P
1093:1629(536) ack 456 win 6432 (DF)
18:52:20.153776 eth0 < 209.179.248.69.1238 > 64.124.41.177.: .
456:456(0) ack 1629 win 65280 (DF)
18:52:20.153803 eth0 > 64.124.41.177. > 209.179.248.69.1238: P
1629:1849(220) ack 456 win 6432 (DF)
18:53:08.147334 eth0 > 64.124.41.177. > 209.179.248.69.1238: P
1629:1849(220) ack 456 win 6432 (DF)
18:53:08.475911 eth0 < 209.179.248.69.1238 > 64.124.41.177.: .
456:456(0) ack 1849 win 65060 (DF)
18:53:08.475947 eth0 > 64.124.41.177. > 209.179.248.69.1238: P
1849:1871(22) ack 456 win 6432 (DF)
18:54:44.467332 eth0 > 64.124.41.177. > 209.179.248.69.1238: P
1849:1871(22) ack 456 win 6432 (DF)
18:54:44.824187 eth0 < 209.179.248.69.1238 > 64.124.41.177.: .
456:456(0) ack 1871 win 65038 (DF)
18:54:44.824256 eth0 > 64.124.41.177. > 209.179.248.69.1238: P
1871:1893(22) ack 456 win 6432 (DF)
18:54:55.212750 eth0 < 209.179.248.69.1238 > 64.124.41.177.: P
456:506(50) ack 1871 win 65038 (DF)
18:54:55.212767 eth0 > 64.124.41.177. > 209.179.248.69.1238: .
1893:1893(0) ack 506 win 6432 (DF)
18:54:55.571337 eth0 > 64.124.41.177. > 209.179.248.69.1238: P
1893:2429(536) ack 506 win 6432 (DF)
18:54:57.394879 eth0 < 209.179.248.69.1238 > 64.124.41.177.: P
456:506(50) ack 1871 win 65038 (DF)
18:54:57.394894 eth0 > 64.124.41.177. > 209.179.248.69.1238: .
2429:2429(0) ack 506 win 6432  (DF)


Here are some numbers from /proc/sys/net/ipv4:

$ cat /proc/sys/net/ipv4/tcp_rmem
409687380   174760

$ cat /proc/sys/net/ipv4/tcp_wmem 
409616384   131072

$ cat /proc/sys/net/ipv4/tcp_sack
1

$ cat /proc/sys/net/ipv4/tcp_fack
1

$ cat /proc/sys/net/ipv4/tcp_dsack
1

$ cat /proc/sys/net/ipv4/tcp_window_scaling 
1

$ cat /proc/sys/net/ipv4/tcp_syncookies 
0

$ cat /proc/sys/net/ipv4/tcp_timestamps 
1



Jordan
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Poor TCP Performance 2.4.0-10 - Win98 SE PPP

2000-11-06 Thread Jordan Mendelson

Jordan Mendelson wrote:
 
 We are seeing a performance slowdown between Windows PPP users and
 servers running 2.4.0-test10. Attached is a tcpdump log of the
 connection. The machines is without TCP ECN support. The Windows machine
 is running Windows 98 SE 4.10. A dialed up over PPP w/ TCP header
 compression. The Linux machine is connected directly to the Internet via
 a 6509. There is a possibility that we are hitting a bandwidth cap on
 outgoing traffic.


Just some updates. This problem does not appear to happen under 2.2.16.
The dump for 2.2.16 is almost the same except we send an mss back of 536
and not 1460 (remote mtu vs local mtu).

Here is the head of a tcpdump with the same client, but this time with a
2.2.16 machine instead of a 2.4.0-test10 machine:

19:26:23.593114 eth0  209.179.248.69.1260  64.124.41.136.: S
5061245:5061245(0) win 8192 mss 536,nop,nop,sackOK (DF)
19:26:23.593237 eth0  64.124.41.136.  209.179.248.69.1260: S
119520695:119520695(0) ack 5061246 win 32696 mss 536,nop,nop,sackOK
(DF)
19:26:23.824394 eth0  209.179.248.69.1260  64.124.41.136.: .
1:1(0) ack 1 win 65280 (DF)
19:26:23.824398 eth0  209.179.248.69.1260  64.124.41.136.: .
1:1(0) ack 1 win 8576 (DF)
19:26:23.825249 eth0  209.179.248.69.1260  64.124.41.136.: P
1:44(43) ack 1 win 65280 (DF)
19:26:23.825283 eth0  64.124.41.136.  209.179.248.69.1260: .
1:1(0) ack 44 win 32696 (DF)
19:26:25.245845 eth0  64.124.41.136.  209.179.248.69.1260: P
1:21(20) ack 44 win 32696 (DF)
19:26:25.245956 eth0  64.124.41.136.  209.179.248.69.1260: P
21:342(321) ack 44 win 32696 (DF)
19:26:25.466759 eth0  209.179.248.69.1260  64.124.41.136.: .
44:44(0) ack 342 win 64939 (DF)
19:26:25.466792 eth0  64.124.41.136.  209.179.248.69.1260: P
342:878(536) ack 44 win 32696 (DF)
19:26:25.466800 eth0  64.124.41.136.  209.179.248.69.1260: P
878:1401(523) ack 44 win 32696 (DF)
19:26:25.467562 eth0  209.179.248.69.1260  64.124.41.136.: P
44:56(12) ack 342 win 64939 (DF)
19:26:25.480104 eth0  64.124.41.136.  209.179.248.69.1260: .
1401:1401(0) ack 56 win 32696 (DF)
19:26:25.763509 eth0  209.179.248.69.1260  64.124.41.136.: P
56:456(400) ack 878 win 65280 (DF)
19:26:25.766253 eth0  209.179.248.69.1260  64.124.41.136.: .
456:456(0) ack 1401 win 64757 (DF)
19:26:26.070115 eth0  64.124.41.136.  209.179.248.69.1260: .
1401:1401(0) ack 456 win 32296 (DF)
19:26:26.431515 eth0  64.124.41.136.  209.179.248.69.1260: P
1401:1413(12) ack 456 win 32696 (DF)
19:26:26.432141 eth0  64.124.41.136.  209.179.248.69.1260: P
1413:1684(271) ack 456 win 32696 (DF)
19:26:26.657631 eth0  209.179.248.69.1260  64.124.41.136.: .
456:456(0) ack 1684 win 65280 (DF)
19:26:26.657663 eth0  64.124.41.136.  209.179.248.69.1260: P
1684:1817(133) ack 456 win 32696 (DF)
19:26:26.952825 eth0  209.179.248.69.1260  64.124.41.136.: .
456:456(0) ack 1817 win 65147 (DF)
19:26:31.086138 eth0  209.179.248.69.1260  64.124.41.136.: P
456:506(50) ack 1817 win 65147 (DF)

 18:51:33.282286 eth0  209.179.248.69.1238  64.124.41.177.: S
 3013389:3013389(0) win 8192 mss 536,nop,nop,sackOK (DF)
 18:51:33.282395 eth0  64.124.41.177.  209.179.248.69.1238: S
 2198113890:2198113890(0) ack 3013390 win 5840 mss 1460,nop,nop,sackOK
 (DF)
 18:51:33.509532 eth0  209.179.248.69.1238  64.124.41.177.: .
 1:1(0) ack 1 win 8576 (DF)
 18:51:33.510360 eth0  209.179.248.69.1238  64.124.41.177.: .
 1:1(0) ack 1 win 65280 (DF)
 18:51:33.510416 eth0  209.179.248.69.1238  64.124.41.177.: P
 1:44(43) ack 1 win 65280 (DF)
 18:51:33.510457 eth0  64.124.41.177.  209.179.248.69.1238: .
 1:1(0) ack 44 win 5840 (DF)
 18:51:33.988330 eth0  64.124.41.177.  209.179.248.69.1238: P
 1:21(20) ack 44 win 5840 (DF)
 18:51:33.988474 eth0  64.124.41.177.  209.179.248.69.1238: P
 21:557(536) ack 44 win 5840 (DF)
 18:51:36.987336 eth0  64.124.41.177.  209.179.248.69.1238: P
 1:21(20) ack 44 win 5840 (DF)
 18:51:37.12 eth0  209.179.248.69.1238  64.124.41.177.: P
 44:56(12) ack 21 win 65260 (DF)
 18:51:37.177794 eth0  64.124.41.177.  209.179.248.69.1238: P
 21:557(536) ack 44 win 5840 (DF)
 18:51:37.177806 eth0  64.124.41.177.  209.179.248.69.1238: P
 557:1093(536) ack 56 win 5840 (DF)
 18:51:39.845046 eth0  209.179.248.69.1238  64.124.41.177.: P
 44:456(412) ack 21 win 65260 (DF)
 18:51:39.845071 eth0  64.124.41.177.  209.179.248.69.1238: .
 1093:1093(0) ack 456 win 6432 nop,nop, sack 1 {44:56}  (DF)
 18:51:43.177329 eth0  64.124.41.177.  209.179.248.69.1238: P
 21:557(536) ack 456 win 6432 (DF)
 18:51:43.538219 eth0  209.179.248.69.1238  64.124.41.177.: .
 456:456(0) ack 557 win 65280 (DF)
 18:51:43.538275 eth0  64.124.41.177.  209.179.248.69.1238: P
 557:1093(536) ack 456 win 6432 (DF)
 18:51:43.538292 eth0  64.124.41.177.  209.179.248.69.1238: P
 1093:1629(536) ack 456 win 6432 (DF)
 18:51:55.537346 eth0  64.124.41.177.  209.179.248.69.1238: P
 557:1093(536) ack 456 win 6432 (DF

Re: Poor TCP Performance 2.4.0-10 - Win98 SE PPP

2000-11-06 Thread Jordan Mendelson

"David S. Miller" wrote:
 
Date:Mon, 06 Nov 2000 18:17:19 -0800
From: Jordan Mendelson [EMAIL PROTECTED]
 
18:54:57.394894 eth0  64.124.41.177.  209.179.248.69.1238: .
2429:2429(0) ack 506 win 6432 nop,nop, sack 1 {456:506}  (DF)
 
 And this is it?  The connection dies right here and says no
 more?  Surely, there was more said on this connection after
 this point.
 
 Otherwise I see nothing obviously wrong in these dumps.

I've provided two new dumps of the complete connection lifetime between
2.4.0 and 2.2.16. Both logs show the same client connecting to identical
machines, receiving the same data and then disconnecting.

2.2.16 handles the entire process in under 5 seconds while 2.4.0 takes
over 2 minutes.

Also note that the 2.4.0 connection did not get shut down correctly and
had to send an RST... though this is probably due to the client side
closing down the connection while there was still data on it. Both
machines were under approximately the same load.

It looks to me like there is an artificial delay in 2.4.0 which is
slowing down the traffic to unbearable levels. 


Jordan

22:00:39.625351 209.179.245.186.1092  64.124.41.179.: S 4155530:4155530(0) win 
8192 mss 536,nop,nop,sackOK (DF)
22:00:39.625437 64.124.41.179.  209.179.245.186.1092: S 1301092473:1301092473(0) 
ack 4155531 win 5840 mss 1460,nop,nop,sackOK (DF)
22:00:39.887133 209.179.245.186.1092  64.124.41.179.: . ack 1 win 8576 (DF)
22:00:39.887969 209.179.245.186.1092  64.124.41.179.: . ack 1 win 65280 (DF)
22:00:39.888951 209.179.245.186.1092  64.124.41.179.: P 1:44(43) ack 1 win 65280 
(DF)
22:00:39.888964 64.124.41.179.  209.179.245.186.1092: . ack 44 win 5840 (DF)
22:00:39.991515 64.124.41.179.  209.179.245.186.1092: P 1:21(20) ack 44 win 5840 
(DF)
22:00:39.991660 64.124.41.179.  209.179.245.186.1092: P 21:557(536) ack 44 win 
5840 (DF)
22:00:42.991490 64.124.41.179.  209.179.245.186.1092: P 1:21(20) ack 44 win 5840 
(DF)
22:00:43.180946 209.179.245.186.1092  64.124.41.179.: P 44:56(12) ack 21 win 
65260 (DF)
22:00:43.180997 64.124.41.179.  209.179.245.186.1092: P 21:557(536) ack 44 win 
5840 (DF)
22:00:43.181025 64.124.41.179.  209.179.245.186.1092: P 557:1093(536) ack 56 win 
5840 (DF)
22:00:45.685143 209.179.245.186.1092  64.124.41.179.: P 44:456(412) ack 21 win 
65260 (DF)
22:00:45.685204 64.124.41.179.  209.179.245.186.1092: . ack 456 win 6432 
nop,nop, sack 1 {44:56}  (DF)
22:00:49.171046 64.124.41.179.  209.179.245.186.1092: P 21:557(536) ack 456 win 
6432 (DF)
22:00:49.470193 209.179.245.186.1092  64.124.41.179.: . ack 557 win 65280 (DF)
22:00:49.470233 64.124.41.179.  209.179.245.186.1092: P 557:1093(536) ack 456 win 
6432 (DF)
22:00:49.470248 64.124.41.179.  209.179.245.186.1092: P 1093:1629(536) ack 456 
win 6432 (DF)
22:01:01.461056 64.124.41.179.  209.179.245.186.1092: P 557:1093(536) ack 456 win 
6432 (DF)
22:01:01.755362 209.179.245.186.1092  64.124.41.179.: . ack 1093 win 65280 (DF)
22:01:01.755428 64.124.41.179.  209.179.245.186.1092: P 1093:1629(536) ack 456 
win 6432 (DF)
22:01:01.755451 64.124.41.179.  209.179.245.186.1092: P 1629:1825(196) ack 456 
win 6432 (DF)
22:01:25.751048 64.124.41.179.  209.179.245.186.1092: P 1093:1629(536) ack 456 
win 6432 (DF)
22:01:26.171932 209.179.245.186.1092  64.124.41.179.: . ack 1629 win 65280 (DF)
22:01:26.171979 64.124.41.179.  209.179.245.186.1092: P 1629:1825(196) ack 456 
win 6432 (DF)
22:02:14.171052 64.124.41.179.  209.179.245.186.1092: P 1629:1825(196) ack 456 
win 6432 (DF)
22:02:14.499920 209.179.245.186.1092  64.124.41.179.: . ack 1825 win 65084 (DF)
22:02:14.499944 64.124.41.179.  209.179.245.186.1092: P 1825:1847(22) ack 456 win 
6432 (DF)
22:02:16.168708 209.179.245.186.1092  64.124.41.179.: F 456:456(0) ack 1825 win 
65084 (DF)
22:02:16.181061 64.124.41.179.  209.179.245.186.1092: . ack 457 win 6432 (DF)
22:02:16.281724 64.124.41.179.  209.179.245.186.1092: F 1847:1847(0) ack 457 win 
6432 (DF)
22:02:16.477943 209.179.245.186.1092  64.124.41.179.: . ack 1825 win 65084 
nop,nop, sack 1 {1847:1848}  (DF)
22:03:50.491063 64.124.41.179.  209.179.245.186.1092: P 1825:1847(22) ack 457 win 
6432 (DF)
22:03:50.680141 209.179.245.186.1092  64.124.41.179.: R 4155987:4155987(0) win 0 
(DF)


22:00:01.684927 209.179.245.186.1091  64.124.41.136.: S 4033171:4033171(0) win 
8192 mss 536,nop,nop,sackOK (DF)
22:00:01.685021 64.124.41.136.  209.179.245.186.1091: S 1261602556:1261602556(0) 
ack 4033172 win 32696 mss 536,nop,nop,sackOK (DF)
22:00:01.916120 209.179.245.186.1091  64.124.41.136.: . ack 1 win 8576 (DF)
22:00:01.916191 209.179.245.186.1091  64.124.41.136.: . ack 1 win 65280 (DF)
22:00:01.916981 209.179.245.186.1091  64.124.41.136.: P 1:44(43) ack 1 win 65280 
(DF)
22:00:01.917032 64.124.41.136.  209.179.245.186.1091: . ack 44 win 32696 (DF)
22:00:02.121143 64.124.4

Re: Poor TCP Performance 2.4.0-10 - Win98 SE PPP

2000-11-06 Thread Jordan Mendelson

"David S. Miller" wrote:
 
Date: Mon, 06 Nov 2000 21:20:39 -0800
From: Jordan Mendelson [EMAIL PROTECTED]
 
It looks to me like there is an artificial delay in 2.4.0 which is
slowing down the traffic to unbearable levels.
 
 No, I think I see whats wrong, it's nothing more than packet drop.

 Looking at the equivalent 220 traces, the only difference appears to
 be that the packets are not getting dropped.

I would like to note that these two machines the windows client is
connecting to are sitting on the exact same switch connected to the same
provider handling identical user loads.

 Alexey, do you have any other similar reports wrt. the new MSS
 advertisement scheme in 2.4.x?
 
 Jordan, you mentioned something about possibly being "bandwidth
 limited"?  Please, elaborate...

There is a possibility that we are hitting an upper level bandwidth
limit between us an our upstream provider due to a misconfiguration on
the other end, but this should only happen during peak time (which it is
not right now). It just bugs me that 2.2.16 doesn't appear to have this
problem.


Jordan
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Poor TCP Performance 2.4.0-10 - Win98 SE PPP

2000-11-06 Thread Jordan Mendelson

"David S. Miller" wrote:
 
Date: Mon, 06 Nov 2000 22:13:23 -0800
From: Jordan Mendelson [EMAIL PROTECTED]
 
There is a possibility that we are hitting an upper level bandwidth
limit between us an our upstream provider due to a misconfiguration
on the other end, but this should only happen during peak time
(which it is not right now). It just bugs me that 2.2.16 doesn't
appear to have this problem.
 
 The only thing I can do now is beg for a tcpdump from the windows95
 machine side.  Do you have the facilities necessary to obtain this?
 This would prove that it is packet drop between the two systems, for
 whatever reason, that is causing this.

Attached to this message are dumps from the windows 98 machine using
windump and the linux 2.4.0-test10. Sorry the time stamps don't match
up.


Jordan

23:36:15.252817 209.179.194.175.1084  64.124.41.179.: S 370996:370996(0) win 8192 
mss 536,nop,nop,sackOK (DF)
23:36:15.252891 64.124.41.179.  209.179.194.175.1084: S 3050526223:3050526223(0) 
ack 370997 win 5840 mss 1460,nop,nop,sackOK (DF)
23:36:16.159685 209.179.194.175.1084  64.124.41.179.: . ack 1 win 8576 (DF)
23:36:16.160461 209.179.194.175.1084  64.124.41.179.: . ack 1 win 65280 (DF)
23:36:16.160488 209.179.194.175.1084  64.124.41.179.: P 1:44(43) ack 1 win 65280 
(DF)
23:36:16.160506 64.124.41.179.  209.179.194.175.1084: . ack 44 win 5840 (DF)
23:36:16.261533 64.124.41.179.  209.179.194.175.1084: P 1:21(20) ack 44 win 5840 
(DF)
23:36:16.261669 64.124.41.179.  209.179.194.175.1084: P 21:557(536) ack 44 win 
5840 (DF)
23:36:19.261055 64.124.41.179.  209.179.194.175.1084: P 1:21(20) ack 44 win 5840 
(DF)
23:36:19.450762 209.179.194.175.1084  64.124.41.179.: P 44:56(12) ack 21 win 
65260 (DF)
23:36:19.450788 64.124.41.179.  209.179.194.175.1084: P 21:557(536) ack 44 win 
5840 (DF)
23:36:19.450820 64.124.41.179.  209.179.194.175.1084: P 557:1093(536) ack 56 win 
5840 (DF)
23:36:22.281248 209.179.194.175.1084  64.124.41.179.: P 44:456(412) ack 21 win 
65260 (DF)
23:36:22.281308 64.124.41.179.  209.179.194.175.1084: . ack 456 win 6432 
nop,nop, sack 1 {44:56}  (DF)
23:36:25.441061 64.124.41.179.  209.179.194.175.1084: P 21:557(536) ack 456 win 
6432 (DF)
23:36:25.701796 209.179.194.175.1084  64.124.41.179.: . ack 557 win 65280 (DF)
23:36:25.701841 64.124.41.179.  209.179.194.175.1084: P 557:1093(536) ack 456 win 
6432 (DF)
23:36:25.701859 64.124.41.179.  209.179.194.175.1084: P 1093:1629(536) ack 456 
win 6432 (DF)
23:36:37.701091 64.124.41.179.  209.179.194.175.1084: P 557:1093(536) ack 456 win 
6432 (DF)
23:36:38.026766 209.179.194.175.1084  64.124.41.179.: . ack 1093 win 65280 (DF)
23:36:38.026826 64.124.41.179.  209.179.194.175.1084: P 1093:1629(536) ack 456 
win 6432 (DF)
23:36:38.026839 64.124.41.179.  209.179.194.175.1084: P 1629:1847(218) ack 456 
win 6432 (DF)
23:37:02.021068 64.124.41.179.  209.179.194.175.1084: P 1093:1629(536) ack 456 
win 6432 (DF)
23:37:02.328163 209.179.194.175.1084  64.124.41.179.: . ack 1629 win 65280 (DF)
23:37:02.328189 64.124.41.179.  209.179.194.175.1084: P 1629:1847(218) ack 456 
win 6432 (DF)
23:37:50.321057 64.124.41.179.  209.179.194.175.1084: P 1629:1847(218) ack 456 
win 6432 (DF)
23:37:50.673000 209.179.194.175.1084  64.124.41.179.: . ack 1847 win 65062 (DF)
23:37:50.673068 64.124.41.179.  209.179.194.175.1084: P 1847:1868(21) ack 456 win 
6432 (DF)
23:38:00.162380 209.179.194.175.1084  64.124.41.179.: F 456:456(0) ack 1847 win 
65062 (DF)
23:38:00.181055 64.124.41.179.  209.179.194.175.1084: . ack 457 win 6432 (DF)
23:38:00.187291 64.124.41.179.  209.179.194.175.1084: F 1868:1868(0) ack 457 win 
6432 (DF)
23:38:00.363357 209.179.194.175.1084  64.124.41.179.: . ack 1847 win 65062 
nop,nop, sack 1 {1868:1869}  (DF)
23:39:26.671050 64.124.41.179.  209.179.194.175.1084: P 1847:1868(21) ack 457 win 
6432 (DF)
23:39:26.886417 209.179.194.175.1084  64.124.41.179.: R 371453:371453(0) win 0 
(DF)


22:34:34.884487 arp who-has 64.124.41.179 tell 209.179.194.175
22:34:34.889477 209.179.194.175.1084  64.124.41.179.: S 370996:370996(0) win 8192 
mss 536,nop,nop,sackOK (DF)
22:34:35.669892 64.124.41.179.  209.179.194.175.1084: S 3050526223:3050526223(0) 
ack 370997 win 5840 mss 1460,nop,nop,sackOK (DF)
22:34:35.670624 209.179.194.175.1084  64.124.41.179.: . ack 1 win 8576 (DF)
22:34:35.670653 209.179.194.175.1084  64.124.41.179.: . ack 1 win 65280 (DF)
22:34:35.674484 209.179.194.175.1084  64.124.41.179.: P 1:44(43) ack 1 win 65280 
(DF)
22:34:36.049808 64.124.41.179.  209.179.194.175.1084: . ack 44 win 5840 (DF)
22:34:36.069773 64.124.41.179.  209.179.194.175.1084: P 1:19(18) ack 44 win 5840 
(DF)
22:34:36.069837 64.124.41.179.  209.179.194.175.1084: P 19:553(534) ack 44 win 
5840 (DF)
22:34:39.049788 64.124.41.179.  209.179.194.175.1084: P 1:21(20) ack 44 win 5840 

Re: Poor TCP Performance 2.4.0-10 - Win98 SE PPP

2000-11-06 Thread Jordan Mendelson

"David S. Miller" wrote:
 
Date: Mon, 06 Nov 2000 22:44:00 -0800
From: Jordan Mendelson [EMAIL PROTECTED]
 
Attached to this message are dumps from the windows 98 machine using
windump and the linux 2.4.0-test10. Sorry the time stamps don't match
up.
 
 (ie. Linux sends bytes 1:21 both the first time, and when it
  retransmits that data.  However win98 "sees" this as 1:19 the first
  time and 1:21 during the retransmit by Linux)
 
 That is bogus.  Something is mangling the packets between the Linux
 machine and the win98 machine.  You mentioned something about
 bandwidth limiting at your upstream provider, any chance you can have
 them turn this bandwidth limiting device off?

It actually turns out that that problem with bandwidth was fixed
yesterday, so this can not be the problem here and yes, 64.124.41.179 is
a linux box. :)

 Or maybe earthlink is using some packet mangling device?
 
 It is clear though, that something is messing with or corrupting the
 packets.  One thing you might try is turning off TCP header
 compression for the PPP link, does this make a difference?

Actually, there has been several reports that turning header compression
does help.


Jordan
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Poor TCP Performance 2.4.0-10 - Win98 SE PPP

2000-11-06 Thread Jordan Mendelson

"David S. Miller" wrote:
 
Date: Mon, 06 Nov 2000 23:16:21 -0800
From: Jordan Mendelson [EMAIL PROTECTED]
 
"David S. Miller" wrote:
 It is clear though, that something is messing with or corrupting the
 packets.  One thing you might try is turning off TCP header
 compression for the PPP link, does this make a difference?
 
Actually, there has been several reports that turning header
compression does help.
 
 If this is what is causing the TCP sequence numbers to change
 then either Win98's or Earthlink terminal server's implementation
 of TCP header compression is buggy.
 
 Assuming this is true, it explains why Win98's TCP does not "see" the
 data sent by Linux, because such a bug would make the TCP checksum of
 these packets incorrect and thus dropped by Win98's TCP.

Ok, but why doesn't 2.2.16 exhibit this behavior?

We've had reports from quite a number of people complaining about this
and I'm fairly certain not all of them are from Earthlink.


Jordan
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



0-order allocation failed / Fragmentation Bug? (2.4.0-test10)

2000-11-03 Thread Jordan Mendelson


I've been receiving these error messages during times of near complete
memory depletion. This particular machine runs a bare minimum of
processes and a our own application which is a threaded long running (1
day, 5:39) which consumes most of the resources on the machine. Oddly
enough however, the mallinfo() for this process shows a discrepancy of
650 megs with ps and top. 

This process handles a large number of TCP connections and does a lot of
dynamic memory allocation, so I assumed the difference was due to memory
fragmentation on our part, however I thought that kswapd would reclaim
memory once it started swapping it out.

Another oddity is the bogomips reported by each CPU are somewhat
different from eachother.

The message being received is:

Nov  3 10:04:30 n175 kernel: __alloc_pages: 0-order allocation failed. 
Nov  3 10:04:30 n175 last message repeated 363 times

The kernel version:

Linux version 2.4.0-test10 (root@stp) (gcc version egcs-2.91.66
19990314/Linux (egcs-1.1.2 release)) #1 SMP Tue Oct 31 13:13:05 PST 2000


What free reports:

 total   used   free sharedbuffers
cached
Mem:   10282561024172   4084  0148 
59296
-/+ buffers/cache: 964728  63528
Swap:   136512  75588  60924


ps and top report that the process taking up all this memory has an RSS
of 967656 KB and a VSIZE of 1005892, but mallinfo() on the process shows
a completely different number:

Memory statistics from mallinfo:
Total space allocated from system: 361406208
Number of non-inuse chunks: 1079273
Number of mmapped regions: 0
Total space in mmapped regions: 0
Total allocated space: 235536032
Total non-inuse space: 125870176
Top-most, releasable (via malloc_trim) space: 68776

All memory in this process is allocated via new or malloc (new calls
malloc though) and the numbers mallinfo() and ps report are 99.5%
accurate up until a sort of "slide" period where they diverge fairly
quickly.

$ cat /proc/slabinfo
slabinfo - version: 1.1 (SMP)
kmem_cache68 68232441 :  252  126
nfs_read_data  0  0352001 :  124   62
nfs_write_data 0  0384001 :  124   62
nfs_page   0  0 96001 :  252  126
nfs_fh80 80 96221 :  252  126
tcp_tw_bucket 39 80 96221 :  252  126
tcp_bind_bucket   33452 32441 :  252  126
tcp_open_request 403413 64771 :  252  126
inet_peer_cache1 59 64111 :  252  126
ip_fib_hash   11113 32111 :  252  126
ip_dst_cache   15813  23856160  994  9941 :  252  126
arp_cache 46 90128331 :  252  126
blkdev_requests  768800 96   20   201 :  252  126
dnotify cache  0  0 20001 :  252  126
file lock cache0  0 92001 :  252  126
fasync cache   0  0 16001 :  252  126
uid_cache  3226 32221 :  252  126
skbuff_head_cache   2840  12984160  541  5411 :  252  126
sock7757  10310800 2062 20621 :  124   62
inode_cache 7616  11000384 1100 11001 :  124   62
bdev_cache 7118 64221 :  252  126
sigqueue  58 58132221 :  252  126
kiobuf 0  0128001 :  252  126
dentry_cache7730  12420128  414  4141 :  252  126
filp   11758  11800 96  295  2951 :  252  126
names_cache2  2   4096221 :   60   30
buffer_head15200  25520 96  637  6381 :  252  126
mm_struct 72 72160331 :  252  126
vm_area_struct  1048   1062 64   18   181 :  252  126
fs_cache 118118 64221 :  252  126
files_cache   27 27416331 :  124   62
signal_act24 24   1312881 :   60   30
size-131072(DMA)   0  0 13107200   32 :00
size-1310720  0 13107200   32 :00
size-65536(DMA)0  0  6553600   16 :00
size-65536 0  0  6553600   16 :00
size-32768(DMA)0  0  32768008 :00
size-32768 0  0  32768008 :00
size-16384(DMA)0  0  16384004 :00
size-16384 0  0  16384004 :00
size-8192(DMA) 0  0   8192002 :00
size-8192  3  3   8192332 :00
size-4096(DMA) 0  0   4096001 :   60   30
size-4096 12 12   4096   12   121 :   60   30
size-2048(DMA) 0  0   2048001 :   

0-order allocation failed / Fragmentation Bug? (2.4.0-test10)

2000-11-03 Thread Jordan Mendelson


I've been receiving these error messages during times of near complete
memory depletion. This particular machine runs a bare minimum of
processes and a our own application which is a threaded long running (1
day, 5:39) which consumes most of the resources on the machine. Oddly
enough however, the mallinfo() for this process shows a discrepancy of
650 megs with ps and top. 

This process handles a large number of TCP connections and does a lot of
dynamic memory allocation, so I assumed the difference was due to memory
fragmentation on our part, however I thought that kswapd would reclaim
memory once it started swapping it out.

Another oddity is the bogomips reported by each CPU are somewhat
different from eachother.

The message being received is:

Nov  3 10:04:30 n175 kernel: __alloc_pages: 0-order allocation failed. 
Nov  3 10:04:30 n175 last message repeated 363 times

The kernel version:

Linux version 2.4.0-test10 (root@stp) (gcc version egcs-2.91.66
19990314/Linux (egcs-1.1.2 release)) #1 SMP Tue Oct 31 13:13:05 PST 2000


What free reports:

 total   used   free sharedbuffers
cached
Mem:   10282561024172   4084  0148 
59296
-/+ buffers/cache: 964728  63528
Swap:   136512  75588  60924


ps and top report that the process taking up all this memory has an RSS
of 967656 KB and a VSIZE of 1005892, but mallinfo() on the process shows
a completely different number:

Memory statistics from mallinfo:
Total space allocated from system: 361406208
Number of non-inuse chunks: 1079273
Number of mmapped regions: 0
Total space in mmapped regions: 0
Total allocated space: 235536032
Total non-inuse space: 125870176
Top-most, releasable (via malloc_trim) space: 68776

All memory in this process is allocated via new or malloc (new calls
malloc though) and the numbers mallinfo() and ps report are 99.5%
accurate up until a sort of "slide" period where they diverge fairly
quickly.

$ cat /proc/slabinfo
slabinfo - version: 1.1 (SMP)
kmem_cache68 68232441 :  252  126
nfs_read_data  0  0352001 :  124   62
nfs_write_data 0  0384001 :  124   62
nfs_page   0  0 96001 :  252  126
nfs_fh80 80 96221 :  252  126
tcp_tw_bucket 39 80 96221 :  252  126
tcp_bind_bucket   33452 32441 :  252  126
tcp_open_request 403413 64771 :  252  126
inet_peer_cache1 59 64111 :  252  126
ip_fib_hash   11113 32111 :  252  126
ip_dst_cache   15813  23856160  994  9941 :  252  126
arp_cache 46 90128331 :  252  126
blkdev_requests  768800 96   20   201 :  252  126
dnotify cache  0  0 20001 :  252  126
file lock cache0  0 92001 :  252  126
fasync cache   0  0 16001 :  252  126
uid_cache  3226 32221 :  252  126
skbuff_head_cache   2840  12984160  541  5411 :  252  126
sock7757  10310800 2062 20621 :  124   62
inode_cache 7616  11000384 1100 11001 :  124   62
bdev_cache 7118 64221 :  252  126
sigqueue  58 58132221 :  252  126
kiobuf 0  0128001 :  252  126
dentry_cache7730  12420128  414  4141 :  252  126
filp   11758  11800 96  295  2951 :  252  126
names_cache2  2   4096221 :   60   30
buffer_head15200  25520 96  637  6381 :  252  126
mm_struct 72 72160331 :  252  126
vm_area_struct  1048   1062 64   18   181 :  252  126
fs_cache 118118 64221 :  252  126
files_cache   27 27416331 :  124   62
signal_act24 24   1312881 :   60   30
size-131072(DMA)   0  0 13107200   32 :00
size-1310720  0 13107200   32 :00
size-65536(DMA)0  0  6553600   16 :00
size-65536 0  0  6553600   16 :00
size-32768(DMA)0  0  32768008 :00
size-32768 0  0  32768008 :00
size-16384(DMA)0  0  16384004 :00
size-16384 0  0  16384004 :00
size-8192(DMA) 0  0   8192002 :00
size-8192  3  3   8192332 :00
size-4096(DMA) 0  0   4096001 :   60   30
size-4096 12 12   4096   12   121 :   60   30
size-2048(DMA) 0  0   2048001 :   

Re: Linux's implementation of poll() not scalable?

2000-10-23 Thread Jordan Mendelson

Linus Torvalds wrote:
> 
> On Tue, 24 Oct 2000, Andi Kleen wrote:
> >
> > I don't see the problem. You have the poll table allocated in the kernel,
> > the drivers directly change it and the user mmaps it (I was not proposing
> > to let poll make a kiobuf out of the passed array)

> Th eproblem with poll() as-is is that the user doesn't really tell the
> kernel explictly when it is changing the table..

What you describe is exactly what the /dev/poll interface patch from the
Linux scalability project does.

It creates a special device which you can open up and write
add/remove/modify entries you wish to be notified of using the standard
struct pollfd. Removing entries is done by setting the events in a
struct written to the device to POLLREMOVE.

You can optionally mmap() memory which the notifications are written to.
Two ioctl() calls are provide for the initial allocation and also to
force it to check all items in your poll() list.

Solaris has this same interface minus the mmap()'ed memory.


Jordan
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Linux's implementation of poll() not scalable?

2000-10-23 Thread Jordan Mendelson

Dan Kegel wrote:
> 
> Jordan Mendelson ([EMAIL PROTECTED]) wrote:
> > An implementation of /dev/poll for Linux already exists and has shown to
> > be more scalable than using RT signals under my tests. A patch for 2.2.x
> > and 2.4.x should be available at the Linux Scalability Project @
> > http://www.citi.umich.edu/projects/linux-scalability/ in the patches
> > section.
> 
> If you'll look at the page I linked to in my original post,
>   http://www.kegel.com/dkftpbench/Poller_bench.html
> you'll see that I also benchmarked /dev/poll.

The Linux /dev/poll implementation has a few "non-standard" features
such as the ability to mmap() the poll structure memory to eliminate a
memory copy.

int dpoll_fd;
unsigned char *dpoll;
struct pollfd *mmap_dpoll;

dpoll_fd = open("/dev/poll", O_RDWR, 0);
ioctl(dpoll_fd, DP_ALLOC, 1);
dpoll = mmap(0, DP_MMAP_SIZE(1), PROT_WRITE|PROT_READ,
MAP_SHARED,   dpoll_fd, 0);

dpoll = (struct pollfd *)mmap_dpoll;

Use this memory when reading and write() to add/remove and see if you
get any boost in performance. Also, I believe there is a hash table
associated with /dev/poll in the kernel patch which might slow down your
performance tests when it's first growing to resize itself.


Jordan
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Linux's implementation of poll() not scalable?

2000-10-23 Thread Jordan Mendelson

Dan Kegel wrote:
> 
> Linus Torvalds wrote:
> > Dan Kegel wrote:
> >> [ http://www.kegel.com/dkftpbench/Poller_bench.html ]
> >> [ With only one active fd and N idle ones, poll's execution time scales
> >> [ as 6N on Solaris, but as 300N on Linux. ]
> >
> > Basically, poll() is _fundamentally_ a O(n) interface. There is no way
> > to avoid it - you have an array, and there simply is _no_ known
> > algorithm to scan an array in faster than O(n) time. Sorry.
> > ...
> > Under Linux, I'm personally more worried about the performance of X etc,
> > and small poll()'s are actually common. So I would argue that the
> > Solaris scalability is going the wrong way. But as performance really
> > depends on the load, and maybe that 1 entry load is what you
> > consider "real life", you are of course free to disagree (and you'd be
> > equally right ;)

> The way I'm implementing RT signal support is by writing a userspace
> wrapper to make it look like an OO version of poll(), more or less,
> with an 'add(int fd)' method so the wrapper manages the arrays of pollfd's.
> When and if I get that working, I may move it into the kernel as an
> implementation of /dev/poll -- and then I won't need to worry about
> the RT signal queue overflowing anymore, and I won't care how scalable
> poll() is.

An implementation of /dev/poll for Linux already exists and has shown to
be more scalable than using RT signals under my tests. A patch for 2.2.x
and 2.4.x should be available at the Linux Scalability Project @ 
http://www.citi.umich.edu/projects/linux-scalability/ in the patches
section.

It works fairly well, but I was actually somewhat disappointed to find
that it wasn't the primary cause for the system CPU suckage for my
particular system. Granted, when you only have to poll a few times per
second, the overhead of standard poll() just isn't that bad.


Jordan
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Linux's implementation of poll() not scalable?

2000-10-23 Thread Jordan Mendelson

Dan Kegel wrote:
 
 Linus Torvalds wrote:
  Dan Kegel wrote:
  [ http://www.kegel.com/dkftpbench/Poller_bench.html ]
  [ With only one active fd and N idle ones, poll's execution time scales
  [ as 6N on Solaris, but as 300N on Linux. ]
 
  Basically, poll() is _fundamentally_ a O(n) interface. There is no way
  to avoid it - you have an array, and there simply is _no_ known
  algorithm to scan an array in faster than O(n) time. Sorry.
  ...
  Under Linux, I'm personally more worried about the performance of X etc,
  and small poll()'s are actually common. So I would argue that the
  Solaris scalability is going the wrong way. But as performance really
  depends on the load, and maybe that 1 entry load is what you
  consider "real life", you are of course free to disagree (and you'd be
  equally right ;)

 The way I'm implementing RT signal support is by writing a userspace
 wrapper to make it look like an OO version of poll(), more or less,
 with an 'add(int fd)' method so the wrapper manages the arrays of pollfd's.
 When and if I get that working, I may move it into the kernel as an
 implementation of /dev/poll -- and then I won't need to worry about
 the RT signal queue overflowing anymore, and I won't care how scalable
 poll() is.

An implementation of /dev/poll for Linux already exists and has shown to
be more scalable than using RT signals under my tests. A patch for 2.2.x
and 2.4.x should be available at the Linux Scalability Project @ 
http://www.citi.umich.edu/projects/linux-scalability/ in the patches
section.

It works fairly well, but I was actually somewhat disappointed to find
that it wasn't the primary cause for the system CPU suckage for my
particular system. Granted, when you only have to poll a few times per
second, the overhead of standard poll() just isn't that bad.


Jordan
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Linux's implementation of poll() not scalable?

2000-10-23 Thread Jordan Mendelson

Linus Torvalds wrote:
 
 On Tue, 24 Oct 2000, Andi Kleen wrote:
 
  I don't see the problem. You have the poll table allocated in the kernel,
  the drivers directly change it and the user mmaps it (I was not proposing
  to let poll make a kiobuf out of the passed array)

 Th eproblem with poll() as-is is that the user doesn't really tell the
 kernel explictly when it is changing the table..

What you describe is exactly what the /dev/poll interface patch from the
Linux scalability project does.

It creates a special device which you can open up and write
add/remove/modify entries you wish to be notified of using the standard
struct pollfd. Removing entries is done by setting the events in a
struct written to the device to POLLREMOVE.

You can optionally mmap() memory which the notifications are written to.
Two ioctl() calls are provide for the initial allocation and also to
force it to check all items in your poll() list.

Solaris has this same interface minus the mmap()'ed memory.


Jordan
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: TCP: peer x.x.x.x:y/z shrinks window a:b:c...

2000-10-19 Thread Jordan Mendelson

[EMAIL PROTECTED] wrote:
> 
> Hello!
> 
> > I'll keep looking.
> 
> Is it easy to reproduce? If so, try to make tcpdump, which
> covers one of these messages.

It's extremely rare. We maintain persistent connections open for long
periods of time and even though a user who triggered it is online, it
only triggers the message a maximum of 26 times (typically ~4) and the
traffic volume we handle, it is not extremely practical for me to log
all traffic.

Of the IPs which triggered the response and were online at the time,
every single one of thm has either not had any ports open, been
firewalled or had nmap not be able to guess correctly with a single
exception of a machine which nmap said was "Windows NT4 / Win95 / Win98,
Windows NT 4 SP3, Windows NT 4.0 Server SP5 + 2047 Hotfixes." that had
port 1500/tcp (vlsi-lm) open.

However, during the scan, nmap reported that the report server was
sending RST from port 1500. One thing I did notice is that most of the
machines which I could ping that triggered this message were extremely
lagged (ping times 800+).

I'll keep trying though.


Jordan
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: TCP: peer x.x.x.x:y/z shrinks window a:b:c...

2000-10-19 Thread Jordan Mendelson

[EMAIL PROTECTED] wrote:
 
 Hello!
 
  I'll keep looking.
 
 Is it easy to reproduce? If so, try to make tcpdump, which
 covers one of these messages.

It's extremely rare. We maintain persistent connections open for long
periods of time and even though a user who triggered it is online, it
only triggers the message a maximum of 26 times (typically ~4) and the
traffic volume we handle, it is not extremely practical for me to log
all traffic.

Of the IPs which triggered the response and were online at the time,
every single one of thm has either not had any ports open, been
firewalled or had nmap not be able to guess correctly with a single
exception of a machine which nmap said was "Windows NT4 / Win95 / Win98,
Windows NT 4 SP3, Windows NT 4.0 Server SP5 + 2047 Hotfixes." that had
port 1500/tcp (vlsi-lm) open.

However, during the scan, nmap reported that the report server was
sending RST from port 1500. One thing I did notice is that most of the
machines which I could ping that triggered this message were extremely
lagged (ping times 800+).

I'll keep trying though.


Jordan
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: TCP: peer x.x.x.x:y/z shrinks window a:b:c...

2000-10-18 Thread Jordan Mendelson

"David S. Miller" wrote:
> 
> The IP addresses are important because we can use them to find out
> what TCP implementations shrink their offered windows.
> 
> Actually, you don't need to tell me or anyone else what these IP
> addresses are, you can instead run one of the "remote OS identifier"
> programs out there to those sites and just let me know what OS those
> systems are running :-)


All of the IPs which were reported appeared to be firewalled making a
direct scan impossible. The hop above them were almost always reported
as Cisco terminal servers running IOS 11.2 and in one case it was
reported as a Cisco router/switch running IOS 11.2.

I'll keep looking.


Jordan
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: TCP: peer x.x.x.x:y/z shrinks window a:b:c...

2000-10-18 Thread Jordan Mendelson

"David S. Miller" wrote:
 
 The IP addresses are important because we can use them to find out
 what TCP implementations shrink their offered windows.
 
 Actually, you don't need to tell me or anyone else what these IP
 addresses are, you can instead run one of the "remote OS identifier"
 programs out there to those sites and just let me know what OS those
 systems are running :-)


All of the IPs which were reported appeared to be firewalled making a
direct scan impossible. The hop above them were almost always reported
as Cisco terminal servers running IOS 11.2 and in one case it was
reported as a Cisco router/switch running IOS 11.2.

I'll keep looking.


Jordan
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



TCP: peer x.x.x.x:y/z shrinks window a:b:c...

2000-10-17 Thread Jordan Mendelson


I've begun to test 2.4.0 kernels on some high traffic machines to see
what kind of difference it makes. I have seen a lot of these error
messages in dmesg and although they don't seem to happen very often and
seem harmless, I figured I'd report it anyway. They show up in groups
(mostly) from the same IP and only in very small numbers (3-15).

IPs have been changed to protect the innocent:

TCP: peer x.x.x.x:1268/ shrinks window 2604660027:635:2604661487.
Bad, what else can I say?
TCP: peer x.x.x.x:1268/ shrinks window 2604662947:121:2604664407.
Bad, what else can I say?
TCP: peer x.x.x.x:1268/ shrinks window 2604665867:635:2604667327.
Bad, what else can I say?

and another:

TCP: peer y.y.y.y:1125/ shrinks window 548103043:635:548104503. Bad,
what else can I say?
TCP: peer y.y.y.y:1125/ shrinks window 548119685:620:548121145. Bad,
what else can I say?
TCP: peer y.y.y.y:1125/ shrinks window 548122605:635:548124065. Bad,
what else can I say?
TCP: peer y.y.y.y:1125/ shrinks window 548125525:635:548126985. Bad,
what else can I say?
TCP: peer y.y.y.y:1125/ shrinks window 548128445:635:548129905. Bad,
what else can I say?
TCP: peer y.y.y.y:1125/ shrinks window 548143150:635:548144610. Bad,
what else can I say?
TCP: peer y.y.y.y:1125/ shrinks window 548146070:635:548146913. Bad,
what else can I say?
TCP: peer y.y.y.y:1125/ shrinks window 548156080:635:548157540. Bad,
what else can I say?
TCP: peer y.y.y.y:1125/ shrinks window 548163715:635:548165175. Bad,
what else can I say?
TCP: peer y.y.y.y:1125/ shrinks window 548175395:635:548176311. Bad,
what else can I say?



Jordan
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



TCP: peer x.x.x.x:y/z shrinks window a:b:c...

2000-10-17 Thread Jordan Mendelson


I've begun to test 2.4.0 kernels on some high traffic machines to see
what kind of difference it makes. I have seen a lot of these error
messages in dmesg and although they don't seem to happen very often and
seem harmless, I figured I'd report it anyway. They show up in groups
(mostly) from the same IP and only in very small numbers (3-15).

IPs have been changed to protect the innocent:

TCP: peer x.x.x.x:1268/ shrinks window 2604660027:635:2604661487.
Bad, what else can I say?
TCP: peer x.x.x.x:1268/ shrinks window 2604662947:121:2604664407.
Bad, what else can I say?
TCP: peer x.x.x.x:1268/ shrinks window 2604665867:635:2604667327.
Bad, what else can I say?

and another:

TCP: peer y.y.y.y:1125/ shrinks window 548103043:635:548104503. Bad,
what else can I say?
TCP: peer y.y.y.y:1125/ shrinks window 548119685:620:548121145. Bad,
what else can I say?
TCP: peer y.y.y.y:1125/ shrinks window 548122605:635:548124065. Bad,
what else can I say?
TCP: peer y.y.y.y:1125/ shrinks window 548125525:635:548126985. Bad,
what else can I say?
TCP: peer y.y.y.y:1125/ shrinks window 548128445:635:548129905. Bad,
what else can I say?
TCP: peer y.y.y.y:1125/ shrinks window 548143150:635:548144610. Bad,
what else can I say?
TCP: peer y.y.y.y:1125/ shrinks window 548146070:635:548146913. Bad,
what else can I say?
TCP: peer y.y.y.y:1125/ shrinks window 548156080:635:548157540. Bad,
what else can I say?
TCP: peer y.y.y.y:1125/ shrinks window 548163715:635:548165175. Bad,
what else can I say?
TCP: peer y.y.y.y:1125/ shrinks window 548175395:635:548176311. Bad,
what else can I say?



Jordan
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Odd Xircom Realport Tulip Behavior

2000-08-31 Thread Jordan Mendelson

[this message was previously cc'ed to tulip-bug]

It seems that my Xircom report refuses to work correctly when first
initialized. I'm running Linux 2.4.0-test7 with the standard
xircom_tulip_cb driver. I can get the Xircom to work just fine, but I
seem to always need to go through a song and dance to do so.

When the driver is first initialized, ifconfig eth0 with an IP and
adding the default route will not work correctly. To get the realport to
work, I have to:

# ifconfig eth0 a.b.c.d netmask x.x.x.x
# ifconfig eth0 down
# ifconfig eth0 up
# route add default gw l.m.n.o

As soon as I bring it back up for the second time, it mysteriously
starts working correctly.

Here are the relevant kernel messages on boot:

cs: IO port probe 0x0c00-0x0cff: clean.
cs: IO port probe 0x0800-0x08ff: clean.
cs: IO port probe 0x0100-0x04ff: excluding 0x220-0x22f 0x330-0x337
0x388-0x38f 0x398-0x39f 0x4d0-0x4d7
cs: IO port probe 0x0a00-0x0aff: clean.
tulip_attach(04:00.0)
PCI: Setting latency timer of device 04:00.0 to 64
xircom_tulip_cb.c:v0.91 4/14/99 [EMAIL PROTECTED] (modified by
[EMAIL PROTECTED] for XIRCOM CBE, fixed by Doug Ledford)
eth0: Xircom Cardbus Adapter (DEC 21143 compatible mode) rev 3 at
0x1c00, 00:10:A4:EB:58:4C, IRQ 9.
eth0:  MII transceiver #0 config 3100 status 7809 advertising 01e1.

Jordy
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Odd Xircom Realport Tulip Behavior

2000-08-31 Thread Jordan Mendelson

[this message was previously cc'ed to tulip-bug]

It seems that my Xircom report refuses to work correctly when first
initialized. I'm running Linux 2.4.0-test7 with the standard
xircom_tulip_cb driver. I can get the Xircom to work just fine, but I
seem to always need to go through a song and dance to do so.

When the driver is first initialized, ifconfig eth0 with an IP and
adding the default route will not work correctly. To get the realport to
work, I have to:

# ifconfig eth0 a.b.c.d netmask x.x.x.x
# ifconfig eth0 down
# ifconfig eth0 up
# route add default gw l.m.n.o

As soon as I bring it back up for the second time, it mysteriously
starts working correctly.

Here are the relevant kernel messages on boot:

cs: IO port probe 0x0c00-0x0cff: clean.
cs: IO port probe 0x0800-0x08ff: clean.
cs: IO port probe 0x0100-0x04ff: excluding 0x220-0x22f 0x330-0x337
0x388-0x38f 0x398-0x39f 0x4d0-0x4d7
cs: IO port probe 0x0a00-0x0aff: clean.
tulip_attach(04:00.0)
PCI: Setting latency timer of device 04:00.0 to 64
xircom_tulip_cb.c:v0.91 4/14/99 [EMAIL PROTECTED] (modified by
[EMAIL PROTECTED] for XIRCOM CBE, fixed by Doug Ledford)
eth0: Xircom Cardbus Adapter (DEC 21143 compatible mode) rev 3 at
0x1c00, 00:10:A4:EB:58:4C, IRQ 9.
eth0:  MII transceiver #0 config 3100 status 7809 advertising 01e1.

Jordy
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/