Re: Abysmal RECV network performance

2001-05-31 Thread Stephen Degler

Hi,

I'm guessing that the tulip driver is not setting the chip up correctly.
I've seen this happen with other tulip variants (21143) when tries to
autonegotiate.  if you do an ifconfig eth1 you will see numerous carrier
and crc errors.

Set the tulip_debug flag to 2 or 3 in /etc/modules.conf and see what
gets said.

A newer version of the driver may help you.  You might try the one on
sourceforge.

Also, I've only ever seen full 100BaseT speeds with decent adapters,
like 21143 based tulips, Intel eepros, and vortex/boomerang 3com cards.
A lot of the cheaper controllers just won't get there.

skd

On Mon, May 28, 2001 at 03:47:22AM +, John William wrote:
> Can someone please help me troubleshoot this problem - I am getting abysmal 
> (see numbers below) network performance on my system, but the poor 
> performance seems limited to receiving data. Transmission is OK.
> 
> The computer in question is a dual Pentium 90 machine. The machine has 
> RedHat 7.0 (kernel 2.2.16-22 from RedHat). I have compiled 2.2.19 (stock) 
> and 2.4.3 (stock) for the machine and used those for testing. I had a 
> NetGear FA310TX card that I used with the "tulip" driver and a 3Com 3CSOHO 
> card (Hurricane chipset) that I used with the "3c59x" driver. I used the 
> netperf package to test performance (latest version, but I don't have the 
> version number off-hand). The numbers netperf is giving me seem to correlate 
> well to FTP statistics I see to the box.
> 
> I have a second machine (P2-350) with a NetGear FA311 (running 2.4.3 and the 
> "natsemi" driver) that I used to talk with the Pentium 90 machine. The two 
> machines are connected through a NetGear FS105 10/100 switch. I also tried 
> using a 10BT hub (see below).
> 
> When connected, the switch indicated 100 Mbps, full duplex connections to 
> both cards. This matches the speed indicator lights on both cards. I have 
> run the miidiag program in the past to verify that the cards are actually 
> set to full duplex, but I didn't run it again this time (this isn't the 
> first time I have tried to chase this problem down).
> 
> For the purposes of this message, call the P2-350 machine "A" and the dual 
> P-90 machine "B". I ran the following tests:
> 
> Machine "A" to localhost  754.74  Mbps
> 
> Kernel 2.2.19SMP
> Machine "B" to localhost  80.63   Mbps
> Machine "B" to "A" (tulip)55.38   Mbps
> Machine "A" to "B" (tulip)10.60   Mbps
> Machine "A" to "B" (3c95x)12.10   Mbps
> 
> Kernel 2.4.3 SMP
> Machine "B" to localhost  83.87   Mbps
> Machine "B" to "A" (tulip)68.07   Mbps
> Machine "A" to "B" (tulip)1.62Mbps
> Machine "A" to "B" (3c95x)2.37Mbps
> 
> Kernel 2.2.16-22 (RedHat kernel)
> Machine "B" to localhost  92.29   Mbps
> Machine "B" to "A" (tulip)57.34   Mbps
> Machine "A" to "B" (tulip)9.98Mbps
> Machine "A" to "B" (3c95x)9.05Mbps
> 
> Now, with both "A" and "B" plugged into a 10BT hub:
> 
> Kernel 2.2.19SMP
> Machine "B" to "A" (tulip)6.96Mbps
> Machine "A" to "B" (tulip)6.89Mbps
> 
> At the end of the runs, I do not see any messages in syslog that would 
> indicate a problem. Using the switch, there were no collisions but looking 
> at /sbin/ifconfig there were a lot of "Frame:" errors on receive. "A lot" 
> means ~30% of the total packets received. This happened with both cards and 
> all kernels.
> 
> The conclusions I draw from this data are:
> 
> 1) Both machines connecting to localhost (data not going out over the wire) 
> give reasonable numbers and are considerably above what I actually see going 
> over the network (as would be expected).
> 2) The P-90 machine seems to have good transmit speed over both cards and 
> all kernels. Transmit performance is close to the localhost numbers, so I 
> can believe them. In the past, I have compared the performance of the FA310 
> to the 3ComSOHO card and there did not seem to be any measurable performance 
> difference between the two.
> 3) Both the FA310 and the 3ComSOHO card have similar receive speeds, leading 
> me to believe that the problem lies with either the machine or the kernel 
> and not the individual cards or drivers.
> 4) Booting the machine as a uni-processor machine (with a non-SMP 2.2.16 
> kernel) did not change anything, so it does not appear to be a problem with 
> SMP.
> 5) Kernel 2.4.3 receive performance is significantly lower than either 2.2.x 
> kernel, so that tends to point to some fundamental problem in the kernel.
> 6) As I understand it, the 3Com card has some hardware acceleration for 
> checksumming, and this is a slow machine, so why is the performance almost 
> identical to the FA310?
> 
> So, my questions are:
> 
> What kind of performance should I be seeing with a P-90 on a 100Mbps 
> connection? I was expecting something in the range of 40-70 Mbps - certainly 
> not 1-2 Mbps.
> 
> What can I do to track this problem down? Has anyone else had problems like 
> this?
> 
> Thanks in advance 

Re: Abysmal RECV network performance

2001-05-31 Thread Ben Greear

John William wrote:
> 
> >Depends on what is driving it...  An application I built can only push
> >about
> >80 Mbps bi-directional on PII 550Mhz machines.  It is not the most
> >efficient program in
> >the world, but it isn't too bad either...
> >
> >I missed the rest of this thread, so maybe you already mentioned it, but
> >what is the bottleneck?  Is your CPU running at 100%?
> >
> >Greatly increasing the buffers both in the drivers and in the sockets
> >does wonders for higher-speed connections, btw.
> >
> >Ben
> 
> I don't know what the bottleneck is. What I'm seeing is ~60Mbps transmit
> speed and anywhere from 1 to 12Mpbs receive speed on a couple 10/100 cards
> using the 2.2.16, 2.2.19 and 2.4.3 kernels.
> 
> I have tried increasing the size of the RX ring buffer and it did not seem
> to make any difference. It appears that there is some sort of overrun or
> other problem. There is a significant slowdown between the 2.2.x and 2.4.x
> kernels.
> 
> However, just tonight, while really hammering on the system, I started to
> get some messages like "eth1: Oversized Ethernet frame spanned multiple
> buffers, status 7fff8301!". Any ideas what could be causing that?

Nope, I'd take it up with the driver developers.  For what it's worth,
the Intel Ether-Express Pro cards are the only ones I've found yet that
really work right at high speeds.  Intel's e100 driver seems to work really
well for me, but the eepro driver also works well with most versions of
the eepro cards I've used...

I have had definate problems with the natsemi (locked up), tulip (won't
autonegotiate multi-port cards correctly, or something), rtl8139 (would
lock up, haven't tried recent drivers though)

I used to assume that Linux had the best/fastest networking support around,
but the reality is that I've had a really hard time finding hardware/drivers
that works at high speeds (60Mbps+, bi-directional).

-- 
Ben Greear <[EMAIL PROTECTED]>  <[EMAIL PROTECTED]>
President of Candela Technologies Inc  http://www.candelatech.com
ScryMUD:  http://scry.wanfear.com http://scry.wanfear.com/~greear
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Abysmal RECV network performance

2001-05-31 Thread John William

>Depends on what is driving it...  An application I built can only push 
>about
>80 Mbps bi-directional on PII 550Mhz machines.  It is not the most 
>efficient program in
>the world, but it isn't too bad either...
>
>I missed the rest of this thread, so maybe you already mentioned it, but
>what is the bottleneck?  Is your CPU running at 100%?
>
>Greatly increasing the buffers both in the drivers and in the sockets
>does wonders for higher-speed connections, btw.
>
>Ben

I don't know what the bottleneck is. What I'm seeing is ~60Mbps transmit 
speed and anywhere from 1 to 12Mpbs receive speed on a couple 10/100 cards 
using the 2.2.16, 2.2.19 and 2.4.3 kernels.

I have tried increasing the size of the RX ring buffer and it did not seem 
to make any difference. It appears that there is some sort of overrun or 
other problem. There is a significant slowdown between the 2.2.x and 2.4.x 
kernels.

However, just tonight, while really hammering on the system, I started to 
get some messages like "eth1: Oversized Ethernet frame spanned multiple 
buffers, status 7fff8301!". Any ideas what could be causing that?

- John

_
Get your FREE download of MSN Explorer at http://explorer.msn.com

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Abysmal RECV network performance

2001-05-31 Thread Ben Greear

John William wrote:
> 
> >I've seen many reports like this where the NIC is invalidly in
> >full-duplex more while the router is in half-duplex mode.
> 
> [root@copper diag]# ./tulip-diag eth1 -m
> tulip-diag.c:v2.08 5/15/2001 Donald Becker ([EMAIL PROTECTED])
> http://www.scyld.com/diag/index.html
> Index #1: Found a Lite-On 82c168 PNIC adapter at 0xfe00.
> Port selection is MII, full-duplex.
> Transmit started, Receive started, full-duplex.
>   The Rx process state is 'Waiting for packets'.
>   The Tx process state is 'Idle'.
>   The transmit threshold is 512.
> MII PHY found at address 1, status 0x782d.
> MII PHY #1 transceiver registers:
>1000 782d 7810  01e1 41e1 0001 
>       
>  4000  38c8 0010  0002
>0001       .
> [root@copper diag]# ./mii-diag eth1
> Basic registers of MII PHY #1:  1000 782d 7810  01e1 41e1 0001 .
> The autonegotiated capability is 01e0.
> The autonegotiated media type is 100baseTx-FD.
> Basic mode control register 0x1000: Auto-negotiation enabled.
> You have link beat, and everything is working OK.
> Your link partner advertised 41e1: 100baseTx-FD 100baseTx 10baseT-FD
> 10baseT.
>End of basic transceiver informaion.
> 
> On the NetGear switch, I have indicator lights for 100baseT-FD on both
> connections used for testing. So it appears to me that everything is working
> correctly (hardware).
> 
> I keep coming back to a problem with the kernel, or that somehow I have two
> cards (FA310 and 3CSOHO) defective in almost exactly the same way, but only
> on receive. If it were a hardware problem, why would I only get poor
> performance in one direction and not both?
> 
> Does anyone have network performance numbers for a comparable machine (P-90
> class)? I'm thinking I should expect 50-70Mbps on a PCI 10/100 ethernet card
> from a P-90 class machine, right?

Depends on what is driving it...  An application I built can only push about
80 Mbps bi-directional on PII 550Mhz machines.  It is not the most efficient program in
the world, but it isn't too bad either...

I missed the rest of this thread, so maybe you already mentioned it, but
what is the bottleneck?  Is your CPU running at 100%?

Greatly increasing the buffers both in the drivers and in the sockets
does wonders for higher-speed connections, btw.

Ben

-- 
Ben Greear <[EMAIL PROTECTED]>  <[EMAIL PROTECTED]>
President of Candela Technologies Inc  http://www.candelatech.com
ScryMUD:  http://scry.wanfear.com http://scry.wanfear.com/~greear
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Abysmal RECV network performance

2001-05-31 Thread John William

>I've seen many reports like this where the NIC is invalidly in
>full-duplex more while the router is in half-duplex mode.

[root@copper diag]# ./tulip-diag eth1 -m
tulip-diag.c:v2.08 5/15/2001 Donald Becker ([EMAIL PROTECTED])
http://www.scyld.com/diag/index.html
Index #1: Found a Lite-On 82c168 PNIC adapter at 0xfe00.
Port selection is MII, full-duplex.
Transmit started, Receive started, full-duplex.
  The Rx process state is 'Waiting for packets'.
  The Tx process state is 'Idle'.
  The transmit threshold is 512.
MII PHY found at address 1, status 0x782d.
MII PHY #1 transceiver registers:
   1000 782d 7810  01e1 41e1 0001 
          
     4000  38c8 0010  0002
   0001       .
[root@copper diag]# ./mii-diag eth1
Basic registers of MII PHY #1:  1000 782d 7810  01e1 41e1 0001 .
The autonegotiated capability is 01e0.
The autonegotiated media type is 100baseTx-FD.
Basic mode control register 0x1000: Auto-negotiation enabled.
You have link beat, and everything is working OK.
Your link partner advertised 41e1: 100baseTx-FD 100baseTx 10baseT-FD 
10baseT.
   End of basic transceiver informaion.

On the NetGear switch, I have indicator lights for 100baseT-FD on both 
connections used for testing. So it appears to me that everything is working 
correctly (hardware).

I keep coming back to a problem with the kernel, or that somehow I have two 
cards (FA310 and 3CSOHO) defective in almost exactly the same way, but only 
on receive. If it were a hardware problem, why would I only get poor 
performance in one direction and not both?

Does anyone have network performance numbers for a comparable machine (P-90 
class)? I'm thinking I should expect 50-70Mbps on a PCI 10/100 ethernet card 
from a P-90 class machine, right?

- John

_
Get your FREE download of MSN Explorer at http://explorer.msn.com

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Abysmal RECV network performance

2001-05-31 Thread Nivedita Singhvi

> >the Netgear FA311/2 (tulip). Found that the link lost
> >connectivity because of card lockups and transmit timeout
> >failures - and some of these were silent. However, I moved
> >to the 3C905C (3c59x driver) which behaved like a champ, and

> I'm a little confused here - do you mean the FA310TX ("tulip" driver) or the 
> FA311/2 ("natsemi" driver)? I have not had any connection problems with 
> either the FA310 or the FA311 cards. I haven't noticed any speed problems 
> with the FA311 card, but I haven't benchmarked it, either. The FA310 is so 
> horribly slow, I couldn't help but notice. Unfortunately, the same is true 
> of the 3cSOHO.

Sorry, meant to describe both, (natsemi and tulip, but latter
on older DEC chip). 

> I looked at tcpdump to try and figure it out, and it appeared that the P-90 
> was taking a very long time to ACK some packets. I am not a TCP/IP guru by 
> any stretch, but my guess at the time was that the packets that were taking 
> forever to get ACK'ed were the ones causing a framing error on the P-90, but 
> again, I'm not an expert.

> The only unusual stat is the framing errors. There are a lot of them under 
> heavy receive load. The machine will go for weeks without a single framing 
> error, but if I blast some netperf action at it (or FTP send to it, etc.) 
> then I get about 1/3 of the incoming packets (to the P-90) with framing 
> errors. I see no other errors at all except a TX overrun error (maybe 1 in 
> 10 packets).

Tried to reproduce this problem last night on my machines
at home (kernel 2.4.4, 500MHz K7/400Mhz K6). Just doing FTP
and netperf tests, didnt see any significant variation between
rcv and tx sides. Admittedly different machines, and between
3C905C and a FA310TX (tulip). However, if the problem
was purely kernel protocol under load, it should have showed. 

Also, am not seeing significant frame errors - 1 in 10K,
definitely not seeing anything remotely like 30%. If 1/3
of your packets are being dropped with frame errs, you'll see
lots of retransmissions, horrible performance, no question. But
I would expect frame errors to be due to things like the
speed not being negotiated correctly(?), or if the board
isnt sitting quite right (true - thats the only experience
I remember of recv code path being error prone compared
to tx), but that should affect all the kernel versions you
ran on that host..

Am pretty clueless about media level issues, but it would
help to identify whats causing the framing errors. 

Not much help, I know..

thanks,
Nivedita

---
Nivedita Singhvi(503) 578-4580
Linux Technology Center [EMAIL PROTECTED]
IBM Beaverton, OR   [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Abysmal RECV network performance

2001-05-31 Thread Nivedita Singhvi

 the Netgear FA311/2 (tulip). Found that the link lost
 connectivity because of card lockups and transmit timeout
 failures - and some of these were silent. However, I moved
 to the 3C905C (3c59x driver) which behaved like a champ, and

 I'm a little confused here - do you mean the FA310TX (tulip driver) or the 
 FA311/2 (natsemi driver)? I have not had any connection problems with 
 either the FA310 or the FA311 cards. I haven't noticed any speed problems 
 with the FA311 card, but I haven't benchmarked it, either. The FA310 is so 
 horribly slow, I couldn't help but notice. Unfortunately, the same is true 
 of the 3cSOHO.

Sorry, meant to describe both, (natsemi and tulip, but latter
on older DEC chip). 

 I looked at tcpdump to try and figure it out, and it appeared that the P-90 
 was taking a very long time to ACK some packets. I am not a TCP/IP guru by 
 any stretch, but my guess at the time was that the packets that were taking 
 forever to get ACK'ed were the ones causing a framing error on the P-90, but 
 again, I'm not an expert.

 The only unusual stat is the framing errors. There are a lot of them under 
 heavy receive load. The machine will go for weeks without a single framing 
 error, but if I blast some netperf action at it (or FTP send to it, etc.) 
 then I get about 1/3 of the incoming packets (to the P-90) with framing 
 errors. I see no other errors at all except a TX overrun error (maybe 1 in 
 10 packets).

Tried to reproduce this problem last night on my machines
at home (kernel 2.4.4, 500MHz K7/400Mhz K6). Just doing FTP
and netperf tests, didnt see any significant variation between
rcv and tx sides. Admittedly different machines, and between
3C905C and a FA310TX (tulip). However, if the problem
was purely kernel protocol under load, it should have showed. 

Also, am not seeing significant frame errors - 1 in 10K,
definitely not seeing anything remotely like 30%. If 1/3
of your packets are being dropped with frame errs, you'll see
lots of retransmissions, horrible performance, no question. But
I would expect frame errors to be due to things like the
speed not being negotiated correctly(?), or if the board
isnt sitting quite right (true - thats the only experience
I remember of recv code path being error prone compared
to tx), but that should affect all the kernel versions you
ran on that host..

Am pretty clueless about media level issues, but it would
help to identify whats causing the framing errors. 

Not much help, I know..

thanks,
Nivedita

---
Nivedita Singhvi(503) 578-4580
Linux Technology Center [EMAIL PROTECTED]
IBM Beaverton, OR   [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Abysmal RECV network performance

2001-05-31 Thread John William

I've seen many reports like this where the NIC is invalidly in
full-duplex more while the router is in half-duplex mode.

[root@copper diag]# ./tulip-diag eth1 -m
tulip-diag.c:v2.08 5/15/2001 Donald Becker ([EMAIL PROTECTED])
http://www.scyld.com/diag/index.html
Index #1: Found a Lite-On 82c168 PNIC adapter at 0xfe00.
Port selection is MII, full-duplex.
Transmit started, Receive started, full-duplex.
  The Rx process state is 'Waiting for packets'.
  The Tx process state is 'Idle'.
  The transmit threshold is 512.
MII PHY found at address 1, status 0x782d.
MII PHY #1 transceiver registers:
   1000 782d 7810  01e1 41e1 0001 
          
     4000  38c8 0010  0002
   0001       .
[root@copper diag]# ./mii-diag eth1
Basic registers of MII PHY #1:  1000 782d 7810  01e1 41e1 0001 .
The autonegotiated capability is 01e0.
The autonegotiated media type is 100baseTx-FD.
Basic mode control register 0x1000: Auto-negotiation enabled.
You have link beat, and everything is working OK.
Your link partner advertised 41e1: 100baseTx-FD 100baseTx 10baseT-FD 
10baseT.
   End of basic transceiver informaion.

On the NetGear switch, I have indicator lights for 100baseT-FD on both 
connections used for testing. So it appears to me that everything is working 
correctly (hardware).

I keep coming back to a problem with the kernel, or that somehow I have two 
cards (FA310 and 3CSOHO) defective in almost exactly the same way, but only 
on receive. If it were a hardware problem, why would I only get poor 
performance in one direction and not both?

Does anyone have network performance numbers for a comparable machine (P-90 
class)? I'm thinking I should expect 50-70Mbps on a PCI 10/100 ethernet card 
from a P-90 class machine, right?

- John

_
Get your FREE download of MSN Explorer at http://explorer.msn.com

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Abysmal RECV network performance

2001-05-31 Thread Ben Greear

John William wrote:
 
 I've seen many reports like this where the NIC is invalidly in
 full-duplex more while the router is in half-duplex mode.
 
 [root@copper diag]# ./tulip-diag eth1 -m
 tulip-diag.c:v2.08 5/15/2001 Donald Becker ([EMAIL PROTECTED])
 http://www.scyld.com/diag/index.html
 Index #1: Found a Lite-On 82c168 PNIC adapter at 0xfe00.
 Port selection is MII, full-duplex.
 Transmit started, Receive started, full-duplex.
   The Rx process state is 'Waiting for packets'.
   The Tx process state is 'Idle'.
   The transmit threshold is 512.
 MII PHY found at address 1, status 0x782d.
 MII PHY #1 transceiver registers:
1000 782d 7810  01e1 41e1 0001 
       
  4000  38c8 0010  0002
0001       .
 [root@copper diag]# ./mii-diag eth1
 Basic registers of MII PHY #1:  1000 782d 7810  01e1 41e1 0001 .
 The autonegotiated capability is 01e0.
 The autonegotiated media type is 100baseTx-FD.
 Basic mode control register 0x1000: Auto-negotiation enabled.
 You have link beat, and everything is working OK.
 Your link partner advertised 41e1: 100baseTx-FD 100baseTx 10baseT-FD
 10baseT.
End of basic transceiver informaion.
 
 On the NetGear switch, I have indicator lights for 100baseT-FD on both
 connections used for testing. So it appears to me that everything is working
 correctly (hardware).
 
 I keep coming back to a problem with the kernel, or that somehow I have two
 cards (FA310 and 3CSOHO) defective in almost exactly the same way, but only
 on receive. If it were a hardware problem, why would I only get poor
 performance in one direction and not both?
 
 Does anyone have network performance numbers for a comparable machine (P-90
 class)? I'm thinking I should expect 50-70Mbps on a PCI 10/100 ethernet card
 from a P-90 class machine, right?

Depends on what is driving it...  An application I built can only push about
80 Mbps bi-directional on PII 550Mhz machines.  It is not the most efficient program in
the world, but it isn't too bad either...

I missed the rest of this thread, so maybe you already mentioned it, but
what is the bottleneck?  Is your CPU running at 100%?

Greatly increasing the buffers both in the drivers and in the sockets
does wonders for higher-speed connections, btw.

Ben

-- 
Ben Greear [EMAIL PROTECTED]  [EMAIL PROTECTED]
President of Candela Technologies Inc  http://www.candelatech.com
ScryMUD:  http://scry.wanfear.com http://scry.wanfear.com/~greear
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Abysmal RECV network performance

2001-05-31 Thread John William

Depends on what is driving it...  An application I built can only push 
about
80 Mbps bi-directional on PII 550Mhz machines.  It is not the most 
efficient program in
the world, but it isn't too bad either...

I missed the rest of this thread, so maybe you already mentioned it, but
what is the bottleneck?  Is your CPU running at 100%?

Greatly increasing the buffers both in the drivers and in the sockets
does wonders for higher-speed connections, btw.

Ben

I don't know what the bottleneck is. What I'm seeing is ~60Mbps transmit 
speed and anywhere from 1 to 12Mpbs receive speed on a couple 10/100 cards 
using the 2.2.16, 2.2.19 and 2.4.3 kernels.

I have tried increasing the size of the RX ring buffer and it did not seem 
to make any difference. It appears that there is some sort of overrun or 
other problem. There is a significant slowdown between the 2.2.x and 2.4.x 
kernels.

However, just tonight, while really hammering on the system, I started to 
get some messages like eth1: Oversized Ethernet frame spanned multiple 
buffers, status 7fff8301!. Any ideas what could be causing that?

- John

_
Get your FREE download of MSN Explorer at http://explorer.msn.com

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Abysmal RECV network performance

2001-05-31 Thread Ben Greear

John William wrote:
 
 Depends on what is driving it...  An application I built can only push
 about
 80 Mbps bi-directional on PII 550Mhz machines.  It is not the most
 efficient program in
 the world, but it isn't too bad either...
 
 I missed the rest of this thread, so maybe you already mentioned it, but
 what is the bottleneck?  Is your CPU running at 100%?
 
 Greatly increasing the buffers both in the drivers and in the sockets
 does wonders for higher-speed connections, btw.
 
 Ben
 
 I don't know what the bottleneck is. What I'm seeing is ~60Mbps transmit
 speed and anywhere from 1 to 12Mpbs receive speed on a couple 10/100 cards
 using the 2.2.16, 2.2.19 and 2.4.3 kernels.
 
 I have tried increasing the size of the RX ring buffer and it did not seem
 to make any difference. It appears that there is some sort of overrun or
 other problem. There is a significant slowdown between the 2.2.x and 2.4.x
 kernels.
 
 However, just tonight, while really hammering on the system, I started to
 get some messages like eth1: Oversized Ethernet frame spanned multiple
 buffers, status 7fff8301!. Any ideas what could be causing that?

Nope, I'd take it up with the driver developers.  For what it's worth,
the Intel Ether-Express Pro cards are the only ones I've found yet that
really work right at high speeds.  Intel's e100 driver seems to work really
well for me, but the eepro driver also works well with most versions of
the eepro cards I've used...

I have had definate problems with the natsemi (locked up), tulip (won't
autonegotiate multi-port cards correctly, or something), rtl8139 (would
lock up, haven't tried recent drivers though)

I used to assume that Linux had the best/fastest networking support around,
but the reality is that I've had a really hard time finding hardware/drivers
that works at high speeds (60Mbps+, bi-directional).

-- 
Ben Greear [EMAIL PROTECTED]  [EMAIL PROTECTED]
President of Candela Technologies Inc  http://www.candelatech.com
ScryMUD:  http://scry.wanfear.com http://scry.wanfear.com/~greear
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Abysmal RECV network performance

2001-05-31 Thread Stephen Degler

Hi,

I'm guessing that the tulip driver is not setting the chip up correctly.
I've seen this happen with other tulip variants (21143) when tries to
autonegotiate.  if you do an ifconfig eth1 you will see numerous carrier
and crc errors.

Set the tulip_debug flag to 2 or 3 in /etc/modules.conf and see what
gets said.

A newer version of the driver may help you.  You might try the one on
sourceforge.

Also, I've only ever seen full 100BaseT speeds with decent adapters,
like 21143 based tulips, Intel eepros, and vortex/boomerang 3com cards.
A lot of the cheaper controllers just won't get there.

skd

On Mon, May 28, 2001 at 03:47:22AM +, John William wrote:
 Can someone please help me troubleshoot this problem - I am getting abysmal 
 (see numbers below) network performance on my system, but the poor 
 performance seems limited to receiving data. Transmission is OK.
 
 The computer in question is a dual Pentium 90 machine. The machine has 
 RedHat 7.0 (kernel 2.2.16-22 from RedHat). I have compiled 2.2.19 (stock) 
 and 2.4.3 (stock) for the machine and used those for testing. I had a 
 NetGear FA310TX card that I used with the tulip driver and a 3Com 3CSOHO 
 card (Hurricane chipset) that I used with the 3c59x driver. I used the 
 netperf package to test performance (latest version, but I don't have the 
 version number off-hand). The numbers netperf is giving me seem to correlate 
 well to FTP statistics I see to the box.
 
 I have a second machine (P2-350) with a NetGear FA311 (running 2.4.3 and the 
 natsemi driver) that I used to talk with the Pentium 90 machine. The two 
 machines are connected through a NetGear FS105 10/100 switch. I also tried 
 using a 10BT hub (see below).
 
 When connected, the switch indicated 100 Mbps, full duplex connections to 
 both cards. This matches the speed indicator lights on both cards. I have 
 run the miidiag program in the past to verify that the cards are actually 
 set to full duplex, but I didn't run it again this time (this isn't the 
 first time I have tried to chase this problem down).
 
 For the purposes of this message, call the P2-350 machine A and the dual 
 P-90 machine B. I ran the following tests:
 
 Machine A to localhost  754.74  Mbps
 
 Kernel 2.2.19SMP
 Machine B to localhost  80.63   Mbps
 Machine B to A (tulip)55.38   Mbps
 Machine A to B (tulip)10.60   Mbps
 Machine A to B (3c95x)12.10   Mbps
 
 Kernel 2.4.3 SMP
 Machine B to localhost  83.87   Mbps
 Machine B to A (tulip)68.07   Mbps
 Machine A to B (tulip)1.62Mbps
 Machine A to B (3c95x)2.37Mbps
 
 Kernel 2.2.16-22 (RedHat kernel)
 Machine B to localhost  92.29   Mbps
 Machine B to A (tulip)57.34   Mbps
 Machine A to B (tulip)9.98Mbps
 Machine A to B (3c95x)9.05Mbps
 
 Now, with both A and B plugged into a 10BT hub:
 
 Kernel 2.2.19SMP
 Machine B to A (tulip)6.96Mbps
 Machine A to B (tulip)6.89Mbps
 
 At the end of the runs, I do not see any messages in syslog that would 
 indicate a problem. Using the switch, there were no collisions but looking 
 at /sbin/ifconfig there were a lot of Frame: errors on receive. A lot 
 means ~30% of the total packets received. This happened with both cards and 
 all kernels.
 
 The conclusions I draw from this data are:
 
 1) Both machines connecting to localhost (data not going out over the wire) 
 give reasonable numbers and are considerably above what I actually see going 
 over the network (as would be expected).
 2) The P-90 machine seems to have good transmit speed over both cards and 
 all kernels. Transmit performance is close to the localhost numbers, so I 
 can believe them. In the past, I have compared the performance of the FA310 
 to the 3ComSOHO card and there did not seem to be any measurable performance 
 difference between the two.
 3) Both the FA310 and the 3ComSOHO card have similar receive speeds, leading 
 me to believe that the problem lies with either the machine or the kernel 
 and not the individual cards or drivers.
 4) Booting the machine as a uni-processor machine (with a non-SMP 2.2.16 
 kernel) did not change anything, so it does not appear to be a problem with 
 SMP.
 5) Kernel 2.4.3 receive performance is significantly lower than either 2.2.x 
 kernel, so that tends to point to some fundamental problem in the kernel.
 6) As I understand it, the 3Com card has some hardware acceleration for 
 checksumming, and this is a slow machine, so why is the performance almost 
 identical to the FA310?
 
 So, my questions are:
 
 What kind of performance should I be seeing with a P-90 on a 100Mbps 
 connection? I was expecting something in the range of 40-70 Mbps - certainly 
 not 1-2 Mbps.
 
 What can I do to track this problem down? Has anyone else had problems like 
 this?
 
 Thanks in advance for any help you can offer.
 
 - John
 
 _
 Get your FREE download of MSN Explorer at 

Re: Abysmal RECV network performance

2001-05-29 Thread John William

>From: Nivedita Singhvi <[EMAIL PROTECTED]>
>To: [EMAIL PROTECTED]
>CC: [EMAIL PROTECTED]
>Subject: Re: Abysmal RECV network performance
>Date: Mon, 28 May 2001 23:45:28 -0700 (PDT)

>While we didnt use 2.2 kernels at all, we did similar tests
>on 2.4.0 through 2.4.4 kernels, on UP and SMP. I've used
>a similar machine (PII 333MHz) as well as faster (866MHz)
>machines, and got pretty nifty (> 90Mbs) throughput on
>netperf tests (tcp stream, no disk I/O) over a 100Mb full
>duplex link.  (not sure if there are any P-90 issues).
>
>Throughput does drop with small MTU, very small packet sizes,
>small socket buffer sizes, but only at extremes, for the most
>part throughput was well over 70Mbs. (this is true for single
>connections, you dont mention how many connections you were
>scaling to, if any).

Sorry, yes - I was doing single connection tests with no changes to the 
default Netperf settings. Each machine was running a copy of the server and 
then I ran the test with "netperf -H x.x.x.x" to each machine (or just 
"netperf" for a localhost speed check).

>However, we did run into serious performance problems with
>the Netgear FA311/2 (tulip). Found that the link lost
>connectivity because of card lockups and transmit timeout
>failures - and some of these were silent. However, I moved
>to the 3C905C (3c59x driver) which behaved like a champ, and
>we didnt see the problems any more, so have stuck to that card.
>This was back in the 2.4.0 time frame, and there have many
>patches since then to various drivers, so not sure if the
>problem(s) have been resolved or not (likely to have been,
>extensively reported). Both your cards might actually be
>underperforming..

I'm a little confused here - do you mean the FA310TX ("tulip" driver) or the 
FA311/2 ("natsemi" driver)? I have not had any connection problems with 
either the FA310 or the FA311 cards. I haven't noticed any speed problems 
with the FA311 card, but I haven't benchmarked it, either. The FA310 is so 
horribly slow, I couldn't help but notice. Unfortunately, the same is true 
of the 3cSOHO.

While I am willing to accept that both the FA310 and FA311 cards are 
underperforming, I think it is more than a little strange that the 3cSOHO 
card would turn in the same performance numbers. Also, keep in mind that I 
was only seeing horrible receive performance, TX performance seemed to be 
ok.

I didn't post FTP numbers (both machines are running FTP servers). While the 
FTP performance numbers are probably not as "scientific" as Netperf, they do 
seem to agree from what I have observered. I.e. retrieving files from the 
P-90 machine is ok (~3MB/sec) but sending files to it is very slow 
(~100K/sec). This roughly agrees with the Netperf numbers I saw.

FTP transfers to the FA311 machine (P2-350) are OK in both directions.

>Are you seeing any errors reported in /var/log/messages?
>Are you monitoring your connection via tcpdump, for example?
>You might sometimes see long gaps in transmission...Are
>there any abnormal numbers in /proc/net/ stats? I dont remember
>seeing that high frame errors, although there were a few.

No, I don't see anything in /var/log/messages.

I looked at tcpdump to try and figure it out, and it appeared that the P-90 
was taking a very long time to ACK some packets. I am not a TCP/IP guru by 
any stretch, but my guess at the time was that the packets that were taking 
forever to get ACK'ed were the ones causing a framing error on the P-90, but 
again, I'm not an expert.

The only unusual stat is the framing errors. There are a lot of them under 
heavy receive load. The machine will go for weeks without a single framing 
error, but if I blast some netperf action at it (or FTP send to it, etc.) 
then I get about 1/3 of the incoming packets (to the P-90) with framing 
errors. I see no other errors at all except a TX overrun error (maybe 1 in 
10 packets).

>HW checksumming for the kind of test you are doing (tcp, mostly
>fast path) will not buy you any real performance gain, the
>checksum is actually consumed by the user-kernel copy routine.

Ok, I'll take your word for it. The P-90 isn't a very fast machine to begin 
with, so I thought it could use all the HW assistance it could get (that and 
the 3cSOHO card was really cheap :-).

I am very disappointed that TCP/IP performance on this machine is so lousy, 
but the problem is clearly with the kernel - just look at the performance 
numbers for 2.4.3 vs 2.2.19 (or 2.2.16). Those numbers aren't exactly great, 
but they are a lot better than 2.4.3.

>You can also run the tests on a profiling kernel and compare
>results...
>
>Nivedita
>
>---
>Nivedita Singhvi(503) 578-4580
>Linux Technology Center [EMAIL PROTECTED]
>IBM Beaverton, OR   

Re: Abysmal RECV network performance

2001-05-29 Thread Nivedita Singhvi

> Can someone please help me troubleshoot this problem - 
> I am getting abysmal (see numbers below) network performance 
> on my system, but the poor performance seems limited to receiving 
> data. Transmission is OK. 

[ snip ]

> What kind of performance should I be seeing with a P-90 
> on a 100Mbps connection? I was expecting something in the 
> range of 40-70 Mbps - certainly not 1-2 Mbps. 
> 
> What can I do to track this problem down? Has anyone else 
> had problems like this? 

While we didnt use 2.2 kernels at all, we did similar tests
on 2.4.0 through 2.4.4 kernels, on UP and SMP. I've used
a similar machine (PII 333MHz) as well as faster (866MHz) 
machines, and got pretty nifty (> 90Mbs) throughput on 
netperf tests (tcp stream, no disk I/O) over a 100Mb full
duplex link.  (not sure if there are any P-90 issues).

Throughput does drop with small MTU, very small packet sizes,
small socket buffer sizes, but only at extremes, for the most
part throughput was well over 70Mbs. (this is true for single
connections, you dont mention how many connections you were
scaling to, if any).

However, we did run into serious performance problems with
the Netgear FA311/2 (tulip). Found that the link lost
connectivity because of card lockups and transmit timeout 
failures - and some of these were silent. However, I moved 
to the 3C905C (3c59x driver) which behaved like a champ, and 
we didnt see the problems any more, so have stuck to that card.  
This was back in the 2.4.0 time frame, and there have many 
patches since then to various drivers, so not sure if the
problem(s) have been resolved or not (likely to have been,
extensively reported). Both your cards might actually be
underperforming..

Are you seeing any errors reported in /var/log/messages?
Are you monitoring your connection via tcpdump, for example?
You might sometimes see long gaps in transmission...Are
there any abnormal numbers in /proc/net/ stats? I dont remember
seeing that high frame errors, although there were a few. 

HW checksumming for the kind of test you are doing (tcp, mostly
fast path) will not buy you any real performance gain, the
checksum is actually consumed by the user-kernel copy routine.

You can also run the tests on a profiling kernel and compare
results... 

Nivedita

---
Nivedita Singhvi(503) 578-4580
Linux Technology Center [EMAIL PROTECTED]
IBM Beaverton, OR   [EMAIL PROTECTED]


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Abysmal RECV network performance

2001-05-29 Thread Nivedita Singhvi

 Can someone please help me troubleshoot this problem - 
 I am getting abysmal (see numbers below) network performance 
 on my system, but the poor performance seems limited to receiving 
 data. Transmission is OK. 

[ snip ]

 What kind of performance should I be seeing with a P-90 
 on a 100Mbps connection? I was expecting something in the 
 range of 40-70 Mbps - certainly not 1-2 Mbps. 
 
 What can I do to track this problem down? Has anyone else 
 had problems like this? 

While we didnt use 2.2 kernels at all, we did similar tests
on 2.4.0 through 2.4.4 kernels, on UP and SMP. I've used
a similar machine (PII 333MHz) as well as faster (866MHz) 
machines, and got pretty nifty ( 90Mbs) throughput on 
netperf tests (tcp stream, no disk I/O) over a 100Mb full
duplex link.  (not sure if there are any P-90 issues).

Throughput does drop with small MTU, very small packet sizes,
small socket buffer sizes, but only at extremes, for the most
part throughput was well over 70Mbs. (this is true for single
connections, you dont mention how many connections you were
scaling to, if any).

However, we did run into serious performance problems with
the Netgear FA311/2 (tulip). Found that the link lost
connectivity because of card lockups and transmit timeout 
failures - and some of these were silent. However, I moved 
to the 3C905C (3c59x driver) which behaved like a champ, and 
we didnt see the problems any more, so have stuck to that card.  
This was back in the 2.4.0 time frame, and there have many 
patches since then to various drivers, so not sure if the
problem(s) have been resolved or not (likely to have been,
extensively reported). Both your cards might actually be
underperforming..

Are you seeing any errors reported in /var/log/messages?
Are you monitoring your connection via tcpdump, for example?
You might sometimes see long gaps in transmission...Are
there any abnormal numbers in /proc/net/ stats? I dont remember
seeing that high frame errors, although there were a few. 

HW checksumming for the kind of test you are doing (tcp, mostly
fast path) will not buy you any real performance gain, the
checksum is actually consumed by the user-kernel copy routine.

You can also run the tests on a profiling kernel and compare
results... 

Nivedita

---
Nivedita Singhvi(503) 578-4580
Linux Technology Center [EMAIL PROTECTED]
IBM Beaverton, OR   [EMAIL PROTECTED]


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Abysmal RECV network performance

2001-05-29 Thread John William

From: Nivedita Singhvi [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
CC: [EMAIL PROTECTED]
Subject: Re: Abysmal RECV network performance
Date: Mon, 28 May 2001 23:45:28 -0700 (PDT)
snip
While we didnt use 2.2 kernels at all, we did similar tests
on 2.4.0 through 2.4.4 kernels, on UP and SMP. I've used
a similar machine (PII 333MHz) as well as faster (866MHz)
machines, and got pretty nifty ( 90Mbs) throughput on
netperf tests (tcp stream, no disk I/O) over a 100Mb full
duplex link.  (not sure if there are any P-90 issues).

Throughput does drop with small MTU, very small packet sizes,
small socket buffer sizes, but only at extremes, for the most
part throughput was well over 70Mbs. (this is true for single
connections, you dont mention how many connections you were
scaling to, if any).

Sorry, yes - I was doing single connection tests with no changes to the 
default Netperf settings. Each machine was running a copy of the server and 
then I ran the test with netperf -H x.x.x.x to each machine (or just 
netperf for a localhost speed check).

However, we did run into serious performance problems with
the Netgear FA311/2 (tulip). Found that the link lost
connectivity because of card lockups and transmit timeout
failures - and some of these were silent. However, I moved
to the 3C905C (3c59x driver) which behaved like a champ, and
we didnt see the problems any more, so have stuck to that card.
This was back in the 2.4.0 time frame, and there have many
patches since then to various drivers, so not sure if the
problem(s) have been resolved or not (likely to have been,
extensively reported). Both your cards might actually be
underperforming..

I'm a little confused here - do you mean the FA310TX (tulip driver) or the 
FA311/2 (natsemi driver)? I have not had any connection problems with 
either the FA310 or the FA311 cards. I haven't noticed any speed problems 
with the FA311 card, but I haven't benchmarked it, either. The FA310 is so 
horribly slow, I couldn't help but notice. Unfortunately, the same is true 
of the 3cSOHO.

While I am willing to accept that both the FA310 and FA311 cards are 
underperforming, I think it is more than a little strange that the 3cSOHO 
card would turn in the same performance numbers. Also, keep in mind that I 
was only seeing horrible receive performance, TX performance seemed to be 
ok.

I didn't post FTP numbers (both machines are running FTP servers). While the 
FTP performance numbers are probably not as scientific as Netperf, they do 
seem to agree from what I have observered. I.e. retrieving files from the 
P-90 machine is ok (~3MB/sec) but sending files to it is very slow 
(~100K/sec). This roughly agrees with the Netperf numbers I saw.

FTP transfers to the FA311 machine (P2-350) are OK in both directions.

Are you seeing any errors reported in /var/log/messages?
Are you monitoring your connection via tcpdump, for example?
You might sometimes see long gaps in transmission...Are
there any abnormal numbers in /proc/net/ stats? I dont remember
seeing that high frame errors, although there were a few.

No, I don't see anything in /var/log/messages.

I looked at tcpdump to try and figure it out, and it appeared that the P-90 
was taking a very long time to ACK some packets. I am not a TCP/IP guru by 
any stretch, but my guess at the time was that the packets that were taking 
forever to get ACK'ed were the ones causing a framing error on the P-90, but 
again, I'm not an expert.

The only unusual stat is the framing errors. There are a lot of them under 
heavy receive load. The machine will go for weeks without a single framing 
error, but if I blast some netperf action at it (or FTP send to it, etc.) 
then I get about 1/3 of the incoming packets (to the P-90) with framing 
errors. I see no other errors at all except a TX overrun error (maybe 1 in 
10 packets).

HW checksumming for the kind of test you are doing (tcp, mostly
fast path) will not buy you any real performance gain, the
checksum is actually consumed by the user-kernel copy routine.

Ok, I'll take your word for it. The P-90 isn't a very fast machine to begin 
with, so I thought it could use all the HW assistance it could get (that and 
the 3cSOHO card was really cheap :-).

I am very disappointed that TCP/IP performance on this machine is so lousy, 
but the problem is clearly with the kernel - just look at the performance 
numbers for 2.4.3 vs 2.2.19 (or 2.2.16). Those numbers aren't exactly great, 
but they are a lot better than 2.4.3.

You can also run the tests on a profiling kernel and compare
results...

Nivedita

---
Nivedita Singhvi(503) 578-4580
Linux Technology Center [EMAIL PROTECTED]
IBM Beaverton, OR   [EMAIL PROTECTED]

Thanks for the assistance. Based on the benchmark information I have, I 
would say that there is a problem with the kernel and would like to pursue 
getting that fixed. I just can't justify why 2.4.3 should be 600% slower 
than

Abysmal RECV network performance

2001-05-27 Thread John William

Can someone please help me troubleshoot this problem - I am getting abysmal 
(see numbers below) network performance on my system, but the poor 
performance seems limited to receiving data. Transmission is OK.

The computer in question is a dual Pentium 90 machine. The machine has 
RedHat 7.0 (kernel 2.2.16-22 from RedHat). I have compiled 2.2.19 (stock) 
and 2.4.3 (stock) for the machine and used those for testing. I had a 
NetGear FA310TX card that I used with the "tulip" driver and a 3Com 3CSOHO 
card (Hurricane chipset) that I used with the "3c59x" driver. I used the 
netperf package to test performance (latest version, but I don't have the 
version number off-hand). The numbers netperf is giving me seem to correlate 
well to FTP statistics I see to the box.

I have a second machine (P2-350) with a NetGear FA311 (running 2.4.3 and the 
"natsemi" driver) that I used to talk with the Pentium 90 machine. The two 
machines are connected through a NetGear FS105 10/100 switch. I also tried 
using a 10BT hub (see below).

When connected, the switch indicated 100 Mbps, full duplex connections to 
both cards. This matches the speed indicator lights on both cards. I have 
run the miidiag program in the past to verify that the cards are actually 
set to full duplex, but I didn't run it again this time (this isn't the 
first time I have tried to chase this problem down).

For the purposes of this message, call the P2-350 machine "A" and the dual 
P-90 machine "B". I ran the following tests:

Machine "A" to localhost754.74  Mbps

Kernel 2.2.19SMP
Machine "B" to localhost80.63   Mbps
Machine "B" to "A" (tulip)  55.38   Mbps
Machine "A" to "B" (tulip)  10.60   Mbps
Machine "A" to "B" (3c95x)  12.10   Mbps

Kernel 2.4.3 SMP
Machine "B" to localhost83.87   Mbps
Machine "B" to "A" (tulip)  68.07   Mbps
Machine "A" to "B" (tulip)  1.62Mbps
Machine "A" to "B" (3c95x)  2.37Mbps

Kernel 2.2.16-22 (RedHat kernel)
Machine "B" to localhost92.29   Mbps
Machine "B" to "A" (tulip)  57.34   Mbps
Machine "A" to "B" (tulip)  9.98Mbps
Machine "A" to "B" (3c95x)  9.05Mbps

Now, with both "A" and "B" plugged into a 10BT hub:

Kernel 2.2.19SMP
Machine "B" to "A" (tulip)  6.96Mbps
Machine "A" to "B" (tulip)  6.89Mbps

At the end of the runs, I do not see any messages in syslog that would 
indicate a problem. Using the switch, there were no collisions but looking 
at /sbin/ifconfig there were a lot of "Frame:" errors on receive. "A lot" 
means ~30% of the total packets received. This happened with both cards and 
all kernels.

The conclusions I draw from this data are:

1) Both machines connecting to localhost (data not going out over the wire) 
give reasonable numbers and are considerably above what I actually see going 
over the network (as would be expected).
2) The P-90 machine seems to have good transmit speed over both cards and 
all kernels. Transmit performance is close to the localhost numbers, so I 
can believe them. In the past, I have compared the performance of the FA310 
to the 3ComSOHO card and there did not seem to be any measurable performance 
difference between the two.
3) Both the FA310 and the 3ComSOHO card have similar receive speeds, leading 
me to believe that the problem lies with either the machine or the kernel 
and not the individual cards or drivers.
4) Booting the machine as a uni-processor machine (with a non-SMP 2.2.16 
kernel) did not change anything, so it does not appear to be a problem with 
SMP.
5) Kernel 2.4.3 receive performance is significantly lower than either 2.2.x 
kernel, so that tends to point to some fundamental problem in the kernel.
6) As I understand it, the 3Com card has some hardware acceleration for 
checksumming, and this is a slow machine, so why is the performance almost 
identical to the FA310?

So, my questions are:

What kind of performance should I be seeing with a P-90 on a 100Mbps 
connection? I was expecting something in the range of 40-70 Mbps - certainly 
not 1-2 Mbps.

What can I do to track this problem down? Has anyone else had problems like 
this?

Thanks in advance for any help you can offer.

- John

_
Get your FREE download of MSN Explorer at http://explorer.msn.com

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Abysmal RECV network performance

2001-05-27 Thread John William

Can someone please help me troubleshoot this problem - I am getting abysmal 
(see numbers below) network performance on my system, but the poor 
performance seems limited to receiving data. Transmission is OK.

The computer in question is a dual Pentium 90 machine. The machine has 
RedHat 7.0 (kernel 2.2.16-22 from RedHat). I have compiled 2.2.19 (stock) 
and 2.4.3 (stock) for the machine and used those for testing. I had a 
NetGear FA310TX card that I used with the tulip driver and a 3Com 3CSOHO 
card (Hurricane chipset) that I used with the 3c59x driver. I used the 
netperf package to test performance (latest version, but I don't have the 
version number off-hand). The numbers netperf is giving me seem to correlate 
well to FTP statistics I see to the box.

I have a second machine (P2-350) with a NetGear FA311 (running 2.4.3 and the 
natsemi driver) that I used to talk with the Pentium 90 machine. The two 
machines are connected through a NetGear FS105 10/100 switch. I also tried 
using a 10BT hub (see below).

When connected, the switch indicated 100 Mbps, full duplex connections to 
both cards. This matches the speed indicator lights on both cards. I have 
run the miidiag program in the past to verify that the cards are actually 
set to full duplex, but I didn't run it again this time (this isn't the 
first time I have tried to chase this problem down).

For the purposes of this message, call the P2-350 machine A and the dual 
P-90 machine B. I ran the following tests:

Machine A to localhost754.74  Mbps

Kernel 2.2.19SMP
Machine B to localhost80.63   Mbps
Machine B to A (tulip)  55.38   Mbps
Machine A to B (tulip)  10.60   Mbps
Machine A to B (3c95x)  12.10   Mbps

Kernel 2.4.3 SMP
Machine B to localhost83.87   Mbps
Machine B to A (tulip)  68.07   Mbps
Machine A to B (tulip)  1.62Mbps
Machine A to B (3c95x)  2.37Mbps

Kernel 2.2.16-22 (RedHat kernel)
Machine B to localhost92.29   Mbps
Machine B to A (tulip)  57.34   Mbps
Machine A to B (tulip)  9.98Mbps
Machine A to B (3c95x)  9.05Mbps

Now, with both A and B plugged into a 10BT hub:

Kernel 2.2.19SMP
Machine B to A (tulip)  6.96Mbps
Machine A to B (tulip)  6.89Mbps

At the end of the runs, I do not see any messages in syslog that would 
indicate a problem. Using the switch, there were no collisions but looking 
at /sbin/ifconfig there were a lot of Frame: errors on receive. A lot 
means ~30% of the total packets received. This happened with both cards and 
all kernels.

The conclusions I draw from this data are:

1) Both machines connecting to localhost (data not going out over the wire) 
give reasonable numbers and are considerably above what I actually see going 
over the network (as would be expected).
2) The P-90 machine seems to have good transmit speed over both cards and 
all kernels. Transmit performance is close to the localhost numbers, so I 
can believe them. In the past, I have compared the performance of the FA310 
to the 3ComSOHO card and there did not seem to be any measurable performance 
difference between the two.
3) Both the FA310 and the 3ComSOHO card have similar receive speeds, leading 
me to believe that the problem lies with either the machine or the kernel 
and not the individual cards or drivers.
4) Booting the machine as a uni-processor machine (with a non-SMP 2.2.16 
kernel) did not change anything, so it does not appear to be a problem with 
SMP.
5) Kernel 2.4.3 receive performance is significantly lower than either 2.2.x 
kernel, so that tends to point to some fundamental problem in the kernel.
6) As I understand it, the 3Com card has some hardware acceleration for 
checksumming, and this is a slow machine, so why is the performance almost 
identical to the FA310?

So, my questions are:

What kind of performance should I be seeing with a P-90 on a 100Mbps 
connection? I was expecting something in the range of 40-70 Mbps - certainly 
not 1-2 Mbps.

What can I do to track this problem down? Has anyone else had problems like 
this?

Thanks in advance for any help you can offer.

- John

_
Get your FREE download of MSN Explorer at http://explorer.msn.com

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/