Re: Abysmal RECV network performance

2001-05-31 Thread Stephen Degler

Hi,

I'm guessing that the tulip driver is not setting the chip up correctly.
I've seen this happen with other tulip variants (21143) when tries to
autonegotiate.  if you do an ifconfig eth1 you will see numerous carrier
and crc errors.

Set the tulip_debug flag to 2 or 3 in /etc/modules.conf and see what
gets said.

A newer version of the driver may help you.  You might try the one on
sourceforge.

Also, I've only ever seen full 100BaseT speeds with decent adapters,
like 21143 based tulips, Intel eepros, and vortex/boomerang 3com cards.
A lot of the cheaper controllers just won't get there.

skd

On Mon, May 28, 2001 at 03:47:22AM +, John William wrote:
> Can someone please help me troubleshoot this problem - I am getting abysmal 
> (see numbers below) network performance on my system, but the poor 
> performance seems limited to receiving data. Transmission is OK.
> 
> The computer in question is a dual Pentium 90 machine. The machine has 
> RedHat 7.0 (kernel 2.2.16-22 from RedHat). I have compiled 2.2.19 (stock) 
> and 2.4.3 (stock) for the machine and used those for testing. I had a 
> NetGear FA310TX card that I used with the "tulip" driver and a 3Com 3CSOHO 
> card (Hurricane chipset) that I used with the "3c59x" driver. I used the 
> netperf package to test performance (latest version, but I don't have the 
> version number off-hand). The numbers netperf is giving me seem to correlate 
> well to FTP statistics I see to the box.
> 
> I have a second machine (P2-350) with a NetGear FA311 (running 2.4.3 and the 
> "natsemi" driver) that I used to talk with the Pentium 90 machine. The two 
> machines are connected through a NetGear FS105 10/100 switch. I also tried 
> using a 10BT hub (see below).
> 
> When connected, the switch indicated 100 Mbps, full duplex connections to 
> both cards. This matches the speed indicator lights on both cards. I have 
> run the miidiag program in the past to verify that the cards are actually 
> set to full duplex, but I didn't run it again this time (this isn't the 
> first time I have tried to chase this problem down).
> 
> For the purposes of this message, call the P2-350 machine "A" and the dual 
> P-90 machine "B". I ran the following tests:
> 
> Machine "A" to localhost  754.74  Mbps
> 
> Kernel 2.2.19SMP
> Machine "B" to localhost  80.63   Mbps
> Machine "B" to "A" (tulip)55.38   Mbps
> Machine "A" to "B" (tulip)10.60   Mbps
> Machine "A" to "B" (3c95x)12.10   Mbps
> 
> Kernel 2.4.3 SMP
> Machine "B" to localhost  83.87   Mbps
> Machine "B" to "A" (tulip)68.07   Mbps
> Machine "A" to "B" (tulip)1.62Mbps
> Machine "A" to "B" (3c95x)2.37Mbps
> 
> Kernel 2.2.16-22 (RedHat kernel)
> Machine "B" to localhost  92.29   Mbps
> Machine "B" to "A" (tulip)57.34   Mbps
> Machine "A" to "B" (tulip)9.98Mbps
> Machine "A" to "B" (3c95x)9.05Mbps
> 
> Now, with both "A" and "B" plugged into a 10BT hub:
> 
> Kernel 2.2.19SMP
> Machine "B" to "A" (tulip)6.96Mbps
> Machine "A" to "B" (tulip)6.89Mbps
> 
> At the end of the runs, I do not see any messages in syslog that would 
> indicate a problem. Using the switch, there were no collisions but looking 
> at /sbin/ifconfig there were a lot of "Frame:" errors on receive. "A lot" 
> means ~30% of the total packets received. This happened with both cards and 
> all kernels.
> 
> The conclusions I draw from this data are:
> 
> 1) Both machines connecting to localhost (data not going out over the wire) 
> give reasonable numbers and are considerably above what I actually see going 
> over the network (as would be expected).
> 2) The P-90 machine seems to have good transmit speed over both cards and 
> all kernels. Transmit performance is close to the localhost numbers, so I 
> can believe them. In the past, I have compared the performance of the FA310 
> to the 3ComSOHO card and there did not seem to be any measurable performance 
> difference between the two.
> 3) Both the FA310 and the 3ComSOHO card have similar receive speeds, leading 
> me to believe that the problem lies with either the machine or the kernel 
> and not the individual cards or drivers.
> 4) Booting the machine as a uni-processor machine (with a non-SMP 2.2.16 
> kernel) did not change anything, so it does not appear to be a problem with 
> SMP.
> 5) Kernel 2.4.3 receive performance is significantly lower than either 2.2.x 
> kernel, so that tends to point to some fundamental problem in the kernel.
> 6) As I understand it, the 3Com card has some hardware acceleration for 
> checksumming, and this is a slow machine, so why is the performance almost 
> identical to the FA310?
> 
> So, my questions are:
> 
> What kind of performance should I be seeing with a P-90 on a 100Mbps 
> connection? I was expecting something in the range of 40-70 Mbps - certainly 
> not 1-2 Mbps.
> 
> What can I do to track this problem down? Has anyone else had problems like 
> this?
> 
> Thanks in advance 

Re: Abysmal RECV network performance

2001-05-31 Thread Stephen Degler

Hi,

I'm guessing that the tulip driver is not setting the chip up correctly.
I've seen this happen with other tulip variants (21143) when tries to
autonegotiate.  if you do an ifconfig eth1 you will see numerous carrier
and crc errors.

Set the tulip_debug flag to 2 or 3 in /etc/modules.conf and see what
gets said.

A newer version of the driver may help you.  You might try the one on
sourceforge.

Also, I've only ever seen full 100BaseT speeds with decent adapters,
like 21143 based tulips, Intel eepros, and vortex/boomerang 3com cards.
A lot of the cheaper controllers just won't get there.

skd

On Mon, May 28, 2001 at 03:47:22AM +, John William wrote:
 Can someone please help me troubleshoot this problem - I am getting abysmal 
 (see numbers below) network performance on my system, but the poor 
 performance seems limited to receiving data. Transmission is OK.
 
 The computer in question is a dual Pentium 90 machine. The machine has 
 RedHat 7.0 (kernel 2.2.16-22 from RedHat). I have compiled 2.2.19 (stock) 
 and 2.4.3 (stock) for the machine and used those for testing. I had a 
 NetGear FA310TX card that I used with the tulip driver and a 3Com 3CSOHO 
 card (Hurricane chipset) that I used with the 3c59x driver. I used the 
 netperf package to test performance (latest version, but I don't have the 
 version number off-hand). The numbers netperf is giving me seem to correlate 
 well to FTP statistics I see to the box.
 
 I have a second machine (P2-350) with a NetGear FA311 (running 2.4.3 and the 
 natsemi driver) that I used to talk with the Pentium 90 machine. The two 
 machines are connected through a NetGear FS105 10/100 switch. I also tried 
 using a 10BT hub (see below).
 
 When connected, the switch indicated 100 Mbps, full duplex connections to 
 both cards. This matches the speed indicator lights on both cards. I have 
 run the miidiag program in the past to verify that the cards are actually 
 set to full duplex, but I didn't run it again this time (this isn't the 
 first time I have tried to chase this problem down).
 
 For the purposes of this message, call the P2-350 machine A and the dual 
 P-90 machine B. I ran the following tests:
 
 Machine A to localhost  754.74  Mbps
 
 Kernel 2.2.19SMP
 Machine B to localhost  80.63   Mbps
 Machine B to A (tulip)55.38   Mbps
 Machine A to B (tulip)10.60   Mbps
 Machine A to B (3c95x)12.10   Mbps
 
 Kernel 2.4.3 SMP
 Machine B to localhost  83.87   Mbps
 Machine B to A (tulip)68.07   Mbps
 Machine A to B (tulip)1.62Mbps
 Machine A to B (3c95x)2.37Mbps
 
 Kernel 2.2.16-22 (RedHat kernel)
 Machine B to localhost  92.29   Mbps
 Machine B to A (tulip)57.34   Mbps
 Machine A to B (tulip)9.98Mbps
 Machine A to B (3c95x)9.05Mbps
 
 Now, with both A and B plugged into a 10BT hub:
 
 Kernel 2.2.19SMP
 Machine B to A (tulip)6.96Mbps
 Machine A to B (tulip)6.89Mbps
 
 At the end of the runs, I do not see any messages in syslog that would 
 indicate a problem. Using the switch, there were no collisions but looking 
 at /sbin/ifconfig there were a lot of Frame: errors on receive. A lot 
 means ~30% of the total packets received. This happened with both cards and 
 all kernels.
 
 The conclusions I draw from this data are:
 
 1) Both machines connecting to localhost (data not going out over the wire) 
 give reasonable numbers and are considerably above what I actually see going 
 over the network (as would be expected).
 2) The P-90 machine seems to have good transmit speed over both cards and 
 all kernels. Transmit performance is close to the localhost numbers, so I 
 can believe them. In the past, I have compared the performance of the FA310 
 to the 3ComSOHO card and there did not seem to be any measurable performance 
 difference between the two.
 3) Both the FA310 and the 3ComSOHO card have similar receive speeds, leading 
 me to believe that the problem lies with either the machine or the kernel 
 and not the individual cards or drivers.
 4) Booting the machine as a uni-processor machine (with a non-SMP 2.2.16 
 kernel) did not change anything, so it does not appear to be a problem with 
 SMP.
 5) Kernel 2.4.3 receive performance is significantly lower than either 2.2.x 
 kernel, so that tends to point to some fundamental problem in the kernel.
 6) As I understand it, the 3Com card has some hardware acceleration for 
 checksumming, and this is a slow machine, so why is the performance almost 
 identical to the FA310?
 
 So, my questions are:
 
 What kind of performance should I be seeing with a P-90 on a 100Mbps 
 connection? I was expecting something in the range of 40-70 Mbps - certainly 
 not 1-2 Mbps.
 
 What can I do to track this problem down? Has anyone else had problems like 
 this?
 
 Thanks in advance for any help you can offer.
 
 - John
 
 _
 Get your FREE download of MSN Explorer at 

Re: magic device renumbering was -- Re: Linux 2.4.2ac20

2001-03-14 Thread Stephen Degler

Hi,

The solution is not to go down the path2inst road, that is full of 
its own traps.  You want volume labels via a volume manager (do lvm and raid
already do this?) and/or filesystem labels (see e2fslabel).  This won't solve
all of the ills associated with device instance changes, but it will certainly
address the biggest one.

skd

On Wed, Mar 14, 2001 at 10:36:40AM -0500, John Jasen wrote:
> 
> The problem:
> 
> drivers change their detection schemes; and changes in the kernel can
> change the order in which devices are assigned names.
> 
> For example, the DAC960(?) drivers changed their order of
> detecting controllers, and I did _not_ have fun, given that the machine in
> question had about 40 disks to deal with, spread across two controllers.
> 
> This can create a lot of problems for people upgrading large, production
> quality systems -- as, in the worst case, the system won't complete the
> boot cycle; or in middle cases, the user/sysadmin is stuck rewriting X
> amount of files and trying again; or in small cases, you find out that
> your SMC and Intel ethernet cards are reversed, and have to go fix things
> ...
> 
> Possible solutions(?):
> 
> Solaris uses an /etc/path_to_inst file, to keep track of device ordering,
> et al.
> 
> Maybe we should consider something similar, where a physical device to
> logical device map is kept and used to keep things consistent on
> kernel/driver changes; device addition/removal, and so forth ...
> 
> I am, of course, open to better solutions.
> 
> --
> -- John E. Jasen ([EMAIL PROTECTED])
> -- In theory, theory and practise are the same. In practise, they aren't.
> 
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: magic device renumbering was -- Re: Linux 2.4.2ac20

2001-03-14 Thread Stephen Degler

Hi,

The solution is not to go down the path2inst road, that is full of 
its own traps.  You want volume labels via a volume manager (do lvm and raid
already do this?) and/or filesystem labels (see e2fslabel).  This won't solve
all of the ills associated with device instance changes, but it will certainly
address the biggest one.

skd

On Wed, Mar 14, 2001 at 10:36:40AM -0500, John Jasen wrote:
 
 The problem:
 
 drivers change their detection schemes; and changes in the kernel can
 change the order in which devices are assigned names.
 
 For example, the DAC960(?) drivers changed their order of
 detecting controllers, and I did _not_ have fun, given that the machine in
 question had about 40 disks to deal with, spread across two controllers.
 
 This can create a lot of problems for people upgrading large, production
 quality systems -- as, in the worst case, the system won't complete the
 boot cycle; or in middle cases, the user/sysadmin is stuck rewriting X
 amount of files and trying again; or in small cases, you find out that
 your SMC and Intel ethernet cards are reversed, and have to go fix things
 ...
 
 Possible solutions(?):
 
 Solaris uses an /etc/path_to_inst file, to keep track of device ordering,
 et al.
 
 Maybe we should consider something similar, where a physical device to
 logical device map is kept and used to keep things consistent on
 kernel/driver changes; device addition/removal, and so forth ...
 
 I am, of course, open to better solutions.
 
 --
 -- John E. Jasen ([EMAIL PROTECTED])
 -- In theory, theory and practise are the same. In practise, they aren't.
 
 
 -
 To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: ARP out the wrong interface

2001-02-08 Thread Stephen Degler

Hi,

What you describe below is having the client mis-addressed to have
the same IP as the server.  Is this what you meant?

skd


On Thu, Feb 08, 2001 at 09:09:49PM -0800, dean gaudet wrote:
> this appears to occur with both 2.2.16 and 2.4.1.
> 
> server:
> 
> eth0 is 192.168.250.11 netmask 255.255.255.0
> eth1 is 192.168.251.11 netmask 255.255.255.0
> 
> they're both connected to the same switch.
> 
> client:
> 
> eth0 is 192.168.251.11 netmask 255.255.255.0
> 
> connected to the same switch as both of server's eth.
> 
> on client i try "ping 192.168.251.11".
> 
> responses come back from both eth0 and eth1, listing each of their
> respective MAC addresses...  it's essentially a race condition at this
> point as to whether i'll get the right MAC address.  ("right" means the
> MAC for server:eth1).
> 
> client# tcpdump -n arp
> Kernel filter, protocol ALL, datagram packet socket
> tcpdump: listening on all devices
> 21:03:05.695089 eth0 > arp who-has 192.168.251.11 tell 192.168.251.25 
>(0:3:47:0:25:80)
> 21:03:05.695405 eth0 < arp reply 192.168.251.11 is-at 0:d0:b7:be:3e:aa 
>(0:3:47:0:25:80)
> 21:03:05.695523 eth0 < arp reply 192.168.251.11 is-at 0:d0:b7:1f:ea:35 
>(0:3:47:0:25:80)
> 
> 
> server# cat /proc/sys/net/ipv4/ip_forward
> 0
> server# cat /proc/sys/net/ipv4/conf/*/proxy_arp
> 0
> 0
> 0
> 0
> 0
> 0
> 0
> 
> is this expected?  it seems broken.
> 
> thanks
> -dean
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> Please read the FAQ at http://www.tux.org/lkml/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: ARP out the wrong interface

2001-02-08 Thread Stephen Degler

Hi,

What you describe below is having the client mis-addressed to have
the same IP as the server.  Is this what you meant?

skd


On Thu, Feb 08, 2001 at 09:09:49PM -0800, dean gaudet wrote:
 this appears to occur with both 2.2.16 and 2.4.1.
 
 server:
 
 eth0 is 192.168.250.11 netmask 255.255.255.0
 eth1 is 192.168.251.11 netmask 255.255.255.0
 
 they're both connected to the same switch.
 
 client:
 
 eth0 is 192.168.251.11 netmask 255.255.255.0
 
 connected to the same switch as both of server's eth.
 
 on client i try "ping 192.168.251.11".
 
 responses come back from both eth0 and eth1, listing each of their
 respective MAC addresses...  it's essentially a race condition at this
 point as to whether i'll get the right MAC address.  ("right" means the
 MAC for server:eth1).
 
 client# tcpdump -n arp
 Kernel filter, protocol ALL, datagram packet socket
 tcpdump: listening on all devices
 21:03:05.695089 eth0  arp who-has 192.168.251.11 tell 192.168.251.25 
(0:3:47:0:25:80)
 21:03:05.695405 eth0  arp reply 192.168.251.11 is-at 0:d0:b7:be:3e:aa 
(0:3:47:0:25:80)
 21:03:05.695523 eth0  arp reply 192.168.251.11 is-at 0:d0:b7:1f:ea:35 
(0:3:47:0:25:80)
 
 
 server# cat /proc/sys/net/ipv4/ip_forward
 0
 server# cat /proc/sys/net/ipv4/conf/*/proxy_arp
 0
 0
 0
 0
 0
 0
 0
 
 is this expected?  it seems broken.
 
 thanks
 -dean
 
 -
 To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
 the body of a message to [EMAIL PROTECTED]
 Please read the FAQ at http://www.tux.org/lkml/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



tulip autonegotiation patch

2001-01-28 Thread Stephen Degler

Hi,

This one-liner fixes a subtle 21143 autonegotiation problem for me on a Zynx
quad card.  The driver would claim to negotiate 100-FD, but would report late
collisions and bad transmit throughput.

The driver still allows packets to be transmitted during autonegotiation,
but that only drops a few packets.

skd

--- 21142.c.bad Sun Jan 28 15:26:25 2001
+++ 21142.c Sun Jan 28 11:51:59 2001
@@ -171,7 +171,7 @@
for (i = 0; i < tp->mtable->leafcount; i++)
if (tp->mtable->mleaf[i].media == dev->if_port) {
tp->cur_index = i;
-   tulip_select_media(dev, 0);
+   tulip_select_media(dev, 1);
setup_done = 1;
break;
}
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



tulip autonegotiation patch

2001-01-28 Thread Stephen Degler

Hi,

This one-liner fixes a subtle 21143 autonegotiation problem for me on a Zynx
quad card.  The driver would claim to negotiate 100-FD, but would report late
collisions and bad transmit throughput.

The driver still allows packets to be transmitted during autonegotiation,
but that only drops a few packets.

skd

--- 21142.c.bad Sun Jan 28 15:26:25 2001
+++ 21142.c Sun Jan 28 11:51:59 2001
@@ -171,7 +171,7 @@
for (i = 0; i  tp-mtable-leafcount; i++)
if (tp-mtable-mleaf[i].media == dev-if_port) {
tp-cur_index = i;
-   tulip_select_media(dev, 0);
+   tulip_select_media(dev, 1);
setup_done = 1;
break;
}
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/