Re: NFS client code slow in 2.4.3

2001-04-03 Thread Caleb Epstein

On Tue, Apr 03, 2001 at 02:56:15PM -0400, Caleb Epstein wrote:

>   I am having problems with timeouts and generaly throughput in
> the 2.4.3 NFS client side code which are not present in the 2.4.2
> kernel running in the same configuraiton on the same hardware.  The
> machines are on a 100 Mbit switched local network with essentially
> no other trafic.

On second thought, it looks like 2.4.2 may also exhibit the
same behaviro after a little while.  Now that the machine has
been up for a half hour or so, NFS traffic has become slow on
my 2.4.2 client again.  I am seeing messages like this in my
kernel log:

Apr  3 15:01:54 hagrid kernel: nfs: server tela not responding, still trying
Apr  3 15:01:54 hagrid kernel: nfs: server tela OK

The machines are *not* having any connectivity problems, at
least judging from TCP sessions I have open between them.

So it would seem that NFS performace degrades over a very
short window in 2.4.2+.  It seems to fairly fly when the
machine is freshly booted, but after 30 minutes or less, the
performance is severely degraded.

Is anyone using 2.4.2+ as a NFS server/client with success?
Am I missing something?

-- 
cae at bklyn dot org | Caleb Epstein | bklyn . org | Brooklyn Dust Bunny Mfg.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



NFS client code slow in 2.4.3

2001-04-03 Thread Caleb Epstein


I am having problems with timeouts and generaly throughput in
the 2.4.3 NFS client side code which are not present in the
2.4.2 kernel running in the same configuraiton on the same
hardware.  The machines are on a 100 Mbit switched local
network with essentially no other trafic.

In both cases, testing against a 2.4.3 NFS server (using
knfsd).  My tests involved using "dd" to read a large file on
an NFS mounted directory and running the "connectathon" NFS
test suite.

When I boot my client machine with 2.4.3, reading a 327 Mbyte
file over NFS takes on the order of 5-6 minutes to complete.
If I run the same command witrh the client running kernel
2.4.2, the command completes in about 1 minute.

Running the "cthon01" test suite, the 2.4.3 client machine
basically hangs in the "read + write" test section and I
didn't bother waiting for it to finish.  Again, when switching
back to 2.4.2, the client runs through the tests quite
quickly.

From my tests I'm pretty convinced that something in either
the NFS client code or the networking layer has changed which
has drastically reduced NFS client speeds in 2.4.3.

Is this a known problem?  Can I provide any additional
information to help debug it?

-- 
cae at bklyn dot org | Caleb Epstein | bklyn . org | Brooklyn Dust Bunny Mfg.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: ld-error on 2.4.2 and on 2.4.1

2001-03-01 Thread Caleb Epstein

On Thu, Mar 01, 2001 at 07:35:58PM +0100, Florian Nykrin wrote:

> Hello, Because I don't know which maintainer to write I write to
> this list.  When I try to compile 2.4.2 or 2.4.1 I get the error: ld
> -m elf_i386 -Ttext 0x0 -s -oformat binary bbootsect.o -o bbootsect
> ld: cannot open binary: No such file or directory make[1]: ***
> [bbootsect] Error 1 make[1]: Leaving directory
> `/usr/src/linux-2.4.2/arch/i386/boot'

perl -pi -e 's/-oformat/--oformat/' arch/i386/boot/Makefile

-- 
cae at bklyn dot org | Caleb Epstein | bklyn . org | Brooklyn Dust Bunny Mfg.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



NETDEV WATCHDOG: eth0: transmit timed out

2001-03-01 Thread Caleb Epstein


I am seeing the following error after my machine has been up
for a while.  My eth0 is connected to a switched, local
subnet.  There is not a lot of traffic on the interface, maybe
a few 100 Mbytes or so.  Taking the interface down and then up
again fixes the problem (until it happens again :)

Here is the relevant section from my kernel log

Mar  1 10:48:44 tela kernel: NETDEV WATCHDOG: eth0: transmit timed out
Mar  1 10:48:44 tela kernel: eth0: transmit timed out, tx_status 00 status e000.
Mar  1 10:48:44 tela kernel:   diagnostics: net 0ec0 media 4810 dma 0021.
Mar  1 10:48:44 tela kernel:   Flags; bus-master 1, full 1; dirty 87959(7) current 
87975(7).
Mar  1 10:48:44 tela kernel:   Transmit list 01252270 vs. c1252270.
Mar  1 10:48:44 tela kernel:   0: @c1252200  length 80f7 status 00f7
Mar  1 10:48:44 tela kernel:   1: @c1252210  length 810c status 010c
Mar  1 10:48:44 tela kernel:   2: @c1252220  length 80f7 status 00f7
Mar  1 10:48:44 tela kernel:   3: @c1252230  length 810c status 010c
Mar  1 10:48:44 tela kernel:   4: @c1252240  length 80f7 status 00f7
Mar  1 10:48:44 tela kernel:   5: @c1252250  length 802a status 802a
Mar  1 10:48:44 tela kernel:   6: @c1252260  length 802a status 802a
Mar  1 10:48:44 tela kernel:   7: @c1252270  length 810c status 010c
Mar  1 10:48:44 tela kernel:   8: @c1252280  length 80f7 status 00f7
Mar  1 10:48:44 tela kernel:   9: @c1252290  length 810c status 010c
Mar  1 10:48:44 tela kernel:   10: @c12522a0  length 80f7 status 00f7
Mar  1 10:48:44 tela kernel:   11: @c12522b0  length 810c status 010c
Mar  1 10:48:44 tela kernel:   12: @c12522c0  length 80f7 status 00f7
Mar  1 10:48:44 tela kernel:   13: @c12522d0  length 810c status 010c
Mar  1 10:48:44 tela kernel:   14: @c12522e0  length 80f7 status 00f7
Mar  1 10:48:44 tela kernel:   15: @c12522f0  length 810c status 010c
Mar  1 10:48:44 tela kernel: eth0: Resetting the Tx ring pointer.

Then a similar dump repeats until the interface is recycled.
It appears that the interface was not functioning for some
hours before the message was generated, and it was my attempt
to ping a host on the local subnet that caused the NETDEV
WATCHDOG error to be generated (e.g. the card locked up, but
the kernel didn't notice until I tried to send something on
the wire).

The card is:

eth0: 3Com PCI 3c900 Boomerang 10Mbps Combo at 0x1400,
00:60:08:bd:ab:0e, IRQ 9

I am running kernel 2.4.2, and have seen this error in 2.4.1
as well; not sure about 2.4.0.  I do not ever recall
encountering this error with the 2.2.x kernels, though my
network topology has changed, but not my hardware.  I know of
at least one other person who gets this same error with a
eth0: 3Com PCI 3c905B Cyclone 100baseTx card.  The system is a
P2-300, 128 Mb RAM, running various versions of Linux very
happily for 3 years.

FWIW, IRQ 9 is shared with the bttv module, though the network
lockup doesn't seem to be related to my use of that module.  I
was using xawtv last night while the interface was stil active
and functioning.  The lockup happened this morning.

Sorry for the long-winded post.  Is this a known bug?
Anything I can do to help track it down and squash it if so?

-- 
cae at bklyn dot org | Caleb Epstein | bklyn . org | Brooklyn Dust Bunny Mfg.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/