Re: Still getting NFS client locking up

2003-11-12 Thread Robert Watson

On Wed, 12 Nov 2003, Tim Middleton wrote:

> On November 11, 2003 11:36 pm, Janet Sullivan wrote:
> > So far I only have problems in a mixed -STABLE/-CURRENT environment.
> > When the client & server are both -CURRENT I haven't had any problems.
> 
> I just installed another -STABLE box to see if keeping them both -STABLE
> helps. I haven't really tested the NFS yet as I didn't want to risk
> locking the box up in the middle of a buildworld.

If we can demonstrate the problem with both systems as -STABLE, that rules
out a lot of things, and might also raise some questions about the
hardware.

> So i just mounted the NFS drive on the new test box and left it 
> Within an hour the NFS server box doing the build world was locked up
> solid. I can't say if it was NFS mount related or not; nfsd wasn't
> really doing anything. Doesn't seem like it would have been. Beginning
> to wonder if it is some strange hardware problem on this box; which
> coincidentally only shows up when there's an nfs mount! But that doesn't
> explain why my normally rock solid desktop system tanked when being
> tested as an NFS client to that STABLE box. Hmmm...

One of the problems that can occur in -STABLE is a cascading failure when
one file system is jammed up (i.e., an NFS mount from another system). 
Processes hang holding locks in NFS because the NFS session is stalled;
other processes try to aquire the hold locks while holding additional
locks, and before you know it a lot of very useful locks are held and
can't be released due to an inability to free up locks at the cause.  Many
aspects of this problem are believed to be resolved in -CURRENT, but it's
a touch cookies to crack without redoing VFS locking.

If you have a spare system, it might be really interesting to install
-STABLE on it, replicate data from your file server, point the client at
that, and see if the problem still occurs there with the same load.  You
might also try swapping network cards: perhaps we're looking at a network
device driver problem where loss of key packets, or packets over a certain
size, is causing an unrecoverable failure.

> Back to testing.  I'm doing heavy disk I/O tests without any NFS mounts
> now.  If they go okay, back to the NFS mounting and testing... 
> 
> It seems to me there is something desperately wrong with NFS is mixing 
> -CURRENT and -STABLE NFS server/clients causes either side (in my case both 
> sides) to lock up solid. I mean, problems are problems... but solid lockups 
> with no crash messages or anything is ... nasty.

Clearly there's a substantial problem, but it sounds like we're still
having a lot of trouble identifying the circumstances that trigger the
problem, and attempting to narrow things down.  One of the problem with
distributed system debugging is that it's often hard to track the problem
down to a particular source when you catch it partway through a cascading
failure.  For example, it could well be that a server problem is
triggering client symptoms, or it could be that a serious client problem
might consume resources on the server such that other clients couldn't
operate.  Under these circumstances, it can be very difficult to track it
down to a particular cause (a missing unlock on the server, for example). 

> > Are the folks seeing hangs getting any kind of console error messages?
> 
> I see nothing. My server is completely locks up. Nothing responds. The
> drive light (the times i've noticed) is frozen "on". On my desktop box
> the mouse is dead as well.

I can't help but wonder if the server isn't suffering an under-reported
hardare failure.  It might be interesting to see how quickly the problem
vanishes when exchanging various elements.

> > I don't see anything - performance just tanks to the point of being
> > unusable.
> 
> When testing with my desktop box as client, i noticed just before or
> just when the NFS locked up the mouse and keyboard response would be
> very erratic ... slow and jerky.

This might suggest a high RPC load, deep queues in processing, or key
locks held for extended periods of time.

Robert N M Watson FreeBSD Core Team, TrustedBSD Projects
[EMAIL PROTECTED]  Network Associates Laboratories

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Still getting NFS client locking up

2003-11-12 Thread Tim Middleton
On November 11, 2003 11:36 pm, Janet Sullivan wrote:
> So far I only have problems in a mixed -STABLE/-CURRENT environment.
> When the client & server are both -CURRENT I haven't had any problems.

I just installed another -STABLE box to see if keeping them both -STABLE 
helps. I haven't really tested the NFS yet as I didn't want to risk locking 
the box up in the middle of a buildworld. 

So i just mounted the NFS drive on the new test box and left it 
Within an hour the NFS server box doing the build world was locked up solid. I 
can't say if it was NFS mount related or not; nfsd wasn't really doing 
anything. Doesn't seem like it would have been. Beginning to wonder if it is 
some strange hardware problem on this box; which coincidentally only shows up 
when there's an nfs mount! But that doesn't explain why my normally rock 
solid desktop system tanked when being tested as an NFS client to that STABLE 
box. Hmmm... 

Back to testing.  I'm doing heavy disk I/O tests without any NFS mounts now. 
If they go okay, back to the NFS mounting and testing...

It seems to me there is something desperately wrong with NFS is mixing 
-CURRENT and -STABLE NFS server/clients causes either side (in my case both 
sides) to lock up solid. I mean, problems are problems... but solid lockups 
with no crash messages or anything is ... nasty.

> Are the folks seeing hangs getting any kind of console error messages?

I see nothing. My server is completely locks up. Nothing responds. The drive 
light (the times i've noticed) is frozen "on". On my desktop box the mouse is 
dead as well. 

> I don't see anything - performance just tanks to the point of being
> unusable.

When testing with my desktop box as client, i noticed just before or just when 
the NFS locked up the mouse and keyboard response would be very erratic ... 
slow and jerky. 

-- 
Tim Middleton | Cain Gang Ltd | "Who is Ungit?" said he, still holding
[EMAIL PROTECTED] | www.Vex.Net   | my hands. --C.S.Lewis (TWHF)


___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Still getting NFS client locking up

2003-11-12 Thread Marc Olzheim
On Mon, Nov 10, 2003 at 11:28:40AM -0500, Robert Watson wrote:
> How fast are your systems, speaking of which?  I live in the world of
> 300-500 mhz machines at work, and 300-800 mhz boxes at home.  If you're
> using multi-ghz boxes, that could well be the distinguishing factor
> between our configurations...

I collected some information from my client and server, just before my
server crashed (probably because of this), including a tcpdump of the
last seconds before it stops...

http://www.stack.nl/~marcolz/FreeBSD/NFS/

Zlo
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Still getting NFS client locking up

2003-11-11 Thread Janet Sullivan

I'm not having any problems with my -CURRENT client.  My server is
running 4.9-STABLE, so I can't comment on the state of the NFS server
code in -CURRENT.  For what it's worth, my NFS usage is not very heavy,
and is mostly reading, with very little writing.
So far I only have problems in a mixed -STABLE/-CURRENT environment. 
When the client & server are both -CURRENT I haven't had any problems. 
If the client is -CURRENT and the server is -STABLE, I occasionally get 
very, very slow response times (like 40 seconds to get an ls response). 
 I can't blame the response times on my LAN, because everything else 
continues to function properly.

In fact, I just had to reboot my laptop (running -CURRENT from 
2003.10.30.07.10.00) to get my -STABLE nfs mounts back to normal.

Are the folks seeing hangs getting any kind of console error messages? 
I don't see anything - performance just tanks to the point of being 
unusable.

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Still getting NFS client locking up

2003-11-11 Thread Doug White
On Tue, 11 Nov 2003, Tim Middleton wrote:

> The server is a P3-1ghz Intel STL2 box, with 1 gig of ram. Using the onboard
> fxp ethernet at 100baseTX. It is not using dhcp. Nothing much else is running
> on this server box as I'm just testing it. When the server locks the box can
> not even be pinged.

Can you set up a serial console on this system?  If so, enable these
kernel options:
options DDB
options WITNESS
options INVARIANTS
options INVARIANTS_SUPPORT
options BREAK_TO_DEBUGGER

Boot through the serial console then trigger the bug and send a break from
serial.  If you drop into ddb, then its a Giant deadlock.

If you can get that, then do 'show locks' from ddb to get a list of
potential culprits, and 'tr' for what its stuck doing.

The kernel handbook section on kernel debugging will be a useful read.

-- 
Doug White|  FreeBSD: The Power to Serve
[EMAIL PROTECTED]  |  www.FreeBSD.org
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Still getting NFS client locking up

2003-11-11 Thread Don Lewis
On 31 Oct, Kelley Reynolds wrote:
> --- Original Message ---
> From: Matt Smith <[EMAIL PROTECTED]>
> Sent: Fri, 31 Oct 2003 08:55:49 +
> To: Robert Watson <[EMAIL PROTECTED]>
> Subject: Re: Still gettnig NFS client locking up
> 
>> Robert Watson wrote:
>> > On Tue, 28 Oct 2003, Soren Schmidt wrote:
>> > 
>> > 
>> >>>I'm now running a kernel/world of October 26th on both NFS client
>> >>>and server machines. I am still seeing NFS lockups as reported by
>> >>>several people in these threads:
>> >>
>> >>Me too!!
>> > 
>> > 
>> > Hmm.  I'm unable to reproduce this so far, and I'm pounding several
>> > 5.x NFS clients and servers.  I've been checking out using CVS over
>> > NFS, performing dd's of big files, etc.  There must be something
>> > more I'm missing in reproducing this.  What network interface cards
>> > are you using (client, server)? Are you using DHCP on the client or
>> > server?  What commands trigger it -- what part of the NFS
>> > namespace, etc?  Are you running the commands as root, or another
>> > user?
>> > 
>> > Robert N M Watson FreeBSD Core Team, TrustedBSD
>> > Projects [EMAIL PROTECTED]  Network Associates
>> > Laboratories
>> > 
> 
> I'm also experiencing lockups with NFS, but it's the server that locks
> up on mine. Both client and server are -CURRENT. Server was fresh as
> of two days ago, and the client is a week or two old. They are
> connected via bfe (server) and vr (client). The server, I've found,
> will last much longer if the mount options on the client include 'tcp'
> and 'nfsv3' (supposed to be default, but I'm just calling it like it
> is). Reading files seems to be okay, and I've managed to get as far as
> compiling a kernel on an NFS-mounted /usr, but a buildworld will hang
> in < 30 minutes. The server is running dhcp and pf. All commands are
> being run as root.

I'm not having any problems with my -CURRENT client.  My server is
running 4.9-STABLE, so I can't comment on the state of the NFS server
code in -CURRENT.  For what it's worth, my NFS usage is not very heavy,
and is mostly reading, with very little writing.
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Still getting NFS client locking up

2003-11-11 Thread Claus Guttesen
Hi.

> I can lock the NFS server up every time simply by
> mounting the nfs partition 
> (i'm using -t for tcp nfs and exporting with
> -maproot=0:0), and then running 
> "iozone -a" on the nfs client box. It takes a while,
> but the 4.9-RELEASE box 
> will always lock up solid eventually. Not good. )-:
> 

Could you show /etc/exports on the server and
/etc/fstab on the client?
Have you tried udp instead of tcp?

regards
Claus


Yahoo! Mail (http://dk.mail.yahoo.com) - Gratis: 6 MB lagerplads, spamfilter og 
virusscan
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Still getting NFS client locking up

2003-11-11 Thread Tim Middleton
On November 10, 2003 12:44 pm, Soren Schmidt wrote:
> For me its just the server end that fails, I've not seen the client hang.

I'm having a bad NFS day... not sure if it is the same lockups described in 
this thread. In fact perhaps I'm posting to the wrong group since the server 
in questoin is 4.9-STABLE. The client I am testing with is 5.1-CURRENT 
(though a few months old) however. 

I'm trying to make sure this server is stable, as it took a long time to get 
my company to let me switch one of the servers to FreeBSD... things were 
going great until i started testing NFS... now they are going very badly.

I can lock the NFS server up every time simply by mounting the nfs partition 
(i'm using -t for tcp nfs and exporting with -maproot=0:0), and then running 
"iozone -a" on the nfs client box. It takes a while, but the 4.9-RELEASE box 
will always lock up solid eventually. Not good. )-:

I've done tests as root and non-root. Sometimes i can rescue the nfs client 
box with a "mount -f", but sometimes the client box has locked up solid as 
well when I've tried that. 

The server is a P3-1ghz Intel STL2 box, with 1 gig of ram. Using the onboard 
fxp ethernet at 100baseTX. It is not using dhcp. Nothing much else is running 
on this server box as I'm just testing it. When the server locks the box can 
not even be pinged.

Since the box locks up solid its hard to see what may be going on. I have left 
top running to see what it says when it freezes, but it may not be accurate 
depending on when it last refreshed. But for the record it has nfsd in 
"biorw" state.

I tried dumping ps -lax to a file every few seconds while testing... that 
didn't work very well as again the refresh is too slow, and the drive loses 
the last few files when it goes down. Perhaps I can turn off caching or soft 
updates or something to help with this (as you may tell, i'm not a file 
systme expert --- any suggestions, welcome). 

Maybe I should set up another box and test 4.9-RELEASE to 4.9-RELEASE... 
and/or update my 5.1-CURRENT box for further testing... 

-- 
Tim Middleton | Cain Gang Ltd | One afternoon, disgusted, bravo, you fall 
[EMAIL PROTECTED] | www.Vex.Net   | asleep. --T.Lilburn (MS)


___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Still getting NFS client locking up

2003-11-10 Thread Robert Watson

On Mon, 10 Nov 2003, Matt Smith wrote:

> I can certainly spend some time trying to get some proper debug based on
> what you have said in your email. I shall look into setting up a serial
> console etc. 
> 
> In the meantime another piece of information which might be helpful is
> this. Looking at the wtmp to see when I rebuilt my world/kernel I can
> see this: 
> 
> reboot   ~ Tue Oct 21 20:44
> reboot   ~ Wed Oct 15 19:36
> 
> (These times are in BST which is +5 hours from east coast US). 
> 
> On the Oct 15th kernel NFS was working perfectly (and before that). From
> the Oct 21st kernel it has always locked up in this way. So something
> between those two dates was commited which broke this for us. Another
> way of me debugging this I guess is to backtrack my world to each date
> in between systematically and find the exact date it breaks and look at
> the commits. 

Hmm.  The one other thing that might be worth trying, and this is pretty
time-consuming, is attempting to narrow down the threshold kernel change
that caused the failures to start.  Typically, this is done using a binary
search (i.e., find two dates -- one that the kernel works, the other that
it doesn't -- split the difference, repeat until narrowed down to a range
of commits that can be individually inspected).  This way we could try to
identify some suspect changes that could be backed out locally
individually to narrow it down.  The likely categories of commits that
might be worth looking at probably include:

(1) Changes specifically to the network drivers that you're using.
(2) Changes to the network stack, especially relating to locking and
timeouts.  (3) Changes to the NFS client and server code.
(4) Changes in general to VFS and buffer cache locking.

We've had a lot of commits in all of these categories, so narrowing it
down would be a useful way to help figure it out...

Robert N M Watson FreeBSD Core Team, TrustedBSD Projects
[EMAIL PROTECTED]  Network Associates Laboratories


___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Still getting NFS client locking up

2003-11-10 Thread Soren Schmidt
It seems Robert Watson wrote:
> How fast are your systems, speaking of which?  I live in the world of
> 300-500 mhz machines at work, and 300-800 mhz boxes at home.  If you're
> using multi-ghz boxes, that could well be the distinguishing factor
> between our configurations...

Server is 533MhzVIA C3, clients everything from 300Mhz PII to 2.6G P4.

> Ok, here's the strategy I was planning to take once I could reproduce it:
> 
> (1) Attempt to further narrow down responsibility to client/server.  In
> particular, see if an apparent hang on one client affects the other
> clients. 

For me its just the server end that fails, I've not seen the client hang.

> (2) Investigate Soren's report that killing and restarting nfsd on the
> server would clear the hang.

Yups, that works, in fact I have that in my crontab now every minute
to keep NFS from hosing my setup here.
NOTE: I also still need to ifconfig done/up my interfaces on some
boxes or the netstack will freeze (again done every minute in crontab).
However when NFS locks up it seems totatlly unrelated, ie all other 
network traffic works...

> (3) Look at stack traces of involved processes on both the client and
> server: in particular, look at traces for any client blocked in NFS,
> any nfsiod processes on the client, and the nfsd processes on the
> server.  Also look at the wait channels on clients and servers for
> these processes.  Particularly interested in whether nfsd processes
> are blocked trying to grab locks.

Ok, will do..

> (4) Look at netstat information for NFS sockets, in particular, if the
> buffers are full, or not being drained.  In particular, on the server,
> is the input queue not being drained by nfsd worker threads? 

Netstat doesn't seem to give any hints or even usefull info here, 
any special cmdøs you want the output from ?

> (5) Try backing out src/sys/nfsserver/nfs_serv.c:1.137, which removed
> another deadlock problem, but did change locking behavior in the NFS
> server.

No change already tried.

> (6) Look at packet traces between the client and server with ethereal,
> which has pretty good NFS decoding.  Is the client retransmitting an
> RPC to the server and the server just isn't responding, or is the
> client failing to transmit?  At the point of the hang, what sorts of
> RPCs are outstanding to the server?  In the past, we've seen "apparent
> hangs" when some or another more obscure unusual error case on the NFS
> server fails to respond to an RPC, which causes the client to "wait
> forever".

I can try that easily, I'll get a trace to you later tonight...

> Things to look for: normally, idle nfsd and nfsiod processes have a WCHAN
> of "-" (ps -lax), which indicates they're blocked waiting for some event
> to kick them off.  If you see nfsd processes "hung" in another state, it's
> a good sign we've identified a server problem.  In the nfs client
> processes, "nfsrcvlk" typically indicates a process has sent out an RPC
> and is now waiting on a response.

I see the idle '-' wchan here when things go bad IIRC...

-Søren
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Still getting NFS client locking up

2003-11-10 Thread Matt Smith
Robert Watson wrote:

 > I'm fairly baffled.  I tried for many hours to reproduce the problem in
two seperate sets of systems here, and completely failed.  I left
buildworlds, cvs updates, blah blah blah, running for 96 hours across
pools of clients and servers and no hint of the problem.  I also use NFS
daily on my primary workstation at work, as well as in my normal
development setup with diskless crashboxes.  So indeed, there must be some
very specific piece of the picture that I'm not reproducing, such as a
specific network card, or there's a race condition that requires very
specific timing, etc. 

How fast are your systems, speaking of which?  I live in the world of
300-500 mhz machines at work, and 300-800 mhz boxes at home.  If you're
using multi-ghz boxes, that could well be the distinguishing factor
between our configurations...


client is an intel pentium II 300mhz with 256meg ram and 1gig of swap.
server is an athlon XP 2200 with 512meg ram and 1gig of swap.
I can certainly spend some time trying to get some proper debug based on 
what you have said in your email. I shall look into setting up a serial 
console etc.

In the meantime another piece of information which might be helpful is 
this. Looking at the wtmp to see when I rebuilt my world/kernel I can 
see this:

reboot   ~ Tue Oct 21 20:44
reboot   ~ Wed Oct 15 19:36
(These times are in BST which is +5 hours from east coast US).

On the Oct 15th kernel NFS was working perfectly (and before that). From 
the Oct 21st kernel it has always locked up in this way. So something 
between those two dates was commited which broke this for us. Another 
way of me debugging this I guess is to backtrack my world to each date 
in between systematically and find the exact date it breaks and look at 
the commits.

Matt.

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Still getting NFS client locking up

2003-11-10 Thread Robert Watson

On Mon, 10 Nov 2003, Matt Smith wrote:

> With a current build from november the 9th I am still getting exactly
> the same NFS lockups. I assume soren is as well. NFS has basically been
> pretty unusable now for over a month. 
> 
> As only a couple of people have complained about this from what I can
> see I assume it is something related to something specific such as a
> network card? 

I'm fairly baffled.  I tried for many hours to reproduce the problem in
two seperate sets of systems here, and completely failed.  I left
buildworlds, cvs updates, blah blah blah, running for 96 hours across
pools of clients and servers and no hint of the problem.  I also use NFS
daily on my primary workstation at work, as well as in my normal
development setup with diskless crashboxes.  So indeed, there must be some
very specific piece of the picture that I'm not reproducing, such as a
specific network card, or there's a race condition that requires very
specific timing, etc. 

How fast are your systems, speaking of which?  I live in the world of
300-500 mhz machines at work, and 300-800 mhz boxes at home.  If you're
using multi-ghz boxes, that could well be the distinguishing factor
between our configurations...

>  From my testing I only get this lockup when writing to the server. 
> Reading from the server works perfectly all the time. So luckily I can
> still manage an NFS mounted installworld/kernel. 
> 
> I just got the lockup again now whilst it downloaded p5-Net-DNS to
> portupgrade into /usr/ports/distfiles. This is a very small file but it
> was enough to trigger it off. So it doesn't look like a size related
> issue either as I can download around 4% of mysql before it locks up.
> 
> Obviously we should really try and find the cause of this before 5.2. I
> am willing to try any patches/debug on my systems. But I just have zero
> clue about what to look for myself. 
> 
> As a start here is the relevent parts of my dmesg to show the NIC's I'm
> using. I wonder if this corresponds to sorens?
> 
> NFS CLIENT (xl1 would be the card it's using to talk to the server):
> xl0: <3Com 3c905B-TX Fast Etherlink XL> port 0xe400-0xe47f mem 
> 0xea00-0xea7f irq 12 at device 15.0 on pci0
> xl0: Ethernet address: 00:a0:24:ac:e1:b4
> miibus0:  on xl0
> xlphy0: <3Com internal media interface> on miibus0
> xlphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
> xl1: <3Com 3c905-TX Fast Etherlink XL> port 0xe800-0xe83f irq 11 at 
> device 17.0 on pci0
> xl1: Ethernet address: 00:60:08:6d:1e:3b
> miibus1:  on xl1
> nsphy0:  on miibus1
> nsphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
> 
> NFS SERVER:
> xl0: <3Com 3c905C-TX Fast Etherlink XL> port 0x1000-0x107f mem 
> 0xfc304800-0xfc30487f irq 10 at device 7.0 on pci5
> xl0: Ethernet address: 00:04:76:8d:c5:fd
> miibus0:  on xl0
> xlphy0: <3c905C 10/100 internal PHY> on miibus0
> xlphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto

My server:

xl0: <3Com 3c905B-TX Fast Etherlink XL> port 0xd880-0xd8ff mem
0xff202000-0xff20207f irq 11 at device 17.0 on pci0
xl0: Ethernet address: 00:b0:d0:29:ec:ce
miibus2:  on xl0
xlphy0: <3Com internal media interface> on miibus2
xlphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto

My client1:

xl0: <3Com 3c905B-TX Fast Etherlink XL> port 0xdc00-0xdc7f mem
0xff00-0xff7f irq 11 at device 17.0 on pci0
xl0: Ethernet address: 00:c0:4f:0d:6b:bc
miibus0:  on xl0
xlphy0: <3Com internal media interface> on miibus0
xlphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto

My client2:

xl0: <3Com 3c905B-TX Fast Etherlink XL> port 0xd880-0xd8ff mem
0xff202000-0xff20207f irq 11 at device 17.0 on pci0
xl0: Ethernet address: 00:b0:d0:2b:76:d5
miibus2:  on xl0
xlphy0: <3Com internal media interface> on miibus2
xlphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto

> Both connected to a 100meg full duplex switch.

Ditto.

> Any ideas? As I have said I'm happy to enable some major debugging etc. 
> But I just need somebody to give me a step by step guide for what to do
> and look for. 
> In case this thread is too old now and nobody remembers anything about
> it the previous email regarding it is at
> http://docs.freebsd.org/cgi/getmsg.cgi?fetch=1183410+0+archive/2003/freebsd-current/20031102.freebsd-current

Ok, here's the strategy I was planning to take once I could reproduce it:

(1) Attempt to further narrow down responsibility to client/server.  In
particular, see if an apparent hang on one client affects the other
clients. 

(2) Investigate Soren's report that killing and restarting nfsd on the
server would clear the hang.

(3) Look at stack traces of involved processes on both the client and
server: in particular, look at traces for any client blocked in NFS,
any nfsiod processes on the client, and the nfsd processes on the
server.  Also look at the wait channels on clients and servers for
these processes.  Particularly interested in whether nfsd processes
ar

Re: Still getting NFS client locking up

2003-11-10 Thread Kelley Reynolds
--- Original Message ---
From: Soren Schmidt <[EMAIL PROTECTED]>
Sent: Mon, 10 Nov 2003 16:03:47 +0100 (CET)
To: Matt Smith <[EMAIL PROTECTED]>
Subject: Re: Still getting NFS client locking up

> It seems Matt Smith wrote:
> > With a current build from november the 9th I am still getting exactly 
> > the same NFS lockups. I assume soren is as well. NFS has basically been 
> > pretty unusable now for over a month.
> 
> Yes I do, NFS is virtually useless...
> 
> > As only a couple of people have complained about this from what I can 
> > see I assume it is something related to something specific such as a 
> > network card?
> 
> Could be, but its more than one type of card which suggests to me
> its more "generic" in origin..
> 
> >  From my testing I only get this lockup when writing to the server. 
> > Reading from the server works perfectly all the time. So luckily I can 
> > still manage an NFS mounted installworld/kernel.
> 
> I can also lock it up with just reading, but it takes longer.
> 
> > Obviously we should really try and find the cause of this before 5.2. I 
> > am willing to try any patches/debug on my systems. But I just have zero 
> > clue about what to look for myself.
> 
> I think its a definite showstopper for 5.2 actually..
> 

Just to add some more evidence to the mix, I have two 5.1 current boxes using bfe, vr, 
and both have ath, and I am experience all of the lockups on the server end... client 
has yet to lock up.

Kelley
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Still getting NFS client locking up

2003-11-10 Thread Soren Schmidt
It seems Matt Smith wrote:
> With a current build from november the 9th I am still getting exactly 
> the same NFS lockups. I assume soren is as well. NFS has basically been 
> pretty unusable now for over a month.

Yes I do, NFS is virtually useless...

> As only a couple of people have complained about this from what I can 
> see I assume it is something related to something specific such as a 
> network card?

Could be, but its more than one type of card which suggests to me
its more "generic" in origin..

>  From my testing I only get this lockup when writing to the server. 
> Reading from the server works perfectly all the time. So luckily I can 
> still manage an NFS mounted installworld/kernel.

I can also lock it up with just reading, but it takes longer.

> Obviously we should really try and find the cause of this before 5.2. I 
> am willing to try any patches/debug on my systems. But I just have zero 
> clue about what to look for myself.

I think its a definite showstopper for 5.2 actually..

> NFS SERVER:
> xl0: <3Com 3c905C-TX Fast Etherlink XL> port 0x1000-0x107f mem 
> 0xfc304800-0xfc30487f irq 10 at device 7.0 on pci5
> xl0: Ethernet address: 00:04:76:8d:c5:fd
> miibus0:  on xl0
> xlphy0: <3c905C 10/100 internal PHY> on miibus0
> xlphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto

OK the worst server I've got has:
re0:  port 0xdc00-0xdcff mem 0xe400-0xe4ff 
irq 12 at device 9.0 on pci0
rlphy0:  on miibus0
rlphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
re1:  port 0xe000-0xe0ff mem 0xe4001000-0xe40010ff 
irq 10 at device 10.0 on pci0
rlphy1:  on miibus1
rlphy1:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
re2:  port 0xe400-0xe4ff mem 0xe4002000-0xe40020ff 
irq 11 at device 11.0 on pci0
rlphy2:  on miibus2
rlphy2:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto

The clients use fxp/xl/sis cards and can all make this server hang in seconds..

-Søren
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Still getting NFS client locking up

2003-11-10 Thread Matt Smith
With a current build from november the 9th I am still getting exactly 
the same NFS lockups. I assume soren is as well. NFS has basically been 
pretty unusable now for over a month.

As only a couple of people have complained about this from what I can 
see I assume it is something related to something specific such as a 
network card?

From my testing I only get this lockup when writing to the server. 
Reading from the server works perfectly all the time. So luckily I can 
still manage an NFS mounted installworld/kernel.

I just got the lockup again now whilst it downloaded p5-Net-DNS to 
portupgrade into /usr/ports/distfiles. This is a very small file but it 
was enough to trigger it off. So it doesn't look like a size related 
issue either as I can download around 4% of mysql before it locks up.

Obviously we should really try and find the cause of this before 5.2. I 
am willing to try any patches/debug on my systems. But I just have zero 
clue about what to look for myself.

As a start here is the relevent parts of my dmesg to show the NIC's I'm 
using. I wonder if this corresponds to sorens?

NFS CLIENT (xl1 would be the card it's using to talk to the server):
xl0: <3Com 3c905B-TX Fast Etherlink XL> port 0xe400-0xe47f mem 
0xea00-0xea7f irq 12 at device 15.0 on pci0
xl0: Ethernet address: 00:a0:24:ac:e1:b4
miibus0:  on xl0
xlphy0: <3Com internal media interface> on miibus0
xlphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
xl1: <3Com 3c905-TX Fast Etherlink XL> port 0xe800-0xe83f irq 11 at 
device 17.0 on pci0
xl1: Ethernet address: 00:60:08:6d:1e:3b
miibus1:  on xl1
nsphy0:  on miibus1
nsphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto

NFS SERVER:
xl0: <3Com 3c905C-TX Fast Etherlink XL> port 0x1000-0x107f mem 
0xfc304800-0xfc30487f irq 10 at device 7.0 on pci5
xl0: Ethernet address: 00:04:76:8d:c5:fd
miibus0:  on xl0
xlphy0: <3c905C 10/100 internal PHY> on miibus0
xlphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto

Both connected to a 100meg full duplex switch.

Any ideas? As I have said I'm happy to enable some major debugging etc. 
But I just need somebody to give me a step by step guide for what to do 
and look for.

In case this thread is too old now and nobody remembers anything about 
it the previous email regarding it is at 
http://docs.freebsd.org/cgi/getmsg.cgi?fetch=1183410+0+archive/2003/freebsd-current/20031102.freebsd-current

Regards, Matt.

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Still getting NFS client locking up

2003-10-31 Thread Kelley Reynolds
--- Original Message ---
From: Matt Smith <[EMAIL PROTECTED]>
Sent: Fri, 31 Oct 2003 08:55:49 +
To: Robert Watson <[EMAIL PROTECTED]>
Subject: Re: Still gettnig NFS client locking up

> Robert Watson wrote:
> > On Tue, 28 Oct 2003, Soren Schmidt wrote:
> > 
> > 
> >>>I'm now running a kernel/world of October 26th on both NFS client and 
> >>>server machines. I am still seeing NFS lockups as reported by several 
> >>>people in these threads:
> >>
> >>Me too!!
> > 
> > 
> > Hmm.  I'm unable to reproduce this so far, and I'm pounding several 5.x
> > NFS clients and servers.  I've been checking out using CVS over NFS,
> > performing dd's of big files, etc.  There must be something more I'm
> > missing in reproducing this.  What network interface cards are you using
> > (client, server)? Are you using DHCP on the client or server?  What
> > commands trigger it -- what part of the NFS namespace, etc?  Are you
> > running the commands as root, or another user?
> > 
> > Robert N M Watson FreeBSD Core Team, TrustedBSD Projects
> > [EMAIL PROTECTED]  Network Associates Laboratories
> > 

I'm also experiencing lockups with NFS, but it's the server that locks up on mine. 
Both client and server are -CURRENT. Server was fresh as of two days ago, and the 
client is a week or two old. They are connected via bfe (server) and vr (client). The 
server, I've found, will last much longer if the mount options on the client include 
'tcp' and 'nfsv3' (supposed to be default, but I'm just calling it like it is). 
Reading files seems to be okay, and I've managed to get as far as compiling a kernel 
on an NFS-mounted /usr, but a buildworld will hang in < 30 minutes. The server is 
running dhcp and pf. All commands are being run as root.

Kelley
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "[EMAIL PROTECTED]"