Re: Still getting NFS client locking up
On Wed, 12 Nov 2003, Tim Middleton wrote: > On November 11, 2003 11:36 pm, Janet Sullivan wrote: > > So far I only have problems in a mixed -STABLE/-CURRENT environment. > > When the client & server are both -CURRENT I haven't had any problems. > > I just installed another -STABLE box to see if keeping them both -STABLE > helps. I haven't really tested the NFS yet as I didn't want to risk > locking the box up in the middle of a buildworld. If we can demonstrate the problem with both systems as -STABLE, that rules out a lot of things, and might also raise some questions about the hardware. > So i just mounted the NFS drive on the new test box and left it > Within an hour the NFS server box doing the build world was locked up > solid. I can't say if it was NFS mount related or not; nfsd wasn't > really doing anything. Doesn't seem like it would have been. Beginning > to wonder if it is some strange hardware problem on this box; which > coincidentally only shows up when there's an nfs mount! But that doesn't > explain why my normally rock solid desktop system tanked when being > tested as an NFS client to that STABLE box. Hmmm... One of the problems that can occur in -STABLE is a cascading failure when one file system is jammed up (i.e., an NFS mount from another system). Processes hang holding locks in NFS because the NFS session is stalled; other processes try to aquire the hold locks while holding additional locks, and before you know it a lot of very useful locks are held and can't be released due to an inability to free up locks at the cause. Many aspects of this problem are believed to be resolved in -CURRENT, but it's a touch cookies to crack without redoing VFS locking. If you have a spare system, it might be really interesting to install -STABLE on it, replicate data from your file server, point the client at that, and see if the problem still occurs there with the same load. You might also try swapping network cards: perhaps we're looking at a network device driver problem where loss of key packets, or packets over a certain size, is causing an unrecoverable failure. > Back to testing. I'm doing heavy disk I/O tests without any NFS mounts > now. If they go okay, back to the NFS mounting and testing... > > It seems to me there is something desperately wrong with NFS is mixing > -CURRENT and -STABLE NFS server/clients causes either side (in my case both > sides) to lock up solid. I mean, problems are problems... but solid lockups > with no crash messages or anything is ... nasty. Clearly there's a substantial problem, but it sounds like we're still having a lot of trouble identifying the circumstances that trigger the problem, and attempting to narrow things down. One of the problem with distributed system debugging is that it's often hard to track the problem down to a particular source when you catch it partway through a cascading failure. For example, it could well be that a server problem is triggering client symptoms, or it could be that a serious client problem might consume resources on the server such that other clients couldn't operate. Under these circumstances, it can be very difficult to track it down to a particular cause (a missing unlock on the server, for example). > > Are the folks seeing hangs getting any kind of console error messages? > > I see nothing. My server is completely locks up. Nothing responds. The > drive light (the times i've noticed) is frozen "on". On my desktop box > the mouse is dead as well. I can't help but wonder if the server isn't suffering an under-reported hardare failure. It might be interesting to see how quickly the problem vanishes when exchanging various elements. > > I don't see anything - performance just tanks to the point of being > > unusable. > > When testing with my desktop box as client, i noticed just before or > just when the NFS locked up the mouse and keyboard response would be > very erratic ... slow and jerky. This might suggest a high RPC load, deep queues in processing, or key locks held for extended periods of time. Robert N M Watson FreeBSD Core Team, TrustedBSD Projects [EMAIL PROTECTED] Network Associates Laboratories ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Still getting NFS client locking up
On November 11, 2003 11:36 pm, Janet Sullivan wrote: > So far I only have problems in a mixed -STABLE/-CURRENT environment. > When the client & server are both -CURRENT I haven't had any problems. I just installed another -STABLE box to see if keeping them both -STABLE helps. I haven't really tested the NFS yet as I didn't want to risk locking the box up in the middle of a buildworld. So i just mounted the NFS drive on the new test box and left it Within an hour the NFS server box doing the build world was locked up solid. I can't say if it was NFS mount related or not; nfsd wasn't really doing anything. Doesn't seem like it would have been. Beginning to wonder if it is some strange hardware problem on this box; which coincidentally only shows up when there's an nfs mount! But that doesn't explain why my normally rock solid desktop system tanked when being tested as an NFS client to that STABLE box. Hmmm... Back to testing. I'm doing heavy disk I/O tests without any NFS mounts now. If they go okay, back to the NFS mounting and testing... It seems to me there is something desperately wrong with NFS is mixing -CURRENT and -STABLE NFS server/clients causes either side (in my case both sides) to lock up solid. I mean, problems are problems... but solid lockups with no crash messages or anything is ... nasty. > Are the folks seeing hangs getting any kind of console error messages? I see nothing. My server is completely locks up. Nothing responds. The drive light (the times i've noticed) is frozen "on". On my desktop box the mouse is dead as well. > I don't see anything - performance just tanks to the point of being > unusable. When testing with my desktop box as client, i noticed just before or just when the NFS locked up the mouse and keyboard response would be very erratic ... slow and jerky. -- Tim Middleton | Cain Gang Ltd | "Who is Ungit?" said he, still holding [EMAIL PROTECTED] | www.Vex.Net | my hands. --C.S.Lewis (TWHF) ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Still getting NFS client locking up
On Mon, Nov 10, 2003 at 11:28:40AM -0500, Robert Watson wrote: > How fast are your systems, speaking of which? I live in the world of > 300-500 mhz machines at work, and 300-800 mhz boxes at home. If you're > using multi-ghz boxes, that could well be the distinguishing factor > between our configurations... I collected some information from my client and server, just before my server crashed (probably because of this), including a tcpdump of the last seconds before it stops... http://www.stack.nl/~marcolz/FreeBSD/NFS/ Zlo ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Still getting NFS client locking up
I'm not having any problems with my -CURRENT client. My server is running 4.9-STABLE, so I can't comment on the state of the NFS server code in -CURRENT. For what it's worth, my NFS usage is not very heavy, and is mostly reading, with very little writing. So far I only have problems in a mixed -STABLE/-CURRENT environment. When the client & server are both -CURRENT I haven't had any problems. If the client is -CURRENT and the server is -STABLE, I occasionally get very, very slow response times (like 40 seconds to get an ls response). I can't blame the response times on my LAN, because everything else continues to function properly. In fact, I just had to reboot my laptop (running -CURRENT from 2003.10.30.07.10.00) to get my -STABLE nfs mounts back to normal. Are the folks seeing hangs getting any kind of console error messages? I don't see anything - performance just tanks to the point of being unusable. ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Still getting NFS client locking up
On Tue, 11 Nov 2003, Tim Middleton wrote: > The server is a P3-1ghz Intel STL2 box, with 1 gig of ram. Using the onboard > fxp ethernet at 100baseTX. It is not using dhcp. Nothing much else is running > on this server box as I'm just testing it. When the server locks the box can > not even be pinged. Can you set up a serial console on this system? If so, enable these kernel options: options DDB options WITNESS options INVARIANTS options INVARIANTS_SUPPORT options BREAK_TO_DEBUGGER Boot through the serial console then trigger the bug and send a break from serial. If you drop into ddb, then its a Giant deadlock. If you can get that, then do 'show locks' from ddb to get a list of potential culprits, and 'tr' for what its stuck doing. The kernel handbook section on kernel debugging will be a useful read. -- Doug White| FreeBSD: The Power to Serve [EMAIL PROTECTED] | www.FreeBSD.org ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Still getting NFS client locking up
On 31 Oct, Kelley Reynolds wrote: > --- Original Message --- > From: Matt Smith <[EMAIL PROTECTED]> > Sent: Fri, 31 Oct 2003 08:55:49 + > To: Robert Watson <[EMAIL PROTECTED]> > Subject: Re: Still gettnig NFS client locking up > >> Robert Watson wrote: >> > On Tue, 28 Oct 2003, Soren Schmidt wrote: >> > >> > >> >>>I'm now running a kernel/world of October 26th on both NFS client >> >>>and server machines. I am still seeing NFS lockups as reported by >> >>>several people in these threads: >> >> >> >>Me too!! >> > >> > >> > Hmm. I'm unable to reproduce this so far, and I'm pounding several >> > 5.x NFS clients and servers. I've been checking out using CVS over >> > NFS, performing dd's of big files, etc. There must be something >> > more I'm missing in reproducing this. What network interface cards >> > are you using (client, server)? Are you using DHCP on the client or >> > server? What commands trigger it -- what part of the NFS >> > namespace, etc? Are you running the commands as root, or another >> > user? >> > >> > Robert N M Watson FreeBSD Core Team, TrustedBSD >> > Projects [EMAIL PROTECTED] Network Associates >> > Laboratories >> > > > I'm also experiencing lockups with NFS, but it's the server that locks > up on mine. Both client and server are -CURRENT. Server was fresh as > of two days ago, and the client is a week or two old. They are > connected via bfe (server) and vr (client). The server, I've found, > will last much longer if the mount options on the client include 'tcp' > and 'nfsv3' (supposed to be default, but I'm just calling it like it > is). Reading files seems to be okay, and I've managed to get as far as > compiling a kernel on an NFS-mounted /usr, but a buildworld will hang > in < 30 minutes. The server is running dhcp and pf. All commands are > being run as root. I'm not having any problems with my -CURRENT client. My server is running 4.9-STABLE, so I can't comment on the state of the NFS server code in -CURRENT. For what it's worth, my NFS usage is not very heavy, and is mostly reading, with very little writing. ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Still getting NFS client locking up
Hi. > I can lock the NFS server up every time simply by > mounting the nfs partition > (i'm using -t for tcp nfs and exporting with > -maproot=0:0), and then running > "iozone -a" on the nfs client box. It takes a while, > but the 4.9-RELEASE box > will always lock up solid eventually. Not good. )-: > Could you show /etc/exports on the server and /etc/fstab on the client? Have you tried udp instead of tcp? regards Claus Yahoo! Mail (http://dk.mail.yahoo.com) - Gratis: 6 MB lagerplads, spamfilter og virusscan ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Still getting NFS client locking up
On November 10, 2003 12:44 pm, Soren Schmidt wrote: > For me its just the server end that fails, I've not seen the client hang. I'm having a bad NFS day... not sure if it is the same lockups described in this thread. In fact perhaps I'm posting to the wrong group since the server in questoin is 4.9-STABLE. The client I am testing with is 5.1-CURRENT (though a few months old) however. I'm trying to make sure this server is stable, as it took a long time to get my company to let me switch one of the servers to FreeBSD... things were going great until i started testing NFS... now they are going very badly. I can lock the NFS server up every time simply by mounting the nfs partition (i'm using -t for tcp nfs and exporting with -maproot=0:0), and then running "iozone -a" on the nfs client box. It takes a while, but the 4.9-RELEASE box will always lock up solid eventually. Not good. )-: I've done tests as root and non-root. Sometimes i can rescue the nfs client box with a "mount -f", but sometimes the client box has locked up solid as well when I've tried that. The server is a P3-1ghz Intel STL2 box, with 1 gig of ram. Using the onboard fxp ethernet at 100baseTX. It is not using dhcp. Nothing much else is running on this server box as I'm just testing it. When the server locks the box can not even be pinged. Since the box locks up solid its hard to see what may be going on. I have left top running to see what it says when it freezes, but it may not be accurate depending on when it last refreshed. But for the record it has nfsd in "biorw" state. I tried dumping ps -lax to a file every few seconds while testing... that didn't work very well as again the refresh is too slow, and the drive loses the last few files when it goes down. Perhaps I can turn off caching or soft updates or something to help with this (as you may tell, i'm not a file systme expert --- any suggestions, welcome). Maybe I should set up another box and test 4.9-RELEASE to 4.9-RELEASE... and/or update my 5.1-CURRENT box for further testing... -- Tim Middleton | Cain Gang Ltd | One afternoon, disgusted, bravo, you fall [EMAIL PROTECTED] | www.Vex.Net | asleep. --T.Lilburn (MS) ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Still getting NFS client locking up
On Mon, 10 Nov 2003, Matt Smith wrote: > I can certainly spend some time trying to get some proper debug based on > what you have said in your email. I shall look into setting up a serial > console etc. > > In the meantime another piece of information which might be helpful is > this. Looking at the wtmp to see when I rebuilt my world/kernel I can > see this: > > reboot ~ Tue Oct 21 20:44 > reboot ~ Wed Oct 15 19:36 > > (These times are in BST which is +5 hours from east coast US). > > On the Oct 15th kernel NFS was working perfectly (and before that). From > the Oct 21st kernel it has always locked up in this way. So something > between those two dates was commited which broke this for us. Another > way of me debugging this I guess is to backtrack my world to each date > in between systematically and find the exact date it breaks and look at > the commits. Hmm. The one other thing that might be worth trying, and this is pretty time-consuming, is attempting to narrow down the threshold kernel change that caused the failures to start. Typically, this is done using a binary search (i.e., find two dates -- one that the kernel works, the other that it doesn't -- split the difference, repeat until narrowed down to a range of commits that can be individually inspected). This way we could try to identify some suspect changes that could be backed out locally individually to narrow it down. The likely categories of commits that might be worth looking at probably include: (1) Changes specifically to the network drivers that you're using. (2) Changes to the network stack, especially relating to locking and timeouts. (3) Changes to the NFS client and server code. (4) Changes in general to VFS and buffer cache locking. We've had a lot of commits in all of these categories, so narrowing it down would be a useful way to help figure it out... Robert N M Watson FreeBSD Core Team, TrustedBSD Projects [EMAIL PROTECTED] Network Associates Laboratories ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Still getting NFS client locking up
It seems Robert Watson wrote: > How fast are your systems, speaking of which? I live in the world of > 300-500 mhz machines at work, and 300-800 mhz boxes at home. If you're > using multi-ghz boxes, that could well be the distinguishing factor > between our configurations... Server is 533MhzVIA C3, clients everything from 300Mhz PII to 2.6G P4. > Ok, here's the strategy I was planning to take once I could reproduce it: > > (1) Attempt to further narrow down responsibility to client/server. In > particular, see if an apparent hang on one client affects the other > clients. For me its just the server end that fails, I've not seen the client hang. > (2) Investigate Soren's report that killing and restarting nfsd on the > server would clear the hang. Yups, that works, in fact I have that in my crontab now every minute to keep NFS from hosing my setup here. NOTE: I also still need to ifconfig done/up my interfaces on some boxes or the netstack will freeze (again done every minute in crontab). However when NFS locks up it seems totatlly unrelated, ie all other network traffic works... > (3) Look at stack traces of involved processes on both the client and > server: in particular, look at traces for any client blocked in NFS, > any nfsiod processes on the client, and the nfsd processes on the > server. Also look at the wait channels on clients and servers for > these processes. Particularly interested in whether nfsd processes > are blocked trying to grab locks. Ok, will do.. > (4) Look at netstat information for NFS sockets, in particular, if the > buffers are full, or not being drained. In particular, on the server, > is the input queue not being drained by nfsd worker threads? Netstat doesn't seem to give any hints or even usefull info here, any special cmdøs you want the output from ? > (5) Try backing out src/sys/nfsserver/nfs_serv.c:1.137, which removed > another deadlock problem, but did change locking behavior in the NFS > server. No change already tried. > (6) Look at packet traces between the client and server with ethereal, > which has pretty good NFS decoding. Is the client retransmitting an > RPC to the server and the server just isn't responding, or is the > client failing to transmit? At the point of the hang, what sorts of > RPCs are outstanding to the server? In the past, we've seen "apparent > hangs" when some or another more obscure unusual error case on the NFS > server fails to respond to an RPC, which causes the client to "wait > forever". I can try that easily, I'll get a trace to you later tonight... > Things to look for: normally, idle nfsd and nfsiod processes have a WCHAN > of "-" (ps -lax), which indicates they're blocked waiting for some event > to kick them off. If you see nfsd processes "hung" in another state, it's > a good sign we've identified a server problem. In the nfs client > processes, "nfsrcvlk" typically indicates a process has sent out an RPC > and is now waiting on a response. I see the idle '-' wchan here when things go bad IIRC... -Søren ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Still getting NFS client locking up
Robert Watson wrote: > I'm fairly baffled. I tried for many hours to reproduce the problem in two seperate sets of systems here, and completely failed. I left buildworlds, cvs updates, blah blah blah, running for 96 hours across pools of clients and servers and no hint of the problem. I also use NFS daily on my primary workstation at work, as well as in my normal development setup with diskless crashboxes. So indeed, there must be some very specific piece of the picture that I'm not reproducing, such as a specific network card, or there's a race condition that requires very specific timing, etc. How fast are your systems, speaking of which? I live in the world of 300-500 mhz machines at work, and 300-800 mhz boxes at home. If you're using multi-ghz boxes, that could well be the distinguishing factor between our configurations... client is an intel pentium II 300mhz with 256meg ram and 1gig of swap. server is an athlon XP 2200 with 512meg ram and 1gig of swap. I can certainly spend some time trying to get some proper debug based on what you have said in your email. I shall look into setting up a serial console etc. In the meantime another piece of information which might be helpful is this. Looking at the wtmp to see when I rebuilt my world/kernel I can see this: reboot ~ Tue Oct 21 20:44 reboot ~ Wed Oct 15 19:36 (These times are in BST which is +5 hours from east coast US). On the Oct 15th kernel NFS was working perfectly (and before that). From the Oct 21st kernel it has always locked up in this way. So something between those two dates was commited which broke this for us. Another way of me debugging this I guess is to backtrack my world to each date in between systematically and find the exact date it breaks and look at the commits. Matt. ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Still getting NFS client locking up
On Mon, 10 Nov 2003, Matt Smith wrote: > With a current build from november the 9th I am still getting exactly > the same NFS lockups. I assume soren is as well. NFS has basically been > pretty unusable now for over a month. > > As only a couple of people have complained about this from what I can > see I assume it is something related to something specific such as a > network card? I'm fairly baffled. I tried for many hours to reproduce the problem in two seperate sets of systems here, and completely failed. I left buildworlds, cvs updates, blah blah blah, running for 96 hours across pools of clients and servers and no hint of the problem. I also use NFS daily on my primary workstation at work, as well as in my normal development setup with diskless crashboxes. So indeed, there must be some very specific piece of the picture that I'm not reproducing, such as a specific network card, or there's a race condition that requires very specific timing, etc. How fast are your systems, speaking of which? I live in the world of 300-500 mhz machines at work, and 300-800 mhz boxes at home. If you're using multi-ghz boxes, that could well be the distinguishing factor between our configurations... > From my testing I only get this lockup when writing to the server. > Reading from the server works perfectly all the time. So luckily I can > still manage an NFS mounted installworld/kernel. > > I just got the lockup again now whilst it downloaded p5-Net-DNS to > portupgrade into /usr/ports/distfiles. This is a very small file but it > was enough to trigger it off. So it doesn't look like a size related > issue either as I can download around 4% of mysql before it locks up. > > Obviously we should really try and find the cause of this before 5.2. I > am willing to try any patches/debug on my systems. But I just have zero > clue about what to look for myself. > > As a start here is the relevent parts of my dmesg to show the NIC's I'm > using. I wonder if this corresponds to sorens? > > NFS CLIENT (xl1 would be the card it's using to talk to the server): > xl0: <3Com 3c905B-TX Fast Etherlink XL> port 0xe400-0xe47f mem > 0xea00-0xea7f irq 12 at device 15.0 on pci0 > xl0: Ethernet address: 00:a0:24:ac:e1:b4 > miibus0: on xl0 > xlphy0: <3Com internal media interface> on miibus0 > xlphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto > xl1: <3Com 3c905-TX Fast Etherlink XL> port 0xe800-0xe83f irq 11 at > device 17.0 on pci0 > xl1: Ethernet address: 00:60:08:6d:1e:3b > miibus1: on xl1 > nsphy0: on miibus1 > nsphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto > > NFS SERVER: > xl0: <3Com 3c905C-TX Fast Etherlink XL> port 0x1000-0x107f mem > 0xfc304800-0xfc30487f irq 10 at device 7.0 on pci5 > xl0: Ethernet address: 00:04:76:8d:c5:fd > miibus0: on xl0 > xlphy0: <3c905C 10/100 internal PHY> on miibus0 > xlphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto My server: xl0: <3Com 3c905B-TX Fast Etherlink XL> port 0xd880-0xd8ff mem 0xff202000-0xff20207f irq 11 at device 17.0 on pci0 xl0: Ethernet address: 00:b0:d0:29:ec:ce miibus2: on xl0 xlphy0: <3Com internal media interface> on miibus2 xlphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto My client1: xl0: <3Com 3c905B-TX Fast Etherlink XL> port 0xdc00-0xdc7f mem 0xff00-0xff7f irq 11 at device 17.0 on pci0 xl0: Ethernet address: 00:c0:4f:0d:6b:bc miibus0: on xl0 xlphy0: <3Com internal media interface> on miibus0 xlphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto My client2: xl0: <3Com 3c905B-TX Fast Etherlink XL> port 0xd880-0xd8ff mem 0xff202000-0xff20207f irq 11 at device 17.0 on pci0 xl0: Ethernet address: 00:b0:d0:2b:76:d5 miibus2: on xl0 xlphy0: <3Com internal media interface> on miibus2 xlphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto > Both connected to a 100meg full duplex switch. Ditto. > Any ideas? As I have said I'm happy to enable some major debugging etc. > But I just need somebody to give me a step by step guide for what to do > and look for. > In case this thread is too old now and nobody remembers anything about > it the previous email regarding it is at > http://docs.freebsd.org/cgi/getmsg.cgi?fetch=1183410+0+archive/2003/freebsd-current/20031102.freebsd-current Ok, here's the strategy I was planning to take once I could reproduce it: (1) Attempt to further narrow down responsibility to client/server. In particular, see if an apparent hang on one client affects the other clients. (2) Investigate Soren's report that killing and restarting nfsd on the server would clear the hang. (3) Look at stack traces of involved processes on both the client and server: in particular, look at traces for any client blocked in NFS, any nfsiod processes on the client, and the nfsd processes on the server. Also look at the wait channels on clients and servers for these processes. Particularly interested in whether nfsd processes ar
Re: Still getting NFS client locking up
--- Original Message --- From: Soren Schmidt <[EMAIL PROTECTED]> Sent: Mon, 10 Nov 2003 16:03:47 +0100 (CET) To: Matt Smith <[EMAIL PROTECTED]> Subject: Re: Still getting NFS client locking up > It seems Matt Smith wrote: > > With a current build from november the 9th I am still getting exactly > > the same NFS lockups. I assume soren is as well. NFS has basically been > > pretty unusable now for over a month. > > Yes I do, NFS is virtually useless... > > > As only a couple of people have complained about this from what I can > > see I assume it is something related to something specific such as a > > network card? > > Could be, but its more than one type of card which suggests to me > its more "generic" in origin.. > > > From my testing I only get this lockup when writing to the server. > > Reading from the server works perfectly all the time. So luckily I can > > still manage an NFS mounted installworld/kernel. > > I can also lock it up with just reading, but it takes longer. > > > Obviously we should really try and find the cause of this before 5.2. I > > am willing to try any patches/debug on my systems. But I just have zero > > clue about what to look for myself. > > I think its a definite showstopper for 5.2 actually.. > Just to add some more evidence to the mix, I have two 5.1 current boxes using bfe, vr, and both have ath, and I am experience all of the lockups on the server end... client has yet to lock up. Kelley ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Still getting NFS client locking up
It seems Matt Smith wrote: > With a current build from november the 9th I am still getting exactly > the same NFS lockups. I assume soren is as well. NFS has basically been > pretty unusable now for over a month. Yes I do, NFS is virtually useless... > As only a couple of people have complained about this from what I can > see I assume it is something related to something specific such as a > network card? Could be, but its more than one type of card which suggests to me its more "generic" in origin.. > From my testing I only get this lockup when writing to the server. > Reading from the server works perfectly all the time. So luckily I can > still manage an NFS mounted installworld/kernel. I can also lock it up with just reading, but it takes longer. > Obviously we should really try and find the cause of this before 5.2. I > am willing to try any patches/debug on my systems. But I just have zero > clue about what to look for myself. I think its a definite showstopper for 5.2 actually.. > NFS SERVER: > xl0: <3Com 3c905C-TX Fast Etherlink XL> port 0x1000-0x107f mem > 0xfc304800-0xfc30487f irq 10 at device 7.0 on pci5 > xl0: Ethernet address: 00:04:76:8d:c5:fd > miibus0: on xl0 > xlphy0: <3c905C 10/100 internal PHY> on miibus0 > xlphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto OK the worst server I've got has: re0: port 0xdc00-0xdcff mem 0xe400-0xe4ff irq 12 at device 9.0 on pci0 rlphy0: on miibus0 rlphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto re1: port 0xe000-0xe0ff mem 0xe4001000-0xe40010ff irq 10 at device 10.0 on pci0 rlphy1: on miibus1 rlphy1: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto re2: port 0xe400-0xe4ff mem 0xe4002000-0xe40020ff irq 11 at device 11.0 on pci0 rlphy2: on miibus2 rlphy2: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto The clients use fxp/xl/sis cards and can all make this server hang in seconds.. -Søren ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Still getting NFS client locking up
With a current build from november the 9th I am still getting exactly the same NFS lockups. I assume soren is as well. NFS has basically been pretty unusable now for over a month. As only a couple of people have complained about this from what I can see I assume it is something related to something specific such as a network card? From my testing I only get this lockup when writing to the server. Reading from the server works perfectly all the time. So luckily I can still manage an NFS mounted installworld/kernel. I just got the lockup again now whilst it downloaded p5-Net-DNS to portupgrade into /usr/ports/distfiles. This is a very small file but it was enough to trigger it off. So it doesn't look like a size related issue either as I can download around 4% of mysql before it locks up. Obviously we should really try and find the cause of this before 5.2. I am willing to try any patches/debug on my systems. But I just have zero clue about what to look for myself. As a start here is the relevent parts of my dmesg to show the NIC's I'm using. I wonder if this corresponds to sorens? NFS CLIENT (xl1 would be the card it's using to talk to the server): xl0: <3Com 3c905B-TX Fast Etherlink XL> port 0xe400-0xe47f mem 0xea00-0xea7f irq 12 at device 15.0 on pci0 xl0: Ethernet address: 00:a0:24:ac:e1:b4 miibus0: on xl0 xlphy0: <3Com internal media interface> on miibus0 xlphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto xl1: <3Com 3c905-TX Fast Etherlink XL> port 0xe800-0xe83f irq 11 at device 17.0 on pci0 xl1: Ethernet address: 00:60:08:6d:1e:3b miibus1: on xl1 nsphy0: on miibus1 nsphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto NFS SERVER: xl0: <3Com 3c905C-TX Fast Etherlink XL> port 0x1000-0x107f mem 0xfc304800-0xfc30487f irq 10 at device 7.0 on pci5 xl0: Ethernet address: 00:04:76:8d:c5:fd miibus0: on xl0 xlphy0: <3c905C 10/100 internal PHY> on miibus0 xlphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto Both connected to a 100meg full duplex switch. Any ideas? As I have said I'm happy to enable some major debugging etc. But I just need somebody to give me a step by step guide for what to do and look for. In case this thread is too old now and nobody remembers anything about it the previous email regarding it is at http://docs.freebsd.org/cgi/getmsg.cgi?fetch=1183410+0+archive/2003/freebsd-current/20031102.freebsd-current Regards, Matt. ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Still getting NFS client locking up
--- Original Message --- From: Matt Smith <[EMAIL PROTECTED]> Sent: Fri, 31 Oct 2003 08:55:49 + To: Robert Watson <[EMAIL PROTECTED]> Subject: Re: Still gettnig NFS client locking up > Robert Watson wrote: > > On Tue, 28 Oct 2003, Soren Schmidt wrote: > > > > > >>>I'm now running a kernel/world of October 26th on both NFS client and > >>>server machines. I am still seeing NFS lockups as reported by several > >>>people in these threads: > >> > >>Me too!! > > > > > > Hmm. I'm unable to reproduce this so far, and I'm pounding several 5.x > > NFS clients and servers. I've been checking out using CVS over NFS, > > performing dd's of big files, etc. There must be something more I'm > > missing in reproducing this. What network interface cards are you using > > (client, server)? Are you using DHCP on the client or server? What > > commands trigger it -- what part of the NFS namespace, etc? Are you > > running the commands as root, or another user? > > > > Robert N M Watson FreeBSD Core Team, TrustedBSD Projects > > [EMAIL PROTECTED] Network Associates Laboratories > > I'm also experiencing lockups with NFS, but it's the server that locks up on mine. Both client and server are -CURRENT. Server was fresh as of two days ago, and the client is a week or two old. They are connected via bfe (server) and vr (client). The server, I've found, will last much longer if the mount options on the client include 'tcp' and 'nfsv3' (supposed to be default, but I'm just calling it like it is). Reading files seems to be okay, and I've managed to get as far as compiling a kernel on an NFS-mounted /usr, but a buildworld will hang in < 30 minutes. The server is running dhcp and pf. All commands are being run as root. Kelley ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "[EMAIL PROTECTED]"