Re: vm_map.c lock up (Was: Re: NFS Locking Issue)
On Saturday 15 July 2006 00:08, User Freebsd wrote: > On Sat, 15 Jul 2006, Kostik Belousov wrote: > > On Sat, Jul 15, 2006 at 12:10:29AM -0300, User Freebsd wrote: > >> On Wed, 5 Jul 2006, Robert Watson wrote: > >>> If you can get into DDB when the hang has occurred, output via > >>> serial console for the following commands would be very > >>> helpful: > >>> > >>> show pcpu > >>> show allpcpu > >>> ps > >>> trace > >>> traceall > >>> show locks > >>> show alllocks > >>> show uma > >>> show malloc > >>> show lockedvnods > >> > >> 'k, after 16 days uptime, the server that I got all the > >> debugging turned on for finally hung up solid ... I was able to > >> break into DDB over the serial link, and have run all of the > >> above on it ... and the output is attached ... > >> > >> One thing to note is that the ps listing is not complete ... > >> there are >6k processes running at the time, and I don't know > >> how to get rid of the '--more--' prompt :( After 1k processes, > >> I just hit 'q' and went onto the other commands ... > > > > set lines=0 > > > >> Also, traceall gave me a 'No such command' error ... now that I > >> think about it, my luck, it was supposed to be 'trace all'? > > > > It is alltrace. > > > >> If this doesn't provide enough information, please let me know > >> what else I should do the next time through, besides the above > >> commands ... > > > > Missing alltrace output seems to be critical. If this is not > > feasible, please, provide at least the output of the bt for > > each pid shown in the "show lockedvnods" and "show alllocks". In > > you case, bt 64880 was the most interesting. It is pity that you > > had reset the machine. > > Was down for too long as it was ... it, of course, happened while I > was out with the family :( > > Will keep all of this in mind next time I get a chance to run > through things ... > > Any idea why 'panic' doesn't produce core like it used to? call doadump Should force a core dump. -- Anish Mistry pgpR6RAW6o4vE.pgp Description: PGP signature
Re: vm_map.c lock up (Was: Re: NFS Locking Issue)
On 14/07/2006 6:08 PM, User Freebsd wrote: Just in case, do you use mlocked mappings ? Also, why so huge number of crons exist in the system ? The are all forking now. It may be (can not say definitely without further investigation) just a fork bomb. re: crons ... this, I'm not sure of, but my suspicion was that the crons weren't able to complete, since the file system was locked up, but the next one was being attempted to run ... *shrug* This seems consistent with behaviour I've seen in on several 6.0-RELEASE machines.. from the limited information I've been able to get from the machines, there has appeared to be multiple tasks from cron all piled up upon one another. In particular, the daily periodic tasks that run the various 'find' were one of the things I noticed (although we run numerous tasks out of cron)... If something is blocking the filesystem and causing find (and possibly other processes) to become stuck, these would just keep mounting up until it all falls over (with numerous maxproc exceeded etc errors). These are on machines without NFS, but the symptoms are very very similar.. NWFS and SMBFS are commonly used on a number of the machines I've seen the problem on, which may be relevant -- perhaps it affects more than just NFS? I may experiment with building up a test server locally and trying to reproduce similar loads to see if I can trigger the problem in-house.. at least that way I can hook up a serial console and get some more detailed information... Regards Antony ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: vm_map.c lock up (Was: Re: NFS Locking Issue)
On Sat, 15 Jul 2006, Kostik Belousov wrote: On Sat, Jul 15, 2006 at 12:10:29AM -0300, User Freebsd wrote: On Wed, 5 Jul 2006, Robert Watson wrote: If you can get into DDB when the hang has occurred, output via serial console for the following commands would be very helpful: show pcpu show allpcpu ps trace traceall show locks show alllocks show uma show malloc show lockedvnods 'k, after 16 days uptime, the server that I got all the debugging turned on for finally hung up solid ... I was able to break into DDB over the serial link, and have run all of the above on it ... and the output is attached ... One thing to note is that the ps listing is not complete ... there are >6k processes running at the time, and I don't know how to get rid of the '--more--' prompt :( After 1k processes, I just hit 'q' and went onto the other commands ... set lines=0 Also, traceall gave me a 'No such command' error ... now that I think about it, my luck, it was supposed to be 'trace all'? It is alltrace. If this doesn't provide enough information, please let me know what else I should do the next time through, besides the above commands ... Missing alltrace output seems to be critical. If this is not feasible, please, provide at least the output of the bt for each pid shown in the "show lockedvnods" and "show alllocks". In you case, bt 64880 was the most interesting. It is pity that you had reset the machine. Was down for too long as it was ... it, of course, happened while I was out with the family :( Will keep all of this in mind next time I get a chance to run through things ... Any idea why 'panic' doesn't produce core like it used to? Just in case, do you use mlocked mappings ? Also, why so huge number of crons exist in the system ? The are all forking now. It may be (can not say definitely without further investigation) just a fork bomb. mlocked mappings? What are they? :) re: crons ... this, I'm not sure of, but my suspicion was that the crons weren't able to complete, since the file system was locked up, but the next one was being attempted to run ... *shrug* Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email . [EMAIL PROTECTED] MSN . [EMAIL PROTECTED] Yahoo . yscrappy Skype: hub.orgICQ . 7615664 ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: vm_map.c lock up (Was: Re: NFS Locking Issue)
On Sat, Jul 15, 2006 at 12:10:29AM -0300, User Freebsd wrote: > > > On Wed, 5 Jul 2006, Robert Watson wrote: > > >If you can get into DDB when the hang has occurred, output via serial > >console for the following commands would be very helpful: > > > >show pcpu > >show allpcpu > >ps > >trace > >traceall > >show locks > >show alllocks > >show uma > >show malloc > >show lockedvnods > > 'k, after 16 days uptime, the server that I got all the debugging turned > on for finally hung up solid ... I was able to break into DDB over the > serial link, and have run all of the above on it ... and the output is > attached ... > > One thing to note is that the ps listing is not complete ... there are >6k > processes running at the time, and I don't know how to get rid of the > '--more--' prompt :( After 1k processes, I just hit 'q' and went onto the > other commands ... set lines=0 > > Also, traceall gave me a 'No such command' error ... now that I think > about it, my luck, it was supposed to be 'trace all'? It is alltrace. > > If this doesn't provide enough information, please let me know what else I > should do the next time through, besides the above commands ... Missing alltrace output seems to be critical. If this is not feasible, please, provide at least the output of the bt for each pid shown in the "show lockedvnods" and "show alllocks". In you case, bt 64880 was the most interesting. It is pity that you had reset the machine. Just in case, do you use mlocked mappings ? Also, why so huge number of crons exist in the system ? The are all forking now. It may be (can not say definitely without further investigation) just a fork bomb. pgpGRGY1ljkXo.pgp Description: PGP signature
Re: vm_map.c lock up (Was: Re: NFS Locking Issue)
On Sat, 15 Jul 2006, User Freebsd wrote: On Wed, 5 Jul 2006, Robert Watson wrote: If you can get into DDB when the hang has occurred, output via serial console for the following commands would be very helpful: show pcpu show allpcpu ps trace traceall show locks show alllocks show uma show malloc show lockedvnods 'k, after 16 days uptime, the server that I got all the debugging turned on for finally hung up solid ... I was able to break into DDB over the serial link, and have run all of the above on it ... and the output is attached ... One thing to note is that the ps listing is not complete ... there are >6k processes running at the time, and I don't know how to get rid of the '--more--' prompt :( After 1k processes, I just hit 'q' and went onto the other commands ... Also, traceall gave me a 'No such command' error ... now that I think about it, my luck, it was supposed to be 'trace all'? If this doesn't provide enough information, please let me know what else I should do the next time through, besides the above commands ... Oh, and how do you get DDB to 'dump core' in 6.x? Back in 4.x days, I'd just do 'panic' (maybe twice) at the DDB prompt, but that didn't work with 6.x ... it just gave me a stacktrace and then the DDB> prompt both times ... Quick appendum ... the kernel on this server is from June 28th of this year ... Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email . [EMAIL PROTECTED] MSN . [EMAIL PROTECTED] Yahoo . yscrappy Skype: hub.orgICQ . 7615664 ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
vm_map.c lock up (Was: Re: NFS Locking Issue)
On Wed, 5 Jul 2006, Robert Watson wrote: If you can get into DDB when the hang has occurred, output via serial console for the following commands would be very helpful: show pcpu show allpcpu ps trace traceall show locks show alllocks show uma show malloc show lockedvnods 'k, after 16 days uptime, the server that I got all the debugging turned on for finally hung up solid ... I was able to break into DDB over the serial link, and have run all of the above on it ... and the output is attached ... One thing to note is that the ps listing is not complete ... there are >6k processes running at the time, and I don't know how to get rid of the '--more--' prompt :( After 1k processes, I just hit 'q' and went onto the other commands ... Also, traceall gave me a 'No such command' error ... now that I think about it, my luck, it was supposed to be 'trace all'? If this doesn't provide enough information, please let me know what else I should do the next time through, besides the above commands ... Oh, and how do you get DDB to 'dump core' in 6.x? Back in 4.x days, I'd just do 'panic' (maybe twice) at the DDB prompt, but that didn't work with 6.x ... it just gave me a stacktrace and then the DDB> prompt both times ... Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email . [EMAIL PROTECTED] MSN . [EMAIL PROTECTED] Yahoo . yscrappy Skype: hub.orgICQ . 7615664 typescript.gz Description: Binary data ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: NFS Locking Issue
On Wed, 05 Jul 2006 02:49:26 +0200, Scott Long <[EMAIL PROTECTED]> wrote: Michel Talon wrote: BTW, I noticed yesterday that that IPv6 support committ to rpc.lockd was never backed out. An immediate question for people experiencing new rpc.lockd problems with 6.x should be whether or not backing out that change helps. So it may be relevant to say that i have kernels without IPV6 support. Recall that i have absolutely no problem with the client in FreeBSD-6.1. Tomorrow i will test one of the 6.1 machines as a NFS server and the other as a client, and will make you know if i see something. As to the problems you mention about NFS Linux, yes i have seen a lot since years. But to my surprise FC5 seems to work well. By the way it is kernel 2.6.16 so sufficiently recent for the problems to have been ironed out, presumably. 2.6.16 should be OK. I've heard of problems with cookie and handle sizes with it, but only under highly unusual circumstances. Scott Just for the record. I'm running a 6.1-STABLE client with a Debian 3.1 server with kernel 2.6.12 and that works ok with nfs locking. Locking didn't work in the past (6.0-STABLE). Ronald. -- Ronald Klop Amsterdam, The Netherlands ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: NFS Locking Issue
On Wed, 5 Jul 2006, Francisco Reyes wrote: Scott Long writes: For what it's worth, I recently spent a lot of time putting FreeBSD 6.1 to the test as both an NFS client and server in a mixed OS environment. I have a few debugging settings/suggestions that have been sent my way and I plan to try them tonight, but this is just another report.. FreeBSD only environment. Today after hours going crazy with horrible performance I brought down nfsd and brought it back up.. that simple process got vmstat 'b' column down and everything was back to normal. Again this will not help anyone troubleshoot, but just to mention that it happens even with a FreeBSD only environment. 'k, to those out there that know what is useful, and what isn't ... If Francisco had DDB enabled, did a CTL-ALT-ESC when the above happens, and does a 'panic' to crash the server and dump a core ... can anything useful be gleamed from that core dump? Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email . [EMAIL PROTECTED] MSN . [EMAIL PROTECTED] Yahoo . yscrappy Skype: hub.orgICQ . 7615664 ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: NFS Locking Issue
User Freebsd writes: What are others using for ethernet? Of our two machines having the problem 1 has BGE and the other one has EM (Intel). Doesn't seem to make much of a difference. Except for the network cards, these two machines are identical. Same motherboard, same RAID controller, same amount of RAM, same RAID configuration... ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: NFS Locking Issue
On Wed, 5 Jul 2006, Francisco Reyes wrote: can you trigger it using work on just one client against a server, without client<->client interactions? This makes tracking and reproduction a lot easier Personally I am experiencing two problems. 1- NFS clients freeze/hang if the server goes away. We have clients with several mounts so if one of the servers dies then the entire operation of the client is put in jeopardy. This I can reproduce every single time with a 6.X client.. with both a 5.X and a 6.X server. "umount -f" hangs too. The problems you are experiencing are almost certainly not related to rpc.lockd, rather, bugs in the NFS client. Let's just look at the normal use hang for now, and revisit umount -f after that. as multi-client test cases are really tricky! The second case only happens under heavy load and restarting nfsd makes it go away. Basically 'b' column in vmstat goes high and the performnance of the machine falls to the floor. Going to try http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneld ebug-deadlocks.html And reading up on how to debug with DDB. Have another user who volunteered to give me some pointers.. so will try that.. so I am able to actually produce more helpfull info. If you can get into DDB when the hang has occurred, output via serial console for the following commands would be very helpful: show pcpu show allpcpu ps trace traceall show locks show alllocks show uma show malloc show lockedvnods Note that the last two will only work if you compile WITNESS in -- WITNESS significantly changes kernel timing, so you may find it closes whatever race you're running into. If you can reproduce the problem with WITNESS and INVARIANTS, that would be very useful. The above output will hopefully tell us the basic state of the system with respect to processes, threads, locking, and so on, and may help us track things down. For the above, you definitely want a serial console as it will be quite a bit of output. Also, can you send the output of the 'mount' command from the un-hung state? I notice a lot of threads stuck in 'ufs'. Finally, during the above, if you could disable background file system checking by placing the following in /etc/rc.conf: background_fsck="NO" And boot to single user mode, doing a full fsck -p before booting up, in order to make sure the file system is in a good state before beginning. Robert N M Watson Computer Laboratory University of Cambridge ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: NFS Locking Issue
User Freebsd writes: I believe, in Francisco's case, they are willing to pay someone to fix the NFS issues they are having, which, i'd assume, means easy access to the problematic server(s) to do proper testing in a "real life scenario" ... Correct. As long as the person is someone "trusted in the community" we could do that. And yes we are willing to come to some agreement for compensation for the help. Needless to say our introduction of new machines will go through a more rigourous test in the future.. specially when jumping to a new Release number in FreeBSD. We lost 1 big customer and after today we likely will loose 2 or 3 more.. of the big ones.. when it's all said and done we are likely to loose several thousand dollars/month due to this 6.X incidents. We are fairly new to NFS and that's why we were hoping to get someone to help us.. or at least point us in the right direction. I plan to go over the link you sent me and try to prepare at least one machine. As for paying someone, yes we have been actively looking for someone to help us since we are relatively new to NFS.. and much more newer to troubleshooting this type of prolbems ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: NFS Locking Issue
Robert Watson writes: It's not impossible. It would be interesting to see if ps axl reports that rpc.lockd is in the kqread state Found my post in another thread. 0 354 1 0 96 0 1412 1032 select Ss??0:07.06 /usr/sbin/rpcbind It was not in kqread state.. and that was from a point where the machine was totally locked up.. had to do a physical reset.. could not even kill nfsd that time. I had also more output from several different ps. You need to do "view more" to see them all. http://tinyurl.com/kpejr ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: NFS Locking Issue
Robert Watson writes: It's not impossible. It would be interesting to see if ps axl reports that rpc.lockd is in the kqread state, which would suggest it was blocked in the resolver. Just tried "ps axl | grep rpc" in the machine giving us the most grief.. Only got one line back: root 367 0.0 0.0 1368 960 ?? Ss 25Jun06 0:05.52 /usr/sbin/rpcbin 0 1 0 4 0 select Is that what one of the lines I should keep an eye, next time the machine is locked up? ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: NFS Locking Issue
Robert Watson writes: can you trigger it using work on just one client against a server, without client<->client interactions? This makes tracking and reproduction a lot easier Personally I am experiencing two problems. 1- NFS clients freeze/hang if the server goes away. We have clients with several mounts so if one of the servers dies then the entire operation of the client is put in jeopardy. This I can reproduce every single time with a 6.X client.. with both a 5.X and a 6.X server. "umount -f" hangs too. as multi-client test cases are really tricky! The second case only happens under heavy load and restarting nfsd makes it go away. Basically 'b' column in vmstat goes high and the performnance of the machine falls to the floor. Going to try http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneld ebug-deadlocks.html And reading up on how to debug with DDB. Have another user who volunteered to give me some pointers.. so will try that.. so I am able to actually produce more helpfull info. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: NFS Locking Issue
Scott Long writes: For what it's worth, I recently spent a lot of time putting FreeBSD 6.1 to the test as both an NFS client and server in a mixed OS environment. I have a few debugging settings/suggestions that have been sent my way and I plan to try them tonight, but this is just another report.. FreeBSD only environment. Today after hours going crazy with horrible performance I brought down nfsd and brought it back up.. that simple process got vmstat 'b' column down and everything was back to normal. Again this will not help anyone troubleshoot, but just to mention that it happens even with a FreeBSD only environment. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: NFS Locking Issue
> with the bge driver ... could we be possibly talking internet vs nfs > issues? Pursuing invetigations, i have discovered that for people having workstations whose home directories are on a NFS server, and who run Gnome or KDE, there is a program which has horrible NFS behavior, it is gam_server from gamin, which detects alterations on your .kde for example. On my machine running nfsstat -c -w 1 i see 4000 requests/s due to that. If i displace it (*) and kill it, this drops to 80 requests/s and KDE works exactly as well, including discovering new files. I think it is not necessary to comment on the performance penalty if a number of stations send 4000r/s to a server, it will soon be killed. (*) it restarts itself automatically so it is necessary to displace or rename it before killing. -- Michel TALON ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: NFS Locking Issue
On Wed, 5 Jul 2006, Michel Talon wrote: So it may be relevant to say that i have kernels without IPV6 support. Recall that i have absolutely no problem with the client in FreeBSD-6.1. Tomorrow i will test one of the 6.1 machines as a NFS server and the other as a client, and will make you know if i see something. Well, i have checked between 2 FreeBSD-6.1-RELEASE machines on the network, both have fxp ethernet driver running at 100 Mb/s, one is NFS server other NFS client. Both run lockd and statd. I have absolutely no problem exchanging files, for example if i begin to copy /usr/src through NFS from one machine to the other, which makes a lot of transactions of all sorts, i get: niobe# mount asmodee:/usr/src /mnt cp -R /mnt/src . ... after some time i interrupt the transfer niobe% du -sh . 131M. and during this time i observe the following type of statistics asmodee% netstat -w 1 -I fxp0 input (fxp0) output packets errs bytespackets errs bytes colls 542 0 84116 1330 01219388 0 515 0 72806 1290 01196330 0 501 0 95722 1081 0 741048 0 539 0 90704 1090 01228052 0 645 0 67888902 01451098 0 405 0 81264 1609 0 604278 0 503 0 74218709 0 924422 0 500 0 98904973 0 619350 0 550 0 100122855 0 836328 0 615 0 79336 1081 0 862772 0 577 0 82862901 01005024 0 which looks decent to me. Doing the same with just one big file no problem either, and i get a transfer speed of 6.60 MB/s which is perhaps a little less than with linux, but nothing catastrophic. I get 8.20 MB/s for FreeBSD client interacting with the Linux server. Now netstat gives packets errs bytespackets errs bytes colls 785 0 123266 4716 06825600 0 759 0 139898 4530 07747276 0 852 0 124652 5106 06902566 0 863 0 128040 5170 07081738 0 811 0 123760 4862 06851498 0 789 0 123540 4720 06834310 0 840 0 115378 5024 06382114 0 So up to what i can see NFS works OK for me on FreeBSD-6.1. So the main difference with other people cases may be that i have removed IPV6 support from kernel. What are others using for ethernet? In your case, you say you are running between fxp cards ... I've heard some report, in another thread, problems with the bge driver ... could we be possibly talking internet vs nfs issues? Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email . [EMAIL PROTECTED] MSN . [EMAIL PROTECTED] Yahoo . yscrappy Skype: hub.orgICQ . 7615664 ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: NFS Locking Issue
On Mon, 3 Jul 2006, Michael Collette wrote: - Let's start with the simplest. The scenario here involves 2 machines, mach01 and mach02. Both are running 6-STABLE, and both are running rpcbind, rpc.statd, and rpc.lockd. mach01 has exported /documents and mach02 is mounting that export under /mnt. Simple enough? The /documents directory has multiple subdirectories and files of various sizes. The actual amount of data doesn't really matter to produce a failure. All you need to do at this point is to try to copy files from that mount point to somewhere else on the hard drive. cp -Rp /mnt/* /tmp/documents/ You may, or not, see that a couple of subdirectories were created, but no files actually moved over. The cp command is now locked up, and no traffic moves. This usually takes a second or two to show up as a problem. I can repeat this with multiple 6-STABLE boxes. Turn off rpc.lockd on either the server or client before the cp command, and things work. I've tried several times to reproduce this, and have not succeeded in doing so. In princple, cp should not be using advisory locks. Could you try running cp under ktrace, and saving the ktrace file somewhere outside of NFS? Something like the following: ktrace -f /usr/tmp/localfile cp -Rp /mnt/* /tmp/documents/ If you are able to reproduce the problem with tracing turned on, a copy of the tracefile would be very helpful. Also, when it locks up, are you able to kill cp using Ctrl-C, and if you hit Ctrl-T while it appears locked, what output do you get? Thanks, Robert N M Watson Computer Laboratory University of Cambridge ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: NFS Locking Issue
On Wed, 5 Jul 2006, Robert Watson wrote: On Wed, 5 Jul 2006, Danny Braniss wrote: In my case our main servers are NetApp, and the problems are more related to am-utils running into some race condition (need more time to debug this :-) the other problem is related to throughput, freebsd is slower than linux, and while freebsd/nfs/tcp is faster on Freebsd than udp, on linux it's the same. So it seems some tunning is needed. our main problem now is samba/rpc.lockd, we are stuck with a server running FreeBSD 5.4 which crashes, and we can't upgrade to 6.1 because lockd doesn't work. So, if someone is willing to look into the lockd issue, we would like to help. The most significant problem working with rpc.lockd is creating easy to reproduce test cases. Not least because they can potentially involve multiple clients. If you can help to produce simple test cases to reproduce the bugs you're seeing, that would be invaluable. I'm aware of two general classes of problems with rpc.lockd. First, architectural issues, some derived from architectural problems in the NLM protocol: for example, assumptions that there can be a clean mapping of process lock owners to locks, which fall down as locks are properties of file descriptors that can be inheritted. Second, implementation bugs/misfeatures, such as the kernel not knowing how to cancel lock requests, so being unable to implement interruptible waits on locks in the distributed case. Reducing complex failure modes to easily reproduced test cases is tricky also, though. It requires careful analysis, often with ktrace and tcpdump/ethereal to work out what's going on, and not a little luck to perform the reduction of a large trace down to a simple test scenario. The first step is to try and figure out what, if any, specific workload results in a problem. For example, can you trigger it using work on just one client against a server, without client<->client interactions? This makes tracking and reproduction a lot easier, as multi-client test cases are really tricky! Once you've established whether it can be reproduced with a single client, you have to track down the behavior that triggers it -- normally, this is done by attempting to narrow down the specific program or sequence of events that causes the bug to trigger, removing things one at a time to see what causes the problem to disappear. This is made more difficult as lock managers are sensitive to timing, so removing a high load item from the list, even if it isn't the source of the problem, might cause it to trigger less frequently. I'm not sure if this is an option for anyone, either developer or user, but in the past, on particularly tricky bugs where I seemed to be the only one to be able to produce it, I've given access to a 'trusted developer' to the machine itself, to minimize the time lag that emails create ... but, also, to let the developer at a machine that has the load required to easily reproduce it ... Not sure if there is anyone out there, on either side of the proverbial fence, that feels comfortable doing this, but figured I'd throw the idea out ... I believe, in Francisco's case, they are willing to pay someone to fix the NFS issues they are having, which, i'd assume, means easy access to the problematic server(s) to do proper testing in a "real life scenario" ... Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email . [EMAIL PROTECTED] MSN . [EMAIL PROTECTED] Yahoo . yscrappy Skype: hub.orgICQ . 7615664 ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: NFS Locking Issue
> So it may be relevant to say that i have kernels without IPV6 support. > Recall that i have absolutely no problem with the client in FreeBSD-6.1. > Tomorrow i will test one of the 6.1 machines as a NFS server and the other as > a client, and will make you know if i see something. Well, i have checked between 2 FreeBSD-6.1-RELEASE machines on the network, both have fxp ethernet driver running at 100 Mb/s, one is NFS server other NFS client. Both run lockd and statd. I have absolutely no problem exchanging files, for example if i begin to copy /usr/src through NFS from one machine to the other, which makes a lot of transactions of all sorts, i get: niobe# mount asmodee:/usr/src /mnt cp -R /mnt/src . ... after some time i interrupt the transfer niobe% du -sh . 131M. and during this time i observe the following type of statistics asmodee% netstat -w 1 -I fxp0 input (fxp0) output packets errs bytespackets errs bytes colls 542 0 84116 1330 01219388 0 515 0 72806 1290 01196330 0 501 0 95722 1081 0 741048 0 539 0 90704 1090 01228052 0 645 0 67888902 01451098 0 405 0 81264 1609 0 604278 0 503 0 74218709 0 924422 0 500 0 98904973 0 619350 0 550 0 100122855 0 836328 0 615 0 79336 1081 0 862772 0 577 0 82862901 01005024 0 which looks decent to me. Doing the same with just one big file no problem either, and i get a transfer speed of 6.60 MB/s which is perhaps a little less than with linux, but nothing catastrophic. I get 8.20 MB/s for FreeBSD client interacting with the Linux server. Now netstat gives packets errs bytespackets errs bytes colls 785 0 123266 4716 06825600 0 759 0 139898 4530 07747276 0 852 0 124652 5106 06902566 0 863 0 128040 5170 07081738 0 811 0 123760 4862 06851498 0 789 0 123540 4720 06834310 0 840 0 115378 5024 06382114 0 So up to what i can see NFS works OK for me on FreeBSD-6.1. So the main difference with other people cases may be that i have removed IPV6 support from kernel. -- Michel TALON ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: NFS Locking Issue
On Wed, Jul 05, 2006 at 02:04:59PM +0100, Robert Watson wrote: > > On Wed, 5 Jul 2006, Kostik Belousov wrote: > > >>Also, the both lockd processes now put identification information in the > >>proctitle (srv and kern). SIGUSR1 shall be sent to srv process. > > > >Hmm, after looking at the dump there and some code reading, I have noted > >the following: > > > >1. NLM lock request contains the field caller_name. It is filled by (let > >call it) kernel rpc.lockd by the results of hostname(3). > > > >2. This caller_name is used by server rpc.lockd to send request for host > >monitoring to rpc.statd (see send_granted). Request is made by clnt_call, > >that is blocking rpc call. > > > >3. rpc.statd does getaddrinfo on caller_name to determine address of the > >host to monitor. > > > >If the getaddrinfo in step 3 waits for resolver, then your client machine > >will get locking process in"lockd" state. > > > >Could people experiencing rpc.lockd mistery at least report whether > >_server_ machine successfully resolve hostname of clients as reported by > >hostname? And, if yes, to what family of IP protocols ? > > It's not impossible. It would be interesting to see if ps axl reports that > rpc.lockd is in the kqread state, which would suggest it was blocked in the rpc.statd :). > resolver. We probably ought to review rpc.statd and make sure it's > generally sensible. I've noticed that its notification process on start is > a bit poorly structured in terms of how it notifies hosts of its state > change -- if one host is down, it may take a very long time to notify other > hosts. pgpExEUvwNn5G.pgp Description: PGP signature
Re: NFS Locking Issue
On Wed, 5 Jul 2006, Kostik Belousov wrote: Also, the both lockd processes now put identification information in the proctitle (srv and kern). SIGUSR1 shall be sent to srv process. Hmm, after looking at the dump there and some code reading, I have noted the following: 1. NLM lock request contains the field caller_name. It is filled by (let call it) kernel rpc.lockd by the results of hostname(3). 2. This caller_name is used by server rpc.lockd to send request for host monitoring to rpc.statd (see send_granted). Request is made by clnt_call, that is blocking rpc call. 3. rpc.statd does getaddrinfo on caller_name to determine address of the host to monitor. If the getaddrinfo in step 3 waits for resolver, then your client machine will get locking process in"lockd" state. Could people experiencing rpc.lockd mistery at least report whether _server_ machine successfully resolve hostname of clients as reported by hostname? And, if yes, to what family of IP protocols ? It's not impossible. It would be interesting to see if ps axl reports that rpc.lockd is in the kqread state, which would suggest it was blocked in the resolver. We probably ought to review rpc.statd and make sure it's generally sensible. I've noticed that its notification process on start is a bit poorly structured in terms of how it notifies hosts of its state change -- if one host is down, it may take a very long time to notify other hosts. There are a number of other dubious things about the NLM protocol design (at least, from my reading last night). I've also noticed that our rpc.lockd is particularly sensitive, on the client side, to locks being released by a different process than the process that acquired the lock, which is triggered excessively by our new libpidfile in RELENG_6. Robert N M Watson Computer Laboratory University of Cambridge ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: NFS Locking Issue
On Wed, Jul 05, 2006 at 02:38:22PM +0300, Kostik Belousov wrote: > On Wed, Jul 05, 2006 at 10:09:24AM +0100, Robert Watson wrote: > > The most significant problem working with rpc.lockd is creating easy to > > reproduce test cases. Not least because they can potentially involve > > multiple clients. If you can help to produce simple test cases to > > reproduce the bugs you're seeing, that would be invaluable. > > > > > > > Reducing complex failure modes to easily reproduced test cases is tricky > > also, though. It requires careful analysis, often with ktrace and > > tcpdump/ethereal to work out what's going on, and not a little luck to > > perform the reduction of a large trace down to a simple test scenario. The > > first step is to try and figure out what, if any, specific workload results > > in a problem. For example, can you trigger it using work on just one > > client against a server, without client<->client interactions? This makes > > tracking and reproduction a lot easier, as multi-client test cases are > > really tricky! Once you've established whether it can be reproduced with a > > single client, you have to track down the behavior that triggers it -- > > normally, this is done by attempting to narrow down the specific program or > > sequence of events that causes the bug to trigger, removing things one at a > > time to see what causes the problem to disappear. This is made more > > difficult as lock managers are sensitive to timing, so removing a high load > > item from the list, even if it isn't the source of the problem, might cause > > it to trigger less frequently. > > I made the patch for rpc.lockd that could somewhat ease obtaining > debug information. Patch is available at > http://people.freebsd.org/~kib/rpc.lockd-debug.patch > > No functional changes. Patch only adds dumping of currently held locks > (as perceived by lockd) on receiving of SIGUSR1. You need to specify > debug level 2 or 3 to obtain the dump. > > Also, the both lockd processes now put identification information > in the proctitle (srv and kern). SIGUSR1 shall be sent to srv process. Hmm, after looking at the dump there and some code reading, I have noted the following: 1. NLM lock request contains the field caller_name. It is filled by (let call it) kernel rpc.lockd by the results of hostname(3). 2. This caller_name is used by server rpc.lockd to send request for host monitoring to rpc.statd (see send_granted). Request is made by clnt_call, that is blocking rpc call. 3. rpc.statd does getaddrinfo on caller_name to determine address of the host to monitor. If the getaddrinfo in step 3 waits for resolver, then your client machine will get locking process in"lockd" state. Could people experiencing rpc.lockd mistery at least report whether _server_ machine successfully resolve hostname of clients as reported by hostname? And, if yes, to what family of IP protocols ? pgpqXwVLbOl6l.pgp Description: PGP signature
Re: NFS Locking Issue
On Wed, Jul 05, 2006 at 10:09:24AM +0100, Robert Watson wrote: > The most significant problem working with rpc.lockd is creating easy to > reproduce test cases. Not least because they can potentially involve > multiple clients. If you can help to produce simple test cases to > reproduce the bugs you're seeing, that would be invaluable. > > > Reducing complex failure modes to easily reproduced test cases is tricky > also, though. It requires careful analysis, often with ktrace and > tcpdump/ethereal to work out what's going on, and not a little luck to > perform the reduction of a large trace down to a simple test scenario. The > first step is to try and figure out what, if any, specific workload results > in a problem. For example, can you trigger it using work on just one > client against a server, without client<->client interactions? This makes > tracking and reproduction a lot easier, as multi-client test cases are > really tricky! Once you've established whether it can be reproduced with a > single client, you have to track down the behavior that triggers it -- > normally, this is done by attempting to narrow down the specific program or > sequence of events that causes the bug to trigger, removing things one at a > time to see what causes the problem to disappear. This is made more > difficult as lock managers are sensitive to timing, so removing a high load > item from the list, even if it isn't the source of the problem, might cause > it to trigger less frequently. I made the patch for rpc.lockd that could somewhat ease obtaining debug information. Patch is available at http://people.freebsd.org/~kib/rpc.lockd-debug.patch No functional changes. Patch only adds dumping of currently held locks (as perceived by lockd) on receiving of SIGUSR1. You need to specify debug level 2 or 3 to obtain the dump. Also, the both lockd processes now put identification information in the proctitle (srv and kern). SIGUSR1 shall be sent to srv process. pgpyMjtyKCekU.pgp Description: PGP signature
Re: NFS Locking Issue
Quoting Michel Talon <[EMAIL PROTECTED]>: So it would appear that you cured the NFS problems inherent with FBSD-6 by replacing FBSD with Fedora Linux. Nice to know that NFSd works in Linux. But won't help those on the FBSD list fix their FBSD-6 boxen. :/ First NFS is designed to make machines of different OSs interact properly. Yes, this is it's purpose. If a FreeBSD server interacts properly with a FreeBSD client, but not other clients, you cannot say that the situation is fine. Indeed. Second i am not the one to chose the NFS server, there are people working in social groups, in the real world. And third, the most important, the OP message seemed to imply that the FreeBSD-6 NFS client was at fault, i pointed out that in my experience my FreeBSD-6.1 client works OK, while the 6.0 doesn't, when interacting with a FC5 server. This is in itself a relevant piece of information for the problem at hand. It may be that the server side is at fault, or some complex interaction between client and server. Of course. I quite agree. Horrible oversight on my part. Anyways some people claimed here that they had no problem with FreeBSD-5 clients and servers. My experience is that i had constant problems between FreeBSD-5 clients and Fedora Core 3 servers. I cannot provide any other data point. I am not particularly sure of the quality of the FC3 or FC5 NFS server implementation, except that the ~ 100 workstations running the similar Fedora distribution work like a charm with their homes NFS mounted on the server. On the other hand a Debian client machine also has severe NFS problems. My only conclusion is that these NFS stories are very tricky. The only moment everything worked fine was when we were running Solaris on the server. Useful knowledge, to be sure. Sorry for my oversight. I should probably refrain from responding when I have too many other things purculating in my mind while at work. This has gotten me in trouble once before on this _same_ list. :) Thank you for your thoughtful response. -- Michel TALON ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]" -- panic: kernel trap (ignored) - FreeBSD 5.4-RELEASE-p12 (SMP - 900x2) Tue Mar 7 19:37:23 PST 2006 / pgpHofOVV3K34.pgp Description: PGP Digital Signature
Re: NFS Locking Issue
On Wed, 5 Jul 2006, Danny Braniss wrote: In my case our main servers are NetApp, and the problems are more related to am-utils running into some race condition (need more time to debug this :-) the other problem is related to throughput, freebsd is slower than linux, and while freebsd/nfs/tcp is faster on Freebsd than udp, on linux it's the same. So it seems some tunning is needed. our main problem now is samba/rpc.lockd, we are stuck with a server running FreeBSD 5.4 which crashes, and we can't upgrade to 6.1 because lockd doesn't work. So, if someone is willing to look into the lockd issue, we would like to help. The most significant problem working with rpc.lockd is creating easy to reproduce test cases. Not least because they can potentially involve multiple clients. If you can help to produce simple test cases to reproduce the bugs you're seeing, that would be invaluable. I'm aware of two general classes of problems with rpc.lockd. First, architectural issues, some derived from architectural problems in the NLM protocol: for example, assumptions that there can be a clean mapping of process lock owners to locks, which fall down as locks are properties of file descriptors that can be inheritted. Second, implementation bugs/misfeatures, such as the kernel not knowing how to cancel lock requests, so being unable to implement interruptible waits on locks in the distributed case. Reducing complex failure modes to easily reproduced test cases is tricky also, though. It requires careful analysis, often with ktrace and tcpdump/ethereal to work out what's going on, and not a little luck to perform the reduction of a large trace down to a simple test scenario. The first step is to try and figure out what, if any, specific workload results in a problem. For example, can you trigger it using work on just one client against a server, without client<->client interactions? This makes tracking and reproduction a lot easier, as multi-client test cases are really tricky! Once you've established whether it can be reproduced with a single client, you have to track down the behavior that triggers it -- normally, this is done by attempting to narrow down the specific program or sequence of events that causes the bug to trigger, removing things one at a time to see what causes the problem to disappear. This is made more difficult as lock managers are sensitive to timing, so removing a high load item from the list, even if it isn't the source of the problem, might cause it to trigger less frequently. Robert N M Watson Computer Laboratory University of Cambridge ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: NFS Locking Issue
Mornin' On Tue, Jul 04, 2006 at 09:47:21PM +0100, Robert Watson wrote: > BTW, I noticed yesterday that that IPv6 support committ to rpc.lockd was > never backed out. An immediate question for people experiencing new > rpc.lockd problems with 6.x should be whether or not backing out that > change helps. That could be a good pointer. I also started experiencing some problems at home (I did not investigate further though, but started using local locking and all was fine), while in our prod setup, where lots of machines are running, and many of them use 6-STABLE of not too long ago, I never experienced any problems with NFS. The main difference between both these networks is, that at home I have an IPv6 environment, while at work it's IPv4 only. I barely find time before the weekend to do tests, but if I don't read any postings telling, that this made a difference, I will then start testing at home. Thanx, Oliver -- | Oliver Brandmueller | Offenbacher Str. 1 | Germany D-14197 Berlin | | Fon +49-172-3130856 | Fax +49-172-3145027 | WWW: http://the.addict.de/ | | Ich bin das Internet. Sowahr ich Gott helfe. | | Eine gewerbliche Nutzung aller enthaltenen Adressen ist nicht gestattet! | pgp9BUYZloqfB.pgp Description: PGP signature
Re: NFS Locking Issue
> Michel Talon wrote: > > >>Using Ubuntu as the server I connected a FreeBSD 5.4 and 6-stable box as > >>clients on a 100Mb/s network. The time trial used a dummy 100Meg file > >>transfered from the server to the client. > >> > > > > > > I have similar experiences here. With FreeBSD-6.1 as client (using an Intel > > etherexpress card at 100 Mb/s) and FC5 server i see full wire speed for file > > transfers via NFS. > > > > > >>After the 4th of July I intend to test Ubuntu as a client to a FreeBSD > >>6-STABLE server on a gigabit lan to run similar time trials. I'm > >>looking to confirm what I can only suspect at this point, which is that > >>the NFS server on FreeBSD is mucked up, but the client is okay. > > > > > > I have the same impression. The 6.1-RELEASE client seems to work well. > > Yesterday i have upgraded my 6.0 (*) box to 6.1 and i have not seen a single > > NFS problem after that. Moreover i am using rpc.statd, and rpc.lockd > > and they work OK and are really functional. > > I have the following sysctl which may have an effect on the problem: > > vfs.nfs.access_cache_timeout=5 > > > > So it may well be that it is the FreeBSD NFS server code which has problems. > > > > (*) 6.0-RELEASE client definitively does not work OK for me. > > > > > > For what it's worth, I recently spent a lot of time putting FreeBSD 6.1 > to the test as both an NFS client and server in a mixed OS environment. > By far and away, the biggest problems that I encountered with it were > due to linux NFS bugs. CentOS, FC, and SuSE all created huge problems > under load, and it was impossible to get stable results until I started > using 2.6.12 and higher kernels. > > I have a variety of theories that I wish I had had time to test. I've > seen hints of problems with READDIRPLUS, with FreeBSD's habit of mapping > GETATTR to ACCESS, and with handle sizes. But in any case, it's been no > secret that Linux has had very severe NFS problems in the past, and that > the NetApp folks have worked very hard over the last year to fix them in > the most recent Linux kernel releases. The only real fault I give > FreeBSD is rpc.lockd. It's pretty much useless in all but trivial > circumstances. Beyond that, make sure you're using a linux kernel that > is relatively recent. > In my case our main servers are NetApp, and the problems are more related to am-utils running into some race condition (need more time to debug this :-) the other problem is related to throughput, freebsd is slower than linux, and while freebsd/nfs/tcp is faster on Freebsd than udp, on linux it's the same. So it seems some tunning is needed. our main problem now is samba/rpc.lockd, we are stuck with a server running FreeBSD 5.4 which crashes, and we can't upgrade to 6.1 because lockd doesn't work. So, if someone is willing to look into the lockd issue, we would like to help. danny > Scott > > ___ > freebsd-stable@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to "[EMAIL PROTECTED]" > ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: NFS Locking Issue
Michel Talon wrote: BTW, I noticed yesterday that that IPv6 support committ to rpc.lockd was never backed out. An immediate question for people experiencing new rpc.lockd problems with 6.x should be whether or not backing out that change helps. So it may be relevant to say that i have kernels without IPV6 support. Recall that i have absolutely no problem with the client in FreeBSD-6.1. Tomorrow i will test one of the 6.1 machines as a NFS server and the other as a client, and will make you know if i see something. As to the problems you mention about NFS Linux, yes i have seen a lot since years. But to my surprise FC5 seems to work well. By the way it is kernel 2.6.16 so sufficiently recent for the problems to have been ironed out, presumably. 2.6.16 should be OK. I've heard of problems with cookie and handle sizes with it, but only under highly unusual circumstances. Scott ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: NFS Locking Issue
> BTW, I noticed yesterday that that IPv6 support committ to rpc.lockd was > never > backed out. An immediate question for people experiencing new rpc.lockd > problems with 6.x should be whether or not backing out that change helps. So it may be relevant to say that i have kernels without IPV6 support. Recall that i have absolutely no problem with the client in FreeBSD-6.1. Tomorrow i will test one of the 6.1 machines as a NFS server and the other as a client, and will make you know if i see something. As to the problems you mention about NFS Linux, yes i have seen a lot since years. But to my surprise FC5 seems to work well. By the way it is kernel 2.6.16 so sufficiently recent for the problems to have been ironed out, presumably. -- Michel TALON ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: NFS Locking Issue
On Tue, 4 Jul 2006, Scott Long wrote: For what it's worth, I recently spent a lot of time putting FreeBSD 6.1 to the test as both an NFS client and server in a mixed OS environment. By far and away, the biggest problems that I encountered with it were due to linux NFS bugs. CentOS, FC, and SuSE all created huge problems under load, and it was impossible to get stable results until I started using 2.6.12 and higher kernels. I have a variety of theories that I wish I had had time to test. I've seen hints of problems with READDIRPLUS, with FreeBSD's habit of mapping GETATTR to ACCESS, and with handle sizes. But in any case, it's been no secret that Linux has had very severe NFS problems in the past, and that the NetApp folks have worked very hard over the last year to fix them in the most recent Linux kernel releases. The only real fault I give FreeBSD is rpc.lockd. It's pretty much useless in all but trivial circumstances. Beyond that, make sure you're using a linux kernel that is relatively recent. BTW, I noticed yesterday that that IPv6 support committ to rpc.lockd was never backed out. An immediate question for people experiencing new rpc.lockd problems with 6.x should be whether or not backing out that change helps. I set up a simple local testbed for rpc.lockd this morning and have started running some basic tests. I wasn't able to trivially reproduce rpc.lockd problems reported for cp -r, although I did bump into another bump in the memory mapping of zero-length files following creation in the NFS client, which I've passed on to Mohan. I think what's needed is a wire-level regression suite, though, in order to avoid mixing up our rpc.lockd client code with the tests for rpc.lockd's server. This is something I may be able to start looking at this week, although it's the usual time trade-off: work on getting audit ready for MFC, network stack locking and protocol cleanup/bug fixing, or throw rpc.lockd into the mix as well? If we can demonstrate that backing out the IPv6 change clearly helps, we need to figure out why it's causing the problem. A casual read of the change doesn't suggest anything obvious, unfortunately, suggesting something non-obvious :-(. Robert N M Watson Computer Laboratory University of Cambridge ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: NFS Locking Issue
Michel Talon wrote: Using Ubuntu as the server I connected a FreeBSD 5.4 and 6-stable box as clients on a 100Mb/s network. The time trial used a dummy 100Meg file transfered from the server to the client. I have similar experiences here. With FreeBSD-6.1 as client (using an Intel etherexpress card at 100 Mb/s) and FC5 server i see full wire speed for file transfers via NFS. After the 4th of July I intend to test Ubuntu as a client to a FreeBSD 6-STABLE server on a gigabit lan to run similar time trials. I'm looking to confirm what I can only suspect at this point, which is that the NFS server on FreeBSD is mucked up, but the client is okay. I have the same impression. The 6.1-RELEASE client seems to work well. Yesterday i have upgraded my 6.0 (*) box to 6.1 and i have not seen a single NFS problem after that. Moreover i am using rpc.statd, and rpc.lockd and they work OK and are really functional. I have the following sysctl which may have an effect on the problem: vfs.nfs.access_cache_timeout=5 So it may well be that it is the FreeBSD NFS server code which has problems. (*) 6.0-RELEASE client definitively does not work OK for me. For what it's worth, I recently spent a lot of time putting FreeBSD 6.1 to the test as both an NFS client and server in a mixed OS environment. By far and away, the biggest problems that I encountered with it were due to linux NFS bugs. CentOS, FC, and SuSE all created huge problems under load, and it was impossible to get stable results until I started using 2.6.12 and higher kernels. I have a variety of theories that I wish I had had time to test. I've seen hints of problems with READDIRPLUS, with FreeBSD's habit of mapping GETATTR to ACCESS, and with handle sizes. But in any case, it's been no secret that Linux has had very severe NFS problems in the past, and that the NetApp folks have worked very hard over the last year to fix them in the most recent Linux kernel releases. The only real fault I give FreeBSD is rpc.lockd. It's pretty much useless in all but trivial circumstances. Beyond that, make sure you're using a linux kernel that is relatively recent. Scott ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: NFS Locking Issue
On Mon, Jul 03, 2006 at 03:40:01PM -0700, Michael Collette wrote: > User Freebsd wrote: > >On Sat, 1 Jul 2006, Francisco Reyes wrote: > > > >>John Hay writes: > >> > >>>I only started to see the lockd problems when upgrading the server side > >>>to FreeBSD 6.x and later. I had various FreeBSD clients, between 4.x > >>>and 7-current and the lockd problem only showed up when upgrading the > >>>server from 5.x to 6.x. > >> > >>It confirms the same we are experiencing.. constant freezing/locking > >>issues. > >>I guess no more 6.X for us.. for the foreseable future.. > > > >Since there are several of us experiencing what looks to be the same > >sort of deadlock issue, I beseech you not to give up > > Honestly trying not to. To tell ya the truth, I've been giving a real > hard look at Ubuntu for my serving needs. This NFS thing has got me > seriously questioning FreeBSD right at the moment. > > >... right now, all > >we've been able to get to the developers is virtually useless > >information (vmstat and such shows the problem, but it doesn't allow > >developers to identify the problem) ... > > > >Is this a problem that you can easily recreate, even on a non-production > >machine? > > Oh yeah. I've got a couple of ways I'm able to get this to fail. > > Method #1: > - > Let's start with the simplest. The scenario here involves 2 machines, > mach01 and mach02. Both are running 6-STABLE, and both are running > rpcbind, rpc.statd, and rpc.lockd. mach01 has exported /documents and > mach02 is mounting that export under /mnt. Simple enough? > > The /documents directory has multiple subdirectories and files of > various sizes. The actual amount of data doesn't really matter to > produce a failure. All you need to do at this point is to try to copy > files from that mount point to somewhere else on the hard drive. > > cp -Rp /mnt/* /tmp/documents/ > > You may, or not, see that a couple of subdirectories were created, but > no files actually moved over. The cp command is now locked up, and no > traffic moves. This usually takes a second or two to show up as a > problem. I can repeat this with multiple 6-STABLE boxes. > > Turn off rpc.lockd on either the server or client before the cp command, > and things work. Either way you specified is too vague to reproduce the problem. As was said, you shall supply tcpdump of the failed nfs session. Personally, I tried to do what you described as method 1, and got no hangs, everything copied as it should be. I did it between amd64 6.1-STABLE as of yesterday (client) and same STABLE i386 as server. Monitoring lockd interaction by ethereal also did not reveal anything. So, what you need to provide to help debug the issue: 1. as detailed information on problem machines configuration as possible 2. exact version of the software you using 3. tcpdump of nfs sessions (for me, it is preferable to get raw tcpdump that could be load into ethereal) 4. log of rpc.lockd both on client and server (see the -d option in man page). Issue seems to be highly specific for some configuration details. And, for instance, me is unable to reproduce it on debug testbench. Without help of the user experiencing trouble, it could take forever to kill that bug. pgpzCzk7MU4oD.pgp Description: PGP signature
Re: NFS Locking Issue
On Mon, 3 Jul 2006, Michael Collette wrote: http://www.freebsd.org/cgi/query-pr.cgi?pr=80389 If you locally back out the referenced change lock_proc.c:1.18 in rpc.lockd on the server, do things improve? Robert N M Watson Computer Laboratory University of Cambridge ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: NFS Locking Issue
> Using Ubuntu as the server I connected a FreeBSD 5.4 and 6-stable box as > clients on a 100Mb/s network. The time trial used a dummy 100Meg file > transfered from the server to the client. > I have similar experiences here. With FreeBSD-6.1 as client (using an Intel etherexpress card at 100 Mb/s) and FC5 server i see full wire speed for file transfers via NFS. > After the 4th of July I intend to test Ubuntu as a client to a FreeBSD > 6-STABLE server on a gigabit lan to run similar time trials. I'm > looking to confirm what I can only suspect at this point, which is that > the NFS server on FreeBSD is mucked up, but the client is okay. I have the same impression. The 6.1-RELEASE client seems to work well. Yesterday i have upgraded my 6.0 (*) box to 6.1 and i have not seen a single NFS problem after that. Moreover i am using rpc.statd, and rpc.lockd and they work OK and are really functional. I have the following sysctl which may have an effect on the problem: vfs.nfs.access_cache_timeout=5 So it may well be that it is the FreeBSD NFS server code which has problems. (*) 6.0-RELEASE client definitively does not work OK for me. -- Michel TALON ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: NFS Locking Issue
User Freebsd wrote: On Sat, 1 Jul 2006, Francisco Reyes wrote: John Hay writes: I only started to see the lockd problems when upgrading the server side to FreeBSD 6.x and later. I had various FreeBSD clients, between 4.x and 7-current and the lockd problem only showed up when upgrading the server from 5.x to 6.x. It confirms the same we are experiencing.. constant freezing/locking issues. I guess no more 6.X for us.. for the foreseable future.. Since there are several of us experiencing what looks to be the same sort of deadlock issue, I beseech you not to give up Honestly trying not to. To tell ya the truth, I've been giving a real hard look at Ubuntu for my serving needs. This NFS thing has got me seriously questioning FreeBSD right at the moment. ... right now, all we've been able to get to the developers is virtually useless information (vmstat and such shows the problem, but it doesn't allow developers to identify the problem) ... Is this a problem that you can easily recreate, even on a non-production machine? Oh yeah. I've got a couple of ways I'm able to get this to fail. Method #1: - Let's start with the simplest. The scenario here involves 2 machines, mach01 and mach02. Both are running 6-STABLE, and both are running rpcbind, rpc.statd, and rpc.lockd. mach01 has exported /documents and mach02 is mounting that export under /mnt. Simple enough? The /documents directory has multiple subdirectories and files of various sizes. The actual amount of data doesn't really matter to produce a failure. All you need to do at this point is to try to copy files from that mount point to somewhere else on the hard drive. cp -Rp /mnt/* /tmp/documents/ You may, or not, see that a couple of subdirectories were created, but no files actually moved over. The cp command is now locked up, and no traffic moves. This usually takes a second or two to show up as a problem. I can repeat this with multiple 6-STABLE boxes. Turn off rpc.lockd on either the server or client before the cp command, and things work. Method #2: - Booting to a diskless work station. The server (mach01) has exported /usr, /usr/local, /usr/X11R6 and enough other stuff to get a diskless workstation up and running. Not going to get into all the details here other than to say that I have a fully functioning setup like this on 5.4 boxes now. I've knocked the boot up of the diskless client (mach02) down to console only. Once at the console I startx with a regular user, taking me in to twm. From there I try to launch a KDE application, which in my test case is kwrite. The same situation is true with launching a GTK app, such as Gimp. X and twm start up. I've got all the rest of the system reasonably functional. When I try to run kwrite, none of the KDE subsystems start up. kwrite just sits there in a lockd state. Same is true of Gimp. If I shutdown rpc.lockd on either machine I'm able to bring up a full KDE desktop, with all applications able to run. Other Testing: - At one point we had in our test network a 6.1 NFS server providing files to 5.4 diskless clients without any problems. We first got to noticing the bulk of the glitches when I moved the diskless setup to use a 6.1 kernel. As I said, I've been looking at Linux alternatives. Especially after reading about Michel Talon's experiences with Fedora. I initially tried CentOS, but wasn't able to get NFS working properly on that thing. I had an Ubuntu CD handy, so I installed it on a test box. Wow, does that NFS server boogie! Using Ubuntu as the server I connected a FreeBSD 5.4 and 6-stable box as clients on a 100Mb/s network. The time trial used a dummy 100Meg file transfered from the server to the client. We measured 90Mb/s transfer, which was FAR faster than I had ever been able to get 2 FreeBSD boxes to perform doing similar tests. I then used Ubuntu to connect to a 5.4 server we have in production. I don't recall the exact stats, but it was close to 10x slower. No lockups here though. After the 4th of July I intend to test Ubuntu as a client to a FreeBSD 6-STABLE server on a gigabit lan to run similar time trials. I'm looking to confirm what I can only suspect at this point, which is that the NFS server on FreeBSD is mucked up, but the client is okay. As time allows I hope to run similar tests between two Ubuntu boxes, then run it all again with Fedora. Seriously debating whether to move some or all of our infrastructure to Linux after all this. A 3-4 month old known bug like this gives me a great deal of concern about FreeBSD. That, and Ubuntu's NFS server speed just about knocked me over! In my case, I have one machine fully configured for debugging, but, of course,
Re: NFS Locking Issue
Garance A Drosihn wrote: At 9:13 PM -0400 7/1/06, Francisco Reyes wrote: John Hay writes: I only started to see the lockd problems when upgrading the server side to FreeBSD 6.x and later. I had various FreeBSD clients, between 4.x and 7-current and the lockd problem only showed up when upgrading the server from 5.x to 6.x. It confirms the same we are experiencing.. constant freezing/locking issues. I guess no more 6.X for us.. for the foreseable future.. I don't know if this will be of any help to anyone, but... I recently moved a network-based service from a 4.x machine to a 6.x machine. Despite some testing in advance of the switch, many people had problems with the service. I booted to a somewhat out-of-date snapshot of 5.x on the same box. I still had problems, but it didn't seem as bad, so I stuck with the 5.x system. Some problems turned out to be bugs in the service itself, and were eventually found and fixed. However, one set of problems on that out-of-date snapshot of 5.x were solved by adding: net.inet.tcp.rfc1323=0 to /etc/sysctl.conf. The guy who suggested that said it avoided a bug which was fixed in later versions of either 5.x or 6.x, I forget which. Of interest is that the bug was such that some people connecting to the service were never bothered by the bug, while other people could not use the service at all until I turned off tcp.rfc1323 . I have a test version of the same service running on a different FreeBSD/i386 box, and that box is now updated to freebsd-stable as of June 10th. Lo and behold, someone connecting to that test box reported some problems. So I typed in 'sysctl net.inet.tcp.rfc1323=0', and his problem immediately disappeared. So, it might be that there is still some problem with the rfc1323 processing, or that the bug which had been fixed has somehow been re-introduced. In any case, people who are experiencing problems with NFS might want to try that, and see if it makes any difference. It does strike me as odd that some people are having a *lot* of trouble with NFS under 6.x, while others seem to be okay with it. Perhaps the difference is the network topology between the NFS server and the NFS clients. Obviously, this is nothing but a guess on my part. I am not a networking guru! Thanks for the try Garance, but in my setup it didn't make any difference. I'll get into a bit more detail about my setup in another post. Later on, -- Michael Collette IT Manager TestEquity Inc [EMAIL PROTECTED] ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: NFS Locking Issue
At 9:13 PM -0400 7/1/06, Francisco Reyes wrote: John Hay writes: I only started to see the lockd problems when upgrading the server side to FreeBSD 6.x and later. I had various FreeBSD clients, between 4.x and 7-current and the lockd problem only showed up when upgrading the server from 5.x to 6.x. It confirms the same we are experiencing.. constant freezing/locking issues. I guess no more 6.X for us.. for the foreseable future.. I don't know if this will be of any help to anyone, but... I recently moved a network-based service from a 4.x machine to a 6.x machine. Despite some testing in advance of the switch, many people had problems with the service. I booted to a somewhat out-of-date snapshot of 5.x on the same box. I still had problems, but it didn't seem as bad, so I stuck with the 5.x system. Some problems turned out to be bugs in the service itself, and were eventually found and fixed. However, one set of problems on that out-of-date snapshot of 5.x were solved by adding: net.inet.tcp.rfc1323=0 to /etc/sysctl.conf. The guy who suggested that said it avoided a bug which was fixed in later versions of either 5.x or 6.x, I forget which. Of interest is that the bug was such that some people connecting to the service were never bothered by the bug, while other people could not use the service at all until I turned off tcp.rfc1323 . I have a test version of the same service running on a different FreeBSD/i386 box, and that box is now updated to freebsd-stable as of June 10th. Lo and behold, someone connecting to that test box reported some problems. So I typed in 'sysctl net.inet.tcp.rfc1323=0', and his problem immediately disappeared. So, it might be that there is still some problem with the rfc1323 processing, or that the bug which had been fixed has somehow been re-introduced. In any case, people who are experiencing problems with NFS might want to try that, and see if it makes any difference. It does strike me as odd that some people are having a *lot* of trouble with NFS under 6.x, while others seem to be okay with it. Perhaps the difference is the network topology between the NFS server and the NFS clients. Obviously, this is nothing but a guess on my part. I am not a networking guru! -- Garance Alistair Drosehn= [EMAIL PROTECTED] Senior Systems Programmer or [EMAIL PROTECTED] Rensselaer Polytechnic Instituteor [EMAIL PROTECTED] ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: NFS Locking Issue
Michel Talon wrote: [ ...a long email snipped... ] My only conclusion is that these NFS stories are very tricky. The only moment everything worked fine was when we were running Solaris on the server. I can't speak to the earlier part about NFS with Linux, but at least I very much agree with your conclusion: Solaris makes one of the best NFS servers available, over a broad range of use cases. However, I also wish to note that if you want to use NFS and you need remote locking to work, your best hope is when the software you use is willing to use explicit lockfiles rather than depending on rpc.lockd to provide remote flock()/lockf()-style locking. There are plenty of software out there which includes locking tests (sendmail does, UWash IMAP does, Perl does, etc), and my observation has been that actually using NFS-based remote locking under anything beyond trivial load tends to make rpc.lockd terminate within seconds (maybe with a core dump, if you get lucky), or end up with processes getting stuck forever waiting on locks that don't ever return because they've been lost somewhere in limbo. YMMV. :-) -- -Chuck ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: NFS Locking Issue
On Mon, 3 Jul 2006, Francisco Reyes wrote: Kostik Belousov writes: I think that then 6.2 and 6.3 is not for you either. Problems cannot be fixed until enough information is given. I am trying.. but so far only other users who are having the same problem are commenting on this and other simmilar threads. We just need some guidance.. Mark gave me a URL to turn on debugging and volunteered ot give me some pointers.. I will try, but I will likely try on my own time, on my own machines.. I can not tell the owner of the company I work for to let me "try".. or "play around" in production machines.. as we loose customers because of current problems with the 6.X line. Since nobody except you experience that problems (at least, only you notified about the problem existence) Did you miss the part of: User Freebsd writes: Since there are several of us experiencing what looks to be the same sort of deadlock issue, I beseech you not to give up I am not the only one reporting or having the issue. Careful here, I think this is where things are getting confused ... the above is related to the deadlock (high vmstat blockd issue), not the NFS issue ... we're getting two different issues confused :) improved handling of signals in nfs client. If you could test it, that would be useful. Does it matter if the OS is i386 or am64? Have an amd64 machine I can more easily play with... with no risk to production. Does the amd64 machine exhibit the same problem? Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email . [EMAIL PROTECTED] MSN . [EMAIL PROTECTED] Yahoo . yscrappy Skype: hub.orgICQ . 7615664 ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: NFS Locking Issue
> So it would appear that you cured the NFS problems inherent with FBSD-6 > by replacing FBSD with Fedora Linux. Nice to know that NFSd works in Linux. > But won't help those on the FBSD list fix their FBSD-6 boxen. :/ > First NFS is designed to make machines of different OSs interact properly. If a FreeBSD server interacts properly with a FreeBSD client, but not other clients, you cannot say that the situation is fine. Second i am not the one to chose the NFS server, there are people working in social groups, in the real world. And third, the most important, the OP message seemed to imply that the FreeBSD-6 NFS client was at fault, i pointed out that in my experience my FreeBSD-6.1 client works OK, while the 6.0 doesn't, when interacting with a FC5 server. This is in itself a relevant piece of information for the problem at hand. It may be that the server side is at fault, or some complex interaction between client and server. Anyways some people claimed here that they had no problem with FreeBSD-5 clients and servers. My experience is that i had constant problems between FreeBSD-5 clients and Fedora Core 3 servers. I cannot provide any other data point. I am not particularly sure of the quality of the FC3 or FC5 NFS server implementation, except that the ~ 100 workstations running the similar Fedora distribution work like a charm with their homes NFS mounted on the server. On the other hand a Debian client machine also has severe NFS problems. My only conclusion is that these NFS stories are very tricky. The only moment everything worked fine was when we were running Solaris on the server. -- Michel TALON ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: NFS Locking Issue
On Mon, Jul 03, 2006 at 10:06:52AM +0100, Robert Watson wrote: > It sounds like there is also an NFS client race condition or other bug of > some sort. It may not be related, directly, but one thing that I noticed, while trying to sort out my own recently commissioned NFS setup, is that the -r1024 mount flag is *crucial* when the network is 100BaseT and the server is a new, fast amd64 box, and the client is an old P3-500 with a RealTek ethernet card. It works fine, now, but tcpdump showed that it was retrying forever without. Even NFS over TCP seemed to suffer a bunch of error-related retries which amounted to stalls in the client. Is there any way for this sort of thing to be adjusted automatically? Cheers, -- Andrew ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: NFS Locking Issue
On Mon, Jul 03, 2006 at 10:06:52AM +0100, Robert Watson wrote: > > On Mon, 3 Jul 2006, Kostik Belousov wrote: > > >On Mon, Jul 03, 2006 at 12:50:11AM -0400, Francisco Reyes wrote: > >>Kostik Belousov writes: > >>>Since nobody except you experience that problems (at least, only you > >>>notified > >>>about the problem existence) > >> > >>Did you miss the part of: > >> > >>>User Freebsd writes: > Since there are several of us experiencing what looks to be the same > sort > of deadlock issue, I beseech you not to give up > >> > >>I am not the only one reporting or having the issue. > >I think you have different issues. > > I agree. It looks like we have several issues floating around. There are > some known issues with rpc.lockd (and probably some unknown ones) that will > require a concerted effort to resolve. There appear to be a number of > reports relating to this/these problems. > > It sounds like there is also an NFS client race condition or other bug of > some sort. > > I think it would be really useful to isolate the two during debugging. > Specifically, to make sure that the second client bug is reproduceable > without rpc.lockd running on the client (and related mount flags). Once we > have some more information, such as vnode locking information, client > thread stack traces, etc, we should probably get Mohan in the loop if > things seem sticky. I believe he was on vacation last week; he may be back > this week sometime. With the July 4 weekend afoot, a lot of .us developers > are offline. I too did noted some time ago that unresposible nfs server takes nfs client down. I then looked at the issue, and have the impression that this is again the case of runningbufspace depletion. I got a lot of processes in wdrain and flswai states. After nfs server repaired, active write requests were executed, number of dirty buffers decreased, and system returned to normal operation. This seems to be an architectural issue. I tried to bring discussion up several month ago, but got no response. And, there is the small problem about SIGINT being ignored when mounted with intr flag. Patch to fix this is attached in my previous mail. pgpJkB9m4Wicz.pgp Description: PGP signature
Re: NFS Locking Issue
On Mon, 3 Jul 2006, Kostik Belousov wrote: On Mon, Jul 03, 2006 at 12:50:11AM -0400, Francisco Reyes wrote: Kostik Belousov writes: Since nobody except you experience that problems (at least, only you notified about the problem existence) Did you miss the part of: User Freebsd writes: Since there are several of us experiencing what looks to be the same sort of deadlock issue, I beseech you not to give up I am not the only one reporting or having the issue. I think you have different issues. I agree. It looks like we have several issues floating around. There are some known issues with rpc.lockd (and probably some unknown ones) that will require a concerted effort to resolve. There appear to be a number of reports relating to this/these problems. It sounds like there is also an NFS client race condition or other bug of some sort. I think it would be really useful to isolate the two during debugging. Specifically, to make sure that the second client bug is reproduceable without rpc.lockd running on the client (and related mount flags). Once we have some more information, such as vnode locking information, client thread stack traces, etc, we should probably get Mohan in the loop if things seem sticky. I believe he was on vacation last week; he may be back this week sometime. With the July 4 weekend afoot, a lot of .us developers are offline. Robert N M Watson Computer Laboratory University of Cambridge ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: NFS Locking Issue
Quoting Michel Talon <[EMAIL PROTECTED]>: I guess I'm still just a bit stunned that a bug this obvious not only found it's way into the STABLE branch, but is still there. Maybe it's not as obvious as I think, or not many folks are using it? All I know for sure here is that if I had upgraded to 6.1 my network would have been crippled. Strange, since i upgraded to FreeBSD-6.1 and the NFS server to Fedora Core 5, my machine, NFS client is happy, and lockd works. It is first time since years i have no problem. It certainly did not work with FreeBSD-5 and i still have a machine with FreeBSD-6.0 which does not work properly (frequently loses the NFS mount, but it gets remounted some times later by amd). Anyways i have exactly 0 problem with the 6.1 machine. I could extend that to say that everything works very well on that machine, nothing is slow, including disk access. This has not always been the case. Stability wise, i have not seen any panic, hang or whatever since i have compiled a kernel adapted to my hardware. I got a panic with the generic kernel soon after installation, but now machine is totally stable. So it would appear that you cured the NFS problems inherent with FBSD-6 by replacing FBSD with Fedora Linux. Nice to know that NFSd works in Linux. But won't help those on the FBSD list fix their FBSD-6 boxen. :/ -- Michel TALON ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]" -- panic: kernel trap (ignored) - FreeBSD 5.4-RELEASE-p12 (SMP - 900x2) Tue Mar 7 19:37:23 PST 2006 / pgpP612thhpv3.pgp Description: PGP Digital Signature
Re: NFS Locking Issue
On Mon, Jul 03, 2006 at 12:50:11AM -0400, Francisco Reyes wrote: > Kostik Belousov writes: > >Since nobody except you experience that problems (at least, only you > >notified > >about the problem existence) > > Did you miss the part of: > > >User Freebsd writes: > >>Since there are several of us experiencing what looks to be the same sort > >>of deadlock issue, I beseech you not to give up > > I am not the only one reporting or having the issue. I think you have different issues. > > >Is this for intr mounts? > > "intr" ? Mount option that allows to interrupt nfs operation by signal. See mount_nfs(8). BTW, I had the impression that this feature not working was one of your problem. > > > >improved handling of signals in nfs client. If you could test it, that > >would be useful. > > Does it matter if the OS is i386 or am64? > Have an amd64 machine I can more easily play with... with no risk to > production. No, this shall be applicable to any arch. Except that the patches are several month old, and were developed against CURRENT. But I think that it is applicable to STABLE. pgpI2ne8y7oAR.pgp Description: PGP signature
Re: NFS Locking Issue
Kostik Belousov writes: I think that then 6.2 and 6.3 is not for you either. Problems cannot be fixed until enough information is given. I am trying.. but so far only other users who are having the same problem are commenting on this and other simmilar threads. We just need some guidance.. Mark gave me a URL to turn on debugging and volunteered ot give me some pointers.. I will try, but I will likely try on my own time, on my own machines.. I can not tell the owner of the company I work for to let me "try".. or "play around" in production machines.. as we loose customers because of current problems with the 6.X line. Since nobody except you experience that problems (at least, only you notified about the problem existence) Did you miss the part of: User Freebsd writes: Since there are several of us experiencing what looks to be the same sort of deadlock issue, I beseech you not to give up I am not the only one reporting or having the issue. Is this for intr mounts? "intr" ? improved handling of signals in nfs client. If you could test it, that would be useful. Does it matter if the OS is i386 or am64? Have an amd64 machine I can more easily play with... with no risk to production. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: NFS Locking Issue
On Sun, Jul 02, 2006 at 05:49:44PM -0400, Francisco Reyes wrote: > User Freebsd writes: > > >Since there are several of us experiencing what looks to be the same sort > >of deadlock issue, I beseech you not to give up > > I will try to setup the environment, but to be honest no more 6.X for us > until 6.2 or 6.3.. We have lost clients already. I think that then 6.2 and 6.3 is not for you either. Problems cannot be fixed until enough information is given. Since nobody except you experience that problems (at least, only you notified about the problem existence), no bug reports with sufficient information is given. > > >Is this a problem that you can easily recreate > > There is one thing I can easily recreate that would very helpfull to solve. > The 6.X NFS clients freeze if the NFS server goes away. > > I have been able to reproduce that every single time.. both in test and > production. Is this for intr mounts ? I posted some time ago the patches that improved handling of signals in nfs client. If you could test it, that would be useful. ? sys/nfsclient/.arch-ids Index: sys/nfsclient/nfs_socket.c === RCS file: /usr/local/arch/ncvs/src/sys/nfsclient/nfs_socket.c,v retrieving revision 1.141 diff -u -r1.141 nfs_socket.c --- sys/nfsclient/nfs_socket.c 23 May 2006 18:33:58 - 1.141 +++ sys/nfsclient/nfs_socket.c 3 Jul 2006 04:19:23 - @@ -1701,11 +1701,13 @@ p = td->td_proc; PROC_LOCK(p); tmpset = p->p_siglist; + SIGSETOR(tmpset, td->td_siglist); SIGSETNAND(tmpset, td->td_sigmask); mtx_lock(&p->p_sigacts->ps_mtx); SIGSETNAND(tmpset, p->p_sigacts->ps_sigignore); mtx_unlock(&p->p_sigacts->ps_mtx); - if (SIGNOTEMPTY(p->p_siglist) && nfs_sig_pending(tmpset)) { + if ((SIGNOTEMPTY(p->p_siglist) || SIGNOTEMPTY(td->td_siglist)) + && nfs_sig_pending(tmpset)) { PROC_UNLOCK(p); return (EINTR); } Index: sys/nfsclient/nfs_vnops.c === RCS file: /usr/local/arch/ncvs/src/sys/nfsclient/nfs_vnops.c,v retrieving revision 1.266 diff -u -r1.266 nfs_vnops.c --- sys/nfsclient/nfs_vnops.c 19 May 2006 00:04:24 - 1.266 +++ sys/nfsclient/nfs_vnops.c 3 Jul 2006 04:19:24 - @@ -2716,7 +2716,7 @@ * otherwise just do it ourselves. */ if ((bp->b_flags & B_ASYNC) == 0 || - nfs_asyncio(VFSTONFS(ap->a_vp->v_mount), bp, NOCRED, td)) + nfs_asyncio(VFSTONFS(ap->a_vp->v_mount), bp, NOCRED, curthread)) (void)nfs_doio(ap->a_vp, bp, cr, td); return (0); } pgpVXbZXOGFFf.pgp Description: PGP signature
Re: NFS Locking Issue
User Freebsd writes: Since there are several of us experiencing what looks to be the same sort of deadlock issue, I beseech you not to give up I will try to setup the environment, but to be honest no more 6.X for us until 6.2 or 6.3.. We have lost clients already. Is this a problem that you can easily recreate There is one thing I can easily recreate that would very helpfull to solve. The 6.X NFS clients freeze if the NFS server goes away. I have been able to reproduce that every single time.. both in test and production. machine? In my case, I have one machine fully configured for debugging, Although solving both, server and client, would be great for us if we could at least solve the client.. it would be very helpfull.. until our next server comes.. in which we are going to install 5.5 information to the developers to debug this, the faster it will get fixed Agree.. but with 4+ crashes in less than a week it has reached the point where we have moved workload away from the most problematic machine.. to try to aliviate the problem.. but still was not enough.. to prevent at least one big customer of ours to go.. We don't keep tight track of the smaller ones. :-) different then your auto-mechanic ... try telling him there is a 'knocking under the hood, please tell me how to fix it, but you can't have my car', and he'll brush you off ... give him 30 minutes under the hood, and not only will he have identified it, but he'll probably fix it too ... The problem is when you are a taxi driver... and it cost you money to have the car off the streets.. and you don't know when the 'knocking' will occur... :-) Will setup my laptop with the debug settings and will then work on trying to debug the client problem... depending on how that goes will then possibly try the server that is giving us problems. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: NFS Locking Issue
On Sat, 1 Jul 2006, Francisco Reyes wrote: John Hay writes: I only started to see the lockd problems when upgrading the server side to FreeBSD 6.x and later. I had various FreeBSD clients, between 4.x and 7-current and the lockd problem only showed up when upgrading the server from 5.x to 6.x. It confirms the same we are experiencing.. constant freezing/locking issues. I guess no more 6.X for us.. for the foreseable future.. Since there are several of us experiencing what looks to be the same sort of deadlock issue, I beseech you not to give up ... right now, all we've been able to get to the developers is virtually useless information (vmstat and such shows the problem, but it doesn't allow developers to identify the problem) ... Is this a problem that you can easily recreate, even on a non-production machine? In my case, I have one machine fully configured for debugging, but, of course, since re-configuring it, it hasn't exhibited the problem ... if most of us get our machines configured properly to give useful information to the developers to debug this, the faster it will get fixed ... My experience with most of the developers is that if you can get into DDB and give them 'internal traces' of the code, bugs tend to get fixed very quickly ... vmstat/ps give "external views", more summaries then anything ... its the details "under the hood" that they need ... its not much different then your auto-mechanic ... try telling him there is a 'knocking under the hood, please tell me how to fix it, but you can't have my car', and he'll brush you off ... give him 30 minutes under the hood, and not only will he have identified it, but he'll probably fix it too ... Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email . [EMAIL PROTECTED] MSN . [EMAIL PROTECTED] Yahoo . yscrappy Skype: hub.orgICQ . 7615664 ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: NFS Locking Issue
> John Hay writes: > > > I only started to see the lockd problems when upgrading the server side > > to FreeBSD 6.x and later. I had various FreeBSD clients, between 4.x > > and 7-current and the lockd problem only showed up when upgrading the > > server from 5.x to 6.x. > > It confirms the same we are experiencing.. constant freezing/locking issues. > I guess no more 6.X for us.. for the foreseable future.. just to add some more 'ingredients' to the problems: 1- we are suffering from the lockd syndrome 2- am-utils sometimes failes - specially /net (type:=host) [there seems to be a race condition] both problems are new since 6.1 and now, on a 'mostly idle' machine, after failing to compile openoffice-2.0 the lockd is 'spinning' with no real work, at least so it seems: last pid: 69935; load averages: 0.16, 0.10, 0.08up 1+16:37:25 09:37:09 44 processes: 1 running, 43 sleeping CPU states: 2.6% user, 0.0% nice, 0.4% system, 0.4% interrupt, 96.6% idle Mem: 129M Active, 2796M Inact, 157M Wired, 106M Cache, 214M Buf, 132M Free Swap: 4096M Total, 4096M Free PID USERNAMETHR PRI NICE SIZERES STATE C TIME WCPU COMMAND 513 root 1 960 48628K 45304K select 1 67:39 5.13% rpc.lockd 498 root 1 40 2420K 868K - 1 23:38 0.83% nfsd 419 root 1 960 5408K 2088K select 1 98:13 0.00% amd-6.1.5 danny ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: NFS Locking Issue
John Hay writes: I only started to see the lockd problems when upgrading the server side to FreeBSD 6.x and later. I had various FreeBSD clients, between 4.x and 7-current and the lockd problem only showed up when upgrading the server from 5.x to 6.x. It confirms the same we are experiencing.. constant freezing/locking issues. I guess no more 6.X for us.. for the foreseable future.. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: NFS Locking Issue
On 6/29/06, Michael Collette <[EMAIL PROTECTED]> wrote: This last week I had been working on a test network to test out 6.1 prior to upgrading our production boxes from 5.4. That's when I ran across the rpc.lockd issues that have been discussed earlier. Our production setup has diskless clients running KDE, which due to this bug is now dead on 6.1. I also have my mail server delivering messages to a file server via NFS. I even have servers booting diskless with NFS provided file systems... all of which are dead on 6.1. The last discussion our bug updates I've seen on this issue were about 3 months ago. This leaves me with a number of questions I hope can be answered here on this list. Is NFS a big deal for most other users, or am I out here on the fringe using it as much as I do? Is anyone working on a fix for this? If so, is there any kind of time frame where this fix might be MFC'd to 6-STABLE? I guess I'm still just a bit stunned that a bug this obvious not only found it's way into the STABLE branch, but is still there. Maybe it's not as obvious as I think, or not many folks are using it? All I know for sure here is that if I had upgraded to 6.1 my network would have been crippled. Try 6.1-STABLE, especially make sure you have $FreeBSD: src/usr.sbin/rpc.lockd/kern.c,v 1.16.2.1 2006/06/02 01:20:58 rodrigc Exp $ for usr.sbin/rpc.lockd/kern.c, and see if this helps. Regards, Rong-En Fan ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: NFS Locking Issue
> Based on prior reading about this problem, I'd venture to guess that the > file locking between FC5 and FreeBSD simply isn't. See, between just 2 > machines sharing files without rpc.lockd running you won't see a > problem. Both the client and the server must not only be running > rpc.lockd, but they must be able to actually talk to each other. > I definitely disagree with that. I have written a little program just to check locking on files on the NFS share, and i can assure you it works. Before FC5 the same program did not work, in fact hanged. You could not kill the program, without unmounting the NFS share. After the upgrade FC3 -> FC5 the lockd works and if i try setting a second lock on the same file it will fail. I am using this daily with mutt, no problem. But it is not only lockd which now works, it is more generally NFS. On a 6.0 machine i regularly get things like: Jun 22 17:30:10 asmodee kernel: for server ada:/ada1 Jun 22 17:30:10 asmodee kernel: nfs send error 1 for server ada:/ada1 Jun 22 17:30:10 asmodee last message repeated 797 times Jun 22 17:30:15 asmodee kernel: for server ada:/ada Jun 22 17:30:15 asmodee kernel: nfs send error 1 for server ada:/ada Jun 22 17:30:15 asmodee last message repeated 817 times Jun 22 17:30:20 asmodee kernel: nfs send error 35 for server ada:/ada and the home directories are inaccessible for a couple of minutes. I have never seen that once on the 6.1 machine. -- Michel TALON ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: NFS Locking Issue
> I only started to see the lockd problems when upgrading the server side > to FreeBSD 6.x and later. I had various FreeBSD clients, between 4.x > and 7-current and the lockd problem only showed up when upgrading the > server from 5.x to 6.x. As far as i remember FreeBSD-4 did not have a true lockd, only a fake one, so it was always working no problem. I have used all versions of FreeBSD-5 up to 6.0 and 6.1 on my client with a Linux server, and i can say that 6.1 is the first one which works OK for me. I don't have any experience with FreeBSD server, except the occasional nfs mounting after a make world. -- Michel TALON ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: NFS Locking Issue
> the one thing that sticks out to me about this report is that they > upgraded teh NFS server to FC5 ... what was the server running before? if > FreeBSD, could the problem be an interaction problem between the NFS > server and client, vs just the client side? Previously the server used Fedora Core 3. I think like you that it is an interaction between client and server. For example we have a client machine running Debian Unstable which had NFS problems interacting FC3 server and still has with FC5 server. But i don't have any more with Fbsd-6.1. As to the problem of the machine freezing when the server freezes i have always seen that, both under Linux and FreeBSD, nothing new. The freeze seems to me less severe now, that is i have been able to log in root with the server down. The load on the server is rather big, we are talking around 100 machines having their home directories on the server. -- Michel TALON ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: NFS Locking Issue
Michel Talon wrote: I guess I'm still just a bit stunned that a bug this obvious not only found it's way into the STABLE branch, but is still there. Maybe it's not as obvious as I think, or not many folks are using it? All I know for sure here is that if I had upgraded to 6.1 my network would have been crippled. Strange, since i upgraded to FreeBSD-6.1 and the NFS server to Fedora Core 5, my machine, NFS client is happy, and lockd works. It is first time since years i have no problem. It certainly did not work with FreeBSD-5 and i still have a machine with FreeBSD-6.0 which does not work properly (frequently loses the NFS mount, but it gets remounted some times later by amd). Anyways i have exactly 0 problem with the 6.1 machine. I could extend that to say that everything works very well on that machine, nothing is slow, including disk access. This has not always been the case. Stability wise, i have not seen any panic, hang or whatever since i have compiled a kernel adapted to my hardware. I got a panic with the generic kernel soon after installation, but now machine is totally stable. Based on prior reading about this problem, I'd venture to guess that the file locking between FC5 and FreeBSD simply isn't. See, between just 2 machines sharing files without rpc.lockd running you won't see a problem. Both the client and the server must not only be running rpc.lockd, but they must be able to actually talk to each other. For a simple 2 machine setup, you don't really need much in the way of locking control, as you don't have to deal with multiple requests for the same resource. This is why folks just running the "-L" flag on their mount command also aren't having any problems. To actually see the problem isn't too hard to set up. Just have rpc.lockd, rpc.statd, and rpcbind enabled on both the client and the server. Then just starting trying to transfer a stack of files from one to the other. I found this to be true even trying to go from a 5.4 server to my 6.1 laptop here. There was quite a thread on this back in March of this year, along with a few PR's that are still opened up. I'm personally just coming head long into all of this. Later on, -- Michael Collette IT Manager TestEquity Inc [EMAIL PROTECTED] ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: NFS Locking Issue
On Fri, Jun 30, 2006 at 01:03:09AM +0200, Michel Talon wrote: > > I guess I'm still just a bit stunned that a bug this obvious not only > > found it's way into the STABLE branch, but is still there. Maybe it's > > not as obvious as I think, or not many folks are using it? All I know > > for sure here is that if I had upgraded to 6.1 my network would have > > been crippled. > > Strange, since i upgraded to FreeBSD-6.1 and the NFS server to Fedora Core 5, > my machine, NFS client is happy, and lockd works. It is first time since > years i have no problem. It certainly did not work with FreeBSD-5 and i still > have a machine with FreeBSD-6.0 which does not work properly (frequently loses > the NFS mount, but it gets remounted some times later by amd). Anyways i have > exactly 0 problem with the 6.1 machine. I could extend that to say that > everything works very well on that machine, nothing is slow, including disk > access. This has not always been the case. Stability wise, i have not seen any > panic, hang or whatever since i have compiled a kernel adapted to my hardware. > I got a panic with the generic kernel soon after installation, but now > machine is totally stable. I only started to see the lockd problems when upgrading the server side to FreeBSD 6.x and later. I had various FreeBSD clients, between 4.x and 7-current and the lockd problem only showed up when upgrading the server from 5.x to 6.x. John -- John Hay -- [EMAIL PROTECTED] / [EMAIL PROTECTED] ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: NFS Locking Issue
User Freebsd writes: the one thing that sticks out to me about this report is that they upgraded teh NFS server to FC5 I wonder if the FreeBSD 6.X client would freeze with a non FreeBSD NFS server. Would be interesting to have that info for comparison. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: NFS Locking Issue
On Thu, 29 Jun 2006, Francisco Reyes wrote: Michel Talon writes: Strange, since i upgraded to FreeBSD-6.1 and the NFS server to Fedora Core 5, my machine, NFS client is happy, and lockd works. What volume are we talking about? My own problems and other reports I see are all under heavy load. the one thing that sticks out to me about this report is that they upgraded teh NFS server to FC5 ... what was the server running before? if FreeBSD, could the problem be an interaction problem between the NFS server and client, vs just the client side? Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email . [EMAIL PROTECTED] MSN . [EMAIL PROTECTED] Yahoo . yscrappy Skype: hub.orgICQ . 7615664 ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: NFS Locking Issue
Michel Talon writes: Strange, since i upgraded to FreeBSD-6.1 and the NFS server to Fedora Core 5, my machine, NFS client is happy, and lockd works. What volume are we talking about? My own problems and other reports I see are all under heavy load. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: NFS Locking Issue
On Thu, 29 Jun 2006 22:25:30 +0200, Michael Collette <[EMAIL PROTECTED]> wrote: Rong-en Fan wrote: On 6/29/06, Michael Collette <[EMAIL PROTECTED]> wrote: This last week I had been working on a test network to test out 6.1 prior to upgrading our production boxes from 5.4. That's when I ran across the rpc.lockd issues that have been discussed earlier. Our production setup has diskless clients running KDE, which due to this bug is now dead on 6.1. I also have my mail server delivering messages to a file server via NFS. I even have servers booting diskless with NFS provided file systems... all of which are dead on 6.1. The last discussion our bug updates I've seen on this issue were about 3 months ago. This leaves me with a number of questions I hope can be answered here on this list. Is NFS a big deal for most other users, or am I out here on the fringe using it as much as I do? Is anyone working on a fix for this? If so, is there any kind of time frame where this fix might be MFC'd to 6-STABLE? I guess I'm still just a bit stunned that a bug this obvious not only found it's way into the STABLE branch, but is still there. Maybe it's not as obvious as I think, or not many folks are using it? All I know for sure here is that if I had upgraded to 6.1 my network would have been crippled. Try 6.1-STABLE, especially make sure you have $FreeBSD: src/usr.sbin/rpc.lockd/kern.c,v 1.16.2.1 2006/06/02 01:20:58 rodrigc Exp $ for usr.sbin/rpc.lockd/kern.c, and see if this helps. I am running STABLE on all my test boxes, and the problem is very much there. It's not everything that locks up though. I'm able to bring X up with twm, but unable to launch any Gnome or KDE applications without them being stranded in a lock state. I sure would have loved for your suggestion to be correct. For what it's worth, all the boxes I'm working with are on STABLE no more than a week old. I ran fresh build worlds on all of them before getting the rest of my configs going. Thanks, Hello, I run my client with the -L mount option. This makes NFS locks local to the client, which is a workaround for me. If you depend on locks enforced on the server it wil not work. Ronald. -- Ronald Klop Amsterdam, The Netherlands ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: NFS Locking Issue
Rong-en Fan wrote: On 6/29/06, Michael Collette <[EMAIL PROTECTED]> wrote: This last week I had been working on a test network to test out 6.1 prior to upgrading our production boxes from 5.4. That's when I ran across the rpc.lockd issues that have been discussed earlier. Our production setup has diskless clients running KDE, which due to this bug is now dead on 6.1. I also have my mail server delivering messages to a file server via NFS. I even have servers booting diskless with NFS provided file systems... all of which are dead on 6.1. The last discussion our bug updates I've seen on this issue were about 3 months ago. This leaves me with a number of questions I hope can be answered here on this list. Is NFS a big deal for most other users, or am I out here on the fringe using it as much as I do? Is anyone working on a fix for this? If so, is there any kind of time frame where this fix might be MFC'd to 6-STABLE? I guess I'm still just a bit stunned that a bug this obvious not only found it's way into the STABLE branch, but is still there. Maybe it's not as obvious as I think, or not many folks are using it? All I know for sure here is that if I had upgraded to 6.1 my network would have been crippled. Try 6.1-STABLE, especially make sure you have $FreeBSD: src/usr.sbin/rpc.lockd/kern.c,v 1.16.2.1 2006/06/02 01:20:58 rodrigc Exp $ for usr.sbin/rpc.lockd/kern.c, and see if this helps. I am running STABLE on all my test boxes, and the problem is very much there. It's not everything that locks up though. I'm able to bring X up with twm, but unable to launch any Gnome or KDE applications without them being stranded in a lock state. I sure would have loved for your suggestion to be correct. For what it's worth, all the boxes I'm working with are on STABLE no more than a week old. I ran fresh build worlds on all of them before getting the rest of my configs going. Thanks, -- Michael Collette IT Manager TestEquity LLC [EMAIL PROTECTED] ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: NFS Locking Issue
Michael Collette writes: This last week I had been working on a test network to test out 6.1 prior to upgrading our production boxes from 5.4. I wish I had done that.. :-( That's when I ran across the rpc.lockd issues that have been discussed earlier. I am not familiar with that, but I can tell you from experience that the nfs client code in 6.X has issues.. In particular if the server goes down the client machine doesn't allow you to unmount the volume.. and if you have programs trying to acces the downed mount, the whole machine may end up freezing. ... I also have my mail server delivering messages to a file server via NFS. We use NFS as our "storage" sever for pop/imap, but use the MTA to deliver to the machine. Is NFS a big deal for most other users, or am I out here on the fringe using it as much as I do? It is for us.. I am even trying to see if we can even pay someone to expedite getting NFS fixed in 6. Unfortunately we decided to increase our NFS usage after I had installed 6.X in a number of new machines. Is anyone working on a fix for this? If there is I have not read about it. I guess I'm still just a bit stunned that a bug this obvious not only found it's way into the STABLE branch, but is still there. I am fairly new to NFS.. but I am getting the impression that FreeBSD's NFS is not as mature as other platforms. I also think it has a lot to do with usage patterns. I have seen mentions of people having hundreds of clients connected to a single NFS server... yet I see problems with just a handfull of clients. Maybe the issue is only with the 6.X branch. Sadly part of the reason I moved some newer machines to 6.X was because of some comments I saw on how NFS had been improved in 6.X :-( ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: NFS Locking Issue
me - too ... 2006/6/29, Michael Collette <[EMAIL PROTECTED]>: This last week I had been working on a test network to test out 6.1 prior to upgrading our production boxes from 5.4. That's when I ran across the rpc.lockd issues that have been discussed earlier. Our production setup has diskless clients running KDE, which due to this bug is now dead on 6.1. I also have my mail server delivering messages to a file server via NFS. I even have servers booting diskless with NFS provided file systems... all of which are dead on 6.1. The last discussion our bug updates I've seen on this issue were about 3 months ago. This leaves me with a number of questions I hope can be answered here on this list. Is NFS a big deal for most other users, or am I out here on the fringe using it as much as I do? Is anyone working on a fix for this? If so, is there any kind of time frame where this fix might be MFC'd to 6-STABLE? I guess I'm still just a bit stunned that a bug this obvious not only found it's way into the STABLE branch, but is still there. Maybe it's not as obvious as I think, or not many folks are using it? All I know for sure here is that if I had upgraded to 6.1 my network would have been crippled. Later on, -- Michael Collette IT Manager TestEquity LLC [EMAIL PROTECTED] ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]" -- С Уважением, Алексей Карагодов. Проектирование, построение, администрирование и поддержка информационных систем. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"