Re: Reproducable, possibly NFS related, fatal double fault in 6.2-R-p7
Quoting Kris Kennaway <[EMAIL PROTECTED]>: Chris H. wrote: -8uz_name); 2268 2269if (zone->uz_dtor) 2270zone->uz_dtor(item, keg->uk_size, udata); 2271#ifdef INVARIANTS 2272ZONE_LOCK(zone); 2273if (keg->uk_flags & UMA_ZONE_MALLOC) 2274uma_dbg_free(zone, udata, item); (kgdb) (kgdb) list *0xc0667e49 0xc0667e49 is in uma_zfree_arg (/usr/src/sys/vm/uma_core.c:2270). 2265#endif 2266CTR2(KTR_UMA, "uma_zfree_arg thread %x zone %s", curthread, 2267zone->uz_name); 2268 2269if (zone->uz_dtor) 2270zone->uz_dtor(item, keg->uk_size, udata); 2271#ifdef INVARIANTS 2272ZONE_LOCK(zone); 2273if (keg->uk_flags & UMA_ZONE_MALLOC) 2274uma_dbg_free(zone, udata, item); (kgdb) backtrace #0 doadump () at pcpu.h:165 #1 0xc052a7aa in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:409 #2 0xc052aa40 in panic (fmt=0xc070e8e5 "double fault") at /usr/src/sys/kern/kern_shutdown.c:565 #3 0xc06bc82e in dblfault_handler () at /usr/src/sys/i386/i386/trap.c:866 #4 0x in ?? () (kgdb) quit Script done, output file is /tmp/vmdump /usr/obj/usr/src/sys/NS1_01 3:52am Fri, 02 ns1# Hope this helps. --Chris P.S. Note to onlookers: I would have produced this information months ago except for a preconceived notion that it would be a difficult/time consuming task. D'OH! WRONG! It is truly a *trivial* task. So /please/ give generously! :) ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]" -- panic: kernel trap (ignored) ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Reproducable, possibly NFS related, fatal double fault in 6.2-R-p7
Chris H. wrote: I was recently able to find a small window in my workload. So I decided to use it to provide the "non-bogus" ;) information needed. After reading: http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneldebug.html and: http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneldebug-gdb.html a few days ago, I was only unclear on one point in setting up the required environment. So I posted my question to the list "dumpdev question (probably stupid)" which Andrey V. Elsukov immediately responded to. I'll be creating a Crash Dump in the next couple of days. So if it's not already abundantly clear that this is the first time I've attempted to produce this information - now would be the perfect time to /enlighten/ me as to anything you can think of that will ensure you get the information you're looking for. :) Thank you again for your reply. I think that document explains everything that is necessary, but if you are unsure about something please feel free to ask. Good luck :) Kris ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Reproducable, possibly NFS related, fatal double fault in 6.2-R-p7
Quoting Kris Kennaway <[EMAIL PROTECTED]>: Chris H. wrote: Quoting Kris Kennaway <[EMAIL PROTECTED]>: Clifton Royston wrote: On Tue, Oct 16, 2007 at 01:01:46PM -0700, Chris H. wrote: excerpt from this list titled: NFS == lock && reboot, that I posted follows: --8<---SNIP---8<-SNIP-8<--- # uname -a FreeBSD host.domain.tld 6.2-RELEASE FreeBSD 6.2-RELEASE #0: Fri Jan 26 16:27:14 PST 2007 Greetings, Does anyone know when NFS and friends will be working again? I haven't been able to /safely/ use it from 4.8 on. I remember some talk on the list sometime ago and then it seemed to be resolved, as the discussion ended. So I thought it was fixed. Seems not. :( My scenario; mount host off root: mount script exec'd follows... #!/bin/sh - mount -t nfs host.domain.tld:/ /host mount -t nfs host.domain.tld:/var /host/var confirm mount... # ls /host .snapCOPYRIGHTbin ... usrvartmp OK looks good... # cp /path/to/approx/10Mb/file /host/path/to/dest/dir/ Fatal double fault eis 0x0blah eiblah blah0x panic double fault no dump device defined rebooting in 15sec... Hmmm... that's not good. :( --8<---SNIP---8<-SNIP-8<--- My final solution was to change the lines in /etc/rc.conf from: nfs_client_enable="YES" nfs_reserved_port_only="YES" nfs_server_enable="YES" rpc_lockd_enable="YES" rpc_statd_enable="YES" rpcbind_enable="YES" to: nfs_client_enable="YES" nfs_reserved_port_only="YES" nfs_server_enable="YES" #rpc_lockd_enable="YES" #rpc_statd_enable="YES" rpcbind_enable="YES" Making those changes ended the "Fatal double fault && reboot in 15 seconds..." Thanks for this very timely mention! The cluster of servers I am about to upgrade from 4.8 to 6.2 relies heavily on NFS to an old Netapp. If I have got to disable rpc_lockd and rpc_statd, it's good to know that now! Can I ask, can anybody confirm that they're running 6.2 on NFS successfully *with* lockd and statd? Er, yes, of course it does. The old message he is quoting is bogus on its own, While I'll grant you that I haven't *yet* found/taken the time to create a dump device and re-enable rpd_lockd && rpc_statd && cp 10Mb file to mount point to produce an *instantaneous* "Fatal double fault". I don't think it's fair to label my original post entirely /bogus/ - especially in light of the recent post I replied to. Which seems to have some very common ground. I should probably mention that since my last posting (my original thread), I have some 20+ RELENG_6_2 boxen that *do* have rpd_lockd + rpc_statd enabled. Yet none of them produce a "Fatal double fault". They are all Tyan SMP boards with dual onboard fxp's - as opposed to the Nvidia UP which has a single onboard nve. They are all inter-connected via NFS. I have a 750Gb drive hanging off the /problematic/ Nvidia board, that I had intended to use for NFS back-up's. But given the NFS issue I had with it, it didn't seem to be the best solution. If anyone felt like throwing me a "cheat sheet" for creating a dump device out of that drive and a "quickie" for producing a backtrace. I'm sure I'd be better able to find the required time to produce the required information. I'm sorry. It's just that I'm a hundred million miles away from that right now. As I've been building several large web applications, and their deadline is fast approaching. FWIW I bounced all the servers today, and therefore have recent /verbose/ dmesg's. Should any of the information they provide, be of any help/use to anyone. Take care. :) http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneldebug.html It's very unlikely NFS is relevant to the problem (which is what made it bogus, together with the lack of debugging) and likely that nve is the cause. The above URL explains in detail how to obtain the necessary debugging to confirm this. Kris Thank you Kris, I was recently able to find a small window in my workload. So I decided to use it to provide the "non-bogus" ;) information needed. After reading: http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneldebug.html and: http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneldebug-gdb.html a few days ago, I was only unclear on one point in setting up the required environment. So I posted my question to the list "dumpdev question (probably stupid)" which Andrey V. Elsukov immediately responded to. I'll be creating a Crash Dump in the next couple of days. So if it's not already abundantly clear that this is the first time I've attempted to produce this information - now would be the perfect time to /enlighten/ me as to anything you can think of that will ensure you get the information you're looking for. :) Thank you again for your reply. --Chris -- panic: kernel trap (ignored) ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any
Re: Reproducable, possibly NFS related, fatal double fault in 6.2-R-p7
Chris H. wrote: Quoting Kris Kennaway <[EMAIL PROTECTED]>: Clifton Royston wrote: On Tue, Oct 16, 2007 at 01:01:46PM -0700, Chris H. wrote: excerpt from this list titled: NFS == lock && reboot, that I posted follows: --8<---SNIP---8<-SNIP-8<--- # uname -a FreeBSD host.domain.tld 6.2-RELEASE FreeBSD 6.2-RELEASE #0: Fri Jan 26 16:27:14 PST 2007 Greetings, Does anyone know when NFS and friends will be working again? I haven't been able to /safely/ use it from 4.8 on. I remember some talk on the list sometime ago and then it seemed to be resolved, as the discussion ended. So I thought it was fixed. Seems not. :( My scenario; mount host off root: mount script exec'd follows... #!/bin/sh - mount -t nfs host.domain.tld:/ /host mount -t nfs host.domain.tld:/var /host/var confirm mount... # ls /host .snapCOPYRIGHTbin ... usrvartmp OK looks good... # cp /path/to/approx/10Mb/file /host/path/to/dest/dir/ Fatal double fault eis 0x0blah eiblah blah0x panic double fault no dump device defined rebooting in 15sec... Hmmm... that's not good. :( --8<---SNIP---8<-SNIP-8<--- My final solution was to change the lines in /etc/rc.conf from: nfs_client_enable="YES" nfs_reserved_port_only="YES" nfs_server_enable="YES" rpc_lockd_enable="YES" rpc_statd_enable="YES" rpcbind_enable="YES" to: nfs_client_enable="YES" nfs_reserved_port_only="YES" nfs_server_enable="YES" #rpc_lockd_enable="YES" #rpc_statd_enable="YES" rpcbind_enable="YES" Making those changes ended the "Fatal double fault && reboot in 15 seconds..." Thanks for this very timely mention! The cluster of servers I am about to upgrade from 4.8 to 6.2 relies heavily on NFS to an old Netapp. If I have got to disable rpc_lockd and rpc_statd, it's good to know that now! Can I ask, can anybody confirm that they're running 6.2 on NFS successfully *with* lockd and statd? Er, yes, of course it does. The old message he is quoting is bogus on its own, While I'll grant you that I haven't *yet* found/taken the time to create a dump device and re-enable rpd_lockd && rpc_statd && cp 10Mb file to mount point to produce an *instantaneous* "Fatal double fault". I don't think it's fair to label my original post entirely /bogus/ - especially in light of the recent post I replied to. Which seems to have some very common ground. I should probably mention that since my last posting (my original thread), I have some 20+ RELENG_6_2 boxen that *do* have rpd_lockd + rpc_statd enabled. Yet none of them produce a "Fatal double fault". They are all Tyan SMP boards with dual onboard fxp's - as opposed to the Nvidia UP which has a single onboard nve. They are all inter-connected via NFS. I have a 750Gb drive hanging off the /problematic/ Nvidia board, that I had intended to use for NFS back-up's. But given the NFS issue I had with it, it didn't seem to be the best solution. If anyone felt like throwing me a "cheat sheet" for creating a dump device out of that drive and a "quickie" for producing a backtrace. I'm sure I'd be better able to find the required time to produce the required information. I'm sorry. It's just that I'm a hundred million miles away from that right now. As I've been building several large web applications, and their deadline is fast approaching. FWIW I bounced all the servers today, and therefore have recent /verbose/ dmesg's. Should any of the information they provide, be of any help/use to anyone. Take care. :) http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneldebug.html It's very unlikely NFS is relevant to the problem (which is what made it bogus, together with the lack of debugging) and likely that nve is the cause. The above URL explains in detail how to obtain the necessary debugging to confirm this. Kris ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Reproducable, possibly NFS related, fatal double fault in 6.2-R-p7
Kris Kennaway <[EMAIL PROTECTED]> writes: > Bengt Ahlgren wrote: >> Esa Karkkainen <[EMAIL PROTECTED]> writes: >> >>> On Sun, Oct 14, 2007 at 02:37:23PM +0200, Kris Kennaway wrote: Esa Karkkainen wrote: > I get "Fatal double fault" error when writing to a filesystem > mounted from NFS server. >>> I got an offlist reply in which he suggested that the problem might be >>> in nve driver. >> That was me. I indeed got the same fault when running NFS over nve. >> Switching to nfe solved the problem for me. The on-screen backtrace >> reveals the true location of the problem. See: >> http://www.sics.se/~bengta/FBSD/DSC00585.JPG >> I do have a dump, but for some reason kgdb is not able to show the >> same information. > > If you're using a module you have to do extra (but documented) > steps. Or maybe kgdb has forgotten how to decode a double fault. Just for the record: if_nve was compiled into the kernel. Bengt ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Reproducable, possibly NFS related, fatal double fault in 6.2-R-p7
Quoting Kris Kennaway <[EMAIL PROTECTED]>: Clifton Royston wrote: On Tue, Oct 16, 2007 at 01:01:46PM -0700, Chris H. wrote: excerpt from this list titled: NFS == lock && reboot, that I posted follows: --8<---SNIP---8<-SNIP-8<--- # uname -a FreeBSD host.domain.tld 6.2-RELEASE FreeBSD 6.2-RELEASE #0: Fri Jan 26 16:27:14 PST 2007 Greetings, Does anyone know when NFS and friends will be working again? I haven't been able to /safely/ use it from 4.8 on. I remember some talk on the list sometime ago and then it seemed to be resolved, as the discussion ended. So I thought it was fixed. Seems not. :( My scenario; mount host off root: mount script exec'd follows... #!/bin/sh - mount -t nfs host.domain.tld:/ /host mount -t nfs host.domain.tld:/var /host/var confirm mount... # ls /host .snapCOPYRIGHTbin ... usrvartmp OK looks good... # cp /path/to/approx/10Mb/file /host/path/to/dest/dir/ Fatal double fault eis 0x0blah eiblah blah0x panic double fault no dump device defined rebooting in 15sec... Hmmm... that's not good. :( --8<---SNIP---8<-SNIP-8<--- My final solution was to change the lines in /etc/rc.conf from: nfs_client_enable="YES" nfs_reserved_port_only="YES" nfs_server_enable="YES" rpc_lockd_enable="YES" rpc_statd_enable="YES" rpcbind_enable="YES" to: nfs_client_enable="YES" nfs_reserved_port_only="YES" nfs_server_enable="YES" #rpc_lockd_enable="YES" #rpc_statd_enable="YES" rpcbind_enable="YES" Making those changes ended the "Fatal double fault && reboot in 15 seconds..." Thanks for this very timely mention! The cluster of servers I am about to upgrade from 4.8 to 6.2 relies heavily on NFS to an old Netapp. If I have got to disable rpc_lockd and rpc_statd, it's good to know that now! Can I ask, can anybody confirm that they're running 6.2 on NFS successfully *with* lockd and statd? Er, yes, of course it does. The old message he is quoting is bogus on its own, While I'll grant you that I haven't *yet* found/taken the time to create a dump device and re-enable rpd_lockd && rpc_statd && cp 10Mb file to mount point to produce an *instantaneous* "Fatal double fault". I don't think it's fair to label my original post entirely /bogus/ - especially in light of the recent post I replied to. Which seems to have some very common ground. I should probably mention that since my last posting (my original thread), I have some 20+ RELENG_6_2 boxen that *do* have rpd_lockd + rpc_statd enabled. Yet none of them produce a "Fatal double fault". They are all Tyan SMP boards with dual onboard fxp's - as opposed to the Nvidia UP which has a single onboard nve. They are all inter-connected via NFS. I have a 750Gb drive hanging off the /problematic/ Nvidia board, that I had intended to use for NFS back-up's. But given the NFS issue I had with it, it didn't seem to be the best solution. If anyone felt like throwing me a "cheat sheet" for creating a dump device out of that drive and a "quickie" for producing a backtrace. I'm sure I'd be better able to find the required time to produce the required information. I'm sorry. It's just that I'm a hundred million miles away from that right now. As I've been building several large web applications, and their deadline is fast approaching. FWIW I bounced all the servers today, and therefore have recent /verbose/ dmesg's. Should any of the information they provide, be of any help/use to anyone. Take care. :) --Chris I don't know if he ever was able to provide meaningful traces but it may well be nve as in the upthread discussion. Kris ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]" -- panic: kernel trap (ignored) ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Reproducable, possibly NFS related, fatal double fault in 6.2-R-p7
Bengt Ahlgren wrote: Esa Karkkainen <[EMAIL PROTECTED]> writes: On Sun, Oct 14, 2007 at 02:37:23PM +0200, Kris Kennaway wrote: Esa Karkkainen wrote: I get "Fatal double fault" error when writing to a filesystem mounted from NFS server. I got an offlist reply in which he suggested that the problem might be in nve driver. That was me. I indeed got the same fault when running NFS over nve. Switching to nfe solved the problem for me. The on-screen backtrace reveals the true location of the problem. See: http://www.sics.se/~bengta/FBSD/DSC00585.JPG I do have a dump, but for some reason kgdb is not able to show the same information. If you're using a module you have to do extra (but documented) steps. Or maybe kgdb has forgotten how to decode a double fault. Anyway, this information is indeed definitive, and it's what others seeing this problem need to provide if they still have doubts. Kris ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Reproducable, possibly NFS related, fatal double fault in 6.2-R-p7
Esa Karkkainen <[EMAIL PROTECTED]> writes: > On Sun, Oct 14, 2007 at 02:37:23PM +0200, Kris Kennaway wrote: >> Esa Karkkainen wrote: >> >I get "Fatal double fault" error when writing to a filesystem >> >mounted from NFS server. > > I got an offlist reply in which he suggested that the problem might be > in nve driver. That was me. I indeed got the same fault when running NFS over nve. Switching to nfe solved the problem for me. The on-screen backtrace reveals the true location of the problem. See: http://www.sics.se/~bengta/FBSD/DSC00585.JPG I do have a dump, but for some reason kgdb is not able to show the same information. Regards, Bengt ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Reproducable, possibly NFS related, fatal double fault in 6.2-R-p7
On Wed, Oct 17, 2007 at 12:24:29PM +1000, Greg Black wrote: > On 2007-10-16, Clifton Royston wrote: > > > Thanks for this very timely mention! The cluster of servers I am > > about to upgrade from 4.8 to 6.2 relies heavily on > > NFS to an old Netapp. If I have got to disable rpc_lockd and > > rpc_statd, it's good to know that now! > > > > Can I ask, can anybody confirm that they're running 6.2 on NFS > > successfully *with* lockd and statd? > > I have this combination running without any drama on a couple of > networks, so I doubt veery much if that is the fatal combination. Thanks for the rapid feedback. Glad to hear it was mistaken alarmism. I shall return to my usual state of apathy. -- Clifton -- Clifton Royston -- [EMAIL PROTECTED] / [EMAIL PROTECTED] President - I and I Computing * http://www.iandicomputing.com/ Custom programming, network design, systems and network consulting services ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Reproducable, possibly NFS related, fatal double fault in 6.2-R-p7
On 2007-10-16, Clifton Royston wrote: > Thanks for this very timely mention! The cluster of servers I am > about to upgrade from 4.8 to 6.2 relies heavily on > NFS to an old Netapp. If I have got to disable rpc_lockd and > rpc_statd, it's good to know that now! > > Can I ask, can anybody confirm that they're running 6.2 on NFS > successfully *with* lockd and statd? I have this combination running without any drama on a couple of networks, so I doubt veery much if that is the fatal combination. Greg ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Reproducable, possibly NFS related, fatal double fault in 6.2-R-p7
Clifton Royston wrote: On Tue, Oct 16, 2007 at 01:01:46PM -0700, Chris H. wrote: excerpt from this list titled: NFS == lock && reboot, that I posted follows: --8<---SNIP---8<-SNIP-8<--- # uname -a FreeBSD host.domain.tld 6.2-RELEASE FreeBSD 6.2-RELEASE #0: Fri Jan 26 16:27:14 PST 2007 Greetings, Does anyone know when NFS and friends will be working again? I haven't been able to /safely/ use it from 4.8 on. I remember some talk on the list sometime ago and then it seemed to be resolved, as the discussion ended. So I thought it was fixed. Seems not. :( My scenario; mount host off root: mount script exec'd follows... #!/bin/sh - mount -t nfs host.domain.tld:/ /host mount -t nfs host.domain.tld:/var /host/var confirm mount... # ls /host .snapCOPYRIGHTbin ... usrvartmp OK looks good... # cp /path/to/approx/10Mb/file /host/path/to/dest/dir/ Fatal double fault eis 0x0blah eiblah blah0x panic double fault no dump device defined rebooting in 15sec... Hmmm... that's not good. :( --8<---SNIP---8<-SNIP-8<--- My final solution was to change the lines in /etc/rc.conf from: nfs_client_enable="YES" nfs_reserved_port_only="YES" nfs_server_enable="YES" rpc_lockd_enable="YES" rpc_statd_enable="YES" rpcbind_enable="YES" to: nfs_client_enable="YES" nfs_reserved_port_only="YES" nfs_server_enable="YES" #rpc_lockd_enable="YES" #rpc_statd_enable="YES" rpcbind_enable="YES" Making those changes ended the "Fatal double fault && reboot in 15 seconds..." Thanks for this very timely mention! The cluster of servers I am about to upgrade from 4.8 to 6.2 relies heavily on NFS to an old Netapp. If I have got to disable rpc_lockd and rpc_statd, it's good to know that now! Can I ask, can anybody confirm that they're running 6.2 on NFS successfully *with* lockd and statd? Er, yes, of course it does. The old message he is quoting is bogus on its own, I don't know if he ever was able to provide meaningful traces but it may well be nve as in the upthread discussion. Kris ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Reproducable, possibly NFS related, fatal double fault in 6.2-R-p7
On Tue, Oct 16, 2007 at 01:01:46PM -0700, Chris H. wrote: > excerpt from this list titled: NFS == lock && reboot, that I posted follows: > > --8<---SNIP---8<-SNIP-8<--- > # uname -a > FreeBSD host.domain.tld 6.2-RELEASE FreeBSD 6.2-RELEASE #0: Fri Jan 26 > 16:27:14 PST 2007 > > Greetings, > Does anyone know when NFS and friends will be working again? I haven't > been able > to /safely/ use it from 4.8 on. I remember some talk on the list > sometime ago and > then it seemed to be resolved, as the discussion ended. So I thought it was > fixed. Seems not. :( > > My scenario; > mount host off root: > mount script exec'd follows... > > #!/bin/sh - > mount -t nfs host.domain.tld:/ /host > mount -t nfs host.domain.tld:/var /host/var > > confirm mount... > > # ls /host > .snapCOPYRIGHTbin > ... > usrvartmp > > OK looks good... > > # cp /path/to/approx/10Mb/file /host/path/to/dest/dir/ > > Fatal double fault > eis 0x0blah > eiblah blah0x > panic double fault > no dump device defined > rebooting in 15sec... > > Hmmm... that's not good. :( > > --8<---SNIP---8<-SNIP-8<--- > > My final solution was to change the lines in /etc/rc.conf > from: > nfs_client_enable="YES" > nfs_reserved_port_only="YES" > nfs_server_enable="YES" > rpc_lockd_enable="YES" > rpc_statd_enable="YES" > rpcbind_enable="YES" > > to: > nfs_client_enable="YES" > nfs_reserved_port_only="YES" > nfs_server_enable="YES" > #rpc_lockd_enable="YES" > #rpc_statd_enable="YES" > rpcbind_enable="YES" > > Making those changes ended the "Fatal double fault && reboot in 15 > seconds..." Thanks for this very timely mention! The cluster of servers I am about to upgrade from 4.8 to 6.2 relies heavily on NFS to an old Netapp. If I have got to disable rpc_lockd and rpc_statd, it's good to know that now! Can I ask, can anybody confirm that they're running 6.2 on NFS successfully *with* lockd and statd? -- Clifton -- Clifton Royston -- [EMAIL PROTECTED] / [EMAIL PROTECTED] President - I and I Computing * http://www.iandicomputing.com/ Custom programming, network design, systems and network consulting services ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Reproducable, possibly NFS related, fatal double fault in 6.2-R-p7
Quoting Esa Karkkainen <[EMAIL PROTECTED]>: On Tue, Oct 16, 2007 at 09:46:37AM +0900, Pyun YongHyeon wrote: I remember that nve(4) is NOT stable under heavy network loads. Yup, that seems to correct. Usually this machine, ie. home my orkstation, does not have a load, network wise or in general. I'd like to say use nfe(4) which is believed to be more stable/fast than nve(4). nfe(4) is also default NVIDIA NIC driver for CURRENT/RELENG_7. If you have to use RELENG_6 try nfe(4) at the following URL. Well, I could use -CURRENT or RELENG_7 in this machine, but I made a decision some time a go to use RELENG_6_2, because it's hassle free. Greetings, I had a situation that was exactly the same - excerpt from this list titled: NFS == lock && reboot, that I posted follows: --8<---SNIP---8<-SNIP-8<--- # uname -a FreeBSD host.domain.tld 6.2-RELEASE FreeBSD 6.2-RELEASE #0: Fri Jan 26 16:27:14 PST 2007 Greetings, Does anyone know when NFS and friends will be working again? I haven't been able to /safely/ use it from 4.8 on. I remember some talk on the list sometime ago and then it seemed to be resolved, as the discussion ended. So I thought it was fixed. Seems not. :( My scenario; mount host off root: mount script exec'd follows... #!/bin/sh - mount -t nfs host.domain.tld:/ /host mount -t nfs host.domain.tld:/var /host/var confirm mount... # ls /host .snapCOPYRIGHTbin ... usrvartmp OK looks good... # cp /path/to/approx/10Mb/file /host/path/to/dest/dir/ Fatal double fault eis 0x0blah eiblah blah0x panic double fault no dump device defined rebooting in 15sec... Hmmm... that's not good. :( --8<---SNIP---8<-SNIP-8<--- My final solution was to change the lines in /etc/rc.conf from: nfs_client_enable="YES" nfs_reserved_port_only="YES" nfs_server_enable="YES" rpc_lockd_enable="YES" rpc_statd_enable="YES" rpcbind_enable="YES" to: nfs_client_enable="YES" nfs_reserved_port_only="YES" nfs_server_enable="YES" #rpc_lockd_enable="YES" #rpc_statd_enable="YES" rpcbind_enable="YES" Making those changes ended the "Fatal double fault && reboot in 15 seconds..." My nic is: ifconfig_nve0 Thanks for reporting the /buggy/ nve driver. So there are no issues with the nfe driver? Thanks again. --Chris -- panic: kernel trap (ignored) ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Reproducable, possibly NFS related, fatal double fault in 6.2-R-p7
On Tue, Oct 16, 2007 at 09:46:37AM +0900, Pyun YongHyeon wrote: > I remember that nve(4) is NOT stable under heavy network loads. Yup, that seems to correct. Usually this machine, ie. home my orkstation, does not have a load, network wise or in general. > I'd like to say use nfe(4) which is believed to be more stable/fast > than nve(4). nfe(4) is also default NVIDIA NIC driver for > CURRENT/RELENG_7. If you have to use RELENG_6 try nfe(4) at the > following URL. Well, I could use -CURRENT or RELENG_7 in this machine, but I made a decision some time a go to use RELENG_6_2, because it's hassle free. -- "In the beginning the Universe was created. This has made a lot of people very angry and been widely regarded as a bad move." -- Douglas Adams 1952 - 2001 ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Reproducable, possibly NFS related, fatal double fault in 6.2-R-p7
Esa Karkkainen wrote: On Tue, Oct 16, 2007 at 02:33:49AM +0200, Kris Kennaway wrote: Esa Karkkainen wrote: This machine has two 512MB DDR333 DIMM's. I installed sysutils/memtest and ran three simultaneously, first two allocated 326 MB each and last one allocated 150 MB of memory, so I'd start to swap. No errors. Well, as you say, such a limited test doesn't mean much. Anyway, it may well have been nve, so see how you go without it. I downloaded Memtest86+ version 1.70 iso image, burned image to a CD, booted from the CD and then I let memtest running for sixteen hours. Memtest did not find any errors during that time. OK, higher probability that it is OK, but some memory errors are highly pattern dependent :) Physically replacing the RAM is the only way to be sure when there are lingering problems. Anyway, probably no need to worry about it unless you have further issues. Kris ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Reproducable, possibly NFS related, fatal double fault in 6.2-R-p7
On Tue, Oct 16, 2007 at 02:33:49AM +0200, Kris Kennaway wrote: > Esa Karkkainen wrote: > >This machine has two 512MB DDR333 DIMM's. > > > >I installed sysutils/memtest and ran three simultaneously, first two > >allocated 326 MB each and last one allocated 150 MB of memory, so I'd > >start to swap. No errors. > > Well, as you say, such a limited test doesn't mean much. Anyway, it may > well have been nve, so see how you go without it. I downloaded Memtest86+ version 1.70 iso image, burned image to a CD, booted from the CD and then I let memtest running for sixteen hours. Memtest did not find any errors during that time. -- "In the beginning the Universe was created. This has made a lot of people very angry and been widely regarded as a bad move." -- Douglas Adams 1952 - 2001 ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Reproducable, possibly NFS related, fatal double fault in 6.2-R-p7
On Mon, Oct 15, 2007 at 11:32:02PM +0300, Esa Karkkainen wrote: > On Sun, Oct 14, 2007 at 02:37:23PM +0200, Kris Kennaway wrote: > > Esa Karkkainen wrote: > > > I get "Fatal double fault" error when writing to a filesystem > > >mounted from NFS server. > > I got an offlist reply in which he suggested that the problem might be > in nve driver. > > I installed an additional Intel nic, appropriate lines from dmesg are > as follows > > fxp0: port 0xb000-0xb03f mem > 0xe720-0xe7200fff,0xe700-0xe70f irq 11 at device 6.0 on pci1 > miibus1: on fxp0 > inphy0: on miibus1 > inphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto > > After I started to use fxp0, I can dump(8) all the necessary filesystems > to the NFS mount, with out panic. > > When I used nve0 dump(8) or cp(1) managed to write less than megabyte to NFS > mount and then machine paniced. > I remember that nve(4) is NOT stable under heavy network loads. I'd like to say use nfe(4) which is believed to be more stable/fast than nve(4). nfe(4) is also default NVIDIA NIC driver for CURRENT/RELENG_7. If you have to use RELENG_6 try nfe(4) at the following URL. http://www.f.csce.kyushu-u.ac.jp/~shigeaki/software/freebsd-nfe.html > It didn't matter if I made dump(8) write to the NFS mount or to a local > filesystem and then copied the file to NFS mount, the end result was a > panic. > > > > Both NFS server and client are running 6.2-RELEASE-p7. > > Both machines have been updated to -p8. > > > ># kgdb kernel.debug /home/crash/vmcore.2 > > >Fatal double fault: > > >eip = 0xc063242a > > > > Can you look up these IPs in the kernel symbol table (see the developers > > handbook)? This might give at least one clue, although I'm not sure it > > is relevant. > > I'm sorry, but I need to learn alot more about gdb and debugging in > general before I can find that information. IIRC I have written about > ten or twenty lines of C in this millenia. > > I do have matching kernel.debug and vmcore files, but kernel modules etc > have been removed before I made new kernel and world. > > > You might also update to RELENG_6, I think there was at least one bug > > fixed that might have caused such a thing. > > At the moment I don't have any stability problems with this machine, but > I can upgrade to RELENG_6 before RELENG_6_3 is branched if that is > necessary. > > > Also try to rule out memory failure etc. > > This machine has two 512MB DDR333 DIMM's. > > I installed sysutils/memtest and ran three simultaneously, first two > allocated 326 MB each and last one allocated 150 MB of memory, so I'd > start to swap. No errors. > > I know these test are not conclusive, but I don't think DIMM's are > faulty. > > -- -- Regards, Pyun YongHyeon ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Reproducable, possibly NFS related, fatal double fault in 6.2-R-p7
Esa Karkkainen wrote: On Sun, Oct 14, 2007 at 02:37:23PM +0200, Kris Kennaway wrote: Esa Karkkainen wrote: I get "Fatal double fault" error when writing to a filesystem mounted from NFS server. I got an offlist reply in which he suggested that the problem might be in nve driver. I installed an additional Intel nic, appropriate lines from dmesg are as follows fxp0: port 0xb000-0xb03f mem 0xe720-0xe7200fff,0xe700-0xe70f irq 11 at device 6.0 on pci1 miibus1: on fxp0 inphy0: on miibus1 inphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto After I started to use fxp0, I can dump(8) all the necessary filesystems to the NFS mount, with out panic. When I used nve0 dump(8) or cp(1) managed to write less than megabyte to NFS mount and then machine paniced. It didn't matter if I made dump(8) write to the NFS mount or to a local filesystem and then copied the file to NFS mount, the end result was a panic. Both NFS server and client are running 6.2-RELEASE-p7. Both machines have been updated to -p8. # kgdb kernel.debug /home/crash/vmcore.2 Fatal double fault: eip = 0xc063242a Can you look up these IPs in the kernel symbol table (see the developers handbook)? This might give at least one clue, although I'm not sure it is relevant. I'm sorry, but I need to learn alot more about gdb and debugging in general before I can find that information. IIRC I have written about ten or twenty lines of C in this millenia. Well, it's explained in explicit detail in that document. C code is not involved. I do have matching kernel.debug and vmcore files, but kernel modules etc have been removed before I made new kernel and world. OK, most likely too late then. You might also update to RELENG_6, I think there was at least one bug fixed that might have caused such a thing. At the moment I don't have any stability problems with this machine, but I can upgrade to RELENG_6 before RELENG_6_3 is branched if that is necessary. Also try to rule out memory failure etc. This machine has two 512MB DDR333 DIMM's. I installed sysutils/memtest and ran three simultaneously, first two allocated 326 MB each and last one allocated 150 MB of memory, so I'd start to swap. No errors. Well, as you say, such a limited test doesn't mean much. Anyway, it may well have been nve, so see how you go without it. kris ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Reproducable, possibly NFS related, fatal double fault in 6.2-R-p7
On Sun, Oct 14, 2007 at 02:37:23PM +0200, Kris Kennaway wrote: > Esa Karkkainen wrote: > > I get "Fatal double fault" error when writing to a filesystem > >mounted from NFS server. I got an offlist reply in which he suggested that the problem might be in nve driver. I installed an additional Intel nic, appropriate lines from dmesg are as follows fxp0: port 0xb000-0xb03f mem 0xe720-0xe7200fff,0xe700-0xe70f irq 11 at device 6.0 on pci1 miibus1: on fxp0 inphy0: on miibus1 inphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto After I started to use fxp0, I can dump(8) all the necessary filesystems to the NFS mount, with out panic. When I used nve0 dump(8) or cp(1) managed to write less than megabyte to NFS mount and then machine paniced. It didn't matter if I made dump(8) write to the NFS mount or to a local filesystem and then copied the file to NFS mount, the end result was a panic. > > Both NFS server and client are running 6.2-RELEASE-p7. Both machines have been updated to -p8. > ># kgdb kernel.debug /home/crash/vmcore.2 > >Fatal double fault: > >eip = 0xc063242a > > Can you look up these IPs in the kernel symbol table (see the developers > handbook)? This might give at least one clue, although I'm not sure it > is relevant. I'm sorry, but I need to learn alot more about gdb and debugging in general before I can find that information. IIRC I have written about ten or twenty lines of C in this millenia. I do have matching kernel.debug and vmcore files, but kernel modules etc have been removed before I made new kernel and world. > You might also update to RELENG_6, I think there was at least one bug > fixed that might have caused such a thing. At the moment I don't have any stability problems with this machine, but I can upgrade to RELENG_6 before RELENG_6_3 is branched if that is necessary. > Also try to rule out memory failure etc. This machine has two 512MB DDR333 DIMM's. I installed sysutils/memtest and ran three simultaneously, first two allocated 326 MB each and last one allocated 150 MB of memory, so I'd start to swap. No errors. I know these test are not conclusive, but I don't think DIMM's are faulty. -- "In the beginning the Universe was created. This has made a lot of people very angry and been widely regarded as a bad move." -- Douglas Adams 1952 - 2001 ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Reproducable, possibly NFS related, fatal double fault in 6.2-R-p7
Esa Karkkainen wrote: I get "Fatal double fault" error when writing to a filesystem mounted from NFS server. Both NFS server and client are running 6.2-RELEASE-p7. I've attached dmesg from client and kernel config from server and client. Both have same these NFS options in /etc/rc.conf rpcbind_enable="YES" nfs_server_enable="YES" nfs_client_enable="YES" nfs_reserved_port_only="YES" rpc_lockd_enable="YES" rpc_statd_enable="YES" I have three kernel crash dumps available. The panic message is same in vmcore.0 and .1 Fatal double fault: eip = 0xc0608015 esp = 0xe3955000 ebp = 0xe3955020 panic: double fault Panic message in vmcore.2 has different eip and ebp values. Fatal double fault: eip = 0xc063242a esp = 0xe3955000 ebp = 0xe3955008 panic: double fault And here is backtrace from vmcore.2, which is identical to backtrace found in vmcore.0 and vmcore.1. Unfortunately the backtrace contains no information. # kgdb kernel.debug /home/crash/vmcore.2 Fatal double fault: eip = 0xc063242a Can you look up these IPs in the kernel symbol table (see the developers handbook)? This might give at least one clue, although I'm not sure it is relevant. You might also update to RELENG_6, I think there was at least one bug fixed that might have caused such a thing. Also try to rule out memory failure etc. Kris ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"