Re: Reproducable, possibly NFS related, fatal double fault in 6.2-R-p7

2007-11-02 Thread Chris H.

Quoting Kris Kennaway <[EMAIL PROTECTED]>:


Chris H. wrote:


-8uz_name);
2268
2269if (zone->uz_dtor)
2270zone->uz_dtor(item, keg->uk_size, udata);
2271#ifdef INVARIANTS
2272ZONE_LOCK(zone);
2273if (keg->uk_flags & UMA_ZONE_MALLOC)
2274uma_dbg_free(zone, udata, item);
(kgdb)
(kgdb) list *0xc0667e49
0xc0667e49 is in uma_zfree_arg (/usr/src/sys/vm/uma_core.c:2270).
2265#endif
2266CTR2(KTR_UMA, "uma_zfree_arg thread %x zone %s", curthread,
2267zone->uz_name);
2268
2269if (zone->uz_dtor)
2270zone->uz_dtor(item, keg->uk_size, udata);
2271#ifdef INVARIANTS
2272ZONE_LOCK(zone);
2273if (keg->uk_flags & UMA_ZONE_MALLOC)
2274uma_dbg_free(zone, udata, item);
(kgdb) backtrace
#0  doadump () at pcpu.h:165
#1  0xc052a7aa in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:409
#2  0xc052aa40 in panic (fmt=0xc070e8e5 "double fault")
   at /usr/src/sys/kern/kern_shutdown.c:565
#3  0xc06bc82e in dblfault_handler () at /usr/src/sys/i386/i386/trap.c:866
#4  0x in ?? ()
(kgdb) quit

Script done, output file is /tmp/vmdump

/usr/obj/usr/src/sys/NS1_01
3:52am
Fri, 02 ns1#


Hope this helps.

--Chris

P.S.
Note to onlookers:
I would have produced this information months ago except for a preconceived
notion that it would be a difficult/time consuming task. D'OH! WRONG!
It is truly a *trivial* task. So /please/ give generously! :)




___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"





--
panic: kernel trap (ignored)



___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Reproducable, possibly NFS related, fatal double fault in 6.2-R-p7

2007-10-30 Thread Kris Kennaway

Chris H. wrote:


I was recently able to find a small window in my workload. So I decided to
use it to provide the "non-bogus" ;) information needed. After reading:
http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneldebug.html 


and:
http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneldebug-gdb.html 


a few days ago, I was only unclear on one point in setting up the required
environment. So I posted my question to the list "dumpdev question 
(probably stupid)"

which Andrey V. Elsukov immediately responded to.
I'll be creating a Crash Dump in the next couple of days. So if it's not 
already

abundantly clear that this is the first time I've attempted to produce this
information - now would be the perfect time to /enlighten/ me as to 
anything you
can think of that will ensure you get the information you're looking 
for. :)


Thank you again for your reply.


I think that document explains everything that is necessary, but if you 
are unsure about something please feel free to ask.  Good luck :)


Kris

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Reproducable, possibly NFS related, fatal double fault in 6.2-R-p7

2007-10-30 Thread Chris H.

Quoting Kris Kennaway <[EMAIL PROTECTED]>:


Chris H. wrote:

Quoting Kris Kennaway <[EMAIL PROTECTED]>:


Clifton Royston wrote:

On Tue, Oct 16, 2007 at 01:01:46PM -0700, Chris H. wrote:
excerpt from this list titled: NFS == lock && reboot, that I 
posted follows:


--8<---SNIP---8<-SNIP-8<---
# uname -a
FreeBSD host.domain.tld 6.2-RELEASE FreeBSD 6.2-RELEASE #0: Fri 
Jan 26 16:27:14 PST 2007


Greetings,
Does anyone know when NFS and friends will be working again? I 
haven't been able
to /safely/ use it from 4.8 on. I remember some talk on the list 
sometime ago and
then it seemed to be resolved, as the discussion ended. So I 
thought it was

fixed. Seems not. :(

My scenario;
mount host off root:
mount script exec'd follows...

#!/bin/sh -
mount -t nfs host.domain.tld:/ /host
mount -t nfs host.domain.tld:/var /host/var

confirm mount...

# ls /host
.snapCOPYRIGHTbin
...
usrvartmp

OK looks good...

# cp /path/to/approx/10Mb/file /host/path/to/dest/dir/

Fatal double fault
eis 0x0blah
eiblah blah0x
panic double fault
no dump device defined
rebooting in 15sec...

Hmmm... that's not good. :(

--8<---SNIP---8<-SNIP-8<---

My final solution was to change the lines in /etc/rc.conf
from:
nfs_client_enable="YES"
nfs_reserved_port_only="YES"
nfs_server_enable="YES"
rpc_lockd_enable="YES"
rpc_statd_enable="YES"
rpcbind_enable="YES"

to:
nfs_client_enable="YES"
nfs_reserved_port_only="YES"
nfs_server_enable="YES"
#rpc_lockd_enable="YES"
#rpc_statd_enable="YES"
rpcbind_enable="YES"

Making those changes ended the "Fatal double fault && reboot in 
15 seconds..."


  Thanks for this very timely mention!  The cluster of servers I am
about to upgrade from 4.8  to 6.2 relies heavily on
NFS to an old Netapp.  If I have got to disable rpc_lockd and
rpc_statd, it's good to know that now!
   Can I ask, can anybody confirm that they're running 6.2 on NFS
successfully *with* lockd and statd?


Er, yes, of course it does.  The old message he is quoting is bogus 
on its own,

While I'll grant you that I haven't *yet* found/taken the time to create a
dump device and re-enable rpd_lockd && rpc_statd && cp 10Mb file to mount
point to produce an *instantaneous* "Fatal double fault". I don't think it's
fair to label my original post entirely /bogus/ - especially in light of
the recent post I replied to. Which seems to have some very common ground.
I should probably mention that since my last posting (my original thread),
I have some 20+ RELENG_6_2 boxen that *do* have rpd_lockd + rpc_statd
enabled. Yet none of them produce a "Fatal double fault". They are all
Tyan SMP boards with dual onboard fxp's - as opposed to the Nvidia UP
which has a single onboard nve.   They are all inter-connected via NFS.
I have a 750Gb drive hanging off the /problematic/ Nvidia board, that I
had intended to use for NFS back-up's. But given the NFS issue I had with
it, it didn't seem to be the best solution. If anyone felt like throwing
me a "cheat sheet" for creating a dump device out of that drive and a
"quickie" for producing a backtrace. I'm sure I'd be better able to find
the required time to produce the required information. I'm sorry. It's
just that I'm a hundred million miles away from that right now. As I've
been building several large web applications, and their deadline is fast
approaching. FWIW I bounced all the servers today, and therefore have
recent /verbose/ dmesg's. Should any of the information they provide, be
of any help/use to anyone.

Take care. :)


http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneldebug.html

It's very unlikely NFS is relevant to the problem (which is what made 
it bogus, together with the lack of debugging) and likely that nve is 
the cause.  The above URL explains in detail how to obtain the 
necessary debugging to confirm this.


Kris



Thank you Kris,
I was recently able to find a small window in my workload. So I decided to
use it to provide the "non-bogus" ;) information needed. After reading:
http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneldebug.html
and:
http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneldebug-gdb.html
a few days ago, I was only unclear on one point in setting up the required
environment. So I posted my question to the list "dumpdev question 
(probably stupid)"

which Andrey V. Elsukov immediately responded to.
I'll be creating a Crash Dump in the next couple of days. So if it's 
not already

abundantly clear that this is the first time I've attempted to produce this
information - now would be the perfect time to /enlighten/ me as to 
anything you

can think of that will ensure you get the information you're looking for. :)

Thank you again for your reply.

--Chris

--
panic: kernel trap (ignored)



___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any

Re: Reproducable, possibly NFS related, fatal double fault in 6.2-R-p7

2007-10-30 Thread Kris Kennaway

Chris H. wrote:

Quoting Kris Kennaway <[EMAIL PROTECTED]>:


Clifton Royston wrote:

On Tue, Oct 16, 2007 at 01:01:46PM -0700, Chris H. wrote:
excerpt from this list titled: NFS == lock && reboot, that I posted 
follows:


--8<---SNIP---8<-SNIP-8<---
# uname -a
FreeBSD host.domain.tld 6.2-RELEASE FreeBSD 6.2-RELEASE #0: Fri Jan 
26 16:27:14 PST 2007


Greetings,
Does anyone know when NFS and friends will be working again? I 
haven't been able
to /safely/ use it from 4.8 on. I remember some talk on the list 
sometime ago and
then it seemed to be resolved, as the discussion ended. So I thought 
it was

fixed. Seems not. :(

My scenario;
mount host off root:
mount script exec'd follows...

#!/bin/sh -
mount -t nfs host.domain.tld:/ /host
mount -t nfs host.domain.tld:/var /host/var

confirm mount...

# ls /host
.snapCOPYRIGHTbin
...
usrvartmp

OK looks good...

# cp /path/to/approx/10Mb/file /host/path/to/dest/dir/

Fatal double fault
eis 0x0blah
eiblah blah0x
panic double fault
no dump device defined
rebooting in 15sec...

Hmmm... that's not good. :(

--8<---SNIP---8<-SNIP-8<---

My final solution was to change the lines in /etc/rc.conf
from:
nfs_client_enable="YES"
nfs_reserved_port_only="YES"
nfs_server_enable="YES"
rpc_lockd_enable="YES"
rpc_statd_enable="YES"
rpcbind_enable="YES"

to:
nfs_client_enable="YES"
nfs_reserved_port_only="YES"
nfs_server_enable="YES"
#rpc_lockd_enable="YES"
#rpc_statd_enable="YES"
rpcbind_enable="YES"

Making those changes ended the "Fatal double fault && reboot in 15 
seconds..."


  Thanks for this very timely mention!  The cluster of servers I am
about to upgrade from 4.8  to 6.2 relies heavily on
NFS to an old Netapp.  If I have got to disable rpc_lockd and
rpc_statd, it's good to know that now!
   Can I ask, can anybody confirm that they're running 6.2 on NFS
successfully *with* lockd and statd?


Er, yes, of course it does.  The old message he is quoting is bogus on 
its own,

While I'll grant you that I haven't *yet* found/taken the time to create a
dump device and re-enable rpd_lockd && rpc_statd && cp 10Mb file to mount
point to produce an *instantaneous* "Fatal double fault". I don't think 
it's

fair to label my original post entirely /bogus/ - especially in light of
the recent post I replied to. Which seems to have some very common ground.
I should probably mention that since my last posting (my original thread),
I have some 20+ RELENG_6_2 boxen that *do* have rpd_lockd + rpc_statd
enabled. Yet none of them produce a "Fatal double fault". They are all
Tyan SMP boards with dual onboard fxp's - as opposed to the Nvidia UP
which has a single onboard nve.   They are all inter-connected via NFS.
I have a 750Gb drive hanging off the /problematic/ Nvidia board, that I
had intended to use for NFS back-up's. But given the NFS issue I had with
it, it didn't seem to be the best solution. If anyone felt like throwing
me a "cheat sheet" for creating a dump device out of that drive and a
"quickie" for producing a backtrace. I'm sure I'd be better able to find
the required time to produce the required information. I'm sorry. It's
just that I'm a hundred million miles away from that right now. As I've
been building several large web applications, and their deadline is fast
approaching. FWIW I bounced all the servers today, and therefore have
recent /verbose/ dmesg's. Should any of the information they provide, be
of any help/use to anyone.

Take care. :)


http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneldebug.html

It's very unlikely NFS is relevant to the problem (which is what made it 
bogus, together with the lack of debugging) and likely that nve is the 
cause.  The above URL explains in detail how to obtain the necessary 
debugging to confirm this.


Kris

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Reproducable, possibly NFS related, fatal double fault in 6.2-R-p7

2007-10-26 Thread Bengt Ahlgren
Kris Kennaway <[EMAIL PROTECTED]> writes:

> Bengt Ahlgren wrote:
>> Esa Karkkainen <[EMAIL PROTECTED]> writes:
>>
>>> On Sun, Oct 14, 2007 at 02:37:23PM +0200, Kris Kennaway wrote:
 Esa Karkkainen wrote:
>   I get "Fatal double fault" error when writing to a filesystem
> mounted from NFS server.
>>> I got an offlist reply in which he suggested that the problem might be
>>> in nve driver.
>> That was me.  I indeed got the same fault when running NFS over nve.
>> Switching to nfe solved the problem for me.  The on-screen backtrace
>> reveals the true location of the problem.  See:
>> http://www.sics.se/~bengta/FBSD/DSC00585.JPG
>> I do have a dump, but for some reason kgdb is not able to show the
>> same information.
>
> If you're using a module you have to do extra (but documented)
> steps. Or maybe kgdb has forgotten how to decode a double fault.

Just for the record: if_nve was compiled into the kernel.

Bengt
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Reproducable, possibly NFS related, fatal double fault in 6.2-R-p7

2007-10-19 Thread Chris H.

Quoting Kris Kennaway <[EMAIL PROTECTED]>:


Clifton Royston wrote:

On Tue, Oct 16, 2007 at 01:01:46PM -0700, Chris H. wrote:
excerpt from this list titled: NFS == lock && reboot, that I posted 
follows:


--8<---SNIP---8<-SNIP-8<---
# uname -a
FreeBSD host.domain.tld 6.2-RELEASE FreeBSD 6.2-RELEASE #0: Fri Jan 
26 16:27:14 PST 2007


Greetings,
Does anyone know when NFS and friends will be working again? I 
haven't been able
to /safely/ use it from 4.8 on. I remember some talk on the list 
sometime ago and

then it seemed to be resolved, as the discussion ended. So I thought it was
fixed. Seems not. :(

My scenario;
mount host off root:
mount script exec'd follows...

#!/bin/sh -
mount -t nfs host.domain.tld:/ /host
mount -t nfs host.domain.tld:/var /host/var

confirm mount...

# ls /host
.snapCOPYRIGHTbin
...
usrvartmp

OK looks good...

# cp /path/to/approx/10Mb/file /host/path/to/dest/dir/

Fatal double fault
eis 0x0blah
eiblah blah0x
panic double fault
no dump device defined
rebooting in 15sec...

Hmmm... that's not good. :(

--8<---SNIP---8<-SNIP-8<---

My final solution was to change the lines in /etc/rc.conf
from:
nfs_client_enable="YES"
nfs_reserved_port_only="YES"
nfs_server_enable="YES"
rpc_lockd_enable="YES"
rpc_statd_enable="YES"
rpcbind_enable="YES"

to:
nfs_client_enable="YES"
nfs_reserved_port_only="YES"
nfs_server_enable="YES"
#rpc_lockd_enable="YES"
#rpc_statd_enable="YES"
rpcbind_enable="YES"

Making those changes ended the "Fatal double fault && reboot in 15 
seconds..."


  Thanks for this very timely mention!  The cluster of servers I am
about to upgrade from 4.8  to 6.2 relies heavily on
NFS to an old Netapp.  If I have got to disable rpc_lockd and
rpc_statd, it's good to know that now!
   Can I ask, can anybody confirm that they're running 6.2 on NFS
successfully *with* lockd and statd?


Er, yes, of course it does.  The old message he is quoting is bogus 
on its own,

While I'll grant you that I haven't *yet* found/taken the time to create a
dump device and re-enable rpd_lockd && rpc_statd && cp 10Mb file to mount
point to produce an *instantaneous* "Fatal double fault". I don't think it's
fair to label my original post entirely /bogus/ - especially in light of
the recent post I replied to. Which seems to have some very common ground.
I should probably mention that since my last posting (my original thread),
I have some 20+ RELENG_6_2 boxen that *do* have rpd_lockd + rpc_statd
enabled. Yet none of them produce a "Fatal double fault". They are all
Tyan SMP boards with dual onboard fxp's - as opposed to the Nvidia UP
which has a single onboard nve. They are all inter-connected via NFS.
I have a 750Gb drive hanging off the /problematic/ Nvidia board, that I
had intended to use for NFS back-up's. But given the NFS issue I had with
it, it didn't seem to be the best solution. If anyone felt like throwing
me a "cheat sheet" for creating a dump device out of that drive and a
"quickie" for producing a backtrace. I'm sure I'd be better able to find
the required time to produce the required information. I'm sorry. It's
just that I'm a hundred million miles away from that right now. As I've
been building several large web applications, and their deadline is fast
approaching. FWIW I bounced all the servers today, and therefore have
recent /verbose/ dmesg's. Should any of the information they provide, be
of any help/use to anyone.

Take care. :)

--Chris

I don't know if he ever was able to provide meaningful traces but it 
may well be nve as in the upthread discussion.


Kris


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"





--
panic: kernel trap (ignored)



___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Reproducable, possibly NFS related, fatal double fault in 6.2-R-p7

2007-10-17 Thread Kris Kennaway

Bengt Ahlgren wrote:

Esa Karkkainen <[EMAIL PROTECTED]> writes:


On Sun, Oct 14, 2007 at 02:37:23PM +0200, Kris Kennaway wrote:

Esa Karkkainen wrote:

I get "Fatal double fault" error when writing to a filesystem
mounted from NFS server.

I got an offlist reply in which he suggested that the problem might be
in nve driver.


That was me.  I indeed got the same fault when running NFS over nve.
Switching to nfe solved the problem for me.  The on-screen backtrace
reveals the true location of the problem.  See:

http://www.sics.se/~bengta/FBSD/DSC00585.JPG

I do have a dump, but for some reason kgdb is not able to show the
same information.


If you're using a module you have to do extra (but documented) steps. 
Or maybe kgdb has forgotten how to decode a double fault.


Anyway, this information is indeed definitive, and it's what others 
seeing this problem need to provide if they still have doubts.


Kris

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Reproducable, possibly NFS related, fatal double fault in 6.2-R-p7

2007-10-17 Thread Bengt Ahlgren
Esa Karkkainen <[EMAIL PROTECTED]> writes:

> On Sun, Oct 14, 2007 at 02:37:23PM +0200, Kris Kennaway wrote:
>> Esa Karkkainen wrote:
>> >I get "Fatal double fault" error when writing to a filesystem
>> >mounted from NFS server.
>
> I got an offlist reply in which he suggested that the problem might be
> in nve driver.

That was me.  I indeed got the same fault when running NFS over nve.
Switching to nfe solved the problem for me.  The on-screen backtrace
reveals the true location of the problem.  See:

http://www.sics.se/~bengta/FBSD/DSC00585.JPG

I do have a dump, but for some reason kgdb is not able to show the
same information.

Regards,

Bengt
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Reproducable, possibly NFS related, fatal double fault in 6.2-R-p7

2007-10-16 Thread Clifton Royston
On Wed, Oct 17, 2007 at 12:24:29PM +1000, Greg Black wrote:
> On 2007-10-16, Clifton Royston wrote:
> 
> >   Thanks for this very timely mention!  The cluster of servers I am
> > about to upgrade from 4.8  to 6.2 relies heavily on
> > NFS to an old Netapp.  If I have got to disable rpc_lockd and
> > rpc_statd, it's good to know that now!
> >  
> >   Can I ask, can anybody confirm that they're running 6.2 on NFS
> > successfully *with* lockd and statd?
> 
> I have this combination running without any drama on a couple of
> networks, so I doubt veery much if that is the fatal combination.

  Thanks for the rapid feedback.  Glad to hear it was mistaken
alarmism.  I shall return to my usual state of apathy.
  -- Clifton

-- 
Clifton Royston  --  [EMAIL PROTECTED] / [EMAIL PROTECTED]
   President  - I and I Computing * http://www.iandicomputing.com/
 Custom programming, network design, systems and network consulting services
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Reproducable, possibly NFS related, fatal double fault in 6.2-R-p7

2007-10-16 Thread Greg Black
On 2007-10-16, Clifton Royston wrote:

>   Thanks for this very timely mention!  The cluster of servers I am
> about to upgrade from 4.8  to 6.2 relies heavily on
> NFS to an old Netapp.  If I have got to disable rpc_lockd and
> rpc_statd, it's good to know that now!
>  
>   Can I ask, can anybody confirm that they're running 6.2 on NFS
> successfully *with* lockd and statd?

I have this combination running without any drama on a couple of
networks, so I doubt veery much if that is the fatal combination.

Greg
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Reproducable, possibly NFS related, fatal double fault in 6.2-R-p7

2007-10-16 Thread Kris Kennaway

Clifton Royston wrote:

On Tue, Oct 16, 2007 at 01:01:46PM -0700, Chris H. wrote:

excerpt from this list titled: NFS == lock && reboot, that I posted follows:

--8<---SNIP---8<-SNIP-8<---
# uname -a
FreeBSD host.domain.tld 6.2-RELEASE FreeBSD 6.2-RELEASE #0: Fri Jan 26 
16:27:14 PST 2007


Greetings,
Does anyone know when NFS and friends will be working again? I haven't 
been able
to /safely/ use it from 4.8 on. I remember some talk on the list 
sometime ago and

then it seemed to be resolved, as the discussion ended. So I thought it was
fixed. Seems not. :(

My scenario;
mount host off root:
mount script exec'd follows...

#!/bin/sh -
mount -t nfs host.domain.tld:/ /host
mount -t nfs host.domain.tld:/var /host/var

confirm mount...

# ls /host
.snapCOPYRIGHTbin
...
usrvartmp

OK looks good...

# cp /path/to/approx/10Mb/file /host/path/to/dest/dir/

Fatal double fault
eis 0x0blah
eiblah blah0x
panic double fault
no dump device defined
rebooting in 15sec...

Hmmm... that's not good. :(

--8<---SNIP---8<-SNIP-8<---

My final solution was to change the lines in /etc/rc.conf
from:
nfs_client_enable="YES"
nfs_reserved_port_only="YES"
nfs_server_enable="YES"
rpc_lockd_enable="YES"
rpc_statd_enable="YES"
rpcbind_enable="YES"

to:
nfs_client_enable="YES"
nfs_reserved_port_only="YES"
nfs_server_enable="YES"
#rpc_lockd_enable="YES"
#rpc_statd_enable="YES"
rpcbind_enable="YES"

Making those changes ended the "Fatal double fault && reboot in 15 
seconds..."


  Thanks for this very timely mention!  The cluster of servers I am
about to upgrade from 4.8  to 6.2 relies heavily on
NFS to an old Netapp.  If I have got to disable rpc_lockd and
rpc_statd, it's good to know that now!
 
  Can I ask, can anybody confirm that they're running 6.2 on NFS

successfully *with* lockd and statd?


Er, yes, of course it does.  The old message he is quoting is bogus on 
its own, I don't know if he ever was able to provide meaningful traces 
but it may well be nve as in the upthread discussion.


Kris


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Reproducable, possibly NFS related, fatal double fault in 6.2-R-p7

2007-10-16 Thread Clifton Royston
On Tue, Oct 16, 2007 at 01:01:46PM -0700, Chris H. wrote:
> excerpt from this list titled: NFS == lock && reboot, that I posted follows:
> 
> --8<---SNIP---8<-SNIP-8<---
> # uname -a
> FreeBSD host.domain.tld 6.2-RELEASE FreeBSD 6.2-RELEASE #0: Fri Jan 26 
> 16:27:14 PST 2007
> 
> Greetings,
> Does anyone know when NFS and friends will be working again? I haven't 
> been able
> to /safely/ use it from 4.8 on. I remember some talk on the list 
> sometime ago and
> then it seemed to be resolved, as the discussion ended. So I thought it was
> fixed. Seems not. :(
> 
> My scenario;
> mount host off root:
> mount script exec'd follows...
> 
> #!/bin/sh -
> mount -t nfs host.domain.tld:/ /host
> mount -t nfs host.domain.tld:/var /host/var
> 
> confirm mount...
> 
> # ls /host
> .snapCOPYRIGHTbin
> ...
> usrvartmp
> 
> OK looks good...
> 
> # cp /path/to/approx/10Mb/file /host/path/to/dest/dir/
> 
> Fatal double fault
> eis 0x0blah
> eiblah blah0x
> panic double fault
> no dump device defined
> rebooting in 15sec...
> 
> Hmmm... that's not good. :(
> 
> --8<---SNIP---8<-SNIP-8<---
> 
> My final solution was to change the lines in /etc/rc.conf
> from:
> nfs_client_enable="YES"
> nfs_reserved_port_only="YES"
> nfs_server_enable="YES"
> rpc_lockd_enable="YES"
> rpc_statd_enable="YES"
> rpcbind_enable="YES"
> 
> to:
> nfs_client_enable="YES"
> nfs_reserved_port_only="YES"
> nfs_server_enable="YES"
> #rpc_lockd_enable="YES"
> #rpc_statd_enable="YES"
> rpcbind_enable="YES"
> 
> Making those changes ended the "Fatal double fault && reboot in 15 
> seconds..."

  Thanks for this very timely mention!  The cluster of servers I am
about to upgrade from 4.8  to 6.2 relies heavily on
NFS to an old Netapp.  If I have got to disable rpc_lockd and
rpc_statd, it's good to know that now!
 
  Can I ask, can anybody confirm that they're running 6.2 on NFS
successfully *with* lockd and statd?

  -- Clifton

-- 
Clifton Royston  --  [EMAIL PROTECTED] / [EMAIL PROTECTED]
   President  - I and I Computing * http://www.iandicomputing.com/
 Custom programming, network design, systems and network consulting services
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Reproducable, possibly NFS related, fatal double fault in 6.2-R-p7

2007-10-16 Thread Chris H.

Quoting Esa Karkkainen <[EMAIL PROTECTED]>:


On Tue, Oct 16, 2007 at 09:46:37AM +0900, Pyun YongHyeon wrote:

I remember that nve(4) is NOT stable under heavy network loads.


Yup, that seems to correct. Usually this machine, ie. home my
orkstation, does not have a load, network wise or in general.


I'd like to say use nfe(4) which is believed to be more stable/fast
than nve(4). nfe(4) is also default NVIDIA NIC driver for
CURRENT/RELENG_7. If you have to use RELENG_6 try nfe(4) at the
following URL.


Well, I could use -CURRENT or RELENG_7 in this machine, but I made a
decision some time a go to use RELENG_6_2, because it's hassle free.


Greetings,
I had a situation that was exactly the same -

excerpt from this list titled: NFS == lock && reboot, that I posted follows:

--8<---SNIP---8<-SNIP-8<---
# uname -a
FreeBSD host.domain.tld 6.2-RELEASE FreeBSD 6.2-RELEASE #0: Fri Jan 26 
16:27:14 PST 2007


Greetings,
Does anyone know when NFS and friends will be working again? I haven't 
been able
to /safely/ use it from 4.8 on. I remember some talk on the list 
sometime ago and

then it seemed to be resolved, as the discussion ended. So I thought it was
fixed. Seems not. :(

My scenario;
mount host off root:
mount script exec'd follows...

#!/bin/sh -
mount -t nfs host.domain.tld:/ /host
mount -t nfs host.domain.tld:/var /host/var

confirm mount...

# ls /host
.snapCOPYRIGHTbin
...
usrvartmp

OK looks good...

# cp /path/to/approx/10Mb/file /host/path/to/dest/dir/

Fatal double fault
eis 0x0blah
eiblah blah0x
panic double fault
no dump device defined
rebooting in 15sec...

Hmmm... that's not good. :(

--8<---SNIP---8<-SNIP-8<---

My final solution was to change the lines in /etc/rc.conf
from:
nfs_client_enable="YES"
nfs_reserved_port_only="YES"
nfs_server_enable="YES"
rpc_lockd_enable="YES"
rpc_statd_enable="YES"
rpcbind_enable="YES"

to:
nfs_client_enable="YES"
nfs_reserved_port_only="YES"
nfs_server_enable="YES"
#rpc_lockd_enable="YES"
#rpc_statd_enable="YES"
rpcbind_enable="YES"

Making those changes ended the "Fatal double fault && reboot in 15 seconds..."

My nic is: ifconfig_nve0

Thanks for reporting the /buggy/ nve driver.
So there are no issues with the nfe driver?

Thanks again.

--Chris


--
panic: kernel trap (ignored)



___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Reproducable, possibly NFS related, fatal double fault in 6.2-R-p7

2007-10-16 Thread Esa Karkkainen
On Tue, Oct 16, 2007 at 09:46:37AM +0900, Pyun YongHyeon wrote:
> I remember that nve(4) is NOT stable under heavy network loads.

Yup, that seems to correct. Usually this machine, ie. home my 
orkstation, does not have a load, network wise or in general.

> I'd like to say use nfe(4) which is believed to be more stable/fast
> than nve(4). nfe(4) is also default NVIDIA NIC driver for
> CURRENT/RELENG_7. If you have to use RELENG_6 try nfe(4) at the
> following URL.

Well, I could use -CURRENT or RELENG_7 in this machine, but I made a
decision some time a go to use RELENG_6_2, because it's hassle free.

-- 
"In the beginning the Universe was created. This has made a lot of
people very angry and been widely regarded as a bad move."
-- Douglas Adams 1952 - 2001
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Reproducable, possibly NFS related, fatal double fault in 6.2-R-p7

2007-10-16 Thread Kris Kennaway

Esa Karkkainen wrote:

On Tue, Oct 16, 2007 at 02:33:49AM +0200, Kris Kennaway wrote:

Esa Karkkainen wrote:

This machine has two 512MB DDR333 DIMM's.

I installed sysutils/memtest and ran three simultaneously, first two
allocated 326 MB each and last one allocated 150 MB of memory, so I'd
start to swap. No errors.
Well, as you say, such a limited test doesn't mean much.  Anyway, it may 
well have been nve, so see how you go without it.


I downloaded Memtest86+ version 1.70 iso image, burned image to a CD,
booted from the CD and then I let memtest running for sixteen hours.

Memtest did not find any errors during that time.


OK, higher probability that it is OK, but some memory errors are highly 
pattern dependent :)  Physically replacing the RAM is the only way to be 
sure when there are lingering problems.


Anyway, probably no need to worry about it unless you have further issues.

Kris
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Reproducable, possibly NFS related, fatal double fault in 6.2-R-p7

2007-10-16 Thread Esa Karkkainen
On Tue, Oct 16, 2007 at 02:33:49AM +0200, Kris Kennaway wrote:
> Esa Karkkainen wrote:
> >This machine has two 512MB DDR333 DIMM's.
> >
> >I installed sysutils/memtest and ran three simultaneously, first two
> >allocated 326 MB each and last one allocated 150 MB of memory, so I'd
> >start to swap. No errors.
> 
> Well, as you say, such a limited test doesn't mean much.  Anyway, it may 
> well have been nve, so see how you go without it.

I downloaded Memtest86+ version 1.70 iso image, burned image to a CD,
booted from the CD and then I let memtest running for sixteen hours.

Memtest did not find any errors during that time.

-- 
"In the beginning the Universe was created. This has made a lot of
people very angry and been widely regarded as a bad move."
-- Douglas Adams 1952 - 2001
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Reproducable, possibly NFS related, fatal double fault in 6.2-R-p7

2007-10-15 Thread Pyun YongHyeon
On Mon, Oct 15, 2007 at 11:32:02PM +0300, Esa Karkkainen wrote:
 > On Sun, Oct 14, 2007 at 02:37:23PM +0200, Kris Kennaway wrote:
 > > Esa Karkkainen wrote:
 > > >  I get "Fatal double fault" error when writing to a filesystem
 > > >mounted from NFS server.
 > 
 > I got an offlist reply in which he suggested that the problem might be
 > in nve driver.
 > 
 > I installed an additional Intel nic, appropriate lines from dmesg are
 > as follows
 > 
 > fxp0:  port 0xb000-0xb03f mem
 > 0xe720-0xe7200fff,0xe700-0xe70f irq 11 at device 6.0 on pci1
 > miibus1:  on fxp0
 > inphy0:  on miibus1
 > inphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
 > 
 > After I started to use fxp0, I can dump(8) all the necessary filesystems
 > to the NFS mount, with out panic.
 > 
 > When I used nve0 dump(8) or cp(1) managed to write less than megabyte to NFS
 > mount and then machine paniced.
 > 

I remember that nve(4) is NOT stable under heavy network loads.
I'd like to say use nfe(4) which is believed to be more stable/fast
than nve(4). nfe(4) is also default NVIDIA NIC driver for
CURRENT/RELENG_7. If you have to use RELENG_6 try nfe(4) at the
following URL.

http://www.f.csce.kyushu-u.ac.jp/~shigeaki/software/freebsd-nfe.html

 > It didn't matter if I made dump(8) write to the NFS mount or to a local
 > filesystem and then copied the file to NFS mount, the end result was a
 > panic.
 > 
 > > >  Both NFS server and client are running 6.2-RELEASE-p7.
 > 
 > Both machines have been updated to -p8.
 > 
 > > ># kgdb kernel.debug /home/crash/vmcore.2 
 > > >Fatal double fault:
 > > >eip = 0xc063242a
 > > 
 > > Can you look up these IPs in the kernel symbol table (see the developers 
 > > handbook)?  This might give at least one clue, although I'm not sure it 
 > > is relevant.
 > 
 > I'm sorry, but I need to learn alot more about gdb and debugging in
 > general before I can find that information. IIRC I have written about
 > ten or twenty lines of C in this millenia.
 > 
 > I do have matching kernel.debug and vmcore files, but kernel modules etc
 > have been removed before I made new kernel and world.
 > 
 > > You might also update to RELENG_6, I think there was at least one bug 
 > > fixed that might have caused such a thing.
 > 
 > At the moment I don't have any stability problems with this machine, but
 > I can upgrade to RELENG_6 before RELENG_6_3 is branched if that is
 > necessary.
 > 
 > > Also try to rule out memory failure etc.
 > 
 > This machine has two 512MB DDR333 DIMM's.
 > 
 > I installed sysutils/memtest and ran three simultaneously, first two
 > allocated 326 MB each and last one allocated 150 MB of memory, so I'd
 > start to swap. No errors.
 > 
 > I know these test are not conclusive, but I don't think DIMM's are 
 > faulty.
 > 
 > -- 

-- 
Regards,
Pyun YongHyeon
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Reproducable, possibly NFS related, fatal double fault in 6.2-R-p7

2007-10-15 Thread Kris Kennaway

Esa Karkkainen wrote:

On Sun, Oct 14, 2007 at 02:37:23PM +0200, Kris Kennaway wrote:

Esa Karkkainen wrote:

I get "Fatal double fault" error when writing to a filesystem
mounted from NFS server.


I got an offlist reply in which he suggested that the problem might be
in nve driver.

I installed an additional Intel nic, appropriate lines from dmesg are
as follows

fxp0:  port 0xb000-0xb03f mem
0xe720-0xe7200fff,0xe700-0xe70f irq 11 at device 6.0 on pci1
miibus1:  on fxp0
inphy0:  on miibus1
inphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto

After I started to use fxp0, I can dump(8) all the necessary filesystems
to the NFS mount, with out panic.

When I used nve0 dump(8) or cp(1) managed to write less than megabyte to NFS
mount and then machine paniced.

It didn't matter if I made dump(8) write to the NFS mount or to a local
filesystem and then copied the file to NFS mount, the end result was a
panic.


Both NFS server and client are running 6.2-RELEASE-p7.


Both machines have been updated to -p8.

# kgdb kernel.debug /home/crash/vmcore.2 
Fatal double fault:

eip = 0xc063242a
Can you look up these IPs in the kernel symbol table (see the developers 
handbook)?  This might give at least one clue, although I'm not sure it 
is relevant.


I'm sorry, but I need to learn alot more about gdb and debugging in
general before I can find that information. IIRC I have written about
ten or twenty lines of C in this millenia.


Well, it's explained in explicit detail in that document.  C code is not 
involved.



I do have matching kernel.debug and vmcore files, but kernel modules etc
have been removed before I made new kernel and world.


OK, most likely too late then.

You might also update to RELENG_6, I think there was at least one bug 
fixed that might have caused such a thing.


At the moment I don't have any stability problems with this machine, but
I can upgrade to RELENG_6 before RELENG_6_3 is branched if that is
necessary.


Also try to rule out memory failure etc.


This machine has two 512MB DDR333 DIMM's.

I installed sysutils/memtest and ran three simultaneously, first two
allocated 326 MB each and last one allocated 150 MB of memory, so I'd
start to swap. No errors.


Well, as you say, such a limited test doesn't mean much.  Anyway, it may 
well have been nve, so see how you go without it.


kris
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Reproducable, possibly NFS related, fatal double fault in 6.2-R-p7

2007-10-15 Thread Esa Karkkainen
On Sun, Oct 14, 2007 at 02:37:23PM +0200, Kris Kennaway wrote:
> Esa Karkkainen wrote:
> > I get "Fatal double fault" error when writing to a filesystem
> >mounted from NFS server.

I got an offlist reply in which he suggested that the problem might be
in nve driver.

I installed an additional Intel nic, appropriate lines from dmesg are
as follows

fxp0:  port 0xb000-0xb03f mem
0xe720-0xe7200fff,0xe700-0xe70f irq 11 at device 6.0 on pci1
miibus1:  on fxp0
inphy0:  on miibus1
inphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto

After I started to use fxp0, I can dump(8) all the necessary filesystems
to the NFS mount, with out panic.

When I used nve0 dump(8) or cp(1) managed to write less than megabyte to NFS
mount and then machine paniced.

It didn't matter if I made dump(8) write to the NFS mount or to a local
filesystem and then copied the file to NFS mount, the end result was a
panic.

> > Both NFS server and client are running 6.2-RELEASE-p7.

Both machines have been updated to -p8.

> ># kgdb kernel.debug /home/crash/vmcore.2 
> >Fatal double fault:
> >eip = 0xc063242a
> 
> Can you look up these IPs in the kernel symbol table (see the developers 
> handbook)?  This might give at least one clue, although I'm not sure it 
> is relevant.

I'm sorry, but I need to learn alot more about gdb and debugging in
general before I can find that information. IIRC I have written about
ten or twenty lines of C in this millenia.

I do have matching kernel.debug and vmcore files, but kernel modules etc
have been removed before I made new kernel and world.

> You might also update to RELENG_6, I think there was at least one bug 
> fixed that might have caused such a thing.

At the moment I don't have any stability problems with this machine, but
I can upgrade to RELENG_6 before RELENG_6_3 is branched if that is
necessary.

> Also try to rule out memory failure etc.

This machine has two 512MB DDR333 DIMM's.

I installed sysutils/memtest and ran three simultaneously, first two
allocated 326 MB each and last one allocated 150 MB of memory, so I'd
start to swap. No errors.

I know these test are not conclusive, but I don't think DIMM's are 
faulty.

-- 
"In the beginning the Universe was created. This has made a lot of
people very angry and been widely regarded as a bad move."
-- Douglas Adams 1952 - 2001
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Reproducable, possibly NFS related, fatal double fault in 6.2-R-p7

2007-10-14 Thread Kris Kennaway

Esa Karkkainen wrote:

I get "Fatal double fault" error when writing to a filesystem
mounted from NFS server.

Both NFS server and client are running 6.2-RELEASE-p7.

I've attached dmesg from client and kernel config from server
and client.

Both have same these NFS options in /etc/rc.conf

rpcbind_enable="YES"
nfs_server_enable="YES"
nfs_client_enable="YES"
nfs_reserved_port_only="YES"
rpc_lockd_enable="YES"
rpc_statd_enable="YES" 


I have three kernel crash dumps available.

	The panic message is same in vmcore.0 and .1 


Fatal double fault:
eip = 0xc0608015
esp = 0xe3955000
ebp = 0xe3955020
panic: double fault

Panic message in vmcore.2 has different eip and ebp values.

Fatal double fault:
eip = 0xc063242a
esp = 0xe3955000
ebp = 0xe3955008
panic: double fault

And here is backtrace from vmcore.2, which is identical to
backtrace found in vmcore.0 and vmcore.1.


Unfortunately the backtrace contains no information.

# kgdb kernel.debug /home/crash/vmcore.2 
Fatal double fault:

eip = 0xc063242a


Can you look up these IPs in the kernel symbol table (see the developers 
handbook)?  This might give at least one clue, although I'm not sure it 
is relevant.


You might also update to RELENG_6, I think there was at least one bug 
fixed that might have caused such a thing.  Also try to rule out memory 
failure etc.


Kris

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"