Re: [panic]Fatal trap 12: page fault while in kernel mode

2007-08-02 Thread Robert Watson

On Tue, 31 Jul 2007, ytriffy wrote:

Trap 12 occured when I rebooted PC. Sending you backtrace. My system: amd64 
3200+ Venice, MB ECS nForce4 A939,Samsung 250GB and WD 250 GB, 2 memory 
banks 512MB each, videocard: Geforce 6600gt 128MB, NIC on realtek chip, 
sound card cirrus logic cs4281. It's very unstable, crashes happen every 
day, so I'm hoping you would say why(any hints what hardware may cause it). 
How to repeat it? I don't know. It happened once during reboot process.


In general, you want to report this sort of bug using the send-pr interface, 
or the gnats web submission form.  In the past, I've quite a few bug reports 
sent to hackers@ get lost because many FreeBSD developers don't subscribe to 
the list.  You could also consider sending it to stable@, since that's the 
mailing list for discussing 6-STABLE development.  FYI, this looks like a 
NULL-pointer dereference in the VFS shutdown code.


Robert N M Watson
Computer Laboratory
University of Cambridge



[EMAIL PROTECTED] /var]# uname -a
FreeBSD freelanc.dubki.ru <http://freelanc.dubki.ru> 6.2-STABLE-200706 
FreeBSD 6.2-STABLE-200706

#1: Mon Jul 23 13:34:27 MSD 2007
[EMAIL PROTECTED]:/usr/obj/usr/src/sys/DEBUGGER
KERN i386

[EMAIL PROTECTED] /usr/obj/usr/src/sys/DEBUGGERKERN]# kgdb kernel.debug
/var/crash/vmcore.3
kgdb: kvm_nlist(_stopped_cpus):
kgdb: kvm_nlist(_stoppcbs):
[GDB will not be able to debug user-mode threads:
/usr/lib/libthread_db.so: Undefined symbol "ps_pglobal_lookup"]
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you
are
welcome to change it and/or distribute copies of it under certain
conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
This GDB was configured as "i386-marcel-freebsd".

Unread portion of the kernel message buffer:
<118>Jul 25 14:06:32 freelanc syslogd: exiting on signal 15
Waiting (max 60 seconds) for system process `vnlru' to stop...done
Waiting (max 60 seconds) for system process `syncer' to stop...
Syncing disks, vnodes remaining...6 5 3 1 0 0 done
Waiting (max 60 seconds) for system process `bufdaemon' to stop...done
All buffers synced.


Fatal trap 12: page fault while in kernel mode
fault virtual address = 0x4
fault code = supervisor read, page not present
instruction pointer = 0x20:0xc058a4e0
stack pointer = 0x28:0xe9455c48
frame pointer = 0x28:0xe9455c58
code segment = base 0x0, limit 0xf, type 0x1b
= DPL 0, pres 1, def32 1, gran 1
processor eflags = interrupt enabled, resume, IOPL = 0
current process = 44922 (reboot)
panic: from debugger
Uptime: 2h45m36s
Dumping 1022 MB (2 chunks)
chunk 0: 1MB (159 pages) ... ok
chunk 1: 1022MB (261600 pages) 1006 990 974 958 942 926 910 894 878 862
846 830 814 798 782 766 750 734 718 702 686 670 654 638 622 606 590 574
558 542 526 510 494 478 462 446 430 414 398 382 366 350 334 318 302 286
270 254 238 222 206 190 174 158 142 126 110 94 78 62 46 30 14

#0 doadump () at pcpu.h:165
165 __asm __volatile("movl %%fs:0,%0" : "=r" (td));
(kgdb) bt
#0 doadump () at pcpu.h:165
#1 0xc053d916 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:409
#2 0xc053dbdc in panic (fmt=0xc06f5278 "from debugger")
at /usr/src/sys/kern/kern_shutdown.c:565
#3 0xc045361d in db_panic (addr=-1067932448, have_addr=0, count=-1,
modif=0xe9455a74 "") at /usr/src/sys/ddb/db_command.c:438
#4 0xc04535b4 in db_command (last_cmdp=0xc0766784, cmd_table=0x0,
aux_cmd_tablep=0xc0728e90, aux_cmd_tablep_end=0xc0728e94)
at /usr/src/sys/ddb/db_command.c:350
#5 0xc045367c in db_command_loop () at /usr/src/sys/ddb/db_command.c:458
#6 0xc0455291 in db_trap (type=12, code=0) at
/usr/src/sys/ddb/db_main.c:222
#7 0xc0556a2b in kdb_trap (type=12, code=0, tf=0xe9455c08)
at /usr/src/sys/kern/subr_kdb.c:473
#8 0xc06cba6c in trap_fatal (frame=0xe9455c08, eva=4)
at /usr/src/sys/i386/i386/trap.c:828
#9 0xc06cb7d7 in trap_pfault (frame=0xe9455c08, usermode=0, eva=4)
at /usr/src/sys/i386/i386/trap.c:745
#10 0xc06cb3f1 in trap (frame=
{tf_fs = 8, tf_es = 40, tf_ds = 40, tf_edi = -381330360, tf_esi =
-993547624, tf_ebp = -381330344, tf_isp = -381330380, tf_ebx = 0, tf_edx
= -992513384, tf_ecx = 4, tf_eax = -950651024, tf_trapno = 12, tf_err =
0, tf_eip = -1067932448, tf_cs = 32, tf_eflags = 590338, tf_esp = 0,
tf_ss = -992305712})
at /usr/src/sys/i386/i386/trap.c:435
#11 0xc06b8b1a in calltrap () at /usr/src/sys/i386/i386/exception.s:139
#12 0xc058a4e0 in cache_purgevfs (mp=0xc4d77298)
at /usr/src/sys/kern/vfs_cache.c:622
#13 0xc0591f29 in dounmount (mp=0xc4d77298, flags=524288, td=0xc62ce300)
at /usr/src/sys/kern/vfs_mount.c:1214
#14 0xc0597d0a in vfs_unmountall () at /usr/src/sys/kern/vfs_subr.c:2837
#15 0xc053d807 in boot (howto=0) at /usr/src/sys/kern/kern_shutdown.c:391
#16 0xc053d2a2 in reboot (td=0

[panic]Fatal trap 12: page fault while in kernel mode

2007-07-31 Thread ytriffy

Hello.
Trap 12 occured when I rebooted PC. Sending you backtrace.
My system: amd64 3200+ Venice, MB ECS nForce4 A939,Samsung 250GB and WD
250 GB, 2 memory banks 512MB each, videocard: Geforce 6600gt 128MB,
NIC on realtek chip, sound card cirrus logic cs4281. It's very unstable,
crashes happen every day, so I'm hoping you would say why(any hints what
hardware may cause it).
How to repeat it? I don't know. It happened once during reboot process.

[EMAIL PROTECTED] /var]# uname -a
FreeBSD freelanc.dubki.ru <http://freelanc.dubki.ru> 6.2-STABLE-200706 
FreeBSD 6.2-STABLE-200706

#1: Mon Jul 23 13:34:27 MSD 2007
[EMAIL PROTECTED]:/usr/obj/usr/src/sys/DEBUGGER
KERN i386

[EMAIL PROTECTED] /usr/obj/usr/src/sys/DEBUGGERKERN]# kgdb kernel.debug
/var/crash/vmcore.3
kgdb: kvm_nlist(_stopped_cpus):
kgdb: kvm_nlist(_stoppcbs):
[GDB will not be able to debug user-mode threads:
/usr/lib/libthread_db.so: Undefined symbol "ps_pglobal_lookup"]
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you
are
welcome to change it and/or distribute copies of it under certain
conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
This GDB was configured as "i386-marcel-freebsd".

Unread portion of the kernel message buffer:
<118>Jul 25 14:06:32 freelanc syslogd: exiting on signal 15
Waiting (max 60 seconds) for system process `vnlru' to stop...done
Waiting (max 60 seconds) for system process `syncer' to stop...
Syncing disks, vnodes remaining...6 5 3 1 0 0 done
Waiting (max 60 seconds) for system process `bufdaemon' to stop...done
All buffers synced.


Fatal trap 12: page fault while in kernel mode
fault virtual address = 0x4
fault code = supervisor read, page not present
instruction pointer = 0x20:0xc058a4e0
stack pointer = 0x28:0xe9455c48
frame pointer = 0x28:0xe9455c58
code segment = base 0x0, limit 0xf, type 0x1b
= DPL 0, pres 1, def32 1, gran 1
processor eflags = interrupt enabled, resume, IOPL = 0
current process = 44922 (reboot)
panic: from debugger
Uptime: 2h45m36s
Dumping 1022 MB (2 chunks)
chunk 0: 1MB (159 pages) ... ok
chunk 1: 1022MB (261600 pages) 1006 990 974 958 942 926 910 894 878 862
846 830 814 798 782 766 750 734 718 702 686 670 654 638 622 606 590 574
558 542 526 510 494 478 462 446 430 414 398 382 366 350 334 318 302 286
270 254 238 222 206 190 174 158 142 126 110 94 78 62 46 30 14

#0 doadump () at pcpu.h:165
165 __asm __volatile("movl %%fs:0,%0" : "=r" (td));
(kgdb) bt
#0 doadump () at pcpu.h:165
#1 0xc053d916 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:409
#2 0xc053dbdc in panic (fmt=0xc06f5278 "from debugger")
at /usr/src/sys/kern/kern_shutdown.c:565
#3 0xc045361d in db_panic (addr=-1067932448, have_addr=0, count=-1,
modif=0xe9455a74 "") at /usr/src/sys/ddb/db_command.c:438
#4 0xc04535b4 in db_command (last_cmdp=0xc0766784, cmd_table=0x0,
aux_cmd_tablep=0xc0728e90, aux_cmd_tablep_end=0xc0728e94)
at /usr/src/sys/ddb/db_command.c:350
#5 0xc045367c in db_command_loop () at /usr/src/sys/ddb/db_command.c:458
#6 0xc0455291 in db_trap (type=12, code=0) at
/usr/src/sys/ddb/db_main.c:222
#7 0xc0556a2b in kdb_trap (type=12, code=0, tf=0xe9455c08)
at /usr/src/sys/kern/subr_kdb.c:473
#8 0xc06cba6c in trap_fatal (frame=0xe9455c08, eva=4)
at /usr/src/sys/i386/i386/trap.c:828
#9 0xc06cb7d7 in trap_pfault (frame=0xe9455c08, usermode=0, eva=4)
at /usr/src/sys/i386/i386/trap.c:745
#10 0xc06cb3f1 in trap (frame=
{tf_fs = 8, tf_es = 40, tf_ds = 40, tf_edi = -381330360, tf_esi =
-993547624, tf_ebp = -381330344, tf_isp = -381330380, tf_ebx = 0, tf_edx
= -992513384, tf_ecx = 4, tf_eax = -950651024, tf_trapno = 12, tf_err =
0, tf_eip = -1067932448, tf_cs = 32, tf_eflags = 590338, tf_esp = 0,
tf_ss = -992305712})
at /usr/src/sys/i386/i386/trap.c:435
#11 0xc06b8b1a in calltrap () at /usr/src/sys/i386/i386/exception.s:139
#12 0xc058a4e0 in cache_purgevfs (mp=0xc4d77298)
at /usr/src/sys/kern/vfs_cache.c:622
#13 0xc0591f29 in dounmount (mp=0xc4d77298, flags=524288, td=0xc62ce300)
at /usr/src/sys/kern/vfs_mount.c:1214
#14 0xc0597d0a in vfs_unmountall () at /usr/src/sys/kern/vfs_subr.c:2837
#15 0xc053d807 in boot (howto=0) at /usr/src/sys/kern/kern_shutdown.c:391
#16 0xc053d2a2 in reboot (td=0xc62ce300, uap=0xc7563770)
at /usr/src/sys/kern/kern_shutdown.c:169
#17 0xc06cbdbb in syscall (frame=
{tf_fs = 59, tf_es = 59, tf_ds = 59, tf_edi = 2, tf_esi = 18, tf_ebp =
-1077941304, tf_isp = -381330076, tf_ebx = 0, tf_edx = -1, tf_ecx =
672491264, tf_eax = 55, tf_trapno = 12, tf_err = 2, tf_eip = 671802263,
tf_cs = 51, tf_eflags = 662, tf_esp = -1077941380, tf_ss = 59}) at
/usr/src/sys/i386/i386/trap.c:983
#18 0xc06b8b6f in Xint0x80_syscall () at
/usr/src/sys/i386/i386/exception.s:200
#19 0x0033 in ?? ()
Previous frame inner to this frame (cor

Re: Help? 6.1-S: Fatal trap 12: page fault while in kernel mode

2006-06-16 Thread Konstantin Belousov
On Fri, Jun 16, 2006 at 07:58:05PM +0400, Maxim Konovalov wrote:
> On Fri, 16 Jun 2006, 08:45-0700, David Wolfskill wrote:
> 
> > On Thu, Jun 15, 2006 at 04:22:40PM -0700, David Wolfskill wrote:
> > > I had one of these [kernel panics] a couple of weeks ago or so...
> > > ...[upgrade to -STABLE as of 15 June; repeat panic]...
> >
> > The message to which I'm replying (posted to -stable) has the
> > particulars about the panic in question, and the machine in question is
> > still sitting at the DDB prompt, if anyone wishes to work with me on
> > that.
> >
> > But the reason for this message is to report that I upgraded the other
> > test machines -- identical confguration: 2x3 GHz Xeons w/ 4 GB RAM;
> > kernel config is called "SMP_PAE_DDB" for a fairly good reason -- to
> > today's -CURRENT, then started the same test that cause -STABLE to crash
> > & burn within a couple of minutes.
> >
> > That was 30 minutes ago; the test is still running on
> >
> > FreeBSD localhost 7.0-CURRENT FreeBSD 7.0-CURRENT #1: Fri Jun 16 07:28:18 
> > PDT 2006 [EMAIL PROTECTED]:/usr/obj/usr/src/sys/SMP_PAE_DDB  i386
> >
> > As I commented in email to some colleagues, "color me surprised."
> >
> > I've suggested to the vendor (the program under test on the box is
> > from a vendor, built under & for FreeBSD 5.x; I'm using the
> > misc/compat5x port) that they consider trying this themselves, and
> > perhaps also take advantage of John Birrell's work to date on the
> > FreeBSD port of DTrace.
> >
> > I'm still not too keen to run a production workload on a -CURRENT
> > platform.  I don't know if whatever is causing -CURRENT to keep running
> > while -STABLE dies is an MFC candidate, but it seems to me that
> > identifying the salient change(s) would be helpful in figuring that out.
> >
> > Any suggestions for how to go about doing that?
> 
> "trace" in ddb would be good start.  Do you really need PAE?
The real problem is that trace does not work. Original message
contains the details. It is either NULL-pointer function call (most likely),
or stack array overflow.

I already sent the OP the instructions how to proceed. And waiting for
response.


pgpK2htKKngei.pgp
Description: PGP signature


Re: Help? 6.1-S: Fatal trap 12: page fault while in kernel mode

2006-06-16 Thread Maxim Konovalov
On Fri, 16 Jun 2006, 08:45-0700, David Wolfskill wrote:

> On Thu, Jun 15, 2006 at 04:22:40PM -0700, David Wolfskill wrote:
> > I had one of these [kernel panics] a couple of weeks ago or so...
> > ...[upgrade to -STABLE as of 15 June; repeat panic]...
>
> The message to which I'm replying (posted to -stable) has the
> particulars about the panic in question, and the machine in question is
> still sitting at the DDB prompt, if anyone wishes to work with me on
> that.
>
> But the reason for this message is to report that I upgraded the other
> test machines -- identical confguration: 2x3 GHz Xeons w/ 4 GB RAM;
> kernel config is called "SMP_PAE_DDB" for a fairly good reason -- to
> today's -CURRENT, then started the same test that cause -STABLE to crash
> & burn within a couple of minutes.
>
> That was 30 minutes ago; the test is still running on
>
> FreeBSD localhost 7.0-CURRENT FreeBSD 7.0-CURRENT #1: Fri Jun 16 07:28:18 PDT 
> 2006 [EMAIL PROTECTED]:/usr/obj/usr/src/sys/SMP_PAE_DDB  i386
>
> As I commented in email to some colleagues, "color me surprised."
>
> I've suggested to the vendor (the program under test on the box is
> from a vendor, built under & for FreeBSD 5.x; I'm using the
> misc/compat5x port) that they consider trying this themselves, and
> perhaps also take advantage of John Birrell's work to date on the
> FreeBSD port of DTrace.
>
> I'm still not too keen to run a production workload on a -CURRENT
> platform.  I don't know if whatever is causing -CURRENT to keep running
> while -STABLE dies is an MFC candidate, but it seems to me that
> identifying the salient change(s) would be helpful in figuring that out.
>
> Any suggestions for how to go about doing that?

"trace" in ddb would be good start.  Do you really need PAE?

-- 
Maxim Konovalov
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Help? 6.1-S: Fatal trap 12: page fault while in kernel mode

2006-06-16 Thread David Wolfskill
On Thu, Jun 15, 2006 at 04:22:40PM -0700, David Wolfskill wrote:
> I had one of these [kernel panics] a couple of weeks ago or so...
> ...[upgrade to -STABLE as of 15 June; repeat panic]...

The message to which I'm replying (posted to -stable) has the
particulars about the panic in question, and the machine in question is
still sitting at the DDB prompt, if anyone wishes to work with me on
that.

But the reason for this message is to report that I upgraded the other
test machines -- identical confguration: 2x3 GHz Xeons w/ 4 GB RAM;
kernel config is called "SMP_PAE_DDB" for a fairly good reason -- to
today's -CURRENT, then started the same test that cause -STABLE to crash
& burn within a couple of minutes.

That was 30 minutes ago; the test is still running on

FreeBSD localhost 7.0-CURRENT FreeBSD 7.0-CURRENT #1: Fri Jun 16 07:28:18 PDT 
2006 [EMAIL PROTECTED]:/usr/obj/usr/src/sys/SMP_PAE_DDB  i386

As I commented in email to some colleagues, "color me surprised."

I've suggested to the vendor (the program under test on the box is
from a vendor, built under & for FreeBSD 5.x; I'm using the
misc/compat5x port) that they consider trying this themselves, and
perhaps also take advantage of John Birrell's work to date on the
FreeBSD port of DTrace.

I'm still not too keen to run a production workload on a -CURRENT
platform.  I don't know if whatever is causing -CURRENT to keep running
while -STABLE dies is an MFC candidate, but it seems to me that
identifying the salient change(s) would be helpful in figuring that out.

Any suggestions for how to go about doing that?

Thanks!

Peace,
david
-- 
David H. Wolfskill  [EMAIL PROTECTED]
Doing business with spammers only encourages them.  Please boycott spammers.

See http://www.catwhisker.org/~david/publickey.gpg for my public key.


pgpZ2XoMmc93k.pgp
Description: PGP signature


Re: Fatal trap 12: page fault while in kernel mode

2005-12-25 Thread kamal kc
thanks,
 i will try INVARIANTS and WITNESS options and will try to get 
 freebsd 6.0. it will be only tomorrow when i'll be able to do this
 because it is already evening and i will go to my office tomorrow 
 only.
 
 in the mean time if the memory corruption is the problem then is there
 any option/configuration or possible thing i could do to 
 make sure that the kernel quits or throws some messages or panics 
 on the moment the corruption takes place rather than some 
 time later when other program is affected by it. 
 
 that way i could locate any bug in my code if present.
 
 thanks, 
 kamal
 

Xin LI <[EMAIL PROTECTED]> wrote: Hi,

On 12/25/05, kamal kc  wrote:
[...]
> Is the problem related to memory leaks or sleeping
> on mutexes or some other causes.

>From the backtrace you have provided, it looks like a memory
corruption.  In order to aid your debugging, you will want INVARIANTS
and WITESS, etc. to be enabled.  Also, if feasible, please consider
using code from -CURRENT or at least RELENG_6_0, as there are more
debugging aids that is likely to catch bugs early.

Cheers,
--
Xin LI  http://www.delphij.net
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"




-
 Yahoo! DSL Something to write home about. Just $16.99/mo. or less
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Fatal trap 12: page fault while in kernel mode

2005-12-25 Thread Xin LI
Hi,

On 12/25/05, kamal kc <[EMAIL PROTECTED]> wrote:
[...]
> Is the problem related to memory leaks or sleeping
> on mutexes or some other causes.

>From the backtrace you have provided, it looks like a memory
corruption.  In order to aid your debugging, you will want INVARIANTS
and WITESS, etc. to be enabled.  Also, if feasible, please consider
using code from -CURRENT or at least RELENG_6_0, as there are more
debugging aids that is likely to catch bugs early.

Cheers,
--
Xin LI <[EMAIL PROTECTED]> http://www.delphij.net
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Fatal trap 12: page fault while in kernel mode

2005-12-25 Thread kamal kc
hello everybody,

i am recently troubled by kernel panics that occur as
soon as 
i run my modified kernel. the only modification i have
done
is i have added compression/decompression function in
the
bridge.c file. I am running 5.4 RELEASE. 

i am just a new beginner in programming the kernel and
may 
have insufficient knowledge regarding it.

things i have done in the function that could affect
the kernel operation are:
1. i frequently allocate memory using malloc() in
M_DEVBUF and 
M_TEMP with M_WAITOK flag
2. i allocate memory with malloc and construct tree.
the node count
can go up 350 so that i may call malloc about 600
times in the 
routine. i know that may sound pretty dumb but
right now i have 
no other choice now as i know so little.
3. the functions are pretty longer and contain loops
so they may consume time
since the bridge code may be called for all the
packets flowing 
through the network. 
4. i have used data structures like linked lists and
trees.

now the problem is as soon as i run my modified kernel
it panics with
fatal trap 12. the name of the process that crashed is
sometimes the cron,
sometimes ps, sometimes top, sometimes g_up, and
sometimes sendmail.

i don't know what to do because the i have tested the
function 
separately and it works fine. i used the dmalloc to
see whether 
the memory leak was present but i didnot find any. it
may be
posible that my tests with dmalloc were insufficient. 

So i have put the crash dumps here that may help 
some of you suggest me whether there is anything i 
can possibly do in order to solve this panic.

Is the problem related to memory leaks or sleeping 
on mutexes or some other causes. 

i have added my function just before the
IFQ_HANDOFF().

thanks,

kamal kc



Panic message:-->

kernel trap 12 with interrupts disabled

Fatal trap 12: page fault while in kernel mode
fault virtual address=0x6c
fault code =supervisor read, page not present
instruction pointer=0x8:0xc052eafd
stack pointer=0x10:0xd50349d0
frame pointer=0x10:0xd50349d4
code segment=base 0x0, limit 0xf, type 0x1b
=DPL 0, pres 1, def32 1, gran 1
processor eflags=resume, IOPL=0
current process=462 (sendmail)
trap number=12
panic: page fault


decomp# kgdb kernel.debug  /var/crash/vmcore.2


[GDB will not be able to debug user-mode threads:
/usr/lib/libthread_db.so: Undefined symbol
"ps_pglobal_lookup"]

GNU gdb 6.1.1 [FreeBSD]

Copyright 2004 Free Software Foundation, Inc.

GDB is free software, covered by the GNU General
Public License, and you are

welcome to change it and/or distribute copies of it
under certain conditions.

Type "show copying" to see the conditions.

There is absolutely no warranty for GDB.  Type "show
warranty" for details.

This GDB was configured as "i386-marcel-freebsd".

#0  doadump () at pcpu.h:159

159 __asm __volatile("movl %%fs:0,%0" : "=r" (td));

(kgdb) bt

#0  doadump () at pcpu.h:159

#1  0xc0510d86 in boot (howto=260) at
../../../kern/kern_shutdown.c:410

#2  0xc051101c in panic (fmt=0xc06b9b28 "%s")

at ../../../kern/kern_shutdown.c:566

#3  0xc0692820 in trap_fatal (frame=0xd5034990,
eva=108)

at ../../../i386/i386/trap.c:817

#4  0xc069200d in trap (frame=

  {tf_fs = 24, tf_es = 16, tf_ds = 16, tf_edi = 2,
tf_esi = -1049545216, tf_ebp = -721204780, tf_isp =
-721204804, tf_ebx = -1050635136, tf_edx =
-1050635136, tf_ecx = 0, tf_eax = -1049545184,
tf_trapno = 12, tf_err = 0, tf_eip = -1068307715,
tf_cs = 8, tf_eflags = 65539, tf_esp = -1049545216,
tf_ss = -721204748}) at ../../../i386/i386/trap.c:255

#5  0xc0682cda in calltrap () at
../../../i386/i386/exception.s:140

#6  0x0018 in ?? ()

#7  0x0010 in ?? ()

#8  0x0010 in ?? ()

#9  0x0002 in ?? ()

#10 0xc1713600 in ?? ()

#11 0xd50349d4 in ?? ()

#12 0xd50349bc in ?? ()

#13 0xc1609480 in ?? ()

#14 0xc1609480 in ?? ()

#15 0x in ?? ()

#16 0xc1713620 in ?? ()

---Type  to continue, or q  to quit---

#17 0x000c in ?? ()

#18 0x in ?? ()

#19 0xc052eafd in turnstile_setowner (ts=0xc1609480,
owner=0x0)

at ../../../kern/subr_turnstile.c:367

#20 0xc052edbf in turnstile_wait (ts=0xc1609480,
lock=0xc16ba800, owner=0x0)

at ../../../kern/subr_turnstile.c:504

#21 0xc0508769 in _mtx_lock_sleep (m=0xc16ba800,
td=0xc1713600, opts=0, 

file=0x0, line=0) at
../../../kern/kern_mutex.c:552

#22 0xc063c691 in ufsdirhash_lookup (ip=0xc170d1a4, 

name=0xc16dd009 "nss_compat.so.1", namelen=15,
offp=0x0, bpp=0x0, 

prevoffp=0x0) at
../../../ufs/ufs/ufs_dirhash.c:349

#23 0xc063e612 in ufs_lookup (ap=0xd5034b78)

at ../../../ufs/ufs/ufs_lookup.c:214

#24 0xc0645623 in ufs_vnoperate (ap=0x0) at
../../../ufs/ufs/ufs_vnops.c:2821

#25 0xc0558402 in vfs_cache_lookup (ap=0x0) at
vnode_if.h:82

#26 0xc0645623 in ufs_vnoperate (ap=0x0) at
../../../ufs/ufs/ufs_vnops.c:2

Re: Fatal trap 12: page fault while in kernel mode

2005-12-12 Thread John Baldwin
On Wednesday 07 December 2005 05:09 pm, Danilo Asara wrote:
> [EMAIL PROTECTED] [~]$ uname -a
> FreeBSD resolza.fastwebnet.it 6.0-STABLE FreeBSD 6.0-STABLE #0: Fri
> Nov18 11:19:38 CET
> [EMAIL PROTECTED]:/usr/obj/usr/src/sys/RESOLZA  i386
> [EMAIL PROTECTED] [~]$
>
>
> [EMAIL PROTECTED] [/usr/crash]# kgdb kernel.debug.0 vmcore.0
> [GDB will not be able to debug user-mode
> threads: /usr/lib/libthread_db.so: Undefined symbol "ps_pglobal_lookup"]
> GNU gdb 6.1.1 [FreeBSD]
> Copyright 2004 Free Software Foundation, Inc.
> GDB is free software, covered by the GNU General Public License, and you
> are
> welcome to change it and/or distribute copies of it under certain
> conditions.
> Type "show copying" to see the conditions.
> There is absolutely no warranty for GDB.  Type "show warranty" for
> details.
> This GDB was configured as "i386-marcel-freebsd".
>
> Unread portion of the kernel message buffer:
>
>
> Fatal trap 12: page fault while in kernel mode
> cpuid = 0; apic id = 00
> fault virtual address   = 0x0
> fault code  = supervisor read, page not present
> instruction pointer = 0x20:0xc0500411
> stack pointer   = 0x28:0xef58fcac
> frame pointer   = 0x28:0xef58fcdc
> code segment= base 0x0, limit 0xf, type 0x1b
> = DPL 0, pres 1, def32 1, gran 1
> processor eflags= interrupt enabled, resume, IOPL = 0
> current process = 722 (artsd)
> trap number = 12
> panic: page fault
> cpuid = 0
> KDB: stack backtrace:
> kdb_backtrace(100,c2a83a80,28,ef58fc6c,c) at kdb_backtrace+0x29
> panic(c06b2fec,c06d9f5b,0,f,c09b) at panic+0x114
> trap_fatal(ef58fc6c,0,c2a83a80,c2890bb8,c) at trap_fatal+0x2ca
> trap_pfault(ef58fc6c,0,0) at trap_pfault+0x1d7
> trap(8,28,28,c2ea9e70,c2a83a80) at trap+0x2fd
> calltrap() at calltrap+0x5
> --- trap 0xc, eip = 0xc0500411, esp = 0xef58fcac, ebp = 0xef58fcdc ---
> kse_release(c2a83a80,ef58fd04,1,0,200292) at kse_release+0x165
> syscall(3b,3b,3b,80f2100,81) at syscall+0x2bf
> Xint0x80_syscall() at Xint0x80_syscall+0x1f
> --- syscall (383, FreeBSD ELF32, kse_release), eip = 0x287d81af, esp =
> 0xbf9fef30, ebp = 0xbf9fef8c ---
> Uptime: 12h9m20s
> Dumping 1023 MB (2 chunks)
>   chunk 0: 1MB (159 pages) ... ok
>   chunk 1: 1023MB (261872 pages) 1007 991 975 959 943 927 911 895 879
> 863 847 831 815 799 783 767 751 735 719 703 687 671 655 639 623 607 591
> 575 559 543 527 511 495 479 463 447 431 415 399 383 367 351 335 319 303
> 287 271 255 239 223 207 191 175 159 143 127 111 95 79 63 47 31 15
>
> #0  doadump () at pcpu.h:165
> 165 pcpu.h: No such file or directory.
> in pcpu.h
> (kgdb) where
> #0  doadump () at pcpu.h:165
> #1  0xc05132bf in boot (howto=260)
> at /usr/src/sys/kern/kern_shutdown.c:399
> #2  0xc0513615 in panic (fmt=0xc06b2fec "%s")
> at /usr/src/sys/kern/kern_shutdown.c:555
> #3  0xc068d8ca in trap_fatal (frame=0xef58fc6c, eva=0)
> at /usr/src/sys/i386/i386/trap.c:831
> #4  0xc068d5d7 in trap_pfault (frame=0xef58fc6c, usermode=0, eva=0)
> at /usr/src/sys/i386/i386/trap.c:742
> #5  0xc068d1ed in trap (frame=
>   {tf_fs = 8, tf_es = 40, tf_ds = 40, tf_edi = -1024811408, tf_esi =
> -1029162368, tf_ebp = -279380772, tf_isp = -279380840, tf_ebx =
> -1026066384, tf_edx = -1029162368, tf_ecx = -1026066303, tf_eax = 0,
> tf_trapno = 12, tf_err = 0, tf_eip = -1068497903, tf_cs = 32, tf_eflags
> = 2687622, tf_esp = -1036728832, tf_ss = 30})
> at /usr/src/sys/i386/i386/trap.c:432
> #6  0xc067aaca in calltrap () at /usr/src/sys/i386/i386/exception.s:139
> #7  0xc0500411 in kse_release (td=0xc2a83a80, uap=0xef58fd04)
> at /usr/src/sys/kern/kern_kse.c:428

The problem is here.  You can try posting this to [EMAIL PROTECTED] and see 
if someone there can help you debug this further.

-- 
John Baldwin <[EMAIL PROTECTED]>  <><  http://www.FreeBSD.org/~jhb/
"Power Users Use the Power to Serve"  =  http://www.FreeBSD.org
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Fatal trap 12: page fault while in kernel mode

2005-12-07 Thread Danilo Asara
[EMAIL PROTECTED] [~]$ uname -a
FreeBSD resolza.fastwebnet.it 6.0-STABLE FreeBSD 6.0-STABLE #0: Fri
Nov18 11:19:38 CET
[EMAIL PROTECTED]:/usr/obj/usr/src/sys/RESOLZA  i386
[EMAIL PROTECTED] [~]$


[EMAIL PROTECTED] [/usr/crash]# kgdb kernel.debug.0 vmcore.0
[GDB will not be able to debug user-mode
threads: /usr/lib/libthread_db.so: Undefined symbol "ps_pglobal_lookup"]
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you
are
welcome to change it and/or distribute copies of it under certain
conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for
details.
This GDB was configured as "i386-marcel-freebsd".

Unread portion of the kernel message buffer:


Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00
fault virtual address   = 0x0
fault code  = supervisor read, page not present
instruction pointer = 0x20:0xc0500411
stack pointer   = 0x28:0xef58fcac
frame pointer   = 0x28:0xef58fcdc
code segment= base 0x0, limit 0xf, type 0x1b
= DPL 0, pres 1, def32 1, gran 1
processor eflags= interrupt enabled, resume, IOPL = 0
current process = 722 (artsd)
trap number = 12
panic: page fault
cpuid = 0
KDB: stack backtrace:
kdb_backtrace(100,c2a83a80,28,ef58fc6c,c) at kdb_backtrace+0x29
panic(c06b2fec,c06d9f5b,0,f,c09b) at panic+0x114
trap_fatal(ef58fc6c,0,c2a83a80,c2890bb8,c) at trap_fatal+0x2ca
trap_pfault(ef58fc6c,0,0) at trap_pfault+0x1d7
trap(8,28,28,c2ea9e70,c2a83a80) at trap+0x2fd
calltrap() at calltrap+0x5
--- trap 0xc, eip = 0xc0500411, esp = 0xef58fcac, ebp = 0xef58fcdc ---
kse_release(c2a83a80,ef58fd04,1,0,200292) at kse_release+0x165
syscall(3b,3b,3b,80f2100,81) at syscall+0x2bf
Xint0x80_syscall() at Xint0x80_syscall+0x1f
--- syscall (383, FreeBSD ELF32, kse_release), eip = 0x287d81af, esp =
0xbf9fef30, ebp = 0xbf9fef8c ---
Uptime: 12h9m20s
Dumping 1023 MB (2 chunks)
  chunk 0: 1MB (159 pages) ... ok
  chunk 1: 1023MB (261872 pages) 1007 991 975 959 943 927 911 895 879
863 847 831 815 799 783 767 751 735 719 703 687 671 655 639 623 607 591
575 559 543 527 511 495 479 463 447 431 415 399 383 367 351 335 319 303
287 271 255 239 223 207 191 175 159 143 127 111 95 79 63 47 31 15

#0  doadump () at pcpu.h:165
165 pcpu.h: No such file or directory.
in pcpu.h
(kgdb) where
#0  doadump () at pcpu.h:165
#1  0xc05132bf in boot (howto=260)
at /usr/src/sys/kern/kern_shutdown.c:399
#2  0xc0513615 in panic (fmt=0xc06b2fec "%s")
at /usr/src/sys/kern/kern_shutdown.c:555
#3  0xc068d8ca in trap_fatal (frame=0xef58fc6c, eva=0)
at /usr/src/sys/i386/i386/trap.c:831
#4  0xc068d5d7 in trap_pfault (frame=0xef58fc6c, usermode=0, eva=0)
at /usr/src/sys/i386/i386/trap.c:742
#5  0xc068d1ed in trap (frame=
  {tf_fs = 8, tf_es = 40, tf_ds = 40, tf_edi = -1024811408, tf_esi =
-1029162368, tf_ebp = -279380772, tf_isp = -279380840, tf_ebx =
-1026066384, tf_edx = -1029162368, tf_ecx = -1026066303, tf_eax = 0,
tf_trapno = 12, tf_err = 0, tf_eip = -1068497903, tf_cs = 32, tf_eflags
= 2687622, tf_esp = -1036728832, tf_ss = 30})
at /usr/src/sys/i386/i386/trap.c:432
#6  0xc067aaca in calltrap () at /usr/src/sys/i386/i386/exception.s:139
#7  0xc0500411 in kse_release (td=0xc2a83a80, uap=0xef58fd04)
at /usr/src/sys/kern/kern_kse.c:428
#8  0xc068dc0f in syscall (frame=
{tf_fs = 59, tf_es = 59, tf_ds = 59, tf_edi = 135209216, tf_esi =
129, tf_ebp = -1080037492, tf_isp = -279380636, tf_ebx = 679326900,
tf_edx = 3, tf_ecx = 31, tf_eax = 383, tf_trapno = 32, tf_err = 2,
tf_eip = 679313839, tf_cs = 51, tf_eflags = 2097810, tf_esp =
-1080037584, tf_ss = 59}) at /usr/src/sys/i386/i386/trap.c:976
#9  0xc067ab1f in Xint0x80_syscall ()
at /usr/src/sys/i386/i386/exception.s:200
#10 0x0033 in ?? ()
Previous frame inner to this frame (corrupt stack?)



___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Fatal trap 12: page fault while in kernel mode

2005-12-07 Thread John Baldwin
On Wednesday 07 December 2005 02:47 am, Yuri Khotyaintsev wrote:
> On Friday 02 December 2005 14.54, John Baldwin wrote:
> > On Friday 02 December 2005 05:00 am, Yuri Khotyaintsev wrote:
> > > I have the following panic occurring several times a week. The machine
> > > is an NFS server, and it usually panics early in the morning, when
> > > first people try to access it. After reboot it may work OK for 1-2
> > > days, and then panics again. I have tried changing memory and replacing
> > > disk which was exported via NFS, but nothing helped :(
> > >
> > > Any suggestion on how to fix this panic will be very much appreciated !
> >
> > This panic (in propagate_priority) is usually caused when a thread goes
> > to sleep while holding a mutex (which is forbidden).  If you enable
> > INVARIANTS and/or WITNESS you should get a better panic, and with WITNESS
> > you will even be warned when a thread goes to sleep while holding a
> > mutex.  However, these options do introduce considerable execution
> > overhead, and sometimes that overhead changes the timing enough to hide
> > the race. :(
>
> Here are the two panics which I got with INVARIANTS and WITNESS enabled.
>
> Unread portion of the kernel message buffer:
> Memory modified after free 0xc4759e00(508) val=0 @ 0xc4759e00
> panic: Most recently used by UFS dirhash

Well, this isn't the panic I was expecting, but it points to something 
trashing free'd memory via a stale pointer or some such.  You might be able 
to use MEMGUARD to track this down.

-- 
John Baldwin <[EMAIL PROTECTED]>  <><  http://www.FreeBSD.org/~jhb/
"Power Users Use the Power to Serve"  =  http://www.FreeBSD.org
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Fatal trap 12: page fault while in kernel mode

2005-12-07 Thread Yuri Khotyaintsev
On Friday 02 December 2005 14.54, John Baldwin wrote:
> On Friday 02 December 2005 05:00 am, Yuri Khotyaintsev wrote:
> > I have the following panic occurring several times a week. The machine is
> > an NFS server, and it usually panics early in the morning, when first
> > people try to access it. After reboot it may work OK for 1-2 days, and
> > then panics again. I have tried changing memory and replacing disk which
> > was exported via NFS, but nothing helped :(
> >
> > Any suggestion on how to fix this panic will be very much appreciated !
>
> This panic (in propagate_priority) is usually caused when a thread goes to
> sleep while holding a mutex (which is forbidden).  If you enable INVARIANTS
> and/or WITNESS you should get a better panic, and with WITNESS you will
> even be warned when a thread goes to sleep while holding a mutex.  However,
> these options do introduce considerable execution overhead, and sometimes
> that overhead changes the timing enough to hide the race. :(

Here are the two panics which I got with INVARIANTS and WITNESS enabled.

# kgdb /usr/obj/usr/src/sys/HEM.DEBUG/kernel.debug vmcore.8 
[GDB will not be able to debug user-mode threads: /usr/lib/libthread_db.so: 
Undefined symbol "ps_pglobal_lookup"]
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i386-marcel-freebsd".

Unread portion of the kernel message buffer:
Memory modified after free 0xc4759e00(508) val=0 @ 0xc4759e00
panic: Most recently used by UFS dirhash

Uptime: 11h8m36s
Dumping 511 MB (2 chunks)
  chunk 0: 1MB (160 pages) ... ok
  chunk 1: 511MB (130800 pages) 495 479 463 447 431 415 399 383 367 351 335 
319 303 287 271 255 239 223 207 191 175 159 143 127 111 95 79 63 47 31 15

#0  doadump () at pcpu.h:165
165 pcpu.h: No such file or directory.
in pcpu.h
(kgdb) where
#0  doadump () at pcpu.h:165
#1  0xc050fd4f in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:399
#2  0xc0510043 in panic (fmt=0xc06dccbb "Most recently used by %s\n")
at /usr/src/sys/kern/kern_shutdown.c:555
#3  0xc0648ccf in mtrash_ctor (mem=0xc4759e00, size=0, arg=0x0, flags=2)
at /usr/src/sys/vm/uma_dbg.c:137
#4  0xc06469c1 in uma_zalloc_arg (zone=0xc104d980, udata=0x0, flags=2)
at /usr/src/sys/vm/uma_core.c:1850
#5  0xc05043cd in malloc (size=400, mtp=0xc06fb700, flags=2) at uma.h:275
#6  0xc063fba9 in ufs_readdir (ap=0xd56eaaec)
at /usr/src/sys/ufs/ufs/ufs_vnops.c:1846
#7  0xc06a61cc in VOP_READDIR_APV (vop=0x0, a=0xd56eaaec) at vnode_if.c:1427
#8  0xc0607716 in nfsrv_readdir (nfsd=0xc4368c00, slp=0x0, td=0xc3326780, 
mrq=0xd56eac80) at vnode_if.h:746
#9  0xc060fa5b in nfssvc_nfsd (td=0x0)
at /usr/src/sys/nfsserver/nfs_syscalls.c:472
#10 0xc060f280 in nfssvc (td=0xc3326780, uap=0xd56ead04)
at /usr/src/sys/nfsserver/nfs_syscalls.c:181
#11 0xc069b6b0 in syscall (frame=
---Type  to continue, or q  to quit---
  {tf_fs = 59, tf_es = 59, tf_ds = 59, tf_edi = 0, tf_esi = 0, tf_ebp = 
-1077941464, tf_isp = -714166940, tf_ebx = 0, tf_edx = -1077936144, tf_ecx = 
1, tf_eax = 155, tf_trapno = 12, tf_err = 2, tf_eip = 671852067, tf_cs = 51, 
tf_eflags = 582, tf_esp = -1077941492, tf_ss = 59}) 
at /usr/src/sys/i386/i386/trap.c:981
#12 0xc068947f in Xint0x80_syscall () 
at /usr/src/sys/i386/i386/exception.s:200
#13 0x0033 in ?? ()
Previous frame inner to this frame (corrupt stack?)
(kgdb) quit

# kgdb /usr/obj/usr/src/sys/HEM.DEBUG/kernel.debug vmcore.9
[GDB will not be able to debug user-mode threads: /usr/lib/libthread_db.so: 
Undefined symbol "ps_pglobal_lookup"]
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i386-marcel-freebsd".

Unread portion of the kernel message buffer:
Memory modified after free 0xc5172800(508) val=0 @ 0xc5172800
panic: Most recently used by UFS dirhash

Uptime: 1d1h7m17s
Dumping 511 MB (2 chunks)
  chunk 0: 1MB (160 pages) ... ok
  chunk 1: 511MB (130800 pages) 495 479 463 447 431 415 399 383 367 351 335 
319 303 287 271 255 239 223 207 191 175 159 143 127 111 95 79 63 47 31 15

#0  doadump () at pcpu.h:165
165 pcpu.h: No such file or directory.
in pcpu.h
(kgdb) where
#0  doadump () at pcpu.h:165
#1  0xc050fd4f in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:399
#2  0xc0510043 in panic (fmt=0xc06dccbb "Most recently used by %s\n")
at /usr/src/sys/kern/kern_shutdown.c:555
#3  0xc0648ccf in mtrash_ctor (mem=0xc5172800, 

Re: Fatal trap 12: page fault while in kernel mode

2005-12-02 Thread Yuri Khotyaintsev
On Friday 02 December 2005 14.54, John Baldwin wrote:
> On Friday 02 December 2005 05:00 am, Yuri Khotyaintsev wrote:
> > I have the following panic occurring several times a week. The machine is
> > an NFS server, and it usually panics early in the morning, when first
> > people try to access it. After reboot it may work OK for 1-2 days, and
> > then panics again. I have tried changing memory and replacing disk which
> > was exported via NFS, but nothing helped :(
> >
> > Any suggestion on how to fix this panic will be very much appreciated !
>
> This panic (in propagate_priority) is usually caused when a thread goes to
> sleep while holding a mutex (which is forbidden).  If you enable INVARIANTS
> and/or WITNESS you should get a better panic, and with WITNESS you will
> even be warned when a thread goes to sleep while holding a mutex.  However,
> these options do introduce considerable execution overhead, and sometimes
> that overhead changes the timing enough to hide the race. :(

I am compiling a new kernel with INVARIANTS and WITNESS now. Will wait for a 
"better" panic ;-)

-- 
Dr. Yuri Khotyaintsev
Institutet för rymdfysik (IRF), Uppsala
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Fatal trap 12: page fault while in kernel mode

2005-12-02 Thread John Baldwin
On Friday 02 December 2005 05:00 am, Yuri Khotyaintsev wrote:
> I have the following panic occurring several times a week. The machine is
> an NFS server, and it usually panics early in the morning, when first
> people try to access it. After reboot it may work OK for 1-2 days, and then
> panics again. I have tried changing memory and replacing disk which was
> exported via NFS, but nothing helped :(
>
> Any suggestion on how to fix this panic will be very much appreciated !

This panic (in propagate_priority) is usually caused when a thread goes to 
sleep while holding a mutex (which is forbidden).  If you enable INVARIANTS 
and/or WITNESS you should get a better panic, and with WITNESS you will even 
be warned when a thread goes to sleep while holding a mutex.  However, these 
options do introduce considerable execution overhead, and sometimes that 
overhead changes the timing enough to hide the race. :(

-- 
John Baldwin <[EMAIL PROTECTED]>  <><  http://www.FreeBSD.org/~jhb/
"Power Users Use the Power to Serve"  =  http://www.FreeBSD.org
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Fatal trap 12: page fault while in kernel mode

2005-12-02 Thread Yuri Khotyaintsev
I have the following panic occurring several times a week. The machine is an 
NFS server, and it usually panics early in the morning, when first people try 
to access it. After reboot it may work OK for 1-2 days, and then panics 
again. I have tried changing memory and replacing disk which was exported via 
NFS, but nothing helped :(

Any suggestion on how to fix this panic will be very much appreciated ! 

/Yuri

[EMAIL PROTECTED]/var/crash]# uname -a
FreeBSD XXX.irfu.se 6.0-STABLE FreeBSD 6.0-STABLE #0: Tue Nov 29 13:31:15 CET 
2005 [EMAIL PROTECTED]:/usr/obj/usr/src/sys/HEM  i386
[EMAIL PROTECTED]/var/crash]# kgdb /usr/obj/usr/src/sys/HEM/kernel.debug 
vmcore.7
[GDB will not be able to debug user-mode threads: /usr/lib/libthread_db.so: 
Undefined symbol "ps_pglobal_lookup"]
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i386-marcel-freebsd".

Unread portion of the kernel message buffer:
kernel trap 12 with interrupts disabled


Fatal trap 12: page fault while in kernel mode
fault virtual address   = 0x74
fault code  = supervisor read, page not present
instruction pointer = 0x20:0xc053a426
stack pointer   = 0x28:0xd56c0b88
frame pointer   = 0x28:0xd56c0b8c
code segment= base 0x0, limit 0xf, type 0x1b
= DPL 0, pres 1, def32 1, gran 1
processor eflags= resume, IOPL = 0
current process = 77 (vnlru)
trap number = 12
panic: page fault
Uptime: 2d12h22m11s
Dumping 511 MB (2 chunks)
  chunk 0: 1MB (160 pages) ... ok
  chunk 1: 511MB (130800 pages) 495 479 463 447 431 415 399 383 367 351 335 
319 303 287 271 255 239 223 207 191 175 159 143 127 111 95 79 63 47 31 15

#0  doadump () at pcpu.h:165
165 pcpu.h: No such file or directory.
in pcpu.h
(kgdb) where
#0  doadump () at pcpu.h:165
#1  0xc051577a in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:399
#2  0xc0515a84 in panic (fmt=0xc06ce475 "%s") 
at /usr/src/sys/kern/kern_shutdown.c:555
#3  0xc06b4815 in trap_fatal (frame=0xd56c0b48, eva=0)
at /usr/src/sys/i386/i386/trap.c:836
#4  0xc06b3f2d in trap (frame=
  {tf_fs = 1133445128, tf_es = 40, tf_ds = 40, tf_edi = -1017997312, 
tf_esi = -1020120704, tf_ebp = -714339444, tf_isp = -714339468, tf_ebx = 
-1012942272, tf_edx = -1020120704, tf_ecx = 0, tf_eax = 0, tf_trapno = 12, 
tf_err = 0, tf_eip = -1068260314, tf_cs = 32, tf_eflags = 589831, tf_esp = 
-1020120704, tf_ss = -714339408})
at /usr/src/sys/i386/i386/trap.c:269
#5  0xc06a24fa in calltrap () at /usr/src/sys/i386/i386/exception.s:139
#6  0xc053a426 in turnstile_setowner (ts=0xc39fba40, owner=0x0)
at /usr/src/sys/kern/subr_turnstile.c:417
#7  0xc053a752 in turnstile_wait (lock=0xc461fe00, owner=0x0)
at /usr/src/sys/kern/subr_turnstile.c:576
#8  0xc050b511 in _mtx_lock_sleep (m=0xc461fe00, tid=3274846592, opts=0, 
file=0x0, line=0)
at /usr/src/sys/kern/kern_mutex.c:555
#9  0xc064becd in ufsdirhash_free (ip=0xc4a33840)
at /usr/src/sys/ufs/ufs/ufs_dirhash.c:289
#10 0xc064de66 in ufs_reclaim (ap=0x0) at /usr/src/sys/ufs/ufs/ufs_inode.c:175
#11 0xc06bef38 in VOP_RECLAIM_APV (vop=0x0, a=0xc3323180) at vnode_if.c:1589
#12 0xc057adfe in vgonel (vp=0xc3cf3aa0) at vnode_if.h:818
#13 0xc0577530 in vtryrecycle (vp=0xc3cf3aa0) 
at /usr/src/sys/kern/vfs_subr.c:840
#14 0xc0576ec6 in vnlru_free (count=1376) at /usr/src/sys/kern/vfs_subr.c:668
#15 0xc0577019 in vnlru_proc () at /usr/src/sys/kern/vfs_subr.c:703
#16 0xc04fc310 in fork_exit (callout=0xc0576f24 , arg=0x0, 
frame=0x0)
at /usr/src/sys/kern/kern_fork.c:789
#17 0xc06a255c in fork_trampoline () at /usr/src/sys/i386/i386/exception.s:208
(kgdb) quit

-- 
Dr. Yuri Khotyaintsev
Institutet för rymdfysik (IRF), Uppsala
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Fatal trap 12: page fault while in kernel mode - HELP!

2005-02-09 Thread Robert Watson
On Tue, 8 Feb 2005, ALeine wrote:

> [EMAIL PROTECTED] wrote: 
> 
> > We have two NFS file servers running FreeBSD 5.2.1-RELEASE-p9 on
> > Dell Poweredge 1750's that crash randomly. They each have about 1.3TB
> > of disk. They are used to server email and web content to several
> > WEB/EMAIL servers. Followin is the console log messages and the kernel boot
> > messages. Any ideas as to what the problem may be?
> 
> Try turning TCP SACK off by putting net.inet.tcp.sack.enable=0 in
> sysctl.conf. 

TCP SACK first shipped in 5.3...

Robert N M Watson


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Fatal trap 12: page fault while in kernel mode - HELP!

2005-02-09 Thread ALeine
[EMAIL PROTECTED] wrote: 

> We have two NFS file servers running FreeBSD 5.2.1-RELEASE-p9 on
> Dell Poweredge 1750's that crash randomly. They each have about 1.3TB
> of disk. They are used to server email and web content to several
> WEB/EMAIL servers. Followin is the console log messages and the kernel boot
> messages. Any ideas as to what the problem may be?

Try turning TCP SACK off by putting net.inet.tcp.sack.enable=0 in sysctl.conf.

ALeine

___
WebMail FREE http://mail.austrosearch.net 
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Fatal trap 12: page fault while in kernel mode - HELP!

2005-02-09 Thread Robert Watson

On Wed, 9 Feb 2005, David Rice wrote:

> We have two NFS file servers running FreeBSD 5.2.1-RELEASE-p9 on Dell
> Poweredge 1750's that crash randomly. They each have about 1.3TB of
> disk. They are used to server email and web content to several WEB/EMAIL
> servers.  Followin is the console log messages and the kernel boot
> messages. Any ideas as to what the problem may be? 

I guess there's no chance of updating to FreeBSD 5.3?  It has a lot of
cleanups, fixes, etc, some of which weren't appropriate to backport to
RELENG_5_2.

Robert N M Watson

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Fatal trap 12: page fault while in kernel mode - HELP!

2005-02-09 Thread David Rice
We have two NFS file servers running FreeBSD 5.2.1-RELEASE-p9 on Dell 
Poweredge 1750's that crash randomly. They each have about 1.3TB of disk. 
They are used to server email and web content to several WEB/EMAIL servers.
Followin is the console log messages and the kernel boot messages. Any ideas 
as to what the problem may be?


eme: fvp == tvp (can't happen)
ufs_rename: fvp == tvp (can't happen)
ufs_rename: fvp == tvp (can't happen)
ufs_rename: fvp == tvp (can't happen)
ufs_rename: fvp == tvp (can't happen)
Feb  9 07:57:31 dst5 kernel: pid 14842 (send_nsca), uid 0: exited on signal 11 
(core dumped)
ufs_rename: fvp == tvp (can't happen)
ufs_rename: fvp == tvp (can't happen)
ufs_rename: fvp == tvp (can't happen)
ufs_rename: fvp == tvp (can't happen)
ufs_rename: fvp == tvp (can't happen)
Feb  9 09:23:55 dst5 kernel: pid 20246 (send_nsca), uid 0: exited on signal 11 
(core dumped)
ufs_rename: fvp == tvp (can't happen)
ufs_rename: fvp == tvp (can't happen)
ufs_rename: fvp == tvp (can't happen)


Fatal trap 12: page fault while in kernel mode
cpuid = 1; apic id = 06
fault virtual address   = 0x20
fault code  = supervisor read, page not present
instruction pointer = 0x8:0xc056471f
stack pointer   = 0x10:0xe1b19920
frame pointer   = 0x10:0xe1b199b8
code segment= base 0x0, limit 0xf, type 0x1b
= DPL 0, pres 1, def32 1, gran 1
processor eflags= interrupt enabled, resume, IOPL = 0
current process = 61 (swi1: net)
kernel: type 12 trap, code=0
Stopped at  tcp_output+0xf: movl0x20(%esi),%eax
db> 
db> 
db> 
--
The Regents of the University of California. All rights reserved.
FreeBSD 5.2.1-RELEASE-p9 #1: Wed Sep  1 18:39:24 PDT 2004
[EMAIL PROTECTED]:/usr/obj/usr/src/sys/DST5
Preloaded elf kernel "/boot/kernel/kernel" at 0xc07c9000.
Preloaded elf module "/boot/kernel/acpi.ko" at 0xc07c921c.
ACPI APIC Table: 
Timecounter "i8254" frequency 1193182 Hz quality 0
CPU: Intel(R) Xeon(TM) CPU 2.40GHz (2387.13-MHz 686-class CPU)
  Origin = "GenuineIntel"  Id = 0xf29  Stepping = 9
  
Features=0xbfebfbff
real memory  = 1073573888 (1023 MB)
avail memory = 1037717504 (989 MB)
FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs
 cpu0 (BSP): APIC ID:  0
 cpu1 (AP): APIC ID:  6
ioapic0: Changing APIC ID to 8
ioapic1: Changing APIC ID to 9
ioapic2: Changing APIC ID to 10
ioapic0  irqs 0-15 on motherboard
ioapic1  irqs 16-31 on motherboard
ioapic2  irqs 32-47 on motherboard
Pentium Pro MTRR support enabled
npx0: [FAST]
stray irq13
npx0:  on motherboard
npx0: INT 16 interface
acpi0:  on motherboard
acpi0: Power Button (fixed)
Timecounter "ACPI-safe" frequency 3579545 Hz quality 1000
pcibios: BIOS version 2.10
Using $PIR table, 7 entries at 0xc00fc4a0
acpi_timer0: <32-bit timer at 3.579545MHz> port 0x808-0x80b on acpi0
acpi_cpu0:  on acpi0
acpi_cpu1:  on acpi0
acpi_cpu2:  on acpi0
device_probe_and_attach: acpi_cpu2 attach returned 6
acpi_cpu2:  on acpi0
device_probe_and_attach: acpi_cpu2 attach returned 6
pcib0:  port 0xcf8-0xcff on acpi0
pci0:  on pcib0
pcib0: slot 15 INTA is routed to irq 15
pci0:  at device 14.0 (no driver attached)
pci0:  at device 15.2 (no driver attached)
isab0:  at device 15.3 on pci0
isa0:  on isab0
pcib1:  on acpi0
pci4:  on pcib1
mpt0:  port 0xdc00-0xdcff mem 
0xfcb2-0xfcb2,0xfcb3-0xfcb3 irq 18 at device 5.0 on pci4
mpt1:  port 0xd800-0xd8ff mem 
0xfcb0-0xfcb0,0xfcb1-0xfcb1 irq 19 at device 5.1 on pci4
pcib2:  on acpi0
pci3:  on pcib2
amr0:  mem 0xfcc0-0xfcc0 irq 24 at device 6.0 on 
pci3
amr0:  Firmware 350O, BIOS 1.09, 128MB RAM
pcib3:  on acpi0
pci2:  on pcib3
bge0:  mem 
0xfcf2-0xfcf2,0xfcf3-0xfcf3 irq 16 at device 0.0 on pci2
bge0: Ethernet address: 00:0f:1f:66:34:0e
miibus0:  on bge0
brgphy0:  on miibus0
brgphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseTX, 
1000baseTX-FDX, auto
bge1:  mem 
0xfcf0-0xfcf0,0xfcf1-0xfcf1 irq 17 at device 0.1 on pci2
bge1: Ethernet address: 00:0f:1f:66:34:0f
miibus1:  on bge1
brgphy1:  on miibus1
brgphy1:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseTX, 
1000baseTX-FDX, auto
pcib4:  on acpi0
pci1:  on pcib4
fdc0:  port 
0x3f7,0x3f0-0x3f5 irq 6 drq 2 on acpi0
fdc0: FIFO enabled, 8 bytes threshold
fd0: <1440-KB 3.5" drive> on fdc0 drive 0
sio0 port 0x3f8-0x3ff irq 4 on acpi0
sio0: type 16550A, console
acpi_cpu2:  on acpi0
device_probe_and_attach: acpi_cpu2 attach returned 6
acpi_cpu2:  on acpi0
device_probe_and_attach: acpi_cpu2 attach returned 6
orm0:  at iomem 0xec000-0xe,0xc8000-0xcbfff,0xc-0xc7fff 
on isa0
ata0 at port 0x3f6,0x1f0-0x1f7 irq 14 on isa0
ata0: [MPSAFE]
ata1 at port 0x376,0x170-0x177 irq 15 on

Re: Fatal trap 12: page fault while in kernel mode

2004-08-23 Thread Kevin Brunelle
> If this is how I got most of my panics, this little script running in
> two different xterms helped decrease the time to panic.  It got my
> system to panic a lot with the older nvidia drivers.

[script trimmed out]

> This always helped get my system unstable on 4-STABLE rather quickly.  I
> think it was the issue of running two or more GL programs at the same
> time that caused or increased the problem.

lol, I might try that.  Although I really don't need to go that far. 
Lately, I have been able to spontaniously reboot by running five GL
applications at once.  Which isn't pleasant but doesn't concern me too
much.  Each time I've had a panic there has been only one gl application
running... and lately all GL programs are causing this issue.

> Are you using the latest nvidia drivers?

As a matter of fact, that is what I think caused the problem.  I just
upgraded to the latest drivers on the 19th... right before I had these
problems.  That combined with the fact that all of these issues can be
consistently caused by running gl programs gives me strong cause to
suspect it.

> You should not be mixing the FreeBSD AGP and the nvidia AGP together.
> Choose one or the other.

Yes, I suspect this might be part of the issue.  I don't remember seeing
this message before the new driver was installed.  But I do know that
the old kernel had it loaded (it was hard coded with the configuration
file).  I think the driver might have changed the way it handled the
presence of both AGPs.

> I have my own panic on 4-STABLE which I just reported in freebsd-stable:
> http://lists.freebsd.org/pipermail/freebsd-stable/2004-August/008530.html
> Would you like to trade?  :)

lol, I would love to... if I thought I could help.  But I am still
learning as much as I can about the kernel... nowhere near the level
required to help.

Kevin
-- 
"Down with disease, up before the dawn.
A thousand barefoot children, dancin? on my lawn"
-Phish "Down with Disease"

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Fatal trap 12: page fault while in kernel mode

2004-08-23 Thread Sean Farley
On Mon, 23 Aug 2004, Kevin Brunelle wrote:
Alright, this is driving me nuts.  For a little while there I could
not get the system to panic -- it would spontaniously reboot when
running a GL program instead of panic.  This afternoon it finally
panic'd (who would think that would be something I want to see but it
was).
If this is how I got most of my panics, this little script running in
two different xterms helped decrease the time to panic.  It got my
system to panic a lot with the older nvidia drivers.
#!/usr/local/bin/zsh
# Try with and without.
export __GL_SINGLE_THREADED=1
/bin/rm -f glxinfo.core
while [ 1 = 1 ]; do
/usr/X11R6/bin/glxinfo >& /dev/null
if [ -e glxinfo.core ]; then
echo "Core found."
/bin/rm -f glxinfo.core
fi
done
This always helped get my system unstable on 4-STABLE rather quickly.  I
think it was the issue of running two or more GL programs at the same
time that caused or increased the problem.
Are you using the latest nvidia drivers?

The error this time was a double fault (are we playing tennis?).  My
original issue was with a page fault in kernel mode.  And my original
problem also was related to a different function.  The function this
time is .
My panics were fairly random.
Take a look at all those sig-11s.  I would suspect bad memory but I
ran memtest86+ on this machine less than a week ago and everything was
fine -- not even a whiff of a problem.  I caused this panic by running
another gl application and I feel it is related to my orginal problem.
I also ran memtest86 for over a day without finding fault in the memory.
The sad thing is that almost any type of bad hardware can cause
stability issues.  At least this is what I was told.  Maybe the caps on
your system have started going bad?
Another thing that interested me is that the kernel dump seems
"corrupted" or incomplete... does the line "---Can't read userspace
from dump, or kernel process---" possibly imply that I did not get a
good dump at the time of the panic?
If anyone has any ideas about what to fix I would love to hear them.
I am tempted to change a few things myself that might be an issue (for
example, removing the FreeBSD agp which nvidia complains about in my
dmesg -- and also upgrading to  3-Beta1 ... so at least my kernel
panics will relate to making that system better).  But, until I know
that this is a dead end and no one wants to see anything, I am not
touching anything.  I don't want to ruin the chances of this being a
real bug and it not being fixed because I change something that just
hides it.
You should not be mixing the FreeBSD AGP and the nvidia AGP together.
Choose one or the other.
If you want me to get any information from the dump or try anything
please let me know.  You may have to tell me how to go about doing
stuff with gdb (I am not very experienced with its advanced features)
but I am willing to learn and do what I can.
I have my own panic on 4-STABLE which I just reported in freebsd-stable:
http://lists.freebsd.org/pipermail/freebsd-stable/2004-August/008530.html
Would you like to trade?  :)
Sean
---
[EMAIL PROTECTED]
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Fatal trap 12: page fault while in kernel mode

2004-08-23 Thread Kevin Brunelle
Alright, this is driving me nuts.  For a little while there I could not
get the system to panic -- it would spontaniously reboot when running a
GL program instead of panic.  This afternoon it finally panic'd (who
would think that would be something I want to see but it was).

I am attaching the transcript of me playing around with it.  It includes
the panic message as well as some debug output from gdb.  Although I am
not certain that is as helpful as I hoped it would be.  At the very end
I have included yet another uname -a and copy of my kernel configuration
file.

The error this time was a double fault (are we playing tennis?).  My
original issue was with a page fault in kernel mode.  And my original
problem also was related to a different function.  The function this
time is .

Take a look at all those sig-11s.  I would suspect bad memory but I ran
memtest86+ on this machine less than a week ago and everything was fine
-- not even a whiff of a problem.  I caused this panic by running
another gl application and I feel it is related to my orginal problem.

Another thing that interested me is that the kernel dump seems
"corrupted" or incomplete... does the line "---Can't read userspace from
dump, or kernel process---" possibly imply that I did not get a good
dump at the time of the panic?

If anyone has any ideas about what to fix I would love to hear them.  I
am tempted to change a few things myself that might be an issue (for
example, removing the FreeBSD agp which nvidia complains about in my
dmesg -- and also upgrading to  3-Beta1 ... so at least my kernel panics
will relate to making that system better).  But, until I know that this
is a dead end and no one wants to see anything, I am not touching
anything.  I don't want to ruin the chances of this being a real bug and
it not being fixed because I change something that just hides it.

If you want me to get any information from the dump or try anything
please let me know.  You may have to tell me how to go about doing stuff
with gdb (I am not very experienced with its advanced features) but I am
willing to learn and do what I can.

-Kevin
-- 
"Down with disease, up before the dawn.
A thousand barefoot children, dancin? on my lawn"
-Phish "Down with Disease"
Script started on Mon Aug 23 16:14:53 2004
/home/kevinb/crash# gdb -k kernel.debug vmcore.1
GNU gdb 5.2.1 (FreeBSD)
Copyright 2002 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i386-unknown-freebsd"...
panic: swp_pager_meta_free_all: failed to locate all swap meta blocks
panic messages:
---
panic: double fault

syncing disks, buffers remaining... 2177 2177 Copyright (c) 1992-2004 The FreeBSD 
Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
The Regents of the University of California. All rights reserved.
FreeBSD 5.2.1-RELEASE-p9 #0: Sun Aug 22 14:00:38 EDT 2004
[EMAIL PROTECTED]:/usr/obj/usr/src/sys/FOOKERN
Preloaded elf kernel "/boot/kernel/kernel" at 0xc0ce4000.
Preloaded elf module "/boot/modules/nvidia.ko" at 0xc0ce4244.
Preloaded elf module "/boot/kernel/linux.ko" at 0xc0ce42f0.
Preloaded elf module "/boot/kernel/acpi.ko" at 0xc0ce439c.
Timecounter "i8254" frequency 1193182 Hz quality 0
CPU: Intel Pentium III (863.87-MHz 686-class CPU)
  Origin = "GenuineIntel"  Id = 0x683  Stepping = 3
  
Features=0x383f9ff
real memory  = 268173312 (255 MB)
avail memory = 246661120 (235 MB)
Pentium Pro MTRR support enabled
npx0: [FAST]
npx0:  on motherboard
npx0: INT 16 interface
acpi0:  on motherboard
pcibios: BIOS version 2.10
Using $PIR table, 12 entries at 0xc00f2d00
acpi0: Power Button (fixed)
Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000
acpi_timer0: <24-bit timer at 3.579545MHz> port 0x408-0x40b on acpi0
acpi_cpu0:  port 0x530-0x537 on acpi0
acpi_button0:  on acpi0
pcib0:  port 0xcf8-0xcff on acpi0
pci0:  on pcib0
pcib0: slot 31 INTD is routed to irq 10
pcib0: slot 31 INTB is routed to irq 9
agp0:  mem 0xf800-0xfbff at device 
0.0 on pci0
pcib1:  at device 1.0 on pci0
pci2:  on pcib1
pcib0: slot 1 INTA is routed to irq 11
pcib1: slot 0 INTA is routed to irq 11
nvidia0:  mem 
0xf200-0xf3ff,0xfd00-0xfdff irq 11 at device 0.0 on pci2
pcib2:  at device 30.0 on pci0
pci1:  on pcib2
pcib2: slot 9 INTA is routed to irq 3
pcib2: slot 12 INTA is routed to irq 9
xl0: <3Com 3c905C-TX Fast Etherlink XL> port 0xdc00-0xdc7f mem 0xfc9ff800-0xfc9ff87f 
irq 3 at device 9.0 on pci1
xl0: Ethernet address: 00:01:03:23:9d:ba
miibus0:  on xl0
xlphy0: <3c905C 10/100 internal PHY> on miibus0
xlphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
pcm0:  port 0xdf00-0xdf3f irq 9 at device 12.0 on pci1
pcm0: 
isab0:  at device 31.0 on pci0
isa0:  on isab0
ata

Re: Fatal trap 12: page fault while in kernel mode

2004-08-22 Thread Brian Fundakowski Feldman
On Sun, Aug 22, 2004 at 11:22:43AM -0400, Kevin Brunelle wrote:
> Okay,
> 
> Replication does not look like it will be an issue.  Again, the system
> panic'd while running a gl application when I was at work.  This time I
> did get a core dump (but I still don't have a debugging kernel -- it was
> building ARG).
> 
> Right now, I am going to disable my screensaver and carefully avoid
> applications which might cause the panic again.  Once the proper kernel
> is in place... then it is go-time.
> 
> If anyone is interested I am going to save the dump -- but it probably
> is worth the wait (in saved effort) till I have a proper kernel in
> place.  I am almost 100% sure this is due to the nvidia drivers -- I
> upgraded on the 19th and never had a problem before this... that and gl
> programs seem to be the cause of both crashes so far.

Andreas, are you using the nvidia driver too?

-- 
Brian Fundakowski Feldman   \'[ FreeBSD ]''\
  <> [EMAIL PROTECTED]   \  The Power to Serve! \
 Opinions expressed are my own.   \,,\
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Fatal trap 12: page fault while in kernel mode

2004-08-22 Thread Kevin Brunelle
Okay,

Replication does not look like it will be an issue.  Again, the system
panic'd while running a gl application when I was at work.  This time I
did get a core dump (but I still don't have a debugging kernel -- it was
building ARG).

Right now, I am going to disable my screensaver and carefully avoid
applications which might cause the panic again.  Once the proper kernel
is in place... then it is go-time.

If anyone is interested I am going to save the dump -- but it probably
is worth the wait (in saved effort) till I have a proper kernel in
place.  I am almost 100% sure this is due to the nvidia drivers -- I
upgraded on the 19th and never had a problem before this... that and gl
programs seem to be the cause of both crashes so far.

Kevin
-- 
"Down with disease, up before the dawn.
A thousand barefoot children, dancin? on my lawn"
-Phish "Down with Disease"

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Fatal trap 12: page fault while in kernel mode

2004-08-22 Thread Kevin Brunelle
pbus0: IEEE1284 device found /NIBBLE/ECP
Probing for PnP devices on ppbus0:
ppbus0:  MLC,PCL,PML
plip0:  on ppbus0
lpt0:  on ppbus0
lpt0: Interrupt-driven port
ppi0:  on ppbus0
pmtimer0 on isa0
sc0:  at flags 0x100 on isa0
sc0: VGA <16 virtual consoles, flags=0x300>
sio1: configured irq 3 not in bitmap of probed irqs 0
sio1: port may not be enabled
vga0:  at port 0x3c0-0x3df iomem 0xa-0xb on isa0
Timecounter "TSC" frequency 863866798 Hz quality 800
Timecounters tick every 10.000 msec
GEOM: create disk ad0 dp=0xc31c3460
ad0: 28629MB  [58168/16/63] at ata0-master UDMA66
GEOM: create disk ad1 dp=0xc31c3160
ad1: 57220MB  [116257/16/63] at ata0-slave UDMA100
acd0: CDRW  at ata1-master PIO4
acd1: CDROM  at ata1-slave PIO4
Mounting root from ufs:/dev/ad0s1a
WARNING: / was not properly dismounted
WARNING: /tmp was not properly dismounted
WARNING: /usr was not properly dismounted
WARNING: /var was not properly dismounted
/var: mount pending error: blocks 4 files 0
WARNING: /drv1 was not properly dismounted
NVRM: detected agp.ko, aborting NVIDIA AGP setup!
NVRM: detected agp.ko, aborting NVIDIA AGP setup!
pid 1034 (bouncingcow), uid 1000: exited on signal 8 (core dumped)
pid 1420 (antspotlight), uid 1000: exited on signal 11 (core dumped)
Aug 20 07:33:50 fnord syslogd: kernel boot file is /boot/kernel/kernel
Aug 20 07:33:50 fnord kernel: pid 7702 (gleidescope), uid 1000: exited on signal 11 
(core dumped)
Aug 20 07:33:50 fnord kernel: TPTE at 0xbfca0168  IS ZERO @ VA 2805a000
Aug 20 07:33:50 fnord kernel: panic: bad pte
Aug 20 07:33:50 fnord kernel: 
Aug 20 07:33:50 fnord kernel: syncing disks, buffers remaining... kernel trap 12 with 
interrupts disabled
Aug 20 07:33:50 fnord kernel: 
Aug 20 07:33:50 fnord kernel: 
Aug 20 07:33:50 fnord kernel: Fatal trap 12: page fault while in kernel mode
Aug 20 07:33:50 fnord kernel: fault virtual address = 0x24
Aug 20 07:33:50 fnord kernel: fault code= supervisor read, page not 
present
Aug 20 07:33:50 fnord kernel: instruction pointer   = 0x8:0xc05450ae
Aug 20 07:33:50 fnord kernel: stack pointer = 0x10:0xcde47c24
Aug 20 07:33:50 fnord kernel: frame pointer = 0x10:0xcde47c48
Aug 20 07:33:50 fnord kernel: code segment  = base 0x0, limit 0xf, 
type 0x1b
Aug 20 07:33:50 fnord kernel: = DPL 0, pres 1, def32 1, gran 1
Aug 20 07:33:50 fnord kernel: processor eflags  = resume, IOPL = 0
Aug 20 07:33:50 fnord kernel: current process   = 28 (swi8: tty:sio clock)
Aug 20 07:33:50 fnord kernel: trap number   = 12
Aug 20 07:33:50 fnord kernel: panic: page fault
Aug 20 07:33:50 fnord kernel: Uptime: 16h50m58s
Aug 20 07:33:50 fnord kernel: Shutting down ACPI
Aug 20 07:33:50 fnord kernel: kernel trap 12 with interrupts disabled
Aug 20 07:33:50 fnord kernel: 
Aug 20 07:33:50 fnord kernel: 
Aug 20 07:33:50 fnord kernel: Fatal trap 12: page fault while in kernel mode
Aug 20 07:33:50 fnord kernel: fault virtual address = 0x10
Aug 20 07:33:50 fnord kernel: fault code= supervisor write, page not 
present
Aug 20 07:33:50 fnord kernel: instruction pointer   = 0x8:0xc054567a
Aug 20 07:33:50 fnord kernel: stack pointer = 0x10:0xcde478d8
Aug 20 07:33:50 fnord kernel: frame pointer = 0x10:0xcde478f8
Aug 20 07:33:50 fnord kernel: code segment  = base 0x0, limit 0xf, 
type 0x1b
Aug 20 07:33:50 fnord kernel: = DPL 0, pres 1, def32 1, gran 1
Aug 20 07:33:50 fnord kernel: processor eflags  = resume, IOPL = 0
Aug 20 07:33:50 fnord kernel: current process   = 28 (swi8: tty:sio clock)
Aug 20 07:33:50 fnord kernel: trap number   = 12
Aug 20 07:33:50 fnord kernel: panic: page fault
Aug 20 07:33:50 fnord kernel: Uptime: 16h50m58s
Aug 20 07:33:50 fnord kernel: Shutting down ACPI
Aug 20 07:33:50 fnord kernel: Automatic reboot in 15 seconds - press a key on the 
console to abort
Aug 20 07:33:50 fnord kernel: Rebooting...
Aug 20 07:33:50 fnord kernel: Copyright (c) 1992-2004 The FreeBSD Project.
Aug 20 07:33:50 fnord kernel: Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 
1992, 1993, 1994
Aug 20 07:33:50 fnord kernel: The Regents of the University of California. All rights 
reserved.
Aug 20 07:33:50 fnord kernel: FreeBSD 5.2.1-RELEASE-p9 #0: Tue Aug  3 19:13:31 EDT 2004
Aug 20 07:33:50 fnord kernel: [EMAIL PROTECTED]:/usr/src/sys/i386/compile/FOOKERN
Aug 20 07:33:50 fnord kernel: Preloaded elf kernel "/boot/kernel/kernel" at 0xc0ce8000.
Aug 20 07:33:50 fnord kernel: Preloaded elf module "/boot/kernel/splash_bmp.ko" at 
0xc0ce8244.
Aug 20 07:33:50 fnord kernel: Preloaded splash_image_data "/boot/splash.bmp" at 
0xc0ce82f4.
Aug 20 07:33:50 fnord kernel: Preloaded elf module "/boot/kernel/linux.ko" at 
0xc0ce8344.
Aug 20 07:33:50 fnord kernel: Preloaded elf module "/boot/modules/nvidia.ko" at 
0xc0ce83f0.
Aug 20 07:33:50 fnord kernel: Preloaded elf module "/boot/kernel/acp

IP fragmentation (was Re: Fatal trap 12: page fault while in kernel mode)

2002-04-05 Thread Bruce A. Mah

[Moving to -net]

If memory serves me right, Andrew Gallatin wrote:

>  > Alternately, it would be a good idea to have a "ip_maxpacketfrags"
>  > instead of an "ip_maxfragpackets", to put a hard limit on the
>  > number of mbufs that can be consumed by the fragment reassembly
>  > process.
> 
> I think this is the best solution.

Just for the heck of it, I started reading through ip_input.c to see how
hard this would be to do.  Haven't got there yet, I saw something odd:
the variables ip_nfragpackets and nipq look *awfully* similar.

It looks like they both track the number of reassembly queues, because
they're initialized to zero, and incremented and decremented at the same
time.  Their limits (ip_maxfragpackets and maxnipq respectively) are
even initialized on consecutive lines.

The only difference I can see is that in ip_input(), if nipq > maxnipq,
all of the fragments for some other packet in the current hash bucket
get dropped (with the wonderfully descriptive comment "gak").  The check
for ip_nfragpackets comes in ip_reass(), where if ip_nfragpackets >=
ip_maxfragpackets, then we drop the current fragment.  (Is it possible 
that the second check masks the effects of the first?)

I couldn't find any obvious explanation in the CVS log for ip_input.c.

Am I missing something, or are these two variables basically doing the 
same thing?

Thanks,

Bruce.



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Fatal trap 12: page fault while in kernel mode

2002-04-05 Thread Andrew Gallatin


Terry Lambert writes:
 > Andrew Gallatin wrote:
 > > The problem is that ip_maxfragpackets is:
 > > "Maximum number of IPv4 fragment reassembly queue entries"
 > > 
 > > You (& I, & most people probably) took that number to mean the cap on
 > > the number of mbufs sitting on reassembly queues.  However, its really
 > > a cap on the number of fragmented packets sitting on reassembly
 > > queues:
 > 
 > [ ... ]
 > 
 > > Since the linux host is sending 16K packets, that means that each
 > > packet is made up of 11 cluster mbufs (assuming a 1500 byte mtu).
 > > There can be as many as 10 cluster mbufs on the reassembly queue for
 > > for each packet.
 > > 
 > > Lets say we have 2048 cluster mbufs.  That makes maxfragpackets 512.
 > > However, 512 * 10 mbufs = 5120 mbufs.  Oops.
 > > 
 > > I think the limit should probably be something much smaller, like
 > > maybe nmbclusters / (net.inet.udp.recvspace / 1472).  Or the
 > > implementation & name should be changed to "maxfragmbufs"
 > 
 > 
 > This suggests that one could fragment as large a UDP packet
 > as one chooses into "n" fragments, and then supply only "n-1"
 > elements of the whole packet, as an attack, in order to use
 > up system resources.

Essentially what a linux NFS client is already doing.. ;-(

 > I think we are better off with my suggestion, where udp packets
 > above a certain size are intentionally dropped as "not supported".

Depending on what the "certain size" is, that might be reasonable.

 > Alternately, it would be a good idea to have a "ip_maxpacketfrags"
 > instead of an "ip_maxfragpackets", to put a hard limit on the
 > number of mbufs that can be consumed by the fragment reassembly
 > process.

I think this is the best solution.

 > Of course, this also suggests that using TCP instead of UDP for
 > the NFS would result in the problem "just going away", for the
 > original poster, which is probably all the opriginal poster
 > really cares about...

Considering that a modern linux NFS client is going to be a common
scenario, we should probably be able to interroperate with it, no
matter how broken its defaults are.  BTW, 16K UDP packets are legal
according to the NFS V3 spec, if I remember it correctly.

Drew


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Fatal trap 12: page fault while in kernel mode

2002-04-05 Thread Terry Lambert

Andrew Gallatin wrote:
> The problem is that ip_maxfragpackets is:
> "Maximum number of IPv4 fragment reassembly queue entries"
> 
> You (& I, & most people probably) took that number to mean the cap on
> the number of mbufs sitting on reassembly queues.  However, its really
> a cap on the number of fragmented packets sitting on reassembly
> queues:

[ ... ]

> Since the linux host is sending 16K packets, that means that each
> packet is made up of 11 cluster mbufs (assuming a 1500 byte mtu).
> There can be as many as 10 cluster mbufs on the reassembly queue for
> for each packet.
> 
> Lets say we have 2048 cluster mbufs.  That makes maxfragpackets 512.
> However, 512 * 10 mbufs = 5120 mbufs.  Oops.
> 
> I think the limit should probably be something much smaller, like
> maybe nmbclusters / (net.inet.udp.recvspace / 1472).  Or the
> implementation & name should be changed to "maxfragmbufs"


This suggests that one could fragment as large a UDP packet
as one chooses into "n" fragments, and then supply only "n-1"
elements of the whole packet, as an attack, in order to use
up system resources.

I think we are better off with my suggestion, where udp packets
above a certain size are intentionally dropped as "not supported".

Alternately, it would be a good idea to have a "ip_maxpacketfrags"
instead of an "ip_maxfragpackets", to put a hard limit on the
number of mbufs that can be consumed by the fragment reassembly
process.

Of course, this also suggests that using TCP instead of UDP for
the NFS would result in the problem "just going away", for the
original poster, which is probably all the opriginal poster
really cares about...

-- Terry

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Fatal trap 12: page fault while in kernel mode

2002-04-05 Thread Andrew Gallatin


Bruce A. Mah writes:
 > 
 > I was discussing this with some of my cow-orkers, as we've had a similar
 > situation (cluster mbufs getting temporarily depleted on a
 > 4.5-RELEASE-p2 NFS server with Linux and FreeBSD clients, but no kernel
 > panics).  Shouldn't the net.inet.ip.maxfragpackets sysctl variable
 > (introduced in 4.4-RELEASE) limit the number of fragments on the
 > reassembly queue(s)?  This value looks to be about 1/4 the number of
 > cluster mbufs, by default.

That's a good point.  When I was bitten by this, I didn't have time to
mess with things & I cranked down the read/write size on the linux
clients.   

The problem is that ip_maxfragpackets is:
"Maximum number of IPv4 fragment reassembly queue entries"


You (& I, & most people probably) took that number to mean the cap on
the number of mbufs sitting on reassembly queues.  However, its really
a cap on the number of fragmented packets sitting on reassembly
queues:

/*
 * If first fragment to arrive, create a reassembly queue.
 */
if (fp == 0) {
/*
 * Enforce upper bound on number of fragmented packets
 * for which we attempt reassembly;
 * If maxfrag is 0, never accept fragments.
 * If maxfrag is -1, accept all fragments without limitation.
 <...>

Since the linux host is sending 16K packets, that means that each
packet is made up of 11 cluster mbufs (assuming a 1500 byte mtu).
There can be as many as 10 cluster mbufs on the reassembly queue for
for each packet.

Lets say we have 2048 cluster mbufs.  That makes maxfragpackets 512.
However, 512 * 10 mbufs = 5120 mbufs.  Oops.

I think the limit should probably be something much smaller, like
maybe nmbclusters / (net.inet.udp.recvspace / 1472).  Or the
implementation & name should be changed to "maxfragmbufs"

Drew


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Fatal trap 12: page fault while in kernel mode

2002-04-04 Thread Bruce A. Mah

If memory serves me right, Andrew Gallatin wrote:
> 
> Will Froning writes:
>  > I have a 4.5-RELEASE-p2 box that is my Firewall/NAT/NFS server.  As a
>  > NFS client I have a RH7.2 linux box.  When I do massive NFS writes to
>  > my FBSD (from RH7.2 box), I get a panic.  I've attached the info I got
>  > from my debug kernel.
>  > 
> 
> While the fix being discussed by Peter & others will prevent panics,
> the linux box will still run your server out of mbufs clusters.  This
> is happening because the linux box is using a 16K write size over UDP
> by default.  This is a stupid default.  If there is any lossage
> between the hosts (eg, any packets get dropped), more and more packets
> will end up on the reassembly queues.  Eventually, all your cluster
> mbufs will be there.

I was discussing this with some of my cow-orkers, as we've had a similar
situation (cluster mbufs getting temporarily depleted on a
4.5-RELEASE-p2 NFS server with Linux and FreeBSD clients, but no kernel
panics).  Shouldn't the net.inet.ip.maxfragpackets sysctl variable
(introduced in 4.4-RELEASE) limit the number of fragments on the
reassembly queue(s)?  This value looks to be about 1/4 the number of
cluster mbufs, by default.

Bruce.



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Fatal trap 12: page fault while in kernel mode

2002-04-04 Thread Terry Lambert

Andrew Gallatin wrote:
> While the fix being discussed by Peter & others will prevent panics,
> the linux box will still run your server out of mbufs clusters.  This
> is happening because the linux box is using a 16K write size over UDP
> by default.  This is a stupid default.  If there is any lossage
> between the hosts (eg, any packets get dropped), more and more packets
> will end up on the reassembly queues.  Eventually, all your cluster
> mbufs will be there.
> 
> I suggest changing the mount options on the linux box to use 8k reads
> and writes, or use TCP.

Good observation.  Actually, for a firewall box, it might be
reasonable to drop UDP packets over a certain size, and to
drop certain classes of frags.

This won't help the original poster with the Linux problem;
they would still have to reconfigure their Linux machine to
use smaller writes.

> Another problem I've see w/Linux NFS clients is that recent linux NFS
> clients seem to spew ACCESS requests like there's no tomorrow & beats
> the snot out of my NFS server.  When building large software pacakges
> via "make -j4" over NFSv3 (100Mb ethernet) on a dual PIII 1GHz system,
> a FreeBSD 4.5 host issues 400-500 ACCESS calls/sec.  A Linux 2.4.18
> host spews 12,000 - 14,000 ACCESS calls/sec, or roughly 30 times as
> many.  Needless to say, the build finishes a whole lot quicker on
> FreeBSD.  Does anybody know what I can do to make the linux client
> cache ACCESS info?

Apart from installing FreeBSD instead?  8-).

I think that it will take some hacking of the Linux NFS code
by someone who cares about Linux performance.

-- Terry

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Fatal trap 12: page fault while in kernel mode

2002-04-04 Thread Andrew Gallatin


Will Froning writes:
 > I have a 4.5-RELEASE-p2 box that is my Firewall/NAT/NFS server.  As a
 > NFS client I have a RH7.2 linux box.  When I do massive NFS writes to
 > my FBSD (from RH7.2 box), I get a panic.  I've attached the info I got
 > from my debug kernel.
 > 

While the fix being discussed by Peter & others will prevent panics,
the linux box will still run your server out of mbufs clusters.  This
is happening because the linux box is using a 16K write size over UDP
by default.  This is a stupid default.  If there is any lossage
between the hosts (eg, any packets get dropped), more and more packets
will end up on the reassembly queues.  Eventually, all your cluster
mbufs will be there.

I suggest changing the mount options on the linux box to use 8k reads
and writes, or use TCP.

Another problem I've see w/Linux NFS clients is that recent linux NFS
clients seem to spew ACCESS requests like there's no tomorrow & beats
the snot out of my NFS server.  When building large software pacakges
via "make -j4" over NFSv3 (100Mb ethernet) on a dual PIII 1GHz system,
a FreeBSD 4.5 host issues 400-500 ACCESS calls/sec.  A Linux 2.4.18
host spews 12,000 - 14,000 ACCESS calls/sec, or roughly 30 times as
many.  Needless to say, the build finishes a whole lot quicker on
FreeBSD.  Does anybody know what I can do to make the linux client
cache ACCESS info?

Cheers,

Drew



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Fatal trap 12: page fault while in kernel mode

2002-04-04 Thread Peter Wemm

Terry Lambert wrote:
> David Greenman wrote:
> > >#16 0xc0152220 in tsleep ()
> > >#17 0xc016abfe in m_clalloc_wait ()
> > >#18 0xc01c8b14 in nfs_realign ()
> > >#19 0xc01c9653 in nfsrv_rcv ()
> > >#20 0xc01701d0 in sowakeup ()
> > >#21 0xc01abd7c in udp_input ()
> > >#22 0xc01a1bfb in ip_input ()
> > >#23 0xc01a1c5b in ipintr ()
> > 
> >This is basically telling you that there is a bug in the NFS code that i
s
> > incorrectly trying to do a "wait" type of allocation in an interrupt contex
t,
> > which is not valid. You can't sleep when there is no process context.
> 
> Amusing.
> 
> Then the fix is probably to take the proc pointer of the
> proc whose socket is being used to do the call, which is
> the third argument to nfssvc_addsock(), and put it into
> the structure pointed to by "struct nfssvc_sock *" as the
> argument to the upcall.
> 
> Then, in the upcall code in nfsrv_rcv(), pass the proc
> pointer down as the process context.
> 
> I think, actually, that multiple sleeps by the same process
> are also disallowed (;^)), so probably...
> 
> 
> You will need to modify nfs_realign() to take a waitflag,
> as propagated from nfsrv_rcv()... and then pass it through
> on the MCLGET and the MGET, to make sure that if the alloc
> fails, that it's OK.
> 
> This does point out a problem in MCLGET() (the macro that
> wraps m_clalloc_wait()) wanting a process context.
> 
> Probably, the best thing would be to pass a proc p in, and
> if it's NULL, just imply no wait semantics.
> 
> What an ugly mess...

Terry, if you spent half of the time reading the code as speculating and
writing about your wild speculation, you'd know that we already have a
"waitflag" for nfsrv_rcv() to track safeness to wait or not.  The bug is that
nfs_realign doesn't take the 'waitflag' argument and has two 'can wait'
mbuf allocation calls.

The fix is trivial and hardly ugly.  But then again, anybody who actually
bothered to read the code before posting would know that.

Cheers,
-Peter
--
Peter Wemm - [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED]
"All of this is for nothing if we don't go to the stars" - JMS/B5


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Fatal trap 12: page fault while in kernel mode

2002-04-03 Thread Terry Lambert

Peter Wemm wrote:
> > You will need to modify nfs_realign() to take a waitflag,
 +++
> > as propagated from nfsrv_rcv()... and then pass it through
   *

> Terry, if you spent half of the time reading the code as speculating and
> writing about your wild speculation, you'd know that we already have a
> "waitflag" for nfsrv_rcv() to track safeness to wait or not.

If you had read the above, you'd see I knew that.  Note the
asterisk marked phrase.

> The bug is that
> nfs_realign doesn't take the 'waitflag' argument and has two 'can wait'
> mbuf allocation calls.

I said that, too.  Note the plus sign marked phrase.  8-).

> The fix is trivial and hardly ugly.  But then again, anybody who actually
> bothered to read the code before posting would know that.

It was a general comment on the NFS code.

You suggested exactly the same fix I did...

-- Terry

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Fatal trap 12: page fault while in kernel mode

2002-04-03 Thread Terry Lambert

David Greenman wrote:
> >#16 0xc0152220 in tsleep ()
> >#17 0xc016abfe in m_clalloc_wait ()
> >#18 0xc01c8b14 in nfs_realign ()
> >#19 0xc01c9653 in nfsrv_rcv ()
> >#20 0xc01701d0 in sowakeup ()
> >#21 0xc01abd7c in udp_input ()
> >#22 0xc01a1bfb in ip_input ()
> >#23 0xc01a1c5b in ipintr ()
> 
>This is basically telling you that there is a bug in the NFS code that is
> incorrectly trying to do a "wait" type of allocation in an interrupt context,
> which is not valid. You can't sleep when there is no process context.

Amusing.

Then the fix is probably to take the proc pointer of the
proc whose socket is being used to do the call, which is
the third argument to nfssvc_addsock(), and put it into
the structure pointed to by "struct nfssvc_sock *" as the
argument to the upcall.

Then, in the upcall code in nfsrv_rcv(), pass the proc
pointer down as the process context.

I think, actually, that multiple sleeps by the same process
are also disallowed (;^)), so probably...


You will need to modify nfs_realign() to take a waitflag,
as propagated from nfsrv_rcv()... and then pass it through
on the MCLGET and the MGET, to make sure that if the alloc
fails, that it's OK.

This does point out a problem in MCLGET() (the macro that
wraps m_clalloc_wait()) wanting a process context.

Probably, the best thing would be to pass a proc p in, and
if it's NULL, just imply no wait semantics.

What an ugly mess...

-- Terry

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Fatal trap 12: page fault while in kernel mode

2002-04-03 Thread David Greenman

>Fatal trap 12: page fault while in kernel mode
>fault virtual address  = 0x70

>#12 0xc014f61d in panic ()
>#13 0xc025c02f in trap_fatal ()
>#14 0xc025bcdd in trap_pfault ()
>#15 0xc025b883 in trap ()
>#16 0xc0152220 in tsleep ()
>#17 0xc016abfe in m_clalloc_wait ()
>#18 0xc01c8b14 in nfs_realign ()
>#19 0xc01c9653 in nfsrv_rcv ()
>#20 0xc01701d0 in sowakeup ()
>#21 0xc01abd7c in udp_input ()
>#22 0xc01a1bfb in ip_input ()
>#23 0xc01a1c5b in ipintr ()

   This is basically telling you that there is a bug in the NFS code that is
incorrectly trying to do a "wait" type of allocation in an interrupt context,
which is not valid. You can't sleep when there is no process context.

-DG

David Greenman
Co-founder, The FreeBSD Project - http://www.freebsd.org
President, TeraSolutions, Inc. - http://www.terasolutions.com
President, Download Technologies, Inc. - http://www.downloadtech.com
Pave the road of life with opportunities.

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Fatal trap 12: page fault while in kernel mode

2002-04-03 Thread Terry Lambert

Will Froning wrote:
> #12 0xc014f61d in panic ()
> #13 0xc025c02f in trap_fatal ()
> #14 0xc025bcdd in trap_pfault ()
> #15 0xc025b883 in trap ()
> #16 0xc0152220 in tsleep ()
> #17 0xc016abfe in m_clalloc_wait ()

The tsleep tried to reference a page that wasn't there.  This
supposedly can't happen.  Here is the tsleep:

caddr_t
m_clalloc_wait(void)
{
...
/* Sleep until something's available or until we expire. */
m_clalloc_wid++;
if ((tsleep(&m_clalloc_wid, PVM, "mclalc", mbuf_wait)) ==
 EWOULDBLOCK)
m_clalloc_wid--;

The m_clalloc_wid is a global variable, so it's not swapped out.

The mbuf_wait is a tunable; it defaults to 32.  You might want to
try tuning this higher... making it wait longer... or setting it
to 0 -- making it wait forever.  This would workaround, or eliminate
you problem.

That the thing panics implies to me that the page that it references
got swapped out from under it (or freed).

The call happens when you are in an extremely low memory condition,
out of mbufs, and then you try to allocate more.  The wakeup on
the free does not guarantee that you will get the resource, in
a resource-staved state.  This would be much better served by a
wakeup_one() instead of a wakeup(), wakeup_one() will not keep the
process being awakened from losing the race for the allocation, if
someone else comes in for the allocation at the same time, through
another path (e.g. handing a network card interrupt allocation for
an mbuf).

The best answer is probably to say: "add mbufs".  This will keep
you from hitting the starvation problem here.  Fixing the other
issues are deeper problems (the second panic is also related to
memory, that time for a lock).

Other than that, you will have to do some tracking down, if
you really want it to fail gracefully... one option is make
m_clalloc_wait actually frigging wait.  It was an incredible
error when "_wait" no longer meant "wait", but instead turned
into "wait for a bit, then fail spectacularly".


-- Terry

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Fatal trap 12: page fault while in kernel mode

2002-04-03 Thread Will Froning

I have a 4.5-RELEASE-p2 box that is my Firewall/NAT/NFS server.  As a
NFS client I have a RH7.2 linux box.  When I do massive NFS writes to
my FBSD (from RH7.2 box), I get a panic.  I've attached the info I got
from my debug kernel.

If there is more info you need, just ask (this is all the FBSD faq
said to include).

Thanks,
Will

==

GNU gdb 4.18
Copyright 1998 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i386-unknown-freebsd"...(no debugging symbols found)...
IdlePTD at phsyical address 0x0036d000
initial pcb at physical address 0x002d6fa0
panicstr: page fault
panic messages:
---
Fatal trap 12: page fault while in kernel mode
fault virtual address   = 0x70
fault code  = supervisor read, page not present
instruction pointer = 0x8:0xc0152220
stack pointer   = 0x10:0xc02ade58
frame pointer   = 0x10:0xc02ade7c
code segment= base 0x0, limit 0xf, type 0x1b
= DPL 0, pres 1, def32 1, gran 1
processor eflags= interrupt enabled, resume, IOPL = 0
current process = Idle
interrupt mask  = net tty bio cam
trap number = 12
panic: page fault

syncing disks...

Fatal trap 12: page fault while in kernel mode
fault virtual address   = 0x30
fault code  = supervisor read, page not present
instruction pointer = 0x8:0xc01f0820
stack pointer   = 0x10:0xc02adc80
frame pointer   = 0x10:0xc02adc88
code segment= base 0x0, limit 0xf, type 0x1b
= DPL 0, pres 1, def32 1, gran 1
processor eflags= interrupt enabled, resume, IOPL = 0
current process = Idle
interrupt mask  = net tty bio cam
trap number = 12
panic: page fault
Uptime: 3d17h3m28s

dumping to dev #ad/0x20001, offset 399488
dump ata0: resetting devices .. done
191 190 189 188 187 186 185 184 183 182 181 180 179 178 177 176 175 174 173 172 171 
170 169 168 167 166 165 164 163 162 161 160 159 158 157 156 155 154 153 152 151 150 
149 148 147 146 145 144 143 142 141 140 139 138 137 136 135 134 133 132 131 130 129 
128 127 126 125 124 123 122 121 120 119 118 117 116 115 114 113 112 111 110 109 108 
107 106 105 104 103 102 101 100 99 98 97 96 95 94 93 92 91 90 89 88 87 86 85 84 83 82 
81 80 79 78 77 76 75 74 73 72 71 70 69 68 67 66 65 64 63 62 61 60 59 58 57 56 55 54 53 
52 51 50 49 48 47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 
23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
---
#0  0xc014f40e in dumpsys ()
(kgdb) where
#0  0xc014f40e in dumpsys ()
#1  0xc014f223 in boot ()
#2  0xc014f61d in panic ()
#3  0xc025c02f in trap_fatal ()
#4  0xc025bcdd in trap_pfault ()
#5  0xc025b883 in trap ()
#6  0xc01f0820 in acquire_lock ()
#7  0xc01f4884 in softdep_update_inodeblock ()
#8  0xc01ef969 in ffs_update ()
#9  0xc01f7d1e in ffs_sync ()
#10 0xc017f69b in sync ()
#11 0xc014efd6 in boot ()
#12 0xc014f61d in panic ()
#13 0xc025c02f in trap_fatal ()
#14 0xc025bcdd in trap_pfault ()
#15 0xc025b883 in trap ()
#16 0xc0152220 in tsleep ()
#17 0xc016abfe in m_clalloc_wait ()
#18 0xc01c8b14 in nfs_realign ()
#19 0xc01c9653 in nfsrv_rcv ()
#20 0xc01701d0 in sowakeup ()
#21 0xc01abd7c in udp_input ()
#22 0xc01a1bfb in ip_input ()
#23 0xc01a1c5b in ipintr ()
(kgdb) quit

-- 
Will Froning
Unix Sys. Admin.
[EMAIL PROTECTED]


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: fatal trap 12: page fault while in kernel mode

2000-05-26 Thread Bosko Milekic


On Fri, 26 May 2000, Greg Skouby wrote:

> Hello,
> 
> I posted a message to -questions yesterday about a machine that had the
> /dev directory somewhat corrupt. I could ls -la /dev/wd0* but when I was
> in the /dev director when I did an ls it was not showing any of the files.  
> Now, today the machine was rebooting over and over again, freezing with
> this message:
> 
> 
> fatal trap 12: page fault while in kernel mode
> 
> fault virtual address = 0xc33a3c6d
> 
> fault code = supervisor read, page not present
> 
> Instruction Pointer  = 0x8:0xc022798F
> 

You have to post more information. For example, what is at the
  location pointed at by the instruction pointer? Get a stack trace, if
  possible (from the debugger), and any other relevant info., most of which
  is explained in the Handbook.
  

--
 Bosko Milekic * pages.infinit.net/bmilekic/index.html * www.technokratis.com
 [EMAIL PROTECTED] * [EMAIL PROTECTED] * [EMAIL PROTECTED]




To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



fatal trap 12: page fault while in kernel mode

2000-05-26 Thread Greg Skouby

Hello,

I posted a message to -questions yesterday about a machine that had the
/dev directory somewhat corrupt. I could ls -la /dev/wd0* but when I was
in the /dev director when I did an ls it was not showing any of the files.  
Now, today the machine was rebooting over and over again, freezing with
this message:


fatal trap 12: page fault while in kernel mode

fault virtual address = 0xc33a3c6d

fault code = supervisor read, page not present

Instruction Pointer  = 0x8:0xc022798F

Stack Pointer = 0x 10: 0xc5dc6988

code segment = base 0 x0, limit 0xf type 0x1b
 = DPL 0, pres 1, def32 1, gran 1

processor eflags = interrupt enabled, resume, IOPL =0

current process = 5 (init)

interrupt mask =

trap number = 12

panic: page fault

syncing disk 1 1 1 1 1 1 1 1 1 giving up

rebooting in 15 seconds



It does this over and over again. i am running 3.3-R..Is it a memory
problem?
Thanks for any help or hints.




To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message