Re: Fatal trap 12: page fault while in kernel mode

2005-12-25 Thread kamal kc
thanks,
 i will try INVARIANTS and WITNESS options and will try to get 
 freebsd 6.0. it will be only tomorrow when i'll be able to do this
 because it is already evening and i will go to my office tomorrow 
 only.
 
 in the mean time if the memory corruption is the problem then is there
 any option/configuration or possible thing i could do to 
 make sure that the kernel quits or throws some messages or panics 
 on the moment the corruption takes place rather than some 
 time later when other program is affected by it. 
 
 that way i could locate any bug in my code if present.
 
 thanks, 
 kamal
 

Xin LI <[EMAIL PROTECTED]> wrote: Hi,

On 12/25/05, kamal kc  wrote:
[...]
> Is the problem related to memory leaks or sleeping
> on mutexes or some other causes.

>From the backtrace you have provided, it looks like a memory
corruption.  In order to aid your debugging, you will want INVARIANTS
and WITESS, etc. to be enabled.  Also, if feasible, please consider
using code from -CURRENT or at least RELENG_6_0, as there are more
debugging aids that is likely to catch bugs early.

Cheers,
--
Xin LI  http://www.delphij.net
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"




-
 Yahoo! DSL Something to write home about. Just $16.99/mo. or less
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Fatal trap 12: page fault while in kernel mode

2005-12-25 Thread Xin LI
Hi,

On 12/25/05, kamal kc <[EMAIL PROTECTED]> wrote:
[...]
> Is the problem related to memory leaks or sleeping
> on mutexes or some other causes.

>From the backtrace you have provided, it looks like a memory
corruption.  In order to aid your debugging, you will want INVARIANTS
and WITESS, etc. to be enabled.  Also, if feasible, please consider
using code from -CURRENT or at least RELENG_6_0, as there are more
debugging aids that is likely to catch bugs early.

Cheers,
--
Xin LI <[EMAIL PROTECTED]> http://www.delphij.net
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Fatal trap 12: page fault while in kernel mode

2005-12-12 Thread John Baldwin
On Wednesday 07 December 2005 05:09 pm, Danilo Asara wrote:
> [EMAIL PROTECTED] [~]$ uname -a
> FreeBSD resolza.fastwebnet.it 6.0-STABLE FreeBSD 6.0-STABLE #0: Fri
> Nov18 11:19:38 CET
> [EMAIL PROTECTED]:/usr/obj/usr/src/sys/RESOLZA  i386
> [EMAIL PROTECTED] [~]$
>
>
> [EMAIL PROTECTED] [/usr/crash]# kgdb kernel.debug.0 vmcore.0
> [GDB will not be able to debug user-mode
> threads: /usr/lib/libthread_db.so: Undefined symbol "ps_pglobal_lookup"]
> GNU gdb 6.1.1 [FreeBSD]
> Copyright 2004 Free Software Foundation, Inc.
> GDB is free software, covered by the GNU General Public License, and you
> are
> welcome to change it and/or distribute copies of it under certain
> conditions.
> Type "show copying" to see the conditions.
> There is absolutely no warranty for GDB.  Type "show warranty" for
> details.
> This GDB was configured as "i386-marcel-freebsd".
>
> Unread portion of the kernel message buffer:
>
>
> Fatal trap 12: page fault while in kernel mode
> cpuid = 0; apic id = 00
> fault virtual address   = 0x0
> fault code  = supervisor read, page not present
> instruction pointer = 0x20:0xc0500411
> stack pointer   = 0x28:0xef58fcac
> frame pointer   = 0x28:0xef58fcdc
> code segment= base 0x0, limit 0xf, type 0x1b
> = DPL 0, pres 1, def32 1, gran 1
> processor eflags= interrupt enabled, resume, IOPL = 0
> current process = 722 (artsd)
> trap number = 12
> panic: page fault
> cpuid = 0
> KDB: stack backtrace:
> kdb_backtrace(100,c2a83a80,28,ef58fc6c,c) at kdb_backtrace+0x29
> panic(c06b2fec,c06d9f5b,0,f,c09b) at panic+0x114
> trap_fatal(ef58fc6c,0,c2a83a80,c2890bb8,c) at trap_fatal+0x2ca
> trap_pfault(ef58fc6c,0,0) at trap_pfault+0x1d7
> trap(8,28,28,c2ea9e70,c2a83a80) at trap+0x2fd
> calltrap() at calltrap+0x5
> --- trap 0xc, eip = 0xc0500411, esp = 0xef58fcac, ebp = 0xef58fcdc ---
> kse_release(c2a83a80,ef58fd04,1,0,200292) at kse_release+0x165
> syscall(3b,3b,3b,80f2100,81) at syscall+0x2bf
> Xint0x80_syscall() at Xint0x80_syscall+0x1f
> --- syscall (383, FreeBSD ELF32, kse_release), eip = 0x287d81af, esp =
> 0xbf9fef30, ebp = 0xbf9fef8c ---
> Uptime: 12h9m20s
> Dumping 1023 MB (2 chunks)
>   chunk 0: 1MB (159 pages) ... ok
>   chunk 1: 1023MB (261872 pages) 1007 991 975 959 943 927 911 895 879
> 863 847 831 815 799 783 767 751 735 719 703 687 671 655 639 623 607 591
> 575 559 543 527 511 495 479 463 447 431 415 399 383 367 351 335 319 303
> 287 271 255 239 223 207 191 175 159 143 127 111 95 79 63 47 31 15
>
> #0  doadump () at pcpu.h:165
> 165 pcpu.h: No such file or directory.
> in pcpu.h
> (kgdb) where
> #0  doadump () at pcpu.h:165
> #1  0xc05132bf in boot (howto=260)
> at /usr/src/sys/kern/kern_shutdown.c:399
> #2  0xc0513615 in panic (fmt=0xc06b2fec "%s")
> at /usr/src/sys/kern/kern_shutdown.c:555
> #3  0xc068d8ca in trap_fatal (frame=0xef58fc6c, eva=0)
> at /usr/src/sys/i386/i386/trap.c:831
> #4  0xc068d5d7 in trap_pfault (frame=0xef58fc6c, usermode=0, eva=0)
> at /usr/src/sys/i386/i386/trap.c:742
> #5  0xc068d1ed in trap (frame=
>   {tf_fs = 8, tf_es = 40, tf_ds = 40, tf_edi = -1024811408, tf_esi =
> -1029162368, tf_ebp = -279380772, tf_isp = -279380840, tf_ebx =
> -1026066384, tf_edx = -1029162368, tf_ecx = -1026066303, tf_eax = 0,
> tf_trapno = 12, tf_err = 0, tf_eip = -1068497903, tf_cs = 32, tf_eflags
> = 2687622, tf_esp = -1036728832, tf_ss = 30})
> at /usr/src/sys/i386/i386/trap.c:432
> #6  0xc067aaca in calltrap () at /usr/src/sys/i386/i386/exception.s:139
> #7  0xc0500411 in kse_release (td=0xc2a83a80, uap=0xef58fd04)
> at /usr/src/sys/kern/kern_kse.c:428

The problem is here.  You can try posting this to [EMAIL PROTECTED] and see 
if someone there can help you debug this further.

-- 
John Baldwin <[EMAIL PROTECTED]>  <><  http://www.FreeBSD.org/~jhb/
"Power Users Use the Power to Serve"  =  http://www.FreeBSD.org
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Fatal trap 12: page fault while in kernel mode

2005-12-07 Thread John Baldwin
On Wednesday 07 December 2005 02:47 am, Yuri Khotyaintsev wrote:
> On Friday 02 December 2005 14.54, John Baldwin wrote:
> > On Friday 02 December 2005 05:00 am, Yuri Khotyaintsev wrote:
> > > I have the following panic occurring several times a week. The machine
> > > is an NFS server, and it usually panics early in the morning, when
> > > first people try to access it. After reboot it may work OK for 1-2
> > > days, and then panics again. I have tried changing memory and replacing
> > > disk which was exported via NFS, but nothing helped :(
> > >
> > > Any suggestion on how to fix this panic will be very much appreciated !
> >
> > This panic (in propagate_priority) is usually caused when a thread goes
> > to sleep while holding a mutex (which is forbidden).  If you enable
> > INVARIANTS and/or WITNESS you should get a better panic, and with WITNESS
> > you will even be warned when a thread goes to sleep while holding a
> > mutex.  However, these options do introduce considerable execution
> > overhead, and sometimes that overhead changes the timing enough to hide
> > the race. :(
>
> Here are the two panics which I got with INVARIANTS and WITNESS enabled.
>
> Unread portion of the kernel message buffer:
> Memory modified after free 0xc4759e00(508) val=0 @ 0xc4759e00
> panic: Most recently used by UFS dirhash

Well, this isn't the panic I was expecting, but it points to something 
trashing free'd memory via a stale pointer or some such.  You might be able 
to use MEMGUARD to track this down.

-- 
John Baldwin <[EMAIL PROTECTED]>  <><  http://www.FreeBSD.org/~jhb/
"Power Users Use the Power to Serve"  =  http://www.FreeBSD.org
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Fatal trap 12: page fault while in kernel mode

2005-12-07 Thread Yuri Khotyaintsev
On Friday 02 December 2005 14.54, John Baldwin wrote:
> On Friday 02 December 2005 05:00 am, Yuri Khotyaintsev wrote:
> > I have the following panic occurring several times a week. The machine is
> > an NFS server, and it usually panics early in the morning, when first
> > people try to access it. After reboot it may work OK for 1-2 days, and
> > then panics again. I have tried changing memory and replacing disk which
> > was exported via NFS, but nothing helped :(
> >
> > Any suggestion on how to fix this panic will be very much appreciated !
>
> This panic (in propagate_priority) is usually caused when a thread goes to
> sleep while holding a mutex (which is forbidden).  If you enable INVARIANTS
> and/or WITNESS you should get a better panic, and with WITNESS you will
> even be warned when a thread goes to sleep while holding a mutex.  However,
> these options do introduce considerable execution overhead, and sometimes
> that overhead changes the timing enough to hide the race. :(

Here are the two panics which I got with INVARIANTS and WITNESS enabled.

# kgdb /usr/obj/usr/src/sys/HEM.DEBUG/kernel.debug vmcore.8 
[GDB will not be able to debug user-mode threads: /usr/lib/libthread_db.so: 
Undefined symbol "ps_pglobal_lookup"]
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i386-marcel-freebsd".

Unread portion of the kernel message buffer:
Memory modified after free 0xc4759e00(508) val=0 @ 0xc4759e00
panic: Most recently used by UFS dirhash

Uptime: 11h8m36s
Dumping 511 MB (2 chunks)
  chunk 0: 1MB (160 pages) ... ok
  chunk 1: 511MB (130800 pages) 495 479 463 447 431 415 399 383 367 351 335 
319 303 287 271 255 239 223 207 191 175 159 143 127 111 95 79 63 47 31 15

#0  doadump () at pcpu.h:165
165 pcpu.h: No such file or directory.
in pcpu.h
(kgdb) where
#0  doadump () at pcpu.h:165
#1  0xc050fd4f in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:399
#2  0xc0510043 in panic (fmt=0xc06dccbb "Most recently used by %s\n")
at /usr/src/sys/kern/kern_shutdown.c:555
#3  0xc0648ccf in mtrash_ctor (mem=0xc4759e00, size=0, arg=0x0, flags=2)
at /usr/src/sys/vm/uma_dbg.c:137
#4  0xc06469c1 in uma_zalloc_arg (zone=0xc104d980, udata=0x0, flags=2)
at /usr/src/sys/vm/uma_core.c:1850
#5  0xc05043cd in malloc (size=400, mtp=0xc06fb700, flags=2) at uma.h:275
#6  0xc063fba9 in ufs_readdir (ap=0xd56eaaec)
at /usr/src/sys/ufs/ufs/ufs_vnops.c:1846
#7  0xc06a61cc in VOP_READDIR_APV (vop=0x0, a=0xd56eaaec) at vnode_if.c:1427
#8  0xc0607716 in nfsrv_readdir (nfsd=0xc4368c00, slp=0x0, td=0xc3326780, 
mrq=0xd56eac80) at vnode_if.h:746
#9  0xc060fa5b in nfssvc_nfsd (td=0x0)
at /usr/src/sys/nfsserver/nfs_syscalls.c:472
#10 0xc060f280 in nfssvc (td=0xc3326780, uap=0xd56ead04)
at /usr/src/sys/nfsserver/nfs_syscalls.c:181
#11 0xc069b6b0 in syscall (frame=
---Type  to continue, or q  to quit---
  {tf_fs = 59, tf_es = 59, tf_ds = 59, tf_edi = 0, tf_esi = 0, tf_ebp = 
-1077941464, tf_isp = -714166940, tf_ebx = 0, tf_edx = -1077936144, tf_ecx = 
1, tf_eax = 155, tf_trapno = 12, tf_err = 2, tf_eip = 671852067, tf_cs = 51, 
tf_eflags = 582, tf_esp = -1077941492, tf_ss = 59}) 
at /usr/src/sys/i386/i386/trap.c:981
#12 0xc068947f in Xint0x80_syscall () 
at /usr/src/sys/i386/i386/exception.s:200
#13 0x0033 in ?? ()
Previous frame inner to this frame (corrupt stack?)
(kgdb) quit

# kgdb /usr/obj/usr/src/sys/HEM.DEBUG/kernel.debug vmcore.9
[GDB will not be able to debug user-mode threads: /usr/lib/libthread_db.so: 
Undefined symbol "ps_pglobal_lookup"]
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i386-marcel-freebsd".

Unread portion of the kernel message buffer:
Memory modified after free 0xc5172800(508) val=0 @ 0xc5172800
panic: Most recently used by UFS dirhash

Uptime: 1d1h7m17s
Dumping 511 MB (2 chunks)
  chunk 0: 1MB (160 pages) ... ok
  chunk 1: 511MB (130800 pages) 495 479 463 447 431 415 399 383 367 351 335 
319 303 287 271 255 239 223 207 191 175 159 143 127 111 95 79 63 47 31 15

#0  doadump () at pcpu.h:165
165 pcpu.h: No such file or directory.
in pcpu.h
(kgdb) where
#0  doadump () at pcpu.h:165
#1  0xc050fd4f in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:399
#2  0xc0510043 in panic (fmt=0xc06dccbb "Most recently used by %s\n")
at /usr/src/sys/kern/kern_shutdown.c:555
#3  0xc0648ccf in mtrash_ctor (mem=0xc5172800, 

Re: Fatal trap 12: page fault while in kernel mode

2005-12-02 Thread Yuri Khotyaintsev
On Friday 02 December 2005 14.54, John Baldwin wrote:
> On Friday 02 December 2005 05:00 am, Yuri Khotyaintsev wrote:
> > I have the following panic occurring several times a week. The machine is
> > an NFS server, and it usually panics early in the morning, when first
> > people try to access it. After reboot it may work OK for 1-2 days, and
> > then panics again. I have tried changing memory and replacing disk which
> > was exported via NFS, but nothing helped :(
> >
> > Any suggestion on how to fix this panic will be very much appreciated !
>
> This panic (in propagate_priority) is usually caused when a thread goes to
> sleep while holding a mutex (which is forbidden).  If you enable INVARIANTS
> and/or WITNESS you should get a better panic, and with WITNESS you will
> even be warned when a thread goes to sleep while holding a mutex.  However,
> these options do introduce considerable execution overhead, and sometimes
> that overhead changes the timing enough to hide the race. :(

I am compiling a new kernel with INVARIANTS and WITNESS now. Will wait for a 
"better" panic ;-)

-- 
Dr. Yuri Khotyaintsev
Institutet för rymdfysik (IRF), Uppsala
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Fatal trap 12: page fault while in kernel mode

2005-12-02 Thread John Baldwin
On Friday 02 December 2005 05:00 am, Yuri Khotyaintsev wrote:
> I have the following panic occurring several times a week. The machine is
> an NFS server, and it usually panics early in the morning, when first
> people try to access it. After reboot it may work OK for 1-2 days, and then
> panics again. I have tried changing memory and replacing disk which was
> exported via NFS, but nothing helped :(
>
> Any suggestion on how to fix this panic will be very much appreciated !

This panic (in propagate_priority) is usually caused when a thread goes to 
sleep while holding a mutex (which is forbidden).  If you enable INVARIANTS 
and/or WITNESS you should get a better panic, and with WITNESS you will even 
be warned when a thread goes to sleep while holding a mutex.  However, these 
options do introduce considerable execution overhead, and sometimes that 
overhead changes the timing enough to hide the race. :(

-- 
John Baldwin <[EMAIL PROTECTED]>  <><  http://www.FreeBSD.org/~jhb/
"Power Users Use the Power to Serve"  =  http://www.FreeBSD.org
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Fatal trap 12: page fault while in kernel mode - HELP!

2005-02-09 Thread Robert Watson
On Tue, 8 Feb 2005, ALeine wrote:

> [EMAIL PROTECTED] wrote: 
> 
> > We have two NFS file servers running FreeBSD 5.2.1-RELEASE-p9 on
> > Dell Poweredge 1750's that crash randomly. They each have about 1.3TB
> > of disk. They are used to server email and web content to several
> > WEB/EMAIL servers. Followin is the console log messages and the kernel boot
> > messages. Any ideas as to what the problem may be?
> 
> Try turning TCP SACK off by putting net.inet.tcp.sack.enable=0 in
> sysctl.conf. 

TCP SACK first shipped in 5.3...

Robert N M Watson


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Fatal trap 12: page fault while in kernel mode - HELP!

2005-02-09 Thread ALeine
[EMAIL PROTECTED] wrote: 

> We have two NFS file servers running FreeBSD 5.2.1-RELEASE-p9 on
> Dell Poweredge 1750's that crash randomly. They each have about 1.3TB
> of disk. They are used to server email and web content to several
> WEB/EMAIL servers. Followin is the console log messages and the kernel boot
> messages. Any ideas as to what the problem may be?

Try turning TCP SACK off by putting net.inet.tcp.sack.enable=0 in sysctl.conf.

ALeine

___
WebMail FREE http://mail.austrosearch.net 
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Fatal trap 12: page fault while in kernel mode - HELP!

2005-02-09 Thread Robert Watson

On Wed, 9 Feb 2005, David Rice wrote:

> We have two NFS file servers running FreeBSD 5.2.1-RELEASE-p9 on Dell
> Poweredge 1750's that crash randomly. They each have about 1.3TB of
> disk. They are used to server email and web content to several WEB/EMAIL
> servers.  Followin is the console log messages and the kernel boot
> messages. Any ideas as to what the problem may be? 

I guess there's no chance of updating to FreeBSD 5.3?  It has a lot of
cleanups, fixes, etc, some of which weren't appropriate to backport to
RELENG_5_2.

Robert N M Watson

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Fatal trap 12: page fault while in kernel mode

2004-08-23 Thread Kevin Brunelle
> If this is how I got most of my panics, this little script running in
> two different xterms helped decrease the time to panic.  It got my
> system to panic a lot with the older nvidia drivers.

[script trimmed out]

> This always helped get my system unstable on 4-STABLE rather quickly.  I
> think it was the issue of running two or more GL programs at the same
> time that caused or increased the problem.

lol, I might try that.  Although I really don't need to go that far. 
Lately, I have been able to spontaniously reboot by running five GL
applications at once.  Which isn't pleasant but doesn't concern me too
much.  Each time I've had a panic there has been only one gl application
running... and lately all GL programs are causing this issue.

> Are you using the latest nvidia drivers?

As a matter of fact, that is what I think caused the problem.  I just
upgraded to the latest drivers on the 19th... right before I had these
problems.  That combined with the fact that all of these issues can be
consistently caused by running gl programs gives me strong cause to
suspect it.

> You should not be mixing the FreeBSD AGP and the nvidia AGP together.
> Choose one or the other.

Yes, I suspect this might be part of the issue.  I don't remember seeing
this message before the new driver was installed.  But I do know that
the old kernel had it loaded (it was hard coded with the configuration
file).  I think the driver might have changed the way it handled the
presence of both AGPs.

> I have my own panic on 4-STABLE which I just reported in freebsd-stable:
> http://lists.freebsd.org/pipermail/freebsd-stable/2004-August/008530.html
> Would you like to trade?  :)

lol, I would love to... if I thought I could help.  But I am still
learning as much as I can about the kernel... nowhere near the level
required to help.

Kevin
-- 
"Down with disease, up before the dawn.
A thousand barefoot children, dancin? on my lawn"
-Phish "Down with Disease"

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Fatal trap 12: page fault while in kernel mode

2004-08-23 Thread Sean Farley
On Mon, 23 Aug 2004, Kevin Brunelle wrote:
Alright, this is driving me nuts.  For a little while there I could
not get the system to panic -- it would spontaniously reboot when
running a GL program instead of panic.  This afternoon it finally
panic'd (who would think that would be something I want to see but it
was).
If this is how I got most of my panics, this little script running in
two different xterms helped decrease the time to panic.  It got my
system to panic a lot with the older nvidia drivers.
#!/usr/local/bin/zsh
# Try with and without.
export __GL_SINGLE_THREADED=1
/bin/rm -f glxinfo.core
while [ 1 = 1 ]; do
/usr/X11R6/bin/glxinfo >& /dev/null
if [ -e glxinfo.core ]; then
echo "Core found."
/bin/rm -f glxinfo.core
fi
done
This always helped get my system unstable on 4-STABLE rather quickly.  I
think it was the issue of running two or more GL programs at the same
time that caused or increased the problem.
Are you using the latest nvidia drivers?

The error this time was a double fault (are we playing tennis?).  My
original issue was with a page fault in kernel mode.  And my original
problem also was related to a different function.  The function this
time is .
My panics were fairly random.
Take a look at all those sig-11s.  I would suspect bad memory but I
ran memtest86+ on this machine less than a week ago and everything was
fine -- not even a whiff of a problem.  I caused this panic by running
another gl application and I feel it is related to my orginal problem.
I also ran memtest86 for over a day without finding fault in the memory.
The sad thing is that almost any type of bad hardware can cause
stability issues.  At least this is what I was told.  Maybe the caps on
your system have started going bad?
Another thing that interested me is that the kernel dump seems
"corrupted" or incomplete... does the line "---Can't read userspace
from dump, or kernel process---" possibly imply that I did not get a
good dump at the time of the panic?
If anyone has any ideas about what to fix I would love to hear them.
I am tempted to change a few things myself that might be an issue (for
example, removing the FreeBSD agp which nvidia complains about in my
dmesg -- and also upgrading to  3-Beta1 ... so at least my kernel
panics will relate to making that system better).  But, until I know
that this is a dead end and no one wants to see anything, I am not
touching anything.  I don't want to ruin the chances of this being a
real bug and it not being fixed because I change something that just
hides it.
You should not be mixing the FreeBSD AGP and the nvidia AGP together.
Choose one or the other.
If you want me to get any information from the dump or try anything
please let me know.  You may have to tell me how to go about doing
stuff with gdb (I am not very experienced with its advanced features)
but I am willing to learn and do what I can.
I have my own panic on 4-STABLE which I just reported in freebsd-stable:
http://lists.freebsd.org/pipermail/freebsd-stable/2004-August/008530.html
Would you like to trade?  :)
Sean
---
[EMAIL PROTECTED]
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Fatal trap 12: page fault while in kernel mode

2004-08-23 Thread Kevin Brunelle
Alright, this is driving me nuts.  For a little while there I could not
get the system to panic -- it would spontaniously reboot when running a
GL program instead of panic.  This afternoon it finally panic'd (who
would think that would be something I want to see but it was).

I am attaching the transcript of me playing around with it.  It includes
the panic message as well as some debug output from gdb.  Although I am
not certain that is as helpful as I hoped it would be.  At the very end
I have included yet another uname -a and copy of my kernel configuration
file.

The error this time was a double fault (are we playing tennis?).  My
original issue was with a page fault in kernel mode.  And my original
problem also was related to a different function.  The function this
time is .

Take a look at all those sig-11s.  I would suspect bad memory but I ran
memtest86+ on this machine less than a week ago and everything was fine
-- not even a whiff of a problem.  I caused this panic by running
another gl application and I feel it is related to my orginal problem.

Another thing that interested me is that the kernel dump seems
"corrupted" or incomplete... does the line "---Can't read userspace from
dump, or kernel process---" possibly imply that I did not get a good
dump at the time of the panic?

If anyone has any ideas about what to fix I would love to hear them.  I
am tempted to change a few things myself that might be an issue (for
example, removing the FreeBSD agp which nvidia complains about in my
dmesg -- and also upgrading to  3-Beta1 ... so at least my kernel panics
will relate to making that system better).  But, until I know that this
is a dead end and no one wants to see anything, I am not touching
anything.  I don't want to ruin the chances of this being a real bug and
it not being fixed because I change something that just hides it.

If you want me to get any information from the dump or try anything
please let me know.  You may have to tell me how to go about doing stuff
with gdb (I am not very experienced with its advanced features) but I am
willing to learn and do what I can.

-Kevin
-- 
"Down with disease, up before the dawn.
A thousand barefoot children, dancin? on my lawn"
-Phish "Down with Disease"
Script started on Mon Aug 23 16:14:53 2004
/home/kevinb/crash# gdb -k kernel.debug vmcore.1
GNU gdb 5.2.1 (FreeBSD)
Copyright 2002 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i386-unknown-freebsd"...
panic: swp_pager_meta_free_all: failed to locate all swap meta blocks
panic messages:
---
panic: double fault

syncing disks, buffers remaining... 2177 2177 Copyright (c) 1992-2004 The FreeBSD 
Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
The Regents of the University of California. All rights reserved.
FreeBSD 5.2.1-RELEASE-p9 #0: Sun Aug 22 14:00:38 EDT 2004
[EMAIL PROTECTED]:/usr/obj/usr/src/sys/FOOKERN
Preloaded elf kernel "/boot/kernel/kernel" at 0xc0ce4000.
Preloaded elf module "/boot/modules/nvidia.ko" at 0xc0ce4244.
Preloaded elf module "/boot/kernel/linux.ko" at 0xc0ce42f0.
Preloaded elf module "/boot/kernel/acpi.ko" at 0xc0ce439c.
Timecounter "i8254" frequency 1193182 Hz quality 0
CPU: Intel Pentium III (863.87-MHz 686-class CPU)
  Origin = "GenuineIntel"  Id = 0x683  Stepping = 3
  
Features=0x383f9ff
real memory  = 268173312 (255 MB)
avail memory = 246661120 (235 MB)
Pentium Pro MTRR support enabled
npx0: [FAST]
npx0:  on motherboard
npx0: INT 16 interface
acpi0:  on motherboard
pcibios: BIOS version 2.10
Using $PIR table, 12 entries at 0xc00f2d00
acpi0: Power Button (fixed)
Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000
acpi_timer0: <24-bit timer at 3.579545MHz> port 0x408-0x40b on acpi0
acpi_cpu0:  port 0x530-0x537 on acpi0
acpi_button0:  on acpi0
pcib0:  port 0xcf8-0xcff on acpi0
pci0:  on pcib0
pcib0: slot 31 INTD is routed to irq 10
pcib0: slot 31 INTB is routed to irq 9
agp0:  mem 0xf800-0xfbff at device 
0.0 on pci0
pcib1:  at device 1.0 on pci0
pci2:  on pcib1
pcib0: slot 1 INTA is routed to irq 11
pcib1: slot 0 INTA is routed to irq 11
nvidia0:  mem 
0xf200-0xf3ff,0xfd00-0xfdff irq 11 at device 0.0 on pci2
pcib2:  at device 30.0 on pci0
pci1:  on pcib2
pcib2: slot 9 INTA is routed to irq 3
pcib2: slot 12 INTA is routed to irq 9
xl0: <3Com 3c905C-TX Fast Etherlink XL> port 0xdc00-0xdc7f mem 0xfc9ff800-0xfc9ff87f 
irq 3 at device 9.0 on pci1
xl0: Ethernet address: 00:01:03:23:9d:ba
miibus0:  on xl0
xlphy0: <3c905C 10/100 internal PHY> on miibus0
xlphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
pcm0:  port 0xdf00-0xdf3f irq 9 at device 12.0 on pci1
pcm0: 
isab0:  at device 31.0 on pci0
isa0:  on isab0
ata

Re: Fatal trap 12: page fault while in kernel mode

2004-08-22 Thread Brian Fundakowski Feldman
On Sun, Aug 22, 2004 at 11:22:43AM -0400, Kevin Brunelle wrote:
> Okay,
> 
> Replication does not look like it will be an issue.  Again, the system
> panic'd while running a gl application when I was at work.  This time I
> did get a core dump (but I still don't have a debugging kernel -- it was
> building ARG).
> 
> Right now, I am going to disable my screensaver and carefully avoid
> applications which might cause the panic again.  Once the proper kernel
> is in place... then it is go-time.
> 
> If anyone is interested I am going to save the dump -- but it probably
> is worth the wait (in saved effort) till I have a proper kernel in
> place.  I am almost 100% sure this is due to the nvidia drivers -- I
> upgraded on the 19th and never had a problem before this... that and gl
> programs seem to be the cause of both crashes so far.

Andreas, are you using the nvidia driver too?

-- 
Brian Fundakowski Feldman   \'[ FreeBSD ]''\
  <> [EMAIL PROTECTED]   \  The Power to Serve! \
 Opinions expressed are my own.   \,,\
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Fatal trap 12: page fault while in kernel mode

2004-08-22 Thread Kevin Brunelle
Okay,

Replication does not look like it will be an issue.  Again, the system
panic'd while running a gl application when I was at work.  This time I
did get a core dump (but I still don't have a debugging kernel -- it was
building ARG).

Right now, I am going to disable my screensaver and carefully avoid
applications which might cause the panic again.  Once the proper kernel
is in place... then it is go-time.

If anyone is interested I am going to save the dump -- but it probably
is worth the wait (in saved effort) till I have a proper kernel in
place.  I am almost 100% sure this is due to the nvidia drivers -- I
upgraded on the 19th and never had a problem before this... that and gl
programs seem to be the cause of both crashes so far.

Kevin
-- 
"Down with disease, up before the dawn.
A thousand barefoot children, dancin? on my lawn"
-Phish "Down with Disease"

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


IP fragmentation (was Re: Fatal trap 12: page fault while in kernel mode)

2002-04-05 Thread Bruce A. Mah

[Moving to -net]

If memory serves me right, Andrew Gallatin wrote:

>  > Alternately, it would be a good idea to have a "ip_maxpacketfrags"
>  > instead of an "ip_maxfragpackets", to put a hard limit on the
>  > number of mbufs that can be consumed by the fragment reassembly
>  > process.
> 
> I think this is the best solution.

Just for the heck of it, I started reading through ip_input.c to see how
hard this would be to do.  Haven't got there yet, I saw something odd:
the variables ip_nfragpackets and nipq look *awfully* similar.

It looks like they both track the number of reassembly queues, because
they're initialized to zero, and incremented and decremented at the same
time.  Their limits (ip_maxfragpackets and maxnipq respectively) are
even initialized on consecutive lines.

The only difference I can see is that in ip_input(), if nipq > maxnipq,
all of the fragments for some other packet in the current hash bucket
get dropped (with the wonderfully descriptive comment "gak").  The check
for ip_nfragpackets comes in ip_reass(), where if ip_nfragpackets >=
ip_maxfragpackets, then we drop the current fragment.  (Is it possible 
that the second check masks the effects of the first?)

I couldn't find any obvious explanation in the CVS log for ip_input.c.

Am I missing something, or are these two variables basically doing the 
same thing?

Thanks,

Bruce.



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Fatal trap 12: page fault while in kernel mode

2002-04-05 Thread Andrew Gallatin


Terry Lambert writes:
 > Andrew Gallatin wrote:
 > > The problem is that ip_maxfragpackets is:
 > > "Maximum number of IPv4 fragment reassembly queue entries"
 > > 
 > > You (& I, & most people probably) took that number to mean the cap on
 > > the number of mbufs sitting on reassembly queues.  However, its really
 > > a cap on the number of fragmented packets sitting on reassembly
 > > queues:
 > 
 > [ ... ]
 > 
 > > Since the linux host is sending 16K packets, that means that each
 > > packet is made up of 11 cluster mbufs (assuming a 1500 byte mtu).
 > > There can be as many as 10 cluster mbufs on the reassembly queue for
 > > for each packet.
 > > 
 > > Lets say we have 2048 cluster mbufs.  That makes maxfragpackets 512.
 > > However, 512 * 10 mbufs = 5120 mbufs.  Oops.
 > > 
 > > I think the limit should probably be something much smaller, like
 > > maybe nmbclusters / (net.inet.udp.recvspace / 1472).  Or the
 > > implementation & name should be changed to "maxfragmbufs"
 > 
 > 
 > This suggests that one could fragment as large a UDP packet
 > as one chooses into "n" fragments, and then supply only "n-1"
 > elements of the whole packet, as an attack, in order to use
 > up system resources.

Essentially what a linux NFS client is already doing.. ;-(

 > I think we are better off with my suggestion, where udp packets
 > above a certain size are intentionally dropped as "not supported".

Depending on what the "certain size" is, that might be reasonable.

 > Alternately, it would be a good idea to have a "ip_maxpacketfrags"
 > instead of an "ip_maxfragpackets", to put a hard limit on the
 > number of mbufs that can be consumed by the fragment reassembly
 > process.

I think this is the best solution.

 > Of course, this also suggests that using TCP instead of UDP for
 > the NFS would result in the problem "just going away", for the
 > original poster, which is probably all the opriginal poster
 > really cares about...

Considering that a modern linux NFS client is going to be a common
scenario, we should probably be able to interroperate with it, no
matter how broken its defaults are.  BTW, 16K UDP packets are legal
according to the NFS V3 spec, if I remember it correctly.

Drew


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Fatal trap 12: page fault while in kernel mode

2002-04-05 Thread Terry Lambert

Andrew Gallatin wrote:
> The problem is that ip_maxfragpackets is:
> "Maximum number of IPv4 fragment reassembly queue entries"
> 
> You (& I, & most people probably) took that number to mean the cap on
> the number of mbufs sitting on reassembly queues.  However, its really
> a cap on the number of fragmented packets sitting on reassembly
> queues:

[ ... ]

> Since the linux host is sending 16K packets, that means that each
> packet is made up of 11 cluster mbufs (assuming a 1500 byte mtu).
> There can be as many as 10 cluster mbufs on the reassembly queue for
> for each packet.
> 
> Lets say we have 2048 cluster mbufs.  That makes maxfragpackets 512.
> However, 512 * 10 mbufs = 5120 mbufs.  Oops.
> 
> I think the limit should probably be something much smaller, like
> maybe nmbclusters / (net.inet.udp.recvspace / 1472).  Or the
> implementation & name should be changed to "maxfragmbufs"


This suggests that one could fragment as large a UDP packet
as one chooses into "n" fragments, and then supply only "n-1"
elements of the whole packet, as an attack, in order to use
up system resources.

I think we are better off with my suggestion, where udp packets
above a certain size are intentionally dropped as "not supported".

Alternately, it would be a good idea to have a "ip_maxpacketfrags"
instead of an "ip_maxfragpackets", to put a hard limit on the
number of mbufs that can be consumed by the fragment reassembly
process.

Of course, this also suggests that using TCP instead of UDP for
the NFS would result in the problem "just going away", for the
original poster, which is probably all the opriginal poster
really cares about...

-- Terry

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Fatal trap 12: page fault while in kernel mode

2002-04-05 Thread Andrew Gallatin


Bruce A. Mah writes:
 > 
 > I was discussing this with some of my cow-orkers, as we've had a similar
 > situation (cluster mbufs getting temporarily depleted on a
 > 4.5-RELEASE-p2 NFS server with Linux and FreeBSD clients, but no kernel
 > panics).  Shouldn't the net.inet.ip.maxfragpackets sysctl variable
 > (introduced in 4.4-RELEASE) limit the number of fragments on the
 > reassembly queue(s)?  This value looks to be about 1/4 the number of
 > cluster mbufs, by default.

That's a good point.  When I was bitten by this, I didn't have time to
mess with things & I cranked down the read/write size on the linux
clients.   

The problem is that ip_maxfragpackets is:
"Maximum number of IPv4 fragment reassembly queue entries"


You (& I, & most people probably) took that number to mean the cap on
the number of mbufs sitting on reassembly queues.  However, its really
a cap on the number of fragmented packets sitting on reassembly
queues:

/*
 * If first fragment to arrive, create a reassembly queue.
 */
if (fp == 0) {
/*
 * Enforce upper bound on number of fragmented packets
 * for which we attempt reassembly;
 * If maxfrag is 0, never accept fragments.
 * If maxfrag is -1, accept all fragments without limitation.
 <...>

Since the linux host is sending 16K packets, that means that each
packet is made up of 11 cluster mbufs (assuming a 1500 byte mtu).
There can be as many as 10 cluster mbufs on the reassembly queue for
for each packet.

Lets say we have 2048 cluster mbufs.  That makes maxfragpackets 512.
However, 512 * 10 mbufs = 5120 mbufs.  Oops.

I think the limit should probably be something much smaller, like
maybe nmbclusters / (net.inet.udp.recvspace / 1472).  Or the
implementation & name should be changed to "maxfragmbufs"

Drew


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Fatal trap 12: page fault while in kernel mode

2002-04-04 Thread Bruce A. Mah

If memory serves me right, Andrew Gallatin wrote:
> 
> Will Froning writes:
>  > I have a 4.5-RELEASE-p2 box that is my Firewall/NAT/NFS server.  As a
>  > NFS client I have a RH7.2 linux box.  When I do massive NFS writes to
>  > my FBSD (from RH7.2 box), I get a panic.  I've attached the info I got
>  > from my debug kernel.
>  > 
> 
> While the fix being discussed by Peter & others will prevent panics,
> the linux box will still run your server out of mbufs clusters.  This
> is happening because the linux box is using a 16K write size over UDP
> by default.  This is a stupid default.  If there is any lossage
> between the hosts (eg, any packets get dropped), more and more packets
> will end up on the reassembly queues.  Eventually, all your cluster
> mbufs will be there.

I was discussing this with some of my cow-orkers, as we've had a similar
situation (cluster mbufs getting temporarily depleted on a
4.5-RELEASE-p2 NFS server with Linux and FreeBSD clients, but no kernel
panics).  Shouldn't the net.inet.ip.maxfragpackets sysctl variable
(introduced in 4.4-RELEASE) limit the number of fragments on the
reassembly queue(s)?  This value looks to be about 1/4 the number of
cluster mbufs, by default.

Bruce.



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Fatal trap 12: page fault while in kernel mode

2002-04-04 Thread Terry Lambert

Andrew Gallatin wrote:
> While the fix being discussed by Peter & others will prevent panics,
> the linux box will still run your server out of mbufs clusters.  This
> is happening because the linux box is using a 16K write size over UDP
> by default.  This is a stupid default.  If there is any lossage
> between the hosts (eg, any packets get dropped), more and more packets
> will end up on the reassembly queues.  Eventually, all your cluster
> mbufs will be there.
> 
> I suggest changing the mount options on the linux box to use 8k reads
> and writes, or use TCP.

Good observation.  Actually, for a firewall box, it might be
reasonable to drop UDP packets over a certain size, and to
drop certain classes of frags.

This won't help the original poster with the Linux problem;
they would still have to reconfigure their Linux machine to
use smaller writes.

> Another problem I've see w/Linux NFS clients is that recent linux NFS
> clients seem to spew ACCESS requests like there's no tomorrow & beats
> the snot out of my NFS server.  When building large software pacakges
> via "make -j4" over NFSv3 (100Mb ethernet) on a dual PIII 1GHz system,
> a FreeBSD 4.5 host issues 400-500 ACCESS calls/sec.  A Linux 2.4.18
> host spews 12,000 - 14,000 ACCESS calls/sec, or roughly 30 times as
> many.  Needless to say, the build finishes a whole lot quicker on
> FreeBSD.  Does anybody know what I can do to make the linux client
> cache ACCESS info?

Apart from installing FreeBSD instead?  8-).

I think that it will take some hacking of the Linux NFS code
by someone who cares about Linux performance.

-- Terry

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Fatal trap 12: page fault while in kernel mode

2002-04-04 Thread Andrew Gallatin


Will Froning writes:
 > I have a 4.5-RELEASE-p2 box that is my Firewall/NAT/NFS server.  As a
 > NFS client I have a RH7.2 linux box.  When I do massive NFS writes to
 > my FBSD (from RH7.2 box), I get a panic.  I've attached the info I got
 > from my debug kernel.
 > 

While the fix being discussed by Peter & others will prevent panics,
the linux box will still run your server out of mbufs clusters.  This
is happening because the linux box is using a 16K write size over UDP
by default.  This is a stupid default.  If there is any lossage
between the hosts (eg, any packets get dropped), more and more packets
will end up on the reassembly queues.  Eventually, all your cluster
mbufs will be there.

I suggest changing the mount options on the linux box to use 8k reads
and writes, or use TCP.

Another problem I've see w/Linux NFS clients is that recent linux NFS
clients seem to spew ACCESS requests like there's no tomorrow & beats
the snot out of my NFS server.  When building large software pacakges
via "make -j4" over NFSv3 (100Mb ethernet) on a dual PIII 1GHz system,
a FreeBSD 4.5 host issues 400-500 ACCESS calls/sec.  A Linux 2.4.18
host spews 12,000 - 14,000 ACCESS calls/sec, or roughly 30 times as
many.  Needless to say, the build finishes a whole lot quicker on
FreeBSD.  Does anybody know what I can do to make the linux client
cache ACCESS info?

Cheers,

Drew



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Fatal trap 12: page fault while in kernel mode

2002-04-04 Thread Peter Wemm

Terry Lambert wrote:
> David Greenman wrote:
> > >#16 0xc0152220 in tsleep ()
> > >#17 0xc016abfe in m_clalloc_wait ()
> > >#18 0xc01c8b14 in nfs_realign ()
> > >#19 0xc01c9653 in nfsrv_rcv ()
> > >#20 0xc01701d0 in sowakeup ()
> > >#21 0xc01abd7c in udp_input ()
> > >#22 0xc01a1bfb in ip_input ()
> > >#23 0xc01a1c5b in ipintr ()
> > 
> >This is basically telling you that there is a bug in the NFS code that i
s
> > incorrectly trying to do a "wait" type of allocation in an interrupt contex
t,
> > which is not valid. You can't sleep when there is no process context.
> 
> Amusing.
> 
> Then the fix is probably to take the proc pointer of the
> proc whose socket is being used to do the call, which is
> the third argument to nfssvc_addsock(), and put it into
> the structure pointed to by "struct nfssvc_sock *" as the
> argument to the upcall.
> 
> Then, in the upcall code in nfsrv_rcv(), pass the proc
> pointer down as the process context.
> 
> I think, actually, that multiple sleeps by the same process
> are also disallowed (;^)), so probably...
> 
> 
> You will need to modify nfs_realign() to take a waitflag,
> as propagated from nfsrv_rcv()... and then pass it through
> on the MCLGET and the MGET, to make sure that if the alloc
> fails, that it's OK.
> 
> This does point out a problem in MCLGET() (the macro that
> wraps m_clalloc_wait()) wanting a process context.
> 
> Probably, the best thing would be to pass a proc p in, and
> if it's NULL, just imply no wait semantics.
> 
> What an ugly mess...

Terry, if you spent half of the time reading the code as speculating and
writing about your wild speculation, you'd know that we already have a
"waitflag" for nfsrv_rcv() to track safeness to wait or not.  The bug is that
nfs_realign doesn't take the 'waitflag' argument and has two 'can wait'
mbuf allocation calls.

The fix is trivial and hardly ugly.  But then again, anybody who actually
bothered to read the code before posting would know that.

Cheers,
-Peter
--
Peter Wemm - [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED]
"All of this is for nothing if we don't go to the stars" - JMS/B5


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Fatal trap 12: page fault while in kernel mode

2002-04-03 Thread Terry Lambert

Peter Wemm wrote:
> > You will need to modify nfs_realign() to take a waitflag,
 +++
> > as propagated from nfsrv_rcv()... and then pass it through
   *

> Terry, if you spent half of the time reading the code as speculating and
> writing about your wild speculation, you'd know that we already have a
> "waitflag" for nfsrv_rcv() to track safeness to wait or not.

If you had read the above, you'd see I knew that.  Note the
asterisk marked phrase.

> The bug is that
> nfs_realign doesn't take the 'waitflag' argument and has two 'can wait'
> mbuf allocation calls.

I said that, too.  Note the plus sign marked phrase.  8-).

> The fix is trivial and hardly ugly.  But then again, anybody who actually
> bothered to read the code before posting would know that.

It was a general comment on the NFS code.

You suggested exactly the same fix I did...

-- Terry

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Fatal trap 12: page fault while in kernel mode

2002-04-03 Thread Terry Lambert

David Greenman wrote:
> >#16 0xc0152220 in tsleep ()
> >#17 0xc016abfe in m_clalloc_wait ()
> >#18 0xc01c8b14 in nfs_realign ()
> >#19 0xc01c9653 in nfsrv_rcv ()
> >#20 0xc01701d0 in sowakeup ()
> >#21 0xc01abd7c in udp_input ()
> >#22 0xc01a1bfb in ip_input ()
> >#23 0xc01a1c5b in ipintr ()
> 
>This is basically telling you that there is a bug in the NFS code that is
> incorrectly trying to do a "wait" type of allocation in an interrupt context,
> which is not valid. You can't sleep when there is no process context.

Amusing.

Then the fix is probably to take the proc pointer of the
proc whose socket is being used to do the call, which is
the third argument to nfssvc_addsock(), and put it into
the structure pointed to by "struct nfssvc_sock *" as the
argument to the upcall.

Then, in the upcall code in nfsrv_rcv(), pass the proc
pointer down as the process context.

I think, actually, that multiple sleeps by the same process
are also disallowed (;^)), so probably...


You will need to modify nfs_realign() to take a waitflag,
as propagated from nfsrv_rcv()... and then pass it through
on the MCLGET and the MGET, to make sure that if the alloc
fails, that it's OK.

This does point out a problem in MCLGET() (the macro that
wraps m_clalloc_wait()) wanting a process context.

Probably, the best thing would be to pass a proc p in, and
if it's NULL, just imply no wait semantics.

What an ugly mess...

-- Terry

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Fatal trap 12: page fault while in kernel mode

2002-04-03 Thread David Greenman

>Fatal trap 12: page fault while in kernel mode
>fault virtual address  = 0x70

>#12 0xc014f61d in panic ()
>#13 0xc025c02f in trap_fatal ()
>#14 0xc025bcdd in trap_pfault ()
>#15 0xc025b883 in trap ()
>#16 0xc0152220 in tsleep ()
>#17 0xc016abfe in m_clalloc_wait ()
>#18 0xc01c8b14 in nfs_realign ()
>#19 0xc01c9653 in nfsrv_rcv ()
>#20 0xc01701d0 in sowakeup ()
>#21 0xc01abd7c in udp_input ()
>#22 0xc01a1bfb in ip_input ()
>#23 0xc01a1c5b in ipintr ()

   This is basically telling you that there is a bug in the NFS code that is
incorrectly trying to do a "wait" type of allocation in an interrupt context,
which is not valid. You can't sleep when there is no process context.

-DG

David Greenman
Co-founder, The FreeBSD Project - http://www.freebsd.org
President, TeraSolutions, Inc. - http://www.terasolutions.com
President, Download Technologies, Inc. - http://www.downloadtech.com
Pave the road of life with opportunities.

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Fatal trap 12: page fault while in kernel mode

2002-04-03 Thread Terry Lambert

Will Froning wrote:
> #12 0xc014f61d in panic ()
> #13 0xc025c02f in trap_fatal ()
> #14 0xc025bcdd in trap_pfault ()
> #15 0xc025b883 in trap ()
> #16 0xc0152220 in tsleep ()
> #17 0xc016abfe in m_clalloc_wait ()

The tsleep tried to reference a page that wasn't there.  This
supposedly can't happen.  Here is the tsleep:

caddr_t
m_clalloc_wait(void)
{
...
/* Sleep until something's available or until we expire. */
m_clalloc_wid++;
if ((tsleep(&m_clalloc_wid, PVM, "mclalc", mbuf_wait)) ==
 EWOULDBLOCK)
m_clalloc_wid--;

The m_clalloc_wid is a global variable, so it's not swapped out.

The mbuf_wait is a tunable; it defaults to 32.  You might want to
try tuning this higher... making it wait longer... or setting it
to 0 -- making it wait forever.  This would workaround, or eliminate
you problem.

That the thing panics implies to me that the page that it references
got swapped out from under it (or freed).

The call happens when you are in an extremely low memory condition,
out of mbufs, and then you try to allocate more.  The wakeup on
the free does not guarantee that you will get the resource, in
a resource-staved state.  This would be much better served by a
wakeup_one() instead of a wakeup(), wakeup_one() will not keep the
process being awakened from losing the race for the allocation, if
someone else comes in for the allocation at the same time, through
another path (e.g. handing a network card interrupt allocation for
an mbuf).

The best answer is probably to say: "add mbufs".  This will keep
you from hitting the starvation problem here.  Fixing the other
issues are deeper problems (the second panic is also related to
memory, that time for a lock).

Other than that, you will have to do some tracking down, if
you really want it to fail gracefully... one option is make
m_clalloc_wait actually frigging wait.  It was an incredible
error when "_wait" no longer meant "wait", but instead turned
into "wait for a bit, then fail spectacularly".


-- Terry

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: fatal trap 12: page fault while in kernel mode

2000-05-26 Thread Bosko Milekic


On Fri, 26 May 2000, Greg Skouby wrote:

> Hello,
> 
> I posted a message to -questions yesterday about a machine that had the
> /dev directory somewhat corrupt. I could ls -la /dev/wd0* but when I was
> in the /dev director when I did an ls it was not showing any of the files.  
> Now, today the machine was rebooting over and over again, freezing with
> this message:
> 
> 
> fatal trap 12: page fault while in kernel mode
> 
> fault virtual address = 0xc33a3c6d
> 
> fault code = supervisor read, page not present
> 
> Instruction Pointer  = 0x8:0xc022798F
> 

You have to post more information. For example, what is at the
  location pointed at by the instruction pointer? Get a stack trace, if
  possible (from the debugger), and any other relevant info., most of which
  is explained in the Handbook.
  

--
 Bosko Milekic * pages.infinit.net/bmilekic/index.html * www.technokratis.com
 [EMAIL PROTECTED] * [EMAIL PROTECTED] * [EMAIL PROTECTED]




To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message