Re: "Fatal trap 12: page fault while in kernel mode" on 7.1/amd64, but not 7.0

2009-03-07 Thread Kostik Belousov
On Fri, Mar 06, 2009 at 05:01:12PM -0500, Boris Kochergin wrote:
> Gavin Atkinson wrote:
> >On Thu, 2009-03-05 at 19:55 -0500, Boris Kochergin wrote:
> >  
> >>Ahoy. I recently upgraded an amd64 machine to 7.1-RELEASE, and started 
> >>getting a bunch of these at a pretty high frequency (a few hours to a 
> >>day apart):
> >>
> >>http://acm.poly.edu/~spawk/IMG00033.jpg
> >>
> >>The "current process" is always httpd. They're particularly annoying 
> >>because the machine doesn't actually ever reboot, requiring manual 
> >>intervention. Reverting the kernel back to 7.0 makes the panic go away, 
> >>and the machine had been happily running 7.0 for about a year 
> >>beforehand. I realize that the photo hardly contains any useful 
> >>debugging information, but I was hoping it might look familiar to 
> >>someone. If not, I guess I'll come back with a backtrace.
> >>
> >
> >A backtrace will almost certainly be necessary to figure out what this
> >issue is, although there is a possibility that the output of
> >"addr2line -e /boot/kernel/kernel.symbols 0x8:0x802d7010"
> >might help, assuming you've not recompiled your kernel yet.  (That
> >number should be the same as the "instruction pointer" shown by the
> >panic, but as the photo is quite blurred there's a chance I've got it
> >wrong, if you have a better picture of it or wrote it down then use
> >that)
> >
> >Gavin
> >___
> >freebsd-stable@freebsd.org mailing list
> >http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> >To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
> >  
> Here it is, with some additional information afterward:
> 
> Unread portion of the kernel message buffer:
> kernel trap 12 with interrupts disabled
> 
> 
> Fatal trap 12: page fault while in kernel mode
> cpuid = 1; apic id = 01
> fault virtual address   = 0x30
> fault code  = supervisor read data, page not present
> instruction pointer = 0x8:0x80293faf
> stack pointer   = 0x10:0x9cbaea70
> frame pointer   = 0x10:0xff000fc14000
> code segment= base 0x0, limit 0xf, type 0x1b
>   = DPL 0, pres 1, long 1, def32 0, gran 1
> processor eflags= resume, IOPL = 0
> current process = 881 (httpd)
> trap number = 12
> panic: page fault
> cpuid = 1
> Uptime: 1m51s
> Physical memory: 8185 MB
> Dumping 328 MB: 313 297 281 265 249 233 217 201 185 169 153 137 121 105 
> 89 73 57 41 25 9
> 
> #0  doadump () at pcpu.h:195
> 195 pcpu.h: No such file or directory.
>   in pcpu.h
> (kgdb) where
> #0  doadump () at pcpu.h:195
> #1  0xff000fc14000 in ?? ()
> #2  0x8025eba9 in boot (howto=260) at 
> /usr/src-7.1/sys/kern/kern_shutdown.c:418
> #3  0x8025efb2 in panic (fmt=0x104  bounds>) at /usr/src-7.1/sys/kern/kern_shutdown.c:574
> #4  0x803df5c3 in trap_fatal (frame=0xff000fc14000, 
> eva=Variable "eva" is not available.
> ) at /usr/src-7.1/sys/amd64/amd64/trap.c:764
> #5  0x803e018f in trap (frame=0x9cbae9c0) at 
> /usr/src-7.1/sys/amd64/amd64/trap.c:290
> #6  0x803c5c4e in calltrap () at 
> /usr/src-7.1/sys/amd64/amd64/exception.S:209
> #7  0x80293faf in turnstile_broadcast (ts=0x0, queue=0) at 
> /usr/src-7.1/sys/kern/subr_turnstile.c:836
> #8  0x8025256a in _mtx_unlock_sleep (m=0x80593538, 
> opts=Variable "opts" is not available.
> ) at /usr/src-7.1/sys/kern/kern_mutex.c:619
> #9  0x80275ed3 in __umtx_op_cv_wait (td=0x1ee, uap=Variable 
> "uap" is not available.
> ) at /usr/src-7.1/sys/kern/kern_umtx.c:312
> #10 0x803dfb78 in syscall (frame=0x9cbaec80) at 
> /usr/src-7.1/sys/amd64/amd64/trap.c:907
> #11 0x803c5e5b in Xfast_syscall () at 
> /usr/src-7.1/sys/amd64/amd64/exception.S:330
> #12 0x000800f5354c in ?? ()
> Previous frame inner to this frame (corrupt stack?)
> (kgdb)
> 
> The dump was difficult to acquire--the system would often lock up after 
> dumping only a portion of the memory it wanted to save. I can also now 
> trigger the panic pretty reliably using this bit of script:
> 
> #!/usr/local/bin/bash
> 
> for i in {1..900}
> do
> wget --quiet -O /dev/null http://acm.poly.edu/wiki/Hosting &
> done
> 
> ...where the URL is a MediaWiki installation on the afflicted machine.

Can you, please, recompile the kernel with debugging options, and
provoke the panic on it ?

We need at least options INVARIANTS, INVARIANT_SUPPORT and WITNESS.



pgpRs7poemfsA.pgp
Description: PGP signature


Re: "Fatal trap 12: page fault while in kernel mode" on 7.1/amd64, but not 7.0

2009-03-06 Thread Boris Kochergin

Gavin Atkinson wrote:

On Thu, 2009-03-05 at 19:55 -0500, Boris Kochergin wrote:
  
Ahoy. I recently upgraded an amd64 machine to 7.1-RELEASE, and started 
getting a bunch of these at a pretty high frequency (a few hours to a 
day apart):


http://acm.poly.edu/~spawk/IMG00033.jpg

The "current process" is always httpd. They're particularly annoying 
because the machine doesn't actually ever reboot, requiring manual 
intervention. Reverting the kernel back to 7.0 makes the panic go away, 
and the machine had been happily running 7.0 for about a year 
beforehand. I realize that the photo hardly contains any useful 
debugging information, but I was hoping it might look familiar to 
someone. If not, I guess I'll come back with a backtrace.



A backtrace will almost certainly be necessary to figure out what this
issue is, although there is a possibility that the output of
"addr2line -e /boot/kernel/kernel.symbols 0x8:0x802d7010"
might help, assuming you've not recompiled your kernel yet.  (That
number should be the same as the "instruction pointer" shown by the
panic, but as the photo is quite blurred there's a chance I've got it
wrong, if you have a better picture of it or wrote it down then use
that)

Gavin
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
  

Here it is, with some additional information afterward:

Unread portion of the kernel message buffer:
kernel trap 12 with interrupts disabled


Fatal trap 12: page fault while in kernel mode
cpuid = 1; apic id = 01
fault virtual address   = 0x30
fault code  = supervisor read data, page not present
instruction pointer = 0x8:0x80293faf
stack pointer   = 0x10:0x9cbaea70
frame pointer   = 0x10:0xff000fc14000
code segment= base 0x0, limit 0xf, type 0x1b
  = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags= resume, IOPL = 0
current process = 881 (httpd)
trap number = 12
panic: page fault
cpuid = 1
Uptime: 1m51s
Physical memory: 8185 MB
Dumping 328 MB: 313 297 281 265 249 233 217 201 185 169 153 137 121 105 
89 73 57 41 25 9


#0  doadump () at pcpu.h:195
195 pcpu.h: No such file or directory.
  in pcpu.h
(kgdb) where
#0  doadump () at pcpu.h:195
#1  0xff000fc14000 in ?? ()
#2  0x8025eba9 in boot (howto=260) at 
/usr/src-7.1/sys/kern/kern_shutdown.c:418
#3  0x8025efb2 in panic (fmt=0x104 bounds>) at /usr/src-7.1/sys/kern/kern_shutdown.c:574
#4  0x803df5c3 in trap_fatal (frame=0xff000fc14000, 
eva=Variable "eva" is not available.

) at /usr/src-7.1/sys/amd64/amd64/trap.c:764
#5  0x803e018f in trap (frame=0x9cbae9c0) at 
/usr/src-7.1/sys/amd64/amd64/trap.c:290
#6  0x803c5c4e in calltrap () at 
/usr/src-7.1/sys/amd64/amd64/exception.S:209
#7  0x80293faf in turnstile_broadcast (ts=0x0, queue=0) at 
/usr/src-7.1/sys/kern/subr_turnstile.c:836
#8  0x8025256a in _mtx_unlock_sleep (m=0x80593538, 
opts=Variable "opts" is not available.

) at /usr/src-7.1/sys/kern/kern_mutex.c:619
#9  0x80275ed3 in __umtx_op_cv_wait (td=0x1ee, uap=Variable 
"uap" is not available.

) at /usr/src-7.1/sys/kern/kern_umtx.c:312
#10 0x803dfb78 in syscall (frame=0x9cbaec80) at 
/usr/src-7.1/sys/amd64/amd64/trap.c:907
#11 0x803c5e5b in Xfast_syscall () at 
/usr/src-7.1/sys/amd64/amd64/exception.S:330

#12 0x000800f5354c in ?? ()
Previous frame inner to this frame (corrupt stack?)
(kgdb)

The dump was difficult to acquire--the system would often lock up after 
dumping only a portion of the memory it wanted to save. I can also now 
trigger the panic pretty reliably using this bit of script:


#!/usr/local/bin/bash

for i in {1..900}
do
wget --quiet -O /dev/null http://acm.poly.edu/wiki/Hosting &
done

...where the URL is a MediaWiki installation on the afflicted machine.

-Boris
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: "Fatal trap 12: page fault while in kernel mode" on 7.1/amd64, but not 7.0

2009-03-06 Thread Gavin Atkinson
On Thu, 2009-03-05 at 19:55 -0500, Boris Kochergin wrote:
> Ahoy. I recently upgraded an amd64 machine to 7.1-RELEASE, and started 
> getting a bunch of these at a pretty high frequency (a few hours to a 
> day apart):
> 
> http://acm.poly.edu/~spawk/IMG00033.jpg
> 
> The "current process" is always httpd. They're particularly annoying 
> because the machine doesn't actually ever reboot, requiring manual 
> intervention. Reverting the kernel back to 7.0 makes the panic go away, 
> and the machine had been happily running 7.0 for about a year 
> beforehand. I realize that the photo hardly contains any useful 
> debugging information, but I was hoping it might look familiar to 
> someone. If not, I guess I'll come back with a backtrace.

A backtrace will almost certainly be necessary to figure out what this
issue is, although there is a possibility that the output of
"addr2line -e /boot/kernel/kernel.symbols 0x8:0x802d7010"
might help, assuming you've not recompiled your kernel yet.  (That
number should be the same as the "instruction pointer" shown by the
panic, but as the photo is quite blurred there's a chance I've got it
wrong, if you have a better picture of it or wrote it down then use
that)

Gavin
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: "Fatal trap 12: page fault while in kernel mode" on 7.1/amd64, but not 7.0

2009-03-06 Thread Kostik Belousov
On Thu, Mar 05, 2009 at 07:55:30PM -0500, Boris Kochergin wrote:
> Ahoy. I recently upgraded an amd64 machine to 7.1-RELEASE, and started 
> getting a bunch of these at a pretty high frequency (a few hours to a 
> day apart):
> 
> http://acm.poly.edu/~spawk/IMG00033.jpg
> 
> The "current process" is always httpd. They're particularly annoying 
> because the machine doesn't actually ever reboot, requiring manual 
> intervention. Reverting the kernel back to 7.0 makes the panic go away, 
> and the machine had been happily running 7.0 for about a year 
> beforehand. I realize that the photo hardly contains any useful 
> debugging information, but I was hoping it might look familiar to 
> someone. If not, I guess I'll come back with a backtrace.

You need to provide the backtrace from kgdb.


pgpifKOEedU0J.pgp
Description: PGP signature


"Fatal trap 12: page fault while in kernel mode" on 7.1/amd64, but not 7.0

2009-03-05 Thread Boris Kochergin
Ahoy. I recently upgraded an amd64 machine to 7.1-RELEASE, and started 
getting a bunch of these at a pretty high frequency (a few hours to a 
day apart):


http://acm.poly.edu/~spawk/IMG00033.jpg

The "current process" is always httpd. They're particularly annoying 
because the machine doesn't actually ever reboot, requiring manual 
intervention. Reverting the kernel back to 7.0 makes the panic go away, 
and the machine had been happily running 7.0 for about a year 
beforehand. I realize that the photo hardly contains any useful 
debugging information, but I was hoping it might look familiar to 
someone. If not, I guess I'll come back with a backtrace.


-Boris
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"