Re: panic in propagate_priority w/ postgresql under heavy load

2005-11-04 Thread Koen Martens
Robert Watson wrote:

>
> On Sun, 2 Oct 2005, Koen Martens wrote:
>
>> kernel trap 12 with interrupts disabled
>>
>>
>> Fatal trap 12: page fault while in kernel mode
>> cpuid = 1; apic id = 06
>> fault virtual address   = 0x24
>> fault code  = supervisor read, page not present
>> instruction pointer = 0x8:0xc051c253
>> stack pointer   = 0x10:0xe93efb3c
>> frame pointer   = 0x10:0xe93efb50
>> code segment= base 0x0, limit 0xf, type 0x1b
>>= DPL 0, pres 1, def32 1, gran 1
>> processor eflags= resume, IOPL = 0
>> current process = 6092 (postgres)
>>
>> And that, that is all.. No ddb> no 'dumping MB', just that. So 
>> basically, i fear this is a non-debugable problem, since putting in 
>> witness and such slows the kernel to a point where the panic does not 
>> occur anymore (at least, not in the 4 weeks i've been running the box 
>> with witness & invariants). Clueless :)
>
>
> This looks like a NULL pointer dereference in kernel code.  Probably, 
> this is not a locking problem, so running without WITNESS to debug 
> this should be OK.  Are you using a serial console?  If not, you might 
> find that it increases the reliability of entering DDB.  If this box 
> is an SMP box, you may also want to add options KDB_STOP_NMI to your 
> kernel config.
>
> Using gdb, could you work out what function 0xc051c253 is, and where 
> in the function.  You should be able to run gdb on your kernel.debug 
> (or kernel on 7.x), and use "l *0xc051c253" to generate a pointer to 
> the line and snippet, which will give us a substantial hint about what 
> is happening.

Sorry for not getting back on this timely, have had rather a busy
period
(lousy excuse, i know). Anyway, I have currently downgraded the machine
to a 5.3-RELEASE-p22 kernel, which seems to have solved the problem.
There are no panics anymore (it has been two weeks since the
downgrade).
Makes me a bit warry about upgrading anything to 6.x :)

Anyway, i did get into the ddb prompt on one of the last panics, and
put
some of the resources online:

http://www.sonologic.nl/fbsd/

As you can see, i was pretty clueless about what to do, and just traced
about everything that was not swapped out..

Did not put the core dump online, as i don't feel like sharing that
with
the world. Available upon request though for those who want to get a
crack at this.

I don't have a copy of the kernel.debug lying around, for which I
apologise. I cannot however upgrade to 5.4 again, we've had enought
trouble with this machine and the user load on that machine has
increased to a point where i cannot afford these random panics anymore.
I don't have the spare identical hardware lying around at this point to
copy the entire setup for testing purposes..

What i will try when i find some time is doing incremental upgrades
from
5.3-RELEASE-p22 to 5.4-RELEASE-p6, step by step, to see what patchlevel
introduces the problem.

Best,

Koen


-- 
K.F.J. Martens, Sonologic, http://www.sonologic.nl/
Networking, hosting, embedded systems, unix, artificial intelligence.
Public PGP key: http://www.metro.cx/pubkey-gmc.asc
Wondering about the funny attachment your mail program
can't read? Visit http://www.openpgp.org/
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: panic in propagate_priority w/ postgresql under heavy load

2005-11-04 Thread Koen Martens

Robert Watson wrote:



On Sun, 2 Oct 2005, Koen Martens wrote:


kernel trap 12 with interrupts disabled


Fatal trap 12: page fault while in kernel mode
cpuid = 1; apic id = 06
fault virtual address   = 0x24
fault code  = supervisor read, page not present
instruction pointer = 0x8:0xc051c253
stack pointer   = 0x10:0xe93efb3c
frame pointer   = 0x10:0xe93efb50
code segment= base 0x0, limit 0xf, type 0x1b
   = DPL 0, pres 1, def32 1, gran 1
processor eflags= resume, IOPL = 0
current process = 6092 (postgres)

And that, that is all.. No ddb> no 'dumping MB', just that. So 
basically, i fear this is a non-debugable problem, since putting in 
witness and such slows the kernel to a point where the panic does not 
occur anymore (at least, not in the 4 weeks i've been running the box 
with witness & invariants). Clueless :)



This looks like a NULL pointer dereference in kernel code.  Probably, 
this is not a locking problem, so running without WITNESS to debug 
this should be OK.  Are you using a serial console?  If not, you might 
find that it increases the reliability of entering DDB.  If this box 
is an SMP box, you may also want to add options KDB_STOP_NMI to your 
kernel config.


Using gdb, could you work out what function 0xc051c253 is, and where 
in the function.  You should be able to run gdb on your kernel.debug 
(or kernel on 7.x), and use "l *0xc051c253" to generate a pointer to 
the line and snippet, which will give us a substantial hint about what 
is happening.


Sorry for not getting back on this timely, have had rather a busy period
(lousy excuse, i know). Anyway, I have currently downgraded the machine
to a 5.3-RELEASE-p22 kernel, which seems to have solved the problem.
There are no panics anymore (it has been two weeks since the downgrade).
Makes me a bit warry about upgrading anything to 6.x :)

Anyway, i did get into the ddb prompt on one of the last panics, and put
some of the resources online:

http://www.sonologic.nl/fbsd/

As you can see, i was pretty clueless about what to do, and just traced
about everything that was not swapped out..

Did not put the core dump online, as i don't feel like sharing that with
the world. Available upon request though for those who want to get a
crack at this.

I don't have a copy of the kernel.debug lying around, for which I
apologise. I cannot however upgrade to 5.4 again, we've had enought
trouble with this machine and the user load on that machine has
increased to a point where i cannot afford these random panics anymore.
I don't have the spare identical hardware lying around at this point to
copy the entire setup for testing purposes..

What i will try when i find some time is doing incremental upgrades from
5.3-RELEASE-p22 to 5.4-RELEASE-p6, step by step, to see what patchlevel
introduces the problem.

Best,

Koen

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: panic in propagate_priority w/ postgresql under heavy load

2005-10-05 Thread Robert Watson


On Sun, 2 Oct 2005, Koen Martens wrote:


kernel trap 12 with interrupts disabled


Fatal trap 12: page fault while in kernel mode
cpuid = 1; apic id = 06
fault virtual address   = 0x24
fault code  = supervisor read, page not present
instruction pointer = 0x8:0xc051c253
stack pointer   = 0x10:0xe93efb3c
frame pointer   = 0x10:0xe93efb50
code segment= base 0x0, limit 0xf, type 0x1b
   = DPL 0, pres 1, def32 1, gran 1
processor eflags= resume, IOPL = 0
current process = 6092 (postgres)

And that, that is all.. No ddb> no 'dumping MB', just that. So 
basically, i fear this is a non-debugable problem, since putting in 
witness and such slows the kernel to a point where the panic does not 
occur anymore (at least, not in the 4 weeks i've been running the box 
with witness & invariants). Clueless :)


This looks like a NULL pointer dereference in kernel code.  Probably, this 
is not a locking problem, so running without WITNESS to debug this should 
be OK.  Are you using a serial console?  If not, you might find that it 
increases the reliability of entering DDB.  If this box is an SMP box, you 
may also want to add options KDB_STOP_NMI to your kernel config.


Using gdb, could you work out what function 0xc051c253 is, and where in 
the function.  You should be able to run gdb on your kernel.debug (or 
kernel on 7.x), and use "l *0xc051c253" to generate a pointer to the line 
and snippet, which will give us a substantial hint about what is 
happening.


Robert N M Watson



Best,

Koen


[ full kernel config:

#
# GENERIC -- Generic kernel configuration file for FreeBSD/i386
#
# For more information on this file, please read the handbook section on
# Kernel Configuration Files:
#
#
http://www.FreeBSD.org/doc/en_US.ISO8859-1/books/handbook/kernelconfig-config.html
#
# The handbook is also available locally in /usr/share/doc/handbook
# if you've installed the doc distribution, otherwise always see the
# FreeBSD World Wide Web server (http://www.FreeBSD.org/) for the
# latest information.
#
# An exhaustive list of options and more detailed explanations of the
# device lines is also present in the ../../conf/NOTES and NOTES files.
# If you are in doubt as to the purpose or necessity of a line,
check first
# in NOTES.
#
# $FreeBSD: src/sys/i386/conf/GENERIC,v 1.413.2.13 2005/04/02
16:37:58 scottl Exp $

machine i386
cpu I486_CPU
cpu I586_CPU
cpu I686_CPU
ident   YIN-YANG

# debug
#options WITNESS
#options INVARIANTS
#optionsINVARIANT_SUPPORT
options KDB
options DDB
#
options KDB_TRACE
#
makeoptions DEBUG=-g
# debug


# To statically compile in device wiring instead of /boot/device.hints
#hints  "GENERIC.hints"   # Default places to look for 
devices.

options SCHED_4BSD  # 4BSD scheduler
options INET# InterNETworking
options INET6   # IPv6 communications protocols
options FFS # Berkeley Fast Filesystem
options SOFTUPDATES # Enable FFS soft updates support
options UFS_ACL # Support for access control lists
options UFS_DIRHASH # Improve performance on big directories
options MD_ROOT # MD is a potential root device
options NFSCLIENT   # Network Filesystem Client
options NFSSERVER   # Network Filesystem Server
options NFS_ROOT# NFS usable as /, requires NFSCLIENT
options MSDOSFS # MSDOS Filesystem
options CD9660  # ISO 9660 Filesystem
options PROCFS  # Process filesystem (requires PSEUDOFS)
options PSEUDOFS# Pseudo-filesystem framework
options GEOM_GPT# GUID Partition Tables.
options COMPAT_43   # Compatible with BSD 4.3 [KEEP THIS!]
options COMPAT_FREEBSD4 # Compatible with FreeBSD4
options SCSI_DELAY=15000# Delay (in ms) before probing SCSI
options KTRACE  # ktrace(1) support
options SYSVSHM # SYSV-style shared memory
options SYSVMSG # SYSV-style message queues
options SYSVSEM # SYSV-style semaphores
options _KPOSIX_PRIORITY_SCHEDULING # POSIX P1003_1B real-time
extensions
options KBD_INSTALL_CDEV# install a CDEV entry in /dev
options AHC_REG_PRETTY_PRINT# Print register bitfields in debug
# output.  Adds ~128k to driver.
options AHD_REG_PRETTY_PRINT# Print register bitfields in debug
# output.  Adds ~215k to driver.
options ADAPTIVE_GIANT   

Re: panic in propagate_priority w/ postgresql under heavy load

2005-10-02 Thread Koen Martens
Robert Watson wrote:
> I can't speak to the problem with the core dumps, as it sounds like that
> is device/firmware related.  However, I probably can lend a hand in
> debugging the problems you're seeing.

I don't think the dump problem is device/firmware related, as a
reboot -d  gives me a dump just fine.

> Often, this means the actual panic or failure has not
> occurred in the thread that prints out the panic you see, but another
> panic.  So the first task on hitting a propagate_priority() panic is to
> identify the thread that actually had the problem.

Hmmm, so we have a very elusive problem here, one that is not easily
pinpointed.. Somehow, this does not come as a surprise :)

> If you want to do this by e-mail so we can lend a hand, you probably
> want to hook up a serial console so you can copy and paste the debugging
> session.  Compile DDB into the kernel (this should have no performance
> overhead), and when the system panics, you'll (ideally) get a db>
> prompt. [excellent help in debugging deleted for brevity]

Right, so perhaps i am missing something here, but this in my kernel
config should do it (full config included below for completeness
sake, as well as dmesg output):

# debug
options KDB
options DDB
#
options KDB_TRACE
#
makeoptions DEBUG=-g
# debug

Furthermore, since reboot -d does dump to swap now (by limiting
physical memory to just below the swap partition size in the
bootloader config), i would expect to get a dump also when a panic
occurs, and i would expect a ddb> prompt. Alas, this is what i get:


kernel trap 12 with interrupts disabled


Fatal trap 12: page fault while in kernel mode
cpuid = 1; apic id = 06
fault virtual address   = 0x24
fault code  = supervisor read, page not present
instruction pointer = 0x8:0xc051c253
stack pointer   = 0x10:0xe93efb3c
frame pointer   = 0x10:0xe93efb50
code segment= base 0x0, limit 0xf, type 0x1b
= DPL 0, pres 1, def32 1, gran 1
processor eflags= resume, IOPL = 0
current process = 6092 (postgres)


And that, that is all.. No ddb> no 'dumping MB', just that. So
basically, i fear this is a non-debugable problem, since putting in
witness and such slows the kernel to a point where the panic does
not occur anymore (at least, not in the 4 weeks i've been running
the box with witness & invariants). Clueless :)

Best,

Koen


[ full kernel config:

#
# GENERIC -- Generic kernel configuration file for FreeBSD/i386
#
# For more information on this file, please read the handbook section on
# Kernel Configuration Files:
#
#
http://www.FreeBSD.org/doc/en_US.ISO8859-1/books/handbook/kernelconfig-config.html
#
# The handbook is also available locally in /usr/share/doc/handbook
# if you've installed the doc distribution, otherwise always see the
# FreeBSD World Wide Web server (http://www.FreeBSD.org/) for the
# latest information.
#
# An exhaustive list of options and more detailed explanations of the
# device lines is also present in the ../../conf/NOTES and NOTES files.
# If you are in doubt as to the purpose or necessity of a line,
check first
# in NOTES.
#
# $FreeBSD: src/sys/i386/conf/GENERIC,v 1.413.2.13 2005/04/02
16:37:58 scottl Exp $

machine i386
cpu I486_CPU
cpu I586_CPU
cpu I686_CPU
ident   YIN-YANG

# debug
#options WITNESS
#options INVARIANTS
#optionsINVARIANT_SUPPORT
options KDB
options DDB
#
options KDB_TRACE
#
makeoptions DEBUG=-g
# debug


# To statically compile in device wiring instead of /boot/device.hints
#hints  "GENERIC.hints" # Default places to look for devices.

options SCHED_4BSD  # 4BSD scheduler
options INET# InterNETworking
options INET6   # IPv6 communications protocols
options FFS # Berkeley Fast Filesystem
options SOFTUPDATES # Enable FFS soft updates support
options UFS_ACL # Support for access control lists
options UFS_DIRHASH # Improve performance on big directories
options MD_ROOT # MD is a potential root device
options NFSCLIENT   # Network Filesystem Client
options NFSSERVER   # Network Filesystem Server
options NFS_ROOT# NFS usable as /, requires NFSCLIENT
options MSDOSFS # MSDOS Filesystem
options CD9660  # ISO 9660 Filesystem
options PROCFS  # Process filesystem (requires PSEUDOFS)
options PSEUDOFS# Pseudo-filesystem framework
options GEOM_GPT# GUID Partition Tables.
options COMPAT_43   # Compatible with BSD 4.3 [KEEP THIS!]
options COMPAT_FREEBSD4 # C

Re: panic in propagate_priority w/ postgresql under heavy load

2005-09-20 Thread John Baldwin
On Monday 19 September 2005 03:35 pm, Koen Martens wrote:
> Vinod Kashyap wrote:
> > You seem to be booting off of a 9000 (twa) controller and not 7000/8000
> > (twe).
> > It could be because of a 9000 firmware bug that you are not being able
> > to
> > get the dump.  The firmware wrongly interprets physical address 0x0 as
> > invalid
> > during dumps, and fails the operations.  This bug will be fixed in
> > future
> > firmware releases.
>
> Ok, it's been a while, here is an update on this.
>
> I ran a heavily instrumented kernel for two weeks on the server, it
> did not crash in that time. I then took out the witness and kdb/ddb
> stuff, because the decreased performance was a bit of a nuisance,
> however i retained the ability to obtain a crash dump. I had to
> limit physical memory, put it on 1.8GB in loader.conf:hw.physmem
> because swap and physmem are both 2GB. Tested with 'reboot -d' gave
> me a core dump.
>
> Without the debug stuff in the kernel, it crashed within 2 days,
> same story: postgresql process, function propagate_priority.
> However, no dump was written to disk :(
>
> Furthermore, i've been seeing the same crash (in propagate_priority)
> on another box in mysql processes. Both servers seem to panic every
> 2-3 days. I have another server of the exact same hardware
> configuration, but it is mainly idling most of the time. Haven't
> seen that one crash yet.
>
> I am thinking now that it is a bug in the twa driver, so i'll have
> to dig in to that. Furthermore, it seems to have to do with some
> sort of concurrency issue or otherwise timing-sensitive issue,
> because slowing the kernel down with debug code seems to avoid the
> panic. But, as i am completely new to the freebsd kernel and don't
> even know what turnstiles are, i imagine i will have a hard time. So
> if anyone can offer some help, please :)
>
> Ok, thanks for your attention,

This panic usually happens either because a thread went to sleep while holding 
a mutex (WITNESS will warn you about this when it happens, but as you noted, 
it slows things down).  It can also happen perhaps if a thread exits while 
holding a lock or if a thread is blocked on a mutex that is destroyed after 
it blocks on it.

-- 
John Baldwin <[EMAIL PROTECTED]>  <><  http://www.FreeBSD.org/~jhb/
"Power Users Use the Power to Serve"  =  http://www.FreeBSD.org
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: panic in propagate_priority w/ postgresql under heavy load

2005-09-20 Thread Robert Watson


On Mon, 19 Sep 2005, Koen Martens wrote:

Without the debug stuff in the kernel, it crashed within 2 days, same 
story: postgresql process, function propagate_priority. However, no dump 
was written to disk :(


Furthermore, i've been seeing the same crash (in propagate_priority) on 
another box in mysql processes. Both servers seem to panic every 2-3 
days. I have another server of the exact same hardware configuration, 
but it is mainly idling most of the time. Haven't seen that one crash 
yet.


I am thinking now that it is a bug in the twa driver, so i'll have to 
dig in to that. Furthermore, it seems to have to do with some sort of 
concurrency issue or otherwise timing-sensitive issue, because slowing 
the kernel down with debug code seems to avoid the panic. But, as i am 
completely new to the freebsd kernel and don't even know what turnstiles 
are, i imagine i will have a hard time. So if anyone can offer some 
help, please :)


Ok, thanks for your attention,


I can't speak to the problem with the core dumps, as it sounds like that 
is device/firmware related.  However, I probably can lend a hand in 
debugging the problems you're seeing.


First off, propagate_priority() is part of the priority propagation 
mechanism associated with mutexes, which are a locking primitive in the 
FreeBSD kernel.  Most panic in propagate_priority() are actually the 
result of a corrupted mutex, and when the mutex code goes to perform 
priority propagation, it trips over bad pointers and panics in some form 
of another.  Often, this means the actual panic or failure has not 
occurred in the thread that prints out the panic you see, but another 
panic.  So the first task on hitting a propagate_priority() panic is to 
identify the thread that actually had the problem.


Usually, I do this from DDB, rather than a core dump, because I find that 
DDB's tools for inspect running state are a little easier to use.  First, 
I identify what code called into the mutex call that resulted in 
propagate_priority() being called.  The reason to do this is that what you 
want to do next is use "ps" and "trace" to identify other 
processes/threads in the same code, and hence likely to have caused a 
problem with the mutex storage in memory.  Generally, you're looking for a 
panic in another thread, so once you identify a set of threads that might 
be to blame, you can trace them to find one that is in panic().  Usually, 
that thread will be in the RUN state, or on an SMP box, possibly running 
on another CPU.  If you're running 6.x, the thread that panicked was 
likely preempted as it had problems, perhaps due to an untimely interrupt.


If you want to do this by e-mail so we can lend a hand, you probably want 
to hook up a serial console so you can copy and paste the debugging 
session.  Compile DDB into the kernel (this should have no performance 
overhead), and when the system panics, you'll (ideally) get a db> prompt. 
The panic message and any related context (such as trap information) is 
useful.  I usually then use "show percpu" to see what CPU I'm running on, 
the thread that's running, etc.  I'll then use "trace" with no argument to 
see the stack of the thread.  If I'm trying to find another thread that 
may have been preempted, I'll use "ps" to show the running processes and 
threads, then "trace " to trace the main thread of processes that 
look interesting.  Generally, those in the RUN state, because the thread 
will be runnable.


If you're running on an SMP system, you may occasionally find that 
information to inspect the stacks of threads currently running on other 
processors may not be consistently in memory -- i.e., cached, the stack 
frame is partially written, or whatever.  There's a kernel option, 
KDB_STOP_NMI, which when combined with a sysctl, will cause the debugger 
to deliver an NMI IPI instead of a debug IPI, which may help kick those 
processors into the debugger if they are stuck in spin locks.  However, 
the chances are fairly good this isn't the case so you're probably fine 
without it.


Robert N M Watson
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: panic in propagate_priority w/ postgresql under heavy load

2005-09-19 Thread Koen Martens
Vinod Kashyap wrote:
> You seem to be booting off of a 9000 (twa) controller and not 7000/8000
> (twe).
> It could be because of a 9000 firmware bug that you are not being able
> to
> get the dump.  The firmware wrongly interprets physical address 0x0 as
> invalid
> during dumps, and fails the operations.  This bug will be fixed in
> future
> firmware releases.

Ok, it's been a while, here is an update on this.

I ran a heavily instrumented kernel for two weeks on the server, it
did not crash in that time. I then took out the witness and kdb/ddb
stuff, because the decreased performance was a bit of a nuisance,
however i retained the ability to obtain a crash dump. I had to
limit physical memory, put it on 1.8GB in loader.conf:hw.physmem
because swap and physmem are both 2GB. Tested with 'reboot -d' gave
me a core dump.

Without the debug stuff in the kernel, it crashed within 2 days,
same story: postgresql process, function propagate_priority.
However, no dump was written to disk :(

Furthermore, i've been seeing the same crash (in propagate_priority)
on another box in mysql processes. Both servers seem to panic every
2-3 days. I have another server of the exact same hardware
configuration, but it is mainly idling most of the time. Haven't
seen that one crash yet.

I am thinking now that it is a bug in the twa driver, so i'll have
to dig in to that. Furthermore, it seems to have to do with some
sort of concurrency issue or otherwise timing-sensitive issue,
because slowing the kernel down with debug code seems to avoid the
panic. But, as i am completely new to the freebsd kernel and don't
even know what turnstiles are, i imagine i will have a hard time. So
if anyone can offer some help, please :)

Ok, thanks for your attention,

Koen

-- 
K.F.J. Martens, Sonologic, http://www.sonologic.nl/
Networking, hosting, embedded systems, unix, artificial intelligence.
Public PGP key: http://www.metro.cx/pubkey-gmc.asc
Wondering about the funny attachment your mail program
can't read? Visit http://www.openpgp.org/
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: panic in propagate_priority w/ postgresql under heavy load

2005-09-08 Thread Mike Meyer
In <[EMAIL PROTECTED]>, Gary Jennejohn <[EMAIL PROTECTED]> typed:
> Koen Martens writes:
> > (Note: swap is 2048mb, physical memory is also 2048mb).
> IIRC swap has to be a little (64kB?) bigger than memory because the
> kernel writes a header containing necessary information about the
> dump to swap.

That information used to be in the dumpon man page. It was replaced by
a "better" explanation. I've submitted a PR to get this information
added back to the man page.

  http://www.mired.org/consulting.html
Independent Network/Unix/Perforce consultant, email for more information.
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: panic in propagate_priority w/ postgresql under heavy load

2005-09-08 Thread Gary Jennejohn
Koen Martens writes:
> (Note: swap is 2048mb, physical memory is also 2048mb).
> 

IIRC swap has to be a little (64kB?) bigger than memory because the
kernel writes a header containing necessary information about the
dump to swap.

---
Gary Jennejohn / garyjATjennejohnDOTorg gjATfreebsdDOTorg garyjATdenxDOTde

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: panic in propagate_priority w/ postgresql under heavy load

2005-09-07 Thread Koen Martens
Vinod Kashyap wrote:
> You seem to be booting off of a 9000 (twa) controller and not 7000/8000
> (twe).
> It could be because of a 9000 firmware bug that you are not being able
> to
> get the dump.  The firmware wrongly interprets physical address 0x0 as
> invalid
> during dumps, and fails the operations.  This bug will be fixed in
> future
> firmware releases.

Indeed am I booting of twa, swap is also on there. Just got back
from vacation, we did have another panic. The box was booted into
single user mode right after that, after which an image of the swap
partition was made with dd. When I got back, i turned of swap
momentarily and dd'ed the image back on the swap partition, after
which i ran savecore. However, savecore reports 'no dumps found'.
With the -f option it also says: 'unable to force dump - bad magic'

(Note: swap is 2048mb, physical memory is also 2048mb).

So basically, what i'm left with now is a production server that
crashes every x days, possibly resulting in some database
corruption, and no way to obtain more info about the crash than i
have already provided...

I will try and install a comparable setup on a spare box, without
the twa (plain IDE) and see if i can reproduce something, if i can
find some time to do this in.

I will also try bumping the witness limits. What is this witness
business anyway, what does it output and/or where do i RTFM about it?

Gr,

Koen

-- 
K.F.J. Martens, Sonologic, http://www.sonologic.nl/
Networking, hosting, embedded systems, unix, artificial intelligence.
Public PGP key: http://www.metro.cx/pubkey-gmc.asc
Wondering about the funny attachment your mail program
can't read? Visit http://www.openpgp.org/
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: panic in propagate_priority w/ postgresql under heavy load

2005-09-02 Thread John Baldwin
On Thursday 01 September 2005 06:04 pm, Koen Martens wrote:
> John Baldwin wrote:
> > On Thursday 01 September 2005 01:02 pm, Koen Martens wrote:
> >>I've had a little chat with neologism on ircnet/#freebsd about this
> >>already, and done as he suggested: compile a debug kernel to obtain
> >>a stack trace.
> >
> > Can you reproduce it with a kernel that has INVARIANTS and
> > INVARIANT_SUPPORT on?  I see that you had WITNESS on, can you check to
> > see if there were any witness messages about sleepign with non-sleepable
> > locks held before the crash?
>
> I will do this when I get back. I did a grep -i on witness in the
> console log but this did not turn up anything suspicious (exact
> output pasted below). Also, i checked again the logs right before
> the crashes, nothing special output to console before the Kernel
> trap 12..
>
>
> voltaire# grep -i witness yin.log
> WARNING: WITNESS option enabled, expect reduced performance.
> witness_get: witness exhausted
> WARNING: WITNESS option enabled, expect reduced performance.
> witness_get: witness exhausted

This last means that witness had turned itself off because it had run out of 
resources.  Try bumping up the WITNESS_COUNT constant in 
sys/kern/subr_witness.c.

-- 
John Baldwin <[EMAIL PROTECTED]>  <><  http://www.FreeBSD.org/~jhb/
"Power Users Use the Power to Serve"  =  http://www.FreeBSD.org
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


RE: panic in propagate_priority w/ postgresql under heavy load

2005-09-01 Thread Vinod Kashyap
> -Original Message-
> From: [EMAIL PROTECTED] 
> [mailto:[EMAIL PROTECTED] On Behalf Of Koen Martens
> Sent: Thursday, September 01, 2005 3:11 PM
> To: Dimitry Andric
> Cc: freebsd-hackers@freebsd.org
> Subject: Re: panic in propagate_priority w/ postgresql under 
> heavy load
> 
> Hi Dim,
> 
> Dimitry Andric wrote:
> > On 2005-09-01 at 19:02:06 Koen Martens wrote: 
> > 
> >>Anyway, it seems the dump should've gone to the swap partition, but 
> >>i'm into multi-user mode again so i guess i'll have to wait for 
> >>another panic to obtain it?
> > 
> > In RELENG_6, the dump device is chosen automagically during boot by 
> > /etc/rc.d/dumpon, but this is (alas) not the case in RELENG_5_x, so 
> > you'll have to manually specify it in /etc/rc.conf, i.e:
> > 
> >   dumpdev=/dev/ad0s1b
> > 
> > Then make sure you have enough space in /var/crash, and try to 
> > reproduce your panic...
> 
> Ok, i get it.. When it reboots it detects a dump on the swap 
> partition and dd's that to /var/crash.. Which has plenty of 
> free space on the particular box, 59 gigs ought to be enough 
> for everyone :)
> 
> > Also, I think I read somewhere that there used to be problems with 
> > dumping and 3Ware RAID cards (you seem to be using one according to 
> > your kernel config, but you apparently didn't post a dmesg).
> 
> You're right, dmesg included below.
> 
> > However,
> > it looks like revision 1.22.2.1 of src/sys/dev/twe/twe.c fixed that:
> > 
> > 
> http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/dev/twe/twe.c?rev=1.22.2
> > .1&content-type=text/x-cvsweb-markup
> > 
> > Just to be sure, can you check if you got this version of twe.c, or
> > 1.22.2.2 (which seems to be the RELENG_5_4 version, and 
> thus it should 
> > be fixed).
> 
>  *  $FreeBSD: src/sys/dev/twe/twe.c,v 1.22.2.2 2005/02/18
> 18:42:16 vkashyap
> Exp $
> 
> (nice wrapping, i think you get the idea :)
> 
> 
> dmesg:
> 
> 
> 
> Copyright (c) 1992-2005 The FreeBSD Project.
> Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 
> 1993, 1994
>   The Regents of the University of California. All rights 
> reserved.
> FreeBSD 5.4-RELEASE-p6 #1: Thu Sep  1 14:06:03 CEST 2005
> [EMAIL PROTECTED]:/usr/obj/usr/src/sys/yin-yang-5.4
> WARNING: WITNESS option enabled, expect reduced performance.
> Timecounter "i8254" frequency 1193182 Hz quality 0
> CPU: Intel(R) Xeon(TM) CPU 3.06GHz (3056.50-MHz 686-class CPU)
>   Origin = "GenuineIntel"  Id = 0xf29  Stepping = 9
> 
> Features=0xbfebfbff
>   Hyperthreading: 2 logical CPUs
> real memory  = 2146959360 (2047 MB)
> avail memory = 2095415296 (1998 MB)
> ACPI APIC Table: 
> FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs  cpu0 
> (BSP): APIC ID:  0
>  cpu1 (AP): APIC ID:  6
> ioapic0  irqs 0-23 on motherboard
> ioapic1  irqs 24-47 on motherboard
> ioapic2  irqs 48-71 on motherboard
> npx0:  on motherboard
> npx0: INT 16 interface
> acpi0:  on motherboard
> acpi0: Power Button (fixed)
> Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000
> acpi_timer0: <24-bit timer at 3.579545MHz> port 0x1008-0x100b on acpi0
> cpu0:  on acpi0
> cpu1:  on acpi0
> pcib0:  port 0xcf8-0xcff on acpi0
> pci0:  on pcib0
> pci0:  at device 0.1 (no driver attached)
> pcib1:  at device 2.0 on pci0
> pci1:  on pcib1
> pci1:  at device 28.0 
> (no driver attached)
> pcib2:  at device 29.0 on pci1
> pci2:  on pcib2
> pci1:  at device 30.0 
> (no driver attached)
> pcib3:  at device 31.0 on pci1
> pci3:  on pcib3
> 3ware device driver for 9000 series storage controllers, version:
> 2.50.02.012
> twa0: <3ware 9000 series Storage Controller> port 
> 0x7000-0x70ff mem 0xfd80-0xfdff,0xfb20-0xfb2000ff 
> irq 48 at device 1.0 on pci3
> twa0: 4 ports, Firmware FE9X 2.02.00.008, BIOS BE9X 2.02.01.037
> pci0:  at device 2.1 (no driver attached)
> pcib4:  at device 30.0 on pci0
> pci4:  on pcib4
> pci4:  at device 3.0 (no driver attached)
> fxp0:  port 0x8400-0x843f mem 
> 0xfb30-0xfb31,0xfb341000-0xfb341fff irq 20 at device 
> 4.0 on pci4
> miibus0:  on fxp0
> inphy0:  on miibus0
> inphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
> fxp0: Ethernet address: 00:02:b3:d8:8a:b5
> em0:  
> port 0x8440-0x847f mem 0xfb32-0xfb33 irq 23 at device 
> 5.0 on pci4
> em0: Ethernet address: 00:02:b3:d8:8b:05
> em0:  Speed:N/A  Duplex:N/A
> isab0:  at device 31.0 on pci0
> isa0:  on isab0
> atapci0:  port
> 0x6c60-0x6c6f,0x376,0x170-0x177,0x3f6,

Re: panic in propagate_priority w/ postgresql under heavy load

2005-09-01 Thread Koen Martens
Hi Dim,

Dimitry Andric wrote:
> On 2005-09-01 at 19:02:06 Koen Martens wrote: 
> 
>>Anyway, it seems the dump should've gone to the swap partition, but
>>i'm into multi-user mode again so i guess i'll have to wait for
>>another panic to obtain it?
> 
> In RELENG_6, the dump device is chosen automagically during boot by
> /etc/rc.d/dumpon, but this is (alas) not the case in RELENG_5_x, so
> you'll have to manually specify it in /etc/rc.conf, i.e:
> 
>   dumpdev=/dev/ad0s1b
> 
> Then make sure you have enough space in /var/crash, and try to
> reproduce your panic...

Ok, i get it.. When it reboots it detects a dump on the swap
partition and dd's that to /var/crash.. Which has plenty of free
space on the particular box, 59 gigs ought to be enough for everyone :)

> Also, I think I read somewhere that there used to be problems with
> dumping and 3Ware RAID cards (you seem to be using one according to
> your kernel config, but you apparently didn't post a dmesg).

You're right, dmesg included below.

> However,
> it looks like revision 1.22.2.1 of src/sys/dev/twe/twe.c fixed that:
> 
> http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/dev/twe/twe.c?rev=1.22.2.1&content-type=text/x-cvsweb-markup
> 
> Just to be sure, can you check if you got this version of twe.c, or
> 1.22.2.2 (which seems to be the RELENG_5_4 version, and thus it should
> be fixed).

 *  $FreeBSD: src/sys/dev/twe/twe.c,v 1.22.2.2 2005/02/18
18:42:16 vkashyap
Exp $

(nice wrapping, i think you get the idea :)


dmesg:



Copyright (c) 1992-2005 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
The Regents of the University of California. All rights reserved.
FreeBSD 5.4-RELEASE-p6 #1: Thu Sep  1 14:06:03 CEST 2005
[EMAIL PROTECTED]:/usr/obj/usr/src/sys/yin-yang-5.4
WARNING: WITNESS option enabled, expect reduced performance.
Timecounter "i8254" frequency 1193182 Hz quality 0
CPU: Intel(R) Xeon(TM) CPU 3.06GHz (3056.50-MHz 686-class CPU)
  Origin = "GenuineIntel"  Id = 0xf29  Stepping = 9

Features=0xbfebfbff
  Hyperthreading: 2 logical CPUs
real memory  = 2146959360 (2047 MB)
avail memory = 2095415296 (1998 MB)
ACPI APIC Table: 
FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs
 cpu0 (BSP): APIC ID:  0
 cpu1 (AP): APIC ID:  6
ioapic0  irqs 0-23 on motherboard
ioapic1  irqs 24-47 on motherboard
ioapic2  irqs 48-71 on motherboard
npx0:  on motherboard
npx0: INT 16 interface
acpi0:  on motherboard
acpi0: Power Button (fixed)
Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000
acpi_timer0: <24-bit timer at 3.579545MHz> port 0x1008-0x100b on acpi0
cpu0:  on acpi0
cpu1:  on acpi0
pcib0:  port 0xcf8-0xcff on acpi0
pci0:  on pcib0
pci0:  at device 0.1 (no driver attached)
pcib1:  at device 2.0 on pci0
pci1:  on pcib1
pci1:  at device 28.0 (no
driver attached)
pcib2:  at device 29.0 on pci1
pci2:  on pcib2
pci1:  at device 30.0 (no
driver attached)
pcib3:  at device 31.0 on pci1
pci3:  on pcib3
3ware device driver for 9000 series storage controllers, version:
2.50.02.012
twa0: <3ware 9000 series Storage Controller> port 0x7000-0x70ff mem
0xfd80-0xfdff,0xfb20-0xfb2000ff irq 48 at device 1.0 on pci3
twa0: 4 ports, Firmware FE9X 2.02.00.008, BIOS BE9X 2.02.01.037
pci0:  at device 2.1 (no driver attached)
pcib4:  at device 30.0 on pci0
pci4:  on pcib4
pci4:  at device 3.0 (no driver attached)
fxp0:  port 0x8400-0x843f mem
0xfb30-0xfb31,0xfb341000-0xfb341fff irq 20 at device 4.0 on pci4
miibus0:  on fxp0
inphy0:  on miibus0
inphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
fxp0: Ethernet address: 00:02:b3:d8:8a:b5
em0:  port
0x8440-0x847f mem 0xfb32-0xfb33 irq 23 at device 5.0 on pci4
em0: Ethernet address: 00:02:b3:d8:8b:05
em0:  Speed:N/A  Duplex:N/A
isab0:  at device 31.0 on pci0
isa0:  on isab0
atapci0:  port
0x6c60-0x6c6f,0x376,0x170-0x177,0x3f6,0x1f0-0x1f7 at device 31.1 on pci0
ata0: channel #0 on atapci0
ata1: channel #1 on atapci0
pci0:  at device 31.3 (no driver attached)
acpi_button0:  on acpi0
atkbdc0:  port 0x64,0x60 irq 1 on acpi0
atkbd0:  irq 1 on atkbdc0
kbd0 at atkbd0
fdc0:  port 0x3f7,0x3f4-0x3f5,0x3f0-0x3f3
irq 6 drq 2 on acpi0
fd0: <1440-KB 3.5" drive> on fdc0 drive 0
sio0: <16550A-compatible COM port> port 0x3f8-0x3ff irq 4 flags 0x10
on acpi0
sio0: type 16550A, console
sio1: <16550A-compatible COM port> port 0x2f8-0x2ff irq 3 on acpi0
sio1: type 16550A
ppc0:  port 0x778-0x77b,0x378-0x37f irq 7
drq 3 on acpi0
ppc0: SMC-like chipset (ECP/EPP/PS2/NIBBLE) in COMPATIBLE mode
ppc0: FIFO with 16/16/9 bytes threshold
ppbus0:  on ppc0
plip0:  on ppbus0
lpt0:  on ppbus0
lpt0: Interrupt-driven port
ppi0:  on ppbus0
pmtimer0 on isa0
orm0:  at iomem
0xe3000-0xe3fff,0xc8000-0xc97ff,0xc-0xc7fff on isa0
sc0:  at flags 0x100 on isa0
sc0: VGA <16 virtual consoles, flags=0x100>
vga0:  at port 0x3c0-0x3df iomem 0xa-0xb on
isa0
Timecounters tick every 10.000 msec
IPv6 packet filtering initialized, default to accept, logging
limited to 10 pack

Re: panic in propagate_priority w/ postgresql under heavy load

2005-09-01 Thread Koen Martens
John Baldwin wrote:
> On Thursday 01 September 2005 01:02 pm, Koen Martens wrote:
> 
>>I've had a little chat with neologism on ircnet/#freebsd about this
>>already, and done as he suggested: compile a debug kernel to obtain
>>a stack trace.
> 
> Can you reproduce it with a kernel that has INVARIANTS and INVARIANT_SUPPORT 
> on?  I see that you had WITNESS on, can you check to see if there were any 
> witness messages about sleepign with non-sleepable locks held before the 
> crash?

I will do this when I get back. I did a grep -i on witness in the
console log but this did not turn up anything suspicious (exact
output pasted below). Also, i checked again the logs right before
the crashes, nothing special output to console before the Kernel
trap 12..


voltaire# grep -i witness yin.log
WARNING: WITNESS option enabled, expect reduced performance.
witness_get: witness exhausted
WARNING: WITNESS option enabled, expect reduced performance.
witness_get: witness exhausted


Gr,

Koen

-- 
K.F.J. Martens, Sonologic, http://www.sonologic.nl/
Networking, hosting, embedded systems, unix, artificial intelligence.
Public PGP key: http://www.metro.cx/pubkey-gmc.asc
Wondering about the funny attachment your mail program
can't read? Visit http://www.openpgp.org/
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: panic in propagate_priority w/ postgresql under heavy load

2005-09-01 Thread Dimitry Andric
On 2005-09-01 at 19:02:06 Koen Martens wrote:

> Anyway, it seems the dump should've gone to the swap partition, but
> i'm into multi-user mode again so i guess i'll have to wait for
> another panic to obtain it?

Yes.  By now, if any dump was ever written to your swap partition, it
will most probably have been overwritten by your heavy postgres
activity. :)

In RELENG_6, the dump device is chosen automagically during boot by
/etc/rc.d/dumpon, but this is (alas) not the case in RELENG_5_x, so
you'll have to manually specify it in /etc/rc.conf, i.e:

  dumpdev=/dev/ad0s1b

Then make sure you have enough space in /var/crash, and try to
reproduce your panic...

Also, I think I read somewhere that there used to be problems with
dumping and 3Ware RAID cards (you seem to be using one according to
your kernel config, but you apparently didn't post a dmesg).  However,
it looks like revision 1.22.2.1 of src/sys/dev/twe/twe.c fixed that:

http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/dev/twe/twe.c?rev=1.22.2.1&content-type=text/x-cvsweb-markup

Just to be sure, can you check if you got this version of twe.c, or
1.22.2.2 (which seems to be the RELENG_5_4 version, and thus it should
be fixed).


pgpiU5sTzAb9H.pgp
Description: PGP signature


Re: panic in propagate_priority w/ postgresql under heavy load

2005-09-01 Thread John Baldwin
On Thursday 01 September 2005 01:02 pm, Koen Martens wrote:
> Hi Hackers,
>
> I've had a little chat with neologism on ircnet/#freebsd about this
> already, and done as he suggested: compile a debug kernel to obtain
> a stack trace.

Can you reproduce it with a kernel that has INVARIANTS and INVARIANT_SUPPORT 
on?  I see that you had WITNESS on, can you check to see if there were any 
witness messages about sleepign with non-sleepable locks held before the 
crash?

> Anyway, what is happening is that there is a crash when running
> postgresql 8.0.3 with a very large database and doing heavy queries.
>
> Kernel is 5.4-RELEASE-p6 (5.4-RELENG i checked out tuesday a week
> ago). Kernel conf is down below.
>
> Here is the message i get on the console:
>
> 
> kernel trap 12 with interrupts disabled
>
>
> Fatal trap 12: page fault while in kernel mode
> cpuid = 1; apic id = 06
> fault virtual address   = 0x24
> fault code  = supervisor read, page not present
> instruction pointer = 0x8:0xc050cff7
> stack pointer   = 0x10:0xe99c2b0c
> frame pointer   = 0x10:0xe99c2b20
> code segment= base 0x0, limit 0xf, type 0x1b
> = DPL 0, pres 1, def32 1, gran 1
> processor eflags= resume, IOPL = 0
> current process = 4571 (postgres)
> 
>
> It has been a postgres process in all of the observed cases.
>
> I've looked it up with gdk on my kernel.debug, here's what i get:
>
> =
> yin# gdb kernel.debug
> GNU gdb 6.1.1 [FreeBSD]
> Copyright 2004 Free Software Foundation, Inc.
> GDB is free software, covered by the GNU General Public License, and
> you are
> welcome to change it and/or distribute copies of it under certain
> conditions.
> Type "show copying" to see the conditions.
> There is absolutely no warranty for GDB.  Type "show warranty" for
> details.
> This GDB was configured as "i386-marcel-freebsd"...
> (gdb) l *propagate_priority+0x7f
> 0xc050cff7 is in propagate_priority
> (/usr/src/sys/kern/subr_turnstile.c:245).
> 240 /*
> 241  * Pick up the lock that td is blocked on.
> 242  */
> 243 ts = td->td_blocked;
> 244 MPASS(ts != NULL);
> 245 tc = TC_LOOKUP(ts->ts_lockobj);
> 246 mtx_lock_spin(&tc->tc_lock);
> 247
> 248 /*
> 249  * This thread may not be blocked on this
> turnstile anymore
> (gdb)
> =
>
>
> So the next thing you'll ask for is a stack trace, but i haven't
> been able to obtain one. According to the freebsd handbook
> (http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerne
>ldebug-gdb.html) there should be a core dump in /var/crash, but there is
> none and the handbook chapter seems outdated anyway judging by it
> mentioning kgdb...
>
> Anyway, it seems the dump should've gone to the swap partition, but
> i'm into multi-user mode again so i guess i'll have to wait for
> another panic to obtain it?
>
> I'm not very knowledgeable about the freebsd kernel internals (yet),
> so i'm not sure what could be wrong here.. I hope some of you can
> provide some insight, and ideally a fix :)
>
> ==[ kernel config:
>
> #
> # GENERIC -- Generic kernel configuration file for FreeBSD/i386
> #
> # For more information on this file, please read the handbook section on
> # Kernel Configuration Files:
> #
> #
> http://www.FreeBSD.org/doc/en_US.ISO8859-1/books/handbook/kernelconfig-conf
>ig.html #
> # The handbook is also available locally in /usr/share/doc/handbook
> # if you've installed the doc distribution, otherwise always see the
> # FreeBSD World Wide Web server (http://www.FreeBSD.org/) for the
> # latest information.
> #
> # An exhaustive list of options and more detailed explanations of the
> # device lines is also present in the ../../conf/NOTES and NOTES files.
> # If you are in doubt as to the purpose or necessity of a line,
> check first
> # in NOTES.
> #
> # $FreeBSD: src/sys/i386/conf/GENERIC,v 1.413.2.13 2005/04/02
> 16:37:58 scottl Exp $
>
> machine   i386
> cpu   I486_CPU
> cpu   I586_CPU
> cpu   I686_CPU
> ident YIN-YANG
>
> # debug
> options WITNESS
> options KDB
> options DDB
> #
> makeoptions   DEBUG=-g
> # debug
>
>
> # To statically compile in device wiring instead of /boot/device.hints
> #hints"GENERIC.hints" # Default places to look for 
> devices.
>
> options   SCHED_4BSD  # 4BSD scheduler
> options   INET# InterNETworking
> options   INET6   # IPv6 communications protocols
> options   FFS # Berkeley Fast Filesystem
> options   SOFTUPDATES # Enable FFS soft updates support
> options   UFS_ACL # Support for access c

panic in propagate_priority w/ postgresql under heavy load

2005-09-01 Thread Koen Martens
Hi Hackers,

I've had a little chat with neologism on ircnet/#freebsd about this
already, and done as he suggested: compile a debug kernel to obtain
a stack trace.

Anyway, what is happening is that there is a crash when running
postgresql 8.0.3 with a very large database and doing heavy queries.

Kernel is 5.4-RELEASE-p6 (5.4-RELENG i checked out tuesday a week
ago). Kernel conf is down below.

Here is the message i get on the console:


kernel trap 12 with interrupts disabled


Fatal trap 12: page fault while in kernel mode
cpuid = 1; apic id = 06
fault virtual address   = 0x24
fault code  = supervisor read, page not present
instruction pointer = 0x8:0xc050cff7
stack pointer   = 0x10:0xe99c2b0c
frame pointer   = 0x10:0xe99c2b20
code segment= base 0x0, limit 0xf, type 0x1b
= DPL 0, pres 1, def32 1, gran 1
processor eflags= resume, IOPL = 0
current process = 4571 (postgres)


It has been a postgres process in all of the observed cases.

I've looked it up with gdk on my kernel.debug, here's what i get:

=
yin# gdb kernel.debug
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and
you are
welcome to change it and/or distribute copies of it under certain
conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for
details.
This GDB was configured as "i386-marcel-freebsd"...
(gdb) l *propagate_priority+0x7f
0xc050cff7 is in propagate_priority
(/usr/src/sys/kern/subr_turnstile.c:245).
240 /*
241  * Pick up the lock that td is blocked on.
242  */
243 ts = td->td_blocked;
244 MPASS(ts != NULL);
245 tc = TC_LOOKUP(ts->ts_lockobj);
246 mtx_lock_spin(&tc->tc_lock);
247
248 /*
249  * This thread may not be blocked on this
turnstile anymore
(gdb)
=


So the next thing you'll ask for is a stack trace, but i haven't
been able to obtain one. According to the freebsd handbook
(http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneldebug-gdb.html)
there should be a core dump in /var/crash, but there is none and the
handbook chapter seems outdated anyway judging by it mentioning kgdb...

Anyway, it seems the dump should've gone to the swap partition, but
i'm into multi-user mode again so i guess i'll have to wait for
another panic to obtain it?

I'm not very knowledgeable about the freebsd kernel internals (yet),
so i'm not sure what could be wrong here.. I hope some of you can
provide some insight, and ideally a fix :)

==[ kernel config:

#
# GENERIC -- Generic kernel configuration file for FreeBSD/i386
#
# For more information on this file, please read the handbook section on
# Kernel Configuration Files:
#
#
http://www.FreeBSD.org/doc/en_US.ISO8859-1/books/handbook/kernelconfig-config.html
#
# The handbook is also available locally in /usr/share/doc/handbook
# if you've installed the doc distribution, otherwise always see the
# FreeBSD World Wide Web server (http://www.FreeBSD.org/) for the
# latest information.
#
# An exhaustive list of options and more detailed explanations of the
# device lines is also present in the ../../conf/NOTES and NOTES files.
# If you are in doubt as to the purpose or necessity of a line,
check first
# in NOTES.
#
# $FreeBSD: src/sys/i386/conf/GENERIC,v 1.413.2.13 2005/04/02
16:37:58 scottl Exp $

machine i386
cpu I486_CPU
cpu I586_CPU
cpu I686_CPU
ident   YIN-YANG

# debug
options WITNESS
options KDB
options DDB
#
makeoptions DEBUG=-g
# debug


# To statically compile in device wiring instead of /boot/device.hints
#hints  "GENERIC.hints" # Default places to look for devices.

options SCHED_4BSD  # 4BSD scheduler
options INET# InterNETworking
options INET6   # IPv6 communications protocols
options FFS # Berkeley Fast Filesystem
options SOFTUPDATES # Enable FFS soft updates support
options UFS_ACL # Support for access control lists
options UFS_DIRHASH # Improve performance on big directories
options MD_ROOT # MD is a potential root device
options NFSCLIENT   # Network Filesystem Client
options NFSSERVER   # Network Filesystem Server
options NFS_ROOT# NFS usable as /, requires NFSCLIENT
options MSDOSFS # MSDOS Filesystem
options CD9660  # ISO 9660 Filesystem
options PROCFS