Re: Repeatable crash with 5.4-p1-RELEASE and SMP

2005-06-04 Thread Robert Watson

On Sat, 4 Jun 2005, Palle Girgensohn wrote:

Anyway, I have managed to get an automatic reboot and a core dump. Giant 
leap for mankind :-) . It looks kind of partly overwritten, though. 
According to the Developer's handbook, the core should be saved *before* 
the swap partition is added to the system. I can easily verifying that 
this is not the case, the swap is mounted first. I once again raise 
the question if PR conf/73834 shouln't be addressed? Or perhaps my core 
dump is quite normal? Doesn't look like it. In rc.conf, I have:


I can't speak to the crash itself, but regarding swap and cores: the 
problem is that fsck requires quite a lot of memory in order to operate on 
large file systems, so you have to configure swap before you fsck. 
However, you can't write the core dump to the file system until it has 
been fsck'd.  Normally, if fsck actually uses swap, it will overwrite the 
core dump header, and savecore will recognize that the entire dump is 
invalidated, so usually you don't see the corrupted core, just that the 
core is missing.  Whether this happens depends on how large your file 
systems are, how many you have (since fsck runs in parallel), and how much 
memory you have.  If you want to be sure this doesn't happen, boot to 
single user mode after the crash, manually fsck without swap enabled (fsck 
-p), mount -a, then sh /etc/rc.d/savecore start to save the core.


My suspicion is that the corruption you're seeing is not a property of 
swap being started, but it's easy to rule out if you have a reproduceable 
crash and can be there to boot single-user after the reboot.


Robert N M Watson
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Repeatable crash with 5.4-p1-RELEASE and SMP

2005-06-03 Thread Palle Girgensohn

Hi!

This is very similar to Brendan White problem just reported here. My guess 
is it is the very same problem. I've reported the same problem on some 
occasions before (although I use amd64, so my postings are to 
[EMAIL PROTECTED]).


My system is also Dell 2850, dual CPUs, 3GB RAM, running amd64 FreeBSD 
5.4-p1. It is quite stable (but slow) when running without SMP. When SMP is 
on, it crashes within a few hours. High load, around 4. See my postings on 
amd64@ for many more details.


Anyway, I have managed to get an automatic reboot and a core dump. Giant 
leap for mankind :-) . It looks kind of partly overwritten, though. 
According to the Developer's handbook, the core should be saved *before* 
the swap partition is added to the system. I can easily verifying that this 
is not the case, the swap is mounted first. I once again raise the 
question if PR conf/73834 shouln't be addressed? Or perhaps my core dump is 
quite normal? Doesn't look like it. In rc.conf, I have:


# kernel crash dumps
dumpdev=/dev/amrd0s2b
dumpdir=/misc/crash


Here's the dump. Anything else I shall extract, please just ask.

# kgdb kernel.debug /misc/crash/vmcore.11
[GDB will not be able to debug user-mode threads: /usr/lib/libthread_db.so: 
Undefined symbol ps_pglobal_lookup]

GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain 
conditions.

Type show copying to see the conditions.
There is absolutely no warranty for GDB.  Type show warranty for details.
This GDB was configured as amd64-marcel-freebsd.
#0  doadump () at pcpu.h:167
167 __asm __volatile(movq %%gs:0,%0 : =r (td));
(kgdb) backtrace
#0  doadump () at pcpu.h:167
#1  0x in ?? ()
#2  0x80341267 in boot (howto=260) at 
/usr/src/sys/kern/kern_shutdown.c:410
#3  0x80341ac6 in panic (fmt=0xff007b76d000  «x{) at 
/usr/src/sys/kern/kern_shutdown.c:566

#4  0x804f0f52 in trap_fatal (frame=0xc, eva=18446742976269307904)
   at /usr/src/sys/amd64/amd64/trap.c:639
#5  0x804f11ef in trap_pfault (frame=0xb1d229b0, usermode=0)
   at /usr/src/sys/amd64/amd64/trap.c:562
#6  0x804f1457 in trap (frame=
 {tf_rdi = -1097427517200, tf_rsi = -1097440243712, tf_rdx = 1056, 
tf_rcx = 0, tf_r8 = 0, tf_r9 = 0, tf_r
ax = 1056, tf_rbx = 0, tf_rbp = -1098069766144, tf_r10 = 4503599627366400, 
tf_r11 = 3392, tf_r12 = 4, tf_r13 =
4, tf_r14 = -1099313881192, tf_r15 = -1097364452848, tf_trapno = 12, 
tf_addr = 136, tf_flags = -1099313881192
, tf_err = 0, tf_rip = -2144020582, tf_cs = 8, tf_rflags = 66050, tf_rsp = 
-1311626640, tf_ss = 0})

   at /usr/src/sys/amd64/amd64/trap.c:341
#7  0x804deb0b in calltrap () at 
/usr/src/sys/amd64/amd64/exception.S:171

#8  0xff007c3900f0 in ?? ()
#9  0xff007b76d000 in ?? ()
#10 0x0420 in ?? ()
#11 0x in ?? ()
#12 0x in ?? ()
#13 0x in ?? ()
#14 0x0420 in ?? ()
#15 0x in ?? ()
#16 0xff0055f11000 in ?? ()
#17 0x000ff000 in ?? ()
#18 0x0d40 in ?? ()
#19 0x0004 in ?? ()
#20 0x0004 in ?? ()
#21 0xff000bc95f98 in ?? ()
#22 0xff007ffb4a10 in ?? ()
#23 0x000c in ?? ()
#24 0x0088 in ?? ()
#25 0xff000bc95f98 in ?? ()
#26 0x in ?? ()
#27 0x8034d79a in thread_fini (mem=0x0, size=0) at 
/usr/src/sys/kern/kern_thread.c:271

#28 0x in ?? ()
#29 0x0001 in ?? ()
#30 0xff007ffb4a00 in ?? ()
#31 0xff0055f11f98 in ?? ()
#32 0x804d46ff in zone_drain (zone=0x8) at 
/usr/src/sys/vm/uma_core.c:749
#33 0x804d22b6 in zone_foreach (zfunc=0x804d4530 
zone_drain)

   at /usr/src/sys/vm/uma_core.c:1494
#34 0x804d5ec9 in uma_reclaim () at /usr/src/sys/vm/uma_core.c:2623
#35 0x804cfcac in vm_pageout () at /usr/src/sys/vm/vm_pageout.c:674
#36 0x8032805c in fork_exit (callout=0x804cf6b0 
vm_pageout, arg=0x0,

   frame=0xb1d22c50) at /usr/src/sys/kern/kern_fork.c:791
#37 0x804ded0e in fork_trampoline () at 
/usr/src/sys/amd64/amd64/exception.S:296

#38 0x in ?? ()
#39 0x in ?? ()
#40 0x0001 in ?? ()
#41 0x in ?? ()
#42 0x in ?? ()
#43 0x in ?? ()
#44 0x in ?? ()
#45 0x in ?? ()
#46 0x in ?? ()
#47 0x in ?? ()
#48 0x in ?? ()
---Type return to continue, or q return to quit---
#49 0x in ?? ()
#50 0x in ?? ()
#51 0x in ?? ()
#52 0x in ?? ()
#53 0x in ?? ()
#54 0x in ?? ()
#55 0x in ?? ()
#56 0x in ?? ()
#57 0x in ?? ()
#58