On Wed, Mar 28, 2001 at 04:45:50PM -0500, Roland McGrath wrote:
> > I wouldn't know how to get it, so I don't know if I can. What do I need for
> > this?
>
> Does ddb work these days? Last time I did kernel hacking it was
> oskit-mach, and that dumps a stack trace when it panics.
I don't know, I never used ddb.
> > If it isn't "wire" I am looking for, I don't know what I am looking for (a
> > grep showed nothing in proc/).
>
> You are right. proc used to wire itself (wire_task_self), but it doesn't
> now (init does). So this kernel bug is of more concern than I thought.
I should mention something. I attached two gdbs, and exited the first one
before the second. (I didn't clear the suspend count when starting the
first, and it didn't ask me for the suspend count when exiting, as it would
in another session I tried). So this might be related to gdb mayhem. I don't
know if running two gdbs is fine (it shouldn't crash the kernel, but...).
Anyway, I sticked with one gdb only this time and it didn't crash. The
subhurd reported that it can't emulate the crash and would reboot the Hurd
now, after exiting gdb. So the kernel panic thread_invoke is either a random
crash or a side effect of the two gdbs (would need to do more testing to
find out. Reproducing the crash takes about one hours, so I'd like to avoid
that).
> > Sometimes I wonder if the kernel ring buffer proposed by RMS wouldn't be
> > helpful in situations like this.
>
> Well, maybe. But it is a lot of overhead. I'd be more inclined to work
> on a way to make it possible to trace a sub-hurd using rpctrace on
> the parent hurd.
Ok, sounds fine, too.
I have reproduced exactly the crash Jeff reported. I have collected the data.
I used a ring buffer of 16 entries (can increase if needed), and the full
gdb log is attached. Here are the three ports on which RPCs where logged
immediately before the crash (in interleaved order, see left column). If a
field is blank, it is the same as the previous one in the same column:
port 218:
real-
order bits size seqno id
1. 2147488018 32 1246 24021 dostop
2. 1247 24031 task2proc
3. 1248 24031
5. 1249 24018 get_arg_locations
7. 1250 24030 task2pid
8. 1251 24012 child
port 229:
order bits size seqno id
4. 2147488018 32 0 24013 setmsgport
6. 4370 40 1 24017 set_arg_locations
9. 24 2 24016 getpids
10. 2147488018 120 3 24022 handle_exceptions
11. 32 4 24021 dostop
12. 5 24031 task2proc
13. 6 24031
15. 4370 24 7 24018 get_arg_locations
port 279:
order bits size seqno id
14. 2147488018 32 0 24013 setmsgport
16. 4370 40 1 24017 set_arg_locations
*** crash ***
Of course, one data point is not very much. I can run this a few more times,
and we can see if a pattern emerges. We can insert assertions etc.
We can probably log whole messages.
Can we run proc single threaded, so that we know where exactly it crashed?
Thanks,
Marcus
--
`Rhubarb is no Egyptian god.' Debian http://www.debian.org [EMAIL PROTECTED]
Marcus Brinkmann GNU http://www.gnu.org [EMAIL PROTECTED]
[EMAIL PROTECTED]
http://www.marcus-brinkmann.de
Script started on Thu Mar 29 00:30:37 2001
hurd:~# gdb /proc.exe 86
GNU gdb 5.0
Copyright 2000 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
This GDB was configured as "i386-unknown-gnu0.2"...
/root/86: No such file or directory.
Attaching to program `/proc.exe', pid 86
warning: Can't modify tracing state for pid 86: No signal thread
Reading symbols from /lib/libhurdbugaddr.so.0.2...done.
Loaded symbols for /lib/libhurdbugaddr.so.0.2
Reading symbols from /lib/libthreads.so.0.2...done.
Loaded symbols for /lib/libthreads.so.0.2
Reading symbols from /lib/libihash.so.0.2...done.
Loaded symbols for /lib/libihash.so.0.2
Reading symbols from /lib/libports.so.0.2...done.
Loaded symbols for /lib/libports.so.0.2
Reading symbols from /lib/libshouldbeinlibc.so.0.2...done.
Loaded symbols for /lib/libshouldbeinlibc.so.0.2
Reading symbols from /lib/libc.so.0.2...done.
Loaded symbols for /lib/libc.so.0.2
Reading symbols from /lib/ld.so...done.
Loaded symbols for /lib/ld.so
Reading symbols from /lib/libmachuser.so.1...done.
Loaded symbols for /lib/libmachuser.so.1
Reading symbols from /lib/libhurduser.so.0.0...done.
Loaded symbols for /lib/libhurduser.so.0.0
[Switching to thread 86.1]
(gdb) cont
Continuing.
warning: Can't wait for pid 86: No child processes
Program received signal EXC_BAD_ACCESS, Could not access memory.
[Switching to thread 86.12]
0x1000100 in ?? ()
(gdb) info thr
18 thread 86.18 0x106109c in __mach_msg_trap () from /lib/libc.so.0.2
17 thread 86.17 0x106109c in __mach_msg_trap () from /lib/libc.so.0.2
16 thread 86.16 0x106109c in __mach_msg_trap () from /lib/libc.so.0.2
15 thread 86.15 0x106109c in __mach_msg_trap () from /lib/libc.so.0.2
14 thread 86.14 0x106109c in __mach_msg_trap () from /lib/libc.so.0.2
13 thread 86.13 0x106109c in __mach_msg_trap () from /lib/libc.so.0.2
* 12 thread 86.12 0x1000100 in ?? ()
11 thread 86.11 0x106109c in __mach_msg_trap () from /lib/libc.so.0.2
10 thread 86.10 0x106109c in __mach_msg_trap () from /lib/libc.so.0.2
9 thread 86.9 0x106109c in __mach_msg_trap () from /lib/libc.so.0.2
8 thread 86.8 0x106109c in __mach_msg_trap () from /lib/libc.so.0.2
7 thread 86.7 0x106109c in __mach_msg_trap () from /lib/libc.so.0.2
6 thread 86.6 0x106109c in __mach_msg_trap () from /lib/libc.so.0.2
5 thread 86.5 0x106109c in __mach_msg_trap () from /lib/libc.so.0.2
4 thread 86.4 0x106109c in __mach_msg_trap () from /lib/libc.so.0.2
3 thread 86.3 0x106109c in __mach_msg_trap () from /lib/libc.so.0.2
2 thread 86.2 0x106109c in __mach_msg_trap () from /lib/libc.so.0.2
1 thread 86.1 0x106109c in __mach_msg_trap () from /lib/libc.so.0.2
(gdb) bt full
#0 0x1000100 in ?? ()
No symbol table info available.
#1 0x1000100 in ?? ()
No symbol table info available.
(gdb) x/5i $pc
0x1000100: add %al,(%eax)
0x1000102: add %al,(%eax)
0x1000104: add %al,(%eax)
0x1000106: add %al,(%eax)
0x1000108: add %al,(%eax)
(gdb) i reg
eax 0x0 0
ecx 0x1038730 17008432
edx 0xe 14
ebx 0x118d718 18405144
esp 0x128bee0 0x128bee0
ebp 0x128bf18 0x128bf18
esi 0x128df40 19455808
edi 0x803 2051
eip 0x1000100 0x1000100
eflags 0x10207 66055
cs 0x17 23
ss 0x1f 31
ds 0x1f 31
es 0x1f 31
fs 0x1f 31
gs 0x1f 31
fctrl 0x0 0
fstat 0x0 0
ftag 0x0 0
fiseg 0x0 0
fioff 0x0 0
foseg 0x0 0
fooff 0x0 0
fop 0x0 0
(gdb) print rolling_index
$9 = 7
(gdb) print rolling_buffer
$10 = {{msgh_bits = 2147488018, msgh_size = 120, msgh_remote_port = 163,
msgh_local_port = 229, msgh_seqno = 3, msgh_id = 24022}, {
msgh_bits = 2147488018, msgh_size = 32, msgh_remote_port = 163,
msgh_local_port = 229, msgh_seqno = 4, msgh_id = 24021}, {
msgh_bits = 2147488018, msgh_size = 32, msgh_remote_port = 163,
msgh_local_port = 229, msgh_seqno = 5, msgh_id = 24031}, {
msgh_bits = 2147488018, msgh_size = 32, msgh_remote_port = 163,
msgh_local_port = 229, msgh_seqno = 6, msgh_id = 24031}, {
msgh_bits = 2147488018, msgh_size = 32, msgh_remote_port = 163,
msgh_local_port = 276, msgh_seqno = 0, msgh_id = 24013}, {
msgh_bits = 4370, msgh_size = 24, msgh_remote_port = 163,
msgh_local_port = 229, msgh_seqno = 7, msgh_id = 24018}, {
msgh_bits = 4370, msgh_size = 40, msgh_remote_port = 163,
msgh_local_port = 276, msgh_seqno = 1, msgh_id = 24017}, {
msgh_bits = 2147488018, msgh_size = 32, msgh_remote_port = 163,
msgh_local_port = 218, msgh_seqno = 1246, msgh_id = 24021}, {
msgh_bits = 2147488018, msgh_size = 32, msgh_remote_port = 163,
msgh_local_port = 218, msgh_seqno = 1247, msgh_id = 24031}, {
msgh_bits = 2147488018, msgh_size = 32, msgh_remote_port = 163,
msgh_local_port = 218, msgh_seqno = 1248, msgh_id = 24031}, {
msgh_bits = 2147488018, msgh_size = 32, msgh_remote_port = 163,
msgh_local_port = 229, msgh_seqno = 0, msgh_id = 24013}, {
msgh_bits = 4370, msgh_size = 24, msgh_remote_port = 163,
msgh_local_port = 218, msgh_seqno = 1249, msgh_id = 24018}, {
---Type <return> to continue, or q <return> to quit---
msgh_bits = 4370, msgh_size = 40, msgh_remote_port = 163,
msgh_local_port = 229, msgh_seqno = 1, msgh_id = 24017}, {
msgh_bits = 2147488018, msgh_size = 32, msgh_remote_port = 163,
msgh_local_port = 218, msgh_seqno = 1250, msgh_id = 24030}, {
msgh_bits = 2147488018, msgh_size = 32, msgh_remote_port = 163,
msgh_local_port = 218, msgh_seqno = 1251, msgh_id = 24012}, {
msgh_bits = 4370, msgh_size = 24, msgh_remote_port = 163,
msgh_local_port = 229, msgh_seqno = 2, msgh_id = 24016}}
(gdb) quit
The program is running. Quit anyway (and detach it)? (y or n) y
Detaching from program `/proc.exe' pid 86
hurd:~# exit
Script done on Thu Mar 29 01:28:51 2001