Re: KGDB/i386 broken/supposed to work?
I'll keep digging. The problem is that KGDB isn't prepared to deal with the non-blocking serial port reads that matt's commit introduced in 2013 (am I really the first one to try this on bare metal since then?) Where the read function `com_common_getc` used to block, it will now return -1, which KGDB happily takes for real input (an endless stream of 0xff) having arrived on the serial port. This leads to excessively long input (from KGDB's perspective) which eventually makes it bail out of interpreting the received data. Compare sys/kern/kgdb_stub.c: 268 while ((c = GETC()) != KGDB_END len maxlen) { 269 DPRINTF((%c,c)); [...] 273 len++; 274 } [...][`len` now has a significant value due to `c` repeatedly being -1] [...][so the following conditional is taken:] 279 if (len = maxlen) { 280 DPRINTF((Long- )); 281 PUTC(KGDB_BADP); 282 continue; 283 } (It would have been easy to spot thanks to the DPRINTFs, but the one on line 269 completely flooded the console, preventing me from catching the one on line 280) If I replace the `GETC` macro with a function that spins as long as -1 is read, as in the patch below, the problem disappears. The patch below is of course just an ad-hoc fix that Works For Me(TM), I'm currently not sure how to address the problem in a more general manner -- perhaps by providing a blocking interface to the serial port on top of the non-blocking one, in a way similar to what the `GETC` function below does, and directing kgdb to use that instead? --- sys/kern/kgdb_stub.c.orig 2015-06-26 01:49:32.0 +0200 +++ sys/kern/kgdb_stub.c2015-06-26 01:51:31.0 +0200 @@ -85,7 +85,17 @@ static u_char buffer[KGDB_BUFLEN]; static kgdb_reg_t gdb_regs[KGDB_NUMREGS]; -#define GETC() ((*kgdb_getc)(kgdb_ioarg)) +static int +GETC(void) +{ + int c; + + while ((c = kgdb_getc(kgdb_ioarg)) == -1) + ; + + return c; +} +//#define GETC() ((*kgdb_getc)(kgdb_ioarg)) #define PUTC(c)((*kgdb_putc)(kgdb_ioarg, c)) /*
Re: KGDB/i386 broken/supposed to work?
There is no delay between ``kgdb waiting...'' and ``fatal breakpoint trap in supervisor mode''. Looks like that behavior was introduced by the following commit to src/sys/arch/i386/i386/trap.c: revision 1.239 date: 2008-05-30 13:38:21 +0300; author: ad; state: Exp; lines: +10 -11; Since breakpoints don't work, dump basic info about the trap before entering the debugger. Sometimes ddb only makes the situation worse. Yes, I've arrived at that commit too, after a longish bisect session. It's indeed the expected behavior. I'm now bisecting 6-stable (it seems to work there indeed), vs -head; but this will take quite a while. I'll report back once I arrived somewhere. Thanks for your replies, Timo
Re: KGDB/i386 broken/supposed to work?
Timo Buhrmester wrote: Using a GENERIC kernel with only the modifications required to enable KGDB (see bottom for config diff), I get the following behavior on the TARGET machine: | boot netbsd -d | 15741968+590492+466076 [689568+730405]=0x1161fd4 | kernel text is mapped with 4 large pages and 5 normal pages | Loaded initial symtab at 0xc110750c, strtab at 0xc11afaac, # entries 43075 | kgdb waiting...fatal breakpoint trap in supervisor mode | trap type 1 code 0 eip c02a6744 cs 8 eflags 202 cr2 0 ilevel 8 esp c1265ea0 | curlwp 0xc1078900 pid 0 lid 1 lowest kstack 0xc12632c0 There is no delay between ``kgdb waiting...'' and ``fatal breakpoint trap in supervisor mode''. I'm not sure whether or not this is the expected behavior, because eip c02a6744 is in the `breakpoint` function so that would make sense; but the documentation makes it sound like it should just say ``kgdb waiting...''. Looks like that behavior was introduced by the following commit to src/sys/arch/i386/i386/trap.c: revision 1.239 date: 2008-05-30 13:38:21 +0300; author: ad; state: Exp; lines: +10 -11; Since breakpoints don't work, dump basic info about the trap before entering the debugger. Sometimes ddb only makes the situation worse. Any idea whether a) KGDB is tested/supposed to work and b) what I might be doing wrong? KGDB over serial on i386 worked for me in January 2008. I don't know if it is still working now; the kernel debugging I've done since then has been using qemu's built-in gdb stub instead, as described in https://wiki.netbsd.org/kernel_debugging_with_qemu/. -- Andreas Gustafsson, g...@gson.org
Re: KGDB/i386 broken/supposed to work?
I am a bit fuzzy on the details, but definitely in 2010 on netbsd-5 remote kgdb on i386 worked. I am 95% sure it still worked on netbsd-6 in 2011/2012. pgpCcJa1jGgY9.pgp Description: PGP signature
Re: KGDB/i386 broken/supposed to work?
In article 20150619201302.GA243@frozen.localdomain, Timo Buhrmester fstd.l...@gmail.com wrote: I'm failing to get KGDB on i386 working for kernel debugging over a serial (nullmodem) link, as described in http://www.netbsd.org/docs/kernel/kgdb.html The TARGET (to-be-debugged) system has two serial ports, com0 is the boot console, com1 is what I set KGDB to operate on. The REMOTE (debugger) system uses its com0 port to connect to the target's com1. Using a GENERIC kernel with only the modifications required to enable KGDB (see bottom for config diff), I get the following behavior on the TARGET machine: | boot netbsd -d | 15741968+590492+466076 [689568+730405]=0x1161fd4 | kernel text is mapped with 4 large pages and 5 normal pages | Loaded initial symtab at 0xc110750c, strtab at 0xc11afaac, # entries 43075 | kgdb waiting...fatal breakpoint trap in supervisor mode | trap type 1 code 0 eip c02a6744 cs 8 eflags 202 cr2 0 ilevel 8 esp c1265ea0 | curlwp 0xc1078900 pid 0 lid 1 lowest kstack 0xc12632c0 There is no delay between ``kgdb waiting...'' and ``fatal breakpoint trap in supervisor mode''. I'm not sure whether or not this is the expected behavior, because eip c02a6744 is in the `breakpoint` function so that would make sense; but the documentation makes it sound like it should just say ``kgdb waiting...''. On the REMOTE (debugger) machine (serial port tty00) I get/do: | # gdb -q netbsd.gdb | Reading symbols from netbsd.gdb...done. | (gdb) set remotebaud 38400 | Warning: command 'set remotebaud' is deprecated. | Use 'set serial baud'. | | (gdb) set serial baud 38400 | (gdb) set remotebreak 1 | Warning: command 'set remotebreak' is deprecated. | Use 'set remote interrupt-sequence'. | | (gdb) set remote interrupt-sequence Ctrl-C | (gdb) set remotetimeout 5 | (gdb) target remote /dev/tty00 | Remote debugging using /dev/tty00 | Ignoring packet error, continuing... | warning: unrecognized item timeout in qSupported response | Ignoring packet error, continuing... | Ignoring packet error, continuing... | Bogus trace status reply from target: timeout | (gdb) ..which I presume is due to the target already having ceased execution. Both machines run the same, recent -current build (7.99.18) on i386. I have verified that the serial connection works in both directions, using a non-KGDB GENERIC kernel. I have also verified that kgdb is actually in the kernel and using the right port (com1) when booting the KGDB kernel without -d: | com0 at isa0 port 0x3f8-0x3ff irq 4: ns16550a, working fifo | com0: console | com1 at isa0 port 0x2f8-0x2ff irq 3: ns16550a, working fifo | com1: kgdb The difference between GENERIC and my KGDB-enabled version of it: -#options DEBUG # expensive debugging checks/support +options DEBUG # expensive debugging checks/support #options LOCKDEBUG # expensive locking checks/support #options KMEMSTATS # kernel memory statistics (vmstat -m) -options DDB # in-kernel debugger +#options DDB # in-kernel debugger #options DDB_ONPANIC=1 # see also sysctl(7): `ddb.onpanic' -options DDB_HISTORY_SIZE=512# enable history editing in DDB +#options DDB_HISTORY_SIZE=512# enable history editing in DDB #options DDB_VERBOSE_HELP -#options KGDB# remote debugger -#options KGDB_DEVNAME=\com\,KGDB_DEVADDR=0x3f8,KGDB_DEVRATE=9600 -#makeoptions DEBUG=-g # compile full symbol table +options KGDB# remote debugger +options KGDB_DEVNAME=\com\,KGDB_DEVADDR=0x2f8,KGDB_DEVRATE=38400 +makeoptions DEBUG=-g # compile full symbol table #options SYSCALL_STATS # per syscall counts #options SYSCALL_TIMES # per syscall times #options SYSCALL_TIMES_HASCOUNTER# use 'broken' rdtsc (soekris) Any idea whether a) KGDB is tested/supposed to work and b) what I might be doing wrong? Is there any other relevant information I missed that would be useful to provide? No, but the explanation is that support for it has probably rotted out. I would file a PR so this information is not lost. christos