IPv6-related panic

2023-05-22 Thread Timo Buhrmester
Our 8.2_STABLE (amd64) system paniced the other day in what appears to be an 
IPv6-related issue.

We had a tcpdump running at the time, which may or may not be related (I 
suspect it, though, as we had some years of uptime on that machine before)

Our own research led to sys/external/bsd/ipf/netinet/fil.c:942, fin->fin_m == 
NULL

Here's the DDB info:

| fatal page fault in supervisor mode
| trap type 6 code 0 rip 0x803e5099 cs 0x8 rflags 0x10206 cr2 0x20 
ilevel 0x4 rsp 0x8000ad83e990
| curlwp 0xfe81ff624420 pid 0.3 lowest kstack 0x8000ad83c2c0
| kernel: page fault trap, code=0
| Stopped in pid 0.3 (system) at  netbsd:ipf_makefrip+0x12e0: cmpl
20(%rax),%edx
| db{0}> bt
| ipf_makefrip() at netbsd:ipf_makefrip+0x12e0
| ipf_checkicmp6matchingstate() at netbsd:ipf_checkicmp6matchingstate+0xed
| ipf_state_lookup() at netbsd:ipf_state_lookup+0x749
| ipf_state_check() at netbsd:ipf_state_check+0x67
| ipf_check() at netbsd:ipf_check+0x94d
| pfil_run_hooks() at netbsd:pfil_run_hooks+0x11b
| ip6_input() at netbsd:ip6_input+0x1c2
| ip6intr() at netbsd:ip6intr+0x7b
| softint_dispatch() at netbsd:softint_dispatch+0x91
| DDB lost frame for netbsd:Xsoftintr+0x4f, trying 0x8000ad83f0f0
| Xsoftintr() at netbsd:Xsoftintr+0x4f
| --- interrupt ---
| 0:
| db{0}> show reg
| ds1f80
| es0
| fs3528
| gs0
| rdi   0
| rsi   fe81228f6868
| rbp   8000ad83ea00
| rbx   3a
| rdx   bc
| rcx   94
| rax   0
| r880a897a0 ipfmain
| r9fe81228f6840
| r10   fe81fc917808
| r11   8000ad83ed48
| r12   8000ad83ed48
| r13   4400
| r14   8e708000
| r15   8000ad83ea68
| rip   80e35099 ipf_makefrip+0x12e0
| cs8
| rflags10206
| rsp   8000ad83e990
| ss10
| netbsd:ipf_makefrip+0x12e0:cmpl20(%rax),%edx
| db{0}>


USB-related panic in 8.2_STABLE

2023-04-27 Thread Timo Buhrmester
Apparently out of nothing, one of our servers paniced.


uname -a gives:

| NetBSD trave.math.uni-bonn.de 8.2_STABLE NetBSD 8.2_STABLE (MI-Server) #17: 
Fri Jul 16 14:01:03 CEST 2021  
supp...@trave.math.uni-bonn.de:/var/work/obj-8/sys/arch/amd64/compile/miserv 
amd64

I've transcribed the panic message and backtrace:

| ohci0: 1 scheduling overruns
| ugen0: detached
| ugen0: at uhub4 port 2 (addr 2) disconnected
| ugen0 at uhub4 port 2
| ugen0: Phoenixtec Power (0x6da) USB Cable (V2.00) (0x02), rev 1.00/0.06, addr 
2
| uvm_fault(0xfe82574c2458, 0x0, 1) -> e
| fatal page fault in supervisor mode
| trap type 6 code 0 rip 0x802f627e cs 0x8 rflags 0x10246 cr2 0x2 
ilevel 6 (NB: could be ilevel 0 as well) rsp 0x80013f482c10
| curlwp 0xfe83002b2000 pid 8393.1 lowest kstack 0x80013f4802c0
| kernel: page fault trap, code=0
| Stopped in pid 8393.1 (nutdrv_qx_usb) at   netbsd:ugen_get_cdesc+0xb1:
| movzwl 2(%rax),%edx
| db{2}> bt
| ugen_get_cdesc() at netbsd:ugen_get_cdesc+0xb1
| ugenioctl() at netbsd:ugenioctl+0x9a4
| cdev_ioctl() at netbsd:cdev_ioctl+0xb4
| VOP_IOCTL() at netbsd:VOP_IOCTL+0x54
| vn_ioctl() at netbsd:vn_ioctl+0xa6
| sys_ioctl() at netbsd:sys_ioctl+0x11a
| syscall() at netbsd:syscall+0x1ec
| --- syscall (number 54) ---
| 7a73c9eff13a:
| db{2}>

Any idea what's going on?

Cheers
Timo Buhrmester


Re: A nit-pick on gettime(9) man-page

2017-01-04 Thread Timo Buhrmester
> A "monotonically increasing" function would guarantee a results that is
> strictly greater than any previous results.
AFAIK "monotonically increasing" and "non-decreasing" are synonyms which both
allow for constant sections, without which it could be called "strictly
increasing".


Re: In kernel mixing - audio cloning

2016-12-28 Thread Timo Buhrmester
> I do get some audible artifacts with this patch
It was pointed out to me on IRC that this is not caused by
your patch, so, never mind.


Re: In kernel mixing - audio cloning

2016-12-28 Thread Timo Buhrmester
> The audio device is cloned and their is no longer a limitation of one device 
> per process and file descriptors may be shared among processes.
Awesome!  Cloning works, the volume-going-down-with-each-open(2) problem
is gone.

I do get some audible artifacts with this patch, though, at least for
mp3 playback using mpg123 (will poke at it in more depth later).  Hard
to explain, but something like a periodic (every 2-3 seconds), white
noise stutter intermixed with the playing track.  Sounds like buffer skip(?).

My audio chip being
hdafg1 at hdaudio1: vendor 1106 product 0397
audio0 at hdafg1: full duplex, playback, capture, mmap, independent

In any case, thanks for implementing this!


Re: Audio - In kernel audio mixing

2016-05-15 Thread Timo Buhrmester
> It works so far [...]
Forgot to mention, the pc beeper does not work.  Is this device correct?
crw-rw-rw-  1 root  wheel  156,   0 May 16 02:19 /dev/speaker

Is it still supposed to use the real integrated beeper, or would it
synthesize a beep through the sound card?


Re: Audio - In kernel audio mixing

2016-05-15 Thread Timo Buhrmester
> /usr/src/sys/dev/isa/files.isa:435: redefinition of `spkr'
It builds when removing the pertinent line from that file.

It works so far, apart from what appears to be a logarithmic decrease
in global volume with each concurrent opening of the device.  I assume
this is done to avoid clipping when multiple large amplitude signals
happen to be in phase (?).

I wonder how other mixers solve the issue.


Re: Audio - In kernel audio mixing

2016-05-15 Thread Timo Buhrmester
I'm getting this when trying to build with your vaudio-kern patches applied:

$ ./build.sh -O /usr/obj.i386 -j3 -U -u kernel=GENERIC
===> build.sh command:./build.sh -O /usr/obj.i386 -j3 -U -u kernel=GENERIC
===> build.sh started:Mon May 16 01:24:25 CEST 2016
===> NetBSD version:  7.99.29
===> MACHINE: i386
===> MACHINE_ARCH:i386
===> Build platform:  NetBSD 7.99.29 i386
===> HOST_SH: /bin/sh
===> MAKECONF file:   /etc/mk.conf
===> TOOLDIR path:/usr/obj.i386/tooldir.NetBSD-7.99.29-i386
===> DESTDIR path:/usr/obj.i386/destdir.i386
===> RELEASEDIR path: /usr/obj.i386/releasedir
===> Updated makewrapper: 
/usr/obj.i386/tooldir.NetBSD-7.99.29-i386/bin/nbmake-i386
===> Building kernel without building new tools
--- obj ---
===> Building kernel: GENERIC
===> Build directory: /usr/obj.i386/sys/arch/i386/compile/GENERIC
/usr/src/sys/dev/isa/files.isa:435: redefinition of `spkr'
/usr/src/sys/dev/isa/files.isa:436: redefinition of `spkr'
/usr/src/sys/dev/isa/files.isa:437: duplicate file dev/isa/spkr.c
/usr/src/sys/dev/vaudio/files.vaudio:10: here is the original definition
*** Stop.

ERROR: nbconfig failed for GENERIC
*** BUILD ABORTED ***

(I have adjusted the patch to remove spkr from
sys/arch/i386/conf/majors.i386 as well)


Re: Audio - In kernel audio mixing

2016-05-15 Thread Timo Buhrmester
> I believe that the vaudio approach is better and wanted to start a discussion 
> about in kernel-mixing and hopefully which approach (if any) should be 
> included in NetBSD in future.
A third option would be taking OpenBSD's sndiod (which we have in
pkgsrc/wip); it seems rather sane and it's probably not all too much
work to make support our OSS (there are two or three ioctls missing).


Re: fssconfig/raidframe/dump-related crashes

2016-03-11 Thread Timo Buhrmester
On Fri, Mar 11, 2016 at 08:21:04AM +, Emmanuel Dreyfus wrote:
> On Fri, Mar 11, 2016 at 07:37:01AM +0100, Timo Buhrmester wrote:
> > It just completed one entire dump this way with no crash.
> > That hadn't happened in 6 days.  Hmm.
> 
> My experience of the thing is that it is not reproductible. I
> will backup fine for weeks and sometimes crash.

The problem seems to disappear when building the kernel wih -O1 or lower.
I'm approaching a (statistically significant) streak of 16 complete runs
of my backup script, whereas with a -O2 kernel it rarely takes more than
one or two attempts to crash it.


Re: fssconfig/raidframe/dump-related crashes

2016-03-10 Thread Timo Buhrmester
> > I use the built-in snapshot capability of dump(8), with the -X option:
> > 
> > dump -0au -h 0 -X -f- ${fs} | gzip -9 > $outfile
> Thanks, I wasn't aware of that.  I'll see if changes anything.
It just completed one entire dump this way with no crash.
That hadn't happened in 6 days.  Hmm.


Re: fssconfig/raidframe/dump-related crashes

2016-03-10 Thread Timo Buhrmester
> Perhaps you can get it to provide a backtrace rather than just simply
> rebooting?
I'd love to, but how?  The machine is set to drop into ddb on panic, but
doesn't, or this isn't a panic.

> I use the built-in snapshot capability of dump(8), with the -X option:
> 
>   dump -0au -h 0 -X -f- ${fs} | gzip -9 > $outfile
Thanks, I wasn't aware of that.  I'll see if changes anything.


fssconfig/raidframe/dump-related crashes

2016-03-10 Thread Timo Buhrmester
My 7-stable/amd64 server crashes nearly every night while my backup
routine is in progress.  There's no backtrace and no crash dump is saved,
but the console reads:
> ohci1: 1 scheduling overruns
> ohci1: WARNING: addr 0x01cf not found
> ohci1: WARNING: addr 0x012c not found
> ohci1: WARNING: addr 0x01cf not found
> ohci1: WARNING: addr 0x01d5 not found
> ohci1: WARNING: addr 0x01d7 not found
> ohci1: WARNING: addr 0x01d6 not found
> ohci1: WARNING: addr 0x012c not found
> ohci1: WARNING: addr 0x01cf not found
> ohci1: 44 scheduling overruns
> [more of these]
> ohci1: 46 scheduling overruns
before it reboots (sometimes hangs instead).

Probably, these messages are /not/ traces of the root cause, since the
machine will also crash with a kernel with no ohci support compiled in
whatsoever - the crashes are silent, then.

It happens while dump(8)ing an in-filesystem fss(4)-snapshot of an empty-ish
FFSv1 (fslevel 4) filesystem sitting on a raid(4)-1 with two components.

There should not, conceptually, be a problem with dumping a fss device,
right?

The command my script runs to create the snapshot is
# fssconfig -cx fss0 /stor /stor/snapshot
and the dump
# dump -$lvl -uant -h 0 -L "$nam" -f - /dev/rfss0 >/tmp/dumpfifo
where /tmp/dumpfifo is a fifo from which
# gzip -1 /var/tmp/dump.gz
reads.  (I don't remember the reason for going via a fifo, but there
was one...)

Any suggestions where I could start looking?  So far, I've tried running
a DEBUG kernel but that didn't provide additional information.
The filesystem is clean as far as fsck_ffs is concerned, too.


Here's some information on the filesystem:

# mount -v | grep /stor
/dev/raid0g on /stor type ffs (log, noatime, local, fsid: 0x1206/0x78b, reads: 
sync 8489 async 0, writes: sync 0 async 1791)


# df -h /stor
Filesystem Size   Used  Avail %Cap Mounted on
/dev/raid0g416G19G   376G   4% /stor


# dumpfs -s /stor
file system: /dev/rraid0g
format  FFSv1
endian  little-endian
magic   11954   timeFri Mar 11 05:55:54 2016
superblock location 8192id  [ 564b7b58 793a9223 ]
cylgrp  dynamic inodes  4.4BSD  sblock  FFSv2   fslevel 4
nbfree  12996876ndir56002   nifree  26708703nffree  4718
ncg 580 size109891568   blocks  109028517
bsize   32768   shift   15  mask0x8000
fsize   4096shift   12  mask0xf000
frag8   shift   3   fsbtodb 3
bpg 23684   fpg 189472  ipg 47104
minfree 5%  optim   timemaxcontig 2 maxbpg  8192
symlinklen 60   contigsumsize 2
maxfilesize 0x004002001005
nindir  8192inopb   256
avgfilesize 16384   avgfpdir 64
sblkno  8   cblkno  16  iblkno  24  dblkno  1496
sbsize  4096cgsize  32768
csaddr  1496cssize  12288
cgrotor 0   fmod0   ronly   0   clean   0x02
wapbl version 0x1   location 2  flags 0x0
wapbl loc0 439587072loc1 131072 loc2 512loc3 3
flags   wapbl 
fsmnt   /stor
volname swuid   0


# raidctl -sv raid0
Components:
   /dev/wd0a: optimal
   /dev/wd1a: optimal
No spares.
Component label for /dev/wd0a:
   Row: 0, Column: 0, Num Rows: 1, Num Columns: 2
   Version: 2, Serial Number: 2015111701, Mod Counter: 1213
   Clean: No, Status: 0
   sectPerSU: 128, SUsPerPU: 1, SUsPerRU: 1
   Queue size: 100, blocksize: 512, numBlocks: 913211264
   RAID Level: 1
   Autoconfig: Yes
   Root partition: Force
   Last configured as: raid0
Component label for /dev/wd1a:
   Row: 0, Column: 1, Num Rows: 1, Num Columns: 2
   Version: 2, Serial Number: 2015111701, Mod Counter: 1213
   Clean: No, Status: 0
   sectPerSU: 128, SUsPerPU: 1, SUsPerRU: 1
   Queue size: 100, blocksize: 512, numBlocks: 913211264
   RAID Level: 1
   Autoconfig: Yes
   Root partition: Force
   Last configured as: raid0
Parity status: clean
Reconstruction is 100% complete.
Parity Re-write is 100% complete.
Copyback is 100% complete.
# exit


Re: panic: locking xyz against myself (linux DRM?!)

2015-07-25 Thread Timo Buhrmester
> I've been offering the attached patch to try to debug the source of
> the problem before the symptom you described happens.  I haven't
> gotten any diagnostics back from anyone yet.  If you can, please try
> it out and let me know.
I believe we're already discussing this on IRC, but for the record, with your 
patch and LOCKDEBUG I get:

panic: kernel diagnostic assertion "((mutex->wwm_state != WW_OWNED) || 
(mutex->wwm_u.owner != curlwp))" failed: file 
"/usr/src/sys/external/bsd/drm2/linux/linux_ww_mutex.c", line 760 locking 
0xfe8084178220 against myself: 0xfe811d915700
fatal breakpoint trap in supervisor mode
trap type 1 code 0 rip 8028be15 cs 8 rflags 246 cr2 7f7ff2e62000 ilevel 
8 rsp fe804175dad8
curlwp 0xfe811d915700 pid 2288.1 lowest kstack 0xfe804175a2c0
Stopped in pid 2288.1 (Xorg) at netbsd:breakpoint+0x5:  leave
db{0}> bt
breakpoint() at netbsd:breakpoint+0x5
vpanic() at netbsd:vpanic+0x13c
kern_assert() at netbsd:kern_assert+0x4f
linux_ww_mutex_trylock() at netbsd:linux_ww_mutex_trylock+0xed
ttm_bo_uvm_fault() at netbsd:ttm_bo_uvm_fault+0x69
radeon_ttm_fault() at netbsd:radeon_ttm_fault+0x6a
uvm_fault_internal() at netbsd:uvm_fault_internal+0x828
trap() at netbsd:trap+0x249
--- trap (number 6) ---
7f7ff002e4ed:
db{0}>


Cheers


panic: locking xyz against myself (linux DRM?!)

2015-07-25 Thread Timo Buhrmester
For a while now, I'm getting the occasional panic which I can't directly 
reproduce, but it seems to correlate with long and/or memory- and/or 
video-intense firefox sessions.

This is a recent -current (7.99.20 on amd64, although the problem exists as of 
at least 7.99.10, likely earlier too), my (onboard) video chip is
> radeon0 at pci1 dev 5 function 0: vendor 1002 product 9614 (rev. 0x00)
all with
> radeondrmkmsfb0 at radeon0
> radeondrmkmsfb0: framebuffer at 0x800046148000, size 1280x1024, depth 32, 
> stride 5120

Besides that, I'm using Xorg from pkgsrc-2015Q2 (i.e. X11_TYPE=modular)

Now, whenever the system is up for a few days, and I didn't think of restarting 
firefox for a while, it eventually crashes with:
> panic: kernel diagnostic assertion "((mutex->wwm_state != WW_OWNED) || 
> (mutex->wwm_u.owner != curlwp))" failed: file 
> "/usr/src/sys/external/bsd/drm2/linux/linux_ww_mutex.c", line 760 locking 
> 0xfe804fc70220 against myself: 0xfe811c5b2840
> fatal breakpoint trap in supervisor mode
> trap type 1 code 0 rip 80193685 cs 8 rflags 246 cr2 7f7ff7e43000 
> ilevel 8 rsp fe8041791ac0
> curlwp 0xfe811c5b2840 pid 1237.1 lowest kstack 0xfe804178e2c0
> Stopped in pid 1237.1 (Xorg) at netbsd:breakpoint+0x5:  leave
> breakpoint() at netbsd:breakpoint+0x5
> vpanic() at netbsd:vpanic+0x13c
> kern_assert() at netbsd:kern_assert+0x4f
> linux_ww_mutex_trylock() at netbsd:linux_ww_mutex_trylock+0xc1
> ttm_bo_uvm_fault() at netbsd:ttm_bo_uvm_fault+0x69
> radeon_ttm_fault() at netbsd:radeon_ttm_fault+0x6a
> uvm_fault_internal() at netbsd:uvm_fault_internal+0x828
> trap() at netbsd:trap+0x32a
> --- trap (number 6) ---
> 7f7ff002e4ed:
> db{1}>

Any ideas?  Does anyone else have the same problem?

I can extract information from the crash dump if required.


Re: kgdb on amd64

2015-06-26 Thread Timo Buhrmester
On Wed, Jun 24, 2015 at 05:50:41PM +, Christos Zoulas wrote:
> >> The reason kgdb wasn't working all this time on amd64 is that GETC()
> >> returns -1 immediately whether or not a character is available =>
> >> all of kgdb's checksums fail due to the extra "-1" characters.
Wow, seems like we analyzed the same bug simultaneously (see my "KGDB/i386 
broken/supposed to work?" mails).

FWIW, I independently came to the same conclusion (see the "KGDB/i386 
broken/supposed to work?" mails) with an almost identical "fix", except for the 
stray
> + continue;

(Writing this just so we can avoid further duplication of effort)


Cheers,
Timo Buhrmester


Re: KGDB/i386 broken/supposed to work?

2015-06-25 Thread Timo Buhrmester
> I'll keep digging.

The problem is that KGDB isn't prepared to deal with the non-blocking serial 
port reads that matt's commit introduced in 2013 (am I really the first one to 
try this on bare metal since then?)

Where the read function `com_common_getc` used to block, it will now return -1, 
which KGDB happily takes for real input (an endless stream of 0xff) having 
arrived on the serial port.
This leads to excessively long input (from KGDB's perspective) which eventually 
makes it bail out of interpreting the received data.
Compare sys/kern/kgdb_stub.c:

268  while ((c = GETC()) != KGDB_END && len < maxlen) {
269  DPRINTF(("%c",c));
 [...]
273  len++;
274  }
 [...][`len` now has a significant value due to `c` repeatedly being -1]
 [...][so the following conditional is taken:]
279  if (len >= maxlen) {
280  DPRINTF(("Long- "));
281  PUTC(KGDB_BADP);
282  continue;
283  }

(It would have been easy to spot thanks to the DPRINTFs, but the one on line 
269 completely flooded the console, preventing me from catching the one on line 
280)

If I replace the `GETC` macro with a function that spins as long as -1 is 
"read", as in the patch below, the problem disappears.

The patch below is of course just an ad-hoc fix that Works For Me(TM), I'm 
currently not sure how to address the problem in a more general manner -- 
perhaps by providing a blocking interface to the serial port on top of the 
non-blocking one, in a way similar to what the `GETC` function below does, and 
directing kgdb to use that instead?


--- sys/kern/kgdb_stub.c.orig   2015-06-26 01:49:32.0 +0200
+++ sys/kern/kgdb_stub.c2015-06-26 01:51:31.0 +0200
@@ -85,7 +85,17 @@
 static u_char buffer[KGDB_BUFLEN];
 static kgdb_reg_t gdb_regs[KGDB_NUMREGS];
 
-#define GETC() ((*kgdb_getc)(kgdb_ioarg))
+static int
+GETC(void)
+{
+   int c;
+
+   while ((c = kgdb_getc(kgdb_ioarg)) == -1)
+   ;
+
+   return c;
+}
+//#define GETC()   ((*kgdb_getc)(kgdb_ioarg))
 #define PUTC(c)((*kgdb_putc)(kgdb_ioarg, c))
 
 /*


Re: KGDB/i386 broken/supposed to work?

2015-06-22 Thread Timo Buhrmester
> > There is no delay between ``kgdb waiting...'' and ``fatal breakpoint trap 
> > in supervisor mode''.
> 
> Looks like that behavior was introduced by the following commit
> to src/sys/arch/i386/i386/trap.c:
> 
>   
>   revision 1.239
>   date: 2008-05-30 13:38:21 +0300;  author: ad;  state: Exp;  lines: +10 -11;
>   Since breakpoints don't work, dump basic info about the trap before
>   entering the debugger. Sometimes ddb only makes the situation worse.
>   
Yes, I've arrived at that commit too, after a longish bisect session.  It's 
indeed the expected behavior.

I'm now bisecting 6-stable (it seems to work there indeed), vs -head; but this 
will take quite a while.  I'll report back once I arrived somewhere. 

Thanks for your replies,

Timo


KGDB/i386 broken/supposed to work?

2015-06-19 Thread Timo Buhrmester
I'm failing to get KGDB on i386 working for kernel debugging over a serial 
(nullmodem) link, as described in http://www.netbsd.org/docs/kernel/kgdb.html

The TARGET (to-be-debugged) system has two serial ports, com0 is the boot 
console, com1 is what I set KGDB to operate on.
The REMOTE (debugger) system uses its com0 port to connect to the target's com1.

Using a GENERIC kernel with only the modifications required to enable KGDB (see 
bottom for config diff), I get the following behavior on the TARGET machine:
| > boot netbsd -d
| 15741968+590492+466076 [689568+730405]=0x1161fd4
| kernel text is mapped with 4 large pages and 5 normal pages
| Loaded initial symtab at 0xc110750c, strtab at 0xc11afaac, # entries 43075
| kgdb waiting...fatal breakpoint trap in supervisor mode
| trap type 1 code 0 eip c02a6744 cs 8 eflags 202 cr2 0 ilevel 8 esp c1265ea0
| curlwp 0xc1078900 pid 0 lid 1 lowest kstack 0xc12632c0

There is no delay between ``kgdb waiting...'' and ``fatal breakpoint trap in 
supervisor mode''.
I'm not sure whether or not this is the expected behavior, because eip c02a6744 
is in the `breakpoint` function so that would make sense; but the documentation 
makes it sound like it should just say ``kgdb waiting...''.


On the REMOTE (debugger) machine (serial port tty00) I get/do:
| # gdb -q netbsd.gdb
| Reading symbols from netbsd.gdb...done.
| (gdb) set remotebaud 38400 
| Warning: command 'set remotebaud' is deprecated.
| Use 'set serial baud'.
|
| (gdb) set serial baud 38400
| (gdb) set remotebreak 1
| Warning: command 'set remotebreak' is deprecated.
| Use 'set remote interrupt-sequence'.
|
| (gdb) set remote interrupt-sequence Ctrl-C 
| (gdb) set remotetimeout 5 
| (gdb) target remote /dev/tty00
| Remote debugging using /dev/tty00
| Ignoring packet error, continuing...
| warning: unrecognized item "timeout" in "qSupported" response
| Ignoring packet error, continuing...
| Ignoring packet error, continuing...
| Bogus trace status reply from target: timeout
| (gdb)

..which I presume is due to the target already having ceased execution.


Both machines run the same, recent -current build (7.99.18) on i386.
I have verified that the serial connection works in both directions, using a 
non-KGDB GENERIC kernel.
I have also verified that kgdb is actually in the kernel and using the right 
port (com1) when booting the KGDB kernel without -d:
| com0 at isa0 port 0x3f8-0x3ff irq 4: ns16550a, working fifo
| com0: console
| com1 at isa0 port 0x2f8-0x2ff irq 3: ns16550a, working fifo
| com1: kgdb


The difference between GENERIC and my KGDB-enabled version of it:
-#options   DEBUG   # expensive debugging checks/support
+optionsDEBUG   # expensive debugging checks/support
 #options   LOCKDEBUG   # expensive locking checks/support
 #options   KMEMSTATS   # kernel memory statistics (vmstat -m)
-optionsDDB # in-kernel debugger
+#options   DDB # in-kernel debugger
 #options   DDB_ONPANIC=1   # see also sysctl(7): `ddb.onpanic'
-optionsDDB_HISTORY_SIZE=512# enable history editing in DDB
+#options   DDB_HISTORY_SIZE=512# enable history editing in DDB
 #options   DDB_VERBOSE_HELP
-#options   KGDB# remote debugger
-#options   KGDB_DEVNAME="\"com\"",KGDB_DEVADDR=0x3f8,KGDB_DEVRATE=9600
-#makeoptions   DEBUG="-g"  # compile full symbol table
+optionsKGDB# remote debugger
+optionsKGDB_DEVNAME="\"com\"",KGDB_DEVADDR=0x2f8,KGDB_DEVRATE=38400
+makeoptionsDEBUG="-g"  # compile full symbol table
 #options   SYSCALL_STATS   # per syscall counts
 #options   SYSCALL_TIMES   # per syscall times
 #options   SYSCALL_TIMES_HASCOUNTER# use 'broken' rdtsc (soekris)


Any idea whether a) KGDB is tested/supposed to work and b) what I might be 
doing wrong?
Is there any other relevant information I missed that would be useful to 
provide?


Timo Buhrmester