Strange kernel panic-like problem

2019-02-07 Thread Kyle
I recently upgraded a box running 6.2 to 6.4 via clean install. After a few 
days of running normally it started locking up, usually within a minute or so 
after booting up to the login prompt. ddb appears on the console.

I eventually thought to try booting bsd.sp, which has been running for about a 
day now without locking up.

Any clues to point me in the right direction would be much appreciated.

Here's some excerpts from the serial console (different sessions) and a dmesg:


ddb{0}> show panic
the kernel did not panic
ddb{0}> trace
acpicpu_idle() at acpicpu_idle+0x1ea
sched_idle(0) at sched_idle+0x245
end trace frame: 0x0, count: -2



login: NMI ... going o debugger
ddb{0}䀿   movq$0x8,%rcxuaei[1;24r c
   db{}> 
ddb{0}> 
d{0}> 
ddb{0} 
ddb{0}> 
ddb{0}> 
ddb{�}> 
ddb{0> 
db{0}> show panic
the kernel did not panic
ddb{0}> 
the kernel did not panic
ddb{0}> 
the kernel did not panic
ddb{0}> 
the kernel did not panic
ddb{0}> 
the kernel di no paniddb{0}> 
the erne i nic
ddb{0}> 
the kernel did not nic
ddb{0}> 
thernel id not nc
ddb{0}> 
the keel did notpanic
ddb{0}> 
theknel did not nic
ddb{0}> 
the kernel did not panic
ddb{0}> 
the kernel did not panic
ddb{0}> 
the kernel did not panc
ddb0}> 
the kernel did not panic
ddb{0}> traccce
No such command
ddb{0}> 
the kernel did not panic
ddb{0}> tracce
No such command
ddb{0}> traace
No such command
ddb{0}> trace
acpicpu_idle() at acpicpu_idle+0x1ea
sched_idle(0) at sched_idle+0x245
end trace frame: 0x0, count: -2
ddb{0}> 
acpicpu_idle() at acpicpu_idle+0x1ea
end trace frame: 0x8000218fca60, count: 0
ddb{0}> 
acpicpu_idle() at acpicpu_idle+0x1ea
end trace frame: 0x8000218fca60, count: 0
ddb{0}> 
acpicpu_idle() at acpicpu_idle+0x1ea
end trace frame: 0x8000218fca60, count: 0
ddb{0}> 
acpicpu_idle() at acpicpu_idle+0x1ea
end trace frame: 0x8000218fca60, count: 0
ddb{0}> 
acpicpu_idle() at acpicpu_idle+0x1ea
end trace frame: 0x8000218fca60, count: 0
ddb{0}> 
acpicpu_idle() at acpicpu_idle+0x1ea
end trace frame: 0x8000218fca60, count: 0
ddb{0}> boot dump
syncing disks..




ddb{1}> boot dump
syncing disks...panic: kernel diagnostic assertion "p->p_wchan == NULL" failed: 
file "/usr/src/sys/kern/kern_sched.c", line 338
Stopped at  db_enter+0x12:  popq%r11
TIDPIDUID PRFLAGS PFLAGS  CPU  COMMAND
  72309  19595 730x100010   0x800  syslogd
db_enter() at db_enter+0x12
panic() at panic+0x120
__assert(811929f4,80002191fa00,800021750ff0,8000e960) a
t __assert+0x24
sched_chooseproc() at sched_chooseproc+0x241
mi_switch() at mi_switch+0x1b4
sleep_finish(6d3fbc03daad5a02,80002191fb10) at sleep_finish+0x7f
sleep_finish_all(270c6ff115c03531,80002191fb10) at sleep_finish_all+0x1f
tsleep(64263b673ec8ed92,ff02417d4200,ff027f616830,65420) at tsleep+0xcd

getblk(f6e14443bcd449df,ff027f6167d0,80002191fd00,0,ff027f3d3000) a
t getblk+0xf5
bread(80145000,ff027f616a28,ff027f3d3000,0) at bread+0x1b
ffs_update(292bb10918b461bf,ff027f616a28) at ffs_update+0xfc
VOP_FSYNC(f6e14443bcb92266,80002191fe38,2be1d547d68afbf,8000e960) a
t VOP_FSYNC+0x52
ffs_sync_vnode(5116a32c89f5a557,80002191fe38) at ffs_sync_vnode+0xd2
vfs_mount_foreach_vnode(9582de35d54b8f61,2,8000e960) at vfs_mount_forea
ch_vnode+0x4e
end trace frame: 0x80002191fea0, count: 0
https://www.openbsd.org/ddb.html describes the minimum info required in bug
reports.  Insufficient info makes it difficult to find and fix bugs.
ddb{1}> boot sync
panic: kernel diagnostic assertion "__mp_lock_held(&sched_lock, curcpu()) == 0" 
failed: file "/usr/src/sys/kern/kern_lock.c", line 63
Stopped at  db_enter+0x12:  popq%r11
db_enter() at db_enter+0x12
panic() at panic+0x120
__assert(811929f4,80002191f530,0,ff02369aeae8) at __assert+0x24

_kernel_lock(778d687e7309b96e,1) at _kernel_lock+0xea
solock(537a4000962b3bf3) at solock+0x44
route_input(6e54ffebe841494b,80002191f610,8012f000) at route_input+
0xd1
if_down(8012f000) at if_down+0x94
if_downall() at if_downall+0x62
boot(c) at boot+0x8d
reboot(4800) at reboot+0x5a
nvramattach(81d05260) at nvramattach
db_boot_sync_cmd(819a1c9e,80002191f6c0,81d05260,1) at db_bo
ot_sync_cmd+0xe
db_command(be4ef64b60647db8,0) at db_command+0x2b4
db_command_loop() at db_command_loop+0x96
end trace frame: 0x80002191f820, count: 0
ddb{1}>



ddb{1}> dmesg  
OpenBSD 6.4 (GENERIC.MP) #6: Sat Jan 26 20:37:44 CET 2019
r...@syspatch-64-amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.
MP
real mem = 8544854016 (8149MB)
avail mem = 8276611072 (7893MB)
mpath0 at root
scsibus0 at mpath0: 256 targets
mainbus0 at root
bios0 at mainbus0: SMBIOS rev. 2.8 @ 0x7f4d8000 (50 entries)
bios0: vendor American Megatrends Inc. version "1.1a" date 08/27/2015
bios0: Supermicro A1SAi
acpi0 at bios0: rev 2
acpi0: sleep states S0 S5
acpi0: tables DSDT FACP FPDT FIDT SPMI MCFG WDA

Re: Strange kernel panic-like problem

2019-02-07 Thread Otto Moerbeek
On Thu, Feb 07, 2019 at 11:18:40PM -0600, Kyle wrote:

> I recently upgraded a box running 6.2 to 6.4 via clean install. After a few 
> days of running normally it started locking up, usually within a minute or so 
> after booting up to the login prompt. ddb appears on the console.
> 
> I eventually thought to try booting bsd.sp, which has been running for about 
> a day now without locking up.
> 
> Any clues to point me in the right direction would be much appreciated.
> 
> Here's some excerpts from the serial console (different sessions) and a dmesg:
> 
> 
> ddb{0}> show panic
> the kernel did not panic
> ddb{0}> trace
> acpicpu_idle() at acpicpu_idle+0x1ea
> sched_idle(0) at sched_idle+0x245
> end trace frame: 0x0, count: -2
> 
> 
> 
> login: NMI ... going o debugger

NMIs (Non Maskable Interrupts) are often an indication of hardware problems.

-Otto

> ddb{0}䀿   movq$0x8,%rcxuaei[1;24r c
>db{}> 
> ddb{0}> 
> d{0}> 
> ddb{0} 
> ddb{0}> 
> ddb{0}> 
> ddb{�}> 
> ddb{0> 
> db{0}> show panic
> the kernel did not panic
> ddb{0}> 
> the kernel did not panic
> ddb{0}> 
> the kernel did not panic
> ddb{0}> 
> the kernel did not panic
> ddb{0}> 
> the kernel di no paniddb{0}> 
> the erne i nic
> ddb{0}> 
> the kernel did not nic
> ddb{0}> 
> thernel id not nc
> ddb{0}> 
> the keel did notpanic
> ddb{0}> 
> theknel did not nic
> ddb{0}> 
> the kernel did not panic
> ddb{0}> 
> the kernel did not panic
> ddb{0}> 
> the kernel did not panc
> ddb0}> 
> the kernel did not panic
> ddb{0}> traccce
> No such command
> ddb{0}> 
> the kernel did not panic
> ddb{0}> tracce
> No such command
> ddb{0}> traace
> No such command
> ddb{0}> trace
> acpicpu_idle() at acpicpu_idle+0x1ea
> sched_idle(0) at sched_idle+0x245
> end trace frame: 0x0, count: -2
> ddb{0}> 
> acpicpu_idle() at acpicpu_idle+0x1ea
> end trace frame: 0x8000218fca60, count: 0
> ddb{0}> 
> acpicpu_idle() at acpicpu_idle+0x1ea
> end trace frame: 0x8000218fca60, count: 0
> ddb{0}> 
> acpicpu_idle() at acpicpu_idle+0x1ea
> end trace frame: 0x8000218fca60, count: 0
> ddb{0}> 
> acpicpu_idle() at acpicpu_idle+0x1ea
> end trace frame: 0x8000218fca60, count: 0
> ddb{0}> 
> acpicpu_idle() at acpicpu_idle+0x1ea
> end trace frame: 0x8000218fca60, count: 0
> ddb{0}> 
> acpicpu_idle() at acpicpu_idle+0x1ea
> end trace frame: 0x8000218fca60, count: 0
> ddb{0}> boot dump
> syncing disks..
> 
> 
> 
> 
> ddb{1}> boot dump
> syncing disks...panic: kernel diagnostic assertion "p->p_wchan == NULL" 
> failed: file "/usr/src/sys/kern/kern_sched.c", line 338
> Stopped at  db_enter+0x12:  popq%r11
> TIDPIDUID PRFLAGS PFLAGS  CPU  COMMAND
>   72309  19595 730x100010   0x800  syslogd
> db_enter() at db_enter+0x12
> panic() at panic+0x120
> __assert(811929f4,80002191fa00,800021750ff0,8000e960) 
> a
> t __assert+0x24
> sched_chooseproc() at sched_chooseproc+0x241
> mi_switch() at mi_switch+0x1b4
> sleep_finish(6d3fbc03daad5a02,80002191fb10) at sleep_finish+0x7f
> sleep_finish_all(270c6ff115c03531,80002191fb10) at sleep_finish_all+0x1f
> tsleep(64263b673ec8ed92,ff02417d4200,ff027f616830,65420) at 
> tsleep+0xcd
> 
> getblk(f6e14443bcd449df,ff027f6167d0,80002191fd00,0,ff027f3d3000) 
> a
> t getblk+0xf5
> bread(80145000,ff027f616a28,ff027f3d3000,0) at bread+0x1b
> ffs_update(292bb10918b461bf,ff027f616a28) at ffs_update+0xfc
> VOP_FSYNC(f6e14443bcb92266,80002191fe38,2be1d547d68afbf,8000e960) 
> a
> t VOP_FSYNC+0x52
> ffs_sync_vnode(5116a32c89f5a557,80002191fe38) at ffs_sync_vnode+0xd2
> vfs_mount_foreach_vnode(9582de35d54b8f61,2,8000e960) at 
> vfs_mount_forea
> ch_vnode+0x4e
> end trace frame: 0x80002191fea0, count: 0
> https://www.openbsd.org/ddb.html describes the minimum info required in bug
> reports.  Insufficient info makes it difficult to find and fix bugs.
> ddb{1}> boot sync
> panic: kernel diagnostic assertion "__mp_lock_held(&sched_lock, curcpu()) == 
> 0" failed: file "/usr/src/sys/kern/kern_lock.c", line 63
> Stopped at  db_enter+0x12:  popq%r11
> db_enter() at db_enter+0x12
> panic() at panic+0x120
> __assert(811929f4,80002191f530,0,ff02369aeae8) at 
> __assert+0x24
> 
> _kernel_lock(778d687e7309b96e,1) at _kernel_lock+0xea
> solock(537a4000962b3bf3) at solock+0x44
> route_input(6e54ffebe841494b,80002191f610,8012f000) at 
> route_input+
> 0xd1
> if_down(8012f000) at if_down+0x94
> if_downall() at if_downall+0x62
> boot(c) at boot+0x8d
> reboot(4800) at reboot+0x5a
> nvramattach(81d05260) at nvramattach
> db_boot_sync_cmd(819a1c9e,80002191f6c0,81d05260,1) at 
> db_bo
> ot_sync_cmd+0xe
> db_command(be4ef64b60647db8,0) at db_command+0x2b4
> db_command_loop() at db_command_loop+0x96
> end trace frame: 0x80002191f820, count: 0
> ddb{1}>
> 
> 
> 
> ddb{1}> dmesg  
> OpenBSD 6.4 (GENERIC.MP) #6: Sat Jan 26 20:37