Hi all,

I've got a couple of VMs that I update quite frequently.  I use a
script to automate the update (in place untarring of the various sets)
on each VM, and have another script on the host that iterates over the
VMs and upgrades each sequentially, orchestrating the process.

After the orchestrator script has completed the upgrade, it sshs in
and runs `doas reboot`: /usr/bin/ssh -t ${VM} doas reboot

Recently (last few weeks, I haven't really tracked this, sorry) more
often than not, I'm seeing panics because init gets SIGBUS:

syncing disks... done
panic: init died (signal 10, exit 0)
Stopped at      db_enter+0x10:  popq    %rbp
    TID    PID    UID     PRFLAGS     PFLAGS  CPU  COMMAND
*475527      1      0       0x802     0x2000    0  init
db_enter() at db_enter+0x10
panic(ffffffff81e6fc03) at panic+0xb8
exit1(ffff8000fffffa40,0,a,1) at exit1+0x61d
trapsignal(ffff8000fffffa40,a,6,3,a7d799f19e0) at trapsignal+0x158
upageflttrap(ffff800014c96730,a7d799f19e0) at upageflttrap+0xf0
usertrap(ffff800014c96730) at usertrap+0x179
recall_trap() at recall_trap+0x8
end of kernel
end trace frame: 0x7f7ffffc1280, count: 8
https://www.openbsd.org/ddb.html describes the minimum info required
in bug
reports.  Insufficient info makes it difficult to find and fix bugs.
ddb> ps
   PID     TID   PPID    UID  S       FLAGS  WAIT          COMMAND
 23914  217005      1      0  2         0x3                reboot
 85576  149845      0      0  3     0x14280  nfsidl        nfsio
 85516  509272      0      0  3     0x14280  nfsidl        nfsio
 27313   82613      0      0  3     0x14280  nfsidl        nfsio
 74778  248263      0      0  3     0x14280  nfsidl        nfsio
 46983  166023      0      0  3     0x14200  bored         smr
 68429  136470      0      0  2     0x14200                zerothread
 86312  246770      0      0  3     0x14200  aiodoned      aiodoned
 45749   87877      0      0  2     0x14600                update
 67717  192012      0      0  3     0x14200  cleaner       cleaner
 15199  366441      0      0  3     0x14200  reaper        reaper
 35818  112882      0      0  3     0x14200  pgdaemon      pagedaemon
 47013  345768      0      0  3     0x14200  bored         crynlk
 74571  370572      0      0  3     0x14200  bored         crypto
 69117   88807      0      0  2     0x14200                softnet
 53953  240257      0      0  2     0x14200                systqmp
 53101  304322      0      0  3     0x14200  bored         systq
 46072  394366      0      0  3  0x40014200  bored         softclock
 59261  410646      0      0  3  0x40014200                idle0
*    1  475527      0      0  7      0x2802                init
     0       0     -1      0  3     0x10200  scheduler     swapper
ddb> 

With some printf debugging, I've determined that this happens in
if_downall() (from the trace above, vfs_shutdown() has just completed,
a printf after resettodr() was shown with a debugging kernel, but the
printf after if_downall() didn't.

I'll dig around a bit further, but if anyone has anything obvious I
should look into, I'm keen to hear it.

Thanks,

Paul

-- 
>++++++++[<++++++++++>-]<+++++++.>+++[<------>-]<.>+++[<+
+++++++++++>-]<.>++[<------------>-]<+.--------------.[-]
                 http://www.weirdnet.nl/                 

Reply via email to