On 18/05/25(Sun) 23:43, Mark Kettenis wrote:
> > From: Dave Voutila <[email protected]>
> > Date: Sun, 18 May 2025 15:21:28 -0400
> >
> > Claudio Jeker <[email protected]> writes:
> >
> > > On Thu, May 15, 2025 at 01:15:43PM +0200, Martin Pieuchot wrote:
> > >> Fault below has been triggered by dong 'vmctl $myvm stop' while
> > >> rebooting a 7.7 amd64 VM:
> > >>
> > >> OpenBSD 7.7 (GENERIC) #619: Sun Apr 13 08:19:34 MDT 2025
> > >> [email protected]:/usr/src/sys/arch/amd64/compile/GENERIC
> > >> real mem = 117428224 (111MB)
> > >> avail mem = 87916544 (83MB)
> > >> random: good seed from bootblocks
> > >> mpath0 at root
> > >> scsibus0 at mpath0: 256 targets
> > >> mainbus0 at root
> > >> bios0 at mainbus0: SMBIOS rev. 2.4 @ 0xf2760 (10 entries)
> > >> bios0: vendor SeaBIOS version "1.16.3p0-OpenBSD-vmm" date 01/01/2011
> > >> bios0: OpenBSD VMM
> > >> acpi at bios0 not configured
> > >> cpu0 at mainbus0: (uniprocessor)
> > >> cpu0: Intel(R) Core(TM) i7-5500U CPU @ 2.40GHz, 2394.43 MHz, 06-3d-04
> > >> cpu0: cpuid 1
> > >> edx=78ba97f<FPU,VME,DE,PSE,TSC,MSR,PAE,CX8,SEP,PGE,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2>
> > >> ecx=f6d83203<SSE3,PCLMUL,SSSE3,FMA3,CX16,SSE4.1,SSE4.2,MOVBE,POPCNT,AES,XSAVE,AVX,F16C,RDRAND,HV>
> > >> cpu0: cpuid 7.0
> > >> ebx=1c23a9<FSGSBASE,BMI1,AVX2,SMEP,BMI2,ERMS,RDSEED,ADX,SMAP>
> > >> edx=400<MD_CLEAR>
> > >> cpu0: cpuid d.1 eax=1<XSAVEOPT>
> > >> cpu0: cpuid 80000001 edx=24100800<NXE,PAGE1GB,LONG>
> > >> ecx=121<LAHF,ABM,3DNOWP>
> > >> cpu0: cpuid 80000007 edx=100<ITSC>
> > >> cpu0: MELTDOWN
> > >> cpu0: 32KB 64b/line 8-way D-cache, 32KB 64b/line 8-way I-cache, 256KB
> > >> 64b/line 8-way L2 cache, 4MB 64b/line 16-way L3 cache
> > >> cpu0: smt 0, core 0, package 0
> > >> cpu0: using VERW MDS workaround
> > >> pvbus0 at mainbus0: OpenBSD
> > >> pvclock0 at pvbus0
> > >> pci0 at mainbus0 bus 0
> > >> pchb0 at pci0 dev 0 function 0 "OpenBSD VMM Host" rev 0x00
> > >> virtio0 at pci0 dev 1 function 0 "Qumranet Virtio RNG" rev 0x00
> > >> viornd0 at virtio0
> > >> virtio0: irq 3
> > >> virtio1 at pci0 dev 2 function 0 "Qumranet Virtio Network" rev 0x00
> > >> vio0 at virtio1: 1 queue, address fe:e1:bb:d1:de:21
> > >> virtio1: irq 5
> > >> virtio2 at pci0 dev 3 function 0 "Qumranet Virtio Storage" rev 0x00
> > >> vioblk0 at virtio2
> > >> virtio2: irq 6
> > >> scsibus1 at vioblk0: 1 targets
> > >> sd0 at scsibus1 targ 0 lun 0: <VirtIO, Block Device, >
> > >> sd0: 3072MB, 512 bytes/sector, 6291456 sectors
> > >> virtio3 at pci0 dev 4 function 0 "OpenBSD VMM Control" rev 0x00
> > >> vmmci0 at virtio3
> > >> virtio3: irq 7
> > >> isa0 at mainbus0
> > >> isadma0 at isa0
> > >> com0 at isa0 port 0x3f8/8 irq 4: ns8250, no fifo
> > >> com0: console
> > >> Rebooting in response to request from vmmci0 host
> > >> uvm_fault(0xffffffff82a56848, 0xf0, 0, 1) -> e
> > >> kernel: page fault trap, code=0
> > >> Stopped at mtx_enter+0x2f: movq 0(%rdi),%rax
> > >> TID PID UID PRFLAGS PFLAGS CPU COMMAND
> > >> * 0 0 0 0x10000 0x200 0 swapper
> > >> mtx_enter(f0,f0,cea0d74fd2320eb6,f0,10,ffffffff82fbdcc0) at
> > >> mtx_enter+0x2f
> > >> prsignal(0,2,2,1,ffff800000090400,d) at prsignal+0x26
> > >> vmmci_config_change(ffff800000090400,ffff800000090400,86acf72f78ea7d07,0,ffff80
> > >> 0000090400,0) at vmmci_config_change+0xf6
> > >> virtio_pci_legacy_intr(ffff800000090400,ffff800000090400,c9a59ef2495be15f,fffff
> > >> fff81004000,0,4) at virtio_pci_legacy_intr+0x63
> > >> intr_handler(ffffffff82fbddf0,ffff80000008e500,cbae99a56f027dfe,ffff80000008e48
> > >> 0,ffffffff8152e8b6,ffffffff82fbdde0) at intr_handler+0x56
> > >> Xintr_legacy7_untramp(0,ffffffff824e65b0,0,18041969,ffffffff81004000,e)
> > >> at Xint
> > >> r_legacy7_untramp+0x1a3
> > >> Xspllower(0,0,c0d762ae039e8784,ffffffff81004000,ffffffff8152ed5c,ffffffff82fb70
> > >> 08) at Xspllower+0x1d
> > >> cpu_configure(4449abb23155d965,0,0,ffff800000032000,ffffffff812a613f,ffffffff82
> > >> fbdf10) at cpu_configure+0xaf
> > >> main(e,e,0,1,ffffffff816ab787,ffffffff82fbdf40) at main+0x427
> > >> end trace frame: 0x0, count: 6
> > >> https://www.openbsd.org/ddb.html describes the minimum info required in
> > >> bug
> > >>
> > >
> > > I guess initprocess is still NULL in pvbus_reboot() and so prsignal blows
> > > up trying to access the ps_mtx.
> > >
> > > I would suggest to use startuphook_establish() to delay the attach of
> > > vmmci0 until initprocess is set.
> >
> > So here's what I'm thinking...
> >
> > I first tried delaying the virtio_attach_finish call using
> > startuphook_establish(9) but there seems to be a lifetime issue with the
> > virtio_attach_args struct. Specifically, virtio_pci_attach_finish
> > dereferences it as a virtio_pci_attach_args. At this point, the
> > pci_attach_args* member vpa_pa seems trashed.
>
> Yes, as soon as you return from the attach function, the attach_args
> struct will be freed. You can make a copy of the struct and stash it
> away for later. But even then you have to be careful that pointer in
> the struct may no longer be valid.
>
> > Instead, I took the approach of using kthread_create_deferred(9) to
> > queue up signalling the shutdown or reboot command. If called late
> > enough, this ends up a direct call of the callback function instead of
> > adding it to the deferred kthread queue.
> >
> > In my testing, it can still result in the signal hitting initprocess
> > after it's alive, but before it can handle it. If that happens, it ends
> > up not causing a shutdown / reboot. To deal with this and the fact that
> > vmd thinks the command is in flight, I arm a timeout to try again after
> > 5 seconds.
> >
> > It kinda sucks, but it works. I'm all ears as to another approach or if
> > someone has a clue as to the lifetime issue I'm seeing with deferring
> > virtio pci attachment.
>
> I wonder if it would make sense to have a more generic solution. I
> suspect a similar issue may arise if you press the (ACPI) powerbutton
> on a machine during boot. I think an API, that can be safely called
> from interrupt context, would make sense. Currently we have several
> ssubtly different implementations of the same code, at least for
> shutdown, that all have one or more bugs...
Are you suggesting using something in-kernel instead of sending a signal
to the init process which might not yet exist?
> > diff refs/heads/master refs/heads/vmmci-panic
> > commit - 3c994bb556f544ca04982fd977c153c1485aab1e
> > commit + c58acacf15c284967253b4196b7d24aedf9ad1a2
> > blob - 984626393cd85a97b6b84a19c82c8175aa2e4379
> > blob + 0a4eac6d9ed700c383d332e5360bf0aea0a464de
> > --- sys/dev/pv/vmmci.c
> > +++ sys/dev/pv/vmmci.c
> > @@ -21,6 +21,7 @@
> > #include <sys/timeout.h>
> > #include <sys/device.h>
> > #include <sys/sensors.h>
> > +#include <sys/kthread.h>
> >
> > #include <machine/bus.h>
> >
> > @@ -39,10 +40,12 @@ struct vmmci_softc {
> > struct device sc_dev;
> > struct virtio_softc *sc_virtio;
> > enum vmmci_cmd sc_cmd;
> > + enum vmmci_cmd sc_cmd_async;
> > unsigned int sc_interval;
> > struct ksensordev sc_sensordev;
> > struct ksensor sc_sensor;
> > struct timeout sc_tick;
> > + struct timeout sc_to_shutdown;
> > };
> >
> > int vmmci_match(struct device *, void *, void *);
> > @@ -52,6 +55,7 @@ int vmmci_activate(struct device *, int);
> > int vmmci_config_change(struct virtio_softc *);
> > void vmmci_tick(void *);
> > void vmmci_tick_hook(struct device *);
> > +void vmmci_worker(void *);
> >
> > const struct cfattach vmmci_ca = {
> > sizeof(struct vmmci_softc),
> > @@ -116,6 +120,8 @@ vmmci_attach(struct device *parent, struct device *sel
> > config_mountroot(self, vmmci_tick_hook);
> > }
> >
> > + timeout_set(&sc->sc_to_shutdown, vmmci_worker, sc);
> > +
> > printf("\n");
> > if (virtio_attach_finish(vsc, va) != 0)
> > goto err;
> > @@ -171,18 +177,22 @@ vmmci_config_change(struct virtio_softc *vsc)
> > /* no action */
> > break;
> > case VMMCI_SHUTDOWN:
> > - pvbus_shutdown(&sc->sc_dev);
> > - break;
> > case VMMCI_REBOOT:
> > - pvbus_reboot(&sc->sc_dev);
> > + /*
> > + * Asynchronously process shutdown/reboot to allow us to handle
> > + * when vmmci(4) receives a command during system startup or
> > + * shutdown.
> > + */
> > + sc->sc_cmd_async = cmd;
> > + kthread_create_deferred(vmmci_worker, sc);
> > break;
> > case VMMCI_SYNCRTC:
> > inittodr(gettime());
> > sc->sc_cmd = VMMCI_NONE;
> > - break;
> > + break;
> > default:
> > printf("%s: invalid command %d\n", sc->sc_dev.dv_xname, cmd);
> > - cmd = VMMCI_NONE;
> > + cmd = VMMCI_NONE;
> > break;
> > }
> >
> > @@ -226,3 +236,25 @@ vmmci_tick_hook(struct device *self)
> > timeout_set(&sc->sc_tick, vmmci_tick, sc);
> > vmmci_tick(sc);
> > }
> > +
> > +void
> > +vmmci_worker(void *arg)
> > +{
> > + struct vmmci_softc *sc = (struct vmmci_softc *)arg;
> > +
> > + switch (sc->sc_cmd_async) {
> > + case VMMCI_SHUTDOWN:
> > + pvbus_shutdown(&sc->sc_dev);
> > + break;
> > + case VMMCI_REBOOT:
> > + pvbus_reboot(&sc->sc_dev);
> > + break;
> > + default:
> > + printf("%s: invalid async command %d\n", sc->sc_dev.dv_xname,
> > + sc->sc_cmd_async);
> > + return;
> > + }
> > +
> > + /* Re-arm in case initprocess misses our signal. */
> > + timeout_add_sec(&sc->sc_to_shutdown, 5);
> > +}
> >
> >
>