On Tue, Jan 09, 2024 at 08:28:29PM +0100, Mark Kettenis wrote:
> > From: Dave Voutila <d...@sisu.io>
> > Date: Tue, 09 Jan 2024 09:19:56 -0500
> >
> > Stefan Fritsch <s...@openbsd.org> writes:
> >
> > > On 08.01.24 22:24, Alexander Bluhm wrote:
> > >> Hi,
> > >> When running a guest in vmm and doing ifconfig operations on vio
> > >> interface, I can crash the guest.
> > >> I run these loops in the guest:
> > >> while doas ifconfig vio1 inet 10.188.234.74/24; do :; done
> > >> while doas ifconfig vio1 -inet; do :; done
> > >> while doas ifconfig vio1 down; do :; done
> > >> And from host I ping the guest:
> > >> ping -f 10.188.234.74
> > >
> > > I suspect there is a race condition in vmd. The vio(4) kernel driver
> > > resets the device and then frees all the mbufs from the tx and rx
> > > rings. If vmd continues doing dma for a bit after the reset, this
> > > could result in corruption. From this code in vmd's vionet.c
> > >
> > >                 case VIODEV_MSG_IO_WRITE:
> > >                         /* Write IO: no reply needed */
> > >                         if (handle_io_write(&msg, dev) == 1)
> > >                                 virtio_assert_pic_irq(dev, 0);
> > >                         break;
> > >
> > > it looks like the main vmd process will just send a pio write message
> > > to the vionet process but does not wait for the vionet process to
> > > actually execute the device reset. The pio write instruction in the
> > > vcpu must complete after the device reset is complete.
> >
> > Are you saying we need to wait for the emulation of the OUT instruction
> > that the vcpu is executing? I don't believe we should be blocking the
> > vcpu here as that's not how port io works with real hardware. It makes
> > no sense to block on an OUT until the device finishes emulation.
>
> Well, I/O address space is highly synchronous.  See 16.6 "Ordering
> I/O" in the Intel SDM.  There it clearly states that execution of the
> next instruction after an OUT instruction is delayed intil the store
> completes.  Now that isn't necessarily the same as completing all
> device emulation for the device.  But it does mean the store has to

This is exactly what we do. We (vmd) delivers the "out" data to the emulated
device, and returns.

If virtio requires "out" to mean "the device must complete all its tasks
associated with that 'out' before the next instruction can be executed",
then IMO expecting that behavior is wrong. I don't think this is the case,
based upon my reading of the spec. More likely, some people got lazy and
just expected that behavior to be the norm based on how some early VMM
implemented things.

> reach the device register before the next instruction gets executed.
>
> Yes, this is slow.  Avoid I/O address space if you can; use
> Memory-Mapped I/O instead.
>

Working on it. But as dv points out in a previous reply, the I/O performance
improvement here is really due to being able to process more data at a time
(rep, and larger data sizes) than the synchronous bus-locking nature of
IN/OUT.

> > I *do* think there could be something wrong in the device status
> > register emulation, but blocking the vcpu on an OUT isn't the way to
> > solve this. In fact, that's what previously happened before I split
> > device emulation out into subprocesses...so if there's a bug in the
> > emulation logic, it was hiding it.
> >
> > >
> > > I could not reproduce this issue with kvm/qemu.
> > >
> >
> > Thanks!
> >
> > >
> > >> Then I see various kind of mbuf corruption:
> > >> kernel: protection fault trap, code=0
> > >> Stopped at      pool_do_put+0xc9:       movq    0x8(%rcx),%rcx
> > >> ddb> trace
> > >> pool_do_put(ffffffff82519e30,fffffd807db89000) at pool_do_put+0xc9
> > >> pool_put(ffffffff82519e30,fffffd807db89000) at pool_put+0x53
> > >> m_extfree(fffffd807d330300) at m_extfree+0xa5
> > >> m_free(fffffd807d330300) at m_free+0x97
> > >> soreceive(fffffd806f33ac88,0,ffff80002a3e97f8,0,0,ffff80002a3e9724,76299c799030
> > >> 1bf1) at soreceive+0xa3e
> > >> soo_read(fffffd807ed4a168,ffff80002a3e97f8,0) at soo_read+0x4a
> > >> dofilereadv(ffff80002a399548,7,ffff80002a3e97f8,0,ffff80002a3e98c0) at 
> > >> dofilere
> > >> adv+0x143
> > >> sys_read(ffff80002a399548,ffff80002a3e9870,ffff80002a3e98c0) at 
> > >> sys_read+0x55
> > >> syscall(ffff80002a3e9930) at syscall+0x33a
> > >> Xsyscall() at Xsyscall+0x128
> > >> end of kernel
> > >> end trace frame: 0x7469f8836930, count: -10
> > >> pool_do_put(ffffffff8259a500,fffffd807e7fa800) at pool_do_put+0xc9
> > >> pool_put(ffffffff8259a500,fffffd807e7fa800) at pool_put+0x53
> > >> m_extfree(fffffd807f838a00) at m_extfree+0xa5
> > >> m_free(fffffd807f838a00) at m_free+0x97
> > >> m_freem(fffffd807f838a00) at m_freem+0x38
> > >> vio_txeof(ffff800000030118) at vio_txeof+0x11d
> > >> vio_tx_intr(ffff800000030118) at vio_tx_intr+0x31
> > >> virtio_check_vqs(ffff800000024800) at virtio_check_vqs+0x102
> > >> virtio_pci_legacy_intr(ffff800000024800) at virtio_pci_legacy_intr+0x65
> > >> intr_handler(ffff80002a52dae0,ffff800000081000) at intr_handler+0x3c
> > >> Xintr_legacy5_untramp() at Xintr_legacy5_untramp+0x1a3
> > >> Xspllower() at Xspllower+0x1d
> > >> vio_ioctl(ffff8000000822a8,80206910,ffff80002a52dd00) at vio_ioctl+0x16a
> > >> ifioctl(fffffd807c0ba7a0,80206910,ffff80002a52dd00,ffff80002a41c810) at 
> > >> ifioctl
> > >> +0x721
> > >> sys_ioctl(ffff80002a41c810,ffff80002a52de00,ffff80002a52de50) at 
> > >> sys_ioctl+0x2a
> > >> b
> > >> syscall(ffff80002a52dec0) at syscall+0x33a
> > >> Xsyscall() at Xsyscall+0x128
> > >> end of kernel
> > >> end trace frame: 0x7b3d36d55eb0, count: -17
> > >> panic: pool_do_get: mcl2k free list modified: page
> > >> 0xfffffd80068bd000; item add
> > >> r 0xfffffd80068bf800; offset 0x0=0xa0000 != 0x83dcdb591c6b8bf
> > >> Stopped at      db_enter+0x14:  popq    %rbp
> > >>      TID    PID    UID     PRFLAGS     PFLAGS  CPU  COMMAND
> > >> *143851  19121      0         0x3          0    0  ifconfig
> > >> db_enter() at db_enter+0x14
> > >> panic(ffffffff8206e651) at panic+0xb5
> > >> pool_do_get(ffffffff824a1b30,2,ffff80002a4a55d4) at pool_do_get+0x320
> > >> pool_get(ffffffff824a1b30,2) at pool_get+0x7d
> > >> m_clget(fffffd807c4e4f00,2,800) at m_clget+0x18d
> > >> rtm_msg1(e,ffff80002a4a56f0) at rtm_msg1+0xde
> > >> rtm_ifchg(ffff8000000822a8) at rtm_ifchg+0x65
> > >> if_down(ffff8000000822a8) at if_down+0xa4
> > >> ifioctl(fffffd8006898978,80206910,ffff80002a4a58c0,ffff80002a474ff0) at 
> > >> ifioctl
> > >> +0xcd5
> > >> sys_ioctl(ffff80002a474ff0,ffff80002a4a59c0,ffff80002a4a5a10) at 
> > >> sys_ioctl+0x2a
> > >> b
> > >> syscall(ffff80002a4a5a80) at syscall+0x33a
> > >> Xsyscall() at Xsyscall+0x128
> > >> end of kernel
> > >> end trace frame: 0x7f6c22492130, count: 3
> > >> OpenBSD 7.4-current (GENERIC) #3213: Mon Jan  8 22:05:58 CET 2024
> > >>      
> > >> bluhm@t430s.bluhm.invalid:/home/bluhm/openbsd/cvs/src/sys/arch/amd64/compile/GENERIC*master
> > >> real mem = 2130706432 (2032MB)
> > >> avail mem = 2046525440 (1951MB)
> > >> random: boothowto does not indicate good seed
> > >> mpath0 at root
> > >> scsibus0 at mpath0: 256 targets
> > >> mainbus0 at root
> > >> bios0 at mainbus0
> > >> acpi at bios0 not configured
> > >> cpu0 at mainbus0: (uniprocessor)
> > >> cpu0: Intel(R) Core(TM) i7-3520M CPU @ 2.90GHz, 2893.78 MHz, 06-3a-09
> > >> cpu0:
> > >> FPU,VME,DE,PSE,TSC,MSR,PAE,CX8,SEP,PGE,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,SSE3,PCLMUL,SSSE3,CX16,SSE4.1,SSE4.2,POPCNT,AES,XSAVE,AVX,F16C,RDRAND,HV,NXE,LONG,LAHF,ITSC,FSGSBASE,SMEP,ERMS,MD_CLEAR,MELTDOWN
> > >> cpu0: 32KB 64b/line 8-way D-cache, 32KB 64b/line 8-way I-cache, 256KB 
> > >> 64b/line 8-way L2 cache, 4MB 64b/line 16-way L3 cache
> > >> cpu0: smt 0, core 0, package 0
> > >> cpu0: using VERW MDS workaround
> > >> pvbus0 at mainbus0: OpenBSD
> > >> pvclock0 at pvbus0
> > >> pci0 at mainbus0 bus 0
> > >> pchb0 at pci0 dev 0 function 0 "OpenBSD VMM Host" rev 0x00
> > >> virtio0 at pci0 dev 1 function 0 "Qumranet Virtio RNG" rev 0x00
> > >> viornd0 at virtio0
> > >> virtio0: irq 3
> > >> virtio1 at pci0 dev 2 function 0 "Qumranet Virtio Network" rev 0x00
> > >> vio0 at virtio1: address 70:5f:ca:21:8d:74
> > >> virtio1: irq 5
> > >> virtio2 at pci0 dev 3 function 0 "Qumranet Virtio Network" rev 0x00
> > >> vio1 at virtio2: address 70:5f:ca:21:8d:84
> > >> virtio2: irq 6
> > >> virtio3 at pci0 dev 4 function 0 "Qumranet Virtio Network" rev 0x00
> > >> vio2 at virtio3: address 70:5f:ca:21:8d:94
> > >> virtio3: irq 7
> > >> virtio4 at pci0 dev 5 function 0 "Qumranet Virtio Storage" rev 0x00
> > >> vioblk0 at virtio4
> > >> scsibus1 at vioblk0: 1 targets
> > >> sd0 at scsibus1 targ 0 lun 0: <VirtIO, Block Device, >
> > >> sd0: 10240MB, 512 bytes/sector, 20971520 sectors
> > >> virtio4: irq 9
> > >> virtio5 at pci0 dev 6 function 0 "Qumranet Virtio SCSI" rev 0x00
> > >> vioscsi0 at virtio5: qsize 128
> > >> scsibus2 at vioscsi0: 1 targets
> > >> cd0 at scsibus2 targ 0 lun 0: <OpenBSD, VMM CD-ROM, 001> removable
> > >> virtio5: irq 10
> > >> virtio6 at pci0 dev 7 function 0 "OpenBSD VMM Control" rev 0x00
> > >> vmmci0 at virtio6
> > >> virtio6: irq 11
> > >> isa0 at mainbus0
> > >> isadma0 at isa0
> > >> com0 at isa0 port 0x3f8/8 irq 4: ns8250, no fifo
> > >> com0: console
> > >> vscsi0 at root
> > >> scsibus3 at vscsi0: 256 targets
> > >> softraid0 at root
> > >> scsibus4 at softraid0: 256 targets
> > >> root on sd0a (16c451c199802b57.a) swap on sd0b dump on sd0b
> >
> >
>

Reply via email to