On Thu, Jun 18, 2026 at 11:39:11AM +0200, Philippe Mathieu-Daudé wrote:
> Hi,
> 
> Old patch committed as a23151e8cc8 (Oct 2020).
> I'm revisiting it in the context of dynamic heterogeneous machines.
> 
> On 6/10/20 14:39, Maxim Levitsky wrote:
> > Some code might race with placement of new devices on a bus.
> 
> So IIUC at least 2 vCPUs plug distinct devices on the same bus,
> and the problem is the bus slot assigned for the device. What is
> deciding the free slot, the guest code or QEMU?

Hi Philippe,
The issue was not about a race in slot assignment, it was about hotplug
racing with an IOThread. When the guest triggers device emulation
activity in an IOThread and the monitor thread is doing hotplug, then
there was a race condition since the IOThread does not execute under the
Big QEMU Lock.

In other words, the virtio-scsi device model must not see partially
hotplugged/unplugged devices when processing SCSI requests in an
IOThread.

> 
> > We currently first place a (unrealized) device on the bus
> > and then realize it.
> 
> Sound like ill design. Maybe resulting from the unclear definition
> on what "realized" means. "Guest visible"? "Wired on some bus"?

Back when everything ran inside the BQL it worked fine because device
emulation activity would not happen during hotplug. Hotplug was atomic.

IOThreads broke the BQL assumption and that's why the race condition
occurred.

> 
> Something I tried to clarify once (now outdated):
> https://lore.kernel.org/qemu-devel/[email protected]/
> 
> A vCPU thread shouldn't poke at QOM/QDev internals. When it accesses
> a device (to check "is it out of reset?"), this is already existing
> realized.
> 
> Actually, when the guest asks to assign the device to the bus, the
> device is already existing (from guest PoV), so must be realized.

I'm not sure I understand. The guest doesn't ask to assign a device to a
bus. QEMU assigns devices to busses. (I'm ignoring cooperative hotplug
and hot unplug workflows where the device and the driver communicate in
a multi-step process.)

> 
> > As a workaround, users that scan the child device list, can
> > check the realized property to see if it is safe to access such a device.
> > Use an atomic write here too to aid with this.
> 
> I'd appreciate to be pointed at real use case we fixed here, to
> better figure what could be reworked on the QBus / QDev layer in
> order to not depend anymore on this workaround.

SCSI controller emulation involves iterating over the SCSI devices on
the bus. This can run in an IOThread outside the BQL. Hotplug modifies
the bus and this needs to be done in a thread-safe way. Otherwise the
IOThread seem partially hotplugged/hot unplugged devices on the SCSI
bus.

This patch series provided a way for devices (e.g. virtio-scsi) that
face this problem to safely skip devices on their busses that are not
realized.

> > A separate discussion is what to do with devices that are unrealized:
> > It looks like for this case we only call the hotplug handler's unplug
> > callback and its up to it to unrealize the device.
> 
> Likely the hotplugging path is what needs to be improved (I'm seeing
> a similar problem with vCPU hotplug).
> 
> Thanks,
> 
> Phil.
> 
> > An atomic operation doesn't cause harm for this code path though.
> > 
> > Signed-off-by: Maxim Levitsky <[email protected]>
> > Reviewed-by: Stefan Hajnoczi <[email protected]>
> > Message-Id: <[email protected]>
> > Signed-off-by: Paolo Bonzini <[email protected]>
> > ---
> >   hw/core/qdev.c         | 19 ++++++++++++++++++-
> >   include/hw/qdev-core.h |  2 ++
> >   2 files changed, 20 insertions(+), 1 deletion(-)
> > 
> > diff --git a/hw/core/qdev.c b/hw/core/qdev.c
> > index 59e5e710b7..fc4daa36fa 100644
> > --- a/hw/core/qdev.c
> > +++ b/hw/core/qdev.c
> > @@ -946,7 +946,25 @@ static void device_set_realized(Object *obj, bool 
> > value, Error **errp)
> >               }
> >          }
> > +       qatomic_store_release(&dev->realized, value);
> > +
> >       } else if (!value && dev->realized) {
> > +
> > +        /*
> > +         * Change the value so that any concurrent users are aware
> > +         * that the device is going to be unrealized
> > +         *
> > +         * TODO: change .realized property to enum that states
> > +         * each phase of the device realization/unrealization
> > +         */
> > +
> > +        qatomic_set(&dev->realized, value);
> > +        /*
> > +         * Ensure that concurrent users see this update prior to
> > +         * any other changes done by unrealize.
> > +         */
> > +        smp_wmb();
> > +
> >           QLIST_FOREACH(bus, &dev->child_bus, sibling) {
> >               qbus_unrealize(bus);
> >           }
> > @@ -961,7 +979,6 @@ static void device_set_realized(Object *obj, bool 
> > value, Error **errp)
> >       }
> >       assert(local_err == NULL);
> > -    dev->realized = value;
> >       return;
> >   child_realize_fail:
> > diff --git a/include/hw/qdev-core.h b/include/hw/qdev-core.h
> > index 2c6307e3ed..868973319e 100644
> > --- a/include/hw/qdev-core.h
> > +++ b/include/hw/qdev-core.h
> > @@ -163,6 +163,8 @@ struct NamedClockList {
> >   /**
> >    * DeviceState:
> >    * @realized: Indicates whether the device has been fully constructed.
> > + *            When accessed outsize big qemu lock, must be accessed with
> > + *            atomic_load_acquire()
> >    * @reset: ResettableState for the device; handled by Resettable 
> > interface.
> >    *
> >    * This structure should not be accessed directly.  We declare it here
> 

Attachment: signature.asc
Description: PGP signature

Reply via email to