On 1/24/24 07:52, Philippe Mathieu-Daudé wrote:
Hi Hannes,

[+Markus as QOM/QDev rubber duck]

On 23/1/24 13:40, Hannes Reinecke wrote:
On 1/23/24 11:59, Damien Hedde wrote:
Hi all,

We are currently looking into hotplugging nvme devices and it is currently not possible:
When nvme was introduced 2 years ago, the feature was disabled.
commit cc6fb6bc506e6c47ed604fcb7b7413dff0b7d845
Author: Klaus Jensen
Date:   Tue Jul 6 10:48:40 2021 +0200

    hw/nvme: mark nvme-subsys non-hotpluggable
    We currently lack the infrastructure to handle subsystem hotplugging, so
    disable it.

Do someone know what's lacking or anyone have some tips/idea of what we should develop to add the support ?

Problem is that the object model is messed up. In qemu namespaces are attached to controllers, which in turn are children of the PCI device.
There are subsystems, but these just reference the controller.

So if you hotunplug the PCI device you detach/destroy the controller and detach the namespaces from the controller. But if you hotplug the PCI device again the NVMe controller will be attached to the PCI device, but the namespace are still be detached.

Klaus said he was going to fix that, and I dimly remember some patches
floating around. But apparently it never went anywhere.

Fundamental problem is that the NVMe hierarchy as per spec is incompatible with the qemu object model; qemu requires a strict
tree model where every object has exactly _one_ parent.

The modelling problem is not clear to me.
Do you have an example of how the NVMe hierarchy should be?

Sure.

As per NVMe spec we have this hierarchy:

     --->  subsys ---
    |                |
    |                V
controller      namespaces

There can be several controllers, and several
namespaces.
The initiator (ie the linux 'nvme' driver) connects
to a controller, queries the subsystem for the attached
namespaces, and presents each namespace as a block device.

For Qemu we have the problem that every device _must_ be
a direct descendant of the parent (expressed by the fact
that each 'parent' object is embedded in the device object).

So if we were to present a NVMe PCI device, the controller
must be derived from the PCI device:

pci -> controller

but now we have to express the NVMe hierarchy, too:

pci -> ctrl1 -> subsys1 -> namespace1

which actually works.
We can easily attach several namespaces:

pci -> ctrl1 ->subsys1 -> namespace2

For a single controller and a single subsystem.
However, as mentioned above, there can be _several_
controllers attached to the same subsystem.
So we can express the second controller:

pci -> ctrl2

but we cannot attach the controller to 'subsys1'
as then 'subsys1' would need to be derived from
'ctrl2', and not (as it is now) from 'ctrl1'.

The most logical step would be to have 'subsystems'
their own entity, independent of any controllers.
But then the block devices (which are derived from
the namespaces) could not be traced back
to the PCI device, and a PCI hotplug would not
'automatically' disconnect the nvme block devices.

Plus the subsystem would be independent from the NVMe
PCI devices, so you could have a subsystem with
no controllers attached. And one would wonder who
should be responsible for cleaning up that.

Cheers,

Hannes


Reply via email to