Hi Igor,
My sincere Apologies, I just realized that I missed to reply this mail.
I was distracted to something else in  the month of the February and
had only resumed working on hotplug in march. But will still reply to
this mail. Also, I have incorporated most of the below points as in the
vcpu hotplug patches as per your suggestions.


> From: Qemu-arm [mailto:qemu-arm-bounces+salil.mehta=huawei....@nongnu.org]
> On Behalf Of Igor Mammedov
> Sent: Monday, January 27, 2020 3:04 PM
> To: Salil Mehta <salil.me...@huawei.com>
> 
> On Fri, 24 Jan 2020 18:44:16 +0000
> Salil Mehta <salil.me...@huawei.com> wrote:
> 
> > > From: Igor Mammedov [mailto:imamm...@redhat.com]
> > > Sent: Friday, January 24, 2020 4:07 PM
> > >
> > > On Fri, 24 Jan 2020 15:02:10 +0000
> > > Salil Mehta <salil.me...@huawei.com> wrote:
> > >
> > > > > From: Igor Mammedov [mailto:imamm...@redhat.com]
> > > > > Sent: Friday, January 24, 2020 1:54 PM
> > > > > To: Salil Mehta <salil.me...@huawei.com>
> > > > >
> > > > > On Fri, 24 Jan 2020 11:20:15 +0000
> > > > > Salil Mehta <salil.me...@huawei.com> wrote:
> > > > >
> > > > > > Hello,
> > > > > > I am working on vCPU Hotplug feature for ARM64 and I am in mid of 
> > > > > > understanding
> > > > > > some aspect of device_add/device_del interface of the QEMU.
> > > > > >
> > > > > > Observations:
> > > > > > 1. Any object initialised by qmp_device_add() gets into 
> > > > > > /machine/unattached
> > > > > > container. I traced the flow to code leg inside  
> > > > > > device_set_realized()
> > > > > > 2. I could see the reverse qmp_device_del() expects the device to 
> > > > > > be in
> > > > > > /machine/peripheral container.
> > > > > > 3. I could see any object initially added to unattached container 
> > > > > > did not had
> > > > > > their parents until object_add_property_child() was called further 
> > > > > > in the leg.
> > > > > > which effectively meant a new property was created and property 
> > > > > > table
> > > > > > populated and child was parented.
> > > > > > 4. Generally, container  /machine/peripheral was being used wherever
> > > > > > DEVICE(dev)->id was present and non-null.
> > > > > >
> > > > > > Question:
> > > > > > 1. Wanted to confirm my understanding about the use of having 
> > > > > > separate
> > > > > > containers like unattached, peripheral and anonymous.
> > > > >
> > > > > > 2. At init time all the vcpus goes under *unattached* container. 
> > > > > > Now,
> > > > > > qmp_device_del() cannot be used to unplug them. I am wondering
> > > > >
> > > > > device is put into 'unattached' in case it wasn't assigned a parent.
> > > > > Usually it happens when board creates device directly.
> > > >
> > > >
> > > > Sure, but if we decide that certain number(N) of vcpus are hotplugabble
> > > > and certain subset of N (say 'n' < 'N') should be allowed to be present
> > > > or cold-plugged at the init time then I wonder which of the following
> > > > is correct approach:
> > > >
> > > > 1. Bring all of N vcpus at boot time under "peripheral" container
> > > >                                    OR
> > > > 2. Just bring subset 'n' of 'N' under "peripheral" container and rest
> > > >     under "unattached" container? And later as and when rest of the
> > > >     vcpus are hotplugged they should be transferred from "unattached"
> > > >     container to "peripheral" container?
> > >
> > > issue with that is that to put device into "peripheral" container,
> > > 'the user' must provide 'id'. (currently QEMU isn't able to do it on its 
> > > own
> > > [1])
> > >
> > > But it doesn't mean that cold-plugged CPUs can't be unpluged.
> > > What current users could do is start QEMU like this (simplified version):
> > >
> > >  $QEMU -smp 1,maxcpus=N -device foo-cpu-type,id=CPU00 -device
> > > foo-cpu-type,id=CPU01 ...
> > >
> > > i.e. 1st CPU is not manageable due to lack if 'id' and is created by 
> > > board code,
> > > the rest have 'id' and could be managed.
> >
> >
> > I understand, that we can somehow assign ids from the QMP interface but
> > above will not push vcpus into "peripheral" container. They will appear
> > in "unattached" container but with specified names and as-far-as I can
> > see in the code 'device_del' can only delete objects/devices from the
> > 'peripheral' container?
> 
> qemu-system-x86_64 -monitor stdio \
>     -smp 1,maxcpus=3 \
>     -device qemu64-x86_64-cpu,id=foo,socket-id=1,core-id=0,thread-id=0 \
>     -device qemu64-x86_64-cpu,socket-id=2,core-id=0,thread-id=0
> 
> (qemu) info hotpluggable-cpus
> Hotpluggable CPUs:
>   type: "qemu64-x86_64-cpu"
>   vcpus_count: "1"
>   qom_path: "/machine/peripheral-anon/device[0]"
>                       ^^^^^^^^^^^^^^^
>   CPUInstance Properties:
>     socket-id: "2"
>     core-id: "0"
>     thread-id: "0"
>   type: "qemu64-x86_64-cpu"
>   vcpus_count: "1"
>   qom_path: "/machine/peripheral/foo"
>                       ^^^^^^^^^^
> 
> in gist, if device is created with any variant of device_add,
> it goes to "peripheral[-anon]" including cold-plugged one.


I am not sure why you said "including cold-plugged one"? I hope by
cold-plug'ging you mean here are the CPUs which already exist at
the boot time of the Guest VM and plugged using -device?


>   CPUInstance Properties:
>     socket-id: "1"
>     core-id: "0"
>     thread-id: "0"
>   type: "qemu64-x86_64-cpu"
>   vcpus_count: "1"
>   qom_path: "/machine/unattached/device[0]"
                          ^^^^^^^^^^^^^

Unless you have pasted wrongly here, above output also shows qom path
as 'unattached' for cold-plugged CPUs. Am I missing something? :)


>   CPUInstance Properties:
>     socket-id: "0"
>     core-id: "0"
>     thread-id: "0"



> > Plus, having those many ids specified for over large number of vcpus
> > does not looks very practical solution. We need interface like auto
> number of IDs is not a problem since it's usually handled by management
> software just fine (ex: libvirt does it)
> 
> > Generation of ids even at the boot time. I could see from the link you
> > have shared that it is already being used by ID_BLOCK subsystem. Can we
> > create a new subsystem for cpus under this and do the auto Generation
> > of vcpu ids as well?
> 
> I'm not sure that auto ids was actually merged.
> (I thought it wasn't)


Well if you were referring to below then it has been part of qemu for quite
long now:

Patch: util - add automated ID generation utility
File: https://github.com/qemu/qemu/blob/master/util/id.c
Commit-id: https://github.com/qemu/qemu/commit/a0f1913637e6


> Anyway auto IDs are not directly related to enabling CPU hotplug for ARM,
> if you feel they should be generated you can try to propose patches.

Sure. 


> > Plus, there is another unique requirement specifically for realizing
> > vcpu hotplug for ARM64.
> >
> > Summary:
> > Right now ARM architecture does not allows reinitializing the GIC
> > after VM has booted. Therefore, we are planning to pre-size the GIC
> > interfaces at init time by initializing all of the possible vcpus
> > and keep them in 'realized' but 'unavailable' state to the Guest
> > VM. They shall be made available as-and-when vcpus are hot-plugged
> > later-on. Therefore, current efforts are to be able to plug and
> > unplug from the qemu QOM without destroying the existing state of
> > the devices/objects representing vcpus. These all possible vcpus
> > shall be created once at the boot time of the VM. The vcpus which
> > are not available to the Guest VM can be Parked.
> >
> > Once the vcpus are hot-(un)plug'ged only the (pre-)plug/unplug(-request)
> > interfaces are used to convey this even information to the Guest
> > VM.
> >
> > I have tested this solution and it works but I wanted to make sure
> > that I am not doing anything which breaks any of the existing Qemu
> > QOM interfaces and basic fundamental idea behind being able to attach
> > and detach from the Qemu QOM is okay?
> >
> > Any suggestion are welcome in this regard?
> 
> From discussion with Drew [CCed], I got that kvm/arm isn't designed
> to support vCPU hotplug (and it would require heavy refactoring to
> separate GIC and VCPUs, which probably won't be welcomed by maintainers).

Agreed. I have pre-sized GIC with possible vcpus. I think Marc did
mention about this earlier.

> 
> But that's only KVM side of the equation. Assuming that we don't
> touch KVM much, the only QEMU side is left.
> 
> Further lets call
>  * vCPU - a kvm's part of CPU
>  * CPU - QEMU object which is linked to vCPU via file descriptor.
> 
> In QEMU we have CPU devices which optionally might create vCPUs
> during device realize time (if QEMU runs with KVM accelerator).
> 
> So from design point of view, I'd suggest to dynamically
> create/remove CPU devices on demand using existing
>  -device/device_add/device_del interface
> like we do for other architectures.
> 
> But in case of running with KVM accelerator, to accommodate
> current non dynamic ARM/KVM, I'd move vCPU creation to "kvm_init()"
> time or board init time, so it would pre-create vCPUs in
> KVM early in parked state and put them in 'kvm_parked_vcpus' list
> but won't create CPU devices for them.


Something similar. But I am doing it from virt machine init.


> Then later when management adds CPU device either with
> '-device' or 'device_add', a new CPU device will pick up
> pre-created parked vCPU file descriptor and re-use it.
> 
> Parked vCPU infrastructure is already exists in QEMU as we use it
> for hot-unplugged CPUs for the same reasons (it needs too much
> refactoring on KVM side to really remove vCPU).

Agreed with respect to KVM change. I have added some new in qemu/kvm
layer but some re-factoring might be needed later to extract common
code. Also, ARM arch specific functions had to be introduced.

> 
> So when CPU is hot-unplugged, we put linked vCPU file descriptor
> into kvm_parked_vcpus (see: kvm_destroy_vcpu) and completely delete
> CPU device on QEMU side. Then when the same CPU is hot-plugged again,
> we reuse previously parked vCPU file descriptor (see: kvm_get_vcpu).


Yes, I have done exactly this and perhaps this approach was discussed
in earlier mail-chains as well involving you. So taken your suggestion
and many thanks for this :)


> > NOTE: I plan to share the patches with the community which includes
> > both the changes of the Linux Kernel and the QEMU in near future.


Plan to share a RFC within a week for sure. 


> > > 1) here is what I could find on IDs topic
> > >    https://lists.gnu.org/archive/html/qemu-block/2015-09/msg00011.html
> > >
> > > > > >    if all the hotplug devices need to go under the *peripheral* 
> > > > > > container while
> > > > > > they are hotplugged and during object init time as well?
> > > > >
> > > > > theoretically device_del may use QOM path (the later users can get 
> > > > > with
> > > > > query-hotpluggable-cpus),
> > > > > but I think it's mostly debugging feature.
> > > >
> > > >
> > > > Sure.
> > > >
> > > >
> > > > > users are supposed to specify 'id' during -device/device_add if they 
> > > > > are going
> > > > > to manage that device.
> > > > > afterwards (like unplugging it). Then they could use that 'id' in 
> > > > > other commands
> > > > > (including device_del)
> > > > >
> > > > > So 'id'-ed devices end up in 'peripheral' container.
> > > >
> > > >
> > > > Sure, what if hotplugged device is removed and then added again? It 
> > > > looks
> > > > qmp_device_add() interface will again end up calling the 
> > > > device_set_realized()
> > > > which eventually would put hotplugged devices under "unattached" 
> > > > container?
> > >
> > > it won't, see call chain:
> > >
> > >   qmp_device_add()
> > >       -> qdev_device_add()
> > >           -> qdev_set_id()
> >
> > Ok, sure. I did see the qdev_set_id() interface. Infact, earlier I was 
> > actually
> > trying to play with it by making it more generic and adding even the 
> > 'unattached'
> > container handling to it(which is missing right now) and calling it inside 
> > the
> > device_set_realized()  instead of below code:
> >
> >         if (!obj->parent) {
> >             gchar *name = g_strdup_printf("device[%d]", unattached_count++);
> >
> >             object_property_add_child(container_get(qdev_get_machine(),
> >                                                     "/unattached"),
> >                                       name, obj, &error_abort);
> >             unattached_parent = true;
> >             g_free(name);
> >         }
> >
> > Idea of above dabbling was to have common interface for 'unattached' 
> > container
> > and call it from virt.c from machvirt_init() where possible vcpus are being
> > created. Force them to be located either inside 'unttached' or 'peripheral'
> > container akin to the example you had given.
> >
> > If we look at qdev_device_add() function, after setting dev->id using
> > qdev_set_id() (which would also result in parenting of an object under
> > 'peripheral' container), it calls the function to 'realize' the 
> > device/object
> > which would end up in hitting above shared code excerpt and now because
> > it will have the parent already set, the object won't go into 'unattached'
> > container.
> >
> > Currently, later cannot be controlled for the cold-lugged vcpus. Therefore,
> > before this discussion my initial thought process was to either make
> > qdev_set_id() universal (i.e. include handling for all type of containers
> > unattached/peripheral/anonymous in qdev_set_id()). Then call it from
> > machvirt_init() just before the vcpus have been 'realized' so that
> > cold-plugged cpus could get into 'peripheral' container. This would help
> > in hot-unplugging using the standard 'device_del' interface.
> 
> I think we understand cold-plugged vcpus differently.
> From QEMU point of view there are 2 kinds of cold-plugged CPU devices.
> 
> Ones that are currently created by board directly following pattern
> 
>  object_new()
>  set properties
>  relalize
> 
> these are created (lets call them builtin) in amount specified by -smp X
> and are not manageable by external applications.
> 
> The second kind of cold-plugged CPUs are the ones created on command
> line with help of:
> 
>  -device cpu-type-foo
> 
> these are created by management applications and could be hot-removed
> if management supplied 'id'. For example libvirt starts qemu with 1
> built-in cpu in paused mode (-s) QEMU, then it hotplugs via QMP N cpus
> and lets VM run (it's essentially the same as using -device on CLI).
> This way it can remove all CPUs except of 1st one which is good enough
> in almost all the cases.
> 

Sure, I can see your point. I have tried to incorporate part of it.
Maybe you would like to check the code which will be sent soon.


Many thanks
Salil


Reply via email to