Hi Igor, My sincere Apologies, I just realized that I missed to reply this mail. I was distracted to something else in the month of the February and had only resumed working on hotplug in march. But will still reply to this mail. Also, I have incorporated most of the below points as in the vcpu hotplug patches as per your suggestions.
> From: Qemu-arm [mailto:qemu-arm-bounces+salil.mehta=huawei....@nongnu.org] > On Behalf Of Igor Mammedov > Sent: Monday, January 27, 2020 3:04 PM > To: Salil Mehta <salil.me...@huawei.com> > > On Fri, 24 Jan 2020 18:44:16 +0000 > Salil Mehta <salil.me...@huawei.com> wrote: > > > > From: Igor Mammedov [mailto:imamm...@redhat.com] > > > Sent: Friday, January 24, 2020 4:07 PM > > > > > > On Fri, 24 Jan 2020 15:02:10 +0000 > > > Salil Mehta <salil.me...@huawei.com> wrote: > > > > > > > > From: Igor Mammedov [mailto:imamm...@redhat.com] > > > > > Sent: Friday, January 24, 2020 1:54 PM > > > > > To: Salil Mehta <salil.me...@huawei.com> > > > > > > > > > > On Fri, 24 Jan 2020 11:20:15 +0000 > > > > > Salil Mehta <salil.me...@huawei.com> wrote: > > > > > > > > > > > Hello, > > > > > > I am working on vCPU Hotplug feature for ARM64 and I am in mid of > > > > > > understanding > > > > > > some aspect of device_add/device_del interface of the QEMU. > > > > > > > > > > > > Observations: > > > > > > 1. Any object initialised by qmp_device_add() gets into > > > > > > /machine/unattached > > > > > > container. I traced the flow to code leg inside > > > > > > device_set_realized() > > > > > > 2. I could see the reverse qmp_device_del() expects the device to > > > > > > be in > > > > > > /machine/peripheral container. > > > > > > 3. I could see any object initially added to unattached container > > > > > > did not had > > > > > > their parents until object_add_property_child() was called further > > > > > > in the leg. > > > > > > which effectively meant a new property was created and property > > > > > > table > > > > > > populated and child was parented. > > > > > > 4. Generally, container /machine/peripheral was being used wherever > > > > > > DEVICE(dev)->id was present and non-null. > > > > > > > > > > > > Question: > > > > > > 1. Wanted to confirm my understanding about the use of having > > > > > > separate > > > > > > containers like unattached, peripheral and anonymous. > > > > > > > > > > > 2. At init time all the vcpus goes under *unattached* container. > > > > > > Now, > > > > > > qmp_device_del() cannot be used to unplug them. I am wondering > > > > > > > > > > device is put into 'unattached' in case it wasn't assigned a parent. > > > > > Usually it happens when board creates device directly. > > > > > > > > > > > > Sure, but if we decide that certain number(N) of vcpus are hotplugabble > > > > and certain subset of N (say 'n' < 'N') should be allowed to be present > > > > or cold-plugged at the init time then I wonder which of the following > > > > is correct approach: > > > > > > > > 1. Bring all of N vcpus at boot time under "peripheral" container > > > > OR > > > > 2. Just bring subset 'n' of 'N' under "peripheral" container and rest > > > > under "unattached" container? And later as and when rest of the > > > > vcpus are hotplugged they should be transferred from "unattached" > > > > container to "peripheral" container? > > > > > > issue with that is that to put device into "peripheral" container, > > > 'the user' must provide 'id'. (currently QEMU isn't able to do it on its > > > own > > > [1]) > > > > > > But it doesn't mean that cold-plugged CPUs can't be unpluged. > > > What current users could do is start QEMU like this (simplified version): > > > > > > $QEMU -smp 1,maxcpus=N -device foo-cpu-type,id=CPU00 -device > > > foo-cpu-type,id=CPU01 ... > > > > > > i.e. 1st CPU is not manageable due to lack if 'id' and is created by > > > board code, > > > the rest have 'id' and could be managed. > > > > > > I understand, that we can somehow assign ids from the QMP interface but > > above will not push vcpus into "peripheral" container. They will appear > > in "unattached" container but with specified names and as-far-as I can > > see in the code 'device_del' can only delete objects/devices from the > > 'peripheral' container? > > qemu-system-x86_64 -monitor stdio \ > -smp 1,maxcpus=3 \ > -device qemu64-x86_64-cpu,id=foo,socket-id=1,core-id=0,thread-id=0 \ > -device qemu64-x86_64-cpu,socket-id=2,core-id=0,thread-id=0 > > (qemu) info hotpluggable-cpus > Hotpluggable CPUs: > type: "qemu64-x86_64-cpu" > vcpus_count: "1" > qom_path: "/machine/peripheral-anon/device[0]" > ^^^^^^^^^^^^^^^ > CPUInstance Properties: > socket-id: "2" > core-id: "0" > thread-id: "0" > type: "qemu64-x86_64-cpu" > vcpus_count: "1" > qom_path: "/machine/peripheral/foo" > ^^^^^^^^^^ > > in gist, if device is created with any variant of device_add, > it goes to "peripheral[-anon]" including cold-plugged one. I am not sure why you said "including cold-plugged one"? I hope by cold-plug'ging you mean here are the CPUs which already exist at the boot time of the Guest VM and plugged using -device? > CPUInstance Properties: > socket-id: "1" > core-id: "0" > thread-id: "0" > type: "qemu64-x86_64-cpu" > vcpus_count: "1" > qom_path: "/machine/unattached/device[0]" ^^^^^^^^^^^^^ Unless you have pasted wrongly here, above output also shows qom path as 'unattached' for cold-plugged CPUs. Am I missing something? :) > CPUInstance Properties: > socket-id: "0" > core-id: "0" > thread-id: "0" > > Plus, having those many ids specified for over large number of vcpus > > does not looks very practical solution. We need interface like auto > number of IDs is not a problem since it's usually handled by management > software just fine (ex: libvirt does it) > > > Generation of ids even at the boot time. I could see from the link you > > have shared that it is already being used by ID_BLOCK subsystem. Can we > > create a new subsystem for cpus under this and do the auto Generation > > of vcpu ids as well? > > I'm not sure that auto ids was actually merged. > (I thought it wasn't) Well if you were referring to below then it has been part of qemu for quite long now: Patch: util - add automated ID generation utility File: https://github.com/qemu/qemu/blob/master/util/id.c Commit-id: https://github.com/qemu/qemu/commit/a0f1913637e6 > Anyway auto IDs are not directly related to enabling CPU hotplug for ARM, > if you feel they should be generated you can try to propose patches. Sure. > > Plus, there is another unique requirement specifically for realizing > > vcpu hotplug for ARM64. > > > > Summary: > > Right now ARM architecture does not allows reinitializing the GIC > > after VM has booted. Therefore, we are planning to pre-size the GIC > > interfaces at init time by initializing all of the possible vcpus > > and keep them in 'realized' but 'unavailable' state to the Guest > > VM. They shall be made available as-and-when vcpus are hot-plugged > > later-on. Therefore, current efforts are to be able to plug and > > unplug from the qemu QOM without destroying the existing state of > > the devices/objects representing vcpus. These all possible vcpus > > shall be created once at the boot time of the VM. The vcpus which > > are not available to the Guest VM can be Parked. > > > > Once the vcpus are hot-(un)plug'ged only the (pre-)plug/unplug(-request) > > interfaces are used to convey this even information to the Guest > > VM. > > > > I have tested this solution and it works but I wanted to make sure > > that I am not doing anything which breaks any of the existing Qemu > > QOM interfaces and basic fundamental idea behind being able to attach > > and detach from the Qemu QOM is okay? > > > > Any suggestion are welcome in this regard? > > From discussion with Drew [CCed], I got that kvm/arm isn't designed > to support vCPU hotplug (and it would require heavy refactoring to > separate GIC and VCPUs, which probably won't be welcomed by maintainers). Agreed. I have pre-sized GIC with possible vcpus. I think Marc did mention about this earlier. > > But that's only KVM side of the equation. Assuming that we don't > touch KVM much, the only QEMU side is left. > > Further lets call > * vCPU - a kvm's part of CPU > * CPU - QEMU object which is linked to vCPU via file descriptor. > > In QEMU we have CPU devices which optionally might create vCPUs > during device realize time (if QEMU runs with KVM accelerator). > > So from design point of view, I'd suggest to dynamically > create/remove CPU devices on demand using existing > -device/device_add/device_del interface > like we do for other architectures. > > But in case of running with KVM accelerator, to accommodate > current non dynamic ARM/KVM, I'd move vCPU creation to "kvm_init()" > time or board init time, so it would pre-create vCPUs in > KVM early in parked state and put them in 'kvm_parked_vcpus' list > but won't create CPU devices for them. Something similar. But I am doing it from virt machine init. > Then later when management adds CPU device either with > '-device' or 'device_add', a new CPU device will pick up > pre-created parked vCPU file descriptor and re-use it. > > Parked vCPU infrastructure is already exists in QEMU as we use it > for hot-unplugged CPUs for the same reasons (it needs too much > refactoring on KVM side to really remove vCPU). Agreed with respect to KVM change. I have added some new in qemu/kvm layer but some re-factoring might be needed later to extract common code. Also, ARM arch specific functions had to be introduced. > > So when CPU is hot-unplugged, we put linked vCPU file descriptor > into kvm_parked_vcpus (see: kvm_destroy_vcpu) and completely delete > CPU device on QEMU side. Then when the same CPU is hot-plugged again, > we reuse previously parked vCPU file descriptor (see: kvm_get_vcpu). Yes, I have done exactly this and perhaps this approach was discussed in earlier mail-chains as well involving you. So taken your suggestion and many thanks for this :) > > NOTE: I plan to share the patches with the community which includes > > both the changes of the Linux Kernel and the QEMU in near future. Plan to share a RFC within a week for sure. > > > 1) here is what I could find on IDs topic > > > https://lists.gnu.org/archive/html/qemu-block/2015-09/msg00011.html > > > > > > > > > if all the hotplug devices need to go under the *peripheral* > > > > > > container while > > > > > > they are hotplugged and during object init time as well? > > > > > > > > > > theoretically device_del may use QOM path (the later users can get > > > > > with > > > > > query-hotpluggable-cpus), > > > > > but I think it's mostly debugging feature. > > > > > > > > > > > > Sure. > > > > > > > > > > > > > users are supposed to specify 'id' during -device/device_add if they > > > > > are going > > > > > to manage that device. > > > > > afterwards (like unplugging it). Then they could use that 'id' in > > > > > other commands > > > > > (including device_del) > > > > > > > > > > So 'id'-ed devices end up in 'peripheral' container. > > > > > > > > > > > > Sure, what if hotplugged device is removed and then added again? It > > > > looks > > > > qmp_device_add() interface will again end up calling the > > > > device_set_realized() > > > > which eventually would put hotplugged devices under "unattached" > > > > container? > > > > > > it won't, see call chain: > > > > > > qmp_device_add() > > > -> qdev_device_add() > > > -> qdev_set_id() > > > > Ok, sure. I did see the qdev_set_id() interface. Infact, earlier I was > > actually > > trying to play with it by making it more generic and adding even the > > 'unattached' > > container handling to it(which is missing right now) and calling it inside > > the > > device_set_realized() instead of below code: > > > > if (!obj->parent) { > > gchar *name = g_strdup_printf("device[%d]", unattached_count++); > > > > object_property_add_child(container_get(qdev_get_machine(), > > "/unattached"), > > name, obj, &error_abort); > > unattached_parent = true; > > g_free(name); > > } > > > > Idea of above dabbling was to have common interface for 'unattached' > > container > > and call it from virt.c from machvirt_init() where possible vcpus are being > > created. Force them to be located either inside 'unttached' or 'peripheral' > > container akin to the example you had given. > > > > If we look at qdev_device_add() function, after setting dev->id using > > qdev_set_id() (which would also result in parenting of an object under > > 'peripheral' container), it calls the function to 'realize' the > > device/object > > which would end up in hitting above shared code excerpt and now because > > it will have the parent already set, the object won't go into 'unattached' > > container. > > > > Currently, later cannot be controlled for the cold-lugged vcpus. Therefore, > > before this discussion my initial thought process was to either make > > qdev_set_id() universal (i.e. include handling for all type of containers > > unattached/peripheral/anonymous in qdev_set_id()). Then call it from > > machvirt_init() just before the vcpus have been 'realized' so that > > cold-plugged cpus could get into 'peripheral' container. This would help > > in hot-unplugging using the standard 'device_del' interface. > > I think we understand cold-plugged vcpus differently. > From QEMU point of view there are 2 kinds of cold-plugged CPU devices. > > Ones that are currently created by board directly following pattern > > object_new() > set properties > relalize > > these are created (lets call them builtin) in amount specified by -smp X > and are not manageable by external applications. > > The second kind of cold-plugged CPUs are the ones created on command > line with help of: > > -device cpu-type-foo > > these are created by management applications and could be hot-removed > if management supplied 'id'. For example libvirt starts qemu with 1 > built-in cpu in paused mode (-s) QEMU, then it hotplugs via QMP N cpus > and lets VM run (it's essentially the same as using -device on CLI). > This way it can remove all CPUs except of 1st one which is good enough > in almost all the cases. > Sure, I can see your point. I have tried to incorporate part of it. Maybe you would like to check the code which will be sent soon. Many thanks Salil