Hi Salil, On 7/4/23 19:58, Salil Mehta wrote:
Latest Qemu Prototype (Pre RFC V2) (Not in the final shape of the patches) https://github.com/salil-mehta/qemu.git virt-cpuhp-armv8/rfc-v1-port11052023.dev-1 should work against below kernel changes as confirmed by James, Latest Kernel Prototype (Pre RFC V2 = RFC V1 + Fixes) https://git.gitlab.arm.com/linux-arm/linux-jm.git virtual_cpu_hotplug/rfc/v2
I think it'd better to have the discussions through maillist. The threads and all follow-up replies can be cached somewhere to avoid lost. Besides, other people may be intrested in the same points and can join the discussion directly. I got a chance to give the RFC patchsets some tests. Not all cases are working as expected. I know the patchset is being polished. I'm summarize them as below: (1) coredump is triggered when the topology is out of range. It's the issue we discussed in private. Here I'm just recapping in case other people also blocked by the issue. (a) start VM with the following command lines /home/gavin/sandbox/qemu.main/build/qemu-system-aarch64 \ -accel kvm -machine virt,gic-version=host,nvdimm=on -cpu host \ -smp cpus=1,maxcpus=2,sockets=1,clusters=1,cores=1,threads=2 \ -m 512M,slots=16,maxmem=64G \ -object memory-backend-ram,id=mem0,size=512M \ -numa node,nodeid=0,cpus=0-1,memdev=mem0 \ (b) hot add CPU whose topology is out of range (qemu) device_add driver=host-arm-cpu,id=cpu1,core-id=1 It's actually caused by typos in hw/arm/virt.c::virt_cpu_pre_plug() where 'ms->possible_cpus->len' needs to be replaced with 'ms->smp.cores'. With this, the hot-added CPU object will be rejected. (2) I don't think TCG has been tested since it seems not working at all. (a) start VM with the following command lines /home/gshan/sandbox/src/qemu/main/build/qemu-system-aarch64 \ -machine virt,gic-version=3 -cpu max -m 1024 \ -smp maxcpus=2,cpus=1,sockets=1,clusters=1,cores=1,threads=2 \ (b) failure while hot-adding CPU (qemu) device_add driver=max-arm-cpu,id=cpu1,thread-id=1 Error: cpu(id1=0:0:0:1) with arch-id 1 exists The error message is printed by hw/arm/virt.c::virt_cpu_pre_plug() where the specific CPU has been presented. For KVM case, the disabled CPUs are detached from 'ms->possible_cpu->cpus[1].cpu' and destroyed. I think we need to do similar thing for TCG case in hw/arm/virt.c::virt_cpu_post_init(). I'm able to add CPU with the following hunk of changes. --- a/hw/arm/virt.c +++ b/hw/arm/virt.c @@ -2122,6 +2122,18 @@ static void virt_cpu_post_init(VirtMachineState *vms, MemoryRegion *sysmem) exit(1); } } + +#if 1 + for (n = 0; n < possible_cpus->len; n++) { + cpu = qemu_get_possible_cpu(n); + if (!qemu_enabled_cpu(cpu)) { + CPUArchId *cpu_slot; + cpu_slot = virt_find_cpu_slot(ms, cpu->cpu_index); + cpu_slot->cpu = NULL; + object_unref(OBJECT(cpu)); + } + } +#endif } } (3) Assertion on following the sequence of hot-add, hot-remove and hot-add when TCG mode is enabled. (a) Include the hack from (2) and start VM with the following command lines /home/gshan/sandbox/src/qemu/main/build/qemu-system-aarch64 \ -machine virt,gic-version=3 -cpu max -m 1024 \ -smp maxcpus=2,cpus=1,sockets=1,clusters=1,cores=1,threads=2 \ (b) assertion on the sequence of hot-add, hot-remove and hot-add (qemu) device_add driver=max-arm-cpu,id=cpu1,thread-id=1 (qemu) device_del cpu1 (qemu) device_add driver=max-arm-cpu,id=cpu1,thread-id=1 ** ERROR:../tcg/tcg.c:669:tcg_register_thread: assertion failed: (n < tcg_max_ctxs) Bail out! ERROR:../tcg/tcg.c:669:tcg_register_thread: assertion failed: (n < tcg_max_ctxs) Aborted (core dumped) I'm not sure if x86 has similar issue. It seems the management for TCG contexts, corresponding to variable @tcg_max_ctxs and @tcg_ctxs need some improvements for better TCG context registration and unregistration to accomodate CPU hotplug. Apart from what have been found in the tests, I've started to look into the code changes. I may reply with more specific comments. However, it would be ideal to comment on the specific changes after the patchset is posted for review. Salil, the plan may have been mentioned by you somewhere. As I understood, the QEMU patchset will be posted after James's RFCv2 kernel series is posted. Please let me know if my understanding is correct. Again, thanks for your efforts to make vCPU hotplug to be supported :) Thanks, Gavin