Hi Salil,

On 7/4/23 19:58, Salil Mehta wrote:


Latest Qemu Prototype (Pre RFC V2) (Not in the final shape of the patches)
https://github.com/salil-mehta/qemu.git   
virt-cpuhp-armv8/rfc-v1-port11052023.dev-1


should work against below kernel changes as confirmed by James,

Latest Kernel Prototype (Pre RFC V2 = RFC V1 + Fixes)
https://git.gitlab.arm.com/linux-arm/linux-jm.git   virtual_cpu_hotplug/rfc/v2


I think it'd better to have the discussions through maillist. The threads and 
all
follow-up replies can be cached somewhere to avoid lost. Besides, other people 
may
be intrested in the same points and can join the discussion directly.

I got a chance to give the RFC patchsets some tests. Not all cases are working
as expected. I know the patchset is being polished. I'm summarize them as below:

(1) coredump is triggered when the topology is out of range. It's the issue we
    discussed in private. Here I'm just recapping in case other people also 
blocked
    by the issue.

    (a) start VM with the following command lines
     /home/gavin/sandbox/qemu.main/build/qemu-system-aarch64       \
     -accel kvm -machine virt,gic-version=host,nvdimm=on -cpu host \
     -smp cpus=1,maxcpus=2,sockets=1,clusters=1,cores=1,threads=2  \
     -m 512M,slots=16,maxmem=64G                                   \
     -object memory-backend-ram,id=mem0,size=512M                  \
     -numa node,nodeid=0,cpus=0-1,memdev=mem0                      \

    (b) hot add CPU whose topology is out of range
    (qemu) device_add driver=host-arm-cpu,id=cpu1,core-id=1


    It's actually caused by typos in hw/arm/virt.c::virt_cpu_pre_plug() where
    'ms->possible_cpus->len' needs to be replaced with 'ms->smp.cores'. With 
this,
    the hot-added CPU object will be rejected.

(2) I don't think TCG has been tested since it seems not working at all.

    (a) start VM with the following command lines
    /home/gshan/sandbox/src/qemu/main/build/qemu-system-aarch64     \
    -machine virt,gic-version=3 -cpu max -m 1024                    \
    -smp maxcpus=2,cpus=1,sockets=1,clusters=1,cores=1,threads=2    \

    (b) failure while hot-adding CPU
    (qemu) device_add driver=max-arm-cpu,id=cpu1,thread-id=1
    Error: cpu(id1=0:0:0:1) with arch-id 1 exists

    The error message is printed by hw/arm/virt.c::virt_cpu_pre_plug() where the
    specific CPU has been presented. For KVM case, the disabled CPUs are 
detached
    from 'ms->possible_cpu->cpus[1].cpu' and destroyed. I think we need to do 
similar
    thing for TCG case in hw/arm/virt.c::virt_cpu_post_init(). I'm able to add 
CPU
    with the following hunk of changes.

--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -2122,6 +2122,18 @@ static void virt_cpu_post_init(VirtMachineState *vms, 
MemoryRegion *sysmem)
                 exit(1);
             }
         }
+
+#if 1
+        for (n = 0; n < possible_cpus->len; n++) {
+            cpu = qemu_get_possible_cpu(n);
+            if (!qemu_enabled_cpu(cpu)) {
+                CPUArchId *cpu_slot;
+                cpu_slot = virt_find_cpu_slot(ms, cpu->cpu_index);
+                cpu_slot->cpu = NULL;
+                object_unref(OBJECT(cpu));
+            }
+        }
+#endif
     }
 }

(3) Assertion on following the sequence of hot-add, hot-remove and hot-add when 
TCG mode is enabled.

    (a) Include the hack from (2) and start VM with the following command lines
    /home/gshan/sandbox/src/qemu/main/build/qemu-system-aarch64     \
    -machine virt,gic-version=3 -cpu max -m 1024                    \
    -smp maxcpus=2,cpus=1,sockets=1,clusters=1,cores=1,threads=2    \

    (b) assertion on the sequence of hot-add, hot-remove and hot-add
    (qemu) device_add driver=max-arm-cpu,id=cpu1,thread-id=1
    (qemu) device_del cpu1
    (qemu) device_add driver=max-arm-cpu,id=cpu1,thread-id=1
    **
    ERROR:../tcg/tcg.c:669:tcg_register_thread: assertion failed: (n < 
tcg_max_ctxs)
    Bail out! ERROR:../tcg/tcg.c:669:tcg_register_thread: assertion failed: (n 
< tcg_max_ctxs)
    Aborted (core dumped)

    I'm not sure if x86 has similar issue. It seems the management for TCG 
contexts, corresponding
    to variable @tcg_max_ctxs and @tcg_ctxs need some improvements for better 
TCG context registration
    and unregistration to accomodate CPU hotplug.


Apart from what have been found in the tests, I've started to look into the 
code changes. I may
reply with more specific comments. However, it would be ideal to comment on the 
specific changes
after the patchset is posted for review. Salil, the plan may have been 
mentioned by you somewhere.
As I understood, the QEMU patchset will be posted after James's RFCv2 kernel 
series is posted.
Please let me know if my understanding is correct. Again, thanks for your 
efforts to make vCPU
hotplug to be supported :)

Thanks,
Gavin




Reply via email to