Hi Harsh,

Sorry for the delay in my reply. I've been off the grid for some time so missed 
this
earlier mail. Please find my reply below to you query.

Thanks

>  From: Harsh Prateek Bora <hars...@linux.ibm.com>
>  Sent: Friday, March 22, 2024 8:15 AM
>  
>  + Vaibhav, Shiva
>  
>  Hi Salil,
>  
>  I came across your patch while trying to solve a related problem on spapr.
>  One query below ..
>  
>  On 3/12/24 07:29, Salil Mehta via wrote:
>  > KVM vCPU creation is done once during the vCPU realization when Qemu
>  > vCPU thread is spawned. This is common to all the architectures as of now.
>  >
>  > Hot-unplug of vCPU results in destruction of the vCPU object in QOM
>  > but the corresponding KVM vCPU object in the Host KVM is not destroyed
>  > as KVM doesn't support vCPU removal. Therefore, its representative KVM
>  > vCPU object/context in Qemu is parked.
>  >
>  > Refactor architecture common logic so that some APIs could be reused
>  > by vCPU Hotplug code of some architectures likes ARM, Loongson etc.
>  > Update new/old APIs with trace events instead of DPRINTF. No functional
>  change is intended here.
>  >
>  > Signed-off-by: Salil Mehta <salil.me...@huawei.com>
>  > Reviewed-by: Gavin Shan <gs...@redhat.com>
>  > Tested-by: Vishnu Pajjuri <vis...@os.amperecomputing.com>
>  > Reviewed-by: Jonathan Cameron <jonathan.came...@huawei.com>
>  > Tested-by: Xianglai Li <lixiang...@loongson.cn>
>  > Tested-by: Miguel Luis <miguel.l...@oracle.com>
>  > Reviewed-by: Shaoqin Huang <shahu...@redhat.com>
>  > ---
>  >   accel/kvm/kvm-all.c    | 64 ++++++++++++++++++++++++++++++++------
>  ----
>  >   accel/kvm/trace-events |  5 +++-
>  >   include/sysemu/kvm.h   | 16 +++++++++++
>  >   3 files changed, 69 insertions(+), 16 deletions(-)
>  >
>  > diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c index
>  > a8cecd040e..3bc3207bda 100644
>  > --- a/accel/kvm/kvm-all.c
>  > +++ b/accel/kvm/kvm-all.c
>  > @@ -126,6 +126,7 @@ static QemuMutex kml_slots_lock;
>  >   #define kvm_slots_unlock()  qemu_mutex_unlock(&kml_slots_lock)
>  >
>  >   static void kvm_slot_init_dirty_bitmap(KVMSlot *mem);
>  > +static int kvm_get_vcpu(KVMState *s, unsigned long vcpu_id);
>  >
>  >   static inline void kvm_resample_fd_remove(int gsi)
>  >   {
>  > @@ -314,14 +315,53 @@ err:
>  >       return ret;
>  >   }
>  >
>  > +void kvm_park_vcpu(CPUState *cpu)
>  > +{
>  > +    struct KVMParkedVcpu *vcpu;
>  > +
>  > +    trace_kvm_park_vcpu(cpu->cpu_index, kvm_arch_vcpu_id(cpu));
>  > +
>  > +    vcpu = g_malloc0(sizeof(*vcpu));
>  > +    vcpu->vcpu_id = kvm_arch_vcpu_id(cpu);
>  > +    vcpu->kvm_fd = cpu->kvm_fd;
>  > +    QLIST_INSERT_HEAD(&kvm_state->kvm_parked_vcpus, vcpu, node); }
>  > +
>  > +int kvm_create_vcpu(CPUState *cpu)
>  > +{
>  > +    unsigned long vcpu_id = kvm_arch_vcpu_id(cpu);
>  > +    KVMState *s = kvm_state;
>  > +    int kvm_fd;
>  > +
>  > +    trace_kvm_create_vcpu(cpu->cpu_index, kvm_arch_vcpu_id(cpu));
>  > +
>  > +    /* check if the KVM vCPU already exist but is parked */
>  > +    kvm_fd = kvm_get_vcpu(s, vcpu_id);
>  > +    if (kvm_fd < 0) {
>  > +        /* vCPU not parked: create a new KVM vCPU */
>  > +        kvm_fd = kvm_vm_ioctl(s, KVM_CREATE_VCPU, vcpu_id);
>  > +        if (kvm_fd < 0) {
>  > +            error_report("KVM_CREATE_VCPU IOCTL failed for vCPU %lu",
>  vcpu_id);
>  > +            return kvm_fd;
>  > +        }
>  > +    }
>  > +
>  > +    cpu->kvm_fd = kvm_fd;
>  > +    cpu->kvm_state = s;
>  > +    cpu->vcpu_dirty = true;
>  > +    cpu->dirty_pages = 0;
>  > +    cpu->throttle_us_per_full = 0;
>  > +
>  > +    return 0;
>  > +}
>  > +
>  >   static int do_kvm_destroy_vcpu(CPUState *cpu)
>  >   {
>  >       KVMState *s = kvm_state;
>  >       long mmap_size;
>  > -    struct KVMParkedVcpu *vcpu = NULL;
>  >       int ret = 0;
>  >
>  > -    trace_kvm_destroy_vcpu();
>  > +    trace_kvm_destroy_vcpu(cpu->cpu_index, kvm_arch_vcpu_id(cpu));
>  >
>  >       ret = kvm_arch_destroy_vcpu(cpu);
>  >       if (ret < 0) {
>  > @@ -347,10 +387,7 @@ static int do_kvm_destroy_vcpu(CPUState *cpu)
>  >           }
>  >       }
>  >
>  > -    vcpu = g_malloc0(sizeof(*vcpu));
>  > -    vcpu->vcpu_id = kvm_arch_vcpu_id(cpu);
>  > -    vcpu->kvm_fd = cpu->kvm_fd;
>  > -    QLIST_INSERT_HEAD(&kvm_state->kvm_parked_vcpus, vcpu, node);
>  > +    kvm_park_vcpu(cpu);
>  >   err:
>  >       return ret;
>  >   }
>  > @@ -371,6 +408,8 @@ static int kvm_get_vcpu(KVMState *s, unsigned
>  long vcpu_id)
>  >           if (cpu->vcpu_id == vcpu_id) {
>  >               int kvm_fd;
>  >
>  > +            trace_kvm_get_vcpu(vcpu_id);
>  > +
>  >               QLIST_REMOVE(cpu, node);
>  >               kvm_fd = cpu->kvm_fd;
>  >               g_free(cpu);
>  > @@ -378,7 +417,7 @@ static int kvm_get_vcpu(KVMState *s, unsigned
>  long vcpu_id)
>  >           }
>  >       }
>  >
>  > -    return kvm_vm_ioctl(s, KVM_CREATE_VCPU, (void *)vcpu_id);
>  > +    return -ENOENT;
>  >   }
>  >
>  >   int kvm_init_vcpu(CPUState *cpu, Error **errp) @@ -389,19 +428,14 @@
>  > int kvm_init_vcpu(CPUState *cpu, Error **errp)
>  >
>  >       trace_kvm_init_vcpu(cpu->cpu_index, kvm_arch_vcpu_id(cpu));
>  >
>  > -    ret = kvm_get_vcpu(s, kvm_arch_vcpu_id(cpu));
>  > +    ret = kvm_create_vcpu(cpu);
>  >       if (ret < 0) {
>  > -        error_setg_errno(errp, -ret, "kvm_init_vcpu: kvm_get_vcpu failed
>  (%lu)",
>  > +        error_setg_errno(errp, -ret,
>  > +                         "kvm_init_vcpu: kvm_create_vcpu failed
>  > + (%lu)",
>  >                            kvm_arch_vcpu_id(cpu));
>  
>  If a vcpu hotplug fails due to failure with kvm_create_vcpu ioctl, current
>  behaviour would be to bring down the guest as errp is &error_fatal. Any
>  thoughts on how do we ensure that a failure with kvm_create_vcpu ioctl for
>  hotplugged cpus (only) doesnt bring down the guest and fail gracefully (by
>  reporting error to user on monitor?)?

In the ARM, we are by design pre-creating all the vCPUs in the KVM during the
Qemu/KVM Init. This is to satisfy the constraints posed by ARM architecture
as we are not allowed to meddle with any initialization at KVM level or Guest
kernel level after system has booted. The constraints are mainly coming from
GIC and related per-CPU features which can only be initialized once during init
in the KVM and then their presence is made to felt to the Guest kernel only
once during enumeration of the CPUs and related GIC CPU interfaces. Later
cannot be changed either. Hence, if all of the KVM vCPUs have been created
successfully during init then hot(un)plugging operations later won't have
fatal initialization errors at the KVM as all operation get handled at QOM
level only for the hot(un)plugged vCPUs.

I feel if there is a failure to create KVM vCPU at Qemu KVM Init time then
there is something severally wrong either with the inputs or the system.
Hence, to keep the handling simple I was in favor of aborting the 
initialization.


But all of above is ARM arch specific. Do you have anything specific in mind
why you need graceful handling at the init time?

Thanks
Salil.

>  
>  regards,
>  Harsh
>  >           goto err;
>  >       }
>  >

Reply via email to