Re: [PATCH 19/19] KVM: introduce a KVM_DELETE_DEVICE ioctl
On Wed, 2019-01-23 at 19:39 +0100, Cédric Le Goater wrote: > > The reason I ask is that we will have to be much more careful about > > memory allocation lifetimes with this patch. > > yes. bad refcounting will lead the host kernel to a crash. One way to alleviate that is to make sure this is only supported on selected devices such as XICS via some flag or the presence of a callback. Cheers, Ben.
Re: [PATCH 19/19] KVM: introduce a KVM_DELETE_DEVICE ioctl
On 1/22/19 6:42 AM, Paul Mackerras wrote: > On Mon, Jan 07, 2019 at 08:10:06PM +0100, Cédric Le Goater wrote: >> This will be used to destroy the KVM XICS or XIVE device when the >> sPAPR machine is reseted. When the VM boots, the CAS negotiation >> process will determine which interrupt mode to use and the appropriate >> KVM device will then be created. > > What would be the consequence if we didn't destroy the device? So, if we don't destroy the device, it would mean that we are maintaining its availability under the KVM PPC structures, VM and vCPUs, I think the changes would be significant to have two interrupt devices unde the VM. We would also need a way to activate one or the other depending on the interrupt mode chosen by CAS. In other words, it's moving all the interrupt mode politics from QEMU to KVM. It's possible of course but I would prefer to leave the ugly details in QEMU. Let's suppose now that we keep the device alive but disconnect the presenters from it, and from the VM also. We would have an unused device in the VM. We would need way to keep an handle on it (fd certainly) and a KVM interface to soft reset a KVM device partially initialized. That's one other option. It seemed easier to do an hard reset : create/destroy. > The reason I ask is that we will have to be much more careful about > memory allocation lifetimes with this patch. yes. bad refcounting will lead the host kernel to a crash. > Having KVM devices last > until the KVM instance is destroyed means that we generally avoid > use-after-free bugs. With this patch we will have to do a careful > analysis of the lifetime of the xive structures vs. possible accesses > on other threads to prove there are no use-after-free bugs. > > For example, it is not sufficient to set any pointers in struct kvm or > struct kvm_vcpu that point into xive structures to NULL before freeing > the structures. There could be code on another CPU that has read the > pointer value before you set it to NULL and then goes and accesses it > after you have freed it. You need to prove that can't happen, > possibly using some sort of explicit synchronization that ensures that > no other CPU could still be accessing the structure at the time when > you free it. RCU can help with this, but in general means you need > RCU synchronization primitives (rcu_read_lock() etc.) at all the > places where you use the pointer, which I don't think you currently > have. no. indeed. I have overlooked the synchronization aspect. > If there is a good fundamental reason why this can't happen, even > though you don't have explicit synchronization, then at a minimum you > need to explain that in the patch description, and ideally also in > code comments. OK. I did leave that patch at the end for one reason. It needs more care. Thanks, C.
Re: [PATCH 19/19] KVM: introduce a KVM_DELETE_DEVICE ioctl
On Mon, Jan 07, 2019 at 08:10:06PM +0100, Cédric Le Goater wrote: > This will be used to destroy the KVM XICS or XIVE device when the > sPAPR machine is reseted. When the VM boots, the CAS negotiation > process will determine which interrupt mode to use and the appropriate > KVM device will then be created. What would be the consequence if we didn't destroy the device? The reason I ask is that we will have to be much more careful about memory allocation lifetimes with this patch. Having KVM devices last until the KVM instance is destroyed means that we generally avoid use-after-free bugs. With this patch we will have to do a careful analysis of the lifetime of the xive structures vs. possible accesses on other threads to prove there are no use-after-free bugs. For example, it is not sufficient to set any pointers in struct kvm or struct kvm_vcpu that point into xive structures to NULL before freeing the structures. There could be code on another CPU that has read the pointer value before you set it to NULL and then goes and accesses it after you have freed it. You need to prove that can't happen, possibly using some sort of explicit synchronization that ensures that no other CPU could still be accessing the structure at the time when you free it. RCU can help with this, but in general means you need RCU synchronization primitives (rcu_read_lock() etc.) at all the places where you use the pointer, which I don't think you currently have. If there is a good fundamental reason why this can't happen, even though you don't have explicit synchronization, then at a minimum you need to explain that in the patch description, and ideally also in code comments. Paul.
[PATCH 19/19] KVM: introduce a KVM_DELETE_DEVICE ioctl
This will be used to destroy the KVM XICS or XIVE device when the sPAPR machine is reseted. When the VM boots, the CAS negotiation process will determine which interrupt mode to use and the appropriate KVM device will then be created. Signed-off-by: Cédric Le Goater --- include/linux/kvm_host.h | 2 ++ include/uapi/linux/kvm.h | 2 ++ arch/powerpc/kvm/book3s_xive.c| 38 +- arch/powerpc/kvm/book3s_xive_native.c | 24 + virt/kvm/kvm_main.c | 39 +++ 5 files changed, 104 insertions(+), 1 deletion(-) diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index c38cc5eb7e73..259b6885dc74 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -1218,6 +1218,8 @@ struct kvm_device_ops { */ void (*destroy)(struct kvm_device *dev); + int (*delete)(struct kvm_device *dev); + int (*set_attr)(struct kvm_device *dev, struct kvm_device_attr *attr); int (*get_attr)(struct kvm_device *dev, struct kvm_device_attr *attr); int (*has_attr)(struct kvm_device *dev, struct kvm_device_attr *attr); diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index 52bf74a1616e..b00cb4d986cf 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -1331,6 +1331,8 @@ struct kvm_s390_ucas_mapping { #define KVM_GET_DEVICE_ATTR _IOW(KVMIO, 0xe2, struct kvm_device_attr) #define KVM_HAS_DEVICE_ATTR _IOW(KVMIO, 0xe3, struct kvm_device_attr) +#define KVM_DELETE_DEVICE_IOWR(KVMIO, 0xf0, struct kvm_create_device) + /* * ioctls for vcpu fds */ diff --git a/arch/powerpc/kvm/book3s_xive.c b/arch/powerpc/kvm/book3s_xive.c index 9b4751713554..5449fb4c87f9 100644 --- a/arch/powerpc/kvm/book3s_xive.c +++ b/arch/powerpc/kvm/book3s_xive.c @@ -1109,11 +1109,19 @@ void kvmppc_xive_disable_vcpu_interrupts(struct kvm_vcpu *vcpu) void kvmppc_xive_cleanup_vcpu(struct kvm_vcpu *vcpu) { struct kvmppc_xive_vcpu *xc = vcpu->arch.xive_vcpu; - struct kvmppc_xive *xive = xc->xive; + struct kvmppc_xive *xive; int i; + if (!kvmppc_xics_enabled(vcpu)) + return; + + if (!xc) + return; + pr_devel("cleanup_vcpu(cpu=%d)\n", xc->server_num); + xive = xc->xive; + /* Ensure no interrupt is still routed to that VP */ xc->valid = false; kvmppc_xive_disable_vcpu_interrupts(vcpu); @@ -1150,6 +1158,10 @@ void kvmppc_xive_cleanup_vcpu(struct kvm_vcpu *vcpu) } /* Free the VP */ kfree(xc); + + /* Cleanup the vcpu */ + vcpu->arch.irq_type = KVMPPC_IRQ_DEFAULT; + vcpu->arch.xive_vcpu = NULL; } int kvmppc_xive_connect_vcpu(struct kvm_device *dev, @@ -1861,6 +1873,29 @@ static void kvmppc_xive_free(struct kvm_device *dev) kfree(dev); } +static int kvmppc_xive_delete(struct kvm_device *dev) +{ + struct kvm *kvm = dev->kvm; + unsigned int i; + struct kvm_vcpu *vcpu; + + if (!kvm->arch.xive) + return -EPERM; + + /* +* call kick_all_cpus_sync() to ensure that all CPUs have +* executed any pending interrupts +*/ + if (is_kvmppc_hv_enabled(kvm)) + kick_all_cpus_sync(); + + kvm_for_each_vcpu(i, vcpu, kvm) + kvmppc_xive_cleanup_vcpu(vcpu); + + kvmppc_xive_free(dev); + return 0; +} + static int kvmppc_xive_create(struct kvm_device *dev, u32 type) { struct kvmppc_xive *xive; @@ -2035,6 +2070,7 @@ struct kvm_device_ops kvm_xive_ops = { .create = kvmppc_xive_create, .init = kvmppc_xive_init, .destroy = kvmppc_xive_free, + .delete = kvmppc_xive_delete, .set_attr = xive_set_attr, .get_attr = xive_get_attr, .has_attr = xive_has_attr, diff --git a/arch/powerpc/kvm/book3s_xive_native.c b/arch/powerpc/kvm/book3s_xive_native.c index 12edac29995e..7367962e670a 100644 --- a/arch/powerpc/kvm/book3s_xive_native.c +++ b/arch/powerpc/kvm/book3s_xive_native.c @@ -979,6 +979,29 @@ static void kvmppc_xive_native_free(struct kvm_device *dev) kfree(dev); } +static int kvmppc_xive_native_delete(struct kvm_device *dev) +{ + struct kvm *kvm = dev->kvm; + unsigned int i; + struct kvm_vcpu *vcpu; + + if (!kvm->arch.xive) + return -EPERM; + + /* +* call kick_all_cpus_sync() to ensure that all CPUs have +* executed any pending interrupts +*/ + if (is_kvmppc_hv_enabled(kvm)) + kick_all_cpus_sync(); + + kvm_for_each_vcpu(i, vcpu, kvm) + kvmppc_xive_native_cleanup_vcpu(vcpu); + + kvmppc_xive_native_free(dev); + return 0; +} + /* * ESB MMIO address of chip 0 */ @@ -1350,6 +1373,7 @@ struct kvm_device_ops kvm_xive_native_ops = { .create = kvmppc_xive_native_create, .init = kvmppc_xive_na