Re: [PATCH] KVM: arm64: fix potential bug
Hi Andre, On Tue, Mar 12, 2019 at 9:44 AM André Przywara wrote: > > On 12/03/2019 00:32, John Gong wrote: > Hi, > > > Since intid always >= VGIC_NR_PRIVATE_IRQS, > > How so? The PMU and the arch timer emulation use PPIs, so intid is > definitely < VGIC_NR_PRIVATE_IRQS there. > > > so then even vcpu == NULL, it never return -EINVAL. > > I am not sure I follow. > To uniquely identify an SPI interrupt, we just need the interrupt ID > (which is always >= 32). For PPIs and SGIs, we additionally need the > vCPU ID this private interrupt belongs to, as there are multiple > interrupts with the same INTID (one per VCPU). > The VCPU ID passed in for SPIs is just a dummy value (because we use the > same function to inject private and shared interrupts), so we don't need > to check for its validity. > > Cheers, > Andre. > Thanks for your explanation. It's my fault to not consider the PPIs and SGIs injection. Sorry for polluting the mail list. Cheers, John Gong > > > > Signed-off-by: Shengmin Gong > > Signed-off-by: John Gong > > --- > > virt/kvm/arm/vgic/vgic.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/virt/kvm/arm/vgic/vgic.c b/virt/kvm/arm/vgic/vgic.c > > index abd9c7352677..d3cb1ce880e2 100644 > > --- a/virt/kvm/arm/vgic/vgic.c > > +++ b/virt/kvm/arm/vgic/vgic.c > > @@ -424,7 +424,7 @@ int kvm_vgic_inject_irq(struct kvm *kvm, int cpuid, > > unsigned int intid, > > return ret; > > > > vcpu = kvm_get_vcpu(kvm, cpuid); > > - if (!vcpu && intid < VGIC_NR_PRIVATE_IRQS) > > + if (!vcpu) > > return -EINVAL; > > > > irq = vgic_get_irq(kvm, vcpu, intid); > > > ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH] KVM: arm64: fix potential bug
On 12/03/2019 00:32, John Gong wrote: Hi, > Since intid always >= VGIC_NR_PRIVATE_IRQS, How so? The PMU and the arch timer emulation use PPIs, so intid is definitely < VGIC_NR_PRIVATE_IRQS there. > so then even vcpu == NULL, it never return -EINVAL. I am not sure I follow. To uniquely identify an SPI interrupt, we just need the interrupt ID (which is always >= 32). For PPIs and SGIs, we additionally need the vCPU ID this private interrupt belongs to, as there are multiple interrupts with the same INTID (one per VCPU). The VCPU ID passed in for SPIs is just a dummy value (because we use the same function to inject private and shared interrupts), so we don't need to check for its validity. Cheers, Andre. > > Signed-off-by: Shengmin Gong > Signed-off-by: John Gong > --- > virt/kvm/arm/vgic/vgic.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/virt/kvm/arm/vgic/vgic.c b/virt/kvm/arm/vgic/vgic.c > index abd9c7352677..d3cb1ce880e2 100644 > --- a/virt/kvm/arm/vgic/vgic.c > +++ b/virt/kvm/arm/vgic/vgic.c > @@ -424,7 +424,7 @@ int kvm_vgic_inject_irq(struct kvm *kvm, int cpuid, > unsigned int intid, > return ret; > > vcpu = kvm_get_vcpu(kvm, cpuid); > - if (!vcpu && intid < VGIC_NR_PRIVATE_IRQS) > + if (!vcpu) > return -EINVAL; > > irq = vgic_get_irq(kvm, vcpu, intid); > ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
[PATCH] KVM: arm64: fix potential bug
Since intid always >= VGIC_NR_PRIVATE_IRQS, so then even vcpu == NULL, it never return -EINVAL. Signed-off-by: Shengmin Gong Signed-off-by: John Gong --- virt/kvm/arm/vgic/vgic.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/virt/kvm/arm/vgic/vgic.c b/virt/kvm/arm/vgic/vgic.c index abd9c7352677..d3cb1ce880e2 100644 --- a/virt/kvm/arm/vgic/vgic.c +++ b/virt/kvm/arm/vgic/vgic.c @@ -424,7 +424,7 @@ int kvm_vgic_inject_irq(struct kvm *kvm, int cpuid, unsigned int intid, return ret; vcpu = kvm_get_vcpu(kvm, cpuid); - if (!vcpu && intid < VGIC_NR_PRIVATE_IRQS) + if (!vcpu) return -EINVAL; irq = vgic_get_irq(kvm, vcpu, intid); -- 2.17.1 ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
[PATCH AUTOSEL 4.14 11/27] KVM: arm/arm64: Reset the VCPU without preemption and vcpu state loaded
From: Christoffer Dall [ Upstream commit e761a927bc9a7ee6ceb7c4f63d5922dbced87f0d ] We have two ways to reset a vcpu: - either through VCPU_INIT - or through a PSCI_ON call The first one is easy to reason about. The second one is implemented in a more bizarre way, as it is the vcpu that handles PSCI_ON that resets the vcpu that is being powered-on. As we need to turn the logic around and have the target vcpu to reset itself, we must take some preliminary steps. Resetting the VCPU state modifies the system register state in memory, but this may interact with vcpu_load/vcpu_put if running with preemption disabled, which in turn may lead to corrupted system register state. Address this by disabling preemption and doing put/load if required around the reset logic. Reviewed-by: Andrew Jones Signed-off-by: Christoffer Dall Signed-off-by: Marc Zyngier Signed-off-by: Sasha Levin --- arch/arm64/kvm/reset.c | 26 -- 1 file changed, 24 insertions(+), 2 deletions(-) diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c index a74311beda35..c1c5a57249d2 100644 --- a/arch/arm64/kvm/reset.c +++ b/arch/arm64/kvm/reset.c @@ -95,16 +95,33 @@ int kvm_arch_dev_ioctl_check_extension(struct kvm *kvm, long ext) * This function finds the right table above and sets the registers on * the virtual CPU struct to their architecturally defined reset * values. + * + * Note: This function can be called from two paths: The KVM_ARM_VCPU_INIT + * ioctl or as part of handling a request issued by another VCPU in the PSCI + * handling code. In the first case, the VCPU will not be loaded, and in the + * second case the VCPU will be loaded. Because this function operates purely + * on the memory-backed valus of system registers, we want to do a full put if + * we were loaded (handling a request) and load the values back at the end of + * the function. Otherwise we leave the state alone. In both cases, we + * disable preemption around the vcpu reset as we would otherwise race with + * preempt notifiers which also call put/load. */ int kvm_reset_vcpu(struct kvm_vcpu *vcpu) { const struct kvm_regs *cpu_reset; + int ret = -EINVAL; + bool loaded; + + preempt_disable(); + loaded = (vcpu->cpu != -1); + if (loaded) + kvm_arch_vcpu_put(vcpu); switch (vcpu->arch.target) { default: if (test_bit(KVM_ARM_VCPU_EL1_32BIT, vcpu->arch.features)) { if (!cpu_has_32bit_el1()) - return -EINVAL; + goto out; cpu_reset = &default_regs_reset32; } else { cpu_reset = &default_regs_reset; @@ -127,5 +144,10 @@ int kvm_reset_vcpu(struct kvm_vcpu *vcpu) vcpu->arch.workaround_flags |= VCPU_WORKAROUND_2_FLAG; /* Reset timer */ - return kvm_timer_vcpu_reset(vcpu); + ret = kvm_timer_vcpu_reset(vcpu); +out: + if (loaded) + kvm_arch_vcpu_load(vcpu, smp_processor_id()); + preempt_enable(); + return ret; } -- 2.19.1 ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
[PATCH AUTOSEL 4.19 19/44] KVM: arm/arm64: vgic: Always initialize the group of private IRQs
From: Christoffer Dall [ Upstream commit ab2d5eb03dbb7b37a1c6356686fb48626ab0c93e ] We currently initialize the group of private IRQs during kvm_vgic_vcpu_init, and the value of the group depends on the GIC model we are emulating. However, CPUs created before creating (and initializing) the VGIC might end up with the wrong group if the VGIC is created as GICv3 later. Since we have no enforced ordering of creating the VGIC and creating VCPUs, we can end up with part the VCPUs being properly intialized and the remaining incorrectly initialized. That also means that we have no single place to do the per-cpu data structure initialization which depends on knowing the emulated GIC model (which is only the group field). This patch removes the incorrect comment from kvm_vgic_vcpu_init and initializes the group of all previously created VCPUs's private interrupts in vgic_init in addition to the existing initialization in kvm_vgic_vcpu_init. Signed-off-by: Christoffer Dall Signed-off-by: Marc Zyngier Signed-off-by: Sasha Levin --- virt/kvm/arm/vgic/vgic-init.c | 22 ++ 1 file changed, 14 insertions(+), 8 deletions(-) diff --git a/virt/kvm/arm/vgic/vgic-init.c b/virt/kvm/arm/vgic/vgic-init.c index 33e7ee814f7b..8196e4f8731f 100644 --- a/virt/kvm/arm/vgic/vgic-init.c +++ b/virt/kvm/arm/vgic/vgic-init.c @@ -231,13 +231,6 @@ int kvm_vgic_vcpu_init(struct kvm_vcpu *vcpu) irq->config = VGIC_CONFIG_LEVEL; } - /* -* GICv3 can only be created via the KVM_DEVICE_CREATE API and -* so we always know the emulation type at this point as it's -* either explicitly configured as GICv3, or explicitly -* configured as GICv2, or not configured yet which also -* implies GICv2. -*/ if (dist->vgic_model == KVM_DEV_TYPE_ARM_VGIC_V3) irq->group = 1; else @@ -281,7 +274,7 @@ int vgic_init(struct kvm *kvm) { struct vgic_dist *dist = &kvm->arch.vgic; struct kvm_vcpu *vcpu; - int ret = 0, i; + int ret = 0, i, idx; if (vgic_initialized(kvm)) return 0; @@ -298,6 +291,19 @@ int vgic_init(struct kvm *kvm) if (ret) goto out; + /* Initialize groups on CPUs created before the VGIC type was known */ + kvm_for_each_vcpu(idx, vcpu, kvm) { + struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu; + + for (i = 0; i < VGIC_NR_PRIVATE_IRQS; i++) { + struct vgic_irq *irq = &vgic_cpu->private_irqs[i]; + if (dist->vgic_model == KVM_DEV_TYPE_ARM_VGIC_V3) + irq->group = 1; + else + irq->group = 0; + } + } + if (vgic_has_its(kvm)) { ret = vgic_v4_init(kvm); if (ret) -- 2.19.1 ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
[PATCH AUTOSEL 4.19 18/44] arm/arm64: KVM: Don't panic on failure to properly reset system registers
From: Marc Zyngier [ Upstream commit 20589c8cc47dce5854c8bf1b44a9fc63d798d26d ] Failing to properly reset system registers is pretty bad. But not quite as bad as bringing the whole machine down... So warn loudly, but slightly more gracefully. Signed-off-by: Marc Zyngier Acked-by: Christoffer Dall Signed-off-by: Sasha Levin --- arch/arm/kvm/coproc.c | 4 ++-- arch/arm64/kvm/sys_regs.c | 8 +--- 2 files changed, 7 insertions(+), 5 deletions(-) diff --git a/arch/arm/kvm/coproc.c b/arch/arm/kvm/coproc.c index cb094e55dc5f..fd6cde23bb5d 100644 --- a/arch/arm/kvm/coproc.c +++ b/arch/arm/kvm/coproc.c @@ -1450,6 +1450,6 @@ void kvm_reset_coprocs(struct kvm_vcpu *vcpu) reset_coproc_regs(vcpu, table, num); for (num = 1; num < NR_CP15_REGS; num++) - if (vcpu_cp15(vcpu, num) == 0x42424242) - panic("Didn't reset vcpu_cp15(vcpu, %zi)", num); + WARN(vcpu_cp15(vcpu, num) == 0x42424242, +"Didn't reset vcpu_cp15(vcpu, %zi)", num); } diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c index 22fbbdbece3c..fe18e68f9a20 100644 --- a/arch/arm64/kvm/sys_regs.c +++ b/arch/arm64/kvm/sys_regs.c @@ -2586,7 +2586,9 @@ void kvm_reset_sys_regs(struct kvm_vcpu *vcpu) table = get_target_table(vcpu->arch.target, true, &num); reset_sys_reg_descs(vcpu, table, num); - for (num = 1; num < NR_SYS_REGS; num++) - if (__vcpu_sys_reg(vcpu, num) == 0x4242424242424242) - panic("Didn't reset __vcpu_sys_reg(%zi)", num); + for (num = 1; num < NR_SYS_REGS; num++) { + if (WARN(__vcpu_sys_reg(vcpu, num) == 0x4242424242424242, +"Didn't reset __vcpu_sys_reg(%zi)\n", num)) + break; + } } -- 2.19.1 ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
[PATCH AUTOSEL 4.19 07/44] KVM: arm/arm64: vgic: Make vgic_dist->lpi_list_lock a raw_spinlock
From: Julien Thierry [ Upstream commit fc3bc475231e12e9c0142f60100cf84d077c79e1 ] vgic_dist->lpi_list_lock must always be taken with interrupts disabled as it is used in interrupt context. For configurations such as PREEMPT_RT_FULL, this means that it should be a raw_spinlock since RT spinlocks are interruptible. Signed-off-by: Julien Thierry Acked-by: Christoffer Dall Acked-by: Marc Zyngier Signed-off-by: Christoffer Dall Signed-off-by: Sasha Levin --- include/kvm/arm_vgic.h| 2 +- virt/kvm/arm/vgic/vgic-init.c | 2 +- virt/kvm/arm/vgic/vgic-its.c | 8 virt/kvm/arm/vgic/vgic.c | 10 +- 4 files changed, 11 insertions(+), 11 deletions(-) diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h index 4f31f96bbfab..90ac450745f1 100644 --- a/include/kvm/arm_vgic.h +++ b/include/kvm/arm_vgic.h @@ -256,7 +256,7 @@ struct vgic_dist { u64 propbaser; /* Protects the lpi_list and the count value below. */ - spinlock_t lpi_list_lock; + raw_spinlock_t lpi_list_lock; struct list_headlpi_list_head; int lpi_list_count; diff --git a/virt/kvm/arm/vgic/vgic-init.c b/virt/kvm/arm/vgic/vgic-init.c index c0c0b88af1d5..33e7ee814f7b 100644 --- a/virt/kvm/arm/vgic/vgic-init.c +++ b/virt/kvm/arm/vgic/vgic-init.c @@ -64,7 +64,7 @@ void kvm_vgic_early_init(struct kvm *kvm) struct vgic_dist *dist = &kvm->arch.vgic; INIT_LIST_HEAD(&dist->lpi_list_head); - spin_lock_init(&dist->lpi_list_lock); + raw_spin_lock_init(&dist->lpi_list_lock); } /* CREATION */ diff --git a/virt/kvm/arm/vgic/vgic-its.c b/virt/kvm/arm/vgic/vgic-its.c index 12502251727e..f376c82afb61 100644 --- a/virt/kvm/arm/vgic/vgic-its.c +++ b/virt/kvm/arm/vgic/vgic-its.c @@ -73,7 +73,7 @@ static struct vgic_irq *vgic_add_lpi(struct kvm *kvm, u32 intid, irq->target_vcpu = vcpu; irq->group = 1; - spin_lock_irqsave(&dist->lpi_list_lock, flags); + raw_spin_lock_irqsave(&dist->lpi_list_lock, flags); /* * There could be a race with another vgic_add_lpi(), so we need to @@ -101,7 +101,7 @@ static struct vgic_irq *vgic_add_lpi(struct kvm *kvm, u32 intid, dist->lpi_list_count++; out_unlock: - spin_unlock_irqrestore(&dist->lpi_list_lock, flags); + raw_spin_unlock_irqrestore(&dist->lpi_list_lock, flags); /* * We "cache" the configuration table entries in our struct vgic_irq's. @@ -339,7 +339,7 @@ int vgic_copy_lpi_list(struct kvm *kvm, struct kvm_vcpu *vcpu, u32 **intid_ptr) if (!intids) return -ENOMEM; - spin_lock_irqsave(&dist->lpi_list_lock, flags); + raw_spin_lock_irqsave(&dist->lpi_list_lock, flags); list_for_each_entry(irq, &dist->lpi_list_head, lpi_list) { if (i == irq_count) break; @@ -348,7 +348,7 @@ int vgic_copy_lpi_list(struct kvm *kvm, struct kvm_vcpu *vcpu, u32 **intid_ptr) continue; intids[i++] = irq->intid; } - spin_unlock_irqrestore(&dist->lpi_list_lock, flags); + raw_spin_unlock_irqrestore(&dist->lpi_list_lock, flags); *intid_ptr = intids; return i; diff --git a/virt/kvm/arm/vgic/vgic.c b/virt/kvm/arm/vgic/vgic.c index f884a54b2601..c5165e3b80cb 100644 --- a/virt/kvm/arm/vgic/vgic.c +++ b/virt/kvm/arm/vgic/vgic.c @@ -72,7 +72,7 @@ static struct vgic_irq *vgic_get_lpi(struct kvm *kvm, u32 intid) struct vgic_irq *irq = NULL; unsigned long flags; - spin_lock_irqsave(&dist->lpi_list_lock, flags); + raw_spin_lock_irqsave(&dist->lpi_list_lock, flags); list_for_each_entry(irq, &dist->lpi_list_head, lpi_list) { if (irq->intid != intid) @@ -88,7 +88,7 @@ static struct vgic_irq *vgic_get_lpi(struct kvm *kvm, u32 intid) irq = NULL; out_unlock: - spin_unlock_irqrestore(&dist->lpi_list_lock, flags); + raw_spin_unlock_irqrestore(&dist->lpi_list_lock, flags); return irq; } @@ -138,15 +138,15 @@ void vgic_put_irq(struct kvm *kvm, struct vgic_irq *irq) if (irq->intid < VGIC_MIN_LPI) return; - spin_lock_irqsave(&dist->lpi_list_lock, flags); + raw_spin_lock_irqsave(&dist->lpi_list_lock, flags); if (!kref_put(&irq->refcount, vgic_irq_release)) { - spin_unlock_irqrestore(&dist->lpi_list_lock, flags); + raw_spin_unlock_irqrestore(&dist->lpi_list_lock, flags); return; }; list_del(&irq->lpi_list); dist->lpi_list_count--; - spin_unlock_irqrestore(&dist->lpi_list_lock, flags); + raw_spin_unlock_irqrestore(&dist->lpi_list_lock, flags); kfree(irq); } -- 2.19.1 ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
[PATCH AUTOSEL 4.19 17/44] arm/arm64: KVM: Allow a VCPU to fully reset itself
From: Marc Zyngier [ Upstream commit 358b28f09f0ab074d781df72b8a671edb1547789 ] The current kvm_psci_vcpu_on implementation will directly try to manipulate the state of the VCPU to reset it. However, since this is not done on the thread that runs the VCPU, we can end up in a strangely corrupted state when the source and target VCPUs are running at the same time. Fix this by factoring out all reset logic from the PSCI implementation and forwarding the required information along with a request to the target VCPU. Reviewed-by: Andrew Jones Signed-off-by: Marc Zyngier Signed-off-by: Christoffer Dall Signed-off-by: Sasha Levin --- arch/arm/include/asm/kvm_host.h | 10 + arch/arm/kvm/reset.c | 24 + arch/arm64/include/asm/kvm_host.h | 11 ++ arch/arm64/kvm/reset.c| 24 + virt/kvm/arm/arm.c| 10 + virt/kvm/arm/psci.c | 36 ++- 6 files changed, 95 insertions(+), 20 deletions(-) diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h index 3ad482d2f1eb..d0d0227fc70d 100644 --- a/arch/arm/include/asm/kvm_host.h +++ b/arch/arm/include/asm/kvm_host.h @@ -48,6 +48,7 @@ #define KVM_REQ_SLEEP \ KVM_ARCH_REQ_FLAGS(0, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP) #define KVM_REQ_IRQ_PENDINGKVM_ARCH_REQ(1) +#define KVM_REQ_VCPU_RESET KVM_ARCH_REQ(2) DECLARE_STATIC_KEY_FALSE(userspace_irqchip_in_use); @@ -147,6 +148,13 @@ struct kvm_cpu_context { typedef struct kvm_cpu_context kvm_cpu_context_t; +struct vcpu_reset_state { + unsigned long pc; + unsigned long r0; + boolbe; + boolreset; +}; + struct kvm_vcpu_arch { struct kvm_cpu_context ctxt; @@ -186,6 +194,8 @@ struct kvm_vcpu_arch { /* Cache some mmu pages needed inside spinlock regions */ struct kvm_mmu_memory_cache mmu_page_cache; + struct vcpu_reset_state reset_state; + /* Detect first run of a vcpu */ bool has_run_once; }; diff --git a/arch/arm/kvm/reset.c b/arch/arm/kvm/reset.c index 5ed0c3ee33d6..e53327912adc 100644 --- a/arch/arm/kvm/reset.c +++ b/arch/arm/kvm/reset.c @@ -26,6 +26,7 @@ #include #include #include +#include #include @@ -69,6 +70,29 @@ int kvm_reset_vcpu(struct kvm_vcpu *vcpu) /* Reset CP15 registers */ kvm_reset_coprocs(vcpu); + /* +* Additional reset state handling that PSCI may have imposed on us. +* Must be done after all the sys_reg reset. +*/ + if (READ_ONCE(vcpu->arch.reset_state.reset)) { + unsigned long target_pc = vcpu->arch.reset_state.pc; + + /* Gracefully handle Thumb2 entry point */ + if (target_pc & 1) { + target_pc &= ~1UL; + vcpu_set_thumb(vcpu); + } + + /* Propagate caller endianness */ + if (vcpu->arch.reset_state.be) + kvm_vcpu_set_be(vcpu); + + *vcpu_pc(vcpu) = target_pc; + vcpu_set_reg(vcpu, 0, vcpu->arch.reset_state.r0); + + vcpu->arch.reset_state.reset = false; + } + /* Reset arch_timer context */ return kvm_timer_vcpu_reset(vcpu); } diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h index 3d6d7336f871..6abe4002945f 100644 --- a/arch/arm64/include/asm/kvm_host.h +++ b/arch/arm64/include/asm/kvm_host.h @@ -48,6 +48,7 @@ #define KVM_REQ_SLEEP \ KVM_ARCH_REQ_FLAGS(0, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP) #define KVM_REQ_IRQ_PENDINGKVM_ARCH_REQ(1) +#define KVM_REQ_VCPU_RESET KVM_ARCH_REQ(2) DECLARE_STATIC_KEY_FALSE(userspace_irqchip_in_use); @@ -206,6 +207,13 @@ struct kvm_cpu_context { typedef struct kvm_cpu_context kvm_cpu_context_t; +struct vcpu_reset_state { + unsigned long pc; + unsigned long r0; + boolbe; + boolreset; +}; + struct kvm_vcpu_arch { struct kvm_cpu_context ctxt; @@ -295,6 +303,9 @@ struct kvm_vcpu_arch { /* Virtual SError ESR to restore when HCR_EL2.VSE is set */ u64 vsesr_el2; + /* Additional reset state */ + struct vcpu_reset_state reset_state; + /* True when deferrable sysregs are loaded on the physical CPU, * see kvm_vcpu_load_sysregs and kvm_vcpu_put_sysregs. */ bool sysregs_loaded_on_cpu; diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c index 644dd0050766..18b9a522a2b3 100644 --- a/arch/arm64/kvm/reset.c +++ b/arch/arm64/kvm/reset.c @@ -31,6 +31,7 @@ #include #include #include +#include #include /* @@ -140,6 +141,29 @@ int kvm_reset_vcpu(struct kvm_vcpu *vcpu) /* Reset system registers */ kvm_reset_sys_regs(vcpu); + /* +* Additional reset state handling that PSCI may have imposed on us. +
[PATCH AUTOSEL 4.19 16/44] KVM: arm/arm64: Reset the VCPU without preemption and vcpu state loaded
From: Christoffer Dall [ Upstream commit e761a927bc9a7ee6ceb7c4f63d5922dbced87f0d ] We have two ways to reset a vcpu: - either through VCPU_INIT - or through a PSCI_ON call The first one is easy to reason about. The second one is implemented in a more bizarre way, as it is the vcpu that handles PSCI_ON that resets the vcpu that is being powered-on. As we need to turn the logic around and have the target vcpu to reset itself, we must take some preliminary steps. Resetting the VCPU state modifies the system register state in memory, but this may interact with vcpu_load/vcpu_put if running with preemption disabled, which in turn may lead to corrupted system register state. Address this by disabling preemption and doing put/load if required around the reset logic. Reviewed-by: Andrew Jones Signed-off-by: Christoffer Dall Signed-off-by: Marc Zyngier Signed-off-by: Sasha Levin --- arch/arm64/kvm/reset.c | 26 -- 1 file changed, 24 insertions(+), 2 deletions(-) diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c index e37c78bbe1ca..644dd0050766 100644 --- a/arch/arm64/kvm/reset.c +++ b/arch/arm64/kvm/reset.c @@ -99,16 +99,33 @@ int kvm_arch_dev_ioctl_check_extension(struct kvm *kvm, long ext) * This function finds the right table above and sets the registers on * the virtual CPU struct to their architecturally defined reset * values. + * + * Note: This function can be called from two paths: The KVM_ARM_VCPU_INIT + * ioctl or as part of handling a request issued by another VCPU in the PSCI + * handling code. In the first case, the VCPU will not be loaded, and in the + * second case the VCPU will be loaded. Because this function operates purely + * on the memory-backed valus of system registers, we want to do a full put if + * we were loaded (handling a request) and load the values back at the end of + * the function. Otherwise we leave the state alone. In both cases, we + * disable preemption around the vcpu reset as we would otherwise race with + * preempt notifiers which also call put/load. */ int kvm_reset_vcpu(struct kvm_vcpu *vcpu) { const struct kvm_regs *cpu_reset; + int ret = -EINVAL; + bool loaded; + + preempt_disable(); + loaded = (vcpu->cpu != -1); + if (loaded) + kvm_arch_vcpu_put(vcpu); switch (vcpu->arch.target) { default: if (test_bit(KVM_ARM_VCPU_EL1_32BIT, vcpu->arch.features)) { if (!cpu_has_32bit_el1()) - return -EINVAL; + goto out; cpu_reset = &default_regs_reset32; } else { cpu_reset = &default_regs_reset; @@ -131,5 +148,10 @@ int kvm_reset_vcpu(struct kvm_vcpu *vcpu) vcpu->arch.workaround_flags |= VCPU_WORKAROUND_2_FLAG; /* Reset timer */ - return kvm_timer_vcpu_reset(vcpu); + ret = kvm_timer_vcpu_reset(vcpu); +out: + if (loaded) + kvm_arch_vcpu_load(vcpu, smp_processor_id()); + preempt_enable(); + return ret; } -- 2.19.1 ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
[PATCH AUTOSEL 4.19 20/44] KVM: arm64: Forbid kprobing of the VHE world-switch code
From: James Morse [ Upstream commit 7d82602909ed9c73b34ad26f05d10db4850a4f8c ] On systems with VHE the kernel and KVM's world-switch code run at the same exception level. Code that is only used on a VHE system does not need to be annotated as __hyp_text as it can reside anywhere in the kernel text. __hyp_text was also used to prevent kprobes from patching breakpoint instructions into this region, as this code runs at a different exception level. While this is no longer true with VHE, KVM still switches VBAR_EL1, meaning a kprobe's breakpoint executed in the world-switch code will cause a hyp-panic. echo "p:weasel sysreg_save_guest_state_vhe" > /sys/kernel/debug/tracing/kprobe_events echo 1 > /sys/kernel/debug/tracing/events/kprobes/weasel/enable lkvm run -k /boot/Image --console serial -p "console=ttyS0 earlycon=uart,mmio,0x3f8" # lkvm run -k /boot/Image -m 384 -c 3 --name guest-1474 Info: Placing fdt at 0x8fe0 - 0x8fff Info: virtio-mmio.devices=0x200@0x1:36 Info: virtio-mmio.devices=0x200@0x10200:37 Info: virtio-mmio.devices=0x200@0x10400:38 [ 614.178186] Kernel panic - not syncing: HYP panic: [ 614.178186] PS:404003c9 PC:100d70e0 ESR:f204 [ 614.178186] FAR:8008 HPFAR:00800800 PAR:1d7edbadc0de [ 614.178186] VCPU:f8de32f1 [ 614.178383] CPU: 2 PID: 1482 Comm: kvm-vcpu-0 Not tainted 5.0.0-rc2 #10799 [ 614.178446] Call trace: [ 614.178480] dump_backtrace+0x0/0x148 [ 614.178567] show_stack+0x24/0x30 [ 614.178658] dump_stack+0x90/0xb4 [ 614.178710] panic+0x13c/0x2d8 [ 614.178793] hyp_panic+0xac/0xd8 [ 614.178880] kvm_vcpu_run_vhe+0x9c/0xe0 [ 614.178958] kvm_arch_vcpu_ioctl_run+0x454/0x798 [ 614.179038] kvm_vcpu_ioctl+0x360/0x898 [ 614.179087] do_vfs_ioctl+0xc4/0x858 [ 614.179174] ksys_ioctl+0x84/0xb8 [ 614.179261] __arm64_sys_ioctl+0x28/0x38 [ 614.179348] el0_svc_common+0x94/0x108 [ 614.179401] el0_svc_handler+0x38/0x78 [ 614.179487] el0_svc+0x8/0xc [ 614.179558] SMP: stopping secondary CPUs [ 614.179661] Kernel Offset: disabled [ 614.179695] CPU features: 0x003,2a80aa38 [ 614.179758] Memory Limit: none [ 614.179858] ---[ end Kernel panic - not syncing: HYP panic: [ 614.179858] PS:404003c9 PC:100d70e0 ESR:f204 [ 614.179858] FAR:8008 HPFAR:00800800 PAR:1d7edbadc0de [ 614.179858] VCPU:f8de32f1 ]--- Annotate the VHE world-switch functions that aren't marked __hyp_text using NOKPROBE_SYMBOL(). Signed-off-by: James Morse Fixes: 3f5c90b890ac ("KVM: arm64: Introduce VHE-specific kvm_vcpu_run") Acked-by: Masami Hiramatsu Signed-off-by: Marc Zyngier Signed-off-by: Sasha Levin --- arch/arm64/kvm/hyp/switch.c| 5 + arch/arm64/kvm/hyp/sysreg-sr.c | 5 + 2 files changed, 10 insertions(+) diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c index a1c32c1f2267..6290a4e81d57 100644 --- a/arch/arm64/kvm/hyp/switch.c +++ b/arch/arm64/kvm/hyp/switch.c @@ -23,6 +23,7 @@ #include #include +#include #include #include #include @@ -107,6 +108,7 @@ static void activate_traps_vhe(struct kvm_vcpu *vcpu) write_sysreg(kvm_get_hyp_vector(), vbar_el1); } +NOKPROBE_SYMBOL(activate_traps_vhe); static void __hyp_text __activate_traps_nvhe(struct kvm_vcpu *vcpu) { @@ -146,6 +148,7 @@ static void deactivate_traps_vhe(void) write_sysreg(CPACR_EL1_DEFAULT, cpacr_el1); write_sysreg(vectors, vbar_el1); } +NOKPROBE_SYMBOL(deactivate_traps_vhe); static void __hyp_text __deactivate_traps_nvhe(void) { @@ -529,6 +532,7 @@ int kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu) return exit_code; } +NOKPROBE_SYMBOL(kvm_vcpu_run_vhe); /* Switch to the guest for legacy non-VHE systems */ int __hyp_text __kvm_vcpu_run_nvhe(struct kvm_vcpu *vcpu) @@ -636,6 +640,7 @@ static void __hyp_call_panic_vhe(u64 spsr, u64 elr, u64 par, read_sysreg_el2(esr), read_sysreg_el2(far), read_sysreg(hpfar_el2), par, vcpu); } +NOKPROBE_SYMBOL(__hyp_call_panic_vhe); void __hyp_text __noreturn hyp_panic(struct kvm_cpu_context *host_ctxt) { diff --git a/arch/arm64/kvm/hyp/sysreg-sr.c b/arch/arm64/kvm/hyp/sysreg-sr.c index 9ce223944983..963d669ae3a2 100644 --- a/arch/arm64/kvm/hyp/sysreg-sr.c +++ b/arch/arm64/kvm/hyp/sysreg-sr.c @@ -18,6 +18,7 @@ #include #include +#include #include #include #include @@ -98,12 +99,14 @@ void sysreg_save_host_state_vhe(struct kvm_cpu_context *ctxt) { __sysreg_save_common_state(ctxt); } +NOKPROBE_SYMBOL(sysreg_save_host_state_vhe); void sysreg_save_guest_state_vhe(struct kvm_cpu_context *ctxt) { __sysreg_save_common_state(ctxt); __sysreg_save_el2_return_state(ctxt); } +NOKPROBE_SYMBOL(sysreg_save_guest_state_vhe); static void __hyp_text __sysreg_restore_common_state(struct kvm_cpu_context *ctxt) { @@ -171,12 +174,14 @@ void sysreg_restore_host_state_vhe(struct kvm_cpu_context *ctxt) { __sysreg_restore_common_sta
[PATCH AUTOSEL 4.20 18/52] KVM: arm/arm64: Reset the VCPU without preemption and vcpu state loaded
From: Christoffer Dall [ Upstream commit e761a927bc9a7ee6ceb7c4f63d5922dbced87f0d ] We have two ways to reset a vcpu: - either through VCPU_INIT - or through a PSCI_ON call The first one is easy to reason about. The second one is implemented in a more bizarre way, as it is the vcpu that handles PSCI_ON that resets the vcpu that is being powered-on. As we need to turn the logic around and have the target vcpu to reset itself, we must take some preliminary steps. Resetting the VCPU state modifies the system register state in memory, but this may interact with vcpu_load/vcpu_put if running with preemption disabled, which in turn may lead to corrupted system register state. Address this by disabling preemption and doing put/load if required around the reset logic. Reviewed-by: Andrew Jones Signed-off-by: Christoffer Dall Signed-off-by: Marc Zyngier Signed-off-by: Sasha Levin --- arch/arm64/kvm/reset.c | 26 -- 1 file changed, 24 insertions(+), 2 deletions(-) diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c index b72a3dd56204..f21a2a575939 100644 --- a/arch/arm64/kvm/reset.c +++ b/arch/arm64/kvm/reset.c @@ -105,16 +105,33 @@ int kvm_arch_vm_ioctl_check_extension(struct kvm *kvm, long ext) * This function finds the right table above and sets the registers on * the virtual CPU struct to their architecturally defined reset * values. + * + * Note: This function can be called from two paths: The KVM_ARM_VCPU_INIT + * ioctl or as part of handling a request issued by another VCPU in the PSCI + * handling code. In the first case, the VCPU will not be loaded, and in the + * second case the VCPU will be loaded. Because this function operates purely + * on the memory-backed valus of system registers, we want to do a full put if + * we were loaded (handling a request) and load the values back at the end of + * the function. Otherwise we leave the state alone. In both cases, we + * disable preemption around the vcpu reset as we would otherwise race with + * preempt notifiers which also call put/load. */ int kvm_reset_vcpu(struct kvm_vcpu *vcpu) { const struct kvm_regs *cpu_reset; + int ret = -EINVAL; + bool loaded; + + preempt_disable(); + loaded = (vcpu->cpu != -1); + if (loaded) + kvm_arch_vcpu_put(vcpu); switch (vcpu->arch.target) { default: if (test_bit(KVM_ARM_VCPU_EL1_32BIT, vcpu->arch.features)) { if (!cpu_has_32bit_el1()) - return -EINVAL; + goto out; cpu_reset = &default_regs_reset32; } else { cpu_reset = &default_regs_reset; @@ -137,7 +154,12 @@ int kvm_reset_vcpu(struct kvm_vcpu *vcpu) vcpu->arch.workaround_flags |= VCPU_WORKAROUND_2_FLAG; /* Reset timer */ - return kvm_timer_vcpu_reset(vcpu); + ret = kvm_timer_vcpu_reset(vcpu); +out: + if (loaded) + kvm_arch_vcpu_load(vcpu, smp_processor_id()); + preempt_enable(); + return ret; } void kvm_set_ipa_limit(void) -- 2.19.1 ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
[PATCH AUTOSEL 4.20 21/52] KVM: arm/arm64: vgic: Always initialize the group of private IRQs
From: Christoffer Dall [ Upstream commit ab2d5eb03dbb7b37a1c6356686fb48626ab0c93e ] We currently initialize the group of private IRQs during kvm_vgic_vcpu_init, and the value of the group depends on the GIC model we are emulating. However, CPUs created before creating (and initializing) the VGIC might end up with the wrong group if the VGIC is created as GICv3 later. Since we have no enforced ordering of creating the VGIC and creating VCPUs, we can end up with part the VCPUs being properly intialized and the remaining incorrectly initialized. That also means that we have no single place to do the per-cpu data structure initialization which depends on knowing the emulated GIC model (which is only the group field). This patch removes the incorrect comment from kvm_vgic_vcpu_init and initializes the group of all previously created VCPUs's private interrupts in vgic_init in addition to the existing initialization in kvm_vgic_vcpu_init. Signed-off-by: Christoffer Dall Signed-off-by: Marc Zyngier Signed-off-by: Sasha Levin --- virt/kvm/arm/vgic/vgic-init.c | 22 ++ 1 file changed, 14 insertions(+), 8 deletions(-) diff --git a/virt/kvm/arm/vgic/vgic-init.c b/virt/kvm/arm/vgic/vgic-init.c index 33e7ee814f7b..8196e4f8731f 100644 --- a/virt/kvm/arm/vgic/vgic-init.c +++ b/virt/kvm/arm/vgic/vgic-init.c @@ -231,13 +231,6 @@ int kvm_vgic_vcpu_init(struct kvm_vcpu *vcpu) irq->config = VGIC_CONFIG_LEVEL; } - /* -* GICv3 can only be created via the KVM_DEVICE_CREATE API and -* so we always know the emulation type at this point as it's -* either explicitly configured as GICv3, or explicitly -* configured as GICv2, or not configured yet which also -* implies GICv2. -*/ if (dist->vgic_model == KVM_DEV_TYPE_ARM_VGIC_V3) irq->group = 1; else @@ -281,7 +274,7 @@ int vgic_init(struct kvm *kvm) { struct vgic_dist *dist = &kvm->arch.vgic; struct kvm_vcpu *vcpu; - int ret = 0, i; + int ret = 0, i, idx; if (vgic_initialized(kvm)) return 0; @@ -298,6 +291,19 @@ int vgic_init(struct kvm *kvm) if (ret) goto out; + /* Initialize groups on CPUs created before the VGIC type was known */ + kvm_for_each_vcpu(idx, vcpu, kvm) { + struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu; + + for (i = 0; i < VGIC_NR_PRIVATE_IRQS; i++) { + struct vgic_irq *irq = &vgic_cpu->private_irqs[i]; + if (dist->vgic_model == KVM_DEV_TYPE_ARM_VGIC_V3) + irq->group = 1; + else + irq->group = 0; + } + } + if (vgic_has_its(kvm)) { ret = vgic_v4_init(kvm); if (ret) -- 2.19.1 ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
[PATCH AUTOSEL 4.20 19/52] arm/arm64: KVM: Allow a VCPU to fully reset itself
From: Marc Zyngier [ Upstream commit 358b28f09f0ab074d781df72b8a671edb1547789 ] The current kvm_psci_vcpu_on implementation will directly try to manipulate the state of the VCPU to reset it. However, since this is not done on the thread that runs the VCPU, we can end up in a strangely corrupted state when the source and target VCPUs are running at the same time. Fix this by factoring out all reset logic from the PSCI implementation and forwarding the required information along with a request to the target VCPU. Reviewed-by: Andrew Jones Signed-off-by: Marc Zyngier Signed-off-by: Christoffer Dall Signed-off-by: Sasha Levin --- arch/arm/include/asm/kvm_host.h | 10 + arch/arm/kvm/reset.c | 24 + arch/arm64/include/asm/kvm_host.h | 11 ++ arch/arm64/kvm/reset.c| 24 + virt/kvm/arm/arm.c| 10 + virt/kvm/arm/psci.c | 36 ++- 6 files changed, 95 insertions(+), 20 deletions(-) diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h index 5ca5d9af0c26..4ecd426e9b1c 100644 --- a/arch/arm/include/asm/kvm_host.h +++ b/arch/arm/include/asm/kvm_host.h @@ -48,6 +48,7 @@ #define KVM_REQ_SLEEP \ KVM_ARCH_REQ_FLAGS(0, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP) #define KVM_REQ_IRQ_PENDINGKVM_ARCH_REQ(1) +#define KVM_REQ_VCPU_RESET KVM_ARCH_REQ(2) DECLARE_STATIC_KEY_FALSE(userspace_irqchip_in_use); @@ -147,6 +148,13 @@ struct kvm_cpu_context { typedef struct kvm_cpu_context kvm_cpu_context_t; +struct vcpu_reset_state { + unsigned long pc; + unsigned long r0; + boolbe; + boolreset; +}; + struct kvm_vcpu_arch { struct kvm_cpu_context ctxt; @@ -186,6 +194,8 @@ struct kvm_vcpu_arch { /* Cache some mmu pages needed inside spinlock regions */ struct kvm_mmu_memory_cache mmu_page_cache; + struct vcpu_reset_state reset_state; + /* Detect first run of a vcpu */ bool has_run_once; }; diff --git a/arch/arm/kvm/reset.c b/arch/arm/kvm/reset.c index 5ed0c3ee33d6..e53327912adc 100644 --- a/arch/arm/kvm/reset.c +++ b/arch/arm/kvm/reset.c @@ -26,6 +26,7 @@ #include #include #include +#include #include @@ -69,6 +70,29 @@ int kvm_reset_vcpu(struct kvm_vcpu *vcpu) /* Reset CP15 registers */ kvm_reset_coprocs(vcpu); + /* +* Additional reset state handling that PSCI may have imposed on us. +* Must be done after all the sys_reg reset. +*/ + if (READ_ONCE(vcpu->arch.reset_state.reset)) { + unsigned long target_pc = vcpu->arch.reset_state.pc; + + /* Gracefully handle Thumb2 entry point */ + if (target_pc & 1) { + target_pc &= ~1UL; + vcpu_set_thumb(vcpu); + } + + /* Propagate caller endianness */ + if (vcpu->arch.reset_state.be) + kvm_vcpu_set_be(vcpu); + + *vcpu_pc(vcpu) = target_pc; + vcpu_set_reg(vcpu, 0, vcpu->arch.reset_state.r0); + + vcpu->arch.reset_state.reset = false; + } + /* Reset arch_timer context */ return kvm_timer_vcpu_reset(vcpu); } diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h index 52fbc823ff8c..4fc7c8f0b717 100644 --- a/arch/arm64/include/asm/kvm_host.h +++ b/arch/arm64/include/asm/kvm_host.h @@ -48,6 +48,7 @@ #define KVM_REQ_SLEEP \ KVM_ARCH_REQ_FLAGS(0, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP) #define KVM_REQ_IRQ_PENDINGKVM_ARCH_REQ(1) +#define KVM_REQ_VCPU_RESET KVM_ARCH_REQ(2) DECLARE_STATIC_KEY_FALSE(userspace_irqchip_in_use); @@ -208,6 +209,13 @@ struct kvm_cpu_context { typedef struct kvm_cpu_context kvm_cpu_context_t; +struct vcpu_reset_state { + unsigned long pc; + unsigned long r0; + boolbe; + boolreset; +}; + struct kvm_vcpu_arch { struct kvm_cpu_context ctxt; @@ -297,6 +305,9 @@ struct kvm_vcpu_arch { /* Virtual SError ESR to restore when HCR_EL2.VSE is set */ u64 vsesr_el2; + /* Additional reset state */ + struct vcpu_reset_state reset_state; + /* True when deferrable sysregs are loaded on the physical CPU, * see kvm_vcpu_load_sysregs and kvm_vcpu_put_sysregs. */ bool sysregs_loaded_on_cpu; diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c index f21a2a575939..f16a5f8ff2b4 100644 --- a/arch/arm64/kvm/reset.c +++ b/arch/arm64/kvm/reset.c @@ -32,6 +32,7 @@ #include #include #include +#include #include /* Maximum phys_shift supported for any VM on this host */ @@ -146,6 +147,29 @@ int kvm_reset_vcpu(struct kvm_vcpu *vcpu) /* Reset system registers */ kvm_reset_sys_regs(vcpu); + /* +* Additional reset
[PATCH AUTOSEL 4.20 20/52] arm/arm64: KVM: Don't panic on failure to properly reset system registers
From: Marc Zyngier [ Upstream commit 20589c8cc47dce5854c8bf1b44a9fc63d798d26d ] Failing to properly reset system registers is pretty bad. But not quite as bad as bringing the whole machine down... So warn loudly, but slightly more gracefully. Signed-off-by: Marc Zyngier Acked-by: Christoffer Dall Signed-off-by: Sasha Levin --- arch/arm/kvm/coproc.c | 4 ++-- arch/arm64/kvm/sys_regs.c | 8 +--- 2 files changed, 7 insertions(+), 5 deletions(-) diff --git a/arch/arm/kvm/coproc.c b/arch/arm/kvm/coproc.c index cb094e55dc5f..fd6cde23bb5d 100644 --- a/arch/arm/kvm/coproc.c +++ b/arch/arm/kvm/coproc.c @@ -1450,6 +1450,6 @@ void kvm_reset_coprocs(struct kvm_vcpu *vcpu) reset_coproc_regs(vcpu, table, num); for (num = 1; num < NR_CP15_REGS; num++) - if (vcpu_cp15(vcpu, num) == 0x42424242) - panic("Didn't reset vcpu_cp15(vcpu, %zi)", num); + WARN(vcpu_cp15(vcpu, num) == 0x42424242, +"Didn't reset vcpu_cp15(vcpu, %zi)", num); } diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c index 22fbbdbece3c..fe18e68f9a20 100644 --- a/arch/arm64/kvm/sys_regs.c +++ b/arch/arm64/kvm/sys_regs.c @@ -2586,7 +2586,9 @@ void kvm_reset_sys_regs(struct kvm_vcpu *vcpu) table = get_target_table(vcpu->arch.target, true, &num); reset_sys_reg_descs(vcpu, table, num); - for (num = 1; num < NR_SYS_REGS; num++) - if (__vcpu_sys_reg(vcpu, num) == 0x4242424242424242) - panic("Didn't reset __vcpu_sys_reg(%zi)", num); + for (num = 1; num < NR_SYS_REGS; num++) { + if (WARN(__vcpu_sys_reg(vcpu, num) == 0x4242424242424242, +"Didn't reset __vcpu_sys_reg(%zi)\n", num)) + break; + } } -- 2.19.1 ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
[PATCH AUTOSEL 4.20 22/52] KVM: arm64: Forbid kprobing of the VHE world-switch code
From: James Morse [ Upstream commit 7d82602909ed9c73b34ad26f05d10db4850a4f8c ] On systems with VHE the kernel and KVM's world-switch code run at the same exception level. Code that is only used on a VHE system does not need to be annotated as __hyp_text as it can reside anywhere in the kernel text. __hyp_text was also used to prevent kprobes from patching breakpoint instructions into this region, as this code runs at a different exception level. While this is no longer true with VHE, KVM still switches VBAR_EL1, meaning a kprobe's breakpoint executed in the world-switch code will cause a hyp-panic. echo "p:weasel sysreg_save_guest_state_vhe" > /sys/kernel/debug/tracing/kprobe_events echo 1 > /sys/kernel/debug/tracing/events/kprobes/weasel/enable lkvm run -k /boot/Image --console serial -p "console=ttyS0 earlycon=uart,mmio,0x3f8" # lkvm run -k /boot/Image -m 384 -c 3 --name guest-1474 Info: Placing fdt at 0x8fe0 - 0x8fff Info: virtio-mmio.devices=0x200@0x1:36 Info: virtio-mmio.devices=0x200@0x10200:37 Info: virtio-mmio.devices=0x200@0x10400:38 [ 614.178186] Kernel panic - not syncing: HYP panic: [ 614.178186] PS:404003c9 PC:100d70e0 ESR:f204 [ 614.178186] FAR:8008 HPFAR:00800800 PAR:1d7edbadc0de [ 614.178186] VCPU:f8de32f1 [ 614.178383] CPU: 2 PID: 1482 Comm: kvm-vcpu-0 Not tainted 5.0.0-rc2 #10799 [ 614.178446] Call trace: [ 614.178480] dump_backtrace+0x0/0x148 [ 614.178567] show_stack+0x24/0x30 [ 614.178658] dump_stack+0x90/0xb4 [ 614.178710] panic+0x13c/0x2d8 [ 614.178793] hyp_panic+0xac/0xd8 [ 614.178880] kvm_vcpu_run_vhe+0x9c/0xe0 [ 614.178958] kvm_arch_vcpu_ioctl_run+0x454/0x798 [ 614.179038] kvm_vcpu_ioctl+0x360/0x898 [ 614.179087] do_vfs_ioctl+0xc4/0x858 [ 614.179174] ksys_ioctl+0x84/0xb8 [ 614.179261] __arm64_sys_ioctl+0x28/0x38 [ 614.179348] el0_svc_common+0x94/0x108 [ 614.179401] el0_svc_handler+0x38/0x78 [ 614.179487] el0_svc+0x8/0xc [ 614.179558] SMP: stopping secondary CPUs [ 614.179661] Kernel Offset: disabled [ 614.179695] CPU features: 0x003,2a80aa38 [ 614.179758] Memory Limit: none [ 614.179858] ---[ end Kernel panic - not syncing: HYP panic: [ 614.179858] PS:404003c9 PC:100d70e0 ESR:f204 [ 614.179858] FAR:8008 HPFAR:00800800 PAR:1d7edbadc0de [ 614.179858] VCPU:f8de32f1 ]--- Annotate the VHE world-switch functions that aren't marked __hyp_text using NOKPROBE_SYMBOL(). Signed-off-by: James Morse Fixes: 3f5c90b890ac ("KVM: arm64: Introduce VHE-specific kvm_vcpu_run") Acked-by: Masami Hiramatsu Signed-off-by: Marc Zyngier Signed-off-by: Sasha Levin --- arch/arm64/kvm/hyp/switch.c| 5 + arch/arm64/kvm/hyp/sysreg-sr.c | 5 + 2 files changed, 10 insertions(+) diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c index f6e02cc4d856..c9f4b25f67d9 100644 --- a/arch/arm64/kvm/hyp/switch.c +++ b/arch/arm64/kvm/hyp/switch.c @@ -23,6 +23,7 @@ #include #include +#include #include #include #include @@ -107,6 +108,7 @@ static void activate_traps_vhe(struct kvm_vcpu *vcpu) write_sysreg(kvm_get_hyp_vector(), vbar_el1); } +NOKPROBE_SYMBOL(activate_traps_vhe); static void __hyp_text __activate_traps_nvhe(struct kvm_vcpu *vcpu) { @@ -146,6 +148,7 @@ static void deactivate_traps_vhe(void) write_sysreg(CPACR_EL1_DEFAULT, cpacr_el1); write_sysreg(vectors, vbar_el1); } +NOKPROBE_SYMBOL(deactivate_traps_vhe); static void __hyp_text __deactivate_traps_nvhe(void) { @@ -529,6 +532,7 @@ int kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu) return exit_code; } +NOKPROBE_SYMBOL(kvm_vcpu_run_vhe); /* Switch to the guest for legacy non-VHE systems */ int __hyp_text __kvm_vcpu_run_nvhe(struct kvm_vcpu *vcpu) @@ -636,6 +640,7 @@ static void __hyp_call_panic_vhe(u64 spsr, u64 elr, u64 par, read_sysreg_el2(esr), read_sysreg_el2(far), read_sysreg(hpfar_el2), par, vcpu); } +NOKPROBE_SYMBOL(__hyp_call_panic_vhe); void __hyp_text __noreturn hyp_panic(struct kvm_cpu_context *host_ctxt) { diff --git a/arch/arm64/kvm/hyp/sysreg-sr.c b/arch/arm64/kvm/hyp/sysreg-sr.c index 68d6f7c3b237..b426e2cf973c 100644 --- a/arch/arm64/kvm/hyp/sysreg-sr.c +++ b/arch/arm64/kvm/hyp/sysreg-sr.c @@ -18,6 +18,7 @@ #include #include +#include #include #include #include @@ -98,12 +99,14 @@ void sysreg_save_host_state_vhe(struct kvm_cpu_context *ctxt) { __sysreg_save_common_state(ctxt); } +NOKPROBE_SYMBOL(sysreg_save_host_state_vhe); void sysreg_save_guest_state_vhe(struct kvm_cpu_context *ctxt) { __sysreg_save_common_state(ctxt); __sysreg_save_el2_return_state(ctxt); } +NOKPROBE_SYMBOL(sysreg_save_guest_state_vhe); static void __hyp_text __sysreg_restore_common_state(struct kvm_cpu_context *ctxt) { @@ -188,12 +191,14 @@ void sysreg_restore_host_state_vhe(struct kvm_cpu_context *ctxt) { __sysreg_restore_common_sta
[PATCH AUTOSEL 4.20 07/52] KVM: arm/arm64: vgic: Make vgic_dist->lpi_list_lock a raw_spinlock
From: Julien Thierry [ Upstream commit fc3bc475231e12e9c0142f60100cf84d077c79e1 ] vgic_dist->lpi_list_lock must always be taken with interrupts disabled as it is used in interrupt context. For configurations such as PREEMPT_RT_FULL, this means that it should be a raw_spinlock since RT spinlocks are interruptible. Signed-off-by: Julien Thierry Acked-by: Christoffer Dall Acked-by: Marc Zyngier Signed-off-by: Christoffer Dall Signed-off-by: Sasha Levin --- include/kvm/arm_vgic.h| 2 +- virt/kvm/arm/vgic/vgic-init.c | 2 +- virt/kvm/arm/vgic/vgic-its.c | 8 virt/kvm/arm/vgic/vgic.c | 10 +- 4 files changed, 11 insertions(+), 11 deletions(-) diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h index 4f31f96bbfab..90ac450745f1 100644 --- a/include/kvm/arm_vgic.h +++ b/include/kvm/arm_vgic.h @@ -256,7 +256,7 @@ struct vgic_dist { u64 propbaser; /* Protects the lpi_list and the count value below. */ - spinlock_t lpi_list_lock; + raw_spinlock_t lpi_list_lock; struct list_headlpi_list_head; int lpi_list_count; diff --git a/virt/kvm/arm/vgic/vgic-init.c b/virt/kvm/arm/vgic/vgic-init.c index c0c0b88af1d5..33e7ee814f7b 100644 --- a/virt/kvm/arm/vgic/vgic-init.c +++ b/virt/kvm/arm/vgic/vgic-init.c @@ -64,7 +64,7 @@ void kvm_vgic_early_init(struct kvm *kvm) struct vgic_dist *dist = &kvm->arch.vgic; INIT_LIST_HEAD(&dist->lpi_list_head); - spin_lock_init(&dist->lpi_list_lock); + raw_spin_lock_init(&dist->lpi_list_lock); } /* CREATION */ diff --git a/virt/kvm/arm/vgic/vgic-its.c b/virt/kvm/arm/vgic/vgic-its.c index eb2a390a6c86..e99fb31582fb 100644 --- a/virt/kvm/arm/vgic/vgic-its.c +++ b/virt/kvm/arm/vgic/vgic-its.c @@ -73,7 +73,7 @@ static struct vgic_irq *vgic_add_lpi(struct kvm *kvm, u32 intid, irq->target_vcpu = vcpu; irq->group = 1; - spin_lock_irqsave(&dist->lpi_list_lock, flags); + raw_spin_lock_irqsave(&dist->lpi_list_lock, flags); /* * There could be a race with another vgic_add_lpi(), so we need to @@ -101,7 +101,7 @@ static struct vgic_irq *vgic_add_lpi(struct kvm *kvm, u32 intid, dist->lpi_list_count++; out_unlock: - spin_unlock_irqrestore(&dist->lpi_list_lock, flags); + raw_spin_unlock_irqrestore(&dist->lpi_list_lock, flags); /* * We "cache" the configuration table entries in our struct vgic_irq's. @@ -332,7 +332,7 @@ int vgic_copy_lpi_list(struct kvm *kvm, struct kvm_vcpu *vcpu, u32 **intid_ptr) if (!intids) return -ENOMEM; - spin_lock_irqsave(&dist->lpi_list_lock, flags); + raw_spin_lock_irqsave(&dist->lpi_list_lock, flags); list_for_each_entry(irq, &dist->lpi_list_head, lpi_list) { if (i == irq_count) break; @@ -341,7 +341,7 @@ int vgic_copy_lpi_list(struct kvm *kvm, struct kvm_vcpu *vcpu, u32 **intid_ptr) continue; intids[i++] = irq->intid; } - spin_unlock_irqrestore(&dist->lpi_list_lock, flags); + raw_spin_unlock_irqrestore(&dist->lpi_list_lock, flags); *intid_ptr = intids; return i; diff --git a/virt/kvm/arm/vgic/vgic.c b/virt/kvm/arm/vgic/vgic.c index f884a54b2601..c5165e3b80cb 100644 --- a/virt/kvm/arm/vgic/vgic.c +++ b/virt/kvm/arm/vgic/vgic.c @@ -72,7 +72,7 @@ static struct vgic_irq *vgic_get_lpi(struct kvm *kvm, u32 intid) struct vgic_irq *irq = NULL; unsigned long flags; - spin_lock_irqsave(&dist->lpi_list_lock, flags); + raw_spin_lock_irqsave(&dist->lpi_list_lock, flags); list_for_each_entry(irq, &dist->lpi_list_head, lpi_list) { if (irq->intid != intid) @@ -88,7 +88,7 @@ static struct vgic_irq *vgic_get_lpi(struct kvm *kvm, u32 intid) irq = NULL; out_unlock: - spin_unlock_irqrestore(&dist->lpi_list_lock, flags); + raw_spin_unlock_irqrestore(&dist->lpi_list_lock, flags); return irq; } @@ -138,15 +138,15 @@ void vgic_put_irq(struct kvm *kvm, struct vgic_irq *irq) if (irq->intid < VGIC_MIN_LPI) return; - spin_lock_irqsave(&dist->lpi_list_lock, flags); + raw_spin_lock_irqsave(&dist->lpi_list_lock, flags); if (!kref_put(&irq->refcount, vgic_irq_release)) { - spin_unlock_irqrestore(&dist->lpi_list_lock, flags); + raw_spin_unlock_irqrestore(&dist->lpi_list_lock, flags); return; }; list_del(&irq->lpi_list); dist->lpi_list_count--; - spin_unlock_irqrestore(&dist->lpi_list_lock, flags); + raw_spin_unlock_irqrestore(&dist->lpi_list_lock, flags); kfree(irq); } -- 2.19.1 ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
[RFC] Question about TLB flush while set Stage-2 huge pages
Hi all, While a page is merged into a transparent huge page, KVM will invalidate Stage-2 for the base address of the huge page and the whole of Stage-1. However, this just only invalidates the first page within the huge page and the other pages are not invalidated, see bellow: +---+--+ |abcde 2MB-Page | +---+--+ TLB before setting new pmd: +---+--+ | VA |PAGESIZE | +---+--+ | a| 4KB | +---+--+ | b| 4KB | +---+--+ | c| 4KB | +---+--+ | d| 4KB | +---+--+ TLB after setting new pmd: +---+--+ | VA |PAGESIZE | +---+--+ | a| 2MB | +---+--+ | b| 4KB | +---+--+ | c| 4KB | +---+--+ | d| 4KB | +---+--+ When VM access *b* address, it will hit the TLB and result in TLB conflict aborts or other potential exceptions. For example, we need to keep tracking of the VM memory dirty pages when VM is in live migration. KVM will set the memslot READONLY and split the huge pages. After live migration is canceled and abort, the pages will be merged into THP. The later access to these pages which are READONLY will cause level-3 Permission Fault until they are invalidated. So should we invalidate the tlb entries for all relative pages(e.g a,b,c,d), like __flush_tlb_range()? Or we can call __kvm_tlb_flush_vmid() to invalidate all tlb entries. -- Thanks, Xiang ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: Question: KVM: Failed to bind vfio with PCI-e / SMMU on Juno-r2
Hi Eric, [ + Mark Rutland ] On Mon, Mar 11, 2019 at 10:47:22AM +0100, Auger Eric wrote: [...] > >>> P.s. I also checked the sysfs node and found it doesn't contain node > >>> 'iommu_group': > >>> > >>> # ls /sys/bus/pci/devices/\:08\:00.0/iommu_group > >>> ls: cannot access '/sys/bus/pci/devices/:08:00.0/iommu_group': No > >>> such file or directory > >> > >> please can you give the output of the following command: > >> find /sys/kernel/iommu_groups/ > > > > I get below result on Juno board: > > > > root@debian:~# find /sys/kernel/iommu_groups/ > > /sys/kernel/iommu_groups/ > > /sys/kernel/iommu_groups/1 > > /sys/kernel/iommu_groups/1/devices > > /sys/kernel/iommu_groups/1/devices/2007.etr > > /sys/kernel/iommu_groups/1/type > > /sys/kernel/iommu_groups/1/reserved_regions > > /sys/kernel/iommu_groups/0 > > /sys/kernel/iommu_groups/0/devices > > /sys/kernel/iommu_groups/0/devices/7ffb.ohci > > /sys/kernel/iommu_groups/0/devices/7ffc.ehci > > /sys/kernel/iommu_groups/0/type > > /sys/kernel/iommu_groups/0/reserved_regions > > > > So the 'iommu_groups' is not created for pci-e devices, right? > > Yes that's correct. Juno's pci-e devices without 'iommu_groups' is caused by dt binding, in dts file [1] it sets 'status = "disabled"' for 'smmu_pcie' node. Thus the SMMU device will not be enabled for pci-e devices. @Mark Rutland, could you have chance to confirm this should be fixed in juno-base.dtsi or juno-r1.dts/juno-r2.dts? > > Will debug into dt binding and related code and keep posted at here.OK So now I made some progress and can see the networking card is pass-through to guest OS, though the networking card reports errors now. Below is detailed steps and info: - Bind devices in the same IOMMU group to vfio driver: echo :03:00.0 > /sys/bus/pci/devices/\:03\:00.0/driver/unbind echo 1095 3132 > /sys/bus/pci/drivers/vfio-pci/new_id echo :08:00.0 > /sys/bus/pci/devices/\:08\:00.0/driver/unbind echo 11ab 4380 > /sys/bus/pci/drivers/vfio-pci/new_id - Enable 'allow_unsafe_interrupts=1' for module vfio_iommu_type1; - Use qemu to launch guest OS: qemu-system-aarch64 \ -cpu host -M virt,accel=kvm -m 4096 -nographic \ -kernel /root/virt/Image -append root=/dev/vda2 \ -net none -device vfio-pci,host=08:00.0 \ -drive if=virtio,file=/root/virt/qemu/debian.img \ -append 'loglevel=8 root=/dev/vda2 rw console=ttyAMA0 earlyprintk ip=dhcp' - Host log: [ 188.329861] vfio-pci :08:00.0: enabling device ( -> 0003) - Below is guest log, from log though the driver has been registered but it reports PCI hardware failure and the timeout for the interrupt. So is this caused by very 'slow' forward interrupt handling? Juno board uses GICv2 (I think it has GICv2m extension). [...] [1.024483] sky2 :00:01.0 eth0: enabling interface [1.026822] sky2 :00:01.0: error interrupt status=0x8000 [1.029155] sky2 :00:01.0: PCI hardware error (0x1010) [4.000699] sky2 :00:01.0 eth0: Link is up at 1000 Mbps, full duplex, flow control both [4.026116] Sending DHCP requests . [4.026201] sky2 :00:01.0: error interrupt status=0x8000 [4.030043] sky2 :00:01.0: PCI hardware error (0x1010) [6.546111] .. [ 14.118106] [ cut here ] [ 14.120672] NETDEV WATCHDOG: eth0 (sky2): transmit queue 0 timed out [ 14.123555] WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:461 dev_watchdog+0x2b4/0x2c0 [ 14.127082] Modules linked in: [ 14.128631] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.0.0-rc8-00061-ga98f9a047756-dirty # [ 14.132800] Hardware name: linux,dummy-virt (DT) [ 14.135082] pstate: 6005 (nZCv daif -PAN -UAO) [ 14.137459] pc : dev_watchdog+0x2b4/0x2c0 [ 14.139457] lr : dev_watchdog+0x2b4/0x2c0 [ 14.141351] sp : 10003d70 [ 14.142924] x29: 10003d70 x28: 112f60c0 [ 14.145433] x27: 0140 x26: 8000fa6eb3b8 [ 14.147936] x25: x24: 8000fa7a7c80 [ 14.150428] x23: 8000fa6eb39c x22: 8000fa6eafb8 [ 14.152934] x21: 8000fa6eb000 x20: 112f7000 [ 14.155437] x19: x18: [ 14.157929] x17: x16: [ 14.160432] x15: 112fd6c8 x14: 90003a97 [ 14.162927] x13: 10003aa5 x12: 11315878 [ 14.165428] x11: 11315000 x10: 05f5e0ff [ 14.167935] x9 : ffd0 x8 : 64656d6974203020 [ 14.170430] x7 : 6575657571207469 x6 : 00e3 [ 14.172935] x5 : x4 : [ 14.175443] x3 : x2 : 113158a8 [ 14.177938] x1 : f2db9128b1f08600 x0 : [ 14.180443] Call trace: [ 14.181625] dev_watchdog+0x2b4/0x2c0 [ 14.183377] call_timer_fn+0x20/0x78 [ 14.185078] expire_timers+0xa4/0xb0 [ 14.186777] run_timer_softirq+0xa0/0x190 [ 14.188687] __do_softirq+0x108/0x234 [
Re: [PATCH v11 6/8] arm64: KVM: Enable VHE support for :G/:H perf event modifiers
On Mon, Mar 11, 2019 at 09:39:19AM +, Julien Thierry wrote: > Hi Andrew, > > On 08/03/2019 12:07, Andrew Murray wrote: > > With VHE different exception levels are used between the host (EL2) and > > guest (EL1) with a shared exception level for userpace (EL0). We can take > > advantage of this and use the PMU's exception level filtering to avoid > > enabling/disabling counters in the world-switch code. Instead we just > > modify the counter type to include or exclude EL0 at vcpu_{load,put} time. > > > > We also ensure that trapped PMU system register writes do not re-enable > > EL0 when reconfiguring the backing perf events. > > > > This approach completely avoids blackout windows seen with !VHE. > > > > Suggested-by: Christoffer Dall > > Signed-off-by: Andrew Murray > > --- > > arch/arm/include/asm/kvm_host.h | 3 ++ > > arch/arm64/include/asm/kvm_host.h | 5 +- > > arch/arm64/kernel/perf_event.c| 6 ++- > > arch/arm64/kvm/pmu.c | 87 ++- > > arch/arm64/kvm/sys_regs.c | 3 ++ > > virt/kvm/arm/arm.c| 2 + > > 6 files changed, 102 insertions(+), 4 deletions(-) > > > > diff --git a/arch/arm/include/asm/kvm_host.h > > b/arch/arm/include/asm/kvm_host.h > > index a358cb15bb0d..3ce429954306 100644 > > --- a/arch/arm/include/asm/kvm_host.h > > +++ b/arch/arm/include/asm/kvm_host.h > > @@ -326,6 +326,9 @@ static inline void kvm_arch_vcpu_load_fp(struct > > kvm_vcpu *vcpu) {} > > static inline void kvm_arch_vcpu_ctxsync_fp(struct kvm_vcpu *vcpu) {} > > static inline void kvm_arch_vcpu_put_fp(struct kvm_vcpu *vcpu) {} > > > > +static inline void kvm_vcpu_pmu_restore_guest(struct kvm_vcpu *vcpu) {} > > +static inline void kvm_vcpu_pmu_restore_host(struct kvm_vcpu *vcpu) {} > > + > > static inline void kvm_arm_vhe_guest_enter(void) {} > > static inline void kvm_arm_vhe_guest_exit(void) {} > > > > diff --git a/arch/arm64/include/asm/kvm_host.h > > b/arch/arm64/include/asm/kvm_host.h > > index 7ca4e094626d..d631528898b5 100644 > > --- a/arch/arm64/include/asm/kvm_host.h > > +++ b/arch/arm64/include/asm/kvm_host.h > > @@ -487,7 +487,7 @@ void kvm_arch_vcpu_put_fp(struct kvm_vcpu *vcpu); > > > > static inline bool kvm_pmu_counter_defered(struct perf_event_attr *attr) > > { > > - return attr->exclude_host; > > + return (!has_vhe() && attr->exclude_host); > > } > > > > #ifdef CONFIG_KVM /* Avoid conflicts with core headers if CONFIG_KVM=n */ > > @@ -501,6 +501,9 @@ void kvm_clr_pmu_events(u32 clr); > > > > void __pmu_switch_to_host(struct kvm_cpu_context *host_ctxt); > > bool __pmu_switch_to_guest(struct kvm_cpu_context *host_ctxt); > > + > > +void kvm_vcpu_pmu_restore_guest(struct kvm_vcpu *vcpu); > > +void kvm_vcpu_pmu_restore_host(struct kvm_vcpu *vcpu); > > #else > > static inline void kvm_set_pmu_events(u32 set, struct perf_event_attr > > *attr) {} > > static inline void kvm_clr_pmu_events(u32 clr) {} > > diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c > > index 64f02a9fd7cd..a121a82fc54c 100644 > > --- a/arch/arm64/kernel/perf_event.c > > +++ b/arch/arm64/kernel/perf_event.c > > @@ -847,8 +847,12 @@ static int armv8pmu_set_event_filter(struct > > hw_perf_event *event, > > * with other architectures (x86 and Power). > > */ > > if (is_kernel_in_hyp_mode()) { > > - if (!attr->exclude_kernel) > > + if (!attr->exclude_kernel && !attr->exclude_host) > > config_base |= ARMV8_PMU_INCLUDE_EL2; > > + if (attr->exclude_guest) > > + config_base |= ARMV8_PMU_EXCLUDE_EL1; > > + if (attr->exclude_host) > > + config_base |= ARMV8_PMU_EXCLUDE_EL0; > > } else { > > if (!attr->exclude_hv && !attr->exclude_host) > > config_base |= ARMV8_PMU_INCLUDE_EL2; > > diff --git a/arch/arm64/kvm/pmu.c b/arch/arm64/kvm/pmu.c > > index a1cee7919953..a0830c70ece5 100644 > > --- a/arch/arm64/kvm/pmu.c > > +++ b/arch/arm64/kvm/pmu.c > > @@ -12,11 +12,19 @@ > > DECLARE_PER_CPU(kvm_host_data_t, kvm_host_data); > > > > /* > > - * Given the exclude_{host,guest} attributes, determine if we are going > > - * to need to switch counters at guest entry/exit. > > + * Given the perf event attributes and system type, determine > > + * if we are going to need to switch counters at guest entry/exit. > > */ > > static bool kvm_pmu_switch_needed(struct perf_event_attr *attr) > > { > > + /** > > +* With VHE the guest kernel runs at EL1 and the host at EL2, > > +* where user (EL0) is excluded then we have no reason to switch > > +* counters. > > +*/ > > + if (has_vhe() && attr->exclude_user) > > + return false; > > + > > /* Only switch if attributes are different */ > > return (attr->exclude_host ^ attr->exclude_guest); > > } > > @@ -87,3 +95,78 @@ void __hyp_text __pmu_switch_to_host(struct > > kvm_cpu_context *host_ctxt) > >
Re: [PATCH v11 3/8] arm64: KVM: add accessors to track guest/host only counters
On Mon, Mar 11, 2019 at 11:00:04AM +, Suzuki K Poulose wrote: > Hi Andrew, > > > On 08/03/2019 12:07, Andrew Murray wrote: > > In order to effeciently switch events_{guest,host} perf counters at > > guest entry/exit we add bitfields to kvm_cpu_context for guest and host > > events as well as accessors for updating them. > > > > A function is also provided which allows the PMU driver to determine > > if a counter should start counting when it is enabled. With exclude_host, > > events on !VHE we may only start counting when entering the guest. > > Some minor comments below. > > > > > Signed-off-by: Andrew Murray > > --- > > arch/arm64/include/asm/kvm_host.h | 17 +++ > > arch/arm64/kvm/Makefile | 2 +- > > arch/arm64/kvm/pmu.c | 49 +++ > > 3 files changed, 67 insertions(+), 1 deletion(-) > > create mode 100644 arch/arm64/kvm/pmu.c > > > > diff --git a/arch/arm64/include/asm/kvm_host.h > > b/arch/arm64/include/asm/kvm_host.h > > index 1d36619d6650..4b7219128f2d 100644 > > --- a/arch/arm64/include/asm/kvm_host.h > > +++ b/arch/arm64/include/asm/kvm_host.h > > @@ -207,8 +207,14 @@ struct kvm_cpu_context { > > struct kvm_vcpu *__hyp_running_vcpu; > > }; > > +struct kvm_pmu_events { > > + u32 events_host; > > + u32 events_guest; > > +}; > > + > > struct kvm_host_data { > > struct kvm_cpu_context host_ctxt; > > + struct kvm_pmu_events pmu_events; > > }; > > typedef struct kvm_host_data kvm_host_data_t; > > @@ -479,11 +485,22 @@ void kvm_arch_vcpu_load_fp(struct kvm_vcpu *vcpu); > > void kvm_arch_vcpu_ctxsync_fp(struct kvm_vcpu *vcpu); > > void kvm_arch_vcpu_put_fp(struct kvm_vcpu *vcpu); > > +static inline bool kvm_pmu_counter_defered(struct perf_event_attr *attr) > > nit: s/defered/deferred/ Thanks. > > > +{ > > + return attr->exclude_host; > > +} > > + > > Does it need a definition for !CONFIG_KVM case, to make sure that > the events are always enabled for that case ? i.e, does this introduce > a change in behavior for !CONFIG_KVM case ? I think this hunk is correct. It makes sense to not count with exclude_host regardless to if there are guests or not. (By the way v10 has this test, we've just moved it to this file.) Later in this series we update the hunk to condition it on !has_vhe(), this is still OK - it means for VHE we always enable to counter (despite exclude_host) but that's OK because on VHE with exclude_host we exclude EL2, EL0 (armv8pmu_set_event_filter). This does mean that we're enabling a counter that doesn't do anything, but then from the user perspective it is a bit pointless to use exclude_host on a system without KVM or guests. > > > #ifdef CONFIG_KVM /* Avoid conflicts with core headers if CONFIG_KVM=n */ > > static inline int kvm_arch_vcpu_run_pid_change(struct kvm_vcpu *vcpu) > > { > > return kvm_arch_vcpu_run_map_fp(vcpu); > > } > > + > > +void kvm_set_pmu_events(u32 set, struct perf_event_attr *attr); > > +void kvm_clr_pmu_events(u32 clr); > > +#else > > +static inline void kvm_set_pmu_events(u32 set, struct perf_event_attr > > *attr) {} > > +static inline void kvm_clr_pmu_events(u32 clr) {} > > #endif > > static inline void kvm_arm_vhe_guest_enter(void) > > diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile > > index 0f2a135ba15b..f34cb49b66ae 100644 > > --- a/arch/arm64/kvm/Makefile > > +++ b/arch/arm64/kvm/Makefile > > @@ -19,7 +19,7 @@ kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/psci.o > > $(KVM)/arm/perf.o > > kvm-$(CONFIG_KVM_ARM_HOST) += inject_fault.o regmap.o va_layout.o > > kvm-$(CONFIG_KVM_ARM_HOST) += hyp.o hyp-init.o handle_exit.o > > kvm-$(CONFIG_KVM_ARM_HOST) += guest.o debug.o reset.o sys_regs.o > > sys_regs_generic_v8.o > > -kvm-$(CONFIG_KVM_ARM_HOST) += vgic-sys-reg-v3.o fpsimd.o > > +kvm-$(CONFIG_KVM_ARM_HOST) += vgic-sys-reg-v3.o fpsimd.o pmu.o > > kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/aarch32.o > > kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/vgic/vgic.o > > diff --git a/arch/arm64/kvm/pmu.c b/arch/arm64/kvm/pmu.c > > new file mode 100644 > > index ..43965a3cc0f4 > > --- /dev/null > > +++ b/arch/arm64/kvm/pmu.c > > @@ -0,0 +1,49 @@ > > +// SPDX-License-Identifier: GPL-2.0 > > +/* > > + * arch/arm64/kvm/pmu.c: Switching between guest and host counters > > minor nit: You don't need the file name, it is superfluous. > > > + * > > + * Copyright 2019 Arm Limited > > + * Author: Andrew Murray > > + */ > > +#include > > +#include > > + > > > +DECLARE_PER_CPU(kvm_host_data_t, kvm_host_data); > > nit: Do we really need this ? This is already in asm/kvm_host.h. No we don't - I'll remove it. > > > + > > +/* > > + * Given the exclude_{host,guest} attributes, determine if we are going > > + * to need to switch counters at guest entry/exit. > > + */ > > +static bool kvm_pmu_switch_needed(struct perf_event_attr *attr) > > +{ > > + /* Only switch if attributes are different */ > > + return (attr->exclude_h
Re: [PATCH v11 3/8] arm64: KVM: add accessors to track guest/host only counters
On Fri, Mar 08, 2019 at 04:40:51PM +, Julien Thierry wrote: > Hi Andrew, > > On 08/03/2019 12:07, Andrew Murray wrote: > > In order to effeciently switch events_{guest,host} perf counters at > > guest entry/exit we add bitfields to kvm_cpu_context for guest and host > > events as well as accessors for updating them. > > > > A function is also provided which allows the PMU driver to determine > > if a counter should start counting when it is enabled. With exclude_host, > > events on !VHE we may only start counting when entering the guest. > > > > I might have missed something here. Why is that true only for !VHE? Is > it because with VHE we can just exclude EL1? That's correct. For VHE we never defer counting and can rely on the PMU filtering capabilities (even though for EL0 we have to reconfigure the filtering upon entering/exiting the guest). > (It's also a bit confusing since the patch does not seem to contain any > VHE/nVHE distinction) This is updated in the later patches of this series. I felt the series would be easier to understand if I add the special VHE case last. I will update the commit message such that it reads "With exclude_host, we may only start counting when entering the guest.". I.e. the changes here are valid for both VHE and !VHE until the later patches change it for VHE. > > > Signed-off-by: Andrew Murray > > --- > > arch/arm64/include/asm/kvm_host.h | 17 +++ > > arch/arm64/kvm/Makefile | 2 +- > > arch/arm64/kvm/pmu.c | 49 +++ > > 3 files changed, 67 insertions(+), 1 deletion(-) > > create mode 100644 arch/arm64/kvm/pmu.c > > > > diff --git a/arch/arm64/include/asm/kvm_host.h > > b/arch/arm64/include/asm/kvm_host.h > > index 1d36619d6650..4b7219128f2d 100644 > > --- a/arch/arm64/include/asm/kvm_host.h > > +++ b/arch/arm64/include/asm/kvm_host.h > > @@ -207,8 +207,14 @@ struct kvm_cpu_context { > > struct kvm_vcpu *__hyp_running_vcpu; > > }; > > > > +struct kvm_pmu_events { > > + u32 events_host; > > + u32 events_guest; > > +}; > > + > > struct kvm_host_data { > > struct kvm_cpu_context host_ctxt; > > + struct kvm_pmu_events pmu_events; > > }; > > > > typedef struct kvm_host_data kvm_host_data_t; > > @@ -479,11 +485,22 @@ void kvm_arch_vcpu_load_fp(struct kvm_vcpu *vcpu); > > void kvm_arch_vcpu_ctxsync_fp(struct kvm_vcpu *vcpu); > > void kvm_arch_vcpu_put_fp(struct kvm_vcpu *vcpu); > > > > +static inline bool kvm_pmu_counter_defered(struct perf_event_attr *attr) > > +{ > > + return attr->exclude_host; > > +} > > + > > #ifdef CONFIG_KVM /* Avoid conflicts with core headers if CONFIG_KVM=n */ > > static inline int kvm_arch_vcpu_run_pid_change(struct kvm_vcpu *vcpu) > > { > > return kvm_arch_vcpu_run_map_fp(vcpu); > > } > > + > > +void kvm_set_pmu_events(u32 set, struct perf_event_attr *attr); > > +void kvm_clr_pmu_events(u32 clr); > > +#else > > +static inline void kvm_set_pmu_events(u32 set, struct perf_event_attr > > *attr) {} > > +static inline void kvm_clr_pmu_events(u32 clr) {} > > #endif > > > > static inline void kvm_arm_vhe_guest_enter(void) > > diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile > > index 0f2a135ba15b..f34cb49b66ae 100644 > > --- a/arch/arm64/kvm/Makefile > > +++ b/arch/arm64/kvm/Makefile > > @@ -19,7 +19,7 @@ kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/psci.o > > $(KVM)/arm/perf.o > > kvm-$(CONFIG_KVM_ARM_HOST) += inject_fault.o regmap.o va_layout.o > > kvm-$(CONFIG_KVM_ARM_HOST) += hyp.o hyp-init.o handle_exit.o > > kvm-$(CONFIG_KVM_ARM_HOST) += guest.o debug.o reset.o sys_regs.o > > sys_regs_generic_v8.o > > -kvm-$(CONFIG_KVM_ARM_HOST) += vgic-sys-reg-v3.o fpsimd.o > > +kvm-$(CONFIG_KVM_ARM_HOST) += vgic-sys-reg-v3.o fpsimd.o pmu.o > > kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/aarch32.o > > > > kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/vgic/vgic.o > > diff --git a/arch/arm64/kvm/pmu.c b/arch/arm64/kvm/pmu.c > > new file mode 100644 > > index ..43965a3cc0f4 > > --- /dev/null > > +++ b/arch/arm64/kvm/pmu.c > > @@ -0,0 +1,49 @@ > > +// SPDX-License-Identifier: GPL-2.0 > > +/* > > + * arch/arm64/kvm/pmu.c: Switching between guest and host counters > > + * > > + * Copyright 2019 Arm Limited > > + * Author: Andrew Murray > > + */ > > +#include > > +#include > > + > > +DECLARE_PER_CPU(kvm_host_data_t, kvm_host_data); > > + > > +/* > > + * Given the exclude_{host,guest} attributes, determine if we are going > > + * to need to switch counters at guest entry/exit. > > + */ > > +static bool kvm_pmu_switch_needed(struct perf_event_attr *attr) > > +{ > > + /* Only switch if attributes are different */ > > + return (attr->exclude_host ^ attr->exclude_guest); > > Nit: > > Is there any benefit to this rather than doing "attr->exclude_host != > attr->exclude_guest" ? The code generated is most likely the same, I > just find the latter slightly more straightforward. Nope, I'll change it. Not sure why I ended up with th
Re: [PATCH v11 3/8] arm64: KVM: add accessors to track guest/host only counters
Hi Andrew, On 08/03/2019 12:07, Andrew Murray wrote: In order to effeciently switch events_{guest,host} perf counters at guest entry/exit we add bitfields to kvm_cpu_context for guest and host events as well as accessors for updating them. A function is also provided which allows the PMU driver to determine if a counter should start counting when it is enabled. With exclude_host, events on !VHE we may only start counting when entering the guest. Some minor comments below. Signed-off-by: Andrew Murray --- arch/arm64/include/asm/kvm_host.h | 17 +++ arch/arm64/kvm/Makefile | 2 +- arch/arm64/kvm/pmu.c | 49 +++ 3 files changed, 67 insertions(+), 1 deletion(-) create mode 100644 arch/arm64/kvm/pmu.c diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h index 1d36619d6650..4b7219128f2d 100644 --- a/arch/arm64/include/asm/kvm_host.h +++ b/arch/arm64/include/asm/kvm_host.h @@ -207,8 +207,14 @@ struct kvm_cpu_context { struct kvm_vcpu *__hyp_running_vcpu; }; +struct kvm_pmu_events { + u32 events_host; + u32 events_guest; +}; + struct kvm_host_data { struct kvm_cpu_context host_ctxt; + struct kvm_pmu_events pmu_events; }; typedef struct kvm_host_data kvm_host_data_t; @@ -479,11 +485,22 @@ void kvm_arch_vcpu_load_fp(struct kvm_vcpu *vcpu); void kvm_arch_vcpu_ctxsync_fp(struct kvm_vcpu *vcpu); void kvm_arch_vcpu_put_fp(struct kvm_vcpu *vcpu); +static inline bool kvm_pmu_counter_defered(struct perf_event_attr *attr) nit: s/defered/deferred/ +{ + return attr->exclude_host; +} + Does it need a definition for !CONFIG_KVM case, to make sure that the events are always enabled for that case ? i.e, does this introduce a change in behavior for !CONFIG_KVM case ? #ifdef CONFIG_KVM /* Avoid conflicts with core headers if CONFIG_KVM=n */ static inline int kvm_arch_vcpu_run_pid_change(struct kvm_vcpu *vcpu) { return kvm_arch_vcpu_run_map_fp(vcpu); } + +void kvm_set_pmu_events(u32 set, struct perf_event_attr *attr); +void kvm_clr_pmu_events(u32 clr); +#else +static inline void kvm_set_pmu_events(u32 set, struct perf_event_attr *attr) {} +static inline void kvm_clr_pmu_events(u32 clr) {} #endif static inline void kvm_arm_vhe_guest_enter(void) diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile index 0f2a135ba15b..f34cb49b66ae 100644 --- a/arch/arm64/kvm/Makefile +++ b/arch/arm64/kvm/Makefile @@ -19,7 +19,7 @@ kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/psci.o $(KVM)/arm/perf.o kvm-$(CONFIG_KVM_ARM_HOST) += inject_fault.o regmap.o va_layout.o kvm-$(CONFIG_KVM_ARM_HOST) += hyp.o hyp-init.o handle_exit.o kvm-$(CONFIG_KVM_ARM_HOST) += guest.o debug.o reset.o sys_regs.o sys_regs_generic_v8.o -kvm-$(CONFIG_KVM_ARM_HOST) += vgic-sys-reg-v3.o fpsimd.o +kvm-$(CONFIG_KVM_ARM_HOST) += vgic-sys-reg-v3.o fpsimd.o pmu.o kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/aarch32.o kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/vgic/vgic.o diff --git a/arch/arm64/kvm/pmu.c b/arch/arm64/kvm/pmu.c new file mode 100644 index ..43965a3cc0f4 --- /dev/null +++ b/arch/arm64/kvm/pmu.c @@ -0,0 +1,49 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * arch/arm64/kvm/pmu.c: Switching between guest and host counters minor nit: You don't need the file name, it is superfluous. + * + * Copyright 2019 Arm Limited + * Author: Andrew Murray + */ +#include +#include + +DECLARE_PER_CPU(kvm_host_data_t, kvm_host_data); nit: Do we really need this ? This is already in asm/kvm_host.h. + +/* + * Given the exclude_{host,guest} attributes, determine if we are going + * to need to switch counters at guest entry/exit. + */ +static bool kvm_pmu_switch_needed(struct perf_event_attr *attr) +{ + /* Only switch if attributes are different */ + return (attr->exclude_host ^ attr->exclude_guest); I back Julien's suggestion to keep this simple. +} + +/* + * Add events to track that we may want to switch at guest entry/exit + * time. + */ +void kvm_set_pmu_events(u32 set, struct perf_event_attr *attr) +{ + struct kvm_host_data *ctx = this_cpu_ptr(&kvm_host_data); + + if (!kvm_pmu_switch_needed(attr)) + return; + + if (!attr->exclude_host) + ctx->pmu_events.events_host |= set; + if (!attr->exclude_guest) + ctx->pmu_events.events_guest |= set; +} + +/* + * Stop tracking events + */ +void kvm_clr_pmu_events(u32 clr) +{ + struct kvm_host_data *ctx = this_cpu_ptr(&kvm_host_data); + + ctx->pmu_events.events_host &= ~clr; + ctx->pmu_events.events_guest &= ~clr; +} Rest looks fine. Suzuki ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: Question: KVM: Failed to bind vfio with PCI-e / SMMU on Juno-r2
Hi Leo, On 3/11/19 10:39 AM, Leo Yan wrote: > Hi Auger, > > On Mon, Mar 11, 2019 at 09:23:20AM +0100, Auger Eric wrote: > > [...] > >>> P.s. I also checked the sysfs node and found it doesn't contain node >>> 'iommu_group': >>> >>> # ls /sys/bus/pci/devices/\:08\:00.0/iommu_group >>> ls: cannot access '/sys/bus/pci/devices/:08:00.0/iommu_group': No >>> such file or directory >> >> please can you give the output of the following command: >> find /sys/kernel/iommu_groups/ > > I get below result on Juno board: > > root@debian:~# find /sys/kernel/iommu_groups/ > /sys/kernel/iommu_groups/ > /sys/kernel/iommu_groups/1 > /sys/kernel/iommu_groups/1/devices > /sys/kernel/iommu_groups/1/devices/2007.etr > /sys/kernel/iommu_groups/1/type > /sys/kernel/iommu_groups/1/reserved_regions > /sys/kernel/iommu_groups/0 > /sys/kernel/iommu_groups/0/devices > /sys/kernel/iommu_groups/0/devices/7ffb.ohci > /sys/kernel/iommu_groups/0/devices/7ffc.ehci > /sys/kernel/iommu_groups/0/type > /sys/kernel/iommu_groups/0/reserved_regions > > So the 'iommu_groups' is not created for pci-e devices, right? Yes that's correct. > > Will debug into dt binding and related code and keep posted at here.OK Thanks Eric > >> when booting your host without noiommu=true >> >> At first sight I would say you have trouble with your iommu groups. > > Thanks a lot for guidance. > Leo Yan > ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: Question: KVM: Failed to bind vfio with PCI-e / SMMU on Juno-r2
Hi Auger, On Mon, Mar 11, 2019 at 09:23:20AM +0100, Auger Eric wrote: [...] > > P.s. I also checked the sysfs node and found it doesn't contain node > > 'iommu_group': > > > > # ls /sys/bus/pci/devices/\:08\:00.0/iommu_group > > ls: cannot access '/sys/bus/pci/devices/:08:00.0/iommu_group': No > > such file or directory > > please can you give the output of the following command: > find /sys/kernel/iommu_groups/ I get below result on Juno board: root@debian:~# find /sys/kernel/iommu_groups/ /sys/kernel/iommu_groups/ /sys/kernel/iommu_groups/1 /sys/kernel/iommu_groups/1/devices /sys/kernel/iommu_groups/1/devices/2007.etr /sys/kernel/iommu_groups/1/type /sys/kernel/iommu_groups/1/reserved_regions /sys/kernel/iommu_groups/0 /sys/kernel/iommu_groups/0/devices /sys/kernel/iommu_groups/0/devices/7ffb.ohci /sys/kernel/iommu_groups/0/devices/7ffc.ehci /sys/kernel/iommu_groups/0/type /sys/kernel/iommu_groups/0/reserved_regions So the 'iommu_groups' is not created for pci-e devices, right? Will debug into dt binding and related code and keep posted at here. > when booting your host without noiommu=true > > At first sight I would say you have trouble with your iommu groups. Thanks a lot for guidance. Leo Yan ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH v11 6/8] arm64: KVM: Enable VHE support for :G/:H perf event modifiers
Hi Andrew, On 08/03/2019 12:07, Andrew Murray wrote: > With VHE different exception levels are used between the host (EL2) and > guest (EL1) with a shared exception level for userpace (EL0). We can take > advantage of this and use the PMU's exception level filtering to avoid > enabling/disabling counters in the world-switch code. Instead we just > modify the counter type to include or exclude EL0 at vcpu_{load,put} time. > > We also ensure that trapped PMU system register writes do not re-enable > EL0 when reconfiguring the backing perf events. > > This approach completely avoids blackout windows seen with !VHE. > > Suggested-by: Christoffer Dall > Signed-off-by: Andrew Murray > --- > arch/arm/include/asm/kvm_host.h | 3 ++ > arch/arm64/include/asm/kvm_host.h | 5 +- > arch/arm64/kernel/perf_event.c| 6 ++- > arch/arm64/kvm/pmu.c | 87 ++- > arch/arm64/kvm/sys_regs.c | 3 ++ > virt/kvm/arm/arm.c| 2 + > 6 files changed, 102 insertions(+), 4 deletions(-) > > diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h > index a358cb15bb0d..3ce429954306 100644 > --- a/arch/arm/include/asm/kvm_host.h > +++ b/arch/arm/include/asm/kvm_host.h > @@ -326,6 +326,9 @@ static inline void kvm_arch_vcpu_load_fp(struct kvm_vcpu > *vcpu) {} > static inline void kvm_arch_vcpu_ctxsync_fp(struct kvm_vcpu *vcpu) {} > static inline void kvm_arch_vcpu_put_fp(struct kvm_vcpu *vcpu) {} > > +static inline void kvm_vcpu_pmu_restore_guest(struct kvm_vcpu *vcpu) {} > +static inline void kvm_vcpu_pmu_restore_host(struct kvm_vcpu *vcpu) {} > + > static inline void kvm_arm_vhe_guest_enter(void) {} > static inline void kvm_arm_vhe_guest_exit(void) {} > > diff --git a/arch/arm64/include/asm/kvm_host.h > b/arch/arm64/include/asm/kvm_host.h > index 7ca4e094626d..d631528898b5 100644 > --- a/arch/arm64/include/asm/kvm_host.h > +++ b/arch/arm64/include/asm/kvm_host.h > @@ -487,7 +487,7 @@ void kvm_arch_vcpu_put_fp(struct kvm_vcpu *vcpu); > > static inline bool kvm_pmu_counter_defered(struct perf_event_attr *attr) > { > - return attr->exclude_host; > + return (!has_vhe() && attr->exclude_host); > } > > #ifdef CONFIG_KVM /* Avoid conflicts with core headers if CONFIG_KVM=n */ > @@ -501,6 +501,9 @@ void kvm_clr_pmu_events(u32 clr); > > void __pmu_switch_to_host(struct kvm_cpu_context *host_ctxt); > bool __pmu_switch_to_guest(struct kvm_cpu_context *host_ctxt); > + > +void kvm_vcpu_pmu_restore_guest(struct kvm_vcpu *vcpu); > +void kvm_vcpu_pmu_restore_host(struct kvm_vcpu *vcpu); > #else > static inline void kvm_set_pmu_events(u32 set, struct perf_event_attr *attr) > {} > static inline void kvm_clr_pmu_events(u32 clr) {} > diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c > index 64f02a9fd7cd..a121a82fc54c 100644 > --- a/arch/arm64/kernel/perf_event.c > +++ b/arch/arm64/kernel/perf_event.c > @@ -847,8 +847,12 @@ static int armv8pmu_set_event_filter(struct > hw_perf_event *event, >* with other architectures (x86 and Power). >*/ > if (is_kernel_in_hyp_mode()) { > - if (!attr->exclude_kernel) > + if (!attr->exclude_kernel && !attr->exclude_host) > config_base |= ARMV8_PMU_INCLUDE_EL2; > + if (attr->exclude_guest) > + config_base |= ARMV8_PMU_EXCLUDE_EL1; > + if (attr->exclude_host) > + config_base |= ARMV8_PMU_EXCLUDE_EL0; > } else { > if (!attr->exclude_hv && !attr->exclude_host) > config_base |= ARMV8_PMU_INCLUDE_EL2; > diff --git a/arch/arm64/kvm/pmu.c b/arch/arm64/kvm/pmu.c > index a1cee7919953..a0830c70ece5 100644 > --- a/arch/arm64/kvm/pmu.c > +++ b/arch/arm64/kvm/pmu.c > @@ -12,11 +12,19 @@ > DECLARE_PER_CPU(kvm_host_data_t, kvm_host_data); > > /* > - * Given the exclude_{host,guest} attributes, determine if we are going > - * to need to switch counters at guest entry/exit. > + * Given the perf event attributes and system type, determine > + * if we are going to need to switch counters at guest entry/exit. > */ > static bool kvm_pmu_switch_needed(struct perf_event_attr *attr) > { > + /** > + * With VHE the guest kernel runs at EL1 and the host at EL2, > + * where user (EL0) is excluded then we have no reason to switch > + * counters. > + */ > + if (has_vhe() && attr->exclude_user) > + return false; > + > /* Only switch if attributes are different */ > return (attr->exclude_host ^ attr->exclude_guest); > } > @@ -87,3 +95,78 @@ void __hyp_text __pmu_switch_to_host(struct > kvm_cpu_context *host_ctxt) > write_sysreg(pmu->events_host, pmcntenset_el0); > } > > +/* > + * Modify ARMv8 PMU events to include EL0 counting > + */ > +static void kvm_vcpu_pmu_enable_el0(unsigned long events) > +{ > + u64 typer; > + u32 counter; > + > +
Re: Question: KVM: Failed to bind vfio with PCI-e / SMMU on Juno-r2
Hi Leo, On 3/11/19 7:42 AM, Leo Yan wrote: > Hi all, > > I am trying to enable PCI-e device pass-through mode with KVM, since > Juno-r2 board has PCI-e bus so I firstly try to use vfio to > passthrough the network card on PCI-e bus. > > According to Juno-r2 board TRM [1], there has a CoreLink MMU-401 (SMMU) > between PCI-e devices and CCI bus; IIUC, PCI-e device and the SMMU can > be used for vfio for address isolation and from hardware pespective it > is sufficient for support pass-through mode. > > I followed Eric's blog [2] for 'VFIO-PCI driver binding', so I > executed blow commands on Juno-r2 board: > > echo vfio-pci > /sys/bus/pci/devices/\:08\:00.0/driver_override > echo :08:00.0 > /sys/bus/pci/drivers/sky2/unbind > echo :08:00.0 > /sys/bus/pci/drivers_probe > > But at the last command for vifo probing, it reports failure as below: > > [ 21.553889] sky2 :08:00.0 enp8s0: disabling interface > [ 21.616720] vfio-pci: probe of :08:00.0 failed with error -22 > > I looked into for the code, though 'dev->bus->iommu_ops' points to the > data structure 'arm_smmu_ops', but 'dev->iommu_group' is NULL thus the > probe function returns failure with below flow: > > vfio_pci_probe() > `-> vfio_iommu_group_get() > `-> iommu_group_get() > `-> return NULL; > > Alternatively, if enable the kconfig CONFIG_VFIO_NOIOMMU & set global > variable 'noiommu' = true, the probe function still returns error; since > the function iommu_present(dev->bus) return back 'arm_smmu_ops' so you > could see the code will run into below logic: > > vfio_iommu_group_get() > { > group = iommu_group_get(dev); > > #ifdef CONFIG_VFIO_NOIOMMU > > /* >* With noiommu enabled, an IOMMU group will be created for a device >* that doesn't already have one and doesn't have an iommu_ops on their >* bus. We set iommudata simply to be able to identify these groups >* as special use and for reclamation later. >*/ > if (group || !noiommu || iommu_present(dev->bus)) > return group;==> return 'group' and 'group' is NULL > > [...] > } > > So either using SMMU or with kernel config CONFIG_VFIO_NOIOMMU, both cannot > bind vifo driver for network card device on Juno-r2 board. > > P.s. I also checked the sysfs node and found it doesn't contain node > 'iommu_group': > > # ls /sys/bus/pci/devices/\:08\:00.0/iommu_group > ls: cannot access '/sys/bus/pci/devices/:08:00.0/iommu_group': No > such file or directory please can you give the output of the following command: find /sys/kernel/iommu_groups/ when booting your host without noiommu=true At first sight I would say you have trouble with your iommu groups. Thanks Eric > > Could you give some suggestions for this so that I can proceed? Very > appreciate for any comment. > > Thanks, > Leo Yan > > [1] > http://infocenter.arm.com/help/topic/com.arm.doc.ddi0515f/DDI0515F_juno_arm_development_platform_soc_trm.pdf > [2] https://www.linaro.org/blog/kvm-pciemsi-passthrough-armarm64/ > ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm