[PATCH] KVM: PPC: Book3S HV: Fix decrementer migration
We used to have a workaround[1] for a hang during migration that was made ineffective when we converted the decrementer expiry to be relative to guest timebase. The point of the workaround was that in the absence of an explicit decrementer expiry value provided by userspace during migration, KVM needs to initialize dec_expires to a value that will result in an expired decrementer after subtracting the current guest timebase. That stops the vcpu from hanging after migration due to a decrementer that's too large. If the dec_expires is now relative to guest timebase, its initialization needs to be guest timebase-relative as well, otherwise we end up with a decrementer expiry that is still larger than the guest timebase. 1- https://git.kernel.org/torvalds/c/5855564c8ab2 Fixes: 3c1a4322bba7 ("KVM: PPC: Book3S HV: Change dec_expires to be relative to guest timebase") Signed-off-by: Fabiano Rosas --- arch/powerpc/kvm/book3s_hv.c | 18 -- arch/powerpc/kvm/powerpc.c | 1 - 2 files changed, 16 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index 57d0835e56fd..917abda9e5ce 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -2517,10 +2517,24 @@ static int kvmppc_set_one_reg_hv(struct kvm_vcpu *vcpu, u64 id, r = set_vpa(vcpu, &vcpu->arch.dtl, addr, len); break; case KVM_REG_PPC_TB_OFFSET: + { /* round up to multiple of 2^24 */ - vcpu->arch.vcore->tb_offset = - ALIGN(set_reg_val(id, *val), 1UL << 24); + u64 tb_offset = ALIGN(set_reg_val(id, *val), 1UL << 24); + + /* +* Now that we know the timebase offset, update the +* decrementer expiry with a guest timebase value. If +* the userspace does not set DEC_EXPIRY, this ensures +* a migrated vcpu at least starts with an expired +* decrementer, which is better than a large one that +* causes a hang. +*/ + if (!vcpu->arch.dec_expires && tb_offset) + vcpu->arch.dec_expires = get_tb() + tb_offset; + + vcpu->arch.vcore->tb_offset = tb_offset; break; + } case KVM_REG_PPC_LPCR: kvmppc_set_lpcr(vcpu, set_reg_val(id, *val), true); break; diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c index fb1490761c87..757491dd6b7b 100644 --- a/arch/powerpc/kvm/powerpc.c +++ b/arch/powerpc/kvm/powerpc.c @@ -786,7 +786,6 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu) hrtimer_init(&vcpu->arch.dec_timer, CLOCK_REALTIME, HRTIMER_MODE_ABS); vcpu->arch.dec_timer.function = kvmppc_decrementer_wakeup; - vcpu->arch.dec_expires = get_tb(); #ifdef CONFIG_KVM_EXIT_TIMING mutex_init(&vcpu->arch.exit_timing_lock); -- 2.35.3
Re: [PATCH v2 2/2] powerpc/kvm: don't crash on missing rng, and use darn
"Jason A. Donenfeld" writes: > On POWER8 systems that don't have ibm,power-rng available, a guest that > ignores the KVM_CAP_PPC_HWRNG flag and calls H_RANDOM anyway will > dereference a NULL pointer. And on machines with darn instead of > ibm,power-rng, H_RANDOM won't work at all. > > This patch kills two birds with one stone, by routing H_RANDOM calls to > ppc_md.get_random_seed, and doing the real mode check inside of it. > > Cc: sta...@vger.kernel.org # v4.1+ > Cc: Michael Ellerman > Fixes: e928e9cb3601 ("KVM: PPC: Book3S HV: Add fast real-mode H_RANDOM > implementation.") > Signed-off-by: Jason A. Donenfeld Reviewed-by: Fabiano Rosas
Re: [PATCH v2 1/2] powerpc/powernv: rename remaining rng powernv_ functions to pnv_
"Jason A. Donenfeld" writes: > The preferred nomenclature is pnv_, not powernv_, but rng.c used > powernv_ for some reason, which isn't consistent with the rest. A recent > commit added a few pnv_ functions to rng.c, making the file a bit of a > mishmash. This commit just replaces the rest of them. > > Cc: Michael Ellerman > Fixes: f3eac426657 ("powerpc/powernv: wire up rng during setup_arch") > Signed-off-by: Jason A. Donenfeld Reviewed-by: Fabiano Rosas
[PATCH v2] KVM: PPC: Align pt_regs in kvm_vcpu_arch structure
The H_ENTER_NESTED hypercall receives as second parameter the address of a region of memory containing the values for the nested guest privileged registers. We currently use the pt_regs structure contained within kvm_vcpu_arch for that end. Most hypercalls that receive a memory address expect that region to not cross a 4K page boundary. We would want H_ENTER_NESTED to follow the same pattern so this patch ensures the pt_regs structure sits within a page. Note: the pt_regs structure is currently 384 bytes in size, so aligning to 512 is sufficient to ensure it will not cross a 4K page and avoids punching too big a hole in struct kvm_vcpu_arch. Signed-off-by: Fabiano Rosas Signed-off-by: Murilo Opsfelder Araújo --- v2: - updated commit message to inform the rationale for aligning to 512; - added Murilo's sign-off which I had forgotten, we worked on this together. --- arch/powerpc/include/asm/kvm_host.h | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index 2909a88acd16..2c7219cef4ec 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -523,7 +523,11 @@ struct kvm_vcpu_arch { struct kvmppc_book3s_shadow_vcpu *shadow_vcpu; #endif - struct pt_regs regs; + /* +* This is passed along to the HV via H_ENTER_NESTED. Align to +* prevent it crossing a real 4K page. +*/ + struct pt_regs regs __aligned(512); struct thread_fp_state fp; -- 2.35.3
Re: [PATCH] powerpc/kvm: don't crash on missing rng, and use darn
"Jason A. Donenfeld" writes: > On POWER8 systems that don't have ibm,power-rng available, a guest that > ignores the KVM_CAP_PPC_HWRNG flag and calls H_RANDOM anyway will > dereference a NULL pointer. And on machines with darn instead of > ibm,power-rng, H_RANDOM won't work at all. > > This patch kills two birds with one stone, by routing H_RANDOM calls to > ppc_md.get_random_seed, and doing the real mode check inside of it. > > Cc: sta...@vger.kernel.org # v4.1+ > Cc: Michael Ellerman > Fixes: e928e9cb3601 ("KVM: PPC: Book3S HV: Add fast real-mode H_RANDOM > implementation.") > Signed-off-by: Jason A. Donenfeld > --- > > This patch must be applied ontop of: > 1) > https://github.com/linuxppc/linux/commit/f3eac426657d985b97c92fa5f7ae1d43f04721f3 > 2) https://lore.kernel.org/all/20220622102532.173393-1-ja...@zx2c4.com/ > > > arch/powerpc/include/asm/archrandom.h | 5 > arch/powerpc/kvm/book3s_hv_builtin.c | 5 ++-- > arch/powerpc/platforms/powernv/rng.c | 33 +++ > 3 files changed, 11 insertions(+), 32 deletions(-) > > diff --git a/arch/powerpc/include/asm/archrandom.h > b/arch/powerpc/include/asm/archrandom.h > index 11d4815841ab..3af27bb84a3d 100644 > --- a/arch/powerpc/include/asm/archrandom.h > +++ b/arch/powerpc/include/asm/archrandom.h > @@ -38,12 +38,7 @@ static inline bool __must_check > arch_get_random_seed_int(unsigned int *v) > #endif /* CONFIG_ARCH_RANDOM */ > > #ifdef CONFIG_PPC_POWERNV > -int pnv_hwrng_present(void); > int pnv_get_random_long(unsigned long *v); > -int pnv_get_random_real_mode(unsigned long *v); > -#else > -static inline int pnv_hwrng_present(void) { return 0; } > -static inline int pnv_get_random_real_mode(unsigned long *v) { return 0; } > #endif > > #endif /* _ASM_POWERPC_ARCHRANDOM_H */ > diff --git a/arch/powerpc/kvm/book3s_hv_builtin.c > b/arch/powerpc/kvm/book3s_hv_builtin.c > index 799d40c2ab4f..1c6672826db5 100644 > --- a/arch/powerpc/kvm/book3s_hv_builtin.c > +++ b/arch/powerpc/kvm/book3s_hv_builtin.c > @@ -176,13 +176,14 @@ EXPORT_SYMBOL_GPL(kvmppc_hcall_impl_hv_realmode); > > int kvmppc_hwrng_present(void) > { > - return pnv_hwrng_present(); > + return ppc_md.get_random_seed != NULL; > } > EXPORT_SYMBOL_GPL(kvmppc_hwrng_present); > > long kvmppc_rm_h_random(struct kvm_vcpu *vcpu) > { > - if (pnv_get_random_real_mode(&vcpu->arch.regs.gpr[4])) > + if (ppc_md.get_random_seed && > + ppc_md.get_random_seed(&vcpu->arch.regs.gpr[4])) > return H_SUCCESS; This is the same as arch_get_random_seed_long, perhaps you could use it instead. Otherwise, the archrandom.h include is not needed anymore and could be replaced with #include for ppc_md. > > return H_HARDWARE; > diff --git a/arch/powerpc/platforms/powernv/rng.c > b/arch/powerpc/platforms/powernv/rng.c > index 868bb9777425..c748567cd47e 100644 > --- a/arch/powerpc/platforms/powernv/rng.c > +++ b/arch/powerpc/platforms/powernv/rng.c > @@ -29,15 +29,6 @@ struct pnv_rng { > > static DEFINE_PER_CPU(struct pnv_rng *, pnv_rng); > > -int pnv_hwrng_present(void) > -{ > - struct pnv_rng *rng; > - > - rng = get_cpu_var(pnv_rng); > - put_cpu_var(rng); > - return rng != NULL; > -} > - > static unsigned long rng_whiten(struct pnv_rng *rng, unsigned long val) > { > unsigned long parity; > @@ -58,17 +49,6 @@ static unsigned long rng_whiten(struct pnv_rng *rng, > unsigned long val) > return val; > } > > -int pnv_get_random_real_mode(unsigned long *v) > -{ > - struct pnv_rng *rng; > - > - rng = raw_cpu_read(pnv_rng); > - > - *v = rng_whiten(rng, __raw_rm_readq(rng->regs_real)); > - > - return 1; > -} > - > static int pnv_get_random_darn(unsigned long *v) > { > unsigned long val; > @@ -105,11 +85,14 @@ int pnv_get_random_long(unsigned long *v) > { > struct pnv_rng *rng; > > - rng = get_cpu_var(pnv_rng); > - > - *v = rng_whiten(rng, in_be64(rng->regs)); > - > - put_cpu_var(rng); > + if (mfmsr() & MSR_DR) { > + rng = raw_cpu_read(pnv_rng); > + *v = rng_whiten(rng, __raw_rm_readq(rng->regs_real)); > + } else { > + rng = get_cpu_var(pnv_rng); > + *v = rng_whiten(rng, in_be64(rng->regs)); > + put_cpu_var(rng); > + } > > return 1; > }
[PATCH] KVM: PPC: Book3S HV: tracing: Add missing hcall names
The kvm_trace_symbol_hcall macro is missing several of the hypercalls defined in hvcall.h. Add the most common ones that are issued during guest lifetime, including the ones that are only used by QEMU and SLOF. Signed-off-by: Fabiano Rosas --- arch/powerpc/include/asm/hvcall.h | 8 arch/powerpc/kvm/trace_hv.h | 21 - 2 files changed, 28 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/include/asm/hvcall.h b/arch/powerpc/include/asm/hvcall.h index d92a20a85395..1d454c70e7c6 100644 --- a/arch/powerpc/include/asm/hvcall.h +++ b/arch/powerpc/include/asm/hvcall.h @@ -350,6 +350,14 @@ /* Platform specific hcalls, used by KVM */ #define H_RTAS 0xf000 +/* + * Platform specific hcalls, used by QEMU/SLOF. These are ignored by + * KVM and only kept here so we can identify them during tracing. + */ +#define H_LOGICAL_MEMOP 0xF001 +#define H_CAS0XF002 +#define H_UPDATE_DT 0XF003 + /* "Platform specific hcalls", provided by PHYP */ #define H_GET_24X7_CATALOG_PAGE0xF078 #define H_GET_24X7_DATA0xF07C diff --git a/arch/powerpc/kvm/trace_hv.h b/arch/powerpc/kvm/trace_hv.h index 32e2cb5811cc..8d57c8428531 100644 --- a/arch/powerpc/kvm/trace_hv.h +++ b/arch/powerpc/kvm/trace_hv.h @@ -94,6 +94,7 @@ {H_GET_HCA_INFO,"H_GET_HCA_INFO"}, \ {H_GET_PERF_COUNT, "H_GET_PERF_COUNT"}, \ {H_MANAGE_TRACE,"H_MANAGE_TRACE"}, \ + {H_GET_CPU_CHARACTERISTICS, "H_GET_CPU_CHARACTERISTICS"}, \ {H_FREE_LOGICAL_LAN_BUFFER, "H_FREE_LOGICAL_LAN_BUFFER"}, \ {H_QUERY_INT_STATE, "H_QUERY_INT_STATE"}, \ {H_POLL_PENDING,"H_POLL_PENDING"}, \ @@ -125,7 +126,25 @@ {H_COP, "H_COP"}, \ {H_GET_MPP_X, "H_GET_MPP_X"}, \ {H_SET_MODE,"H_SET_MODE"}, \ - {H_RTAS,"H_RTAS"} + {H_REGISTER_PROC_TBL, "H_REGISTER_PROC_TBL"}, \ + {H_QUERY_VAS_CAPABILITIES, "H_QUERY_VAS_CAPABILITIES"}, \ + {H_INT_GET_SOURCE_INFO, "H_INT_GET_SOURCE_INFO"}, \ + {H_INT_SET_SOURCE_CONFIG, "H_INT_SET_SOURCE_CONFIG"}, \ + {H_INT_GET_QUEUE_INFO, "H_INT_GET_QUEUE_INFO"}, \ + {H_INT_SET_QUEUE_CONFIG,"H_INT_SET_QUEUE_CONFIG"}, \ + {H_INT_ESB, "H_INT_ESB"}, \ + {H_INT_RESET, "H_INT_RESET"}, \ + {H_RPT_INVALIDATE, "H_RPT_INVALIDATE"}, \ + {H_RTAS,"H_RTAS"}, \ + {H_LOGICAL_MEMOP, "H_LOGICAL_MEMOP"}, \ + {H_CAS, "H_CAS"}, \ + {H_UPDATE_DT, "H_UPDATE_DT"}, \ + {H_GET_PERF_COUNTER_INFO, "H_GET_PERF_COUNTER_INFO"}, \ + {H_SET_PARTITION_TABLE, "H_SET_PARTITION_TABLE"}, \ + {H_ENTER_NESTED,"H_ENTER_NESTED"}, \ + {H_TLB_INVALIDATE, "H_TLB_INVALIDATE"}, \ + {H_COPY_TOFROM_GUEST, "H_COPY_TOFROM_GUEST"} + #define kvm_trace_symbol_kvmret \ {RESUME_GUEST, "RESUME_GUEST"}, \ -- 2.35.3
Re: [PATCH 4/4] powerpc/pseries: Implement CONFIG_PARAVIRT_TIME_ACCOUNTING
Nicholas Piggin writes: > CONFIG_VIRT_CPU_ACCOUNTING_GEN under pseries does not implement > stolen time accounting. Implement it with the paravirt time > accounting option. > > Signed-off-by: Nicholas Piggin > --- > .../admin-guide/kernel-parameters.txt | 6 +++--- > arch/powerpc/include/asm/paravirt.h | 12 > arch/powerpc/platforms/pseries/Kconfig| 8 > arch/powerpc/platforms/pseries/lpar.c | 11 +++ > arch/powerpc/platforms/pseries/setup.c| 19 +++ > 5 files changed, 53 insertions(+), 3 deletions(-) > > diff --git a/Documentation/admin-guide/kernel-parameters.txt > b/Documentation/admin-guide/kernel-parameters.txt > index 3f1cc5e317ed..855fc7b02261 100644 > --- a/Documentation/admin-guide/kernel-parameters.txt > +++ b/Documentation/admin-guide/kernel-parameters.txt > @@ -3604,9 +3604,9 @@ > [X86,PV_OPS] Disable paravirtualized VMware scheduler > clock and use the default one. > > - no-steal-acc[X86,PV_OPS,ARM64] Disable paravirtualized steal time > - accounting. steal time is computed, but won't > - influence scheduler behaviour > + no-steal-acc[X86,PV_OPS,ARM64,PPC/PSERIES] Disable paravirtualized > + steal time accounting. steal time is computed, but > + won't influence scheduler behaviour > > nolapic [X86-32,APIC] Do not enable or use the local APIC. > > diff --git a/arch/powerpc/include/asm/paravirt.h > b/arch/powerpc/include/asm/paravirt.h > index eb7df559ae74..f5ba1a3c41f8 100644 > --- a/arch/powerpc/include/asm/paravirt.h > +++ b/arch/powerpc/include/asm/paravirt.h > @@ -21,6 +21,18 @@ static inline bool is_shared_processor(void) > return static_branch_unlikely(&shared_processor); > } > > +#ifdef CONFIG_PARAVIRT_TIME_ACCOUNTING > +extern struct static_key paravirt_steal_enabled; > +extern struct static_key paravirt_steal_rq_enabled; > + > +u64 pseries_paravirt_steal_clock(int cpu); > + > +static inline u64 paravirt_steal_clock(int cpu) > +{ > + return pseries_paravirt_steal_clock(cpu); > +} > +#endif > + > /* If bit 0 is set, the cpu has been ceded, conferred, or preempted */ > static inline u32 yield_count_of(int cpu) > { > diff --git a/arch/powerpc/platforms/pseries/Kconfig > b/arch/powerpc/platforms/pseries/Kconfig > index f7fd91d153a4..d4306ebdca5e 100644 > --- a/arch/powerpc/platforms/pseries/Kconfig > +++ b/arch/powerpc/platforms/pseries/Kconfig > @@ -24,13 +24,21 @@ config PPC_PSERIES > select SWIOTLB > default y > > +config PARAVIRT > + bool > + In file included from ../kernel/sched/build_utility.c:53: ../kernel/sched/sched.h:87:11: fatal error: asm/paravirt_api_clock.h: No such file or directory 87 | # include $ find . -name paravirt_api_clock.h ./arch/arm64/include/asm/paravirt_api_clock.h ./arch/x86/include/asm/paravirt_api_clock.h ./arch/arm/include/asm/paravirt_api_clock.h > config PARAVIRT_SPINLOCKS > bool > > +config PARAVIRT_TIME_ACCOUNTING > + select PARAVIRT > + bool > + > config PPC_SPLPAR > bool "Support for shared-processor logical partitions" > depends on PPC_PSERIES > select PARAVIRT_SPINLOCKS if PPC_QUEUED_SPINLOCKS > + select PARAVIRT_TIME_ACCOUNTING if VIRT_CPU_ACCOUNTING_GEN > default y > help > Enabling this option will make the kernel run more efficiently
Re: [PATCH 3/4] KVM: PPC: Book3S HV: Implement scheduling wait interval counters in the VPA
Nicholas Piggin writes: > PAPR specifies accumulated virtual processor wait intervals that relate > to partition scheduling interval times. Implement these counters in the > same way as they are repoted by dtl. > > Signed-off-by: Nicholas Piggin Reviewed-by: Fabiano Rosas
Re: [PATCH 2/4] powerpc/pseries: Add wait interval counters to VPA
Nicholas Piggin writes: > The hypervisor exposes accumulated partition scheduling interval times > in the VPA (lppaca). These can be used to implement a simple stolen time > in the guest without complex and costly dtl scanning. > > Signed-off-by: Nicholas Piggin > --- > arch/powerpc/include/asm/lppaca.h | 10 +++--- > 1 file changed, 7 insertions(+), 3 deletions(-) > > diff --git a/arch/powerpc/include/asm/lppaca.h > b/arch/powerpc/include/asm/lppaca.h > index c390ec377bae..34d44cb17c87 100644 > --- a/arch/powerpc/include/asm/lppaca.h > +++ b/arch/powerpc/include/asm/lppaca.h > @@ -104,14 +104,18 @@ struct lppaca { > volatile __be32 dispersion_count; /* dispatch changed physical cpu */ > volatile __be64 cmo_faults; /* CMO page fault count */ > volatile __be64 cmo_fault_time; /* CMO page fault time */ > - u8 reserved10[104]; > + u8 reserved10[64]; /* [S]PURR expropriated/donated */ > + volatile __be64 enqueue_dispatch_tb; /* Total TB enqueue->dispatch */ > + volatile __be64 ready_enqueue_tb; /* Total TB ready->enqueue */ > + volatile __be64 wait_ready_tb; /* Total TB wait->ready */ This last one is unused but I assume you are adding anyway it because it could be later added to lparcfg. So: Reviewed-by: Fabiano Rosas
Re: [PATCH 1/4] KVM: PPC: Book3S HV P9: Restore stolen time logging in dtl
Nicholas Piggin writes: > Stolen time logging in dtl was removed from the P9 path, so guests had > no stolen time accounting. Add it back in a simpler way that still > avoids locks and per-core accounting code. > > Fixes: ecb6a7207f92 ("KVM: PPC: Book3S HV P9: Remove most of the vcore logic") > Signed-off-by: Nicholas Piggin > --- > arch/powerpc/kvm/book3s_hv.c | 49 +--- > 1 file changed, 45 insertions(+), 4 deletions(-) > > diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c > index 6fa518f6501d..0a0835edb64a 100644 > --- a/arch/powerpc/kvm/book3s_hv.c > +++ b/arch/powerpc/kvm/book3s_hv.c > @@ -248,6 +248,7 @@ static void kvmppc_fast_vcpu_kick_hv(struct kvm_vcpu > *vcpu) > > /* > * We use the vcpu_load/put functions to measure stolen time. > + * > * Stolen time is counted as time when either the vcpu is able to > * run as part of a virtual core, but the task running the vcore > * is preempted or sleeping, or when the vcpu needs something done > @@ -277,6 +278,12 @@ static void kvmppc_fast_vcpu_kick_hv(struct kvm_vcpu > *vcpu) > * lock. The stolen times are measured in units of timebase ticks. > * (Note that the != TB_NIL checks below are purely defensive; > * they should never fail.) > + * > + * The POWER9 path is simpler, one vcpu per virtual core so the > + * former case does not exist. If a vcpu is preempted when it is > + * BUSY_IN_HOST and not ceded or otherwise blocked, then accumulate > + * the stolen cycles in busy_stolen. RUNNING is not a preemptible > + * state in the P9 path. Do you mean RUNNABLE? The only RUNNING state I see is in relation to the vcore.
[PATCH 5/5] KVM: PPC: Book3S HV: Provide more detailed timings for P9 entry path
Alter the data collection points for the debug timing code in the P9 path to be more in line with what the code does. The points where we accumulate time are now the following: vcpu_entry: From vcpu_run_hv entry until the start of the inner loop; guest_entry: From the start of the inner loop until the guest entry in asm; in_guest: From the guest entry in asm until the return to KVM C code; guest_exit: From the return into KVM C code until the corresponding hypercall/page fault handling or re-entry into the guest; hypercall: Time spent handling hcalls in the kernel (hcalls can go to QEMU, not accounted here); page_fault: Time spent handling page faults; vcpu_exit: vcpu_run_hv exit (almost no code here currently). Like before, these are exposed in debugfs in a file called "timings". There are four values: - number of occurrences of the accumulation point; - total time the vcpu spent in the phase in ns; - shortest time the vcpu spent in the phase in ns; - longest time the vcpu spent in the phase in ns; === Before: rm_entry: 53132 16793518 256 4060 rm_intr: 53132 2125914 22 340 rm_exit: 53132 24108344 374 2180 guest: 53132 40980507996 404 9997650 cede: 0 0 0 0 After: vcpu_entry: 34637 7716108 178 4416 guest_entry: 52414 49365608 324 747542 in_guest: 52411 40828715840 258 9997480 guest_exit: 52410 19681717182 826 102496674 vcpu_exit: 34636 1744462 38 182 hypercall: 45712 22878288 38 1307962 page_fault: 992 04034 568 168688 With just one instruction (hcall): vcpu_entry: 1 942 942 942 guest_entry: 1 4044 4044 4044 in_guest: 1 1540 1540 1540 guest_exit: 1 3542 3542 3542 vcpu_exit: 1 80 80 80 hypercall: 0 0 0 0 page_fault: 0 0 0 0 === Signed-off-by: Fabiano Rosas --- arch/powerpc/include/asm/kvm_host.h | 12 +++- arch/powerpc/kvm/Kconfig | 9 + arch/powerpc/kvm/book3s_hv.c | 23 ++- arch/powerpc/kvm/book3s_hv_p9_entry.c | 14 -- 4 files changed, 34 insertions(+), 24 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index 37f03665bfa2..de2b226aa350 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -827,11 +827,13 @@ struct kvm_vcpu_arch { struct kvmhv_tb_accumulator *cur_activity; /* What we're timing */ u64 cur_tb_start; /* when it started */ #ifdef CONFIG_KVM_BOOK3S_HV_P9_TIMING - struct kvmhv_tb_accumulator rm_entry; /* real-mode entry code */ - struct kvmhv_tb_accumulator rm_intr;/* real-mode intr handling */ - struct kvmhv_tb_accumulator rm_exit;/* real-mode exit code */ - struct kvmhv_tb_accumulator guest_time; /* guest execution */ - struct kvmhv_tb_accumulator cede_time; /* time napping inside guest */ + struct kvmhv_tb_accumulator vcpu_entry; + struct kvmhv_tb_accumulator vcpu_exit; + struct kvmhv_tb_accumulator in_guest; + struct kvmhv_tb_accumulator hcall; + struct kvmhv_tb_accumulator pg_fault; + struct kvmhv_tb_accumulator guest_entry; + struct kvmhv_tb_accumulator guest_exit; #else struct kvmhv_tb_accumulator rm_entry; /* real-mode entry code */ struct kvmhv_tb_accumulator rm_intr;/* real-mode intr handling */ diff --git a/arch/powerpc/kvm/Kconfig b/arch/powerpc/kvm/Kconfig index 191347f44731..cedf1e0f50e1 100644 --- a/arch/powerpc/kvm/Kconfig +++ b/arch/powerpc/kvm/Kconfig @@ -135,10 +135,11 @@ config KVM_BOOK3S_HV_P9_TIMING select KVM_BOOK3S_HV_EXIT_TIMING depends on KVM_BOOK3S_HV_POSSIBLE && DEBUG_FS help - Calculate time taken for each vcpu in various parts of the - code. The total, minimum and maximum times in nanoseconds - together with the number of executions are reported in debugfs in - kvm/vm#/vcpu#/timings. + Calculate time taken for each vcpu during vcpu entry and + exit, time spent inside the guest and time spent handling + hypercalls and page faults. The total, minimum and maximum + times in nanoseconds together with the number of executions + are reported in debugfs in kvm/vm#/vcpu#/timings. If unsure, say N. diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index 69a6b40d58b9..f485632f247a 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -2654,11 +2654,13 @@ static struct debugfs_timings_element { size_t offset; } timings[] = { #ifdef CONFIG_KVM_BOOK3S_HV_P9_TIMING - {"rm_entry",offsetof(struct kvm_vcpu, arch.rm_entry)}, - {"rm_intr", offsetof(struct kvm_vcpu, arch.rm_intr)}, - {"rm_exit", offsetof(struct kvm_vcpu, arch.rm_exit)}, - {"guest", offsetof(struct kvm_vcpu, arch.guest_time)}, -
[PATCH 4/5] KVM: PPC: Book3S HV: Expose timing functions to module code
The next patch adds new timing points to the P9 entry path, some of which are in the module code, so we need to export the timing functions. Signed-off-by: Fabiano Rosas --- arch/powerpc/kvm/book3s_hv.h | 10 ++ arch/powerpc/kvm/book3s_hv_p9_entry.c | 11 ++- 2 files changed, 12 insertions(+), 9 deletions(-) diff --git a/arch/powerpc/kvm/book3s_hv.h b/arch/powerpc/kvm/book3s_hv.h index 6b7f07d9026b..2f2e59d7d433 100644 --- a/arch/powerpc/kvm/book3s_hv.h +++ b/arch/powerpc/kvm/book3s_hv.h @@ -40,3 +40,13 @@ void switch_pmu_to_guest(struct kvm_vcpu *vcpu, struct p9_host_os_sprs *host_os_sprs); void switch_pmu_to_host(struct kvm_vcpu *vcpu, struct p9_host_os_sprs *host_os_sprs); + +#ifdef CONFIG_KVM_BOOK3S_HV_P9_TIMING +void accumulate_time(struct kvm_vcpu *vcpu, struct kvmhv_tb_accumulator *next); +#define start_timing(vcpu, next) accumulate_time(vcpu, next) +#define end_timing(vcpu) accumulate_time(vcpu, NULL) +#else +#define accumulate_time(vcpu, next) do {} while (0) +#define start_timing(vcpu, next) do {} while (0) +#define end_timing(vcpu) do {} while (0) +#endif diff --git a/arch/powerpc/kvm/book3s_hv_p9_entry.c b/arch/powerpc/kvm/book3s_hv_p9_entry.c index f8ce473149b7..8b2a9a360e4e 100644 --- a/arch/powerpc/kvm/book3s_hv_p9_entry.c +++ b/arch/powerpc/kvm/book3s_hv_p9_entry.c @@ -438,7 +438,7 @@ void restore_p9_host_os_sprs(struct kvm_vcpu *vcpu, EXPORT_SYMBOL_GPL(restore_p9_host_os_sprs); #ifdef CONFIG_KVM_BOOK3S_HV_P9_TIMING -static void __accumulate_time(struct kvm_vcpu *vcpu, struct kvmhv_tb_accumulator *next) +void accumulate_time(struct kvm_vcpu *vcpu, struct kvmhv_tb_accumulator *next) { struct kvmppc_vcore *vc = vcpu->arch.vcore; struct kvmhv_tb_accumulator *curr; @@ -468,14 +468,7 @@ static void __accumulate_time(struct kvm_vcpu *vcpu, struct kvmhv_tb_accumulator smp_wmb(); curr->seqcount = seq + 2; } - -#define start_timing(vcpu, next) __accumulate_time(vcpu, next) -#define end_timing(vcpu) __accumulate_time(vcpu, NULL) -#define accumulate_time(vcpu, next) __accumulate_time(vcpu, next) -#else -#define start_timing(vcpu, next) do {} while (0) -#define end_timing(vcpu) do {} while (0) -#define accumulate_time(vcpu, next) do {} while (0) +EXPORT_SYMBOL_GPL(accumulate_time); #endif static inline u64 mfslbv(unsigned int idx) -- 2.35.1
[PATCH 3/5] KVM: PPC: Book3S HV: Decouple the debug timing from the P8 entry path
We are currently doing the timing for debug purposes of the P9 entry path using the accumulators and terminology defined by the old entry path for P8 machines. Not only the "real-mode" and "napping" mentions are out of place for the P9 Radix entry path but also we cannot change them because the timing code is coupled to the structures defined in struct kvm_vcpu_arch. Add a new CONFIG_KVM_BOOK3S_HV_P9_TIMING to enable the timing code for the P9 entry path. For now, just add the new CONFIG and duplicate the structures. A subsequent patch will add the P9 changes. Signed-off-by: Fabiano Rosas --- arch/powerpc/include/asm/kvm_host.h | 8 arch/powerpc/kvm/Kconfig | 14 +- arch/powerpc/kvm/book3s_hv.c | 13 +++-- arch/powerpc/kvm/book3s_hv_p9_entry.c | 2 +- 4 files changed, 33 insertions(+), 4 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index faf301d0dec0..37f03665bfa2 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -826,11 +826,19 @@ struct kvm_vcpu_arch { #ifdef CONFIG_KVM_BOOK3S_HV_EXIT_TIMING struct kvmhv_tb_accumulator *cur_activity; /* What we're timing */ u64 cur_tb_start; /* when it started */ +#ifdef CONFIG_KVM_BOOK3S_HV_P9_TIMING struct kvmhv_tb_accumulator rm_entry; /* real-mode entry code */ struct kvmhv_tb_accumulator rm_intr;/* real-mode intr handling */ struct kvmhv_tb_accumulator rm_exit;/* real-mode exit code */ struct kvmhv_tb_accumulator guest_time; /* guest execution */ struct kvmhv_tb_accumulator cede_time; /* time napping inside guest */ +#else + struct kvmhv_tb_accumulator rm_entry; /* real-mode entry code */ + struct kvmhv_tb_accumulator rm_intr;/* real-mode intr handling */ + struct kvmhv_tb_accumulator rm_exit;/* real-mode exit code */ + struct kvmhv_tb_accumulator guest_time; /* guest execution */ + struct kvmhv_tb_accumulator cede_time; /* time napping inside guest */ +#endif #endif /* CONFIG_KVM_BOOK3S_HV_EXIT_TIMING */ }; diff --git a/arch/powerpc/kvm/Kconfig b/arch/powerpc/kvm/Kconfig index 73f8277df7d1..191347f44731 100644 --- a/arch/powerpc/kvm/Kconfig +++ b/arch/powerpc/kvm/Kconfig @@ -130,10 +130,22 @@ config KVM_BOOK3S_64_PR config KVM_BOOK3S_HV_EXIT_TIMING bool +config KVM_BOOK3S_HV_P9_TIMING + bool "Detailed timing for the P9 entry point" + select KVM_BOOK3S_HV_EXIT_TIMING + depends on KVM_BOOK3S_HV_POSSIBLE && DEBUG_FS + help + Calculate time taken for each vcpu in various parts of the + code. The total, minimum and maximum times in nanoseconds + together with the number of executions are reported in debugfs in + kvm/vm#/vcpu#/timings. + + If unsure, say N. + config KVM_BOOK3S_HV_P8_TIMING bool "Detailed timing for hypervisor real-mode code (for POWER8)" select KVM_BOOK3S_HV_EXIT_TIMING - depends on KVM_BOOK3S_HV_POSSIBLE && DEBUG_FS + depends on KVM_BOOK3S_HV_POSSIBLE && DEBUG_FS && !KVM_BOOK3S_HV_P9_TIMING help Calculate time taken for each vcpu in the real-mode guest entry, exit, and interrupt handling code, plus time spent in the guest diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index 6fa518f6501d..69a6b40d58b9 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -2653,11 +2653,19 @@ static struct debugfs_timings_element { const char *name; size_t offset; } timings[] = { +#ifdef CONFIG_KVM_BOOK3S_HV_P9_TIMING {"rm_entry",offsetof(struct kvm_vcpu, arch.rm_entry)}, {"rm_intr", offsetof(struct kvm_vcpu, arch.rm_intr)}, {"rm_exit", offsetof(struct kvm_vcpu, arch.rm_exit)}, {"guest", offsetof(struct kvm_vcpu, arch.guest_time)}, {"cede",offsetof(struct kvm_vcpu, arch.cede_time)}, +#else + {"rm_entry",offsetof(struct kvm_vcpu, arch.rm_entry)}, + {"rm_intr", offsetof(struct kvm_vcpu, arch.rm_intr)}, + {"rm_exit", offsetof(struct kvm_vcpu, arch.rm_exit)}, + {"guest", offsetof(struct kvm_vcpu, arch.guest_time)}, + {"cede",offsetof(struct kvm_vcpu, arch.cede_time)}, +#endif }; #define N_TIMINGS (ARRAY_SIZE(timings)) @@ -2776,8 +2784,9 @@ static const struct file_operations debugfs_timings_ops = { /* Create a debugfs directory for the vcpu */ static int kvmppc_arch_create_vcpu_debugfs_hv(struct kvm_vcpu *vcpu, struct dentry *debugfs_dentry) { - debugfs_create_file("timings", 0444, debugfs_dentry, vcpu, - &am
[PATCH 2/5] KVM: PPC: Book3S HV: Add a new config for P8 debug timing
Turn the existing Kconfig KVM_BOOK3S_HV_EXIT_TIMING into KVM_BOOK3S_HV_P8_TIMING in preparation for the addition of a new config for P9 timings. This applies only to P8 code, the generic timing code is still kept under KVM_BOOK3S_HV_EXIT_TIMING. Signed-off-by: Fabiano Rosas --- arch/powerpc/kernel/asm-offsets.c | 2 +- arch/powerpc/kvm/Kconfig| 6 +- arch/powerpc/kvm/book3s_hv_rmhandlers.S | 24 3 files changed, 18 insertions(+), 14 deletions(-) diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c index eec536aef83a..8c10f536e478 100644 --- a/arch/powerpc/kernel/asm-offsets.c +++ b/arch/powerpc/kernel/asm-offsets.c @@ -379,7 +379,7 @@ int main(void) OFFSET(VCPU_SPRG2, kvm_vcpu, arch.shregs.sprg2); OFFSET(VCPU_SPRG3, kvm_vcpu, arch.shregs.sprg3); #endif -#ifdef CONFIG_KVM_BOOK3S_HV_EXIT_TIMING +#ifdef CONFIG_KVM_BOOK3S_HV_P8_TIMING OFFSET(VCPU_TB_RMENTRY, kvm_vcpu, arch.rm_entry); OFFSET(VCPU_TB_RMINTR, kvm_vcpu, arch.rm_intr); OFFSET(VCPU_TB_RMEXIT, kvm_vcpu, arch.rm_exit); diff --git a/arch/powerpc/kvm/Kconfig b/arch/powerpc/kvm/Kconfig index ddd88179110a..73f8277df7d1 100644 --- a/arch/powerpc/kvm/Kconfig +++ b/arch/powerpc/kvm/Kconfig @@ -128,7 +128,11 @@ config KVM_BOOK3S_64_PR and system calls on the host. config KVM_BOOK3S_HV_EXIT_TIMING - bool "Detailed timing for hypervisor real-mode code" + bool + +config KVM_BOOK3S_HV_P8_TIMING + bool "Detailed timing for hypervisor real-mode code (for POWER8)" + select KVM_BOOK3S_HV_EXIT_TIMING depends on KVM_BOOK3S_HV_POSSIBLE && DEBUG_FS help Calculate time taken for each vcpu in the real-mode guest entry, diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S index d185dee26026..c34932e31dcd 100644 --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S @@ -229,14 +229,14 @@ kvm_novcpu_wakeup: cmpdi r4, 0 beq kvmppc_primary_no_guest -#ifdef CONFIG_KVM_BOOK3S_HV_EXIT_TIMING +#ifdef CONFIG_KVM_BOOK3S_HV_P8_TIMING addir3, r4, VCPU_TB_RMENTRY bl kvmhv_start_timing #endif b kvmppc_got_guest kvm_novcpu_exit: -#ifdef CONFIG_KVM_BOOK3S_HV_EXIT_TIMING +#ifdef CONFIG_KVM_BOOK3S_HV_P8_TIMING ld r4, HSTATE_KVM_VCPU(r13) cmpdi r4, 0 beq 13f @@ -515,7 +515,7 @@ kvmppc_hv_entry: li r6, KVM_GUEST_MODE_HOST_HV stb r6, HSTATE_IN_GUEST(r13) -#ifdef CONFIG_KVM_BOOK3S_HV_EXIT_TIMING +#ifdef CONFIG_KVM_BOOK3S_HV_P8_TIMING /* Store initial timestamp */ cmpdi r4, 0 beq 1f @@ -886,7 +886,7 @@ fast_guest_return: li r9, KVM_GUEST_MODE_GUEST_HV stb r9, HSTATE_IN_GUEST(r13) -#ifdef CONFIG_KVM_BOOK3S_HV_EXIT_TIMING +#ifdef CONFIG_KVM_BOOK3S_HV_P8_TIMING /* Accumulate timing */ addir3, r4, VCPU_TB_GUEST bl kvmhv_accumulate_time @@ -937,7 +937,7 @@ secondary_too_late: cmpdi r4, 0 beq 11f stw r12, VCPU_TRAP(r4) -#ifdef CONFIG_KVM_BOOK3S_HV_EXIT_TIMING +#ifdef CONFIG_KVM_BOOK3S_HV_P8_TIMING addir3, r4, VCPU_TB_RMEXIT bl kvmhv_accumulate_time #endif @@ -951,7 +951,7 @@ hdec_soon: li r12, BOOK3S_INTERRUPT_HV_DECREMENTER 12:stw r12, VCPU_TRAP(r4) mr r9, r4 -#ifdef CONFIG_KVM_BOOK3S_HV_EXIT_TIMING +#ifdef CONFIG_KVM_BOOK3S_HV_P8_TIMING addir3, r4, VCPU_TB_RMEXIT bl kvmhv_accumulate_time #endif @@ -1048,7 +1048,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR) li r0, MSR_RI mtmsrd r0, 1 -#ifdef CONFIG_KVM_BOOK3S_HV_EXIT_TIMING +#ifdef CONFIG_KVM_BOOK3S_HV_P8_TIMING addir3, r9, VCPU_TB_RMINTR mr r4, r9 bl kvmhv_accumulate_time @@ -1127,7 +1127,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR) guest_exit_cont: /* r9 = vcpu, r12 = trap, r13 = paca */ -#ifdef CONFIG_KVM_BOOK3S_HV_EXIT_TIMING +#ifdef CONFIG_KVM_BOOK3S_HV_P8_TIMING addir3, r9, VCPU_TB_RMEXIT mr r4, r9 bl kvmhv_accumulate_time @@ -1487,7 +1487,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S) mtspr SPRN_LPCR,r8 isync -#ifdef CONFIG_KVM_BOOK3S_HV_EXIT_TIMING +#ifdef CONFIG_KVM_BOOK3S_HV_P8_TIMING /* Finish timing, if we have a vcpu */ ld r4, HSTATE_KVM_VCPU(r13) cmpdi r4, 0 @@ -2155,7 +2155,7 @@ END_FTR_SECTION_IFCLR(CPU_FTR_TM) ld r4, HSTATE_KVM_VCPU(r13) std r3, VCPU_DEC_EXPIRES(r4) -#ifdef CONFIG_KVM_BOOK3S_HV_EXIT_TIMING +#ifdef CONFIG_KVM_BOOK3S_HV_P8_TIMING ld r4, HSTATE_KVM_VCPU(r13) addir3, r4, VCPU_TB_CEDE bl kvmhv_accumulate_time @@ -2223,7 +2223,7 @@ kvm_end_cede: /* get vcpu p
[PATCH 1/5] KVM: PPC: Book3S HV: Fix "rm_exit" entry in debugfs timings
At debugfs/kvm//vcpu0/timings we show how long each part of the code takes to run: $ cat /sys/kernel/debug/kvm/*-*/vcpu0/timings rm_entry: 123785 49398892 118 4898 rm_intr: 123780 6075890 22 390 rm_exit: 0 0 0 0 <-- NOK guest: 123780 46732919988 402 9997638 cede: 0 0 0 0<-- OK, no cede napping in P9 The "rm_exit" is always showing zero because it is the last one and end_timing does not increment the counter of the previous entry. We can fix it by calling accumulate_time again instead of end_timing. That way the counter gets incremented. The rest of the arithmetic can be ignored because there are no timing points after this and the accumulators are reset before the next round. Signed-off-by: Fabiano Rosas --- arch/powerpc/kvm/book3s_hv_p9_entry.c | 13 ++--- 1 file changed, 2 insertions(+), 11 deletions(-) diff --git a/arch/powerpc/kvm/book3s_hv_p9_entry.c b/arch/powerpc/kvm/book3s_hv_p9_entry.c index a28e5b3daabd..f7591b6c92d1 100644 --- a/arch/powerpc/kvm/book3s_hv_p9_entry.c +++ b/arch/powerpc/kvm/book3s_hv_p9_entry.c @@ -438,15 +438,6 @@ void restore_p9_host_os_sprs(struct kvm_vcpu *vcpu, EXPORT_SYMBOL_GPL(restore_p9_host_os_sprs); #ifdef CONFIG_KVM_BOOK3S_HV_EXIT_TIMING -static void __start_timing(struct kvm_vcpu *vcpu, struct kvmhv_tb_accumulator *next) -{ - struct kvmppc_vcore *vc = vcpu->arch.vcore; - u64 tb = mftb() - vc->tb_offset_applied; - - vcpu->arch.cur_activity = next; - vcpu->arch.cur_tb_start = tb; -} - static void __accumulate_time(struct kvm_vcpu *vcpu, struct kvmhv_tb_accumulator *next) { struct kvmppc_vcore *vc = vcpu->arch.vcore; @@ -478,8 +469,8 @@ static void __accumulate_time(struct kvm_vcpu *vcpu, struct kvmhv_tb_accumulator curr->seqcount = seq + 2; } -#define start_timing(vcpu, next) __start_timing(vcpu, next) -#define end_timing(vcpu) __start_timing(vcpu, NULL) +#define start_timing(vcpu, next) __accumulate_time(vcpu, next) +#define end_timing(vcpu) __accumulate_time(vcpu, NULL) #define accumulate_time(vcpu, next) __accumulate_time(vcpu, next) #else #define start_timing(vcpu, next) do {} while (0) -- 2.35.1
[PATCH 0/5] KVM: PPC: Book3S HV: Update debug timing code
We have some debug information at /sys/kernel/debug/kvm//vcpu#/timings which shows the time it takes to run various parts of the code. That infrastructure was written in the P8 timeframe and wasn't updated along with the guest entry point changes for P9. Ideally we would be able to just add new/different accounting points to the code as it changes over time but since the P8 and P9 entry points are different code paths we first need to separate them from each other. This series alters KVM Kconfig to make that distinction. Currently: CONFIG_KVM_BOOK3S_HV_EXIT_TIMING - timing infrastructure in asm (P8 only) timing infrastructure in C (P9 only) generic timing variables (P8/P9) debugfs code timing points for P8 After this series: CONFIG_KVM_BOOK3S_HV_EXIT_TIMING - generic timing variables (P8/P9) debugfs code CONFIG_KVM_BOOK3S_HV_P8_TIMING - timing infrastructure in asm (P8 only) timing points for P8 CONFIG_KVM_BOOK3S_HV_P9_TIMING - timing infrastructure in C (P9 only) timing points for P9 The new Kconfig rules are: a) CONFIG_KVM_BOOK3S_HV_P8_TIMING selects CONFIG_KVM_BOOK3S_HV_EXIT_TIMING, resulting in the previous behavior. Tested on P8. b) CONFIG_KVM_BOOK3S_HV_P9_TIMING selects CONFIG_KVM_BOOK3S_HV_EXIT_TIMING, resulting in the new behavior. Tested on P9. c) CONFIG_KVM_BOOK3S_HV_P8_TIMING and CONFIG_KVM_BOOK3S_HV_P9_TIMING are mutually exclusive. If both are set, P9 takes precedence. Fabiano Rosas (5): KVM: PPC: Book3S HV: Fix "rm_exit" entry in debugfs timings KVM: PPC: Book3S HV: Add a new config for P8 debug timing KVM: PPC: Book3S HV: Decouple the debug timing from the P8 entry path KVM: PPC: Book3S HV: Expose timing functions to module code KVM: PPC: Book3S HV: Provide more detailed timings for P9 entry path arch/powerpc/include/asm/kvm_host.h | 10 +++ arch/powerpc/kernel/asm-offsets.c | 2 +- arch/powerpc/kvm/Kconfig| 19 - arch/powerpc/kvm/book3s_hv.c| 26 -- arch/powerpc/kvm/book3s_hv.h| 10 +++ arch/powerpc/kvm/book3s_hv_p9_entry.c | 36 + arch/powerpc/kvm/book3s_hv_rmhandlers.S | 24 - 7 files changed, 82 insertions(+), 45 deletions(-) -- 2.35.1
[PATCH] KVM: PPC: Align pt_regs in kvm_vcpu_arch structure
The H_ENTER_NESTED hypercall receives as second parameter the address of a region of memory containing the values for the nested guest privileged registers. We currently use the pt_regs structure contained within kvm_vcpu_arch for that end. Most hypercalls that receive a memory address expect that region to not cross a 4k page boundary. We would want H_ENTER_NESTED to follow the same pattern so this patch ensures the pt_regs structure sits within a page. Signed-off-by: Fabiano Rosas --- arch/powerpc/include/asm/kvm_host.h | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index faf301d0dec0..87eba60f2920 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -519,7 +519,11 @@ struct kvm_vcpu_arch { struct kvmppc_book3s_shadow_vcpu *shadow_vcpu; #endif - struct pt_regs regs; + /* +* This is passed along to the HV via H_ENTER_NESTED. Align to +* prevent it crossing a real 4K page. +*/ + struct pt_regs regs __aligned(512); struct thread_fp_state fp; -- 2.35.1
Re: [PATCH kernel] KVM: PPC: Book3s: PR: Enable default TCE hypercalls
Alexey Kardashevskiy writes: > When KVM_CAP_PPC_ENABLE_HCALL was introduced, H_GET_TCE and H_PUT_TCE > were already implemented and enabled by default; however H_GET_TCE > was missed out on PR KVM (probably because the handler was in > the real mode code at the time). > > This enables H_GET_TCE by default. While at this, this wraps > the checks in ifdef CONFIG_SPAPR_TCE_IOMMU just like HV KVM. > > Signed-off-by: Alexey Kardashevskiy Reviewed-by: Fabiano Rosas
Re: [PATCH v2] powerpc/rtas: Keep MSR[RI] set when calling RTAS
Michael Ellerman writes: >> diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S >> index 9581906b5ee9..65cb14b56f8d 100644 >> --- a/arch/powerpc/kernel/entry_64.S >> +++ b/arch/powerpc/kernel/entry_64.S >> @@ -330,22 +330,18 @@ _GLOBAL(enter_rtas) >> clrldi r4,r4,2 /* convert to realmode address */ >> mtlrr4 >> >> -li r0,0 >> -ori r0,r0,MSR_EE|MSR_SE|MSR_BE|MSR_RI >> -andcr0,r6,r0 >> - >> -li r9,1 >> -rldicr r9,r9,MSR_SF_LG,(63-MSR_SF_LG) >> -ori r9,r9,MSR_IR|MSR_DR|MSR_FE0|MSR_FE1|MSR_FP|MSR_RI|MSR_LE >> -andcr6,r0,r9 > > One advantage of the old method is it can adapt to new MSR bits being > set by the kernel. > > For example we used to use RTAS on powernv, and this code didn't need > updating to cater to MSR_HV being set. We will probably never use RTAS > on bare-metal again, so that's OK. > > But your change might break secure virtual machines, because it clears > MSR_S whereas the old code didn't. I think SVMs did use RTAS, but I > don't know whether it matters if it's called with MSR_S set or not? > > Not sure if anyone will remember, or has a working setup they can test. > Maybe for now we just copy MSR_S from the kernel MSR the way the > current code does. Would the kernel even be able to change the bit? I think only urfid can clear MSR_S.
Re: [PATCH kernel] KVM: PPC: Book3s: Retire H_PUT_TCE/etc real mode handlers
Alexey Kardashevskiy writes: > diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S > b/arch/powerpc/kvm/book3s_hv_rmhandlers.S > index d185dee26026..44d74bfe05df 100644 > --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S > +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S > @@ -1784,13 +1784,8 @@ hcall_real_table: > .long DOTSYM(kvmppc_h_clear_mod) - hcall_real_table > .long DOTSYM(kvmppc_h_clear_ref) - hcall_real_table > .long DOTSYM(kvmppc_h_protect) - hcall_real_table > -#ifdef CONFIG_SPAPR_TCE_IOMMU > - .long DOTSYM(kvmppc_h_get_tce) - hcall_real_table > - .long DOTSYM(kvmppc_rm_h_put_tce) - hcall_real_table > -#else > .long 0 /* 0x1c */ > .long 0 /* 0x20 */ > -#endif > .long 0 /* 0x24 - H_SET_SPRG0 */ > .long DOTSYM(kvmppc_h_set_dabr) - hcall_real_table > .long DOTSYM(kvmppc_rm_h_page_init) - hcall_real_table > @@ -1868,13 +1863,8 @@ hcall_real_table: > .long 0 /* 0x12c */ > .long 0 /* 0x130 */ > .long DOTSYM(kvmppc_h_set_xdabr) - hcall_real_table > -#ifdef CONFIG_SPAPR_TCE_IOMMU > - .long DOTSYM(kvmppc_rm_h_stuff_tce) - hcall_real_table > - .long DOTSYM(kvmppc_rm_h_put_tce_indirect) - hcall_real_table > -#else > .long 0 /* 0x138 */ > .long 0 /* 0x13c */ > -#endif > .long 0 /* 0x140 */ > .long 0 /* 0x144 */ > .long 0 /* 0x148 */ The ones you remove from here need to be added to kvmppc_hcall_impl_hv, otherwise we get the WARN at init_default_hcalls because kvmppc_hcall_impl_hv_realmode can't find them.
Re: serial hang in qemu-system-ppc64 -M pseries
Rob Landley writes: > On 4/27/22 10:27, Thomas Huth wrote: >> On 26/04/2022 12.26, Rob Landley wrote: >>> When I cut and paste 80-ish characters of text into the Linux serial >>> console, it >>> reads 16 characters and stops. When I hit space, it reads another 16 >>> characters, >>> and if I keep at it will eventually catch up without losing data. If I type, >>> every character shows up immediately. >> >> That "16" certainly comes from VTERM_BUFSIZE in hw/char/spapr_vty.c in the >> QEMU sources, I think. >> >>> (On other qemu targets and kernels I can cut and paste an entire uuencoded >>> binary and it goes through just fine in one go, but this target hangs with >>> big >>> pastes until I hit keys.) >>> >>> Is this a qemu-side bug, or a kernel-side bug? >>> >>> Kernel config attached (linux 5.18-rc3 or thereabouts), qemu invocation is: >>> >>> qemu-system-ppc64 -M pseries -vga none -nographic -no-reboot -m 256 -kernel >>> vmlinux -initrd powerpc64leroot.cpio.gz -append "panic=1 HOST=powerpc64le >>> console=hvc0" >> >> Which version of QEMU are you using? > > $ qemu-system-ppc64 --version > QEMU emulator version 6.2.92 (v6.2.0-rc2) > Copyright (c) 2003-2021 Fabrice Bellard and the QEMU Project developers > > From november. I can pull and rebuild but it'll take a bit. (Hopefully > rebuilding would fix the need to echo -e '\e[?7h' afterwards to undo the "bash > command line history marches up the screen because qemu's x86 bios disabled > line > wrap and then left it that way" issue...) > >> Which frontend (GTK or terminal?) ... > > The above command line has -nographic, forcing terminal. Running ldd on the > binary doesn't pull up anything gtk. (It pulls up libncursesw though.) > > If you want to reproduce my test locally: > > wget > https://landley.net/toybox/downloads/binaries/mkroot/0.8.5/powerpc64le.tgz > tar xvzf powerpc64le.tgz > cd powerpc64le > ./qemu-*.sh > > Then paste something longer than 16 characters at the eventual command prompt > once the kernel finishes booting. I suspect this is due to how the tty driver (n_tty.c) interacts with the console (hvc_console.c) driver's buffer size. This is the stack: #0 hvc_push <-- calls into KVM/QEMU to write up to 16 bytes #1 hvc_write #2 tty_put_char #3 do_output_char #4 __process_echoes <-- writes up to tty_write_room() chars #5 flush_echoes <-- returns early if ~ECHO && ~ECHONL #6 n_tty_receive_buf_common <-- buffers more than 16 bytes #7 tty_ldisc_receive_buf #8 tty_port_default_receive_buf #9 receive_buf #10 process_one_work In my busybox instance which does not have this issue I can see that termios.c_lflag = 0x447, so in the stack above #4 is not called (ECHO is 0x8, ECHONL is 0x10). In the bug scenario: termios.c_lflag = 0x5cf, so we go into #4 which is supposed to write (say) 17 bytes, but ends up writing only 16 because that is what tty_write_room() returns. What I think is causing this issue is that the hvc_vio.c driver is configured with hp->outbuf_size = 16. That number comes from the H_PUT_TERM_CHAR hypercall spec which reads two registers at a time. However, the hvc_write function supports writes larger than 16 bytes so it seems we're needlessly propagating the 16 byte limitation to the upper layer. The driver is also not buffering the write, so if it gets called to write one char at a time (like __process_echoes does) there should be no limit to how much it can write. I think if we increase hp->outbuf_size to a value that is larger than N_TTY_BUF_SIZE=4096 the echo buffer would drain before reaching this new hvc driver limit. I tested that and it seems to work, but I'm not sure if it's the right fix, there are some things I couldn't figure out: i) if a driver actually runs out of buffer space, what __process_echoes should do about it; ii) why the bug sometimes happens only at the 32 characters boundary (or other multiple of 16); iii) why the ECHO flag differs from the working case. > If you want to reproduce it all from source: > > git clone https://github.com/landley/toybox > cd toybox && mkdir ccc && cd ccc > wget > https://landley.net/toybox/downloads/binaries/toolchains/latest/powerpc64le-linux-musl-cross.tar.xz > -O - | tar xv > cd .. > CROSS=powerpc64le LINUX=~/linux scripts/mkroot.sh > cd root/powerpc64le > ./qemu-*.sh > > This assumes your linux kernel source directory is in ~/linux of course, and > that qemu-system-ppc64 is in the $PATH... > >> this rings a distant bell, but I thought we had fixed these issues long ago >> in the past... e.g.: >> >> https://yhbt.net/lore/all/1380113886-16845-10-git-send-email-mdr...@linux.vnet.ibm.com/ >> >> https://git.qemu.org/?p=qemu.git;a=commitdiff;h=8a273cbe53221d28 > > The qemu I'm running is newer than 2016. :) > > Most targets are fine with this: I cut and paste entire uuencoded binaries > into > the serial console as an easy way to insert a file into an initramfs. It can > usually take multiple megabytes without
Re: [PATCH] KVM: PPC: Book3S HV: Initialize AMOR in nested entry
Nicholas Piggin writes: > Excerpts from Fabiano Rosas's message of April 26, 2022 12:21 am: >> The hypervisor always sets AMOR to ~0, but let's ensure we're not >> passing stale values around. >> > > Reviewed-by: Nicholas Piggin > > Looks like our L0 doesn't do anything with hvregs.amor ? It doesn't. And if the HV ever starts clearing bits from AMOR, then we would need to change any kernel code that writes and reads from AMR ( such as the KUAP) to take into consideration that we might read a different value from what we wrote.
[PATCH] KVM: PPC: Book3S HV: Initialize AMOR in nested entry
The hypervisor always sets AMOR to ~0, but let's ensure we're not passing stale values around. Signed-off-by: Fabiano Rosas --- arch/powerpc/kvm/book3s_hv.c | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index 6fa518f6501d..b5f504576765 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -3967,6 +3967,7 @@ static int kvmhv_vcpu_entry_p9_nested(struct kvm_vcpu *vcpu, u64 time_limit, uns kvmhv_save_hv_regs(vcpu, &hvregs); hvregs.lpcr = lpcr; + hvregs.amor = ~0; vcpu->arch.regs.msr = vcpu->arch.shregs.msr; hvregs.version = HV_GUEST_STATE_VERSION; if (vcpu->arch.nested) { -- 2.35.1
Re: [PATCH 3/3] KVM: Add helpers to wrap vcpu->srcu_idx and yell if it's abused
Sean Christopherson writes: > Add wrappers to acquire/release KVM's SRCU lock when stashing the index > in vcpu->src_idx, along with rudimentary detection of illegal usage, > e.g. re-acquiring SRCU and thus overwriting vcpu->src_idx. Because the > SRCU index is (currently) either 0 or 1, illegal nesting bugs can go > unnoticed for quite some time and only cause problems when the nested > lock happens to get a different index. > > Wrap the WARNs in PROVE_RCU=y, and make them ONCE, otherwise KVM will > likely yell so loudly that it will bring the kernel to its knees. > > Signed-off-by: Sean Christopherson For the powerpc part: Tested-by: Fabiano Rosas > --- > arch/powerpc/kvm/book3s_64_mmu_radix.c | 9 + > arch/powerpc/kvm/book3s_hv_nested.c| 16 +++ > arch/powerpc/kvm/book3s_rtas.c | 4 ++-- > arch/powerpc/kvm/powerpc.c | 4 ++-- > arch/riscv/kvm/vcpu.c | 16 +++ > arch/riscv/kvm/vcpu_exit.c | 4 ++-- > arch/s390/kvm/interrupt.c | 4 ++-- > arch/s390/kvm/kvm-s390.c | 8 > arch/s390/kvm/vsie.c | 4 ++-- > arch/x86/kvm/x86.c | 28 -- > include/linux/kvm_host.h | 24 +- > 11 files changed, 71 insertions(+), 50 deletions(-) > > diff --git a/arch/powerpc/kvm/book3s_64_mmu_radix.c > b/arch/powerpc/kvm/book3s_64_mmu_radix.c > index e4ce2a35483f..42851c32ff3b 100644 > --- a/arch/powerpc/kvm/book3s_64_mmu_radix.c > +++ b/arch/powerpc/kvm/book3s_64_mmu_radix.c > @@ -168,9 +168,10 @@ int kvmppc_mmu_walk_radix_tree(struct kvm_vcpu *vcpu, > gva_t eaddr, > return -EINVAL; > /* Read the entry from guest memory */ > addr = base + (index * sizeof(rpte)); > - vcpu->srcu_idx = srcu_read_lock(&kvm->srcu); > + > + kvm_vcpu_srcu_read_lock(vcpu); > ret = kvm_read_guest(kvm, addr, &rpte, sizeof(rpte)); > - srcu_read_unlock(&kvm->srcu, vcpu->srcu_idx); > + kvm_vcpu_srcu_read_unlock(vcpu); > if (ret) { > if (pte_ret_p) > *pte_ret_p = addr; > @@ -246,9 +247,9 @@ int kvmppc_mmu_radix_translate_table(struct kvm_vcpu > *vcpu, gva_t eaddr, > > /* Read the table to find the root of the radix tree */ > ptbl = (table & PRTB_MASK) + (table_index * sizeof(entry)); > - vcpu->srcu_idx = srcu_read_lock(&kvm->srcu); > + kvm_vcpu_srcu_read_lock(vcpu); > ret = kvm_read_guest(kvm, ptbl, &entry, sizeof(entry)); > - srcu_read_unlock(&kvm->srcu, vcpu->srcu_idx); > + kvm_vcpu_srcu_read_unlock(vcpu); > if (ret) > return ret; > > diff --git a/arch/powerpc/kvm/book3s_hv_nested.c > b/arch/powerpc/kvm/book3s_hv_nested.c > index 9d373f8963ee..c943a051c6e7 100644 > --- a/arch/powerpc/kvm/book3s_hv_nested.c > +++ b/arch/powerpc/kvm/book3s_hv_nested.c > @@ -306,10 +306,10 @@ long kvmhv_enter_nested_guest(struct kvm_vcpu *vcpu) > /* copy parameters in */ > hv_ptr = kvmppc_get_gpr(vcpu, 4); > regs_ptr = kvmppc_get_gpr(vcpu, 5); > - vcpu->srcu_idx = srcu_read_lock(&vcpu->kvm->srcu); > + kvm_vcpu_srcu_read_lock(vcpu); > err = kvmhv_read_guest_state_and_regs(vcpu, &l2_hv, &l2_regs, > hv_ptr, regs_ptr); > - srcu_read_unlock(&vcpu->kvm->srcu, vcpu->srcu_idx); > + kvm_vcpu_srcu_read_unlock(vcpu); > if (err) > return H_PARAMETER; > > @@ -410,10 +410,10 @@ long kvmhv_enter_nested_guest(struct kvm_vcpu *vcpu) > byteswap_hv_regs(&l2_hv); > byteswap_pt_regs(&l2_regs); > } > - vcpu->srcu_idx = srcu_read_lock(&vcpu->kvm->srcu); > + kvm_vcpu_srcu_read_lock(vcpu); > err = kvmhv_write_guest_state_and_regs(vcpu, &l2_hv, &l2_regs, > hv_ptr, regs_ptr); > - srcu_read_unlock(&vcpu->kvm->srcu, vcpu->srcu_idx); > + kvm_vcpu_srcu_read_unlock(vcpu); > if (err) > return H_AUTHORITY; > > @@ -600,16 +600,16 @@ long kvmhv_copy_tofrom_guest_nested(struct kvm_vcpu > *vcpu) > goto not_found; > > /* Write what was loaded into our buffer back to the L1 guest */ > - vcpu->srcu_idx = srcu_read_lock(&vcpu->kvm->srcu); > + kvm_vcpu_srcu_read_lock(vcpu); > rc = kvm_vcpu_write_guest(vcpu, gp_to, buf, n
[PATCH] KVM: PPC: Book3S HV: Fix vcore_blocked tracepoint
We removed most of the vcore logic from the P9 path but there's still a tracepoint that tried to dereference vc->runner. Fixes: ecb6a7207f92 ("KVM: PPC: Book3S HV P9: Remove most of the vcore logic") Signed-off-by: Fabiano Rosas --- arch/powerpc/kvm/book3s_hv.c | 8 arch/powerpc/kvm/trace_hv.h | 8 2 files changed, 8 insertions(+), 8 deletions(-) diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index c886557638a1..5f5b2d0dee8c 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -4218,13 +4218,13 @@ static void kvmppc_vcore_blocked(struct kvmppc_vcore *vc) start_wait = ktime_get(); vc->vcore_state = VCORE_SLEEPING; - trace_kvmppc_vcore_blocked(vc, 0); + trace_kvmppc_vcore_blocked(vc->runner, 0); spin_unlock(&vc->lock); schedule(); finish_rcuwait(&vc->wait); spin_lock(&vc->lock); vc->vcore_state = VCORE_INACTIVE; - trace_kvmppc_vcore_blocked(vc, 1); + trace_kvmppc_vcore_blocked(vc->runner, 1); ++vc->runner->stat.halt_successful_wait; cur = ktime_get(); @@ -4596,9 +4596,9 @@ int kvmhv_run_single_vcpu(struct kvm_vcpu *vcpu, u64 time_limit, if (kvmppc_vcpu_check_block(vcpu)) break; - trace_kvmppc_vcore_blocked(vc, 0); + trace_kvmppc_vcore_blocked(vcpu, 0); schedule(); - trace_kvmppc_vcore_blocked(vc, 1); + trace_kvmppc_vcore_blocked(vcpu, 1); } finish_rcuwait(wait); } diff --git a/arch/powerpc/kvm/trace_hv.h b/arch/powerpc/kvm/trace_hv.h index 38cd0ed0a617..32e2cb5811cc 100644 --- a/arch/powerpc/kvm/trace_hv.h +++ b/arch/powerpc/kvm/trace_hv.h @@ -409,9 +409,9 @@ TRACE_EVENT(kvmppc_run_core, ); TRACE_EVENT(kvmppc_vcore_blocked, - TP_PROTO(struct kvmppc_vcore *vc, int where), + TP_PROTO(struct kvm_vcpu *vcpu, int where), - TP_ARGS(vc, where), + TP_ARGS(vcpu, where), TP_STRUCT__entry( __field(int,n_runnable) @@ -421,8 +421,8 @@ TRACE_EVENT(kvmppc_vcore_blocked, ), TP_fast_assign( - __entry->runner_vcpu = vc->runner->vcpu_id; - __entry->n_runnable = vc->n_runnable; + __entry->runner_vcpu = vcpu->vcpu_id; + __entry->n_runnable = vcpu->arch.vcore->n_runnable; __entry->where = where; __entry->tgid= current->tgid; ), -- 2.35.1
Re: [PATCH 6/6] KVM: PPC: Book3S HV Nested: L2 LPCR should inherit L1 LPES setting
Nicholas Piggin writes: > The L1 should not be able to adjust LPES mode for the L2. Setting LPES > if the L0 needs it clear would cause external interrupts to be sent to > L2 and missed by the L0. > > Clearing LPES when it may be set, as typically happens with XIVE enabled > could cause a performance issue despite having no native XIVE support in > the guest, because it will cause mediated interrupts for the L2 to be > taken in HV mode, which then have to be injected. > > Signed-off-by: Nicholas Piggin Reviewed-by: Fabiano Rosas
Re: [PATCH 4/6] KVM: PPC: Book3S HV P9: Split !nested case out from guest entry
Nicholas Piggin writes: > The differences between nested and !nested will become larger in > later changes so split them out for readability. > > Signed-off-by: Nicholas Piggin Reviewed-by: Fabiano Rosas
Re: [PATCH 3/6] KVM: PPC: Book3S HV P9: Move cede logic out of XIVE escalation rearming
Nicholas Piggin writes: > Move the cede abort logic out of xive escalation rearming and into > the caller to prepare for handling a similar case with nested guest > entry. > > Signed-off-by: Nicholas Piggin > --- > arch/powerpc/include/asm/kvm_ppc.h | 4 ++-- > arch/powerpc/kvm/book3s_hv.c | 10 -- > arch/powerpc/kvm/book3s_xive.c | 9 ++--- > 3 files changed, 16 insertions(+), 7 deletions(-) > > diff --git a/arch/powerpc/include/asm/kvm_ppc.h > b/arch/powerpc/include/asm/kvm_ppc.h > index a14dbcd1b8ce..94fa5f246657 100644 > --- a/arch/powerpc/include/asm/kvm_ppc.h > +++ b/arch/powerpc/include/asm/kvm_ppc.h > @@ -671,7 +671,7 @@ extern int kvmppc_xive_set_irq(struct kvm *kvm, int > irq_source_id, u32 irq, > int level, bool line_status); > extern void kvmppc_xive_push_vcpu(struct kvm_vcpu *vcpu); > extern void kvmppc_xive_pull_vcpu(struct kvm_vcpu *vcpu); > -extern void kvmppc_xive_rearm_escalation(struct kvm_vcpu *vcpu); > +extern bool kvmppc_xive_rearm_escalation(struct kvm_vcpu *vcpu); > > static inline int kvmppc_xive_enabled(struct kvm_vcpu *vcpu) > { > @@ -709,7 +709,7 @@ static inline int kvmppc_xive_set_irq(struct kvm *kvm, > int irq_source_id, u32 ir > int level, bool line_status) { return > -ENODEV; } > static inline void kvmppc_xive_push_vcpu(struct kvm_vcpu *vcpu) { } > static inline void kvmppc_xive_pull_vcpu(struct kvm_vcpu *vcpu) { } > -static inline void kvmppc_xive_rearm_escalation(struct kvm_vcpu *vcpu) { } > +static inline bool kvmppc_xive_rearm_escalation(struct kvm_vcpu *vcpu) { > return true; } > > static inline int kvmppc_xive_enabled(struct kvm_vcpu *vcpu) > { return 0; } > diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c > index 5df359053147..a0b674d3a2da 100644 > --- a/arch/powerpc/kvm/book3s_hv.c > +++ b/arch/powerpc/kvm/book3s_hv.c > @@ -4073,10 +4073,16 @@ static int kvmhv_p9_guest_entry(struct kvm_vcpu > *vcpu, u64 time_limit, > !(vcpu->arch.shregs.msr & MSR_PR)) { > unsigned long req = kvmppc_get_gpr(vcpu, 3); > > - /* H_CEDE has to be handled now, not later */ > + /* H_CEDE has to be handled now */ > if (req == H_CEDE) { > kvmppc_cede(vcpu); > - kvmppc_xive_rearm_escalation(vcpu); /* may > un-cede */ > + if (!kvmppc_xive_rearm_escalation(vcpu)) { > + /* > + * Pending escalation so abort > + * the cede. > + */ > + vcpu->arch.ceded = 0; This was moved after the mb() in kvmppc_xive_rearm_escalation, so I think a concurrent H_PROD might continue to see tvcpu->arch.ceded = 1 after the escalation has been set. Is this an issue? > + } > kvmppc_set_gpr(vcpu, 3, 0); > trap = 0; > > diff --git a/arch/powerpc/kvm/book3s_xive.c b/arch/powerpc/kvm/book3s_xive.c > index e216c068075d..7b513e14cada 100644 > --- a/arch/powerpc/kvm/book3s_xive.c > +++ b/arch/powerpc/kvm/book3s_xive.c > @@ -179,12 +179,13 @@ void kvmppc_xive_pull_vcpu(struct kvm_vcpu *vcpu) > } > EXPORT_SYMBOL_GPL(kvmppc_xive_pull_vcpu); > > -void kvmppc_xive_rearm_escalation(struct kvm_vcpu *vcpu) > +bool kvmppc_xive_rearm_escalation(struct kvm_vcpu *vcpu) > { > void __iomem *esc_vaddr = (void __iomem *)vcpu->arch.xive_esc_vaddr; > + bool ret = true; > > if (!esc_vaddr) > - return; > + return ret; > > /* we are using XIVE with single escalation */ > > @@ -197,7 +198,7 @@ void kvmppc_xive_rearm_escalation(struct kvm_vcpu *vcpu) >* we also don't want to set xive_esc_on to 1 here in >* case we race with xive_esc_irq(). >*/ > - vcpu->arch.ceded = 0; > + ret = false; > /* >* The escalation interrupts are special as we don't EOI them. >* There is no need to use the load-after-store ordering offset > @@ -210,6 +211,8 @@ void kvmppc_xive_rearm_escalation(struct kvm_vcpu *vcpu) > __raw_readq(esc_vaddr + XIVE_ESB_SET_PQ_00); > } > mb(); > + > + return ret; > } > EXPORT_SYMBOL_GPL(kvmppc_xive_rearm_escalation);
Re: [PATCH 1/6] KVM: PPC: Book3S HV P9: Fix "lost kick" race
Nicholas Piggin writes: > When new work is created that requires attention from the hypervisor > (e.g., to inject an interrupt into the guest), fast_vcpu_kick is used to > pull the target vcpu out of the guest if it may have been running. > > Therefore the work creation side looks like this: > > vcpu->arch.doorbell_request = 1; > kvmppc_fast_vcpu_kick_hv(vcpu) { > smp_mb(); > cpu = vcpu->cpu; > if (cpu != -1) > send_ipi(cpu); > } > > And the guest entry side *should* look like this: > > vcpu->cpu = smp_processor_id(); > smp_mb(); > if (vcpu->arch.doorbell_request) { > // do something (abort entry or inject doorbell etc) > } > > But currently the store and load are flipped, so it is possible for the > entry to see no doorbell pending, and the doorbell creation misses the > store to set cpu, resulting lost work (or at least delayed until the > next guest exit). > > Fix this by reordering the entry operations and adding a smp_mb > between them. The P8 path appears to have a similar race which is > commented but not addressed yet. > > Signed-off-by: Nicholas Piggin Reviewed-by: Fabiano Rosas
[RFC PATCH] KVM: PPC: Book3S HV: Add KVM_CAP_PPC_GTSE
This patch adds a new KVM capability to address a crash we're currently having inside the nested guest kernel when running with GTSE disabled in the nested hypervisor. The summary is: We allow any guest a cmdline override of GTSE for migration purposes. The nested guest does not know it needs to use the option and tries to run 'tlbie' with LPCR_GTSE=0. The details are a bit more intricate: QEMU always sets GTSE=1 in OV5 even before calling KVM. At prom_init, guests use the OV5 value to set MMU_FTR_GTSE. This setting can be overridden by 'radix_hcall_invalidate=on' in the kernel cmdline. The option itself depends on the availability of FW_FEATURE_RPT_INVALIDATE, which is tied to QEMU's cap-rpt-invalidate capability. The MMU_FTR_GTSE flag leads guests to set PROC_TABLE_GTSE in their process tables and after H_REGISTER_PROC_TBL, both QEMU and KVM will set LPCR_GTSE=1 for that guest. Unless the guest uses the cmdline override, in which case: MMU_FTR_GTSE=0 -> PROC_TABLE_GTSE=0 -> LPCR_GTSE=0 We don't allow the nested hypervisor to set some LPCR bits for its nested guests, so if the nested HV has LPCR_GTSE=0, its nested guests will also have LPCR_GTSE=0. But since the only thing that can really flip GTSE is the cmdline override, if a nested guest runs without it, then the sequence goes: MMU_FTR_GTSE=1 -> PROC_TABLE_GTSE=1 -> LPCR_GTSE=0. With LPCR_GTSE=0 the HW will treat 'tlbie' as HV privileged. How the new capability helps: By having QEMU consult KVM on what the correct GTSE value is, we can have the nested hypervisor return the same value that it is currently using. QEMU will then put the correct value in the device-tree for the nested guest and MMU_FTR_GTSE will match LPCR_GTSE. Fixes: b87cc116c7e1 ("KVM: PPC: Book3S HV: Add KVM_CAP_PPC_RPT_INVALIDATE capability") Signed-off-by: Fabiano Rosas --- This supersedes the previous RFC: "KVM: PPC: Book3s HV: Allow setting GTSE for the nested guest"*. Aneesh explained to me that we don't want to allow L1 and L2 GTSE values to differ. *- https://lore.kernel.org/r/20220304182657.2489303-1-faro...@linux.ibm.com --- arch/powerpc/kvm/powerpc.c | 3 +++ include/uapi/linux/kvm.h | 1 + 2 files changed, 4 insertions(+) diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c index 2ad0ccd202d5..dd08b3b729cd 100644 --- a/arch/powerpc/kvm/powerpc.c +++ b/arch/powerpc/kvm/powerpc.c @@ -677,6 +677,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) case KVM_CAP_PPC_RPT_INVALIDATE: r = 1; break; + case KVM_CAP_PPC_GTSE: + r = mmu_has_feature(MMU_FTR_GTSE); + break; #endif default: r = 0; diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index 507ee1f2aa96..cc581e345d2a 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -1135,6 +1135,7 @@ struct kvm_ppc_resize_hpt { #define KVM_CAP_XSAVE2 208 #define KVM_CAP_SYS_ATTRIBUTES 209 #define KVM_CAP_PPC_AIL_MODE_3 210 +#define KVM_CAP_PPC_GTSE 211 #ifdef KVM_CAP_IRQ_ROUTING -- 2.34.1
Re: [RFC PATCH 2/2] KVM: PPC: Book3S HV: Provide a more accurate MAX_VCPU_ID in P9
Christophe Leroy writes: > Le 13/04/2021 à 00:26, Fabiano Rosas a écrit : >> The KVM_CAP_MAX_VCPU_ID capability was added by commit 0b1b1dfd52a6 >> ("kvm: introduce KVM_MAX_VCPU_ID") to allow for vcpu ids larger than >> KVM_MAX_VCPU in powerpc. >> >> For a P9 host we depend on the guest VSMT value to know what is the >> maximum number of vcpu id we can support: >> >> kvmppc_core_vcpu_create_hv: >> (...) >> if (cpu_has_feature(CPU_FTR_ARCH_300)) { >> --> if (id >= (KVM_MAX_VCPUS * kvm->arch.emul_smt_mode)) { >> pr_devel("KVM: VCPU ID too high\n"); >> core = KVM_MAX_VCORES; >> } else { >> BUG_ON(kvm->arch.smt_mode != 1); >> core = kvmppc_pack_vcpu_id(kvm, id); >> } >> } else { >> core = id / kvm->arch.smt_mode; >> } >> >> which means that the value being returned by the capability today for >> a given guest is potentially way larger than what we actually support: >> >> \#define KVM_MAX_VCPU_ID (MAX_SMT_THREADS * KVM_MAX_VCORES) >> >> If the capability is queried before userspace enables the >> KVM_CAP_PPC_SMT ioctl there is not much we can do, but if the emulated >> smt mode is already known we could provide a more accurate value. >> >> The only practical effect of this change today is to make the >> kvm_create_max_vcpus test pass for powerpc. The QEMU spapr machine has >> a lower max vcpu than what KVM allows so even KVM_MAX_VCPU is not >> reached. >> >> Signed-off-by: Fabiano Rosas > > This patch won't apply after commit a1c42ddedf35 ("kvm: rename > KVM_MAX_VCPU_ID to KVM_MAX_VCPU_IDS") > > Feel free to resubmit if still applicable. Thanks for the reminder, Christophe. I was focusing on enabling the rest of the kvm-selftests: https://lore.kernel.org/r/20220120170109.948681-1-faro...@linux.ibm.com I'm preparing a v2 for that series and will try to include these patches as well.
Re: [PATCH 2/6] KVM: PPC: Book3S HV P9: Inject pending xive interrupts at guest entry
Nicholas Piggin writes: > If there is a pending xive interrupt, inject it at guest entry (if > MSR[EE] is enabled) rather than take another interrupt when the guest > is entered. If xive is enabled then LPCR[LPES] is set so this behaviour > should be expected. > > Signed-off-by: Nicholas Piggin > --- > arch/powerpc/kvm/book3s_hv.c | 9 +++-- > 1 file changed, 7 insertions(+), 2 deletions(-) > > diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c > index f8c0f1f52a1e..5df359053147 100644 > --- a/arch/powerpc/kvm/book3s_hv.c > +++ b/arch/powerpc/kvm/book3s_hv.c > @@ -4524,9 +4524,14 @@ int kvmhv_run_single_vcpu(struct kvm_vcpu *vcpu, u64 > time_limit, > > if (!nested) { > kvmppc_core_prepare_to_enter(vcpu); > - if (test_bit(BOOK3S_IRQPRIO_EXTERNAL, > - &vcpu->arch.pending_exceptions)) > + if (vcpu->arch.shregs.msr & MSR_EE) { > + if (xive_interrupt_pending(vcpu)) > + kvmppc_inject_interrupt_hv(vcpu, > + BOOK3S_INTERRUPT_EXTERNAL, 0); > + } else if (test_bit(BOOK3S_IRQPRIO_EXTERNAL, > + &vcpu->arch.pending_exceptions)) { > lpcr |= LPCR_MER; > + } > } else if (vcpu->arch.pending_exceptions || > vcpu->arch.doorbell_request || > xive_interrupt_pending(vcpu)) { Reviewed-by: Fabiano Rosas
Re: [RFC PATCH] KVM: PPC: Book3s HV: Allow setting GTSE for the nested guest
"Aneesh Kumar K.V" writes: > Fabiano Rosas writes: > >> We're currently getting a Program Interrupt inside the nested guest >> kernel when running with GTSE disabled in the nested hypervisor. We >> allow any guest a cmdline override of GTSE for migration purposes. The >> nested guest does not know it needs to use the option and tries to run >> 'tlbie' with LPCR_GTSE=0. >> >> The details are a bit more intricate: >> >> QEMU always sets GTSE=1 in OV5 even before calling KVM. At prom_init, >> guests use the OV5 value to set MMU_FTR_GTSE. This setting can be >> overridden by 'radix_hcall_invalidate=on' in the kernel cmdline. The >> option itself depends on the availability of >> FW_FEATURE_RPT_INVALIDATE, which it tied to QEMU's cap-rpt-invalidate >> capability. >> >> The MMU_FTR_GTSE flag leads guests to set PROC_TABLE_GTSE in their >> process tables and after H_REGISTER_PROC_TBL, both QEMU and KVM will >> set LPCR_GTSE=1 for that guest. Unless the guest uses the cmdline >> override, in which case: >> >> MMU_FTR_GTSE=0 -> PROC_TABLE_GTSE=0 -> LPCR_GTSE=0 >> >> We don't allow the nested hypervisor to set some LPCR bits for its >> nested guests, so if the nested HV has LPCR_GTSE=0, its nested guests >> will also have LPCR_GTSE=0. But since the only thing that can really >> flip GTSE is the cmdline override, if a nested guest runs without it, >> then the sequence goes: >> >> MMU_FTR_GTSE=1 -> PROC_TABLE_GTSE=1 -> LPCR_GTSE=0. >> >> With LPCR_GTSE=0 the HW will treat 'tlbie' as HV privileged. >> >> This patch allows a nested HV to set LPCR_GTSE for its nested guests >> so the LPCR setting will match what the nested guest sees in OV5. > > This needs a Fixes: tag? This feature was done in many pieces, I think it will end up being the commit that enabled the H_RPT_INVALIDATE capability: Fixes: b87cc116c7e1 ("KVM: PPC: Book3S HV: Add KVM_CAP_PPC_RPT_INVALIDATE capability") > I am not sure what is broken. If L1 doesn't support GTSE, then it should > publish the same to L2 and L2 should not use tlbie. L1 cannot set L2's LPCR to the correct value because L0 will not allow it. That is what this patch is changing. I looked into having QEMU set the proper values to use with CAS, but that is done in QEMU too early, before the first dispatch of L2 (which is when L0 decides that L1 is not allowed to modify some bits). So QEMU always advertises GTSE=1. > That was working before? Or is it that the kernel command to disable > gtse when used with L2 kernel is broken? The command line works, but it needs to be explicitly given when starting the L2. There is no link between L1 and L2 when it comes to GTSE aside from the LPCR value L1 chose to use. So L2 can start with no command line at all, while L1 had GTSE disabled. AFAICT, this has always been broken. The stack leading to this is: NIP [c008615c] radix__flush_tlb_kernel_range+0x13c/0x420 [c0075840] change_page_attr+0xb0/0x240 [c044624c] __apply_to_page_range+0x5ac/0xb90 [c0075bbc] change_memory_attr+0x7c/0x150 [c0350390] bpf_prog_select_runtime+0x200/0x290 [c0d9400c] bpf_migrate_filter+0x18c/0x1e0 [c0d95f38] bpf_prog_create+0x178/0x1d0 [c130e4f4] ptp_classifier_init+0x4c/0x80 [c130d874] sock_init+0xe0/0x100 [c00121e0] do_one_initcall+0x60/0x2c0 [c12b48cc] kernel_init_freeable+0x33c/0x3dc [c00127c8] kernel_init+0x44/0x18c [c000ce64] ret_from_kernel_thread+0x5c/0x64 >> >> Signed-off-by: Fabiano Rosas >> --- >> arch/powerpc/kvm/book3s_hv_nested.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/arch/powerpc/kvm/book3s_hv_nested.c >> b/arch/powerpc/kvm/book3s_hv_nested.c >> index 9d373f8963ee..5b9008d89f90 100644 >> --- a/arch/powerpc/kvm/book3s_hv_nested.c >> +++ b/arch/powerpc/kvm/book3s_hv_nested.c >> @@ -262,7 +262,7 @@ static void load_l2_hv_regs(struct kvm_vcpu *vcpu, >> * Don't let L1 change LPCR bits for the L2 except these: >> */ >> mask = LPCR_DPFD | LPCR_ILE | LPCR_TC | LPCR_AIL | LPCR_LD | >> -LPCR_LPES | LPCR_MER; >> +LPCR_LPES | LPCR_MER | LPCR_GTSE; >> >> /* >> * Additional filtering is required depending on hardware >> -- >> 2.34.1
[RFC PATCH] KVM: PPC: Book3s HV: Allow setting GTSE for the nested guest
We're currently getting a Program Interrupt inside the nested guest kernel when running with GTSE disabled in the nested hypervisor. We allow any guest a cmdline override of GTSE for migration purposes. The nested guest does not know it needs to use the option and tries to run 'tlbie' with LPCR_GTSE=0. The details are a bit more intricate: QEMU always sets GTSE=1 in OV5 even before calling KVM. At prom_init, guests use the OV5 value to set MMU_FTR_GTSE. This setting can be overridden by 'radix_hcall_invalidate=on' in the kernel cmdline. The option itself depends on the availability of FW_FEATURE_RPT_INVALIDATE, which it tied to QEMU's cap-rpt-invalidate capability. The MMU_FTR_GTSE flag leads guests to set PROC_TABLE_GTSE in their process tables and after H_REGISTER_PROC_TBL, both QEMU and KVM will set LPCR_GTSE=1 for that guest. Unless the guest uses the cmdline override, in which case: MMU_FTR_GTSE=0 -> PROC_TABLE_GTSE=0 -> LPCR_GTSE=0 We don't allow the nested hypervisor to set some LPCR bits for its nested guests, so if the nested HV has LPCR_GTSE=0, its nested guests will also have LPCR_GTSE=0. But since the only thing that can really flip GTSE is the cmdline override, if a nested guest runs without it, then the sequence goes: MMU_FTR_GTSE=1 -> PROC_TABLE_GTSE=1 -> LPCR_GTSE=0. With LPCR_GTSE=0 the HW will treat 'tlbie' as HV privileged. This patch allows a nested HV to set LPCR_GTSE for its nested guests so the LPCR setting will match what the nested guest sees in OV5. Signed-off-by: Fabiano Rosas --- arch/powerpc/kvm/book3s_hv_nested.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/powerpc/kvm/book3s_hv_nested.c b/arch/powerpc/kvm/book3s_hv_nested.c index 9d373f8963ee..5b9008d89f90 100644 --- a/arch/powerpc/kvm/book3s_hv_nested.c +++ b/arch/powerpc/kvm/book3s_hv_nested.c @@ -262,7 +262,7 @@ static void load_l2_hv_regs(struct kvm_vcpu *vcpu, * Don't let L1 change LPCR bits for the L2 except these: */ mask = LPCR_DPFD | LPCR_ILE | LPCR_TC | LPCR_AIL | LPCR_LD | - LPCR_LPES | LPCR_MER; + LPCR_LPES | LPCR_MER | LPCR_GTSE; /* * Additional filtering is required depending on hardware -- 2.34.1
Re: [PATCH v3 3/3] KVM: PPC: Add KVM_CAP_PPC_AIL_MODE_3
Nicholas Piggin writes: > Add KVM_CAP_PPC_AIL_MODE_3 to advertise the capability to set the AIL > resource mode to 3 with the H_SET_MODE hypercall. This capability > differs between processor types and KVM types (PR, HV, Nested HV), and > affects guest-visible behaviour. > > QEMU will implement a cap-ail-mode-3 to control this behaviour[1], and > use the KVM CAP if available to determine KVM support[2]. > > [1] https://lists.nongnu.org/archive/html/qemu-ppc/2022-02/msg00437.html > [2] https://lists.nongnu.org/archive/html/qemu-ppc/2022-02/msg00439.html > > Signed-off-by: Nicholas Piggin Reviewed-by: Fabiano Rosas > --- > Documentation/virt/kvm/api.rst | 14 ++ > arch/powerpc/include/asm/setup.h | 2 ++ > arch/powerpc/kvm/powerpc.c | 20 > arch/powerpc/platforms/pseries/setup.c | 12 +++- > include/uapi/linux/kvm.h | 1 + > tools/include/uapi/linux/kvm.h | 1 + > 6 files changed, 49 insertions(+), 1 deletion(-) > > diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst > index bb8cfddbb22d..404056a9a35a 100644 > --- a/Documentation/virt/kvm/api.rst > +++ b/Documentation/virt/kvm/api.rst > @@ -6995,6 +6995,20 @@ indicated by the fd to the VM this is called on. > This is intended to support intra-host migration of VMs between userspace > VMMs, > upgrading the VMM process without interrupting the guest. > > +7.30 KVM_CAP_PPC_AIL_MODE_3 > +--- > + > +:Capability: KVM_CAP_PPC_AIL_MODE_3 > +:Architectures: ppc > +:Type: vm > + > +This capability indicates that the kernel supports the mode 3 setting for the > +"Address Translation Mode on Interrupt" aka "Alternate Interrupt Location" > +resource that is controlled with the H_SET_MODE hypercall. > + > +This capability allows a guest kernel to use a better-performance mode for > +handling interrupts and system calls. > + > 8. Other capabilities. > == > > diff --git a/arch/powerpc/include/asm/setup.h > b/arch/powerpc/include/asm/setup.h > index d0d3dd531c7f..a555fb77258a 100644 > --- a/arch/powerpc/include/asm/setup.h > +++ b/arch/powerpc/include/asm/setup.h > @@ -28,11 +28,13 @@ void setup_panic(void); > #define ARCH_PANIC_TIMEOUT 180 > > #ifdef CONFIG_PPC_PSERIES > +extern bool pseries_reloc_on_exception(void); > extern bool pseries_enable_reloc_on_exc(void); > extern void pseries_disable_reloc_on_exc(void); > extern void pseries_big_endian_exceptions(void); > void __init pseries_little_endian_exceptions(void); > #else > +static inline bool pseries_reloc_on_exception(void) { return false; } > static inline bool pseries_enable_reloc_on_exc(void) { return false; } > static inline void pseries_disable_reloc_on_exc(void) {} > static inline void pseries_big_endian_exceptions(void) {} > diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c > index 2ad0ccd202d5..7dc101ea778c 100644 > --- a/arch/powerpc/kvm/powerpc.c > +++ b/arch/powerpc/kvm/powerpc.c > @@ -678,6 +678,26 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long > ext) > r = 1; > break; > #endif > + case KVM_CAP_PPC_AIL_MODE_3: > + /* > + * KVM PR, POWER7, and some POWER9s don't support AIL=3 mode. > + * The POWER9s can support it if the guest runs in hash mode, > + * but QEMU doesn't necessarily query the capability in time. > + */ > + if (hv_enabled) { > + if (kvmhv_on_pseries()) { > + if (pseries_reloc_on_exception()) > + r = 1; > + } else if (cpu_has_feature(CPU_FTR_ARCH_207S) && > + > !cpu_has_feature(CPU_FTR_P9_RADIX_PREFETCH_BUG)) { > + r = 1; > + } else { > + r = 0; > + } > + } else { > + r = 0; > + } > + break; > default: > r = 0; > break; > diff --git a/arch/powerpc/platforms/pseries/setup.c > b/arch/powerpc/platforms/pseries/setup.c > index 83a04d967a59..182525c2abd5 100644 > --- a/arch/powerpc/platforms/pseries/setup.c > +++ b/arch/powerpc/platforms/pseries/setup.c > @@ -353,6 +353,13 @@ static void pseries_lpar_idle(void) > pseries_idle_epilog(); > } > > +static bool pseries_reloc_on_exception_enabled; > + > +bool pseries_reloc_on_exception(void) > +{ > + return pseries_
Re: [PATCH v2 1/2] KVM: PPC: Book3S PR: Disable SCV when AIL could be disabled
Nicholas Piggin writes: > PR KVM does not support running with AIL enabled, and SCV does is not > supported with AIL disabled. Fix this by ensuring the SCV facility is > disabled with FSCR while a CPU could be running with AIL=0. > > The PowerNV host supports disabling AIL on a per-CPU basis, so SCV just > needs to be disabled when a vCPU is being run. > > The pSeries machine can only switch AIL on a system-wide basis, so it > must disable SCV support at boot if the configuration can potentially > run a PR KVM guest. > > Also ensure a the FSCR[SCV] bit can not be enabled when emulating > mtFSCR for the guest. > > SCV is not emulated for the PR guest at the moment, this just fixes the > host crashes. > > Alternatives considered and rejected: > - SCV support can not be disabled by PR KVM after boot, because it is > advertised to userspace with HWCAP. > - AIL can not be disabled on a per-CPU basis. At least when running on > pseries it is a per-LPAR setting. > - Support for real-mode SCV vectors will not be added because they are > at 0x17000 so making such a large fixed head space causes immediate > value limits to be exceeded, requiring a lot rework and more code. > - Disabling SCV for any PR KVM possible kernel will cause a slowdown > when not using PR KVM. > - A boot time option to disable SCV to use PR KVM is user-hostile. > - System call instruction emulation for SCV facility unavailable > instructions is too complex and old emulation code was subtly broken > and removed. > > Signed-off-by: Nicholas Piggin > --- > arch/powerpc/kernel/exceptions-64s.S | 4 > arch/powerpc/kernel/setup_64.c | 28 > arch/powerpc/kvm/Kconfig | 9 + > arch/powerpc/kvm/book3s_pr.c | 20 ++-- > 4 files changed, 55 insertions(+), 6 deletions(-) > > diff --git a/arch/powerpc/kernel/exceptions-64s.S > b/arch/powerpc/kernel/exceptions-64s.S > index 55caeee37c08..b66dd6f775a4 100644 > --- a/arch/powerpc/kernel/exceptions-64s.S > +++ b/arch/powerpc/kernel/exceptions-64s.S > @@ -809,6 +809,10 @@ __start_interrupts: > * - MSR_EE|MSR_RI is clear (no reentrant exceptions) > * - Standard kernel environment is set up (stack, paca, etc) > * > + * KVM: > + * These interrupts do not elevate HV 0->1, so HV is not involved. PR KVM > + * ensures that FSCR[SCV] is disabled whenever it has to force AIL off. > + * > * Call convention: > * > * syscall register convention is in Documentation/powerpc/syscall64-abi.rst > diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c > index be8577ac9397..7f7da641e551 100644 > --- a/arch/powerpc/kernel/setup_64.c > +++ b/arch/powerpc/kernel/setup_64.c > @@ -197,6 +197,34 @@ static void __init configure_exceptions(void) > > /* Under a PAPR hypervisor, we need hypercalls */ > if (firmware_has_feature(FW_FEATURE_SET_MODE)) { > + /* > + * - PR KVM does not support AIL mode interrupts in the host > + * while a PR guest is running. > + * > + * - SCV system call interrupt vectors are only implemented for > + * AIL mode interrupts. > + * > + * - On pseries, AIL mode can only be enabled and disabled > + * system-wide so when a PR VM is created on a pseries host, > + * all CPUs of the host are set to AIL=0 mode. > + * > + * - Therefore host CPUs must not execute scv while a PR VM > + * exists. > + * > + * - SCV support can not be disabled dynamically because the > + * feature is advertised to host userspace. Disabling the > + * facility and emulating it would be possible but is not > + * implemented. > + * > + * - So SCV support is blanket diabled if PR KVM could possibly disabled Reviewed-by: Fabiano Rosas
Re: [PATCH v2 2/2] KVM: PPC: Book3S PR: Disallow AIL != 0
Nicholas Piggin writes: > KVM PR does not implement address translation modes on interrupt, so it > must not allow H_SET_MODE to succeed. The behaviour change caused by > this mode is architected and not advisory (interrupts *must* behave > differently). > > QEMU does not deal with differences in AIL support in the host. The > solution to that is a spapr capability and corresponding KVM CAP, but > this patch does not break things more than before (the host behaviour > already differs, this change just disallows some modes that are not > implemented properly). > > By happy coincidence, this allows PR Linux guests that are using the SCV > facility to boot and run, because Linux disables the use of SCV if AIL > can not be set to 3. This does not fix the underlying problem of missing > SCV support (an OS could implement real-mode SCV vectors and try to > enable the facility). The true fix for that is for KVM PR to emulate scv > interrupts from the facility unavailable interrupt. > > Signed-off-by: Nicholas Piggin Reviewed-by: Fabiano Rosas > --- > arch/powerpc/kvm/book3s_pr_papr.c | 20 > 1 file changed, 20 insertions(+) > > diff --git a/arch/powerpc/kvm/book3s_pr_papr.c > b/arch/powerpc/kvm/book3s_pr_papr.c > index 1f10e7dfcdd0..dc4f51ac84bc 100644 > --- a/arch/powerpc/kvm/book3s_pr_papr.c > +++ b/arch/powerpc/kvm/book3s_pr_papr.c > @@ -281,6 +281,22 @@ static int kvmppc_h_pr_logical_ci_store(struct kvm_vcpu > *vcpu) > return EMULATE_DONE; > } > > +static int kvmppc_h_pr_set_mode(struct kvm_vcpu *vcpu) > +{ > + unsigned long mflags = kvmppc_get_gpr(vcpu, 4); > + unsigned long resource = kvmppc_get_gpr(vcpu, 5); > + > + if (resource == H_SET_MODE_RESOURCE_ADDR_TRANS_MODE) { > + /* KVM PR does not provide AIL!=0 to guests */ > + if (mflags == 0) > + kvmppc_set_gpr(vcpu, 3, H_SUCCESS); > + else > + kvmppc_set_gpr(vcpu, 3, H_UNSUPPORTED_FLAG_START - 63); > + return EMULATE_DONE; > + } > + return EMULATE_FAIL; > +} > + > #ifdef CONFIG_SPAPR_TCE_IOMMU > static int kvmppc_h_pr_put_tce(struct kvm_vcpu *vcpu) > { > @@ -384,6 +400,8 @@ int kvmppc_h_pr(struct kvm_vcpu *vcpu, unsigned long cmd) > return kvmppc_h_pr_logical_ci_load(vcpu); > case H_LOGICAL_CI_STORE: > return kvmppc_h_pr_logical_ci_store(vcpu); > + case H_SET_MODE: > + return kvmppc_h_pr_set_mode(vcpu); > case H_XIRR: > case H_CPPR: > case H_EOI: > @@ -421,6 +439,7 @@ int kvmppc_hcall_impl_pr(unsigned long cmd) > case H_CEDE: > case H_LOGICAL_CI_LOAD: > case H_LOGICAL_CI_STORE: > + case H_SET_MODE: > #ifdef CONFIG_KVM_XICS > case H_XIRR: > case H_CPPR: > @@ -447,6 +466,7 @@ static unsigned int default_hcall_list[] = { > H_BULK_REMOVE, > H_PUT_TCE, > H_CEDE, > + H_SET_MODE, > #ifdef CONFIG_KVM_XICS > H_XIRR, > H_CPPR,
[PATCH v5 5/5] KVM: PPC: Book3s: mmio: Deliver DSI after emulation failure
MMIO emulation can fail if the guest uses an instruction that we are not prepared to emulate. Since these instructions can be and most likely are valid ones, this is (slightly) closer to an access fault than to an illegal instruction, so deliver a Data Storage interrupt instead of a Program interrupt. BookE ignores bad faults, so it will keep using a Program interrupt because a DSI would cause a fault loop in the guest. Suggested-by: Nicholas Piggin Signed-off-by: Fabiano Rosas --- arch/powerpc/kvm/emulate_loadstore.c | 10 +++--- arch/powerpc/kvm/powerpc.c | 22 ++ 2 files changed, 25 insertions(+), 7 deletions(-) diff --git a/arch/powerpc/kvm/emulate_loadstore.c b/arch/powerpc/kvm/emulate_loadstore.c index 48272a9b9c30..cfc9114b87d0 100644 --- a/arch/powerpc/kvm/emulate_loadstore.c +++ b/arch/powerpc/kvm/emulate_loadstore.c @@ -73,7 +73,6 @@ int kvmppc_emulate_loadstore(struct kvm_vcpu *vcpu) { u32 inst; enum emulation_result emulated = EMULATE_FAIL; - int advance = 1; struct instruction_op op; /* this default type might be overwritten by subcategories */ @@ -98,6 +97,8 @@ int kvmppc_emulate_loadstore(struct kvm_vcpu *vcpu) int type = op.type & INSTR_TYPE_MASK; int size = GETSIZE(op.type); + vcpu->mmio_is_write = OP_IS_STORE(type); + switch (type) { case LOAD: { int instr_byte_swap = op.type & BYTEREV; @@ -355,15 +356,10 @@ int kvmppc_emulate_loadstore(struct kvm_vcpu *vcpu) } } - if (emulated == EMULATE_FAIL) { - advance = 0; - kvmppc_core_queue_program(vcpu, 0); - } - trace_kvm_ppc_instr(inst, kvmppc_get_pc(vcpu), emulated); /* Advance past emulated instruction. */ - if (advance) + if (emulated != EMULATE_FAIL) kvmppc_set_pc(vcpu, kvmppc_get_pc(vcpu) + 4); return emulated; diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c index acb0d2a4bdb9..82d889db2b6b 100644 --- a/arch/powerpc/kvm/powerpc.c +++ b/arch/powerpc/kvm/powerpc.c @@ -309,6 +309,28 @@ int kvmppc_emulate_mmio(struct kvm_vcpu *vcpu) kvmppc_get_last_inst(vcpu, INST_GENERIC, &last_inst); kvm_debug_ratelimited("Guest access to device memory using unsupported instruction (opcode: %#08x)\n", last_inst); + + /* +* Injecting a Data Storage here is a bit more +* accurate since the instruction that caused the +* access could still be a valid one. +*/ + if (!IS_ENABLED(CONFIG_BOOKE)) { + ulong dsisr = DSISR_BADACCESS; + + if (vcpu->mmio_is_write) + dsisr |= DSISR_ISSTORE; + + kvmppc_core_queue_data_storage(vcpu, vcpu->arch.vaddr_accessed, dsisr); + } else { + /* +* BookE does not send a SIGBUS on a bad +* fault, so use a Program interrupt instead +* to avoid a fault loop. +*/ + kvmppc_core_queue_program(vcpu, 0); + } + r = RESUME_GUEST; break; } -- 2.34.1
[PATCH v5 4/5] KVM: PPC: mmio: Return to guest after emulation failure
If MMIO emulation fails we don't want to crash the whole guest by returning to userspace. The original commit bbf45ba57eae ("KVM: ppc: PowerPC 440 KVM implementation") added a todo: /* XXX Deliver Program interrupt to guest. */ and later the commit d69614a295ae ("KVM: PPC: Separate loadstore emulation from priv emulation") added the Program interrupt injection but in another file, so I'm assuming it was missed that this block needed to be altered. Also change the message to a ratelimited one since we're letting the guest run and it could flood the host logs. Signed-off-by: Fabiano Rosas Reviewed-by: Nicholas Piggin --- arch/powerpc/kvm/powerpc.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c index 27fb2b70f631..acb0d2a4bdb9 100644 --- a/arch/powerpc/kvm/powerpc.c +++ b/arch/powerpc/kvm/powerpc.c @@ -307,9 +307,9 @@ int kvmppc_emulate_mmio(struct kvm_vcpu *vcpu) u32 last_inst; kvmppc_get_last_inst(vcpu, INST_GENERIC, &last_inst); - /* XXX Deliver Program interrupt to guest. */ - pr_emerg("%s: emulation failed (%08x)\n", __func__, last_inst); - r = RESUME_HOST; + kvm_debug_ratelimited("Guest access to device memory using unsupported instruction (opcode: %#08x)\n", + last_inst); + r = RESUME_GUEST; break; } default: -- 2.34.1
[PATCH v5 3/5] KVM: PPC: mmio: Reject instructions that access more than mmio.data size
The MMIO interface between the kernel and userspace uses a structure that supports a maximum of 8-bytes of data. Instructions that access more than that need to be emulated in parts. We currently don't have generic support for splitting the emulation in parts and each set of instructions needs to be explicitly included. There's already an error message being printed when a load or store exceeds the mmio.data buffer but we don't fail the emulation until later at kvmppc_complete_mmio_load and even then we allow userspace to make a partial copy of the data, which ends up overwriting some fields of the mmio structure. This patch makes the emulation fail earlier at kvmppc_handle_load|store, which will send a Program interrupt to the guest. This is better than allowing the guest to proceed with partial data. Note that this was caught in a somewhat artificial scenario using quadword instructions (lq/stq), there's no account of an actual guest in the wild running instructions that are not properly emulated. (While here, remove the "bad MMIO" messages. The caller already has an error message.) Signed-off-by: Fabiano Rosas Reviewed-by: Alexey Kardashevskiy Reviewed-by: Nicholas Piggin --- arch/powerpc/kvm/powerpc.c | 16 +--- 1 file changed, 5 insertions(+), 11 deletions(-) diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c index c2bd29e90314..27fb2b70f631 100644 --- a/arch/powerpc/kvm/powerpc.c +++ b/arch/powerpc/kvm/powerpc.c @@ -1114,10 +1114,8 @@ static void kvmppc_complete_mmio_load(struct kvm_vcpu *vcpu) struct kvm_run *run = vcpu->run; u64 gpr; - if (run->mmio.len > sizeof(gpr)) { - printk(KERN_ERR "bad MMIO length: %d\n", run->mmio.len); + if (run->mmio.len > sizeof(gpr)) return; - } if (!vcpu->arch.mmio_host_swabbed) { switch (run->mmio.len) { @@ -1236,10 +1234,8 @@ static int __kvmppc_handle_load(struct kvm_vcpu *vcpu, host_swabbed = !is_default_endian; } - if (bytes > sizeof(run->mmio.data)) { - printk(KERN_ERR "%s: bad MMIO length: %d\n", __func__, - run->mmio.len); - } + if (bytes > sizeof(run->mmio.data)) + return EMULATE_FAIL; run->mmio.phys_addr = vcpu->arch.paddr_accessed; run->mmio.len = bytes; @@ -1325,10 +1321,8 @@ int kvmppc_handle_store(struct kvm_vcpu *vcpu, host_swabbed = !is_default_endian; } - if (bytes > sizeof(run->mmio.data)) { - printk(KERN_ERR "%s: bad MMIO length: %d\n", __func__, - run->mmio.len); - } + if (bytes > sizeof(run->mmio.data)) + return EMULATE_FAIL; run->mmio.phys_addr = vcpu->arch.paddr_accessed; run->mmio.len = bytes; -- 2.34.1
[PATCH v5 2/5] KVM: PPC: Fix vmx/vsx mixup in mmio emulation
The MMIO emulation code for vector instructions is duplicated between VSX and VMX. When emulating VMX we should check the VMX copy size instead of the VSX one. Fixes: acc9eb9305fe ("KVM: PPC: Reimplement LOAD_VMX/STORE_VMX instruction ...") Signed-off-by: Fabiano Rosas Reviewed-by: Nicholas Piggin --- arch/powerpc/kvm/powerpc.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c index 50414fb2a5ea..c2bd29e90314 100644 --- a/arch/powerpc/kvm/powerpc.c +++ b/arch/powerpc/kvm/powerpc.c @@ -1499,7 +1499,7 @@ int kvmppc_handle_vmx_load(struct kvm_vcpu *vcpu, { enum emulation_result emulated = EMULATE_DONE; - if (vcpu->arch.mmio_vsx_copy_nums > 2) + if (vcpu->arch.mmio_vmx_copy_nums > 2) return EMULATE_FAIL; while (vcpu->arch.mmio_vmx_copy_nums) { @@ -1596,7 +1596,7 @@ int kvmppc_handle_vmx_store(struct kvm_vcpu *vcpu, unsigned int index = rs & KVM_MMIO_REG_MASK; enum emulation_result emulated = EMULATE_DONE; - if (vcpu->arch.mmio_vsx_copy_nums > 2) + if (vcpu->arch.mmio_vmx_copy_nums > 2) return EMULATE_FAIL; vcpu->arch.io_gpr = rs; -- 2.34.1
[PATCH v5 0/5] KVM: PPC: MMIO fixes
Changes from v4: -patch 4: switched to kvm_debug_ratelimited. -patch 5: kept the Program interrupt for BookE. v4: https://lore.kernel.org/r/20220121222626.972495-1-faro...@linux.ibm.com v3: https://lore.kernel.org/r/20220107210012.4091153-1-faro...@linux.ibm.com v2: https://lore.kernel.org/r/20220106200304.4070825-1-faro...@linux.ibm.com v1: https://lore.kernel.org/r/20211223211528.3560711-1-faro...@linux.ibm.com Fabiano Rosas (5): KVM: PPC: Book3S HV: Stop returning internal values to userspace KVM: PPC: Fix vmx/vsx mixup in mmio emulation KVM: PPC: mmio: Reject instructions that access more than mmio.data size KVM: PPC: mmio: Return to guest after emulation failure KVM: PPC: Book3s: mmio: Deliver DSI after emulation failure arch/powerpc/kvm/emulate_loadstore.c | 10 ++--- arch/powerpc/kvm/powerpc.c | 56 2 files changed, 43 insertions(+), 23 deletions(-) -- 2.34.1
[PATCH v5 1/5] KVM: PPC: Book3S HV: Stop returning internal values to userspace
Our kvm_arch_vcpu_ioctl_run currently returns the RESUME_HOST values to userspace, against the API of the KVM_RUN ioctl which returns 0 on success. Signed-off-by: Fabiano Rosas Reviewed-by: Nicholas Piggin --- This was noticed while enabling the kvm selftests for powerpc. There's an assert at the _vcpu_run function when we return a value different from the expected. --- arch/powerpc/kvm/powerpc.c | 8 1 file changed, 8 insertions(+) diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c index 2ad0ccd202d5..50414fb2a5ea 100644 --- a/arch/powerpc/kvm/powerpc.c +++ b/arch/powerpc/kvm/powerpc.c @@ -1841,6 +1841,14 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu) #ifdef CONFIG_ALTIVEC out: #endif + + /* +* We're already returning to userspace, don't pass the +* RESUME_HOST flags along. +*/ + if (r > 0) + r = 0; + vcpu_put(vcpu); return r; } -- 2.34.1
[PATCH v3 4/4] KVM: PPC: Decrement module refcount if init_vm fails
We increment the reference count for KVM-HV/PR before the call to kvmppc_core_init_vm. If that function fails we need to decrement the refcount. Also remove the check on kvm_ops->owner because try_module_get can handle a NULL module. Signed-off-by: Fabiano Rosas Reviewed-by: Nicholas Piggin --- arch/powerpc/kvm/powerpc.c | 9 +++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c index 2ad0ccd202d5..a6d6d452243f 100644 --- a/arch/powerpc/kvm/powerpc.c +++ b/arch/powerpc/kvm/powerpc.c @@ -431,6 +431,8 @@ int kvm_arch_check_processor_compat(void *opaque) int kvm_arch_init_vm(struct kvm *kvm, unsigned long type) { struct kvmppc_ops *kvm_ops = NULL; + int r; + /* * if we have both HV and PR enabled, default is HV */ @@ -452,11 +454,14 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type) } else goto err_out; - if (kvm_ops->owner && !try_module_get(kvm_ops->owner)) + if (!try_module_get(kvm_ops->owner)) return -ENOENT; kvm->arch.kvm_ops = kvm_ops; - return kvmppc_core_init_vm(kvm); + r = kvmppc_core_init_vm(kvm); + if (r) + module_put(kvm_ops->owner); + return r; err_out: return -EINVAL; } -- 2.34.1
[PATCH v3 3/4] KVM: PPC: Book3S HV: Free allocated memory if module init fails
The module's exit function is not called when the init fails, we need to do cleanup before returning. Signed-off-by: Fabiano Rosas Reviewed-by: Nicholas Piggin --- arch/powerpc/kvm/book3s_hv.c | 15 +++ 1 file changed, 11 insertions(+), 4 deletions(-) diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index b9aace212599..87a49651a402 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -6104,7 +6104,7 @@ static int kvmppc_book3s_init_hv(void) if (!cpu_has_feature(CPU_FTR_ARCH_300)) { r = kvm_init_subcore_bitmap(); if (r) - return r; + goto err; } /* @@ -6120,7 +6120,8 @@ static int kvmppc_book3s_init_hv(void) np = of_find_compatible_node(NULL, NULL, "ibm,opal-intc"); if (!np) { pr_err("KVM-HV: Cannot determine method for accessing XICS\n"); - return -ENODEV; + r = -ENODEV; + goto err; } /* presence of intc confirmed - node can be dropped again */ of_node_put(np); @@ -6133,12 +6134,12 @@ static int kvmppc_book3s_init_hv(void) r = kvmppc_mmu_hv_init(); if (r) - return r; + goto err; if (kvmppc_radix_possible()) { r = kvmppc_radix_init(); if (r) - return r; + goto err; } r = kvmppc_uvmem_init(); @@ -6151,6 +6152,12 @@ static int kvmppc_book3s_init_hv(void) kvmppc_hv_ops = &kvm_ops_hv; return 0; + +err: + kvmhv_nested_exit(); + kvmppc_radix_exit(); + + return r; } static void kvmppc_book3s_exit_hv(void) -- 2.34.1
[PATCH v3 2/4] KVM: PPC: Book3S HV: Delay setting of kvm ops
Delay the setting of kvm_hv_ops until after all init code has completed. This avoids leaving the ops still accessible if the init fails. Signed-off-by: Fabiano Rosas --- arch/powerpc/kvm/book3s_hv.c | 12 +++- 1 file changed, 7 insertions(+), 5 deletions(-) diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index 3a3845f366d4..b9aace212599 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -6127,9 +6127,6 @@ static int kvmppc_book3s_init_hv(void) } #endif - kvm_ops_hv.owner = THIS_MODULE; - kvmppc_hv_ops = &kvm_ops_hv; - init_default_hcalls(); init_vcore_lists(); @@ -6145,10 +6142,15 @@ static int kvmppc_book3s_init_hv(void) } r = kvmppc_uvmem_init(); - if (r < 0) + if (r < 0) { pr_err("KVM-HV: kvmppc_uvmem_init failed %d\n", r); + return r; + } - return r; + kvm_ops_hv.owner = THIS_MODULE; + kvmppc_hv_ops = &kvm_ops_hv; + + return 0; } static void kvmppc_book3s_exit_hv(void) -- 2.34.1
[PATCH v3 1/4] KVM: PPC: Book3S HV: Check return value of kvmppc_radix_init
The return of the function is being shadowed by the call to kvmppc_uvmem_init. Fixes: ca9f4942670c ("KVM: PPC: Book3S HV: Support for running secure guests") Signed-off-by: Fabiano Rosas Reviewed-by: Nicholas Piggin --- arch/powerpc/kvm/book3s_hv.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index d1817cd9a691..3a3845f366d4 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -6138,8 +6138,11 @@ static int kvmppc_book3s_init_hv(void) if (r) return r; - if (kvmppc_radix_possible()) + if (kvmppc_radix_possible()) { r = kvmppc_radix_init(); + if (r) + return r; + } r = kvmppc_uvmem_init(); if (r < 0) -- 2.34.1
[PATCH v3 0/4] KVM: PPC: KVM module exit fixes
changes from v2: - patch 4: Matched module_put() to try_module_get() v2: https://lore.kernel.org/r/20220124220803.1011673-1-faro...@linux.ibm.com v1: https://lore.kernel.org/r/20211223211931.3560887-1-faro...@linux.ibm.com Fabiano Rosas (4): KVM: PPC: Book3S HV: Check return value of kvmppc_radix_init KVM: PPC: Book3S HV: Delay setting of kvm ops KVM: PPC: Book3S HV: Free allocated memory if module init fails KVM: PPC: Decrement module refcount if init_vm fails arch/powerpc/kvm/book3s_hv.c | 28 arch/powerpc/kvm/powerpc.c | 9 +++-- 2 files changed, 27 insertions(+), 10 deletions(-) -- 2.34.1
Re: [PATCH v2 4/4] KVM: PPC: Decrement module refcount if init_vm fails
Nicholas Piggin writes: > Excerpts from Fabiano Rosas's message of January 25, 2022 8:08 am: >> We increment the reference count for KVM-HV/PR before the call to >> kvmppc_core_init_vm. If that function fails we need to decrement the >> refcount. >> >> Signed-off-by: Fabiano Rosas >> --- >> Caught this while testing Nick's LPID patches by looking at >> /sys/module/kvm_hv/refcnt > > Nice catch. Is this the only change in the series? Yes. > You can just use kvm_ops->owner like try_module_get() does I think? Also > try_module_get works on a NULL module same as module_put by the looks, > so you could adjust that in this patch to remove the NULL check so it > is consistent with the put. Sure, I'll send a v3.
Re: [PATCH 2/2] KVM: PPC: Book3S PR: Disallow AIL != 0
Nicholas Piggin writes: > KVM PR does not implement address translation modes on interrupt, so it > must not allow H_SET_MODE to succeed. > > This is not compatible with QEMU behaviour. The solution might be to > have a cap-ail for this, but now it's broken either way so fix it in > KVM to start with. > > This allows PR Linux guests that are using the SCV facility to boot and > run, because Linux disables the use of SCV if AIL can not be set to 3. > This isn't a real fix because Linux or another OS could implement real > mode SCV vectors and try to enable it. The right solution is for KVM to > emulate scv interrupts from the facility unavailable interrupt. > > Signed-off-by: Nicholas Piggin > --- Reviewed-by: Fabiano Rosas > arch/powerpc/kvm/book3s_pr_papr.c | 20 > 1 file changed, 20 insertions(+) > > diff --git a/arch/powerpc/kvm/book3s_pr_papr.c > b/arch/powerpc/kvm/book3s_pr_papr.c > index 1f10e7dfcdd0..dc4f51ac84bc 100644 > --- a/arch/powerpc/kvm/book3s_pr_papr.c > +++ b/arch/powerpc/kvm/book3s_pr_papr.c > @@ -281,6 +281,22 @@ static int kvmppc_h_pr_logical_ci_store(struct kvm_vcpu > *vcpu) > return EMULATE_DONE; > } > > +static int kvmppc_h_pr_set_mode(struct kvm_vcpu *vcpu) > +{ > + unsigned long mflags = kvmppc_get_gpr(vcpu, 4); > + unsigned long resource = kvmppc_get_gpr(vcpu, 5); > + > + if (resource == H_SET_MODE_RESOURCE_ADDR_TRANS_MODE) { > + /* KVM PR does not provide AIL!=0 to guests */ > + if (mflags == 0) > + kvmppc_set_gpr(vcpu, 3, H_SUCCESS); > + else > + kvmppc_set_gpr(vcpu, 3, H_UNSUPPORTED_FLAG_START - 63); > + return EMULATE_DONE; > + } > + return EMULATE_FAIL; > +} > + > #ifdef CONFIG_SPAPR_TCE_IOMMU > static int kvmppc_h_pr_put_tce(struct kvm_vcpu *vcpu) > { > @@ -384,6 +400,8 @@ int kvmppc_h_pr(struct kvm_vcpu *vcpu, unsigned long cmd) > return kvmppc_h_pr_logical_ci_load(vcpu); > case H_LOGICAL_CI_STORE: > return kvmppc_h_pr_logical_ci_store(vcpu); > + case H_SET_MODE: > + return kvmppc_h_pr_set_mode(vcpu); > case H_XIRR: > case H_CPPR: > case H_EOI: > @@ -421,6 +439,7 @@ int kvmppc_hcall_impl_pr(unsigned long cmd) > case H_CEDE: > case H_LOGICAL_CI_LOAD: > case H_LOGICAL_CI_STORE: > + case H_SET_MODE: > #ifdef CONFIG_KVM_XICS > case H_XIRR: > case H_CPPR: > @@ -447,6 +466,7 @@ static unsigned int default_hcall_list[] = { > H_BULK_REMOVE, > H_PUT_TCE, > H_CEDE, > + H_SET_MODE, > #ifdef CONFIG_KVM_XICS > H_XIRR, > H_CPPR,
Re: [PATCH 1/2] KVM: PPC: Book3S PR: Disable SCV when running AIL is disabled
Nicholas Piggin writes: > PR KVM does not support running with AIL enabled, and SCV does is not > supported with AIL disabled. > > Fix this by ensuring the SCV facility is disabled with FSCR while a > CPU can be running with AIL=0. PowerNV host supports disabling AIL on a > per-CPU basis, so SCV just needs to be disabled when a vCPU is run. > > The pSeries machine can only switch AIL on a system-wide basis, so it > must disable SCV support at boot if the configuration can potentially > run a PR KVM guest. > > SCV is not emulated for the PR guest at the moment, this just fixes the > host crashes. > > Alternatives considered and rejected: > - SCV support can not be disabled by PR KVM after boot, because it is > advertised to userspace with HWCAP. > - AIL can not be disabled on a per-CPU basis. At least when running on > pseries it is a per-LPAR setting. > - Support for real-mode SCV vectors will not be added because they are > at 0x17000 so making such a large fixed head space causes immediate > value limits to be exceeded, requiring a lot rework and more code. > - Disabling SCV for any PR KVM possible kernel will cause a slowdown > when not using PR KVM. > - A boot time option to disable SCV to use PR KVM is user-hostile. > - System call instruction emulation for SCV facility unavailable > instructions is too complex and old emulation code was subtly broken > and removed. > > Signed-off-by: Nicholas Piggin > --- > arch/powerpc/kernel/exceptions-64s.S | 4 > arch/powerpc/kernel/setup_64.c | 15 +++ > arch/powerpc/kvm/book3s_pr.c | 20 ++-- > 3 files changed, 33 insertions(+), 6 deletions(-) > > diff --git a/arch/powerpc/kernel/exceptions-64s.S > b/arch/powerpc/kernel/exceptions-64s.S > index 55caeee37c08..b66dd6f775a4 100644 > --- a/arch/powerpc/kernel/exceptions-64s.S > +++ b/arch/powerpc/kernel/exceptions-64s.S > @@ -809,6 +809,10 @@ __start_interrupts: > * - MSR_EE|MSR_RI is clear (no reentrant exceptions) > * - Standard kernel environment is set up (stack, paca, etc) > * > + * KVM: > + * These interrupts do not elevate HV 0->1, so HV is not involved. PR KVM > + * ensures that FSCR[SCV] is disabled whenever it has to force AIL off. > + * > * Call convention: > * > * syscall register convention is in Documentation/powerpc/syscall64-abi.rst > diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c > index be8577ac9397..ac52c69a3811 100644 > --- a/arch/powerpc/kernel/setup_64.c > +++ b/arch/powerpc/kernel/setup_64.c > @@ -197,6 +197,21 @@ static void __init configure_exceptions(void) > > /* Under a PAPR hypervisor, we need hypercalls */ > if (firmware_has_feature(FW_FEATURE_SET_MODE)) { > + /* > + * PR KVM does not support AIL mode interrupts in the host, and > + * SCV system call interrupt vectors are only implemented for > + * AIL mode. Under pseries, AIL mode can only be enabled and > + * disabled system-wide so when PR KVM is loaded, all CPUs in > + * the host are set to AIL=0 mode. SCV can not be disabled > + * dynamically because the feature is advertised to host > + * userspace, so SCV support must not be enabled if PR KVM can > + * possibly be run. > + */ > + if (IS_ENABLED(CONFIG_KVM_BOOK3S_PR_POSSIBLE) && > !radix_enabled()) { > + init_task.thread.fscr &= ~FSCR_SCV; > + cur_cpu_spec->cpu_user_features2 &= ~PPC_FEATURE2_SCV; > + } > + "Under pseries, AIL mode can only be enabled and disabled system-wide so when PR KVM is loaded, all CPUs in the host are set to AIL=0 mode." Loaded as in 'modprobe kvm_pr'? And host as in "nested host" surely. Unless I completely misunderstood the patch (likely). Is there a way to make this less unexpected to users? Maybe a few words in the Kconfig entry for PR_POSSIBLE saying "if you enable this and run a Hash MMU guest, you lose SCV"? > /* Enable AIL if possible */ > if (!pseries_enable_reloc_on_exc()) { > init_task.thread.fscr &= ~FSCR_SCV; > diff --git a/arch/powerpc/kvm/book3s_pr.c b/arch/powerpc/kvm/book3s_pr.c > index 34a801c3604a..4d1c84b94b77 100644 > --- a/arch/powerpc/kvm/book3s_pr.c > +++ b/arch/powerpc/kvm/book3s_pr.c > @@ -140,9 +140,12 @@ static void kvmppc_core_vcpu_load_pr(struct kvm_vcpu > *vcpu, int cpu) > #endif > > /* Disable AIL if supported */ > - if (cpu_has_feature(CPU_FTR_HVMODE) && > - cpu_has_feature(CPU_FTR_ARCH_207S)) > - mtspr(SPRN_LPCR, mfspr(SPRN_LPCR) & ~LPCR_AIL); > + if (cpu_has_feature(CPU_FTR_HVMODE)) { > + if (cpu_has_feature(CPU_FTR_ARCH_207S)) > + mtspr(SPRN_LPCR, mfspr(SPRN_LPCR) & ~LPCR_AIL); > + if (cpu_has_feature(CPU_FTR_ARCH_300) && (current->thread.fscr > & FSCR_SCV)) > +
[PATCH v2 0/4] KVM: PPC: KVM module exit fixes
I stumbled upon another issue with our module exit so I'm sending another version to add a fix for it. - patches 1 and 3 are already reviewed; - patch 2 lacks a Reviewed-by. Nick asked about an issue Alexey might have encountered. I haven't heard of any issues with the module exit aside from the ones that this series fixes; - patch 4 is new. It fixes an issue with module refcounting. v1: https://lore.kernel.org/r/20211223211931.3560887-1-faro...@linux.ibm.com Fabiano Rosas (4): KVM: PPC: Book3S HV: Check return value of kvmppc_radix_init KVM: PPC: Book3S HV: Delay setting of kvm ops KVM: PPC: Book3S HV: Free allocated memory if module init fails KVM: PPC: Decrement module refcount if init_vm fails arch/powerpc/kvm/book3s_hv.c | 28 arch/powerpc/kvm/powerpc.c | 7 ++- 2 files changed, 26 insertions(+), 9 deletions(-) -- 2.34.1
Re: [PATCH 6/6] KVM: PPC: Book3S HV: Remove KVMPPC_NR_LPIDS
Nicholas Piggin writes: > KVMPPC_NR_LPIDS no longer represents any size restriction on the > LPID space and can be removed. A CPU with more than 12 LPID bits > implemented will now be able to create more than 4095 guests. > > Signed-off-by: Nicholas Piggin Reviewed-by: Fabiano Rosas > --- > arch/powerpc/include/asm/kvm_book3s_asm.h | 3 --- > arch/powerpc/kvm/book3s_64_mmu_hv.c | 3 --- > 2 files changed, 6 deletions(-) > > diff --git a/arch/powerpc/include/asm/kvm_book3s_asm.h > b/arch/powerpc/include/asm/kvm_book3s_asm.h > index e6bda70b1d93..c8882d9b86c2 100644 > --- a/arch/powerpc/include/asm/kvm_book3s_asm.h > +++ b/arch/powerpc/include/asm/kvm_book3s_asm.h > @@ -14,9 +14,6 @@ > #define XICS_MFRR0xc > #define XICS_IPI 2 /* interrupt source # for IPIs */ > > -/* LPIDs we support with this build -- runtime limit may be lower */ > -#define KVMPPC_NR_LPIDS (1UL << 12) > - > /* Maximum number of threads per physical core */ > #define MAX_SMT_THREADS 8 > > diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c > b/arch/powerpc/kvm/book3s_64_mmu_hv.c > index f983fb36cbf2..aafd2a74304c 100644 > --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c > +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c > @@ -269,9 +269,6 @@ int kvmppc_mmu_hv_init(void) > nr_lpids = 1UL << KVM_MAX_NESTED_GUESTS_SHIFT; > } > > - if (nr_lpids > KVMPPC_NR_LPIDS) > - nr_lpids = KVMPPC_NR_LPIDS; > - > if (!cpu_has_feature(CPU_FTR_ARCH_300)) { > /* POWER7 has 10-bit LPIDs, POWER8 has 12-bit LPIDs */ > if (cpu_has_feature(CPU_FTR_ARCH_207S))
Re: [PATCH 5/6] KVM: PPC: Book3S Nested: Use explicit 4096 LPID maximum
Nicholas Piggin writes: > Rather than tie this to KVMPPC_NR_LPIDS which is becoming more dynamic, > fix it to 4096 (12-bits) explicitly for now. > > kvmhv_get_nested() does not have to check against KVM_MAX_NESTED_GUESTS > because the L1 partition table registration hcall already did that, and > it checks against the partition table size. > > This patch also puts all the partition table size calculations into the > same form, using 12 for the architected size field shift and 4 for the > shift corresponding to the partition table entry size. > > Signed-of-by: Nicholas Piggin Reviewed-by: Fabiano Rosas > --- > arch/powerpc/include/asm/kvm_host.h | 7 ++- > arch/powerpc/kvm/book3s_64_mmu_hv.c | 2 +- > arch/powerpc/kvm/book3s_hv_nested.c | 24 +++- > 3 files changed, 18 insertions(+), 15 deletions(-) > > diff --git a/arch/powerpc/include/asm/kvm_host.h > b/arch/powerpc/include/asm/kvm_host.h > index 5fd0564e5c94..e6fb03884dcc 100644 > --- a/arch/powerpc/include/asm/kvm_host.h > +++ b/arch/powerpc/include/asm/kvm_host.h > @@ -34,7 +34,12 @@ > #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE > #include /* for MAX_SMT_THREADS */ > #define KVM_MAX_VCPU_IDS (MAX_SMT_THREADS * KVM_MAX_VCORES) > -#define KVM_MAX_NESTED_GUESTSKVMPPC_NR_LPIDS > + > +/* > + * Limit the nested partition table to 4096 entries (because that's what > + * hardware supports). Both guest and host use this value. > + */ > +#define KVM_MAX_NESTED_GUESTS_SHIFT 12 > > #else > #define KVM_MAX_VCPU_IDS KVM_MAX_VCPUS > diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c > b/arch/powerpc/kvm/book3s_64_mmu_hv.c > index 5be92d5bc099..f983fb36cbf2 100644 > --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c > +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c > @@ -266,7 +266,7 @@ int kvmppc_mmu_hv_init(void) > return -EINVAL; > nr_lpids = 1UL << mmu_lpid_bits; > } else { > - nr_lpids = KVM_MAX_NESTED_GUESTS; > + nr_lpids = 1UL << KVM_MAX_NESTED_GUESTS_SHIFT; > } > > if (nr_lpids > KVMPPC_NR_LPIDS) > diff --git a/arch/powerpc/kvm/book3s_hv_nested.c > b/arch/powerpc/kvm/book3s_hv_nested.c > index 1eff969b095c..75169e0753ce 100644 > --- a/arch/powerpc/kvm/book3s_hv_nested.c > +++ b/arch/powerpc/kvm/book3s_hv_nested.c > @@ -439,10 +439,11 @@ long kvmhv_nested_init(void) > if (!radix_enabled()) > return -ENODEV; > > - /* find log base 2 of KVMPPC_NR_LPIDS, rounding up */ > - ptb_order = __ilog2(KVMPPC_NR_LPIDS - 1) + 1; > - if (ptb_order < 8) > - ptb_order = 8; > + /* Partition table entry is 1<<4 bytes in size, hence the 4. */ > + ptb_order = KVM_MAX_NESTED_GUESTS_SHIFT + 4; > + /* Minimum partition table size is 1<<12 bytes */ > + if (ptb_order < 12) > + ptb_order = 12; > pseries_partition_tb = kmalloc(sizeof(struct patb_entry) << ptb_order, > GFP_KERNEL); > if (!pseries_partition_tb) { > @@ -450,7 +451,7 @@ long kvmhv_nested_init(void) > return -ENOMEM; > } > > - ptcr = __pa(pseries_partition_tb) | (ptb_order - 8); > + ptcr = __pa(pseries_partition_tb) | (ptb_order - 12); > rc = plpar_hcall_norets(H_SET_PARTITION_TABLE, ptcr); > if (rc != H_SUCCESS) { > pr_err("kvm-hv: Parent hypervisor does not support nesting > (rc=%ld)\n", > @@ -534,16 +535,14 @@ long kvmhv_set_partition_table(struct kvm_vcpu *vcpu) > long ret = H_SUCCESS; > > srcu_idx = srcu_read_lock(&kvm->srcu); > - /* > - * Limit the partition table to 4096 entries (because that's what > - * hardware supports), and check the base address. > - */ > - if ((ptcr & PRTS_MASK) > 12 - 8 || > + /* Check partition size and base address. */ > + if ((ptcr & PRTS_MASK) + 12 - 4 > KVM_MAX_NESTED_GUESTS_SHIFT || > !kvm_is_visible_gfn(vcpu->kvm, (ptcr & PRTB_MASK) >> PAGE_SHIFT)) > ret = H_PARAMETER; > srcu_read_unlock(&kvm->srcu, srcu_idx); > if (ret == H_SUCCESS) > kvm->arch.l1_ptcr = ptcr; > + > return ret; > } > > @@ -639,7 +638,7 @@ static void kvmhv_update_ptbl_cache(struct > kvm_nested_guest *gp) > > ret = -EFAULT; > ptbl_addr = (kvm->arch.l1_ptcr & PRTB_MASK) + (gp->l1_lpid << 4); > - if (gp->l1_lpid < (1ul << ((kvm->arch.l1_ptcr & PRTS_MASK) + 8))) { > + if (gp->l1_lpid < (1ul << ((kvm->arch.l1_ptcr & PRTS_MASK) + 12
Re: [PATCH 4/6] KVM: PPC: Book3S HV Nested: Change nested guest lookup to use idr
Nicholas Piggin writes: > This removes the fixed sized kvm->arch.nested_guests array. > > Signed-off-by: Nicholas Piggin > --- Reviewed-by: Fabiano Rosas > arch/powerpc/include/asm/kvm_host.h | 3 +- > arch/powerpc/kvm/book3s_hv_nested.c | 110 +++- > 2 files changed, 59 insertions(+), 54 deletions(-) > > diff --git a/arch/powerpc/include/asm/kvm_host.h > b/arch/powerpc/include/asm/kvm_host.h > index d9bf60bf0816..5fd0564e5c94 100644 > --- a/arch/powerpc/include/asm/kvm_host.h > +++ b/arch/powerpc/include/asm/kvm_host.h > @@ -326,8 +326,7 @@ struct kvm_arch { > struct list_head uvmem_pfns; > struct mutex mmu_setup_lock;/* nests inside vcpu mutexes */ > u64 l1_ptcr; > - int max_nested_lpid; > - struct kvm_nested_guest *nested_guests[KVM_MAX_NESTED_GUESTS]; > + struct idr kvm_nested_guest_idr; > /* This array can grow quite large, keep it at the end */ > struct kvmppc_vcore *vcores[KVM_MAX_VCORES]; > #endif > diff --git a/arch/powerpc/kvm/book3s_hv_nested.c > b/arch/powerpc/kvm/book3s_hv_nested.c > index 9d373f8963ee..1eff969b095c 100644 > --- a/arch/powerpc/kvm/book3s_hv_nested.c > +++ b/arch/powerpc/kvm/book3s_hv_nested.c > @@ -521,11 +521,6 @@ static void kvmhv_set_nested_ptbl(struct > kvm_nested_guest *gp) > kvmhv_set_ptbl_entry(gp->shadow_lpid, dw0, gp->process_table); > } > > -void kvmhv_vm_nested_init(struct kvm *kvm) > -{ > - kvm->arch.max_nested_lpid = -1; > -} > - > /* > * Handle the H_SET_PARTITION_TABLE hcall. > * r4 = guest real address of partition table + log_2(size) - 12 > @@ -660,6 +655,35 @@ static void kvmhv_update_ptbl_cache(struct > kvm_nested_guest *gp) > kvmhv_set_nested_ptbl(gp); > } > > +void kvmhv_vm_nested_init(struct kvm *kvm) > +{ > + idr_init(&kvm->arch.kvm_nested_guest_idr); > +} > + > +static struct kvm_nested_guest *__find_nested(struct kvm *kvm, int lpid) > +{ > + return idr_find(&kvm->arch.kvm_nested_guest_idr, lpid); > +} > + > +static bool __prealloc_nested(struct kvm *kvm, int lpid) > +{ > + if (idr_alloc(&kvm->arch.kvm_nested_guest_idr, > + NULL, lpid, lpid + 1, GFP_KERNEL) != lpid) > + return false; > + return true; > +} > + > +static void __add_nested(struct kvm *kvm, int lpid, struct kvm_nested_guest > *gp) > +{ > + if (idr_replace(&kvm->arch.kvm_nested_guest_idr, gp, lpid)) > + WARN_ON(1); > +} > + > +static void __remove_nested(struct kvm *kvm, int lpid) > +{ > + idr_remove(&kvm->arch.kvm_nested_guest_idr, lpid); > +} > + > static struct kvm_nested_guest *kvmhv_alloc_nested(struct kvm *kvm, unsigned > int lpid) > { > struct kvm_nested_guest *gp; > @@ -720,13 +744,8 @@ static void kvmhv_remove_nested(struct kvm_nested_guest > *gp) > long ref; > > spin_lock(&kvm->mmu_lock); > - if (gp == kvm->arch.nested_guests[lpid]) { > - kvm->arch.nested_guests[lpid] = NULL; > - if (lpid == kvm->arch.max_nested_lpid) { > - while (--lpid >= 0 && !kvm->arch.nested_guests[lpid]) > - ; > - kvm->arch.max_nested_lpid = lpid; > - } > + if (gp == __find_nested(kvm, lpid)) { > + __remove_nested(kvm, lpid); > --gp->refcnt; > } > ref = gp->refcnt; > @@ -743,24 +762,22 @@ static void kvmhv_remove_nested(struct kvm_nested_guest > *gp) > */ > void kvmhv_release_all_nested(struct kvm *kvm) > { > - int i; > + int lpid; > struct kvm_nested_guest *gp; > struct kvm_nested_guest *freelist = NULL; > struct kvm_memory_slot *memslot; > int srcu_idx, bkt; > > spin_lock(&kvm->mmu_lock); > - for (i = 0; i <= kvm->arch.max_nested_lpid; i++) { > - gp = kvm->arch.nested_guests[i]; > - if (!gp) > - continue; > - kvm->arch.nested_guests[i] = NULL; > + idr_for_each_entry(&kvm->arch.kvm_nested_guest_idr, gp, lpid) { > + __remove_nested(kvm, lpid); > if (--gp->refcnt == 0) { > gp->next = freelist; > freelist = gp; > } > } > - kvm->arch.max_nested_lpid = -1; > + idr_destroy(&kvm->arch.kvm_nested_guest_idr); > + /* idr is empty and may be reused at this point */ > spin_unlock(&kvm->mmu_lock); > while ((gp = freelist) != NULL) { >
Re: [PATCH 3/6] KVM: PPC: Book3S HV: Use IDA allocator for LPID allocator
Nicholas Piggin writes: > This removes the fixed-size lpid_inuse array. > > Signed-off-by: Nicholas Piggin > --- Reviewed-by: Fabiano Rosas > arch/powerpc/kvm/powerpc.c | 25 + > 1 file changed, 13 insertions(+), 12 deletions(-) > > diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c > index 102993462872..c527a5751b46 100644 > --- a/arch/powerpc/kvm/powerpc.c > +++ b/arch/powerpc/kvm/powerpc.c > @@ -2453,20 +2453,22 @@ long kvm_arch_vm_ioctl(struct file *filp, > return r; > } > > -static unsigned long lpid_inuse[BITS_TO_LONGS(KVMPPC_NR_LPIDS)]; > +static DEFINE_IDA(lpid_inuse); > static unsigned long nr_lpids; > > long kvmppc_alloc_lpid(void) > { > - long lpid; > + int lpid; > > - do { > - lpid = find_first_zero_bit(lpid_inuse, KVMPPC_NR_LPIDS); > - if (lpid >= nr_lpids) { > + /* The host LPID must always be 0 (allocation starts at 1) */ > + lpid = ida_alloc_range(&lpid_inuse, 1, nr_lpids - 1, GFP_KERNEL); > + if (lpid < 0) { > + if (lpid == -ENOMEM) > + pr_err("%s: Out of memory\n", __func__); > + else > pr_err("%s: No LPIDs free\n", __func__); > - return -ENOMEM; > - } > - } while (test_and_set_bit(lpid, lpid_inuse)); > + return -ENOMEM; > + } > > return lpid; > } > @@ -2474,15 +2476,14 @@ EXPORT_SYMBOL_GPL(kvmppc_alloc_lpid); > > void kvmppc_free_lpid(long lpid) > { > - clear_bit(lpid, lpid_inuse); > + ida_free(&lpid_inuse, lpid); > } > EXPORT_SYMBOL_GPL(kvmppc_free_lpid); > > +/* nr_lpids_param includes the host LPID */ > void kvmppc_init_lpid(unsigned long nr_lpids_param) > { > - nr_lpids = min_t(unsigned long, KVMPPC_NR_LPIDS, nr_lpids_param); > - memset(lpid_inuse, 0, sizeof(lpid_inuse)); > - set_bit(0, lpid_inuse); /* The host LPID must always be 0 */ > + nr_lpids = nr_lpids_param; > } > EXPORT_SYMBOL_GPL(kvmppc_init_lpid);
Re: [PATCH 2/6] KVM: PPC: Book3S HV: Update LPID allocator init for POWER9, Nested
Nicholas Piggin writes: > The LPID allocator init is changed to: > - use mmu_lpid_bits rather than hard-coding; > - use KVM_MAX_NESTED_GUESTS for nested hypervisors; > - not reserve the top LPID on POWER9 and newer CPUs. > > The reserved LPID is made a POWER7/8-specific detail. > > Signed-off-by: Nicholas Piggin > --- Reviewed-by: Fabiano Rosas > arch/powerpc/include/asm/kvm_book3s_asm.h | 2 +- > arch/powerpc/include/asm/reg.h| 2 -- > arch/powerpc/kvm/book3s_64_mmu_hv.c | 29 --- > arch/powerpc/kvm/book3s_hv_rmhandlers.S | 8 +++ > arch/powerpc/mm/init_64.c | 3 +++ > 5 files changed, 33 insertions(+), 11 deletions(-) > > diff --git a/arch/powerpc/include/asm/kvm_book3s_asm.h > b/arch/powerpc/include/asm/kvm_book3s_asm.h > index b6d31bff5209..e6bda70b1d93 100644 > --- a/arch/powerpc/include/asm/kvm_book3s_asm.h > +++ b/arch/powerpc/include/asm/kvm_book3s_asm.h > @@ -15,7 +15,7 @@ > #define XICS_IPI 2 /* interrupt source # for IPIs */ > > /* LPIDs we support with this build -- runtime limit may be lower */ > -#define KVMPPC_NR_LPIDS (LPID_RSVD + 1) > +#define KVMPPC_NR_LPIDS (1UL << 12) > > /* Maximum number of threads per physical core */ > #define MAX_SMT_THREADS 8 > diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h > index 1e14324c5190..1e8b2e04e626 100644 > --- a/arch/powerpc/include/asm/reg.h > +++ b/arch/powerpc/include/asm/reg.h > @@ -473,8 +473,6 @@ > #ifndef SPRN_LPID > #define SPRN_LPID0x13F /* Logical Partition Identifier */ > #endif > -#define LPID_RSVD_POWER7 0x3ff /* Reserved LPID for partn switching */ > -#define LPID_RSVD 0xfff /* Reserved LPID for partn switching */ > #define SPRN_HMER 0x150 /* Hypervisor maintenance exception reg > */ > #define HMER_DEBUG_TRIG(1ul << (63 - 17)) /* Debug trigger */ > #define SPRN_HMEER 0x151 /* Hyp maintenance exception enable reg > */ > diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c > b/arch/powerpc/kvm/book3s_64_mmu_hv.c > index 09fc52b6f390..5be92d5bc099 100644 > --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c > +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c > @@ -256,7 +256,7 @@ void kvmppc_map_vrma(struct kvm_vcpu *vcpu, struct > kvm_memory_slot *memslot, > > int kvmppc_mmu_hv_init(void) > { > - unsigned long rsvd_lpid; > + unsigned long nr_lpids; > > if (!mmu_has_feature(MMU_FTR_LOCKLESS_TLBIE)) > return -EINVAL; > @@ -264,16 +264,29 @@ int kvmppc_mmu_hv_init(void) > if (cpu_has_feature(CPU_FTR_HVMODE)) { > if (WARN_ON(mfspr(SPRN_LPID) != 0)) > return -EINVAL; > + nr_lpids = 1UL << mmu_lpid_bits; > + } else { > + nr_lpids = KVM_MAX_NESTED_GUESTS; > } > > - /* POWER8 and above have 12-bit LPIDs (10-bit in POWER7) */ > - if (cpu_has_feature(CPU_FTR_ARCH_207S)) > - rsvd_lpid = LPID_RSVD; > - else > - rsvd_lpid = LPID_RSVD_POWER7; > + if (nr_lpids > KVMPPC_NR_LPIDS) > + nr_lpids = KVMPPC_NR_LPIDS; > + > + if (!cpu_has_feature(CPU_FTR_ARCH_300)) { > + /* POWER7 has 10-bit LPIDs, POWER8 has 12-bit LPIDs */ > + if (cpu_has_feature(CPU_FTR_ARCH_207S)) > + WARN_ON(nr_lpids != 1UL << 12); > + else > + WARN_ON(nr_lpids != 1UL << 10); > + > + /* > + * Reserve the last implemented LPID use in partition > + * switching for POWER7 and POWER8. > + */ > + nr_lpids -= 1; > + } > > - /* rsvd_lpid is reserved for use in partition switching */ > - kvmppc_init_lpid(rsvd_lpid); > + kvmppc_init_lpid(nr_lpids); > > return 0; > } > diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S > b/arch/powerpc/kvm/book3s_hv_rmhandlers.S > index d185dee26026..0c552885a032 100644 > --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S > +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S > @@ -50,6 +50,14 @@ > #define STACK_SLOT_UAMOR (SFS-88) > #define STACK_SLOT_FSCR (SFS-96) > > +/* > + * Use the last LPID (all implemented LPID bits = 1) for partition switching. > + * This is reserved in the LPID allocator. POWER7 only implements 0x3ff, but > + * we write 0xfff into the LPID SPR anyway, which seems to work and just > + * ignores the top bits. > + */ > +#define LPID_RSVD 0xfff > + > /* > * Call kvmppc_hv_entry in real mode. > *
[PATCH v2 4/4] KVM: PPC: Decrement module refcount if init_vm fails
We increment the reference count for KVM-HV/PR before the call to kvmppc_core_init_vm. If that function fails we need to decrement the refcount. Signed-off-by: Fabiano Rosas --- Caught this while testing Nick's LPID patches by looking at /sys/module/kvm_hv/refcnt --- arch/powerpc/kvm/powerpc.c | 7 ++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c index 2ad0ccd202d5..4285d0eac900 100644 --- a/arch/powerpc/kvm/powerpc.c +++ b/arch/powerpc/kvm/powerpc.c @@ -431,6 +431,8 @@ int kvm_arch_check_processor_compat(void *opaque) int kvm_arch_init_vm(struct kvm *kvm, unsigned long type) { struct kvmppc_ops *kvm_ops = NULL; + int r; + /* * if we have both HV and PR enabled, default is HV */ @@ -456,7 +458,10 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type) return -ENOENT; kvm->arch.kvm_ops = kvm_ops; - return kvmppc_core_init_vm(kvm); + r = kvmppc_core_init_vm(kvm); + if (r) + module_put(kvm->arch.kvm_ops->owner); + return r; err_out: return -EINVAL; } -- 2.34.1
[PATCH v2 3/4] KVM: PPC: Book3S HV: Free allocated memory if module init fails
The module's exit function is not called when the init fails, we need to do cleanup before returning. Signed-off-by: Fabiano Rosas Reviewed-by: Nicholas Piggin --- arch/powerpc/kvm/book3s_hv.c | 15 +++ 1 file changed, 11 insertions(+), 4 deletions(-) diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index b9aace212599..87a49651a402 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -6104,7 +6104,7 @@ static int kvmppc_book3s_init_hv(void) if (!cpu_has_feature(CPU_FTR_ARCH_300)) { r = kvm_init_subcore_bitmap(); if (r) - return r; + goto err; } /* @@ -6120,7 +6120,8 @@ static int kvmppc_book3s_init_hv(void) np = of_find_compatible_node(NULL, NULL, "ibm,opal-intc"); if (!np) { pr_err("KVM-HV: Cannot determine method for accessing XICS\n"); - return -ENODEV; + r = -ENODEV; + goto err; } /* presence of intc confirmed - node can be dropped again */ of_node_put(np); @@ -6133,12 +6134,12 @@ static int kvmppc_book3s_init_hv(void) r = kvmppc_mmu_hv_init(); if (r) - return r; + goto err; if (kvmppc_radix_possible()) { r = kvmppc_radix_init(); if (r) - return r; + goto err; } r = kvmppc_uvmem_init(); @@ -6151,6 +6152,12 @@ static int kvmppc_book3s_init_hv(void) kvmppc_hv_ops = &kvm_ops_hv; return 0; + +err: + kvmhv_nested_exit(); + kvmppc_radix_exit(); + + return r; } static void kvmppc_book3s_exit_hv(void) -- 2.34.1
[PATCH v2 2/4] KVM: PPC: Book3S HV: Delay setting of kvm ops
Delay the setting of kvm_hv_ops until after all init code has completed. This avoids leaving the ops still accessible if the init fails. Signed-off-by: Fabiano Rosas --- arch/powerpc/kvm/book3s_hv.c | 12 +++- 1 file changed, 7 insertions(+), 5 deletions(-) diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index 3a3845f366d4..b9aace212599 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -6127,9 +6127,6 @@ static int kvmppc_book3s_init_hv(void) } #endif - kvm_ops_hv.owner = THIS_MODULE; - kvmppc_hv_ops = &kvm_ops_hv; - init_default_hcalls(); init_vcore_lists(); @@ -6145,10 +6142,15 @@ static int kvmppc_book3s_init_hv(void) } r = kvmppc_uvmem_init(); - if (r < 0) + if (r < 0) { pr_err("KVM-HV: kvmppc_uvmem_init failed %d\n", r); + return r; + } - return r; + kvm_ops_hv.owner = THIS_MODULE; + kvmppc_hv_ops = &kvm_ops_hv; + + return 0; } static void kvmppc_book3s_exit_hv(void) -- 2.34.1
[PATCH v2 1/4] KVM: PPC: Book3S HV: Check return value of kvmppc_radix_init
The return of the function is being shadowed by the call to kvmppc_uvmem_init. Fixes: ca9f4942670c ("KVM: PPC: Book3S HV: Support for running secure guests") Signed-off-by: Fabiano Rosas Reviewed-by: Nicholas Piggin --- arch/powerpc/kvm/book3s_hv.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index d1817cd9a691..3a3845f366d4 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -6138,8 +6138,11 @@ static int kvmppc_book3s_init_hv(void) if (r) return r; - if (kvmppc_radix_possible()) + if (kvmppc_radix_possible()) { r = kvmppc_radix_init(); + if (r) + return r; + } r = kvmppc_uvmem_init(); if (r < 0) -- 2.34.1
Re: [PATCH] KVM: PPC: Book3S HV P9: Optimise loads around context switch
Nicholas Piggin writes: > It is better to get all loads for the register values in flight > before starting to switch LPID, PID, and LPCR because those > mtSPRs are expensive and serialising. > > This also just tidies up the code for a potential future change > to the context switching sequence. > > Signed-off-by: Nicholas Piggin Reviewed-by: Fabiano Rosas
Re: [PATCH] KVM: PPC: Book3S HV: HFSCR[PREFIX] does not exist
Nicholas Piggin writes: > This facility is controlled by FSCR only. Reserved bits should not be > set in the HFSCR register (although it's likely harmless as this > position would not be re-used, and the L0 is forgiving here too). > > Signed-off-by: Nicholas Piggin Reviewed-by: Fabiano Rosas > --- > arch/powerpc/include/asm/reg.h | 1 - > arch/powerpc/kvm/book3s_hv.c | 2 +- > 2 files changed, 1 insertion(+), 2 deletions(-) > > diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h > index 2835f6363228..1e14324c5190 100644 > --- a/arch/powerpc/include/asm/reg.h > +++ b/arch/powerpc/include/asm/reg.h > @@ -417,7 +417,6 @@ > #define FSCR_DSCR __MASK(FSCR_DSCR_LG) > #define FSCR_INTR_CAUSE (ASM_CONST(0xFF) << 56)/* interrupt cause */ > #define SPRN_HFSCR 0xbe/* HV=1 Facility Status & Control Register */ > -#define HFSCR_PREFIX __MASK(FSCR_PREFIX_LG) > #define HFSCR_MSGP __MASK(FSCR_MSGP_LG) > #define HFSCR_TAR __MASK(FSCR_TAR_LG) > #define HFSCR_EBB __MASK(FSCR_EBB_LG) > diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c > index 84c89f08ae9a..be8914c3dde9 100644 > --- a/arch/powerpc/kvm/book3s_hv.c > +++ b/arch/powerpc/kvm/book3s_hv.c > @@ -2830,7 +2830,7 @@ static int kvmppc_core_vcpu_create_hv(struct kvm_vcpu > *vcpu) >* to trap and then we emulate them. >*/ > vcpu->arch.hfscr = HFSCR_TAR | HFSCR_EBB | HFSCR_PM | HFSCR_BHRB | > - HFSCR_DSCR | HFSCR_VECVSX | HFSCR_FP | HFSCR_PREFIX; > + HFSCR_DSCR | HFSCR_VECVSX | HFSCR_FP; > if (cpu_has_feature(CPU_FTR_HVMODE)) { > vcpu->arch.hfscr &= mfspr(SPRN_HFSCR); > #ifdef CONFIG_PPC_TRANSACTIONAL_MEM
Re: [PATCH] KVM: PPC: Book3S HV Nested: Fix nested HFSCR being clobbered with multiple vCPUs
Nicholas Piggin writes: > The L0 is storing HFSCR requested by the L1 for the L2 in struct > kvm_nested_guest when the L1 requests a vCPU enter L2. kvm_nested_guest > is not a per-vCPU structure. Hilarity ensues. > > Fix it by moving the nested hfscr into the vCPU structure together with > the other per-vCPU nested fields. > > Fixes: 8b210a880b35 ("KVM: PPC: Book3S HV Nested: Make nested HFSCR state > accessible") > Signed-off-by: Nicholas Piggin Reviewed-by: Fabiano Rosas
[PATCH v4 5/5] KVM: PPC: mmio: Deliver DSI after emulation failure
MMIO emulation can fail if the guest uses an instruction that we are not prepared to emulate. Since these instructions can be and most likely are valid ones, this is (slightly) closer to an access fault than to an illegal instruction, so deliver a Data Storage interrupt instead of a Program interrupt. Suggested-by: Nicholas Piggin Signed-off-by: Fabiano Rosas --- arch/powerpc/kvm/emulate_loadstore.c | 10 +++--- arch/powerpc/kvm/powerpc.c | 12 2 files changed, 15 insertions(+), 7 deletions(-) diff --git a/arch/powerpc/kvm/emulate_loadstore.c b/arch/powerpc/kvm/emulate_loadstore.c index 48272a9b9c30..cfc9114b87d0 100644 --- a/arch/powerpc/kvm/emulate_loadstore.c +++ b/arch/powerpc/kvm/emulate_loadstore.c @@ -73,7 +73,6 @@ int kvmppc_emulate_loadstore(struct kvm_vcpu *vcpu) { u32 inst; enum emulation_result emulated = EMULATE_FAIL; - int advance = 1; struct instruction_op op; /* this default type might be overwritten by subcategories */ @@ -98,6 +97,8 @@ int kvmppc_emulate_loadstore(struct kvm_vcpu *vcpu) int type = op.type & INSTR_TYPE_MASK; int size = GETSIZE(op.type); + vcpu->mmio_is_write = OP_IS_STORE(type); + switch (type) { case LOAD: { int instr_byte_swap = op.type & BYTEREV; @@ -355,15 +356,10 @@ int kvmppc_emulate_loadstore(struct kvm_vcpu *vcpu) } } - if (emulated == EMULATE_FAIL) { - advance = 0; - kvmppc_core_queue_program(vcpu, 0); - } - trace_kvm_ppc_instr(inst, kvmppc_get_pc(vcpu), emulated); /* Advance past emulated instruction. */ - if (advance) + if (emulated != EMULATE_FAIL) kvmppc_set_pc(vcpu, kvmppc_get_pc(vcpu) + 4); return emulated; diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c index 214602c58f13..9befb121dddb 100644 --- a/arch/powerpc/kvm/powerpc.c +++ b/arch/powerpc/kvm/powerpc.c @@ -305,10 +305,22 @@ int kvmppc_emulate_mmio(struct kvm_vcpu *vcpu) case EMULATE_FAIL: { u32 last_inst; + ulong store_bit = DSISR_ISSTORE; + ulong cause = DSISR_BADACCESS; +#ifdef CONFIG_BOOKE + store_bit = ESR_ST; + cause = 0; +#endif kvmppc_get_last_inst(vcpu, INST_GENERIC, &last_inst); pr_info_ratelimited("KVM: guest access to device memory using unsupported instruction (PID: %d opcode: %#08x)\n", current->pid, last_inst); + + if (vcpu->mmio_is_write) + cause |= store_bit; + + kvmppc_core_queue_data_storage(vcpu, vcpu->arch.vaddr_accessed, + cause); r = RESUME_GUEST; break; } -- 2.34.1
[PATCH v4 4/5] KVM: PPC: mmio: Return to guest after emulation failure
If MMIO emulation fails we don't want to crash the whole guest by returning to userspace. The original commit bbf45ba57eae ("KVM: ppc: PowerPC 440 KVM implementation") added a todo: /* XXX Deliver Program interrupt to guest. */ and later the commit d69614a295ae ("KVM: PPC: Separate loadstore emulation from priv emulation") added the Program interrupt injection but in another file, so I'm assuming it was missed that this block needed to be altered. Also change the message to a ratelimited one since we're letting the guest run and it could flood the host logs. Signed-off-by: Fabiano Rosas --- arch/powerpc/kvm/powerpc.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c index 27fb2b70f631..214602c58f13 100644 --- a/arch/powerpc/kvm/powerpc.c +++ b/arch/powerpc/kvm/powerpc.c @@ -307,9 +307,9 @@ int kvmppc_emulate_mmio(struct kvm_vcpu *vcpu) u32 last_inst; kvmppc_get_last_inst(vcpu, INST_GENERIC, &last_inst); - /* XXX Deliver Program interrupt to guest. */ - pr_emerg("%s: emulation failed (%08x)\n", __func__, last_inst); - r = RESUME_HOST; + pr_info_ratelimited("KVM: guest access to device memory using unsupported instruction (PID: %d opcode: %#08x)\n", + current->pid, last_inst); + r = RESUME_GUEST; break; } default: -- 2.34.1
[PATCH v4 3/5] KVM: PPC: mmio: Reject instructions that access more than mmio.data size
The MMIO interface between the kernel and userspace uses a structure that supports a maximum of 8-bytes of data. Instructions that access more than that need to be emulated in parts. We currently don't have generic support for splitting the emulation in parts and each set of instructions needs to be explicitly included. There's already an error message being printed when a load or store exceeds the mmio.data buffer but we don't fail the emulation until later at kvmppc_complete_mmio_load and even then we allow userspace to make a partial copy of the data, which ends up overwriting some fields of the mmio structure. This patch makes the emulation fail earlier at kvmppc_handle_load|store, which will send a Program interrupt to the guest. This is better than allowing the guest to proceed with partial data. Note that this was caught in a somewhat artificial scenario using quadword instructions (lq/stq), there's no account of an actual guest in the wild running instructions that are not properly emulated. (While here, remove the "bad MMIO" messages. The caller already has an error message.) Signed-off-by: Fabiano Rosas Reviewed-by: Alexey Kardashevskiy --- arch/powerpc/kvm/powerpc.c | 16 +--- 1 file changed, 5 insertions(+), 11 deletions(-) diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c index c2bd29e90314..27fb2b70f631 100644 --- a/arch/powerpc/kvm/powerpc.c +++ b/arch/powerpc/kvm/powerpc.c @@ -1114,10 +1114,8 @@ static void kvmppc_complete_mmio_load(struct kvm_vcpu *vcpu) struct kvm_run *run = vcpu->run; u64 gpr; - if (run->mmio.len > sizeof(gpr)) { - printk(KERN_ERR "bad MMIO length: %d\n", run->mmio.len); + if (run->mmio.len > sizeof(gpr)) return; - } if (!vcpu->arch.mmio_host_swabbed) { switch (run->mmio.len) { @@ -1236,10 +1234,8 @@ static int __kvmppc_handle_load(struct kvm_vcpu *vcpu, host_swabbed = !is_default_endian; } - if (bytes > sizeof(run->mmio.data)) { - printk(KERN_ERR "%s: bad MMIO length: %d\n", __func__, - run->mmio.len); - } + if (bytes > sizeof(run->mmio.data)) + return EMULATE_FAIL; run->mmio.phys_addr = vcpu->arch.paddr_accessed; run->mmio.len = bytes; @@ -1325,10 +1321,8 @@ int kvmppc_handle_store(struct kvm_vcpu *vcpu, host_swabbed = !is_default_endian; } - if (bytes > sizeof(run->mmio.data)) { - printk(KERN_ERR "%s: bad MMIO length: %d\n", __func__, - run->mmio.len); - } + if (bytes > sizeof(run->mmio.data)) + return EMULATE_FAIL; run->mmio.phys_addr = vcpu->arch.paddr_accessed; run->mmio.len = bytes; -- 2.34.1
[PATCH v4 2/5] KVM: PPC: Fix vmx/vsx mixup in mmio emulation
The MMIO emulation code for vector instructions is duplicated between VSX and VMX. When emulating VMX we should check the VMX copy size instead of the VSX one. Fixes: acc9eb9305fe ("KVM: PPC: Reimplement LOAD_VMX/STORE_VMX instruction ...") Signed-off-by: Fabiano Rosas Reviewed-by: Nicholas Piggin --- arch/powerpc/kvm/powerpc.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c index 50414fb2a5ea..c2bd29e90314 100644 --- a/arch/powerpc/kvm/powerpc.c +++ b/arch/powerpc/kvm/powerpc.c @@ -1499,7 +1499,7 @@ int kvmppc_handle_vmx_load(struct kvm_vcpu *vcpu, { enum emulation_result emulated = EMULATE_DONE; - if (vcpu->arch.mmio_vsx_copy_nums > 2) + if (vcpu->arch.mmio_vmx_copy_nums > 2) return EMULATE_FAIL; while (vcpu->arch.mmio_vmx_copy_nums) { @@ -1596,7 +1596,7 @@ int kvmppc_handle_vmx_store(struct kvm_vcpu *vcpu, unsigned int index = rs & KVM_MMIO_REG_MASK; enum emulation_result emulated = EMULATE_DONE; - if (vcpu->arch.mmio_vsx_copy_nums > 2) + if (vcpu->arch.mmio_vmx_copy_nums > 2) return EMULATE_FAIL; vcpu->arch.io_gpr = rs; -- 2.34.1
[PATCH v4 1/5] KVM: PPC: Book3S HV: Stop returning internal values to userspace
Our kvm_arch_vcpu_ioctl_run currently returns the RESUME_HOST values to userspace, against the API of the KVM_RUN ioctl which returns 0 on success. Signed-off-by: Fabiano Rosas Reviewed-by: Nicholas Piggin --- This was noticed while enabling the kvm selftests for powerpc. There's an assert at the _vcpu_run function when we return a value different from the expected. --- arch/powerpc/kvm/powerpc.c | 8 1 file changed, 8 insertions(+) diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c index 2ad0ccd202d5..50414fb2a5ea 100644 --- a/arch/powerpc/kvm/powerpc.c +++ b/arch/powerpc/kvm/powerpc.c @@ -1841,6 +1841,14 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu) #ifdef CONFIG_ALTIVEC out: #endif + + /* +* We're already returning to userspace, don't pass the +* RESUME_HOST flags along. +*/ + if (r > 0) + r = 0; + vcpu_put(vcpu); return r; } -- 2.34.1
[PATCH v4 0/5] KVM: PPC: MMIO fixes
Changes from v3: Removed all of the low level messages and altered the pr_emerg in the emulate_mmio to a more descriptive message. Changed the Program interrupt to a Data Storage. There's an ifdef needed because this code is shared by HV, PR and booke. v3: https://lore.kernel.org/r/20220107210012.4091153-1-faro...@linux.ibm.com v2: https://lore.kernel.org/r/20220106200304.4070825-1-faro...@linux.ibm.com v1: https://lore.kernel.org/r/20211223211528.3560711-1-faro...@linux.ibm.com Fabiano Rosas (5): KVM: PPC: Book3S HV: Stop returning internal values to userspace KVM: PPC: Fix vmx/vsx mixup in mmio emulation KVM: PPC: mmio: Reject instructions that access more than mmio.data size KVM: PPC: mmio: Return to guest after emulation failure KVM: PPC: mmio: Deliver DSI after emulation failure arch/powerpc/kvm/emulate_loadstore.c | 10 ++ arch/powerpc/kvm/powerpc.c | 46 ++-- 2 files changed, 33 insertions(+), 23 deletions(-) -- 2.34.1
[PATCH 2/2] KVM: selftests: Add support for ppc64le
This adds the infrastructure for writing tests for the powerpc platform (Only Radix MMU for now). This patch also enables two tests: - a dummy sample test that creates a guest with one vcpu, issues hypercalls and reads/writes test values from memory. - the kvm_page_table test, although at this point I'm not using it to test KVM, but mostly as a way to stress test this code. $ make -C tools/testing/selftests TARGETS=kvm $ make -C tools/testing/selftests TARGETS=kvm run_tests Signed-off-by: Fabiano Rosas --- MAINTAINERS | 3 + tools/testing/selftests/kvm/.gitignore| 1 + tools/testing/selftests/kvm/Makefile | 14 +- .../selftests/kvm/include/kvm_util_base.h | 7 + .../selftests/kvm/include/ppc64le/processor.h | 43 +++ tools/testing/selftests/kvm/lib/kvm_util.c| 5 + .../testing/selftests/kvm/lib/powerpc/hcall.S | 6 + .../selftests/kvm/lib/powerpc/processor.c | 343 ++ .../testing/selftests/kvm/lib/powerpc/ucall.c | 67 .../selftests/kvm/powerpc/sample_test.c | 144 10 files changed, 630 insertions(+), 3 deletions(-) create mode 100644 tools/testing/selftests/kvm/include/ppc64le/processor.h create mode 100644 tools/testing/selftests/kvm/lib/powerpc/hcall.S create mode 100644 tools/testing/selftests/kvm/lib/powerpc/processor.c create mode 100644 tools/testing/selftests/kvm/lib/powerpc/ucall.c create mode 100644 tools/testing/selftests/kvm/powerpc/sample_test.c diff --git a/MAINTAINERS b/MAINTAINERS index a76e7558b151..15c89d33d584 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -10537,6 +10537,9 @@ F: arch/powerpc/include/asm/kvm* F: arch/powerpc/include/uapi/asm/kvm* F: arch/powerpc/kernel/kvm* F: arch/powerpc/kvm/ +F: tools/testing/selftests/kvm/include/ppc64le/ +F: tools/testing/selftests/kvm/lib/powerpc/ +F: tools/testing/selftests/kvm/powerpc/ KERNEL VIRTUAL MACHINE FOR RISC-V (KVM/riscv) M: Anup Patel diff --git a/tools/testing/selftests/kvm/.gitignore b/tools/testing/selftests/kvm/.gitignore index 8c129961accf..45ab993e2845 100644 --- a/tools/testing/selftests/kvm/.gitignore +++ b/tools/testing/selftests/kvm/.gitignore @@ -46,6 +46,7 @@ /x86_64/xen_vmcall_test /x86_64/xss_msr_test /x86_64/vmx_pmu_msrs_test +/powerpc/sample_test /access_tracking_perf_test /demand_paging_test /dirty_log_test diff --git a/tools/testing/selftests/kvm/Makefile b/tools/testing/selftests/kvm/Makefile index 556da71c33b8..5ae27109e9b9 100644 --- a/tools/testing/selftests/kvm/Makefile +++ b/tools/testing/selftests/kvm/Makefile @@ -17,9 +17,9 @@ KSFT_KHDR_INSTALL := 1 # LINUX_TOOL_ARCH_INCLUDE is set using ARCH variable. # # x86_64 targets are named to include x86_64 as a suffix and directories -# for includes are in x86_64 sub-directory. s390x and aarch64 follow the -# same convention. "uname -m" doesn't result in the correct mapping for -# s390x and aarch64. +# for includes are in x86_64 sub-directory. s390x, aarch64 and ppc64le +# follow the same convention. "uname -m" doesn't result in the correct +# mapping for s390x, aarch64 and ppc64le. # # No change necessary for x86_64 UNAME_M := $(shell uname -m) @@ -36,12 +36,17 @@ endif ifeq ($(ARCH),riscv) UNAME_M := riscv endif +# Set UNAME_M for ppc64le compile/install to work +ifeq ($(ARCH),powerpc) + UNAME_M := ppc64le +endif LIBKVM = lib/assert.c lib/elf.c lib/io.c lib/kvm_util.c lib/rbtree.c lib/sparsebit.c lib/test_util.c lib/guest_modes.c lib/perf_test_util.c LIBKVM_x86_64 = lib/x86_64/apic.c lib/x86_64/processor.c lib/x86_64/vmx.c lib/x86_64/svm.c lib/x86_64/ucall.c lib/x86_64/handlers.S LIBKVM_aarch64 = lib/aarch64/processor.c lib/aarch64/ucall.c lib/aarch64/handlers.S lib/aarch64/spinlock.c lib/aarch64/gic.c lib/aarch64/gic_v3.c lib/aarch64/vgic.c LIBKVM_s390x = lib/s390x/processor.c lib/s390x/ucall.c lib/s390x/diag318_test_handler.c LIBKVM_riscv = lib/riscv/processor.c lib/riscv/ucall.c +LIBKVM_ppc64le = lib/powerpc/processor.c lib/powerpc/ucall.c lib/powerpc/hcall.S TEST_GEN_PROGS_x86_64 = x86_64/cr4_cpuid_sync_test TEST_GEN_PROGS_x86_64 += x86_64/get_msr_index_features @@ -133,6 +138,9 @@ TEST_GEN_PROGS_riscv += kvm_page_table_test TEST_GEN_PROGS_riscv += set_memory_region_test TEST_GEN_PROGS_riscv += kvm_binary_stats_test +TEST_GEN_PROGS_ppc64le += powerpc/sample_test +TEST_GEN_PROGS_ppc64le += kvm_page_table_test + TEST_GEN_PROGS += $(TEST_GEN_PROGS_$(UNAME_M)) LIBKVM += $(LIBKVM_$(UNAME_M)) diff --git a/tools/testing/selftests/kvm/include/kvm_util_base.h b/tools/testing/selftests/kvm/include/kvm_util_base.h index 66775de26952..a930d663fe67 100644 --- a/tools/testing/selftests/kvm/include/kvm_util_base.h +++ b/tools/testing/selftests/kvm/include/kvm_util_base.h @@ -54,6 +54,7 @@ enum vm_guest_mode { VM_MODE_P36V48_16K, VM_MODE_P36V48_64K, VM_MODE_P36V47_16K, + VM_MODE_P51V52_64K,
[PATCH 0/2] KVM: selftests: Add powerpc support
This series adds the initial support for ppc64le Book3s with Radix MMU. At this time I'm including only the kvm_page_table test and a dummy test to serve as a sample of what can be done with these tests. I intend to make a pass over the remaining common tests and add the ones which could be built for powerpc as well. patch 1: a prerequisite small fix for the powerpc vcpu_ioctl. It is the same I already sent to the ppc mailing list but I'll include it here to make this a complete series. patch 2: the actual infrastructure support. Fabiano Rosas (2): KVM: PPC: Book3S HV: Stop returning internal values to userspace KVM: selftests: Add support for ppc64le MAINTAINERS | 3 + arch/powerpc/kvm/powerpc.c| 8 + tools/testing/selftests/kvm/.gitignore| 1 + tools/testing/selftests/kvm/Makefile | 14 +- .../selftests/kvm/include/kvm_util_base.h | 7 + .../selftests/kvm/include/ppc64le/processor.h | 43 +++ tools/testing/selftests/kvm/lib/kvm_util.c| 5 + .../testing/selftests/kvm/lib/powerpc/hcall.S | 6 + .../selftests/kvm/lib/powerpc/processor.c | 343 ++ .../testing/selftests/kvm/lib/powerpc/ucall.c | 67 .../selftests/kvm/powerpc/sample_test.c | 144 11 files changed, 638 insertions(+), 3 deletions(-) create mode 100644 tools/testing/selftests/kvm/include/ppc64le/processor.h create mode 100644 tools/testing/selftests/kvm/lib/powerpc/hcall.S create mode 100644 tools/testing/selftests/kvm/lib/powerpc/processor.c create mode 100644 tools/testing/selftests/kvm/lib/powerpc/ucall.c create mode 100644 tools/testing/selftests/kvm/powerpc/sample_test.c -- 2.34.1
[PATCH 1/2] KVM: PPC: Book3S HV: Stop returning internal values to userspace
Our kvm_arch_vcpu_ioctl_run currently returns the RESUME_HOST values to userspace, against the API of the KVM_RUN ioctl which returns 0 on success. Signed-off-by: Fabiano Rosas Reviewed-by: Nicholas Piggin --- This was noticed while enabling the kvm selftests for powerpc. There's an assert at the _vcpu_run function when we return a value different from the expected. --- arch/powerpc/kvm/powerpc.c | 8 1 file changed, 8 insertions(+) diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c index 2ad0ccd202d5..50414fb2a5ea 100644 --- a/arch/powerpc/kvm/powerpc.c +++ b/arch/powerpc/kvm/powerpc.c @@ -1841,6 +1841,14 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu) #ifdef CONFIG_ALTIVEC out: #endif + + /* +* We're already returning to userspace, don't pass the +* RESUME_HOST flags along. +*/ + if (r > 0) + r = 0; + vcpu_put(vcpu); return r; } -- 2.34.1
Re: [PATCH 2/3] KVM: PPC: Book3S HV: Delay setting of kvm ops
Nicholas Piggin writes: > Excerpts from Fabiano Rosas's message of December 24, 2021 7:19 am: >> Delay the setting of kvm_hv_ops until after all init code has >> completed. This avoids leaving the ops still accessible if the init >> fails. >> >> Signed-off-by: Fabiano Rosas >> --- >> arch/powerpc/kvm/book3s_hv.c | 12 +++- >> 1 file changed, 7 insertions(+), 5 deletions(-) >> >> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c >> index 9f4765951733..53400932f5d8 100644 >> --- a/arch/powerpc/kvm/book3s_hv.c >> +++ b/arch/powerpc/kvm/book3s_hv.c >> @@ -6087,9 +6087,6 @@ static int kvmppc_book3s_init_hv(void) >> } >> #endif >> >> -kvm_ops_hv.owner = THIS_MODULE; >> -kvmppc_hv_ops = &kvm_ops_hv; >> - >> init_default_hcalls(); >> >> init_vcore_lists(); >> @@ -6105,10 +6102,15 @@ static int kvmppc_book3s_init_hv(void) >> } >> >> r = kvmppc_uvmem_init(); >> -if (r < 0) >> +if (r < 0) { >> pr_err("KVM-HV: kvmppc_uvmem_init failed %d\n", r); >> +return r; >> +} >> >> -return r; >> +kvm_ops_hv.owner = THIS_MODULE; >> +kvmppc_hv_ops = &kvm_ops_hv; >> + >> +return 0; >> } >> >> static void kvmppc_book3s_exit_hv(void) >> -- >> 2.33.1 >> >> > > Also looks okay to me but KVM init has lots of details. IIRC Alexey may > have run into a related issue with ops being set too early (or was it > cleared too late?) > > Thanks, > Nick > CC Alexey
Re: [PATCH v3 5/6] KVM: PPC: mmio: Return to guest after emulation failure
Nicholas Piggin writes: > Excerpts from Alexey Kardashevskiy's message of January 11, 2022 9:51 am: >> >> >> On 1/10/22 18:36, Nicholas Piggin wrote: >>> Excerpts from Fabiano Rosas's message of January 8, 2022 7:00 am: >>>> If MMIO emulation fails we don't want to crash the whole guest by >>>> returning to userspace. >>>> >>>> The original commit bbf45ba57eae ("KVM: ppc: PowerPC 440 KVM >>>> implementation") added a todo: >>>> >>>>/* XXX Deliver Program interrupt to guest. */ >>>> >>>> and later the commit d69614a295ae ("KVM: PPC: Separate loadstore >>>> emulation from priv emulation") added the Program interrupt injection >>>> but in another file, so I'm assuming it was missed that this block >>>> needed to be altered. >>>> >>>> Signed-off-by: Fabiano Rosas >>>> Reviewed-by: Alexey Kardashevskiy >>>> --- >>>> arch/powerpc/kvm/powerpc.c | 2 +- >>>> 1 file changed, 1 insertion(+), 1 deletion(-) >>>> >>>> diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c >>>> index 6daeea4a7de1..56b0faab7a5f 100644 >>>> --- a/arch/powerpc/kvm/powerpc.c >>>> +++ b/arch/powerpc/kvm/powerpc.c >>>> @@ -309,7 +309,7 @@ int kvmppc_emulate_mmio(struct kvm_vcpu *vcpu) >>>>kvmppc_get_last_inst(vcpu, INST_GENERIC, &last_inst); >>>>kvmppc_core_queue_program(vcpu, 0); >>>>pr_info("%s: emulation failed (%08x)\n", __func__, last_inst); >>>> - r = RESUME_HOST; >>>> + r = RESUME_GUEST; >>> >>> So at this point can the pr_info just go away? >>> >>> I wonder if this shouldn't be a DSI rather than a program check. >>> DSI with DSISR[37] looks a bit more expected. Not that Linux >>> probably does much with it but at least it would give a SIGBUS >>> rather than SIGILL. >> >> It does not like it is more expected to me, it is not about wrong memory >> attributes, it is the instruction itself which cannot execute. > > It's not an illegal instruction though, it can't execute because of the > nature of the data / address it is operating on. That says d-side to me. > > DSISR[37] isn't perfect but if you squint it's not terrible. It's about > certain instructions that have restrictions operating on other than > normal cacheable mappings. I think I agree with Nick on this one. At least the DSISR gives _some_ information while the Program is maybe too generic. I would probably be staring at the opcode wondering what is wrong with it.
Re: [PATCH v3 3/6] KVM: PPC: Don't use pr_emerg when mmio emulation fails
Nicholas Piggin writes: > Excerpts from Fabiano Rosas's message of January 8, 2022 7:00 am: >> If MMIO emulation fails we deliver a Program interrupt to the >> guest. This is a normal event for the host, so use pr_info. >> >> Signed-off-by: Fabiano Rosas >> --- > > Should it be rate limited to prevent guest spamming host log? In the > case of informational messages it's always good if it gives the > administrator some idea of what to do with it. If it's normal > for the host does it even need a message? If yes, then identifying which > guest and adding something like "(this might becaused by a bug in guest > driver)" would set the poor admin's mind at rest. I'll drop this message then and improve the other one that is emitted earlier at the emulation code.
Re: [PATCH v3 6/6] KVM: PPC: mmio: Reject instructions that access more than mmio.data size
Nicholas Piggin writes: > Excerpts from Fabiano Rosas's message of January 8, 2022 7:00 am: >> The MMIO interface between the kernel and userspace uses a structure >> that supports a maximum of 8-bytes of data. Instructions that access >> more than that need to be emulated in parts. >> >> We currently don't have generic support for splitting the emulation in >> parts and each set of instructions needs to be explicitly included. >> >> There's already an error message being printed when a load or store >> exceeds the mmio.data buffer but we don't fail the emulation until >> later at kvmppc_complete_mmio_load and even then we allow userspace to >> make a partial copy of the data, which ends up overwriting some fields >> of the mmio structure. >> >> This patch makes the emulation fail earlier at kvmppc_handle_load|store, >> which will send a Program interrupt to the guest. This is better than >> allowing the guest to proceed with partial data. >> >> Note that this was caught in a somewhat artificial scenario using >> quadword instructions (lq/stq), there's no account of an actual guest >> in the wild running instructions that are not properly emulated. >> >> (While here, fix the error message to check against 'bytes' and not >> 'run->mmio.len' which at this point has an old value.) > > This looks good to me > > Reviewed-by: Nicholas Piggin > >> >> Signed-off-by: Fabiano Rosas >> Reviewed-by: Alexey Kardashevskiy >> --- >> arch/powerpc/kvm/powerpc.c | 6 -- >> 1 file changed, 4 insertions(+), 2 deletions(-) >> >> diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c >> index 56b0faab7a5f..a1643ca988e0 100644 >> --- a/arch/powerpc/kvm/powerpc.c >> +++ b/arch/powerpc/kvm/powerpc.c >> @@ -1246,7 +1246,8 @@ static int __kvmppc_handle_load(struct kvm_vcpu *vcpu, >> >> if (bytes > sizeof(run->mmio.data)) { >> printk(KERN_ERR "%s: bad MMIO length: %d\n", __func__, >> - run->mmio.len); >> + bytes); > > I wonder though this should probably be ratelimited, informational (or > at least warning because it's a host message), and perhaps a bit more > explanatory that it's a guest problem (or at least lack of host support > for particular guest MMIO sizes). Yes, I'll ratelimit it an try to make it clear that this is something that happened inside the guest but it lacks support in KVM. Then hopefully people will tell to us if they ever need that support.
[PATCH v3 3/6] KVM: PPC: Don't use pr_emerg when mmio emulation fails
If MMIO emulation fails we deliver a Program interrupt to the guest. This is a normal event for the host, so use pr_info. Signed-off-by: Fabiano Rosas --- arch/powerpc/kvm/powerpc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c index 92e552ab5a77..4d7d0d080232 100644 --- a/arch/powerpc/kvm/powerpc.c +++ b/arch/powerpc/kvm/powerpc.c @@ -308,7 +308,7 @@ int kvmppc_emulate_mmio(struct kvm_vcpu *vcpu) kvmppc_get_last_inst(vcpu, INST_GENERIC, &last_inst); /* XXX Deliver Program interrupt to guest. */ - pr_emerg("%s: emulation failed (%08x)\n", __func__, last_inst); + pr_info("%s: emulation failed (%08x)\n", __func__, last_inst); r = RESUME_HOST; break; } -- 2.33.1
[PATCH v3 4/6] KVM: PPC: mmio: Queue interrupt at kvmppc_emulate_mmio
If MMIO emulation fails, we queue a Program interrupt to the guest. Move that line up into kvmppc_emulate_mmio, which is where we set RESUME_GUEST/HOST. This allows the removal of the 'advance' variable. No functional change, just separation of responsibilities. Signed-off-by: Fabiano Rosas --- arch/powerpc/kvm/emulate_loadstore.c | 8 +--- arch/powerpc/kvm/powerpc.c | 2 +- 2 files changed, 2 insertions(+), 8 deletions(-) diff --git a/arch/powerpc/kvm/emulate_loadstore.c b/arch/powerpc/kvm/emulate_loadstore.c index 48272a9b9c30..4dec920fe4c9 100644 --- a/arch/powerpc/kvm/emulate_loadstore.c +++ b/arch/powerpc/kvm/emulate_loadstore.c @@ -73,7 +73,6 @@ int kvmppc_emulate_loadstore(struct kvm_vcpu *vcpu) { u32 inst; enum emulation_result emulated = EMULATE_FAIL; - int advance = 1; struct instruction_op op; /* this default type might be overwritten by subcategories */ @@ -355,15 +354,10 @@ int kvmppc_emulate_loadstore(struct kvm_vcpu *vcpu) } } - if (emulated == EMULATE_FAIL) { - advance = 0; - kvmppc_core_queue_program(vcpu, 0); - } - trace_kvm_ppc_instr(inst, kvmppc_get_pc(vcpu), emulated); /* Advance past emulated instruction. */ - if (advance) + if (emulated != EMULATE_FAIL) kvmppc_set_pc(vcpu, kvmppc_get_pc(vcpu) + 4); return emulated; diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c index 4d7d0d080232..6daeea4a7de1 100644 --- a/arch/powerpc/kvm/powerpc.c +++ b/arch/powerpc/kvm/powerpc.c @@ -307,7 +307,7 @@ int kvmppc_emulate_mmio(struct kvm_vcpu *vcpu) u32 last_inst; kvmppc_get_last_inst(vcpu, INST_GENERIC, &last_inst); - /* XXX Deliver Program interrupt to guest. */ + kvmppc_core_queue_program(vcpu, 0); pr_info("%s: emulation failed (%08x)\n", __func__, last_inst); r = RESUME_HOST; break; -- 2.33.1
[PATCH v3 6/6] KVM: PPC: mmio: Reject instructions that access more than mmio.data size
The MMIO interface between the kernel and userspace uses a structure that supports a maximum of 8-bytes of data. Instructions that access more than that need to be emulated in parts. We currently don't have generic support for splitting the emulation in parts and each set of instructions needs to be explicitly included. There's already an error message being printed when a load or store exceeds the mmio.data buffer but we don't fail the emulation until later at kvmppc_complete_mmio_load and even then we allow userspace to make a partial copy of the data, which ends up overwriting some fields of the mmio structure. This patch makes the emulation fail earlier at kvmppc_handle_load|store, which will send a Program interrupt to the guest. This is better than allowing the guest to proceed with partial data. Note that this was caught in a somewhat artificial scenario using quadword instructions (lq/stq), there's no account of an actual guest in the wild running instructions that are not properly emulated. (While here, fix the error message to check against 'bytes' and not 'run->mmio.len' which at this point has an old value.) Signed-off-by: Fabiano Rosas Reviewed-by: Alexey Kardashevskiy --- arch/powerpc/kvm/powerpc.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c index 56b0faab7a5f..a1643ca988e0 100644 --- a/arch/powerpc/kvm/powerpc.c +++ b/arch/powerpc/kvm/powerpc.c @@ -1246,7 +1246,8 @@ static int __kvmppc_handle_load(struct kvm_vcpu *vcpu, if (bytes > sizeof(run->mmio.data)) { printk(KERN_ERR "%s: bad MMIO length: %d\n", __func__, - run->mmio.len); + bytes); + return EMULATE_FAIL; } run->mmio.phys_addr = vcpu->arch.paddr_accessed; @@ -1335,7 +1336,8 @@ int kvmppc_handle_store(struct kvm_vcpu *vcpu, if (bytes > sizeof(run->mmio.data)) { printk(KERN_ERR "%s: bad MMIO length: %d\n", __func__, - run->mmio.len); + bytes); + return EMULATE_FAIL; } run->mmio.phys_addr = vcpu->arch.paddr_accessed; -- 2.33.1
[PATCH v3 5/6] KVM: PPC: mmio: Return to guest after emulation failure
If MMIO emulation fails we don't want to crash the whole guest by returning to userspace. The original commit bbf45ba57eae ("KVM: ppc: PowerPC 440 KVM implementation") added a todo: /* XXX Deliver Program interrupt to guest. */ and later the commit d69614a295ae ("KVM: PPC: Separate loadstore emulation from priv emulation") added the Program interrupt injection but in another file, so I'm assuming it was missed that this block needed to be altered. Signed-off-by: Fabiano Rosas Reviewed-by: Alexey Kardashevskiy --- arch/powerpc/kvm/powerpc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c index 6daeea4a7de1..56b0faab7a5f 100644 --- a/arch/powerpc/kvm/powerpc.c +++ b/arch/powerpc/kvm/powerpc.c @@ -309,7 +309,7 @@ int kvmppc_emulate_mmio(struct kvm_vcpu *vcpu) kvmppc_get_last_inst(vcpu, INST_GENERIC, &last_inst); kvmppc_core_queue_program(vcpu, 0); pr_info("%s: emulation failed (%08x)\n", __func__, last_inst); - r = RESUME_HOST; + r = RESUME_GUEST; break; } default: -- 2.33.1
[PATCH v3 1/6] KVM: PPC: Book3S HV: Stop returning internal values to userspace
Our kvm_arch_vcpu_ioctl_run currently returns the RESUME_HOST values to userspace, against the API of the KVM_RUN ioctl which returns 0 on success. Signed-off-by: Fabiano Rosas Reviewed-by: Nicholas Piggin --- This was noticed while enabling the kvm selftests for powerpc. There's an assert at the _vcpu_run function when we return a value different from the expected. --- arch/powerpc/kvm/powerpc.c | 8 1 file changed, 8 insertions(+) diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c index a72920f4f221..1e130bb087c4 100644 --- a/arch/powerpc/kvm/powerpc.c +++ b/arch/powerpc/kvm/powerpc.c @@ -1849,6 +1849,14 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu) #ifdef CONFIG_ALTIVEC out: #endif + + /* +* We're already returning to userspace, don't pass the +* RESUME_HOST flags along. +*/ + if (r > 0) + r = 0; + vcpu_put(vcpu); return r; } -- 2.33.1
[PATCH v3 2/6] KVM: PPC: Fix vmx/vsx mixup in mmio emulation
The MMIO emulation code for vector instructions is duplicated between VSX and VMX. When emulating VMX we should check the VMX copy size instead of the VSX one. Fixes: acc9eb9305fe ("KVM: PPC: Reimplement LOAD_VMX/STORE_VMX instruction ...") Signed-off-by: Fabiano Rosas Reviewed-by: Nicholas Piggin --- arch/powerpc/kvm/powerpc.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c index 1e130bb087c4..92e552ab5a77 100644 --- a/arch/powerpc/kvm/powerpc.c +++ b/arch/powerpc/kvm/powerpc.c @@ -1507,7 +1507,7 @@ int kvmppc_handle_vmx_load(struct kvm_vcpu *vcpu, { enum emulation_result emulated = EMULATE_DONE; - if (vcpu->arch.mmio_vsx_copy_nums > 2) + if (vcpu->arch.mmio_vmx_copy_nums > 2) return EMULATE_FAIL; while (vcpu->arch.mmio_vmx_copy_nums) { @@ -1604,7 +1604,7 @@ int kvmppc_handle_vmx_store(struct kvm_vcpu *vcpu, unsigned int index = rs & KVM_MMIO_REG_MASK; enum emulation_result emulated = EMULATE_DONE; - if (vcpu->arch.mmio_vsx_copy_nums > 2) + if (vcpu->arch.mmio_vmx_copy_nums > 2) return EMULATE_FAIL; vcpu->arch.io_gpr = rs; -- 2.33.1
[PATCH v3 0/6] KVM: PPC: MMIO fixes
This v3 addresses review comments: Merge patches 3 and 7, now patch 6, which returns EMULATE_FAIL and now also alters the error message. Remove the now unnecessary 'advance' variable from emulate_loadstore in patch 4. v2: https://lore.kernel.org/r/20220106200304.4070825-1-faro...@linux.ibm.com v1: https://lore.kernel.org/r/20211223211528.3560711-1-faro...@linux.ibm.com Fabiano Rosas (6): KVM: PPC: Book3S HV: Stop returning internal values to userspace KVM: PPC: Fix vmx/vsx mixup in mmio emulation KVM: PPC: Don't use pr_emerg when mmio emulation fails KVM: PPC: mmio: Queue interrupt at kvmppc_emulate_mmio KVM: PPC: mmio: Return to guest after emulation failure KVM: PPC: mmio: Reject instructions that access more than mmio.data size arch/powerpc/kvm/emulate_loadstore.c | 8 +--- arch/powerpc/kvm/powerpc.c | 24 +--- 2 files changed, 18 insertions(+), 14 deletions(-) -- 2.33.1
Re: [PATCH v2 6/7] KVM: PPC: mmio: Return to guest after emulation failure
Alexey Kardashevskiy writes: > On 07/01/2022 07:03, Fabiano Rosas wrote: >> If MMIO emulation fails we don't want to crash the whole guest by >> returning to userspace. >> >> The original commit bbf45ba57eae ("KVM: ppc: PowerPC 440 KVM >> implementation") added a todo: >> >>/* XXX Deliver Program interrupt to guest. */ >> >> and later the commit d69614a295ae ("KVM: PPC: Separate loadstore >> emulation from priv emulation") added the Program interrupt injection >> but in another file, so I'm assuming it was missed that this block >> needed to be altered. >> >> Signed-off-by: Fabiano Rosas > > > Looks right. > Reviewed-by: Alexey Kardashevskiy > > but this means if I want to keep debugging those kvm selftests in > comfort, I'll have to have some exception handlers in the vm as > otherwise the failing $pc is lost after this change :) Yes! But that will be a problem for any test. These kinds of issues is why I wanted a trial period before sending the test infrastructure upstream. Maybe we don't need exception handlers, but just a way to force the test to crash if it tries to fetch from 0x700.
[PATCH v2 7/7] KVM: PPC: mmio: Reject instructions that access more than mmio.data size
The MMIO interface between the kernel and userspace uses a structure that supports a maximum of 8-bytes of data. Instructions that access more than that need to be emulated in parts. We currently don't have generic support for splitting the emulation in parts and each set of instructions needs to be explicitly included. There's already an error message being printed when a load or store exceeds the mmio.data buffer but we don't fail the emulation until later at kvmppc_complete_mmio_load and even then we allow userspace to make a partial copy of the data, which ends up overwriting some fields of the mmio structure. This patch makes the emulation fail earlier at kvmppc_handle_load|store, which will send a Program interrupt to the guest. This is better than allowing the guest to proceed with partial data. Note that this was caught in a somewhat artificial scenario using quadword instructions (lq/stq), there's no account of an actual guest in the wild running instructions that are not properly emulated. Signed-off-by: Fabiano Rosas --- arch/powerpc/kvm/powerpc.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c index 50e08635e18a..a1643ca988e0 100644 --- a/arch/powerpc/kvm/powerpc.c +++ b/arch/powerpc/kvm/powerpc.c @@ -1247,6 +1247,7 @@ static int __kvmppc_handle_load(struct kvm_vcpu *vcpu, if (bytes > sizeof(run->mmio.data)) { printk(KERN_ERR "%s: bad MMIO length: %d\n", __func__, bytes); + return EMULATE_FAIL; } run->mmio.phys_addr = vcpu->arch.paddr_accessed; @@ -1336,6 +1337,7 @@ int kvmppc_handle_store(struct kvm_vcpu *vcpu, if (bytes > sizeof(run->mmio.data)) { printk(KERN_ERR "%s: bad MMIO length: %d\n", __func__, bytes); + return EMULATE_FAIL; } run->mmio.phys_addr = vcpu->arch.paddr_accessed; -- 2.33.1
[PATCH v2 6/7] KVM: PPC: mmio: Return to guest after emulation failure
If MMIO emulation fails we don't want to crash the whole guest by returning to userspace. The original commit bbf45ba57eae ("KVM: ppc: PowerPC 440 KVM implementation") added a todo: /* XXX Deliver Program interrupt to guest. */ and later the commit d69614a295ae ("KVM: PPC: Separate loadstore emulation from priv emulation") added the Program interrupt injection but in another file, so I'm assuming it was missed that this block needed to be altered. Signed-off-by: Fabiano Rosas --- arch/powerpc/kvm/powerpc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c index a2e78229d645..50e08635e18a 100644 --- a/arch/powerpc/kvm/powerpc.c +++ b/arch/powerpc/kvm/powerpc.c @@ -309,7 +309,7 @@ int kvmppc_emulate_mmio(struct kvm_vcpu *vcpu) kvmppc_get_last_inst(vcpu, INST_GENERIC, &last_inst); kvmppc_core_queue_program(vcpu, 0); pr_info("%s: emulation failed (%08x)\n", __func__, last_inst); - r = RESUME_HOST; + r = RESUME_GUEST; break; } default: -- 2.33.1
[PATCH v2 5/7] KVM: PPC: mmio: Queue interrupt at kvmppc_emulate_mmio
If MMIO emulation fails, we queue a Program interrupt to the guest. Move that line up into kvmppc_emulate_mmio, which is where we set RESUME_GUEST/HOST. No functional change, just separation of responsibilities. Signed-off-by: Fabiano Rosas --- arch/powerpc/kvm/emulate_loadstore.c | 4 +--- arch/powerpc/kvm/powerpc.c | 2 +- 2 files changed, 2 insertions(+), 4 deletions(-) diff --git a/arch/powerpc/kvm/emulate_loadstore.c b/arch/powerpc/kvm/emulate_loadstore.c index 48272a9b9c30..ef50e8cfd988 100644 --- a/arch/powerpc/kvm/emulate_loadstore.c +++ b/arch/powerpc/kvm/emulate_loadstore.c @@ -355,10 +355,8 @@ int kvmppc_emulate_loadstore(struct kvm_vcpu *vcpu) } } - if (emulated == EMULATE_FAIL) { + if (emulated == EMULATE_FAIL) advance = 0; - kvmppc_core_queue_program(vcpu, 0); - } trace_kvm_ppc_instr(inst, kvmppc_get_pc(vcpu), emulated); diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c index 3fc8057db4b4..a2e78229d645 100644 --- a/arch/powerpc/kvm/powerpc.c +++ b/arch/powerpc/kvm/powerpc.c @@ -307,7 +307,7 @@ int kvmppc_emulate_mmio(struct kvm_vcpu *vcpu) u32 last_inst; kvmppc_get_last_inst(vcpu, INST_GENERIC, &last_inst); - /* XXX Deliver Program interrupt to guest. */ + kvmppc_core_queue_program(vcpu, 0); pr_info("%s: emulation failed (%08x)\n", __func__, last_inst); r = RESUME_HOST; break; -- 2.33.1
[PATCH v2 4/7] KVM: PPC: Don't use pr_emerg when mmio emulation fails
If MMIO emulation fails we deliver a Program interrupt to the guest. This is a normal event for the host, so use pr_info. Signed-off-by: Fabiano Rosas --- arch/powerpc/kvm/powerpc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c index 0b0818d032e1..3fc8057db4b4 100644 --- a/arch/powerpc/kvm/powerpc.c +++ b/arch/powerpc/kvm/powerpc.c @@ -308,7 +308,7 @@ int kvmppc_emulate_mmio(struct kvm_vcpu *vcpu) kvmppc_get_last_inst(vcpu, INST_GENERIC, &last_inst); /* XXX Deliver Program interrupt to guest. */ - pr_emerg("%s: emulation failed (%08x)\n", __func__, last_inst); + pr_info("%s: emulation failed (%08x)\n", __func__, last_inst); r = RESUME_HOST; break; } -- 2.33.1
[PATCH v2 3/7] KVM: PPC: Fix mmio length message
We check against 'bytes' but print 'run->mmio.len' which at that point has an old value. e.g. 16-byte load: before: __kvmppc_handle_load: bad MMIO length: 8 now: __kvmppc_handle_load: bad MMIO length: 16 Signed-off-by: Fabiano Rosas --- arch/powerpc/kvm/powerpc.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c index 92e552ab5a77..0b0818d032e1 100644 --- a/arch/powerpc/kvm/powerpc.c +++ b/arch/powerpc/kvm/powerpc.c @@ -1246,7 +1246,7 @@ static int __kvmppc_handle_load(struct kvm_vcpu *vcpu, if (bytes > sizeof(run->mmio.data)) { printk(KERN_ERR "%s: bad MMIO length: %d\n", __func__, - run->mmio.len); + bytes); } run->mmio.phys_addr = vcpu->arch.paddr_accessed; @@ -1335,7 +1335,7 @@ int kvmppc_handle_store(struct kvm_vcpu *vcpu, if (bytes > sizeof(run->mmio.data)) { printk(KERN_ERR "%s: bad MMIO length: %d\n", __func__, - run->mmio.len); + bytes); } run->mmio.phys_addr = vcpu->arch.paddr_accessed; -- 2.33.1
[PATCH v2 2/7] KVM: PPC: Fix vmx/vsx mixup in mmio emulation
The MMIO emulation code for vector instructions is duplicated between VSX and VMX. When emulating VMX we should check the VMX copy size instead of the VSX one. Fixes: acc9eb9305fe ("KVM: PPC: Reimplement LOAD_VMX/STORE_VMX instruction ...") Signed-off-by: Fabiano Rosas Reviewed-by: Nicholas Piggin --- arch/powerpc/kvm/powerpc.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c index 1e130bb087c4..92e552ab5a77 100644 --- a/arch/powerpc/kvm/powerpc.c +++ b/arch/powerpc/kvm/powerpc.c @@ -1507,7 +1507,7 @@ int kvmppc_handle_vmx_load(struct kvm_vcpu *vcpu, { enum emulation_result emulated = EMULATE_DONE; - if (vcpu->arch.mmio_vsx_copy_nums > 2) + if (vcpu->arch.mmio_vmx_copy_nums > 2) return EMULATE_FAIL; while (vcpu->arch.mmio_vmx_copy_nums) { @@ -1604,7 +1604,7 @@ int kvmppc_handle_vmx_store(struct kvm_vcpu *vcpu, unsigned int index = rs & KVM_MMIO_REG_MASK; enum emulation_result emulated = EMULATE_DONE; - if (vcpu->arch.mmio_vsx_copy_nums > 2) + if (vcpu->arch.mmio_vmx_copy_nums > 2) return EMULATE_FAIL; vcpu->arch.io_gpr = rs; -- 2.33.1
[PATCH v2 1/7] KVM: PPC: Book3S HV: Stop returning internal values to userspace
Our kvm_arch_vcpu_ioctl_run currently returns the RESUME_HOST values to userspace, against the API of the KVM_RUN ioctl which returns 0 on success. Signed-off-by: Fabiano Rosas Reviewed-by: Nicholas Piggin --- This was noticed while enabling the kvm selftests for powerpc. There's an assert at the _vcpu_run function when we return a value different from the expected. --- arch/powerpc/kvm/powerpc.c | 8 1 file changed, 8 insertions(+) diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c index a72920f4f221..1e130bb087c4 100644 --- a/arch/powerpc/kvm/powerpc.c +++ b/arch/powerpc/kvm/powerpc.c @@ -1849,6 +1849,14 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu) #ifdef CONFIG_ALTIVEC out: #endif + + /* +* We're already returning to userspace, don't pass the +* RESUME_HOST flags along. +*/ + if (r > 0) + r = 0; + vcpu_put(vcpu); return r; } -- 2.33.1
[PATCH v2 0/7] KVM: PPC: MMIO fixes
The change from v1 is that I have altered the MMIO size check to fail the emulation in case the size exceeds 8 bytes. That brought some cleanups and another fix along with it, we were returning to userspace in case of failure instead of the guest. This has now become an MMIO series, but since the first commit has been reviewed already, I'm leaving it here. v1: https://lore.kernel.org/r/20211223211528.3560711-1-faro...@linux.ibm.com Fabiano Rosas (7): KVM: PPC: Book3S HV: Stop returning internal values to userspace KVM: PPC: Fix vmx/vsx mixup in mmio emulation KVM: PPC: Fix mmio length message KVM: PPC: Don't use pr_emerg when mmio emulation fails KVM: PPC: mmio: Queue interrupt at kvmppc_emulate_mmio KVM: PPC: mmio: Return to guest after emulation failure KVM: PPC: mmio: Reject instructions that access more than mmio.data size arch/powerpc/kvm/emulate_loadstore.c | 4 +--- arch/powerpc/kvm/powerpc.c | 24 +--- 2 files changed, 18 insertions(+), 10 deletions(-) -- 2.33.1
Re: [PATCH 3/3] KVM: PPC: Fix mmio length message
Nicholas Piggin writes: > Excerpts from Fabiano Rosas's message of December 24, 2021 7:15 am: >> We check against 'bytes' but print 'run->mmio.len' which at that point >> has an old value. >> >> e.g. 16-byte load: >> >> before: >> __kvmppc_handle_load: bad MMIO length: 8 >> >> now: >> __kvmppc_handle_load: bad MMIO length: 16 >> >> Signed-off-by: Fabiano Rosas > > This patch fine, but in the case of overflow we continue anyway here. > Can that overwrite some other memory in the kvm_run struct? I tested this and QEMU will indeed overwrite the subsequent fields of kvm_run. A `lq` on this data: mmio_test_data: .long 0xdeadbeef .long 0x8badf00d .long 0x1337c0de .long 0x01abcdef produces: __kvmppc_handle_load: bad MMIO length: 16 kvmppc_complete_mmio_load data: 0x8badf00ddeadbeef bad MMIO length: 322420958 <-- mmio.len got nuked But then we return from kvmppc_complete_mmio_load without writing to the registers. > > This is familiar, maybe something Alexey has noticed in the past too? > What was the consensus on fixing it? (at least it should have a comment > if it's not a problem IMO) My plan was to just add quadword support. And whatever else might missing. But I got sidetracked with how to test this so I'm just now coming back to it. Perhaps a more immediate fix is needed before that? We could block loads and stores larger than 8 bytes earlier at kvmppc_emulate_loadstore for instance.
Re: [PATCH 2/3] KVM: PPC: Fix vmx/vsx mixup in mmio emulation
Nicholas Piggin writes: > Excerpts from Fabiano Rosas's message of December 24, 2021 7:15 am: >> The MMIO emulation code for vector instructions is duplicated between >> VSX and VMX. When emulating VMX we should check the VMX copy size >> instead of the VSX one. >> >> Fixes: acc9eb9305fe ("KVM: PPC: Reimplement LOAD_VMX/STORE_VMX instruction >> ...") >> Signed-off-by: Fabiano Rosas > > Good catch. AFAIKS handle_vmx_store needs the same treatment? If you > agree then Half the bug now, half the bug next year... haha I'll send a v2. aside: All this duplication is kind of annoying. I'm looking into what it would take to have quadword instruction emulation here as well (Alexey caught a bug with syskaller) and the code would be really similar. I see that x86 has a more generic implementation that maybe we could take advantage of. See "f78146b0f923 (KVM: Fix page-crossing MMIO)"
[PATCH 2/3] KVM: PPC: Book3S HV: Delay setting of kvm ops
Delay the setting of kvm_hv_ops until after all init code has completed. This avoids leaving the ops still accessible if the init fails. Signed-off-by: Fabiano Rosas --- arch/powerpc/kvm/book3s_hv.c | 12 +++- 1 file changed, 7 insertions(+), 5 deletions(-) diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index 9f4765951733..53400932f5d8 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -6087,9 +6087,6 @@ static int kvmppc_book3s_init_hv(void) } #endif - kvm_ops_hv.owner = THIS_MODULE; - kvmppc_hv_ops = &kvm_ops_hv; - init_default_hcalls(); init_vcore_lists(); @@ -6105,10 +6102,15 @@ static int kvmppc_book3s_init_hv(void) } r = kvmppc_uvmem_init(); - if (r < 0) + if (r < 0) { pr_err("KVM-HV: kvmppc_uvmem_init failed %d\n", r); + return r; + } - return r; + kvm_ops_hv.owner = THIS_MODULE; + kvmppc_hv_ops = &kvm_ops_hv; + + return 0; } static void kvmppc_book3s_exit_hv(void) -- 2.33.1
[PATCH 1/3] KVM: PPC: Book3S HV: Check return value of kvmppc_radix_init
The return of the function is being shadowed by the call to kvmppc_uvmem_init. Fixes: ca9f4942670c ("KVM: PPC: Book3S HV: Support for running secure guests") Signed-off-by: Fabiano Rosas --- arch/powerpc/kvm/book3s_hv.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index 7b74fc0a986b..9f4765951733 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -6098,8 +6098,11 @@ static int kvmppc_book3s_init_hv(void) if (r) return r; - if (kvmppc_radix_possible()) + if (kvmppc_radix_possible()) { r = kvmppc_radix_init(); + if (r) + return r; + } r = kvmppc_uvmem_init(); if (r < 0) -- 2.33.1
[PATCH 3/3] KVM: PPC: Book3S HV: Free allocated memory if module init fails
The module's exit function is not called when the init fails, we need to do cleanup before returning. Signed-off-by: Fabiano Rosas --- arch/powerpc/kvm/book3s_hv.c | 15 +++ 1 file changed, 11 insertions(+), 4 deletions(-) diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index 53400932f5d8..2d79298e7ca4 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -6065,7 +6065,7 @@ static int kvmppc_book3s_init_hv(void) r = kvm_init_subcore_bitmap(); if (r) - return r; + goto err; /* * We need a way of accessing the XICS interrupt controller, @@ -6080,7 +6080,8 @@ static int kvmppc_book3s_init_hv(void) np = of_find_compatible_node(NULL, NULL, "ibm,opal-intc"); if (!np) { pr_err("KVM-HV: Cannot determine method for accessing XICS\n"); - return -ENODEV; + r = -ENODEV; + goto err; } /* presence of intc confirmed - node can be dropped again */ of_node_put(np); @@ -6093,12 +6094,12 @@ static int kvmppc_book3s_init_hv(void) r = kvmppc_mmu_hv_init(); if (r) - return r; + goto err; if (kvmppc_radix_possible()) { r = kvmppc_radix_init(); if (r) - return r; + goto err; } r = kvmppc_uvmem_init(); @@ -6111,6 +6112,12 @@ static int kvmppc_book3s_init_hv(void) kvmppc_hv_ops = &kvm_ops_hv; return 0; + +err: + kvmhv_nested_exit(); + kvmppc_radix_exit(); + + return r; } static void kvmppc_book3s_exit_hv(void) -- 2.33.1
[PATCH 0/3] KVM: PPC: KVM module exit fixes
This is a resend the module cleanup fixes but this time without the HV/PR merge. Fabiano Rosas (1): KVM: PPC: Book3S HV: Check return value of kvmppc_radix_init KVM: PPC: Book3S HV: Delay setting of kvm ops KVM: PPC: Book3S HV: Free allocated memory if module init fails arch/powerpc/kvm/book3s_hv.c | 28 1 file changed, 20 insertions(+), 8 deletions(-) -- 2.33.1
[PATCH 3/3] KVM: PPC: Fix mmio length message
We check against 'bytes' but print 'run->mmio.len' which at that point has an old value. e.g. 16-byte load: before: __kvmppc_handle_load: bad MMIO length: 8 now: __kvmppc_handle_load: bad MMIO length: 16 Signed-off-by: Fabiano Rosas --- arch/powerpc/kvm/powerpc.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c index 793d42bd6c8f..7823207eb8f1 100644 --- a/arch/powerpc/kvm/powerpc.c +++ b/arch/powerpc/kvm/powerpc.c @@ -1246,7 +1246,7 @@ static int __kvmppc_handle_load(struct kvm_vcpu *vcpu, if (bytes > sizeof(run->mmio.data)) { printk(KERN_ERR "%s: bad MMIO length: %d\n", __func__, - run->mmio.len); + bytes); } run->mmio.phys_addr = vcpu->arch.paddr_accessed; @@ -1335,7 +1335,7 @@ int kvmppc_handle_store(struct kvm_vcpu *vcpu, if (bytes > sizeof(run->mmio.data)) { printk(KERN_ERR "%s: bad MMIO length: %d\n", __func__, - run->mmio.len); + bytes); } run->mmio.phys_addr = vcpu->arch.paddr_accessed; -- 2.33.1