[PATCH v1 03/10] powerpc/bpf/32: No need to zeroise r4 when not doing tail call
r4 is cleared at function entry and used as tail call count. But when the function does not perform tail call, r4 is ignored, so no need to clear it. Replace it by a NOP in that case. Signed-off-by: Christophe Leroy --- arch/powerpc/net/bpf_jit_comp32.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/net/bpf_jit_comp32.c b/arch/powerpc/net/bpf_jit_comp32.c index a379b0ce19ff..4e6caee9c98a 100644 --- a/arch/powerpc/net/bpf_jit_comp32.c +++ b/arch/powerpc/net/bpf_jit_comp32.c @@ -114,7 +114,10 @@ void bpf_jit_build_prologue(u32 *image, struct codegen_context *ctx) int i; /* Initialize tail_call_cnt, to be skipped if we do tail calls. */ - EMIT(PPC_RAW_LI(_R4, 0)); + if (ctx->seen & SEEN_TAILCALL) + EMIT(PPC_RAW_LI(_R4, 0)); + else + EMIT(PPC_RAW_NOP()); #define BPF_TAILCALL_PROLOGUE_SIZE 4 -- 2.38.1
[PATCH v1 07/10] powerpc/bpf: Only pad length-variable code at initial pass
Now that two real additional passes are performed in case of extra pass requested by BPF core, padding is not needed anymore except during initial pass done before memory allocation to count maximum possible program size. So, only do the padding when 'image' is NULL. Signed-off-by: Christophe Leroy --- arch/powerpc/net/bpf_jit_comp32.c | 8 +++- arch/powerpc/net/bpf_jit_comp64.c | 12 +++- 2 files changed, 10 insertions(+), 10 deletions(-) diff --git a/arch/powerpc/net/bpf_jit_comp32.c b/arch/powerpc/net/bpf_jit_comp32.c index 7c129fe810f5..6c45d953d4e8 100644 --- a/arch/powerpc/net/bpf_jit_comp32.c +++ b/arch/powerpc/net/bpf_jit_comp32.c @@ -206,9 +206,6 @@ int bpf_jit_emit_func_call_rel(u32 *image, struct codegen_context *ctx, u64 func if (image && rel < 0x200 && rel >= -0x200) { PPC_BL(func); - EMIT(PPC_RAW_NOP()); - EMIT(PPC_RAW_NOP()); - EMIT(PPC_RAW_NOP()); } else { /* Load function address into r0 */ EMIT(PPC_RAW_LIS(_R0, IMM_H(func))); @@ -973,8 +970,9 @@ int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, struct codegen_context * PPC_LI32(dst_reg_h, (u32)insn[i + 1].imm); PPC_LI32(dst_reg, (u32)insn[i].imm); /* padding to allow full 4 instructions for later patching */ - for (j = ctx->idx - tmp_idx; j < 4; j++) - EMIT(PPC_RAW_NOP()); + if (!image) + for (j = ctx->idx - tmp_idx; j < 4; j++) + EMIT(PPC_RAW_NOP()); /* Adjust for two bpf instructions */ addrs[++i] = ctx->idx * 4; break; diff --git a/arch/powerpc/net/bpf_jit_comp64.c b/arch/powerpc/net/bpf_jit_comp64.c index 29ee306d6302..1c2931b4bc15 100644 --- a/arch/powerpc/net/bpf_jit_comp64.c +++ b/arch/powerpc/net/bpf_jit_comp64.c @@ -240,13 +240,14 @@ int bpf_jit_emit_func_call_rel(u32 *image, struct codegen_context *ctx, u64 func * load the callee's address, but this may optimize the number of * instructions required based on the nature of the address. * -* Since we don't want the number of instructions emitted to change, +* Since we don't want the number of instructions emitted to increase, * we pad the optimized PPC_LI64() call with NOPs to guarantee that * we always have a five-instruction sequence, which is the maximum * that PPC_LI64() can emit. */ - for (i = ctx->idx - ctx_idx; i < 5; i++) - EMIT(PPC_RAW_NOP()); + if (!image) + for (i = ctx->idx - ctx_idx; i < 5; i++) + EMIT(PPC_RAW_NOP()); EMIT(PPC_RAW_MTCTR(_R12)); EMIT(PPC_RAW_BCTRL()); @@ -938,8 +939,9 @@ int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, struct codegen_context * tmp_idx = ctx->idx; PPC_LI64(dst_reg, imm64); /* padding to allow full 5 instructions for later patching */ - for (j = ctx->idx - tmp_idx; j < 5; j++) - EMIT(PPC_RAW_NOP()); + if (!image) + for (j = ctx->idx - tmp_idx; j < 5; j++) + EMIT(PPC_RAW_NOP()); /* Adjust for two bpf instructions */ addrs[++i] = ctx->idx * 4; break; -- 2.38.1
[PATCH v1 02/10] powerpc: Remove __kernel_text_address() in show_instructions()
That test was introducted in 2006 by commit 00ae36de49cc ("[POWERPC] Better check in show_instructions"). At that time, there was no BPF progs. As seen in message of commit 89d21e259a94 ("powerpc/bpf/32: Fix Oops on tail call tests"), when a page fault occurs in test_bpf.ko for instance, the code is dumped as s. Allthough __kernel_text_address() checks is_bpf_text_address(), it seems it is not enough. Today, show_instructions() uses get_kernel_nofault() to read the code, so there is no real need for additional verifications. ARM64 and x86 don't do any additional check before dumping instructions. Do the same and remove __kernel_text_address() in show_instructions(). Signed-off-by: Christophe Leroy --- arch/powerpc/kernel/process.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c index e3e1feaa536a..c4e9f090ad22 100644 --- a/arch/powerpc/kernel/process.c +++ b/arch/powerpc/kernel/process.c @@ -1373,8 +1373,7 @@ static void show_instructions(struct pt_regs *regs) for (i = 0; i < NR_INSN_TO_PRINT; i++) { int instr; - if (!__kernel_text_address(pc) || - get_kernel_nofault(instr, (const void *)pc)) { + if (get_kernel_nofault(instr, (const void *)pc)) { pr_cont(" "); } else { if (nip == pc) -- 2.38.1
[PATCH v1 05/10] powerpc/bpf/32: BPF prog is never called with more than one arg
BPF progs are never called with more than one argument, plus the tail call count as a second argument when needed. So, no need to retrieve 9th and 10th argument (5th 64 bits argument) from the stack in prologue. Signed-off-by: Christophe Leroy --- arch/powerpc/net/bpf_jit_comp32.c | 6 -- 1 file changed, 6 deletions(-) diff --git a/arch/powerpc/net/bpf_jit_comp32.c b/arch/powerpc/net/bpf_jit_comp32.c index 7f54d37bede6..7c129fe810f5 100644 --- a/arch/powerpc/net/bpf_jit_comp32.c +++ b/arch/powerpc/net/bpf_jit_comp32.c @@ -159,12 +159,6 @@ void bpf_jit_build_prologue(u32 *image, struct codegen_context *ctx) if (bpf_is_seen_register(ctx, i)) EMIT(PPC_RAW_STW(i, _R1, bpf_jit_stack_offsetof(ctx, i))); - /* If needed retrieve arguments 9 and 10, ie 5th 64 bits arg.*/ - if (bpf_is_seen_register(ctx, bpf_to_ppc(BPF_REG_5))) { - EMIT(PPC_RAW_LWZ(bpf_to_ppc(BPF_REG_5) - 1, _R1, BPF_PPC_STACKFRAME(ctx)) + 8); - EMIT(PPC_RAW_LWZ(bpf_to_ppc(BPF_REG_5), _R1, BPF_PPC_STACKFRAME(ctx)) + 12); - } - /* Setup frame pointer to point to the bpf stack area */ if (bpf_is_seen_register(ctx, bpf_to_ppc(BPF_REG_FP))) { EMIT(PPC_RAW_LI(bpf_to_ppc(BPF_REG_FP) - 1, 0)); -- 2.38.1
[PATCH v1 01/10] powerpc/bpf/32: Fix Oops on tail call tests
test_bpf tail call tests end up as: test_bpf: #0 Tail call leaf jited:1 85 PASS test_bpf: #1 Tail call 2 jited:1 111 PASS test_bpf: #2 Tail call 3 jited:1 145 PASS test_bpf: #3 Tail call 4 jited:1 170 PASS test_bpf: #4 Tail call load/store leaf jited:1 190 PASS test_bpf: #5 Tail call load/store jited:1 BUG: Unable to handle kernel data access on write at 0xf1b4e000 Faulting instruction address: 0xbe86b710 Oops: Kernel access of bad area, sig: 11 [#1] BE PAGE_SIZE=4K MMU=Hash PowerMac Modules linked in: test_bpf(+) CPU: 0 PID: 97 Comm: insmod Not tainted 6.1.0-rc4+ #195 Hardware name: PowerMac3,1 750CL 0x87210 PowerMac NIP: be86b710 LR: be857e88 CTR: be86b704 REGS: f1b4df20 TRAP: 0300 Not tainted (6.1.0-rc4+) MSR: 9032 CR: 28008242 XER: DAR: f1b4e000 DSISR: 4200 GPR00: 0001 f1b4dfe0 c11d2280 0002 GPR08: f1b4e000 be86b704 f1b4e000 100d816a f244 fe73baa8 GPR16: f2458000 c1941ae4 f1fe2248 0045 c0de f2458030 GPR24: 03e8 000f f2458000 f1b4dc90 3e584b46 f24466a0 c1941a00 NIP [be86b710] 0xbe86b710 LR [be857e88] __run_one+0xec/0x264 [test_bpf] Call Trace: [f1b4dfe0] [0002] 0x2 (unreliable) Instruction dump: ---[ end trace ]--- This is a tentative to write above the stack. The problem is encoutered with tests added by commit 38608ee7b690 ("bpf, tests: Add load store test case for tail call") This happens because tail call is done to a BPF prog with a different stack_depth. At the time being, the stack is kept as is when the caller tail calls its callee. But at exit, the callee restores the stack based on its own properties. Therefore here, at each run, r1 is erroneously increased by 32 - 16 = 16 bytes. This was done that way in order to pass the tail call count from caller to callee through the stack. As powerpc32 doesn't have a red zone in the stack, it was necessary the maintain the stack as is for the tail call. But it was not anticipated that the BPF frame size could be different. Let's take a new approach. Use register r4 to carry the tail call count during the tail call, and save it into the stack at function entry if required. This means the input parameter must be in r3, which is more correct as it is a 32 bits parameter, then tail call better match with normal BPF function entry, the down side being that we move that input parameter back and forth between r3 and r4. That can be optimised later. Doing that also has the advantage of maximising the common parts between tail calls and a normal function exit. With the fix, tail call tests are now successfull: test_bpf: #0 Tail call leaf jited:1 53 PASS test_bpf: #1 Tail call 2 jited:1 115 PASS test_bpf: #2 Tail call 3 jited:1 154 PASS test_bpf: #3 Tail call 4 jited:1 165 PASS test_bpf: #4 Tail call load/store leaf jited:1 101 PASS test_bpf: #5 Tail call load/store jited:1 141 PASS test_bpf: #6 Tail call error path, max count reached jited:1 994 PASS test_bpf: #7 Tail call count preserved across function calls jited:1 140975 PASS test_bpf: #8 Tail call error path, NULL target jited:1 110 PASS test_bpf: #9 Tail call error path, index out of range jited:1 69 PASS test_bpf: test_tail_calls: Summary: 10 PASSED, 0 FAILED, [10/10 JIT'ed] Suggested-by: Naveen N. Rao Fixes: 51c66ad849a7 ("powerpc/bpf: Implement extended BPF on PPC32") Cc: sta...@vger.kernel.org Signed-off-by: Christophe Leroy Tested-by: Naveen N. Rao Link: https://lore.kernel.org/r/757acccb7fbfc78efa42dcf3c974b46678198905.1669278887.git.christophe.le...@csgroup.eu --- arch/powerpc/net/bpf_jit_comp32.c | 52 +-- 1 file changed, 21 insertions(+), 31 deletions(-) diff --git a/arch/powerpc/net/bpf_jit_comp32.c b/arch/powerpc/net/bpf_jit_comp32.c index 43f1c76d48ce..a379b0ce19ff 100644 --- a/arch/powerpc/net/bpf_jit_comp32.c +++ b/arch/powerpc/net/bpf_jit_comp32.c @@ -113,23 +113,19 @@ void bpf_jit_build_prologue(u32 *image, struct codegen_context *ctx) { int i; - /* First arg comes in as a 32 bits pointer. */ - EMIT(PPC_RAW_MR(bpf_to_ppc(BPF_REG_1), _R3)); - EMIT(PPC_RAW_LI(bpf_to_ppc(BPF_REG_1) - 1, 0)); + /* Initialize tail_call_cnt, to be skipped if we do tail calls. */ + EMIT(PPC_RAW_LI(_R4, 0)); + +#define BPF_TAILCALL_PROLOGUE_SIZE 4 + EMIT(PPC_RAW_STWU(_R1, _R1, -BPF_PPC_STACKFRAME(ctx))); - /* -* Initialize tail_call_cnt in stack frame if we do tail calls. -* Otherwise, put in NOPs so that it can be skipped when we are -* invoked through a tail call. -*/ if (ctx->seen & SEEN_TAILCALL) - EMIT(PPC_RAW_STW(bpf_to_ppc(BPF_REG_1) - 1, _R1, -
[PATCH v5 7/7] powerpc/64: Sanitise user registers on interrupt in pseries, POWERNV
Cause pseries and POWERNV platforms to default to zeroising all potentially user-defined registers when entering the kernel by means of any interrupt source, reducing user-influence of the kernel and the likelihood or producing speculation gadgets. Acked-by: Nicholas Piggin Signed-off-by: Rohan McLure --- Resubmitting patches as their own series after v6 partially merged: Link: https://lore.kernel.org/all/166488988686.779920.13794870102696416283.b4...@ellerman.id.au/t/ v4: Default on POWERNV as well. --- arch/powerpc/Kconfig | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index 280c797e0f30..2ab114a02f62 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -534,7 +534,7 @@ config HOTPLUG_CPU config INTERRUPT_SANITIZE_REGISTERS bool "Clear gprs on interrupt arrival" depends on PPC64 && ARCH_HAS_SYSCALL_WRAPPER - default PPC_BOOK3E_64 + default PPC_BOOK3E_64 || PPC_PSERIES || PPC_POWERNV help Reduce the influence of user register state on interrupt handlers and syscalls through clearing user state from registers before handling -- 2.37.2
[PATCH v5 6/7] powerpc/64e: Clear gprs on interrupt routine entry on Book3E
Zero GPRS r14-r31 on entry into the kernel for interrupt sources to limit influence of user-space values in potential speculation gadgets. Prior to this commit, all other GPRS are reassigned during the common prologue to interrupt handlers and so need not be zeroised explicitly. This may be done safely, without loss of register state prior to the interrupt, as the common prologue saves the initial values of non-volatiles, which are unconditionally restored in interrupt_64.S. Mitigation defaults to enabled by INTERRUPT_SANITIZE_REGISTERS. Reviewed-by: Nicholas Piggin Signed-off-by: Rohan McLure --- Resubmitting patches as their own series after v6 partially merged: Link: https://lore.kernel.org/all/166488988686.779920.13794870102696416283.b4...@ellerman.id.au/t/ --- arch/powerpc/kernel/exceptions-64e.S | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/kernel/exceptions-64e.S b/arch/powerpc/kernel/exceptions-64e.S index 930e36099015..52431cb0c083 100644 --- a/arch/powerpc/kernel/exceptions-64e.S +++ b/arch/powerpc/kernel/exceptions-64e.S @@ -358,7 +358,6 @@ ret_from_mc_except: std r14,PACA_EXMC+EX_R14(r13); \ std r15,PACA_EXMC+EX_R15(r13) - /* Core exception code for all exceptions except TLB misses. */ #define EXCEPTION_COMMON_LVL(n, scratch, excf) \ exc_##n##_common: \ @@ -394,7 +393,8 @@ exc_##n##_common: \ std r12,STACK_FRAME_OVERHEAD-16(r1); /* mark the frame */ \ std r3,_TRAP(r1); /* set trap number */ \ std r0,RESULT(r1); /* clear regs->result */\ - SAVE_NVGPRS(r1); + SAVE_NVGPRS(r1);\ + SANITIZE_NVGPRS(); /* minimise speculation influence */ #define EXCEPTION_COMMON(n) \ EXCEPTION_COMMON_LVL(n, SPRN_SPRG_GEN_SCRATCH, PACA_EXGEN) -- 2.37.2
[PATCH v5 5/7] powerpc/64s: Zeroise gprs on interrupt routine entry on Book3S
Zeroise user state in gprs (assign to zero) to reduce the influence of user registers on speculation within kernel syscall handlers. Clears occur at the very beginning of the sc and scv 0 interrupt handlers, with restores occurring following the execution of the syscall handler. Zeroise GPRS r0, r2-r11, r14-r31, on entry into the kernel for all other interrupt sources. The remaining gprs are overwritten by entry macros to interrupt handlers, irrespective of whether or not a given handler consumes these register values. If an interrupt does not select the IMSR_R12 IOption, zeroise r12. Prior to this commit, r14-r31 are restored on a per-interrupt basis at exit, but now they are always restored on 64bit Book3S. Remove explicit REST_NVGPRS invocations on 64-bit Book3S. 32-bit systems do not clear user registers on interrupt, and continue to depend on the return value of interrupt_exit_user_prepare to determine whether or not to restore non-volatiles. The mmap_bench benchmark in selftests should rapidly invoke pagefaults. See ~0.8% performance regression with this mitigation, but this indicates the worst-case performance due to heavier-weight interrupt handlers. This mitigation is able to be enabled/disabled through CONFIG_INTERRUPT_SANITIZE_REGISTERS. Reviewed-by: Nicholas Piggin Signed-off-by: Rohan McLure --- v2: REST_NVGPRS should be conditional on mitigation in scv handler. Fix improper multi-line preprocessor macro in interrupt_64.S v4: Split off IMSR_R12 definition into its own patch. Move macro definitions for register sanitisation into asm/ppc_asm.h v5: Replace unconditional ZEROIZE_... with conditional SANITIZE_... counterparts --- arch/powerpc/kernel/exceptions-64s.S | 27 ++- arch/powerpc/kernel/interrupt_64.S | 16 ++-- 2 files changed, 32 insertions(+), 11 deletions(-) diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S index 58d72db1d484..68de42e42268 100644 --- a/arch/powerpc/kernel/exceptions-64s.S +++ b/arch/powerpc/kernel/exceptions-64s.S @@ -506,6 +506,7 @@ DEFINE_FIXED_SYMBOL(\name\()_common_real, text) std r10,0(r1) /* make stack chain pointer */ std r0,GPR0(r1) /* save r0 in stackframe*/ std r10,GPR1(r1)/* save r1 in stackframe*/ + SANITIZE_GPR(0) /* Mark our [H]SRRs valid for return */ li r10,1 @@ -548,8 +549,14 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR) std r9,GPR11(r1) std r10,GPR12(r1) std r11,GPR13(r1) + .if !IMSR_R12 + SANITIZE_GPRS(9, 12) + .else + SANITIZE_GPRS(9, 11) + .endif SAVE_NVGPRS(r1) + SANITIZE_NVGPRS() .if IDAR .if IISIDE @@ -581,8 +588,8 @@ BEGIN_FTR_SECTION END_FTR_SECTION_IFSET(CPU_FTR_CFAR) ld r10,IAREA+EX_CTR(r13) std r10,_CTR(r1) - std r2,GPR2(r1) /* save r2 in stackframe*/ - SAVE_GPRS(3, 8, r1) /* save r3 - r8 in stackframe */ + SAVE_GPRS(2, 8, r1) /* save r2 - r8 in stackframe */ + SANITIZE_GPRS(2, 8) mflrr9 /* Get LR, later save to stack */ LOAD_PACA_TOC() /* get kernel TOC into r2 */ std r9,_LINK(r1) @@ -700,6 +707,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_CFAR) mtlrr9 ld r9,_CCR(r1) mtcrr9 + SANITIZE_RESTORE_NVGPRS() REST_GPRS(2, 13, r1) REST_GPR(0, r1) /* restore original r1. */ @@ -1445,7 +1453,7 @@ ALT_MMU_FTR_SECTION_END_IFCLR(MMU_FTR_TYPE_RADIX) * do_break() may have changed the NV GPRS while handling a breakpoint. * If so, we need to restore them with their updated values. */ - REST_NVGPRS(r1) + HANDLER_RESTORE_NVGPRS() b interrupt_return_srr @@ -1671,7 +1679,7 @@ EXC_COMMON_BEGIN(alignment_common) GEN_COMMON alignment addir3,r1,STACK_FRAME_OVERHEAD bl alignment_exception - REST_NVGPRS(r1) /* instruction emulation may change GPRs */ + HANDLER_RESTORE_NVGPRS() /* instruction emulation may change GPRs */ b interrupt_return_srr @@ -1737,7 +1745,7 @@ EXC_COMMON_BEGIN(program_check_common) .Ldo_program_check: addir3,r1,STACK_FRAME_OVERHEAD bl program_check_exception - REST_NVGPRS(r1) /* instruction emulation may change GPRs */ + HANDLER_RESTORE_NVGPRS() /* instruction emulation may change GPRs */ b interrupt_return_srr @@ -2169,7 +2177,7 @@ EXC_COMMON_BEGIN(emulation_assist_common) GEN_COMMON emulation_assist addir3,r1,STACK_FRAME_OVERHEAD bl emulation_assist_interrupt - REST_NVGPRS(r1) /* instruction emulation may change GPRs */ + HANDLER_RESTORE_NVGPRS() /* instruction emulation may change GPRs */
[PATCH v5 4/7] powerpc/64s: IOption for MSR stored in r12
Interrupt handlers in asm/exceptions-64s.S contain a great deal of common code produced by the GEN_COMMON macros. Currently, at the exit point of the macro, r12 will contain the contents of the MSR. A future patch will cause these macros to zeroise architected registers to avoid potential speculation influence of user data. Provide an IOption that signals that r12 must be retained, as the interrupt handler assumes it to hold the contents of the MSR. Reviewed-by: Nicholas Piggin Signed-off-by: Rohan McLure --- v4: Split 64s register sanitisation commit to establish this IOption --- arch/powerpc/kernel/exceptions-64s.S | 7 +++ 1 file changed, 7 insertions(+) diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S index 5381a43e50fe..58d72db1d484 100644 --- a/arch/powerpc/kernel/exceptions-64s.S +++ b/arch/powerpc/kernel/exceptions-64s.S @@ -111,6 +111,7 @@ name: #define ISTACK .L_ISTACK_\name\() /* Set regular kernel stack */ #define __ISTACK(name) .L_ISTACK_ ## name #define IKUAP .L_IKUAP_\name\() /* Do KUAP lock */ +#define IMSR_R12 .L_IMSR_R12_\name\()/* Assumes MSR saved to r12 */ #define INT_DEFINE_BEGIN(n)\ .macro int_define_ ## n name @@ -176,6 +177,9 @@ do_define_int n .ifndef IKUAP IKUAP=1 .endif + .ifndef IMSR_R12 + IMSR_R12=0 + .endif .endm /* @@ -1751,6 +1755,7 @@ INT_DEFINE_BEGIN(fp_unavailable) #ifdef CONFIG_KVM_BOOK3S_PR_POSSIBLE IKVM_REAL=1 #endif + IMSR_R12=1 INT_DEFINE_END(fp_unavailable) EXC_REAL_BEGIN(fp_unavailable, 0x800, 0x100) @@ -2372,6 +2377,7 @@ INT_DEFINE_BEGIN(altivec_unavailable) #ifdef CONFIG_KVM_BOOK3S_PR_POSSIBLE IKVM_REAL=1 #endif + IMSR_R12=1 INT_DEFINE_END(altivec_unavailable) EXC_REAL_BEGIN(altivec_unavailable, 0xf20, 0x20) @@ -2421,6 +2427,7 @@ INT_DEFINE_BEGIN(vsx_unavailable) #ifdef CONFIG_KVM_BOOK3S_PR_POSSIBLE IKVM_REAL=1 #endif + IMSR_R12=1 INT_DEFINE_END(vsx_unavailable) EXC_REAL_BEGIN(vsx_unavailable, 0xf40, 0x20) -- 2.37.2
[PATCH v5 1/7] powerpc/64: Add INTERRUPT_SANITIZE_REGISTERS Kconfig
Add Kconfig option for enabling clearing of registers on arrival in an interrupt handler. This reduces the speculation influence of registers on kernel internals. The option will be consumed by 64-bit systems that feature speculation and wish to implement this mitigation. This patch only introduces the Kconfig option, no actual mitigations. The primary overhead of this mitigation lies in an increased number of registers that must be saved and restored by interrupt handlers on Book3S systems. Enable by default on Book3E systems, which prior to this patch eagerly save and restore register state, meaning that the mitigation when implemented will have minimal overhead. Acked-by: Nicholas Piggin Signed-off-by: Rohan McLure --- Resubmitting patches as their own series after v6 partially merged: Link: https://lore.kernel.org/all/166488988686.779920.13794870102696416283.b4...@ellerman.id.au/t/ --- arch/powerpc/Kconfig | 9 + 1 file changed, 9 insertions(+) diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index 4fd4924f6d50..280c797e0f30 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -531,6 +531,15 @@ config HOTPLUG_CPU Say N if you are unsure. +config INTERRUPT_SANITIZE_REGISTERS + bool "Clear gprs on interrupt arrival" + depends on PPC64 && ARCH_HAS_SYSCALL_WRAPPER + default PPC_BOOK3E_64 + help + Reduce the influence of user register state on interrupt handlers and + syscalls through clearing user state from registers before handling + the exception. + config PPC_QUEUED_SPINLOCKS bool "Queued spinlocks" if EXPERT depends on SMP -- 2.37.2
[PATCH v5 2/7] powerpc/64: Add interrupt register sanitisation macros
Include in asm/ppc_asm.h macros to be used in multiple successive patches to implement zeroising architected registers in interrupt handlers. Registers will be sanitised in this fashion in future patches to reduce the speculation influence of user-controlled register values. These mitigations will be configurable through the CONFIG_INTERRUPT_SANITIZE_REGISTERS Kconfig option. Included are macros for conditionally zeroising registers and restoring as required with the mitigation enabled. With the mitigation disabled, non-volatiles must be restored on demand at separate locations to those required by the mitigation. Reviewed-by: Nicholas Piggin Signed-off-by: Rohan McLure --- v4: New patch v5: Remove unnecessary _ZEROIZE_ parts of macro titles, as the implementation of how registers are sanitised doesn't need to be immediately accessible, only that values will be clobbered. Introduce arbitrary sanitize gpr(s) macros. --- arch/powerpc/include/asm/ppc_asm.h | 19 +++ 1 file changed, 19 insertions(+) diff --git a/arch/powerpc/include/asm/ppc_asm.h b/arch/powerpc/include/asm/ppc_asm.h index 753a2757bcd4..d2f44612f4b0 100644 --- a/arch/powerpc/include/asm/ppc_asm.h +++ b/arch/powerpc/include/asm/ppc_asm.h @@ -74,6 +74,25 @@ #define SAVE_GPR(n, base) SAVE_GPRS(n, n, base) #define REST_GPR(n, base) REST_GPRS(n, n, base) +/* macros for handling user register sanitisation */ +#ifdef CONFIG_INTERRUPT_SANITIZE_REGISTERS +#define SANITIZE_SYSCALL_GPRS()ZEROIZE_GPR(0); \ + ZEROIZE_GPRS(5, 12);\ + ZEROIZE_NVGPRS() +#define SANITIZE_GPR(n)ZEROIZE_GPR(n) +#define SANITIZE_GPRS(start, end) ZEROIZE_GPRS(start, end) +#define SANITIZE_NVGPRS() ZEROIZE_NVGPRS() +#define SANITIZE_RESTORE_NVGPRS() REST_NVGPRS(r1) +#define HANDLER_RESTORE_NVGPRS() +#else +#define SANITIZE_SYSCALL_GPRS() +#define SANITIZE_GPR(n) +#define SANITIZE_GPRS(start, end) +#define SANITIZE_NVGPRS() +#define SANITIZE_RESTORE_NVGPRS() +#define HANDLER_RESTORE_NVGPRS() REST_NVGPRS(r1) +#endif /* CONFIG_INTERRUPT_SANITIZE_REGISTERS */ + #define SAVE_FPR(n, base) stfdn,8*TS_FPRWIDTH*(n)(base) #define SAVE_2FPRS(n, base)SAVE_FPR(n, base); SAVE_FPR(n+1, base) #define SAVE_4FPRS(n, base)SAVE_2FPRS(n, base); SAVE_2FPRS(n+2, base) -- 2.37.2
[PATCH v5 3/7] powerpc/64: Sanitise common exit code for interrupts
Interrupt code is shared between Book3E/S 64-bit systems for interrupt handlers. Ensure that exit code correctly restores non-volatile gprs on each system when CONFIG_INTERRUPT_SANITIZE_REGISTERS is enabled. Also introduce macros for clearing/restoring registers on interrupt entry for when this configuration option is either disabled or enabled. Reviewed-by: Nicholas Piggin Signed-off-by: Rohan McLure --- v4: New patch --- arch/powerpc/kernel/interrupt_64.S | 6 ++ 1 file changed, 6 insertions(+) diff --git a/arch/powerpc/kernel/interrupt_64.S b/arch/powerpc/kernel/interrupt_64.S index 978a173eb339..1ef4fdef74fb 100644 --- a/arch/powerpc/kernel/interrupt_64.S +++ b/arch/powerpc/kernel/interrupt_64.S @@ -408,9 +408,11 @@ interrupt_return_\srr\()_user: /* make backtraces match the _kernel variant */ _ASM_NOKPROBE_SYMBOL(interrupt_return_\srr\()_user) addir3,r1,STACK_FRAME_OVERHEAD bl interrupt_exit_user_prepare +#ifndef CONFIG_INTERRUPT_SANITIZE_REGISTERS cmpdi r3,0 bne-.Lrestore_nvgprs_\srr .Lrestore_nvgprs_\srr\()_cont: +#endif std r1,PACA_EXIT_SAVE_R1(r13) /* save r1 for restart */ #ifdef CONFIG_PPC_BOOK3S .Linterrupt_return_\srr\()_user_rst_start: @@ -424,6 +426,7 @@ _ASM_NOKPROBE_SYMBOL(interrupt_return_\srr\()_user) stb r11,PACAIRQHAPPENED(r13) # clear out possible HARD_DIS .Lfast_user_interrupt_return_\srr\(): + SANITIZE_RESTORE_NVGPRS() #ifdef CONFIG_PPC_BOOK3S .ifc \srr,srr lbz r4,PACASRR_VALID(r13) @@ -493,9 +496,11 @@ ALT_FTR_SECTION_END_IFCLR(CPU_FTR_STCX_CHECKS_ADDRESS) b . /* prevent speculative execution */ .Linterrupt_return_\srr\()_user_rst_end: +#ifndef CONFIG_INTERRUPT_SANITIZE_REGISTERS .Lrestore_nvgprs_\srr\(): REST_NVGPRS(r1) b .Lrestore_nvgprs_\srr\()_cont +#endif #ifdef CONFIG_PPC_BOOK3S interrupt_return_\srr\()_user_restart: @@ -576,6 +581,7 @@ _ASM_NOKPROBE_SYMBOL(interrupt_return_\srr\()_kernel) stb r11,PACAIRQHAPPENED(r13) // clear the possible HARD_DIS .Lfast_kernel_interrupt_return_\srr\(): + SANITIZE_RESTORE_NVGPRS() cmpdi cr1,r3,0 #ifdef CONFIG_PPC_BOOK3S .ifc \srr,srr -- 2.37.2
Re: [PATCH 2/5] arm: dts: remove label = "cpu" from DSA dt-binding
On Wed, Nov 30, 2022 at 05:10:37PM +0300, Arınç ÜNAL wrote: > This is not used by the DSA dt-binding, so remove it from all devicetrees. > > Signed-off-by: Arınç ÜNAL > --- > arch/arm/boot/dts/imx6qdl-skov-cpu.dtsi | 1 - > arch/arm/boot/dts/imx6qp-prtwd3.dts | 1 - Reviewed-by: Oleksij Rempel Thx! -- Pengutronix e.K. | | Steuerwalder Str. 21 | http://www.pengutronix.de/ | 31137 Hildesheim, Germany | Phone: +49-5121-206917-0| Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917- |
Re: [PATCH 4/5] mips: dts: remove label = "cpu" from DSA dt-binding
On Wed, Nov 30, 2022 at 3:14 PM Arınç ÜNAL wrote: > > This is not used by the DSA dt-binding, so remove it from all devicetrees. > > Signed-off-by: Arınç ÜNAL > --- > arch/mips/boot/dts/ralink/mt7621.dtsi | 1 - Acked-by: Sergio Paracuellos Thanks, Sergio Paracuellos
Re: [PATCH v3 3/3] block: sed-opal: keyring support for SED keys
On Wed, Nov 30, 2022 at 09:19:25 -0600, Greg Joyce wrote: > On Wed, 2022-11-30 at 08:00 +0100, Hannes Reinecke wrote: > > On 11/30/22 00:25, gjo...@linux.vnet.ibm.com wrote: > > > + case OPAL_KEYRING: > > > + /* the key is in the keyring */ > > > + ret = read_sed_opal_key(OPAL_AUTH_KEY, key->key, > > > OPAL_KEY_MAX); > > > + if (ret > 0) { > > > + if (ret > 255) { > > > > Why is a key longer than 255 an error? > > If this is a requirement, why not move the check into > > read_sed_opal_key() such that one only has to check for > > ret < 0 on errors? > > The check is done here because the SED Opal spec stipulates 255 as the > maximum key length. The key length (key->key_len) in the existing data > structures is __u8, so a length greater than 255 can not be conveyed. > For defensive purposes, I though it best to check here. Perhaps naming it `OPAL_MAX_KEY_LEN` would help clarify this? --Ben
Re: [RFC PATCH] Disable Book-E KVM support?
On Thu Dec 1, 2022 at 6:45 AM AEST, Crystal Wood wrote: > On Mon, 2022-11-28 at 14:36 +1000, Nicholas Piggin wrote: > > BookE KVM is in a deep maintenance state, I'm not sure how much testing > > it gets. I don't have a test setup, and it does not look like QEMU has > > any HV architecture enabled. It hasn't been too painful but there are > > some cases where it causes a bit of problem not being able to test, e.g., > > > > https://lists.ozlabs.org/pipermail/linuxppc-dev/2022-November/251452.html > > > > Time to begin removal process, or are there still people using it? I'm > > happy to to keep making occasional patches to try keep it going if > > there are people testing upstream. Getting HV support into QEMU would > > help with long term support, not sure how big of a job that would be. > > Not sure what you mean about QEMU not having e500 HV support? I don't know if > it's bitrotted, but it's there. > > I don't know whether anyone is still using this, but if they are, it's > probably e500mc and not e500v2 (which involved a bunch of hacks to get almost- > sorta-usable performance out of hardware not designed for virtualization). I > do see that there have been a few recent patches on QEMU e500 (beyond the > treewide cleanup type stuff), though I don't know if they're using KVM. CCing > them and the QEMU list. Well I could be wrong about it, but it doesn't look it implements LPIDR or GSPRs. The only use of MSR_GS seems to be a couple of places including an instruction that aborts because no HV implementation. It does have an MMU index selector but before d764184ddb22 that apparently didn't really work. QEMU probably should be able to run BookE KVM in PR mode at least. > I have an e6500 I could occasionally test on, if it turns out people do still > care about this. Don't count me as the use case, though. :-) Do you have a KVM setup on it? And it works with recentish upstream? > FWIW, as far as the RECONCILE_IRQ_STATE issue, that used to be done in > kvmppc_handle_exit(), but was moved in commit 9bd880a2c882 to be "cleaner and > faster". :-P Yeah that was probably reasonable at the time, that was the common way to do it and thie patch avoids an unnecessary expensive write to MSR (which my patch retains). I think it must have always clobbered r4 though. It's possible it wasn't tested with the right build option, or the right tracer active, or maybe the call was simple enough that it was lucky and the compiler didn't use r4. Easy bug to miss when it's not obvious that macro can call into C. Thanks, Nick
Re: [PATCH v2 26/50] KVM: PPC: Move processor compatibility check to module init
Sean Christopherson writes: > Move KVM PPC's compatibility checks to their respective module_init() > hooks, there's no need to wait until KVM's common compat check, nor is > there a need to perform the check on every CPU (provided by common KVM's > hook), as the compatibility checks operate on global data. > > arch/powerpc/include/asm/cputable.h: extern struct cpu_spec *cur_cpu_spec; > arch/powerpc/kvm/book3s.c: return 0 > arch/powerpc/kvm/e500.c: strcmp(cur_cpu_spec->cpu_name, "e500v2") > arch/powerpc/kvm/e500mc.c: strcmp(cur_cpu_spec->cpu_name, "e500mc") > strcmp(cur_cpu_spec->cpu_name, "e5500") > strcmp(cur_cpu_spec->cpu_name, "e6500") I'm not sure that output is really useful in the change log unless you explain more about what it is. > diff --git a/arch/powerpc/kvm/e500mc.c b/arch/powerpc/kvm/e500mc.c > index 57e0ad6a2ca3..795667f7ebf0 100644 > --- a/arch/powerpc/kvm/e500mc.c > +++ b/arch/powerpc/kvm/e500mc.c > @@ -388,6 +388,10 @@ static int __init kvmppc_e500mc_init(void) > { > int r; > > + r = kvmppc_e500mc_check_processor_compat(); > + if (r) > + return kvmppc_e500mc; This doesn't build: linux/arch/powerpc/kvm/e500mc.c: In function ‘kvmppc_e500mc_init’: linux/arch/powerpc/kvm/e500mc.c:391:13: error: implicit declaration of function ‘kvmppc_e500mc_check_processor_compat’; did you mean ‘kvmppc_core_check_processor_compat’? [-Werror=implicit-function-declaration] 391 | r = kvmppc_e500mc_check_processor_compat(); | ^~~~ | kvmppc_core_check_processor_compat linux/arch/powerpc/kvm/e500mc.c:393:24: error: ‘kvmppc_e500mc’ undeclared (first use in this function); did you mean ‘kvm_ops_e500mc’? 393 | return kvmppc_e500mc; |^ |kvm_ops_e500mc linux/arch/powerpc/kvm/e500mc.c:393:24: note: each undeclared identifier is reported only once for each function it appears in It needs the delta below to compile. With that: Acked-by: Michael Ellerman (powerpc) cheers diff --git a/arch/powerpc/kvm/e500mc.c b/arch/powerpc/kvm/e500mc.c index 795667f7ebf0..4564aa27edcf 100644 --- a/arch/powerpc/kvm/e500mc.c +++ b/arch/powerpc/kvm/e500mc.c @@ -168,7 +168,7 @@ static void kvmppc_core_vcpu_put_e500mc(struct kvm_vcpu *vcpu) kvmppc_booke_vcpu_put(vcpu); } -int kvmppc_core_check_processor_compat(void) +int kvmppc_e500mc_check_processor_compat(void) { int r; @@ -390,7 +390,7 @@ static int __init kvmppc_e500mc_init(void) r = kvmppc_e500mc_check_processor_compat(); if (r) - return kvmppc_e500mc; + goto err_out; r = kvmppc_booke_init(); if (r)
[PATCH] selftests: powerpc: Use "grep -E" instead of "egrep"
The latest version of grep claims the egrep is now obsolete so the build now contains warnings that look like: egrep: warning: egrep is obsolescent; using grep -E fix this using "grep -E" instead. sed -i "s/egrep/grep -E/g" `grep egrep -rwl tools/testing/selftests/powerpc` Here are the steps to install the latest grep: wget http://ftp.gnu.org/gnu/grep/grep-3.8.tar.gz tar xf grep-3.8.tar.gz cd grep-3.8 && ./configure && make sudo make install export PATH=/usr/local/bin:$PATH Signed-off-by: Tiezhu Yang --- As Shuah suggested, this patch should go through powerpc/linux.git tools/testing/selftests/powerpc/scripts/hmi.sh | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/testing/selftests/powerpc/scripts/hmi.sh b/tools/testing/selftests/powerpc/scripts/hmi.sh index dcdb392..bcc7b6b 100755 --- a/tools/testing/selftests/powerpc/scripts/hmi.sh +++ b/tools/testing/selftests/powerpc/scripts/hmi.sh @@ -36,7 +36,7 @@ trap "ppc64_cpu --smt-snooze-delay=100" 0 1 # for each chip+core combination # todo - less fragile parsing -egrep -o 'OCC: Chip [0-9a-f]+ Core [0-9a-f]' < /sys/firmware/opal/msglog | +grep -E -o 'OCC: Chip [0-9a-f]+ Core [0-9a-f]' < /sys/firmware/opal/msglog | while read chipcore; do chip=$(echo "$chipcore"|awk '{print $3}') core=$(echo "$chipcore"|awk '{print $5}') -- 2.1.0
Re: [PATCH 32/44] KVM: x86: Unify pr_fmt to use module name for all KVM modules
On Wed, 2022-11-30 at 23:02 +, Sean Christopherson wrote: > On Thu, Nov 10, 2022, Sean Christopherson wrote: > > On Thu, Nov 10, 2022, Robert Hoo wrote: > > > After this patch set, still find some printk()s left in > > > arch/x86/kvm/*, > > > consider clean all of them up? > > > > Hmm, yeah, I suppose at this point it makes sense to tack on a > > patch to clean > > them up. > > Actually, I'm going to pass on this for now. The series is already > too big. I'll > add this to my todo list for the future. That's all right, thanks for update.
Re: [PATCH v2] hvc/xen: lock console list traversal
On Wed, 30 Nov 2022, Roger Pau Monne wrote: > The currently lockless access to the xen console list in > vtermno_to_xencons() is incorrect, as additions and removals from the > list can happen anytime, and as such the traversal of the list to get > the private console data for a given termno needs to happen with the > lock held. Note users that modify the list already do so with the > lock taken. > > Adjust current lock takers to use the _irq{save,restore} helpers, > since the context in which vtermno_to_xencons() is called can have > interrupts disabled. Use the _irq{save,restore} set of helpers to > switch the current callers to disable interrupts in the locked region. > I haven't checked if existing users could instead use the _irq > variant, as I think it's safer to use _irq{save,restore} upfront. > > While there switch from using list_for_each_entry_safe to > list_for_each_entry: the current entry cursor won't be removed as > part of the code in the loop body, so using the _safe variant is > pointless. > > Fixes: 02e19f9c7cac ('hvc_xen: implement multiconsole support') > Signed-off-by: Roger Pau Monné Reviewed-by: Stefano Stabellini > --- > Changes since v1: > - Switch current lock users to disable interrupts in the locked >region. > --- > drivers/tty/hvc/hvc_xen.c | 46 --- > 1 file changed, 29 insertions(+), 17 deletions(-) > > diff --git a/drivers/tty/hvc/hvc_xen.c b/drivers/tty/hvc/hvc_xen.c > index e63c1761a361..d9d023275328 100644 > --- a/drivers/tty/hvc/hvc_xen.c > +++ b/drivers/tty/hvc/hvc_xen.c > @@ -53,17 +53,22 @@ static DEFINE_SPINLOCK(xencons_lock); > > static struct xencons_info *vtermno_to_xencons(int vtermno) > { > - struct xencons_info *entry, *n, *ret = NULL; > + struct xencons_info *entry, *ret = NULL; > + unsigned long flags; > > - if (list_empty()) > - return NULL; > + spin_lock_irqsave(_lock, flags); > + if (list_empty()) { > + spin_unlock_irqrestore(_lock, flags); > + return NULL; > + } > > - list_for_each_entry_safe(entry, n, , list) { > + list_for_each_entry(entry, , list) { > if (entry->vtermno == vtermno) { > ret = entry; > break; > } > } > + spin_unlock_irqrestore(_lock, flags); > > return ret; > } > @@ -234,7 +239,7 @@ static int xen_hvm_console_init(void) > { > int r; > uint64_t v = 0; > - unsigned long gfn; > + unsigned long gfn, flags; > struct xencons_info *info; > > if (!xen_hvm_domain()) > @@ -270,9 +275,9 @@ static int xen_hvm_console_init(void) > goto err; > info->vtermno = HVC_COOKIE; > > - spin_lock(_lock); > + spin_lock_irqsave(_lock, flags); > list_add_tail(>list, ); > - spin_unlock(_lock); > + spin_unlock_irqrestore(_lock, flags); > > return 0; > err: > @@ -296,6 +301,7 @@ static int xencons_info_pv_init(struct xencons_info > *info, int vtermno) > static int xen_pv_console_init(void) > { > struct xencons_info *info; > + unsigned long flags; > > if (!xen_pv_domain()) > return -ENODEV; > @@ -312,9 +318,9 @@ static int xen_pv_console_init(void) > /* already configured */ > return 0; > } > - spin_lock(_lock); > + spin_lock_irqsave(_lock, flags); > xencons_info_pv_init(info, HVC_COOKIE); > - spin_unlock(_lock); > + spin_unlock_irqrestore(_lock, flags); > > return 0; > } > @@ -322,6 +328,7 @@ static int xen_pv_console_init(void) > static int xen_initial_domain_console_init(void) > { > struct xencons_info *info; > + unsigned long flags; > > if (!xen_initial_domain()) > return -ENODEV; > @@ -337,9 +344,9 @@ static int xen_initial_domain_console_init(void) > info->irq = bind_virq_to_irq(VIRQ_CONSOLE, 0, false); > info->vtermno = HVC_COOKIE; > > - spin_lock(_lock); > + spin_lock_irqsave(_lock, flags); > list_add_tail(>list, ); > - spin_unlock(_lock); > + spin_unlock_irqrestore(_lock, flags); > > return 0; > } > @@ -394,10 +401,12 @@ static void xencons_free(struct xencons_info *info) > > static int xen_console_remove(struct xencons_info *info) > { > + unsigned long flags; > + > xencons_disconnect_backend(info); > - spin_lock(_lock); > + spin_lock_irqsave(_lock, flags); > list_del(>list); > - spin_unlock(_lock); > + spin_unlock_irqrestore(_lock, flags); > if (info->xbdev != NULL) > xencons_free(info); > else { > @@ -478,6 +487,7 @@ static int xencons_probe(struct xenbus_device *dev, > { > int ret, devid; > struct xencons_info *info; > + unsigned long flags; > > devid = dev->nodename[strlen(dev->nodename) - 1] - '0'; > if (devid == 0) > @@ -497,9 +507,9 @@ static int xencons_probe(struct
Re: [PATCH v2] hvc/xen: prevent concurrent accesses to the shared ring
On Wed, 30 Nov 2022, Roger Pau Monne wrote: > The hvc machinery registers both a console and a tty device based on > the hv ops provided by the specific implementation. Those two > interfaces however have different locks, and there's no single locks > that's shared between the tty and the console implementations, hence > the driver needs to protect itself against concurrent accesses. > Otherwise concurrent calls using the split interfaces are likely to > corrupt the ring indexes, leaving the console unusable. > > Introduce a lock to xencons_info to serialize accesses to the shared > ring. This is only required when using the shared memory console, > concurrent accesses to the hypercall based console implementation are > not an issue. > > Note the conditional logic in domU_read_console() is slightly modified > so the notify_daemon() call can be done outside of the locked region: > it's an hypercall and there's no need for it to be done with the lock > held. For domU_read_console: I don't mean to block this patch but we need to be sure about the semantics of hv_ops.get_chars. Either it is expected to be already locked, then we definitely shouldn't add another lock to domU_read_console. Or it is not expected to be already locked, then we should add the lock. My impression is that it is expected to be already locked, but I think we need Greg or Jiri to confirm one way or the other. Aside from that the rest looks fine. > Fixes: b536b4b96230 ('xen: use the hvc console infrastructure for Xen > console') > Signed-off-by: Roger Pau Monné > --- > Changes since v1: > - Properly initialize the introduced lock in all paths. > --- > drivers/tty/hvc/hvc_xen.c | 19 +-- > 1 file changed, 17 insertions(+), 2 deletions(-) > > diff --git a/drivers/tty/hvc/hvc_xen.c b/drivers/tty/hvc/hvc_xen.c > index 7c23112dc923..e63c1761a361 100644 > --- a/drivers/tty/hvc/hvc_xen.c > +++ b/drivers/tty/hvc/hvc_xen.c > @@ -43,6 +43,7 @@ struct xencons_info { > int irq; > int vtermno; > grant_ref_t gntref; > + spinlock_t ring_lock; > }; > > static LIST_HEAD(xenconsoles); > @@ -84,12 +85,15 @@ static int __write_console(struct xencons_info *xencons, > XENCONS_RING_IDX cons, prod; > struct xencons_interface *intf = xencons->intf; > int sent = 0; > + unsigned long flags; > > + spin_lock_irqsave(>ring_lock, flags); > cons = intf->out_cons; > prod = intf->out_prod; > mb(); /* update queue values before going on */ > > if ((prod - cons) > sizeof(intf->out)) { > + spin_unlock_irqrestore(>ring_lock, flags); > pr_err_once("xencons: Illegal ring page indices"); > return -EINVAL; > } > @@ -99,6 +103,7 @@ static int __write_console(struct xencons_info *xencons, > > wmb(); /* write ring before updating pointer */ > intf->out_prod = prod; > + spin_unlock_irqrestore(>ring_lock, flags); > > if (sent) > notify_daemon(xencons); > @@ -141,16 +146,19 @@ static int domU_read_console(uint32_t vtermno, char > *buf, int len) > int recv = 0; > struct xencons_info *xencons = vtermno_to_xencons(vtermno); > unsigned int eoiflag = 0; > + unsigned long flags; > > if (xencons == NULL) > return -EINVAL; > intf = xencons->intf; > > + spin_lock_irqsave(>ring_lock, flags); > cons = intf->in_cons; > prod = intf->in_prod; > mb(); /* get pointers before reading ring */ > > if ((prod - cons) > sizeof(intf->in)) { > + spin_unlock_irqrestore(>ring_lock, flags); > pr_err_once("xencons: Illegal ring page indices"); > return -EINVAL; > } > @@ -174,10 +182,13 @@ static int domU_read_console(uint32_t vtermno, char > *buf, int len) > xencons->out_cons = intf->out_cons; > xencons->out_cons_same = 0; > } > + if (!recv && xencons->out_cons_same++ > 1) { > + eoiflag = XEN_EOI_FLAG_SPURIOUS; > + } > + spin_unlock_irqrestore(>ring_lock, flags); > + > if (recv) { > notify_daemon(xencons); > - } else if (xencons->out_cons_same++ > 1) { > - eoiflag = XEN_EOI_FLAG_SPURIOUS; > } > > xen_irq_lateeoi(xencons->irq, eoiflag); > @@ -234,6 +245,7 @@ static int xen_hvm_console_init(void) > info = kzalloc(sizeof(struct xencons_info), GFP_KERNEL); > if (!info) > return -ENOMEM; > + spin_lock_init(>ring_lock); > } else if (info->intf != NULL) { > /* already configured */ > return 0; > @@ -270,6 +282,7 @@ static int xen_hvm_console_init(void) > > static int xencons_info_pv_init(struct xencons_info *info, int vtermno) > { > + spin_lock_init(>ring_lock); > info->evtchn = xen_start_info->console.domU.evtchn; > /* GFN ==
[PATCH v2 50/50] KVM: Clean up error labels in kvm_init()
Convert the last two "out" lables to "err" labels now that the dust has settled, i.e. now that there are no more planned changes to the order of things in kvm_init(). Use "err" instead of "out" as it's easier to describe what failed than it is to describe what needs to be unwound, e.g. if allocating a per-CPU kick mask fails, KVM needs to free any masks that were allocated, and of course needs to unwind previous operations. Reported-by: Chao Gao Signed-off-by: Sean Christopherson --- virt/kvm/kvm_main.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 6a2be96557c2..b8c6bfb46066 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -5907,14 +5907,14 @@ int kvm_init(unsigned vcpu_size, unsigned vcpu_align, struct module *module) NULL); if (!kvm_vcpu_cache) { r = -ENOMEM; - goto out_free_3; + goto err_vcpu_cache; } for_each_possible_cpu(cpu) { if (!alloc_cpumask_var_node(_cpu(cpu_kick_mask, cpu), GFP_KERNEL, cpu_to_node(cpu))) { r = -ENOMEM; - goto out_free_4; + goto err_cpu_kick_mask; } } @@ -5956,11 +5956,11 @@ int kvm_init(unsigned vcpu_size, unsigned vcpu_align, struct module *module) err_async_pf: kvm_irqfd_exit(); err_irqfd: -out_free_4: +err_cpu_kick_mask: for_each_possible_cpu(cpu) free_cpumask_var(per_cpu(cpu_kick_mask, cpu)); kmem_cache_destroy(kvm_vcpu_cache); -out_free_3: +err_vcpu_cache: #ifdef CONFIG_KVM_GENERIC_HARDWARE_ENABLING unregister_syscore_ops(_syscore_ops); unregister_reboot_notifier(_reboot_notifier); -- 2.38.1.584.g0f3c55d4c2-goog
[PATCH v2 49/50] KVM: Opt out of generic hardware enabling on s390 and PPC
Allow architectures to opt out of the generic hardware enabling logic, and opt out on both s390 and PPC, which don't need to manually enable virtualization as it's always on (when available). In addition to letting s390 and PPC drop a bit of dead code, this will hopefully also allow ARM to clean up its related code, e.g. ARM has its own per-CPU flag to track which CPUs have enable hardware due to the need to keep hardware enabled indefinitely when pKVM is enabled. Signed-off-by: Sean Christopherson Acked-by: Anup Patel --- arch/arm64/kvm/Kconfig | 1 + arch/mips/kvm/Kconfig | 1 + arch/powerpc/include/asm/kvm_host.h | 1 - arch/powerpc/kvm/powerpc.c | 5 - arch/riscv/kvm/Kconfig | 1 + arch/s390/include/asm/kvm_host.h| 1 - arch/s390/kvm/kvm-s390.c| 6 -- arch/x86/kvm/Kconfig| 1 + include/linux/kvm_host.h| 4 virt/kvm/Kconfig| 3 +++ virt/kvm/kvm_main.c | 30 +++-- 11 files changed, 35 insertions(+), 19 deletions(-) diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig index 815cc118c675..0a7d2116b27b 100644 --- a/arch/arm64/kvm/Kconfig +++ b/arch/arm64/kvm/Kconfig @@ -21,6 +21,7 @@ if VIRTUALIZATION menuconfig KVM bool "Kernel-based Virtual Machine (KVM) support" depends on HAVE_KVM + select KVM_GENERIC_HARDWARE_ENABLING select MMU_NOTIFIER select PREEMPT_NOTIFIERS select HAVE_KVM_CPU_RELAX_INTERCEPT diff --git a/arch/mips/kvm/Kconfig b/arch/mips/kvm/Kconfig index 91d197bee9c0..29e51649203b 100644 --- a/arch/mips/kvm/Kconfig +++ b/arch/mips/kvm/Kconfig @@ -28,6 +28,7 @@ config KVM select MMU_NOTIFIER select SRCU select INTERVAL_TREE + select KVM_GENERIC_HARDWARE_ENABLING help Support for hosting Guest kernels. diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index 0a80e80c7b9e..959f566a455c 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -876,7 +876,6 @@ struct kvm_vcpu_arch { #define __KVM_HAVE_ARCH_WQP #define __KVM_HAVE_CREATE_DEVICE -static inline void kvm_arch_hardware_disable(void) {} static inline void kvm_arch_sync_events(struct kvm *kvm) {} static inline void kvm_arch_memslots_updated(struct kvm *kvm, u64 gen) {} static inline void kvm_arch_flush_shadow_all(struct kvm *kvm) {} diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c index f5b4ff6bfc89..4c5405fc5538 100644 --- a/arch/powerpc/kvm/powerpc.c +++ b/arch/powerpc/kvm/powerpc.c @@ -435,11 +435,6 @@ int kvmppc_ld(struct kvm_vcpu *vcpu, ulong *eaddr, int size, void *ptr, } EXPORT_SYMBOL_GPL(kvmppc_ld); -int kvm_arch_hardware_enable(void) -{ - return 0; -} - int kvm_arch_init_vm(struct kvm *kvm, unsigned long type) { struct kvmppc_ops *kvm_ops = NULL; diff --git a/arch/riscv/kvm/Kconfig b/arch/riscv/kvm/Kconfig index f36a737d5f96..d5a658a047a7 100644 --- a/arch/riscv/kvm/Kconfig +++ b/arch/riscv/kvm/Kconfig @@ -20,6 +20,7 @@ if VIRTUALIZATION config KVM tristate "Kernel-based Virtual Machine (KVM) support (EXPERIMENTAL)" depends on RISCV_SBI && MMU + select KVM_GENERIC_HARDWARE_ENABLING select MMU_NOTIFIER select PREEMPT_NOTIFIERS select KVM_MMIO diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h index d67ce719d16a..2bbc3d54959d 100644 --- a/arch/s390/include/asm/kvm_host.h +++ b/arch/s390/include/asm/kvm_host.h @@ -1031,7 +1031,6 @@ extern char sie_exit; extern int kvm_s390_gisc_register(struct kvm *kvm, u32 gisc); extern int kvm_s390_gisc_unregister(struct kvm *kvm, u32 gisc); -static inline void kvm_arch_hardware_disable(void) {} static inline void kvm_arch_sync_events(struct kvm *kvm) {} static inline void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu) {} static inline void kvm_arch_free_memslot(struct kvm *kvm, diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c index 7ad8252e92c2..bd25076aa19b 100644 --- a/arch/s390/kvm/kvm-s390.c +++ b/arch/s390/kvm/kvm-s390.c @@ -256,12 +256,6 @@ debug_info_t *kvm_s390_dbf; debug_info_t *kvm_s390_dbf_uv; /* Section: not file related */ -int kvm_arch_hardware_enable(void) -{ - /* every s390 is virtualization enabled ;-) */ - return 0; -} - /* forward declarations */ static void kvm_gmap_notifier(struct gmap *gmap, unsigned long start, unsigned long end); diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig index fbeaa9ddef59..8e578311ca9d 100644 --- a/arch/x86/kvm/Kconfig +++ b/arch/x86/kvm/Kconfig @@ -49,6 +49,7 @@ config KVM select SRCU select INTERVAL_TREE select HAVE_KVM_PM_NOTIFIER if PM + select KVM_GENERIC_HARDWARE_ENABLING help Support hosting fully virtualized guest machines using hardware
[PATCH v2 48/50] KVM: Register syscore (suspend/resume) ops early in kvm_init()
Register the suspend/resume notifier hooks at the same time KVM registers its reboot notifier so that all the code in kvm_init() that deals with enabling/disabling hardware is bundled together. Opportunstically move KVM's implementations to reside near the reboot notifier code for the same reason. Bunching the code together will allow architectures to opt out of KVM's generic hardware enable/disable logic with minimal #ifdeffery. Signed-off-by: Sean Christopherson --- virt/kvm/kvm_main.c | 68 ++--- 1 file changed, 34 insertions(+), 34 deletions(-) diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 674a9dab5411..c12db3839114 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -5154,6 +5154,38 @@ static struct notifier_block kvm_reboot_notifier = { .priority = 0, }; +static int kvm_suspend(void) +{ + /* +* Secondary CPUs and CPU hotplug are disabled across the suspend/resume +* callbacks, i.e. no need to acquire kvm_lock to ensure the usage count +* is stable. Assert that kvm_lock is not held to ensure the system +* isn't suspended while KVM is enabling hardware. Hardware enabling +* can be preempted, but the task cannot be frozen until it has dropped +* all locks (userspace tasks are frozen via a fake signal). +*/ + lockdep_assert_not_held(_lock); + lockdep_assert_irqs_disabled(); + + if (kvm_usage_count) + hardware_disable_nolock(NULL); + return 0; +} + +static void kvm_resume(void) +{ + lockdep_assert_not_held(_lock); + lockdep_assert_irqs_disabled(); + + if (kvm_usage_count) + WARN_ON_ONCE(__hardware_enable_nolock()); +} + +static struct syscore_ops kvm_syscore_ops = { + .suspend = kvm_suspend, + .resume = kvm_resume, +}; + static void kvm_io_bus_destroy(struct kvm_io_bus *bus) { int i; @@ -5732,38 +5764,6 @@ static void kvm_init_debug(void) } } -static int kvm_suspend(void) -{ - /* -* Secondary CPUs and CPU hotplug are disabled across the suspend/resume -* callbacks, i.e. no need to acquire kvm_lock to ensure the usage count -* is stable. Assert that kvm_lock is not held to ensure the system -* isn't suspended while KVM is enabling hardware. Hardware enabling -* can be preempted, but the task cannot be frozen until it has dropped -* all locks (userspace tasks are frozen via a fake signal). -*/ - lockdep_assert_not_held(_lock); - lockdep_assert_irqs_disabled(); - - if (kvm_usage_count) - hardware_disable_nolock(NULL); - return 0; -} - -static void kvm_resume(void) -{ - lockdep_assert_not_held(_lock); - lockdep_assert_irqs_disabled(); - - if (kvm_usage_count) - WARN_ON_ONCE(__hardware_enable_nolock()); -} - -static struct syscore_ops kvm_syscore_ops = { - .suspend = kvm_suspend, - .resume = kvm_resume, -}; - static inline struct kvm_vcpu *preempt_notifier_to_vcpu(struct preempt_notifier *pn) { @@ -5879,6 +5879,7 @@ int kvm_init(unsigned vcpu_size, unsigned vcpu_align, struct module *module) return r; register_reboot_notifier(_reboot_notifier); + register_syscore_ops(_syscore_ops); /* A kmem cache lets us meet the alignment requirements of fx_save. */ if (!vcpu_align) @@ -5913,8 +5914,6 @@ int kvm_init(unsigned vcpu_size, unsigned vcpu_align, struct module *module) kvm_chardev_ops.owner = module; - register_syscore_ops(_syscore_ops); - kvm_preempt_ops.sched_in = kvm_sched_in; kvm_preempt_ops.sched_out = kvm_sched_out; @@ -5948,6 +5947,7 @@ int kvm_init(unsigned vcpu_size, unsigned vcpu_align, struct module *module) free_cpumask_var(per_cpu(cpu_kick_mask, cpu)); kmem_cache_destroy(kvm_vcpu_cache); out_free_3: + unregister_syscore_ops(_syscore_ops); unregister_reboot_notifier(_reboot_notifier); cpuhp_remove_state_nocalls(CPUHP_AP_KVM_ONLINE); return r; -- 2.38.1.584.g0f3c55d4c2-goog
[PATCH v2 47/50] KVM: Make hardware_enable_failed a local variable in the "enable all" path
From: Isaku Yamahata Rework detecting hardware enabling errors to use a local variable in the "enable all" path to track whether or not enabling was successful across all CPUs. Using a global variable complicates paths that enable hardware only on the current CPU, e.g. kvm_resume() and kvm_online_cpu(). Opportunistically add a WARN if hardware enabling fails during kvm_resume(), KVM is all kinds of hosed if CPU0 fails to enable hardware. The WARN is largely futile in the current code, as KVM BUG()s on spurious faults on VMX instructions, e.g. attempting to run a vCPU on CPU if hardware enabling fails will explode. [ cut here ] kernel BUG at arch/x86/kvm/x86.c:508! invalid opcode: [#1] SMP CPU: 3 PID: 1009 Comm: CPU 4/KVM Not tainted 6.1.0-rc1+ #11 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015 RIP: 0010:kvm_spurious_fault+0xa/0x10 Call Trace: vmx_vcpu_load_vmcs+0x192/0x230 [kvm_intel] vmx_vcpu_load+0x16/0x60 [kvm_intel] kvm_arch_vcpu_load+0x32/0x1f0 vcpu_load+0x2f/0x40 kvm_arch_vcpu_ioctl_run+0x19/0x9d0 kvm_vcpu_ioctl+0x271/0x660 __x64_sys_ioctl+0x80/0xb0 do_syscall_64+0x2b/0x50 entry_SYSCALL_64_after_hwframe+0x46/0xb0 But, the WARN may provide a breadcrumb to understand what went awry, and someday KVM may fix one or both of those bugs, e.g. by finding a way to eat spurious faults no matter the context (easier said than done due to side effects of certain operations, e.g. Intel's VMCLEAR). Signed-off-by: Isaku Yamahata [sean: rebase, WARN on failure in kvm_resume()] Signed-off-by: Sean Christopherson --- virt/kvm/kvm_main.c | 35 --- 1 file changed, 16 insertions(+), 19 deletions(-) diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index c1e48c18e2d9..674a9dab5411 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -104,7 +104,6 @@ LIST_HEAD(vm_list); static DEFINE_PER_CPU(bool, hardware_enabled); static int kvm_usage_count; -static atomic_t hardware_enable_failed; static struct kmem_cache *kvm_vcpu_cache; @@ -5025,19 +5024,25 @@ static struct miscdevice kvm_dev = { _chardev_ops, }; -static void hardware_enable_nolock(void *junk) +static int __hardware_enable_nolock(void) { if (__this_cpu_read(hardware_enabled)) - return; + return 0; if (kvm_arch_hardware_enable()) { - atomic_inc(_enable_failed); pr_info("kvm: enabling virtualization on CPU%d failed\n", raw_smp_processor_id()); - return; + return -EIO; } __this_cpu_write(hardware_enabled, true); + return 0; +} + +static void hardware_enable_nolock(void *failed) +{ + if (__hardware_enable_nolock()) + atomic_inc(failed); } static int kvm_online_cpu(unsigned int cpu) @@ -5050,16 +5055,8 @@ static int kvm_online_cpu(unsigned int cpu) * errors when scheduled to this CPU. */ mutex_lock(_lock); - if (kvm_usage_count) { - WARN_ON_ONCE(atomic_read(_enable_failed)); - - hardware_enable_nolock(NULL); - - if (atomic_read(_enable_failed)) { - atomic_set(_enable_failed, 0); - ret = -EIO; - } - } + if (kvm_usage_count) + ret = __hardware_enable_nolock(); mutex_unlock(_lock); return ret; } @@ -5107,6 +5104,7 @@ static void hardware_disable_all(void) static int hardware_enable_all(void) { + atomic_t failed = ATOMIC_INIT(0); int r = 0; /* @@ -5122,10 +5120,9 @@ static int hardware_enable_all(void) kvm_usage_count++; if (kvm_usage_count == 1) { - atomic_set(_enable_failed, 0); - on_each_cpu(hardware_enable_nolock, NULL, 1); + on_each_cpu(hardware_enable_nolock, , 1); - if (atomic_read(_enable_failed)) { + if (atomic_read()) { hardware_disable_all_nolock(); r = -EBUSY; } @@ -5759,7 +5756,7 @@ static void kvm_resume(void) lockdep_assert_irqs_disabled(); if (kvm_usage_count) - hardware_enable_nolock(NULL); + WARN_ON_ONCE(__hardware_enable_nolock()); } static struct syscore_ops kvm_syscore_ops = { -- 2.38.1.584.g0f3c55d4c2-goog
[PATCH v2 46/50] KVM: Use a per-CPU variable to track which CPUs have enabled virtualization
Use a per-CPU variable instead of a shared bitmap to track which CPUs have successfully enabled virtualization hardware. Using a per-CPU bool avoids the need for an additional allocation, and arguably yields easier to read code. Using a bitmap would be advantageous if KVM used it to avoid generating IPIs to CPUs that failed to enable hardware, but that's an extreme edge case and not worth optimizing, and the low level helpers would still want to keep their individual checks as attempting to enable virtualization hardware when it's already enabled can be problematic, e.g. Intel's VMXON will fault. Opportunistically change the order in hardware_enable_nolock() to set the flag if and only if hardware enabling is successful, instead of speculatively setting the flag and then clearing it on failure. Add a comment explaining that the check in hardware_disable_nolock() isn't simply paranoia. Waaay back when, commit 1b6c016818a5 ("KVM: Keep track of which cpus have virtualization enabled"), added the logic as a guards against CPU hotplug racing with hardware enable/disable. Now that KVM has eliminated the race by taking cpu_hotplug_lock for read (via cpus_read_lock()) when enabling or disabling hardware, at first glance it appears that the check is now superfluous, i.e. it's tempting to remove the per-CPU flag entirely... Signed-off-by: Sean Christopherson --- virt/kvm/kvm_main.c | 41 ++--- 1 file changed, 18 insertions(+), 23 deletions(-) diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index a27ded004644..c1e48c18e2d9 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -102,7 +102,7 @@ EXPORT_SYMBOL_GPL(halt_poll_ns_shrink); DEFINE_MUTEX(kvm_lock); LIST_HEAD(vm_list); -static cpumask_var_t cpus_hardware_enabled; +static DEFINE_PER_CPU(bool, hardware_enabled); static int kvm_usage_count; static atomic_t hardware_enable_failed; @@ -5027,21 +5027,17 @@ static struct miscdevice kvm_dev = { static void hardware_enable_nolock(void *junk) { - int cpu = smp_processor_id(); - int r; - - if (cpumask_test_cpu(cpu, cpus_hardware_enabled)) + if (__this_cpu_read(hardware_enabled)) return; - cpumask_set_cpu(cpu, cpus_hardware_enabled); - - r = kvm_arch_hardware_enable(); - - if (r) { - cpumask_clear_cpu(cpu, cpus_hardware_enabled); + if (kvm_arch_hardware_enable()) { atomic_inc(_enable_failed); - pr_info("kvm: enabling virtualization on CPU%d failed\n", cpu); + pr_info("kvm: enabling virtualization on CPU%d failed\n", + raw_smp_processor_id()); + return; } + + __this_cpu_write(hardware_enabled, true); } static int kvm_online_cpu(unsigned int cpu) @@ -5070,12 +5066,16 @@ static int kvm_online_cpu(unsigned int cpu) static void hardware_disable_nolock(void *junk) { - int cpu = smp_processor_id(); - - if (!cpumask_test_cpu(cpu, cpus_hardware_enabled)) + /* +* Note, hardware_disable_all_nolock() tells all online CPUs to disable +* hardware, not just CPUs that successfully enabled hardware! +*/ + if (!__this_cpu_read(hardware_enabled)) return; - cpumask_clear_cpu(cpu, cpus_hardware_enabled); + kvm_arch_hardware_disable(); + + __this_cpu_write(hardware_enabled, false); } static int kvm_offline_cpu(unsigned int cpu) @@ -5876,13 +5876,11 @@ int kvm_init(unsigned vcpu_size, unsigned vcpu_align, struct module *module) int r; int cpu; - if (!zalloc_cpumask_var(_hardware_enabled, GFP_KERNEL)) - return -ENOMEM; - r = cpuhp_setup_state_nocalls(CPUHP_AP_KVM_ONLINE, "kvm/cpu:online", kvm_online_cpu, kvm_offline_cpu); if (r) - goto out_free_2; + return r; + register_reboot_notifier(_reboot_notifier); /* A kmem cache lets us meet the alignment requirements of fx_save. */ @@ -5955,8 +5953,6 @@ int kvm_init(unsigned vcpu_size, unsigned vcpu_align, struct module *module) out_free_3: unregister_reboot_notifier(_reboot_notifier); cpuhp_remove_state_nocalls(CPUHP_AP_KVM_ONLINE); -out_free_2: - free_cpumask_var(cpus_hardware_enabled); return r; } EXPORT_SYMBOL_GPL(kvm_init); @@ -5982,7 +5978,6 @@ void kvm_exit(void) unregister_reboot_notifier(_reboot_notifier); cpuhp_remove_state_nocalls(CPUHP_AP_KVM_ONLINE); kvm_irqfd_exit(); - free_cpumask_var(cpus_hardware_enabled); } EXPORT_SYMBOL_GPL(kvm_exit); -- 2.38.1.584.g0f3c55d4c2-goog
[PATCH v2 45/50] KVM: Remove on_each_cpu(hardware_disable_nolock) in kvm_exit()
From: Isaku Yamahata Drop the superfluous invocation of hardware_disable_nolock() during kvm_exit(), as it's nothing more than a glorified nop. KVM automatically disables hardware on all CPUs when the last VM is destroyed, and kvm_exit() cannot be called until the last VM goes away as the calling module is pinned by an elevated refcount of the fops associated with /dev/kvm. This holds true even on x86, where the caller of kvm_exit() is not kvm.ko, but is instead a dependent module, kvm_amd.ko or kvm_intel.ko, as kvm_chardev_ops.owner is set to the module that calls kvm_init(), not hardcoded to the base kvm.ko module. Signed-off-by: Isaku Yamahata [sean: rework changelog] Signed-off-by: Sean Christopherson --- virt/kvm/kvm_main.c | 1 - 1 file changed, 1 deletion(-) diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 6a8fb53b32f0..a27ded004644 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -5981,7 +5981,6 @@ void kvm_exit(void) unregister_syscore_ops(_syscore_ops); unregister_reboot_notifier(_reboot_notifier); cpuhp_remove_state_nocalls(CPUHP_AP_KVM_ONLINE); - on_each_cpu(hardware_disable_nolock, NULL, 1); kvm_irqfd_exit(); free_cpumask_var(cpus_hardware_enabled); } -- 2.38.1.584.g0f3c55d4c2-goog
[PATCH v2 44/50] KVM: Drop kvm_count_lock and instead protect kvm_usage_count with kvm_lock
From: Isaku Yamahata Drop kvm_count_lock and instead protect kvm_usage_count with kvm_lock now that KVM hooks CPU hotplug during the ONLINE phase, which can sleep. Previously, KVM hooked the STARTING phase, which is not allowed to sleep and thus could not take kvm_lock (a mutex). This effectively allows the task that's initiating hardware enabling/disabling to preempted and/or migrated. Note, the Documentation/virt/kvm/locking.rst statement that kvm_count_lock is "raw" because hardware enabling/disabling needs to be atomic with respect to migration is wrong on multiple fronts. First, while regular spinlocks can be preempted, the task holding the lock cannot be migrated. Second, preventing migration is not required. on_each_cpu() disables preemption, which ensures that cpus_hardware_enabled correctly reflects hardware state. The task may be preempted/migrated between bumping kvm_usage_count and invoking on_each_cpu(), but that's perfectly ok as kvm_usage_count is still protected, e.g. other tasks that call hardware_enable_all() will be blocked until the preempted/migrated owner exits its critical section. KVM does have lockless accesses to kvm_usage_count in the suspend/resume flows, but those are safe because all tasks must be frozen prior to suspending CPUs, and a task cannot be frozen while it holds one or more locks (userspace tasks are frozen via a fake signal). Preemption doesn't need to be explicitly disabled in the hotplug path. The hotplug thread is pinned to the CPU that's being hotplugged, and KVM only cares about having a stable CPU, i.e. to ensure hardware is enabled on the correct CPU. Lockep, i.e. check_preemption_disabled(), plays nice with this state too, as is_percpu_thread() is true for the hotplug thread. Signed-off-by: Isaku Yamahata Co-developed-by: Sean Christopherson Signed-off-by: Sean Christopherson --- Documentation/virt/kvm/locking.rst | 19 virt/kvm/kvm_main.c| 36 -- 2 files changed, 34 insertions(+), 21 deletions(-) diff --git a/Documentation/virt/kvm/locking.rst b/Documentation/virt/kvm/locking.rst index 132a9e5436e5..cd570e565522 100644 --- a/Documentation/virt/kvm/locking.rst +++ b/Documentation/virt/kvm/locking.rst @@ -9,6 +9,8 @@ KVM Lock Overview The acquisition orders for mutexes are as follows: +- cpus_read_lock() is taken outside kvm_lock + - kvm->lock is taken outside vcpu->mutex - kvm->lock is taken outside kvm->slots_lock and kvm->irq_lock @@ -216,15 +218,10 @@ time it will be set using the Dirty tracking mechanism described above. :Type: mutex :Arch: any :Protects: - vm_list - -``kvm_count_lock`` -^^ - -:Type: raw_spinlock_t -:Arch: any -:Protects: - hardware virtualization enable/disable -:Comment: 'raw' because hardware enabling/disabling must be atomic /wrt - migration. + - kvm_usage_count + - hardware virtualization enable/disable +:Comment: KVM also disables CPU hotplug via cpus_read_lock() during + enable/disable. ``kvm->mn_invalidate_lock`` ^^^ @@ -288,3 +285,7 @@ time it will be set using the Dirty tracking mechanism described above. :Type: mutex :Arch: x86 :Protects: loading a vendor module (kvm_amd or kvm_intel) +:Comment: Exists because using kvm_lock leads to deadlock. cpu_hotplug_lock is +taken outside of kvm_lock, e.g. in KVM's CPU online/offline callbacks, and +many operations need to take cpu_hotplug_lock when loading a vendor module, +e.g. updating static calls. diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index a46d61e9c053..6a8fb53b32f0 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -100,7 +100,6 @@ EXPORT_SYMBOL_GPL(halt_poll_ns_shrink); */ DEFINE_MUTEX(kvm_lock); -static DEFINE_RAW_SPINLOCK(kvm_count_lock); LIST_HEAD(vm_list); static cpumask_var_t cpus_hardware_enabled; @@ -5054,17 +5053,18 @@ static int kvm_online_cpu(unsigned int cpu) * be enabled. Otherwise running VMs would encounter unrecoverable * errors when scheduled to this CPU. */ - raw_spin_lock(_count_lock); + mutex_lock(_lock); if (kvm_usage_count) { WARN_ON_ONCE(atomic_read(_enable_failed)); hardware_enable_nolock(NULL); + if (atomic_read(_enable_failed)) { atomic_set(_enable_failed, 0); ret = -EIO; } } - raw_spin_unlock(_count_lock); + mutex_unlock(_lock); return ret; } @@ -5080,10 +5080,10 @@ static void hardware_disable_nolock(void *junk) static int kvm_offline_cpu(unsigned int cpu) { - raw_spin_lock(_count_lock); + mutex_lock(_lock); if (kvm_usage_count) hardware_disable_nolock(NULL); - raw_spin_unlock(_count_lock); +
[PATCH v2 43/50] KVM: Ensure CPU is stable during low level hardware enable/disable
Use the non-raw smp_processor_id() in the low hardware enable/disable helpers as KVM absolutely relies on the CPU being stable, e.g. KVM would end up with incorrect state if the task were migrated between accessing cpus_hardware_enabled and actually enabling/disabling hardware. Signed-off-by: Sean Christopherson --- virt/kvm/kvm_main.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index d985b24c423b..a46d61e9c053 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -5028,7 +5028,7 @@ static struct miscdevice kvm_dev = { static void hardware_enable_nolock(void *junk) { - int cpu = raw_smp_processor_id(); + int cpu = smp_processor_id(); int r; if (cpumask_test_cpu(cpu, cpus_hardware_enabled)) @@ -5070,7 +5070,7 @@ static int kvm_online_cpu(unsigned int cpu) static void hardware_disable_nolock(void *junk) { - int cpu = raw_smp_processor_id(); + int cpu = smp_processor_id(); if (!cpumask_test_cpu(cpu, cpus_hardware_enabled)) return; -- 2.38.1.584.g0f3c55d4c2-goog
[PATCH v2 42/50] KVM: Disable CPU hotplug during hardware enabling/disabling
From: Chao Gao Disable CPU hotplug when enabling/disabling hardware to prevent the corner case where if the following sequence occurs: 1. A hotplugged CPU marks itself online in cpu_online_mask 2. The hotplugged CPU enables interrupt before invoking KVM's ONLINE callback 3 hardware_{en,dis}able_all() is invoked on another CPU the hotplugged CPU will be included in on_each_cpu() and thus get sent through hardware_{en,dis}able_nolock() before kvm_online_cpu() is called. start_secondary { ... set_cpu_online(smp_processor_id(), true); <- 1 ... local_irq_enable(); <- 2 ... cpu_startup_entry(CPUHP_AP_ONLINE_IDLE); <- 3 } KVM currently fudges around this race by keeping track of which CPUs have done hardware enabling (see commit 1b6c016818a5 "KVM: Keep track of which cpus have virtualization enabled"), but that's an inefficient, convoluted, and hacky solution. Signed-off-by: Chao Gao [sean: split to separate patch, write changelog] Signed-off-by: Sean Christopherson --- arch/x86/kvm/x86.c | 11 ++- virt/kvm/kvm_main.c | 12 2 files changed, 22 insertions(+), 1 deletion(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index dad30097f0c3..d2ad383da998 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -9281,7 +9281,16 @@ static inline void kvm_ops_update(struct kvm_x86_init_ops *ops) static int kvm_x86_check_processor_compatibility(void) { - struct cpuinfo_x86 *c = _data(smp_processor_id()); + int cpu = smp_processor_id(); + struct cpuinfo_x86 *c = _data(cpu); + + /* +* Compatibility checks are done when loading KVM and when enabling +* hardware, e.g. during CPU hotplug, to ensure all online CPUs are +* compatible, i.e. KVM should never perform a compatibility check on +* an offline CPU. +*/ + WARN_ON(!cpu_online(cpu)); if (__cr4_reserved_bits(cpu_has, c) != __cr4_reserved_bits(cpu_has, _cpu_data)) diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index f26ea779710a..d985b24c423b 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -5098,15 +5098,26 @@ static void hardware_disable_all_nolock(void) static void hardware_disable_all(void) { + cpus_read_lock(); raw_spin_lock(_count_lock); hardware_disable_all_nolock(); raw_spin_unlock(_count_lock); + cpus_read_unlock(); } static int hardware_enable_all(void) { int r = 0; + /* +* When onlining a CPU, cpu_online_mask is set before kvm_online_cpu() +* is called, and so on_each_cpu() between them includes the CPU that +* is being onlined. As a result, hardware_enable_nolock() may get +* invoked before kvm_online_cpu(), which also enables hardware if the +* usage count is non-zero. Disable CPU hotplug to avoid attempting to +* enable hardware multiple times. +*/ + cpus_read_lock(); raw_spin_lock(_count_lock); kvm_usage_count++; @@ -5121,6 +5132,7 @@ static int hardware_enable_all(void) } raw_spin_unlock(_count_lock); + cpus_read_unlock(); return r; } -- 2.38.1.584.g0f3c55d4c2-goog
[PATCH v2 41/50] KVM: Rename and move CPUHP_AP_KVM_STARTING to ONLINE section
From: Chao Gao The CPU STARTING section doesn't allow callbacks to fail. Move KVM's hotplug callback to ONLINE section so that it can abort onlining a CPU in certain cases to avoid potentially breaking VMs running on existing CPUs. For example, when KVM fails to enable hardware virtualization on the hotplugged CPU. Place KVM's hotplug state before CPUHP_AP_SCHED_WAIT_EMPTY as it ensures when offlining a CPU, all user tasks and non-pinned kernel tasks have left the CPU, i.e. there cannot be a vCPU task around. So, it is safe for KVM's CPU offline callback to disable hardware virtualization at that point. Likewise, KVM's online callback can enable hardware virtualization before any vCPU task gets a chance to run on hotplugged CPUs. Drop kvm_x86_check_processor_compatibility()'s WARN that IRQs are disabled, as the ONLINE section runs with IRQs disabled. The WARN wasn't intended to be a requirement, e.g. disabling preemption is sufficient, the IRQ thing was purely an aggressive sanity check since the helper was only ever invoked via SMP function call. Rename KVM's CPU hotplug callbacks accordingly. Suggested-by: Thomas Gleixner Signed-off-by: Chao Gao Signed-off-by: Isaku Yamahata Reviewed-by: Yuan Yao [sean: drop WARN that IRQs are disabled] Signed-off-by: Sean Christopherson --- arch/x86/kvm/x86.c | 2 -- include/linux/cpuhotplug.h | 2 +- virt/kvm/kvm_main.c| 30 ++ 3 files changed, 23 insertions(+), 11 deletions(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 5a9e74cedbc6..dad30097f0c3 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -9283,8 +9283,6 @@ static int kvm_x86_check_processor_compatibility(void) { struct cpuinfo_x86 *c = _data(smp_processor_id()); - WARN_ON(!irqs_disabled()); - if (__cr4_reserved_bits(cpu_has, c) != __cr4_reserved_bits(cpu_has, _cpu_data)) return -EIO; diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h index 7337414e4947..de45be38dd27 100644 --- a/include/linux/cpuhotplug.h +++ b/include/linux/cpuhotplug.h @@ -185,7 +185,6 @@ enum cpuhp_state { CPUHP_AP_CSKY_TIMER_STARTING, CPUHP_AP_TI_GP_TIMER_STARTING, CPUHP_AP_HYPERV_TIMER_STARTING, - CPUHP_AP_KVM_STARTING, /* Must be the last timer callback */ CPUHP_AP_DUMMY_TIMER_STARTING, CPUHP_AP_ARM_XEN_STARTING, @@ -200,6 +199,7 @@ enum cpuhp_state { /* Online section invoked on the hotplugged CPU from the hotplug thread */ CPUHP_AP_ONLINE_IDLE, + CPUHP_AP_KVM_ONLINE, CPUHP_AP_SCHED_WAIT_EMPTY, CPUHP_AP_SMPBOOT_THREADS, CPUHP_AP_X86_VDSO_VMA_ONLINE, diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 3900bd3d75cb..f26ea779710a 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -5045,13 +5045,27 @@ static void hardware_enable_nolock(void *junk) } } -static int kvm_starting_cpu(unsigned int cpu) +static int kvm_online_cpu(unsigned int cpu) { + int ret = 0; + + /* +* Abort the CPU online process if hardware virtualization cannot +* be enabled. Otherwise running VMs would encounter unrecoverable +* errors when scheduled to this CPU. +*/ raw_spin_lock(_count_lock); - if (kvm_usage_count) + if (kvm_usage_count) { + WARN_ON_ONCE(atomic_read(_enable_failed)); + hardware_enable_nolock(NULL); + if (atomic_read(_enable_failed)) { + atomic_set(_enable_failed, 0); + ret = -EIO; + } + } raw_spin_unlock(_count_lock); - return 0; + return ret; } static void hardware_disable_nolock(void *junk) @@ -5064,7 +5078,7 @@ static void hardware_disable_nolock(void *junk) kvm_arch_hardware_disable(); } -static int kvm_dying_cpu(unsigned int cpu) +static int kvm_offline_cpu(unsigned int cpu) { raw_spin_lock(_count_lock); if (kvm_usage_count) @@ -5841,8 +5855,8 @@ int kvm_init(unsigned vcpu_size, unsigned vcpu_align, struct module *module) if (!zalloc_cpumask_var(_hardware_enabled, GFP_KERNEL)) return -ENOMEM; - r = cpuhp_setup_state_nocalls(CPUHP_AP_KVM_STARTING, "kvm/cpu:starting", - kvm_starting_cpu, kvm_dying_cpu); + r = cpuhp_setup_state_nocalls(CPUHP_AP_KVM_ONLINE, "kvm/cpu:online", + kvm_online_cpu, kvm_offline_cpu); if (r) goto out_free_2; register_reboot_notifier(_reboot_notifier); @@ -5916,7 +5930,7 @@ int kvm_init(unsigned vcpu_size, unsigned vcpu_align, struct module *module) kmem_cache_destroy(kvm_vcpu_cache); out_free_3: unregister_reboot_notifier(_reboot_notifier); - cpuhp_remove_state_nocalls(CPUHP_AP_KVM_STARTING); + cpuhp_remove_state_nocalls(CPUHP_AP_KVM_ONLINE);
[PATCH v2 40/50] KVM: x86: Do compatibility checks when onlining CPU
From: Chao Gao Do compatibility checks when enabling hardware to effectively add compatibility checks when onlining a CPU. Abort enabling, i.e. the online process, if the (hotplugged) CPU is incompatible with the known good setup. At init time, KVM does compatibility checks to ensure that all online CPUs support hardware virtualization and a common set of features. But KVM uses hotplugged CPUs without such compatibility checks. On Intel CPUs, this leads to #GP if the hotplugged CPU doesn't support VMX, or VM-Entry failure if the hotplugged CPU doesn't support all features enabled by KVM. Note, this is little more than a NOP on SVM, as SVM already checks for full SVM support during hardware enabling. Opportunistically add a pr_err() if setup_vmcs_config() fails, and tweak all error messages to output which CPU failed. Signed-off-by: Chao Gao Co-developed-by: Sean Christopherson Signed-off-by: Sean Christopherson --- arch/x86/kvm/svm/svm.c | 8 +++- arch/x86/kvm/vmx/vmx.c | 15 ++- arch/x86/kvm/x86.c | 5 + 3 files changed, 18 insertions(+), 10 deletions(-) diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c index c2e95c0d9fd8..46b658d0f46e 100644 --- a/arch/x86/kvm/svm/svm.c +++ b/arch/x86/kvm/svm/svm.c @@ -521,11 +521,12 @@ static void svm_init_osvw(struct kvm_vcpu *vcpu) static bool kvm_is_svm_supported(void) { + int cpu = raw_smp_processor_id(); const char *msg; u64 vm_cr; if (!cpu_has_svm()) { - pr_err("SVM not supported, %s\n", msg); + pr_err("SVM not supported by CPU %d, %s\n", cpu, msg); return false; } @@ -536,7 +537,7 @@ static bool kvm_is_svm_supported(void) rdmsrl(MSR_VM_CR, vm_cr); if (vm_cr & (1 << SVM_VM_CR_SVM_DISABLE)) { - pr_err("SVM disabled (by BIOS) in MSR_VM_CR\n"); + pr_err("SVM disabled (by BIOS) in MSR_VM_CR on CPU %d\n", cpu); return false; } @@ -587,9 +588,6 @@ static int svm_hardware_enable(void) if (efer & EFER_SVME) return -EBUSY; - if (!kvm_is_svm_supported()) - return -EINVAL; - sd = per_cpu_ptr(_data, me); sd->asid_generation = 1; sd->max_asid = cpuid_ebx(SVM_CPUID_FUNC) - 1; diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index 6416ed5b7f89..39dd3082fcd8 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -2711,14 +2711,16 @@ static int setup_vmcs_config(struct vmcs_config *vmcs_conf, static bool kvm_is_vmx_supported(void) { + int cpu = raw_smp_processor_id(); + if (!cpu_has_vmx()) { - pr_err("CPU doesn't support VMX\n"); + pr_err("VMX not supported by CPU %d\n", cpu); return false; } if (!this_cpu_has(X86_FEATURE_MSR_IA32_FEAT_CTL) || !this_cpu_has(X86_FEATURE_VMX)) { - pr_err("VMX not enabled (by BIOS) in MSR_IA32_FEAT_CTL\n"); + pr_err("VMX not enabled (by BIOS) in MSR_IA32_FEAT_CTL on CPU %d\n", cpu); return false; } @@ -2727,18 +2729,21 @@ static bool kvm_is_vmx_supported(void) static int vmx_check_processor_compat(void) { + int cpu = raw_smp_processor_id(); struct vmcs_config vmcs_conf; struct vmx_capability vmx_cap; if (!kvm_is_vmx_supported()) return -EIO; - if (setup_vmcs_config(_conf, _cap) < 0) + if (setup_vmcs_config(_conf, _cap) < 0) { + pr_err("Failed to setup VMCS config on CPU %d\n", cpu); return -EIO; + } if (nested) nested_vmx_setup_ctls_msrs(_conf, vmx_cap.ept); - if (memcmp(_config, _conf, sizeof(struct vmcs_config)) != 0) { - pr_err("CPU %d feature inconsistency!\n", smp_processor_id()); + if (memcmp(_config, _conf, sizeof(struct vmcs_config))) { + pr_err("Inconsistent VMCS config on CPU %d\n", cpu); return -EIO; } return 0; diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index ee9af412ffd4..5a9e74cedbc6 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -11967,6 +11967,11 @@ int kvm_arch_hardware_enable(void) bool stable, backwards_tsc = false; kvm_user_return_msr_cpu_online(); + + ret = kvm_x86_check_processor_compatibility(); + if (ret) + return ret; + ret = static_call(kvm_x86_hardware_enable)(); if (ret != 0) return ret; -- 2.38.1.584.g0f3c55d4c2-goog
[PATCH v2 39/50] KVM: x86: Move CPU compat checks hook to kvm_x86_ops (from kvm_x86_init_ops)
Move the .check_processor_compatibility() callback from kvm_x86_init_ops to kvm_x86_ops to allow a future patch to do compatibility checks during CPU hotplug. Do kvm_ops_update() before compat checks so that static_call() can be used during compat checks. Signed-off-by: Sean Christopherson --- arch/x86/include/asm/kvm-x86-ops.h | 1 + arch/x86/include/asm/kvm_host.h| 3 ++- arch/x86/kvm/svm/svm.c | 5 +++-- arch/x86/kvm/vmx/vmx.c | 16 +++ arch/x86/kvm/x86.c | 31 +++--- 5 files changed, 25 insertions(+), 31 deletions(-) diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h index abccd51dcfca..dba2909e5ae2 100644 --- a/arch/x86/include/asm/kvm-x86-ops.h +++ b/arch/x86/include/asm/kvm-x86-ops.h @@ -14,6 +14,7 @@ BUILD_BUG_ON(1) * to make a definition optional, but in this case the default will * be __static_call_return0. */ +KVM_X86_OP(check_processor_compatibility) KVM_X86_OP(hardware_enable) KVM_X86_OP(hardware_disable) KVM_X86_OP(hardware_unsetup) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index d79aedf70908..ba74fea6850b 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1518,6 +1518,8 @@ static inline u16 kvm_lapic_irq_dest_mode(bool dest_mode_logical) struct kvm_x86_ops { const char *name; + int (*check_processor_compatibility)(void); + int (*hardware_enable)(void); void (*hardware_disable)(void); void (*hardware_unsetup)(void); @@ -1729,7 +1731,6 @@ struct kvm_x86_nested_ops { }; struct kvm_x86_init_ops { - int (*check_processor_compatibility)(void); int (*hardware_setup)(void); unsigned int (*handle_intel_pt_intr)(void); diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c index 9f94efcb9aa6..c2e95c0d9fd8 100644 --- a/arch/x86/kvm/svm/svm.c +++ b/arch/x86/kvm/svm/svm.c @@ -543,7 +543,7 @@ static bool kvm_is_svm_supported(void) return true; } -static int __init svm_check_processor_compat(void) +static int svm_check_processor_compat(void) { if (!kvm_is_svm_supported()) return -EIO; @@ -4695,6 +4695,8 @@ static int svm_vm_init(struct kvm *kvm) static struct kvm_x86_ops svm_x86_ops __initdata = { .name = KBUILD_MODNAME, + .check_processor_compatibility = svm_check_processor_compat, + .hardware_unsetup = svm_hardware_unsetup, .hardware_enable = svm_hardware_enable, .hardware_disable = svm_hardware_disable, @@ -5079,7 +5081,6 @@ static __init int svm_hardware_setup(void) static struct kvm_x86_init_ops svm_init_ops __initdata = { .hardware_setup = svm_hardware_setup, - .check_processor_compatibility = svm_check_processor_compat, .runtime_ops = _x86_ops, .pmu_ops = _pmu_ops, diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index 2a8a6e481c76..6416ed5b7f89 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -2520,8 +2520,7 @@ static bool cpu_has_perf_global_ctrl_bug(void) return false; } -static __init int adjust_vmx_controls(u32 ctl_min, u32 ctl_opt, - u32 msr, u32 *result) +static int adjust_vmx_controls(u32 ctl_min, u32 ctl_opt, u32 msr, u32 *result) { u32 vmx_msr_low, vmx_msr_high; u32 ctl = ctl_min | ctl_opt; @@ -2539,7 +2538,7 @@ static __init int adjust_vmx_controls(u32 ctl_min, u32 ctl_opt, return 0; } -static __init u64 adjust_vmx_controls64(u64 ctl_opt, u32 msr) +static u64 adjust_vmx_controls64(u64 ctl_opt, u32 msr) { u64 allowed; @@ -2548,8 +2547,8 @@ static __init u64 adjust_vmx_controls64(u64 ctl_opt, u32 msr) return ctl_opt & allowed; } -static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf, - struct vmx_capability *vmx_cap) +static int setup_vmcs_config(struct vmcs_config *vmcs_conf, +struct vmx_capability *vmx_cap) { u32 vmx_msr_low, vmx_msr_high; u32 _pin_based_exec_control = 0; @@ -2710,7 +2709,7 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf, return 0; } -static bool __init kvm_is_vmx_supported(void) +static bool kvm_is_vmx_supported(void) { if (!cpu_has_vmx()) { pr_err("CPU doesn't support VMX\n"); @@ -2726,7 +2725,7 @@ static bool __init kvm_is_vmx_supported(void) return true; } -static int __init vmx_check_processor_compat(void) +static int vmx_check_processor_compat(void) { struct vmcs_config vmcs_conf; struct vmx_capability vmx_cap; @@ -8104,6 +8103,8 @@ static void vmx_vm_destroy(struct kvm *kvm) static struct kvm_x86_ops vmx_x86_ops __initdata = { .name = KBUILD_MODNAME, + .check_processor_compatibility = vmx_check_processor_compat, + .hardware_unsetup =
[PATCH v2 38/50] KVM: SVM: Check for SVM support in CPU compatibility checks
Check that SVM is supported and enabled in the processor compatibility checks. SVM already checks for support during hardware enabling, i.e. this doesn't really add new functionality. The net effect is that KVM will refuse to load if a CPU doesn't have SVM fully enabled, as opposed to failing KVM_CREATE_VM. Opportunistically move svm_check_processor_compat() up in svm.c so that it can be invoked during hardware enabling in a future patch. Signed-off-by: Sean Christopherson --- arch/x86/kvm/svm/svm.c | 13 - 1 file changed, 8 insertions(+), 5 deletions(-) diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c index 49ccef9fae81..9f94efcb9aa6 100644 --- a/arch/x86/kvm/svm/svm.c +++ b/arch/x86/kvm/svm/svm.c @@ -543,6 +543,14 @@ static bool kvm_is_svm_supported(void) return true; } +static int __init svm_check_processor_compat(void) +{ + if (!kvm_is_svm_supported()) + return -EIO; + + return 0; +} + void __svm_write_tsc_multiplier(u64 multiplier) { preempt_disable(); @@ -4087,11 +4095,6 @@ svm_patch_hypercall(struct kvm_vcpu *vcpu, unsigned char *hypercall) hypercall[2] = 0xd9; } -static int __init svm_check_processor_compat(void) -{ - return 0; -} - /* * The kvm parameter can be NULL (module initialization, or invocation before * VM creation). Be sure to check the kvm parameter before using it. -- 2.38.1.584.g0f3c55d4c2-goog
[PATCH v2 37/50] KVM: VMX: Shuffle support checks and hardware enabling code around
Reorder code in vmx.c so that the VMX support check helpers reside above the hardware enabling helpers, which will allow KVM to perform support checks during hardware enabling (in a future patch). No functional change intended. Signed-off-by: Sean Christopherson --- arch/x86/kvm/vmx/vmx.c | 216 - 1 file changed, 108 insertions(+), 108 deletions(-) diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index 23b64bf4bfcf..2a8a6e481c76 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -2485,79 +2485,6 @@ static void vmx_cache_reg(struct kvm_vcpu *vcpu, enum kvm_reg reg) } } -static int kvm_cpu_vmxon(u64 vmxon_pointer) -{ - u64 msr; - - cr4_set_bits(X86_CR4_VMXE); - - asm_volatile_goto("1: vmxon %[vmxon_pointer]\n\t" - _ASM_EXTABLE(1b, %l[fault]) - : : [vmxon_pointer] "m"(vmxon_pointer) - : : fault); - return 0; - -fault: - WARN_ONCE(1, "VMXON faulted, MSR_IA32_FEAT_CTL (0x3a) = 0x%llx\n", - rdmsrl_safe(MSR_IA32_FEAT_CTL, ) ? 0xdeadbeef : msr); - cr4_clear_bits(X86_CR4_VMXE); - - return -EFAULT; -} - -static int vmx_hardware_enable(void) -{ - int cpu = raw_smp_processor_id(); - u64 phys_addr = __pa(per_cpu(vmxarea, cpu)); - int r; - - if (cr4_read_shadow() & X86_CR4_VMXE) - return -EBUSY; - - /* -* This can happen if we hot-added a CPU but failed to allocate -* VP assist page for it. -*/ - if (static_branch_unlikely(_evmcs) && - !hv_get_vp_assist_page(cpu)) - return -EFAULT; - - intel_pt_handle_vmx(1); - - r = kvm_cpu_vmxon(phys_addr); - if (r) { - intel_pt_handle_vmx(0); - return r; - } - - if (enable_ept) - ept_sync_global(); - - return 0; -} - -static void vmclear_local_loaded_vmcss(void) -{ - int cpu = raw_smp_processor_id(); - struct loaded_vmcs *v, *n; - - list_for_each_entry_safe(v, n, _cpu(loaded_vmcss_on_cpu, cpu), -loaded_vmcss_on_cpu_link) - __loaded_vmcs_clear(v); -} - -static void vmx_hardware_disable(void) -{ - vmclear_local_loaded_vmcss(); - - if (cpu_vmxoff()) - kvm_spurious_fault(); - - hv_reset_evmcs(); - - intel_pt_handle_vmx(0); -} - /* * There is no X86_FEATURE for SGX yet, but anyway we need to query CPUID * directly instead of going through cpu_has(), to ensure KVM is trapping @@ -2783,6 +2710,114 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf, return 0; } +static bool __init kvm_is_vmx_supported(void) +{ + if (!cpu_has_vmx()) { + pr_err("CPU doesn't support VMX\n"); + return false; + } + + if (!this_cpu_has(X86_FEATURE_MSR_IA32_FEAT_CTL) || + !this_cpu_has(X86_FEATURE_VMX)) { + pr_err("VMX not enabled (by BIOS) in MSR_IA32_FEAT_CTL\n"); + return false; + } + + return true; +} + +static int __init vmx_check_processor_compat(void) +{ + struct vmcs_config vmcs_conf; + struct vmx_capability vmx_cap; + + if (!kvm_is_vmx_supported()) + return -EIO; + + if (setup_vmcs_config(_conf, _cap) < 0) + return -EIO; + if (nested) + nested_vmx_setup_ctls_msrs(_conf, vmx_cap.ept); + if (memcmp(_config, _conf, sizeof(struct vmcs_config)) != 0) { + pr_err("CPU %d feature inconsistency!\n", smp_processor_id()); + return -EIO; + } + return 0; +} + +static int kvm_cpu_vmxon(u64 vmxon_pointer) +{ + u64 msr; + + cr4_set_bits(X86_CR4_VMXE); + + asm_volatile_goto("1: vmxon %[vmxon_pointer]\n\t" + _ASM_EXTABLE(1b, %l[fault]) + : : [vmxon_pointer] "m"(vmxon_pointer) + : : fault); + return 0; + +fault: + WARN_ONCE(1, "VMXON faulted, MSR_IA32_FEAT_CTL (0x3a) = 0x%llx\n", + rdmsrl_safe(MSR_IA32_FEAT_CTL, ) ? 0xdeadbeef : msr); + cr4_clear_bits(X86_CR4_VMXE); + + return -EFAULT; +} + +static int vmx_hardware_enable(void) +{ + int cpu = raw_smp_processor_id(); + u64 phys_addr = __pa(per_cpu(vmxarea, cpu)); + int r; + + if (cr4_read_shadow() & X86_CR4_VMXE) + return -EBUSY; + + /* +* This can happen if we hot-added a CPU but failed to allocate +* VP assist page for it. +*/ + if (static_branch_unlikely(_evmcs) && + !hv_get_vp_assist_page(cpu)) + return -EFAULT; + + intel_pt_handle_vmx(1); + + r = kvm_cpu_vmxon(phys_addr); + if (r) { + intel_pt_handle_vmx(0); + return r; + } + + if (enable_ept) +
[PATCH v2 36/50] KVM: x86: Do VMX/SVM support checks directly in vendor code
Do basic VMX/SVM support checks directly in vendor code instead of implementing them via kvm_x86_ops hooks. Beyond the superficial benefit of providing common messages, which isn't even clearly a net positive since vendor code can provide more precise/detailed messages, there's zero advantage to bouncing through common x86 code. Consolidating the checks will also simplify performing the checks across all CPUs (in a future patch). Signed-off-by: Sean Christopherson --- arch/x86/include/asm/kvm_host.h | 2 -- arch/x86/kvm/svm/svm.c | 38 +++-- arch/x86/kvm/vmx/vmx.c | 37 +--- arch/x86/kvm/x86.c | 11 -- 4 files changed, 37 insertions(+), 51 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 04a9ae66fb8d..d79aedf70908 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1729,8 +1729,6 @@ struct kvm_x86_nested_ops { }; struct kvm_x86_init_ops { - int (*cpu_has_kvm_support)(void); - int (*disabled_by_bios)(void); int (*check_processor_compatibility)(void); int (*hardware_setup)(void); unsigned int (*handle_intel_pt_intr)(void); diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c index ab53da3fbcd1..49ccef9fae81 100644 --- a/arch/x86/kvm/svm/svm.c +++ b/arch/x86/kvm/svm/svm.c @@ -519,21 +519,28 @@ static void svm_init_osvw(struct kvm_vcpu *vcpu) vcpu->arch.osvw.status |= 1; } -static int has_svm(void) +static bool kvm_is_svm_supported(void) { const char *msg; + u64 vm_cr; if (!cpu_has_svm()) { - printk(KERN_INFO "has_svm: %s\n", msg); - return 0; + pr_err("SVM not supported, %s\n", msg); + return false; } if (cc_platform_has(CC_ATTR_GUEST_MEM_ENCRYPT)) { pr_info("KVM is unsupported when running as an SEV guest\n"); - return 0; + return false; } - return 1; + rdmsrl(MSR_VM_CR, vm_cr); + if (vm_cr & (1 << SVM_VM_CR_SVM_DISABLE)) { + pr_err("SVM disabled (by BIOS) in MSR_VM_CR\n"); + return false; + } + + return true; } void __svm_write_tsc_multiplier(u64 multiplier) @@ -572,10 +579,9 @@ static int svm_hardware_enable(void) if (efer & EFER_SVME) return -EBUSY; - if (!has_svm()) { - pr_err("%s: err EOPNOTSUPP on %d\n", __func__, me); + if (!kvm_is_svm_supported()) return -EINVAL; - } + sd = per_cpu_ptr(_data, me); sd->asid_generation = 1; sd->max_asid = cpuid_ebx(SVM_CPUID_FUNC) - 1; @@ -4070,17 +4076,6 @@ static void svm_load_mmu_pgd(struct kvm_vcpu *vcpu, hpa_t root_hpa, vmcb_mark_dirty(svm->vmcb, VMCB_CR); } -static int is_disabled(void) -{ - u64 vm_cr; - - rdmsrl(MSR_VM_CR, vm_cr); - if (vm_cr & (1 << SVM_VM_CR_SVM_DISABLE)) - return 1; - - return 0; -} - static void svm_patch_hypercall(struct kvm_vcpu *vcpu, unsigned char *hypercall) { @@ -5080,8 +5075,6 @@ static __init int svm_hardware_setup(void) static struct kvm_x86_init_ops svm_init_ops __initdata = { - .cpu_has_kvm_support = has_svm, - .disabled_by_bios = is_disabled, .hardware_setup = svm_hardware_setup, .check_processor_compatibility = svm_check_processor_compat, @@ -5095,6 +5088,9 @@ static int __init svm_init(void) __unused_size_checks(); + if (!kvm_is_svm_supported()) + return -EOPNOTSUPP; + r = kvm_x86_vendor_init(_init_ops); if (r) return r; diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index 3f7d9f88b314..23b64bf4bfcf 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -2485,17 +2485,6 @@ static void vmx_cache_reg(struct kvm_vcpu *vcpu, enum kvm_reg reg) } } -static __init int cpu_has_kvm_support(void) -{ - return cpu_has_vmx(); -} - -static __init int vmx_disabled_by_bios(void) -{ - return !this_cpu_has(X86_FEATURE_MSR_IA32_FEAT_CTL) || - !this_cpu_has(X86_FEATURE_VMX); -} - static int kvm_cpu_vmxon(u64 vmxon_pointer) { u64 msr; @@ -7479,16 +7468,29 @@ static int vmx_vm_init(struct kvm *kvm) return 0; } +static bool __init kvm_is_vmx_supported(void) +{ + if (!cpu_has_vmx()) { + pr_err("CPU doesn't support VMX\n"); + return false; + } + + if (!this_cpu_has(X86_FEATURE_MSR_IA32_FEAT_CTL) || + !this_cpu_has(X86_FEATURE_VMX)) { + pr_err("VMX not enabled (by BIOS) in MSR_IA32_FEAT_CTL\n"); + return false; + } + + return true; +} + static int __init vmx_check_processor_compat(void) { struct vmcs_config vmcs_conf; struct vmx_capability vmx_cap; -
[PATCH v2 35/50] KVM: VMX: Use current CPU's info to perform "disabled by BIOS?" checks
Use this_cpu_has() instead of boot_cpu_has() to perform the effective "disabled by BIOS?" checks for VMX. This will allow consolidating code between vmx_disabled_by_bios() and vmx_check_processor_compat(). Checking the boot CPU isn't a strict requirement as any divergence in VMX enabling between the boot CPU and other CPUs will result in KVM refusing to load thanks to the aforementioned vmx_check_processor_compat(). Furthermore, using the boot CPU was an unintentional change introduced by commit a4d0b2fdbcf7 ("KVM: VMX: Use VMX feature flag to query BIOS enabling"). Prior to using the feature flags, KVM checked the raw MSR value from the current CPU. Reported-by: Kai Huang Signed-off-by: Sean Christopherson --- arch/x86/kvm/vmx/vmx.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index e859d2b7daa4..3f7d9f88b314 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -2492,8 +2492,8 @@ static __init int cpu_has_kvm_support(void) static __init int vmx_disabled_by_bios(void) { - return !boot_cpu_has(X86_FEATURE_MSR_IA32_FEAT_CTL) || - !boot_cpu_has(X86_FEATURE_VMX); + return !this_cpu_has(X86_FEATURE_MSR_IA32_FEAT_CTL) || + !this_cpu_has(X86_FEATURE_VMX); } static int kvm_cpu_vmxon(u64 vmxon_pointer) -- 2.38.1.584.g0f3c55d4c2-goog
[PATCH v2 32/50] KVM: Drop kvm_arch_check_processor_compat() hook
Drop kvm_arch_check_processor_compat() and its support code now that all architecture implementations are nops. Signed-off-by: Sean Christopherson Reviewed-by: Philippe Mathieu-Daudé Reviewed-by: Eric Farman # s390 Acked-by: Anup Patel --- arch/arm64/kvm/arm.c | 7 +-- arch/mips/kvm/mips.c | 7 +-- arch/powerpc/kvm/book3s.c | 2 +- arch/powerpc/kvm/e500.c| 2 +- arch/powerpc/kvm/e500mc.c | 2 +- arch/powerpc/kvm/powerpc.c | 5 - arch/riscv/kvm/main.c | 7 +-- arch/s390/kvm/kvm-s390.c | 7 +-- arch/x86/kvm/svm/svm.c | 4 ++-- arch/x86/kvm/vmx/vmx.c | 4 ++-- arch/x86/kvm/x86.c | 5 - include/linux/kvm_host.h | 4 +--- virt/kvm/kvm_main.c| 24 +--- 13 files changed, 13 insertions(+), 67 deletions(-) diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c index 936ef7d1ea94..e915b1d9f2cd 100644 --- a/arch/arm64/kvm/arm.c +++ b/arch/arm64/kvm/arm.c @@ -63,11 +63,6 @@ int kvm_arch_vcpu_should_kick(struct kvm_vcpu *vcpu) return kvm_vcpu_exiting_guest_mode(vcpu) == IN_GUEST_MODE; } -int kvm_arch_check_processor_compat(void *opaque) -{ - return 0; -} - int kvm_vm_ioctl_enable_cap(struct kvm *kvm, struct kvm_enable_cap *cap) { @@ -2273,7 +2268,7 @@ static __init int kvm_arm_init(void) * FIXME: Do something reasonable if kvm_init() fails after pKVM * hypervisor protection is finalized. */ - err = kvm_init(NULL, sizeof(struct kvm_vcpu), 0, THIS_MODULE); + err = kvm_init(sizeof(struct kvm_vcpu), 0, THIS_MODULE); if (err) goto out_subs; diff --git a/arch/mips/kvm/mips.c b/arch/mips/kvm/mips.c index 3cade648827a..36c8991b5d39 100644 --- a/arch/mips/kvm/mips.c +++ b/arch/mips/kvm/mips.c @@ -135,11 +135,6 @@ void kvm_arch_hardware_disable(void) kvm_mips_callbacks->hardware_disable(); } -int kvm_arch_check_processor_compat(void *opaque) -{ - return 0; -} - extern void kvm_init_loongson_ipi(struct kvm *kvm); int kvm_arch_init_vm(struct kvm *kvm, unsigned long type) @@ -1636,7 +1631,7 @@ static int __init kvm_mips_init(void) register_die_notifier(_mips_csr_die_notifier); - ret = kvm_init(NULL, sizeof(struct kvm_vcpu), 0, THIS_MODULE); + ret = kvm_init(sizeof(struct kvm_vcpu), 0, THIS_MODULE); if (ret) { unregister_die_notifier(_mips_csr_die_notifier); return ret; diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c index 87283a0e33d8..57f4e7896d67 100644 --- a/arch/powerpc/kvm/book3s.c +++ b/arch/powerpc/kvm/book3s.c @@ -1052,7 +1052,7 @@ static int kvmppc_book3s_init(void) { int r; - r = kvm_init(NULL, sizeof(struct kvm_vcpu), 0, THIS_MODULE); + r = kvm_init(sizeof(struct kvm_vcpu), 0, THIS_MODULE); if (r) return r; #ifdef CONFIG_KVM_BOOK3S_32_HANDLER diff --git a/arch/powerpc/kvm/e500.c b/arch/powerpc/kvm/e500.c index 0ea61190ec04..b0f695428733 100644 --- a/arch/powerpc/kvm/e500.c +++ b/arch/powerpc/kvm/e500.c @@ -531,7 +531,7 @@ static int __init kvmppc_e500_init(void) flush_icache_range(kvmppc_booke_handlers, kvmppc_booke_handlers + ivor[max_ivor] + handler_len); - r = kvm_init(NULL, sizeof(struct kvmppc_vcpu_e500), 0, THIS_MODULE); + r = kvm_init(sizeof(struct kvmppc_vcpu_e500), 0, THIS_MODULE); if (r) goto err_out; kvm_ops_e500.owner = THIS_MODULE; diff --git a/arch/powerpc/kvm/e500mc.c b/arch/powerpc/kvm/e500mc.c index 795667f7ebf0..611532a0dedc 100644 --- a/arch/powerpc/kvm/e500mc.c +++ b/arch/powerpc/kvm/e500mc.c @@ -404,7 +404,7 @@ static int __init kvmppc_e500mc_init(void) */ kvmppc_init_lpid(KVMPPC_NR_LPIDS/threads_per_core); - r = kvm_init(NULL, sizeof(struct kvmppc_vcpu_e500), 0, THIS_MODULE); + r = kvm_init(sizeof(struct kvmppc_vcpu_e500), 0, THIS_MODULE); if (r) goto err_out; kvm_ops_e500mc.owner = THIS_MODULE; diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c index 01d0f9935e6c..f5b4ff6bfc89 100644 --- a/arch/powerpc/kvm/powerpc.c +++ b/arch/powerpc/kvm/powerpc.c @@ -440,11 +440,6 @@ int kvm_arch_hardware_enable(void) return 0; } -int kvm_arch_check_processor_compat(void *opaque) -{ - return 0; -} - int kvm_arch_init_vm(struct kvm *kvm, unsigned long type) { struct kvmppc_ops *kvm_ops = NULL; diff --git a/arch/riscv/kvm/main.c b/arch/riscv/kvm/main.c index 4710a6751687..34c3dece6990 100644 --- a/arch/riscv/kvm/main.c +++ b/arch/riscv/kvm/main.c @@ -20,11 +20,6 @@ long kvm_arch_dev_ioctl(struct file *filp, return -EINVAL; } -int kvm_arch_check_processor_compat(void *opaque) -{ - return 0; -} - int kvm_arch_hardware_enable(void) { unsigned long hideleg, hedeleg; @@ -110,6 +105,6 @@ static int __init riscv_kvm_init(void)
[PATCH v2 34/50] KVM: x86: Unify pr_fmt to use module name for all KVM modules
Define pr_fmt using KBUILD_MODNAME for all KVM x86 code so that printks use consistent formatting across common x86, Intel, and AMD code. In addition to providing consistent print formatting, using KBUILD_MODNAME, e.g. kvm_amd and kvm_intel, allows referencing SVM and VMX (and SEV and SGX and ...) as technologies without generating weird messages, and without causing naming conflicts with other kernel code, e.g. "SEV: ", "tdx: ", "sgx: " etc.. are all used by the kernel for non-KVM subsystems. Opportunistically move away from printk() for prints that need to be modified anyways, e.g. to drop a manual "kvm: " prefix. Opportunistically convert a few SGX WARNs that are similarly modified to WARN_ONCE; in the very unlikely event that the WARNs fire, odds are good that they would fire repeatedly and spam the kernel log without providing unique information in each print. Note, defining pr_fmt yields undesirable results for code that uses KVM's printk wrappers, e.g. vcpu_unimpl(). But, that's a pre-existing problem as SVM/kvm_amd already defines a pr_fmt, and thankfully use of KVM's wrappers is relatively limited in KVM x86 code. Signed-off-by: Sean Christopherson --- arch/x86/kvm/cpuid.c| 1 + arch/x86/kvm/debugfs.c | 2 ++ arch/x86/kvm/emulate.c | 1 + arch/x86/kvm/hyperv.c | 1 + arch/x86/kvm/i8254.c| 4 ++-- arch/x86/kvm/i8259.c| 4 +++- arch/x86/kvm/ioapic.c | 1 + arch/x86/kvm/irq.c | 1 + arch/x86/kvm/irq_comm.c | 7 +++--- arch/x86/kvm/kvm_onhyperv.c | 1 + arch/x86/kvm/lapic.c| 8 +++ arch/x86/kvm/mmu/mmu.c | 6 ++--- arch/x86/kvm/mmu/page_track.c | 1 + arch/x86/kvm/mmu/spte.c | 4 ++-- arch/x86/kvm/mmu/spte.h | 4 ++-- arch/x86/kvm/mmu/tdp_iter.c | 1 + arch/x86/kvm/mmu/tdp_mmu.c | 1 + arch/x86/kvm/mtrr.c | 1 + arch/x86/kvm/pmu.c | 1 + arch/x86/kvm/smm.c | 1 + arch/x86/kvm/svm/avic.c | 2 +- arch/x86/kvm/svm/nested.c | 2 +- arch/x86/kvm/svm/pmu.c | 2 ++ arch/x86/kvm/svm/sev.c | 1 + arch/x86/kvm/svm/svm.c | 10 - arch/x86/kvm/svm/svm_onhyperv.c | 1 + arch/x86/kvm/svm/svm_onhyperv.h | 4 ++-- arch/x86/kvm/vmx/hyperv.c | 1 + arch/x86/kvm/vmx/hyperv.h | 4 +--- arch/x86/kvm/vmx/nested.c | 3 ++- arch/x86/kvm/vmx/pmu_intel.c| 5 +++-- arch/x86/kvm/vmx/posted_intr.c | 2 ++ arch/x86/kvm/vmx/sgx.c | 5 +++-- arch/x86/kvm/vmx/vmcs12.c | 1 + arch/x86/kvm/vmx/vmx.c | 40 - arch/x86/kvm/vmx/vmx_ops.h | 4 ++-- arch/x86/kvm/x86.c | 28 --- arch/x86/kvm/xen.c | 1 + 38 files changed, 97 insertions(+), 70 deletions(-) diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c index 723502181a3a..82411693e604 100644 --- a/arch/x86/kvm/cpuid.c +++ b/arch/x86/kvm/cpuid.c @@ -8,6 +8,7 @@ * Copyright 2011 Red Hat, Inc. and/or its affiliates. * Copyright IBM Corporation, 2008 */ +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt #include #include diff --git a/arch/x86/kvm/debugfs.c b/arch/x86/kvm/debugfs.c index c1390357126a..ee8c4c3496ed 100644 --- a/arch/x86/kvm/debugfs.c +++ b/arch/x86/kvm/debugfs.c @@ -4,6 +4,8 @@ * * Copyright 2016 Red Hat, Inc. and/or its affiliates. */ +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt + #include #include #include "lapic.h" diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index 5cc3efa0e21c..c3443045cd93 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -17,6 +17,7 @@ * * From: xen-unstable 10676:af9809f51f81a3c43f276f00c81a52ef558afda4 */ +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt #include #include "kvm_cache_regs.h" diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c index 2c7f2a26421e..4c47892d72bb 100644 --- a/arch/x86/kvm/hyperv.c +++ b/arch/x86/kvm/hyperv.c @@ -17,6 +17,7 @@ * Ben-Ami Yassour * Andrey Smetanin */ +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt #include "x86.h" #include "lapic.h" diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c index e0a7a0e7a73c..cd57a517d04a 100644 --- a/arch/x86/kvm/i8254.c +++ b/arch/x86/kvm/i8254.c @@ -30,7 +30,7 @@ * Based on QEMU and Xen. */ -#define pr_fmt(fmt) "pit: " fmt +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt #include #include @@ -351,7 +351,7 @@ static void create_pit_timer(struct kvm_pit *pit, u32 val, int is_period) if (ps->period < min_period) { pr_info_ratelimited( - "kvm: requested %lld ns " + "requested %lld ns " "i8254 timer period limited to %lld ns\n", ps->period, min_period); ps->period = min_period; diff --git a/arch/x86/kvm/i8259.c
[PATCH v2 33/50] KVM: x86: Use KBUILD_MODNAME to specify vendor module name
Use KBUILD_MODNAME to specify the vendor module name instead of manually writing out the name to make it a bit more obvious that the name isn't completely arbitrary. A future patch will also use KBUILD_MODNAME to define pr_fmt, at which point using KBUILD_MODNAME for kvm_x86_ops.name further reinforces the intended usage of kvm_x86_ops.name. No functional change intended. Signed-off-by: Sean Christopherson --- arch/x86/kvm/svm/svm.c | 2 +- arch/x86/kvm/vmx/vmx.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c index d9a54590591d..a875cf7b2942 100644 --- a/arch/x86/kvm/svm/svm.c +++ b/arch/x86/kvm/svm/svm.c @@ -4695,7 +4695,7 @@ static int svm_vm_init(struct kvm *kvm) } static struct kvm_x86_ops svm_x86_ops __initdata = { - .name = "kvm_amd", + .name = KBUILD_MODNAME, .hardware_unsetup = svm_hardware_unsetup, .hardware_enable = svm_hardware_enable, diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index b6f08a0a1435..229a9cf485f0 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -8102,7 +8102,7 @@ static void vmx_vm_destroy(struct kvm *kvm) } static struct kvm_x86_ops vmx_x86_ops __initdata = { - .name = "kvm_intel", + .name = KBUILD_MODNAME, .hardware_unsetup = vmx_hardware_unsetup, -- 2.38.1.584.g0f3c55d4c2-goog
Re: [PATCH 3/5] arm64: dts: remove label = "cpu" from DSA dt-binding
Am Mittwoch, 30. November 2022, 15:10:38 CET schrieb Arınç ÜNAL: > This is not used by the DSA dt-binding, so remove it from all devicetrees. > > Signed-off-by: Arınç ÜNAL > --- > diff --git a/arch/arm64/boot/dts/rockchip/rk3568-bpi-r2-pro.dts > b/arch/arm64/boot/dts/rockchip/rk3568-bpi-r2-pro.dts > index c282f6e79960..b71162d65d2e 100644 > --- a/arch/arm64/boot/dts/rockchip/rk3568-bpi-r2-pro.dts > +++ b/arch/arm64/boot/dts/rockchip/rk3568-bpi-r2-pro.dts > @@ -552,7 +552,6 @@ port@4 { > > port@5 { > reg = <5>; > - label = "cpu"; > ethernet = <>; > phy-mode = "rgmii"; > > Rockchip-part: Acked-by: Heiko Stuebner
Re: [PATCH 0/5] remove label = "cpu" from DSA dt-binding
On 30.11.2022 18:55, Andrew Lunn wrote: On Wed, Nov 30, 2022 at 05:10:35PM +0300, Arınç ÜNAL wrote: Hello folks, With this patch series, we're completely getting rid of 'label = "cpu";' which is not used by the DSA dt-binding at all. Information for taking the patches for maintainers: Patch 1: netdev maintainers (based off netdev/net-next.git main) Patch 2-3: SoC maintainers (based off soc/soc.git soc/dt) Patch 4: MIPS maintainers (based off mips/linux.git mips-next) Patch 5: PowerPC maintainers (based off powerpc/linux.git next-test) Hi Arınç So your plan is that each architecture maintainer merges one patch? Initially, I sent this series to s...@kernel.org to take it all but Rob said it must be this way instead. That is fine, but it is good to be explicit, otherwise patches will fall through the cracks because nobody picks them up. I generally use To: to indicate who i expect to merge a patch, and everybody else in the Cc: Thanks for this, I'll follow suit if I don't see any activity for a few weeks. Reviewed-by: Andrew Lunn Andrew Arınç
Re: [PATCH 0/5] remove label = "cpu" from DSA dt-binding
On Wed, Nov 30, 2022 at 05:10:35PM +0300, Arınç ÜNAL wrote: > Hello folks, > > With this patch series, we're completely getting rid of 'label = "cpu";' > which is not used by the DSA dt-binding at all. > > Information for taking the patches for maintainers: > Patch 1: netdev maintainers (based off netdev/net-next.git main) > Patch 2-3: SoC maintainers (based off soc/soc.git soc/dt) > Patch 4: MIPS maintainers (based off mips/linux.git mips-next) > Patch 5: PowerPC maintainers (based off powerpc/linux.git next-test) Hi Arınç So your plan is that each architecture maintainer merges one patch? That is fine, but it is good to be explicit, otherwise patches will fall through the cracks because nobody picks them up. I generally use To: to indicate who i expect to merge a patch, and everybody else in the Cc: Reviewed-by: Andrew Lunn Andrew
Re: [PATCH 2/5] arm: dts: remove label = "cpu" from DSA dt-binding
CC cleger On Wed, Nov 30, 2022 at 3:33 PM Arınç ÜNAL wrote: > This is not used by the DSA dt-binding, so remove it from all devicetrees. > > Signed-off-by: Arınç ÜNAL > arch/arm/boot/dts/r9a06g032.dtsi | 1 - Acked-by: Geert Uytterhoeven > --- a/arch/arm/boot/dts/r9a06g032.dtsi > +++ b/arch/arm/boot/dts/r9a06g032.dtsi > @@ -401,7 +401,6 @@ switch_port3: port@3 { > switch_port4: port@4 { > reg = <4>; > ethernet = <>; > - label = "cpu"; > phy-mode = "internal"; > status = "disabled"; > fixed-link { Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds
[PATCH 4/5] mips: dts: remove label = "cpu" from DSA dt-binding
This is not used by the DSA dt-binding, so remove it from all devicetrees. Signed-off-by: Arınç ÜNAL --- arch/mips/boot/dts/qca/ar9331.dtsi| 1 - arch/mips/boot/dts/ralink/mt7621.dtsi | 1 - 2 files changed, 2 deletions(-) diff --git a/arch/mips/boot/dts/qca/ar9331.dtsi b/arch/mips/boot/dts/qca/ar9331.dtsi index c4102b280b47..768ac0f869b1 100644 --- a/arch/mips/boot/dts/qca/ar9331.dtsi +++ b/arch/mips/boot/dts/qca/ar9331.dtsi @@ -176,7 +176,6 @@ ports { switch_port0: port@0 { reg = <0x0>; - label = "cpu"; ethernet = <>; phy-mode = "gmii"; diff --git a/arch/mips/boot/dts/ralink/mt7621.dtsi b/arch/mips/boot/dts/ralink/mt7621.dtsi index f3f4c1f26e01..445817cbf376 100644 --- a/arch/mips/boot/dts/ralink/mt7621.dtsi +++ b/arch/mips/boot/dts/ralink/mt7621.dtsi @@ -386,7 +386,6 @@ port@4 { port@6 { reg = <6>; - label = "cpu"; ethernet = <>; phy-mode = "trgmii"; -- 2.34.1
[PATCH 5/5] powerpc: dts: remove label = "cpu" from DSA dt-binding
This is not used by the DSA dt-binding, so remove it from all devicetrees. Signed-off-by: Arınç ÜNAL --- arch/powerpc/boot/dts/turris1x.dts | 2 -- 1 file changed, 2 deletions(-) diff --git a/arch/powerpc/boot/dts/turris1x.dts b/arch/powerpc/boot/dts/turris1x.dts index 045af668e928..3841c8d96d00 100644 --- a/arch/powerpc/boot/dts/turris1x.dts +++ b/arch/powerpc/boot/dts/turris1x.dts @@ -147,7 +147,6 @@ ports { port@0 { reg = <0>; - label = "cpu"; ethernet = <>; phy-mode = "rgmii-id"; @@ -184,7 +183,6 @@ port@5 { port@6 { reg = <6>; - label = "cpu"; ethernet = <>; phy-mode = "rgmii-id"; -- 2.34.1
[PATCH 3/5] arm64: dts: remove label = "cpu" from DSA dt-binding
This is not used by the DSA dt-binding, so remove it from all devicetrees. Signed-off-by: Arınç ÜNAL --- arch/arm64/boot/dts/freescale/imx8mm-venice-gw7901.dts | 1 - arch/arm64/boot/dts/freescale/imx8mp-venice-gw74xx.dts | 1 - arch/arm64/boot/dts/freescale/imx8mq-zii-ultra.dtsi| 1 - arch/arm64/boot/dts/marvell/armada-3720-espressobin.dtsi | 1 - arch/arm64/boot/dts/marvell/armada-3720-turris-mox.dts | 2 -- arch/arm64/boot/dts/marvell/armada-7040-mochabin.dts | 1 - arch/arm64/boot/dts/marvell/armada-8040-clearfog-gt-8k.dts | 1 - arch/arm64/boot/dts/marvell/cn9130-crb.dtsi| 1 - arch/arm64/boot/dts/mediatek/mt7622-bananapi-bpi-r64.dts | 1 - arch/arm64/boot/dts/mediatek/mt7622-rfb1.dts | 1 - arch/arm64/boot/dts/mediatek/mt7986a-rfb.dts | 1 - arch/arm64/boot/dts/mediatek/mt7986b-rfb.dts | 1 - arch/arm64/boot/dts/rockchip/rk3568-bpi-r2-pro.dts | 1 - 13 files changed, 14 deletions(-) diff --git a/arch/arm64/boot/dts/freescale/imx8mm-venice-gw7901.dts b/arch/arm64/boot/dts/freescale/imx8mm-venice-gw7901.dts index 750a1f07ecb7..2b1fd70acdec 100644 --- a/arch/arm64/boot/dts/freescale/imx8mm-venice-gw7901.dts +++ b/arch/arm64/boot/dts/freescale/imx8mm-venice-gw7901.dts @@ -662,7 +662,6 @@ lan4: port@3 { port@5 { reg = <5>; - label = "cpu"; ethernet = <>; phy-mode = "rgmii-id"; diff --git a/arch/arm64/boot/dts/freescale/imx8mp-venice-gw74xx.dts b/arch/arm64/boot/dts/freescale/imx8mp-venice-gw74xx.dts index ceeca4966fc5..7a70eb2d1275 100644 --- a/arch/arm64/boot/dts/freescale/imx8mp-venice-gw74xx.dts +++ b/arch/arm64/boot/dts/freescale/imx8mp-venice-gw74xx.dts @@ -546,7 +546,6 @@ lan5: port@4 { port@5 { reg = <5>; - label = "cpu"; ethernet = <>; phy-mode = "rgmii-id"; diff --git a/arch/arm64/boot/dts/freescale/imx8mq-zii-ultra.dtsi b/arch/arm64/boot/dts/freescale/imx8mq-zii-ultra.dtsi index 4e05120c62d4..efa895b2316d 100644 --- a/arch/arm64/boot/dts/freescale/imx8mq-zii-ultra.dtsi +++ b/arch/arm64/boot/dts/freescale/imx8mq-zii-ultra.dtsi @@ -177,7 +177,6 @@ port@1 { port@2 { reg = <2>; - label = "cpu"; ethernet = <>; fixed-link { diff --git a/arch/arm64/boot/dts/marvell/armada-3720-espressobin.dtsi b/arch/arm64/boot/dts/marvell/armada-3720-espressobin.dtsi index 5fc613d24151..21e7fb64515c 100644 --- a/arch/arm64/boot/dts/marvell/armada-3720-espressobin.dtsi +++ b/arch/arm64/boot/dts/marvell/armada-3720-espressobin.dtsi @@ -159,7 +159,6 @@ ports { switch0port0: port@0 { reg = <0>; - label = "cpu"; ethernet = <>; phy-mode = "rgmii-id"; fixed-link { diff --git a/arch/arm64/boot/dts/marvell/armada-3720-turris-mox.dts b/arch/arm64/boot/dts/marvell/armada-3720-turris-mox.dts index ada164d423f3..d8601b188cca 100644 --- a/arch/arm64/boot/dts/marvell/armada-3720-turris-mox.dts +++ b/arch/arm64/boot/dts/marvell/armada-3720-turris-mox.dts @@ -400,7 +400,6 @@ port@8 { port@9 { reg = <0x9>; - label = "cpu"; ethernet = <>; phy-mode = "2500base-x"; managed = "in-band-status"; @@ -485,7 +484,6 @@ port@4 { port@5 { reg = <0x5>; - label = "cpu"; phy-mode = "2500base-x"; managed = "in-band-status"; ethernet = <>; diff --git a/arch/arm64/boot/dts/marvell/armada-7040-mochabin.dts b/arch/arm64/boot/dts/marvell/armada-7040-mochabin.dts index 7ca71f2d7afb..7c65c0772208 100644 --- a/arch/arm64/boot/dts/marvell/armada-7040-mochabin.dts +++ b/arch/arm64/boot/dts/marvell/armada-7040-mochabin.dts @@ -344,7 +344,6 @@ swport4: port@4 { port@5 { reg = <5>; - label = "cpu"; ethernet = <_eth1>; phy-mode = "2500base-x"; managed = "in-band-status"; diff --git a/arch/arm64/boot/dts/marvell/armada-8040-clearfog-gt-8k.dts b/arch/arm64/boot/dts/marvell/armada-8040-clearfog-gt-8k.dts index 4125202028c8..60b11eeaeb9e 100644 ---
[PATCH 0/5] remove label = "cpu" from DSA dt-binding
Hello folks, With this patch series, we're completely getting rid of 'label = "cpu";' which is not used by the DSA dt-binding at all. Information for taking the patches for maintainers: Patch 1: netdev maintainers (based off netdev/net-next.git main) Patch 2-3: SoC maintainers (based off soc/soc.git soc/dt) Patch 4: MIPS maintainers (based off mips/linux.git mips-next) Patch 5: PowerPC maintainers (based off powerpc/linux.git next-test) I've been meaning to submit this for a few months. Find the relevant conversation here: https://lore.kernel.org/netdev/20220913155408.ga3802998-r...@kernel.org/ Here's how I did it, for the interested (or suggestions): Find the platforms which have got 'label = "cpu";' defined. grep -rnw . -e 'label = "cpu";' Remove the line where 'label = "cpu";' is included. sed -i /'label = "cpu";'/,+d arch/arm/boot/dts/* sed -i /'label = "cpu";'/,+d arch/arm64/boot/dts/freescale/* sed -i /'label = "cpu";'/,+d arch/arm64/boot/dts/marvell/* sed -i /'label = "cpu";'/,+d arch/arm64/boot/dts/mediatek/* sed -i /'label = "cpu";'/,+d arch/arm64/boot/dts/rockchip/* sed -i /'label = "cpu";'/,+d arch/mips/boot/dts/qca/* sed -i /'label = "cpu";'/,+d arch/mips/boot/dts/ralink/* sed -i /'label = "cpu";'/,+d arch/powerpc/boot/dts/turris1x.dts sed -i /'label = "cpu";'/,+d Documentation/devicetree/bindings/net/qca,ar71xx.yaml Restore the symlink files which typechange after running sed. Arınç ÜNAL (5): dt-bindings: net: qca,ar71xx: remove label = "cpu" from examples arm: dts: remove label = "cpu" from DSA dt-binding arm64: dts: remove label = "cpu" from DSA dt-binding mips: dts: remove label = "cpu" from DSA dt-binding powerpc: dts: remove label = "cpu" from DSA dt-binding
[PATCH 2/5] arm: dts: remove label = "cpu" from DSA dt-binding
This is not used by the DSA dt-binding, so remove it from all devicetrees. Signed-off-by: Arınç ÜNAL --- arch/arm/boot/dts/armada-370-rd.dts | 1 - arch/arm/boot/dts/armada-381-netgear-gs110emx.dts | 1 - arch/arm/boot/dts/armada-385-clearfog-gtr-l8.dts | 1 - arch/arm/boot/dts/armada-385-clearfog-gtr-s4.dts | 1 - arch/arm/boot/dts/armada-385-linksys.dtsi | 1 - arch/arm/boot/dts/armada-385-turris-omnia.dts | 1 - arch/arm/boot/dts/armada-388-clearfog.dts | 1 - arch/arm/boot/dts/armada-xp-linksys-mamba.dts | 1 - arch/arm/boot/dts/at91-sama5d2_icp.dts| 1 - arch/arm/boot/dts/at91-sama5d3_ksz9477_evb.dts| 1 - arch/arm/boot/dts/bcm-cygnus.dtsi | 1 - arch/arm/boot/dts/bcm4708-buffalo-wzr-1166dhp-common.dtsi | 1 - arch/arm/boot/dts/bcm4708-luxul-xap-1510.dts | 1 - arch/arm/boot/dts/bcm4708-luxul-xwc-1000.dts | 1 - arch/arm/boot/dts/bcm4708-netgear-r6250.dts | 1 - arch/arm/boot/dts/bcm4708-smartrg-sr400ac.dts | 1 - arch/arm/boot/dts/bcm47081-buffalo-wzr-600dhp2.dts| 1 - arch/arm/boot/dts/bcm47081-luxul-xap-1410.dts | 1 - arch/arm/boot/dts/bcm47081-luxul-xwr-1200.dts | 1 - arch/arm/boot/dts/bcm4709-netgear-r8000.dts | 1 - arch/arm/boot/dts/bcm47094-asus-rt-ac88u.dts | 3 --- arch/arm/boot/dts/bcm47094-dlink-dir-885l.dts | 1 - arch/arm/boot/dts/bcm47094-linksys-panamera.dts | 4 arch/arm/boot/dts/bcm47094-luxul-abr-4500.dts | 1 - arch/arm/boot/dts/bcm47094-luxul-xap-1610.dts | 1 - arch/arm/boot/dts/bcm47094-luxul-xbr-4500.dts | 1 - arch/arm/boot/dts/bcm47094-luxul-xwc-2000.dts | 1 - arch/arm/boot/dts/bcm47094-luxul-xwr-3100.dts | 1 - arch/arm/boot/dts/bcm47094-luxul-xwr-3150-v1.dts | 1 - arch/arm/boot/dts/bcm47189-tenda-ac9.dts | 1 - arch/arm/boot/dts/bcm53015-meraki-mr26.dts| 1 - arch/arm/boot/dts/bcm53016-meraki-mr32.dts| 1 - arch/arm/boot/dts/bcm953012er.dts | 1 - arch/arm/boot/dts/bcm958622hr.dts | 1 - arch/arm/boot/dts/bcm958623hr.dts | 1 - arch/arm/boot/dts/bcm958625hr.dts | 1 - arch/arm/boot/dts/bcm958625k.dts | 1 - arch/arm/boot/dts/bcm988312hr.dts | 1 - arch/arm/boot/dts/gemini-dlink-dir-685.dts| 1 - arch/arm/boot/dts/gemini-sl93512r.dts | 1 - arch/arm/boot/dts/gemini-sq201.dts| 1 - arch/arm/boot/dts/imx51-zii-rdu1.dts | 1 - arch/arm/boot/dts/imx51-zii-scu2-mezz.dts | 1 - arch/arm/boot/dts/imx51-zii-scu3-esb.dts | 1 - arch/arm/boot/dts/imx53-kp-hsc.dts| 1 - arch/arm/boot/dts/imx6dl-yapp4-common.dtsi| 1 - arch/arm/boot/dts/imx6q-b450v3.dts| 1 - arch/arm/boot/dts/imx6q-b650v3.dts| 1 - arch/arm/boot/dts/imx6q-b850v3.dts| 1 - arch/arm/boot/dts/imx6qdl-gw5904.dtsi | 1 - arch/arm/boot/dts/imx6qdl-skov-cpu.dtsi | 1 - arch/arm/boot/dts/imx6qdl-zii-rdu2.dtsi | 1 - arch/arm/boot/dts/imx6qp-prtwd3.dts | 1 - arch/arm/boot/dts/imx7d-zii-rpu2.dts | 1 - arch/arm/boot/dts/kirkwood-dir665.dts | 1 - arch/arm/boot/dts/kirkwood-l-50.dts | 1 - arch/arm/boot/dts/kirkwood-linksys-viper.dts | 1 - arch/arm/boot/dts/kirkwood-mv88f6281gtw-ge.dts| 1 - arch/arm/boot/dts/kirkwood-rd88f6281.dtsi | 1 - arch/arm/boot/dts/mt7623a-rfb-emmc.dts| 1 - arch/arm/boot/dts/mt7623a-rfb-nand.dts| 1 - arch/arm/boot/dts/mt7623n-bananapi-bpi-r2.dts | 1 - arch/arm/boot/dts/mt7623n-rfb-emmc.dts| 1 - arch/arm/boot/dts/orion5x-netgear-wnr854t.dts | 1 - arch/arm/boot/dts/qcom-ipq8064-rb3011.dts | 2 -- arch/arm/boot/dts/r9a06g032.dtsi | 1 - arch/arm/boot/dts/stm32mp151a-prtt1c.dts | 1 - arch/arm/boot/dts/sun7i-a20-lamobo-r1.dts | 1 - arch/arm/boot/dts/vf610-zii-cfu1.dts | 1 - arch/arm/boot/dts/vf610-zii-dev-rev-b.dts | 1 - arch/arm/boot/dts/vf610-zii-dev-rev-c.dts | 1 - arch/arm/boot/dts/vf610-zii-scu4-aib.dts | 1 - arch/arm/boot/dts/vf610-zii-spb4.dts | 1 - arch/arm/boot/dts/vf610-zii-ssmb-dtu.dts | 1 - arch/arm/boot/dts/vf610-zii-ssmb-spu3.dts | 1 - 75 files
[PATCH 1/5] dt-bindings: net: qca,ar71xx: remove label = "cpu" from examples
This is not used by the DSA dt-binding, so remove it from the examples. Signed-off-by: Arınç ÜNAL --- Documentation/devicetree/bindings/net/qca,ar71xx.yaml | 1 - 1 file changed, 1 deletion(-) diff --git a/Documentation/devicetree/bindings/net/qca,ar71xx.yaml b/Documentation/devicetree/bindings/net/qca,ar71xx.yaml index 1ebf9e8c8a1d..89f94b31b546 100644 --- a/Documentation/devicetree/bindings/net/qca,ar71xx.yaml +++ b/Documentation/devicetree/bindings/net/qca,ar71xx.yaml @@ -123,7 +123,6 @@ examples: switch_port0: port@0 { reg = <0x0>; -label = "cpu"; ethernet = <>; phy-mode = "gmii"; -- 2.34.1
[PATCH v2 31/50] KVM: x86: Do CPU compatibility checks in x86 code
Move the CPU compatibility checks to pure x86 code, i.e. drop x86's use of the common kvm_x86_check_cpu_compat() arch hook. x86 is the only architecture that "needs" to do per-CPU compatibility checks, moving the logic to x86 will allow dropping the common code, and will also give x86 more control over when/how the compatibility checks are performed, e.g. TDX will need to enable hardware (do VMXON) in order to perform compatibility checks. Signed-off-by: Sean Christopherson --- arch/x86/kvm/svm/svm.c | 2 +- arch/x86/kvm/vmx/vmx.c | 2 +- arch/x86/kvm/x86.c | 49 -- 3 files changed, 40 insertions(+), 13 deletions(-) diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c index 19e81a99c58f..d7ea1c1175c2 100644 --- a/arch/x86/kvm/svm/svm.c +++ b/arch/x86/kvm/svm/svm.c @@ -5103,7 +5103,7 @@ static int __init svm_init(void) * Common KVM initialization _must_ come last, after this, /dev/kvm is * exposed to userspace! */ - r = kvm_init(_init_ops, sizeof(struct vcpu_svm), + r = kvm_init(NULL, sizeof(struct vcpu_svm), __alignof__(struct vcpu_svm), THIS_MODULE); if (r) goto err_kvm_init; diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index 654d81f781da..8deb1bd60c10 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -8592,7 +8592,7 @@ static int __init vmx_init(void) * Common KVM initialization _must_ come last, after this, /dev/kvm is * exposed to userspace! */ - r = kvm_init(_init_ops, sizeof(struct vcpu_vmx), + r = kvm_init(NULL, sizeof(struct vcpu_vmx), __alignof__(struct vcpu_vmx), THIS_MODULE); if (r) goto err_kvm_init; diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 66f16458aa97..3571bc968cf8 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -9277,10 +9277,36 @@ static inline void kvm_ops_update(struct kvm_x86_init_ops *ops) kvm_pmu_ops_update(ops->pmu_ops); } +struct kvm_cpu_compat_check { + struct kvm_x86_init_ops *ops; + int *ret; +}; + +static int kvm_x86_check_processor_compatibility(struct kvm_x86_init_ops *ops) +{ + struct cpuinfo_x86 *c = _data(smp_processor_id()); + + WARN_ON(!irqs_disabled()); + + if (__cr4_reserved_bits(cpu_has, c) != + __cr4_reserved_bits(cpu_has, _cpu_data)) + return -EIO; + + return ops->check_processor_compatibility(); +} + +static void kvm_x86_check_cpu_compat(void *data) +{ + struct kvm_cpu_compat_check *c = data; + + *c->ret = kvm_x86_check_processor_compatibility(c->ops); +} + static int __kvm_x86_vendor_init(struct kvm_x86_init_ops *ops) { + struct kvm_cpu_compat_check c; u64 host_pat; - int r; + int r, cpu; if (kvm_x86_ops.hardware_enable) { pr_err("kvm: already loaded vendor module '%s'\n", kvm_x86_ops.name); @@ -9360,6 +9386,14 @@ static int __kvm_x86_vendor_init(struct kvm_x86_init_ops *ops) if (r != 0) goto out_mmu_exit; + c.ret = + c.ops = ops; + for_each_online_cpu(cpu) { + smp_call_function_single(cpu, kvm_x86_check_cpu_compat, , 1); + if (r < 0) + goto out_hardware_unsetup; + } + /* * Point of no return! DO NOT add error paths below this point unless * absolutely necessary, as most operations from this point forward @@ -9402,6 +9436,8 @@ static int __kvm_x86_vendor_init(struct kvm_x86_init_ops *ops) kvm_init_msr_list(); return 0; +out_hardware_unsetup: + ops->runtime_ops->hardware_unsetup(); out_mmu_exit: kvm_mmu_vendor_module_exit(); out_free_percpu: @@ -12037,16 +12073,7 @@ void kvm_arch_hardware_disable(void) int kvm_arch_check_processor_compat(void *opaque) { - struct cpuinfo_x86 *c = _data(smp_processor_id()); - struct kvm_x86_init_ops *ops = opaque; - - WARN_ON(!irqs_disabled()); - - if (__cr4_reserved_bits(cpu_has, c) != - __cr4_reserved_bits(cpu_has, _cpu_data)) - return -EIO; - - return ops->check_processor_compatibility(); + return 0; } bool kvm_vcpu_is_reset_bsp(struct kvm_vcpu *vcpu) -- 2.38.1.584.g0f3c55d4c2-goog
[PATCH v2 30/50] KVM: VMX: Make VMCS configuration/capabilities structs read-only after init
Tag vmcs_config and vmx_capability structs as __init, the canonical configuration is generated during hardware_setup() and must never be modified after that point. Signed-off-by: Sean Christopherson --- arch/x86/kvm/vmx/capabilities.h | 4 ++-- arch/x86/kvm/vmx/vmx.c | 4 ++-- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/arch/x86/kvm/vmx/capabilities.h b/arch/x86/kvm/vmx/capabilities.h index cd2ac9536c99..45162c1bcd8f 100644 --- a/arch/x86/kvm/vmx/capabilities.h +++ b/arch/x86/kvm/vmx/capabilities.h @@ -66,13 +66,13 @@ struct vmcs_config { u64 misc; struct nested_vmx_msrs nested; }; -extern struct vmcs_config vmcs_config; +extern struct vmcs_config vmcs_config __ro_after_init; struct vmx_capability { u32 ept; u32 vpid; }; -extern struct vmx_capability vmx_capability; +extern struct vmx_capability vmx_capability __ro_after_init; static inline bool cpu_has_vmx_basic_inout(void) { diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index 76185a7a7ded..654d81f781da 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -488,8 +488,8 @@ static DEFINE_PER_CPU(struct list_head, loaded_vmcss_on_cpu); static DECLARE_BITMAP(vmx_vpid_bitmap, VMX_NR_VPIDS); static DEFINE_SPINLOCK(vmx_vpid_lock); -struct vmcs_config vmcs_config; -struct vmx_capability vmx_capability; +struct vmcs_config vmcs_config __ro_after_init; +struct vmx_capability vmx_capability __ro_after_init; #define VMX_SEGMENT_FIELD(seg) \ [VCPU_SREG_##seg] = { \ -- 2.38.1.584.g0f3c55d4c2-goog
[PATCH v2 29/50] KVM: Drop kvm_arch_{init,exit}() hooks
Drop kvm_arch_init() and kvm_arch_exit() now that all implementations are nops. No functional change intended. Signed-off-by: Sean Christopherson Reviewed-by: Eric Farman # s390 Reviewed-by: Philippe Mathieu-Daudé Acked-by: Anup Patel --- arch/arm64/kvm/arm.c| 11 --- arch/mips/kvm/mips.c| 10 -- arch/powerpc/include/asm/kvm_host.h | 1 - arch/powerpc/kvm/powerpc.c | 5 - arch/riscv/kvm/main.c | 9 - arch/s390/kvm/kvm-s390.c| 10 -- arch/x86/kvm/x86.c | 10 -- include/linux/kvm_host.h| 3 --- virt/kvm/kvm_main.c | 19 ++- 9 files changed, 2 insertions(+), 76 deletions(-) diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c index 4d34abcfc9a9..936ef7d1ea94 100644 --- a/arch/arm64/kvm/arm.c +++ b/arch/arm64/kvm/arm.c @@ -2289,17 +2289,6 @@ static __init int kvm_arm_init(void) return err; } -int kvm_arch_init(void *opaque) -{ - return 0; -} - -/* NOP: Compiling as a module not supported */ -void kvm_arch_exit(void) -{ - -} - static int __init early_kvm_mode_cfg(char *arg) { if (!arg) diff --git a/arch/mips/kvm/mips.c b/arch/mips/kvm/mips.c index ae7a24342fdf..3cade648827a 100644 --- a/arch/mips/kvm/mips.c +++ b/arch/mips/kvm/mips.c @@ -1010,16 +1010,6 @@ long kvm_arch_vm_ioctl(struct file *filp, unsigned int ioctl, unsigned long arg) return r; } -int kvm_arch_init(void *opaque) -{ - return 0; -} - -void kvm_arch_exit(void) -{ - -} - int kvm_arch_vcpu_ioctl_get_sregs(struct kvm_vcpu *vcpu, struct kvm_sregs *sregs) { diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index 5d2c3a487e73..0a80e80c7b9e 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -881,7 +881,6 @@ static inline void kvm_arch_sync_events(struct kvm *kvm) {} static inline void kvm_arch_memslots_updated(struct kvm *kvm, u64 gen) {} static inline void kvm_arch_flush_shadow_all(struct kvm *kvm) {} static inline void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu) {} -static inline void kvm_arch_exit(void) {} static inline void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu) {} static inline void kvm_arch_vcpu_unblocking(struct kvm_vcpu *vcpu) {} diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c index d44b85ba8cef..01d0f9935e6c 100644 --- a/arch/powerpc/kvm/powerpc.c +++ b/arch/powerpc/kvm/powerpc.c @@ -2539,11 +2539,6 @@ void kvmppc_init_lpid(unsigned long nr_lpids_param) } EXPORT_SYMBOL_GPL(kvmppc_init_lpid); -int kvm_arch_init(void *opaque) -{ - return 0; -} - EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_ppc_instr); void kvm_arch_create_vcpu_debugfs(struct kvm_vcpu *vcpu, struct dentry *debugfs_dentry) diff --git a/arch/riscv/kvm/main.c b/arch/riscv/kvm/main.c index cb063b8a9a0f..4710a6751687 100644 --- a/arch/riscv/kvm/main.c +++ b/arch/riscv/kvm/main.c @@ -65,15 +65,6 @@ void kvm_arch_hardware_disable(void) csr_write(CSR_HIDELEG, 0); } -int kvm_arch_init(void *opaque) -{ - return 0; -} - -void kvm_arch_exit(void) -{ -} - static int __init riscv_kvm_init(void) { const char *str; diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c index 66d162723d21..25b08b956888 100644 --- a/arch/s390/kvm/kvm-s390.c +++ b/arch/s390/kvm/kvm-s390.c @@ -541,16 +541,6 @@ static void __kvm_s390_exit(void) debug_unregister(kvm_s390_dbf_uv); } -int kvm_arch_init(void *opaque) -{ - return 0; -} - -void kvm_arch_exit(void) -{ - -} - /* Section: device related */ long kvm_arch_dev_ioctl(struct file *filp, unsigned int ioctl, unsigned long arg) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 45184ca89317..66f16458aa97 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -9277,16 +9277,6 @@ static inline void kvm_ops_update(struct kvm_x86_init_ops *ops) kvm_pmu_ops_update(ops->pmu_ops); } -int kvm_arch_init(void *opaque) -{ - return 0; -} - -void kvm_arch_exit(void) -{ - -} - static int __kvm_x86_vendor_init(struct kvm_x86_init_ops *ops) { u64 host_pat; diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index f2e0e78d2d92..7dde28333e7c 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -1439,9 +1439,6 @@ int kvm_arch_vcpu_ioctl_set_guest_debug(struct kvm_vcpu *vcpu, struct kvm_guest_debug *dbg); int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu); -int kvm_arch_init(void *opaque); -void kvm_arch_exit(void); - void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu); void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu); diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 0e62887e8ce1..a4a10a0b322f 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -5852,20 +5852,8 @@ int
[PATCH v2 28/50] KVM: s390: Mark __kvm_s390_init() and its descendants as __init
Tag __kvm_s390_init() and its unique helpers as __init. These functions are only ever called during module_init(), but could not be tagged accordingly while they were invoked from the common kvm_arch_init(), which is not __init because of x86. Signed-off-by: Sean Christopherson Reviewed-by: Eric Farman --- arch/s390/kvm/interrupt.c | 2 +- arch/s390/kvm/kvm-s390.c | 4 ++-- arch/s390/kvm/kvm-s390.h | 2 +- arch/s390/kvm/pci.c | 2 +- arch/s390/kvm/pci.h | 2 +- 5 files changed, 6 insertions(+), 6 deletions(-) diff --git a/arch/s390/kvm/interrupt.c b/arch/s390/kvm/interrupt.c index 1dae78deddf2..3754d7937530 100644 --- a/arch/s390/kvm/interrupt.c +++ b/arch/s390/kvm/interrupt.c @@ -3411,7 +3411,7 @@ void kvm_s390_gib_destroy(void) gib = NULL; } -int kvm_s390_gib_init(u8 nisc) +int __init kvm_s390_gib_init(u8 nisc) { int rc = 0; diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c index 8c4fcaf2bd36..66d162723d21 100644 --- a/arch/s390/kvm/kvm-s390.c +++ b/arch/s390/kvm/kvm-s390.c @@ -366,7 +366,7 @@ static __always_inline void __insn32_query(unsigned int opcode, u8 *query) #define INSN_SORTL 0xb938 #define INSN_DFLTCC 0xb939 -static void kvm_s390_cpu_feat_init(void) +static void __init kvm_s390_cpu_feat_init(void) { int i; @@ -469,7 +469,7 @@ static void kvm_s390_cpu_feat_init(void) */ } -static int __kvm_s390_init(void) +static int __init __kvm_s390_init(void) { int rc = -ENOMEM; diff --git a/arch/s390/kvm/kvm-s390.h b/arch/s390/kvm/kvm-s390.h index d48588c207d8..0261d42c7d01 100644 --- a/arch/s390/kvm/kvm-s390.h +++ b/arch/s390/kvm/kvm-s390.h @@ -470,7 +470,7 @@ void kvm_s390_gisa_clear(struct kvm *kvm); void kvm_s390_gisa_destroy(struct kvm *kvm); void kvm_s390_gisa_disable(struct kvm *kvm); void kvm_s390_gisa_enable(struct kvm *kvm); -int kvm_s390_gib_init(u8 nisc); +int __init kvm_s390_gib_init(u8 nisc); void kvm_s390_gib_destroy(void); /* implemented in guestdbg.c */ diff --git a/arch/s390/kvm/pci.c b/arch/s390/kvm/pci.c index ded1af2ddae9..39544c92ce3d 100644 --- a/arch/s390/kvm/pci.c +++ b/arch/s390/kvm/pci.c @@ -670,7 +670,7 @@ int kvm_s390_pci_zpci_op(struct kvm *kvm, struct kvm_s390_zpci_op *args) return r; } -int kvm_s390_pci_init(void) +int __init kvm_s390_pci_init(void) { zpci_kvm_hook.kvm_register = kvm_s390_pci_register_kvm; zpci_kvm_hook.kvm_unregister = kvm_s390_pci_unregister_kvm; diff --git a/arch/s390/kvm/pci.h b/arch/s390/kvm/pci.h index 486d06ef563f..ff0972dd5e71 100644 --- a/arch/s390/kvm/pci.h +++ b/arch/s390/kvm/pci.h @@ -60,7 +60,7 @@ void kvm_s390_pci_clear_list(struct kvm *kvm); int kvm_s390_pci_zpci_op(struct kvm *kvm, struct kvm_s390_zpci_op *args); -int kvm_s390_pci_init(void); +int __init kvm_s390_pci_init(void); void kvm_s390_pci_exit(void); static inline bool kvm_s390_pci_interp_allowed(void) -- 2.38.1.584.g0f3c55d4c2-goog
[PATCH v2 27/50] KVM: s390: Do s390 specific init without bouncing through kvm_init()
Move the guts of kvm_arch_init() into a new helper, __kvm_s390_init(), and invoke the new helper directly from kvm_s390_init() instead of bouncing through kvm_init(). Invoking kvm_arch_init() is the very first action performed by kvm_init(), i.e. this is a glorified nop. Moving setup to __kvm_s390_init() will allow tagging more functions as __init, and emptying kvm_arch_init() will allow dropping the hook entirely once all architecture implementations are nops. No functional change intended. Signed-off-by: Sean Christopherson Reviewed-by: Eric Farman Reviewed-by: Philippe Mathieu-Daudé --- arch/s390/kvm/kvm-s390.c | 29 + 1 file changed, 25 insertions(+), 4 deletions(-) diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c index 829e6e046003..8c4fcaf2bd36 100644 --- a/arch/s390/kvm/kvm-s390.c +++ b/arch/s390/kvm/kvm-s390.c @@ -469,7 +469,7 @@ static void kvm_s390_cpu_feat_init(void) */ } -int kvm_arch_init(void *opaque) +static int __kvm_s390_init(void) { int rc = -ENOMEM; @@ -527,7 +527,7 @@ int kvm_arch_init(void *opaque) return rc; } -void kvm_arch_exit(void) +static void __kvm_s390_exit(void) { gmap_unregister_pte_notifier(_notifier); gmap_unregister_pte_notifier(_gmap_notifier); @@ -541,6 +541,16 @@ void kvm_arch_exit(void) debug_unregister(kvm_s390_dbf_uv); } +int kvm_arch_init(void *opaque) +{ + return 0; +} + +void kvm_arch_exit(void) +{ + +} + /* Section: device related */ long kvm_arch_dev_ioctl(struct file *filp, unsigned int ioctl, unsigned long arg) @@ -5696,7 +5706,7 @@ static inline unsigned long nonhyp_mask(int i) static int __init kvm_s390_init(void) { - int i; + int i, r; if (!sclp.has_sief2) { pr_info("SIE is not available\n"); @@ -5712,12 +5722,23 @@ static int __init kvm_s390_init(void) kvm_s390_fac_base[i] |= stfle_fac_list[i] & nonhyp_mask(i); - return kvm_init(NULL, sizeof(struct kvm_vcpu), 0, THIS_MODULE); + r = __kvm_s390_init(); + if (r) + return r; + + r = kvm_init(NULL, sizeof(struct kvm_vcpu), 0, THIS_MODULE); + if (r) { + __kvm_s390_exit(); + return r; + } + return 0; } static void __exit kvm_s390_exit(void) { kvm_exit(); + + __kvm_s390_exit(); } module_init(kvm_s390_init); -- 2.38.1.584.g0f3c55d4c2-goog
[PATCH v2 26/50] KVM: PPC: Move processor compatibility check to module init
Move KVM PPC's compatibility checks to their respective module_init() hooks, there's no need to wait until KVM's common compat check, nor is there a need to perform the check on every CPU (provided by common KVM's hook), as the compatibility checks operate on global data. arch/powerpc/include/asm/cputable.h: extern struct cpu_spec *cur_cpu_spec; arch/powerpc/kvm/book3s.c: return 0 arch/powerpc/kvm/e500.c: strcmp(cur_cpu_spec->cpu_name, "e500v2") arch/powerpc/kvm/e500mc.c: strcmp(cur_cpu_spec->cpu_name, "e500mc") strcmp(cur_cpu_spec->cpu_name, "e5500") strcmp(cur_cpu_spec->cpu_name, "e6500") Cc: Fabiano Rosas Cc: Michael Ellerman Signed-off-by: Sean Christopherson --- arch/powerpc/include/asm/kvm_ppc.h | 1 - arch/powerpc/kvm/book3s.c | 10 -- arch/powerpc/kvm/e500.c| 4 ++-- arch/powerpc/kvm/e500mc.c | 4 arch/powerpc/kvm/powerpc.c | 2 +- 5 files changed, 7 insertions(+), 14 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h index bfacf12784dd..51a1824b0a16 100644 --- a/arch/powerpc/include/asm/kvm_ppc.h +++ b/arch/powerpc/include/asm/kvm_ppc.h @@ -118,7 +118,6 @@ extern int kvmppc_xlate(struct kvm_vcpu *vcpu, ulong eaddr, extern int kvmppc_core_vcpu_create(struct kvm_vcpu *vcpu); extern void kvmppc_core_vcpu_free(struct kvm_vcpu *vcpu); extern int kvmppc_core_vcpu_setup(struct kvm_vcpu *vcpu); -extern int kvmppc_core_check_processor_compat(void); extern int kvmppc_core_vcpu_translate(struct kvm_vcpu *vcpu, struct kvm_translation *tr); diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c index 6d525285dbe8..87283a0e33d8 100644 --- a/arch/powerpc/kvm/book3s.c +++ b/arch/powerpc/kvm/book3s.c @@ -999,16 +999,6 @@ int kvmppc_h_logical_ci_store(struct kvm_vcpu *vcpu) } EXPORT_SYMBOL_GPL(kvmppc_h_logical_ci_store); -int kvmppc_core_check_processor_compat(void) -{ - /* -* We always return 0 for book3s. We check -* for compatibility while loading the HV -* or PR module -*/ - return 0; -} - int kvmppc_book3s_hcall_implemented(struct kvm *kvm, unsigned long hcall) { return kvm->arch.kvm_ops->hcall_implemented(hcall); diff --git a/arch/powerpc/kvm/e500.c b/arch/powerpc/kvm/e500.c index c8b2b4478545..0ea61190ec04 100644 --- a/arch/powerpc/kvm/e500.c +++ b/arch/powerpc/kvm/e500.c @@ -314,7 +314,7 @@ static void kvmppc_core_vcpu_put_e500(struct kvm_vcpu *vcpu) kvmppc_booke_vcpu_put(vcpu); } -int kvmppc_core_check_processor_compat(void) +static int kvmppc_e500_check_processor_compat(void) { int r; @@ -507,7 +507,7 @@ static int __init kvmppc_e500_init(void) unsigned long handler_len; unsigned long max_ivor = 0; - r = kvmppc_core_check_processor_compat(); + r = kvmppc_e500_check_processor_compat(); if (r) goto err_out; diff --git a/arch/powerpc/kvm/e500mc.c b/arch/powerpc/kvm/e500mc.c index 57e0ad6a2ca3..795667f7ebf0 100644 --- a/arch/powerpc/kvm/e500mc.c +++ b/arch/powerpc/kvm/e500mc.c @@ -388,6 +388,10 @@ static int __init kvmppc_e500mc_init(void) { int r; + r = kvmppc_e500mc_check_processor_compat(); + if (r) + return kvmppc_e500mc; + r = kvmppc_booke_init(); if (r) goto err_out; diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c index 5faf69421f13..d44b85ba8cef 100644 --- a/arch/powerpc/kvm/powerpc.c +++ b/arch/powerpc/kvm/powerpc.c @@ -442,7 +442,7 @@ int kvm_arch_hardware_enable(void) int kvm_arch_check_processor_compat(void *opaque) { - return kvmppc_core_check_processor_compat(); + return 0; } int kvm_arch_init_vm(struct kvm *kvm, unsigned long type) -- 2.38.1.584.g0f3c55d4c2-goog
[PATCH v2 25/50] KVM: RISC-V: Tag init functions and data with __init, __ro_after_init
Now that KVM setup is handled directly in riscv_kvm_init(), tag functions and data that are used/set only during init with __init/__ro_after_init. Signed-off-by: Sean Christopherson Acked-by: Anup Patel --- arch/riscv/include/asm/kvm_host.h | 6 +++--- arch/riscv/kvm/mmu.c | 12 ++-- arch/riscv/kvm/vmid.c | 4 ++-- 3 files changed, 11 insertions(+), 11 deletions(-) diff --git a/arch/riscv/include/asm/kvm_host.h b/arch/riscv/include/asm/kvm_host.h index 8c771fc4f5d2..778ff0f282b7 100644 --- a/arch/riscv/include/asm/kvm_host.h +++ b/arch/riscv/include/asm/kvm_host.h @@ -295,11 +295,11 @@ int kvm_riscv_gstage_map(struct kvm_vcpu *vcpu, int kvm_riscv_gstage_alloc_pgd(struct kvm *kvm); void kvm_riscv_gstage_free_pgd(struct kvm *kvm); void kvm_riscv_gstage_update_hgatp(struct kvm_vcpu *vcpu); -void kvm_riscv_gstage_mode_detect(void); -unsigned long kvm_riscv_gstage_mode(void); +void __init kvm_riscv_gstage_mode_detect(void); +unsigned long __init kvm_riscv_gstage_mode(void); int kvm_riscv_gstage_gpa_bits(void); -void kvm_riscv_gstage_vmid_detect(void); +void __init kvm_riscv_gstage_vmid_detect(void); unsigned long kvm_riscv_gstage_vmid_bits(void); int kvm_riscv_gstage_vmid_init(struct kvm *kvm); bool kvm_riscv_gstage_vmid_ver_changed(struct kvm_vmid *vmid); diff --git a/arch/riscv/kvm/mmu.c b/arch/riscv/kvm/mmu.c index 3620ecac2fa1..f42a34c7879a 100644 --- a/arch/riscv/kvm/mmu.c +++ b/arch/riscv/kvm/mmu.c @@ -20,12 +20,12 @@ #include #ifdef CONFIG_64BIT -static unsigned long gstage_mode = (HGATP_MODE_SV39X4 << HGATP_MODE_SHIFT); -static unsigned long gstage_pgd_levels = 3; +static unsigned long gstage_mode __ro_after_init = (HGATP_MODE_SV39X4 << HGATP_MODE_SHIFT); +static unsigned long gstage_pgd_levels __ro_after_init = 3; #define gstage_index_bits 9 #else -static unsigned long gstage_mode = (HGATP_MODE_SV32X4 << HGATP_MODE_SHIFT); -static unsigned long gstage_pgd_levels = 2; +static unsigned long gstage_mode __ro_after_init = (HGATP_MODE_SV32X4 << HGATP_MODE_SHIFT); +static unsigned long gstage_pgd_levels __ro_after_init = 2; #define gstage_index_bits 10 #endif @@ -760,7 +760,7 @@ void kvm_riscv_gstage_update_hgatp(struct kvm_vcpu *vcpu) kvm_riscv_local_hfence_gvma_all(); } -void kvm_riscv_gstage_mode_detect(void) +void __init kvm_riscv_gstage_mode_detect(void) { #ifdef CONFIG_64BIT /* Try Sv57x4 G-stage mode */ @@ -784,7 +784,7 @@ void kvm_riscv_gstage_mode_detect(void) #endif } -unsigned long kvm_riscv_gstage_mode(void) +unsigned long __init kvm_riscv_gstage_mode(void) { return gstage_mode >> HGATP_MODE_SHIFT; } diff --git a/arch/riscv/kvm/vmid.c b/arch/riscv/kvm/vmid.c index 6cd93995fb65..5246da1c9167 100644 --- a/arch/riscv/kvm/vmid.c +++ b/arch/riscv/kvm/vmid.c @@ -17,10 +17,10 @@ static unsigned long vmid_version = 1; static unsigned long vmid_next; -static unsigned long vmid_bits; +static unsigned long vmid_bits __ro_after_init; static DEFINE_SPINLOCK(vmid_lock); -void kvm_riscv_gstage_vmid_detect(void) +void __init kvm_riscv_gstage_vmid_detect(void) { unsigned long old; -- 2.38.1.584.g0f3c55d4c2-goog
[PATCH v2 24/50] KVM: RISC-V: Do arch init directly in riscv_kvm_init()
Fold the guts of kvm_arch_init() into riscv_kvm_init() instead of bouncing through kvm_init()=>kvm_arch_init(). Functionally, this is a glorified nop as invoking kvm_arch_init() is the very first action performed by kvm_init(). Moving setup to riscv_kvm_init(), which is tagged __init, will allow tagging more functions and data with __init and __ro_after_init. And emptying kvm_arch_init() will allow dropping the hook entirely once all architecture implementations are nops. No functional change intended. Signed-off-by: Sean Christopherson Reviewed-by: Philippe Mathieu-Daudé Acked-by: Anup Patel --- arch/riscv/kvm/main.c | 18 +- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/arch/riscv/kvm/main.c b/arch/riscv/kvm/main.c index a146fa0ce4d2..cb063b8a9a0f 100644 --- a/arch/riscv/kvm/main.c +++ b/arch/riscv/kvm/main.c @@ -66,6 +66,15 @@ void kvm_arch_hardware_disable(void) } int kvm_arch_init(void *opaque) +{ + return 0; +} + +void kvm_arch_exit(void) +{ +} + +static int __init riscv_kvm_init(void) { const char *str; @@ -110,15 +119,6 @@ int kvm_arch_init(void *opaque) kvm_info("VMID %ld bits available\n", kvm_riscv_gstage_vmid_bits()); - return 0; -} - -void kvm_arch_exit(void) -{ -} - -static int __init riscv_kvm_init(void) -{ return kvm_init(NULL, sizeof(struct kvm_vcpu), 0, THIS_MODULE); } module_init(riscv_kvm_init); -- 2.38.1.584.g0f3c55d4c2-goog
[PATCH v2 23/50] KVM: MIPS: Register die notifier prior to kvm_init()
Call kvm_init() only after _all_ setup is complete, as kvm_init() exposes /dev/kvm to userspace and thus allows userspace to create VMs (and call other ioctls). Signed-off-by: Sean Christopherson Reviewed-by: Philippe Mathieu-Daudé --- arch/mips/kvm/mips.c | 9 + 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/arch/mips/kvm/mips.c b/arch/mips/kvm/mips.c index 75681281e2df..ae7a24342fdf 100644 --- a/arch/mips/kvm/mips.c +++ b/arch/mips/kvm/mips.c @@ -1640,16 +1640,17 @@ static int __init kvm_mips_init(void) if (ret) return ret; - ret = kvm_init(NULL, sizeof(struct kvm_vcpu), 0, THIS_MODULE); - - if (ret) - return ret; if (boot_cpu_type() == CPU_LOONGSON64) kvm_priority_to_irq = kvm_loongson3_priority_to_irq; register_die_notifier(_mips_csr_die_notifier); + ret = kvm_init(NULL, sizeof(struct kvm_vcpu), 0, THIS_MODULE); + if (ret) { + unregister_die_notifier(_mips_csr_die_notifier); + return ret; + } return 0; } -- 2.38.1.584.g0f3c55d4c2-goog
[PATCH v2 22/50] KVM: MIPS: Setup VZ emulation? directly from kvm_mips_init()
Invoke kvm_mips_emulation_init() directly from kvm_mips_init() instead of bouncing through kvm_init()=>kvm_arch_init(). Functionally, this is a glorified nop as invoking kvm_arch_init() is the very first action performed by kvm_init(). Emptying kvm_arch_init() will allow dropping the hook entirely once all architecture implementations are nops. No functional change intended. Signed-off-by: Sean Christopherson Reviewed-by: Philippe Mathieu-Daudé --- arch/mips/kvm/mips.c | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/arch/mips/kvm/mips.c b/arch/mips/kvm/mips.c index f0a6c245d1ff..75681281e2df 100644 --- a/arch/mips/kvm/mips.c +++ b/arch/mips/kvm/mips.c @@ -1012,7 +1012,7 @@ long kvm_arch_vm_ioctl(struct file *filp, unsigned int ioctl, unsigned long arg) int kvm_arch_init(void *opaque) { - return kvm_mips_emulation_init(); + return 0; } void kvm_arch_exit(void) @@ -1636,6 +1636,10 @@ static int __init kvm_mips_init(void) if (ret) return ret; + ret = kvm_mips_emulation_init(); + if (ret) + return ret; + ret = kvm_init(NULL, sizeof(struct kvm_vcpu), 0, THIS_MODULE); if (ret) -- 2.38.1.584.g0f3c55d4c2-goog
[PATCH v2 21/50] KVM: MIPS: Hardcode callbacks to hardware virtualization extensions
Now that KVM no longer supports trap-and-emulate (see commit 45c7e8af4a5e "MIPS: Remove KVM_TE support"), hardcode the MIPS callbacks to the virtualization callbacks. Harcoding the callbacks eliminates the technically-unnecessary check on non-NULL kvm_mips_callbacks in kvm_arch_init(). MIPS has never supported multiple in-tree modules, i.e. barring an out-of-tree module, where copying and renaming kvm.ko counts as "out-of-tree", KVM could never encounter a non-NULL set of callbacks during module init. The callback check is also subtly broken, as it is not thread safe, i.e. if there were multiple modules, loading both concurrently would create a race between checking and setting kvm_mips_callbacks. Given that out-of-tree shenanigans are not the kernel's responsibility, hardcode the callbacks to simplify the code. Signed-off-by: Sean Christopherson --- arch/mips/include/asm/kvm_host.h | 2 +- arch/mips/kvm/Makefile | 2 +- arch/mips/kvm/callback.c | 14 -- arch/mips/kvm/mips.c | 9 ++--- arch/mips/kvm/vz.c | 7 --- 5 files changed, 8 insertions(+), 26 deletions(-) delete mode 100644 arch/mips/kvm/callback.c diff --git a/arch/mips/include/asm/kvm_host.h b/arch/mips/include/asm/kvm_host.h index 28f0ba97db71..2803c9c21ef9 100644 --- a/arch/mips/include/asm/kvm_host.h +++ b/arch/mips/include/asm/kvm_host.h @@ -758,7 +758,7 @@ struct kvm_mips_callbacks { void (*vcpu_reenter)(struct kvm_vcpu *vcpu); }; extern struct kvm_mips_callbacks *kvm_mips_callbacks; -int kvm_mips_emulation_init(struct kvm_mips_callbacks **install_callbacks); +int kvm_mips_emulation_init(void); /* Debug: dump vcpu state */ int kvm_arch_vcpu_dump_regs(struct kvm_vcpu *vcpu); diff --git a/arch/mips/kvm/Makefile b/arch/mips/kvm/Makefile index 21ff75bcdbc4..805aeea2166e 100644 --- a/arch/mips/kvm/Makefile +++ b/arch/mips/kvm/Makefile @@ -17,4 +17,4 @@ kvm-$(CONFIG_CPU_LOONGSON64) += loongson_ipi.o kvm-y += vz.o obj-$(CONFIG_KVM) += kvm.o -obj-y += callback.o tlb.o +obj-y += tlb.o diff --git a/arch/mips/kvm/callback.c b/arch/mips/kvm/callback.c deleted file mode 100644 index d88aa2173fb0.. --- a/arch/mips/kvm/callback.c +++ /dev/null @@ -1,14 +0,0 @@ -/* - * This file is subject to the terms and conditions of the GNU General Public - * License. See the file "COPYING" in the main directory of this archive - * for more details. - * - * Copyright (C) 2012 MIPS Technologies, Inc. All rights reserved. - * Authors: Yann Le Du - */ - -#include -#include - -struct kvm_mips_callbacks *kvm_mips_callbacks; -EXPORT_SYMBOL_GPL(kvm_mips_callbacks); diff --git a/arch/mips/kvm/mips.c b/arch/mips/kvm/mips.c index af29490d9740..f0a6c245d1ff 100644 --- a/arch/mips/kvm/mips.c +++ b/arch/mips/kvm/mips.c @@ -1012,17 +1012,12 @@ long kvm_arch_vm_ioctl(struct file *filp, unsigned int ioctl, unsigned long arg) int kvm_arch_init(void *opaque) { - if (kvm_mips_callbacks) { - kvm_err("kvm: module already exists\n"); - return -EEXIST; - } - - return kvm_mips_emulation_init(_mips_callbacks); + return kvm_mips_emulation_init(); } void kvm_arch_exit(void) { - kvm_mips_callbacks = NULL; + } int kvm_arch_vcpu_ioctl_get_sregs(struct kvm_vcpu *vcpu, diff --git a/arch/mips/kvm/vz.c b/arch/mips/kvm/vz.c index c706f5890a05..dafab003ea0d 100644 --- a/arch/mips/kvm/vz.c +++ b/arch/mips/kvm/vz.c @@ -3304,7 +3304,10 @@ static struct kvm_mips_callbacks kvm_vz_callbacks = { .vcpu_reenter = kvm_vz_vcpu_reenter, }; -int kvm_mips_emulation_init(struct kvm_mips_callbacks **install_callbacks) +/* FIXME: Get rid of the callbacks now that trap-and-emulate is gone. */ +struct kvm_mips_callbacks *kvm_mips_callbacks = _vz_callbacks; + +int kvm_mips_emulation_init(void) { if (!cpu_has_vz) return -ENODEV; @@ -3318,7 +3321,5 @@ int kvm_mips_emulation_init(struct kvm_mips_callbacks **install_callbacks) return -ENODEV; pr_info("Starting KVM with MIPS VZ extensions\n"); - - *install_callbacks = _vz_callbacks; return 0; } -- 2.38.1.584.g0f3c55d4c2-goog
[PATCH v2 20/50] KVM: arm64: Mark kvm_arm_init() and its unique descendants as __init
Tag kvm_arm_init() and its unique helper as __init, and tag data that is only ever modified under the kvm_arm_init() umbrella as read-only after init. Opportunistically name the boolean param in kvm_timer_hyp_init()'s prototype to match its definition. Signed-off-by: Sean Christopherson --- arch/arm64/include/asm/kvm_host.h | 14 ++--- arch/arm64/include/asm/kvm_mmu.h | 4 ++-- arch/arm64/kvm/arch_timer.c | 2 +- arch/arm64/kvm/arm.c | 34 +++ arch/arm64/kvm/mmu.c | 12 +-- arch/arm64/kvm/reset.c| 8 arch/arm64/kvm/sys_regs.c | 6 +++--- arch/arm64/kvm/vmid.c | 6 +++--- include/kvm/arm_arch_timer.h | 2 +- 9 files changed, 44 insertions(+), 44 deletions(-) diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h index 5d5a887e63a5..4863fe356be1 100644 --- a/arch/arm64/include/asm/kvm_host.h +++ b/arch/arm64/include/asm/kvm_host.h @@ -66,8 +66,8 @@ enum kvm_mode kvm_get_mode(void); DECLARE_STATIC_KEY_FALSE(userspace_irqchip_in_use); -extern unsigned int kvm_sve_max_vl; -int kvm_arm_init_sve(void); +extern unsigned int __ro_after_init kvm_sve_max_vl; +int __init kvm_arm_init_sve(void); u32 __attribute_const__ kvm_target_cpu(void); int kvm_reset_vcpu(struct kvm_vcpu *vcpu); @@ -793,7 +793,7 @@ int kvm_handle_cp10_id(struct kvm_vcpu *vcpu); void kvm_reset_sys_regs(struct kvm_vcpu *vcpu); -int kvm_sys_reg_table_init(void); +int __init kvm_sys_reg_table_init(void); /* MMIO helpers */ void kvm_mmio_write_buf(void *buf, unsigned int len, unsigned long data); @@ -824,9 +824,9 @@ int kvm_arm_pvtime_get_attr(struct kvm_vcpu *vcpu, int kvm_arm_pvtime_has_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr); -extern unsigned int kvm_arm_vmid_bits; -int kvm_arm_vmid_alloc_init(void); -void kvm_arm_vmid_alloc_free(void); +extern unsigned int __ro_after_init kvm_arm_vmid_bits; +int __init kvm_arm_vmid_alloc_init(void); +void __init kvm_arm_vmid_alloc_free(void); void kvm_arm_vmid_update(struct kvm_vmid *kvm_vmid); void kvm_arm_vmid_clear_active(void); @@ -909,7 +909,7 @@ static inline void kvm_clr_pmu_events(u32 clr) {} void kvm_vcpu_load_sysregs_vhe(struct kvm_vcpu *vcpu); void kvm_vcpu_put_sysregs_vhe(struct kvm_vcpu *vcpu); -int kvm_set_ipa_limit(void); +int __init kvm_set_ipa_limit(void); #define __KVM_HAVE_ARCH_VM_ALLOC struct kvm *kvm_arch_alloc_vm(void); diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h index 7784081088e7..ced5b0028933 100644 --- a/arch/arm64/include/asm/kvm_mmu.h +++ b/arch/arm64/include/asm/kvm_mmu.h @@ -163,7 +163,7 @@ int create_hyp_io_mappings(phys_addr_t phys_addr, size_t size, void __iomem **haddr); int create_hyp_exec_mappings(phys_addr_t phys_addr, size_t size, void **haddr); -void free_hyp_pgds(void); +void __init free_hyp_pgds(void); void stage2_unmap_vm(struct kvm *kvm); int kvm_init_stage2_mmu(struct kvm *kvm, struct kvm_s2_mmu *mmu); @@ -175,7 +175,7 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu); phys_addr_t kvm_mmu_get_httbr(void); phys_addr_t kvm_get_idmap_vector(void); -int kvm_mmu_init(u32 *hyp_va_bits); +int __init kvm_mmu_init(u32 *hyp_va_bits); static inline void *__kvm_vector_slot2addr(void *base, enum arm64_hyp_spectre_vector slot) diff --git a/arch/arm64/kvm/arch_timer.c b/arch/arm64/kvm/arch_timer.c index 33fca1a691a5..23346585a294 100644 --- a/arch/arm64/kvm/arch_timer.c +++ b/arch/arm64/kvm/arch_timer.c @@ -1113,7 +1113,7 @@ static int kvm_irq_init(struct arch_timer_kvm_info *info) return 0; } -int kvm_timer_hyp_init(bool has_gic) +int __init kvm_timer_hyp_init(bool has_gic) { struct arch_timer_kvm_info *info; int err; diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c index d3a4db1abf32..4d34abcfc9a9 100644 --- a/arch/arm64/kvm/arm.c +++ b/arch/arm64/kvm/arm.c @@ -1513,7 +1513,7 @@ static int kvm_init_vector_slots(void) return 0; } -static void cpu_prepare_hyp_mode(int cpu) +static void __init cpu_prepare_hyp_mode(int cpu) { struct kvm_nvhe_init_params *params = per_cpu_ptr_nvhe_sym(kvm_init_params, cpu); unsigned long tcr; @@ -1739,26 +1739,26 @@ static struct notifier_block hyp_init_cpu_pm_nb = { .notifier_call = hyp_init_cpu_pm_notifier, }; -static void hyp_cpu_pm_init(void) +static void __init hyp_cpu_pm_init(void) { if (!is_protected_kvm_enabled()) cpu_pm_register_notifier(_init_cpu_pm_nb); } -static void hyp_cpu_pm_exit(void) +static void __init hyp_cpu_pm_exit(void) { if (!is_protected_kvm_enabled()) cpu_pm_unregister_notifier(_init_cpu_pm_nb); } #else -static inline void hyp_cpu_pm_init(void) +static inline void __init hyp_cpu_pm_init(void) { } -static inline void
[PATCH v2 19/50] KVM: arm64: Do arm/arch initialization without bouncing through kvm_init()
Do arm/arch specific initialization directly in arm's module_init(), now called kvm_arm_init(), instead of bouncing through kvm_init() to reach kvm_arch_init(). Invoking kvm_arch_init() is the very first action performed by kvm_init(), so from a initialization perspective this is a glorified nop. Avoiding kvm_arch_init() also fixes a mostly benign bug as kvm_arch_exit() doesn't properly unwind if a later stage of kvm_init() fails. While the soon-to-be-deleted comment about compiling as a module being unsupported is correct, kvm_arch_exit() can still be called by kvm_init() if any step after the call to kvm_arch_init() succeeds. Add a FIXME to call out that pKVM initialization isn't unwound if kvm_init() fails, which is a pre-existing problem inherited from kvm_arch_exit(). Making kvm_arch_init() a nop will also allow dropping kvm_arch_init() and kvm_arch_exit() entirely once all other architectures follow suit. Signed-off-by: Sean Christopherson --- arch/arm64/kvm/arm.c | 25 - 1 file changed, 16 insertions(+), 9 deletions(-) diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c index e6f6fcfe6bcc..d3a4db1abf32 100644 --- a/arch/arm64/kvm/arm.c +++ b/arch/arm64/kvm/arm.c @@ -2195,7 +2195,7 @@ void kvm_arch_irq_bypass_start(struct irq_bypass_consumer *cons) /** * Initialize Hyp-mode and memory mappings on all CPUs. */ -int kvm_arch_init(void *opaque) +int kvm_arm_init(void) { int err; bool in_hyp_mode; @@ -2269,6 +2269,14 @@ int kvm_arch_init(void *opaque) kvm_info("Hyp mode initialized successfully\n"); } + /* +* FIXME: Do something reasonable if kvm_init() fails after pKVM +* hypervisor protection is finalized. +*/ + err = kvm_init(NULL, sizeof(struct kvm_vcpu), 0, THIS_MODULE); + if (err) + goto out_subs; + return 0; out_subs: @@ -2281,10 +2289,15 @@ int kvm_arch_init(void *opaque) return err; } +int kvm_arch_init(void *opaque) +{ + return 0; +} + /* NOP: Compiling as a module not supported */ void kvm_arch_exit(void) { - kvm_unregister_perf_callbacks(); + } static int __init early_kvm_mode_cfg(char *arg) @@ -2325,10 +2338,4 @@ enum kvm_mode kvm_get_mode(void) return kvm_mode; } -static int arm_init(void) -{ - int rc = kvm_init(NULL, sizeof(struct kvm_vcpu), 0, THIS_MODULE); - return rc; -} - -module_init(arm_init); +module_init(kvm_arm_init); -- 2.38.1.584.g0f3c55d4c2-goog
[PATCH v2 18/50] KVM: arm64: Unregister perf callbacks if hypervisor finalization fails
Undo everything done by init_subsystems() if a later initialization step fails, i.e. unregister perf callbacks in addition to unregistering the power management notifier. Fixes: bfa79a805454 ("KVM: arm64: Elevate hypervisor mappings creation at EL2") Signed-off-by: Sean Christopherson --- arch/arm64/kvm/arm.c | 13 +++-- 1 file changed, 11 insertions(+), 2 deletions(-) diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c index fa986ebb4793..e6f6fcfe6bcc 100644 --- a/arch/arm64/kvm/arm.c +++ b/arch/arm64/kvm/arm.c @@ -1839,12 +1839,21 @@ static int init_subsystems(void) kvm_register_perf_callbacks(NULL); out: + if (err) + hyp_cpu_pm_exit(); + if (err || !is_protected_kvm_enabled()) on_each_cpu(_kvm_arch_hardware_disable, NULL, 1); return err; } +static void teardown_subsystems(void) +{ + kvm_unregister_perf_callbacks(); + hyp_cpu_pm_exit(); +} + static void teardown_hyp_mode(void) { int cpu; @@ -2242,7 +2251,7 @@ int kvm_arch_init(void *opaque) err = init_subsystems(); if (err) - goto out_subs; + goto out_hyp; if (!in_hyp_mode) { err = finalize_hyp_mode(); @@ -2263,7 +2272,7 @@ int kvm_arch_init(void *opaque) return 0; out_subs: - hyp_cpu_pm_exit(); + teardown_subsystems(); out_hyp: if (!in_hyp_mode) teardown_hyp_mode(); -- 2.38.1.584.g0f3c55d4c2-goog
[PATCH v2 17/50] KVM: arm64: Free hypervisor allocations if vector slot init fails
Teardown hypervisor mode if vector slot setup fails in order to avoid leaking any allocations done by init_hyp_mode(). Fixes: b881cdce77b4 ("KVM: arm64: Allocate hyp vectors statically") Signed-off-by: Sean Christopherson --- arch/arm64/kvm/arm.c | 15 --- 1 file changed, 8 insertions(+), 7 deletions(-) diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c index 07f5cef5c33b..fa986ebb4793 100644 --- a/arch/arm64/kvm/arm.c +++ b/arch/arm64/kvm/arm.c @@ -2237,18 +2237,18 @@ int kvm_arch_init(void *opaque) err = kvm_init_vector_slots(); if (err) { kvm_err("Cannot initialise vector slots\n"); - goto out_err; - } - - err = init_subsystems(); - if (err) goto out_hyp; + } + + err = init_subsystems(); + if (err) + goto out_subs; if (!in_hyp_mode) { err = finalize_hyp_mode(); if (err) { kvm_err("Failed to finalize Hyp protection\n"); - goto out_hyp; + goto out_subs; } } @@ -2262,8 +2262,9 @@ int kvm_arch_init(void *opaque) return 0; -out_hyp: +out_subs: hyp_cpu_pm_exit(); +out_hyp: if (!in_hyp_mode) teardown_hyp_mode(); out_err: -- 2.38.1.584.g0f3c55d4c2-goog
[PATCH v2 16/50] KVM: arm64: Simplify the CPUHP logic
From: Marc Zyngier For a number of historical reasons, the KVM/arm64 hotplug setup is pretty complicated, and we have two extra CPUHP notifiers for vGIC and timers. It looks pretty pointless, and gets in the way of further changes. So let's just expose some helpers that can be called from the core CPUHP callback, and get rid of everything else. This gives us the opportunity to drop a useless notifier entry, as well as tidy-up the timer enable/disable, which was a bit odd. Signed-off-by: Marc Zyngier Signed-off-by: Isaku Yamahata Signed-off-by: Sean Christopherson --- arch/arm64/kvm/arch_timer.c | 27 ++- arch/arm64/kvm/arm.c| 13 + arch/arm64/kvm/vgic/vgic-init.c | 19 ++- include/kvm/arm_arch_timer.h| 4 include/kvm/arm_vgic.h | 4 include/linux/cpuhotplug.h | 3 --- 6 files changed, 33 insertions(+), 37 deletions(-) diff --git a/arch/arm64/kvm/arch_timer.c b/arch/arm64/kvm/arch_timer.c index bb24a76b4224..33fca1a691a5 100644 --- a/arch/arm64/kvm/arch_timer.c +++ b/arch/arm64/kvm/arch_timer.c @@ -811,10 +811,18 @@ void kvm_timer_vcpu_init(struct kvm_vcpu *vcpu) ptimer->host_timer_irq_flags = host_ptimer_irq_flags; } -static void kvm_timer_init_interrupt(void *info) +void kvm_timer_cpu_up(void) { enable_percpu_irq(host_vtimer_irq, host_vtimer_irq_flags); - enable_percpu_irq(host_ptimer_irq, host_ptimer_irq_flags); + if (host_ptimer_irq) + enable_percpu_irq(host_ptimer_irq, host_ptimer_irq_flags); +} + +void kvm_timer_cpu_down(void) +{ + disable_percpu_irq(host_vtimer_irq); + if (host_ptimer_irq) + disable_percpu_irq(host_ptimer_irq); } int kvm_arm_timer_set_reg(struct kvm_vcpu *vcpu, u64 regid, u64 value) @@ -976,18 +984,6 @@ void kvm_arm_timer_write_sysreg(struct kvm_vcpu *vcpu, preempt_enable(); } -static int kvm_timer_starting_cpu(unsigned int cpu) -{ - kvm_timer_init_interrupt(NULL); - return 0; -} - -static int kvm_timer_dying_cpu(unsigned int cpu) -{ - disable_percpu_irq(host_vtimer_irq); - return 0; -} - static int timer_irq_set_vcpu_affinity(struct irq_data *d, void *vcpu) { if (vcpu) @@ -1185,9 +1181,6 @@ int kvm_timer_hyp_init(bool has_gic) goto out_free_irq; } - cpuhp_setup_state(CPUHP_AP_KVM_ARM_TIMER_STARTING, - "kvm/arm/timer:starting", kvm_timer_starting_cpu, - kvm_timer_dying_cpu); return 0; out_free_irq: free_percpu_irq(host_vtimer_irq, kvm_get_running_vcpus()); diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c index c6732ac329ca..07f5cef5c33b 100644 --- a/arch/arm64/kvm/arm.c +++ b/arch/arm64/kvm/arm.c @@ -1670,7 +1670,15 @@ static void _kvm_arch_hardware_enable(void *discard) int kvm_arch_hardware_enable(void) { + int was_enabled = __this_cpu_read(kvm_arm_hardware_enabled); + _kvm_arch_hardware_enable(NULL); + + if (!was_enabled) { + kvm_vgic_cpu_up(); + kvm_timer_cpu_up(); + } + return 0; } @@ -1684,6 +1692,11 @@ static void _kvm_arch_hardware_disable(void *discard) void kvm_arch_hardware_disable(void) { + if (__this_cpu_read(kvm_arm_hardware_enabled)) { + kvm_timer_cpu_down(); + kvm_vgic_cpu_down(); + } + if (!is_protected_kvm_enabled()) _kvm_arch_hardware_disable(NULL); } diff --git a/arch/arm64/kvm/vgic/vgic-init.c b/arch/arm64/kvm/vgic/vgic-init.c index f6d4f4052555..6c7f6ae21ec0 100644 --- a/arch/arm64/kvm/vgic/vgic-init.c +++ b/arch/arm64/kvm/vgic/vgic-init.c @@ -465,17 +465,15 @@ int kvm_vgic_map_resources(struct kvm *kvm) /* GENERIC PROBE */ -static int vgic_init_cpu_starting(unsigned int cpu) +void kvm_vgic_cpu_up(void) { enable_percpu_irq(kvm_vgic_global_state.maint_irq, 0); - return 0; } -static int vgic_init_cpu_dying(unsigned int cpu) +void kvm_vgic_cpu_down(void) { disable_percpu_irq(kvm_vgic_global_state.maint_irq); - return 0; } static irqreturn_t vgic_maintenance_handler(int irq, void *data) @@ -584,19 +582,6 @@ int kvm_vgic_hyp_init(void) return ret; } - ret = cpuhp_setup_state(CPUHP_AP_KVM_ARM_VGIC_INIT_STARTING, - "kvm/arm/vgic:starting", - vgic_init_cpu_starting, vgic_init_cpu_dying); - if (ret) { - kvm_err("Cannot register vgic CPU notifier\n"); - goto out_free_irq; - } - kvm_info("vgic interrupt IRQ%d\n", kvm_vgic_global_state.maint_irq); return 0; - -out_free_irq: - free_percpu_irq(kvm_vgic_global_state.maint_irq, - kvm_get_running_vcpus()); - return ret; } diff --git a/include/kvm/arm_arch_timer.h b/include/kvm/arm_arch_timer.h index cd6d8f260eab..1638418f72dd 100644 ---
[PATCH v2 15/50] KVM: x86: Serialize vendor module initialization (hardware setup)
Acquire a new mutex, vendor_module_lock, in kvm_x86_vendor_init() while doing hardware setup to ensure that concurrent calls are fully serialized. KVM rejects attempts to load vendor modules if a different module has already been loaded, but doesn't handle the case where multiple vendor modules are loaded at the same time, and module_init() doesn't run under the global module_mutex. Note, in practice, this is likely a benign bug as no platform exists that supports both SVM and VMX, i.e. barring a weird VM setup, one of the vendor modules is guaranteed to fail a support check before modifying common KVM state. Alternatively, KVM could perform an atomic CMPXCHG on .hardware_enable, but that comes with its own ugliness as it would require setting .hardware_enable before success is guaranteed, e.g. attempting to load the "wrong" could result in spurious failure to load the "right" module. Introduce a new mutex as using kvm_lock is extremely deadlock prone due to kvm_lock being taken under cpus_write_lock(), and in the future, under under cpus_read_lock(). Any operation that takes cpus_read_lock() while holding kvm_lock would potentially deadlock, e.g. kvm_timer_init() takes cpus_read_lock() to register a callback. In theory, KVM could avoid such problematic paths, i.e. do less setup under kvm_lock, but avoiding all calls to cpus_read_lock() is subtly difficult and thus fragile. E.g. updating static calls also acquires cpus_read_lock(). Inverting the lock ordering, i.e. always taking kvm_lock outside cpus_read_lock(), is not a viable option as kvm_lock is taken in various callbacks that may be invoked under cpus_read_lock(), e.g. x86's kvmclock_cpufreq_notifier(). The lockdep splat below is dependent on future patches to take cpus_read_lock() in hardware_enable_all(), but as above, deadlock is already is already possible. == WARNING: possible circular locking dependency detected 6.0.0-smp--7ec93244f194-init2 #27 Tainted: G O -- stable/251833 is trying to acquire lock: c097ea28 (kvm_lock){+.+.}-{3:3}, at: hardware_enable_all+0x1f/0xc0 [kvm] but task is already holding lock: a2456828 (cpu_hotplug_lock){}-{0:0}, at: hardware_enable_all+0xf/0xc0 [kvm] which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (cpu_hotplug_lock){}-{0:0}: cpus_read_lock+0x2a/0xa0 __cpuhp_setup_state+0x2b/0x60 __kvm_x86_vendor_init+0x16a/0x1870 [kvm] kvm_x86_vendor_init+0x23/0x40 [kvm] 0xc0a4d02b do_one_initcall+0x110/0x200 do_init_module+0x4f/0x250 load_module+0x1730/0x18f0 __se_sys_finit_module+0xca/0x100 __x64_sys_finit_module+0x1d/0x20 do_syscall_64+0x3d/0x80 entry_SYSCALL_64_after_hwframe+0x63/0xcd -> #0 (kvm_lock){+.+.}-{3:3}: __lock_acquire+0x16f4/0x30d0 lock_acquire+0xb2/0x190 __mutex_lock+0x98/0x6f0 mutex_lock_nested+0x1b/0x20 hardware_enable_all+0x1f/0xc0 [kvm] kvm_dev_ioctl+0x45e/0x930 [kvm] __se_sys_ioctl+0x77/0xc0 __x64_sys_ioctl+0x1d/0x20 do_syscall_64+0x3d/0x80 entry_SYSCALL_64_after_hwframe+0x63/0xcd other info that might help us debug this: Possible unsafe locking scenario: CPU0CPU1 lock(cpu_hotplug_lock); lock(kvm_lock); lock(cpu_hotplug_lock); lock(kvm_lock); *** DEADLOCK *** 1 lock held by stable/251833: #0: a2456828 (cpu_hotplug_lock){}-{0:0}, at: hardware_enable_all+0xf/0xc0 [kvm] Signed-off-by: Sean Christopherson --- Documentation/virt/kvm/locking.rst | 6 ++ arch/x86/kvm/x86.c | 18 -- 2 files changed, 22 insertions(+), 2 deletions(-) diff --git a/Documentation/virt/kvm/locking.rst b/Documentation/virt/kvm/locking.rst index 845a561629f1..132a9e5436e5 100644 --- a/Documentation/virt/kvm/locking.rst +++ b/Documentation/virt/kvm/locking.rst @@ -282,3 +282,9 @@ time it will be set using the Dirty tracking mechanism described above. wakeup notification event since external interrupts from the assigned devices happens, we will find the vCPU on the list to wakeup. + +``vendor_module_lock`` + +:Type: mutex +:Arch: x86 +:Protects: loading a vendor module (kvm_amd or kvm_intel) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index b33932fca36e..45184ca89317 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -128,6 +128,7 @@ static int kvm_vcpu_do_singlestep(struct kvm_vcpu *vcpu); static
[PATCH v2 14/50] KVM: VMX: Do _all_ initialization before exposing /dev/kvm to userspace
Call kvm_init() only after _all_ setup is complete, as kvm_init() exposes /dev/kvm to userspace and thus allows userspace to create VMs (and call other ioctls). E.g. KVM will encounter a NULL pointer when attempting to add a vCPU to the per-CPU loaded_vmcss_on_cpu list if userspace is able to create a VM before vmx_init() configures said list. BUG: kernel NULL pointer dereference, address: 0008 #PF: supervisor write access in kernel mode #PF: error_code(0x0002) - not-present page PGD 0 P4D 0 Oops: 0002 [#1] SMP CPU: 6 PID: 1143 Comm: stable Not tainted 6.0.0-rc7+ #988 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015 RIP: 0010:vmx_vcpu_load_vmcs+0x68/0x230 [kvm_intel] vmx_vcpu_load+0x16/0x60 [kvm_intel] kvm_arch_vcpu_load+0x32/0x1f0 [kvm] vcpu_load+0x2f/0x40 [kvm] kvm_arch_vcpu_create+0x231/0x310 [kvm] kvm_vm_ioctl+0x79f/0xe10 [kvm] ? handle_mm_fault+0xb1/0x220 __x64_sys_ioctl+0x80/0xb0 do_syscall_64+0x2b/0x50 entry_SYSCALL_64_after_hwframe+0x46/0xb0 RIP: 0033:0x7f5a6b05743b Modules linked in: vhost_net vhost vhost_iotlb tap kvm_intel(+) kvm irqbypass Cc: sta...@vger.kernel.org Signed-off-by: Sean Christopherson --- arch/x86/kvm/vmx/vmx.c | 34 +- 1 file changed, 21 insertions(+), 13 deletions(-) diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index 8e81cd94407d..76185a7a7ded 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -8521,19 +8521,23 @@ static void vmx_cleanup_l1d_flush(void) l1tf_vmx_mitigation = VMENTER_L1D_FLUSH_AUTO; } -static void vmx_exit(void) +static void __vmx_exit(void) { + allow_smaller_maxphyaddr = false; + #ifdef CONFIG_KEXEC_CORE RCU_INIT_POINTER(crash_vmclear_loaded_vmcss, NULL); synchronize_rcu(); #endif - - kvm_exit(); - kvm_x86_vendor_exit(); - vmx_cleanup_l1d_flush(); +} - allow_smaller_maxphyaddr = false; +static void vmx_exit(void) +{ + kvm_exit(); + kvm_x86_vendor_exit(); + + __vmx_exit(); } module_exit(vmx_exit); @@ -8551,11 +8555,6 @@ static int __init vmx_init(void) if (r) return r; - r = kvm_init(_init_ops, sizeof(struct vcpu_vmx), -__alignof__(struct vcpu_vmx), THIS_MODULE); - if (r) - goto err_kvm_init; - /* * Must be called after common x86 init so enable_ept is properly set * up. Hand the parameter mitigation value in which was stored in @@ -8589,11 +8588,20 @@ static int __init vmx_init(void) if (!enable_ept) allow_smaller_maxphyaddr = true; + /* +* Common KVM initialization _must_ come last, after this, /dev/kvm is +* exposed to userspace! +*/ + r = kvm_init(_init_ops, sizeof(struct vcpu_vmx), +__alignof__(struct vcpu_vmx), THIS_MODULE); + if (r) + goto err_kvm_init; + return 0; -err_l1d_flush: - vmx_exit(); err_kvm_init: + __vmx_exit(); +err_l1d_flush: kvm_x86_vendor_exit(); return r; } -- 2.38.1.584.g0f3c55d4c2-goog
[PATCH v2 13/50] KVM: x86: Move guts of kvm_arch_init() to standalone helper
Move the guts of kvm_arch_init() to a new helper, kvm_x86_vendor_init(), so that VMX can do _all_ arch and vendor initialization before calling kvm_init(). Calling kvm_init() must be the _very_ last step during init, as kvm_init() exposes /dev/kvm to userspace, i.e. allows creating VMs. No functional change intended. Signed-off-by: Sean Christopherson --- arch/x86/include/asm/kvm_host.h | 3 +++ arch/x86/kvm/svm/svm.c | 23 +-- arch/x86/kvm/vmx/vmx.c | 21 +++-- arch/x86/kvm/x86.c | 15 +-- 4 files changed, 52 insertions(+), 10 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 70af7240a1d5..04a9ae66fb8d 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1758,6 +1758,9 @@ extern struct kvm_x86_ops kvm_x86_ops; #define KVM_X86_OP_OPTIONAL_RET0 KVM_X86_OP #include +int kvm_x86_vendor_init(struct kvm_x86_init_ops *ops); +void kvm_x86_vendor_exit(void); + #define __KVM_HAVE_ARCH_VM_ALLOC static inline struct kvm *kvm_arch_alloc_vm(void) { diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c index 91352d692845..19e81a99c58f 100644 --- a/arch/x86/kvm/svm/svm.c +++ b/arch/x86/kvm/svm/svm.c @@ -5091,15 +5091,34 @@ static struct kvm_x86_init_ops svm_init_ops __initdata = { static int __init svm_init(void) { + int r; + __unused_size_checks(); - return kvm_init(_init_ops, sizeof(struct vcpu_svm), - __alignof__(struct vcpu_svm), THIS_MODULE); + r = kvm_x86_vendor_init(_init_ops); + if (r) + return r; + + /* +* Common KVM initialization _must_ come last, after this, /dev/kvm is +* exposed to userspace! +*/ + r = kvm_init(_init_ops, sizeof(struct vcpu_svm), +__alignof__(struct vcpu_svm), THIS_MODULE); + if (r) + goto err_kvm_init; + + return 0; + +err_kvm_init: + kvm_x86_vendor_exit(); + return r; } static void __exit svm_exit(void) { kvm_exit(); + kvm_x86_vendor_exit(); } module_init(svm_init) diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index b8bf95b9710d..8e81cd94407d 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -8529,6 +8529,7 @@ static void vmx_exit(void) #endif kvm_exit(); + kvm_x86_vendor_exit(); vmx_cleanup_l1d_flush(); @@ -8546,23 +8547,25 @@ static int __init vmx_init(void) */ hv_init_evmcs(); + r = kvm_x86_vendor_init(_init_ops); + if (r) + return r; + r = kvm_init(_init_ops, sizeof(struct vcpu_vmx), __alignof__(struct vcpu_vmx), THIS_MODULE); if (r) - return r; + goto err_kvm_init; /* -* Must be called after kvm_init() so enable_ept is properly set +* Must be called after common x86 init so enable_ept is properly set * up. Hand the parameter mitigation value in which was stored in * the pre module init parser. If no parameter was given, it will * contain 'auto' which will be turned into the default 'cond' * mitigation mode. */ r = vmx_setup_l1d_flush(vmentry_l1d_flush_param); - if (r) { - vmx_exit(); - return r; - } + if (r) + goto err_l1d_flush; vmx_setup_fb_clear_ctrl(); @@ -8587,5 +8590,11 @@ static int __init vmx_init(void) allow_smaller_maxphyaddr = true; return 0; + +err_l1d_flush: + vmx_exit(); +err_kvm_init: + kvm_x86_vendor_exit(); + return r; } module_init(vmx_init); diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 915d57c3b41d..b33932fca36e 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -9278,7 +9278,16 @@ static inline void kvm_ops_update(struct kvm_x86_init_ops *ops) int kvm_arch_init(void *opaque) { - struct kvm_x86_init_ops *ops = opaque; + return 0; +} + +void kvm_arch_exit(void) +{ + +} + +int kvm_x86_vendor_init(struct kvm_x86_init_ops *ops) +{ u64 host_pat; int r; @@ -9410,8 +9419,9 @@ int kvm_arch_init(void *opaque) kmem_cache_destroy(x86_emulator_cache); return r; } +EXPORT_SYMBOL_GPL(kvm_x86_vendor_init); -void kvm_arch_exit(void) +void kvm_x86_vendor_exit(void) { kvm_unregister_perf_callbacks(); @@ -9440,6 +9450,7 @@ void kvm_arch_exit(void) WARN_ON(static_branch_unlikely(_xen_enabled.key)); #endif } +EXPORT_SYMBOL_GPL(kvm_x86_vendor_exit); static int __kvm_emulate_halt(struct kvm_vcpu *vcpu, int state, int reason) { -- 2.38.1.584.g0f3c55d4c2-goog
[PATCH v2 12/50] KVM: VMX: Move Hyper-V eVMCS initialization to helper
Move Hyper-V's eVMCS initialization to a dedicated helper to clean up vmx_init(), and add a comment to call out that the Hyper-V init code doesn't need to be unwound if vmx_init() ultimately fails. No functional change intended. Signed-off-by: Sean Christopherson --- arch/x86/kvm/vmx/vmx.c | 73 +- 1 file changed, 43 insertions(+), 30 deletions(-) diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index c0de7160700b..b8bf95b9710d 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -523,6 +523,8 @@ static inline void vmx_segment_cache_clear(struct vcpu_vmx *vmx) static unsigned long host_idt_base; #if IS_ENABLED(CONFIG_HYPERV) +static struct kvm_x86_ops vmx_x86_ops __initdata; + static bool __read_mostly enlightened_vmcs = true; module_param(enlightened_vmcs, bool, 0444); @@ -551,6 +553,43 @@ static int hv_enable_l2_tlb_flush(struct kvm_vcpu *vcpu) return 0; } +static __init void hv_init_evmcs(void) +{ + int cpu; + + if (!enlightened_vmcs) + return; + + /* +* Enlightened VMCS usage should be recommended and the host needs +* to support eVMCS v1 or above. +*/ + if (ms_hyperv.hints & HV_X64_ENLIGHTENED_VMCS_RECOMMENDED && + (ms_hyperv.nested_features & HV_X64_ENLIGHTENED_VMCS_VERSION) >= +KVM_EVMCS_VERSION) { + + /* Check that we have assist pages on all online CPUs */ + for_each_online_cpu(cpu) { + if (!hv_get_vp_assist_page(cpu)) { + enlightened_vmcs = false; + break; + } + } + + if (enlightened_vmcs) { + pr_info("KVM: vmx: using Hyper-V Enlightened VMCS\n"); + static_branch_enable(_evmcs); + } + + if (ms_hyperv.nested_features & HV_X64_NESTED_DIRECT_FLUSH) + vmx_x86_ops.enable_l2_tlb_flush + = hv_enable_l2_tlb_flush; + + } else { + enlightened_vmcs = false; + } +} + static void hv_reset_evmcs(void) { struct hv_vp_assist_page *vp_ap; @@ -577,6 +616,7 @@ static void hv_reset_evmcs(void) } #else /* IS_ENABLED(CONFIG_HYPERV) */ +static void hv_init_evmcs(void) {} static void hv_reset_evmcs(void) {} #endif /* IS_ENABLED(CONFIG_HYPERV) */ @@ -8500,38 +8540,11 @@ static int __init vmx_init(void) { int r, cpu; -#if IS_ENABLED(CONFIG_HYPERV) /* -* Enlightened VMCS usage should be recommended and the host needs -* to support eVMCS v1 or above. We can also disable eVMCS support -* with module parameter. +* Note, hv_init_evmcs() touches only VMX knobs, i.e. there's nothing +* to unwind if a later step fails. */ - if (enlightened_vmcs && - ms_hyperv.hints & HV_X64_ENLIGHTENED_VMCS_RECOMMENDED && - (ms_hyperv.nested_features & HV_X64_ENLIGHTENED_VMCS_VERSION) >= - KVM_EVMCS_VERSION) { - - /* Check that we have assist pages on all online CPUs */ - for_each_online_cpu(cpu) { - if (!hv_get_vp_assist_page(cpu)) { - enlightened_vmcs = false; - break; - } - } - - if (enlightened_vmcs) { - pr_info("KVM: vmx: using Hyper-V Enlightened VMCS\n"); - static_branch_enable(_evmcs); - } - - if (ms_hyperv.nested_features & HV_X64_NESTED_DIRECT_FLUSH) - vmx_x86_ops.enable_l2_tlb_flush - = hv_enable_l2_tlb_flush; - - } else { - enlightened_vmcs = false; - } -#endif + hv_init_evmcs(); r = kvm_init(_init_ops, sizeof(struct vcpu_vmx), __alignof__(struct vcpu_vmx), THIS_MODULE); -- 2.38.1.584.g0f3c55d4c2-goog
[PATCH v2 11/50] KVM: VMX: Don't bother disabling eVMCS static key on module exit
Don't disable the eVMCS static key on module exit, kvm_intel.ko owns the key so there can't possibly be users after the kvm_intel.ko is unloaded, at least not without much bigger issues. Signed-off-by: Sean Christopherson --- arch/x86/kvm/vmx/vmx.c | 4 1 file changed, 4 deletions(-) diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index d85d175dca70..c0de7160700b 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -8490,10 +8490,6 @@ static void vmx_exit(void) kvm_exit(); -#if IS_ENABLED(CONFIG_HYPERV) - if (static_branch_unlikely(_evmcs)) - static_branch_disable(_evmcs); -#endif vmx_cleanup_l1d_flush(); allow_smaller_maxphyaddr = false; -- 2.38.1.584.g0f3c55d4c2-goog
[PATCH v2 10/50] KVM: VMX: Reset eVMCS controls in VP assist page during hardware disabling
Reset the eVMCS controls in the per-CPU VP assist page during hardware disabling instead of waiting until kvm-intel's module exit. The controls are activated if and only if KVM creates a VM, i.e. don't need to be reset if hardware is never enabled. Doing the reset during hardware disabling will naturally fix a potential NULL pointer deref bug once KVM disables CPU hotplug while enabling and disabling hardware (which is necessary to fix a variety of bugs). If the kernel is running as the root partition, the VP assist page is unmapped during CPU hot unplug, and so KVM's clearing of the eVMCS controls needs to occur with CPU hot(un)plug disabled, otherwise KVM could attempt to write to a CPU's VP assist page after it's unmapped. Reported-by: Vitaly Kuznetsov Signed-off-by: Sean Christopherson --- arch/x86/kvm/vmx/vmx.c | 50 +- 1 file changed, 30 insertions(+), 20 deletions(-) diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index cea8c07f5229..d85d175dca70 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -551,6 +551,33 @@ static int hv_enable_l2_tlb_flush(struct kvm_vcpu *vcpu) return 0; } +static void hv_reset_evmcs(void) +{ + struct hv_vp_assist_page *vp_ap; + + if (!static_branch_unlikely(_evmcs)) + return; + + /* +* KVM should enable eVMCS if and only if all CPUs have a VP assist +* page, and should reject CPU onlining if eVMCS is enabled the CPU +* doesn't have a VP assist page allocated. +*/ + vp_ap = hv_get_vp_assist_page(smp_processor_id()); + if (WARN_ON_ONCE(!vp_ap)) + return; + + /* +* Reset everything to support using non-enlightened VMCS access later +* (e.g. when we reload the module with enlightened_vmcs=0) +*/ + vp_ap->nested_control.features.directhypercall = 0; + vp_ap->current_nested_vmcs = 0; + vp_ap->enlighten_vmentry = 0; +} + +#else /* IS_ENABLED(CONFIG_HYPERV) */ +static void hv_reset_evmcs(void) {} #endif /* IS_ENABLED(CONFIG_HYPERV) */ /* @@ -2496,6 +2523,8 @@ static void vmx_hardware_disable(void) if (cpu_vmxoff()) kvm_spurious_fault(); + hv_reset_evmcs(); + intel_pt_handle_vmx(0); } @@ -8462,27 +8491,8 @@ static void vmx_exit(void) kvm_exit(); #if IS_ENABLED(CONFIG_HYPERV) - if (static_branch_unlikely(_evmcs)) { - int cpu; - struct hv_vp_assist_page *vp_ap; - /* -* Reset everything to support using non-enlightened VMCS -* access later (e.g. when we reload the module with -* enlightened_vmcs=0) -*/ - for_each_online_cpu(cpu) { - vp_ap = hv_get_vp_assist_page(cpu); - - if (!vp_ap) - continue; - - vp_ap->nested_control.features.directhypercall = 0; - vp_ap->current_nested_vmcs = 0; - vp_ap->enlighten_vmentry = 0; - } - + if (static_branch_unlikely(_evmcs)) static_branch_disable(_evmcs); - } #endif vmx_cleanup_l1d_flush(); -- 2.38.1.584.g0f3c55d4c2-goog
[PATCH v2 09/50] KVM: Drop arch hardware (un)setup hooks
Drop kvm_arch_hardware_setup() and kvm_arch_hardware_unsetup() now that all implementations are nops. No functional change intended. Signed-off-by: Sean Christopherson Reviewed-by: Eric Farman # s390 Acked-by: Anup Patel --- arch/arm64/include/asm/kvm_host.h | 1 - arch/arm64/kvm/arm.c| 5 - arch/mips/include/asm/kvm_host.h| 1 - arch/mips/kvm/mips.c| 5 - arch/powerpc/include/asm/kvm_host.h | 1 - arch/powerpc/kvm/powerpc.c | 5 - arch/riscv/include/asm/kvm_host.h | 1 - arch/riscv/kvm/main.c | 5 - arch/s390/kvm/kvm-s390.c| 10 -- arch/x86/kvm/x86.c | 10 -- include/linux/kvm_host.h| 2 -- virt/kvm/kvm_main.c | 7 --- 12 files changed, 53 deletions(-) diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h index 45e2136322ba..5d5a887e63a5 100644 --- a/arch/arm64/include/asm/kvm_host.h +++ b/arch/arm64/include/asm/kvm_host.h @@ -859,7 +859,6 @@ static inline bool kvm_system_needs_idmapped_vectors(void) void kvm_arm_vcpu_ptrauth_trap(struct kvm_vcpu *vcpu); -static inline void kvm_arch_hardware_unsetup(void) {} static inline void kvm_arch_sync_events(struct kvm *kvm) {} static inline void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu) {} diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c index 7b107fa540fa..c6732ac329ca 100644 --- a/arch/arm64/kvm/arm.c +++ b/arch/arm64/kvm/arm.c @@ -63,11 +63,6 @@ int kvm_arch_vcpu_should_kick(struct kvm_vcpu *vcpu) return kvm_vcpu_exiting_guest_mode(vcpu) == IN_GUEST_MODE; } -int kvm_arch_hardware_setup(void *opaque) -{ - return 0; -} - int kvm_arch_check_processor_compat(void *opaque) { return 0; diff --git a/arch/mips/include/asm/kvm_host.h b/arch/mips/include/asm/kvm_host.h index 5cedb28e8a40..28f0ba97db71 100644 --- a/arch/mips/include/asm/kvm_host.h +++ b/arch/mips/include/asm/kvm_host.h @@ -888,7 +888,6 @@ extern unsigned long kvm_mips_get_ramsize(struct kvm *kvm); extern int kvm_vcpu_ioctl_interrupt(struct kvm_vcpu *vcpu, struct kvm_mips_interrupt *irq); -static inline void kvm_arch_hardware_unsetup(void) {} static inline void kvm_arch_sync_events(struct kvm *kvm) {} static inline void kvm_arch_free_memslot(struct kvm *kvm, struct kvm_memory_slot *slot) {} diff --git a/arch/mips/kvm/mips.c b/arch/mips/kvm/mips.c index a25e0b73ee70..af29490d9740 100644 --- a/arch/mips/kvm/mips.c +++ b/arch/mips/kvm/mips.c @@ -135,11 +135,6 @@ void kvm_arch_hardware_disable(void) kvm_mips_callbacks->hardware_disable(); } -int kvm_arch_hardware_setup(void *opaque) -{ - return 0; -} - int kvm_arch_check_processor_compat(void *opaque) { return 0; diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index caea15dcb91d..5d2c3a487e73 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -877,7 +877,6 @@ struct kvm_vcpu_arch { #define __KVM_HAVE_CREATE_DEVICE static inline void kvm_arch_hardware_disable(void) {} -static inline void kvm_arch_hardware_unsetup(void) {} static inline void kvm_arch_sync_events(struct kvm *kvm) {} static inline void kvm_arch_memslots_updated(struct kvm *kvm, u64 gen) {} static inline void kvm_arch_flush_shadow_all(struct kvm *kvm) {} diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c index 04494a4fb37a..5faf69421f13 100644 --- a/arch/powerpc/kvm/powerpc.c +++ b/arch/powerpc/kvm/powerpc.c @@ -440,11 +440,6 @@ int kvm_arch_hardware_enable(void) return 0; } -int kvm_arch_hardware_setup(void *opaque) -{ - return 0; -} - int kvm_arch_check_processor_compat(void *opaque) { return kvmppc_core_check_processor_compat(); diff --git a/arch/riscv/include/asm/kvm_host.h b/arch/riscv/include/asm/kvm_host.h index dbbf43d52623..8c771fc4f5d2 100644 --- a/arch/riscv/include/asm/kvm_host.h +++ b/arch/riscv/include/asm/kvm_host.h @@ -229,7 +229,6 @@ struct kvm_vcpu_arch { bool pause; }; -static inline void kvm_arch_hardware_unsetup(void) {} static inline void kvm_arch_sync_events(struct kvm *kvm) {} static inline void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu) {} diff --git a/arch/riscv/kvm/main.c b/arch/riscv/kvm/main.c index df2d8716851f..a146fa0ce4d2 100644 --- a/arch/riscv/kvm/main.c +++ b/arch/riscv/kvm/main.c @@ -25,11 +25,6 @@ int kvm_arch_check_processor_compat(void *opaque) return 0; } -int kvm_arch_hardware_setup(void *opaque) -{ - return 0; -} - int kvm_arch_hardware_enable(void) { unsigned long hideleg, hedeleg; diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c index 97c7ccd189eb..829e6e046003 100644 --- a/arch/s390/kvm/kvm-s390.c +++ b/arch/s390/kvm/kvm-s390.c @@ -329,16 +329,6 @@ static struct notifier_block kvm_clock_notifier = {
[PATCH v2 08/50] KVM: x86: Move hardware setup/unsetup to init/exit
Now that kvm_arch_hardware_setup() is called immediately after kvm_arch_init(), fold the guts of kvm_arch_hardware_(un)setup() into kvm_arch_{init,exit}() as a step towards dropping one of the hooks. To avoid having to unwind various setup, e.g registration of several notifiers, slot in the vendor hardware setup before the registration of said notifiers and callbacks. Introducing a functional change while moving code is less than ideal, but the alternative is adding a pile of unwinding code, which is much more error prone, e.g. several attempts to move the setup code verbatim all introduced bugs. Add a comment to document that kvm_ops_update() is effectively the point of no return, e.g. it sets the kvm_x86_ops.hardware_enable canary and so needs to be unwound. Signed-off-by: Sean Christopherson --- arch/x86/kvm/x86.c | 121 +++-- 1 file changed, 63 insertions(+), 58 deletions(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index a873618564cd..fe5f2e49b5eb 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -9258,6 +9258,24 @@ static struct notifier_block pvclock_gtod_notifier = { }; #endif +static inline void kvm_ops_update(struct kvm_x86_init_ops *ops) +{ + memcpy(_x86_ops, ops->runtime_ops, sizeof(kvm_x86_ops)); + +#define __KVM_X86_OP(func) \ + static_call_update(kvm_x86_##func, kvm_x86_ops.func); +#define KVM_X86_OP(func) \ + WARN_ON(!kvm_x86_ops.func); __KVM_X86_OP(func) +#define KVM_X86_OP_OPTIONAL __KVM_X86_OP +#define KVM_X86_OP_OPTIONAL_RET0(func) \ + static_call_update(kvm_x86_##func, (void *)kvm_x86_ops.func ? : \ + (void *)__static_call_return0); +#include +#undef __KVM_X86_OP + + kvm_pmu_ops_update(ops->pmu_ops); +} + int kvm_arch_init(void *opaque) { struct kvm_x86_init_ops *ops = opaque; @@ -9331,6 +9349,24 @@ int kvm_arch_init(void *opaque) kvm_caps.supported_xcr0 = host_xcr0 & KVM_SUPPORTED_XCR0; } + rdmsrl_safe(MSR_EFER, _efer); + + if (boot_cpu_has(X86_FEATURE_XSAVES)) + rdmsrl(MSR_IA32_XSS, host_xss); + + kvm_init_pmu_capability(); + + r = ops->hardware_setup(); + if (r != 0) + goto out_mmu_exit; + + /* +* Point of no return! DO NOT add error paths below this point unless +* absolutely necessary, as most operations from this point forward +* require unwinding. +*/ + kvm_ops_update(ops); + kvm_timer_init(); if (pi_inject_timer == -1) @@ -9342,8 +9378,32 @@ int kvm_arch_init(void *opaque) set_hv_tscchange_cb(kvm_hyperv_tsc_notifier); #endif + kvm_register_perf_callbacks(ops->handle_intel_pt_intr); + + if (!kvm_cpu_cap_has(X86_FEATURE_XSAVES)) + kvm_caps.supported_xss = 0; + +#define __kvm_cpu_cap_has(UNUSED_, f) kvm_cpu_cap_has(f) + cr4_reserved_bits = __cr4_reserved_bits(__kvm_cpu_cap_has, UNUSED_); +#undef __kvm_cpu_cap_has + + if (kvm_caps.has_tsc_control) { + /* +* Make sure the user can only configure tsc_khz values that +* fit into a signed integer. +* A min value is not calculated because it will always +* be 1 on all machines. +*/ + u64 max = min(0x7fffULL, + __scale_tsc(kvm_caps.max_tsc_scaling_ratio, tsc_khz)); + kvm_caps.max_guest_tsc_khz = max; + } + kvm_caps.default_tsc_scaling_ratio = 1ULL << kvm_caps.tsc_scaling_ratio_frac_bits; + kvm_init_msr_list(); return 0; +out_mmu_exit: + kvm_mmu_vendor_module_exit(); out_free_percpu: free_percpu(user_return_msrs); out_free_x86_emulator_cache: @@ -9353,6 +9413,8 @@ int kvm_arch_init(void *opaque) void kvm_arch_exit(void) { + kvm_unregister_perf_callbacks(); + #ifdef CONFIG_X86_64 if (hypervisor_is_type(X86_HYPER_MS_HYPERV)) clear_hv_tscchange_cb(); @@ -9368,6 +9430,7 @@ void kvm_arch_exit(void) irq_work_sync(_irq_work); cancel_work_sync(_gtod_work); #endif + static_call(kvm_x86_hardware_unsetup)(); kvm_x86_ops.hardware_enable = NULL; kvm_mmu_vendor_module_exit(); free_percpu(user_return_msrs); @@ -11957,72 +12020,14 @@ void kvm_arch_hardware_disable(void) drop_user_return_notifiers(); } -static inline void kvm_ops_update(struct kvm_x86_init_ops *ops) -{ - memcpy(_x86_ops, ops->runtime_ops, sizeof(kvm_x86_ops)); - -#define __KVM_X86_OP(func) \ - static_call_update(kvm_x86_##func, kvm_x86_ops.func); -#define KVM_X86_OP(func) \ - WARN_ON(!kvm_x86_ops.func); __KVM_X86_OP(func) -#define KVM_X86_OP_OPTIONAL __KVM_X86_OP -#define KVM_X86_OP_OPTIONAL_RET0(func) \ - static_call_update(kvm_x86_##func, (void *)kvm_x86_ops.func ? : \ - (void
[PATCH v2 07/50] KVM: x86: Do timer initialization after XCR0 configuration
Move kvm_arch_init()'s call to kvm_timer_init() down a few lines below the XCR0 configuration code. A future patch will move hardware setup into kvm_arch_init() and slot in vendor hardware setup before the call to kvm_timer_init() so that timer initialization (among other stuff) doesn't need to be unwound if vendor setup fails. XCR0 setup on the other hand needs to happen before vendor hardware setup. Signed-off-by: Sean Christopherson --- arch/x86/kvm/x86.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index f18f579ebde8..a873618564cd 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -9326,13 +9326,13 @@ int kvm_arch_init(void *opaque) if (r) goto out_free_percpu; - kvm_timer_init(); - if (boot_cpu_has(X86_FEATURE_XSAVE)) { host_xcr0 = xgetbv(XCR_XFEATURE_ENABLED_MASK); kvm_caps.supported_xcr0 = host_xcr0 & KVM_SUPPORTED_XCR0; } + kvm_timer_init(); + if (pi_inject_timer == -1) pi_inject_timer = housekeeping_enabled(HK_TYPE_TIMER); #ifdef CONFIG_X86_64 -- 2.38.1.584.g0f3c55d4c2-goog
[PATCH v2 06/50] KVM: s390: Move hardware setup/unsetup to init/exit
Now that kvm_arch_hardware_setup() is called immediately after kvm_arch_init(), fold the guts of kvm_arch_hardware_(un)setup() into kvm_arch_{init,exit}() as a step towards dropping one of the hooks. No functional change intended. Signed-off-by: Sean Christopherson Reviewed-by: Eric Farman --- arch/s390/kvm/kvm-s390.c | 23 +-- 1 file changed, 13 insertions(+), 10 deletions(-) diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c index 221481a09742..97c7ccd189eb 100644 --- a/arch/s390/kvm/kvm-s390.c +++ b/arch/s390/kvm/kvm-s390.c @@ -331,21 +331,12 @@ static struct notifier_block kvm_clock_notifier = { int kvm_arch_hardware_setup(void *opaque) { - gmap_notifier.notifier_call = kvm_gmap_notifier; - gmap_register_pte_notifier(_notifier); - vsie_gmap_notifier.notifier_call = kvm_s390_vsie_gmap_notifier; - gmap_register_pte_notifier(_gmap_notifier); - atomic_notifier_chain_register(_epoch_delta_notifier, - _clock_notifier); return 0; } void kvm_arch_hardware_unsetup(void) { - gmap_unregister_pte_notifier(_notifier); - gmap_unregister_pte_notifier(_gmap_notifier); - atomic_notifier_chain_unregister(_epoch_delta_notifier, -_clock_notifier); + } static void allow_cpu_feat(unsigned long nr) @@ -525,6 +516,13 @@ int kvm_arch_init(void *opaque) if (rc) goto err_gib; + gmap_notifier.notifier_call = kvm_gmap_notifier; + gmap_register_pte_notifier(_notifier); + vsie_gmap_notifier.notifier_call = kvm_s390_vsie_gmap_notifier; + gmap_register_pte_notifier(_gmap_notifier); + atomic_notifier_chain_register(_epoch_delta_notifier, + _clock_notifier); + return 0; err_gib: @@ -541,6 +539,11 @@ int kvm_arch_init(void *opaque) void kvm_arch_exit(void) { + gmap_unregister_pte_notifier(_notifier); + gmap_unregister_pte_notifier(_gmap_notifier); + atomic_notifier_chain_unregister(_epoch_delta_notifier, +_clock_notifier); + kvm_s390_gib_destroy(); if (IS_ENABLED(CONFIG_VFIO_PCI_ZDEV_KVM)) kvm_s390_pci_exit(); -- 2.38.1.584.g0f3c55d4c2-goog
[PATCH v2 05/50] KVM: s390: Unwind kvm_arch_init() piece-by-piece() if a step fails
In preparation for folding kvm_arch_hardware_setup() into kvm_arch_init(), unwind initialization one step at a time instead of simply calling kvm_arch_exit(). Using kvm_arch_exit() regardless of which initialization step failed relies on all affected state playing nice with being undone even if said state wasn't first setup. That holds true for state that is currently configured by kvm_arch_init(), but not for state that's handled by kvm_arch_hardware_setup(), e.g. calling gmap_unregister_pte_notifier() without first registering a notifier would result in list corruption due to attempting to delete an entry that was never added to the list. Signed-off-by: Sean Christopherson Reviewed-by: Eric Farman --- arch/s390/kvm/kvm-s390.c | 21 ++--- 1 file changed, 14 insertions(+), 7 deletions(-) diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c index e4890e04b210..221481a09742 100644 --- a/arch/s390/kvm/kvm-s390.c +++ b/arch/s390/kvm/kvm-s390.c @@ -498,11 +498,11 @@ int kvm_arch_init(void *opaque) kvm_s390_dbf_uv = debug_register("kvm-uv", 32, 1, 7 * sizeof(long)); if (!kvm_s390_dbf_uv) - goto out; + goto err_kvm_uv; if (debug_register_view(kvm_s390_dbf, _sprintf_view) || debug_register_view(kvm_s390_dbf_uv, _sprintf_view)) - goto out; + goto err_debug_view; kvm_s390_cpu_feat_init(); @@ -510,25 +510,32 @@ int kvm_arch_init(void *opaque) rc = kvm_register_device_ops(_flic_ops, KVM_DEV_TYPE_FLIC); if (rc) { pr_err("A FLIC registration call failed with rc=%d\n", rc); - goto out; + goto err_flic; } if (IS_ENABLED(CONFIG_VFIO_PCI_ZDEV_KVM)) { rc = kvm_s390_pci_init(); if (rc) { pr_err("Unable to allocate AIFT for PCI\n"); - goto out; + goto err_pci; } } rc = kvm_s390_gib_init(GAL_ISC); if (rc) - goto out; + goto err_gib; return 0; -out: - kvm_arch_exit(); +err_gib: + if (IS_ENABLED(CONFIG_VFIO_PCI_ZDEV_KVM)) + kvm_s390_pci_exit(); +err_pci: +err_flic: +err_debug_view: + debug_unregister(kvm_s390_dbf_uv); +err_kvm_uv: + debug_unregister(kvm_s390_dbf); return rc; } -- 2.38.1.584.g0f3c55d4c2-goog
[PATCH v2 04/50] KVM: Teardown VFIO ops earlier in kvm_exit()
Move the call to kvm_vfio_ops_exit() further up kvm_exit() to try and bring some amount of symmetry to the setup order in kvm_init(), and more importantly so that the arch hooks are invoked dead last by kvm_exit(). This will allow arch code to move away from the arch hooks without any change in ordering between arch code and common code in kvm_exit(). That kvm_vfio_ops_exit() is called last appears to be 100% arbitrary. It was bolted on after the fact by commit 571ee1b68598 ("kvm: vfio: fix unregister kvm_device_ops of vfio"). The nullified kvm_device_ops_table is also local to kvm_main.c and is used only when there are active VMs, so unless arch code is doing something truly bizarre, nullifying the table earlier in kvm_exit() is little more than a nop. Signed-off-by: Sean Christopherson Reviewed-by: Cornelia Huck Reviewed-by: Eric Farman --- virt/kvm/kvm_main.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index ded88ad6c2d8..988f7d92db2e 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -5980,6 +5980,7 @@ void kvm_exit(void) for_each_possible_cpu(cpu) free_cpumask_var(per_cpu(cpu_kick_mask, cpu)); kmem_cache_destroy(kvm_vcpu_cache); + kvm_vfio_ops_exit(); kvm_async_pf_deinit(); unregister_syscore_ops(_syscore_ops); unregister_reboot_notifier(_reboot_notifier); @@ -5989,7 +5990,6 @@ void kvm_exit(void) free_cpumask_var(cpus_hardware_enabled); kvm_arch_hardware_unsetup(); kvm_arch_exit(); - kvm_vfio_ops_exit(); } EXPORT_SYMBOL_GPL(kvm_exit); -- 2.38.1.584.g0f3c55d4c2-goog
[PATCH v2 03/50] KVM: Allocate cpus_hardware_enabled after arch hardware setup
Allocate cpus_hardware_enabled after arch hardware setup so that arch "init" and "hardware setup" are called back-to-back and thus can be combined in a future patch. cpus_hardware_enabled is never used before kvm_create_vm(), i.e. doesn't have a dependency with hardware setup and only needs to be allocated before /dev/kvm is exposed to userspace. Free the object before the arch hooks are invoked to maintain symmetry, and so that arch code can move away from the hooks without having to worry about ordering changes. Signed-off-by: Sean Christopherson Reviewed-by: Yuan Yao --- virt/kvm/kvm_main.c | 14 +++--- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 43e2e4f38151..ded88ad6c2d8 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -5862,15 +5862,15 @@ int kvm_init(void *opaque, unsigned vcpu_size, unsigned vcpu_align, if (r) return r; + r = kvm_arch_hardware_setup(opaque); + if (r < 0) + goto err_hw_setup; + if (!zalloc_cpumask_var(_hardware_enabled, GFP_KERNEL)) { r = -ENOMEM; goto err_hw_enabled; } - r = kvm_arch_hardware_setup(opaque); - if (r < 0) - goto out_free_1; - c.ret = c.opaque = opaque; for_each_online_cpu(cpu) { @@ -5956,10 +5956,10 @@ int kvm_init(void *opaque, unsigned vcpu_size, unsigned vcpu_align, unregister_reboot_notifier(_reboot_notifier); cpuhp_remove_state_nocalls(CPUHP_AP_KVM_STARTING); out_free_2: - kvm_arch_hardware_unsetup(); -out_free_1: free_cpumask_var(cpus_hardware_enabled); err_hw_enabled: + kvm_arch_hardware_unsetup(); +err_hw_setup: kvm_arch_exit(); return r; } @@ -5986,9 +5986,9 @@ void kvm_exit(void) cpuhp_remove_state_nocalls(CPUHP_AP_KVM_STARTING); on_each_cpu(hardware_disable_nolock, NULL, 1); kvm_irqfd_exit(); + free_cpumask_var(cpus_hardware_enabled); kvm_arch_hardware_unsetup(); kvm_arch_exit(); - free_cpumask_var(cpus_hardware_enabled); kvm_vfio_ops_exit(); } EXPORT_SYMBOL_GPL(kvm_exit); -- 2.38.1.584.g0f3c55d4c2-goog
[PATCH v2 02/50] KVM: Initialize IRQ FD after arch hardware setup
Move initialization of KVM's IRQ FD workqueue below arch hardware setup as a step towards consolidating arch "init" and "hardware setup", and eventually towards dropping the hooks entirely. There is no dependency on the workqueue being created before hardware setup, the workqueue is used only when destroying VMs, i.e. only needs to be created before /dev/kvm is exposed to userspace. Move the destruction of the workqueue before the arch hooks to maintain symmetry, and so that arch code can move away from the hooks without having to worry about ordering changes. Reword the comment about kvm_irqfd_init() needing to come after kvm_arch_init() to call out that kvm_arch_init() must come before common KVM does _anything_, as x86 very subtly relies on that behavior to deal with multiple calls to kvm_init(), e.g. if userspace attempts to load kvm_amd.ko and kvm_intel.ko. Tag the code with a FIXME, as x86's subtle requirement is gross, and invoking an arch callback as the very first action in a helper that is called only from arch code is silly. Signed-off-by: Sean Christopherson --- virt/kvm/kvm_main.c | 37 ++--- 1 file changed, 18 insertions(+), 19 deletions(-) diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index b60abb03606b..43e2e4f38151 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -5852,24 +5852,19 @@ int kvm_init(void *opaque, unsigned vcpu_size, unsigned vcpu_align, int r; int cpu; + /* +* FIXME: Get rid of kvm_arch_init(), vendor code should call arch code +* directly. Note, kvm_arch_init() _must_ be called before anything +* else as x86 relies on checks buried in kvm_arch_init() to guard +* against multiple calls to kvm_init(). +*/ r = kvm_arch_init(opaque); if (r) - goto out_fail; - - /* -* kvm_arch_init makes sure there's at most one caller -* for architectures that support multiple implementations, -* like intel and amd on x86. -* kvm_arch_init must be called before kvm_irqfd_init to avoid creating -* conflicts in case kvm is already setup for another implementation. -*/ - r = kvm_irqfd_init(); - if (r) - goto out_irqfd; + return r; if (!zalloc_cpumask_var(_hardware_enabled, GFP_KERNEL)) { r = -ENOMEM; - goto out_free_0; + goto err_hw_enabled; } r = kvm_arch_hardware_setup(opaque); @@ -5913,9 +5908,13 @@ int kvm_init(void *opaque, unsigned vcpu_size, unsigned vcpu_align, } } + r = kvm_irqfd_init(); + if (r) + goto err_irqfd; + r = kvm_async_pf_init(); if (r) - goto out_free_4; + goto err_async_pf; kvm_chardev_ops.owner = module; @@ -5946,6 +5945,9 @@ int kvm_init(void *opaque, unsigned vcpu_size, unsigned vcpu_align, kvm_vfio_ops_exit(); err_vfio: kvm_async_pf_deinit(); +err_async_pf: + kvm_irqfd_exit(); +err_irqfd: out_free_4: for_each_possible_cpu(cpu) free_cpumask_var(per_cpu(cpu_kick_mask, cpu)); @@ -5957,11 +5959,8 @@ int kvm_init(void *opaque, unsigned vcpu_size, unsigned vcpu_align, kvm_arch_hardware_unsetup(); out_free_1: free_cpumask_var(cpus_hardware_enabled); -out_free_0: - kvm_irqfd_exit(); -out_irqfd: +err_hw_enabled: kvm_arch_exit(); -out_fail: return r; } EXPORT_SYMBOL_GPL(kvm_init); @@ -5986,9 +5985,9 @@ void kvm_exit(void) unregister_reboot_notifier(_reboot_notifier); cpuhp_remove_state_nocalls(CPUHP_AP_KVM_STARTING); on_each_cpu(hardware_disable_nolock, NULL, 1); + kvm_irqfd_exit(); kvm_arch_hardware_unsetup(); kvm_arch_exit(); - kvm_irqfd_exit(); free_cpumask_var(cpus_hardware_enabled); kvm_vfio_ops_exit(); } -- 2.38.1.584.g0f3c55d4c2-goog
[PATCH v2 00/50] KVM: Rework kvm_init() and hardware enabling
The main theme of this series is to kill off kvm_arch_init(), kvm_arch_hardware_(un)setup(), and kvm_arch_check_processor_compat(), which all originated in x86 code from way back when, and needlessly complicate both common KVM code and architecture code. E.g. many architectures don't mark functions/data as __init/__ro_after_init purely because kvm_init() isn't marked __init to support x86's separate vendor modules. The idea/hope is that with those hooks gone (moved to arch code), it will be easier for x86 (and other architectures) to modify their module init sequences as needed without having to fight common KVM code. E.g. I'm hoping that ARM can build on this to simplify its hardware enabling logic, especially the pKVM side of things. There are bug fixes throughout this series. They are more scattered than I would usually prefer, but getting the sequencing correct was a gigantic pain for many of the x86 fixes due to needing to fix common code in order for the x86 fix to have any meaning. And while the bugs are often fatal, they aren't all that interesting for most users as they either require a malicious admin or broken hardware, i.e. aren't likely to be encountered by the vast majority of KVM users. So unless someone _really_ wants a particular fix isolated for backporting, I'm not planning on shuffling patches. v2: - Collect reviews/acks. - Reset eVMCS controls in VP assist page when disabling hardware. [Vitaly] - Clean up labels in kvm_init(). [Chao] - Fix a goof where VMX compat checks used boot_cpu_has. [Kai] - Reorder patches and/or tweak changelogs to not require time travel. [Paolo, Kai] - Rewrite the changelog for the patch to move ARM away from kvm_arch_init() to call out that it fixes theoretical bugs. [Philippe] - Document why it's safe to allow preemption and/or migration when accessing kvm_usage_count. v1: https://lore.kernel.org/all/20221102231911.3107438-1-sea...@google.com Chao Gao (3): KVM: x86: Do compatibility checks when onlining CPU KVM: Rename and move CPUHP_AP_KVM_STARTING to ONLINE section KVM: Disable CPU hotplug during hardware enabling/disabling Isaku Yamahata (3): KVM: Drop kvm_count_lock and instead protect kvm_usage_count with kvm_lock KVM: Remove on_each_cpu(hardware_disable_nolock) in kvm_exit() KVM: Make hardware_enable_failed a local variable in the "enable all" path Marc Zyngier (1): KVM: arm64: Simplify the CPUHP logic Sean Christopherson (43): KVM: Register /dev/kvm as the _very_ last thing during initialization KVM: Initialize IRQ FD after arch hardware setup KVM: Allocate cpus_hardware_enabled after arch hardware setup KVM: Teardown VFIO ops earlier in kvm_exit() KVM: s390: Unwind kvm_arch_init() piece-by-piece() if a step fails KVM: s390: Move hardware setup/unsetup to init/exit KVM: x86: Do timer initialization after XCR0 configuration KVM: x86: Move hardware setup/unsetup to init/exit KVM: Drop arch hardware (un)setup hooks KVM: VMX: Reset eVMCS controls in VP assist page during hardware disabling KVM: VMX: Don't bother disabling eVMCS static key on module exit KVM: VMX: Move Hyper-V eVMCS initialization to helper KVM: x86: Move guts of kvm_arch_init() to standalone helper KVM: VMX: Do _all_ initialization before exposing /dev/kvm to userspace KVM: x86: Serialize vendor module initialization (hardware setup) KVM: arm64: Free hypervisor allocations if vector slot init fails KVM: arm64: Unregister perf callbacks if hypervisor finalization fails KVM: arm64: Do arm/arch initialization without bouncing through kvm_init() KVM: arm64: Mark kvm_arm_init() and its unique descendants as __init KVM: MIPS: Hardcode callbacks to hardware virtualization extensions KVM: MIPS: Setup VZ emulation? directly from kvm_mips_init() KVM: MIPS: Register die notifier prior to kvm_init() KVM: RISC-V: Do arch init directly in riscv_kvm_init() KVM: RISC-V: Tag init functions and data with __init, __ro_after_init KVM: PPC: Move processor compatibility check to module init KVM: s390: Do s390 specific init without bouncing through kvm_init() KVM: s390: Mark __kvm_s390_init() and its descendants as __init KVM: Drop kvm_arch_{init,exit}() hooks KVM: VMX: Make VMCS configuration/capabilities structs read-only after init KVM: x86: Do CPU compatibility checks in x86 code KVM: Drop kvm_arch_check_processor_compat() hook KVM: x86: Use KBUILD_MODNAME to specify vendor module name KVM: x86: Unify pr_fmt to use module name for all KVM modules KVM: VMX: Use current CPU's info to perform "disabled by BIOS?" checks KVM: x86: Do VMX/SVM support checks directly in vendor code KVM: VMX: Shuffle support checks and hardware enabling code around KVM: SVM: Check for SVM support in CPU compatibility checks KVM: x86: Move CPU compat checks hook to kvm_x86_ops (from kvm_x86_init_ops) KVM: Ensure CPU is stable during low level hardware enable/disable KVM: Use a per-CPU
[PATCH v2 01/50] KVM: Register /dev/kvm as the _very_ last thing during initialization
Register /dev/kvm, i.e. expose KVM to userspace, only after all other setup has completed. Once /dev/kvm is exposed, userspace can start invoking KVM ioctls, creating VMs, etc... If userspace creates a VM before KVM is done with its configuration, bad things may happen, e.g. KVM will fail to properly migrate vCPU state if a VM is created before KVM has registered preemption notifiers. Cc: sta...@vger.kernel.org Signed-off-by: Sean Christopherson --- virt/kvm/kvm_main.c | 31 ++- 1 file changed, 22 insertions(+), 9 deletions(-) diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 1782c4555d94..b60abb03606b 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -5919,12 +5919,6 @@ int kvm_init(void *opaque, unsigned vcpu_size, unsigned vcpu_align, kvm_chardev_ops.owner = module; - r = misc_register(_dev); - if (r) { - pr_err("kvm: misc device register failed\n"); - goto out_unreg; - } - register_syscore_ops(_syscore_ops); kvm_preempt_ops.sched_in = kvm_sched_in; @@ -5933,11 +5927,24 @@ int kvm_init(void *opaque, unsigned vcpu_size, unsigned vcpu_align, kvm_init_debug(); r = kvm_vfio_ops_init(); - WARN_ON(r); + if (WARN_ON_ONCE(r)) + goto err_vfio; + + /* +* Registration _must_ be the very last thing done, as this exposes +* /dev/kvm to userspace, i.e. all infrastructure must be setup! +*/ + r = misc_register(_dev); + if (r) { + pr_err("kvm: misc device register failed\n"); + goto err_register; + } return 0; -out_unreg: +err_register: + kvm_vfio_ops_exit(); +err_vfio: kvm_async_pf_deinit(); out_free_4: for_each_possible_cpu(cpu) @@ -5963,8 +5970,14 @@ void kvm_exit(void) { int cpu; - debugfs_remove_recursive(kvm_debugfs_dir); + /* +* Note, unregistering /dev/kvm doesn't strictly need to come first, +* fops_get(), a.k.a. try_module_get(), prevents acquiring references +* to KVM while the module is being stopped. +*/ misc_deregister(_dev); + + debugfs_remove_recursive(kvm_debugfs_dir); for_each_possible_cpu(cpu) free_cpumask_var(per_cpu(cpu_kick_mask, cpu)); kmem_cache_destroy(kvm_vcpu_cache); -- 2.38.1.584.g0f3c55d4c2-goog
Re: [PATCH 32/44] KVM: x86: Unify pr_fmt to use module name for all KVM modules
On Thu, Nov 10, 2022, Sean Christopherson wrote: > On Thu, Nov 10, 2022, Robert Hoo wrote: > > After this patch set, still find some printk()s left in arch/x86/kvm/*, > > consider clean all of them up? > > Hmm, yeah, I suppose at this point it makes sense to tack on a patch to clean > them up. Actually, I'm going to pass on this for now. The series is already too big. I'll add this to my todo list for the future.
[Bug 216713] BUG: Bad page map in process init pte:c0ab684c pmd:01182000 (on a PowerMac G4 DP)
https://bugzilla.kernel.org/show_bug.cgi?id=216713 Erhard F. (erhar...@mailbox.org) changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |MOVED --- Comment #2 from Erhard F. (erhar...@mailbox.org) --- Moved to linux-mm. -- You may reply to this email to add a comment. You are receiving this mail because: You are watching the assignee of the bug.
Re: [RFC PATCH] Disable Book-E KVM support?
On Mon, 2022-11-28 at 14:36 +1000, Nicholas Piggin wrote: > BookE KVM is in a deep maintenance state, I'm not sure how much testing > it gets. I don't have a test setup, and it does not look like QEMU has > any HV architecture enabled. It hasn't been too painful but there are > some cases where it causes a bit of problem not being able to test, e.g., > > https://lists.ozlabs.org/pipermail/linuxppc-dev/2022-November/251452.html > > Time to begin removal process, or are there still people using it? I'm > happy to to keep making occasional patches to try keep it going if > there are people testing upstream. Getting HV support into QEMU would > help with long term support, not sure how big of a job that would be. Not sure what you mean about QEMU not having e500 HV support? I don't know if it's bitrotted, but it's there. I don't know whether anyone is still using this, but if they are, it's probably e500mc and not e500v2 (which involved a bunch of hacks to get almost- sorta-usable performance out of hardware not designed for virtualization). I do see that there have been a few recent patches on QEMU e500 (beyond the treewide cleanup type stuff), though I don't know if they're using KVM. CCing them and the QEMU list. I have an e6500 I could occasionally test on, if it turns out people do still care about this. Don't count me as the use case, though. :-) FWIW, as far as the RECONCILE_IRQ_STATE issue, that used to be done in kvmppc_handle_exit(), but was moved in commit 9bd880a2c882 to be "cleaner and faster". :-P -Crystal
[PATCH v5 0/3] generic and PowerPC SED Opal keystore
From: Greg Joyce Changelog v5: - added check for key length based on review comment by "Elliott, Robert (Servers)" Changelog v4: - scope reduced to cover just SED Opal keys - base SED Opal keystore is now in SED block driver - removed use of enum to indicate type - refactored common code into common function that read and write use - removed cast to void - added use of SED Opal keystore functions to SED block driver Generic functions have been defined for accessing SED Opal keys. The generic functions are defined as weak so that they may be superseded by keystore specific versions. PowerPC/pseries versions of these functions provide read/write access to SED Opal keys in the PLPKS keystore. The SED block driver has been modified to read the SED Opal keystore to populate a key in the SED Opal keyring. Changes to the SED Opal key will be written to the SED Opal keystore. Greg Joyce (3): block: sed-opal: SED Opal keystore powerpc/pseries: PLPKS SED Opal keystore support block: sed-opal: keystore access for SED Opal keys arch/powerpc/platforms/pseries/Makefile | 1 + .../powerpc/platforms/pseries/plpks_sed_ops.c | 129 ++ block/Makefile| 2 +- block/sed-opal-key.c | 23 block/sed-opal.c | 18 ++- include/linux/sed-opal-key.h | 15 ++ 6 files changed, 185 insertions(+), 3 deletions(-) create mode 100644 arch/powerpc/platforms/pseries/plpks_sed_ops.c create mode 100644 block/sed-opal-key.c create mode 100644 include/linux/sed-opal-key.h -- gjo...@linux.vnet.ibm.com
[PATCH v5 3/3] block: sed-opal: keystore access for SED Opal keys
From: Greg Joyce Allow for permanent SED authentication keys by reading/writing to the SED Opal non-volatile keystore. Signed-off-by: Greg Joyce Reviewed-by: Jonathan Derrick --- block/sed-opal.c | 18 -- 1 file changed, 16 insertions(+), 2 deletions(-) diff --git a/block/sed-opal.c b/block/sed-opal.c index a8729892178b..e280631b932e 100644 --- a/block/sed-opal.c +++ b/block/sed-opal.c @@ -18,6 +18,7 @@ #include #include #include +#include #include #include #include @@ -2762,7 +2763,13 @@ static int opal_set_new_pw(struct opal_dev *dev, struct opal_new_pw *opal_pw) if (ret) return ret; - /* update keyring with new password */ + /* update keyring and key store with new password */ + ret = sed_write_key(OPAL_AUTH_KEY, + opal_pw->new_user_pw.opal_key.key, + opal_pw->new_user_pw.opal_key.key_len); + if (ret != -EOPNOTSUPP) + pr_warn("error updating SED key: %d\n", ret); + ret = update_sed_opal_key(OPAL_AUTH_KEY, opal_pw->new_user_pw.opal_key.key, opal_pw->new_user_pw.opal_key.key_len); @@ -3009,6 +3016,8 @@ EXPORT_SYMBOL_GPL(sed_ioctl); static int __init sed_opal_init(void) { struct key *kr; + char init_sed_key[OPAL_KEY_MAX]; + int keylen = OPAL_KEY_MAX; kr = keyring_alloc(".sed_opal", GLOBAL_ROOT_UID, GLOBAL_ROOT_GID, current_cred(), @@ -3021,6 +3030,11 @@ static int __init sed_opal_init(void) sed_opal_keyring = kr; - return 0; + if (sed_read_key(OPAL_AUTH_KEY, init_sed_key, ) < 0) { + memset(init_sed_key, '\0', sizeof(init_sed_key)); + keylen = OPAL_KEY_MAX; + } + + return update_sed_opal_key(OPAL_AUTH_KEY, init_sed_key, keylen); } late_initcall(sed_opal_init); -- gjo...@linux.vnet.ibm.com
[PATCH v5 2/3] powerpc/pseries: PLPKS SED Opal keystore support
From: Greg Joyce Define operations for SED Opal to read/write keys from POWER LPAR Platform KeyStore(PLPKS). This allows for non-volatile storage of SED Opal keys. Signed-off-by: Greg Joyce Reviewed-by: Jonathan Derrick --- arch/powerpc/platforms/pseries/Makefile | 1 + .../powerpc/platforms/pseries/plpks_sed_ops.c | 129 ++ 2 files changed, 130 insertions(+) create mode 100644 arch/powerpc/platforms/pseries/plpks_sed_ops.c diff --git a/arch/powerpc/platforms/pseries/Makefile b/arch/powerpc/platforms/pseries/Makefile index 92310202bdd7..5bedc06ee2cc 100644 --- a/arch/powerpc/platforms/pseries/Makefile +++ b/arch/powerpc/platforms/pseries/Makefile @@ -28,6 +28,7 @@ obj-$(CONFIG_PPC_SPLPAR) += vphn.o obj-$(CONFIG_PPC_SVM) += svm.o obj-$(CONFIG_FA_DUMP) += rtas-fadump.o obj-$(CONFIG_PSERIES_PLPKS) += plpks.o +obj-$(CONFIG_PSERIES_PLPKS) += plpks_sed_ops.o obj-$(CONFIG_SUSPEND) += suspend.o obj-$(CONFIG_PPC_VAS) += vas.o vas-sysfs.o diff --git a/arch/powerpc/platforms/pseries/plpks_sed_ops.c b/arch/powerpc/platforms/pseries/plpks_sed_ops.c new file mode 100644 index ..cd1084d07716 --- /dev/null +++ b/arch/powerpc/platforms/pseries/plpks_sed_ops.c @@ -0,0 +1,129 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * POWER Platform specific code for non-volatile SED key access + * Copyright (C) 2022 IBM Corporation + * + * Define operations for SED Opal to read/write keys + * from POWER LPAR Platform KeyStore(PLPKS). + * + * Self Encrypting Drives(SED) key storage using PLPKS + */ + +#include +#include +#include +#include +#include +#include "plpks.h" + +/* + * structure that contains all SED data + */ +struct plpks_sed_object_data { + u_char version; + u_char pad1[7]; + u_long authority; + u_long range; + u_int key_len; + u_char key[32]; +}; + +#define PLPKS_PLATVAR_POLICYWORLDREADABLE +#define PLPKS_PLATVAR_OS_COMMON 4 + +#define PLPKS_SED_OBJECT_DATA_V00 +#define PLPKS_SED_MANGLED_LABEL "/default/pri" +#define PLPKS_SED_COMPONENT "sed-opal" +#define PLPKS_SED_KEY "opal-boot-pin" + +/* + * authority is admin1 and range is global + */ +#define PLPKS_SED_AUTHORITY 0x000900010001 +#define PLPKS_SED_RANGE 0x08020001 + +void plpks_init_var(struct plpks_var *var, char *keyname) +{ + var->name = keyname; + var->namelen = strlen(keyname); + if (strcmp(PLPKS_SED_KEY, keyname) == 0) { + var->name = PLPKS_SED_MANGLED_LABEL; + var->namelen = strlen(keyname); + } + var->policy = PLPKS_PLATVAR_POLICY; + var->os = PLPKS_PLATVAR_OS_COMMON; + var->data = NULL; + var->datalen = 0; + var->component = PLPKS_SED_COMPONENT; +} + +/* + * Read the SED Opal key from PLPKS given the label + */ +int sed_read_key(char *keyname, char *key, u_int *keylen) +{ + struct plpks_var var; + struct plpks_sed_object_data *data; + u_int offset; + int ret; + u_int len; + + plpks_init_var(, keyname); + + ret = plpks_read_os_var(); + if (ret != 0) + return ret; + + offset = offsetof(struct plpks_sed_object_data, key); + if (offset > var.datalen) { + kfree(var.data); + return -EINVAL; + } + + data = (struct plpks_sed_object_data *)var.data; + len = min(be32_to_cpu(data->key_len), *keylen); + + if (var.data) { + memcpy(key, data->key, len); + kfree(var.data); + } else + len = 0; + + key[len] = '\0'; + *keylen = len; + + return 0; +} + +/* + * Write the SED Opal key to PLPKS given the label + */ +int sed_write_key(char *keyname, char *key, u_int keylen) +{ + struct plpks_var var; + struct plpks_sed_object_data data; + struct plpks_var_name vname; + + plpks_init_var(, keyname); + + var.datalen = sizeof(struct plpks_sed_object_data); + var.data = (u8 *) + + /* initialize SED object */ + data.version = PLPKS_SED_OBJECT_DATA_V0; + data.authority = cpu_to_be64(PLPKS_SED_AUTHORITY); + data.range = cpu_to_be64(PLPKS_SED_RANGE); + memset(, '\0', sizeof(data.pad1)); + data.key_len = cpu_to_be32(keylen); + memcpy(data.key, (char *)key, keylen); + + /* +* Key update requires remove first. The return value +* is ignored since it's okay if the key doesn't exist. +*/ + vname.namelen = var.namelen; + vname.name = var.name; + plpks_remove_var(var.component, var.os, vname); + + return plpks_write_var(var); +} -- gjo...@linux.vnet.ibm.com
[PATCH v5 1/3] block: sed-opal: SED Opal keystore
From: Greg Joyce Add read and write functions that allow SED Opal keys to stored in a permanent keystore. Signed-off-by: Greg Joyce Reviewed-by: Jonathan Derrick --- block/Makefile | 2 +- block/sed-opal-key.c | 23 +++ include/linux/sed-opal-key.h | 15 +++ 3 files changed, 39 insertions(+), 1 deletion(-) create mode 100644 block/sed-opal-key.c create mode 100644 include/linux/sed-opal-key.h diff --git a/block/Makefile b/block/Makefile index 4e01bb71ad6e..464a9f209552 100644 --- a/block/Makefile +++ b/block/Makefile @@ -35,7 +35,7 @@ obj-$(CONFIG_BLK_DEV_ZONED) += blk-zoned.o obj-$(CONFIG_BLK_WBT) += blk-wbt.o obj-$(CONFIG_BLK_DEBUG_FS) += blk-mq-debugfs.o obj-$(CONFIG_BLK_DEBUG_FS_ZONED)+= blk-mq-debugfs-zoned.o -obj-$(CONFIG_BLK_SED_OPAL) += sed-opal.o +obj-$(CONFIG_BLK_SED_OPAL) += sed-opal.o sed-opal-key.o obj-$(CONFIG_BLK_PM) += blk-pm.o obj-$(CONFIG_BLK_INLINE_ENCRYPTION)+= blk-crypto.o blk-crypto-profile.o \ blk-crypto-sysfs.o diff --git a/block/sed-opal-key.c b/block/sed-opal-key.c new file mode 100644 index ..32ef988cd53b --- /dev/null +++ b/block/sed-opal-key.c @@ -0,0 +1,23 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * SED key operations. + * + * Copyright (C) 2022 IBM Corporation + * + * These are the accessor functions (read/write) for SED Opal + * keys. Specific keystores can provide overrides. + * + */ + +#include +#include + +int __weak sed_read_key(char *keyname, char *key, u_int *keylen) +{ + return -EOPNOTSUPP; +} + +int __weak sed_write_key(char *keyname, char *key, u_int keylen) +{ + return -EOPNOTSUPP; +} diff --git a/include/linux/sed-opal-key.h b/include/linux/sed-opal-key.h new file mode 100644 index ..c9b1447986d8 --- /dev/null +++ b/include/linux/sed-opal-key.h @@ -0,0 +1,15 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * SED key operations. + * + * Copyright (C) 2022 IBM Corporation + * + * These are the accessor functions (read/write) for SED Opal + * keys. Specific keystores can provide overrides. + * + */ + +#include + +int sed_read_key(char *keyname, char *key, u_int *keylen); +int sed_write_key(char *keyname, char *key, u_int keylen); -- gjo...@linux.vnet.ibm.com
[PATCH v3] powerpc/hv-gpci: Fix hv_gpci event list
Based on getPerfCountInfo v1.018 documentation, some of the hv_gpci events were deprecated for platform firmware that supports counter_info_version 0x8 or above. Fix the hv_gpci event list by adding a new attribute group called "hv_gpci_event_attrs_v6" and a "ENABLE_EVENTS_COUNTERINFO_V6" macro to enable these events for platform firmware that supports counter_info_version 0x6 or below. And assigning the hv_gpci event list based on output counter info version of underlying plaform. Fixes: 97bf2640184f ("powerpc/perf/hv-gpci: add the remaining gpci requests") Signed-off-by: Kajol Jain Reviewed-by: Madhavan Srinivasan Reviewed-by: Athira Rajeev --- Changelog: v2 -> v3 - Make nit commit/comment changes and changed the name of macro as suggested by Michael Ellerman - Add reviewed by tag - Add information about not having counter_info_version 0x7 in comment to avoid off-by-one error confusion as suggested by Michael Ellerman v1 -> v2 - As suggested by Michael Ellerman, using counter_info_version value rather then cpu_has_feature() to assign hv-gpci event list. arch/powerpc/perf/hv-gpci-requests.h | 4 arch/powerpc/perf/hv-gpci.c | 35 ++-- arch/powerpc/perf/hv-gpci.h | 1 + arch/powerpc/perf/req-gen/perf.h | 20 4 files changed, 58 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/perf/hv-gpci-requests.h b/arch/powerpc/perf/hv-gpci-requests.h index 8965b4463d43..5e86371a20c7 100644 --- a/arch/powerpc/perf/hv-gpci-requests.h +++ b/arch/powerpc/perf/hv-gpci-requests.h @@ -79,6 +79,7 @@ REQUEST(__field(0,8, partition_id) ) #include I(REQUEST_END) +#ifdef ENABLE_EVENTS_COUNTERINFO_V6 /* * Not available for counter_info_version >= 0x8, use * run_instruction_cycles_by_partition(0x100) instead. @@ -92,6 +93,7 @@ REQUEST(__field(0,8, partition_id) __count(0x10, 8, cycles) ) #include I(REQUEST_END) +#endif #define REQUEST_NAME system_performance_capabilities #define REQUEST_NUM 0x40 @@ -103,6 +105,7 @@ REQUEST(__field(0, 1, perf_collect_privileged) ) #include I(REQUEST_END) +#ifdef ENABLE_EVENTS_COUNTERINFO_V6 #define REQUEST_NAME processor_bus_utilization_abc_links #define REQUEST_NUM 0x50 #define REQUEST_IDX_KIND "hw_chip_id=?" @@ -194,6 +197,7 @@ REQUEST(__field(0, 4, phys_processor_idx) __count(0x28, 8, instructions_completed) ) #include I(REQUEST_END) +#endif /* Processor_core_power_mode (0x95) skipped, no counters */ /* Affinity_domain_information_by_virtual_processor (0xA0) skipped, diff --git a/arch/powerpc/perf/hv-gpci.c b/arch/powerpc/perf/hv-gpci.c index 5eb60ed5b5e8..7ff8ff3509f5 100644 --- a/arch/powerpc/perf/hv-gpci.c +++ b/arch/powerpc/perf/hv-gpci.c @@ -70,9 +70,9 @@ static const struct attribute_group format_group = { .attrs = format_attrs, }; -static const struct attribute_group event_group = { +static struct attribute_group event_group = { .name = "events", - .attrs = hv_gpci_event_attrs, + /* .attrs is set in init */ }; #define HV_CAPS_ATTR(_name, _format) \ @@ -330,6 +330,7 @@ static int hv_gpci_init(void) int r; unsigned long hret; struct hv_perf_caps caps; + struct hv_gpci_request_buffer *arg; hv_gpci_assert_offsets_correct(); @@ -353,6 +354,36 @@ static int hv_gpci_init(void) /* sampling not supported */ h_gpci_pmu.capabilities |= PERF_PMU_CAP_NO_INTERRUPT; + arg = (void *)get_cpu_var(hv_gpci_reqb); + memset(arg, 0, HGPCI_REQ_BUFFER_SIZE); + + /* +* hcall H_GET_PERF_COUNTER_INFO populates the output +* counter_info_version value based on the system hypervisor. +* Pass the counter request 0x10 corresponds to request type +* 'Dispatch_timebase_by_processor', to get the supported +* counter_info_version. +*/ + arg->params.counter_request = cpu_to_be32(0x10); + + r = plpar_hcall_norets(H_GET_PERF_COUNTER_INFO, + virt_to_phys(arg), HGPCI_REQ_BUFFER_SIZE); + if (r) { + pr_devel("hcall failed, can't get supported counter_info_version: 0x%x\n", r); + arg->params.counter_info_version_out = 0x8; + } + + /* +* Use counter_info_version_out value to assign +* required hv-gpci event list. +*/ + if (arg->params.counter_info_version_out >= 0x8) + event_group.attrs = hv_gpci_event_attrs; + else + event_group.attrs = hv_gpci_event_attrs_v6; + + put_cpu_var(hv_gpci_reqb); + r = perf_pmu_register(_gpci_pmu, h_gpci_pmu.name, -1); if (r) return r; diff --git a/arch/powerpc/perf/hv-gpci.h b/arch/powerpc/perf/hv-gpci.h index 4d108262bed7..c72020912dea 100644 --- a/arch/powerpc/perf/hv-gpci.h +++ b/arch/powerpc/perf/hv-gpci.h @@ -26,6 +26,7 @@ enum { #define REQUEST_FILE
[linux-next:master] BUILD REGRESSION 700e0cd3a5ce6a2cb90d9a2aab729b52f092a7d6
tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master branch HEAD: 700e0cd3a5ce6a2cb90d9a2aab729b52f092a7d6 Add linux-next specific files for 20221130 Error/Warning reports: https://lore.kernel.org/oe-kbuild-all/202211090634.ryfkk0ws-...@intel.com https://lore.kernel.org/oe-kbuild-all/20220149.0etifpy6-...@intel.com https://lore.kernel.org/oe-kbuild-all/202211242021.fdzrfna8-...@intel.com https://lore.kernel.org/oe-kbuild-all/202211242120.mzzvguln-...@intel.com https://lore.kernel.org/oe-kbuild-all/202211282102.qur7hhrw-...@intel.com https://lore.kernel.org/oe-kbuild-all/202211301622.rxgmfgtv-...@intel.com https://lore.kernel.org/oe-kbuild-all/202211301634.cejlltjp-...@intel.com https://lore.kernel.org/oe-kbuild-all/202211301840.y7rrob13-...@intel.com https://lore.kernel.org/oe-kbuild-all/202211302059.viaoimsf-...@intel.com Error/Warning: (recently discovered and may have been fixed) arch/arm/mach-s3c/devs.c:32:10: fatal error: linux/platform_data/dma-s3c24xx.h: No such file or directory arch/powerpc/kernel/kvm_emul.o: warning: objtool: kvm_template_end(): can't find starting instruction arch/powerpc/kernel/optprobes_head.o: warning: objtool: optprobe_template_end(): can't find starting instruction drivers/gpu/drm/amd/amdgpu/../display/dc/irq/dcn201/irq_service_dcn201.c:40:20: warning: no previous prototype for 'to_dal_irq_source_dcn201' [-Wmissing-prototypes] drivers/gpu/drm/nouveau/nvkm/engine/fifo/gf100.c:451:1: warning: no previous prototype for 'gf100_fifo_nonstall_block' [-Wmissing-prototypes] drivers/gpu/drm/nouveau/nvkm/engine/fifo/gf100.c:451:1: warning: no previous prototype for function 'gf100_fifo_nonstall_block' [-Wmissing-prototypes] drivers/gpu/drm/nouveau/nvkm/engine/fifo/runl.c:34:1: warning: no previous prototype for 'nvkm_engn_cgrp_get' [-Wmissing-prototypes] drivers/gpu/drm/nouveau/nvkm/engine/fifo/runl.c:34:1: warning: no previous prototype for function 'nvkm_engn_cgrp_get' [-Wmissing-prototypes] drivers/gpu/drm/nouveau/nvkm/engine/gr/tu102.c:210:1: warning: no previous prototype for 'tu102_gr_load' [-Wmissing-prototypes] drivers/gpu/drm/nouveau/nvkm/engine/gr/tu102.c:210:1: warning: no previous prototype for function 'tu102_gr_load' [-Wmissing-prototypes] drivers/gpu/drm/nouveau/nvkm/nvfw/acr.c:49:1: warning: no previous prototype for 'wpr_generic_header_dump' [-Wmissing-prototypes] drivers/gpu/drm/nouveau/nvkm/nvfw/acr.c:49:1: warning: no previous prototype for function 'wpr_generic_header_dump' [-Wmissing-prototypes] drivers/gpu/drm/nouveau/nvkm/subdev/acr/lsfw.c:221:21: warning: variable 'loc' set but not used [-Wunused-but-set-variable] drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c:1474:38: warning: unused variable 'mt8173_jpeg_drvdata' [-Wunused-const-variable] drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c:1489:38: warning: unused variable 'mtk_jpeg_drvdata' [-Wunused-const-variable] drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c:1890:38: warning: unused variable 'mtk8195_jpegdec_drvdata' [-Wunused-const-variable] drivers/regulator/tps65219-regulator.c:310:60: warning: parameter 'dev' set but not used [-Wunused-but-set-parameter] drivers/regulator/tps65219-regulator.c:370:26: warning: ordered comparison of pointer with integer zero [-Wextra] include/linux/hugetlb.h:1240:33: error: 'VM_MAYSHARE' undeclared (first use in this function) include/linux/hugetlb.h:1240:47: error: 'VM_SHARED' undeclared (first use in this function); did you mean 'MNT_SHARED'? include/linux/hugetlb.h:1262:56: error: invalid use of undefined type 'struct hugetlb_vma_lock' net/netfilter/nf_conntrack_netlink.c:2674:6: warning: unused variable 'mark' [-Wunused-variable] vmlinux.o: warning: objtool: __btrfs_map_block+0x1d77: unreachable instruction Unverified Error/Warning (likely false positive, please contact us if interested): drivers/media/i2c/ov9282.c:470:3: warning: Value stored to 'ret' is never read [clang-analyzer-deadcode.DeadStores] Error/Warning ids grouped by kconfigs: gcc_recent_errors |-- alpha-allyesconfig | |-- drivers-gpu-drm-amd-amdgpu-..-display-dc-irq-dcn201-irq_service_dcn201.c:warning:no-previous-prototype-for-to_dal_irq_source_dcn201 | |-- drivers-gpu-drm-nouveau-nvkm-engine-fifo-gf100.c:warning:no-previous-prototype-for-gf100_fifo_nonstall_block | |-- drivers-gpu-drm-nouveau-nvkm-engine-fifo-runl.c:warning:no-previous-prototype-for-nvkm_engn_cgrp_get | |-- drivers-gpu-drm-nouveau-nvkm-engine-gr-tu102.c:warning:no-previous-prototype-for-tu102_gr_load | |-- drivers-gpu-drm-nouveau-nvkm-nvfw-acr.c:warning:no-previous-prototype-for-wpr_generic_header_dump | `-- drivers-gpu-drm-nouveau-nvkm-subdev-acr-lsfw.c:warning:variable-loc-set-but-not-used |-- arc-allyesconfig | |-- drivers-gpu-drm-amd-amdgpu-..-display-dc-irq-dcn201-irq_service_dcn201.c:warning:no-previous-prototype-for-to_dal_irq_source_dcn201 | |-- drivers-gpu-drm-nouveau-nvkm-engine-fifo-gf100.c:warning:no
[PATCH v2] hvc/xen: lock console list traversal
The currently lockless access to the xen console list in vtermno_to_xencons() is incorrect, as additions and removals from the list can happen anytime, and as such the traversal of the list to get the private console data for a given termno needs to happen with the lock held. Note users that modify the list already do so with the lock taken. Adjust current lock takers to use the _irq{save,restore} helpers, since the context in which vtermno_to_xencons() is called can have interrupts disabled. Use the _irq{save,restore} set of helpers to switch the current callers to disable interrupts in the locked region. I haven't checked if existing users could instead use the _irq variant, as I think it's safer to use _irq{save,restore} upfront. While there switch from using list_for_each_entry_safe to list_for_each_entry: the current entry cursor won't be removed as part of the code in the loop body, so using the _safe variant is pointless. Fixes: 02e19f9c7cac ('hvc_xen: implement multiconsole support') Signed-off-by: Roger Pau Monné --- Changes since v1: - Switch current lock users to disable interrupts in the locked region. --- drivers/tty/hvc/hvc_xen.c | 46 --- 1 file changed, 29 insertions(+), 17 deletions(-) diff --git a/drivers/tty/hvc/hvc_xen.c b/drivers/tty/hvc/hvc_xen.c index e63c1761a361..d9d023275328 100644 --- a/drivers/tty/hvc/hvc_xen.c +++ b/drivers/tty/hvc/hvc_xen.c @@ -53,17 +53,22 @@ static DEFINE_SPINLOCK(xencons_lock); static struct xencons_info *vtermno_to_xencons(int vtermno) { - struct xencons_info *entry, *n, *ret = NULL; + struct xencons_info *entry, *ret = NULL; + unsigned long flags; - if (list_empty()) - return NULL; + spin_lock_irqsave(_lock, flags); + if (list_empty()) { + spin_unlock_irqrestore(_lock, flags); + return NULL; + } - list_for_each_entry_safe(entry, n, , list) { + list_for_each_entry(entry, , list) { if (entry->vtermno == vtermno) { ret = entry; break; } } + spin_unlock_irqrestore(_lock, flags); return ret; } @@ -234,7 +239,7 @@ static int xen_hvm_console_init(void) { int r; uint64_t v = 0; - unsigned long gfn; + unsigned long gfn, flags; struct xencons_info *info; if (!xen_hvm_domain()) @@ -270,9 +275,9 @@ static int xen_hvm_console_init(void) goto err; info->vtermno = HVC_COOKIE; - spin_lock(_lock); + spin_lock_irqsave(_lock, flags); list_add_tail(>list, ); - spin_unlock(_lock); + spin_unlock_irqrestore(_lock, flags); return 0; err: @@ -296,6 +301,7 @@ static int xencons_info_pv_init(struct xencons_info *info, int vtermno) static int xen_pv_console_init(void) { struct xencons_info *info; + unsigned long flags; if (!xen_pv_domain()) return -ENODEV; @@ -312,9 +318,9 @@ static int xen_pv_console_init(void) /* already configured */ return 0; } - spin_lock(_lock); + spin_lock_irqsave(_lock, flags); xencons_info_pv_init(info, HVC_COOKIE); - spin_unlock(_lock); + spin_unlock_irqrestore(_lock, flags); return 0; } @@ -322,6 +328,7 @@ static int xen_pv_console_init(void) static int xen_initial_domain_console_init(void) { struct xencons_info *info; + unsigned long flags; if (!xen_initial_domain()) return -ENODEV; @@ -337,9 +344,9 @@ static int xen_initial_domain_console_init(void) info->irq = bind_virq_to_irq(VIRQ_CONSOLE, 0, false); info->vtermno = HVC_COOKIE; - spin_lock(_lock); + spin_lock_irqsave(_lock, flags); list_add_tail(>list, ); - spin_unlock(_lock); + spin_unlock_irqrestore(_lock, flags); return 0; } @@ -394,10 +401,12 @@ static void xencons_free(struct xencons_info *info) static int xen_console_remove(struct xencons_info *info) { + unsigned long flags; + xencons_disconnect_backend(info); - spin_lock(_lock); + spin_lock_irqsave(_lock, flags); list_del(>list); - spin_unlock(_lock); + spin_unlock_irqrestore(_lock, flags); if (info->xbdev != NULL) xencons_free(info); else { @@ -478,6 +487,7 @@ static int xencons_probe(struct xenbus_device *dev, { int ret, devid; struct xencons_info *info; + unsigned long flags; devid = dev->nodename[strlen(dev->nodename) - 1] - '0'; if (devid == 0) @@ -497,9 +507,9 @@ static int xencons_probe(struct xenbus_device *dev, ret = xencons_connect_backend(dev, info); if (ret < 0) goto error; - spin_lock(_lock); + spin_lock_irqsave(_lock, flags); list_add_tail(>list, ); -
Re: [PATCH v3 3/3] block: sed-opal: keyring support for SED keys
On Wed, 2022-11-30 at 08:00 +0100, Hannes Reinecke wrote: > On 11/30/22 00:25, gjo...@linux.vnet.ibm.com wrote: > > From: Greg Joyce > > > > Extend the SED block driver so it can alternatively > > obtain a key from a sed-opal kernel keyring. The SED > > ioctls will indicate the source of the key, either > > directly in the ioctl data or from the keyring. > > > > This allows the use of SED commands in scripts such as > > udev scripts so that drives may be automatically unlocked > > as they become available. > > > > Signed-off-by: Greg Joyce > > Reviewed-by: Jonathan Derrick > > --- > > block/Kconfig | 1 + > > block/sed-opal.c | 174 > > +- > > include/linux/sed-opal.h | 3 + > > include/uapi/linux/sed-opal.h | 8 +- > > 4 files changed, 183 insertions(+), 3 deletions(-) > > > > diff --git a/block/Kconfig b/block/Kconfig > > index 444c5ab3b67e..b46f93ac8405 100644 > > --- a/block/Kconfig > > +++ b/block/Kconfig > > @@ -181,6 +181,7 @@ config BLK_DEBUG_FS_ZONED > > > > config BLK_SED_OPAL > > bool "Logic for interfacing with Opal enabled SEDs" > > + depends on KEYS > > help > > Builds Logic for interfacing with Opal enabled controllers. > > Enabling this option enables users to setup/unlock/lock > > diff --git a/block/sed-opal.c b/block/sed-opal.c > > index 993b2b7cc4c2..a8729892178b 100644 > > --- a/block/sed-opal.c > > +++ b/block/sed-opal.c > > @@ -20,6 +20,9 @@ > > #include > > #include > > #include > > +#include > > +#include > > +#include > > > > #include "opal_proto.h" > > > > @@ -29,6 +32,8 @@ > > /* Number of bytes needed by cmd_finalize. */ > > #define CMD_FINALIZE_BYTES_NEEDED 7 > > > > +static struct key *sed_opal_keyring; > > + > > struct opal_step { > > int (*fn)(struct opal_dev *dev, void *data); > > void *data; > > @@ -265,6 +270,101 @@ static void print_buffer(const u8 *ptr, u32 > > length) > > #endif > > } > > > > +/* > > + * Allocate/update a SED Opal key and add it to the SED Opal > > keyring. > > + */ > > +static int update_sed_opal_key(const char *desc, u_char *key_data, > > int keylen) > > +{ > > + key_ref_t kr; > > + > > + if (!sed_opal_keyring) > > + return -ENOKEY; > > + > > + kr = key_create_or_update(make_key_ref(sed_opal_keyring, true), > > "user", > > + desc, (const void *)key_data, keylen, > > + KEY_USR_VIEW | KEY_USR_SEARCH | > > KEY_USR_WRITE, > > + KEY_ALLOC_NOT_IN_QUOTA | > > KEY_ALLOC_BUILT_IN | > > + KEY_ALLOC_BYPASS_RESTRICTION); > > + if (IS_ERR(kr)) { > > + pr_err("Error adding SED key (%ld)\n", PTR_ERR(kr)); > > + return PTR_ERR(kr); > > + } > > + > > + return 0; > > +} > > + > > +/* > > + * Read a SED Opal key from the SED Opal keyring. > > + */ > > +static int read_sed_opal_key(const char *key_name, u_char *buffer, > > int buflen) > > +{ > > + int ret; > > + key_ref_t kref; > > + struct key *key; > > + > > + if (!sed_opal_keyring) > > + return -ENOKEY; > > + > > + kref = keyring_search(make_key_ref(sed_opal_keyring, true), > > + _type_user, key_name, true); > > + > > + if (IS_ERR(kref)) > > + ret = PTR_ERR(kref); > > + > > + key = key_ref_to_ptr(kref); > > + down_read(>sem); > > + ret = key_validate(key); > > + if (ret == 0) { > > + if (buflen > key->datalen) > > + buflen = key->datalen; > > + > > + ret = key->type->read(key, (char *)buffer, buflen); > > + } > > + up_read(>sem); > > + > > + key_ref_put(kref); > > + > > + return ret; > > +} > > + > > +static int opal_get_key(struct opal_dev *dev, struct opal_key > > *key) > > +{ > > + int ret = 0; > > + > > + switch (key->key_type) { > > + case OPAL_INCLUDED: > > + /* the key is ready to use */ > > + break; > > + case OPAL_KEYRING: > > + /* the key is in the keyring */ > > + ret = read_sed_opal_key(OPAL_AUTH_KEY, key->key, > > OPAL_KEY_MAX); > > + if (ret > 0) { > > + if (ret > 255) { > > Why is a key longer than 255 an error? > If this is a requirement, why not move the check into > read_sed_opal_key() such that one only has to check for > ret < 0 on errors? The check is done here because the SED Opal spec stipulates 255 as the maximum key length. The key length (key->key_len) in the existing data structures is __u8, so a length greater than 255 can not be conveyed. For defensive purposes, I though it best to check here. > > > + ret = -ENOSPC; > > + goto error; > > + } > > + key->key_len = ret; > > + key->key_type = OPAL_INCLUDED; > > + } > > + break; > > + default: > > + ret = -EINVAL;
[PATCH v2] hvc/xen: prevent concurrent accesses to the shared ring
The hvc machinery registers both a console and a tty device based on the hv ops provided by the specific implementation. Those two interfaces however have different locks, and there's no single locks that's shared between the tty and the console implementations, hence the driver needs to protect itself against concurrent accesses. Otherwise concurrent calls using the split interfaces are likely to corrupt the ring indexes, leaving the console unusable. Introduce a lock to xencons_info to serialize accesses to the shared ring. This is only required when using the shared memory console, concurrent accesses to the hypercall based console implementation are not an issue. Note the conditional logic in domU_read_console() is slightly modified so the notify_daemon() call can be done outside of the locked region: it's an hypercall and there's no need for it to be done with the lock held. Fixes: b536b4b96230 ('xen: use the hvc console infrastructure for Xen console') Signed-off-by: Roger Pau Monné --- Changes since v1: - Properly initialize the introduced lock in all paths. --- drivers/tty/hvc/hvc_xen.c | 19 +-- 1 file changed, 17 insertions(+), 2 deletions(-) diff --git a/drivers/tty/hvc/hvc_xen.c b/drivers/tty/hvc/hvc_xen.c index 7c23112dc923..e63c1761a361 100644 --- a/drivers/tty/hvc/hvc_xen.c +++ b/drivers/tty/hvc/hvc_xen.c @@ -43,6 +43,7 @@ struct xencons_info { int irq; int vtermno; grant_ref_t gntref; + spinlock_t ring_lock; }; static LIST_HEAD(xenconsoles); @@ -84,12 +85,15 @@ static int __write_console(struct xencons_info *xencons, XENCONS_RING_IDX cons, prod; struct xencons_interface *intf = xencons->intf; int sent = 0; + unsigned long flags; + spin_lock_irqsave(>ring_lock, flags); cons = intf->out_cons; prod = intf->out_prod; mb(); /* update queue values before going on */ if ((prod - cons) > sizeof(intf->out)) { + spin_unlock_irqrestore(>ring_lock, flags); pr_err_once("xencons: Illegal ring page indices"); return -EINVAL; } @@ -99,6 +103,7 @@ static int __write_console(struct xencons_info *xencons, wmb(); /* write ring before updating pointer */ intf->out_prod = prod; + spin_unlock_irqrestore(>ring_lock, flags); if (sent) notify_daemon(xencons); @@ -141,16 +146,19 @@ static int domU_read_console(uint32_t vtermno, char *buf, int len) int recv = 0; struct xencons_info *xencons = vtermno_to_xencons(vtermno); unsigned int eoiflag = 0; + unsigned long flags; if (xencons == NULL) return -EINVAL; intf = xencons->intf; + spin_lock_irqsave(>ring_lock, flags); cons = intf->in_cons; prod = intf->in_prod; mb(); /* get pointers before reading ring */ if ((prod - cons) > sizeof(intf->in)) { + spin_unlock_irqrestore(>ring_lock, flags); pr_err_once("xencons: Illegal ring page indices"); return -EINVAL; } @@ -174,10 +182,13 @@ static int domU_read_console(uint32_t vtermno, char *buf, int len) xencons->out_cons = intf->out_cons; xencons->out_cons_same = 0; } + if (!recv && xencons->out_cons_same++ > 1) { + eoiflag = XEN_EOI_FLAG_SPURIOUS; + } + spin_unlock_irqrestore(>ring_lock, flags); + if (recv) { notify_daemon(xencons); - } else if (xencons->out_cons_same++ > 1) { - eoiflag = XEN_EOI_FLAG_SPURIOUS; } xen_irq_lateeoi(xencons->irq, eoiflag); @@ -234,6 +245,7 @@ static int xen_hvm_console_init(void) info = kzalloc(sizeof(struct xencons_info), GFP_KERNEL); if (!info) return -ENOMEM; + spin_lock_init(>ring_lock); } else if (info->intf != NULL) { /* already configured */ return 0; @@ -270,6 +282,7 @@ static int xen_hvm_console_init(void) static int xencons_info_pv_init(struct xencons_info *info, int vtermno) { + spin_lock_init(>ring_lock); info->evtchn = xen_start_info->console.domU.evtchn; /* GFN == MFN for PV guest */ info->intf = gfn_to_virt(xen_start_info->console.domU.mfn); @@ -318,6 +331,7 @@ static int xen_initial_domain_console_init(void) info = kzalloc(sizeof(struct xencons_info), GFP_KERNEL); if (!info) return -ENOMEM; + spin_lock_init(>ring_lock); } info->irq = bind_virq_to_irq(VIRQ_CONSOLE, 0, false); @@ -472,6 +486,7 @@ static int xencons_probe(struct xenbus_device *dev, info = kzalloc(sizeof(struct xencons_info), GFP_KERNEL); if (!info) return -ENOMEM; +
Re: linux-next: build warnings after merge of the powerpc-objtool tree
On 29/11/22 20:58, Christophe Leroy wrote: Le 29/11/2022 à 16:13, Sathvika Vasireddy a écrit : Hi all, On 25/11/22 09:00, Stephen Rothwell wrote: Hi all, After merging the powerpc-objtool tree, today's linux-next build (powerpc pseries_le_defconfig) produced these warnings: arch/powerpc/kernel/head_64.o: warning: objtool: end_first_256B(): can't find starting instruction arch/powerpc/kernel/optprobes_head.o: warning: objtool: optprobe_template_end(): can't find starting instruction I have no idea what started this (they may have been there yesterday). I was able to recreate the above mentioned warnings with pseries_le_defconfig and powernv_defconfig. The regression report also mentions a warning (https://lore.kernel.org/oe-kbuild-all/202211282102.qur7hhrw-...@intel.com/) seen with arch/powerpc/kernel/kvm_emul.S assembly file. [1] arch/powerpc/kernel/optprobes_head.o: warning: objtool: optprobe_template_end(): can't find starting instruction [2] arch/powerpc/kernel/kvm_emul.o: warning: objtool: kvm_template_end(): can't find starting instruction [3] arch/powerpc/kernel/head_64.o: warning: objtool: end_first_256B(): can't find starting instruction The warnings [1] and [2] go away after adding 'nop' instruction. Below diff fixes it for me: You have to add NOPs just because those labels are at the end of the files. That's a bit odd. I think either we are missing some kind of flagging for the symbols, or objtool has a bug. In both cases, I'm not sure adding an artificial 'nop' is the solution. At least there should be a big hammer warning explaining why. I don't see these warnings with powerpc/topic/objtool branch. However, they are seen with linux-next master branch. Commit dbcdbdfdf137b49144204571f1a5e5dc01b8aaad objtool: Rework instruction -> symbol mapping in linux-next is resulting in objtool can't find starting instruction warnings on powerpc. Reverting this particular hunk (pasted below), resolves it and we don't see the problem anymore. @@ -427,7 +427,10 @@ static int decode_instructions(struct objtool_file *file) } list_for_each_entry(func, >symbol_list, list) { - if (func->type != STT_FUNC || func->alias != func) + if (func->type != STT_NOTYPE && func->type != STT_FUNC) + continue; + + if (func->return_thunk || func->alias != func) continue; if (!find_insn(file, sec, func->offset)) { Peterz, can we ignore STT_NOTYPE symbols? diff --git a/arch/powerpc/kernel/optprobes_head.S b/arch/powerpc/kernel/optprobes_head.S index cd4e7bc32609..ea4e3bd82f4f 100644 --- a/arch/powerpc/kernel/optprobes_head.S +++ b/arch/powerpc/kernel/optprobes_head.S @@ -134,3 +134,4 @@ optprobe_template_ret: .global optprobe_template_end optprobe_template_end: + nop diff --git a/arch/powerpc/kernel/kvm_emul.S b/arch/powerpc/kernel/kvm_emul.S index 7af6f8b50c5d..41fd664e3ba0 100644 --- a/arch/powerpc/kernel/kvm_emul.S +++ b/arch/powerpc/kernel/kvm_emul.S @@ -352,3 +352,4 @@ kvm_tmp_end: .global kvm_template_end kvm_template_end: + nop For warning [3], objtool is throwing can't find starting instruction warning because it finds that the symbol (end_first_256B) is zero sized, and such symbols are not added to the rbtree. I tried to fix it by adding a 'nop' instruction (pasted diff below), but that resulted in a kernel build failure. What's the failure ? diff --git a/arch/powerpc/kernel/head_64.S b/arch/powerpc/kernel/head_64.S index 874efd25cc45..d48850fe159f 100644 --- a/arch/powerpc/kernel/head_64.S +++ b/arch/powerpc/kernel/head_64.S @@ -192,6 +192,7 @@ __secondary_hold: EMIT_BUG_ENTRY 0b, __FILE__, __LINE__, 0 #endif CLOSE_FIXED_SECTION(first_256B) +nop /* * On server, we include the exception vectors code here as it diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S index 26f8fef53c72..f7517d443e9b 100644 --- a/arch/powerpc/kernel/exceptions-64s.S +++ b/arch/powerpc/kernel/exceptions-64s.S @@ -3104,9 +3104,13 @@ __end_interrupts: DEFINE_FIXED_SYMBOL(__end_interrupts, virt_trampolines) CLOSE_FIXED_SECTION(real_vectors); +nop CLOSE_FIXED_SECTION(real_trampolines); +nop CLOSE_FIXED_SECTION(virt_vectors); +nop CLOSE_FIXED_SECTION(virt_trampolines); +nop What are the NOPs after the CLOSE_FIXED_SECTION() ? You don't explain them, and I can't see any related warning in the warnings you show. After fixing arch/powerpc/kernel/head_64.o: warning: objtool: end_first_256B(): can't find starting instruction warning, objtool started showing more warnings in the same file. Below is the list of warnings: arch/powerpc/kernel/head_64.o: warning: objtool: end_real_vectors(): can't find starting instruction arch/powerpc/kernel/head_64.o: warning: objtool: end_real_trampolines(): can't find starting
Re: [syzbot] WARNING in btrfs_free_reserved_data_space_noquota
syzbot writes: > Hello, > > syzbot found the following issue on: > > HEAD commit:b7b275e60bcd Linux 6.1-rc7 > git tree: upstream > console+strace: https://syzkaller.appspot.com/x/log.txt?x=158a7b7388 > kernel config: https://syzkaller.appspot.com/x/.config?x=2325e409a9a893e1 > dashboard link: https://syzkaller.appspot.com/bug?extid=adec8406ad17413d4c06 > compiler: Debian clang version > 13.0.1-++20220126092033+75e33f71c2da-1~exp1~20220126212112.63, GNU ld (GNU > Binutils for Debian) 2.35.2 > syz repro: https://syzkaller.appspot.com/x/repro.syz?x=169ccb7588 > C reproducer: https://syzkaller.appspot.com/x/repro.c?x=17bf715388 > > Downloadable assets: > disk image: > https://storage.googleapis.com/syzbot-assets/525233126d34/disk-b7b275e6.raw.xz > vmlinux: > https://storage.googleapis.com/syzbot-assets/e8299bf41400/vmlinux-b7b275e6.xz > kernel image: > https://storage.googleapis.com/syzbot-assets/eebf691dbf6f/bzImage-b7b275e6.xz > mounted in repro: > https://storage.googleapis.com/syzbot-assets/5423c2d2ad62/mount_0.gz > > The issue was bisected to: > > commit c814bf958926ff45a9c1e899bd001006ab6cfbae > Author: ye xingchen > Date: Tue Aug 16 10:51:06 2022 + > > powerpc/selftests: Use timersub() for gettimeofday() That can't be right, that patch only touches tools/testing/selftests/powerpc/benchmarks/gettimeofday.c which isn't built into vmlinux - and definitely not for an x86 build. > bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=118c3d0388 This says: Reproducer flagged being flaky AFAICS there isn't a syzbot command to ask for a new bisection, so someone will have to do it manually. cheers > final oops: https://syzkaller.appspot.com/x/report.txt?x=138c3d0388 > console output: https://syzkaller.appspot.com/x/log.txt?x=158c3d0388 > > IMPORTANT: if you fix the issue, please add the following tag to the commit: > Reported-by: syzbot+adec8406ad17413d4...@syzkaller.appspotmail.com > Fixes: c814bf958926 ("powerpc/selftests: Use timersub() for gettimeofday()") > > RDX: 0001 RSI: 2280 RDI: 0005 > RBP: 7ffd32e91c70 R08: R09: > R10: 0800 R11: 0246 R12: 0006 > R13: 7ffd32e91cb0 R14: 7ffd32e91c90 R15: 0006 > > [ cut here ] > WARNING: CPU: 1 PID: 3764 at fs/btrfs/space-info.h:122 > btrfs_space_info_free_bytes_may_use fs/btrfs/space-info.h:154 [inline] > WARNING: CPU: 1 PID: 3764 at fs/btrfs/space-info.h:122 > btrfs_free_reserved_data_space_noquota+0x219/0x2b0 > fs/btrfs/delalloc-space.c:179 > Modules linked in: > CPU: 1 PID: 3764 Comm: syz-executor759 Not tainted 6.1.0-rc7-syzkaller #0 > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS > Google 10/26/2022 > RIP: 0010:btrfs_space_info_update_bytes_may_use fs/btrfs/space-info.h:122 > [inline] > RIP: 0010:btrfs_space_info_free_bytes_may_use fs/btrfs/space-info.h:154 > [inline] > RIP: 0010:btrfs_free_reserved_data_space_noquota+0x219/0x2b0 > fs/btrfs/delalloc-space.c:179 > Code: 2f 00 74 08 4c 89 ef e8 b5 98 32 fe 49 8b 5d 00 48 89 df 4c 8b 74 24 08 > 4c 89 f6 e8 21 81 de fd 4c 39 f3 73 16 e8 d7 7e de fd <0f> 0b 31 db 4c 8b 34 > 24 41 80 3c 2f 00 75 8c eb 92 e8 c1 7e de fd > RSP: 0018:c9000443f410 EFLAGS: 00010293 > RAX: 83ac1919 RBX: 005cb000 RCX: 888027989d40 > RDX: RSI: 0080 RDI: 005cb000 > RBP: dc00 R08: 83ac190f R09: fbfff1cebe0e > R10: fbfff1cebe0e R11: 11cebe0d R12: 8880774f3800 > R13: 8880774f3860 R14: 0080 R15: 11100ee9e70c > FS: 55aaa300() GS:8880b990() knlGS: > CS: 0010 DS: ES: CR0: 80050033 > CR2: 7f0d98f20140 CR3: 25ccf000 CR4: 003506e0 > DR0: DR1: DR2: > DR3: DR6: fffe0ff0 DR7: 0400 > Call Trace: > > btrfs_free_reserved_data_space+0x9d/0xd0 fs/btrfs/delalloc-space.c:199 > btrfs_dio_iomap_begin+0x8f7/0x1070 fs/btrfs/inode.c:7762 > iomap_iter+0x606/0x8a0 fs/iomap/iter.c:74 > __iomap_dio_rw+0xd91/0x20d0 fs/iomap/direct-io.c:601 > btrfs_dio_write+0x9c/0xe0 fs/btrfs/inode.c:8094 > btrfs_direct_write fs/btrfs/file.c:1835 [inline] > btrfs_do_write_iter+0x871/0x1260 fs/btrfs/file.c:1980 > do_iter_write+0x6c2/0xc20 fs/read_write.c:861 > vfs_writev fs/read_write.c:934 [inline] > do_pwritev+0x200/0x350 fs/read_write.c:1031 > do_syscall_x64 arch/x86/entry/common.c:50 [inline] > do_syscall_64+0x3d/0xb0 arch/x86/entry/common.c:80 > entry_SYSCALL_64_after_hwframe+0x63/0xcd > RIP: 0033:0x7f0d98ea8ea9 > Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 41 15 00 00 90 48 89 f8 48 89 f7 48 > 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 > 01 c3 48 c7 c1 c0 ff ff ff f7 d8 64 89 01 48 >
Re: [PATCH] hvc/xen: prevent concurrent accesses to the shared ring
On Wed, Nov 30, 2022 at 10:34:41AM +0100, Jan Beulich wrote: > On 30.11.2022 10:26, Roger Pau Monné wrote: > > On Tue, Nov 29, 2022 at 02:12:10PM -0800, Stefano Stabellini wrote: > >> On Tue, 29 Nov 2022, Roger Pau Monne wrote: > >>> The hvc machinery registers both a console and a tty device based on > >>> the hv ops provided by the specific implementation. Those two > >>> interfaces however have different locks, and there's no single locks > >>> that's shared between the tty and the console implementations, hence > >>> the driver needs to protect itself against concurrent accesses. > >>> Otherwise concurrent calls using the split interfaces are likely to > >>> corrupt the ring indexes, leaving the console unusable. > >>> > >>> Introduce a lock to xencons_info to serialize accesses to the shared > >>> ring. This is only required when using the shared memory console, > >>> concurrent accesses to the hypercall based console implementation are > >>> not an issue. > >>> > >>> Note the conditional logic in domU_read_console() is slightly modified > >>> so the notify_daemon() call can be done outside of the locked region: > >>> it's an hypercall and there's no need for it to be done with the lock > >>> held. > >>> > >>> Fixes: b536b4b96230 ('xen: use the hvc console infrastructure for Xen > >>> console') > >>> Signed-off-by: Roger Pau Monné > >>> --- > >>> While the write handler (domU_write_console()) is used by both the > >>> console and the tty ops, that's not the case for the read side > >>> (domU_read_console()). It's not obvious to me whether we could get > >>> concurrent poll calls from the poll_get_char tty hook, hence stay on > >>> the safe side also serialize read accesses in domU_read_console(). > >> > >> I think domU_read_console doesn't need it. struct hv_ops and struct > >> console are both already locked although independently locked. > >> > >> I think we shouldn't add an unrequired lock there. > > > > Not all accesses are done using the tty lock. There's a path using > > tty_find_polling_driver() in kgdboc.c that directly calls into the > > ->poll_get_char() hook without any locks apparently taken. > > Simply by the name of the file I'm inclined to say that debugger code > not respecting locks may be kind of intentional (but would then need > to be accompanied by certain other precautions there). I'm also confused because hvc_poll() which calls get_chars() does so while holding an hvc lock, while hvc_poll_get_char() calls get_chars() without holding any lock. The call to get_chars() being done with a lock held in hvc_poll() might just be a side-effect of the lock being hold to keep consistency in the hvc_struct struct. I also wonder whether new users of tty_find_polling_driver() and ->poll_get_char() could start appearing and assuming that the underlying implementation would already take the necessary locks for consistency. Just looking at hvc_vio.c it does take a lock in its get_chars() implementation to serialize accesses to the buffer. Thanks, Roger.
Re: [PATCH] hvc/xen: prevent concurrent accesses to the shared ring
On 30.11.2022 10:26, Roger Pau Monné wrote: > On Tue, Nov 29, 2022 at 02:12:10PM -0800, Stefano Stabellini wrote: >> On Tue, 29 Nov 2022, Roger Pau Monne wrote: >>> The hvc machinery registers both a console and a tty device based on >>> the hv ops provided by the specific implementation. Those two >>> interfaces however have different locks, and there's no single locks >>> that's shared between the tty and the console implementations, hence >>> the driver needs to protect itself against concurrent accesses. >>> Otherwise concurrent calls using the split interfaces are likely to >>> corrupt the ring indexes, leaving the console unusable. >>> >>> Introduce a lock to xencons_info to serialize accesses to the shared >>> ring. This is only required when using the shared memory console, >>> concurrent accesses to the hypercall based console implementation are >>> not an issue. >>> >>> Note the conditional logic in domU_read_console() is slightly modified >>> so the notify_daemon() call can be done outside of the locked region: >>> it's an hypercall and there's no need for it to be done with the lock >>> held. >>> >>> Fixes: b536b4b96230 ('xen: use the hvc console infrastructure for Xen >>> console') >>> Signed-off-by: Roger Pau Monné >>> --- >>> While the write handler (domU_write_console()) is used by both the >>> console and the tty ops, that's not the case for the read side >>> (domU_read_console()). It's not obvious to me whether we could get >>> concurrent poll calls from the poll_get_char tty hook, hence stay on >>> the safe side also serialize read accesses in domU_read_console(). >> >> I think domU_read_console doesn't need it. struct hv_ops and struct >> console are both already locked although independently locked. >> >> I think we shouldn't add an unrequired lock there. > > Not all accesses are done using the tty lock. There's a path using > tty_find_polling_driver() in kgdboc.c that directly calls into the > ->poll_get_char() hook without any locks apparently taken. Simply by the name of the file I'm inclined to say that debugger code not respecting locks may be kind of intentional (but would then need to be accompanied by certain other precautions there). Jan
Re: [PATCH] macintosh/windfarm_pid: Add header file macro definition
On Thu, 7 Jul 2022 09:59:49 +0800, Li zeming wrote: > I think the header file could avoid redefinition errors. > at compile time by adding macro definitions. > > Applied to powerpc/next. [1/1] macintosh/windfarm_pid: Add header file macro definition https://git.kernel.org/powerpc/c/e3e528d29d13c01289f382a0d3ddb5312ac3dae3 cheers
Re: [PATCH] macintosh/ams/ams: Add header file macro definition
On Thu, 7 Jul 2022 09:53:52 +0800, Li zeming wrote: > Add header file macro definition. > > Applied to powerpc/next. [1/1] macintosh/ams/ams: Add header file macro definition https://git.kernel.org/powerpc/c/2dfcace75e1e1dfbd89af63fce1bfe8aebe38427 cheers