Re: [PATCH 0/2] powerpc: Disable syscall emulation and stepping
Le 24/01/2022 à 06:57, Nicholas Piggin a écrit : > As discussed previously > > https://lists.ozlabs.org/pipermail/linuxppc-dev/2022-January/238946.html > > I'm wondering whether PPC32 should be returning -1 for syscall > instructions too here? That could be done in another patch anyway. > The 'Programming Environments Manual for 32-Bit Implementations of the PowerPC™ Architecture' says: The following are not traced: • rfi instruction • sc and trap instructions that trap • Other instructions that cause interrupts (other than trace interrupts) • The first instruction of any interrupt handler • Instructions that are emulated by software So I think PPC32 should return -1 as well. Christophe
[PATCH 2/2] powerpc/uprobes: Reject uprobe on a system call instruction
Per the ISA, a Trace interrupt is not generated for a system call [vectored] instruction. Reject uprobes on such instructions as we are not emulating a system call [vectored] instruction anymore. Signed-off-by: Naveen N. Rao [np: Switch to pr_info_ratelimited] Signed-off-by: Nicholas Piggin --- arch/powerpc/include/asm/ppc-opcode.h | 1 + arch/powerpc/kernel/uprobes.c | 6 ++ 2 files changed, 7 insertions(+) diff --git a/arch/powerpc/include/asm/ppc-opcode.h b/arch/powerpc/include/asm/ppc-opcode.h index 9675303b724e..8bbe16ce5173 100644 --- a/arch/powerpc/include/asm/ppc-opcode.h +++ b/arch/powerpc/include/asm/ppc-opcode.h @@ -411,6 +411,7 @@ #define PPC_RAW_DCBFPS(a, b) (0x7cac | ___PPC_RA(a) | ___PPC_RB(b) | (4 << 21)) #define PPC_RAW_DCBSTPS(a, b) (0x7cac | ___PPC_RA(a) | ___PPC_RB(b) | (6 << 21)) #define PPC_RAW_SC() (0x4402) +#define PPC_RAW_SCV() (0x4401) #define PPC_RAW_SYNC() (0x7c0004ac) #define PPC_RAW_ISYNC()(0x4c00012c) diff --git a/arch/powerpc/kernel/uprobes.c b/arch/powerpc/kernel/uprobes.c index c6975467d9ff..3779fde804bd 100644 --- a/arch/powerpc/kernel/uprobes.c +++ b/arch/powerpc/kernel/uprobes.c @@ -41,6 +41,12 @@ int arch_uprobe_analyze_insn(struct arch_uprobe *auprobe, if (addr & 0x03) return -EINVAL; + if (ppc_inst_val(ppc_inst_read(auprobe->insn)) == PPC_RAW_SC() || + ppc_inst_val(ppc_inst_read(auprobe->insn)) == PPC_RAW_SCV()) { + pr_info_ratelimited("Rejecting uprobe on system call instruction\n"); + return -EINVAL; + } + if (cpu_has_feature(CPU_FTR_ARCH_31) && ppc_inst_prefixed(ppc_inst_read(auprobe->insn)) && (addr & 0x3f) == 60) { -- 2.23.0
[PATCH 0/2] powerpc: Disable syscall emulation and stepping
As discussed previously https://lists.ozlabs.org/pipermail/linuxppc-dev/2022-January/238946.html I'm wondering whether PPC32 should be returning -1 for syscall instructions too here? That could be done in another patch anyway. Thanks, Nick Nicholas Piggin (2): powerpc/64: remove system call instruction emulation powerpc/uprobes: Reject uprobe on a system call instruction arch/powerpc/include/asm/ppc-opcode.h | 1 + arch/powerpc/kernel/interrupt_64.S| 10 --- arch/powerpc/kernel/uprobes.c | 6 arch/powerpc/lib/sstep.c | 42 +++ 4 files changed, 18 insertions(+), 41 deletions(-) -- 2.23.0
[PATCH 1/2] powerpc/64: remove system call instruction emulation
emulate_step instruction emulation including sc instruction emulation initially appeared in xmon. It then emulation code was then moved into sstep.c where kprobes could use it too, and later hw_breakpoint and uprobes started to use it. Until uprobes, the only instruction emulation users were for kernel mode instructions. - xmon only steps / breaks on kernel addresses. - kprobes is kernel only. - hw_breakpoint only emulates kernel instructions, single steps user. At one point there was support for the kernel to execute sc instructions, although that is long removed and it's not clear whether there was any in-tree code. So system call emulation is not required by the above users. uprobes uses emulate_step and it appears possible to emulate sc instruction in userspace. Userspace system call emulation is broken and it's not clear it ever worked well. The big complication is that userspace takes an interrupt to the kernel to emulate the instruction. The user->kernel interrupt sets up registers and interrupt stack frame expecting to return to userspace, then system call instruction emulation re-directs that stack frame to the kernel, early in the system call interrupt handler. This means the the interrupt return code takes the kernel->kernel restore path, which does not restore everything as the system call interrupt handler would expect coming from userspace. regs->iamr appears to get lost for example, because the kernel->kernel return does not restore the user iamr. Accounting such as irqflags tracing and CPU accounting does not get flipped back to user mode as the system call handler expects, so those appear to enter the kernel twice without returning to userspace. These things may be individually fixable with various complication, but it is a big complexity for unclear real benefit. This patch removes system call emulation and disables stepping system calls (because they don't work with trace interrupts, as commented). Acked-by: Naveen N. Rao Signed-off-by: Nicholas Piggin --- arch/powerpc/kernel/interrupt_64.S | 10 --- arch/powerpc/lib/sstep.c | 42 -- 2 files changed, 11 insertions(+), 41 deletions(-) diff --git a/arch/powerpc/kernel/interrupt_64.S b/arch/powerpc/kernel/interrupt_64.S index 7bab2d7de372..6471034c7909 100644 --- a/arch/powerpc/kernel/interrupt_64.S +++ b/arch/powerpc/kernel/interrupt_64.S @@ -219,16 +219,6 @@ system_call_vectored common 0x3000 */ system_call_vectored sigill 0x7ff0 - -/* - * Entered via kernel return set up by kernel/sstep.c, must match entry regs - */ - .globl system_call_vectored_emulate -system_call_vectored_emulate: -_ASM_NOKPROBE_SYMBOL(system_call_vectored_emulate) - li r10,IRQS_ALL_DISABLED - stb r10,PACAIRQSOFTMASK(r13) - b system_call_vectored_common #endif /* CONFIG_PPC_BOOK3S */ .balign IFETCH_ALIGN_BYTES diff --git a/arch/powerpc/lib/sstep.c b/arch/powerpc/lib/sstep.c index a94b0cd0bdc5..5f317b12b2db 100644 --- a/arch/powerpc/lib/sstep.c +++ b/arch/powerpc/lib/sstep.c @@ -15,9 +15,6 @@ #include #include -extern char system_call_common[]; -extern char system_call_vectored_emulate[]; - #ifdef CONFIG_PPC64 /* Bits in SRR1 that are copied from MSR */ #define MSR_MASK 0x87c0UL @@ -3650,39 +3647,22 @@ int emulate_step(struct pt_regs *regs, ppc_inst_t instr) goto instr_done; #ifdef CONFIG_PPC64 - case SYSCALL: /* sc */ /* -* N.B. this uses knowledge about how the syscall -* entry code works. If that is changed, this will -* need to be changed also. +* Per ISA v3.1, section 7.5.15 'Trace Interrupt', we can't +* single step a system call instruction: +* +* Successful completion for an instruction means that the +* instruction caused no other interrupt. Thus a Trace +* interrupt never occurs for a System Call or System Call +* Vectored instruction, or for a Trap instruction that +* traps. */ - if (IS_ENABLED(CONFIG_PPC_FAST_ENDIAN_SWITCH) && - cpu_has_feature(CPU_FTR_REAL_LE) && - regs->gpr[0] == 0x1ebe) { - regs_set_return_msr(regs, regs->msr ^ MSR_LE); - goto instr_done; - } - regs->gpr[9] = regs->gpr[13]; - regs->gpr[10] = MSR_KERNEL; - regs->gpr[11] = regs->nip + 4; - regs->gpr[12] = regs->msr & MSR_MASK; - regs->gpr[13] = (unsigned long) get_paca(); - regs_set_return_ip(regs, (unsigned long) &system_call_common); - regs_set_return_msr(regs, MSR_KERNEL); - return 1; - + case SYSCALL: /* sc */ + return -1; #ifdef CONFIG_PPC_BOOK3S
Re: [PATCH v3 1/2] mm/cma: provide option to opt out from exposing pages on activation failure
Hi Andrew, Could you please pick these patches via -mm tree. On 17/01/22 1:22 pm, Hari Bathini wrote: Commit 072355c1cf2d ("mm/cma: expose all pages to the buddy if activation of an area fails") started exposing all pages to buddy allocator on CMA activation failure. But there can be CMA users that want to handle the reserved memory differently on CMA allocation failure. Provide an option to opt out from exposing pages to buddy for such cases. Signed-off-by: Hari Bathini Reviewed-by: David Hildenbrand --- Changes in v3: * Dropped NULL check in cma_reserve_pages_on_error(). * Dropped explicit initialization of cma->reserve_pages_on_error to 'false' in cma_init_reserved_mem(). * Added Reviewed-by tag from David. Changes in v2: * Changed cma->free_pages_on_error to cma->reserve_pages_on_error and cma_dont_free_pages_on_error() to cma_reserve_pages_on_error() to avoid confusion. include/linux/cma.h | 2 ++ mm/cma.c| 11 +-- mm/cma.h| 1 + 3 files changed, 12 insertions(+), 2 deletions(-) diff --git a/include/linux/cma.h b/include/linux/cma.h index bd801023504b..51d540eee18a 100644 --- a/include/linux/cma.h +++ b/include/linux/cma.h @@ -50,4 +50,6 @@ extern bool cma_pages_valid(struct cma *cma, const struct page *pages, unsigned extern bool cma_release(struct cma *cma, const struct page *pages, unsigned long count); extern int cma_for_each_area(int (*it)(struct cma *cma, void *data), void *data); + +extern void cma_reserve_pages_on_error(struct cma *cma); #endif diff --git a/mm/cma.c b/mm/cma.c index bc9ca8f3c487..766f1b82b532 100644 --- a/mm/cma.c +++ b/mm/cma.c @@ -131,8 +131,10 @@ static void __init cma_activate_area(struct cma *cma) bitmap_free(cma->bitmap); out_error: /* Expose all pages to the buddy, they are useless for CMA. */ - for (pfn = base_pfn; pfn < base_pfn + cma->count; pfn++) - free_reserved_page(pfn_to_page(pfn)); + if (!cma->reserve_pages_on_error) { + for (pfn = base_pfn; pfn < base_pfn + cma->count; pfn++) + free_reserved_page(pfn_to_page(pfn)); + } totalcma_pages -= cma->count; cma->count = 0; pr_err("CMA area %s could not be activated\n", cma->name); @@ -150,6 +152,11 @@ static int __init cma_init_reserved_areas(void) } core_initcall(cma_init_reserved_areas); +void __init cma_reserve_pages_on_error(struct cma *cma) +{ + cma->reserve_pages_on_error = true; +} + /** * cma_init_reserved_mem() - create custom contiguous area from reserved memory * @base: Base address of the reserved area diff --git a/mm/cma.h b/mm/cma.h index 2c775877eae2..88a0595670b7 100644 --- a/mm/cma.h +++ b/mm/cma.h @@ -30,6 +30,7 @@ struct cma { /* kobject requires dynamic object */ struct cma_kobject *cma_kobj; #endif + bool reserve_pages_on_error; }; extern struct cma cma_areas[MAX_CMA_AREAS]; Thanks Hari
Re: [PATCH v3 2/2] powerpc/fadump: opt out from freeing pages on cma activation failure
Hari Bathini writes: > With commit a4e92ce8e4c8 ("powerpc/fadump: Reservationless firmware > assisted dump"), Linux kernel's Contiguous Memory Allocator (CMA) > based reservation was introduced in fadump. That change was aimed at > using CMA to let applications utilize the memory reserved for fadump > while blocking it from being used for kernel pages. The assumption > was, even if CMA activation fails for whatever reason, the memory > still remains reserved to avoid it from being used for kernel pages. > But commit 072355c1cf2d ("mm/cma: expose all pages to the buddy if > activation of an area fails") breaks this assumption as it started > exposing all pages to buddy allocator on CMA activation failure. > It led to warning messages like below while running crash-utility > on vmcore of a kernel having above two commits: > > crash: seek error: kernel virtual address: > > To fix this problem, opt out from exposing pages to buddy allocator > on CMA activation failure for fadump reserved memory. > > Signed-off-by: Hari Bathini > Acked-by: David Hildenbrand > --- > > Changes in v3: > * Added Acked-by tag from David. > > > arch/powerpc/kernel/fadump.c | 6 ++ > 1 file changed, 6 insertions(+) Acked-by: Michael Ellerman cheers > diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c > index d03e488cfe9c..d0ad86b67e66 100644 > --- a/arch/powerpc/kernel/fadump.c > +++ b/arch/powerpc/kernel/fadump.c > @@ -112,6 +112,12 @@ static int __init fadump_cma_init(void) > return 1; > } > > + /* > + * If CMA activation fails, keep the pages reserved, instead of > + * exposing them to buddy allocator. Same as 'fadump=nocma' case. > + */ > + cma_reserve_pages_on_error(fadump_cma); > + > /* >* So we now have successfully initialized cma area for fadump. >*/ > -- > 2.34.1
Re: [RFC PATCH 0/2] powerpc/pseries: add support for local secure storage called Platform Keystore(PKS)
Hi Greg, > Ok, this is like the 3rd or 4th different platform-specific proposal for > this type of functionality. I think we need to give up on > platform-specific user/kernel apis on this (random sysfs/securityfs > files scattered around the tree), and come up with a standard place for > all of this. I agree that we do have a number of platforms exposing superficially similar functionality. Indeed, back in 2019 I had a crack at a unified approach: [1] [2]. Looking back at it now, I am not sure it ever would have worked because the semantics of the underlying firmware stores are quite different. Here are the ones I know about: - OpenPower/PowerNV Secure Variables: * Firmware semantics: - flat variable space - variables are fixed in firmware, can neither be created nor destroyed - variable names are ASCII - no concept of policy/attributes * Current kernel interface semantics: - names are case sensitive - directory per variable - (U)EFI variables: * Firmware semantics: - flat variable space - variables can be created/destroyed but the semantics are fiddly [3] - variable names are UTF-16 + UUID - variables have 32-bit attributes * efivarfs interface semantics: - file per variable - attributes are the first 4 bytes of the file - names are partially case-insensitive (UUID part) and partially case-sensitive ('name' part) * sysfs interface semantics (as used by CONFIG_GOOGLE_SMI) - directory per variable - attributes are a separate sysfs file - to create a variable you write a serialised structure to `/sys/firmware/efi/vars/new_var`, to delete a var you write to `.../del_var` - names are case-sensitive including the UUID - PowerVM Partition Key Store Variables: * Firmware semantics: - _not_ a flat space, there are 3 domains ("consumers"): firmware, bootloader and OS (not yet supported by the patch set) - variables can be created and destroyed but the semantics are fiddly and fiddly in different ways to UEFI [4] - variable names are arbitrary byte strings: the hypervisor permits names to contain nul and /. - variables have 32-bit attributes ("policy") that don't align with UEFI attributes * No stable kernel interface yet Even if we could come up with some stable kernel interface features (e.g. decide if we want file per variable vs directory per variable), I don't know how easy it would be to deal with the underlying semantic differences - I think userspace would still need substantial per-platform knowledge. Or have I misunderstood what you're asking for? (If you want them all to live under /sys/firmware, these ones all already do...) Kind regards, Daniel [1]: https://lists.ozlabs.org/pipermail/linuxppc-dev/2019-May/190735.html [2]: discussion continues at https://lists.ozlabs.org/pipermail/linuxppc-dev/2019-June/191365.html [3]: https://lists.ozlabs.org/pipermail/linuxppc-dev/2019-June/191496.html [4]: An unsigned variable cannot be updated, it can only be deleted (unless it was created with the immutable policy) and then re-created. A signed variable, on the other hand, can be updated and the only way to delete it is to submit a validly signed empty update.
[PATCH 46/54] soc: replace cpumask_weight with cpumask_weight_lt
qman_test_stash() calls cpumask_weight() to compare the weight of cpumask with a given number. We can do it more efficiently with cpumask_weight_lt because conditional cpumask_weight may stop traversing the cpumask earlier, as soon as condition is met. Signed-off-by: Yury Norov --- drivers/soc/fsl/qbman/qman_test_stash.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/soc/fsl/qbman/qman_test_stash.c b/drivers/soc/fsl/qbman/qman_test_stash.c index b7e8e5ec884c..28b08568a349 100644 --- a/drivers/soc/fsl/qbman/qman_test_stash.c +++ b/drivers/soc/fsl/qbman/qman_test_stash.c @@ -561,7 +561,7 @@ int qman_test_stash(void) { int err; - if (cpumask_weight(cpu_online_mask) < 2) { + if (cpumask_weight_lt(cpu_online_mask, 2)) { pr_info("%s(): skip - only 1 CPU\n", __func__); return 0; } -- 2.30.2
[PATCH 39/54] arch/powerpc: replace cpumask_weight with cpumask_weight_{eq, ...} where appropriate
PowerPC code uses cpumask_weight() to compare the weight of cpumask with a given number. We can do it more efficiently with cpumask_weight_{eq, ...} because conditional cpumask_weight may stop traversing the cpumask earlier, as soon as condition is met. Signed-off-by: Yury Norov --- arch/powerpc/kernel/smp.c | 2 +- arch/powerpc/kernel/watchdog.c | 2 +- arch/powerpc/xmon/xmon.c | 4 ++-- 3 files changed, 4 insertions(+), 4 deletions(-) diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c index b7fd6a72aa76..8bff748df402 100644 --- a/arch/powerpc/kernel/smp.c +++ b/arch/powerpc/kernel/smp.c @@ -1656,7 +1656,7 @@ void start_secondary(void *unused) if (has_big_cores) sibling_mask = cpu_smallcore_mask; - if (cpumask_weight(mask) > cpumask_weight(sibling_mask(cpu))) + if (cpumask_weight_gt(mask, cpumask_weight(sibling_mask(cpu shared_caches = true; } diff --git a/arch/powerpc/kernel/watchdog.c b/arch/powerpc/kernel/watchdog.c index bfc27496fe7e..62937a077de7 100644 --- a/arch/powerpc/kernel/watchdog.c +++ b/arch/powerpc/kernel/watchdog.c @@ -483,7 +483,7 @@ static void start_watchdog(void *arg) wd_smp_lock(&flags); cpumask_set_cpu(cpu, &wd_cpus_enabled); - if (cpumask_weight(&wd_cpus_enabled) == 1) { + if (cpumask_weight_eq(&wd_cpus_enabled, 1)) { cpumask_set_cpu(cpu, &wd_smp_cpus_pending); wd_smp_last_reset_tb = get_tb(); } diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c index fd72753e8ad5..b423812e94e0 100644 --- a/arch/powerpc/xmon/xmon.c +++ b/arch/powerpc/xmon/xmon.c @@ -469,7 +469,7 @@ static bool wait_for_other_cpus(int ncpus) /* We wait for 2s, which is a metric "little while" */ for (timeout = 2; timeout != 0; --timeout) { - if (cpumask_weight(&cpus_in_xmon) >= ncpus) + if (cpumask_weight_ge(&cpus_in_xmon, ncpus)) return true; udelay(100); barrier(); @@ -1338,7 +1338,7 @@ static int cpu_cmd(void) case 'S': case 't': cpumask_copy(&xmon_batch_cpus, &cpus_in_xmon); - if (cpumask_weight(&xmon_batch_cpus) <= 1) { + if (cpumask_weight_le(&xmon_batch_cpus, 1)) { printf("There are no other cpus in xmon\n"); break; } -- 2.30.2
Re: [GIT PULL] Please pull powerpc/linux.git powerpc-5.17-2 tag
The pull request you sent on Sun, 23 Jan 2022 22:19:16 +1100: > https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git > tags/powerpc-5.17-2 has been merged into torvalds/linux.git: https://git.kernel.org/torvalds/c/dd81e1c7d5fb126e5fbc5c9e334d7b3ec29a16a0 Thank you! -- Deet-doot-dot, I am a bot. https://korg.docs.kernel.org/prtracker.html
Re: [PATCH] powerpc: fix building after binutils changes.
Now I remembered reading something from 2013 on 'lwsync', https://gcc.gnu.org/legacy-ml/gcc-patches/2006-11/msg01238.html https://gcc.gnu.org/legacy-ml/gcc-patches/2012-07/msg01062.html so that would end up something like --- a/media/thread/12fd50d6-d14c-42af-ad1d-a595e5f080cd/dev/linux-main/linux/arch/powerpc/lib/sstep.c +++ b/home/thread/dev/linus/linux/arch/powerpc/lib/sstep.c @@ -3265,7 +3265,11 @@ void emulate_update_regs(struct pt_regs *regs, struct instruction_op *op) eieio(); break; case BARRIER_LWSYNC: +#if defined (CONFIG_40x || CONFIG_44x || CONFIG_E500 || CONFIG_PPA8548 || CONFIG_TQM8548 || CONFIG_MPC8540_ADS || CONFIG_PPC_BOOK3S_603) + asm volatile("sync" : : : "memory"); +#else asm volatile("lwsync" : : : "memory"); +#endif break; #ifdef CONFIG_PPC64 case BARRIER_PTESYNC: On Sun, 23 Jan 2022 at 15:18, Mike wrote: > > Maybe cite the correct parts of the patch where my questions arose for > context. > --- > diff --git a/arch/powerpc/lib/sstep.c b/arch/powerpc/lib/sstep.c > index a94b0cd0bdc5ca..4ffd6791b03ec0 100644 > --- a/arch/powerpc/lib/sstep.c > +++ b/arch/powerpc/lib/sstep.c > @@ -1465,7 +1465,7 @@ int analyse_instr(struct instruction_op *op, > const struct pt_regs *regs, > switch ((word >> 1) & 0x3ff) { > case 598: /* sync */ > op->type = BARRIER + BARRIER_SYNC; > -#ifdef __powerpc64__ > +#ifdef CONFIG_PPC64 > switch ((word >> 21) & 3) { > case 1: /* lwsync */ > op->type = BARRIER + BARRIER_LWSYNC; > @@ -3267,9 +3267,11 @@ void emulate_update_regs(struct pt_regs *regs, > struct instruction_op *op) > case BARRIER_LWSYNC: > asm volatile("lwsync" : : : "memory"); > break; > +#ifdef CONFIG_PPC64 > case BARRIER_PTESYNC: > asm volatile("ptesync" : : : "memory"); > break; > +#endif > } > break; > - > > On Sun, 23 Jan 2022 at 14:43, Mike wrote: > > > > As some have probably noticed, we are seeing errors like ' Error: > > unrecognized opcode: `ptesync'' 'dssall' and 'stbcix' as a result of > > binutils changes, making compiling all that more fun again. The only > > question on my mind still is this: > > > > diff --git a/arch/powerpc/include/asm/io.h b/arch/powerpc/include/asm/io.h > > index beba4979bff939..d3a9c91cd06a8b 100644 > > --- a/arch/powerpc/include/asm/io.h > > +++ b/arch/powerpc/include/asm/io.h > > @@ -334,7 +334,7 @@ static inline void __raw_writel(unsigned int v, > > volatile void __iomem *addr) > > } > > #define __raw_writel __raw_writel > > > > -#ifdef __powerpc64__ > > +#ifdef CONFIG_PPC64 > > static inline unsigned long __raw_readq(const volatile void __iomem *addr) > > { > > return *(volatile unsigned long __force *)PCI_FIX_ADDR(addr); > > @@ -352,7 +352,8 @@ static inline void __raw_writeq_be(unsigned long > > v, volatile void __iomem *addr) > > __raw_writeq((__force unsigned long)cpu_to_be64(v), addr); > > } > > #define __raw_writeq_be __raw_writeq_be > > - > > +#endif > > +#ifdef CONFIG_POWER6_CPU > > /* > > * Real mode versions of the above. Those instructions are only supposed > > * to be used in hypervisor real mode as per the architecture spec. > > @@ -417,7 +418,7 @@ static inline u64 __raw_rm_readq(volatile void > > __iomem *paddr) > > : "=r" (ret) : "r" (paddr) : "memory"); > > return ret; > > } > > -#endif /* __powerpc64__ */ > > > > +#endif /* CONFIG_POWER6_CPU */ > > > > --- > > Will there come a mail saying this broke the PPC6'ish based CPU > > someone made in their garage? And lwesync is a valid PPC32 > > instruction, should i just follow the example above where > > BARRIER_LWESYNC is PPC64 only? > > > > https://github.com/threader/linux/commits/master-build-ppc - linux-next > > > > Best regards. > > Michael Heltne
Re: [PATCH] powerpc: fix building after binutils changes.
Maybe cite the correct parts of the patch where my questions arose for context. --- diff --git a/arch/powerpc/lib/sstep.c b/arch/powerpc/lib/sstep.c index a94b0cd0bdc5ca..4ffd6791b03ec0 100644 --- a/arch/powerpc/lib/sstep.c +++ b/arch/powerpc/lib/sstep.c @@ -1465,7 +1465,7 @@ int analyse_instr(struct instruction_op *op, const struct pt_regs *regs, switch ((word >> 1) & 0x3ff) { case 598: /* sync */ op->type = BARRIER + BARRIER_SYNC; -#ifdef __powerpc64__ +#ifdef CONFIG_PPC64 switch ((word >> 21) & 3) { case 1: /* lwsync */ op->type = BARRIER + BARRIER_LWSYNC; @@ -3267,9 +3267,11 @@ void emulate_update_regs(struct pt_regs *regs, struct instruction_op *op) case BARRIER_LWSYNC: asm volatile("lwsync" : : : "memory"); break; +#ifdef CONFIG_PPC64 case BARRIER_PTESYNC: asm volatile("ptesync" : : : "memory"); break; +#endif } break; - On Sun, 23 Jan 2022 at 14:43, Mike wrote: > > As some have probably noticed, we are seeing errors like ' Error: > unrecognized opcode: `ptesync'' 'dssall' and 'stbcix' as a result of > binutils changes, making compiling all that more fun again. The only > question on my mind still is this: > > diff --git a/arch/powerpc/include/asm/io.h b/arch/powerpc/include/asm/io.h > index beba4979bff939..d3a9c91cd06a8b 100644 > --- a/arch/powerpc/include/asm/io.h > +++ b/arch/powerpc/include/asm/io.h > @@ -334,7 +334,7 @@ static inline void __raw_writel(unsigned int v, > volatile void __iomem *addr) > } > #define __raw_writel __raw_writel > > -#ifdef __powerpc64__ > +#ifdef CONFIG_PPC64 > static inline unsigned long __raw_readq(const volatile void __iomem *addr) > { > return *(volatile unsigned long __force *)PCI_FIX_ADDR(addr); > @@ -352,7 +352,8 @@ static inline void __raw_writeq_be(unsigned long > v, volatile void __iomem *addr) > __raw_writeq((__force unsigned long)cpu_to_be64(v), addr); > } > #define __raw_writeq_be __raw_writeq_be > - > +#endif > +#ifdef CONFIG_POWER6_CPU > /* > * Real mode versions of the above. Those instructions are only supposed > * to be used in hypervisor real mode as per the architecture spec. > @@ -417,7 +418,7 @@ static inline u64 __raw_rm_readq(volatile void > __iomem *paddr) > : "=r" (ret) : "r" (paddr) : "memory"); > return ret; > } > -#endif /* __powerpc64__ */ > > +#endif /* CONFIG_POWER6_CPU */ > > --- > Will there come a mail saying this broke the PPC6'ish based CPU > someone made in their garage? And lwesync is a valid PPC32 > instruction, should i just follow the example above where > BARRIER_LWESYNC is PPC64 only? > > https://github.com/threader/linux/commits/master-build-ppc - linux-next > > Best regards. > Michael Heltne
[PATCH 6/6] KVM: PPC: Book3S HV: Remove KVMPPC_NR_LPIDS
KVMPPC_NR_LPIDS no longer represents any size restriction on the LPID space and can be removed. A CPU with more than 12 LPID bits implemented will now be able to create more than 4095 guests. Signed-off-by: Nicholas Piggin --- arch/powerpc/include/asm/kvm_book3s_asm.h | 3 --- arch/powerpc/kvm/book3s_64_mmu_hv.c | 3 --- 2 files changed, 6 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_book3s_asm.h b/arch/powerpc/include/asm/kvm_book3s_asm.h index e6bda70b1d93..c8882d9b86c2 100644 --- a/arch/powerpc/include/asm/kvm_book3s_asm.h +++ b/arch/powerpc/include/asm/kvm_book3s_asm.h @@ -14,9 +14,6 @@ #define XICS_MFRR 0xc #define XICS_IPI 2 /* interrupt source # for IPIs */ -/* LPIDs we support with this build -- runtime limit may be lower */ -#define KVMPPC_NR_LPIDS(1UL << 12) - /* Maximum number of threads per physical core */ #define MAX_SMT_THREADS8 diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c index f983fb36cbf2..aafd2a74304c 100644 --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c @@ -269,9 +269,6 @@ int kvmppc_mmu_hv_init(void) nr_lpids = 1UL << KVM_MAX_NESTED_GUESTS_SHIFT; } - if (nr_lpids > KVMPPC_NR_LPIDS) - nr_lpids = KVMPPC_NR_LPIDS; - if (!cpu_has_feature(CPU_FTR_ARCH_300)) { /* POWER7 has 10-bit LPIDs, POWER8 has 12-bit LPIDs */ if (cpu_has_feature(CPU_FTR_ARCH_207S)) -- 2.23.0
[PATCH 5/6] KVM: PPC: Book3S Nested: Use explicit 4096 LPID maximum
Rather than tie this to KVMPPC_NR_LPIDS which is becoming more dynamic, fix it to 4096 (12-bits) explicitly for now. kvmhv_get_nested() does not have to check against KVM_MAX_NESTED_GUESTS because the L1 partition table registration hcall already did that, and it checks against the partition table size. This patch also puts all the partition table size calculations into the same form, using 12 for the architected size field shift and 4 for the shift corresponding to the partition table entry size. Signed-of-by: Nicholas Piggin --- arch/powerpc/include/asm/kvm_host.h | 7 ++- arch/powerpc/kvm/book3s_64_mmu_hv.c | 2 +- arch/powerpc/kvm/book3s_hv_nested.c | 24 +++- 3 files changed, 18 insertions(+), 15 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index 5fd0564e5c94..e6fb03884dcc 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -34,7 +34,12 @@ #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE #include /* for MAX_SMT_THREADS */ #define KVM_MAX_VCPU_IDS (MAX_SMT_THREADS * KVM_MAX_VCORES) -#define KVM_MAX_NESTED_GUESTS KVMPPC_NR_LPIDS + +/* + * Limit the nested partition table to 4096 entries (because that's what + * hardware supports). Both guest and host use this value. + */ +#define KVM_MAX_NESTED_GUESTS_SHIFT12 #else #define KVM_MAX_VCPU_IDS KVM_MAX_VCPUS diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c index 5be92d5bc099..f983fb36cbf2 100644 --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c @@ -266,7 +266,7 @@ int kvmppc_mmu_hv_init(void) return -EINVAL; nr_lpids = 1UL << mmu_lpid_bits; } else { - nr_lpids = KVM_MAX_NESTED_GUESTS; + nr_lpids = 1UL << KVM_MAX_NESTED_GUESTS_SHIFT; } if (nr_lpids > KVMPPC_NR_LPIDS) diff --git a/arch/powerpc/kvm/book3s_hv_nested.c b/arch/powerpc/kvm/book3s_hv_nested.c index 1eff969b095c..75169e0753ce 100644 --- a/arch/powerpc/kvm/book3s_hv_nested.c +++ b/arch/powerpc/kvm/book3s_hv_nested.c @@ -439,10 +439,11 @@ long kvmhv_nested_init(void) if (!radix_enabled()) return -ENODEV; - /* find log base 2 of KVMPPC_NR_LPIDS, rounding up */ - ptb_order = __ilog2(KVMPPC_NR_LPIDS - 1) + 1; - if (ptb_order < 8) - ptb_order = 8; + /* Partition table entry is 1<<4 bytes in size, hence the 4. */ + ptb_order = KVM_MAX_NESTED_GUESTS_SHIFT + 4; + /* Minimum partition table size is 1<<12 bytes */ + if (ptb_order < 12) + ptb_order = 12; pseries_partition_tb = kmalloc(sizeof(struct patb_entry) << ptb_order, GFP_KERNEL); if (!pseries_partition_tb) { @@ -450,7 +451,7 @@ long kvmhv_nested_init(void) return -ENOMEM; } - ptcr = __pa(pseries_partition_tb) | (ptb_order - 8); + ptcr = __pa(pseries_partition_tb) | (ptb_order - 12); rc = plpar_hcall_norets(H_SET_PARTITION_TABLE, ptcr); if (rc != H_SUCCESS) { pr_err("kvm-hv: Parent hypervisor does not support nesting (rc=%ld)\n", @@ -534,16 +535,14 @@ long kvmhv_set_partition_table(struct kvm_vcpu *vcpu) long ret = H_SUCCESS; srcu_idx = srcu_read_lock(&kvm->srcu); - /* -* Limit the partition table to 4096 entries (because that's what -* hardware supports), and check the base address. -*/ - if ((ptcr & PRTS_MASK) > 12 - 8 || + /* Check partition size and base address. */ + if ((ptcr & PRTS_MASK) + 12 - 4 > KVM_MAX_NESTED_GUESTS_SHIFT || !kvm_is_visible_gfn(vcpu->kvm, (ptcr & PRTB_MASK) >> PAGE_SHIFT)) ret = H_PARAMETER; srcu_read_unlock(&kvm->srcu, srcu_idx); if (ret == H_SUCCESS) kvm->arch.l1_ptcr = ptcr; + return ret; } @@ -639,7 +638,7 @@ static void kvmhv_update_ptbl_cache(struct kvm_nested_guest *gp) ret = -EFAULT; ptbl_addr = (kvm->arch.l1_ptcr & PRTB_MASK) + (gp->l1_lpid << 4); - if (gp->l1_lpid < (1ul << ((kvm->arch.l1_ptcr & PRTS_MASK) + 8))) { + if (gp->l1_lpid < (1ul << ((kvm->arch.l1_ptcr & PRTS_MASK) + 12 - 4))) { int srcu_idx = srcu_read_lock(&kvm->srcu); ret = kvm_read_guest(kvm, ptbl_addr, &ptbl_entry, sizeof(ptbl_entry)); @@ -809,8 +808,7 @@ struct kvm_nested_guest *kvmhv_get_nested(struct kvm *kvm, int l1_lpid, { struct kvm_nested_guest *gp, *newgp; - if (l1_lpid >= KVM_MAX_NESTED_GUESTS || - l1_lpid >= (1ul << ((kvm->arch.l1_ptcr & PRTS_MASK) + 12 - 4))) + if (l1_lpid >= (1ul << ((kvm->arch.l1_ptcr & PRTS_MASK) + 12 - 4))) return NULL; spin_lock(&kvm->mmu_lock); -- 2.23.0
[PATCH 4/6] KVM: PPC: Book3S HV Nested: Change nested guest lookup to use idr
This removes the fixed sized kvm->arch.nested_guests array. Signed-off-by: Nicholas Piggin --- arch/powerpc/include/asm/kvm_host.h | 3 +- arch/powerpc/kvm/book3s_hv_nested.c | 110 +++- 2 files changed, 59 insertions(+), 54 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index d9bf60bf0816..5fd0564e5c94 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -326,8 +326,7 @@ struct kvm_arch { struct list_head uvmem_pfns; struct mutex mmu_setup_lock;/* nests inside vcpu mutexes */ u64 l1_ptcr; - int max_nested_lpid; - struct kvm_nested_guest *nested_guests[KVM_MAX_NESTED_GUESTS]; + struct idr kvm_nested_guest_idr; /* This array can grow quite large, keep it at the end */ struct kvmppc_vcore *vcores[KVM_MAX_VCORES]; #endif diff --git a/arch/powerpc/kvm/book3s_hv_nested.c b/arch/powerpc/kvm/book3s_hv_nested.c index 9d373f8963ee..1eff969b095c 100644 --- a/arch/powerpc/kvm/book3s_hv_nested.c +++ b/arch/powerpc/kvm/book3s_hv_nested.c @@ -521,11 +521,6 @@ static void kvmhv_set_nested_ptbl(struct kvm_nested_guest *gp) kvmhv_set_ptbl_entry(gp->shadow_lpid, dw0, gp->process_table); } -void kvmhv_vm_nested_init(struct kvm *kvm) -{ - kvm->arch.max_nested_lpid = -1; -} - /* * Handle the H_SET_PARTITION_TABLE hcall. * r4 = guest real address of partition table + log_2(size) - 12 @@ -660,6 +655,35 @@ static void kvmhv_update_ptbl_cache(struct kvm_nested_guest *gp) kvmhv_set_nested_ptbl(gp); } +void kvmhv_vm_nested_init(struct kvm *kvm) +{ + idr_init(&kvm->arch.kvm_nested_guest_idr); +} + +static struct kvm_nested_guest *__find_nested(struct kvm *kvm, int lpid) +{ + return idr_find(&kvm->arch.kvm_nested_guest_idr, lpid); +} + +static bool __prealloc_nested(struct kvm *kvm, int lpid) +{ + if (idr_alloc(&kvm->arch.kvm_nested_guest_idr, + NULL, lpid, lpid + 1, GFP_KERNEL) != lpid) + return false; + return true; +} + +static void __add_nested(struct kvm *kvm, int lpid, struct kvm_nested_guest *gp) +{ + if (idr_replace(&kvm->arch.kvm_nested_guest_idr, gp, lpid)) + WARN_ON(1); +} + +static void __remove_nested(struct kvm *kvm, int lpid) +{ + idr_remove(&kvm->arch.kvm_nested_guest_idr, lpid); +} + static struct kvm_nested_guest *kvmhv_alloc_nested(struct kvm *kvm, unsigned int lpid) { struct kvm_nested_guest *gp; @@ -720,13 +744,8 @@ static void kvmhv_remove_nested(struct kvm_nested_guest *gp) long ref; spin_lock(&kvm->mmu_lock); - if (gp == kvm->arch.nested_guests[lpid]) { - kvm->arch.nested_guests[lpid] = NULL; - if (lpid == kvm->arch.max_nested_lpid) { - while (--lpid >= 0 && !kvm->arch.nested_guests[lpid]) - ; - kvm->arch.max_nested_lpid = lpid; - } + if (gp == __find_nested(kvm, lpid)) { + __remove_nested(kvm, lpid); --gp->refcnt; } ref = gp->refcnt; @@ -743,24 +762,22 @@ static void kvmhv_remove_nested(struct kvm_nested_guest *gp) */ void kvmhv_release_all_nested(struct kvm *kvm) { - int i; + int lpid; struct kvm_nested_guest *gp; struct kvm_nested_guest *freelist = NULL; struct kvm_memory_slot *memslot; int srcu_idx, bkt; spin_lock(&kvm->mmu_lock); - for (i = 0; i <= kvm->arch.max_nested_lpid; i++) { - gp = kvm->arch.nested_guests[i]; - if (!gp) - continue; - kvm->arch.nested_guests[i] = NULL; + idr_for_each_entry(&kvm->arch.kvm_nested_guest_idr, gp, lpid) { + __remove_nested(kvm, lpid); if (--gp->refcnt == 0) { gp->next = freelist; freelist = gp; } } - kvm->arch.max_nested_lpid = -1; + idr_destroy(&kvm->arch.kvm_nested_guest_idr); + /* idr is empty and may be reused at this point */ spin_unlock(&kvm->mmu_lock); while ((gp = freelist) != NULL) { freelist = gp->next; @@ -797,7 +814,7 @@ struct kvm_nested_guest *kvmhv_get_nested(struct kvm *kvm, int l1_lpid, return NULL; spin_lock(&kvm->mmu_lock); - gp = kvm->arch.nested_guests[l1_lpid]; + gp = __find_nested(kvm, l1_lpid); if (gp) ++gp->refcnt; spin_unlock(&kvm->mmu_lock); @@ -808,17 +825,19 @@ struct kvm_nested_guest *kvmhv_get_nested(struct kvm *kvm, int l1_lpid, newgp = kvmhv_alloc_nested(kvm, l1_lpid); if (!newgp) return NULL; + + if (!__prealloc_nested(kvm, l1_lpid)) { + kvmhv_release_nested(newgp); + return NULL; + } + spin_loc
[PATCH 3/6] KVM: PPC: Book3S HV: Use IDA allocator for LPID allocator
This removes the fixed-size lpid_inuse array. Signed-off-by: Nicholas Piggin --- arch/powerpc/kvm/powerpc.c | 25 + 1 file changed, 13 insertions(+), 12 deletions(-) diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c index 102993462872..c527a5751b46 100644 --- a/arch/powerpc/kvm/powerpc.c +++ b/arch/powerpc/kvm/powerpc.c @@ -2453,20 +2453,22 @@ long kvm_arch_vm_ioctl(struct file *filp, return r; } -static unsigned long lpid_inuse[BITS_TO_LONGS(KVMPPC_NR_LPIDS)]; +static DEFINE_IDA(lpid_inuse); static unsigned long nr_lpids; long kvmppc_alloc_lpid(void) { - long lpid; + int lpid; - do { - lpid = find_first_zero_bit(lpid_inuse, KVMPPC_NR_LPIDS); - if (lpid >= nr_lpids) { + /* The host LPID must always be 0 (allocation starts at 1) */ + lpid = ida_alloc_range(&lpid_inuse, 1, nr_lpids - 1, GFP_KERNEL); + if (lpid < 0) { + if (lpid == -ENOMEM) + pr_err("%s: Out of memory\n", __func__); + else pr_err("%s: No LPIDs free\n", __func__); - return -ENOMEM; - } - } while (test_and_set_bit(lpid, lpid_inuse)); + return -ENOMEM; + } return lpid; } @@ -2474,15 +2476,14 @@ EXPORT_SYMBOL_GPL(kvmppc_alloc_lpid); void kvmppc_free_lpid(long lpid) { - clear_bit(lpid, lpid_inuse); + ida_free(&lpid_inuse, lpid); } EXPORT_SYMBOL_GPL(kvmppc_free_lpid); +/* nr_lpids_param includes the host LPID */ void kvmppc_init_lpid(unsigned long nr_lpids_param) { - nr_lpids = min_t(unsigned long, KVMPPC_NR_LPIDS, nr_lpids_param); - memset(lpid_inuse, 0, sizeof(lpid_inuse)); - set_bit(0, lpid_inuse); /* The host LPID must always be 0 */ + nr_lpids = nr_lpids_param; } EXPORT_SYMBOL_GPL(kvmppc_init_lpid); -- 2.23.0
[PATCH 2/6] KVM: PPC: Book3S HV: Update LPID allocator init for POWER9, Nested
The LPID allocator init is changed to: - use mmu_lpid_bits rather than hard-coding; - use KVM_MAX_NESTED_GUESTS for nested hypervisors; - not reserve the top LPID on POWER9 and newer CPUs. The reserved LPID is made a POWER7/8-specific detail. Signed-off-by: Nicholas Piggin --- arch/powerpc/include/asm/kvm_book3s_asm.h | 2 +- arch/powerpc/include/asm/reg.h| 2 -- arch/powerpc/kvm/book3s_64_mmu_hv.c | 29 --- arch/powerpc/kvm/book3s_hv_rmhandlers.S | 8 +++ arch/powerpc/mm/init_64.c | 3 +++ 5 files changed, 33 insertions(+), 11 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_book3s_asm.h b/arch/powerpc/include/asm/kvm_book3s_asm.h index b6d31bff5209..e6bda70b1d93 100644 --- a/arch/powerpc/include/asm/kvm_book3s_asm.h +++ b/arch/powerpc/include/asm/kvm_book3s_asm.h @@ -15,7 +15,7 @@ #define XICS_IPI 2 /* interrupt source # for IPIs */ /* LPIDs we support with this build -- runtime limit may be lower */ -#define KVMPPC_NR_LPIDS(LPID_RSVD + 1) +#define KVMPPC_NR_LPIDS(1UL << 12) /* Maximum number of threads per physical core */ #define MAX_SMT_THREADS8 diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h index 1e14324c5190..1e8b2e04e626 100644 --- a/arch/powerpc/include/asm/reg.h +++ b/arch/powerpc/include/asm/reg.h @@ -473,8 +473,6 @@ #ifndef SPRN_LPID #define SPRN_LPID 0x13F /* Logical Partition Identifier */ #endif -#define LPID_RSVD_POWER7 0x3ff /* Reserved LPID for partn switching */ -#define LPID_RSVD0xfff /* Reserved LPID for partn switching */ #defineSPRN_HMER 0x150 /* Hypervisor maintenance exception reg */ #define HMER_DEBUG_TRIG (1ul << (63 - 17)) /* Debug trigger */ #defineSPRN_HMEER 0x151 /* Hyp maintenance exception enable reg */ diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c index 09fc52b6f390..5be92d5bc099 100644 --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c @@ -256,7 +256,7 @@ void kvmppc_map_vrma(struct kvm_vcpu *vcpu, struct kvm_memory_slot *memslot, int kvmppc_mmu_hv_init(void) { - unsigned long rsvd_lpid; + unsigned long nr_lpids; if (!mmu_has_feature(MMU_FTR_LOCKLESS_TLBIE)) return -EINVAL; @@ -264,16 +264,29 @@ int kvmppc_mmu_hv_init(void) if (cpu_has_feature(CPU_FTR_HVMODE)) { if (WARN_ON(mfspr(SPRN_LPID) != 0)) return -EINVAL; + nr_lpids = 1UL << mmu_lpid_bits; + } else { + nr_lpids = KVM_MAX_NESTED_GUESTS; } - /* POWER8 and above have 12-bit LPIDs (10-bit in POWER7) */ - if (cpu_has_feature(CPU_FTR_ARCH_207S)) - rsvd_lpid = LPID_RSVD; - else - rsvd_lpid = LPID_RSVD_POWER7; + if (nr_lpids > KVMPPC_NR_LPIDS) + nr_lpids = KVMPPC_NR_LPIDS; + + if (!cpu_has_feature(CPU_FTR_ARCH_300)) { + /* POWER7 has 10-bit LPIDs, POWER8 has 12-bit LPIDs */ + if (cpu_has_feature(CPU_FTR_ARCH_207S)) + WARN_ON(nr_lpids != 1UL << 12); + else + WARN_ON(nr_lpids != 1UL << 10); + + /* +* Reserve the last implemented LPID use in partition +* switching for POWER7 and POWER8. +*/ + nr_lpids -= 1; + } - /* rsvd_lpid is reserved for use in partition switching */ - kvmppc_init_lpid(rsvd_lpid); + kvmppc_init_lpid(nr_lpids); return 0; } diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S index d185dee26026..0c552885a032 100644 --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S @@ -50,6 +50,14 @@ #define STACK_SLOT_UAMOR (SFS-88) #define STACK_SLOT_FSCR(SFS-96) +/* + * Use the last LPID (all implemented LPID bits = 1) for partition switching. + * This is reserved in the LPID allocator. POWER7 only implements 0x3ff, but + * we write 0xfff into the LPID SPR anyway, which seems to work and just + * ignores the top bits. + */ +#define LPID_RSVD0xfff + /* * Call kvmppc_hv_entry in real mode. * Must be called with interrupts hard-disabled. diff --git a/arch/powerpc/mm/init_64.c b/arch/powerpc/mm/init_64.c index 35f46bf54281..ad1a41e3ff1c 100644 --- a/arch/powerpc/mm/init_64.c +++ b/arch/powerpc/mm/init_64.c @@ -371,6 +371,9 @@ void register_page_bootmem_memmap(unsigned long section_nr, #ifdef CONFIG_PPC_BOOK3S_64 unsigned int mmu_lpid_bits; +#ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE +EXPORT_SYMBOL_GPL(mmu_lpid_bits); +#endif unsigned int mmu_pid_bits; static bool disable_radix = !IS_ENABLED(CONFIG_PPC_RADIX_MMU_DEFAULT); -- 2.23.0
[PATCH 1/6] KVM: PPC: Remove kvmppc_claim_lpid
Removing kvmppc_claim_lpid makes the lpid allocator API a bit simpler to change the underlying implementation in a future patch. The host LPID is always 0, so that can be a detail of the allocator. If the allocator range is restricted, that can reserve LPIDs at the top of the range. This allows kvmppc_claim_lpid to be removed. --- arch/powerpc/include/asm/kvm_ppc.h | 1 - arch/powerpc/kvm/book3s_64_mmu_hv.c | 14 ++ arch/powerpc/kvm/e500mc.c | 1 - arch/powerpc/kvm/powerpc.c | 7 +-- 4 files changed, 7 insertions(+), 16 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h index a14dbcd1b8ce..7e22199a95c9 100644 --- a/arch/powerpc/include/asm/kvm_ppc.h +++ b/arch/powerpc/include/asm/kvm_ppc.h @@ -863,7 +863,6 @@ int kvm_vcpu_ioctl_dirty_tlb(struct kvm_vcpu *vcpu, struct kvm_dirty_tlb *cfg); long kvmppc_alloc_lpid(void); -void kvmppc_claim_lpid(long lpid); void kvmppc_free_lpid(long lpid); void kvmppc_init_lpid(unsigned long nr_lpids); diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c index 213232914367..09fc52b6f390 100644 --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c @@ -256,14 +256,15 @@ void kvmppc_map_vrma(struct kvm_vcpu *vcpu, struct kvm_memory_slot *memslot, int kvmppc_mmu_hv_init(void) { - unsigned long host_lpid, rsvd_lpid; + unsigned long rsvd_lpid; if (!mmu_has_feature(MMU_FTR_LOCKLESS_TLBIE)) return -EINVAL; - host_lpid = 0; - if (cpu_has_feature(CPU_FTR_HVMODE)) - host_lpid = mfspr(SPRN_LPID); + if (cpu_has_feature(CPU_FTR_HVMODE)) { + if (WARN_ON(mfspr(SPRN_LPID) != 0)) + return -EINVAL; + } /* POWER8 and above have 12-bit LPIDs (10-bit in POWER7) */ if (cpu_has_feature(CPU_FTR_ARCH_207S)) @@ -271,11 +272,8 @@ int kvmppc_mmu_hv_init(void) else rsvd_lpid = LPID_RSVD_POWER7; - kvmppc_init_lpid(rsvd_lpid + 1); - - kvmppc_claim_lpid(host_lpid); /* rsvd_lpid is reserved for use in partition switching */ - kvmppc_claim_lpid(rsvd_lpid); + kvmppc_init_lpid(rsvd_lpid); return 0; } diff --git a/arch/powerpc/kvm/e500mc.c b/arch/powerpc/kvm/e500mc.c index 1c189b5aadcc..7087d8f2037a 100644 --- a/arch/powerpc/kvm/e500mc.c +++ b/arch/powerpc/kvm/e500mc.c @@ -398,7 +398,6 @@ static int __init kvmppc_e500mc_init(void) * allocator. */ kvmppc_init_lpid(KVMPPC_NR_LPIDS/threads_per_core); - kvmppc_claim_lpid(0); /* host */ r = kvm_init(NULL, sizeof(struct kvmppc_vcpu_e500), 0, THIS_MODULE); if (r) diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c index 2ad0ccd202d5..102993462872 100644 --- a/arch/powerpc/kvm/powerpc.c +++ b/arch/powerpc/kvm/powerpc.c @@ -2472,12 +2472,6 @@ long kvmppc_alloc_lpid(void) } EXPORT_SYMBOL_GPL(kvmppc_alloc_lpid); -void kvmppc_claim_lpid(long lpid) -{ - set_bit(lpid, lpid_inuse); -} -EXPORT_SYMBOL_GPL(kvmppc_claim_lpid); - void kvmppc_free_lpid(long lpid) { clear_bit(lpid, lpid_inuse); @@ -2488,6 +2482,7 @@ void kvmppc_init_lpid(unsigned long nr_lpids_param) { nr_lpids = min_t(unsigned long, KVMPPC_NR_LPIDS, nr_lpids_param); memset(lpid_inuse, 0, sizeof(lpid_inuse)); + set_bit(0, lpid_inuse); /* The host LPID must always be 0 */ } EXPORT_SYMBOL_GPL(kvmppc_init_lpid); -- 2.23.0
[PATCH 0/6] KVM: PPC: Book3S: Make LPID/nested LPID allocations dynamic
With LPID width plumbed through from firmware, LPID allocations can now be dynamic, which requires changing the fixed sized bitmap. Rather than just dynamically sizing it, switch to IDA allocator. Nested KVM stays with a fixed 12-bit LPID width for now, but it is also moved to a more dynamic allocator. In future if nested LPID width is advertised to a guest it will be simple to take advantage of it. Thanks, Nick Nicholas Piggin (6): KVM: PPC: Remove kvmppc_claim_lpid KVM: PPC: Book3S HV: Update LPID allocator init for POWER9, Nested KVM: PPC: Book3S HV: Use IDA allocator for LPID allocator KVM: PPC: Book3S HV Nested: Change nested guest lookup to use idr KVM: PPC: Book3S Nested: Use explicit 4096 LPID maximum KVM: PPC: Book3S HV: Remove KVMPPC_NR_LPIDS arch/powerpc/include/asm/kvm_book3s_asm.h | 3 - arch/powerpc/include/asm/kvm_host.h | 10 +- arch/powerpc/include/asm/kvm_ppc.h| 1 - arch/powerpc/include/asm/reg.h| 2 - arch/powerpc/kvm/book3s_64_mmu_hv.c | 34 +++--- arch/powerpc/kvm/book3s_hv_nested.c | 134 +++--- arch/powerpc/kvm/book3s_hv_rmhandlers.S | 8 ++ arch/powerpc/kvm/e500mc.c | 1 - arch/powerpc/kvm/powerpc.c| 30 +++-- arch/powerpc/mm/init_64.c | 3 + 10 files changed, 121 insertions(+), 105 deletions(-) -- 2.23.0
[PATCH] KVM: PPC: Book3S HV P9: Optimise loads around context switch
It is better to get all loads for the register values in flight before starting to switch LPID, PID, and LPCR because those mtSPRs are expensive and serialising. This also just tidies up the code for a potential future change to the context switching sequence. Signed-off-by: Nicholas Piggin --- arch/powerpc/kvm/book3s_hv_p9_entry.c | 15 +++ 1 file changed, 11 insertions(+), 4 deletions(-) diff --git a/arch/powerpc/kvm/book3s_hv_p9_entry.c b/arch/powerpc/kvm/book3s_hv_p9_entry.c index a28e5b3daabd..9dba3e3f65a0 100644 --- a/arch/powerpc/kvm/book3s_hv_p9_entry.c +++ b/arch/powerpc/kvm/book3s_hv_p9_entry.c @@ -539,8 +539,10 @@ static void switch_mmu_to_guest_radix(struct kvm *kvm, struct kvm_vcpu *vcpu, u6 { struct kvm_nested_guest *nested = vcpu->arch.nested; u32 lpid; + u32 pid; lpid = nested ? nested->shadow_lpid : kvm->arch.lpid; + pid = vcpu->arch.pid; /* * Prior memory accesses to host PID Q3 must be completed before we @@ -551,7 +553,7 @@ static void switch_mmu_to_guest_radix(struct kvm *kvm, struct kvm_vcpu *vcpu, u6 isync(); mtspr(SPRN_LPID, lpid); mtspr(SPRN_LPCR, lpcr); - mtspr(SPRN_PID, vcpu->arch.pid); + mtspr(SPRN_PID, pid); /* * isync not required here because we are HRFID'ing to guest before * any guest context access, which is context synchronising. @@ -561,9 +563,11 @@ static void switch_mmu_to_guest_radix(struct kvm *kvm, struct kvm_vcpu *vcpu, u6 static void switch_mmu_to_guest_hpt(struct kvm *kvm, struct kvm_vcpu *vcpu, u64 lpcr) { u32 lpid; + u32 pid; int i; lpid = kvm->arch.lpid; + pid = vcpu->arch.pid; /* * See switch_mmu_to_guest_radix. ptesync should not be required here @@ -574,7 +578,7 @@ static void switch_mmu_to_guest_hpt(struct kvm *kvm, struct kvm_vcpu *vcpu, u64 isync(); mtspr(SPRN_LPID, lpid); mtspr(SPRN_LPCR, lpcr); - mtspr(SPRN_PID, vcpu->arch.pid); + mtspr(SPRN_PID, pid); for (i = 0; i < vcpu->arch.slb_max; i++) mtslb(vcpu->arch.slb[i].orige, vcpu->arch.slb[i].origv); @@ -585,6 +589,9 @@ static void switch_mmu_to_guest_hpt(struct kvm *kvm, struct kvm_vcpu *vcpu, u64 static void switch_mmu_to_host(struct kvm *kvm, u32 pid) { + u32 lpid = kvm->arch.host_lpid; + u64 lpcr = kvm->arch.host_lpcr; + /* * The guest has exited, so guest MMU context is no longer being * non-speculatively accessed, but a hwsync is needed before the @@ -594,8 +601,8 @@ static void switch_mmu_to_host(struct kvm *kvm, u32 pid) asm volatile("hwsync" ::: "memory"); isync(); mtspr(SPRN_PID, pid); - mtspr(SPRN_LPID, kvm->arch.host_lpid); - mtspr(SPRN_LPCR, kvm->arch.host_lpcr); + mtspr(SPRN_LPID, lpid); + mtspr(SPRN_LPCR, lpcr); /* * isync is not required after the switch, because mtmsrd with L=0 * is performed after this switch, which is context synchronising. -- 2.23.0
[PATCH] powerpc: fix building after binutils changes.
As some have probably noticed, we are seeing errors like ' Error: unrecognized opcode: `ptesync'' 'dssall' and 'stbcix' as a result of binutils changes, making compiling all that more fun again. The only question on my mind still is this: diff --git a/arch/powerpc/include/asm/io.h b/arch/powerpc/include/asm/io.h index beba4979bff939..d3a9c91cd06a8b 100644 --- a/arch/powerpc/include/asm/io.h +++ b/arch/powerpc/include/asm/io.h @@ -334,7 +334,7 @@ static inline void __raw_writel(unsigned int v, volatile void __iomem *addr) } #define __raw_writel __raw_writel -#ifdef __powerpc64__ +#ifdef CONFIG_PPC64 static inline unsigned long __raw_readq(const volatile void __iomem *addr) { return *(volatile unsigned long __force *)PCI_FIX_ADDR(addr); @@ -352,7 +352,8 @@ static inline void __raw_writeq_be(unsigned long v, volatile void __iomem *addr) __raw_writeq((__force unsigned long)cpu_to_be64(v), addr); } #define __raw_writeq_be __raw_writeq_be - +#endif +#ifdef CONFIG_POWER6_CPU /* * Real mode versions of the above. Those instructions are only supposed * to be used in hypervisor real mode as per the architecture spec. @@ -417,7 +418,7 @@ static inline u64 __raw_rm_readq(volatile void __iomem *paddr) : "=r" (ret) : "r" (paddr) : "memory"); return ret; } -#endif /* __powerpc64__ */ +#endif /* CONFIG_POWER6_CPU */ --- Will there come a mail saying this broke the PPC6'ish based CPU someone made in their garage? And lwesync is a valid PPC32 instruction, should i just follow the example above where BARRIER_LWESYNC is PPC64 only? https://github.com/threader/linux/commits/master-build-ppc - linux-next Best regards. Michael Heltne From 226efa05733457bb5c483f30aab6d5c6a304422c Mon Sep 17 00:00:00 2001 From: threader Date: Sun, 23 Jan 2022 14:17:10 +0100 Subject: [PATCH] arch: powerpc: fix building after binutils changes. 'dssall' in mmu_context.c is an altivec instruction, build that accordingly. 'ptesync' is a PPC64 instruction, so dont go there for if not. And apparently ifdef __powerpc64__ isnt enough in all configurations and 'stbcix' and friends, all POWER6 instructions hopefully not needed by CONFIG_PPC64 in general, wanted to play. Signed-off-by: Micahel B Heltne --- arch/powerpc/include/asm/io.h | 7 --- arch/powerpc/lib/sstep.c | 4 +++- arch/powerpc/mm/Makefile | 3 +++ arch/powerpc/mm/pageattr.c| 4 ++-- 4 files changed, 12 insertions(+), 6 deletions(-) diff --git a/arch/powerpc/include/asm/io.h b/arch/powerpc/include/asm/io.h index beba4979bff939..d3a9c91cd06a8b 100644 --- a/arch/powerpc/include/asm/io.h +++ b/arch/powerpc/include/asm/io.h @@ -334,7 +334,7 @@ static inline void __raw_writel(unsigned int v, volatile void __iomem *addr) } #define __raw_writel __raw_writel -#ifdef __powerpc64__ +#ifdef CONFIG_PPC64 static inline unsigned long __raw_readq(const volatile void __iomem *addr) { return *(volatile unsigned long __force *)PCI_FIX_ADDR(addr); @@ -352,7 +352,8 @@ static inline void __raw_writeq_be(unsigned long v, volatile void __iomem *addr) __raw_writeq((__force unsigned long)cpu_to_be64(v), addr); } #define __raw_writeq_be __raw_writeq_be - +#endif +#ifdef CONFIG_POWER6_CPU /* * Real mode versions of the above. Those instructions are only supposed * to be used in hypervisor real mode as per the architecture spec. @@ -417,7 +418,7 @@ static inline u64 __raw_rm_readq(volatile void __iomem *paddr) : "=r" (ret) : "r" (paddr) : "memory"); return ret; } -#endif /* __powerpc64__ */ +#endif /* CONFIG_POWER6_CPU */ /* * diff --git a/arch/powerpc/lib/sstep.c b/arch/powerpc/lib/sstep.c index a94b0cd0bdc5ca..4ffd6791b03ec0 100644 --- a/arch/powerpc/lib/sstep.c +++ b/arch/powerpc/lib/sstep.c @@ -1465,7 +1465,7 @@ int analyse_instr(struct instruction_op *op, const struct pt_regs *regs, switch ((word >> 1) & 0x3ff) { case 598: /* sync */ op->type = BARRIER + BARRIER_SYNC; -#ifdef __powerpc64__ +#ifdef CONFIG_PPC64 switch ((word >> 21) & 3) { case 1: /* lwsync */ op->type = BARRIER + BARRIER_LWSYNC; @@ -3267,9 +3267,11 @@ void emulate_update_regs(struct pt_regs *regs, struct instruction_op *op) case BARRIER_LWSYNC: asm volatile("lwsync" : : : "memory"); break; +#ifdef CONFIG_PPC64 case BARRIER_PTESYNC: asm volatile("ptesync" : : : "memory"); break; +#endif } break; diff --git a/arch/powerpc/mm/Makefile b/arch/powerpc/mm/Makefile index df8172da2301b7..2f87e77315997a 100644 --- a/arch/powerpc/mm/Makefile +++ b/arch/powerpc/mm/Makefile @@ -4,6 +4,9 @@ # ccflags-$(CONFIG_PPC64) := $(NO_MINIMAL_TOC) +ifeq ($(CONFIG_ALTIVEC),y) +CFLAGS_mmu_context.o += $(call cc-option, -maltivec, -mabi=altivec) +endif obj-y:= fault.o mem.o pgtable.o mmap.o maccess.o pageattr.o \ init_$(BITS).o pgtable_$(BITS).o \ diff --git a/arch/powerpc/mm/pageattr.c b/arch/powerpc/mm/pageattr.c index edea388e9d3fbb..ccd04a386e28fc 100644 --- a/arch/powerpc/mm/pageattr.c +++ b
[GIT PULL] Please pull powerpc/linux.git powerpc-5.17-2 tag
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Hi Linus, Please pull powerpc fixes for 5.17. There's a change to kernel/bpf and one in tools/bpf, both have Daniel's ack. cheers The following changes since commit 29ec39fcf11e4583eb8d5174f756ea109c77cc44: Merge tag 'powerpc-5.17-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux (2022-01-14 15:17:26 +0100) are available in the git repository at: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git tags/powerpc-5.17-2 for you to fetch changes up to aee101d7b95a03078945681dd7f7ea5e4a1e7686: powerpc/64s: Mask SRR0 before checking against the masked NIP (2022-01-18 10:25:18 +1100) - -- powerpc fixes for 5.17 #2 - A series of bpf fixes, including an oops fix and some codegen fixes. - Fix a regression in syscall_get_arch() for compat processes. - Fix boot failure on some 32-bit systems with KASAN enabled. - A couple of other build/minor fixes. Thanks to: Athira Rajeev, Christophe Leroy, Dmitry V. Levin, Jiri Olsa, Johan Almbladh, Maxime Bizon, Naveen N. Rao, Nicholas Piggin. - -- Athira Rajeev (1): powerpc/perf: Only define power_pmu_wants_prompt_pmi() for CONFIG_PPC64 Christophe Leroy (3): powerpc/audit: Fix syscall_get_arch() powerpc/time: Fix build failure due to do_hard_irq_enable() on PPC32 powerpc/32s: Fix kasan_init_region() for KASAN Naveen N. Rao (5): bpf: Guard against accessing NULL pt_regs in bpf_get_task_stack() powerpc32/bpf: Fix codegen for bpf-to-bpf calls powerpc/bpf: Update ldimm64 instructions during extra pass tools/bpf: Rename 'struct event' to avoid naming conflict powerpc64/bpf: Limit 'ldbrx' to processors compliant with ISA v2.06 Nicholas Piggin (1): powerpc/64s: Mask SRR0 before checking against the masked NIP arch/powerpc/include/asm/book3s/32/mmu-hash.h | 2 + arch/powerpc/include/asm/hw_irq.h | 2 +- arch/powerpc/include/asm/ppc-opcode.h | 1 + arch/powerpc/include/asm/syscall.h| 4 +- arch/powerpc/include/asm/thread_info.h| 2 + arch/powerpc/kernel/interrupt_64.S| 2 + arch/powerpc/mm/book3s32/mmu.c| 10 ++-- arch/powerpc/mm/kasan/book3s_32.c | 59 ++-- arch/powerpc/net/bpf_jit_comp.c | 29 -- arch/powerpc/net/bpf_jit_comp32.c | 9 +++ arch/powerpc/net/bpf_jit_comp64.c | 29 ++ arch/powerpc/perf/core-book3s.c | 58 ++- kernel/bpf/stackmap.c | 5 +- tools/bpf/runqslower/runqslower.bpf.c | 2 +- tools/bpf/runqslower/runqslower.c | 2 +- tools/bpf/runqslower/runqslower.h | 2 +- 16 files changed, 131 insertions(+), 87 deletions(-) -BEGIN PGP SIGNATURE- iQIzBAEBCAAdFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAmHtOYQACgkQUevqPMjh pYAZHw//UQj2TYAqdcrkDE2tz81s6/ifbnHsypz4vU9YV8muJUFsXpt9MPbvQhoq gvUnG3gkMNoXxQ+YDKa2ygN/MLC78ch+4VYWyGGzNcpqVxKWhPqbH/Gt7KvMGOZr LtnUCYjw462GBGrU7VI+yg9ki4c/pRzcSGoU4w346Q2/xIWdcNDb2aZ9a9MiYMCw /SBOpwj2hPhFQsAINVujXgrIHlybon+cDGJdPQptBSqvEq24wFu+F+elzXBcJvfm tVoAe81C077AhT8EGwyM9mTvTmBie+0jgZAkGVsvrUsbJJJY3FV/s923Fc9+lm/m SMD4Pn8ZaN+dPMRUgCMaUZFjCKTyBx182ELlqraZtTTZvFXXt/ZtM5BCvXZqreZU 6XPFs+xMvJN4ZatdVM724hKhR9UoDaDer0zDcMvj1Yqr5E5LL1cl9ZG0fPeIYPdg +tMKCWxvx64OWYwZNyeGr12JNvtrzWruvO/2TD60gGdqXIQH39ds8voaW6AUJOeX xWP5UdEeh1LUPTb5HIEloy7K9QsUlE+fJ+3McbPk2vL01TBbrAjLymPdqCKEDGWe Z74u7iRjggXEopUOLQPQS4L60P/T6a+5oq2j0eUh4NCWXlJA4Iyfez/76BIiov3L qHNn4PjNXNQzR5r9xuhTe+WSZselnCnaVZgqsYnptkfdps5Yd6w= =bxy0 -END PGP SIGNATURE-