Re: remove the legacy ide driver
On Fri, Mar 19, 2021 at 12:43:48PM +1100, Finn Thain wrote: > A few months ago I wrote another patch to move some more platforms away > from macide but it has not been tested yet. That is not to say you should > wait. However, my patch does have some changes that are missing from your > patch series, relating to ide platform devices in arch/m68k/mac/config.c. > I hope to be able to test this patch before the 5.13 merge window closes. Normally we do not remove drivers for hardware that is still used. So at leat for macide my plan was not to take it away unless the users are sufficiently happy. Or in other words: I think waiting it the right choice, but hopefully we can make that wait as short as possible.
Re: [PATCH 00/36] [Set 4] Rid W=1 warnings in SCSI
Lee, > This set is part of a larger effort attempting to clean-up W=1 kernel > builds, which are currently overwhelmingly riddled with niggly little > warnings. Applied to 5.13/scsi-staging, thanks! I fixed a few little things. -- Martin K. Petersen Oracle Linux Engineering
Re: [PATCH 04/10] MIPS: disable CONFIG_IDE in sb1250_swarm_defconfig
On Thu, 18 Mar 2021, Christoph Hellwig wrote: > sb1250_swarm_defconfig enables CONFIG_IDE but no actual host controller > driver, so just drop CONFIG_IDE, CONFIG_BLK_DEV_IDECD and > CONFIG_BLK_DEV_IDETAPE as they are useless. Actually BLK_DEV_PLATFORM would handle the SWARM's platform driver as an IDE device, however the driver has supported libata ever since commit 2fef357cf391 ("IDE: Fix platform device registration in Swarm IDE driver (v2)") back in 2008, so this is good to go. We should probably enable PATA_PLATFORM in the defconfig instead. The printed name of the driver could be improved I suppose though: scsi host0: pata_platform ata1: PATA max PIO0 mmio cmd 0x100b3e00 ctl 0x100b7ec0 irq 36 (PIO3 is actually hardwired; it's an odd interface and people reported issues with it, but I have never had any myself be it with IDE or libata). Acked-by: Maciej W. Rozycki Maciej
[for-stable-4.19 PATCH 0/2] Backport patches to fix KASAN+LKDTM with recent clang on ARM64
Backport 2 patches that are required to make KASAN+LKDTM work with recent clang (patch 2/2 has a complete description). Tested on our chromeos-4.19 branch. Patch 1/2 is context conflict only, and 2/2 is a clean backport. These patches have been merged to 5.4 stable already. We might need to backport to older stable branches, but this is what I could test for now. Mark Rutland (1): lkdtm: don't move ctors to .rodata Thomas Gleixner (1): vmlinux.lds.h: Create section for protection against instrumentation arch/powerpc/kernel/vmlinux.lds.S | 1 + drivers/misc/lkdtm/Makefile | 2 +- drivers/misc/lkdtm/rodata.c | 2 +- include/asm-generic/sections.h| 3 ++ include/asm-generic/vmlinux.lds.h | 10 ++ include/linux/compiler.h | 54 +++ include/linux/compiler_types.h| 4 +++ scripts/mod/modpost.c | 2 +- 8 files changed, 75 insertions(+), 3 deletions(-) -- 2.31.0.rc2.261.g7f71774620-goog
[for-stable-4.19 PATCH 1/2] vmlinux.lds.h: Create section for protection against instrumentation
From: Thomas Gleixner commit 655389433e7efec589838b400a2a652b3ffa upstream. Some code pathes, especially the low level entry code, must be protected against instrumentation for various reasons: - Low level entry code can be a fragile beast, especially on x86. - With NO_HZ_FULL RCU state needs to be established before using it. Having a dedicated section for such code allows to validate with tooling that no unsafe functions are invoked. Add the .noinstr.text section and the noinstr attribute to mark functions. noinstr implies notrace. Kprobes will gain a section check later. Provide also a set of markers: instrumentation_begin()/end() These are used to mark code inside a noinstr function which calls into regular instrumentable text section as safe. The instrumentation markers are only active when CONFIG_DEBUG_ENTRY is enabled as the end marker emits a NOP to prevent the compiler from merging the annotation points. This means the objtool verification requires a kernel compiled with this option. Signed-off-by: Thomas Gleixner Reviewed-by: Alexandre Chartre Acked-by: Peter Zijlstra Link: https://lkml.kernel.org/r/20200505134100.075416...@linutronix.de [Nicolas: context conflicts in: arch/powerpc/kernel/vmlinux.lds.S include/asm-generic/vmlinux.lds.h include/linux/compiler.h include/linux/compiler_types.h] Signed-off-by: Nicolas Boichat --- arch/powerpc/kernel/vmlinux.lds.S | 1 + include/asm-generic/sections.h| 3 ++ include/asm-generic/vmlinux.lds.h | 10 ++ include/linux/compiler.h | 54 +++ include/linux/compiler_types.h| 4 +++ scripts/mod/modpost.c | 2 +- 6 files changed, 73 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/kernel/vmlinux.lds.S b/arch/powerpc/kernel/vmlinux.lds.S index 695432965f20..9b346f3d2814 100644 --- a/arch/powerpc/kernel/vmlinux.lds.S +++ b/arch/powerpc/kernel/vmlinux.lds.S @@ -99,6 +99,7 @@ SECTIONS #endif /* careful! __ftr_alt_* sections need to be close to .text */ *(.text.hot TEXT_MAIN .text.fixup .text.unlikely .fixup __ftr_alt_* .ref.text); + NOINSTR_TEXT SCHED_TEXT CPUIDLE_TEXT LOCK_TEXT diff --git a/include/asm-generic/sections.h b/include/asm-generic/sections.h index 849cd8eb5ca0..ea5987bb0b84 100644 --- a/include/asm-generic/sections.h +++ b/include/asm-generic/sections.h @@ -53,6 +53,9 @@ extern char __ctors_start[], __ctors_end[]; /* Start and end of .opd section - used for function descriptors. */ extern char __start_opd[], __end_opd[]; +/* Start and end of instrumentation protected text section */ +extern char __noinstr_text_start[], __noinstr_text_end[]; + extern __visible const void __nosave_begin, __nosave_end; /* Function descriptor handling (if any). Override in asm/sections.h */ diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h index 2d632a74cc5e..88484ee023ca 100644 --- a/include/asm-generic/vmlinux.lds.h +++ b/include/asm-generic/vmlinux.lds.h @@ -482,6 +482,15 @@ __security_initcall_end = .;\ } +/* + * Non-instrumentable text section + */ +#define NOINSTR_TEXT \ + ALIGN_FUNCTION(); \ + __noinstr_text_start = .; \ + *(.noinstr.text)\ + __noinstr_text_end = .; + /* * .text section. Map to function alignment to avoid address changes * during second ld run in second ld pass when generating System.map @@ -496,6 +505,7 @@ *(TEXT_MAIN .text.fixup)\ *(.text.unlikely .text.unlikely.*) \ *(.text.unknown .text.unknown.*)\ + NOINSTR_TEXT\ *(.text..refcount) \ *(.ref.text)\ MEM_KEEP(init.text*)\ diff --git a/include/linux/compiler.h b/include/linux/compiler.h index 6b6505e3b2c7..6a53300cbd1e 100644 --- a/include/linux/compiler.h +++ b/include/linux/compiler.h @@ -129,11 +129,65 @@ void ftrace_likely_update(struct ftrace_likely_data *f, int val, ".pushsection .discard.unreachable\n\t" \ ".long 999b - .\n\t"\ ".popsection\n\t" + +#ifdef CONFIG_DEBUG_ENTRY +/* Begin/end of an instrumentation safe region */ +#define instrumentation_begin() ({ \ + asm volatile("%c0:\n\t" \ +".pushsection
Re: [RFC PATCH 8/8] powerpc/64/asm: don't reassign labels
Excerpts from Daniel Axtens's message of February 26, 2021 10:28 am: > Segher Boessenkool writes: > >> On Thu, Feb 25, 2021 at 02:10:06PM +1100, Daniel Axtens wrote: >>> The assembler really does not like us reassigning things to the same >>> label: >>> >>> :7:9: error: invalid reassignment of non-absolute variable >>> 'fs_label' >>> >>> This happens across a bunch of platforms: >>> https://github.com/ClangBuiltLinux/linux/issues/1043 >>> https://github.com/ClangBuiltLinux/linux/issues/1008 >>> https://github.com/ClangBuiltLinux/linux/issues/920 >>> https://github.com/ClangBuiltLinux/linux/issues/1050 >>> >>> There is no hope of getting this fixed in LLVM, so if we want to build >>> with LLVM_IAS, we need to hack around it ourselves. >>> >>> For us the big problem comes from this: >>> >>> \#define USE_FIXED_SECTION(sname) \ >>> fs_label = start_##sname; \ >>> fs_start = sname##_start; \ >>> use_ftsec sname; >>> >>> \#define USE_TEXT_SECTION() >>> fs_label = start_text; \ >>> fs_start = text_start; \ >>> .text >>> >>> and in particular fs_label. >> >> The "Setting Symbols" super short chapter reads: >> >> "A symbol can be given an arbitrary value by writing a symbol, followed >> by an equals sign '=', followed by an expression. This is equivalent >> to using the '.set' directive." >> >> And ".set" has >> >> "Set the value of SYMBOL to EXPRESSION. This changes SYMBOL's value and >> type to conform to EXPRESSION. If SYMBOL was flagged as external, it >> remains flagged. >> >> You may '.set' a symbol many times in the same assembly provided that >> the values given to the symbol are constants. Values that are based on >> expressions involving other symbols are allowed, but some targets may >> restrict this to only being done once per assembly. This is because >> those targets do not set the addresses of symbols at assembly time, but >> rather delay the assignment until a final link is performed. This >> allows the linker a chance to change the code in the files, changing the >> location of, and the relative distance between, various different >> symbols. >> >> If you '.set' a global symbol, the value stored in the object file is >> the last value stored into it." >> >> So this really should be fixed in clang: it is basic assembler syntax. > > No doubt I have explained this poorly. > > LLVM does allow some things, this builds fine for example: > > .set foo, 8192 > addi %r3, %r3, foo > .set foo, 1234 > addi %r3, %r3, foo > > However, this does not: > > a: > .set foo, a > addi %r3, %r3, foo@l > b: > .set foo, b > addi %r3, %r3, foo-a > > clang -target ppc64le -integrated-as foo.s -o foo.o -c > foo.s:5:11: error: invalid reassignment of non-absolute variable 'foo' in > '.set' directive > .set foo, b > ^ So that does seem to be allowed by the specification. I don't have a huge problem with the patch actually, doesn't seem too bad. Thanks, Nick
Re: [RFC PATCH 7/8] powerpc/purgatory: drop .machine specifier
Excerpts from Segher Boessenkool's message of February 26, 2021 1:58 am: > On Thu, Feb 25, 2021 at 02:10:05PM +1100, Daniel Axtens wrote: >> It's ignored by future versions of llvm's integrated assembler (by not -11). >> I'm not sure what it does for us in gas. > > It enables all insns that exist on 620 (the first 64-bit PowerPC CPU). Same question for this, why do we have it at all? Thanks, Nick
Re: [RFC PATCH 6/8] powerpc/mm/book3s64/hash: drop pre 2.06 tlbiel for clang
Excerpts from Daniel Axtens's message of February 25, 2021 1:10 pm: > The llvm integrated assembler does not recognise the ISA 2.05 tlbiel > version. Eventually do this more smartly. The whole thing with TLBIE and TLBIEL in this file seems a bit too clever. We should have PPC_TLBIE* macros for all of them. Thanks, Nick > > Signed-off-by: Daniel Axtens > --- > arch/powerpc/mm/book3s64/hash_native.c | 10 ++ > 1 file changed, 10 insertions(+) > > diff --git a/arch/powerpc/mm/book3s64/hash_native.c > b/arch/powerpc/mm/book3s64/hash_native.c > index 52e170bd95ae..c5937f69a452 100644 > --- a/arch/powerpc/mm/book3s64/hash_native.c > +++ b/arch/powerpc/mm/book3s64/hash_native.c > @@ -267,9 +267,14 @@ static inline void __tlbiel(unsigned long vpn, int > psize, int apsize, int ssize) > va |= ssize << 8; > sllp = get_sllp_encoding(apsize); > va |= sllp << 5; > +#if 0 > asm volatile(ASM_FTR_IFSET("tlbiel %0", "tlbiel %0,0", %1) >: : "r" (va), "i" (CPU_FTR_ARCH_206) >: "memory"); > +#endif > + asm volatile("tlbiel %0" > + : : "r" (va) > + : "memory"); > break; > default: > /* We need 14 to 14 + i bits of va */ > @@ -286,9 +291,14 @@ static inline void __tlbiel(unsigned long vpn, int > psize, int apsize, int ssize) >*/ > va |= (vpn & 0xfe); > va |= 1; /* L */ > +#if 0 > asm volatile(ASM_FTR_IFSET("tlbiel %0", "tlbiel %0,1", %1) >: : "r" (va), "i" (CPU_FTR_ARCH_206) >: "memory"); > +#endif > + asm volatile("tlbiel %0" > + : : "r" (va) > + : "memory"); > break; > } > trace_tlbie(0, 1, va, 0, 0, 0, 0); > -- > 2.27.0 > >
Re: remove the legacy ide driver
On Thu, 18 Mar 2021, Christoph Hellwig wrote: > Hi all, > > we've been trying to get rid of the legacy ide driver for a while now, > and finally scheduled a removal for 2021, which is three month old now. > > In general distros and most defconfigs have switched to libata long ago, > but there are a few exceptions. This series first switches over all > remaining defconfigs to use libata and then removes the legacy ide > driver. > > libata mostly covers all hardware supported by the legacy ide driver. > There are three mips drivers that are not supported, but the linux-mips > list could not identify any users of those. There also are two m68k > drivers that do not have libata equivalents, which might or might not > have users, so we'll need some input and possibly help from the m68k > community here. > A few months ago I wrote another patch to move some more platforms away from macide but it has not been tested yet. That is not to say you should wait. However, my patch does have some changes that are missing from your patch series, relating to ide platform devices in arch/m68k/mac/config.c. I hope to be able to test this patch before the 5.13 merge window closes.
Re: [RFC PATCH 4/8] powerpc/ppc_asm: use plain numbers for registers
Excerpts from Daniel Axtens's message of February 26, 2021 10:12 am: > Segher Boessenkool writes: > >> On Thu, Feb 25, 2021 at 02:10:02PM +1100, Daniel Axtens wrote: >>> This is dumb but makes the llvm integrated assembler happy. >>> https://github.com/ClangBuiltLinux/linux/issues/764 >> >>> -#definer0 %r0 >> >>> +#definer0 0 >> >> This is a big step back (compare 9a13a524ba37). >> >> If you use a new enough GAS, you can use the -mregnames option and just >> say "r0" directly (so not define it at all, or define it to itself). >> >> === >> addi 3,3,3 >> addi r3,r3,3 >> addi %r3,%r3,3 >> >> addi 3,3,3 >> addi r3,r3,r3 >> addi %r3,%r3,%r3 >> === >> >> $ as t.s -o t.o -mregnames >> t.s: Assembler messages: >> t.s:6: Warning: invalid register expression >> t.s:7: Warning: invalid register expression >> >> >> Many people do not like bare numbers. It is a bit like not wearing >> seatbelts (but so is all assembler code really: you just have to pay >> attention). A better argument is that it is harder to read for people >> not used to assembler code like this. >> >> We used to have "#define r0 0" etc., and that was quite problematic. >> Like that "addi r3,r3,r3" example, but also, people wrote "r0" where >> only a plain 0 is allowed (like in "lwzx r3,0,r3": "r0" would be >> misleading there!) > > So an overarching comment on all of these patches is that they're not > intended to be ready to merge, nor are they necessarily what I think is > the best solution. I'm just swinging a big hammer to see how far towards > LLVM_IAS=1 I can get on powerpc, and I accept I'm going to have to come > back and clean things up. > > Anyway, noted, I'll push harder on trying to get llvm to accept %rN: > there was a patch that went in after llvm-11 that should help. If you put it under ifdef CONFIG_CC_IS_CLANG in the meantime I think that would be okay. Then we get error checking with gcc compiles and llvm at least builds with its assembler which would be nice. Thanks, Nick
Re: [RFC PATCH 3/8] powerpc/head-64: do less gas-specific stuff with sections
Excerpts from Daniel Axtens's message of February 25, 2021 1:10 pm: > Reopening the section without specifying the same flags breaks > the llvm integrated assembler. Don't do it: just specify all the > flags all the time. I don't have a problem with this but llvm might want to track the issue if it aims to be compatible with gas if you haven't alread opened an issue. When you fix the patch (perhaps add a quick comment as well?), then Acked-by: Nicholas Piggin Thanks, Nick > > Signed-off-by: Daniel Axtens > --- > arch/powerpc/include/asm/head-64.h | 8 > 1 file changed, 4 insertions(+), 4 deletions(-) > > diff --git a/arch/powerpc/include/asm/head-64.h > b/arch/powerpc/include/asm/head-64.h > index 4cb9efa2eb21..7d8ccab47e86 100644 > --- a/arch/powerpc/include/asm/head-64.h > +++ b/arch/powerpc/include/asm/head-64.h > @@ -15,10 +15,10 @@ > .macro define_data_ftsec name > .section ".head.data.\name\()","a",@progbits > .endm > -.macro use_ftsec name > - .section ".head.text.\name\()" > -.endm > - > +//.macro use_ftsec name > +// .section ".head.text.\name\()" > +//.endm > +#define use_ftsec define_ftsec > /* > * Fixed (location) sections are used by opening fixed sections and emitting > * fixed section entries into them before closing them. Multiple fixed > sections > -- > 2.27.0 > >
Re: [RFC PATCH 2/8] powerpc: check for support for -Wa, -m{power4, any}
Excerpts from Daniel Axtens's message of February 25, 2021 1:10 pm: > LLVM's integrated assembler does not like either -Wa,-mpower4 > or -Wa,-many. So just don't pass them if they're not supported. > > Signed-off-by: Daniel Axtens > --- > arch/powerpc/Makefile | 4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) > > diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile > index 08cf0eade56a..3e2c72d20bb8 100644 > --- a/arch/powerpc/Makefile > +++ b/arch/powerpc/Makefile > @@ -252,7 +252,9 @@ cpu-as-$(CONFIG_E500) += -Wa,-me500 > # When using '-many -mpower4' gas will first try and find a matching power4 > # mnemonic and failing that it will allow any valid mnemonic that GAS knows > # about. GCC will pass -many to GAS when assembling, clang does not. > -cpu-as-$(CONFIG_PPC_BOOK3S_64) += -Wa,-mpower4 -Wa,-many > +# LLVM IAS doesn't understand either flag: > https://github.com/ClangBuiltLinux/linux/issues/675 > +# but LLVM IAS only supports ISA >= 2.06 for Book3S 64 anyway... > +cpu-as-$(CONFIG_PPC_BOOK3S_64) += $(call > as-option,-Wa$(comma)-mpower4) $(call as-option,-Wa$(comma)-many) > cpu-as-$(CONFIG_PPC_E500MC) += $(call as-option,-Wa$(comma)-me500mc) > > KBUILD_AFLAGS += $(cpu-as-y) I'm wondering why we even have this now. Kbuild's "AS" command goes through the C compiler now with relevant options like -mcpu. I assume it used to be useful for cross compiling when as was called directly but I'm not sure. Thanks, Nick
Re: [PATCH v9 1/8] powerpc/mm: Implement set_memory() routines
Jordan Niethe writes: > From: Russell Currey > > The set_memory_{ro/rw/nx/x}() functions are required for STRICT_MODULE_RWX, > and are generally useful primitives to have. This implementation is > designed to be completely generic across powerpc's many MMUs. > > It's possible that this could be optimised to be faster for specific > MMUs, but the focus is on having a generic and safe implementation for > now. This won't work for the linear mapping with HPT on book3s 64. Because the linear mapping is not in the kernel page tables. apply_to_existing_page_range() should work that out and return an error. But I'm not sure if callers handle that well or at all. We might want to add a WARN_ON_ONCE() in change_memory_attr(), at least to begin with, to report those errors, so we know when we are failing to set permissions. Rather than silently failing and then crashing some time later due to the permissions being wrong for some mapping. cheers > This implementation does not handle cases where the caller is attempting > to change the mapping of the page it is executing from, or if another > CPU is concurrently using the page being altered. These cases likely > shouldn't happen, but a more complex implementation with MMU-specific code > could safely handle them, so that is left as a TODO for now. > > These functions do nothing if STRICT_KERNEL_RWX is not enabled. > > Reviewed-by: Daniel Axtens > Signed-off-by: Russell Currey > Signed-off-by: Christophe Leroy > [jpn: rebase on next plus "powerpc/mm/64s: Allow STRICT_KERNEL_RWX again"] > Signed-off-by: Jordan Niethe > --- > arch/powerpc/Kconfig | 1 + > arch/powerpc/include/asm/set_memory.h | 32 +++ > arch/powerpc/mm/Makefile | 2 +- > arch/powerpc/mm/pageattr.c| 81 +++ > 4 files changed, 115 insertions(+), 1 deletion(-) > create mode 100644 arch/powerpc/include/asm/set_memory.h > create mode 100644 arch/powerpc/mm/pageattr.c > > diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig > index fc7f5c5933e6..4498a27ac9db 100644 > --- a/arch/powerpc/Kconfig > +++ b/arch/powerpc/Kconfig > @@ -135,6 +135,7 @@ config PPC > select ARCH_HAS_MEMBARRIER_CALLBACKS > select ARCH_HAS_MEMBARRIER_SYNC_CORE > select ARCH_HAS_SCALED_CPUTIME if VIRT_CPU_ACCOUNTING_NATIVE > && PPC_BOOK3S_64 > + select ARCH_HAS_SET_MEMORY > select ARCH_HAS_STRICT_KERNEL_RWX if ((PPC_BOOK3S_64 || PPC32) && > !HIBERNATION) > select ARCH_HAS_TICK_BROADCAST if GENERIC_CLOCKEVENTS_BROADCAST > select ARCH_HAS_UACCESS_FLUSHCACHE > diff --git a/arch/powerpc/include/asm/set_memory.h > b/arch/powerpc/include/asm/set_memory.h > new file mode 100644 > index ..64011ea444b4 > --- /dev/null > +++ b/arch/powerpc/include/asm/set_memory.h > @@ -0,0 +1,32 @@ > +/* SPDX-License-Identifier: GPL-2.0 */ > +#ifndef _ASM_POWERPC_SET_MEMORY_H > +#define _ASM_POWERPC_SET_MEMORY_H > + > +#define SET_MEMORY_RO0 > +#define SET_MEMORY_RW1 > +#define SET_MEMORY_NX2 > +#define SET_MEMORY_X 3 > + > +int change_memory_attr(unsigned long addr, int numpages, long action); > + > +static inline int set_memory_ro(unsigned long addr, int numpages) > +{ > + return change_memory_attr(addr, numpages, SET_MEMORY_RO); > +} > + > +static inline int set_memory_rw(unsigned long addr, int numpages) > +{ > + return change_memory_attr(addr, numpages, SET_MEMORY_RW); > +} > + > +static inline int set_memory_nx(unsigned long addr, int numpages) > +{ > + return change_memory_attr(addr, numpages, SET_MEMORY_NX); > +} > + > +static inline int set_memory_x(unsigned long addr, int numpages) > +{ > + return change_memory_attr(addr, numpages, SET_MEMORY_X); > +} > + > +#endif > diff --git a/arch/powerpc/mm/Makefile b/arch/powerpc/mm/Makefile > index 3b4e9e4e25ea..d8a08abde1ae 100644 > --- a/arch/powerpc/mm/Makefile > +++ b/arch/powerpc/mm/Makefile > @@ -5,7 +5,7 @@ > > ccflags-$(CONFIG_PPC64) := $(NO_MINIMAL_TOC) > > -obj-y:= fault.o mem.o pgtable.o mmap.o > maccess.o \ > +obj-y:= fault.o mem.o pgtable.o mmap.o > maccess.o pageattr.o \ > init_$(BITS).o pgtable_$(BITS).o \ > pgtable-frag.o ioremap.o ioremap_$(BITS).o \ > init-common.o mmu_context.o drmem.o > diff --git a/arch/powerpc/mm/pageattr.c b/arch/powerpc/mm/pageattr.c > new file mode 100644 > index ..2da3fbab6ff7 > --- /dev/null > +++ b/arch/powerpc/mm/pageattr.c > @@ -0,0 +1,81 @@ > +// SPDX-License-Identifier: GPL-2.0 > + > +/* > + * MMU-generic set_memory implementation for powerpc > + * > + * Copyright 2019, IBM Corporation. > + */ > + > +#include > +#include > + > +#include > +#include > +#include > + > + > +/* > + * Updates the attributes of a page in three steps: > + * > + * 1. invalidate the page
Re: [PATCH] powerpc/mm: Revert "powerpc/mm: Remove DEBUG_VM_PGTABLE support on powerpc"
"Aneesh Kumar K.V" writes: > This reverts commit 675bceb097e6 ("powerpc/mm: Remove DEBUG_VM_PGTABLE > support on powerpc") > > All the related issues are fixed by the series > https://lore.kernel.org/linux-mm/20200902114222.181353-1-aneesh.ku...@linux.ibm.com Was that series merged? If so this seems like this could be tagged as a Fix for the last commit in that series. cheers > Hence enable it back > > Signed-off-by: Aneesh Kumar K.V > --- > Documentation/features/debug/debug-vm-pgtable/arch-support.txt | 2 +- > arch/powerpc/Kconfig | 1 + > 2 files changed, 2 insertions(+), 1 deletion(-) > > diff --git a/Documentation/features/debug/debug-vm-pgtable/arch-support.txt > b/Documentation/features/debug/debug-vm-pgtable/arch-support.txt > index 7aff505af706..fa83403b4aec 100644 > --- a/Documentation/features/debug/debug-vm-pgtable/arch-support.txt > +++ b/Documentation/features/debug/debug-vm-pgtable/arch-support.txt > @@ -21,7 +21,7 @@ > | nios2: | TODO | > |openrisc: | TODO | > | parisc: | TODO | > -| powerpc: | TODO | > +| powerpc: | ok | > | riscv: | ok | > |s390: | ok | > | sh: | TODO | > diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig > index 386ae12d8523..982c87d5c051 100644 > --- a/arch/powerpc/Kconfig > +++ b/arch/powerpc/Kconfig > @@ -119,6 +119,7 @@ config PPC > # > select ARCH_32BIT_OFF_T if PPC32 > select ARCH_HAS_DEBUG_VIRTUAL > + select ARCH_HAS_DEBUG_VM_PGTABLE > select ARCH_HAS_DEVMEM_IS_ALLOWED > select ARCH_HAS_ELF_RANDOMIZE > select ARCH_HAS_FORTIFY_SOURCE > -- > 2.30.2
[PATCH] powerpc/iommu/debug: fix ifnullfree.cocci warnings
From: kernel test robot arch/powerpc/kernel/iommu.c:76:2-16: WARNING: NULL check before some freeing functions is not needed. NULL check before some freeing functions is not needed. Based on checkpatch warning "kfree(NULL) is safe this check is probably not required" and kfreeaddr.cocci by Julia Lawall. Generated by: scripts/coccinelle/free/ifnullfree.cocci Fixes: 691602aab9c3 ("powerpc/iommu/debug: Add debugfs entries for IOMMU tables") CC: Alexey Kardashevskiy Reported-by: kernel test robot Signed-off-by: kernel test robot --- tree: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master head: 81aa0968b7ea6dbabcdcda37dc8434dca6e1565b commit: 691602aab9c3cce31d3ff9529c09b7922a5f6224 powerpc/iommu/debug: Add debugfs entries for IOMMU tables iommu.c |3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) --- a/arch/powerpc/kernel/iommu.c +++ b/arch/powerpc/kernel/iommu.c @@ -72,8 +72,7 @@ static void iommu_debugfs_del(struct iom sprintf(name, "%08lx", tbl->it_index); liobn_entry = debugfs_lookup(name, iommu_debugfs_dir); - if (liobn_entry) - debugfs_remove(liobn_entry); + debugfs_remove(liobn_entry); } #else static void iommu_debugfs_add(struct iommu_table *tbl){}
Re: [PATCH] net: marvell: Remove reference to CONFIG_MV64X60
Hello: This patch was applied to netdev/net.git (refs/heads/master): On Thu, 18 Mar 2021 17:25:08 + (UTC) you wrote: > Commit 92c8c16f3457 ("powerpc/embedded6xx: Remove C2K board support") > removed last selector of CONFIG_MV64X60. > > As it is not a user selectable config item, all references to it > are stale. Remove them. > > Signed-off-by: Christophe Leroy > > [...] Here is the summary with links: - net: marvell: Remove reference to CONFIG_MV64X60 https://git.kernel.org/netdev/net/c/600cc3c9c62d You are awesome, thank you! -- Deet-doot-dot, I am a bot. https://korg.docs.kernel.org/patchwork/pwbot.html
[PATCH v2] powerpc/qspinlock: Use generic smp_cond_load_relaxed
49a7d46a06c3 (powerpc: Implement smp_cond_load_relaxed()) added busy-waiting pausing with a preferred SMT priority pattern, lowering the priority (reducing decode cycles) during the whole loop slowpath. However, data shows that while this pattern works well with simple spinlocks, queued spinlocks benefit more being kept in medium priority, with a cpu_relax() instead, being a low+medium combo on powerpc. Data is from three benchmarks on a Power9: 9008-22L 64 CPUs with 2 sockets and 8 threads per core. 1. locktorture. This is data for the lowest and most artificial/pathological level, with increasing thread counts pounding on the lock. Metrics are total ops/minute. Despite some small hits in the 4-8 range, scenarios are either neutral or favorable to this patch. +=+==+==+===+ | # tasks | vanilla | dirty| %diff | +=+==+==+===+ | 2 | 46718565 | 48751350 | 4.35 | +-+--+--+---+ | 4 | 51740198 | 50369082 | -2.65 | +-+--+--+---+ | 8 | 63756510 | 62568821 | -1.86 | +-+--+--+---+ | 16 | 67824531 | 70966546 | 4.63 | +-+--+--+---+ | 32 | 53843519 | 61155508 | 13.58 | +-+--+--+---+ | 64 | 53005778 | 53104412 | 0.18 | +-+--+--+---+ | 128 | 53331980 | 54606910 | 2.39 | +=+==+==+===+ 2. sockperf (tcp throughput) Here a client will do one-way throughput tests to a localhost server, with increasing message sizes, dealing with the sk_lock. This patch shows to put the performance of the qspinlock back to par with that of the simple lock: simple-spinlock vanilla dirty Hmean 1473.50 ( 0.00%) 54.44 * -25.93%* 73.45 * -0.07%* Hmean 100 654.47 ( 0.00%) 385.61 * -41.08%* 771.43 * 17.87%* Hmean 300 2719.39 ( 0.00%) 2181.67 * -19.77%* 2666.50 * -1.94%* Hmean 500 4400.59 ( 0.00%) 3390.77 * -22.95%* 4322.14 * -1.78%* Hmean 850 6726.21 ( 0.00%) 5264.03 * -21.74%* 6863.12 * 2.04%* 3. dbench (tmpfs) Configured to run with up to ncpusx8 clients, it shows both latency and throughput metrics. For the latency, with the exception of the 64 case, there is really nothing to go by: vanilladirty Amean latency-1 1.67 ( 0.00%)1.67 * 0.09%* Amean latency-2 2.15 ( 0.00%)2.08 * 3.36%* Amean latency-4 2.50 ( 0.00%)2.56 * -2.27%* Amean latency-8 2.49 ( 0.00%)2.48 * 0.31%* Amean latency-16 2.69 ( 0.00%)2.72 * -1.37%* Amean latency-32 2.96 ( 0.00%)3.04 * -2.60%* Amean latency-64 7.78 ( 0.00%)8.17 * -5.07%* Amean latency-512 186.91 ( 0.00%) 186.41 * 0.27%* For the dbench4 Throughput (misleading but traditional) there's a small but rather constant improvement: vanilladirty Hmean 1849.13 ( 0.00%) 851.51 * 0.28%* Hmean 2 1664.03 ( 0.00%) 1663.94 * -0.01%* Hmean 4 3073.70 ( 0.00%) 3104.29 * 1.00%* Hmean 8 5624.02 ( 0.00%) 5694.16 * 1.25%* Hmean 16 9169.49 ( 0.00%) 9324.43 * 1.69%* Hmean 32 11969.37 ( 0.00%)12127.09 * 1.32%* Hmean 64 15021.12 ( 0.00%)15243.14 * 1.48%* Hmean 51214891.27 ( 0.00%)15162.11 * 1.82%* Measuring the dbench4 Per-VFS Operation latency, shows some very minor differences within the noise level, around the 0-1% ranges. Fixes: 49a7d46a06c3 (powerpc: Implement smp_cond_load_relaxed()) Acked-by: Nicholas Piggin Signed-off-by: Davidlohr Bueso --- Changes from v1: Added small description and labeling smp_cond_load_relaxed requested by Nick. Added Nick's ack. arch/powerpc/include/asm/barrier.h | 16 arch/powerpc/include/asm/qspinlock.h | 7 +++ 2 files changed, 7 insertions(+), 16 deletions(-) diff --git a/arch/powerpc/include/asm/barrier.h b/arch/powerpc/include/asm/barrier.h index aecfde829d5d..7ae29cfb06c0 100644 --- a/arch/powerpc/include/asm/barrier.h +++ b/arch/powerpc/include/asm/barrier.h @@ -80,22 +80,6 @@ do { \ ___p1; \ }) -#ifdef CONFIG_PPC64 -#define smp_cond_load_relaxed(ptr, cond_expr) ({ \ - typeof(ptr) __PTR = (ptr); \ - __unqual_scalar_typeof(*ptr) VAL; \ - VAL = READ_ONCE(*__PTR);\ - if (unlikely(!(cond_expr))) { \ - spin_begin();
Re: [PATCH 3/3] powerpc/qspinlock: Use generic smp_cond_load_relaxed
On Tue, 16 Mar 2021, Nicholas Piggin wrote: One request, could you add a comment in place that references smp_cond_load_relaxed() so this commit can be found again if someone looks at it? Something like this /* * smp_cond_load_relaxed was found to have performance problems if * implemented with spin_begin()/spin_end(). */ Sure, let me see where I can fit that in and send out a v2. Similarly, but unrelated to this patch, is there any chance we could remove the whole spin_until_cond() machinery and make it specific to powerpc? This was introduced in 2017 and doesn't really have any users outside of powerpc, except for these: drivers/firmware/arm_scmi/driver.c: spin_until_cond(scmi_xfer_done_no_timeout(cinfo, xfer, stop)); drivers/firmware/arm_scmi/shmem.c: spin_until_cond(ioread32(>channel_status) & drivers/net/ethernet/xilinx/ll_temac_main.c: spin_until_cond(hard_acs_rdy_or_timeout(lp, timeout)); ... which afaict only the xilinx one can actually build on powerpc. Regardless, these could be converted to smp_cond_load_relaxed(), being the more standard way to do optimized busy-waiting, caring more about the family of barriers than ad-hoc SMT priorities. Of course, I have no way of testing any of these changes. I wonder if it should have a Fixes: tag to the original commit as well. I'm not sure either. I've actually been informed recently of other workloads that benefit from the revert on large Power9 boxes. So I'll go ahead and add it. Otherwise, Acked-by: Nicholas Piggin Thanks, Davidlohr
Re: [PATCH] powerpc/embedded6xx: Remove CONFIG_MV64X60
On Thu, Mar 18, 2021 at 05:25:07PM +, Christophe Leroy wrote: > Commit 92c8c16f3457 ("powerpc/embedded6xx: Remove C2K board support") > moved the last selector of CONFIG_MV64X60. > > As it is not a user selectable config, it can be removed. > > Signed-off-by: Christophe Leroy Acked-by: Wolfram Sang # for I2C signature.asc Description: PGP signature
Re: [PATCH] watchdog: Remove MV64x60 watchdog driver
On 3/18/21 10:25 AM, Christophe Leroy wrote: > Commit 92c8c16f3457 ("powerpc/embedded6xx: Remove C2K board support") > removed the last selector of CONFIG_MV64X60. > > Therefore CONFIG_MV64X60_WDT cannot be selected anymore and > can be removed. > > Signed-off-by: Christophe Leroy Reviewed-by: Guenter Roeck > --- > drivers/watchdog/Kconfig | 4 - > drivers/watchdog/Makefile | 1 - > drivers/watchdog/mv64x60_wdt.c | 324 - > include/linux/mv643xx.h| 8 - > 4 files changed, 337 deletions(-) > delete mode 100644 drivers/watchdog/mv64x60_wdt.c > > diff --git a/drivers/watchdog/Kconfig b/drivers/watchdog/Kconfig > index 1fe0042a48d2..178296bda151 100644 > --- a/drivers/watchdog/Kconfig > +++ b/drivers/watchdog/Kconfig > @@ -1831,10 +1831,6 @@ config 8xxx_WDT > > For BookE processors (MPC85xx) use the BOOKE_WDT driver instead. > > -config MV64X60_WDT > - tristate "MV64X60 (Marvell Discovery) Watchdog Timer" > - depends on MV64X60 || COMPILE_TEST > - > config PIKA_WDT > tristate "PIKA FPGA Watchdog" > depends on WARP || (PPC64 && COMPILE_TEST) > diff --git a/drivers/watchdog/Makefile b/drivers/watchdog/Makefile > index f3a6540e725e..752c6513f731 100644 > --- a/drivers/watchdog/Makefile > +++ b/drivers/watchdog/Makefile > @@ -175,7 +175,6 @@ obj-$(CONFIG_PIC32_DMT) += pic32-dmt.o > # POWERPC Architecture > obj-$(CONFIG_GEF_WDT) += gef_wdt.o > obj-$(CONFIG_8xxx_WDT) += mpc8xxx_wdt.o > -obj-$(CONFIG_MV64X60_WDT) += mv64x60_wdt.o > obj-$(CONFIG_PIKA_WDT) += pika_wdt.o > obj-$(CONFIG_BOOKE_WDT) += booke_wdt.o > obj-$(CONFIG_MEN_A21_WDT) += mena21_wdt.o > diff --git a/drivers/watchdog/mv64x60_wdt.c b/drivers/watchdog/mv64x60_wdt.c > deleted file mode 100644 > index 894aa63488d3.. > --- a/drivers/watchdog/mv64x60_wdt.c > +++ /dev/null > @@ -1,324 +0,0 @@ > -// SPDX-License-Identifier: GPL-2.0 > -/* > - * mv64x60_wdt.c - MV64X60 (Marvell Discovery) watchdog userspace interface > - * > - * Author: James Chapman > - * > - * Platform-specific setup code should configure the dog to generate > - * interrupt or reset as required. This code only enables/disables > - * and services the watchdog. > - * > - * Derived from mpc8xx_wdt.c, with the following copyright. > - * > - * 2002 (c) Florian Schirmer > - */ > - > -#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt > - > -#include > -#include > -#include > -#include > -#include > -#include > -#include > -#include > -#include > -#include > - > -#define MV64x60_WDT_WDC_OFFSET 0 > - > -/* > - * The watchdog configuration register contains a pair of 2-bit fields, > - * 1. a reload field, bits 27-26, which triggers a reload of > - * the countdown register, and > - * 2. an enable field, bits 25-24, which toggles between > - * enabling and disabling the watchdog timer. > - * Bit 31 is a read-only field which indicates whether the > - * watchdog timer is currently enabled. > - * > - * The low 24 bits contain the timer reload value. > - */ > -#define MV64x60_WDC_ENABLE_SHIFT 24 > -#define MV64x60_WDC_SERVICE_SHIFT26 > -#define MV64x60_WDC_ENABLED_SHIFT31 > - > -#define MV64x60_WDC_ENABLED_TRUE 1 > -#define MV64x60_WDC_ENABLED_FALSE0 > - > -/* Flags bits */ > -#define MV64x60_WDOG_FLAG_OPENED 0 > - > -static unsigned long wdt_flags; > -static int wdt_status; > -static void __iomem *mv64x60_wdt_regs; > -static int mv64x60_wdt_timeout; > -static int mv64x60_wdt_count; > -static unsigned int bus_clk; > -static char expect_close; > -static DEFINE_SPINLOCK(mv64x60_wdt_spinlock); > - > -static bool nowayout = WATCHDOG_NOWAYOUT; > -module_param(nowayout, bool, 0); > -MODULE_PARM_DESC(nowayout, > - "Watchdog cannot be stopped once started (default=" > - __MODULE_STRING(WATCHDOG_NOWAYOUT) ")"); > - > -static int mv64x60_wdt_toggle_wdc(int enabled_predicate, int field_shift) > -{ > - u32 data; > - u32 enabled; > - int ret = 0; > - > - spin_lock(_wdt_spinlock); > - data = readl(mv64x60_wdt_regs + MV64x60_WDT_WDC_OFFSET); > - enabled = (data >> MV64x60_WDC_ENABLED_SHIFT) & 1; > - > - /* only toggle the requested field if enabled state matches predicate */ > - if ((enabled ^ enabled_predicate) == 0) { > - /* We write a 1, then a 2 -- to the appropriate field */ > - data = (1 << field_shift) | mv64x60_wdt_count; > - writel(data, mv64x60_wdt_regs + MV64x60_WDT_WDC_OFFSET); > - > - data = (2 << field_shift) | mv64x60_wdt_count; > - writel(data, mv64x60_wdt_regs + MV64x60_WDT_WDC_OFFSET); > - ret = 1; > - } > - spin_unlock(_wdt_spinlock); > - > - return ret; > -} > - > -static void mv64x60_wdt_service(void) > -{ > - mv64x60_wdt_toggle_wdc(MV64x60_WDC_ENABLED_TRUE, > -MV64x60_WDC_SERVICE_SHIFT); > -} > - > -static void mv64x60_wdt_handler_enable(void) >
Re: [PATCH 01/10] alpha: use libata instead of the legacy ide driver
Måns Rullgård writes: > Christoph Hellwig writes: > >> On Thu, Mar 18, 2021 at 05:54:55AM +, Al Viro wrote: >>> On Thu, Mar 18, 2021 at 05:56:57AM +0100, Christoph Hellwig wrote: >>> > Switch the alpha defconfig from the legacy ide driver to libata. >>> >>> Umm... I don't have an IDE alpha box in a usable shape (fans on >>> CPU module shat themselves), and it would take a while to resurrect >>> it, but I remember the joy it used to cause in some versions. >>> >>> Do you have reports of libata variants of drivers actually tested on >>> those? >> >> No, I haven't. The whole point is that we're not going to keep 4 >> lines of code around despite notice for users that don't exist or >> care. If there is a regression we'll fix it, but we're not going to >> make life miserable just because we can. > > The pata_ali driver works fine on my UP1500 machine, unless something > broke recently. I'll build the latest kernel and report back. 5.11.7 seems fine too. -- Måns Rullgård
[PATCH 1/1] powerpc/kernel/iommu: Use largepool as a last resort when !largealloc
As of today, doing iommu_range_alloc() only for !largealloc (npages <= 15) will only be able to use 3/4 of the available pages, given pages on largepool not being available for !largealloc. This could mean some drivers not being able to fully use all the available pages for the DMA window. Add pages on largepool as a last resort for !largealloc, making all pages of the DMA window available. Signed-off-by: Leonardo Bras Reviewed-by: Alexey Kardashevskiy --- arch/powerpc/kernel/iommu.c | 9 + 1 file changed, 9 insertions(+) diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c index 3329ef045805..ae6ad8dca605 100644 --- a/arch/powerpc/kernel/iommu.c +++ b/arch/powerpc/kernel/iommu.c @@ -255,6 +255,15 @@ static unsigned long iommu_range_alloc(struct device *dev, pass++; goto again; + } else if (pass == tbl->nr_pools + 1) { + /* Last resort: try largepool */ + spin_unlock(>lock); + pool = >large_pool; + spin_lock(>lock); + pool->hint = pool->start; + pass++; + goto again; + } else { /* Give up */ spin_unlock_irqrestore(&(pool->lock), flags); -- 2.29.2
[PATCH 1/1] powerpc/kernel/iommu: Align size for IOMMU_PAGE_SIZE() to save TCEs
Currently both iommu_alloc_coherent() and iommu_free_coherent() align the desired allocation size to PAGE_SIZE, and gets system pages and IOMMU mappings (TCEs) for that value. When IOMMU_PAGE_SIZE < PAGE_SIZE, this behavior may cause unnecessary TCEs to be created for mapping the whole system page. Example: - PAGE_SIZE = 64k, IOMMU_PAGE_SIZE() = 4k - iommu_alloc_coherent() is called for 128 bytes - 1 system page (64k) is allocated - 16 IOMMU pages (16 x 4k) are allocated (16 TCEs used) It would be enough to use a single TCE for this, so 15 TCEs are wasted in the process. Update iommu_*_coherent() to make sure the size alignment happens only for IOMMU_PAGE_SIZE() before calling iommu_alloc() and iommu_free(). Also, on iommu_range_alloc(), replace ALIGN(n, 1 << tbl->it_page_shift) with IOMMU_PAGE_ALIGN(n, tbl), which is easier to read and does the same. Signed-off-by: Leonardo Bras Reviewed-by: Alexey Kardashevskiy --- arch/powerpc/kernel/iommu.c | 11 ++- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c index 5b69a6a72a0e..3329ef045805 100644 --- a/arch/powerpc/kernel/iommu.c +++ b/arch/powerpc/kernel/iommu.c @@ -851,6 +851,7 @@ void *iommu_alloc_coherent(struct device *dev, struct iommu_table *tbl, unsigned int order; unsigned int nio_pages, io_order; struct page *page; + size_t size_io = size; size = PAGE_ALIGN(size); order = get_order(size); @@ -877,8 +878,9 @@ void *iommu_alloc_coherent(struct device *dev, struct iommu_table *tbl, memset(ret, 0, size); /* Set up tces to cover the allocated range */ - nio_pages = size >> tbl->it_page_shift; - io_order = get_iommu_order(size, tbl); + size_io = IOMMU_PAGE_ALIGN(size_io, tbl); + nio_pages = size_io >> tbl->it_page_shift; + io_order = get_iommu_order(size_io, tbl); mapping = iommu_alloc(dev, tbl, ret, nio_pages, DMA_BIDIRECTIONAL, mask >> tbl->it_page_shift, io_order, 0); if (mapping == DMA_MAPPING_ERROR) { @@ -893,10 +895,9 @@ void iommu_free_coherent(struct iommu_table *tbl, size_t size, void *vaddr, dma_addr_t dma_handle) { if (tbl) { - unsigned int nio_pages; + size_t size_io = IOMMU_PAGE_ALIGN(size, tbl); + unsigned int nio_pages = size_io >> tbl->it_page_shift; - size = PAGE_ALIGN(size); - nio_pages = size >> tbl->it_page_shift; iommu_free(tbl, dma_handle, nio_pages); size = PAGE_ALIGN(size); free_pages((unsigned long)vaddr, get_order(size)); -- 2.29.2
[PATCH] net: marvell: Remove reference to CONFIG_MV64X60
Commit 92c8c16f3457 ("powerpc/embedded6xx: Remove C2K board support") removed last selector of CONFIG_MV64X60. As it is not a user selectable config item, all references to it are stale. Remove them. Signed-off-by: Christophe Leroy --- drivers/net/ethernet/marvell/Kconfig | 4 ++-- drivers/net/ethernet/marvell/mv643xx_eth.c | 2 +- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/net/ethernet/marvell/Kconfig b/drivers/net/ethernet/marvell/Kconfig index 7fe15a3286f4..fe0989c0fc25 100644 --- a/drivers/net/ethernet/marvell/Kconfig +++ b/drivers/net/ethernet/marvell/Kconfig @@ -6,7 +6,7 @@ config NET_VENDOR_MARVELL bool "Marvell devices" default y - depends on PCI || CPU_PXA168 || MV64X60 || PPC32 || PLAT_ORION || INET || COMPILE_TEST + depends on PCI || CPU_PXA168 || PPC32 || PLAT_ORION || INET || COMPILE_TEST help If you have a network (Ethernet) card belonging to this class, say Y. @@ -19,7 +19,7 @@ if NET_VENDOR_MARVELL config MV643XX_ETH tristate "Marvell Discovery (643XX) and Orion ethernet support" - depends on MV64X60 || PPC32 || PLAT_ORION || COMPILE_TEST + depends on PPC32 || PLAT_ORION || COMPILE_TEST depends on INET select PHYLIB select MVMDIO diff --git a/drivers/net/ethernet/marvell/mv643xx_eth.c b/drivers/net/ethernet/marvell/mv643xx_eth.c index 90e6111ce534..3bfb659b5c99 100644 --- a/drivers/net/ethernet/marvell/mv643xx_eth.c +++ b/drivers/net/ethernet/marvell/mv643xx_eth.c @@ -2684,7 +2684,7 @@ static const struct of_device_id mv643xx_eth_shared_ids[] = { MODULE_DEVICE_TABLE(of, mv643xx_eth_shared_ids); #endif -#if defined(CONFIG_OF_IRQ) && !defined(CONFIG_MV64X60) +#ifdef CONFIG_OF_IRQ #define mv643xx_eth_property(_np, _name, _v) \ do {\ u32 tmp;\ -- 2.25.0
[PATCH] watchdog: Remove MV64x60 watchdog driver
Commit 92c8c16f3457 ("powerpc/embedded6xx: Remove C2K board support") removed the last selector of CONFIG_MV64X60. Therefore CONFIG_MV64X60_WDT cannot be selected anymore and can be removed. Signed-off-by: Christophe Leroy --- drivers/watchdog/Kconfig | 4 - drivers/watchdog/Makefile | 1 - drivers/watchdog/mv64x60_wdt.c | 324 - include/linux/mv643xx.h| 8 - 4 files changed, 337 deletions(-) delete mode 100644 drivers/watchdog/mv64x60_wdt.c diff --git a/drivers/watchdog/Kconfig b/drivers/watchdog/Kconfig index 1fe0042a48d2..178296bda151 100644 --- a/drivers/watchdog/Kconfig +++ b/drivers/watchdog/Kconfig @@ -1831,10 +1831,6 @@ config 8xxx_WDT For BookE processors (MPC85xx) use the BOOKE_WDT driver instead. -config MV64X60_WDT - tristate "MV64X60 (Marvell Discovery) Watchdog Timer" - depends on MV64X60 || COMPILE_TEST - config PIKA_WDT tristate "PIKA FPGA Watchdog" depends on WARP || (PPC64 && COMPILE_TEST) diff --git a/drivers/watchdog/Makefile b/drivers/watchdog/Makefile index f3a6540e725e..752c6513f731 100644 --- a/drivers/watchdog/Makefile +++ b/drivers/watchdog/Makefile @@ -175,7 +175,6 @@ obj-$(CONFIG_PIC32_DMT) += pic32-dmt.o # POWERPC Architecture obj-$(CONFIG_GEF_WDT) += gef_wdt.o obj-$(CONFIG_8xxx_WDT) += mpc8xxx_wdt.o -obj-$(CONFIG_MV64X60_WDT) += mv64x60_wdt.o obj-$(CONFIG_PIKA_WDT) += pika_wdt.o obj-$(CONFIG_BOOKE_WDT) += booke_wdt.o obj-$(CONFIG_MEN_A21_WDT) += mena21_wdt.o diff --git a/drivers/watchdog/mv64x60_wdt.c b/drivers/watchdog/mv64x60_wdt.c deleted file mode 100644 index 894aa63488d3.. --- a/drivers/watchdog/mv64x60_wdt.c +++ /dev/null @@ -1,324 +0,0 @@ -// SPDX-License-Identifier: GPL-2.0 -/* - * mv64x60_wdt.c - MV64X60 (Marvell Discovery) watchdog userspace interface - * - * Author: James Chapman - * - * Platform-specific setup code should configure the dog to generate - * interrupt or reset as required. This code only enables/disables - * and services the watchdog. - * - * Derived from mpc8xx_wdt.c, with the following copyright. - * - * 2002 (c) Florian Schirmer - */ - -#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt - -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include - -#define MV64x60_WDT_WDC_OFFSET 0 - -/* - * The watchdog configuration register contains a pair of 2-bit fields, - * 1. a reload field, bits 27-26, which triggers a reload of - * the countdown register, and - * 2. an enable field, bits 25-24, which toggles between - * enabling and disabling the watchdog timer. - * Bit 31 is a read-only field which indicates whether the - * watchdog timer is currently enabled. - * - * The low 24 bits contain the timer reload value. - */ -#define MV64x60_WDC_ENABLE_SHIFT 24 -#define MV64x60_WDC_SERVICE_SHIFT 26 -#define MV64x60_WDC_ENABLED_SHIFT 31 - -#define MV64x60_WDC_ENABLED_TRUE 1 -#define MV64x60_WDC_ENABLED_FALSE 0 - -/* Flags bits */ -#define MV64x60_WDOG_FLAG_OPENED 0 - -static unsigned long wdt_flags; -static int wdt_status; -static void __iomem *mv64x60_wdt_regs; -static int mv64x60_wdt_timeout; -static int mv64x60_wdt_count; -static unsigned int bus_clk; -static char expect_close; -static DEFINE_SPINLOCK(mv64x60_wdt_spinlock); - -static bool nowayout = WATCHDOG_NOWAYOUT; -module_param(nowayout, bool, 0); -MODULE_PARM_DESC(nowayout, - "Watchdog cannot be stopped once started (default=" - __MODULE_STRING(WATCHDOG_NOWAYOUT) ")"); - -static int mv64x60_wdt_toggle_wdc(int enabled_predicate, int field_shift) -{ - u32 data; - u32 enabled; - int ret = 0; - - spin_lock(_wdt_spinlock); - data = readl(mv64x60_wdt_regs + MV64x60_WDT_WDC_OFFSET); - enabled = (data >> MV64x60_WDC_ENABLED_SHIFT) & 1; - - /* only toggle the requested field if enabled state matches predicate */ - if ((enabled ^ enabled_predicate) == 0) { - /* We write a 1, then a 2 -- to the appropriate field */ - data = (1 << field_shift) | mv64x60_wdt_count; - writel(data, mv64x60_wdt_regs + MV64x60_WDT_WDC_OFFSET); - - data = (2 << field_shift) | mv64x60_wdt_count; - writel(data, mv64x60_wdt_regs + MV64x60_WDT_WDC_OFFSET); - ret = 1; - } - spin_unlock(_wdt_spinlock); - - return ret; -} - -static void mv64x60_wdt_service(void) -{ - mv64x60_wdt_toggle_wdc(MV64x60_WDC_ENABLED_TRUE, - MV64x60_WDC_SERVICE_SHIFT); -} - -static void mv64x60_wdt_handler_enable(void) -{ - if (mv64x60_wdt_toggle_wdc(MV64x60_WDC_ENABLED_FALSE, - MV64x60_WDC_ENABLE_SHIFT)) { - mv64x60_wdt_service(); - pr_notice("watchdog activated\n"); - } -} - -static void mv64x60_wdt_handler_disable(void) -{ - if
[PATCH] powerpc/embedded6xx: Remove CONFIG_MV64X60
Commit 92c8c16f3457 ("powerpc/embedded6xx: Remove C2K board support") moved the last selector of CONFIG_MV64X60. As it is not a user selectable config, it can be removed. Signed-off-by: Christophe Leroy --- arch/powerpc/platforms/embedded6xx/Kconfig | 5 - drivers/i2c/busses/Kconfig | 2 +- 2 files changed, 1 insertion(+), 6 deletions(-) diff --git a/arch/powerpc/platforms/embedded6xx/Kconfig b/arch/powerpc/platforms/embedded6xx/Kconfig index c1920961f410..4c6d703a4284 100644 --- a/arch/powerpc/platforms/embedded6xx/Kconfig +++ b/arch/powerpc/platforms/embedded6xx/Kconfig @@ -71,11 +71,6 @@ config MPC10X_BRIDGE bool select PPC_INDIRECT_PCI -config MV64X60 - bool - select PPC_INDIRECT_PCI - select CHECK_CACHE_COHERENCY - config GAMECUBE_COMMON bool diff --git a/drivers/i2c/busses/Kconfig b/drivers/i2c/busses/Kconfig index 05ebf7546e3f..20edcda1c6f4 100644 --- a/drivers/i2c/busses/Kconfig +++ b/drivers/i2c/busses/Kconfig @@ -776,7 +776,7 @@ config I2C_MT7621 config I2C_MV64XXX tristate "Marvell mv64xxx I2C Controller" - depends on MV64X60 || PLAT_ORION || ARCH_SUNXI || ARCH_MVEBU || COMPILE_TEST + depends on PLAT_ORION || ARCH_SUNXI || ARCH_MVEBU || COMPILE_TEST help If you say yes to this option, support will be included for the built-in I2C interface on the Marvell 64xxx line of host bridges. -- 2.25.0
Re: [PATCH 01/10] alpha: use libata instead of the legacy ide driver
Christoph Hellwig writes: > On Thu, Mar 18, 2021 at 05:54:55AM +, Al Viro wrote: >> On Thu, Mar 18, 2021 at 05:56:57AM +0100, Christoph Hellwig wrote: >> > Switch the alpha defconfig from the legacy ide driver to libata. >> >> Umm... I don't have an IDE alpha box in a usable shape (fans on >> CPU module shat themselves), and it would take a while to resurrect >> it, but I remember the joy it used to cause in some versions. >> >> Do you have reports of libata variants of drivers actually tested on >> those? > > No, I haven't. The whole point is that we're not going to keep 4 > lines of code around despite notice for users that don't exist or > care. If there is a regression we'll fix it, but we're not going to > make life miserable just because we can. The pata_ali driver works fine on my UP1500 machine, unless something broke recently. I'll build the latest kernel and report back. -- Måns Rullgård
Re: [PATCH 08/10] MIPS: disable CONFIG_IDE in malta*_defconfig
On Thu, Mar 18, 2021 at 05:57:04AM +0100, Christoph Hellwig wrote: > arch/mips/configs/malta_kvm_guest_defconfig | 3 --- that file is gone in mips-next. I could take all MIPS patches into mips-next, if you want... Thomas. -- Crap can work. Given enough thrust pigs will fly, but it's not necessarily a good idea.[ RFC1925, 2.3 ]
[PATCH 3/3] swiotlb: remove swiotlb_nr_tbl
All callers just use it to check if swiotlb is active at all, for which they can just use is_swiotlb_active. In the longer run drivers need to stop using is_swiotlb_active as well, but let's do the simple step first. Signed-off-by: Christoph Hellwig --- drivers/gpu/drm/i915/gem/i915_gem_internal.c | 2 +- drivers/gpu/drm/nouveau/nouveau_ttm.c| 2 +- drivers/pci/xen-pcifront.c | 2 +- include/linux/swiotlb.h | 1 - kernel/dma/swiotlb.c | 7 +-- 5 files changed, 4 insertions(+), 10 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_internal.c b/drivers/gpu/drm/i915/gem/i915_gem_internal.c index ad22f42541bda6..a9d65fc8aa0eab 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_internal.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_internal.c @@ -42,7 +42,7 @@ static int i915_gem_object_get_pages_internal(struct drm_i915_gem_object *obj) max_order = MAX_ORDER; #ifdef CONFIG_SWIOTLB - if (swiotlb_nr_tbl()) { + if (is_swiotlb_active()) { unsigned int max_segment; max_segment = swiotlb_max_segment(); diff --git a/drivers/gpu/drm/nouveau/nouveau_ttm.c b/drivers/gpu/drm/nouveau/nouveau_ttm.c index a37bc3d7b38b3b..9662522aa0664a 100644 --- a/drivers/gpu/drm/nouveau/nouveau_ttm.c +++ b/drivers/gpu/drm/nouveau/nouveau_ttm.c @@ -321,7 +321,7 @@ nouveau_ttm_init(struct nouveau_drm *drm) } #if IS_ENABLED(CONFIG_SWIOTLB) && IS_ENABLED(CONFIG_X86) - need_swiotlb = !!swiotlb_nr_tbl(); + need_swiotlb = is_swiotlb_active(); #endif ret = ttm_bo_device_init(>ttm.bdev, _bo_driver, diff --git a/drivers/pci/xen-pcifront.c b/drivers/pci/xen-pcifront.c index 2d75026482197d..b7a8f3a1921f83 100644 --- a/drivers/pci/xen-pcifront.c +++ b/drivers/pci/xen-pcifront.c @@ -693,7 +693,7 @@ static int pcifront_connect_and_init_dma(struct pcifront_device *pdev) spin_unlock(_dev_lock); - if (!err && !swiotlb_nr_tbl()) { + if (!err && !is_swiotlb_active()) { err = pci_xen_swiotlb_init_late(); if (err) dev_err(>xdev->dev, "Could not setup SWIOTLB!\n"); diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h index 63f7a63f61d098..216854a5e5134b 100644 --- a/include/linux/swiotlb.h +++ b/include/linux/swiotlb.h @@ -37,7 +37,6 @@ enum swiotlb_force { extern void swiotlb_init(int verbose); int swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int verbose); -extern unsigned long swiotlb_nr_tbl(void); unsigned long swiotlb_size_or_default(void); extern int swiotlb_late_init_with_tbl(char *tlb, unsigned long nslabs); extern int swiotlb_late_init_with_default_size(size_t default_size); diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c index 13de669a9b4681..539c76beb52e07 100644 --- a/kernel/dma/swiotlb.c +++ b/kernel/dma/swiotlb.c @@ -94,12 +94,6 @@ setup_io_tlb_npages(char *str) } early_param("swiotlb", setup_io_tlb_npages); -unsigned long swiotlb_nr_tbl(void) -{ - return io_tlb_default_mem ? io_tlb_default_mem->nslabs : 0; -} -EXPORT_SYMBOL_GPL(swiotlb_nr_tbl); - unsigned int swiotlb_max_segment(void) { return io_tlb_default_mem ? max_segment : 0; @@ -652,6 +646,7 @@ bool is_swiotlb_active(void) { return io_tlb_default_mem != NULL; } +EXPORT_SYMBOL_GPL(is_swiotlb_active); #ifdef CONFIG_DEBUG_FS -- 2.30.1
[PATCH 2/3] swiotlb: dynamically allocate io_tlb_default_mem
Instead of allocating ->list and ->orig_addr separately just do one dynamic allocation for the actual io_tlb_mem structure. This simplifies a lot of the initialization code, and also allows to just check io_tlb_default_mem to see if swiotlb is in use. Signed-off-by: Christoph Hellwig --- drivers/xen/swiotlb-xen.c | 22 +-- include/linux/swiotlb.h | 18 ++- kernel/dma/swiotlb.c | 306 -- 3 files changed, 117 insertions(+), 229 deletions(-) diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c index 5329ad54a5f34e..4c89afc0df6289 100644 --- a/drivers/xen/swiotlb-xen.c +++ b/drivers/xen/swiotlb-xen.c @@ -158,17 +158,14 @@ static const char *xen_swiotlb_error(enum xen_swiotlb_err err) int __ref xen_swiotlb_init(void) { enum xen_swiotlb_err m_ret = XEN_SWIOTLB_UNKNOWN; - unsigned long nslabs, bytes, order; - unsigned int repeat = 3; + unsigned long bytes = swiotlb_size_or_default(); + unsigned long nslabs = bytes >> IO_TLB_SHIFT; + unsigned int order, repeat = 3; int rc = -ENOMEM; char *start; - nslabs = swiotlb_nr_tbl(); - if (!nslabs) - nslabs = DEFAULT_NSLABS; retry: m_ret = XEN_SWIOTLB_ENOMEM; - bytes = nslabs << IO_TLB_SHIFT; order = get_order(bytes); /* @@ -221,19 +218,16 @@ int __ref xen_swiotlb_init(void) #ifdef CONFIG_X86 void __init xen_swiotlb_init_early(void) { - unsigned long nslabs, bytes; + unsigned long bytes = swiotlb_size_or_default(); + unsigned long nslabs = bytes >> IO_TLB_SHIFT; unsigned int repeat = 3; char *start; int rc; - nslabs = swiotlb_nr_tbl(); - if (!nslabs) - nslabs = DEFAULT_NSLABS; retry: /* * Get IO TLB memory from any location. */ - bytes = nslabs << IO_TLB_SHIFT; start = memblock_alloc(PAGE_ALIGN(bytes), PAGE_SIZE); if (!start) panic("%s: Failed to allocate %lu bytes align=0x%lx\n", @@ -248,8 +242,8 @@ void __init xen_swiotlb_init_early(void) if (repeat--) { /* Min is 2MB */ nslabs = max(1024UL, (nslabs >> 1)); - pr_info("Lowering to %luMB\n", - (nslabs << IO_TLB_SHIFT) >> 20); + bytes = nslabs << IO_TLB_SHIFT; + pr_info("Lowering to %luMB\n", bytes >> 20); goto retry; } panic("%s (rc:%d)", xen_swiotlb_error(XEN_SWIOTLB_EFIXUP), rc); @@ -548,7 +542,7 @@ xen_swiotlb_sync_sg_for_device(struct device *dev, struct scatterlist *sgl, static int xen_swiotlb_dma_supported(struct device *hwdev, u64 mask) { - return xen_phys_to_dma(hwdev, io_tlb_default_mem.end - 1) <= mask; + return xen_phys_to_dma(hwdev, io_tlb_default_mem->end - 1) <= mask; } const struct dma_map_ops xen_swiotlb_dma_ops = { diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h index 5ec5378b17c333..63f7a63f61d098 100644 --- a/include/linux/swiotlb.h +++ b/include/linux/swiotlb.h @@ -90,28 +90,30 @@ struct io_tlb_mem { phys_addr_t end; unsigned long nslabs; unsigned long used; - unsigned int *list; unsigned int index; - phys_addr_t *orig_addr; - size_t *alloc_size; spinlock_t lock; struct dentry *debugfs; bool late_alloc; + struct io_tlb_slot { + phys_addr_t orig_addr; + size_t alloc_size; + unsigned int list; + } slots[]; }; -extern struct io_tlb_mem io_tlb_default_mem; +extern struct io_tlb_mem *io_tlb_default_mem; static inline bool is_swiotlb_buffer(phys_addr_t paddr) { - struct io_tlb_mem *mem = _tlb_default_mem; + struct io_tlb_mem *mem = io_tlb_default_mem; - return paddr >= mem->start && paddr < mem->end; + return mem && paddr >= mem->start && paddr < mem->end; } void __init swiotlb_exit(void); unsigned int swiotlb_max_segment(void); size_t swiotlb_max_mapping_size(struct device *dev); bool is_swiotlb_active(void); -void __init swiotlb_adjust_size(unsigned long new_size); +void __init swiotlb_adjust_size(unsigned long size); #else #define swiotlb_force SWIOTLB_NO_FORCE static inline bool is_swiotlb_buffer(phys_addr_t paddr) @@ -135,7 +137,7 @@ static inline bool is_swiotlb_active(void) return false; } -static inline void swiotlb_adjust_size(unsigned long new_size) +static inline void swiotlb_adjust_size(unsigned long size) { } #endif /* CONFIG_SWIOTLB */ diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c index d9c097f0f78cec..13de669a9b4681 100644 --- a/kernel/dma/swiotlb.c +++ b/kernel/dma/swiotlb.c @@ -63,7 +63,7 @@ enum swiotlb_force swiotlb_force; -struct io_tlb_mem io_tlb_default_mem; +struct io_tlb_mem *io_tlb_default_mem; /* * Max segment that we can
[PATCH 1/3] swiotlb: move global variables into a new io_tlb_mem structure
From: Claire Chang Added a new struct, io_tlb_mem, as the IO TLB memory pool descriptor and moved relevant global variables into that struct. This will be useful later to allow for restricted DMA pool. Signed-off-by: Claire Chang [hch: rebased] Signed-off-by: Christoph Hellwig --- drivers/xen/swiotlb-xen.c | 2 +- include/linux/swiotlb.h | 43 - kernel/dma/swiotlb.c | 354 ++ 3 files changed, 206 insertions(+), 193 deletions(-) diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c index 4ecfce2c6f7263..5329ad54a5f34e 100644 --- a/drivers/xen/swiotlb-xen.c +++ b/drivers/xen/swiotlb-xen.c @@ -548,7 +548,7 @@ xen_swiotlb_sync_sg_for_device(struct device *dev, struct scatterlist *sgl, static int xen_swiotlb_dma_supported(struct device *hwdev, u64 mask) { - return xen_phys_to_dma(hwdev, io_tlb_end - 1) <= mask; + return xen_phys_to_dma(hwdev, io_tlb_default_mem.end - 1) <= mask; } const struct dma_map_ops xen_swiotlb_dma_ops = { diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h index 0696bdc8072e97..5ec5378b17c333 100644 --- a/include/linux/swiotlb.h +++ b/include/linux/swiotlb.h @@ -6,6 +6,7 @@ #include #include #include +#include struct device; struct page; @@ -61,11 +62,49 @@ dma_addr_t swiotlb_map(struct device *dev, phys_addr_t phys, #ifdef CONFIG_SWIOTLB extern enum swiotlb_force swiotlb_force; -extern phys_addr_t io_tlb_start, io_tlb_end; + +/** + * struct io_tlb_mem - IO TLB Memory Pool Descriptor + * + * @start: The start address of the swiotlb memory pool. Used to do a quick + * range check to see if the memory was in fact allocated by this + * API. + * @end: The end address of the swiotlb memory pool. Used to do a quick + * range check to see if the memory was in fact allocated by this + * API. + * @nslabs:The number of IO TLB blocks (in groups of 64) between @start and + * @end. This is command line adjustable via setup_io_tlb_npages. + * @used: The number of used IO TLB block. + * @list: The free list describing the number of free entries available + * from each index. + * @index: The index to start searching in the next round. + * @orig_addr: The original address corresponding to a mapped entry. + * @alloc_size:Size of the allocated buffer. + * @lock: The lock to protect the above data structures in the map and + * unmap calls. + * @debugfs: The dentry to debugfs. + * @late_alloc:%true if allocated using the page allocator + */ +struct io_tlb_mem { + phys_addr_t start; + phys_addr_t end; + unsigned long nslabs; + unsigned long used; + unsigned int *list; + unsigned int index; + phys_addr_t *orig_addr; + size_t *alloc_size; + spinlock_t lock; + struct dentry *debugfs; + bool late_alloc; +}; +extern struct io_tlb_mem io_tlb_default_mem; static inline bool is_swiotlb_buffer(phys_addr_t paddr) { - return paddr >= io_tlb_start && paddr < io_tlb_end; + struct io_tlb_mem *mem = _tlb_default_mem; + + return paddr >= mem->start && paddr < mem->end; } void __init swiotlb_exit(void); diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c index 35e24f0ff8b207..d9c097f0f78cec 100644 --- a/kernel/dma/swiotlb.c +++ b/kernel/dma/swiotlb.c @@ -59,32 +59,11 @@ */ #define IO_TLB_MIN_SLABS ((1<<20) >> IO_TLB_SHIFT) -enum swiotlb_force swiotlb_force; - -/* - * Used to do a quick range check in swiotlb_tbl_unmap_single and - * swiotlb_tbl_sync_single_*, to see if the memory was in fact allocated by this - * API. - */ -phys_addr_t io_tlb_start, io_tlb_end; - -/* - * The number of IO TLB blocks (in groups of 64) between io_tlb_start and - * io_tlb_end. This is command line adjustable via setup_io_tlb_npages. - */ -static unsigned long io_tlb_nslabs; +#define INVALID_PHYS_ADDR (~(phys_addr_t)0) -/* - * The number of used IO TLB block - */ -static unsigned long io_tlb_used; +enum swiotlb_force swiotlb_force; -/* - * This is a free list describing the number of free entries available from - * each index - */ -static unsigned int *io_tlb_list; -static unsigned int io_tlb_index; +struct io_tlb_mem io_tlb_default_mem; /* * Max segment that we can provide which (if pages are contingous) will @@ -92,32 +71,15 @@ static unsigned int io_tlb_index; */ static unsigned int max_segment; -/* - * We need to save away the original address corresponding to a mapped entry - * for the sync operations. - */ -#define INVALID_PHYS_ADDR (~(phys_addr_t)0) -static phys_addr_t *io_tlb_orig_addr; - -/* - * The mapped buffer's size should be validated during a sync operation. - */ -static size_t *io_tlb_alloc_size; - -/* - * Protect the above data structures in the map and unmap calls - */ -static DEFINE_SPINLOCK(io_tlb_lock); - -static int late_alloc; - static int __init
swiotlb cleanups v3
Hi Konrad, this series contains a bunch of swiotlb cleanups, mostly to reduce the amount of internals exposed to code outside of swiotlb.c, which should helper to prepare for supporting multiple different bounce buffer pools. Changes since v2: - fix a bisetion hazard that did not allocate the alloc_size array - dropped all patches already merged Changes since v1: - rebased to v5.12-rc1 - a few more cleanups - merge and forward port the patch from Claire to move all the global variables into a struct to prepare for multiple instances
Re: Advice needed on SMP regression after cpu_core_mask change
On 3/18/21 10:42 AM, Srikar Dronamraju wrote: * Daniel Henrique Barboza [2021-03-17 10:00:34]: Hello, Patch 4bce545903fa ("powerpc/topology: Update topology_core_cpumask") introduced a regression in both upstream and RHEL downstream kernels [1]. The assumption made in the commit: "Further analysis shows that cpu_core_mask and cpu_cpu_mask for any CPU would be equal on Power" Doesn't seem to be true. After this commit, QEMU is now unable to set single NUMA node SMP topologies such as: -smp 8,maxcpus=8,cores=2,threads=2,sockets=2 What does it mean for a NUMA to have more than one sockets? If they are all part of the same node, there are at local distance to each other. cache is per core. So what resources are shared by the Sockets that are part of the same NUMA. And how does Userspace/ application make use of the same. Honestly, I sympathize with the idea that multiple sockets in the same NUMA node being "weird". QEMU is accepting this kind of topology since forever because we didn't pay attention to these other details. I don't see any problems adding more constraints that makes sense in the virtual layer, as long as the constraints make sense and are documented. Putting multiple sockets in a single NUMA node seems like a fair restriction. Please don't mistake this as attempt to downplay your report but a honest attempt to better understand the situation. It's cool. Ask away. For example, if the socket denotes the hemisphere logic in P10, then can we see if the coregroup feature can be used. "Coregroup" is suppose to mean a set of cores within a NUMA that have some characteristics and there can be multiple coregroups within a NUMA. We add that mostly to mimic hemisphere in P10. However the number of coregroups in a NUMA is not exported to userspace at this time. I see. I thought that the presence of the hemispheres inside the chip would justify more than one NUMA node inside the chip, meaning that a chip/socket would have more than one NUMA nodes inside of it. If that's not the case then I guess socket == NUMA node is still valid in P10 as well. The last 'lscpu' example I gave here, claiming that this would be a Power10 scenario, doesn't represent P10 after all. However if each Socket is associated with a memory and node distance, then should they be NUMA? Can you provide me with the unique ibm,chip-ids in your 2 NUMA, 4 node case? Does this cause an performance issues with the guest/application? I can fetch some values, but we're trying to move out of it since it's not on the pseries spec (PAPR). Perhaps with these restrictions we can live without ibm,chip-id in QEMU. Till your report, I was under the impression that NUMAs == Sockets. After reading and discussing about it, I think the sensible thing to do is to put this same constraint in QEMU. In theory it would be nice to let the virtual machine to have whatever topology it wants, multiple sockets in the same NUMA domain and so on, but in the end we're emulating Power hardware. If Power hardware - and the powerpc kernel - operates under these assumptions, then I don't see much point into allowing users to set unrealistic virtual CPU topologies that will be misrepresented in the kernel. I'll try this restriction in QEMU and see how upstream kernel behaves, with and without ibm,chip-id being advertised in the DT. Thanks, DHB lscpu will give the following output in this case: # lscpu Architecture:ppc64le Byte Order: Little Endian CPU(s): 8 On-line CPU(s) list: 0-7 Thread(s) per core: 2 Core(s) per socket: 4 Socket(s): 1 NUMA node(s):1 Model: 2.2 (pvr 004e 1202) Model name: POWER9 (architected), altivec supported Hypervisor vendor: KVM Virtualization type: para L1d cache: 32K L1i cache: 32K NUMA node0 CPU(s): 0-7 This is happening because the macro cpu_cpu_mask(cpu) expands to cpumask_of_node(cpu_to_node(cpu)), which in turn expands to node_to_cpumask_map[node]. node_to_cpumask_map is a NUMA array that maps CPUs to NUMA nodes (Aneesh is on CC to correct me if I'm wrong). We're now associating sockets to NUMA nodes directly. If I add a second NUMA node then I can get the intended smp topology: -smp 8,maxcpus=8,cores=2,threads=2,sockets=2 -numa node,memdev=mem0,cpus=0-3,nodeid=0 \ -numa node,memdev=mem1,cpus=4-7,nodeid=1 \ # lscpu Architecture:ppc64le Byte Order: Little Endian CPU(s): 8 On-line CPU(s) list: 0-7 Thread(s) per core: 2 Core(s) per socket: 2 Socket(s): 2 NUMA node(s):2 Model: 2.2 (pvr 004e 1202) Model name: POWER9 (architected), altivec supported Hypervisor vendor: KVM Virtualization type: para L1d cache: 32K L1i cache: 32K NUMA node0 CPU(s): 0-3 NUMA node1 CPU(s): 4-7 However, if I try a single socket with multiple NUMA nodes topology, which is the case of Power10, e.g.: -smp
Re: Advice needed on SMP regression after cpu_core_mask change
* Daniel Henrique Barboza [2021-03-17 10:00:34]: > Hello, > > Patch 4bce545903fa ("powerpc/topology: Update topology_core_cpumask") > introduced > a regression in both upstream and RHEL downstream kernels [1]. The assumption > made > in the commit: > > "Further analysis shows that cpu_core_mask and cpu_cpu_mask for any CPU would > be > equal on Power" > > Doesn't seem to be true. After this commit, QEMU is now unable to set single > NUMA > node SMP topologies such as: > > -smp 8,maxcpus=8,cores=2,threads=2,sockets=2 What does it mean for a NUMA to have more than one sockets? If they are all part of the same node, there are at local distance to each other. cache is per core. So what resources are shared by the Sockets that are part of the same NUMA. And how does Userspace/ application make use of the same. Please don't mistake this as attempt to downplay your report but a honest attempt to better understand the situation. For example, if the socket denotes the hemisphere logic in P10, then can we see if the coregroup feature can be used. "Coregroup" is suppose to mean a set of cores within a NUMA that have some characteristics and there can be multiple coregroups within a NUMA. We add that mostly to mimic hemisphere in P10. However the number of coregroups in a NUMA is not exported to userspace at this time. However if each Socket is associated with a memory and node distance, then should they be NUMA? Can you provide me with the unique ibm,chip-ids in your 2 NUMA, 4 node case? Does this cause an performance issues with the guest/application? Till your report, I was under the impression that NUMAs == Sockets. > > lscpu will give the following output in this case: > > # lscpu > Architecture:ppc64le > Byte Order: Little Endian > CPU(s): 8 > On-line CPU(s) list: 0-7 > Thread(s) per core: 2 > Core(s) per socket: 4 > Socket(s): 1 > NUMA node(s):1 > Model: 2.2 (pvr 004e 1202) > Model name: POWER9 (architected), altivec supported > Hypervisor vendor: KVM > Virtualization type: para > L1d cache: 32K > L1i cache: 32K > NUMA node0 CPU(s): 0-7 > > > This is happening because the macro cpu_cpu_mask(cpu) expands to > cpumask_of_node(cpu_to_node(cpu)), which in turn expands to > node_to_cpumask_map[node]. > node_to_cpumask_map is a NUMA array that maps CPUs to NUMA nodes (Aneesh is > on CC to > correct me if I'm wrong). We're now associating sockets to NUMA nodes > directly. > > If I add a second NUMA node then I can get the intended smp topology: > > -smp 8,maxcpus=8,cores=2,threads=2,sockets=2 > -numa node,memdev=mem0,cpus=0-3,nodeid=0 \ > -numa node,memdev=mem1,cpus=4-7,nodeid=1 \ > > # lscpu > Architecture:ppc64le > Byte Order: Little Endian > CPU(s): 8 > On-line CPU(s) list: 0-7 > Thread(s) per core: 2 > Core(s) per socket: 2 > Socket(s): 2 > NUMA node(s):2 > Model: 2.2 (pvr 004e 1202) > Model name: POWER9 (architected), altivec supported > Hypervisor vendor: KVM > Virtualization type: para > L1d cache: 32K > L1i cache: 32K > NUMA node0 CPU(s): 0-3 > NUMA node1 CPU(s): 4-7 > > > However, if I try a single socket with multiple NUMA nodes topology, which is > the case > of Power10, e.g.: > > > -smp 8,maxcpus=8,cores=4,threads=2,sockets=1 > -numa node,memdev=mem0,cpus=0-3,nodeid=0 \ > -numa node,memdev=mem1,cpus=4-7,nodeid=1 \ > > > This is the result: > > # lscpu > Architecture:ppc64le > Byte Order: Little Endian > CPU(s): 8 > On-line CPU(s) list: 0-7 > Thread(s) per core: 2 > Core(s) per socket: 2 > Socket(s): 2 > NUMA node(s):2 > Model: 2.2 (pvr 004e 1202) > Model name: POWER9 (architected), altivec supported > Hypervisor vendor: KVM > Virtualization type: para > L1d cache: 32K > L1i cache: 32K > NUMA node0 CPU(s): 0-3 > NUMA node1 CPU(s): 4-7 > > > This confirms my suspicions that, at this moment, we're making sockets == > NUMA nodes. > > > Cedric, the reason I'm CCing you is because this is related to ibm,chip-id. > The commit > after the one that caused the regression, 4ca234a9cbd7c3a65 ("powerpc/smp: > Stop updating > cpu_core_mask"), is erasing the code that calculated cpu_core_mask. > cpu_core_mask, despite > its shortcomings that caused its removal, was giving a precise SMP topology. > And it was > using physical_package_id/'ibm,chip-id' for that. > > Checking in QEMU I can say that the ibm,chip-id calculation is the only place > in the code > that cares about cores per socket information. The kernel is now ignoring > that, starting > on 4bce545903fa, and now QEMU is unable to provide this info to the guest. > > If we're not going to use ibm,chip-id any longer, which seems sensible given > that PAPR does > not declare it, we need another way of letting the guest know how
[PATCH v3 00/10] Rid W=1 warnings in Crypto
This is set 1 of 2 sets required to fully clean Crypto. v2: No functional changes since v1. v3: Description change and additional struct header fix Lee Jones (10): crypto: hisilicon: sec_drv: Supply missing description for 'sec_queue_empty()'s 'queue' param crypto: bcm: Fix a whole host of kernel-doc misdemeanours crypto: chelsio: chcr_core: Fix some kernel-doc issues crypto: ux500: hash: hash_core: Fix worthy kernel-doc headers and remove others crypto: keembay: ocs-hcu: Fix incorrectly named functions/structs crypto: atmel-ecc: Struct headers need to start with keyword 'struct' crypto: caam: caampkc: Provide the name of the function and provide missing descriptions crypto: vmx: Source headers are not good kernel-doc candidates crypto: nx: nx-aes-cbc: Repair some kernel-doc problems crypto: cavium: nitrox_isr: Demote non-compliant kernel-doc headers drivers/crypto/atmel-ecc.c| 2 +- drivers/crypto/bcm/cipher.c | 7 ++-- drivers/crypto/bcm/spu.c | 16 - drivers/crypto/bcm/spu2.c | 43 +-- drivers/crypto/bcm/util.c | 4 +-- drivers/crypto/caam/caamalg_qi2.c | 3 ++ drivers/crypto/caam/caampkc.c | 3 +- drivers/crypto/cavium/nitrox/nitrox_isr.c | 4 +-- drivers/crypto/chelsio/chcr_algo.c| 8 ++--- drivers/crypto/chelsio/chcr_core.c| 2 +- drivers/crypto/hisilicon/sec/sec_drv.c| 1 + drivers/crypto/keembay/ocs-hcu.c | 8 ++--- drivers/crypto/nx/nx-aes-cbc.c| 2 +- drivers/crypto/nx/nx.c| 5 +-- drivers/crypto/nx/nx_debugfs.c| 2 +- drivers/crypto/ux500/cryp/cryp.c | 5 +-- drivers/crypto/ux500/cryp/cryp_core.c | 5 +-- drivers/crypto/ux500/cryp/cryp_irq.c | 2 +- drivers/crypto/ux500/hash/hash_core.c | 15 +++- drivers/crypto/vmx/vmx.c | 2 +- 20 files changed, 73 insertions(+), 66 deletions(-) Cc: Alexandre Belloni Cc: Andreas Westin Cc: Atul Gupta Cc: Aymen Sghaier Cc: Ayush Sawal Cc: Benjamin Herrenschmidt Cc: Berne Hebark Cc: "Breno Leitão" Cc: Daniele Alessandrelli Cc: "David S. Miller" Cc: Declan Murphy Cc: Harsh Jain Cc: Henrique Cerri Cc: Herbert Xu Cc: "Horia Geantă" Cc: Jitendra Lulla Cc: Joakim Bech Cc: Jonas Linde Cc: Jonathan Cameron Cc: Kent Yoder Cc: linux-arm-ker...@lists.infradead.org Cc: linux-cry...@vger.kernel.org Cc: linuxppc-dev@lists.ozlabs.org Cc: Ludovic Desroches Cc: Manoj Malviya Cc: Michael Ellerman Cc: M R Gowda Cc: Nayna Jain Cc: Nicolas Ferre Cc: Niklas Hernaeus Cc: Paul Mackerras Cc: Paulo Flabiano Smorigo Cc: Rob Rice Cc: Rohit Maheshwari Cc: Shujuan Chen Cc: Tudor Ambarus Cc: Vinay Kumar Yadav Cc: Zaibo Xu -- 2.27.0
[PATCH 09/10] crypto: nx: nx-aes-cbc: Repair some kernel-doc problems
Fixes the following W=1 kernel build warning(s): drivers/crypto/nx/nx-aes-cbc.c:24: warning: Function parameter or member 'tfm' not described in 'cbc_aes_nx_set_key' drivers/crypto/nx/nx-aes-cbc.c:24: warning: Function parameter or member 'in_key' not described in 'cbc_aes_nx_set_key' drivers/crypto/nx/nx-aes-cbc.c:24: warning: Function parameter or member 'key_len' not described in 'cbc_aes_nx_set_key' drivers/crypto/nx/nx-aes-cbc.c:24: warning: expecting prototype for Nest Accelerators driver(). Prototype was for cbc_aes_nx_set_key() instead drivers/crypto/nx/nx_debugfs.c:34: warning: Function parameter or member 'drv' not described in 'nx_debugfs_init' drivers/crypto/nx/nx_debugfs.c:34: warning: expecting prototype for Nest Accelerators driver(). Prototype was for nx_debugfs_init() instead drivers/crypto/nx/nx.c:31: warning: Incorrect use of kernel-doc format: * nx_hcall_sync - make an H_COP_OP hcall for the passed in op structure drivers/crypto/nx/nx.c:43: warning: Function parameter or member 'nx_ctx' not described in 'nx_hcall_sync' drivers/crypto/nx/nx.c:43: warning: Function parameter or member 'op' not described in 'nx_hcall_sync' drivers/crypto/nx/nx.c:43: warning: Function parameter or member 'may_sleep' not described in 'nx_hcall_sync' drivers/crypto/nx/nx.c:43: warning: expecting prototype for Nest Accelerators driver(). Prototype was for nx_hcall_sync() instead drivers/crypto/nx/nx.c:209: warning: Function parameter or member 'nbytes' not described in 'trim_sg_list' Cc: "Breno Leitão" Cc: Nayna Jain Cc: Paulo Flabiano Smorigo Cc: Michael Ellerman Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Herbert Xu Cc: "David S. Miller" Cc: Kent Yoder Cc: linux-cry...@vger.kernel.org Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: Lee Jones --- drivers/crypto/nx/nx-aes-cbc.c | 2 +- drivers/crypto/nx/nx.c | 5 +++-- drivers/crypto/nx/nx_debugfs.c | 2 +- 3 files changed, 5 insertions(+), 4 deletions(-) diff --git a/drivers/crypto/nx/nx-aes-cbc.c b/drivers/crypto/nx/nx-aes-cbc.c index 92e921eceed75..d6314ea9ae896 100644 --- a/drivers/crypto/nx/nx-aes-cbc.c +++ b/drivers/crypto/nx/nx-aes-cbc.c @@ -1,5 +1,5 @@ // SPDX-License-Identifier: GPL-2.0-only -/** +/* * AES CBC routines supporting the Power 7+ Nest Accelerators driver * * Copyright (C) 2011-2012 International Business Machines Inc. diff --git a/drivers/crypto/nx/nx.c b/drivers/crypto/nx/nx.c index 1d0e8a1ba1605..010e87d9da36b 100644 --- a/drivers/crypto/nx/nx.c +++ b/drivers/crypto/nx/nx.c @@ -1,5 +1,5 @@ // SPDX-License-Identifier: GPL-2.0-only -/** +/* * Routines supporting the Power 7+ Nest Accelerators driver * * Copyright (C) 2011-2012 International Business Machines Inc. @@ -200,7 +200,8 @@ struct nx_sg *nx_walk_and_build(struct nx_sg *nx_dst, * @sg: sg list head * @end: sg lisg end * @delta: is the amount we need to crop in order to bound the list. - * + * @nbytes: length of data in the scatterlists or data length - whichever + * is greater. */ static long int trim_sg_list(struct nx_sg *sg, struct nx_sg *end, diff --git a/drivers/crypto/nx/nx_debugfs.c b/drivers/crypto/nx/nx_debugfs.c index 1975bcbee9974..ee7cd88bb10a7 100644 --- a/drivers/crypto/nx/nx_debugfs.c +++ b/drivers/crypto/nx/nx_debugfs.c @@ -1,5 +1,5 @@ // SPDX-License-Identifier: GPL-2.0-only -/** +/* * debugfs routines supporting the Power 7+ Nest Accelerators driver * * Copyright (C) 2011-2012 International Business Machines Inc. -- 2.27.0
[PATCH 08/10] crypto: vmx: Source headers are not good kernel-doc candidates
Fixes the following W=1 kernel build warning(s): drivers/crypto/vmx/vmx.c:23: warning: expecting prototype for Routines supporting VMX instructions on the Power 8(). Prototype was for p8_init() instead Cc: "Breno Leitão" Cc: Nayna Jain Cc: Paulo Flabiano Smorigo Cc: Michael Ellerman Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Herbert Xu Cc: "David S. Miller" Cc: Henrique Cerri Cc: linux-cry...@vger.kernel.org Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: Lee Jones --- drivers/crypto/vmx/vmx.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/crypto/vmx/vmx.c b/drivers/crypto/vmx/vmx.c index a40d08e75fc0b..7eb713cc87c8c 100644 --- a/drivers/crypto/vmx/vmx.c +++ b/drivers/crypto/vmx/vmx.c @@ -1,5 +1,5 @@ // SPDX-License-Identifier: GPL-2.0-only -/** +/* * Routines supporting VMX instructions on the Power 8 * * Copyright (C) 2015 International Business Machines Inc. -- 2.27.0
Re: [PATCH 1/2] audit: add support for the openat2 syscall
On 2021-03-18 11:48, Christian Brauner wrote: > [+Cc Aleksa, the author of openat2()] Ah! Thanks for pulling in Aleksa. I thought I caught everyone... > and a comment below. :) Same... > On Wed, Mar 17, 2021 at 09:47:17PM -0400, Richard Guy Briggs wrote: > > The openat2(2) syscall was added in kernel v5.6 with commit fddb5d430ad9 > > ("open: introduce openat2(2) syscall") > > > > Add the openat2(2) syscall to the audit syscall classifier. > > > > See the github issue > > https://github.com/linux-audit/audit-kernel/issues/67 > > > > Signed-off-by: Richard Guy Briggs > > --- > > arch/alpha/kernel/audit.c | 2 ++ > > arch/ia64/kernel/audit.c | 2 ++ > > arch/parisc/kernel/audit.c | 2 ++ > > arch/parisc/kernel/compat_audit.c | 2 ++ > > arch/powerpc/kernel/audit.c| 2 ++ > > arch/powerpc/kernel/compat_audit.c | 2 ++ > > arch/s390/kernel/audit.c | 2 ++ > > arch/s390/kernel/compat_audit.c| 2 ++ > > arch/sparc/kernel/audit.c | 2 ++ > > arch/sparc/kernel/compat_audit.c | 2 ++ > > arch/x86/ia32/audit.c | 2 ++ > > arch/x86/kernel/audit_64.c | 2 ++ > > kernel/auditsc.c | 3 +++ > > lib/audit.c| 4 > > lib/compat_audit.c | 4 > > 15 files changed, 35 insertions(+) > > > > diff --git a/arch/alpha/kernel/audit.c b/arch/alpha/kernel/audit.c > > index 96a9d18ff4c4..06a911b685d1 100644 > > --- a/arch/alpha/kernel/audit.c > > +++ b/arch/alpha/kernel/audit.c > > @@ -42,6 +42,8 @@ int audit_classify_syscall(int abi, unsigned syscall) > > return 3; > > case __NR_execve: > > return 5; > > + case __NR_openat2: > > + return 6; > > default: > > return 0; > > } > > diff --git a/arch/ia64/kernel/audit.c b/arch/ia64/kernel/audit.c > > index 5192ca899fe6..5eaa888c8fd3 100644 > > --- a/arch/ia64/kernel/audit.c > > +++ b/arch/ia64/kernel/audit.c > > @@ -43,6 +43,8 @@ int audit_classify_syscall(int abi, unsigned syscall) > > return 3; > > case __NR_execve: > > return 5; > > + case __NR_openat2: > > + return 6; > > default: > > return 0; > > } > > diff --git a/arch/parisc/kernel/audit.c b/arch/parisc/kernel/audit.c > > index 9eb47b2225d2..fc721a7727ba 100644 > > --- a/arch/parisc/kernel/audit.c > > +++ b/arch/parisc/kernel/audit.c > > @@ -52,6 +52,8 @@ int audit_classify_syscall(int abi, unsigned syscall) > > return 3; > > case __NR_execve: > > return 5; > > + case __NR_openat2: > > + return 6; > > default: > > return 0; > > } > > diff --git a/arch/parisc/kernel/compat_audit.c > > b/arch/parisc/kernel/compat_audit.c > > index 20c39c9d86a9..fc6d35918c44 100644 > > --- a/arch/parisc/kernel/compat_audit.c > > +++ b/arch/parisc/kernel/compat_audit.c > > @@ -35,6 +35,8 @@ int parisc32_classify_syscall(unsigned syscall) > > return 3; > > case __NR_execve: > > return 5; > > + case __NR_openat2: > > + return 6; > > default: > > return 1; > > } > > diff --git a/arch/powerpc/kernel/audit.c b/arch/powerpc/kernel/audit.c > > index a27f3d09..8f32700b0baa 100644 > > --- a/arch/powerpc/kernel/audit.c > > +++ b/arch/powerpc/kernel/audit.c > > @@ -54,6 +54,8 @@ int audit_classify_syscall(int abi, unsigned syscall) > > return 4; > > case __NR_execve: > > return 5; > > + case __NR_openat2: > > + return 6; > > default: > > return 0; > > } > > diff --git a/arch/powerpc/kernel/compat_audit.c > > b/arch/powerpc/kernel/compat_audit.c > > index 55c6ccda0a85..ebe45534b1c9 100644 > > --- a/arch/powerpc/kernel/compat_audit.c > > +++ b/arch/powerpc/kernel/compat_audit.c > > @@ -38,6 +38,8 @@ int ppc32_classify_syscall(unsigned syscall) > > return 4; > > case __NR_execve: > > return 5; > > + case __NR_openat2: > > + return 6; > > default: > > return 1; > > } > > diff --git a/arch/s390/kernel/audit.c b/arch/s390/kernel/audit.c > > index d395c6c9944c..d964cb94cfaf 100644 > > --- a/arch/s390/kernel/audit.c > > +++ b/arch/s390/kernel/audit.c > > @@ -54,6 +54,8 @@ int audit_classify_syscall(int abi, unsigned syscall) > > return 4; > > case __NR_execve: > > return 5; > > + case __NR_openat2: > > + return 6; > > default: > > return 0; > > } > > diff --git a/arch/s390/kernel/compat_audit.c > > b/arch/s390/kernel/compat_audit.c > > index 444fb1f66944..f7b32933ce0e 100644 > > --- a/arch/s390/kernel/compat_audit.c > > +++ b/arch/s390/kernel/compat_audit.c > > @@ -39,6 +39,8 @@ int s390_classify_syscall(unsigned syscall) > > return 4; > > case __NR_execve: > > return 5; > > + case __NR_openat2: > > + return 6; > > default: > > return 1; >
Re: [PATCH 1/2] audit: add support for the openat2 syscall
On 2021-03-18 11:52, Christian Brauner wrote: > On Thu, Mar 18, 2021 at 11:48:45AM +0100, Christian Brauner wrote: > > On Wed, Mar 17, 2021 at 09:47:17PM -0400, Richard Guy Briggs wrote: > > > The openat2(2) syscall was added in kernel v5.6 with commit fddb5d430ad9 > > > ("open: introduce openat2(2) syscall") > > > Add the openat2(2) syscall to the audit syscall classifier. > > > See the github issue > > > https://github.com/linux-audit/audit-kernel/issues/67 > > > Signed-off-by: Richard Guy Briggs ... > And one more comment, why return a hard-coded integer from all of these > architectures instead of introducing an enum in a central place with > proper names idk: Oh, believe me, I tried hard to do that because I really don't like hard-coded magic values, but for expediency I continued the same approach until I could sort out the header file mess. There was an extra preparatory patch (attached) in this patchset with a different audit syscall perms patch (also attached). By including "#include " in each of the compat source files there were warnings of redefinitions of every __NR_* syscall number. The easiest way to get rid of it would have been to pull the new AUDITSC_* definitions into a new file and include that from and each of the arch/*/*/*audit.c (and lib/*audit.c) files. > enum audit_match_perm_t { > . > . > . > AUDIT_MATCH_PERM_EXECVE = 5, > AUDIT_MATCH_PERM_OPENAT2 = 6, > . > . > . > } > > Then you can drop these hard-coded comments too and it's way less > brittle overall. Totally agree. > Christian - RGB -- Richard Guy Briggs Sr. S/W Engineer, Kernel Security, Base Operating Systems Remote, Ottawa, Red Hat Canada IRC: rgb, SunRaycer Voice: +1.647.777.2635, Internal: (81) 32635 >From 599ae48091296a3ad3eb4259e7af39cdf0f743c7 Mon Sep 17 00:00:00 2001 Message-Id: <599ae48091296a3ad3eb4259e7af39cdf0f743c7.1616067847.git@redhat.com> In-Reply-To: References: From: Richard Guy Briggs Date: Fri, 22 Jan 2021 16:27:42 -0500 Subject: [PATCH 1/3] audit: replace magic audit syscall class numbers with macros Replace the magic numbers used to indicate audit syscall classes with macros. Signed-off-by: Richard Guy Briggs --- arch/alpha/kernel/audit.c | 8 arch/ia64/kernel/audit.c | 8 arch/parisc/kernel/audit.c | 8 arch/parisc/kernel/compat_audit.c | 9 + arch/powerpc/kernel/audit.c| 10 +- arch/powerpc/kernel/compat_audit.c | 11 ++- arch/s390/kernel/audit.c | 10 +- arch/s390/kernel/compat_audit.c| 11 ++- arch/sparc/kernel/audit.c | 10 +- arch/sparc/kernel/compat_audit.c | 11 ++- arch/x86/ia32/audit.c | 11 ++- arch/x86/kernel/audit_64.c | 8 include/linux/audit.h | 7 +++ kernel/auditsc.c | 12 ++-- lib/audit.c| 10 +- lib/compat_audit.c | 11 ++- 16 files changed, 84 insertions(+), 71 deletions(-) diff --git a/arch/alpha/kernel/audit.c b/arch/alpha/kernel/audit.c index 96a9d18ff4c4..81cbd804e375 100644 --- a/arch/alpha/kernel/audit.c +++ b/arch/alpha/kernel/audit.c @@ -37,13 +37,13 @@ int audit_classify_syscall(int abi, unsigned syscall) { switch(syscall) { case __NR_open: - return 2; + return AUDITSC_OPEN; case __NR_openat: - return 3; + return AUDITSC_OPENAT; case __NR_execve: - return 5; + return AUDITSC_EXECVE; default: - return 0; + return AUDITSC_NATIVE; } } diff --git a/arch/ia64/kernel/audit.c b/arch/ia64/kernel/audit.c index 5192ca899fe6..dba6a74c9ab3 100644 --- a/arch/ia64/kernel/audit.c +++ b/arch/ia64/kernel/audit.c @@ -38,13 +38,13 @@ int audit_classify_syscall(int abi, unsigned syscall) { switch(syscall) { case __NR_open: - return 2; + return AUDITSC_OPEN; case __NR_openat: - return 3; + return AUDITSC_OPENAT; case __NR_execve: - return 5; + return AUDITSC_EXECVE; default: - return 0; + return AUDITSC_NATIVE; } } diff --git a/arch/parisc/kernel/audit.c b/arch/parisc/kernel/audit.c index 9eb47b2225d2..14244e83db75 100644 --- a/arch/parisc/kernel/audit.c +++ b/arch/parisc/kernel/audit.c @@ -47,13 +47,13 @@ int audit_classify_syscall(int abi, unsigned syscall) #endif switch (syscall) { case __NR_open: - return 2; + return AUDITSC_OPEN; case __NR_openat: - return 3; + return AUDITSC_OPENAT; case __NR_execve: - return 5; + return AUDITSC_EXECVE; default: - return 0; +
Re: [PATCH 1/2] audit: add support for the openat2 syscall
On Thu, Mar 18, 2021 at 11:48:45AM +0100, Christian Brauner wrote: > [+Cc Aleksa, the author of openat2()] > > and a comment below. :) > > On Wed, Mar 17, 2021 at 09:47:17PM -0400, Richard Guy Briggs wrote: > > The openat2(2) syscall was added in kernel v5.6 with commit fddb5d430ad9 > > ("open: introduce openat2(2) syscall") > > > > Add the openat2(2) syscall to the audit syscall classifier. > > > > See the github issue > > https://github.com/linux-audit/audit-kernel/issues/67 > > > > Signed-off-by: Richard Guy Briggs > > --- > > arch/alpha/kernel/audit.c | 2 ++ > > arch/ia64/kernel/audit.c | 2 ++ > > arch/parisc/kernel/audit.c | 2 ++ > > arch/parisc/kernel/compat_audit.c | 2 ++ > > arch/powerpc/kernel/audit.c| 2 ++ > > arch/powerpc/kernel/compat_audit.c | 2 ++ > > arch/s390/kernel/audit.c | 2 ++ > > arch/s390/kernel/compat_audit.c| 2 ++ > > arch/sparc/kernel/audit.c | 2 ++ > > arch/sparc/kernel/compat_audit.c | 2 ++ > > arch/x86/ia32/audit.c | 2 ++ > > arch/x86/kernel/audit_64.c | 2 ++ > > kernel/auditsc.c | 3 +++ > > lib/audit.c| 4 > > lib/compat_audit.c | 4 > > 15 files changed, 35 insertions(+) > > > > diff --git a/arch/alpha/kernel/audit.c b/arch/alpha/kernel/audit.c > > index 96a9d18ff4c4..06a911b685d1 100644 > > --- a/arch/alpha/kernel/audit.c > > +++ b/arch/alpha/kernel/audit.c > > @@ -42,6 +42,8 @@ int audit_classify_syscall(int abi, unsigned syscall) > > return 3; > > case __NR_execve: > > return 5; > > + case __NR_openat2: > > + return 6; > > default: > > return 0; > > } > > diff --git a/arch/ia64/kernel/audit.c b/arch/ia64/kernel/audit.c > > index 5192ca899fe6..5eaa888c8fd3 100644 > > --- a/arch/ia64/kernel/audit.c > > +++ b/arch/ia64/kernel/audit.c > > @@ -43,6 +43,8 @@ int audit_classify_syscall(int abi, unsigned syscall) > > return 3; > > case __NR_execve: > > return 5; > > + case __NR_openat2: > > + return 6; > > default: > > return 0; > > } > > diff --git a/arch/parisc/kernel/audit.c b/arch/parisc/kernel/audit.c > > index 9eb47b2225d2..fc721a7727ba 100644 > > --- a/arch/parisc/kernel/audit.c > > +++ b/arch/parisc/kernel/audit.c > > @@ -52,6 +52,8 @@ int audit_classify_syscall(int abi, unsigned syscall) > > return 3; > > case __NR_execve: > > return 5; > > + case __NR_openat2: > > + return 6; > > default: > > return 0; > > } > > diff --git a/arch/parisc/kernel/compat_audit.c > > b/arch/parisc/kernel/compat_audit.c > > index 20c39c9d86a9..fc6d35918c44 100644 > > --- a/arch/parisc/kernel/compat_audit.c > > +++ b/arch/parisc/kernel/compat_audit.c > > @@ -35,6 +35,8 @@ int parisc32_classify_syscall(unsigned syscall) > > return 3; > > case __NR_execve: > > return 5; > > + case __NR_openat2: > > + return 6; > > default: > > return 1; > > } > > diff --git a/arch/powerpc/kernel/audit.c b/arch/powerpc/kernel/audit.c > > index a27f3d09..8f32700b0baa 100644 > > --- a/arch/powerpc/kernel/audit.c > > +++ b/arch/powerpc/kernel/audit.c > > @@ -54,6 +54,8 @@ int audit_classify_syscall(int abi, unsigned syscall) > > return 4; > > case __NR_execve: > > return 5; > > + case __NR_openat2: > > + return 6; > > default: > > return 0; > > } > > diff --git a/arch/powerpc/kernel/compat_audit.c > > b/arch/powerpc/kernel/compat_audit.c > > index 55c6ccda0a85..ebe45534b1c9 100644 > > --- a/arch/powerpc/kernel/compat_audit.c > > +++ b/arch/powerpc/kernel/compat_audit.c > > @@ -38,6 +38,8 @@ int ppc32_classify_syscall(unsigned syscall) > > return 4; > > case __NR_execve: > > return 5; > > + case __NR_openat2: > > + return 6; > > default: > > return 1; > > } > > diff --git a/arch/s390/kernel/audit.c b/arch/s390/kernel/audit.c > > index d395c6c9944c..d964cb94cfaf 100644 > > --- a/arch/s390/kernel/audit.c > > +++ b/arch/s390/kernel/audit.c > > @@ -54,6 +54,8 @@ int audit_classify_syscall(int abi, unsigned syscall) > > return 4; > > case __NR_execve: > > return 5; > > + case __NR_openat2: > > + return 6; > > default: > > return 0; > > } > > diff --git a/arch/s390/kernel/compat_audit.c > > b/arch/s390/kernel/compat_audit.c > > index 444fb1f66944..f7b32933ce0e 100644 > > --- a/arch/s390/kernel/compat_audit.c > > +++ b/arch/s390/kernel/compat_audit.c > > @@ -39,6 +39,8 @@ int s390_classify_syscall(unsigned syscall) > > return 4; > > case __NR_execve: > > return 5; > > + case __NR_openat2: > > + return 6; > > default: > > return 1; > > } > > diff --git a/arch/sparc/kernel/audit.c
Re: [PATCH 1/2] audit: add support for the openat2 syscall
[+Cc Aleksa, the author of openat2()] and a comment below. :) On Wed, Mar 17, 2021 at 09:47:17PM -0400, Richard Guy Briggs wrote: > The openat2(2) syscall was added in kernel v5.6 with commit fddb5d430ad9 > ("open: introduce openat2(2) syscall") > > Add the openat2(2) syscall to the audit syscall classifier. > > See the github issue > https://github.com/linux-audit/audit-kernel/issues/67 > > Signed-off-by: Richard Guy Briggs > --- > arch/alpha/kernel/audit.c | 2 ++ > arch/ia64/kernel/audit.c | 2 ++ > arch/parisc/kernel/audit.c | 2 ++ > arch/parisc/kernel/compat_audit.c | 2 ++ > arch/powerpc/kernel/audit.c| 2 ++ > arch/powerpc/kernel/compat_audit.c | 2 ++ > arch/s390/kernel/audit.c | 2 ++ > arch/s390/kernel/compat_audit.c| 2 ++ > arch/sparc/kernel/audit.c | 2 ++ > arch/sparc/kernel/compat_audit.c | 2 ++ > arch/x86/ia32/audit.c | 2 ++ > arch/x86/kernel/audit_64.c | 2 ++ > kernel/auditsc.c | 3 +++ > lib/audit.c| 4 > lib/compat_audit.c | 4 > 15 files changed, 35 insertions(+) > > diff --git a/arch/alpha/kernel/audit.c b/arch/alpha/kernel/audit.c > index 96a9d18ff4c4..06a911b685d1 100644 > --- a/arch/alpha/kernel/audit.c > +++ b/arch/alpha/kernel/audit.c > @@ -42,6 +42,8 @@ int audit_classify_syscall(int abi, unsigned syscall) > return 3; > case __NR_execve: > return 5; > + case __NR_openat2: > + return 6; > default: > return 0; > } > diff --git a/arch/ia64/kernel/audit.c b/arch/ia64/kernel/audit.c > index 5192ca899fe6..5eaa888c8fd3 100644 > --- a/arch/ia64/kernel/audit.c > +++ b/arch/ia64/kernel/audit.c > @@ -43,6 +43,8 @@ int audit_classify_syscall(int abi, unsigned syscall) > return 3; > case __NR_execve: > return 5; > + case __NR_openat2: > + return 6; > default: > return 0; > } > diff --git a/arch/parisc/kernel/audit.c b/arch/parisc/kernel/audit.c > index 9eb47b2225d2..fc721a7727ba 100644 > --- a/arch/parisc/kernel/audit.c > +++ b/arch/parisc/kernel/audit.c > @@ -52,6 +52,8 @@ int audit_classify_syscall(int abi, unsigned syscall) > return 3; > case __NR_execve: > return 5; > + case __NR_openat2: > + return 6; > default: > return 0; > } > diff --git a/arch/parisc/kernel/compat_audit.c > b/arch/parisc/kernel/compat_audit.c > index 20c39c9d86a9..fc6d35918c44 100644 > --- a/arch/parisc/kernel/compat_audit.c > +++ b/arch/parisc/kernel/compat_audit.c > @@ -35,6 +35,8 @@ int parisc32_classify_syscall(unsigned syscall) > return 3; > case __NR_execve: > return 5; > + case __NR_openat2: > + return 6; > default: > return 1; > } > diff --git a/arch/powerpc/kernel/audit.c b/arch/powerpc/kernel/audit.c > index a27f3d09..8f32700b0baa 100644 > --- a/arch/powerpc/kernel/audit.c > +++ b/arch/powerpc/kernel/audit.c > @@ -54,6 +54,8 @@ int audit_classify_syscall(int abi, unsigned syscall) > return 4; > case __NR_execve: > return 5; > + case __NR_openat2: > + return 6; > default: > return 0; > } > diff --git a/arch/powerpc/kernel/compat_audit.c > b/arch/powerpc/kernel/compat_audit.c > index 55c6ccda0a85..ebe45534b1c9 100644 > --- a/arch/powerpc/kernel/compat_audit.c > +++ b/arch/powerpc/kernel/compat_audit.c > @@ -38,6 +38,8 @@ int ppc32_classify_syscall(unsigned syscall) > return 4; > case __NR_execve: > return 5; > + case __NR_openat2: > + return 6; > default: > return 1; > } > diff --git a/arch/s390/kernel/audit.c b/arch/s390/kernel/audit.c > index d395c6c9944c..d964cb94cfaf 100644 > --- a/arch/s390/kernel/audit.c > +++ b/arch/s390/kernel/audit.c > @@ -54,6 +54,8 @@ int audit_classify_syscall(int abi, unsigned syscall) > return 4; > case __NR_execve: > return 5; > + case __NR_openat2: > + return 6; > default: > return 0; > } > diff --git a/arch/s390/kernel/compat_audit.c b/arch/s390/kernel/compat_audit.c > index 444fb1f66944..f7b32933ce0e 100644 > --- a/arch/s390/kernel/compat_audit.c > +++ b/arch/s390/kernel/compat_audit.c > @@ -39,6 +39,8 @@ int s390_classify_syscall(unsigned syscall) > return 4; > case __NR_execve: > return 5; > + case __NR_openat2: > + return 6; > default: > return 1; > } > diff --git a/arch/sparc/kernel/audit.c b/arch/sparc/kernel/audit.c > index a6e91bf34d48..b6dcca9c6520 100644 > --- a/arch/sparc/kernel/audit.c > +++ b/arch/sparc/kernel/audit.c > @@ -55,6 +55,8 @@ int audit_classify_syscall(int abi, unsigned int syscall) >
Re: [PATCH 08/10] MIPS: disable CONFIG_IDE in malta*_defconfig
On 3/18/21 7:57 AM, Christoph Hellwig wrote: > Various malta defconfigs enable CONFIG_IDE for the tc86c001 ide driver, > hich is a Toshiba plug in card that does not make much sense to use on ^ which is for > bigsur platforms. For all other ATA cards libata support is already ^ Malta. > enabled. > > Signed-off-by: Christoph Hellwig [...] MBR, Sergei
Re: [PATCH 07/10] MIPS: disable CONFIG_IDE in bigsur_defconfig
Hi! On 3/18/21 7:57 AM, Christoph Hellwig wrote: > bigsur_defconfig enables CONFIG_IDE for the tc86c001 ide driver, which > is a Toshiba plug in card that does not make much sense to use on bigsur ^ for Else that doesn't make much sense. :-) > platforms. For all other ATA cards libata support is already enabled. > > Signed-off-by: Christoph Hellwig [...] MBR, Sergei
Re: [PATCH 1/1] hotplug-cpu.c: show 'last online CPU' error in dlpar_cpu_remove()
Ping On 3/5/21 2:38 PM, Daniel Henrique Barboza wrote: Of all the reasons that dlpar_cpu_remove() can fail, the 'last online CPU' is one that can be caused directly by the user offlining CPUs in a partition/virtual machine that has hotplugged CPUs. Trying to reclaim a hotplugged CPU can fail if the CPU is now the last online in the system. This is easily reproduced using QEMU [1]. Throwing a more specific error message for this case, instead of just "Failed to offline CPU", makes it clearer that the error is in fact a known error situation instead of other generic/unknown cause. [1] https://bugzilla.redhat.com/1911414 Signed-off-by: Daniel Henrique Barboza --- arch/powerpc/platforms/pseries/hotplug-cpu.c | 12 +++- 1 file changed, 11 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/platforms/pseries/hotplug-cpu.c b/arch/powerpc/platforms/pseries/hotplug-cpu.c index 12cbffd3c2e3..134f393f09e1 100644 --- a/arch/powerpc/platforms/pseries/hotplug-cpu.c +++ b/arch/powerpc/platforms/pseries/hotplug-cpu.c @@ -514,7 +514,17 @@ static ssize_t dlpar_cpu_remove(struct device_node *dn, u32 drc_index) rc = dlpar_offline_cpu(dn); if (rc) { - pr_warn("Failed to offline CPU %pOFn, rc: %d\n", dn, rc); + /* dlpar_offline_cpu will return -EBUSY from cpu_down() (via +* device_offline()) in 2 cases: cpu_hotplug_disable is true or +* there is only one CPU left. Warn the user about the second +* since this can happen with user offlining CPUs and then +* attempting hotunplugs. +*/ + if (rc == -EBUSY && num_online_cpus() == 1) + pr_warn("Unable to remove last online CPU %pOFn\n", dn); + else + pr_warn("Failed to offline CPU %pOFn, rc: %d\n", dn, rc); + return -EINVAL; }
Re: [PATCH] powerpc/numa: Fix topology_physical_package_id() on pSeries
On 3/18/21 4:28 AM, Cédric Le Goater wrote: Also we've been using it for several years and I don't think we should risk breaking anything by changing the value now. I guess we can leave it that way. Please read the commit log of the second patch (not tagged as a v2 ...). But we should remove ibm,chip-id from QEMU since the property does not exist on PAPR and that the calculation is anyhow very broken. I am a strong advocate of getting rid of ibm,chip-id in QEMU. That said, we need to make sure that the current problem with CPU topologies, that I reported in that other thread, can be fixed without it. Thanks, DHB Thanks, C.
[PATCH] pseries: prevent free CPU ids to be reused on another node
When a CPU is hot added, the CPU ids are taken from the available mask from the lower possible set. If that set of values was previously used for CPU attached to a different node, this seems to application like if these CPUs have migrated from a node to another one which is not expected in real life. To prevent this, it is needed to record the CPU ids used for each node and to not reuse them on another node. However, to prevent CPU hot plug to fail, in the case the CPU ids is starved on a node, the capability to reuse other nodes’ free CPU ids is kept. A warning is displayed in such a case to warn the user. A new CPU bit mask (node_recorded_ids_map) is introduced for each possible node. It is populated with the CPU onlined at boot time, and then when a CPU is hot plug to a node. The bits in that mask remain when the CPU is hot unplugged, to remind this CPU ids have been used for this node. If no id set was found, a retry is made without removing the ids used on the other nodes to try reusing them. This is the way ids have been allocated prior to this patch. The effect of this patch can be seen by removing and adding CPUs using the Qemu monitor. In the following case, the first CPU from the node 2 is removed, then the first one from the node 1 is removed too. Later, the first CPU of the node 2 is added back. Without that patch, the kernel will numbered these CPUs using the first CPU ids available which are the ones freed when removing the second CPU of the node 0. This leads to the CPU ids 16-23 to move from the node 1 to the node 2. With the patch applied, the CPU ids 32-39 are used since they are the lowest free ones which have not been used on another node. At boot time: [root@vm40 ~]# numactl -H | grep cpus available: 3 nodes (0-2) node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 node 1 cpus: 16 17 18 19 20 21 22 23 40 41 42 43 44 45 46 47 node 2 cpus: 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 Unpatched kernel, after the CPU hot unplug/plug operations: [root@vm40 ~]# numactl -H | grep cpus available: 3 nodes (0-2) node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 node 1 cpus: 24 25 26 27 28 29 30 31 node 2 cpus: 16 17 18 19 20 21 22 23 40 41 42 43 44 45 46 47 Patched kernel, after the CPU hot unplug/plug operations: [root@vm40 ~]# numactl -H | grep cpus available: 3 nodes (0-2) node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 node 1 cpus: 24 25 26 27 28 29 30 31 node 2 cpus: 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 Signed-off-by: Laurent Dufour --- arch/powerpc/platforms/pseries/hotplug-cpu.c | 83 ++-- 1 file changed, 76 insertions(+), 7 deletions(-) diff --git a/arch/powerpc/platforms/pseries/hotplug-cpu.c b/arch/powerpc/platforms/pseries/hotplug-cpu.c index 12cbffd3c2e3..dc5797110d6e 100644 --- a/arch/powerpc/platforms/pseries/hotplug-cpu.c +++ b/arch/powerpc/platforms/pseries/hotplug-cpu.c @@ -39,6 +39,8 @@ /* This version can't take the spinlock, because it never returns */ static int rtas_stop_self_token = RTAS_UNKNOWN_SERVICE; +cpumask_var_t node_recorded_ids_map[MAX_NUMNODES]; + static void rtas_stop_self(void) { static struct rtas_args args; @@ -151,29 +153,61 @@ static void pseries_cpu_die(unsigned int cpu) */ static int pseries_add_processor(struct device_node *np) { - unsigned int cpu; + unsigned int cpu, node; cpumask_var_t candidate_mask, tmp; - int err = -ENOSPC, len, nthreads, i; + int err = -ENOSPC, len, nthreads, i, nid; const __be32 *intserv; + bool force_reusing = false; intserv = of_get_property(np, "ibm,ppc-interrupt-server#s", ); if (!intserv) return 0; - zalloc_cpumask_var(_mask, GFP_KERNEL); - zalloc_cpumask_var(, GFP_KERNEL); + alloc_cpumask_var(_mask, GFP_KERNEL); + alloc_cpumask_var(, GFP_KERNEL); + + /* +* Fetch from the DT nodes read by dlpar_configure_connector() the NUMA +* node id the added CPU belongs to. +*/ + nid = of_node_to_nid(np); + if (nid < 0 || !node_possible(nid)) + nid = first_online_node; nthreads = len / sizeof(u32); - for (i = 0; i < nthreads; i++) - cpumask_set_cpu(i, tmp); cpu_maps_update_begin(); BUG_ON(!cpumask_subset(cpu_present_mask, cpu_possible_mask)); +again: + cpumask_clear(candidate_mask); + cpumask_clear(tmp); + for (i = 0; i < nthreads; i++) + cpumask_set_cpu(i, tmp); + /* Get a bitmap of unoccupied slots. */ cpumask_xor(candidate_mask, cpu_possible_mask, cpu_present_mask); + + /* +* Remove free ids previously assigned on the other nodes. We can walk +* only online nodes because once a node became online it is not turned +* offlined back. +*/ + if (!force_reusing) + for_each_online_node(node) { + if (node == nid) /* Keep our node's recorded ids
Re: [PATCH v2 4/6] mm/mremap: Use mmu gather interface instead of flush_tlb_range
Excerpts from Aneesh Kumar K.V's message of March 15, 2021 9:38 pm: > Some architectures do have the concept of page walk cache and only mmu gather > interface supports flushing them. A fast mremap that involves moving page > table pages instead of copying pte entries should flush page walk cache since > the old translation cache is no more valid. Hence switch to mm gather to flush > TLB and mark tlb.freed_tables = 1. No page table pages need to be freed here. > With this the tlb flush is done outside page table lock (ptl). I would maybe just get archs that implement it to provide a specific flush_tlb+pwc_range for it, or else they get flush_tlb_range by default. I think that would be simpler for now, at least in generic code. There was some other talk of consolidating the TLB flush APIs, I jsut don't know if it's the best way to go to use the page/page table gathering and freeing API for it. Thanks, Nick > > Signed-off-by: Aneesh Kumar K.V > --- > mm/mremap.c | 33 + > 1 file changed, 29 insertions(+), 4 deletions(-) > > diff --git a/mm/mremap.c b/mm/mremap.c > index 574287f9bb39..fafa73b965d3 100644 > --- a/mm/mremap.c > +++ b/mm/mremap.c > @@ -216,6 +216,7 @@ static bool move_normal_pmd(struct vm_area_struct *vma, > unsigned long old_addr, > { > spinlock_t *old_ptl, *new_ptl; > struct mm_struct *mm = vma->vm_mm; > + struct mmu_gather tlb; > pmd_t pmd; > > /* > @@ -244,11 +245,12 @@ static bool move_normal_pmd(struct vm_area_struct *vma, > unsigned long old_addr, > if (WARN_ON_ONCE(!pmd_none(*new_pmd))) > return false; > > + tlb_gather_mmu(, mm); > /* >* We don't have to worry about the ordering of src and dst >* ptlocks because exclusive mmap_lock prevents deadlock. >*/ > - old_ptl = pmd_lock(vma->vm_mm, old_pmd); > + old_ptl = pmd_lock(mm, old_pmd); > new_ptl = pmd_lockptr(mm, new_pmd); > if (new_ptl != old_ptl) > spin_lock_nested(new_ptl, SINGLE_DEPTH_NESTING); > @@ -257,13 +259,23 @@ static bool move_normal_pmd(struct vm_area_struct *vma, > unsigned long old_addr, > pmd = *old_pmd; > pmd_clear(old_pmd); > > + /* > + * Mark the range. We are not freeing page table pages nor > + * regular pages. Hence we don't need to call tlb_remove_table() > + * or tlb_remove_page(). > + */ > + tlb_flush_pte_range(, old_addr, PMD_SIZE); > + tlb.freed_tables = 1; > VM_BUG_ON(!pmd_none(*new_pmd)); > pmd_populate(mm, new_pmd, (pgtable_t)pmd_page_vaddr(pmd)); > > - flush_tlb_range(vma, old_addr, old_addr + PMD_SIZE); > if (new_ptl != old_ptl) > spin_unlock(new_ptl); > spin_unlock(old_ptl); > + /* > + * This will invalidate both the old TLB and page table walk caches. > + */ > + tlb_finish_mmu(); > > return true; > } > @@ -282,6 +294,7 @@ static bool move_normal_pud(struct vm_area_struct *vma, > unsigned long old_addr, > { > spinlock_t *old_ptl, *new_ptl; > struct mm_struct *mm = vma->vm_mm; > + struct mmu_gather tlb; > pud_t pud; > > /* > @@ -291,11 +304,12 @@ static bool move_normal_pud(struct vm_area_struct *vma, > unsigned long old_addr, > if (WARN_ON_ONCE(!pud_none(*new_pud))) > return false; > > + tlb_gather_mmu(, mm); > /* >* We don't have to worry about the ordering of src and dst >* ptlocks because exclusive mmap_lock prevents deadlock. >*/ > - old_ptl = pud_lock(vma->vm_mm, old_pud); > + old_ptl = pud_lock(mm, old_pud); > new_ptl = pud_lockptr(mm, new_pud); > if (new_ptl != old_ptl) > spin_lock_nested(new_ptl, SINGLE_DEPTH_NESTING); > @@ -304,14 +318,25 @@ static bool move_normal_pud(struct vm_area_struct *vma, > unsigned long old_addr, > pud = *old_pud; > pud_clear(old_pud); > > + /* > + * Mark the range. We are not freeing page table pages nor > + * regular pages. Hence we don't need to call tlb_remove_table() > + * or tlb_remove_page(). > + */ > + tlb_flush_pte_range(, old_addr, PUD_SIZE); > + tlb.freed_tables = 1; > VM_BUG_ON(!pud_none(*new_pud)); > > pud_populate(mm, new_pud, (pmd_t *)pud_page_vaddr(pud)); > - flush_tlb_range(vma, old_addr, old_addr + PUD_SIZE); > + > if (new_ptl != old_ptl) > spin_unlock(new_ptl); > spin_unlock(old_ptl); > > + /* > + * This will invalidate both the old TLB and page table walk caches. > + */ > + tlb_finish_mmu(); > return true; > } > #else > -- > 2.29.2 > >
Re: [PATCH v2 4/6] mm/mremap: Use mmu gather interface instead of flush_tlb_range
Hi "Aneesh, I love your patch! Yet something to improve: [auto build test ERROR on powerpc/next] [also build test ERROR on kselftest/next v5.12-rc3 next-20210317] [cannot apply to hnaz-linux-mm/master] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch] url: https://github.com/0day-ci/linux/commits/Aneesh-Kumar-K-V/Speedup-mremap-on-ppc64/20210315-194324 base: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next config: x86_64-rhel-8.3 (attached as .config) compiler: gcc-9 (Debian 9.3.0-22) 9.3.0 reproduce (this is a W=1 build): # https://github.com/0day-ci/linux/commit/d3b9a3e6f414413d8f822185158b937d9f19b7a6 git remote add linux-review https://github.com/0day-ci/linux git fetch --no-tags linux-review Aneesh-Kumar-K-V/Speedup-mremap-on-ppc64/20210315-194324 git checkout d3b9a3e6f414413d8f822185158b937d9f19b7a6 # save the attached .config to linux build tree make W=1 ARCH=x86_64 If you fix the issue, kindly add following tag as appropriate Reported-by: kernel test robot Note: the linux-review/Aneesh-Kumar-K-V/Speedup-mremap-on-ppc64/20210315-194324 HEAD 79633714ff2b990b3e4972873457678bb34d029f builds fine. It only hurts bisectibility. All errors (new ones prefixed by >>): mm/mremap.c: In function 'move_normal_pmd': >> mm/mremap.c:219:20: error: storage size of 'tlb' isn't known 219 | struct mmu_gather tlb; |^~~ >> mm/mremap.c:267:2: error: implicit declaration of function >> 'tlb_flush_pte_range' [-Werror=implicit-function-declaration] 267 | tlb_flush_pte_range(, old_addr, PMD_SIZE); | ^~~ mm/mremap.c:219:20: warning: unused variable 'tlb' [-Wunused-variable] 219 | struct mmu_gather tlb; |^~~ mm/mremap.c: In function 'move_normal_pud': mm/mremap.c:297:20: error: storage size of 'tlb' isn't known 297 | struct mmu_gather tlb; |^~~ mm/mremap.c:297:20: warning: unused variable 'tlb' [-Wunused-variable] cc1: some warnings being treated as errors vim +219 mm/mremap.c 212 213 #ifdef CONFIG_HAVE_MOVE_PMD 214 static bool move_normal_pmd(struct vm_area_struct *vma, unsigned long old_addr, 215unsigned long new_addr, pmd_t *old_pmd, pmd_t *new_pmd) 216 { 217 spinlock_t *old_ptl, *new_ptl; 218 struct mm_struct *mm = vma->vm_mm; > 219 struct mmu_gather tlb; 220 pmd_t pmd; 221 222 /* 223 * The destination pmd shouldn't be established, free_pgtables() 224 * should have released it. 225 * 226 * However, there's a case during execve() where we use mremap 227 * to move the initial stack, and in that case the target area 228 * may overlap the source area (always moving down). 229 * 230 * If everything is PMD-aligned, that works fine, as moving 231 * each pmd down will clear the source pmd. But if we first 232 * have a few 4kB-only pages that get moved down, and then 233 * hit the "now the rest is PMD-aligned, let's do everything 234 * one pmd at a time", we will still have the old (now empty 235 * of any 4kB pages, but still there) PMD in the page table 236 * tree. 237 * 238 * Warn on it once - because we really should try to figure 239 * out how to do this better - but then say "I won't move 240 * this pmd". 241 * 242 * One alternative might be to just unmap the target pmd at 243 * this point, and verify that it really is empty. We'll see. 244 */ 245 if (WARN_ON_ONCE(!pmd_none(*new_pmd))) 246 return false; 247 248 tlb_gather_mmu(, mm); 249 /* 250 * We don't have to worry about the ordering of src and dst 251 * ptlocks because exclusive mmap_lock prevents deadlock. 252 */ 253 old_ptl = pmd_lock(mm, old_pmd); 254 new_ptl = pmd_lockptr(mm, new_pmd); 255 if (new_ptl != old_ptl) 256 spin_lock_nested(new_ptl, SINGLE_DEPTH_NESTING); 257 258 /* Clear the pmd */ 259 pmd = *old_pmd; 260 pmd_clear(old_pmd); 261 262 /* 263 * Mark the range. We are not freeing page table pages nor 264 * regular pages. Hence we don't need to call tlb_remove_table() 265 * or tlb_remove_page(). 266 */ > 267 tlb_flush_pte_range(, old_addr, PMD_SIZE); 268 tlb.freed_tables = 1; 269
Re: [PATCH 01/10] alpha: use libata instead of the legacy ide driver
Hi Al! On 3/18/21 6:54 AM, Al Viro wrote: > On Thu, Mar 18, 2021 at 05:56:57AM +0100, Christoph Hellwig wrote: >> Switch the alpha defconfig from the legacy ide driver to libata. > > Umm... I don't have an IDE alpha box in a usable shape (fans on > CPU module shat themselves), and it would take a while to resurrect > it, but I remember the joy it used to cause in some versions. > > Do you have reports of libata variants of drivers actually tested on > those? At least pata_cypress works fine on my AlphaStation XP1000: root@tsunami:~> lspci :00:07.0 ISA bridge: Contaq Microsystems 82c693 :00:07.1 IDE interface: Contaq Microsystems 82c693 :00:07.2 IDE interface: Contaq Microsystems 82c693 :00:07.3 USB controller: Contaq Microsystems 82c693 :00:0d.0 VGA compatible controller: Texas Instruments TVP4020 [Permedia 2] (rev 01) 0001:01:03.0 Ethernet controller: Digital Equipment Corporation DECchip 21142/43 (rev 41) 0001:01:06.0 SCSI storage controller: QLogic Corp. ISP1020 Fast-wide SCSI (rev 06) 0001:01:08.0 PCI bridge: Digital Equipment Corporation DECchip 21152 (rev 03) 0001:02:09.0 Ethernet controller: Intel Corporation 82541PI Gigabit Ethernet Controller (rev 05) root@tsunami:~> lsmod|grep pata pata_cypress3595 3 libata235071 2 ata_generic,pata_cypress root@tsunami:~> I also have two AlphaStation 233 currently in storage which I assume use different IDE chipset which I could test as well. Adrian -- .''`. John Paul Adrian Glaubitz : :' : Debian Developer - glaub...@debian.org `. `' Freie Universitaet Berlin - glaub...@physik.fu-berlin.de `-GPG: 62FF 8A75 84E0 2956 9546 0006 7426 3B37 F5B5 F913
Re: [PATCH] powerpc/numa: Fix topology_physical_package_id() on pSeries
> Also we've been using it for several years and I don't think we should > risk breaking anything by changing the value now. I guess we can leave it that way. Please read the commit log of the second patch (not tagged as a v2 ...). But we should remove ibm,chip-id from QEMU since the property does not exist on PAPR and that the calculation is anyhow very broken. Thanks, C.
Re: [PATCH 01/10] alpha: use libata instead of the legacy ide driver
On Thu, Mar 18, 2021 at 05:54:55AM +, Al Viro wrote: > On Thu, Mar 18, 2021 at 05:56:57AM +0100, Christoph Hellwig wrote: > > Switch the alpha defconfig from the legacy ide driver to libata. > > Umm... I don't have an IDE alpha box in a usable shape (fans on > CPU module shat themselves), and it would take a while to resurrect > it, but I remember the joy it used to cause in some versions. > > Do you have reports of libata variants of drivers actually tested on > those? No, I haven't. The whole point is that we're not going to keep 4 lines of code around despite notice for users that don't exist or care. If there is a regression we'll fix it, but we're not going to make life miserable just because we can.
Re: [PATCH 01/10] alpha: use libata instead of the legacy ide driver
On Thu, Mar 18, 2021 at 05:56:57AM +0100, Christoph Hellwig wrote: > Switch the alpha defconfig from the legacy ide driver to libata. Umm... I don't have an IDE alpha box in a usable shape (fans on CPU module shat themselves), and it would take a while to resurrect it, but I remember the joy it used to cause in some versions. Do you have reports of libata variants of drivers actually tested on those?