date:20210318

[PATCH v6] powerpc/irq: inline call_do_irq() and call_do_softirq() on PPC32

2021-03-18 Thread Christophe Leroy

call_do_irq() and call_do_softirq() are simple enough to be
worth inlining.

Inlining them avoids an mflr/mtlr pair plus a save/reload on stack.
It also allows GCC to keep the saved ksp_limit in an nonvolatile reg.

This is inspired from S390 arch. Several other arches do more or
less the same. The way sparc arch does seems odd thought.

For the time being this is limited to PPC32 because there are
incertainties on the handling of r2 which is the TOC on PPC64,
see discussion at https://patchwork.ozlabs.org/patch/1174288/

Signed-off-by: Christophe Leroy 
Reviewed-by: Segher Boessenkool 
---
v2: no change.
v3: no change.
v4:
- comment reminding the purpose of the inline asm block.
- added r2 as clobbered reg
v5:
- Limiting the change to PPC32 for now.
- removed r2 from the clobbered regs list (on PPC32 r2 points to current all 
the time)
- Removed patch 1 and merged ksp_limit handling in here.
v6:
- Rebase on top of merge-test (ca6e327fefb2).
- Remove the ksp_limit stuff as it's doesn't exist anymore.
---
 arch/powerpc/include/asm/irq.h |  2 ++
 arch/powerpc/kernel/irq.c  | 34 ++
 arch/powerpc/kernel/misc_32.S  | 25 -
 3 files changed, 36 insertions(+), 25 deletions(-)

diff --git a/arch/powerpc/include/asm/irq.h b/arch/powerpc/include/asm/irq.h
index f3f264e441a7..23c28974ca29 100644
--- a/arch/powerpc/include/asm/irq.h
+++ b/arch/powerpc/include/asm/irq.h
@@ -53,8 +53,10 @@ extern void *mcheckirq_ctx[NR_CPUS];
 extern void *hardirq_ctx[NR_CPUS];
 extern void *softirq_ctx[NR_CPUS];
 
+#ifdef CONFIG_PPC64
 void call_do_softirq(void *sp);
 void call_do_irq(struct pt_regs *regs, void *sp);
+#endif
 extern void do_IRQ(struct pt_regs *regs);
 extern void __init init_IRQ(void);
 extern void __do_irq(struct pt_regs *regs);
diff --git a/arch/powerpc/kernel/irq.c b/arch/powerpc/kernel/irq.c
index 5b72abbff96c..327422c57ae8 100644
--- a/arch/powerpc/kernel/irq.c
+++ b/arch/powerpc/kernel/irq.c
@@ -667,6 +667,40 @@ static inline void check_stack_overflow(void)
}
 }
 
+#ifdef CONFIG_PPC32
+static inline void call_do_softirq(const void *sp)
+{
+   register unsigned long ret asm("r3");
+
+   /* Temporarily switch r1 to sp, call __do_softirq() then restore r1. */
+   asm volatile(
+   "   "PPC_STLU"  1, %2(%1);\n"
+   "   mr  1, %1;\n"
+   "   bl  %3;\n"
+   "   "PPC_LL"1, 0(1);\n" :
+   "=r"(ret) :
+   "b"(sp), "i"(THREAD_SIZE - STACK_FRAME_OVERHEAD), 
"i"(__do_softirq) :
+   "lr", "xer", "ctr", "memory", "cr0", "cr1", "cr5", "cr6", "cr7",
+   "r0", "r4", "r5", "r6", "r7", "r8", "r9", "r10", "r11", "r12");
+}
+
+static inline void call_do_irq(struct pt_regs *regs, void *sp)
+{
+   register unsigned long r3 asm("r3") = (unsigned long)regs;
+
+   /* Temporarily switch r1 to sp, call __do_irq() then restore r1. */
+   asm volatile(
+   "   "PPC_STLU"  1, %2(%1);\n"
+   "   mr  1, %1;\n"
+   "   bl  %3;\n"
+   "   "PPC_LL"1, 0(1);\n" :
+   "+r"(r3) :
+   "b"(sp), "i"(THREAD_SIZE - STACK_FRAME_OVERHEAD), "i"(__do_irq) 
:
+   "lr", "xer", "ctr", "memory", "cr0", "cr1", "cr5", "cr6", "cr7",
+   "r0", "r4", "r5", "r6", "r7", "r8", "r9", "r10", "r11", "r12");
+}
+#endif
+
 void __do_irq(struct pt_regs *regs)
 {
unsigned int irq;
diff --git a/arch/powerpc/kernel/misc_32.S b/arch/powerpc/kernel/misc_32.S
index acc410043b96..6a076bef2932 100644
--- a/arch/powerpc/kernel/misc_32.S
+++ b/arch/powerpc/kernel/misc_32.S
@@ -27,31 +27,6 @@
 
.text
 
-_GLOBAL(call_do_softirq)
-   mflrr0
-   stw r0,4(r1)
-   stwur1,THREAD_SIZE-STACK_FRAME_OVERHEAD(r3)
-   mr  r1,r3
-   bl  __do_softirq
-   lwz r1,0(r1)
-   lwz r0,4(r1)
-   mtlrr0
-   blr
-
-/*
- * void call_do_irq(struct pt_regs *regs, void *sp);
- */
-_GLOBAL(call_do_irq)
-   mflrr0
-   stw r0,4(r1)
-   stwur1,THREAD_SIZE-STACK_FRAME_OVERHEAD(r4)
-   mr  r1,r4
-   bl  __do_irq
-   lwz r1,0(r1)
-   lwz r0,4(r1)
-   mtlrr0
-   blr
-
 /*
  * This returns the high 64 bits of the product of two 64-bit numbers.
  */
-- 
2.25.0

Re: [PATCH] powerpc/kexec: Don't use .machine ppc64 in trampoline_64.S

2021-03-18 Thread Michael Ellerman

Segher Boessenkool  writes:
> Hi!
>
> On Mon, Mar 15, 2021 at 02:41:59PM +1100, Michael Ellerman wrote:
>> The ".machine" directive allows changing the machine for which code is
>> being generated. It's equivalent to passing an -mcpu option on the
>> command line.
>> 
>> Although it can be useful, it's generally a bad idea because it adds
>> another way to influence code generation separate from the flags
>> passed via the build system. ie. if we need to build different pieces
>> of code with different flags we should do that via our Makefiles, not
>> using ".machine".
>
> It does not influence code generation.  It says which instructions are
> valid, instead.  There are a few cases where the same mnemonic will
> generate a different binary encoding depending on machine selected,
> maybe you mean that?

Yeah that's what I was referring to. Which is code generation in my
mind, but I guess that's probably not the right terminology to use
around compiler people :)

And I guess you're right, the more common case is that the mnemonics are
just not valid for other machines and wouldn't assemble at all.

I'll reword it.

> It is *normal* to use .machine push/pop and a specific .machine around
> instructions that require a machine other than what you are building
> for.  The compiler does this itself, and it is the recommended way to
> use "foreign" instructions in inline assembler.

Right, but it also makes it easy to build code that won't run :) So we'd
like to avoid it.

We had that in the past where we were building the power7-only memcpy
routines for Book3E 64. They weren't being used due to runtime patching
stuff, but it's better to not build them in the first place for those
CPUs because they could never work.

> That said...
>
>> However as best as I can tell the ".machine" directive in
>> trampoline_64.S is not necessary at all.
>> 
>> It was added in commit 0d97631392c2 ("powerpc: Add purgatory for
>> kexec_file_load() implementation."), which created the file based on
>> the kexec-tools purgatory. It may be/have-been necessary in the
>> kexec-tools version, but we have a completely different build system,
>> and we already pass the desired CPU flags, eg:
>> 
>>   gcc ... -m64 -Wl,-a64 -mabi=elfv2 -Wa,-maltivec -Wa,-mpower4 -Wa,-many
>>   ... arch/powerpc/purgatory/trampoline_64.S
>> 
>> So drop the ".machine" directive and rely on the assembler flags.
>
>> -.machine ppc64
>
> Please make sure to test this on a big endian config.

Done.

> A ppc64le-linux assembler defaults to power8.  A ppc64-linux assembler
> defaults to power3 (that is the same as .machine ppc64).  Or maybe it
> makes it power4?  I get lost :-)

For book3s64 we always specify -mpower4 since 15a3204d24a3
("powerpc/64s: Set assembler machine type to POWER4") (Apr 2018).

That does leave 64-bit book3e, but I just tested that and it also builds
fine.

> It certainly *should* work, but, test please :-)
>
> (And with a *default* powerpc64-linux config, not one that defaults to
> power7 or power8 or similar!  Arnd's toolchains at
> 
> are fine for this.)

Yep, I used Arnd's 10.1.0.

> Reviewed-by: Segher Boessenkool 

Thanks.

cheers

Re: remove the legacy ide driver

2021-03-18 Thread Christoph Hellwig

On Fri, Mar 19, 2021 at 12:43:48PM +1100, Finn Thain wrote:
> A few months ago I wrote another patch to move some more platforms away 
> from macide but it has not been tested yet. That is not to say you should 
> wait. However, my patch does have some changes that are missing from your 
> patch series, relating to ide platform devices in arch/m68k/mac/config.c. 
> I hope to be able to test this patch before the 5.13 merge window closes.

Normally we do not remove drivers for hardware that is still used.  So
at leat for macide my plan was not to take it away unless the users 
are sufficiently happy.  Or in other words:  I think waiting it the
right choice, but hopefully we can make that wait as short as possible.

Re: [PATCH 00/36] [Set 4] Rid W=1 warnings in SCSI

2021-03-18 Thread Martin K. Petersen



Lee,

> This set is part of a larger effort attempting to clean-up W=1 kernel
> builds, which are currently overwhelmingly riddled with niggly little
> warnings.

Applied to 5.13/scsi-staging, thanks! I fixed a few little things.

-- 
Martin K. Petersen  Oracle Linux Engineering

Re: [PATCH 04/10] MIPS: disable CONFIG_IDE in sb1250_swarm_defconfig

2021-03-18 Thread Maciej W. Rozycki

On Thu, 18 Mar 2021, Christoph Hellwig wrote:

> sb1250_swarm_defconfig enables CONFIG_IDE but no actual host controller
> driver, so just drop CONFIG_IDE, CONFIG_BLK_DEV_IDECD and
> CONFIG_BLK_DEV_IDETAPE as they are useless.

 Actually BLK_DEV_PLATFORM would handle the SWARM's platform driver as an 
IDE device, however the driver has supported libata ever since commit 
2fef357cf391 ("IDE: Fix platform device registration in Swarm IDE driver 
(v2)") back in 2008, so this is good to go.  We should probably enable 
PATA_PLATFORM in the defconfig instead.

 The printed name of the driver could be improved I suppose though:

scsi host0: pata_platform
ata1: PATA max PIO0 mmio cmd 0x100b3e00 ctl 0x100b7ec0 irq 36

(PIO3 is actually hardwired; it's an odd interface and people reported 
issues with it, but I have never had any myself be it with IDE or libata).

Acked-by: Maciej W. Rozycki 

  Maciej

[for-stable-4.19 PATCH 0/2] Backport patches to fix KASAN+LKDTM with recent clang on ARM64

2021-03-18 Thread Nicolas Boichat



Backport 2 patches that are required to make KASAN+LKDTM work
with recent clang (patch 2/2 has a complete description).
Tested on our chromeos-4.19 branch.

Patch 1/2 is context conflict only, and 2/2 is a clean backport.

These patches have been merged to 5.4 stable already. We might
need to backport to older stable branches, but this is what I
could test for now.


Mark Rutland (1):
  lkdtm: don't move ctors to .rodata

Thomas Gleixner (1):
  vmlinux.lds.h: Create section for protection against instrumentation

 arch/powerpc/kernel/vmlinux.lds.S |  1 +
 drivers/misc/lkdtm/Makefile   |  2 +-
 drivers/misc/lkdtm/rodata.c   |  2 +-
 include/asm-generic/sections.h|  3 ++
 include/asm-generic/vmlinux.lds.h | 10 ++
 include/linux/compiler.h  | 54 +++
 include/linux/compiler_types.h|  4 +++
 scripts/mod/modpost.c |  2 +-
 8 files changed, 75 insertions(+), 3 deletions(-)

-- 
2.31.0.rc2.261.g7f71774620-goog

[for-stable-4.19 PATCH 1/2] vmlinux.lds.h: Create section for protection against instrumentation

2021-03-18 Thread Nicolas Boichat

From: Thomas Gleixner 

commit 655389433e7efec589838b400a2a652b3ffa upstream.

Some code pathes, especially the low level entry code, must be protected
against instrumentation for various reasons:

 - Low level entry code can be a fragile beast, especially on x86.

 - With NO_HZ_FULL RCU state needs to be established before using it.

Having a dedicated section for such code allows to validate with tooling
that no unsafe functions are invoked.

Add the .noinstr.text section and the noinstr attribute to mark
functions. noinstr implies notrace. Kprobes will gain a section check
later.

Provide also a set of markers: instrumentation_begin()/end()

These are used to mark code inside a noinstr function which calls
into regular instrumentable text section as safe.

The instrumentation markers are only active when CONFIG_DEBUG_ENTRY is
enabled as the end marker emits a NOP to prevent the compiler from merging
the annotation points. This means the objtool verification requires a
kernel compiled with this option.

Signed-off-by: Thomas Gleixner 
Reviewed-by: Alexandre Chartre 
Acked-by: Peter Zijlstra 
Link: https://lkml.kernel.org/r/20200505134100.075416...@linutronix.de

[Nicolas: context conflicts in:
arch/powerpc/kernel/vmlinux.lds.S
include/asm-generic/vmlinux.lds.h
include/linux/compiler.h
include/linux/compiler_types.h]
Signed-off-by: Nicolas Boichat 

---

 arch/powerpc/kernel/vmlinux.lds.S |  1 +
 include/asm-generic/sections.h|  3 ++
 include/asm-generic/vmlinux.lds.h | 10 ++
 include/linux/compiler.h  | 54 +++
 include/linux/compiler_types.h|  4 +++
 scripts/mod/modpost.c |  2 +-
 6 files changed, 73 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/vmlinux.lds.S 
b/arch/powerpc/kernel/vmlinux.lds.S
index 695432965f20..9b346f3d2814 100644
--- a/arch/powerpc/kernel/vmlinux.lds.S
+++ b/arch/powerpc/kernel/vmlinux.lds.S
@@ -99,6 +99,7 @@ SECTIONS
 #endif
/* careful! __ftr_alt_* sections need to be close to .text */
*(.text.hot TEXT_MAIN .text.fixup .text.unlikely .fixup 
__ftr_alt_* .ref.text);
+   NOINSTR_TEXT
SCHED_TEXT
CPUIDLE_TEXT
LOCK_TEXT
diff --git a/include/asm-generic/sections.h b/include/asm-generic/sections.h
index 849cd8eb5ca0..ea5987bb0b84 100644
--- a/include/asm-generic/sections.h
+++ b/include/asm-generic/sections.h
@@ -53,6 +53,9 @@ extern char __ctors_start[], __ctors_end[];
 /* Start and end of .opd section - used for function descriptors. */
 extern char __start_opd[], __end_opd[];
 
+/* Start and end of instrumentation protected text section */
+extern char __noinstr_text_start[], __noinstr_text_end[];
+
 extern __visible const void __nosave_begin, __nosave_end;
 
 /* Function descriptor handling (if any).  Override in asm/sections.h */
diff --git a/include/asm-generic/vmlinux.lds.h 
b/include/asm-generic/vmlinux.lds.h
index 2d632a74cc5e..88484ee023ca 100644
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -482,6 +482,15 @@
__security_initcall_end = .;\
}
 
+/*
+ * Non-instrumentable text section
+ */
+#define NOINSTR_TEXT   \
+   ALIGN_FUNCTION();   \
+   __noinstr_text_start = .;   \
+   *(.noinstr.text)\
+   __noinstr_text_end = .;
+
 /*
  * .text section. Map to function alignment to avoid address changes
  * during second ld run in second ld pass when generating System.map
@@ -496,6 +505,7 @@
*(TEXT_MAIN .text.fixup)\
*(.text.unlikely .text.unlikely.*)  \
*(.text.unknown .text.unknown.*)\
+   NOINSTR_TEXT\
*(.text..refcount)  \
*(.ref.text)\
MEM_KEEP(init.text*)\
diff --git a/include/linux/compiler.h b/include/linux/compiler.h
index 6b6505e3b2c7..6a53300cbd1e 100644
--- a/include/linux/compiler.h
+++ b/include/linux/compiler.h
@@ -129,11 +129,65 @@ void ftrace_likely_update(struct ftrace_likely_data *f, 
int val,
".pushsection .discard.unreachable\n\t" \
".long 999b - .\n\t"\
".popsection\n\t"
+
+#ifdef CONFIG_DEBUG_ENTRY
+/* Begin/end of an instrumentation safe region */
+#define instrumentation_begin() ({ \
+   asm volatile("%c0:\n\t" \
+".pushsection .discard.instr_begin\n\t"

Re: [RFC PATCH 8/8] powerpc/64/asm: don't reassign labels

2021-03-18 Thread Nicholas Piggin

Excerpts from Daniel Axtens's message of February 26, 2021 10:28 am:
> Segher Boessenkool  writes:
> 
>> On Thu, Feb 25, 2021 at 02:10:06PM +1100, Daniel Axtens wrote:
>>> The assembler really does not like us reassigning things to the same
>>> label:
>>> 
>>> :7:9: error: invalid reassignment of non-absolute variable 
>>> 'fs_label'
>>> 
>>> This happens across a bunch of platforms:
>>> https://github.com/ClangBuiltLinux/linux/issues/1043
>>> https://github.com/ClangBuiltLinux/linux/issues/1008
>>> https://github.com/ClangBuiltLinux/linux/issues/920
>>> https://github.com/ClangBuiltLinux/linux/issues/1050
>>> 
>>> There is no hope of getting this fixed in LLVM, so if we want to build
>>> with LLVM_IAS, we need to hack around it ourselves.
>>> 
>>> For us the big problem comes from this:
>>> 
>>> \#define USE_FIXED_SECTION(sname)   \
>>> fs_label = start_##sname;   \
>>> fs_start = sname##_start;   \
>>> use_ftsec sname;
>>> 
>>> \#define USE_TEXT_SECTION()
>>> fs_label = start_text;  \
>>> fs_start = text_start;  \
>>> .text
>>> 
>>> and in particular fs_label.
>>
>> The "Setting Symbols" super short chapter reads:
>>
>> "A symbol can be given an arbitrary value by writing a symbol, followed
>> by an equals sign '=', followed by an expression.  This is equivalent
>> to using the '.set' directive."
>>
>> And ".set" has
>>
>> "Set the value of SYMBOL to EXPRESSION.  This changes SYMBOL's value and
>> type to conform to EXPRESSION.  If SYMBOL was flagged as external, it
>> remains flagged.
>>
>> You may '.set' a symbol many times in the same assembly provided that
>> the values given to the symbol are constants.  Values that are based on
>> expressions involving other symbols are allowed, but some targets may
>> restrict this to only being done once per assembly.  This is because
>> those targets do not set the addresses of symbols at assembly time, but
>> rather delay the assignment until a final link is performed.  This
>> allows the linker a chance to change the code in the files, changing the
>> location of, and the relative distance between, various different
>> symbols.
>>
>> If you '.set' a global symbol, the value stored in the object file is
>> the last value stored into it."
>>
>> So this really should be fixed in clang: it is basic assembler syntax.
> 
> No doubt I have explained this poorly.
> 
> LLVM does allow some things, this builds fine for example:
> 
> .set foo, 8192
> addi %r3, %r3, foo
> .set foo, 1234
> addi %r3, %r3, foo
> 
> However, this does not:
> 
> a:
> .set foo, a
> addi %r3, %r3, foo@l
> b:
> .set foo, b
> addi %r3, %r3, foo-a
> 
> clang -target ppc64le -integrated-as  foo.s -o foo.o -c
> foo.s:5:11: error: invalid reassignment of non-absolute variable 'foo' in 
> '.set' directive
> .set foo, b
>   ^

So that does seem to be allowed by the specification.

I don't have a huge problem with the patch actually, doesn't seem too 
bad.

Thanks,
Nick

Re: [RFC PATCH 7/8] powerpc/purgatory: drop .machine specifier

2021-03-18 Thread Nicholas Piggin

Excerpts from Segher Boessenkool's message of February 26, 2021 1:58 am:
> On Thu, Feb 25, 2021 at 02:10:05PM +1100, Daniel Axtens wrote:
>> It's ignored by future versions of llvm's integrated assembler (by not -11).
>> I'm not sure what it does for us in gas.
> 
> It enables all insns that exist on 620 (the first 64-bit PowerPC CPU).

Same question for this, why do we have it at all?

Thanks,
Nick

Re: [RFC PATCH 6/8] powerpc/mm/book3s64/hash: drop pre 2.06 tlbiel for clang

2021-03-18 Thread Nicholas Piggin

Excerpts from Daniel Axtens's message of February 25, 2021 1:10 pm:
> The llvm integrated assembler does not recognise the ISA 2.05 tlbiel
> version. Eventually do this more smartly.

The whole thing with TLBIE and TLBIEL in this file seems a bit too 
clever. We should have PPC_TLBIE* macros for all of them.

Thanks,
Nick

> 
> Signed-off-by: Daniel Axtens 
> ---
>  arch/powerpc/mm/book3s64/hash_native.c | 10 ++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/arch/powerpc/mm/book3s64/hash_native.c 
> b/arch/powerpc/mm/book3s64/hash_native.c
> index 52e170bd95ae..c5937f69a452 100644
> --- a/arch/powerpc/mm/book3s64/hash_native.c
> +++ b/arch/powerpc/mm/book3s64/hash_native.c
> @@ -267,9 +267,14 @@ static inline void __tlbiel(unsigned long vpn, int 
> psize, int apsize, int ssize)
>   va |= ssize << 8;
>   sllp = get_sllp_encoding(apsize);
>   va |= sllp << 5;
> +#if 0
>   asm volatile(ASM_FTR_IFSET("tlbiel %0", "tlbiel %0,0", %1)
>: : "r" (va), "i" (CPU_FTR_ARCH_206)
>: "memory");
> +#endif
> + asm volatile("tlbiel %0"
> +  : : "r" (va)
> +  : "memory");
>   break;
>   default:
>   /* We need 14 to 14 + i bits of va */
> @@ -286,9 +291,14 @@ static inline void __tlbiel(unsigned long vpn, int 
> psize, int apsize, int ssize)
>*/
>   va |= (vpn & 0xfe);
>   va |= 1; /* L */
> +#if 0
>   asm volatile(ASM_FTR_IFSET("tlbiel %0", "tlbiel %0,1", %1)
>: : "r" (va), "i" (CPU_FTR_ARCH_206)
>: "memory");
> +#endif
> + asm volatile("tlbiel %0"
> +  : : "r" (va)
> +  : "memory");
>   break;
>   }
>   trace_tlbie(0, 1, va, 0, 0, 0, 0);
> -- 
> 2.27.0
> 
>

Re: remove the legacy ide driver

2021-03-18 Thread Finn Thain

On Thu, 18 Mar 2021, Christoph Hellwig wrote:

> Hi all,
> 
> we've been trying to get rid of the legacy ide driver for a while now,
> and finally scheduled a removal for 2021, which is three month old now.
> 
> In general distros and most defconfigs have switched to libata long ago,
> but there are a few exceptions.  This series first switches over all
> remaining defconfigs to use libata and then removes the legacy ide
> driver.
> 
> libata mostly covers all hardware supported by the legacy ide driver.
> There are three mips drivers that are not supported, but the linux-mips
> list could not identify any users of those.  There also are two m68k
> drivers that do not have libata equivalents, which might or might not
> have users, so we'll need some input and possibly help from the m68k
> community here.
> 

A few months ago I wrote another patch to move some more platforms away 
from macide but it has not been tested yet. That is not to say you should 
wait. However, my patch does have some changes that are missing from your 
patch series, relating to ide platform devices in arch/m68k/mac/config.c. 
I hope to be able to test this patch before the 5.13 merge window closes.

Re: [RFC PATCH 4/8] powerpc/ppc_asm: use plain numbers for registers

2021-03-18 Thread Nicholas Piggin

Excerpts from Daniel Axtens's message of February 26, 2021 10:12 am:
> Segher Boessenkool  writes:
> 
>> On Thu, Feb 25, 2021 at 02:10:02PM +1100, Daniel Axtens wrote:
>>> This is dumb but makes the llvm integrated assembler happy.
>>> https://github.com/ClangBuiltLinux/linux/issues/764
>>
>>> -#definer0  %r0
>>
>>> +#definer0  0
>>
>> This is a big step back (compare 9a13a524ba37).
>>
>> If you use a new enough GAS, you can use the -mregnames option and just
>> say "r0" directly (so not define it at all, or define it to itself).
>>
>> ===
>> addi 3,3,3
>> addi r3,r3,3
>> addi %r3,%r3,3
>>
>> addi 3,3,3
>> addi r3,r3,r3
>> addi %r3,%r3,%r3
>> ===
>>
>> $ as t.s -o t.o -mregnames
>> t.s: Assembler messages:
>> t.s:6: Warning: invalid register expression
>> t.s:7: Warning: invalid register expression
>>
>>
>> Many people do not like bare numbers.  It is a bit like not wearing
>> seatbelts (but so is all assembler code really: you just have to pay
>> attention).  A better argument is that it is harder to read for people
>> not used to assembler code like this.
>>
>> We used to have "#define r0 0" etc., and that was quite problematic.
>> Like that "addi r3,r3,r3" example, but also, people wrote "r0" where
>> only a plain 0 is allowed (like in "lwzx r3,0,r3": "r0" would be
>> misleading there!)
> 
> So an overarching comment on all of these patches is that they're not
> intended to be ready to merge, nor are they necessarily what I think is
> the best solution. I'm just swinging a big hammer to see how far towards
> LLVM_IAS=1 I can get on powerpc, and I accept I'm going to have to come
> back and clean things up.
> 
> Anyway, noted, I'll push harder on trying to get llvm to accept %rN:
> there was a patch that went in after llvm-11 that should help.

If you put it under ifdef CONFIG_CC_IS_CLANG in the meantime I think 
that would be okay. Then we get error checking with gcc compiles and
llvm at least builds with its assembler which would be nice.

Thanks,
Nick

Re: [RFC PATCH 3/8] powerpc/head-64: do less gas-specific stuff with sections

2021-03-18 Thread Nicholas Piggin

Excerpts from Daniel Axtens's message of February 25, 2021 1:10 pm:
> Reopening the section without specifying the same flags breaks
> the llvm integrated assembler. Don't do it: just specify all the
> flags all the time.

I don't have a problem with this but llvm might want to track the issue 
if it aims to be compatible with gas if you haven't alread opened an 
issue.

When you fix the patch (perhaps add a quick comment as well?), then

Acked-by: Nicholas Piggin 

Thanks,
Nick

> 
> Signed-off-by: Daniel Axtens 
> ---
>  arch/powerpc/include/asm/head-64.h | 8 
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/head-64.h 
> b/arch/powerpc/include/asm/head-64.h
> index 4cb9efa2eb21..7d8ccab47e86 100644
> --- a/arch/powerpc/include/asm/head-64.h
> +++ b/arch/powerpc/include/asm/head-64.h
> @@ -15,10 +15,10 @@
>  .macro define_data_ftsec name
>   .section ".head.data.\name\()","a",@progbits
>  .endm
> -.macro use_ftsec name
> - .section ".head.text.\name\()"
> -.endm
> -
> +//.macro use_ftsec name
> +//   .section ".head.text.\name\()"
> +//.endm
> +#define use_ftsec define_ftsec
>  /*
>   * Fixed (location) sections are used by opening fixed sections and emitting
>   * fixed section entries into them before closing them. Multiple fixed 
> sections
> -- 
> 2.27.0
> 
>

Re: [RFC PATCH 2/8] powerpc: check for support for -Wa, -m{power4, any}

2021-03-18 Thread Nicholas Piggin

Excerpts from Daniel Axtens's message of February 25, 2021 1:10 pm:
> LLVM's integrated assembler does not like either -Wa,-mpower4
> or -Wa,-many. So just don't pass them if they're not supported.
> 
> Signed-off-by: Daniel Axtens 
> ---
>  arch/powerpc/Makefile | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile
> index 08cf0eade56a..3e2c72d20bb8 100644
> --- a/arch/powerpc/Makefile
> +++ b/arch/powerpc/Makefile
> @@ -252,7 +252,9 @@ cpu-as-$(CONFIG_E500) += -Wa,-me500
>  # When using '-many -mpower4' gas will first try and find a matching power4
>  # mnemonic and failing that it will allow any valid mnemonic that GAS knows
>  # about. GCC will pass -many to GAS when assembling, clang does not.
> -cpu-as-$(CONFIG_PPC_BOOK3S_64)   += -Wa,-mpower4 -Wa,-many
> +# LLVM IAS doesn't understand either flag: 
> https://github.com/ClangBuiltLinux/linux/issues/675
> +# but LLVM IAS only supports ISA >= 2.06 for Book3S 64 anyway...
> +cpu-as-$(CONFIG_PPC_BOOK3S_64)   += $(call 
> as-option,-Wa$(comma)-mpower4) $(call as-option,-Wa$(comma)-many)
>  cpu-as-$(CONFIG_PPC_E500MC)  += $(call as-option,-Wa$(comma)-me500mc)
>  
>  KBUILD_AFLAGS += $(cpu-as-y)

I'm wondering why we even have this now. Kbuild's "AS" command goes 
through the C compiler now with relevant options like -mcpu. I assume it 
used to be useful for cross compiling when as was called directly but
I'm not sure.

Thanks,
Nick

Re: [PATCH v9 1/8] powerpc/mm: Implement set_memory() routines

2021-03-18 Thread Michael Ellerman

Jordan Niethe  writes:
> From: Russell Currey 
>
> The set_memory_{ro/rw/nx/x}() functions are required for STRICT_MODULE_RWX,
> and are generally useful primitives to have.  This implementation is
> designed to be completely generic across powerpc's many MMUs.
>
> It's possible that this could be optimised to be faster for specific
> MMUs, but the focus is on having a generic and safe implementation for
> now.

This won't work for the linear mapping with HPT on book3s 64. Because
the linear mapping is not in the kernel page tables.

apply_to_existing_page_range() should work that out and return an error.
But I'm not sure if callers handle that well or at all.

We might want to add a WARN_ON_ONCE() in change_memory_attr(), at least
to begin with, to report those errors, so we know when we are failing to
set permissions. Rather than silently failing and then crashing some
time later due to the permissions being wrong for some mapping.

cheers


> This implementation does not handle cases where the caller is attempting
> to change the mapping of the page it is executing from, or if another
> CPU is concurrently using the page being altered.  These cases likely
> shouldn't happen, but a more complex implementation with MMU-specific code
> could safely handle them, so that is left as a TODO for now.
>
> These functions do nothing if STRICT_KERNEL_RWX is not enabled.
>
> Reviewed-by: Daniel Axtens 
> Signed-off-by: Russell Currey 
> Signed-off-by: Christophe Leroy 
> [jpn: rebase on next plus "powerpc/mm/64s: Allow STRICT_KERNEL_RWX again"]
> Signed-off-by: Jordan Niethe 
> ---
>  arch/powerpc/Kconfig  |  1 +
>  arch/powerpc/include/asm/set_memory.h | 32 +++
>  arch/powerpc/mm/Makefile  |  2 +-
>  arch/powerpc/mm/pageattr.c| 81 +++
>  4 files changed, 115 insertions(+), 1 deletion(-)
>  create mode 100644 arch/powerpc/include/asm/set_memory.h
>  create mode 100644 arch/powerpc/mm/pageattr.c
>
> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
> index fc7f5c5933e6..4498a27ac9db 100644
> --- a/arch/powerpc/Kconfig
> +++ b/arch/powerpc/Kconfig
> @@ -135,6 +135,7 @@ config PPC
>   select ARCH_HAS_MEMBARRIER_CALLBACKS
>   select ARCH_HAS_MEMBARRIER_SYNC_CORE
>   select ARCH_HAS_SCALED_CPUTIME  if VIRT_CPU_ACCOUNTING_NATIVE 
> && PPC_BOOK3S_64
> + select ARCH_HAS_SET_MEMORY
>   select ARCH_HAS_STRICT_KERNEL_RWX   if ((PPC_BOOK3S_64 || PPC32) && 
> !HIBERNATION)
>   select ARCH_HAS_TICK_BROADCAST  if GENERIC_CLOCKEVENTS_BROADCAST
>   select ARCH_HAS_UACCESS_FLUSHCACHE
> diff --git a/arch/powerpc/include/asm/set_memory.h 
> b/arch/powerpc/include/asm/set_memory.h
> new file mode 100644
> index ..64011ea444b4
> --- /dev/null
> +++ b/arch/powerpc/include/asm/set_memory.h
> @@ -0,0 +1,32 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef _ASM_POWERPC_SET_MEMORY_H
> +#define _ASM_POWERPC_SET_MEMORY_H
> +
> +#define SET_MEMORY_RO0
> +#define SET_MEMORY_RW1
> +#define SET_MEMORY_NX2
> +#define SET_MEMORY_X 3
> +
> +int change_memory_attr(unsigned long addr, int numpages, long action);
> +
> +static inline int set_memory_ro(unsigned long addr, int numpages)
> +{
> + return change_memory_attr(addr, numpages, SET_MEMORY_RO);
> +}
> +
> +static inline int set_memory_rw(unsigned long addr, int numpages)
> +{
> + return change_memory_attr(addr, numpages, SET_MEMORY_RW);
> +}
> +
> +static inline int set_memory_nx(unsigned long addr, int numpages)
> +{
> + return change_memory_attr(addr, numpages, SET_MEMORY_NX);
> +}
> +
> +static inline int set_memory_x(unsigned long addr, int numpages)
> +{
> + return change_memory_attr(addr, numpages, SET_MEMORY_X);
> +}
> +
> +#endif
> diff --git a/arch/powerpc/mm/Makefile b/arch/powerpc/mm/Makefile
> index 3b4e9e4e25ea..d8a08abde1ae 100644
> --- a/arch/powerpc/mm/Makefile
> +++ b/arch/powerpc/mm/Makefile
> @@ -5,7 +5,7 @@
>  
>  ccflags-$(CONFIG_PPC64)  := $(NO_MINIMAL_TOC)
>  
> -obj-y:= fault.o mem.o pgtable.o mmap.o 
> maccess.o \
> +obj-y:= fault.o mem.o pgtable.o mmap.o 
> maccess.o pageattr.o \
>  init_$(BITS).o pgtable_$(BITS).o \
>  pgtable-frag.o ioremap.o ioremap_$(BITS).o \
>  init-common.o mmu_context.o drmem.o
> diff --git a/arch/powerpc/mm/pageattr.c b/arch/powerpc/mm/pageattr.c
> new file mode 100644
> index ..2da3fbab6ff7
> --- /dev/null
> +++ b/arch/powerpc/mm/pageattr.c
> @@ -0,0 +1,81 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +/*
> + * MMU-generic set_memory implementation for powerpc
> + *
> + * Copyright 2019, IBM Corporation.
> + */
> +
> +#include 
> +#include 
> +
> +#include 
> +#include 
> +#include 
> +
> +
> +/*
> + * Updates the attributes of a page in three steps:
> + *
> + * 1. invalidate the page tabl

Re: [PATCH] powerpc/mm: Revert "powerpc/mm: Remove DEBUG_VM_PGTABLE support on powerpc"

2021-03-18 Thread Michael Ellerman

"Aneesh Kumar K.V"  writes:
> This reverts commit 675bceb097e6 ("powerpc/mm: Remove DEBUG_VM_PGTABLE 
> support on powerpc")
>
> All the related issues are fixed by the series
> https://lore.kernel.org/linux-mm/20200902114222.181353-1-aneesh.ku...@linux.ibm.com

Was that series merged?

If so this seems like this could be tagged as a Fix for the last commit
in that series.

cheers

> Hence enable it back
>
> Signed-off-by: Aneesh Kumar K.V 
> ---
>  Documentation/features/debug/debug-vm-pgtable/arch-support.txt | 2 +-
>  arch/powerpc/Kconfig   | 1 +
>  2 files changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/Documentation/features/debug/debug-vm-pgtable/arch-support.txt 
> b/Documentation/features/debug/debug-vm-pgtable/arch-support.txt
> index 7aff505af706..fa83403b4aec 100644
> --- a/Documentation/features/debug/debug-vm-pgtable/arch-support.txt
> +++ b/Documentation/features/debug/debug-vm-pgtable/arch-support.txt
> @@ -21,7 +21,7 @@
>  |   nios2: | TODO |
>  |openrisc: | TODO |
>  |  parisc: | TODO |
> -| powerpc: | TODO |
> +| powerpc: |  ok  |
>  |   riscv: |  ok  |
>  |s390: |  ok  |
>  |  sh: | TODO |
> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
> index 386ae12d8523..982c87d5c051 100644
> --- a/arch/powerpc/Kconfig
> +++ b/arch/powerpc/Kconfig
> @@ -119,6 +119,7 @@ config PPC
>   #
>   select ARCH_32BIT_OFF_T if PPC32
>   select ARCH_HAS_DEBUG_VIRTUAL
> + select ARCH_HAS_DEBUG_VM_PGTABLE
>   select ARCH_HAS_DEVMEM_IS_ALLOWED
>   select ARCH_HAS_ELF_RANDOMIZE
>   select ARCH_HAS_FORTIFY_SOURCE
> -- 
> 2.30.2

[PATCH] powerpc/iommu/debug: fix ifnullfree.cocci warnings

2021-03-18 Thread kernel test robot

From: kernel test robot 

arch/powerpc/kernel/iommu.c:76:2-16: WARNING: NULL check before some freeing 
functions is not needed.

 NULL check before some freeing functions is not needed.

 Based on checkpatch warning
 "kfree(NULL) is safe this check is probably not required"
 and kfreeaddr.cocci by Julia Lawall.

Generated by: scripts/coccinelle/free/ifnullfree.cocci

Fixes: 691602aab9c3 ("powerpc/iommu/debug: Add debugfs entries for IOMMU 
tables")
CC: Alexey Kardashevskiy 
Reported-by: kernel test robot 
Signed-off-by: kernel test robot 
---

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 
master
head:   81aa0968b7ea6dbabcdcda37dc8434dca6e1565b
commit: 691602aab9c3cce31d3ff9529c09b7922a5f6224 powerpc/iommu/debug: Add 
debugfs entries for IOMMU tables

 iommu.c |3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

--- a/arch/powerpc/kernel/iommu.c
+++ b/arch/powerpc/kernel/iommu.c
@@ -72,8 +72,7 @@ static void iommu_debugfs_del(struct iom
 
sprintf(name, "%08lx", tbl->it_index);
liobn_entry = debugfs_lookup(name, iommu_debugfs_dir);
-   if (liobn_entry)
-   debugfs_remove(liobn_entry);
+   debugfs_remove(liobn_entry);
 }
 #else
 static void iommu_debugfs_add(struct iommu_table *tbl){}

Re: [PATCH] net: marvell: Remove reference to CONFIG_MV64X60

2021-03-18 Thread patchwork-bot+netdevbpf

Hello:

This patch was applied to netdev/net.git (refs/heads/master):

On Thu, 18 Mar 2021 17:25:08 + (UTC) you wrote:
> Commit 92c8c16f3457 ("powerpc/embedded6xx: Remove C2K board support")
> removed last selector of CONFIG_MV64X60.
> 
> As it is not a user selectable config item, all references to it
> are stale. Remove them.
> 
> Signed-off-by: Christophe Leroy 
> 
> [...]

Here is the summary with links:
  - net: marvell: Remove reference to CONFIG_MV64X60
https://git.kernel.org/netdev/net/c/600cc3c9c62d

You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html

[PATCH v2] powerpc/qspinlock: Use generic smp_cond_load_relaxed

2021-03-18 Thread Davidlohr Bueso

49a7d46a06c3 (powerpc: Implement smp_cond_load_relaxed()) added
busy-waiting pausing with a preferred SMT priority pattern, lowering
the priority (reducing decode cycles) during the whole loop slowpath.

However, data shows that while this pattern works well with simple
spinlocks, queued spinlocks benefit more being kept in medium priority,
with a cpu_relax() instead, being a low+medium combo on powerpc.

Data is from three benchmarks on a Power9: 9008-22L 64 CPUs with
2 sockets and 8 threads per core.

1. locktorture.

This is data for the lowest and most artificial/pathological level,
with increasing thread counts pounding on the lock. Metrics are total
ops/minute. Despite some small hits in the 4-8 range, scenarios are
either neutral or favorable to this patch.

+=+==+==+===+
| # tasks | vanilla  | dirty| %diff |
+=+==+==+===+
| 2   | 46718565 | 48751350 | 4.35  |
+-+--+--+---+
| 4   | 51740198 | 50369082 | -2.65 |
+-+--+--+---+
| 8   | 63756510 | 62568821 | -1.86 |
+-+--+--+---+
| 16  | 67824531 | 70966546 | 4.63  |
+-+--+--+---+
| 32  | 53843519 | 61155508 | 13.58 |
+-+--+--+---+
| 64  | 53005778 | 53104412 | 0.18  |
+-+--+--+---+
| 128 | 53331980 | 54606910 | 2.39  |
+=+==+==+===+

2. sockperf (tcp throughput)

Here a client will do one-way throughput tests to a localhost server, with
increasing message sizes, dealing with the sk_lock. This patch shows to put
the performance of the qspinlock back to par with that of the simple lock:

 simple-spinlock   vanilla  dirty
Hmean 1473.50 (   0.00%)   54.44 * -25.93%*   73.45 * 
-0.07%*
Hmean 100  654.47 (   0.00%)  385.61 * -41.08%*  771.43 * 
17.87%*
Hmean 300 2719.39 (   0.00%) 2181.67 * -19.77%* 2666.50 * 
-1.94%*
Hmean 500 4400.59 (   0.00%) 3390.77 * -22.95%* 4322.14 * 
-1.78%*
Hmean 850 6726.21 (   0.00%) 5264.03 * -21.74%* 6863.12 * 2.04%*

3. dbench (tmpfs)

Configured to run with up to ncpusx8 clients, it shows both latency and
throughput metrics. For the latency, with the exception of the 64 case,
there is really nothing to go by:
 vanilladirty
Amean latency-1  1.67 (   0.00%)1.67 *   0.09%*
Amean latency-2  2.15 (   0.00%)2.08 *   3.36%*
Amean latency-4  2.50 (   0.00%)2.56 *  -2.27%*
Amean latency-8  2.49 (   0.00%)2.48 *   0.31%*
Amean latency-16 2.69 (   0.00%)2.72 *  -1.37%*
Amean latency-32 2.96 (   0.00%)3.04 *  -2.60%*
Amean latency-64 7.78 (   0.00%)8.17 *  -5.07%*
Amean latency-512  186.91 (   0.00%)  186.41 *   0.27%*

For the dbench4 Throughput (misleading but traditional) there's a small
but rather constant improvement:

 vanilladirty
Hmean 1849.13 (   0.00%)  851.51 *   0.28%*
Hmean 2   1664.03 (   0.00%) 1663.94 *  -0.01%*
Hmean 4   3073.70 (   0.00%) 3104.29 *   1.00%*
Hmean 8   5624.02 (   0.00%) 5694.16 *   1.25%*
Hmean 16  9169.49 (   0.00%) 9324.43 *   1.69%*
Hmean 32 11969.37 (   0.00%)12127.09 *   1.32%*
Hmean 64 15021.12 (   0.00%)15243.14 *   1.48%*
Hmean 51214891.27 (   0.00%)15162.11 *   1.82%*

Measuring the dbench4 Per-VFS Operation latency, shows some very minor
differences within the noise level, around the 0-1% ranges.

Fixes: 49a7d46a06c3 (powerpc: Implement smp_cond_load_relaxed())
Acked-by: Nicholas Piggin 
Signed-off-by: Davidlohr Bueso 
---
Changes from v1:
Added small description and labeling smp_cond_load_relaxed requested by Nick.
Added Nick's ack.

 arch/powerpc/include/asm/barrier.h   | 16 
 arch/powerpc/include/asm/qspinlock.h |  7 +++
 2 files changed, 7 insertions(+), 16 deletions(-)

diff --git a/arch/powerpc/include/asm/barrier.h 
b/arch/powerpc/include/asm/barrier.h
index aecfde829d5d..7ae29cfb06c0 100644
--- a/arch/powerpc/include/asm/barrier.h
+++ b/arch/powerpc/include/asm/barrier.h
@@ -80,22 +80,6 @@ do { 
\
___p1;  \
 })
 
-#ifdef CONFIG_PPC64
-#define smp_cond_load_relaxed(ptr, cond_expr) ({   \
-   typeof(ptr) __PTR = (ptr);  \
-   __unqual_scalar_typeof(*ptr) VAL;   \
-   VAL = READ_ONCE(*__PTR);\
-   if (unlikely(!(cond_expr))) {   \
-   spin_begin();

Re: [PATCH 3/3] powerpc/qspinlock: Use generic smp_cond_load_relaxed

2021-03-18 Thread Davidlohr Bueso


On Tue, 16 Mar 2021, Nicholas Piggin wrote:


One request, could you add a comment in place that references
smp_cond_load_relaxed() so this commit can be found again if
someone looks at it? Something like this

/*
* smp_cond_load_relaxed was found to have performance problems if
* implemented with spin_begin()/spin_end().
*/


Sure, let me see where I can fit that in and send out a v2.

Similarly, but unrelated to this patch, is there any chance we could
remove the whole spin_until_cond() machinery and make it specific to
powerpc? This was introduced in 2017 and doesn't really have any users
outside of powerpc, except for these:

drivers/firmware/arm_scmi/driver.c: 
spin_until_cond(scmi_xfer_done_no_timeout(cinfo, xfer, stop));
drivers/firmware/arm_scmi/shmem.c:  
spin_until_cond(ioread32(&shmem->channel_status) &
drivers/net/ethernet/xilinx/ll_temac_main.c:
spin_until_cond(hard_acs_rdy_or_timeout(lp, timeout));

... which afaict only the xilinx one can actually build on powerpc.
Regardless, these could be converted to smp_cond_load_relaxed(), being
the more standard way to do optimized busy-waiting, caring more about
the family of barriers than ad-hoc SMT priorities. Of course, I have
no way of testing any of these changes.


I wonder if it should have a Fixes: tag to the original commit as
well.


I'm not sure either. I've actually been informed recently of other
workloads that benefit from the revert on large Power9 boxes. So I'll
go ahead and add it.



Otherwise,

Acked-by: Nicholas Piggin 


Thanks,
Davidlohr

Re: [PATCH] powerpc/embedded6xx: Remove CONFIG_MV64X60

2021-03-18 Thread Wolfram Sang

On Thu, Mar 18, 2021 at 05:25:07PM +, Christophe Leroy wrote:
> Commit 92c8c16f3457 ("powerpc/embedded6xx: Remove C2K board support")
> moved the last selector of CONFIG_MV64X60.
> 
> As it is not a user selectable config, it can be removed.
> 
> Signed-off-by: Christophe Leroy 

Acked-by: Wolfram Sang  # for I2C



signature.asc
Description: PGP signature

Re: [PATCH] watchdog: Remove MV64x60 watchdog driver

2021-03-18 Thread Guenter Roeck

On 3/18/21 10:25 AM, Christophe Leroy wrote:
> Commit 92c8c16f3457 ("powerpc/embedded6xx: Remove C2K board support")
> removed the last selector of CONFIG_MV64X60.
> 
> Therefore CONFIG_MV64X60_WDT cannot be selected anymore and
> can be removed.
> 
> Signed-off-by: Christophe Leroy 

Reviewed-by: Guenter Roeck 

> ---
>  drivers/watchdog/Kconfig   |   4 -
>  drivers/watchdog/Makefile  |   1 -
>  drivers/watchdog/mv64x60_wdt.c | 324 -
>  include/linux/mv643xx.h|   8 -
>  4 files changed, 337 deletions(-)
>  delete mode 100644 drivers/watchdog/mv64x60_wdt.c
> 
> diff --git a/drivers/watchdog/Kconfig b/drivers/watchdog/Kconfig
> index 1fe0042a48d2..178296bda151 100644
> --- a/drivers/watchdog/Kconfig
> +++ b/drivers/watchdog/Kconfig
> @@ -1831,10 +1831,6 @@ config 8xxx_WDT
>  
> For BookE processors (MPC85xx) use the BOOKE_WDT driver instead.
>  
> -config MV64X60_WDT
> - tristate "MV64X60 (Marvell Discovery) Watchdog Timer"
> - depends on MV64X60 || COMPILE_TEST
> -
>  config PIKA_WDT
>   tristate "PIKA FPGA Watchdog"
>   depends on WARP || (PPC64 && COMPILE_TEST)
> diff --git a/drivers/watchdog/Makefile b/drivers/watchdog/Makefile
> index f3a6540e725e..752c6513f731 100644
> --- a/drivers/watchdog/Makefile
> +++ b/drivers/watchdog/Makefile
> @@ -175,7 +175,6 @@ obj-$(CONFIG_PIC32_DMT) += pic32-dmt.o
>  # POWERPC Architecture
>  obj-$(CONFIG_GEF_WDT) += gef_wdt.o
>  obj-$(CONFIG_8xxx_WDT) += mpc8xxx_wdt.o
> -obj-$(CONFIG_MV64X60_WDT) += mv64x60_wdt.o
>  obj-$(CONFIG_PIKA_WDT) += pika_wdt.o
>  obj-$(CONFIG_BOOKE_WDT) += booke_wdt.o
>  obj-$(CONFIG_MEN_A21_WDT) += mena21_wdt.o
> diff --git a/drivers/watchdog/mv64x60_wdt.c b/drivers/watchdog/mv64x60_wdt.c
> deleted file mode 100644
> index 894aa63488d3..
> --- a/drivers/watchdog/mv64x60_wdt.c
> +++ /dev/null
> @@ -1,324 +0,0 @@
> -// SPDX-License-Identifier: GPL-2.0
> -/*
> - * mv64x60_wdt.c - MV64X60 (Marvell Discovery) watchdog userspace interface
> - *
> - * Author: James Chapman 
> - *
> - * Platform-specific setup code should configure the dog to generate
> - * interrupt or reset as required.  This code only enables/disables
> - * and services the watchdog.
> - *
> - * Derived from mpc8xx_wdt.c, with the following copyright.
> - *
> - * 2002 (c) Florian Schirmer 
> - */
> -
> -#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
> -
> -#include 
> -#include 
> -#include 
> -#include 
> -#include 
> -#include 
> -#include 
> -#include 
> -#include 
> -#include 
> -
> -#define MV64x60_WDT_WDC_OFFSET   0
> -
> -/*
> - * The watchdog configuration register contains a pair of 2-bit fields,
> - *   1.  a reload field, bits 27-26, which triggers a reload of
> - *   the countdown register, and
> - *   2.  an enable field, bits 25-24, which toggles between
> - *   enabling and disabling the watchdog timer.
> - * Bit 31 is a read-only field which indicates whether the
> - * watchdog timer is currently enabled.
> - *
> - * The low 24 bits contain the timer reload value.
> - */
> -#define MV64x60_WDC_ENABLE_SHIFT 24
> -#define MV64x60_WDC_SERVICE_SHIFT26
> -#define MV64x60_WDC_ENABLED_SHIFT31
> -
> -#define MV64x60_WDC_ENABLED_TRUE 1
> -#define MV64x60_WDC_ENABLED_FALSE0
> -
> -/* Flags bits */
> -#define MV64x60_WDOG_FLAG_OPENED 0
> -
> -static unsigned long wdt_flags;
> -static int wdt_status;
> -static void __iomem *mv64x60_wdt_regs;
> -static int mv64x60_wdt_timeout;
> -static int mv64x60_wdt_count;
> -static unsigned int bus_clk;
> -static char expect_close;
> -static DEFINE_SPINLOCK(mv64x60_wdt_spinlock);
> -
> -static bool nowayout = WATCHDOG_NOWAYOUT;
> -module_param(nowayout, bool, 0);
> -MODULE_PARM_DESC(nowayout,
> - "Watchdog cannot be stopped once started (default="
> - __MODULE_STRING(WATCHDOG_NOWAYOUT) ")");
> -
> -static int mv64x60_wdt_toggle_wdc(int enabled_predicate, int field_shift)
> -{
> - u32 data;
> - u32 enabled;
> - int ret = 0;
> -
> - spin_lock(&mv64x60_wdt_spinlock);
> - data = readl(mv64x60_wdt_regs + MV64x60_WDT_WDC_OFFSET);
> - enabled = (data >> MV64x60_WDC_ENABLED_SHIFT) & 1;
> -
> - /* only toggle the requested field if enabled state matches predicate */
> - if ((enabled ^ enabled_predicate) == 0) {
> - /* We write a 1, then a 2 -- to the appropriate field */
> - data = (1 << field_shift) | mv64x60_wdt_count;
> - writel(data, mv64x60_wdt_regs + MV64x60_WDT_WDC_OFFSET);
> -
> - data = (2 << field_shift) | mv64x60_wdt_count;
> - writel(data, mv64x60_wdt_regs + MV64x60_WDT_WDC_OFFSET);
> - ret = 1;
> - }
> - spin_unlock(&mv64x60_wdt_spinlock);
> -
> - return ret;
> -}
> -
> -static void mv64x60_wdt_service(void)
> -{
> - mv64x60_wdt_toggle_wdc(MV64x60_WDC_ENABLED_TRUE,
> -MV64x60_WDC_SERVICE_SHIFT);
> -}
> -
> -static void mv64x60_wdt_handler_

Re: [PATCH 01/10] alpha: use libata instead of the legacy ide driver

2021-03-18 Thread Måns Rullgård

Måns Rullgård  writes:

> Christoph Hellwig  writes:
>
>> On Thu, Mar 18, 2021 at 05:54:55AM +, Al Viro wrote:
>>> On Thu, Mar 18, 2021 at 05:56:57AM +0100, Christoph Hellwig wrote:
>>> > Switch the alpha defconfig from the legacy ide driver to libata.
>>> 
>>> Umm...  I don't have an IDE alpha box in a usable shape (fans on
>>> CPU module shat themselves), and it would take a while to resurrect
>>> it, but I remember the joy it used to cause in some versions.
>>> 
>>> Do you have reports of libata variants of drivers actually tested on
>>> those?
>>
>> No, I haven't.  The whole point is that we're not going to keep 4
>> lines of code around despite notice for users that don't exist or
>> care.  If there is a regression we'll fix it, but we're not going to
>> make life miserable just because we can.
>
> The pata_ali driver works fine on my UP1500 machine, unless something
> broke recently.  I'll build the latest kernel and report back.

5.11.7 seems fine too.

-- 
Måns Rullgård

[PATCH 1/1] powerpc/kernel/iommu: Use largepool as a last resort when !largealloc

2021-03-18 Thread Leonardo Bras

As of today, doing iommu_range_alloc() only for !largealloc (npages <= 15)
will only be able to use 3/4 of the available pages, given pages on
largepool  not being available for !largealloc.

This could mean some drivers not being able to fully use all the available
pages for the DMA window.

Add pages on largepool as a last resort for !largealloc, making all pages
of the DMA window available.

Signed-off-by: Leonardo Bras 
Reviewed-by: Alexey Kardashevskiy 
---
 arch/powerpc/kernel/iommu.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c
index 3329ef045805..ae6ad8dca605 100644
--- a/arch/powerpc/kernel/iommu.c
+++ b/arch/powerpc/kernel/iommu.c
@@ -255,6 +255,15 @@ static unsigned long iommu_range_alloc(struct device *dev,
pass++;
goto again;
 
+   } else if (pass == tbl->nr_pools + 1) {
+   /* Last resort: try largepool */
+   spin_unlock(&pool->lock);
+   pool = &tbl->large_pool;
+   spin_lock(&pool->lock);
+   pool->hint = pool->start;
+   pass++;
+   goto again;
+
} else {
/* Give up */
spin_unlock_irqrestore(&(pool->lock), flags);
-- 
2.29.2

[PATCH 1/1] powerpc/kernel/iommu: Align size for IOMMU_PAGE_SIZE() to save TCEs

2021-03-18 Thread Leonardo Bras

Currently both iommu_alloc_coherent() and iommu_free_coherent() align the
desired allocation size to PAGE_SIZE, and gets system pages and IOMMU
mappings (TCEs) for that value.

When IOMMU_PAGE_SIZE < PAGE_SIZE, this behavior may cause unnecessary
TCEs to be created for mapping the whole system page.

Example:
- PAGE_SIZE = 64k, IOMMU_PAGE_SIZE() = 4k
- iommu_alloc_coherent() is called for 128 bytes
- 1 system page (64k) is allocated
- 16 IOMMU pages (16 x 4k) are allocated (16 TCEs used)

It would be enough to use a single TCE for this, so 15 TCEs are
wasted in the process.

Update iommu_*_coherent() to make sure the size alignment happens only
for IOMMU_PAGE_SIZE() before calling iommu_alloc() and iommu_free().

Also, on iommu_range_alloc(), replace ALIGN(n, 1 << tbl->it_page_shift)
with IOMMU_PAGE_ALIGN(n, tbl), which is easier to read and does the
same.

Signed-off-by: Leonardo Bras 
Reviewed-by: Alexey Kardashevskiy 
---
 arch/powerpc/kernel/iommu.c | 11 ++-
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c
index 5b69a6a72a0e..3329ef045805 100644
--- a/arch/powerpc/kernel/iommu.c
+++ b/arch/powerpc/kernel/iommu.c
@@ -851,6 +851,7 @@ void *iommu_alloc_coherent(struct device *dev, struct 
iommu_table *tbl,
unsigned int order;
unsigned int nio_pages, io_order;
struct page *page;
+   size_t size_io = size;
 
size = PAGE_ALIGN(size);
order = get_order(size);
@@ -877,8 +878,9 @@ void *iommu_alloc_coherent(struct device *dev, struct 
iommu_table *tbl,
memset(ret, 0, size);
 
/* Set up tces to cover the allocated range */
-   nio_pages = size >> tbl->it_page_shift;
-   io_order = get_iommu_order(size, tbl);
+   size_io = IOMMU_PAGE_ALIGN(size_io, tbl);
+   nio_pages = size_io >> tbl->it_page_shift;
+   io_order = get_iommu_order(size_io, tbl);
mapping = iommu_alloc(dev, tbl, ret, nio_pages, DMA_BIDIRECTIONAL,
  mask >> tbl->it_page_shift, io_order, 0);
if (mapping == DMA_MAPPING_ERROR) {
@@ -893,10 +895,9 @@ void iommu_free_coherent(struct iommu_table *tbl, size_t 
size,
 void *vaddr, dma_addr_t dma_handle)
 {
if (tbl) {
-   unsigned int nio_pages;
+   size_t size_io = IOMMU_PAGE_ALIGN(size, tbl);
+   unsigned int nio_pages = size_io >> tbl->it_page_shift;
 
-   size = PAGE_ALIGN(size);
-   nio_pages = size >> tbl->it_page_shift;
iommu_free(tbl, dma_handle, nio_pages);
size = PAGE_ALIGN(size);
free_pages((unsigned long)vaddr, get_order(size));
-- 
2.29.2

[PATCH] net: marvell: Remove reference to CONFIG_MV64X60

2021-03-18 Thread Christophe Leroy

Commit 92c8c16f3457 ("powerpc/embedded6xx: Remove C2K board support")
removed last selector of CONFIG_MV64X60.

As it is not a user selectable config item, all references to it
are stale. Remove them.

Signed-off-by: Christophe Leroy 
---
 drivers/net/ethernet/marvell/Kconfig   | 4 ++--
 drivers/net/ethernet/marvell/mv643xx_eth.c | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/marvell/Kconfig 
b/drivers/net/ethernet/marvell/Kconfig
index 7fe15a3286f4..fe0989c0fc25 100644
--- a/drivers/net/ethernet/marvell/Kconfig
+++ b/drivers/net/ethernet/marvell/Kconfig
@@ -6,7 +6,7 @@
 config NET_VENDOR_MARVELL
bool "Marvell devices"
default y
-   depends on PCI || CPU_PXA168 || MV64X60 || PPC32 || PLAT_ORION || INET 
|| COMPILE_TEST
+   depends on PCI || CPU_PXA168 || PPC32 || PLAT_ORION || INET || 
COMPILE_TEST
help
  If you have a network (Ethernet) card belonging to this class, say Y.
 
@@ -19,7 +19,7 @@ if NET_VENDOR_MARVELL
 
 config MV643XX_ETH
tristate "Marvell Discovery (643XX) and Orion ethernet support"
-   depends on MV64X60 || PPC32 || PLAT_ORION || COMPILE_TEST
+   depends on PPC32 || PLAT_ORION || COMPILE_TEST
depends on INET
select PHYLIB
select MVMDIO
diff --git a/drivers/net/ethernet/marvell/mv643xx_eth.c 
b/drivers/net/ethernet/marvell/mv643xx_eth.c
index 90e6111ce534..3bfb659b5c99 100644
--- a/drivers/net/ethernet/marvell/mv643xx_eth.c
+++ b/drivers/net/ethernet/marvell/mv643xx_eth.c
@@ -2684,7 +2684,7 @@ static const struct of_device_id mv643xx_eth_shared_ids[] 
= {
 MODULE_DEVICE_TABLE(of, mv643xx_eth_shared_ids);
 #endif
 
-#if defined(CONFIG_OF_IRQ) && !defined(CONFIG_MV64X60)
+#ifdef CONFIG_OF_IRQ
 #define mv643xx_eth_property(_np, _name, _v)   \
do {\
u32 tmp;\
-- 
2.25.0

[PATCH] watchdog: Remove MV64x60 watchdog driver

2021-03-18 Thread Christophe Leroy

Commit 92c8c16f3457 ("powerpc/embedded6xx: Remove C2K board support")
removed the last selector of CONFIG_MV64X60.

Therefore CONFIG_MV64X60_WDT cannot be selected anymore and
can be removed.

Signed-off-by: Christophe Leroy 
---
 drivers/watchdog/Kconfig   |   4 -
 drivers/watchdog/Makefile  |   1 -
 drivers/watchdog/mv64x60_wdt.c | 324 -
 include/linux/mv643xx.h|   8 -
 4 files changed, 337 deletions(-)
 delete mode 100644 drivers/watchdog/mv64x60_wdt.c

diff --git a/drivers/watchdog/Kconfig b/drivers/watchdog/Kconfig
index 1fe0042a48d2..178296bda151 100644
--- a/drivers/watchdog/Kconfig
+++ b/drivers/watchdog/Kconfig
@@ -1831,10 +1831,6 @@ config 8xxx_WDT
 
  For BookE processors (MPC85xx) use the BOOKE_WDT driver instead.
 
-config MV64X60_WDT
-   tristate "MV64X60 (Marvell Discovery) Watchdog Timer"
-   depends on MV64X60 || COMPILE_TEST
-
 config PIKA_WDT
tristate "PIKA FPGA Watchdog"
depends on WARP || (PPC64 && COMPILE_TEST)
diff --git a/drivers/watchdog/Makefile b/drivers/watchdog/Makefile
index f3a6540e725e..752c6513f731 100644
--- a/drivers/watchdog/Makefile
+++ b/drivers/watchdog/Makefile
@@ -175,7 +175,6 @@ obj-$(CONFIG_PIC32_DMT) += pic32-dmt.o
 # POWERPC Architecture
 obj-$(CONFIG_GEF_WDT) += gef_wdt.o
 obj-$(CONFIG_8xxx_WDT) += mpc8xxx_wdt.o
-obj-$(CONFIG_MV64X60_WDT) += mv64x60_wdt.o
 obj-$(CONFIG_PIKA_WDT) += pika_wdt.o
 obj-$(CONFIG_BOOKE_WDT) += booke_wdt.o
 obj-$(CONFIG_MEN_A21_WDT) += mena21_wdt.o
diff --git a/drivers/watchdog/mv64x60_wdt.c b/drivers/watchdog/mv64x60_wdt.c
deleted file mode 100644
index 894aa63488d3..
--- a/drivers/watchdog/mv64x60_wdt.c
+++ /dev/null
@@ -1,324 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0
-/*
- * mv64x60_wdt.c - MV64X60 (Marvell Discovery) watchdog userspace interface
- *
- * Author: James Chapman 
- *
- * Platform-specific setup code should configure the dog to generate
- * interrupt or reset as required.  This code only enables/disables
- * and services the watchdog.
- *
- * Derived from mpc8xx_wdt.c, with the following copyright.
- *
- * 2002 (c) Florian Schirmer 
- */
-
-#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
-
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-
-#define MV64x60_WDT_WDC_OFFSET 0
-
-/*
- * The watchdog configuration register contains a pair of 2-bit fields,
- *   1.  a reload field, bits 27-26, which triggers a reload of
- *   the countdown register, and
- *   2.  an enable field, bits 25-24, which toggles between
- *   enabling and disabling the watchdog timer.
- * Bit 31 is a read-only field which indicates whether the
- * watchdog timer is currently enabled.
- *
- * The low 24 bits contain the timer reload value.
- */
-#define MV64x60_WDC_ENABLE_SHIFT   24
-#define MV64x60_WDC_SERVICE_SHIFT  26
-#define MV64x60_WDC_ENABLED_SHIFT  31
-
-#define MV64x60_WDC_ENABLED_TRUE   1
-#define MV64x60_WDC_ENABLED_FALSE  0
-
-/* Flags bits */
-#define MV64x60_WDOG_FLAG_OPENED   0
-
-static unsigned long wdt_flags;
-static int wdt_status;
-static void __iomem *mv64x60_wdt_regs;
-static int mv64x60_wdt_timeout;
-static int mv64x60_wdt_count;
-static unsigned int bus_clk;
-static char expect_close;
-static DEFINE_SPINLOCK(mv64x60_wdt_spinlock);
-
-static bool nowayout = WATCHDOG_NOWAYOUT;
-module_param(nowayout, bool, 0);
-MODULE_PARM_DESC(nowayout,
-   "Watchdog cannot be stopped once started (default="
-   __MODULE_STRING(WATCHDOG_NOWAYOUT) ")");
-
-static int mv64x60_wdt_toggle_wdc(int enabled_predicate, int field_shift)
-{
-   u32 data;
-   u32 enabled;
-   int ret = 0;
-
-   spin_lock(&mv64x60_wdt_spinlock);
-   data = readl(mv64x60_wdt_regs + MV64x60_WDT_WDC_OFFSET);
-   enabled = (data >> MV64x60_WDC_ENABLED_SHIFT) & 1;
-
-   /* only toggle the requested field if enabled state matches predicate */
-   if ((enabled ^ enabled_predicate) == 0) {
-   /* We write a 1, then a 2 -- to the appropriate field */
-   data = (1 << field_shift) | mv64x60_wdt_count;
-   writel(data, mv64x60_wdt_regs + MV64x60_WDT_WDC_OFFSET);
-
-   data = (2 << field_shift) | mv64x60_wdt_count;
-   writel(data, mv64x60_wdt_regs + MV64x60_WDT_WDC_OFFSET);
-   ret = 1;
-   }
-   spin_unlock(&mv64x60_wdt_spinlock);
-
-   return ret;
-}
-
-static void mv64x60_wdt_service(void)
-{
-   mv64x60_wdt_toggle_wdc(MV64x60_WDC_ENABLED_TRUE,
-  MV64x60_WDC_SERVICE_SHIFT);
-}
-
-static void mv64x60_wdt_handler_enable(void)
-{
-   if (mv64x60_wdt_toggle_wdc(MV64x60_WDC_ENABLED_FALSE,
-  MV64x60_WDC_ENABLE_SHIFT)) {
-   mv64x60_wdt_service();
-   pr_notice("watchdog activated\n");
-   }
-}
-
-static void mv64x60_wdt_handler_disable(void)
-{
-   if (

[PATCH] powerpc/embedded6xx: Remove CONFIG_MV64X60

2021-03-18 Thread Christophe Leroy

Commit 92c8c16f3457 ("powerpc/embedded6xx: Remove C2K board support")
moved the last selector of CONFIG_MV64X60.

As it is not a user selectable config, it can be removed.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/platforms/embedded6xx/Kconfig | 5 -
 drivers/i2c/busses/Kconfig | 2 +-
 2 files changed, 1 insertion(+), 6 deletions(-)

diff --git a/arch/powerpc/platforms/embedded6xx/Kconfig 
b/arch/powerpc/platforms/embedded6xx/Kconfig
index c1920961f410..4c6d703a4284 100644
--- a/arch/powerpc/platforms/embedded6xx/Kconfig
+++ b/arch/powerpc/platforms/embedded6xx/Kconfig
@@ -71,11 +71,6 @@ config MPC10X_BRIDGE
bool
select PPC_INDIRECT_PCI
 
-config MV64X60
-   bool
-   select PPC_INDIRECT_PCI
-   select CHECK_CACHE_COHERENCY
-
 config GAMECUBE_COMMON
bool
 
diff --git a/drivers/i2c/busses/Kconfig b/drivers/i2c/busses/Kconfig
index 05ebf7546e3f..20edcda1c6f4 100644
--- a/drivers/i2c/busses/Kconfig
+++ b/drivers/i2c/busses/Kconfig
@@ -776,7 +776,7 @@ config I2C_MT7621
 
 config I2C_MV64XXX
tristate "Marvell mv64xxx I2C Controller"
-   depends on MV64X60 || PLAT_ORION || ARCH_SUNXI || ARCH_MVEBU || 
COMPILE_TEST
+   depends on PLAT_ORION || ARCH_SUNXI || ARCH_MVEBU || COMPILE_TEST
help
  If you say yes to this option, support will be included for the
  built-in I2C interface on the Marvell 64xxx line of host bridges.
-- 
2.25.0

Re: [PATCH 01/10] alpha: use libata instead of the legacy ide driver

2021-03-18 Thread Måns Rullgård

Christoph Hellwig  writes:

> On Thu, Mar 18, 2021 at 05:54:55AM +, Al Viro wrote:
>> On Thu, Mar 18, 2021 at 05:56:57AM +0100, Christoph Hellwig wrote:
>> > Switch the alpha defconfig from the legacy ide driver to libata.
>> 
>> Umm...  I don't have an IDE alpha box in a usable shape (fans on
>> CPU module shat themselves), and it would take a while to resurrect
>> it, but I remember the joy it used to cause in some versions.
>> 
>> Do you have reports of libata variants of drivers actually tested on
>> those?
>
> No, I haven't.  The whole point is that we're not going to keep 4
> lines of code around despite notice for users that don't exist or
> care.  If there is a regression we'll fix it, but we're not going to
> make life miserable just because we can.

The pata_ali driver works fine on my UP1500 machine, unless something
broke recently.  I'll build the latest kernel and report back.

-- 
Måns Rullgård

Re: [PATCH 08/10] MIPS: disable CONFIG_IDE in malta*_defconfig

2021-03-18 Thread Thomas Bogendoerfer

On Thu, Mar 18, 2021 at 05:57:04AM +0100, Christoph Hellwig wrote:
>  arch/mips/configs/malta_kvm_guest_defconfig | 3 ---

that file is gone in mips-next.

I could take all MIPS patches into mips-next, if you want...

Thomas.

-- 
Crap can work. Given enough thrust pigs will fly, but it's not necessarily a
good idea.[ RFC1925, 2.3 ]

[PATCH 3/3] swiotlb: remove swiotlb_nr_tbl

2021-03-18 Thread Christoph Hellwig

All callers just use it to check if swiotlb is active at all, for which
they can just use is_swiotlb_active.  In the longer run drivers need
to stop using is_swiotlb_active as well, but let's do the simple step
first.

Signed-off-by: Christoph Hellwig 
---
 drivers/gpu/drm/i915/gem/i915_gem_internal.c | 2 +-
 drivers/gpu/drm/nouveau/nouveau_ttm.c| 2 +-
 drivers/pci/xen-pcifront.c   | 2 +-
 include/linux/swiotlb.h  | 1 -
 kernel/dma/swiotlb.c | 7 +--
 5 files changed, 4 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_internal.c 
b/drivers/gpu/drm/i915/gem/i915_gem_internal.c
index ad22f42541bda6..a9d65fc8aa0eab 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_internal.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_internal.c
@@ -42,7 +42,7 @@ static int i915_gem_object_get_pages_internal(struct 
drm_i915_gem_object *obj)
 
max_order = MAX_ORDER;
 #ifdef CONFIG_SWIOTLB
-   if (swiotlb_nr_tbl()) {
+   if (is_swiotlb_active()) {
unsigned int max_segment;
 
max_segment = swiotlb_max_segment();
diff --git a/drivers/gpu/drm/nouveau/nouveau_ttm.c 
b/drivers/gpu/drm/nouveau/nouveau_ttm.c
index a37bc3d7b38b3b..9662522aa0664a 100644
--- a/drivers/gpu/drm/nouveau/nouveau_ttm.c
+++ b/drivers/gpu/drm/nouveau/nouveau_ttm.c
@@ -321,7 +321,7 @@ nouveau_ttm_init(struct nouveau_drm *drm)
}
 
 #if IS_ENABLED(CONFIG_SWIOTLB) && IS_ENABLED(CONFIG_X86)
-   need_swiotlb = !!swiotlb_nr_tbl();
+   need_swiotlb = is_swiotlb_active();
 #endif
 
ret = ttm_bo_device_init(&drm->ttm.bdev, &nouveau_bo_driver,
diff --git a/drivers/pci/xen-pcifront.c b/drivers/pci/xen-pcifront.c
index 2d75026482197d..b7a8f3a1921f83 100644
--- a/drivers/pci/xen-pcifront.c
+++ b/drivers/pci/xen-pcifront.c
@@ -693,7 +693,7 @@ static int pcifront_connect_and_init_dma(struct 
pcifront_device *pdev)
 
spin_unlock(&pcifront_dev_lock);
 
-   if (!err && !swiotlb_nr_tbl()) {
+   if (!err && !is_swiotlb_active()) {
err = pci_xen_swiotlb_init_late();
if (err)
dev_err(&pdev->xdev->dev, "Could not setup SWIOTLB!\n");
diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index 63f7a63f61d098..216854a5e5134b 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -37,7 +37,6 @@ enum swiotlb_force {
 
 extern void swiotlb_init(int verbose);
 int swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int verbose);
-extern unsigned long swiotlb_nr_tbl(void);
 unsigned long swiotlb_size_or_default(void);
 extern int swiotlb_late_init_with_tbl(char *tlb, unsigned long nslabs);
 extern int swiotlb_late_init_with_default_size(size_t default_size);
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 13de669a9b4681..539c76beb52e07 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -94,12 +94,6 @@ setup_io_tlb_npages(char *str)
 }
 early_param("swiotlb", setup_io_tlb_npages);
 
-unsigned long swiotlb_nr_tbl(void)
-{
-   return io_tlb_default_mem ? io_tlb_default_mem->nslabs : 0;
-}
-EXPORT_SYMBOL_GPL(swiotlb_nr_tbl);
-
 unsigned int swiotlb_max_segment(void)
 {
return io_tlb_default_mem ? max_segment : 0;
@@ -652,6 +646,7 @@ bool is_swiotlb_active(void)
 {
return io_tlb_default_mem != NULL;
 }
+EXPORT_SYMBOL_GPL(is_swiotlb_active);
 
 #ifdef CONFIG_DEBUG_FS
 
-- 
2.30.1

[PATCH 2/3] swiotlb: dynamically allocate io_tlb_default_mem

2021-03-18 Thread Christoph Hellwig

Instead of allocating ->list and ->orig_addr separately just do one
dynamic allocation for the actual io_tlb_mem structure.  This simplifies
a lot of the initialization code, and also allows to just check
io_tlb_default_mem to see if swiotlb is in use.

Signed-off-by: Christoph Hellwig 
---
 drivers/xen/swiotlb-xen.c |  22 +--
 include/linux/swiotlb.h   |  18 ++-
 kernel/dma/swiotlb.c  | 306 --
 3 files changed, 117 insertions(+), 229 deletions(-)

diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
index 5329ad54a5f34e..4c89afc0df6289 100644
--- a/drivers/xen/swiotlb-xen.c
+++ b/drivers/xen/swiotlb-xen.c
@@ -158,17 +158,14 @@ static const char *xen_swiotlb_error(enum xen_swiotlb_err 
err)
 int __ref xen_swiotlb_init(void)
 {
enum xen_swiotlb_err m_ret = XEN_SWIOTLB_UNKNOWN;
-   unsigned long nslabs, bytes, order;
-   unsigned int repeat = 3;
+   unsigned long bytes = swiotlb_size_or_default();
+   unsigned long nslabs = bytes >> IO_TLB_SHIFT;
+   unsigned int order, repeat = 3;
int rc = -ENOMEM;
char *start;
 
-   nslabs = swiotlb_nr_tbl();
-   if (!nslabs)
-   nslabs = DEFAULT_NSLABS;
 retry:
m_ret = XEN_SWIOTLB_ENOMEM;
-   bytes = nslabs << IO_TLB_SHIFT;
order = get_order(bytes);
 
/*
@@ -221,19 +218,16 @@ int __ref xen_swiotlb_init(void)
 #ifdef CONFIG_X86
 void __init xen_swiotlb_init_early(void)
 {
-   unsigned long nslabs, bytes;
+   unsigned long bytes = swiotlb_size_or_default();
+   unsigned long nslabs = bytes >> IO_TLB_SHIFT;
unsigned int repeat = 3;
char *start;
int rc;
 
-   nslabs = swiotlb_nr_tbl();
-   if (!nslabs)
-   nslabs = DEFAULT_NSLABS;
 retry:
/*
 * Get IO TLB memory from any location.
 */
-   bytes = nslabs << IO_TLB_SHIFT;
start = memblock_alloc(PAGE_ALIGN(bytes), PAGE_SIZE);
if (!start)
panic("%s: Failed to allocate %lu bytes align=0x%lx\n",
@@ -248,8 +242,8 @@ void __init xen_swiotlb_init_early(void)
if (repeat--) {
/* Min is 2MB */
nslabs = max(1024UL, (nslabs >> 1));
-   pr_info("Lowering to %luMB\n",
-   (nslabs << IO_TLB_SHIFT) >> 20);
+   bytes = nslabs << IO_TLB_SHIFT;
+   pr_info("Lowering to %luMB\n", bytes >> 20);
goto retry;
}
panic("%s (rc:%d)", xen_swiotlb_error(XEN_SWIOTLB_EFIXUP), rc);
@@ -548,7 +542,7 @@ xen_swiotlb_sync_sg_for_device(struct device *dev, struct 
scatterlist *sgl,
 static int
 xen_swiotlb_dma_supported(struct device *hwdev, u64 mask)
 {
-   return xen_phys_to_dma(hwdev, io_tlb_default_mem.end - 1) <= mask;
+   return xen_phys_to_dma(hwdev, io_tlb_default_mem->end - 1) <= mask;
 }
 
 const struct dma_map_ops xen_swiotlb_dma_ops = {
diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index 5ec5378b17c333..63f7a63f61d098 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -90,28 +90,30 @@ struct io_tlb_mem {
phys_addr_t end;
unsigned long nslabs;
unsigned long used;
-   unsigned int *list;
unsigned int index;
-   phys_addr_t *orig_addr;
-   size_t *alloc_size;
spinlock_t lock;
struct dentry *debugfs;
bool late_alloc;
+   struct io_tlb_slot {
+   phys_addr_t orig_addr;
+   size_t alloc_size;
+   unsigned int list;
+   } slots[];
 };
-extern struct io_tlb_mem io_tlb_default_mem;
+extern struct io_tlb_mem *io_tlb_default_mem;
 
 static inline bool is_swiotlb_buffer(phys_addr_t paddr)
 {
-   struct io_tlb_mem *mem = &io_tlb_default_mem;
+   struct io_tlb_mem *mem = io_tlb_default_mem;
 
-   return paddr >= mem->start && paddr < mem->end;
+   return mem && paddr >= mem->start && paddr < mem->end;
 }
 
 void __init swiotlb_exit(void);
 unsigned int swiotlb_max_segment(void);
 size_t swiotlb_max_mapping_size(struct device *dev);
 bool is_swiotlb_active(void);
-void __init swiotlb_adjust_size(unsigned long new_size);
+void __init swiotlb_adjust_size(unsigned long size);
 #else
 #define swiotlb_force SWIOTLB_NO_FORCE
 static inline bool is_swiotlb_buffer(phys_addr_t paddr)
@@ -135,7 +137,7 @@ static inline bool is_swiotlb_active(void)
return false;
 }
 
-static inline void swiotlb_adjust_size(unsigned long new_size)
+static inline void swiotlb_adjust_size(unsigned long size)
 {
 }
 #endif /* CONFIG_SWIOTLB */
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index d9c097f0f78cec..13de669a9b4681 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -63,7 +63,7 @@
 
 enum swiotlb_force swiotlb_force;
 
-struct io_tlb_mem io_tlb_default_mem;
+struct io_tlb_mem *io_tlb_default_mem;
 
 /*
  * Max segment that we can

[PATCH 1/3] swiotlb: move global variables into a new io_tlb_mem structure

2021-03-18 Thread Christoph Hellwig

From: Claire Chang 

Added a new struct, io_tlb_mem, as the IO TLB memory pool descriptor and
moved relevant global variables into that struct.
This will be useful later to allow for restricted DMA pool.

Signed-off-by: Claire Chang 
[hch: rebased]
Signed-off-by: Christoph Hellwig 
---
 drivers/xen/swiotlb-xen.c |   2 +-
 include/linux/swiotlb.h   |  43 -
 kernel/dma/swiotlb.c  | 354 ++
 3 files changed, 206 insertions(+), 193 deletions(-)

diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
index 4ecfce2c6f7263..5329ad54a5f34e 100644
--- a/drivers/xen/swiotlb-xen.c
+++ b/drivers/xen/swiotlb-xen.c
@@ -548,7 +548,7 @@ xen_swiotlb_sync_sg_for_device(struct device *dev, struct 
scatterlist *sgl,
 static int
 xen_swiotlb_dma_supported(struct device *hwdev, u64 mask)
 {
-   return xen_phys_to_dma(hwdev, io_tlb_end - 1) <= mask;
+   return xen_phys_to_dma(hwdev, io_tlb_default_mem.end - 1) <= mask;
 }
 
 const struct dma_map_ops xen_swiotlb_dma_ops = {
diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index 0696bdc8072e97..5ec5378b17c333 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -6,6 +6,7 @@
 #include 
 #include 
 #include 
+#include 
 
 struct device;
 struct page;
@@ -61,11 +62,49 @@ dma_addr_t swiotlb_map(struct device *dev, phys_addr_t phys,
 
 #ifdef CONFIG_SWIOTLB
 extern enum swiotlb_force swiotlb_force;
-extern phys_addr_t io_tlb_start, io_tlb_end;
+
+/**
+ * struct io_tlb_mem - IO TLB Memory Pool Descriptor
+ *
+ * @start: The start address of the swiotlb memory pool. Used to do a quick
+ * range check to see if the memory was in fact allocated by this
+ * API.
+ * @end:   The end address of the swiotlb memory pool. Used to do a quick
+ * range check to see if the memory was in fact allocated by this
+ * API.
+ * @nslabs:The number of IO TLB blocks (in groups of 64) between @start and
+ * @end. This is command line adjustable via setup_io_tlb_npages.
+ * @used:  The number of used IO TLB block.
+ * @list:  The free list describing the number of free entries available
+ * from each index.
+ * @index: The index to start searching in the next round.
+ * @orig_addr: The original address corresponding to a mapped entry.
+ * @alloc_size:Size of the allocated buffer.
+ * @lock:  The lock to protect the above data structures in the map and
+ * unmap calls.
+ * @debugfs:   The dentry to debugfs.
+ * @late_alloc:%true if allocated using the page allocator
+ */
+struct io_tlb_mem {
+   phys_addr_t start;
+   phys_addr_t end;
+   unsigned long nslabs;
+   unsigned long used;
+   unsigned int *list;
+   unsigned int index;
+   phys_addr_t *orig_addr;
+   size_t *alloc_size;
+   spinlock_t lock;
+   struct dentry *debugfs;
+   bool late_alloc;
+};
+extern struct io_tlb_mem io_tlb_default_mem;
 
 static inline bool is_swiotlb_buffer(phys_addr_t paddr)
 {
-   return paddr >= io_tlb_start && paddr < io_tlb_end;
+   struct io_tlb_mem *mem = &io_tlb_default_mem;
+
+   return paddr >= mem->start && paddr < mem->end;
 }
 
 void __init swiotlb_exit(void);
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 35e24f0ff8b207..d9c097f0f78cec 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -59,32 +59,11 @@
  */
 #define IO_TLB_MIN_SLABS ((1<<20) >> IO_TLB_SHIFT)
 
-enum swiotlb_force swiotlb_force;
-
-/*
- * Used to do a quick range check in swiotlb_tbl_unmap_single and
- * swiotlb_tbl_sync_single_*, to see if the memory was in fact allocated by 
this
- * API.
- */
-phys_addr_t io_tlb_start, io_tlb_end;
-
-/*
- * The number of IO TLB blocks (in groups of 64) between io_tlb_start and
- * io_tlb_end.  This is command line adjustable via setup_io_tlb_npages.
- */
-static unsigned long io_tlb_nslabs;
+#define INVALID_PHYS_ADDR (~(phys_addr_t)0)
 
-/*
- * The number of used IO TLB block
- */
-static unsigned long io_tlb_used;
+enum swiotlb_force swiotlb_force;
 
-/*
- * This is a free list describing the number of free entries available from
- * each index
- */
-static unsigned int *io_tlb_list;
-static unsigned int io_tlb_index;
+struct io_tlb_mem io_tlb_default_mem;
 
 /*
  * Max segment that we can provide which (if pages are contingous) will
@@ -92,32 +71,15 @@ static unsigned int io_tlb_index;
  */
 static unsigned int max_segment;
 
-/*
- * We need to save away the original address corresponding to a mapped entry
- * for the sync operations.
- */
-#define INVALID_PHYS_ADDR (~(phys_addr_t)0)
-static phys_addr_t *io_tlb_orig_addr;
-
-/*
- * The mapped buffer's size should be validated during a sync operation.
- */
-static size_t *io_tlb_alloc_size;
-
-/*
- * Protect the above data structures in the map and unmap calls
- */
-static DEFINE_SPINLOCK(io_tlb_lock);
-
-static int late_alloc;
-
 static int __init
 setup_io_tl

swiotlb cleanups v3

2021-03-18 Thread Christoph Hellwig

Hi Konrad,

this series contains a bunch of swiotlb cleanups, mostly to reduce the
amount of internals exposed to code outside of swiotlb.c, which should
helper to prepare for supporting multiple different bounce buffer pools.

Changes since v2:
 - fix a bisetion hazard that did not allocate the alloc_size array
 - dropped all patches already merged

Changes since v1:
 - rebased to v5.12-rc1
 - a few more cleanups
 - merge and forward port the patch from Claire to move all the global
   variables into a struct to prepare for multiple instances

Re: Advice needed on SMP regression after cpu_core_mask change

2021-03-18 Thread Daniel Henrique Barboza





On 3/18/21 10:42 AM, Srikar Dronamraju wrote:

* Daniel Henrique Barboza  [2021-03-17 10:00:34]:


Hello,

Patch 4bce545903fa ("powerpc/topology: Update topology_core_cpumask") introduced
a regression in both upstream and RHEL downstream kernels [1]. The assumption 
made
in the commit:

"Further analysis shows that cpu_core_mask and cpu_cpu_mask for any CPU would be
equal on Power"

Doesn't seem to be true. After this commit, QEMU is now unable to set single 
NUMA
node SMP topologies such as:

-smp 8,maxcpus=8,cores=2,threads=2,sockets=2


What does it mean for a NUMA to have more than one sockets?
If they are all part of the same node, there are at local distance to each
other. cache is per core. So what resources are shared by the Sockets that
are part of the same NUMA. And how does Userspace/ application make use of
the same.


Honestly, I sympathize with the idea that multiple sockets in the same NUMA
node being "weird". QEMU is accepting this kind of topology since forever
because we didn't pay attention to these other details.

I don't see any problems adding more constraints that makes sense in the
virtual layer, as long as the constraints make sense and are documented.
Putting multiple sockets in a single NUMA node seems like a fair restriction.




Please don't mistake this as attempt to downplay your report but a honest
attempt to better understand the situation.


It's cool. Ask away.



For example, if the socket denotes the hemisphere logic in P10, then can we
see if the coregroup feature can be used. "Coregroup" is suppose to mean a
set of cores within a NUMA that have some characteristics and there can be
multiple coregroups within a NUMA. We add that mostly to mimic hemisphere in
P10. However the number of coregroups in a NUMA is not exported to userspace
at this time.


I see. I thought that the presence of the hemispheres inside the chip would
justify more than one NUMA node inside the chip, meaning that a chip/socket
would have more than one NUMA nodes inside of it.

If that's not the case then I guess socket == NUMA node is still valid in
P10 as well. The last 'lscpu' example I gave here, claiming that this would
be a Power10 scenario, doesn't represent P10 after all.



However if each Socket is associated with a memory and node distance, then
should they be NUMA?

Can you provide me with the unique ibm,chip-ids in your 2 NUMA, 4 node case?
Does this cause an performance issues with the guest/application?


I can fetch some values, but we're trying to move out of it since it's not on 
the
pseries spec (PAPR). Perhaps with these restrictions we can live without
ibm,chip-id in QEMU.



Till your report, I was under the impression that NUMAs == Sockets.


After reading and discussing about it, I think the sensible thing to do is to
put this same constraint in QEMU.

In theory it would be nice to let the virtual machine to have whatever topology 
it
wants, multiple sockets in the same NUMA domain and so on, but in the end we're
emulating Power hardware. If Power hardware - and the powerpc kernel - operates
under these assumptions, then I don't see much point into allowing users to
set unrealistic virtual CPU topologies that will be misrepresented in the
kernel.


I'll try this restriction in QEMU and see how upstream kernel behaves, with
and without ibm,chip-id being advertised in the DT.


Thanks,


DHB





lscpu will give the following output in this case:

# lscpu
Architecture:ppc64le
Byte Order:  Little Endian
CPU(s):  8
On-line CPU(s) list: 0-7
Thread(s) per core:  2
Core(s) per socket:  4
Socket(s):   1
NUMA node(s):1
Model:   2.2 (pvr 004e 1202)
Model name:  POWER9 (architected), altivec supported
Hypervisor vendor:   KVM
Virtualization type: para
L1d cache:   32K
L1i cache:   32K
NUMA node0 CPU(s):   0-7


This is happening because the macro cpu_cpu_mask(cpu) expands to
cpumask_of_node(cpu_to_node(cpu)), which in turn expands to 
node_to_cpumask_map[node].
node_to_cpumask_map is a NUMA array that maps CPUs to NUMA nodes (Aneesh is on 
CC to
correct me if I'm wrong). We're now associating sockets to NUMA nodes directly.

If I add a second NUMA node then I can get the intended smp topology:

-smp 8,maxcpus=8,cores=2,threads=2,sockets=2
-numa node,memdev=mem0,cpus=0-3,nodeid=0 \
-numa node,memdev=mem1,cpus=4-7,nodeid=1 \

# lscpu
Architecture:ppc64le
Byte Order:  Little Endian
CPU(s):  8
On-line CPU(s) list: 0-7
Thread(s) per core:  2
Core(s) per socket:  2
Socket(s):   2
NUMA node(s):2
Model:   2.2 (pvr 004e 1202)
Model name:  POWER9 (architected), altivec supported
Hypervisor vendor:   KVM
Virtualization type: para
L1d cache:   32K
L1i cache:   32K
NUMA node0 CPU(s):   0-3
NUMA node1 CPU(s):   4-7


However, if I try a single socket with multiple NUMA nodes topology, which is 
the case
of Power10, e.g.:


-smp 8,maxcpus=

Re: Advice needed on SMP regression after cpu_core_mask change

2021-03-18 Thread Srikar Dronamraju

* Daniel Henrique Barboza  [2021-03-17 10:00:34]:

> Hello,
> 
> Patch 4bce545903fa ("powerpc/topology: Update topology_core_cpumask") 
> introduced
> a regression in both upstream and RHEL downstream kernels [1]. The assumption 
> made
> in the commit:
> 
> "Further analysis shows that cpu_core_mask and cpu_cpu_mask for any CPU would 
> be
> equal on Power"
> 
> Doesn't seem to be true. After this commit, QEMU is now unable to set single 
> NUMA
> node SMP topologies such as:
> 
> -smp 8,maxcpus=8,cores=2,threads=2,sockets=2

What does it mean for a NUMA to have more than one sockets?
If they are all part of the same node, there are at local distance to each
other. cache is per core. So what resources are shared by the Sockets that
are part of the same NUMA. And how does Userspace/ application make use of
the same.

Please don't mistake this as attempt to downplay your report but a honest
attempt to better understand the situation.

For example, if the socket denotes the hemisphere logic in P10, then can we
see if the coregroup feature can be used. "Coregroup" is suppose to mean a
set of cores within a NUMA that have some characteristics and there can be
multiple coregroups within a NUMA. We add that mostly to mimic hemisphere in
P10. However the number of coregroups in a NUMA is not exported to userspace
at this time.

However if each Socket is associated with a memory and node distance, then
should they be NUMA?

Can you provide me with the unique ibm,chip-ids in your 2 NUMA, 4 node case?
Does this cause an performance issues with the guest/application?

Till your report, I was under the impression that NUMAs == Sockets.

> 
> lscpu will give the following output in this case:
> 
> # lscpu
> Architecture:ppc64le
> Byte Order:  Little Endian
> CPU(s):  8
> On-line CPU(s) list: 0-7
> Thread(s) per core:  2
> Core(s) per socket:  4
> Socket(s):   1
> NUMA node(s):1
> Model:   2.2 (pvr 004e 1202)
> Model name:  POWER9 (architected), altivec supported
> Hypervisor vendor:   KVM
> Virtualization type: para
> L1d cache:   32K
> L1i cache:   32K
> NUMA node0 CPU(s):   0-7
> 
> 
> This is happening because the macro cpu_cpu_mask(cpu) expands to
> cpumask_of_node(cpu_to_node(cpu)), which in turn expands to 
> node_to_cpumask_map[node].
> node_to_cpumask_map is a NUMA array that maps CPUs to NUMA nodes (Aneesh is 
> on CC to
> correct me if I'm wrong). We're now associating sockets to NUMA nodes 
> directly.
> 
> If I add a second NUMA node then I can get the intended smp topology:
> 
> -smp 8,maxcpus=8,cores=2,threads=2,sockets=2
> -numa node,memdev=mem0,cpus=0-3,nodeid=0 \
> -numa node,memdev=mem1,cpus=4-7,nodeid=1 \
> 
> # lscpu
> Architecture:ppc64le
> Byte Order:  Little Endian
> CPU(s):  8
> On-line CPU(s) list: 0-7
> Thread(s) per core:  2
> Core(s) per socket:  2
> Socket(s):   2
> NUMA node(s):2
> Model:   2.2 (pvr 004e 1202)
> Model name:  POWER9 (architected), altivec supported
> Hypervisor vendor:   KVM
> Virtualization type: para
> L1d cache:   32K
> L1i cache:   32K
> NUMA node0 CPU(s):   0-3
> NUMA node1 CPU(s):   4-7
> 
> 
> However, if I try a single socket with multiple NUMA nodes topology, which is 
> the case
> of Power10, e.g.:
> 
> 
> -smp 8,maxcpus=8,cores=4,threads=2,sockets=1
> -numa node,memdev=mem0,cpus=0-3,nodeid=0 \
> -numa node,memdev=mem1,cpus=4-7,nodeid=1 \
> 
> 
> This is the result:
> 
> # lscpu
> Architecture:ppc64le
> Byte Order:  Little Endian
> CPU(s):  8
> On-line CPU(s) list: 0-7
> Thread(s) per core:  2
> Core(s) per socket:  2
> Socket(s):   2
> NUMA node(s):2
> Model:   2.2 (pvr 004e 1202)
> Model name:  POWER9 (architected), altivec supported
> Hypervisor vendor:   KVM
> Virtualization type: para
> L1d cache:   32K
> L1i cache:   32K
> NUMA node0 CPU(s):   0-3
> NUMA node1 CPU(s):   4-7
> 
> 
> This confirms my suspicions that, at this moment, we're making sockets == 
> NUMA nodes.
> 
> 
> Cedric, the reason I'm CCing you is because this is related to ibm,chip-id. 
> The commit
> after the one that caused the regression, 4ca234a9cbd7c3a65 ("powerpc/smp: 
> Stop updating
> cpu_core_mask"), is erasing the code that calculated cpu_core_mask. 
> cpu_core_mask, despite
> its shortcomings that caused its removal, was giving a precise SMP topology. 
> And it was
> using physical_package_id/'ibm,chip-id' for that.
> 
> Checking in QEMU I can say that the ibm,chip-id calculation is the only place 
> in the code
> that cares about cores per socket information. The kernel is now ignoring 
> that, starting
> on 4bce545903fa, and now QEMU is unable to provide this info to the guest.
> 
> If we're not going to use ibm,chip-id any longer, which seems sensible given 
> that PAPR does
> not declare it, we need another way of letting the guest know how

[PATCH v3 00/10] Rid W=1 warnings in Crypto

2021-03-18 Thread Lee Jones

This is set 1 of 2 sets required to fully clean Crypto.

v2: No functional changes since v1.
v3: Description change and additional struct header fix

Lee Jones (10):
  crypto: hisilicon: sec_drv: Supply missing description for
'sec_queue_empty()'s 'queue' param
  crypto: bcm: Fix a whole host of kernel-doc misdemeanours
  crypto: chelsio: chcr_core: Fix some kernel-doc issues
  crypto: ux500: hash: hash_core: Fix worthy kernel-doc headers and
remove others
  crypto: keembay: ocs-hcu: Fix incorrectly named functions/structs
  crypto: atmel-ecc: Struct headers need to start with keyword 'struct'
  crypto: caam: caampkc: Provide the name of the function and provide
missing descriptions
  crypto: vmx: Source headers are not good kernel-doc candidates
  crypto: nx: nx-aes-cbc: Repair some kernel-doc problems
  crypto: cavium: nitrox_isr: Demote non-compliant kernel-doc headers

 drivers/crypto/atmel-ecc.c|  2 +-
 drivers/crypto/bcm/cipher.c   |  7 ++--
 drivers/crypto/bcm/spu.c  | 16 -
 drivers/crypto/bcm/spu2.c | 43 +--
 drivers/crypto/bcm/util.c |  4 +--
 drivers/crypto/caam/caamalg_qi2.c |  3 ++
 drivers/crypto/caam/caampkc.c |  3 +-
 drivers/crypto/cavium/nitrox/nitrox_isr.c |  4 +--
 drivers/crypto/chelsio/chcr_algo.c|  8 ++---
 drivers/crypto/chelsio/chcr_core.c|  2 +-
 drivers/crypto/hisilicon/sec/sec_drv.c|  1 +
 drivers/crypto/keembay/ocs-hcu.c  |  8 ++---
 drivers/crypto/nx/nx-aes-cbc.c|  2 +-
 drivers/crypto/nx/nx.c|  5 +--
 drivers/crypto/nx/nx_debugfs.c|  2 +-
 drivers/crypto/ux500/cryp/cryp.c  |  5 +--
 drivers/crypto/ux500/cryp/cryp_core.c |  5 +--
 drivers/crypto/ux500/cryp/cryp_irq.c  |  2 +-
 drivers/crypto/ux500/hash/hash_core.c | 15 +++-
 drivers/crypto/vmx/vmx.c  |  2 +-
 20 files changed, 73 insertions(+), 66 deletions(-)

Cc: Alexandre Belloni 
Cc: Andreas Westin 
Cc: Atul Gupta 
Cc: Aymen Sghaier 
Cc: Ayush Sawal 
Cc: Benjamin Herrenschmidt 
Cc: Berne Hebark 
Cc: "Breno Leitão" 
Cc: Daniele Alessandrelli 
Cc: "David S. Miller" 
Cc: Declan Murphy 
Cc: Harsh Jain 
Cc: Henrique Cerri 
Cc: Herbert Xu 
Cc: "Horia Geantă" 
Cc: Jitendra Lulla 
Cc: Joakim Bech 
Cc: Jonas Linde 
Cc: Jonathan Cameron 
Cc: Kent Yoder 
Cc: linux-arm-ker...@lists.infradead.org
Cc: linux-cry...@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: Ludovic Desroches 
Cc: Manoj Malviya 
Cc: Michael Ellerman 
Cc: M R Gowda 
Cc: Nayna Jain 
Cc: Nicolas Ferre 
Cc: Niklas Hernaeus 
Cc: Paul Mackerras 
Cc: Paulo Flabiano Smorigo 
Cc: Rob Rice 
Cc: Rohit Maheshwari 
Cc: Shujuan Chen 
Cc: Tudor Ambarus 
Cc: Vinay Kumar Yadav 
Cc: Zaibo Xu 
-- 
2.27.0

[PATCH 09/10] crypto: nx: nx-aes-cbc: Repair some kernel-doc problems

2021-03-18 Thread Lee Jones

Fixes the following W=1 kernel build warning(s):

 drivers/crypto/nx/nx-aes-cbc.c:24: warning: Function parameter or member 'tfm' 
not described in 'cbc_aes_nx_set_key'
 drivers/crypto/nx/nx-aes-cbc.c:24: warning: Function parameter or member 
'in_key' not described in 'cbc_aes_nx_set_key'
 drivers/crypto/nx/nx-aes-cbc.c:24: warning: Function parameter or member 
'key_len' not described in 'cbc_aes_nx_set_key'
 drivers/crypto/nx/nx-aes-cbc.c:24: warning: expecting prototype for Nest 
Accelerators driver(). Prototype was for cbc_aes_nx_set_key() instead
 drivers/crypto/nx/nx_debugfs.c:34: warning: Function parameter or member 'drv' 
not described in 'nx_debugfs_init'
 drivers/crypto/nx/nx_debugfs.c:34: warning: expecting prototype for Nest 
Accelerators driver(). Prototype was for nx_debugfs_init() instead
 drivers/crypto/nx/nx.c:31: warning: Incorrect use of kernel-doc format:  * 
nx_hcall_sync - make an H_COP_OP hcall for the passed in op structure
 drivers/crypto/nx/nx.c:43: warning: Function parameter or member 'nx_ctx' not 
described in 'nx_hcall_sync'
 drivers/crypto/nx/nx.c:43: warning: Function parameter or member 'op' not 
described in 'nx_hcall_sync'
 drivers/crypto/nx/nx.c:43: warning: Function parameter or member 'may_sleep' 
not described in 'nx_hcall_sync'
 drivers/crypto/nx/nx.c:43: warning: expecting prototype for Nest Accelerators 
driver(). Prototype was for nx_hcall_sync() instead
 drivers/crypto/nx/nx.c:209: warning: Function parameter or member 'nbytes' not 
described in 'trim_sg_list'

Cc: "Breno Leitão" 
Cc: Nayna Jain 
Cc: Paulo Flabiano Smorigo 
Cc: Michael Ellerman 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Herbert Xu 
Cc: "David S. Miller" 
Cc: Kent Yoder 
Cc: linux-cry...@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Lee Jones 
---
 drivers/crypto/nx/nx-aes-cbc.c | 2 +-
 drivers/crypto/nx/nx.c | 5 +++--
 drivers/crypto/nx/nx_debugfs.c | 2 +-
 3 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/crypto/nx/nx-aes-cbc.c b/drivers/crypto/nx/nx-aes-cbc.c
index 92e921eceed75..d6314ea9ae896 100644
--- a/drivers/crypto/nx/nx-aes-cbc.c
+++ b/drivers/crypto/nx/nx-aes-cbc.c
@@ -1,5 +1,5 @@
 // SPDX-License-Identifier: GPL-2.0-only
-/**
+/*
  * AES CBC routines supporting the Power 7+ Nest Accelerators driver
  *
  * Copyright (C) 2011-2012 International Business Machines Inc.
diff --git a/drivers/crypto/nx/nx.c b/drivers/crypto/nx/nx.c
index 1d0e8a1ba1605..010e87d9da36b 100644
--- a/drivers/crypto/nx/nx.c
+++ b/drivers/crypto/nx/nx.c
@@ -1,5 +1,5 @@
 // SPDX-License-Identifier: GPL-2.0-only
-/**
+/*
  * Routines supporting the Power 7+ Nest Accelerators driver
  *
  * Copyright (C) 2011-2012 International Business Machines Inc.
@@ -200,7 +200,8 @@ struct nx_sg *nx_walk_and_build(struct nx_sg   *nx_dst,
  * @sg: sg list head
  * @end: sg lisg end
  * @delta:  is the amount we need to crop in order to bound the list.
- *
+ * @nbytes: length of data in the scatterlists or data length - whichever
+ *  is greater.
  */
 static long int trim_sg_list(struct nx_sg *sg,
 struct nx_sg *end,
diff --git a/drivers/crypto/nx/nx_debugfs.c b/drivers/crypto/nx/nx_debugfs.c
index 1975bcbee9974..ee7cd88bb10a7 100644
--- a/drivers/crypto/nx/nx_debugfs.c
+++ b/drivers/crypto/nx/nx_debugfs.c
@@ -1,5 +1,5 @@
 // SPDX-License-Identifier: GPL-2.0-only
-/**
+/*
  * debugfs routines supporting the Power 7+ Nest Accelerators driver
  *
  * Copyright (C) 2011-2012 International Business Machines Inc.
-- 
2.27.0

[PATCH 08/10] crypto: vmx: Source headers are not good kernel-doc candidates

2021-03-18 Thread Lee Jones

Fixes the following W=1 kernel build warning(s):

 drivers/crypto/vmx/vmx.c:23: warning: expecting prototype for Routines 
supporting VMX instructions on the Power 8(). Prototype was for p8_init() 
instead

Cc: "Breno Leitão" 
Cc: Nayna Jain 
Cc: Paulo Flabiano Smorigo 
Cc: Michael Ellerman 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Herbert Xu 
Cc: "David S. Miller" 
Cc: Henrique Cerri 
Cc: linux-cry...@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Lee Jones 
---
 drivers/crypto/vmx/vmx.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/crypto/vmx/vmx.c b/drivers/crypto/vmx/vmx.c
index a40d08e75fc0b..7eb713cc87c8c 100644
--- a/drivers/crypto/vmx/vmx.c
+++ b/drivers/crypto/vmx/vmx.c
@@ -1,5 +1,5 @@
 // SPDX-License-Identifier: GPL-2.0-only
-/**
+/*
  * Routines supporting VMX instructions on the Power 8
  *
  * Copyright (C) 2015 International Business Machines Inc.
-- 
2.27.0

Re: [PATCH 1/2] audit: add support for the openat2 syscall

2021-03-18 Thread Richard Guy Briggs

On 2021-03-18 11:48, Christian Brauner wrote:
> [+Cc Aleksa, the author of openat2()]

Ah!  Thanks for pulling in Aleksa.  I thought I caught everyone...

> and a comment below. :)

Same...

> On Wed, Mar 17, 2021 at 09:47:17PM -0400, Richard Guy Briggs wrote:
> > The openat2(2) syscall was added in kernel v5.6 with commit fddb5d430ad9
> > ("open: introduce openat2(2) syscall")
> > 
> > Add the openat2(2) syscall to the audit syscall classifier.
> > 
> > See the github issue
> > https://github.com/linux-audit/audit-kernel/issues/67
> > 
> > Signed-off-by: Richard Guy Briggs 
> > ---
> >  arch/alpha/kernel/audit.c  | 2 ++
> >  arch/ia64/kernel/audit.c   | 2 ++
> >  arch/parisc/kernel/audit.c | 2 ++
> >  arch/parisc/kernel/compat_audit.c  | 2 ++
> >  arch/powerpc/kernel/audit.c| 2 ++
> >  arch/powerpc/kernel/compat_audit.c | 2 ++
> >  arch/s390/kernel/audit.c   | 2 ++
> >  arch/s390/kernel/compat_audit.c| 2 ++
> >  arch/sparc/kernel/audit.c  | 2 ++
> >  arch/sparc/kernel/compat_audit.c   | 2 ++
> >  arch/x86/ia32/audit.c  | 2 ++
> >  arch/x86/kernel/audit_64.c | 2 ++
> >  kernel/auditsc.c   | 3 +++
> >  lib/audit.c| 4 
> >  lib/compat_audit.c | 4 
> >  15 files changed, 35 insertions(+)
> > 
> > diff --git a/arch/alpha/kernel/audit.c b/arch/alpha/kernel/audit.c
> > index 96a9d18ff4c4..06a911b685d1 100644
> > --- a/arch/alpha/kernel/audit.c
> > +++ b/arch/alpha/kernel/audit.c
> > @@ -42,6 +42,8 @@ int audit_classify_syscall(int abi, unsigned syscall)
> > return 3;
> > case __NR_execve:
> > return 5;
> > +   case __NR_openat2:
> > +   return 6;
> > default:
> > return 0;
> > }
> > diff --git a/arch/ia64/kernel/audit.c b/arch/ia64/kernel/audit.c
> > index 5192ca899fe6..5eaa888c8fd3 100644
> > --- a/arch/ia64/kernel/audit.c
> > +++ b/arch/ia64/kernel/audit.c
> > @@ -43,6 +43,8 @@ int audit_classify_syscall(int abi, unsigned syscall)
> > return 3;
> > case __NR_execve:
> > return 5;
> > +   case __NR_openat2:
> > +   return 6;
> > default:
> > return 0;
> > }
> > diff --git a/arch/parisc/kernel/audit.c b/arch/parisc/kernel/audit.c
> > index 9eb47b2225d2..fc721a7727ba 100644
> > --- a/arch/parisc/kernel/audit.c
> > +++ b/arch/parisc/kernel/audit.c
> > @@ -52,6 +52,8 @@ int audit_classify_syscall(int abi, unsigned syscall)
> > return 3;
> > case __NR_execve:
> > return 5;
> > +   case __NR_openat2:
> > +   return 6;
> > default:
> > return 0;
> > }
> > diff --git a/arch/parisc/kernel/compat_audit.c 
> > b/arch/parisc/kernel/compat_audit.c
> > index 20c39c9d86a9..fc6d35918c44 100644
> > --- a/arch/parisc/kernel/compat_audit.c
> > +++ b/arch/parisc/kernel/compat_audit.c
> > @@ -35,6 +35,8 @@ int parisc32_classify_syscall(unsigned syscall)
> > return 3;
> > case __NR_execve:
> > return 5;
> > +   case __NR_openat2:
> > +   return 6;
> > default:
> > return 1;
> > }
> > diff --git a/arch/powerpc/kernel/audit.c b/arch/powerpc/kernel/audit.c
> > index a27f3d09..8f32700b0baa 100644
> > --- a/arch/powerpc/kernel/audit.c
> > +++ b/arch/powerpc/kernel/audit.c
> > @@ -54,6 +54,8 @@ int audit_classify_syscall(int abi, unsigned syscall)
> > return 4;
> > case __NR_execve:
> > return 5;
> > +   case __NR_openat2:
> > +   return 6;
> > default:
> > return 0;
> > }
> > diff --git a/arch/powerpc/kernel/compat_audit.c 
> > b/arch/powerpc/kernel/compat_audit.c
> > index 55c6ccda0a85..ebe45534b1c9 100644
> > --- a/arch/powerpc/kernel/compat_audit.c
> > +++ b/arch/powerpc/kernel/compat_audit.c
> > @@ -38,6 +38,8 @@ int ppc32_classify_syscall(unsigned syscall)
> > return 4;
> > case __NR_execve:
> > return 5;
> > +   case __NR_openat2:
> > +   return 6;
> > default:
> > return 1;
> > }
> > diff --git a/arch/s390/kernel/audit.c b/arch/s390/kernel/audit.c
> > index d395c6c9944c..d964cb94cfaf 100644
> > --- a/arch/s390/kernel/audit.c
> > +++ b/arch/s390/kernel/audit.c
> > @@ -54,6 +54,8 @@ int audit_classify_syscall(int abi, unsigned syscall)
> > return 4;
> > case __NR_execve:
> > return 5;
> > +   case __NR_openat2:
> > +   return 6;
> > default:
> > return 0;
> > }
> > diff --git a/arch/s390/kernel/compat_audit.c 
> > b/arch/s390/kernel/compat_audit.c
> > index 444fb1f66944..f7b32933ce0e 100644
> > --- a/arch/s390/kernel/compat_audit.c
> > +++ b/arch/s390/kernel/compat_audit.c
> > @@ -39,6 +39,8 @@ int s390_classify_syscall(unsigned syscall)
> > return 4;
> > case __NR_execve:
> > return 5;
> > +   case __NR_openat2:
> > +   return 6;
> > default:
> > return 1;
>

Re: [PATCH 1/2] audit: add support for the openat2 syscall

2021-03-18 Thread Richard Guy Briggs

On 2021-03-18 11:52, Christian Brauner wrote:
> On Thu, Mar 18, 2021 at 11:48:45AM +0100, Christian Brauner wrote:
> > On Wed, Mar 17, 2021 at 09:47:17PM -0400, Richard Guy Briggs wrote:
> > > The openat2(2) syscall was added in kernel v5.6 with commit fddb5d430ad9
> > > ("open: introduce openat2(2) syscall")
> > > Add the openat2(2) syscall to the audit syscall classifier.
> > > See the github issue
> > > https://github.com/linux-audit/audit-kernel/issues/67
> > > Signed-off-by: Richard Guy Briggs 

...

> And one more comment, why return a hard-coded integer from all of these
> architectures instead of introducing an enum in a central place with
> proper names idk:

Oh, believe me, I tried hard to do that because I really don't like
hard-coded magic values, but for expediency I continued the same
approach until I could sort out the header file mess.  There was an
extra preparatory patch (attached) in this patchset with a different
audit syscall perms patch (also attached).  By including "#include
" in each of the compat source files there were warnings
of redefinitions of every __NR_* syscall number.  The easiest way to get
rid of it would have been to pull the new AUDITSC_* definitions into a
new  file and include that from  and
each of the arch/*/*/*audit.c (and lib/*audit.c) files.

> enum audit_match_perm_t {
>   .
>   .
>   .
>   AUDIT_MATCH_PERM_EXECVE = 5,
>   AUDIT_MATCH_PERM_OPENAT2 = 6,
>   .
>   .
>   .
> }
> 
> Then you can drop these hard-coded comments too and it's way less
> brittle overall.

Totally agree.

> Christian

- RGB

--
Richard Guy Briggs 
Sr. S/W Engineer, Kernel Security, Base Operating Systems
Remote, Ottawa, Red Hat Canada
IRC: rgb, SunRaycer
Voice: +1.647.777.2635, Internal: (81) 32635
>From 599ae48091296a3ad3eb4259e7af39cdf0f743c7 Mon Sep 17 00:00:00 2001
Message-Id: 
<599ae48091296a3ad3eb4259e7af39cdf0f743c7.1616067847.git@redhat.com>
In-Reply-To: 
References: 
From: Richard Guy Briggs 
Date: Fri, 22 Jan 2021 16:27:42 -0500
Subject: [PATCH 1/3] audit: replace magic audit syscall class numbers with
 macros

Replace the magic numbers used to indicate audit syscall classes with macros.

Signed-off-by: Richard Guy Briggs 
---
 arch/alpha/kernel/audit.c  |  8 
 arch/ia64/kernel/audit.c   |  8 
 arch/parisc/kernel/audit.c |  8 
 arch/parisc/kernel/compat_audit.c  |  9 +
 arch/powerpc/kernel/audit.c| 10 +-
 arch/powerpc/kernel/compat_audit.c | 11 ++-
 arch/s390/kernel/audit.c   | 10 +-
 arch/s390/kernel/compat_audit.c| 11 ++-
 arch/sparc/kernel/audit.c  | 10 +-
 arch/sparc/kernel/compat_audit.c   | 11 ++-
 arch/x86/ia32/audit.c  | 11 ++-
 arch/x86/kernel/audit_64.c |  8 
 include/linux/audit.h  |  7 +++
 kernel/auditsc.c   | 12 ++--
 lib/audit.c| 10 +-
 lib/compat_audit.c | 11 ++-
 16 files changed, 84 insertions(+), 71 deletions(-)

diff --git a/arch/alpha/kernel/audit.c b/arch/alpha/kernel/audit.c
index 96a9d18ff4c4..81cbd804e375 100644
--- a/arch/alpha/kernel/audit.c
+++ b/arch/alpha/kernel/audit.c
@@ -37,13 +37,13 @@ int audit_classify_syscall(int abi, unsigned syscall)
 {
switch(syscall) {
case __NR_open:
-   return 2;
+   return AUDITSC_OPEN;
case __NR_openat:
-   return 3;
+   return AUDITSC_OPENAT;
case __NR_execve:
-   return 5;
+   return AUDITSC_EXECVE;
default:
-   return 0;
+   return AUDITSC_NATIVE;
}
 }
 
diff --git a/arch/ia64/kernel/audit.c b/arch/ia64/kernel/audit.c
index 5192ca899fe6..dba6a74c9ab3 100644
--- a/arch/ia64/kernel/audit.c
+++ b/arch/ia64/kernel/audit.c
@@ -38,13 +38,13 @@ int audit_classify_syscall(int abi, unsigned syscall)
 {
switch(syscall) {
case __NR_open:
-   return 2;
+   return AUDITSC_OPEN;
case __NR_openat:
-   return 3;
+   return AUDITSC_OPENAT;
case __NR_execve:
-   return 5;
+   return AUDITSC_EXECVE;
default:
-   return 0;
+   return AUDITSC_NATIVE;
}
 }
 
diff --git a/arch/parisc/kernel/audit.c b/arch/parisc/kernel/audit.c
index 9eb47b2225d2..14244e83db75 100644
--- a/arch/parisc/kernel/audit.c
+++ b/arch/parisc/kernel/audit.c
@@ -47,13 +47,13 @@ int audit_classify_syscall(int abi, unsigned syscall)
 #endif
switch (syscall) {
case __NR_open:
-   return 2;
+   return AUDITSC_OPEN;
case __NR_openat:
-   return 3;
+   return AUDITSC_OPENAT;
case __NR_execve:
-   return 5;
+   return AUDITSC_EXECVE;
default:
-   return 0;
+

Re: [PATCH 1/2] audit: add support for the openat2 syscall

2021-03-18 Thread Christian Brauner

On Thu, Mar 18, 2021 at 11:48:45AM +0100, Christian Brauner wrote:
> [+Cc Aleksa, the author of openat2()]
> 
> and a comment below. :)
> 
> On Wed, Mar 17, 2021 at 09:47:17PM -0400, Richard Guy Briggs wrote:
> > The openat2(2) syscall was added in kernel v5.6 with commit fddb5d430ad9
> > ("open: introduce openat2(2) syscall")
> > 
> > Add the openat2(2) syscall to the audit syscall classifier.
> > 
> > See the github issue
> > https://github.com/linux-audit/audit-kernel/issues/67
> > 
> > Signed-off-by: Richard Guy Briggs 
> > ---
> >  arch/alpha/kernel/audit.c  | 2 ++
> >  arch/ia64/kernel/audit.c   | 2 ++
> >  arch/parisc/kernel/audit.c | 2 ++
> >  arch/parisc/kernel/compat_audit.c  | 2 ++
> >  arch/powerpc/kernel/audit.c| 2 ++
> >  arch/powerpc/kernel/compat_audit.c | 2 ++
> >  arch/s390/kernel/audit.c   | 2 ++
> >  arch/s390/kernel/compat_audit.c| 2 ++
> >  arch/sparc/kernel/audit.c  | 2 ++
> >  arch/sparc/kernel/compat_audit.c   | 2 ++
> >  arch/x86/ia32/audit.c  | 2 ++
> >  arch/x86/kernel/audit_64.c | 2 ++
> >  kernel/auditsc.c   | 3 +++
> >  lib/audit.c| 4 
> >  lib/compat_audit.c | 4 
> >  15 files changed, 35 insertions(+)
> > 
> > diff --git a/arch/alpha/kernel/audit.c b/arch/alpha/kernel/audit.c
> > index 96a9d18ff4c4..06a911b685d1 100644
> > --- a/arch/alpha/kernel/audit.c
> > +++ b/arch/alpha/kernel/audit.c
> > @@ -42,6 +42,8 @@ int audit_classify_syscall(int abi, unsigned syscall)
> > return 3;
> > case __NR_execve:
> > return 5;
> > +   case __NR_openat2:
> > +   return 6;
> > default:
> > return 0;
> > }
> > diff --git a/arch/ia64/kernel/audit.c b/arch/ia64/kernel/audit.c
> > index 5192ca899fe6..5eaa888c8fd3 100644
> > --- a/arch/ia64/kernel/audit.c
> > +++ b/arch/ia64/kernel/audit.c
> > @@ -43,6 +43,8 @@ int audit_classify_syscall(int abi, unsigned syscall)
> > return 3;
> > case __NR_execve:
> > return 5;
> > +   case __NR_openat2:
> > +   return 6;
> > default:
> > return 0;
> > }
> > diff --git a/arch/parisc/kernel/audit.c b/arch/parisc/kernel/audit.c
> > index 9eb47b2225d2..fc721a7727ba 100644
> > --- a/arch/parisc/kernel/audit.c
> > +++ b/arch/parisc/kernel/audit.c
> > @@ -52,6 +52,8 @@ int audit_classify_syscall(int abi, unsigned syscall)
> > return 3;
> > case __NR_execve:
> > return 5;
> > +   case __NR_openat2:
> > +   return 6;
> > default:
> > return 0;
> > }
> > diff --git a/arch/parisc/kernel/compat_audit.c 
> > b/arch/parisc/kernel/compat_audit.c
> > index 20c39c9d86a9..fc6d35918c44 100644
> > --- a/arch/parisc/kernel/compat_audit.c
> > +++ b/arch/parisc/kernel/compat_audit.c
> > @@ -35,6 +35,8 @@ int parisc32_classify_syscall(unsigned syscall)
> > return 3;
> > case __NR_execve:
> > return 5;
> > +   case __NR_openat2:
> > +   return 6;
> > default:
> > return 1;
> > }
> > diff --git a/arch/powerpc/kernel/audit.c b/arch/powerpc/kernel/audit.c
> > index a27f3d09..8f32700b0baa 100644
> > --- a/arch/powerpc/kernel/audit.c
> > +++ b/arch/powerpc/kernel/audit.c
> > @@ -54,6 +54,8 @@ int audit_classify_syscall(int abi, unsigned syscall)
> > return 4;
> > case __NR_execve:
> > return 5;
> > +   case __NR_openat2:
> > +   return 6;
> > default:
> > return 0;
> > }
> > diff --git a/arch/powerpc/kernel/compat_audit.c 
> > b/arch/powerpc/kernel/compat_audit.c
> > index 55c6ccda0a85..ebe45534b1c9 100644
> > --- a/arch/powerpc/kernel/compat_audit.c
> > +++ b/arch/powerpc/kernel/compat_audit.c
> > @@ -38,6 +38,8 @@ int ppc32_classify_syscall(unsigned syscall)
> > return 4;
> > case __NR_execve:
> > return 5;
> > +   case __NR_openat2:
> > +   return 6;
> > default:
> > return 1;
> > }
> > diff --git a/arch/s390/kernel/audit.c b/arch/s390/kernel/audit.c
> > index d395c6c9944c..d964cb94cfaf 100644
> > --- a/arch/s390/kernel/audit.c
> > +++ b/arch/s390/kernel/audit.c
> > @@ -54,6 +54,8 @@ int audit_classify_syscall(int abi, unsigned syscall)
> > return 4;
> > case __NR_execve:
> > return 5;
> > +   case __NR_openat2:
> > +   return 6;
> > default:
> > return 0;
> > }
> > diff --git a/arch/s390/kernel/compat_audit.c 
> > b/arch/s390/kernel/compat_audit.c
> > index 444fb1f66944..f7b32933ce0e 100644
> > --- a/arch/s390/kernel/compat_audit.c
> > +++ b/arch/s390/kernel/compat_audit.c
> > @@ -39,6 +39,8 @@ int s390_classify_syscall(unsigned syscall)
> > return 4;
> > case __NR_execve:
> > return 5;
> > +   case __NR_openat2:
> > +   return 6;
> > default:
> > return 1;
> > }
> > diff --git a/arch/sparc/kernel/audit.c b

Re: [PATCH 1/2] audit: add support for the openat2 syscall

2021-03-18 Thread Christian Brauner

[+Cc Aleksa, the author of openat2()]

and a comment below. :)

On Wed, Mar 17, 2021 at 09:47:17PM -0400, Richard Guy Briggs wrote:
> The openat2(2) syscall was added in kernel v5.6 with commit fddb5d430ad9
> ("open: introduce openat2(2) syscall")
> 
> Add the openat2(2) syscall to the audit syscall classifier.
> 
> See the github issue
> https://github.com/linux-audit/audit-kernel/issues/67
> 
> Signed-off-by: Richard Guy Briggs 
> ---
>  arch/alpha/kernel/audit.c  | 2 ++
>  arch/ia64/kernel/audit.c   | 2 ++
>  arch/parisc/kernel/audit.c | 2 ++
>  arch/parisc/kernel/compat_audit.c  | 2 ++
>  arch/powerpc/kernel/audit.c| 2 ++
>  arch/powerpc/kernel/compat_audit.c | 2 ++
>  arch/s390/kernel/audit.c   | 2 ++
>  arch/s390/kernel/compat_audit.c| 2 ++
>  arch/sparc/kernel/audit.c  | 2 ++
>  arch/sparc/kernel/compat_audit.c   | 2 ++
>  arch/x86/ia32/audit.c  | 2 ++
>  arch/x86/kernel/audit_64.c | 2 ++
>  kernel/auditsc.c   | 3 +++
>  lib/audit.c| 4 
>  lib/compat_audit.c | 4 
>  15 files changed, 35 insertions(+)
> 
> diff --git a/arch/alpha/kernel/audit.c b/arch/alpha/kernel/audit.c
> index 96a9d18ff4c4..06a911b685d1 100644
> --- a/arch/alpha/kernel/audit.c
> +++ b/arch/alpha/kernel/audit.c
> @@ -42,6 +42,8 @@ int audit_classify_syscall(int abi, unsigned syscall)
>   return 3;
>   case __NR_execve:
>   return 5;
> + case __NR_openat2:
> + return 6;
>   default:
>   return 0;
>   }
> diff --git a/arch/ia64/kernel/audit.c b/arch/ia64/kernel/audit.c
> index 5192ca899fe6..5eaa888c8fd3 100644
> --- a/arch/ia64/kernel/audit.c
> +++ b/arch/ia64/kernel/audit.c
> @@ -43,6 +43,8 @@ int audit_classify_syscall(int abi, unsigned syscall)
>   return 3;
>   case __NR_execve:
>   return 5;
> + case __NR_openat2:
> + return 6;
>   default:
>   return 0;
>   }
> diff --git a/arch/parisc/kernel/audit.c b/arch/parisc/kernel/audit.c
> index 9eb47b2225d2..fc721a7727ba 100644
> --- a/arch/parisc/kernel/audit.c
> +++ b/arch/parisc/kernel/audit.c
> @@ -52,6 +52,8 @@ int audit_classify_syscall(int abi, unsigned syscall)
>   return 3;
>   case __NR_execve:
>   return 5;
> + case __NR_openat2:
> + return 6;
>   default:
>   return 0;
>   }
> diff --git a/arch/parisc/kernel/compat_audit.c 
> b/arch/parisc/kernel/compat_audit.c
> index 20c39c9d86a9..fc6d35918c44 100644
> --- a/arch/parisc/kernel/compat_audit.c
> +++ b/arch/parisc/kernel/compat_audit.c
> @@ -35,6 +35,8 @@ int parisc32_classify_syscall(unsigned syscall)
>   return 3;
>   case __NR_execve:
>   return 5;
> + case __NR_openat2:
> + return 6;
>   default:
>   return 1;
>   }
> diff --git a/arch/powerpc/kernel/audit.c b/arch/powerpc/kernel/audit.c
> index a27f3d09..8f32700b0baa 100644
> --- a/arch/powerpc/kernel/audit.c
> +++ b/arch/powerpc/kernel/audit.c
> @@ -54,6 +54,8 @@ int audit_classify_syscall(int abi, unsigned syscall)
>   return 4;
>   case __NR_execve:
>   return 5;
> + case __NR_openat2:
> + return 6;
>   default:
>   return 0;
>   }
> diff --git a/arch/powerpc/kernel/compat_audit.c 
> b/arch/powerpc/kernel/compat_audit.c
> index 55c6ccda0a85..ebe45534b1c9 100644
> --- a/arch/powerpc/kernel/compat_audit.c
> +++ b/arch/powerpc/kernel/compat_audit.c
> @@ -38,6 +38,8 @@ int ppc32_classify_syscall(unsigned syscall)
>   return 4;
>   case __NR_execve:
>   return 5;
> + case __NR_openat2:
> + return 6;
>   default:
>   return 1;
>   }
> diff --git a/arch/s390/kernel/audit.c b/arch/s390/kernel/audit.c
> index d395c6c9944c..d964cb94cfaf 100644
> --- a/arch/s390/kernel/audit.c
> +++ b/arch/s390/kernel/audit.c
> @@ -54,6 +54,8 @@ int audit_classify_syscall(int abi, unsigned syscall)
>   return 4;
>   case __NR_execve:
>   return 5;
> + case __NR_openat2:
> + return 6;
>   default:
>   return 0;
>   }
> diff --git a/arch/s390/kernel/compat_audit.c b/arch/s390/kernel/compat_audit.c
> index 444fb1f66944..f7b32933ce0e 100644
> --- a/arch/s390/kernel/compat_audit.c
> +++ b/arch/s390/kernel/compat_audit.c
> @@ -39,6 +39,8 @@ int s390_classify_syscall(unsigned syscall)
>   return 4;
>   case __NR_execve:
>   return 5;
> + case __NR_openat2:
> + return 6;
>   default:
>   return 1;
>   }
> diff --git a/arch/sparc/kernel/audit.c b/arch/sparc/kernel/audit.c
> index a6e91bf34d48..b6dcca9c6520 100644
> --- a/arch/sparc/kernel/audit.c
> +++ b/arch/sparc/kernel/audit.c
> @@ -55,6 +55,8 @@ int audit_classify_syscall(int abi, unsigned int syscall)
>

Re: [PATCH 08/10] MIPS: disable CONFIG_IDE in malta*_defconfig

2021-03-18 Thread Sergei Shtylyov

On 3/18/21 7:57 AM, Christoph Hellwig wrote:

> Various malta defconfigs enable CONFIG_IDE for the tc86c001 ide driver,
> hich is a Toshiba plug in card that does not make much sense to use on
  ^ which is for

> bigsur platforms.  For all other ATA cards libata support is already
  ^ Malta.

> enabled.
> 
> Signed-off-by: Christoph Hellwig 
[...]

MBR, Sergei

Re: [PATCH 07/10] MIPS: disable CONFIG_IDE in bigsur_defconfig

2021-03-18 Thread Sergei Shtylyov

Hi!

On 3/18/21 7:57 AM, Christoph Hellwig wrote:

> bigsur_defconfig enables CONFIG_IDE for the tc86c001 ide driver, which
> is a Toshiba plug in card that does not make much sense to use on bigsur
^ for

   Else that doesn't make much sense. :-)

> platforms.  For all other ATA cards libata support is already enabled.
> 
> Signed-off-by: Christoph Hellwig 
[...]

MBR, Sergei

Re: [PATCH 1/1] hotplug-cpu.c: show 'last online CPU' error in dlpar_cpu_remove()

2021-03-18 Thread Daniel Henrique Barboza


Ping

On 3/5/21 2:38 PM, Daniel Henrique Barboza wrote:

Of all the reasons that dlpar_cpu_remove() can fail, the 'last online
CPU' is one that can be caused directly by the user offlining CPUs
in a partition/virtual machine that has hotplugged CPUs. Trying to
reclaim a hotplugged CPU can fail if the CPU is now the last online in
the system. This is easily reproduced using QEMU [1].

Throwing a more specific error message for this case, instead of just
"Failed to offline CPU", makes it clearer that the error is in fact a
known error situation instead of other generic/unknown cause.

[1] https://bugzilla.redhat.com/1911414

Signed-off-by: Daniel Henrique Barboza 
---
  arch/powerpc/platforms/pseries/hotplug-cpu.c | 12 +++-
  1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/platforms/pseries/hotplug-cpu.c 
b/arch/powerpc/platforms/pseries/hotplug-cpu.c
index 12cbffd3c2e3..134f393f09e1 100644
--- a/arch/powerpc/platforms/pseries/hotplug-cpu.c
+++ b/arch/powerpc/platforms/pseries/hotplug-cpu.c
@@ -514,7 +514,17 @@ static ssize_t dlpar_cpu_remove(struct device_node *dn, 
u32 drc_index)
  
  	rc = dlpar_offline_cpu(dn);

if (rc) {
-   pr_warn("Failed to offline CPU %pOFn, rc: %d\n", dn, rc);
+   /* dlpar_offline_cpu will return -EBUSY from cpu_down() (via
+* device_offline()) in 2 cases: cpu_hotplug_disable is true or
+* there is only one CPU left. Warn the user about the second
+* since this can happen with user offlining CPUs and then
+* attempting hotunplugs.
+*/
+   if (rc == -EBUSY && num_online_cpus() == 1)
+   pr_warn("Unable to remove last online CPU %pOFn\n", dn);
+   else
+   pr_warn("Failed to offline CPU %pOFn, rc: %d\n", dn, 
rc);
+
return -EINVAL;
}

Re: [PATCH] powerpc/numa: Fix topology_physical_package_id() on pSeries

2021-03-18 Thread Daniel Henrique Barboza





On 3/18/21 4:28 AM, Cédric Le Goater wrote:

Also we've been using it for several years and I don't think we should
risk breaking anything by changing the value now.


I guess we can leave it that way. Please read the commit log of
the second patch (not tagged as a v2 ...).

But we should remove ibm,chip-id from QEMU since the property does
not exist on PAPR and that the calculation is anyhow very broken.



I am a strong advocate of getting rid of ibm,chip-id in QEMU. That said,
we need to make sure that the current problem with CPU topologies, that
I reported in that other thread, can be fixed without it.


Thanks,


DHB






Thanks,

C.

[PATCH] pseries: prevent free CPU ids to be reused on another node

2021-03-18 Thread Laurent Dufour

When a CPU is hot added, the CPU ids are taken from the available mask from
the lower possible set. If that set of values was previously used for CPU
attached to a different node, this seems to application like if these CPUs
have migrated from a node to another one which is not expected in real
life.

To prevent this, it is needed to record the CPU ids used for each node and
to not reuse them on another node. However, to prevent CPU hot plug to
fail, in the case the CPU ids is starved on a node, the capability to reuse
other nodes’ free CPU ids is kept. A warning is displayed in such a case
to warn the user.

A new CPU bit mask (node_recorded_ids_map) is introduced for each possible
node. It is populated with the CPU onlined at boot time, and then when a
CPU is hot plug to a node. The bits in that mask remain when the CPU is hot
unplugged, to remind this CPU ids have been used for this node.

If no id set was found, a retry is made without removing the ids used on
the other nodes to try reusing them. This is the way ids have been
allocated prior to this patch.

The effect of this patch can be seen by removing and adding CPUs using the
Qemu monitor. In the following case, the first CPU from the node 2 is
removed, then the first one from the node 1 is removed too. Later, the
first CPU of the node 2 is added back. Without that patch, the kernel will
numbered these CPUs using the first CPU ids available which are the ones
freed when removing the second CPU of the node 0. This leads to the CPU ids
16-23 to move from the node 1 to the node 2. With the patch applied, the
CPU ids 32-39 are used since they are the lowest free ones which have not
been used on another node.

At boot time:
[root@vm40 ~]# numactl -H | grep cpus
available: 3 nodes (0-2)
node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
node 1 cpus: 16 17 18 19 20 21 22 23 40 41 42 43 44 45 46 47
node 2 cpus: 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39

Unpatched kernel, after the CPU hot unplug/plug operations:
[root@vm40 ~]# numactl -H | grep cpus
available: 3 nodes (0-2)
node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
node 1 cpus: 24 25 26 27 28 29 30 31
node 2 cpus: 16 17 18 19 20 21 22 23 40 41 42 43 44 45 46 47

Patched kernel, after the CPU hot unplug/plug operations:
[root@vm40 ~]# numactl -H | grep cpus
available: 3 nodes (0-2)
node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
node 1 cpus: 24 25 26 27 28 29 30 31
node 2 cpus: 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47

Signed-off-by: Laurent Dufour 
---
 arch/powerpc/platforms/pseries/hotplug-cpu.c | 83 ++--
 1 file changed, 76 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/hotplug-cpu.c 
b/arch/powerpc/platforms/pseries/hotplug-cpu.c
index 12cbffd3c2e3..dc5797110d6e 100644
--- a/arch/powerpc/platforms/pseries/hotplug-cpu.c
+++ b/arch/powerpc/platforms/pseries/hotplug-cpu.c
@@ -39,6 +39,8 @@
 /* This version can't take the spinlock, because it never returns */
 static int rtas_stop_self_token = RTAS_UNKNOWN_SERVICE;
 
+cpumask_var_t node_recorded_ids_map[MAX_NUMNODES];
+
 static void rtas_stop_self(void)
 {
static struct rtas_args args;
@@ -151,29 +153,61 @@ static void pseries_cpu_die(unsigned int cpu)
  */
 static int pseries_add_processor(struct device_node *np)
 {
-   unsigned int cpu;
+   unsigned int cpu, node;
cpumask_var_t candidate_mask, tmp;
-   int err = -ENOSPC, len, nthreads, i;
+   int err = -ENOSPC, len, nthreads, i, nid;
const __be32 *intserv;
+   bool force_reusing = false;
 
intserv = of_get_property(np, "ibm,ppc-interrupt-server#s", &len);
if (!intserv)
return 0;
 
-   zalloc_cpumask_var(&candidate_mask, GFP_KERNEL);
-   zalloc_cpumask_var(&tmp, GFP_KERNEL);
+   alloc_cpumask_var(&candidate_mask, GFP_KERNEL);
+   alloc_cpumask_var(&tmp, GFP_KERNEL);
+
+   /*
+* Fetch from the DT nodes read by dlpar_configure_connector() the NUMA
+* node id the added CPU belongs to.
+*/
+   nid = of_node_to_nid(np);
+   if (nid < 0 || !node_possible(nid))
+   nid = first_online_node;
 
nthreads = len / sizeof(u32);
-   for (i = 0; i < nthreads; i++)
-   cpumask_set_cpu(i, tmp);
 
cpu_maps_update_begin();
 
BUG_ON(!cpumask_subset(cpu_present_mask, cpu_possible_mask));
 
+again:
+   cpumask_clear(candidate_mask);
+   cpumask_clear(tmp);
+   for (i = 0; i < nthreads; i++)
+   cpumask_set_cpu(i, tmp);
+
/* Get a bitmap of unoccupied slots. */
cpumask_xor(candidate_mask, cpu_possible_mask, cpu_present_mask);
+
+   /*
+* Remove free ids previously assigned on the other nodes. We can walk
+* only online nodes because once a node became online it is not turned
+* offlined back.
+*/
+   if (!force_reusing)
+   for_each_online_node(node) {
+   if (node == nid)

Re: [PATCH v2 4/6] mm/mremap: Use mmu gather interface instead of flush_tlb_range

2021-03-18 Thread Nicholas Piggin

Excerpts from Aneesh Kumar K.V's message of March 15, 2021 9:38 pm:
> Some architectures do have the concept of page walk cache and only mmu gather
> interface supports flushing them. A fast mremap that involves moving page
> table pages instead of copying pte entries should flush page walk cache since
> the old translation cache is no more valid. Hence switch to mm gather to flush
> TLB and mark tlb.freed_tables = 1. No page table pages need to be freed here.
> With this the tlb flush is done outside page table lock (ptl).

I would maybe just get archs that implement it to provide a specific
flush_tlb+pwc_range for it, or else they get flush_tlb_range by default.

I think that would be simpler for now, at least in generic code.

There was some other talk of consolidating the TLB flush APIs, I jsut 
don't know if it's the best way to go to use the page/page table 
gathering and freeing API for it.

Thanks,
Nick

> 
> Signed-off-by: Aneesh Kumar K.V 
> ---
>  mm/mremap.c | 33 +
>  1 file changed, 29 insertions(+), 4 deletions(-)
> 
> diff --git a/mm/mremap.c b/mm/mremap.c
> index 574287f9bb39..fafa73b965d3 100644
> --- a/mm/mremap.c
> +++ b/mm/mremap.c
> @@ -216,6 +216,7 @@ static bool move_normal_pmd(struct vm_area_struct *vma, 
> unsigned long old_addr,
>  {
>   spinlock_t *old_ptl, *new_ptl;
>   struct mm_struct *mm = vma->vm_mm;
> + struct mmu_gather tlb;
>   pmd_t pmd;
>  
>   /*
> @@ -244,11 +245,12 @@ static bool move_normal_pmd(struct vm_area_struct *vma, 
> unsigned long old_addr,
>   if (WARN_ON_ONCE(!pmd_none(*new_pmd)))
>   return false;
>  
> + tlb_gather_mmu(&tlb, mm);
>   /*
>* We don't have to worry about the ordering of src and dst
>* ptlocks because exclusive mmap_lock prevents deadlock.
>*/
> - old_ptl = pmd_lock(vma->vm_mm, old_pmd);
> + old_ptl = pmd_lock(mm, old_pmd);
>   new_ptl = pmd_lockptr(mm, new_pmd);
>   if (new_ptl != old_ptl)
>   spin_lock_nested(new_ptl, SINGLE_DEPTH_NESTING);
> @@ -257,13 +259,23 @@ static bool move_normal_pmd(struct vm_area_struct *vma, 
> unsigned long old_addr,
>   pmd = *old_pmd;
>   pmd_clear(old_pmd);
>  
> + /*
> +  * Mark the range. We are not freeing page table pages nor
> +  * regular pages. Hence we don't need to call tlb_remove_table()
> +  * or tlb_remove_page().
> +  */
> + tlb_flush_pte_range(&tlb, old_addr, PMD_SIZE);
> + tlb.freed_tables = 1;
>   VM_BUG_ON(!pmd_none(*new_pmd));
>   pmd_populate(mm, new_pmd, (pgtable_t)pmd_page_vaddr(pmd));
>  
> - flush_tlb_range(vma, old_addr, old_addr + PMD_SIZE);
>   if (new_ptl != old_ptl)
>   spin_unlock(new_ptl);
>   spin_unlock(old_ptl);
> + /*
> +  * This will invalidate both the old TLB and page table walk caches.
> +  */
> + tlb_finish_mmu(&tlb);
>  
>   return true;
>  }
> @@ -282,6 +294,7 @@ static bool move_normal_pud(struct vm_area_struct *vma, 
> unsigned long old_addr,
>  {
>   spinlock_t *old_ptl, *new_ptl;
>   struct mm_struct *mm = vma->vm_mm;
> + struct mmu_gather tlb;
>   pud_t pud;
>  
>   /*
> @@ -291,11 +304,12 @@ static bool move_normal_pud(struct vm_area_struct *vma, 
> unsigned long old_addr,
>   if (WARN_ON_ONCE(!pud_none(*new_pud)))
>   return false;
>  
> + tlb_gather_mmu(&tlb, mm);
>   /*
>* We don't have to worry about the ordering of src and dst
>* ptlocks because exclusive mmap_lock prevents deadlock.
>*/
> - old_ptl = pud_lock(vma->vm_mm, old_pud);
> + old_ptl = pud_lock(mm, old_pud);
>   new_ptl = pud_lockptr(mm, new_pud);
>   if (new_ptl != old_ptl)
>   spin_lock_nested(new_ptl, SINGLE_DEPTH_NESTING);
> @@ -304,14 +318,25 @@ static bool move_normal_pud(struct vm_area_struct *vma, 
> unsigned long old_addr,
>   pud = *old_pud;
>   pud_clear(old_pud);
>  
> + /*
> +  * Mark the range. We are not freeing page table pages nor
> +  * regular pages. Hence we don't need to call tlb_remove_table()
> +  * or tlb_remove_page().
> +  */
> + tlb_flush_pte_range(&tlb, old_addr, PUD_SIZE);
> + tlb.freed_tables = 1;
>   VM_BUG_ON(!pud_none(*new_pud));
>  
>   pud_populate(mm, new_pud, (pmd_t *)pud_page_vaddr(pud));
> - flush_tlb_range(vma, old_addr, old_addr + PUD_SIZE);
> +
>   if (new_ptl != old_ptl)
>   spin_unlock(new_ptl);
>   spin_unlock(old_ptl);
>  
> + /*
> +  * This will invalidate both the old TLB and page table walk caches.
> +  */
> + tlb_finish_mmu(&tlb);
>   return true;
>  }
>  #else
> -- 
> 2.29.2
> 
>

Re: [PATCH v2 4/6] mm/mremap: Use mmu gather interface instead of flush_tlb_range

2021-03-18 Thread kernel test robot

Hi "Aneesh,

I love your patch! Yet something to improve:

[auto build test ERROR on powerpc/next]
[also build test ERROR on kselftest/next v5.12-rc3 next-20210317]
[cannot apply to hnaz-linux-mm/master]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:
https://github.com/0day-ci/linux/commits/Aneesh-Kumar-K-V/Speedup-mremap-on-ppc64/20210315-194324
base:   https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next
config: x86_64-rhel-8.3 (attached as .config)
compiler: gcc-9 (Debian 9.3.0-22) 9.3.0
reproduce (this is a W=1 build):
# 
https://github.com/0day-ci/linux/commit/d3b9a3e6f414413d8f822185158b937d9f19b7a6
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review 
Aneesh-Kumar-K-V/Speedup-mremap-on-ppc64/20210315-194324
git checkout d3b9a3e6f414413d8f822185158b937d9f19b7a6
# save the attached .config to linux build tree
make W=1 ARCH=x86_64 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot 

Note: the linux-review/Aneesh-Kumar-K-V/Speedup-mremap-on-ppc64/20210315-194324 
HEAD 79633714ff2b990b3e4972873457678bb34d029f builds fine.
  It only hurts bisectibility.

All errors (new ones prefixed by >>):

   mm/mremap.c: In function 'move_normal_pmd':
>> mm/mremap.c:219:20: error: storage size of 'tlb' isn't known
 219 |  struct mmu_gather tlb;
 |^~~
>> mm/mremap.c:267:2: error: implicit declaration of function 
>> 'tlb_flush_pte_range' [-Werror=implicit-function-declaration]
 267 |  tlb_flush_pte_range(&tlb, old_addr, PMD_SIZE);
 |  ^~~
   mm/mremap.c:219:20: warning: unused variable 'tlb' [-Wunused-variable]
 219 |  struct mmu_gather tlb;
 |^~~
   mm/mremap.c: In function 'move_normal_pud':
   mm/mremap.c:297:20: error: storage size of 'tlb' isn't known
 297 |  struct mmu_gather tlb;
 |^~~
   mm/mremap.c:297:20: warning: unused variable 'tlb' [-Wunused-variable]
   cc1: some warnings being treated as errors


vim +219 mm/mremap.c

   212  
   213  #ifdef CONFIG_HAVE_MOVE_PMD
   214  static bool move_normal_pmd(struct vm_area_struct *vma, unsigned long 
old_addr,
   215unsigned long new_addr, pmd_t *old_pmd, pmd_t 
*new_pmd)
   216  {
   217  spinlock_t *old_ptl, *new_ptl;
   218  struct mm_struct *mm = vma->vm_mm;
 > 219  struct mmu_gather tlb;
   220  pmd_t pmd;
   221  
   222  /*
   223   * The destination pmd shouldn't be established, free_pgtables()
   224   * should have released it.
   225   *
   226   * However, there's a case during execve() where we use mremap
   227   * to move the initial stack, and in that case the target area
   228   * may overlap the source area (always moving down).
   229   *
   230   * If everything is PMD-aligned, that works fine, as moving
   231   * each pmd down will clear the source pmd. But if we first
   232   * have a few 4kB-only pages that get moved down, and then
   233   * hit the "now the rest is PMD-aligned, let's do everything
   234   * one pmd at a time", we will still have the old (now empty
   235   * of any 4kB pages, but still there) PMD in the page table
   236   * tree.
   237   *
   238   * Warn on it once - because we really should try to figure
   239   * out how to do this better - but then say "I won't move
   240   * this pmd".
   241   *
   242   * One alternative might be to just unmap the target pmd at
   243   * this point, and verify that it really is empty. We'll see.
   244   */
   245  if (WARN_ON_ONCE(!pmd_none(*new_pmd)))
   246  return false;
   247  
   248  tlb_gather_mmu(&tlb, mm);
   249  /*
   250   * We don't have to worry about the ordering of src and dst
   251   * ptlocks because exclusive mmap_lock prevents deadlock.
   252   */
   253  old_ptl = pmd_lock(mm, old_pmd);
   254  new_ptl = pmd_lockptr(mm, new_pmd);
   255  if (new_ptl != old_ptl)
   256  spin_lock_nested(new_ptl, SINGLE_DEPTH_NESTING);
   257  
   258  /* Clear the pmd */
   259  pmd = *old_pmd;
   260  pmd_clear(old_pmd);
   261  
   262  /*
   263   * Mark the range. We are not freeing page table pages nor
   264   * regular pages. Hence we don't need to call tlb_remove_table()
   265   * or tlb_remove_page().
   266   */
 > 267  tlb_flush_pte_range(&tlb, old_addr, PMD_SIZE);
   268  tlb.freed_tables = 1;
   269  VM_BUG_ON(!pm

Re: [PATCH 01/10] alpha: use libata instead of the legacy ide driver

2021-03-18 Thread John Paul Adrian Glaubitz

Hi Al!

On 3/18/21 6:54 AM, Al Viro wrote:
> On Thu, Mar 18, 2021 at 05:56:57AM +0100, Christoph Hellwig wrote:
>> Switch the alpha defconfig from the legacy ide driver to libata.
> 
> Umm...  I don't have an IDE alpha box in a usable shape (fans on
> CPU module shat themselves), and it would take a while to resurrect
> it, but I remember the joy it used to cause in some versions.
> 
> Do you have reports of libata variants of drivers actually tested on
> those?

At least pata_cypress works fine on my AlphaStation XP1000:

root@tsunami:~> lspci
:00:07.0 ISA bridge: Contaq Microsystems 82c693
:00:07.1 IDE interface: Contaq Microsystems 82c693
:00:07.2 IDE interface: Contaq Microsystems 82c693
:00:07.3 USB controller: Contaq Microsystems 82c693
:00:0d.0 VGA compatible controller: Texas Instruments TVP4020 [Permedia 2] 
(rev 01)
0001:01:03.0 Ethernet controller: Digital Equipment Corporation DECchip 
21142/43 (rev 41)
0001:01:06.0 SCSI storage controller: QLogic Corp. ISP1020 Fast-wide SCSI (rev 
06)
0001:01:08.0 PCI bridge: Digital Equipment Corporation DECchip 21152 (rev 03)
0001:02:09.0 Ethernet controller: Intel Corporation 82541PI Gigabit Ethernet 
Controller (rev 05)
root@tsunami:~> lsmod|grep pata
pata_cypress3595  3
libata235071  2 ata_generic,pata_cypress
root@tsunami:~>

I also have two AlphaStation 233 currently in storage which I assume use
different IDE chipset which I could test as well.

Adrian

-- 
 .''`.  John Paul Adrian Glaubitz
: :' :  Debian Developer - glaub...@debian.org
`. `'   Freie Universitaet Berlin - glaub...@physik.fu-berlin.de
  `-GPG: 62FF 8A75 84E0 2956 9546  0006 7426 3B37 F5B5 F913

Re: [PATCH] powerpc/numa: Fix topology_physical_package_id() on pSeries

2021-03-18 Thread Cédric Le Goater

> Also we've been using it for several years and I don't think we should
> risk breaking anything by changing the value now.

I guess we can leave it that way. Please read the commit log of 
the second patch (not tagged as a v2 ...).

But we should remove ibm,chip-id from QEMU since the property does 
not exist on PAPR and that the calculation is anyhow very broken. 

Thanks,

C.

52 matches

Mail list logo