Re: [PATCH 2/2] MAINTAINERS: Make cxl obsolete

2024-05-03 Thread Andrew Donnellan
On Fri, 2024-05-03 at 13:15 +1000, Andrew Donnellan wrote:
> This doesn't seem quite right to me, I don't think we can just
> redefine
> CONFIG_CXL as a bool, but I'll do something like this. Probably won't
> bother for CXLFLASH since they'll see it for CXL anyway, but I might
> add a warning message on probe to both drivers.

The more I look at how to do this, the more issues I see, though
perhaps because I personally use olddefconfig more than I use
oldconfig.

Without changing the default to n, running olddefconfig is liable to
switch CXL back on in configs where the user has disabled it.

Conversely, if the user has set CXL=y rather than CXL=m, I'm not sure
if there's any way to make it such that olddefconfig doesn't reset one
symbol or the other to the default m.

Honestly, I'm very tempted to be a little more aggressive and a) not
bother with trying to play games with symbols, b) change the default to
n in this release, c) add a warning printed on probe, and see whether
anyone complains.

We could also print a message during the build itself, though that kind
of noise is liable to break things in other ways?

It would be kind of nice if kbuild had some way to mark a symbol for
deprecation which could print a warning during configuration.

-- 
Andrew DonnellanOzLabs, ADL Canberra
a...@linux.ibm.com   IBM Australia Limited


[PATCH v2 2/2] powerpc/64: Set _IO_BASE to POISON_POINTER_DELTA not 0 for CONFIG_PCI=n

2024-05-03 Thread Michael Ellerman
There is code that builds with calls to IO accessors even when
CONFIG_PCI=n, but the actual calls are guarded by runtime checks.

If not those calls would be faulting, because the page at virtual
address zero is (usually) not mapped into the kernel. As Arnd pointed
out, it is possible a large port value could cause the address to be
above mmap_min_addr which would then access userspace, which would be
a bug.

To avoid any such issues, set _IO_BASE to POISON_POINTER_DELTA. That
is a value chosen to point into unmapped space between the kernel and
userspace, so any access will always fault.

Note that on 32-bit POISON_POINTER_DELTA is 0, so the patch only has an
effect on 64-bit.

Signed-off-by: Michael Ellerman 
---
 arch/powerpc/include/asm/io.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

v2: Patch unchanged, changelog updated to reflect patch 1.

diff --git a/arch/powerpc/include/asm/io.h b/arch/powerpc/include/asm/io.h
index ba2e13bb879d..048e3705af20 100644
--- a/arch/powerpc/include/asm/io.h
+++ b/arch/powerpc/include/asm/io.h
@@ -37,7 +37,7 @@ extern struct pci_dev *isa_bridge_pcidev;
  * define properly based on the platform
  */
 #ifndef CONFIG_PCI
-#define _IO_BASE   0
+#define _IO_BASE   POISON_POINTER_DELTA
 #define _ISA_MEM_BASE  0
 #define PCI_DRAM_OFFSET 0
 #elif defined(CONFIG_PPC32)
-- 
2.44.0



[PATCH v2 1/2] powerpc/io: Avoid clang null pointer arithmetic warnings

2024-05-03 Thread Michael Ellerman
With -Wextra clang warns about pointer arithmetic using a null pointer.
When building with CONFIG_PCI=n, that triggers a warning in the IO
accessors, eg:

  In file included from linux/arch/powerpc/include/asm/io.h:672:
  linux/arch/powerpc/include/asm/io-defs.h:23:1: warning: performing pointer 
arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
 23 | DEF_PCI_AC_RET(inb, u8, (unsigned long port), (port), pio, port)
| ^~~~
  ...
  linux/arch/powerpc/include/asm/io.h:591:53: note: expanded from macro 
'__do_inb'
591 | #define __do_inb(port)  readb((PCI_IO_ADDR)_IO_BASE + port);
|   ~ ^

That is because when CONFIG_PCI=n, _IO_BASE is defined as 0.

Although _IO_BASE is defined as plain 0, the cast (PCI_IO_ADDR) converts
it to void * before the addition with port happens.

Instead the addition can be done first, and then the cast. The resulting
value will be the same, but avoids the warning, and also avoids void
pointer arithmetic which is apparently non-standard.

Reported-by: Naresh Kamboju 
Closes: 
https://lore.kernel.org/all/CA+G9fYtEh8zmq8k8wE-8RZwW-Qr927RLTn+KqGnq1F=ptaa...@mail.gmail.com
Signed-off-by: Michael Ellerman 
---
 arch/powerpc/include/asm/io.h | 24 
 1 file changed, 12 insertions(+), 12 deletions(-)

v2: New.

diff --git a/arch/powerpc/include/asm/io.h b/arch/powerpc/include/asm/io.h
index 08c550ed49be..ba2e13bb879d 100644
--- a/arch/powerpc/include/asm/io.h
+++ b/arch/powerpc/include/asm/io.h
@@ -585,12 +585,12 @@ __do_out_asm(_rec_outl, "stwbrx")
 #define __do_inw(port) _rec_inw(port)
 #define __do_inl(port) _rec_inl(port)
 #else /* CONFIG_PPC32 */
-#define __do_outb(val, port)   writeb(val,(PCI_IO_ADDR)_IO_BASE+port);
-#define __do_outw(val, port)   writew(val,(PCI_IO_ADDR)_IO_BASE+port);
-#define __do_outl(val, port)   writel(val,(PCI_IO_ADDR)_IO_BASE+port);
-#define __do_inb(port) readb((PCI_IO_ADDR)_IO_BASE + port);
-#define __do_inw(port) readw((PCI_IO_ADDR)_IO_BASE + port);
-#define __do_inl(port) readl((PCI_IO_ADDR)_IO_BASE + port);
+#define __do_outb(val, port)   writeb(val,(PCI_IO_ADDR)(_IO_BASE+port));
+#define __do_outw(val, port)   writew(val,(PCI_IO_ADDR)(_IO_BASE+port));
+#define __do_outl(val, port)   writel(val,(PCI_IO_ADDR)(_IO_BASE+port));
+#define __do_inb(port) readb((PCI_IO_ADDR)(_IO_BASE + port));
+#define __do_inw(port) readw((PCI_IO_ADDR)(_IO_BASE + port));
+#define __do_inl(port) readl((PCI_IO_ADDR)(_IO_BASE + port));
 #endif /* !CONFIG_PPC32 */
 
 #ifdef CONFIG_EEH
@@ -606,12 +606,12 @@ __do_out_asm(_rec_outl, "stwbrx")
 #define __do_writesw(a, b, n)  _outsw(PCI_FIX_ADDR(a),(b),(n))
 #define __do_writesl(a, b, n)  _outsl(PCI_FIX_ADDR(a),(b),(n))
 
-#define __do_insb(p, b, n) readsb((PCI_IO_ADDR)_IO_BASE+(p), (b), (n))
-#define __do_insw(p, b, n) readsw((PCI_IO_ADDR)_IO_BASE+(p), (b), (n))
-#define __do_insl(p, b, n) readsl((PCI_IO_ADDR)_IO_BASE+(p), (b), (n))
-#define __do_outsb(p, b, n)writesb((PCI_IO_ADDR)_IO_BASE+(p),(b),(n))
-#define __do_outsw(p, b, n)writesw((PCI_IO_ADDR)_IO_BASE+(p),(b),(n))
-#define __do_outsl(p, b, n)writesl((PCI_IO_ADDR)_IO_BASE+(p),(b),(n))
+#define __do_insb(p, b, n) readsb((PCI_IO_ADDR)(_IO_BASE+(p)), (b), (n))
+#define __do_insw(p, b, n) readsw((PCI_IO_ADDR)(_IO_BASE+(p)), (b), (n))
+#define __do_insl(p, b, n) readsl((PCI_IO_ADDR)(_IO_BASE+(p)), (b), (n))
+#define __do_outsb(p, b, n)writesb((PCI_IO_ADDR)(_IO_BASE+(p)),(b),(n))
+#define __do_outsw(p, b, n)writesw((PCI_IO_ADDR)(_IO_BASE+(p)),(b),(n))
+#define __do_outsl(p, b, n)writesl((PCI_IO_ADDR)(_IO_BASE+(p)),(b),(n))
 
 #define __do_memset_io(addr, c, n) \
_memset_io(PCI_FIX_ADDR(addr), c, n)
-- 
2.44.0



Re: [PATCH v15 00/16] Add audio support in v4l2 framework

2024-05-03 Thread Mauro Carvalho Chehab
Em Fri, 3 May 2024 10:47:19 +0900
Mark Brown  escreveu:

> On Thu, May 02, 2024 at 10:26:43AM +0100, Mauro Carvalho Chehab wrote:
> > Mauro Carvalho Chehab  escreveu:  
> 
> > > There are still time control associated with it, as audio and video
> > > needs to be in sync. This is done by controlling the buffers size 
> > > and could be fine-tuned by checking when the buffer transfer is done.  
> 
> ...
> 
> > Just complementing: on media, we do this per video buffer (or
> > per half video buffer). A typical use case on cameras is to have
> > buffers transferred 30 times per second, if the video was streamed 
> > at 30 frames per second.   
> 
> IIRC some big use case for this hardware was transcoding so there was a
> desire to just go at whatever rate the hardware could support as there
> is no interactive user consuming the output as it is generated.

Indeed, codecs could be used to just do transcoding, but I would
expect it to be a border use case. See, as the chipsets implementing 
codecs are typically the ones used on mobiles, I would expect that
the major use cases to be to watch audio and video and to participate
on audio/video conferences.

Going further, the codec API may end supporting not only transcoding
(which is something that CPU can usually handle without too much
processing) but also audio processing that may require more 
complex algorithms - even deep learning ones - like background noise
removal, echo detection/removal, volume auto-gain, audio enhancement
and such.

On other words, the typical use cases will either have input
or output being a physical hardware (microphone or speaker).

> > I would assume that, on an audio/video stream, the audio data
> > transfer will be programmed to also happen on a regular interval.  
> 
> With audio the API is very much "wake userspace every Xms".


Re: [PATCH v3 00/11] sysctl: treewide: constify ctl_table argument of sysctl handlers

2024-05-03 Thread Joel Granados
Hey Thomas

Here is my feedback for your outstanding constification patches [1] and [2].

# You need to split the patch
The answer that you got from Jakub in the network subsystem is very clear and
baring a change of heart from the network folks, this will go in as but as a
split patchset. Please split it considering the following:
1. Create a different patchset for drivers/,  fs/, kernel/, net, and a
   miscellaneous that includes whatever does not fit into the others.
2. Consider that this might take several releases.
3. Consider the following sufix for the interim function name "_const". Like in
   kfree_const. Please not "_new".
4. Please publish the final result somewhere. This is important so someone can
   take over in case you need to stop.
5. Consistently mention the motivation in your cover letters. I specify more
   further down in "#Motivation".
6. Also mention that this is part of a bigger effort (like you did in your
   original cover letters). I would include [3,4,5,6]
7. Include a way to show what made it into .rodata. I specify more further down
   in "#Show the move".

# Motivation
As I read it, the motivation for these constification efforts are:
1. It provides increased safety: Having things in .rodata section reduces the
   attack surface. This is especially relevant for structures that have function
   pointers (like ctl_table); having these in .rodata means that these pointers
   always point to the "intended" function and cannot be changed.
2. Compiler optimizations: This was just a comment in the patchsets that I have
   mentioned ([3,4,5]). Do you know what optimizations specifically? Does it
   have to do with enhancing locality for the data in .rodata? Do you have other
   specific optimizations in mind?
3. Readability: because it is easier to know up-front that data is not supposed
   to change or its obvious that a function is re-entrant. Actually a lot of the
   readability reasons is about knowing things "up-front".
As we move forward with the constification in sysctl, please include a more
detailed motivation in all your cover letters. This helps maintainers (that
don't have the context) understand what you are trying to do. It does not need
to be my three points, but it should be more than just "put things into
.rodata". Please tell me if I have missed anything in the motivation.

# Show the move
I created [8] because there is no easy way to validate which objects made it
into .rodata. I ran [8] for your Dec 2nd patcheset [7] and there are less in
.rodata than I expected (the results are in [9]) Why is that? Is it something
that has not been posted to the lists yet? 

Best

[1] 
https://lore.kernel.org/all/20240423-sysctl-const-handler-v3-0-e0beccb83...@weissschuh.net/
[2] 
https://lore.kernel.org/all/20240418-sysctl-const-table-arg-v2-1-4012abc31...@weissschuh.net
[3] [PATCH v2 00/14] ASoC: Constify local snd_sof_dsp_ops

https://lore.kernel.org/all/20240426-n-const-ops-var-v2-0-e553fe67a...@kernel.org
[4] [PATCH v2 00/19] backlight: Constify lcd_ops

https://lore.kernel.org/all/20240424-video-backlight-lcd-ops-v2-0-1aaa82b07...@kernel.org
[5] [PATCH 1/4] iommu: constify pointer to bus_type

https://lore.kernel.org/all/20240216144027.185959-1-krzysztof.kozlow...@linaro.org
[6] [PATCH 00/29] const xattr tables
https://lore.kernel.org/all/20230930050033.41174-1-wedso...@gmail.com
[7] 
https://lore.kernel.org/all/20231204-const-sysctl-v2-0-7a5060b11...@weissschuh.net/

[8]
#!/usr/bin/python3

import subprocess
import re

def exec_cmd( cmd ):
try:
result = subprocess.run(cmd, shell=True, text=True, check=True, 
capture_output=True)
output_lines = result.stdout.splitlines()
return output_lines
except Exception as e:
print(f"An error occurred: {e}")
return []

def remove_tokens_re(lines, regex_patterns, uniq = True):
filtered_lines = []
seen_lines = set()
regexes = [re.compile(pattern) for pattern in regex_patterns]

for line in lines:
for regex in regexes:
line = regex.sub('', line)  # Replace matches with empty string

if uniq:
if line not in seen_lines:
seen_lines.add(line)
filtered_lines.append(line)
else:
filtered_lines.append(line)

return filtered_lines

def filter_in_lines(lines, regex_patterns):
filtered_lines = []
regexes = [re.compile(pattern) for pattern in regex_patterns]

for line in lines:
if any(regex.search(line) for regex in regexes):
filtered_lines.append(line)

return filtered_lines

cmd = "git grep 'static \(const \)\?struct ctl_table '"
regex_patterns = ['[\}]*;$', ' = \{', '\[.*\]', '.*\.(c|h):[ \t]*static 
(const )?struct ctl_table ']
ctl_table_structs = remove_tokens_re(exec_cmd( cmd ), regex_patterns)


Re: [PATCH 2/2] MAINTAINERS: Make cxl obsolete

2024-05-03 Thread Michael Ellerman
Andrew Donnellan  writes:
> On Fri, 2024-05-03 at 13:15 +1000, Andrew Donnellan wrote:
>> This doesn't seem quite right to me, I don't think we can just
>> redefine
>> CONFIG_CXL as a bool, but I'll do something like this. Probably won't
>> bother for CXLFLASH since they'll see it for CXL anyway, but I might
>> add a warning message on probe to both drivers.
>
> The more I look at how to do this, the more issues I see, though
> perhaps because I personally use olddefconfig more than I use
> oldconfig.
>
> Without changing the default to n, running olddefconfig is liable to
> switch CXL back on in configs where the user has disabled it.

Yes that's true.

> Conversely, if the user has set CXL=y rather than CXL=m, I'm not sure
> if there's any way to make it such that olddefconfig doesn't reset one
> symbol or the other to the default m.
>
> Honestly, I'm very tempted to be a little more aggressive and a) not
> bother with trying to play games with symbols, b) change the default to
> n in this release, c) add a warning printed on probe, and see whether
> anyone complains.

You mean just changing CXL to default n?

The problem is that has no effect on folks with existing configs. Those
of us who build from defconfigs will have it turned off, but any actual
users with existing configs will just still have it enabled.

I'm not really convinced printing warnings does much. I guess an actual
WARN_ON might work, but only if someone is watching the console.

> We could also print a message during the build itself, though that kind
> of noise is liable to break things in other ways?

More likely to break some CI somewhere, and a good chance it isn't even
seen by a human unless they're paying close attention to the build
output.

> It would be kind of nice if kbuild had some way to mark a symbol for
> deprecation which could print a warning during configuration.

Yeah, though it suffers from the same problem that there's a good chance
no one notices.

The below I think works. It does print a warning about CXL changing from
tristate to bool, but that seems harmless.

In all cases olddefconfig will turn CXL off, whether it was on, off
or =m beforehand. A fresh defconfig has it off. The only way to turn it
on is explicitly.

cheers

diff --git a/drivers/misc/cxl/Kconfig b/drivers/misc/cxl/Kconfig
index 5efc4151bf58..e62c16cc7292 100644
--- a/drivers/misc/cxl/Kconfig
+++ b/drivers/misc/cxl/Kconfig
@@ -9,11 +9,18 @@ config CXL_BASE
select PPC_64S_HASH_MMU

 config CXL
-   tristate "Support for IBM Coherent Accelerators (CXL)"
+   def_bool y
+   depends on DEPRECATED_CXL
+
+config DEPRECATED_CXL
+   tristate "Deprecated support for IBM Coherent Accelerators (CXL)"
depends on PPC_POWERNV && PCI_MSI && EEH
select CXL_BASE
-   default m
+   default n
help
+ The cxl driver is no longer actively maintained and we intend to
+ remove it in a future kernel release.
+
  Select this option to enable driver support for IBM Coherent
  Accelerators (CXL).  CXL is otherwise known as Coherent Accelerator
  Processor Interface (CAPI).  CAPI allows accelerators in FPGAs to be


Re: [PATCH v7 00/16] mm: jit/text allocator

2024-05-03 Thread Liviu Dudau
On Fri, May 03, 2024 at 09:28:25AM +0300, Mike Rapoport wrote:
> On Fri, May 03, 2024 at 01:23:30AM +0100, Liviu Dudau wrote:
> > On Thu, May 02, 2024 at 04:07:05PM -0700, Luis Chamberlain wrote:
> > > On Thu, May 02, 2024 at 11:50:36PM +0100, Liviu Dudau wrote:
> > > > On Mon, Apr 29, 2024 at 09:29:20AM -0700, Luis Chamberlain wrote:
> > > > > On Mon, Apr 29, 2024 at 03:16:04PM +0300, Mike Rapoport wrote:
> > > > > > From: "Mike Rapoport (IBM)" 
> > > > > > 
> > > > > > Hi,
> > > > > > 
> > > > > > The patches are also available in git:
> > > > > > https://git.kernel.org/pub/scm/linux/kernel/git/rppt/linux.git/log/?h=execmem/v7
> > > > > > 
> > > > > > v7 changes:
> > > > > > * define MODULE_{VADDR,END} for riscv32 to fix the build and avoid
> > > > > >   #ifdefs in a function body
> > > > > > * add Acks, thanks everybody
> > > > > 
> > > > > Thanks, I've pushed this to modules-next for further exposure / 
> > > > > testing.
> > > > > Given the status of testing so far with prior revisions, in that only 
> > > > > a
> > > > > few issues were found and that those were fixed, and the status of
> > > > > reviews, this just might be ripe for v6.10.
> > > > 
> > > > Looks like there is still some work needed. I've picked up next-20240501
> > > > and on arch/mips with CONFIG_MODULE_COMPRESS_XZ=y and 
> > > > CONFIG_MODULE_DECOMPRESS=y
> > > > I fail to load any module:
> > > > 
> > > > # modprobe rfkill
> > > > [11746.539090] Invalid ELF header magic: != ELF
> > > > [11746.587149] execmem: unable to allocate memory
> > > > modprobe: can't load module rfkill (kernel/net/rfkill/rfkill.ko.xz): 
> > > > Out of memory
> > > > 
> > > > The (hopefully) relevant parts of my .config:
> > > 
> > > Thanks for the report! Any chance we can get you to try a bisection? I
> > > think it should take 2-3 test boots. To help reduce scope you try 
> > > modules-next:
> > > 
> > > https://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux.git/log/?h=modules-next
> > > 
> > > Then can you check by resetting your tree to commmit 3fbe6c2f820a76 (mm:
> > > introduce execmem_alloc() and execmem_free()"). I suspect that should
> > > boot, so your bad commit would be the tip 3c2c250cb3a5fbb ("bpf: remove
> > > CONFIG_BPF_JIT dependency on CONFIG_MODULES of").
> > > 
> > > That gives us only a few commits to bisect:
> > > 
> > > git log --oneline 3fbe6c2f820a76bc36d5546bda85832f57c8fce2..
> > > 3c2c250cb3a5 (HEAD -> modules-next, korg/modules-next) bpf: remove 
> > > CONFIG_BPF_JIT dependency on CONFIG_MODULES of
> > > 11e8e65cce5c kprobes: remove dependency on CONFIG_MODULES
> > > e10cbc38697b powerpc: use CONFIG_EXECMEM instead of CONFIG_MODULES where 
> > > appropriate
> > > 4da3d38f24c5 x86/ftrace: enable dynamic ftrace without CONFIG_MODULES
> > > 13ae3d74ee70 arch: make execmem setup available regardless of 
> > > CONFIG_MODULES
> > > 460bbbc70a47 powerpc: extend execmem_params for kprobes allocations
> > > e1a14069b5b4 arm64: extend execmem_info for generated code allocations
> > > 971e181c6585 riscv: extend execmem_params for generated code allocations
> > > 0fa276f26721 mm/execmem, arch: convert remaining overrides of 
> > > module_alloc to execmem
> > > 022cef244287 mm/execmem, arch: convert simple overrides of module_alloc 
> > > to execmem
> > > 
> > > With 2-3 boots we should be to tell which is the bad commit.
> > 
> > Looks like 0fa276f26721 is the first bad commit.
> > 
> > $ git bisect log
> > # bad: [3c2c250cb3a5fbbccc4a4ff4c9354c54af91f02c] bpf: remove 
> > CONFIG_BPF_JIT dependency on CONFIG_MODULES of
> > # good: [3fbe6c2f820a76bc36d5546bda85832f57c8fce2] mm: introduce 
> > execmem_alloc() and execmem_free()
> > git bisect start '3c2c250cb3a5' '3fbe6c2f820a76'
> > # bad: [460bbbc70a47e929b1936ca68979f3b79f168fc6] powerpc: extend 
> > execmem_params for kprobes allocations
> > git bisect bad 460bbbc70a47e929b1936ca68979f3b79f168fc6
> > # bad: [0fa276f26721e0ffc2ae9c7cf67dcc005b43c67e] mm/execmem, arch: convert 
> > remaining overrides of module_alloc to execmem
> > git bisect bad 0fa276f26721e0ffc2ae9c7cf67dcc005b43c67e
> > # good: [022cef2442870db738a366d3b7a636040c081859] mm/execmem, arch: 
> > convert simple overrides of module_alloc to execmem
> > git bisect good 022cef2442870db738a366d3b7a636040c081859
> > # first bad commit: [0fa276f26721e0ffc2ae9c7cf67dcc005b43c67e] mm/execmem, 
> > arch: convert remaining overrides of module_alloc to execmem
> > 
> > Maybe MIPS also needs a ARCH_WANTS_EXECMEM_LATE?
> 
> I don't think so. It rather seems there's a bug in the initialization of
> the defaults in execmem. This should fix it:
> 
> diff --git a/mm/execmem.c b/mm/execmem.c
> index f6dc3fabc1ca..0c4b36bc6d10 100644
> --- a/mm/execmem.c
> +++ b/mm/execmem.c
> @@ -118,7 +118,6 @@ static void __init __execmem_init(void)
>   info->ranges[EXECMEM_DEFAULT].end = VMALLOC_END;
>   info->ranges[EXECMEM_DEFAULT].pgprot = PAGE_KERNEL_EXEC;
>   info->ranges[EXECMEM_DEFAULT].alignment = 1;
> - 

[powerpc:next-test] BUILD SUCCESS cebb0005e8e4bc482151a261af649ab1a73edffd

2024-05-03 Thread kernel test robot
tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git 
next-test
branch HEAD: cebb0005e8e4bc482151a261af649ab1a73edffd  Documentation: Document 
PowerPC kernel dynamic DEXCR interface

elapsed time: 1207m

configs tested: 139
configs skipped: 3

The following configs have been built successfully.
More configs may be tested in the coming days.

tested configs:
alpha allnoconfig   gcc  
alphaallyesconfig   gcc  
alpha   defconfig   gcc  
arc  allmodconfig   gcc  
arc   allnoconfig   gcc  
arc  allyesconfig   gcc  
arc  axs103_defconfig   gcc  
arc defconfig   gcc  
arc   randconfig-001-20240503   gcc  
arc   randconfig-002-20240503   gcc  
arc   tb10x_defconfig   gcc  
arm  allmodconfig   gcc  
arm   allnoconfig   clang
arm  allyesconfig   gcc  
arm defconfig   clang
armmulti_v5_defconfig   gcc  
arm   randconfig-004-20240503   gcc  
arm s3c6400_defconfig   gcc  
arm   sama5_defconfig   gcc  
arm64allmodconfig   clang
arm64 allnoconfig   gcc  
arm64   defconfig   gcc  
arm64 randconfig-002-20240503   gcc  
csky allmodconfig   gcc  
csky  allnoconfig   gcc  
csky allyesconfig   gcc  
cskydefconfig   gcc  
csky  randconfig-001-20240503   gcc  
csky  randconfig-002-20240503   gcc  
hexagon  allmodconfig   clang
hexagon   allnoconfig   clang
hexagon  allyesconfig   clang
hexagon defconfig   clang
i386 allmodconfig   gcc  
i386  allnoconfig   gcc  
i386 allyesconfig   gcc  
i386 buildonly-randconfig-001-20240503   clang
i386 buildonly-randconfig-002-20240503   clang
i386 buildonly-randconfig-006-20240503   clang
i386defconfig   clang
i386  randconfig-002-20240503   clang
i386  randconfig-003-20240503   clang
i386  randconfig-005-20240503   clang
i386  randconfig-006-20240503   clang
i386  randconfig-011-20240503   clang
i386  randconfig-016-20240503   clang
loongarchallmodconfig   gcc  
loongarch allnoconfig   gcc  
loongarchallyesconfig   gcc  
loongarch   defconfig   gcc  
loongarch randconfig-001-20240503   gcc  
loongarch randconfig-002-20240503   gcc  
m68k allmodconfig   gcc  
m68k  allnoconfig   gcc  
m68k allyesconfig   gcc  
m68kdefconfig   gcc  
microblaze   allmodconfig   gcc  
microblazeallnoconfig   gcc  
microblaze   allyesconfig   gcc  
microblaze  defconfig   gcc  
mips allmodconfig   gcc  
mips  allnoconfig   gcc  
mips allyesconfig   gcc  
nios2allmodconfig   gcc  
nios2 allnoconfig   gcc  
nios2allyesconfig   gcc  
nios2   defconfig   gcc  
nios2 randconfig-001-20240503   gcc  
nios2 randconfig-002-20240503   gcc  
openrisc allmodconfig   gcc  
openrisc  allnoconfig   gcc  
openrisc allyesconfig   gcc  
openriscdefconfig   gcc  
parisc   allmodconfig   gcc  
pariscallnoconfig   gcc  
parisc   allyesconfig   gcc  
parisc  defconfig   gcc  
pariscrandconfig-001-20240503   gcc  
pariscrandconfig-002-20240503   gcc  
parisc64defconfig   gcc  
powerpc  allmodconfig   gcc  
powerpc   allnoconfig   gcc  
powerpc  allyesconfig   clang
powerpcge_imp3a_defconfig   gcc  
powerpc mpc8313_rdb_defconfig   gcc  
powerpc

Re: [PATCH 1/3] powerpc/mm: Align memory_limit value specified using mem= kernel parameter

2024-05-03 Thread Michael Ellerman
On Wed, 03 Apr 2024 14:06:09 +0530, Aneesh Kumar K.V (IBM) wrote:
> The value specified for the memory limit is used to set a restriction on
> memory usage. It is important to ensure that this restriction is within
> the linear map kernel address space range. The hash page table
> translation uses a 16MB page size to map the kernel linear map address
> space. htab_bolt_mapping() function aligns down the size of the range
> while mapping kernel linear address space. Since the memblock limit is
> enforced very early during boot, before we can detect the type of memory
> translation (radix vs hash), we align the memory limit value specified
> as a kernel parameter to 16MB. This alignment value will work for both
> hash and radix translations.
> 
> [...]

Applied to powerpc/next.

[1/3] powerpc/mm: Align memory_limit value specified using mem= kernel parameter
  https://git.kernel.org/powerpc/c/5ca096161cdccfa328acf6704a4615528471d309
[2/3] powerpc/fadump: Don't update the user-specified memory limit
  https://git.kernel.org/powerpc/c/f94f5ac07983cb53de0c964f5428366c19e81993
[3/3] powerpc/mm: Update the memory limit based on direct mapping restrictions
  https://git.kernel.org/powerpc/c/5a799af9522641517f6d871d9f56e2658ee7db58

cheers


Re: [PATCH v2] powerpc/eeh: Permanently disable the removed device

2024-05-03 Thread Michael Ellerman
On Mon, 22 Apr 2024 13:27:37 +0530, Ganesh Goudar wrote:
> When a device is hot removed on powernv, the hotplug driver clears
> the device's state. However, on pseries, if a device is removed by
> phyp after reaching the error threshold, the kernel remains unaware,
> leading to the device not being torn down. This prevents necessary
> remediation actions like failover.
> 
> Permanently disable the device if the presence check fails.
> 
> [...]

Applied to powerpc/next.

[1/1] powerpc/eeh: Permanently disable the removed device
  https://git.kernel.org/powerpc/c/d1679b4fa1722e6bb4a17b13aacdc01a130ba362

cheers


Re: [PATCH v2] powerpc/pseries: remove returning ENODEV when uevent is triggered

2024-05-03 Thread Michael Ellerman
On Thu, 11 Apr 2024 10:04:50 +0800, Lidong Zhong wrote:
> We noticed the following nuisance messages during boot process
> 
> [7.120610][ T1060] vio vio: uevent: failed to send synthetic uevent
> [7.122281][ T1060] vio 4000: uevent: failed to send synthetic uevent
> [7.122304][ T1060] vio 4001: uevent: failed to send synthetic uevent
> [7.122324][ T1060] vio 4002: uevent: failedto send synthetic uevent
> [7.122345][ T1060] vio 4004: uevent: failed to send synthetic uevent
> 
> [...]

Applied to powerpc/next.

[1/1] powerpc/pseries: remove returning ENODEV when uevent is triggered
  https://git.kernel.org/powerpc/c/29247de4ad753771afef95ace8af738d807ca279

cheers


Re: [PATCH 1/3] selftest/powerpc: Re-order *FLAGS to follow lib.mk

2024-05-03 Thread Michael Ellerman
On Thu, 29 Feb 2024 15:07:09 +0530, Madhavan Srinivasan wrote:
> In some powerpc/ sub-folder Makefiles, CFLAGS are
> defined before lib.mk include. Clean it up by
> re-ordering it to follow after the mk include.
> This is needed to support sub-folders in powerpc/
> buildable on its own.
> 
> 
> [...]

Applied to powerpc/next.

[1/3] selftest/powerpc: Re-order *FLAGS to follow lib.mk
  https://git.kernel.org/powerpc/c/37496845c812db2a470d51088a59ee38156e8058
[2/3] selftest/powerpc: Add flags.mk to support pmu buildable
  https://git.kernel.org/powerpc/c/5553a79387e92ffd812a49fdcf679f392281f6a9
[3/3] selftest/powerpc: make sub-folders buildable on it own
  https://git.kernel.org/powerpc/c/108e5e68615023265a9a73a29d4c2fa16c70

cheers


Re: [PATCH] MAINTAINERS: MMU GATHER: Update Aneesh's address

2024-05-03 Thread Michael Ellerman
On Tue, 30 Apr 2024 14:43:27 +1000, Michael Ellerman wrote:
> Aneesh's IBM address no longer works, switch to his preferred kernel.org
> address.
> 
> 

Applied to powerpc/next.

[1/1] MAINTAINERS: MMU GATHER: Update Aneesh's address
  https://git.kernel.org/powerpc/c/1fcd254733371cfa5a3602bab5ae2c9dc4bf69e6

cheers


Re: [PATCH] MAINTAINERS: powerpc: Remove Aneesh

2024-05-03 Thread Michael Ellerman
On Tue, 30 Apr 2024 14:42:28 +1000, Michael Ellerman wrote:
> Aneesh is stepping down from powerpc maintenance.
> 
> 

Applied to powerpc/next.

[1/1] MAINTAINERS: powerpc: Remove Aneesh
  https://git.kernel.org/powerpc/c/6a3e640b5dcf56fb44d66d525e01ea08633c6b8b

cheers


Re: [PATCH] powerpc/dart: Drop unnecessary call to kmemleak_no_scan()

2024-05-03 Thread Michael Ellerman
On Fri, 19 Apr 2024 21:59:13 +1000, Michael Ellerman wrote:
> Erhard reported that kmemleak was showing a warning at boot:
> 
>   kmemleak: Not scanning unknown object at 0xc0007f00
>   CPU: 0 PID: 0 Comm: swapper Not tainted 5.19.0-rc3-PMacG5+ #2
>   Call Trace:
>.dump_stack_lvl+0x7c/0xc4 (unreliable)
>.kmemleak_no_scan+0xe0/0x100
>.iommu_init_early_dart+0x2f0/0x924
>.pmac_probe+0x1b0/0x20c
>.setup_arch+0x1b8/0x674
>.start_kernel+0xdc/0xb74
>start_here_common+0x1c/0x44
>   DART table allocated at: (ptrval)
> 
> [...]

Applied to powerpc/next.

[1/1] powerpc/dart: Drop unnecessary call to kmemleak_no_scan()
  https://git.kernel.org/powerpc/c/4ccae23609f589dd69a593f457f76ee8b0e2d4e0

cheers


Re: [PATCH] powerpc: Mark memory_limit as initdata

2024-05-03 Thread Michael Ellerman
On Mon, 22 Apr 2024 21:52:31 +1000, Michael Ellerman wrote:
> The `memory_limit` variable should only be used during boot, enforce
> that by marking it initdata.
> 
> 

Applied to powerpc/next.

[1/1] powerpc: Mark memory_limit as initdata
  https://git.kernel.org/powerpc/c/236a4c63491784ae4814100cca47bc3645c776df

cheers


Re: [PATCH v2 1/2] selftests/powerpc: Convert pmu Makefile to for loop style

2024-05-03 Thread Michael Ellerman
On Mon, 22 Apr 2024 23:34:52 +1000, Michael Ellerman wrote:
> The pmu Makefile has grown more sub directories over the years. Rather
> than open coding the rules for each subdir, use for loops.
> 
> 

Applied to powerpc/next.

[1/2] selftests/powerpc: Convert pmu Makefile to for loop style
  https://git.kernel.org/powerpc/c/822a04957cc5e675570645f506270797a1cf2865
[2/2] selftests/powerpc: Install tests in sub-directories
  https://git.kernel.org/powerpc/c/dda32e37d397f5937cc24a6e98b71d3645f51afa

cheers


Re: [PATCH] powerpc/pseries: Enforce hcall result buffer validity and size

2024-05-03 Thread Michael Ellerman
On Mon, 08 Apr 2024 09:08:31 -0500, Nathan Lynch wrote:
> plpar_hcall(), plpar_hcall9(), and related functions expect callers to
> provide valid result buffers of certain minimum size. Currently this
> is communicated only through comments in the code and the compiler has
> no idea.
> 
> For example, if I write a bug like this:
> 
> [...]

Applied to powerpc/next.

[1/1] powerpc/pseries: Enforce hcall result buffer validity and size
  https://git.kernel.org/powerpc/c/ff2e185cf73df480ec69675936c4ee75a445c3e4

cheers


Re: [PATCH linux-next] macintosh/macio-adb: replace of_node_put() with __free

2024-05-03 Thread Michael Ellerman
On Wed, 24 Apr 2024 20:37:18 +0530, sundar wrote:
> use the new cleanup magic to replace of_node_put() with
> __free(device_node) marking to auto release when they get out of scope.
> 
> 

Applied to powerpc/next.

[1/1] macintosh/macio-adb: replace of_node_put() with __free
  https://git.kernel.org/powerpc/c/84030aacf127d000180fa3cb4b589d8ab1b0d46b

cheers


Re: [PATCH v10 0/3] powerpc: make fadump resilient with memory add/remove events

2024-05-03 Thread Michael Ellerman
On Tue, 23 Apr 2024 01:29:29 +0530, Sourabh Jain wrote:
> Problem:
> 
> Due to changes in memory resources caused by either memory hotplug or
> online/offline events, the elfcorehdr, which describes the cpus and
> memory of the crashed kernel to the kernel that collects the dump (known
> as second/fadump kernel), becomes outdated. Consequently, attempting
> dump collection with an outdated elfcorehdr can lead to failed or
> inaccurate dump collection.
> 
> [...]

Applied to powerpc/next.

[1/3] powerpc: make fadump resilient with memory add/remove events
  https://git.kernel.org/powerpc/c/c6c5b14dac0d1bd0da8b4d1d3b77f18eb9085fcb
[2/3] powerpc/fadump: add hotplug_ready sysfs interface
  https://git.kernel.org/powerpc/c/bc446c5acabadeb38b61b565535401c5dfdd1214
[3/3] Documentation/powerpc: update fadump implementation details
  https://git.kernel.org/powerpc/c/57e6700145c5d1f49c52137e9163f73ec5441256

cheers


Re: [PATCH v2 0/2] powerpc/pseries: Fixes for lparstat boot reports

2024-05-03 Thread Michael Ellerman
On Fri, 12 Apr 2024 14:50:45 +0530, Shrikanth Hegde wrote:
> Currently lparstat reports which shows since LPAR boot are wrong for
> some fields. There is a need for storing the PIC(Pool Idle Count) at
> boot for accurate reporting. PATCH 1 Does that.
> 
> While there, it was noticed that hcall return value is long and both
> h_get_ppp and h_get_mpp could set the uninitialized values if the hcall
> fails. PATCH 2 does that.
> 
> [...]

Applied to powerpc/next.

[1/2] powerpc/pseries: Add pool idle time at LPAR boot
  https://git.kernel.org/powerpc/c/9c74ecfd0fc46e2eaf92c1b6169cc0c8a87f1dc2
[2/2] powerpc/pseries: Add failure related checks for h_get_mpp and h_get_ppp
  https://git.kernel.org/powerpc/c/6d4341638516bf97b9a34947e0bd95035a8230a5

cheers


Re: [PATCH v2] powerpc/pseries/iommu: LPAR panics during boot up with a frozen PE

2024-05-03 Thread Michael Ellerman
On Mon, 22 Apr 2024 15:51:41 -0500, Gaurav Batra wrote:
> At the time of LPAR boot up, partition firmware provides Open Firmware
> property ibm,dma-window for the PE. This property is provided on the PCI
> bus the PE is attached to.
> 
> There are execptions where the partition firmware might not provide this
> property for the PE at the time of LPAR boot up. One of the scenario is
> where the firmware has frozen the PE due to some error condition. This
> PE is frozen for 24 hours or unless the whole system is reinitialized.
> 
> [...]

Applied to powerpc/fixes.

[1/1] powerpc/pseries/iommu: LPAR panics during boot up with a frozen PE
  https://git.kernel.org/powerpc/c/49a940dbdc3107fecd5e6d3063dc07128177e058

cheers


Re: [PATCH v4] powerpc/pseries: make max polling consistent for longer H_CALLs

2024-05-03 Thread Michael Ellerman
On Wed, 17 Apr 2024 23:12:30 -0400, Nayna Jain wrote:
> Currently, plpks_confirm_object_flushed() function polls for 5msec in
> total instead of 5sec.
> 
> Keep max polling time consistent for all the H_CALLs, which take longer
> than expected, to be 5sec. Also, make use of fsleep() everywhere to
> insert delay.
> 
> [...]

Applied to powerpc/fixes.

[1/1] powerpc/pseries: make max polling consistent for longer H_CALLs
  https://git.kernel.org/powerpc/c/784354349d2c988590c63a5a001ca37b2a6d4da1

cheers


Re: [PATCH] Fix the address of the linuxppc-dev mailing list

2024-05-03 Thread Michael Ellerman
Stephen Rothwell  writes:
> This list was moved many years ago.
>
> Signed-off-by: Stephen Rothwell 
> ---
>  Documentation/ABI/testing/sysfs-devices-system-cpu | 14 +++---
>  .../ABI/testing/sysfs-firmware-opal-powercap   |  4 ++--
>  Documentation/ABI/testing/sysfs-firmware-opal-psr  |  4 ++--
>  .../ABI/testing/sysfs-firmware-opal-sensor-groups  |  4 ++--
>  .../testing/sysfs-firmware-papr-energy-scale-info  | 10 +-
>  5 files changed, 18 insertions(+), 18 deletions(-)

These are mostly powerpc specific files so I can take this.

cheers

> diff --git a/Documentation/ABI/testing/sysfs-devices-system-cpu 
> b/Documentation/ABI/testing/sysfs-devices-system-cpu
> index 710d47be11e0..e7e160954e79 100644
> --- a/Documentation/ABI/testing/sysfs-devices-system-cpu
> +++ b/Documentation/ABI/testing/sysfs-devices-system-cpu
> @@ -423,7 +423,7 @@ What: 
> /sys/devices/system/cpu/cpuX/cpufreq/throttle_stats
>   /sys/devices/system/cpu/cpuX/cpufreq/throttle_stats/occ_reset
>  Date:March 2016
>  Contact: Linux kernel mailing list 
> - Linux for PowerPC mailing list 
> + Linux for PowerPC mailing list 
>  Description: POWERNV CPUFreq driver's frequency throttle stats directory and
>   attributes
>  
> @@ -473,7 +473,7 @@ What: 
> /sys/devices/system/cpu/cpufreq/policyX/throttle_stats
>   /sys/devices/system/cpu/cpufreq/policyX/throttle_stats/occ_reset
>  Date:March 2016
>  Contact: Linux kernel mailing list 
> - Linux for PowerPC mailing list 
> + Linux for PowerPC mailing list 
>  Description: POWERNV CPUFreq driver's frequency throttle stats directory and
>   attributes
>  
> @@ -608,7 +608,7 @@ Description:  Umwait control
>  What:/sys/devices/system/cpu/svm
>  Date:August 2019
>  Contact: Linux kernel mailing list 
> - Linux for PowerPC mailing list 
> + Linux for PowerPC mailing list 
>  Description: Secure Virtual Machine
>  
>   If 1, it means the system is using the Protected Execution
> @@ -617,7 +617,7 @@ Description:  Secure Virtual Machine
>  
>  What:/sys/devices/system/cpu/cpuX/purr
>  Date:Apr 2005
> -Contact: Linux for PowerPC mailing list 
> +Contact: Linux for PowerPC mailing list 
>  Description: PURR ticks for this CPU since the system boot.
>  
>   The Processor Utilization Resources Register (PURR) is
> @@ -628,7 +628,7 @@ Description:  PURR ticks for this CPU since the 
> system boot.
>  
>  What:/sys/devices/system/cpu/cpuX/spurr
>  Date:Dec 2006
> -Contact: Linux for PowerPC mailing list 
> +Contact: Linux for PowerPC mailing list 
>  Description: SPURR ticks for this CPU since the system boot.
>  
>   The Scaled Processor Utilization Resources Register
> @@ -640,7 +640,7 @@ Description:  SPURR ticks for this CPU since the 
> system boot.
>  
>  What:/sys/devices/system/cpu/cpuX/idle_purr
>  Date:Apr 2020
> -Contact: Linux for PowerPC mailing list 
> +Contact: Linux for PowerPC mailing list 
>  Description: PURR ticks for cpuX when it was idle.
>  
>   This sysfs interface exposes the number of PURR ticks
> @@ -648,7 +648,7 @@ Description:  PURR ticks for cpuX when it was idle.
>  
>  What:/sys/devices/system/cpu/cpuX/idle_spurr
>  Date:Apr 2020
> -Contact: Linux for PowerPC mailing list 
> +Contact: Linux for PowerPC mailing list 
>  Description: SPURR ticks for cpuX when it was idle.
>  
>   This sysfs interface exposes the number of SPURR ticks
> diff --git a/Documentation/ABI/testing/sysfs-firmware-opal-powercap 
> b/Documentation/ABI/testing/sysfs-firmware-opal-powercap
> index c9b66ec4f165..d2d12ee89288 100644
> --- a/Documentation/ABI/testing/sysfs-firmware-opal-powercap
> +++ b/Documentation/ABI/testing/sysfs-firmware-opal-powercap
> @@ -1,6 +1,6 @@
>  What:/sys/firmware/opal/powercap
>  Date:August 2017
> -Contact: Linux for PowerPC mailing list 
> +Contact: Linux for PowerPC mailing list 
>  Description: Powercap directory for Powernv (P8, P9) servers
>  
>   Each folder in this directory contains a
> @@ -11,7 +11,7 @@ What:   
> /sys/firmware/opal/powercap/system-powercap
>   /sys/firmware/opal/powercap/system-powercap/powercap-max
>   /sys/firmware/opal/powercap/system-powercap/powercap-current
>  Date:August 2017
> -Contact: Linux for PowerPC mailing list 
> +Contact: Linux for PowerPC mailing list 
>  Description: System powercap directory and attributes applicable for
>   Powernv (P8, P9) servers
>  
> diff --git a/Documentation/ABI/testing/sysfs-firmware-opal-psr 
> b/Documentation/ABI

RE: [PATCH v3 1/3] PCI/AER: Store UNCOR_STATUS bits that might be ANFE in aer_err_info

2024-05-03 Thread Duan, Zhenzhong


>-Original Message-
>From: Jonathan Cameron 
>Subject: Re: [PATCH v3 1/3] PCI/AER: Store UNCOR_STATUS bits that might
>be ANFE in aer_err_info
>
>On Sun, 28 Apr 2024 03:31:11 +
>"Duan, Zhenzhong"  wrote:
>
>> Hi Jonathan,
>>
>> >-Original Message-
>> >From: Jonathan Cameron 
>> >Subject: Re: [PATCH v3 1/3] PCI/AER: Store UNCOR_STATUS bits that
>might
>> >be ANFE in aer_err_info
>> >
>> >On Tue, 23 Apr 2024 02:25:05 +
>> >"Duan, Zhenzhong"  wrote:
>> >
>> >> >-Original Message-
>> >> >From: Jonathan Cameron 
>> >> >Subject: Re: [PATCH v3 1/3] PCI/AER: Store UNCOR_STATUS bits that
>> >might
>> >> >be ANFE in aer_err_info
>> >> >
>> >> >On Wed, 17 Apr 2024 14:14:05 +0800
>> >> >Zhenzhong Duan  wrote:
>> >> >
>> >> >> In some cases the detector of a Non-Fatal Error(NFE) is not the most
>> >> >> appropriate agent to determine the type of the error. For example,
>> >> >> when software performs a configuration read from a non-existent
>> >> >> device or Function, completer will send an ERR_NONFATAL Message.
>> >> >> On some platforms, ERR_NONFATAL results in a System Error, which
>> >> >> breaks normal software probing.
>> >> >>
>> >> >> Advisory Non-Fatal Error(ANFE) is a special case that can be used
>> >> >> in above scenario. It is predominantly determined by the role of the
>> >> >> detecting agent (Requester, Completer, or Receiver) and the specific
>> >> >> error. In such cases, an agent with AER signals the NFE (if enabled)
>> >> >> by sending an ERR_COR Message as an advisory to software, instead
>of
>> >> >> sending ERR_NONFATAL.
>> >> >>
>> >> >> When processing an ANFE, ideally both correctable error(CE) status
>and
>> >> >> uncorrectable error(UE) status should be cleared. However, there is
>no
>> >> >> way to fully identify the UE associated with ANFE. Even worse, a
>Fatal
>> >> >> Error(FE) or Non-Fatal Error(NFE) may set the same UE status bit as
>> >> >> ANFE. Treating an ANFE as NFE will reproduce above mentioned
>issue,
>> >> >> i.e., breaking softwore probing; treating NFE as ANFE will make us
>> >> >> ignoring some UEs which need active recover operation. To avoid
>> >clearing
>> >> >> UEs that are not ANFE by accident, the most conservative route is
>taken
>> >> >> here: If any of the FE/NFE Detected bits is set in Device Status, do
>not
>> >> >> touch UE status, they should be cleared later by the UE handler.
>> >Otherwise,
>> >> >> a specific set of UEs that may be raised as ANFE according to the
>PCIe
>> >> >> specification will be cleared if their corresponding severity is Non-
>Fatal.
>> >> >>
>> >> >> To achieve above purpose, store UNCOR_STATUS bits that might be
>> >ANFE
>> >> >> in aer_err_info.anfe_status. So that those bits could be printed and
>> >> >> processed later.
>> >> >>
>> >> >> Tested-by: Yudong Wang 
>> >> >> Co-developed-by: "Wang, Qingshun"
>
>> >> >> Signed-off-by: "Wang, Qingshun" 
>> >> >> Signed-off-by: Zhenzhong Duan 
>> >> >> ---
>> >> >>  drivers/pci/pci.h  |  1 +
>> >> >>  drivers/pci/pcie/aer.c | 45
>> >> >++
>> >> >>  2 files changed, 46 insertions(+)
>> >> >>
>> >> >> diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
>> >> >> index 17fed1846847..3f9eb807f9fd 100644
>> >> >> --- a/drivers/pci/pci.h
>> >> >> +++ b/drivers/pci/pci.h
>> >> >> @@ -412,6 +412,7 @@ struct aer_err_info {
>> >> >>
>> >> >>unsigned int status;/* COR/UNCOR Error Status
>*/
>> >> >>unsigned int mask;  /* COR/UNCOR Error Mask */
>> >> >> +  unsigned int anfe_status;   /* UNCOR Error Status for
>ANFE */
>> >> >>struct pcie_tlp_log tlp;/* TLP Header */
>> >> >>  };
>> >> >>
>> >> >> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
>> >> >> index ac6293c24976..27364ab4b148 100644
>> >> >> --- a/drivers/pci/pcie/aer.c
>> >> >> +++ b/drivers/pci/pcie/aer.c
>> >> >> @@ -107,6 +107,12 @@ struct aer_stats {
>> >> >>
>   PCI_ERR_ROOT_MULTI_COR_RCV |
>> >> > \
>> >> >>
>   PCI_ERR_ROOT_MULTI_UNCOR_RCV)
>> >> >>
>> >> >> +#define AER_ERR_ANFE_UNC_MASK
>> >> > (PCI_ERR_UNC_POISON_TLP |   \
>> >> >> +  PCI_ERR_UNC_COMP_TIME |
>> >> > \
>> >> >> +  PCI_ERR_UNC_COMP_ABORT
>|
>> >> > \
>> >> >> +  PCI_ERR_UNC_UNX_COMP |
>> >> > \
>> >> >> +  PCI_ERR_UNC_UNSUP)
>> >> >> +
>> >> >>  static int pcie_aer_disable;
>> >> >>  static pci_ers_result_t aer_root_reset(struct pci_dev *dev);
>> >> >>
>> >> >> @@ -1196,6 +1202,41 @@ void aer_recover_queue(int domain,
>> >unsigned
>> >> >int bus, unsigned int devfn,
>> >> >>  EXPORT_SYMBOL_GPL(aer_recover_queue);
>> >> >>  #endif
>> >> >>
>> >> >> +static void anfe_get_uc_status(struct pci_dev *dev, struct
>> >aer_err_info
>> >> >*info)
>> >> >> +{
>> >> >> +  u32 uncor_mask, uncor_status;
>> >> >> +  u16 device_status;
>> >> >> +  int aer = dev->aer

Re: [PATCH] tty: hvc: hvc_opal: eliminate uses of of_node_put()

2024-05-03 Thread Javier Carrasco
On 5/3/24 13:43, Lu Dai wrote:
> Make use of the __free() cleanup handler to automatically free nodes
> when they get out of scope.
> 
> Removes the need for a 'goto' as an effect.
> 
> Signed-off-by: Lu Dai 
> ---
>  drivers/tty/hvc/hvc_opal.c | 9 +++--
>  1 file changed, 3 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/tty/hvc/hvc_opal.c b/drivers/tty/hvc/hvc_opal.c
> index 095c33ad10f8..67e90fa993a3 100644
> --- a/drivers/tty/hvc/hvc_opal.c
> +++ b/drivers/tty/hvc/hvc_opal.c
> @@ -327,14 +327,14 @@ static void udbg_init_opal_common(void)
>  
>  void __init hvc_opal_init_early(void)
>  {
> - struct device_node *stdout_node = of_node_get(of_stdout);
> + struct device_node *stdout_node __free(device_node) = 
> of_node_get(of_stdout);
>   const __be32 *termno;
>   const struct hv_ops *ops;
>   u32 index;
>  
>   /* If the console wasn't in /chosen, try /ibm,opal */
>   if (!stdout_node) {
> - struct device_node *opal, *np;

Generally, you should always initialize the variable where it is
declared. What would happen if the variable goes out of scope before it
gets initialized? Now it is not dangerous, but if new code is added and
it returns because of some error, we might run into trouble.

In this particular case you can solve this easily by putting together
your modification and the assignment right after the comment.


> + struct device_node *opal __free(device_node), *np;
>  
>   /* Current OPAL takeover doesn't provide the stdout
>* path, so we hard wire it
> @@ -356,7 +356,6 @@ void __init hvc_opal_init_early(void)
>   break;
>   }
>   }
> - of_node_put(opal);
>   }
>   if (!stdout_node)
>   return;
> @@ -382,13 +381,11 @@ void __init hvc_opal_init_early(void)
>   hvsilib_establish(&hvc_opal_boot_priv.hvsi);
>   pr_devel("hvc_opal: Found HVSI console\n");
>   } else
> - goto out;
> + return;
>   hvc_opal_boot_termno = index;
>   udbg_init_opal_common();
>   add_preferred_console("hvc", index, NULL);
>   hvc_instantiate(index, index, ops);
> -out:
> - of_node_put(stdout_node);
>  }
>  
>  #ifdef CONFIG_PPC_EARLY_DEBUG_OPAL_RAW


Best regards,
Javier Carrasco


[powerpc:next] BUILD SUCCESS 1fcd254733371cfa5a3602bab5ae2c9dc4bf69e6

2024-05-03 Thread kernel test robot
tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git 
next
branch HEAD: 1fcd254733371cfa5a3602bab5ae2c9dc4bf69e6  MAINTAINERS: MMU GATHER: 
Update Aneesh's address

elapsed time: 1374m

configs tested: 159
configs skipped: 7

The following configs have been built successfully.
More configs may be tested in the coming days.

tested configs:
alpha allnoconfig   gcc  
alphaallyesconfig   gcc  
alpha   defconfig   gcc  
arc  allmodconfig   gcc  
arc   allnoconfig   gcc  
arc  allyesconfig   gcc  
arc defconfig   gcc  
arc   randconfig-001-20240503   gcc  
arc   randconfig-002-20240503   gcc  
arm  allmodconfig   gcc  
arm   allnoconfig   clang
arm  allyesconfig   gcc  
arm defconfig   clang
arm   randconfig-001-20240503   clang
arm   randconfig-002-20240503   clang
arm   randconfig-003-20240503   clang
arm   randconfig-004-20240503   gcc  
arm64allmodconfig   clang
arm64 allnoconfig   gcc  
arm64   defconfig   gcc  
arm64 randconfig-001-20240503   clang
arm64 randconfig-002-20240503   gcc  
arm64 randconfig-003-20240503   clang
arm64 randconfig-004-20240503   clang
csky allmodconfig   gcc  
csky  allnoconfig   gcc  
csky allyesconfig   gcc  
cskydefconfig   gcc  
csky  randconfig-001-20240503   gcc  
csky  randconfig-002-20240503   gcc  
hexagon  allmodconfig   clang
hexagon   allnoconfig   clang
hexagon  allyesconfig   clang
hexagon defconfig   clang
hexagon   randconfig-001-20240503   clang
hexagon   randconfig-002-20240503   clang
i386 allmodconfig   gcc  
i386  allnoconfig   gcc  
i386 allyesconfig   gcc  
i386 buildonly-randconfig-001-20240503   clang
i386 buildonly-randconfig-002-20240503   clang
i386 buildonly-randconfig-003-20240503   gcc  
i386 buildonly-randconfig-004-20240503   gcc  
i386 buildonly-randconfig-005-20240503   gcc  
i386 buildonly-randconfig-006-20240503   clang
i386defconfig   clang
i386  randconfig-001-20240503   gcc  
i386  randconfig-002-20240503   clang
i386  randconfig-004-20240503   gcc  
i386  randconfig-011-20240503   clang
i386  randconfig-012-20240503   gcc  
i386  randconfig-013-20240503   gcc  
i386  randconfig-014-20240503   gcc  
i386  randconfig-015-20240503   gcc  
i386  randconfig-016-20240503   clang
loongarchallmodconfig   gcc  
loongarch allnoconfig   gcc  
loongarch   defconfig   gcc  
loongarch randconfig-001-20240503   gcc  
loongarch randconfig-002-20240503   gcc  
m68k allmodconfig   gcc  
m68k  allnoconfig   gcc  
m68k allyesconfig   gcc  
m68kdefconfig   gcc  
microblaze   allmodconfig   gcc  
microblazeallnoconfig   gcc  
microblaze   allyesconfig   gcc  
microblaze  defconfig   gcc  
mips  allnoconfig   gcc  
mips allyesconfig   gcc  
nios2allmodconfig   gcc  
nios2 allnoconfig   gcc  
nios2allyesconfig   gcc  
nios2   defconfig   gcc  
nios2 randconfig-001-20240503   gcc  
nios2 randconfig-002-20240503   gcc  
openrisc  allnoconfig   gcc  
openrisc allyesconfig   gcc  
openriscdefconfig   gcc  
parisc   allmodconfig   gcc  
pariscallnoconfig   gcc  
parisc   allyesconfig   gcc  
parisc  defconfig   gcc  
pariscrandconfig-001-20240503   gcc  
pariscrandconfig-002-20240503   gcc  
par

Re: [PATCH] tty: hvc: hvc_opal: eliminate uses of of_node_put()

2024-05-03 Thread Greg KH
On Fri, May 03, 2024 at 02:43:30PM +0300, Lu Dai wrote:
> Make use of the __free() cleanup handler to automatically free nodes
> when they get out of scope.
> 
> Removes the need for a 'goto' as an effect.
> 
> Signed-off-by: Lu Dai 
> ---
>  drivers/tty/hvc/hvc_opal.c | 9 +++--
>  1 file changed, 3 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/tty/hvc/hvc_opal.c b/drivers/tty/hvc/hvc_opal.c
> index 095c33ad10f8..67e90fa993a3 100644
> --- a/drivers/tty/hvc/hvc_opal.c
> +++ b/drivers/tty/hvc/hvc_opal.c
> @@ -327,14 +327,14 @@ static void udbg_init_opal_common(void)
>  
>  void __init hvc_opal_init_early(void)
>  {
> - struct device_node *stdout_node = of_node_get(of_stdout);
> + struct device_node *stdout_node __free(device_node) = 
> of_node_get(of_stdout);
>   const __be32 *termno;
>   const struct hv_ops *ops;
>   u32 index;
>  
>   /* If the console wasn't in /chosen, try /ibm,opal */
>   if (!stdout_node) {
> - struct device_node *opal, *np;
> + struct device_node *opal __free(device_node), *np;

*np needs to be on a separate line, right?

thanks,

greg k-h


Re: [PATCH V2] tty: hvc: hvc_opal: eliminate uses of of_node_put()

2024-05-03 Thread Greg KH
On Fri, May 03, 2024 at 04:52:15PM +0300, Lu Dai wrote:
> Make use of the __free() cleanup handler to automatically free nodes
> when they get out of scope.
> 
> Remove the need for a 'goto' as an effect.
> 
> Signed-off-by: Lu Dai 
> ---
> Changes since v1:
> Move the assignment of 'opal' to its declaration
> Seperate the declaration of 'np'
> 
>  drivers/tty/hvc/hvc_opal.c | 13 +
>  1 file changed, 5 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/tty/hvc/hvc_opal.c b/drivers/tty/hvc/hvc_opal.c
> index 095c33ad10f8..c17e8343ea60 100644
> --- a/drivers/tty/hvc/hvc_opal.c
> +++ b/drivers/tty/hvc/hvc_opal.c
> @@ -327,19 +327,18 @@ static void udbg_init_opal_common(void)
>  
>  void __init hvc_opal_init_early(void)
>  {
> - struct device_node *stdout_node = of_node_get(of_stdout);
> + struct device_node *stdout_node __free(device_node) = 
> of_node_get(of_stdout);
>   const __be32 *termno;
>   const struct hv_ops *ops;
>   u32 index;
>  
>   /* If the console wasn't in /chosen, try /ibm,opal */
>   if (!stdout_node) {
> - struct device_node *opal, *np;
> -
>   /* Current OPAL takeover doesn't provide the stdout
>* path, so we hard wire it
>*/
> - opal = of_find_node_by_path("/ibm,opal/consoles");
> + struct device_node *opal __free(device_node) =
> + of_find_node_by_path("/ibm,opal/consoles");
>   if (opal) {

No blank line?

>   pr_devel("hvc_opal: Found consoles in new location\n");
>   } else {
> @@ -350,13 +349,13 @@ void __init hvc_opal_init_early(void)
>   }
>   if (!opal)
>   return;
> + struct device_node *np;
>   for_each_child_of_node(opal, np) {

Ick, no, don't do that please.  Take some time and become more familiar
with kernel coding style and issues, perhaps work in drivers/staging/
first, before attempting to do stuff like this that is not correct.

thanks,

greg k-h


Re: [PATCH v3 00/11] sysctl: treewide: constify ctl_table argument of sysctl handlers

2024-05-03 Thread Thomas Weißschuh
Hey Joel,

On 2024-05-03 11:03:32+, Joel Granados wrote:
> Here is my feedback for your outstanding constification patches [1] and [2].

Thanks!

> # You need to split the patch
> The answer that you got from Jakub in the network subsystem is very clear and
> baring a change of heart from the network folks, this will go in as but as a
> split patchset. Please split it considering the following:
> 1. Create a different patchset for drivers/,  fs/, kernel/, net, and a
>miscellaneous that includes whatever does not fit into the others.
> 2. Consider that this might take several releases.
> 3. Consider the following sufix for the interim function name "_const". Like 
> in
>kfree_const. Please not "_new".

Ack. "_new" was an intentionally unacceptable placeholder.

> 4. Please publish the final result somewhere. This is important so someone can
>take over in case you need to stop.

Will do. Both for each single series and a combination of all of them.

> 5. Consistently mention the motivation in your cover letters. I specify more
>further down in "#Motivation".
> 6. Also mention that this is part of a bigger effort (like you did in your
>original cover letters). I would include [3,4,5,6]
> 7. Include a way to show what made it into .rodata. I specify more further 
> down
>in "#Show the move".
> 
> # Motivation
> As I read it, the motivation for these constification efforts are:
> 1. It provides increased safety: Having things in .rodata section reduces the
>attack surface. This is especially relevant for structures that have 
> function
>pointers (like ctl_table); having these in .rodata means that these 
> pointers
>always point to the "intended" function and cannot be changed.
> 2. Compiler optimizations: This was just a comment in the patchsets that I 
> have
>mentioned ([3,4,5]). Do you know what optimizations specifically? Does it
>have to do with enhancing locality for the data in .rodata? Do you have 
> other
>specific optimizations in mind?

I don't know about anything that would make it faster.
It's more about safety and transmission of intent to API users,
especially callback implementers.

> 3. Readability: because it is easier to know up-front that data is not 
> supposed
>to change or its obvious that a function is re-entrant. Actually a lot of 
> the
>readability reasons is about knowing things "up-front".
> As we move forward with the constification in sysctl, please include a more
> detailed motivation in all your cover letters. This helps maintainers (that
> don't have the context) understand what you are trying to do. It does not need
> to be my three points, but it should be more than just "put things into
> .rodata". Please tell me if I have missed anything in the motivation.

Will do.

> # Show the move
> I created [8] because there is no easy way to validate which objects made it
> into .rodata. I ran [8] for your Dec 2nd patcheset [7] and there are less in
> .rodata than I expected (the results are in [9]) Why is that? Is it something
> that has not been posted to the lists yet? 

Constifying the APIs only *allows* the actual table to be constified
themselves.
Then each table definition will have to be touched and "const" added.

See patches 17 and 18 in [7] for two examples.

Some tables in net/ are already "const" as the static definitions are
never registered themselves but only their copies are.

This seems to explain your findings.

> Best

Thanks!

> [1] 
> https://lore.kernel.org/all/20240423-sysctl-const-handler-v3-0-e0beccb83...@weissschuh.net/
> [2] 
> https://lore.kernel.org/all/20240418-sysctl-const-table-arg-v2-1-4012abc31...@weissschuh.net
> [3] [PATCH v2 00/14] ASoC: Constify local snd_sof_dsp_ops
> 
> https://lore.kernel.org/all/20240426-n-const-ops-var-v2-0-e553fe67a...@kernel.org
> [4] [PATCH v2 00/19] backlight: Constify lcd_ops
> 
> https://lore.kernel.org/all/20240424-video-backlight-lcd-ops-v2-0-1aaa82b07...@kernel.org
> [5] [PATCH 1/4] iommu: constify pointer to bus_type
> 
> https://lore.kernel.org/all/20240216144027.185959-1-krzysztof.kozlow...@linaro.org
> [6] [PATCH 00/29] const xattr tables
> https://lore.kernel.org/all/20230930050033.41174-1-wedso...@gmail.com
> [7] 
> https://lore.kernel.org/all/20231204-const-sysctl-v2-0-7a5060b11...@weissschuh.net/
> 
> [8]

[snip]

> [9]
> section: .rodataobj_name : kern_table
> section: .rodataobj_name : sysctl_mount_point
> section: .rodataobj_name : addrconf_sysctl
> section: .rodataobj_name : ax25_param_table
> section: .rodataobj_name : mpls_table
> section: .rodataobj_name : mpls_dev_table
> section: .data  obj_name : sld_sysctls
> section: .data  obj_name : kern_panic_table
> section: .data  obj_name : kern_exit_table
> section: .data  obj_name : vm_table
> section: .data 

Re: [PATCH v3 0/3] arch: Remove fbdev dependency from video helpers

2024-05-03 Thread Arnd Bergmann
On Fri, Apr 5, 2024, at 11:04, Thomas Zimmermann wrote:
> Hi,
>
> if there are no further comments, can this series be merged through 
> asm-generic?

Sorry for the delay, I've merged these for asm-generic now.

  Arnd


Re: [PATCH v3 0/3] arch: Remove fbdev dependency from video helpers

2024-05-03 Thread Thomas Zimmermann




Am 03.05.24 um 17:29 schrieb Arnd Bergmann:

On Fri, Apr 5, 2024, at 11:04, Thomas Zimmermann wrote:

Hi,

if there are no further comments, can this series be merged through
asm-generic?

Sorry for the delay, I've merged these for asm-generic now.


Thank you so much!



   Arnd



--
--
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Frankenstrasse 146, 90461 Nuernberg, Germany
GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman
HRB 36809 (AG Nuernberg)



Re: [PATCH] [RFC] scsi: Convert from tasklet to BH workqueue

2024-05-03 Thread Allen Pais



> On May 2, 2024, at 7:03 PM, Michael Ellerman  wrote:
> 
> Allen Pais  writes:
>> The only generic interface to execute asynchronously in the BH context is
>> tasklet; however, it's marked deprecated and has some design flaws. To
>> replace tasklets, BH workqueue support was recently added. A BH workqueue
>> behaves similarly to regular workqueues except that the queued work items
>> are executed in the BH context.
>> 
>> This patch converts drivers/scsi/* from tasklet to BH workqueue.
>> 
>> Based on the work done by Tejun Heo 
>> Branch: https://git.kernel.org/pub/scm/linux/kernel/git/tj/wq.git for-6.10
>> 
>> Signed-off-by: Allen Pais 
>> ---
>> drivers/scsi/aic7xxx/aic7xxx_osm.c  |  2 +-
>> drivers/scsi/aic94xx/aic94xx_hwi.c  | 14 ++--
>> drivers/scsi/aic94xx/aic94xx_hwi.h  |  5 +-
>> drivers/scsi/aic94xx/aic94xx_scb.c  | 36 +-
>> drivers/scsi/aic94xx/aic94xx_task.c | 14 ++--
>> drivers/scsi/aic94xx/aic94xx_tmf.c  | 34 -
>> drivers/scsi/esas2r/esas2r.h| 12 ++--
>> drivers/scsi/esas2r/esas2r_init.c   | 14 ++--
>> drivers/scsi/esas2r/esas2r_int.c| 18 ++---
>> drivers/scsi/esas2r/esas2r_io.c |  2 +-
>> drivers/scsi/esas2r/esas2r_main.c   | 16 ++---
>> drivers/scsi/ibmvscsi/ibmvfc.c  | 16 ++---
>> drivers/scsi/ibmvscsi/ibmvfc.h  |  3 +-
>> drivers/scsi/ibmvscsi/ibmvscsi.c| 16 ++---
>> drivers/scsi/ibmvscsi/ibmvscsi.h|  3 +-
>> drivers/scsi/ibmvscsi_tgt/ibmvscsi_tgt.c| 15 ++--
>> drivers/scsi/ibmvscsi_tgt/ibmvscsi_tgt.h|  3 +-
> 
> Something there is giving me a build failure (ppc64le_guest_defconfig):
> 
>  + make -s 'CC=ccache powerpc64le-linux-gnu-gcc' -j 4
>  /linux/drivers/scsi/ibmvscsi/ibmvscsi.c: In function 
> 'ibmvscsi_init_crq_queue':
>  Error: /linux/drivers/scsi/ibmvscsi/ibmvscsi.c:370:331: error: 
> 'ibmvscsi_work' undeclared (first use in this function)
>  /linux/drivers/scsi/ibmvscsi/ibmvscsi.c:370:331: note: each undeclared 
> identifier is reported only once for each function it appears in
>  /linux/scripts/Makefile.build:244: recipe for target 
> 'drivers/scsi/ibmvscsi/ibmvscsi.o' failed
>  /linux/scripts/Makefile.build:485: recipe for target 'drivers/scsi/ibmvscsi' 
> failed
>  /linux/scripts/Makefile.build:485: recipe for target 'drivers/scsi' failed
>  /linux/scripts/Makefile.build:485: recipe for target 'drivers' failed
>  /linux/drivers/scsi/ibmvscsi/ibmvscsi.c: In function 'ibmvscsi_probe':
>  Error: /linux/drivers/scsi/ibmvscsi/ibmvscsi.c:2255:78: error: passing 
> argument 1 of 'kthread_create_on_node' from incompatible pointer type 
> [-Werror=incompatible-pointer-types]
>  In file included from /linux/drivers/scsi/ibmvscsi/ibmvscsi.c:56:0:
>  /linux/include/linux/kthread.h:11:21: note: expected 'int (*)(void *)' but 
> argument is of type 'int (*)(struct work_struct *)'
>   struct task_struct *kthread_create_on_node(int (*threadfn)(void *data),
>   ^
>  /linux/drivers/scsi/ibmvscsi/ibmvscsi.c: At top level:
>  Warning: /linux/drivers/scsi/ibmvscsi/ibmvscsi.c:212:13: warning: 
> 'ibmvscsi_task' defined but not used [-Wunused-function]
>   static void ibmvscsi_task(void *data)
>   ^
>  Warning: cc1: warning: unrecognized command line option 
> '-Wno-shift-negative-value'
>  Warning: cc1: warning: unrecognized command line option 
> '-Wno-stringop-overflow'
>  cc1: some warnings being treated as errors
>  make[6]: *** [drivers/scsi/ibmvscsi/ibmvscsi.o] Error 1
>  make[5]: *** [drivers/scsi/ibmvscsi] Error 2
>  make[4]: *** [drivers/scsi] Error 2
>  make[3]: *** [drivers] Error 2
>  make[3]: *** Waiting for unfinished jobs
> 
> Full log here: 
> https://github.com/linuxppc/linux-snowpatch/actions/runs/8930174372/job/24529645923

 Thank you for testing it out. Unfortunately, I did not cross-compile it.
Will fix this in v2.

- Allen

> 
> Cross compile instructions if you're keen: 
> https://github.com/linuxppc/wiki/wiki/Building-powerpc-kernels
> 
> cheers



Re: [PATCH v4 02/29] x86/mm: add ARCH_PKEY_BITS to Kconfig

2024-05-03 Thread Dave Hansen
On 5/3/24 06:01, Joey Gouly wrote:
> The new config option specifies how many bits are in each PKEY.

Acked-by: Dave Hansen 


Re: [PATCH v4 03/29] mm: use ARCH_PKEY_BITS to define VM_PKEY_BITN

2024-05-03 Thread Dave Hansen
On 5/3/24 06:01, Joey Gouly wrote:
>  #ifdef CONFIG_ARCH_HAS_PKEYS
> -# define VM_PKEY_SHIFT   VM_HIGH_ARCH_BIT_0
> -# define VM_PKEY_BIT0VM_HIGH_ARCH_0  /* A protection key is a 4-bit 
> value */
> -# define VM_PKEY_BIT1VM_HIGH_ARCH_1  /* on x86 and 5-bit value on 
> ppc64   */
> -# define VM_PKEY_BIT2VM_HIGH_ARCH_2
> -# define VM_PKEY_BIT3VM_HIGH_ARCH_3
> -#ifdef CONFIG_PPC
> +# define VM_PKEY_SHIFT VM_HIGH_ARCH_BIT_0
> +# define VM_PKEY_BIT0  VM_HIGH_ARCH_0
> +# define VM_PKEY_BIT1  VM_HIGH_ARCH_1
> +# define VM_PKEY_BIT2  VM_HIGH_ARCH_2
> +#if CONFIG_ARCH_PKEY_BITS > 3
> +# define VM_PKEY_BIT3  VM_HIGH_ARCH_3
> +#else
> +# define VM_PKEY_BIT3  0
> +#endif
> +#if CONFIG_ARCH_PKEY_BITS > 4

It's certainly not pretty, but it does get the arch #ifdef out of
generic code.  We might need to rethink this if we get another
architecture or two, but this seems manageable for now.

Acked-by: Dave Hansen 


[PATCH v3 3/3] KVM: Mark a vCPU as preempted/ready iff it's scheduled out while running

2024-05-03 Thread David Matlack
Mark a vCPU as preempted/ready if-and-only-if it's scheduled out while
running. i.e. Do not mark a vCPU preempted/ready if it's scheduled out
during a non-KVM_RUN ioctl() or when userspace is doing KVM_RUN with
immediate_exit.

Commit 54aa83c90198 ("KVM: x86: do not set st->preempted when going back
to user space") stopped marking a vCPU as preempted when returning to
userspace, but if userspace then invokes a KVM vCPU ioctl() that gets
preempted, the vCPU will be marked preempted/ready. This is arguably
incorrect behavior since the vCPU was not actually preempted while the
guest was running, it was preempted while doing something on behalf of
userspace.

This commit also avoids KVM dirtying guest memory after userspace has
paused vCPUs, e.g. for Live Migration, which allows userspace to collect
the final dirty bitmap before or in parallel with saving vCPU state
without having to worry about saving vCPU state triggering writes to
guest memory.

Suggested-by: Sean Christopherson 
Signed-off-by: David Matlack 
---
 virt/kvm/kvm_main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 2b29851a90bd..3973e62acc7c 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -6302,7 +6302,7 @@ static void kvm_sched_out(struct preempt_notifier *pn,
 {
struct kvm_vcpu *vcpu = preempt_notifier_to_vcpu(pn);
 
-   if (current->on_rq) {
+   if (current->on_rq && vcpu->wants_to_run) {
WRITE_ONCE(vcpu->preempted, true);
WRITE_ONCE(vcpu->ready, true);
}
-- 
2.45.0.rc1.225.g2a3ae87e7f-goog



[PATCH v3 2/3] KVM: Ensure new code that references immediate_exit gets extra scrutiny

2024-05-03 Thread David Matlack
Ensure that any new KVM code that references immediate_exit gets extra
scrutiny by renaming it to immediate_exit__unsafe in kernel code.

All fields in struct kvm_run are subject to TOCTOU races since they are
mapped into userspace, which may be malicious or buggy. To protect KVM,
this commit introduces a new macro that appends __unsafe to field names
in struct kvm_run, hinting to developers and reviewers that accessing
this field must be done carefully.

Apply the new macro to immediate_exit, since userspace can make
immediate_exit inconsistent with vcpu->wants_to_run, i.e. accessing
immediate_exit directly could lead to unexpected bugs in the future.

Signed-off-by: David Matlack 
---
 include/uapi/linux/kvm.h | 15 ++-
 virt/kvm/kvm_main.c  |  2 +-
 2 files changed, 15 insertions(+), 2 deletions(-)

diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 2190adbe3002..3611ad3b9c2a 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -192,11 +192,24 @@ struct kvm_xen_exit {
 /* Flags that describe what fields in emulation_failure hold valid data. */
 #define KVM_INTERNAL_ERROR_EMULATION_FLAG_INSTRUCTION_BYTES (1ULL << 0)
 
+/*
+ * struct kvm_run can be modified by userspace at any time, so KVM must be
+ * careful to avoid TOCTOU bugs. In order to protect KVM, HINT_UNSAFE_IN_KVM()
+ * renames fields in struct kvm_run from  to __unsafe when
+ * compiled into the kernel, ensuring that any use within KVM is obvious and
+ * gets extra scrutiny.
+ */
+#ifdef __KERNEL__
+#define HINT_UNSAFE_IN_KVM(_symbol) _symbol##__unsafe
+#else
+#define HINT_UNSAFE_IN_KVM(_symbol) _symbol
+#endif
+
 /* for KVM_RUN, returned by mmap(vcpu_fd, offset=0) */
 struct kvm_run {
/* in */
__u8 request_interrupt_window;
-   __u8 immediate_exit;
+   __u8 HINT_UNSAFE_IN_KVM(immediate_exit);
__u8 padding1[6];
 
/* out */
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index bdea5b978f80..2b29851a90bd 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -4425,7 +4425,7 @@ static long kvm_vcpu_ioctl(struct file *filp,
synchronize_rcu();
put_pid(oldpid);
}
-   vcpu->wants_to_run = !READ_ONCE(vcpu->run->immediate_exit);
+   vcpu->wants_to_run = 
!READ_ONCE(vcpu->run->immediate_exit__unsafe);
r = kvm_arch_vcpu_ioctl_run(vcpu);
vcpu->wants_to_run = false;
 
-- 
2.45.0.rc1.225.g2a3ae87e7f-goog



[PATCH v3 1/3] KVM: Introduce vcpu->wants_to_run

2024-05-03 Thread David Matlack
Introduce vcpu->wants_to_run to indicate when a vCPU is in its core run
loop, i.e. when the vCPU is running the KVM_RUN ioctl and immediate_exit
was not set.

Replace all references to vcpu->run->immediate_exit with
!vcpu->wants_to_run to avoid TOCTOU races with userspace. For example, a
malicious userspace could invoked KVM_RUN with immediate_exit=true and
then after KVM reads it to set wants_to_run=false, flip it to false.
This would result in the vCPU running in KVM_RUN with
wants_to_run=false. This wouldn't cause any real bugs today but is a
dangerous landmine.

Signed-off-by: David Matlack 
---
 arch/arm64/kvm/arm.c   | 2 +-
 arch/loongarch/kvm/vcpu.c  | 2 +-
 arch/mips/kvm/mips.c   | 2 +-
 arch/powerpc/kvm/powerpc.c | 2 +-
 arch/riscv/kvm/vcpu.c  | 2 +-
 arch/s390/kvm/kvm-s390.c   | 2 +-
 arch/x86/kvm/x86.c | 4 ++--
 include/linux/kvm_host.h   | 1 +
 virt/kvm/kvm_main.c| 3 +++
 9 files changed, 12 insertions(+), 8 deletions(-)

diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index c4a0a35e02c7..c587e5d9396e 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -986,7 +986,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
 
vcpu_load(vcpu);
 
-   if (run->immediate_exit) {
+   if (!vcpu->wants_to_run) {
ret = -EINTR;
goto out;
}
diff --git a/arch/loongarch/kvm/vcpu.c b/arch/loongarch/kvm/vcpu.c
index 3a8779065f73..847ef54f3a84 100644
--- a/arch/loongarch/kvm/vcpu.c
+++ b/arch/loongarch/kvm/vcpu.c
@@ -1163,7 +1163,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
kvm_complete_iocsr_read(vcpu, run);
}
 
-   if (run->immediate_exit)
+   if (!vcpu->wants_to_run)
return r;
 
/* Clear exit_reason */
diff --git a/arch/mips/kvm/mips.c b/arch/mips/kvm/mips.c
index 231ac052b506..f1a99962027a 100644
--- a/arch/mips/kvm/mips.c
+++ b/arch/mips/kvm/mips.c
@@ -436,7 +436,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
vcpu->mmio_needed = 0;
}
 
-   if (vcpu->run->immediate_exit)
+   if (!vcpu->wants_to_run)
goto out;
 
lose_fpu(1);
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index d32abe7fe6ab..961aadc71de2 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -1852,7 +1852,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
 
kvm_sigset_activate(vcpu);
 
-   if (run->immediate_exit)
+   if (!vcpu->wants_to_run)
r = -EINTR;
else
r = kvmppc_vcpu_run(vcpu);
diff --git a/arch/riscv/kvm/vcpu.c b/arch/riscv/kvm/vcpu.c
index b5ca9f2e98ac..3d8349470ee6 100644
--- a/arch/riscv/kvm/vcpu.c
+++ b/arch/riscv/kvm/vcpu.c
@@ -711,7 +711,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
return ret;
}
 
-   if (run->immediate_exit) {
+   if (!vcpu->wants_to_run) {
kvm_vcpu_srcu_read_unlock(vcpu);
return -EINTR;
}
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index 5147b943a864..b1ea25aacbf9 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -5033,7 +5033,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
if (vcpu->kvm->arch.pv.dumping)
return -EINVAL;
 
-   if (kvm_run->immediate_exit)
+   if (!vcpu->wants_to_run)
return -EINTR;
 
if (kvm_run->kvm_valid_regs & ~KVM_SYNC_S390_VALID_FIELDS ||
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 2d2619d3eee4..f70ae1558684 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -11396,7 +11396,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
 
kvm_vcpu_srcu_read_lock(vcpu);
if (unlikely(vcpu->arch.mp_state == KVM_MP_STATE_UNINITIALIZED)) {
-   if (kvm_run->immediate_exit) {
+   if (!vcpu->wants_to_run) {
r = -EINTR;
goto out;
}
@@ -11474,7 +11474,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
WARN_ON_ONCE(vcpu->mmio_needed);
}
 
-   if (kvm_run->immediate_exit) {
+   if (!vcpu->wants_to_run) {
r = -EINTR;
goto out;
}
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index afbc99264ffa..f9b9ce0c3cd9 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -380,6 +380,7 @@ struct kvm_vcpu {
bool dy_eligible;
} spin_loop;
 #endif
+   bool wants_to_run;
bool preempted;
bool ready;
struct kvm_vcpu_arch arch;
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 38b498669ef9..bdea5b978f80 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -4425,7 +4425,10 @@ static long kvm_vcpu_ioctl(struct file *filp,
synchronize_rcu();

[PATCH v3 0/3] KVM: Set vcpu->preempted/ready iff scheduled out while running

2024-05-03 Thread David Matlack
This series changes KVM to mark a vCPU as preempted/ready if-and-only-if
it's scheduled out while running. i.e. Do not mark a vCPU
preempted/ready if it's scheduled out during a non-KVM_RUN ioctl() or
when userspace is doing KVM_RUN with immediate_exit=true.

This is a logical extension of commit 54aa83c90198 ("KVM: x86: do not
set st->preempted when going back to user space"), which  stopped
marking a vCPU as preempted when returning to userspace. But if userspace
invokes a KVM vCPU ioctl() that gets preempted, the vCPU will be marked
preempted/ready. This is arguably incorrect behavior since the vCPU was
not actually preempted while the guest was running, it was preempted
while doing something on behalf of userspace.

In practice, this avoids KVM dirtying guest memory via the steal time
page after userspace has paused vCPUs, e.g. for Live Migration, which
allows userspace to collect the final dirty bitmap before or in parallel
with saving vCPU state without having to worry about saving vCPU state
triggering writes to guest memory.

Patch 1 introduces vcpu->wants_to_run to allow KVM to detect when a vCPU
is in its core run loop.

Patch 2 renames immediated_exit to immediated_exit__unsafe within KVM to
ensure that any new references get extra scrutiny.

Patch 3 perform leverages vcpu->wants_to_run to contrain when
vcpu->preempted and vcpu->ready are set.

v3:
 - Use READ_ONCE() to read immediate_exit [Sean]
 - Replace use of immediate_exit with !wants_to_run to avoid TOCTOU [Sean]
 - Hide/Rename immediate_exit in KVM to harden against TOCTOU bugs [Sean]

v2: https://lore.kernel.org/kvm/20240307163541.92138-1-dmatl...@google.com/
 - Drop Google-specific "PRODKERNEL: " shortlog prefix [me]

v1: https://lore.kernel.org/kvm/20231218185850.1659570-1-dmatl...@google.com/

David Matlack (3):
  KVM: Introduce vcpu->wants_to_run
  KVM: Ensure new code that references immediate_exit gets extra
scrutiny
  KVM: Mark a vCPU as preempted/ready iff it's scheduled out while
running

 arch/arm64/kvm/arm.c   |  2 +-
 arch/loongarch/kvm/vcpu.c  |  2 +-
 arch/mips/kvm/mips.c   |  2 +-
 arch/powerpc/kvm/powerpc.c |  2 +-
 arch/riscv/kvm/vcpu.c  |  2 +-
 arch/s390/kvm/kvm-s390.c   |  2 +-
 arch/x86/kvm/x86.c |  4 ++--
 include/linux/kvm_host.h   |  1 +
 include/uapi/linux/kvm.h   | 15 ++-
 virt/kvm/kvm_main.c|  5 -
 10 files changed, 27 insertions(+), 10 deletions(-)


base-commit: 296655d9bf272cfdd9d2211d099bcb8a61b93037
-- 
2.45.0.rc1.225.g2a3ae87e7f-goog



Re: [RFC PATCH v2 0/6] powerpc: pSeries: vfio: iommu: Re-enable support for SPAPR TCE VFIO

2024-05-03 Thread Shivaprasad G Bhat

On 5/2/24 06:59, Alexey Kardashevskiy wrote:



On 2/5/24 00:09, Jason Gunthorpe wrote:

On Tue, Apr 30, 2024 at 03:05:34PM -0500, Shivaprasad G Bhat wrote:

RFC v1 was posted here [1]. As I was testing more and fixing the
issues, I realized its clean to have the table_group_ops implemented
the way it is done on PowerNV and stop 'borrowing' the DMA windows
for pSeries.

This patch-set implements the iommu table_group_ops for pSeries for
VFIO SPAPR TCE sub-driver thereby enabling the VFIO support on POWER
pSeries machines.


Wait, did they previously not have any support?

>

Again, this TCE stuff needs to go away, not grow. I can grudgingly
accept fixing it where it used to work, but not enabling more HW that
never worked before! :(



This used to work when I tried last time 2+ years ago, not a new 
stuff. Thanks,



Thanks Alexey for pitching in.


Hi Jason,


As Alexey implied, this used to work in the past.


The support for pSeries VFIO exists for a long time, and the support
for VFIO_SPAPR_TCE_v2_IOMMU also was added with
9d67c9433509 ("powerpc/iommu: Add "borrowing" iommu_table_group_ops")


The commit 090bad39b237a ("powerpc/powernv: Add indirect levels to 
it_userspace")

broke the userspace view for pSeries, which the Patch 6 here tries to
bring back.


We found more issues with 9d67c9433509 and I felt its
better to stop "borrowing" the DMA windows as that would be
cleaner which is what is done in Patch 6.


In this process we discovered few bugs in upstream as well, which
we have been trying to fix and have posted few of fixes earlier like,
d2d00e15808 powerpc: iommu: Bring back table group release_ownership() call
83b3836bf83 iommu: Allow ops->default_domain to work when !CONFIG_IOMMU_DMA


So, this patch series tries to fix some more issues(patch 2, 4, 6)
coupled with some code refactoring(1, 3, 5 & 6) to stop "borrowing"
DMA windows.


We have legacy workloads using VFIO in userspace/kvm guests running
on downstream distro kernels. We want these workloads to be able to
continue running on our arch.


Going forward we are planning to have the IOMMUFD support for PPC64,
I firmly believe the refactoring in this patch series is a step in
that direction.


Thanks,
Shivaprasad


Re: [PATCH 1/3] powerpc/mm: Align memory_limit value specified using mem= kernel parameter

2024-05-03 Thread Joel Savitz
On Thu, May 2, 2024 at 10:20 PM Michael Ellerman  wrote:
>
> Joel Savitz  writes:
> > On Wed, Apr 17, 2024 at 10:36 AM Joel Savitz  wrote:
> >>
> >> Acked-by: Joel Savitz 
> >>
> >
> > Hi,
> >
> > What is the status of this? This patch fixes a bug where a powerpc
> > machine hangs at boot when passed an unaligned value in the mem=
> > kernel parameter.
>
> It's in linux-next for v6.10
>
> cheers
>

Thanks!

Best,
Joel Savitz



Re: [PATCH v3 1/2] PCI: Add TLP Prefix reading into pcie_read_tlp_log()

2024-05-03 Thread Bjorn Helgaas
On Fri, Apr 12, 2024 at 04:36:34PM +0300, Ilpo Järvinen wrote:
> pcie_read_tlp_log() handles only 4 TLP Header Log DWORDs but TLP Prefix
> Log (PCIe r6.1 secs 7.8.4.12 & 7.9.14.13) may also be present.
> 
> Generalize pcie_read_tlp_log() and struct pcie_tlp_log to handle also
> TLP Prefix Log. The layout of relevant registers in AER and DPC
> Capability is not identical because the offsets of TLP Header Log and
> TLP Prefix Log vary so the callers must pass the offsets to
> pcie_read_tlp_log().

I think the layouts of the Header Log and the TLP Prefix Log *are*
identical, but they are at different offsets in the AER Capability vs
the DPC Capability.  Lukas and I have both stumbled over this.

Similar and more comments at:
https://lore.kernel.org/r/20240322193011.GA701027@bhelgaas

> Convert eetlp_prefix_path into integer called eetlp_prefix_max and
> make is available also when CONFIG_PCI_PASID is not configured to
> be able to determine the number of E-E Prefixes.

s/make is/make it/

I think this could be a separate patch.

> --- a/include/linux/aer.h
> +++ b/include/linux/aer.h
> @@ -20,6 +20,7 @@ struct pci_dev;
>  
>  struct pcie_tlp_log {
>   u32 dw[4];
> + u32 prefix[4];
>  };
>  
>  struct aer_capability_regs {
> @@ -37,7 +38,9 @@ struct aer_capability_regs {
>   u16 uncor_err_source;
>  };
>  
> -int pcie_read_tlp_log(struct pci_dev *dev, int where, struct pcie_tlp_log 
> *log);
> +int pcie_read_tlp_log(struct pci_dev *dev, int where, int where2,
> +   unsigned int tlp_len, struct pcie_tlp_log *log);
> +unsigned int aer_tlp_log_len(struct pci_dev *dev);

I think it was a mistake to expose pcie_read_tlp_log() outside
drivers/pci, and I don't think we should expose aer_tlp_log_len()
either.

We might be stuck with exposing struct pcie_tlp_log since it looks
like ras_event.h uses it.

Bjorn


[PATCH] tty: hvc: hvc_opal: eliminate uses of of_node_put()

2024-05-03 Thread Lu Dai
Make use of the __free() cleanup handler to automatically free nodes
when they get out of scope.

Removes the need for a 'goto' as an effect.

Signed-off-by: Lu Dai 
---
 drivers/tty/hvc/hvc_opal.c | 9 +++--
 1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/drivers/tty/hvc/hvc_opal.c b/drivers/tty/hvc/hvc_opal.c
index 095c33ad10f8..67e90fa993a3 100644
--- a/drivers/tty/hvc/hvc_opal.c
+++ b/drivers/tty/hvc/hvc_opal.c
@@ -327,14 +327,14 @@ static void udbg_init_opal_common(void)
 
 void __init hvc_opal_init_early(void)
 {
-   struct device_node *stdout_node = of_node_get(of_stdout);
+   struct device_node *stdout_node __free(device_node) = 
of_node_get(of_stdout);
const __be32 *termno;
const struct hv_ops *ops;
u32 index;
 
/* If the console wasn't in /chosen, try /ibm,opal */
if (!stdout_node) {
-   struct device_node *opal, *np;
+   struct device_node *opal __free(device_node), *np;
 
/* Current OPAL takeover doesn't provide the stdout
 * path, so we hard wire it
@@ -356,7 +356,6 @@ void __init hvc_opal_init_early(void)
break;
}
}
-   of_node_put(opal);
}
if (!stdout_node)
return;
@@ -382,13 +381,11 @@ void __init hvc_opal_init_early(void)
hvsilib_establish(&hvc_opal_boot_priv.hvsi);
pr_devel("hvc_opal: Found HVSI console\n");
} else
-   goto out;
+   return;
hvc_opal_boot_termno = index;
udbg_init_opal_common();
add_preferred_console("hvc", index, NULL);
hvc_instantiate(index, index, ops);
-out:
-   of_node_put(stdout_node);
 }
 
 #ifdef CONFIG_PPC_EARLY_DEBUG_OPAL_RAW
-- 
2.39.2



[PATCH v4 01/29] powerpc/mm: add ARCH_PKEY_BITS to Kconfig

2024-05-03 Thread Joey Gouly
The new config option specifies how many bits are in each PKEY.

Signed-off-by: Joey Gouly 
Cc: Michael Ellerman 
Cc: Nicholas Piggin 
Cc: Christophe Leroy 
Cc: "Aneesh Kumar K.V" 
Cc: "Naveen N. Rao" 
Cc: linuxppc-dev@lists.ozlabs.org
---
 arch/powerpc/Kconfig | 4 
 1 file changed, 4 insertions(+)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 1c4be3373686..6e33e4726856 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -1020,6 +1020,10 @@ config PPC_MEM_KEYS
 
  If unsure, say y.
 
+config ARCH_PKEY_BITS
+   int
+   default 5
+
 config PPC_SECURE_BOOT
prompt "Enable secure boot support"
bool
-- 
2.25.1



[PATCH v4 00/29] arm64: Permission Overlay Extension

2024-05-03 Thread Joey Gouly
Hi all,

This series implements the Permission Overlay Extension introduced in 2022
VMSA enhancements [1]. It is based on v6.9-rc5.

One possible issue with this version, I took the last bit of HWCAP2.

Changes since v3[2]:
- Moved Kconfig to nearer the end of the series
- Reworked MMU Fault path, to check for POE faults earlier, under the 
mm lock
- Rework VM_FLAGS to use Kconfig option
- Don't check POR_EL0 in MTE sync tags function
- Reworked KVM to fit into VNCR/VM configuration changes
- Use new AT instruction in KVM
- Rebase onto v6.9-rc5

The Permission Overlay Extension allows to constrain permissions on memory
regions. This can be used from userspace (EL0) without a system call or TLB
invalidation.

POE is used to implement the Memory Protection Keys [3] Linux syscall.

The first few patches add the basic framework, then the PKEYS interface is
implemented, and then the selftests are made to work on arm64.

I have tested the modified protection_keys test on x86_64, but not PPC.
I haven't build tested the x86/ppc arch changes.

Thanks,
Joey

Joey Gouly (29):
  powerpc/mm: add ARCH_PKEY_BITS to Kconfig
  x86/mm: add ARCH_PKEY_BITS to Kconfig
  mm: use ARCH_PKEY_BITS to define VM_PKEY_BITN
  arm64: disable trapping of POR_EL0 to EL2
  arm64: cpufeature: add Permission Overlay Extension cpucap
  arm64: context switch POR_EL0 register
  KVM: arm64: Save/restore POE registers
  KVM: arm64: make kvm_at() take an OP_AT_*
  KVM: arm64: use `at s1e1a` for POE
  arm64: enable the Permission Overlay Extension for EL0
  arm64: re-order MTE VM_ flags
  arm64: add POIndex defines
  arm64: convert protection key into vm_flags and pgprot values
  arm64: mask out POIndex when modifying a PTE
  arm64: handle PKEY/POE faults
  arm64: add pte_access_permitted_no_overlay()
  arm64: implement PKEYS support
  arm64: add POE signal support
  arm64: enable PKEY support for CPUs with S1POE
  arm64: enable POE and PIE to coexist
  arm64/ptrace: add support for FEAT_POE
  arm64: add Permission Overlay Extension Kconfig
  kselftest/arm64: move get_header()
  selftests: mm: move fpregs printing
  selftests: mm: make protection_keys test work on arm64
  kselftest/arm64: add HWCAP test for FEAT_S1POE
  kselftest/arm64: parse POE_MAGIC in a signal frame
  kselftest/arm64: Add test case for POR_EL0 signal frame records
  KVM: selftests: get-reg-list: add Permission Overlay registers

 Documentation/arch/arm64/elf_hwcaps.rst   |   2 +
 arch/arm64/Kconfig|  22 +++
 arch/arm64/include/asm/cpufeature.h   |   6 +
 arch/arm64/include/asm/el2_setup.h|  10 +-
 arch/arm64/include/asm/hwcap.h|   1 +
 arch/arm64/include/asm/kvm_asm.h  |   3 +-
 arch/arm64/include/asm/kvm_host.h |   4 +
 arch/arm64/include/asm/mman.h |   8 +-
 arch/arm64/include/asm/mmu.h  |   1 +
 arch/arm64/include/asm/mmu_context.h  |  51 ++-
 arch/arm64/include/asm/pgtable-hwdef.h|  10 ++
 arch/arm64/include/asm/pgtable-prot.h |   8 +-
 arch/arm64/include/asm/pgtable.h  |  34 -
 arch/arm64/include/asm/pkeys.h| 110 ++
 arch/arm64/include/asm/por.h  |  33 +
 arch/arm64/include/asm/processor.h|   1 +
 arch/arm64/include/asm/sysreg.h   |   3 +
 arch/arm64/include/asm/traps.h|   1 +
 arch/arm64/include/asm/vncr_mapping.h |   1 +
 arch/arm64/include/uapi/asm/hwcap.h   |   1 +
 arch/arm64/include/uapi/asm/sigcontext.h  |   7 +
 arch/arm64/kernel/cpufeature.c|  23 +++
 arch/arm64/kernel/cpuinfo.c   |   1 +
 arch/arm64/kernel/process.c   |  28 
 arch/arm64/kernel/ptrace.c|  46 ++
 arch/arm64/kernel/signal.c|  52 +++
 arch/arm64/kernel/traps.c |  12 +-
 arch/arm64/kvm/hyp/include/hyp/fault.h|   5 +-
 arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h|  29 
 arch/arm64/kvm/sys_regs.c |   8 +-
 arch/arm64/mm/fault.c |  56 ++-
 arch/arm64/mm/mmap.c  |   9 ++
 arch/arm64/mm/mmu.c   |  40 +
 arch/arm64/tools/cpucaps  |   1 +
 arch/powerpc/Kconfig  |   4 +
 arch/x86/Kconfig  |   4 +
 fs/proc/task_mmu.c|   2 +
 include/linux/mm.h|  20 ++-
 include/uapi/linux/elf.h  |   1 +
 tools/testing/selftests/arm64/abi/hwcap.c |  14 ++
 .../testing/selftests/arm64/signal/.gitignore |   1 +
 .../arm64/signal/testcases/poe_siginfo.c  |  86 +++
 .../arm64/signal/testcases/testcases.c|  27 +---
 .../arm64/signal/testcases/testcases.h|  28 +++-
 .../selftests/kvm/aarch64/get-reg-li

[PATCH v4 02/29] x86/mm: add ARCH_PKEY_BITS to Kconfig

2024-05-03 Thread Joey Gouly
The new config option specifies how many bits are in each PKEY.

Signed-off-by: Joey Gouly 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: Borislav Petkov 
Cc: Dave Hansen 
Cc: H. Peter Anvin 
Cc: x...@kernel.org
---
 arch/x86/Kconfig | 4 
 1 file changed, 4 insertions(+)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 928820e61cb5..109e767d36e7 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1879,6 +1879,10 @@ config X86_INTEL_MEMORY_PROTECTION_KEYS
 
  If unsure, say y.
 
+config ARCH_PKEY_BITS
+   int
+   default 4
+
 choice
prompt "TSX enable mode"
depends on CPU_SUP_INTEL
-- 
2.25.1



[PATCH v4 03/29] mm: use ARCH_PKEY_BITS to define VM_PKEY_BITN

2024-05-03 Thread Joey Gouly
Use the new CONFIG_ARCH_PKEY_BITS to simplify setting these bits
for different architectures.

Signed-off-by: Joey Gouly 

Cc: Andrew Morton 
Cc: linux-fsde...@vger.kernel.org
Cc: linux...@kvack.org
---
 fs/proc/task_mmu.c |  2 ++
 include/linux/mm.h | 16 ++--
 2 files changed, 12 insertions(+), 6 deletions(-)

diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index 23fbab954c20..0d152f460dcc 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -692,7 +692,9 @@ static void show_smap_vma_flags(struct seq_file *m, struct 
vm_area_struct *vma)
[ilog2(VM_PKEY_BIT0)]   = "",
[ilog2(VM_PKEY_BIT1)]   = "",
[ilog2(VM_PKEY_BIT2)]   = "",
+#if VM_PKEY_BIT3
[ilog2(VM_PKEY_BIT3)]   = "",
+#endif
 #if VM_PKEY_BIT4
[ilog2(VM_PKEY_BIT4)]   = "",
 #endif
diff --git a/include/linux/mm.h b/include/linux/mm.h
index b6bdaa18b9e9..5605b938acce 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -329,12 +329,16 @@ extern unsigned int kobjsize(const void *objp);
 #endif /* CONFIG_ARCH_USES_HIGH_VMA_FLAGS */
 
 #ifdef CONFIG_ARCH_HAS_PKEYS
-# define VM_PKEY_SHIFT VM_HIGH_ARCH_BIT_0
-# define VM_PKEY_BIT0  VM_HIGH_ARCH_0  /* A protection key is a 4-bit value */
-# define VM_PKEY_BIT1  VM_HIGH_ARCH_1  /* on x86 and 5-bit value on ppc64   */
-# define VM_PKEY_BIT2  VM_HIGH_ARCH_2
-# define VM_PKEY_BIT3  VM_HIGH_ARCH_3
-#ifdef CONFIG_PPC
+# define VM_PKEY_SHIFT VM_HIGH_ARCH_BIT_0
+# define VM_PKEY_BIT0  VM_HIGH_ARCH_0
+# define VM_PKEY_BIT1  VM_HIGH_ARCH_1
+# define VM_PKEY_BIT2  VM_HIGH_ARCH_2
+#if CONFIG_ARCH_PKEY_BITS > 3
+# define VM_PKEY_BIT3  VM_HIGH_ARCH_3
+#else
+# define VM_PKEY_BIT3  0
+#endif
+#if CONFIG_ARCH_PKEY_BITS > 4
 # define VM_PKEY_BIT4  VM_HIGH_ARCH_4
 #else
 # define VM_PKEY_BIT4  0
-- 
2.25.1



[PATCH v4 04/29] arm64: disable trapping of POR_EL0 to EL2

2024-05-03 Thread Joey Gouly
Allow EL0 or EL1 to access POR_EL0 without being trapped to EL2.

Signed-off-by: Joey Gouly 
Cc: Catalin Marinas 
Cc: Will Deacon 
Acked-by: Catalin Marinas 
---
 arch/arm64/include/asm/el2_setup.h | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/el2_setup.h 
b/arch/arm64/include/asm/el2_setup.h
index b7afaa026842..df5614be4b70 100644
--- a/arch/arm64/include/asm/el2_setup.h
+++ b/arch/arm64/include/asm/el2_setup.h
@@ -184,12 +184,20 @@
 .Lset_pie_fgt_\@:
mrs_s   x1, SYS_ID_AA64MMFR3_EL1
ubfxx1, x1, #ID_AA64MMFR3_EL1_S1PIE_SHIFT, #4
-   cbz x1, .Lset_fgt_\@
+   cbz x1, .Lset_poe_fgt_\@
 
/* Disable trapping of PIR_EL1 / PIRE0_EL1 */
orr x0, x0, #HFGxTR_EL2_nPIR_EL1
orr x0, x0, #HFGxTR_EL2_nPIRE0_EL1
 
+.Lset_poe_fgt_\@:
+   mrs_s   x1, SYS_ID_AA64MMFR3_EL1
+   ubfxx1, x1, #ID_AA64MMFR3_EL1_S1POE_SHIFT, #4
+   cbz x1, .Lset_fgt_\@
+
+   /* Disable trapping of POR_EL0 */
+   orr x0, x0, #HFGxTR_EL2_nPOR_EL0
+
 .Lset_fgt_\@:
msr_s   SYS_HFGRTR_EL2, x0
msr_s   SYS_HFGWTR_EL2, x0
-- 
2.25.1



[PATCH v4 08/29] KVM: arm64: make kvm_at() take an OP_AT_*

2024-05-03 Thread Joey Gouly
To allow using newer instructions that current assemblers don't know about,
replace the `at` instruction with the underlying SYS instruction.

Signed-off-by: Joey Gouly 
Cc: Marc Zyngier 
Cc: Oliver Upton 
Cc: Catalin Marinas 
Cc: Will Deacon 
---
 arch/arm64/include/asm/kvm_asm.h   | 3 ++-
 arch/arm64/kvm/hyp/include/hyp/fault.h | 2 +-
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
index 24b5e6b23417..ce65fd0f01b0 100644
--- a/arch/arm64/include/asm/kvm_asm.h
+++ b/arch/arm64/include/asm/kvm_asm.h
@@ -10,6 +10,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #define ARM_EXIT_WITH_SERROR_BIT  31
 #define ARM_EXCEPTION_CODE(x)((x) & ~(1U << ARM_EXIT_WITH_SERROR_BIT))
@@ -261,7 +262,7 @@ extern u64 __kvm_get_mdcr_el2(void);
asm volatile(   \
"   mrs %1, spsr_el2\n" \
"   mrs %2, elr_el2\n"  \
-   "1: at  "at_op", %3\n"  \
+   "1: " __msr_s(at_op, "%3") "\n" \
"   isb\n"  \
"   b   9f\n"   \
"2: msr spsr_el2, %1\n" \
diff --git a/arch/arm64/kvm/hyp/include/hyp/fault.h 
b/arch/arm64/kvm/hyp/include/hyp/fault.h
index 9e13c1bc2ad5..487c06099d6f 100644
--- a/arch/arm64/kvm/hyp/include/hyp/fault.h
+++ b/arch/arm64/kvm/hyp/include/hyp/fault.h
@@ -27,7 +27,7 @@ static inline bool __translate_far_to_hpfar(u64 far, u64 
*hpfar)
 * saved the guest context yet, and we may return early...
 */
par = read_sysreg_par();
-   if (!__kvm_at("s1e1r", far))
+   if (!__kvm_at(OP_AT_S1E1R, far))
tmp = read_sysreg_par();
else
tmp = SYS_PAR_EL1_F; /* back to the guest */
-- 
2.25.1



[PATCH v4 06/29] arm64: context switch POR_EL0 register

2024-05-03 Thread Joey Gouly
POR_EL0 is a register that can be modified by userspace directly,
so it must be context switched.

Signed-off-by: Joey Gouly 
Cc: Catalin Marinas 
Cc: Will Deacon 
---
 arch/arm64/include/asm/cpufeature.h |  6 ++
 arch/arm64/include/asm/processor.h  |  1 +
 arch/arm64/include/asm/sysreg.h |  3 +++
 arch/arm64/kernel/process.c | 28 
 4 files changed, 38 insertions(+)

diff --git a/arch/arm64/include/asm/cpufeature.h 
b/arch/arm64/include/asm/cpufeature.h
index 8b904a757bd3..d46aab23e06e 100644
--- a/arch/arm64/include/asm/cpufeature.h
+++ b/arch/arm64/include/asm/cpufeature.h
@@ -832,6 +832,12 @@ static inline bool system_supports_lpa2(void)
return cpus_have_final_cap(ARM64_HAS_LPA2);
 }
 
+static inline bool system_supports_poe(void)
+{
+   return IS_ENABLED(CONFIG_ARM64_POE) &&
+   alternative_has_cap_unlikely(ARM64_HAS_S1POE);
+}
+
 int do_emulate_mrs(struct pt_regs *regs, u32 sys_reg, u32 rt);
 bool try_emulate_mrs(struct pt_regs *regs, u32 isn);
 
diff --git a/arch/arm64/include/asm/processor.h 
b/arch/arm64/include/asm/processor.h
index f77371232d8c..e6376f979273 100644
--- a/arch/arm64/include/asm/processor.h
+++ b/arch/arm64/include/asm/processor.h
@@ -184,6 +184,7 @@ struct thread_struct {
u64 sctlr_user;
u64 svcr;
u64 tpidr2_el0;
+   u64 por_el0;
 };
 
 static inline unsigned int thread_get_vl(struct thread_struct *thread,
diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
index 9e8999592f3a..62c399811dbf 100644
--- a/arch/arm64/include/asm/sysreg.h
+++ b/arch/arm64/include/asm/sysreg.h
@@ -1064,6 +1064,9 @@
 #define POE_RXWUL(0x7)
 #define POE_MASK   UL(0xf)
 
+/* Initial value for Permission Overlay Extension for EL0 */
+#define POR_EL0_INIT   POE_RXW
+
 #define ARM64_FEATURE_FIELD_BITS   4
 
 /* Defined for compatibility only, do not add new users. */
diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
index 4ae31b7af6c3..0ffaca98bed6 100644
--- a/arch/arm64/kernel/process.c
+++ b/arch/arm64/kernel/process.c
@@ -271,12 +271,23 @@ static void flush_tagged_addr_state(void)
clear_thread_flag(TIF_TAGGED_ADDR);
 }
 
+static void flush_poe(void)
+{
+   if (!system_supports_poe())
+   return;
+
+   write_sysreg_s(POR_EL0_INIT, SYS_POR_EL0);
+   /* ISB required for kernel uaccess routines when chaning POR_EL0 */
+   isb();
+}
+
 void flush_thread(void)
 {
fpsimd_flush_thread();
tls_thread_flush();
flush_ptrace_hw_breakpoint(current);
flush_tagged_addr_state();
+   flush_poe();
 }
 
 void arch_release_task_struct(struct task_struct *tsk)
@@ -371,6 +382,9 @@ int copy_thread(struct task_struct *p, const struct 
kernel_clone_args *args)
if (system_supports_tpidr2())
p->thread.tpidr2_el0 = read_sysreg_s(SYS_TPIDR2_EL0);
 
+   if (system_supports_poe())
+   p->thread.por_el0 = read_sysreg_s(SYS_POR_EL0);
+
if (stack_start) {
if (is_compat_thread(task_thread_info(p)))
childregs->compat_sp = stack_start;
@@ -495,6 +509,19 @@ static void erratum_1418040_new_exec(void)
preempt_enable();
 }
 
+static void permission_overlay_switch(struct task_struct *next)
+{
+   if (!system_supports_poe())
+   return;
+
+   current->thread.por_el0 = read_sysreg_s(SYS_POR_EL0);
+   if (current->thread.por_el0 != next->thread.por_el0) {
+   write_sysreg_s(next->thread.por_el0, SYS_POR_EL0);
+   /* ISB required for kernel uaccess routines when chaning 
POR_EL0 */
+   isb();
+   }
+}
+
 /*
  * __switch_to() checks current->thread.sctlr_user as an optimisation. 
Therefore
  * this function must be called with preemption disabled and the update to
@@ -530,6 +557,7 @@ struct task_struct *__switch_to(struct task_struct *prev,
ssbs_thread_switch(next);
erratum_1418040_thread_switch(next);
ptrauth_thread_switch_user(next);
+   permission_overlay_switch(next);
 
/*
 * Complete any pending TLB or cache maintenance on this CPU in case
-- 
2.25.1



[PATCH v4 05/29] arm64: cpufeature: add Permission Overlay Extension cpucap

2024-05-03 Thread Joey Gouly
This indicates if the system supports POE. This is a CPUCAP_BOOT_CPU_FEATURE
as the boot CPU will enable POE if it has it, so secondary CPUs must also
have this feature.

Signed-off-by: Joey Gouly 
Cc: Catalin Marinas 
Cc: Will Deacon 
---
 arch/arm64/kernel/cpufeature.c | 9 +
 arch/arm64/tools/cpucaps   | 1 +
 2 files changed, 10 insertions(+)

diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index 56583677c1f2..2f3c2346e156 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -2861,6 +2861,15 @@ static const struct arm64_cpu_capabilities 
arm64_features[] = {
.matches = has_nv1,
ARM64_CPUID_FIELDS_NEG(ID_AA64MMFR4_EL1, E2H0, NI_NV1)
},
+#ifdef CONFIG_ARM64_POE
+   {
+   .desc = "Stage-1 Permission Overlay Extension (S1POE)",
+   .capability = ARM64_HAS_S1POE,
+   .type = ARM64_CPUCAP_BOOT_CPU_FEATURE,
+   .matches = has_cpuid_feature,
+   ARM64_CPUID_FIELDS(ID_AA64MMFR3_EL1, S1POE, IMP)
+   },
+#endif
{},
 };
 
diff --git a/arch/arm64/tools/cpucaps b/arch/arm64/tools/cpucaps
index 62b2838a231a..45f558fc0d87 100644
--- a/arch/arm64/tools/cpucaps
+++ b/arch/arm64/tools/cpucaps
@@ -45,6 +45,7 @@ HAS_MOPS
 HAS_NESTED_VIRT
 HAS_PAN
 HAS_S1PIE
+HAS_S1POE
 HAS_RAS_EXTN
 HAS_RNG
 HAS_SB
-- 
2.25.1



[PATCH v4 07/29] KVM: arm64: Save/restore POE registers

2024-05-03 Thread Joey Gouly
Define the new system registers that POE introduces and context switch them.

Signed-off-by: Joey Gouly 
Cc: Marc Zyngier 
Cc: Oliver Upton 
Cc: Catalin Marinas 
Cc: Will Deacon 
---
 arch/arm64/include/asm/kvm_host.h  |  4 +++
 arch/arm64/include/asm/vncr_mapping.h  |  1 +
 arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h | 29 ++
 arch/arm64/kvm/sys_regs.c  |  8 --
 4 files changed, 40 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_host.h 
b/arch/arm64/include/asm/kvm_host.h
index 9e8a496fb284..28042da0befd 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -419,6 +419,8 @@ enum vcpu_sysreg {
GCR_EL1,/* Tag Control Register */
TFSRE0_EL1, /* Tag Fault Status Register (EL0) */
 
+   POR_EL0,/* Permission Overlay Register 0 (EL0) */
+
/* 32bit specific registers. */
DACR32_EL2, /* Domain Access Control Register */
IFSR32_EL2, /* Instruction Fault Status Register */
@@ -489,6 +491,8 @@ enum vcpu_sysreg {
VNCR(PIR_EL1),   /* Permission Indirection Register 1 (EL1) */
VNCR(PIRE0_EL1), /*  Permission Indirection Register 0 (EL1) */
 
+   VNCR(POR_EL1),  /* Permission Overlay Register 1 (EL1) */
+
VNCR(HFGRTR_EL2),
VNCR(HFGWTR_EL2),
VNCR(HFGITR_EL2),
diff --git a/arch/arm64/include/asm/vncr_mapping.h 
b/arch/arm64/include/asm/vncr_mapping.h
index df2c47c55972..06f8ec0906a6 100644
--- a/arch/arm64/include/asm/vncr_mapping.h
+++ b/arch/arm64/include/asm/vncr_mapping.h
@@ -52,6 +52,7 @@
 #define VNCR_PIRE0_EL1 0x290
 #define VNCR_PIRE0_EL2 0x298
 #define VNCR_PIR_EL1   0x2A0
+#define VNCR_POR_EL1   0x2A8
 #define VNCR_ICH_LR0_EL20x400
 #define VNCR_ICH_LR1_EL20x408
 #define VNCR_ICH_LR2_EL20x410
diff --git a/arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h 
b/arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h
index 4be6a7fa0070..1c9536557bae 100644
--- a/arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h
+++ b/arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h
@@ -16,9 +16,15 @@
 #include 
 #include 
 
+static inline bool ctxt_has_s1poe(struct kvm_cpu_context *ctxt);
+
 static inline void __sysreg_save_common_state(struct kvm_cpu_context *ctxt)
 {
ctxt_sys_reg(ctxt, MDSCR_EL1)   = read_sysreg(mdscr_el1);
+
+   // POR_EL0 can affect uaccess, so must be saved/restored early.
+   if (ctxt_has_s1poe(ctxt))
+   ctxt_sys_reg(ctxt, POR_EL0) = read_sysreg_s(SYS_POR_EL0);
 }
 
 static inline void __sysreg_save_user_state(struct kvm_cpu_context *ctxt)
@@ -55,6 +61,17 @@ static inline bool ctxt_has_s1pie(struct kvm_cpu_context 
*ctxt)
return kvm_has_feat(kern_hyp_va(vcpu->kvm), ID_AA64MMFR3_EL1, S1PIE, 
IMP);
 }
 
+static inline bool ctxt_has_s1poe(struct kvm_cpu_context *ctxt)
+{
+   struct kvm_vcpu *vcpu;
+
+   if (!system_supports_poe())
+   return false;
+
+   vcpu = ctxt_to_vcpu(ctxt);
+   return kvm_has_feat(kern_hyp_va(vcpu->kvm), ID_AA64MMFR3_EL1, S1POE, 
IMP);
+}
+
 static inline void __sysreg_save_el1_state(struct kvm_cpu_context *ctxt)
 {
ctxt_sys_reg(ctxt, SCTLR_EL1)   = read_sysreg_el1(SYS_SCTLR);
@@ -77,6 +94,10 @@ static inline void __sysreg_save_el1_state(struct 
kvm_cpu_context *ctxt)
ctxt_sys_reg(ctxt, PIR_EL1) = read_sysreg_el1(SYS_PIR);
ctxt_sys_reg(ctxt, PIRE0_EL1)   = read_sysreg_el1(SYS_PIRE0);
}
+
+   if (ctxt_has_s1poe(ctxt))
+   ctxt_sys_reg(ctxt, POR_EL1) = read_sysreg_el1(SYS_POR);
+
ctxt_sys_reg(ctxt, PAR_EL1) = read_sysreg_par();
ctxt_sys_reg(ctxt, TPIDR_EL1)   = read_sysreg(tpidr_el1);
 
@@ -107,6 +128,10 @@ static inline void __sysreg_save_el2_return_state(struct 
kvm_cpu_context *ctxt)
 static inline void __sysreg_restore_common_state(struct kvm_cpu_context *ctxt)
 {
write_sysreg(ctxt_sys_reg(ctxt, MDSCR_EL1),  mdscr_el1);
+
+   // POR_EL0 can affect uaccess, so must be saved/restored early.
+   if (ctxt_has_s1poe(ctxt))
+   write_sysreg_s(ctxt_sys_reg(ctxt, POR_EL0), SYS_POR_EL0);
 }
 
 static inline void __sysreg_restore_user_state(struct kvm_cpu_context *ctxt)
@@ -153,6 +178,10 @@ static inline void __sysreg_restore_el1_state(struct 
kvm_cpu_context *ctxt)
write_sysreg_el1(ctxt_sys_reg(ctxt, PIR_EL1),   SYS_PIR);
write_sysreg_el1(ctxt_sys_reg(ctxt, PIRE0_EL1), SYS_PIRE0);
}
+
+   if (ctxt_has_s1poe(ctxt))
+   write_sysreg_el1(ctxt_sys_reg(ctxt, POR_EL1),   SYS_POR);
+
write_sysreg(ctxt_sys_reg(ctxt, PAR_EL1),   par_el1);
write_sysreg(ctxt_sys_reg(ctxt, TPIDR_EL1), tpidr_el1);
 
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index c9f4f387155f..be04fae35afb 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -2423,6 +2423,7 @@ static con

[PATCH v4 17/29] arm64: implement PKEYS support

2024-05-03 Thread Joey Gouly
Implement the PKEYS interface, using the Permission Overlay Extension.

Signed-off-by: Joey Gouly 
Cc: Catalin Marinas 
Cc: Will Deacon 
---
 arch/arm64/include/asm/mmu.h |   1 +
 arch/arm64/include/asm/mmu_context.h |  51 -
 arch/arm64/include/asm/pgtable.h |  22 +-
 arch/arm64/include/asm/pkeys.h   | 110 +++
 arch/arm64/include/asm/por.h |  33 
 arch/arm64/mm/mmu.c  |  40 ++
 6 files changed, 255 insertions(+), 2 deletions(-)
 create mode 100644 arch/arm64/include/asm/pkeys.h
 create mode 100644 arch/arm64/include/asm/por.h

diff --git a/arch/arm64/include/asm/mmu.h b/arch/arm64/include/asm/mmu.h
index 65977c7783c5..983afeb4eba5 100644
--- a/arch/arm64/include/asm/mmu.h
+++ b/arch/arm64/include/asm/mmu.h
@@ -25,6 +25,7 @@ typedef struct {
refcount_t  pinned;
void*vdso;
unsigned long   flags;
+   u8  pkey_allocation_map;
 } mm_context_t;
 
 /*
diff --git a/arch/arm64/include/asm/mmu_context.h 
b/arch/arm64/include/asm/mmu_context.h
index c768d16b81a4..cb499db7a97b 100644
--- a/arch/arm64/include/asm/mmu_context.h
+++ b/arch/arm64/include/asm/mmu_context.h
@@ -15,12 +15,12 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
@@ -175,9 +175,36 @@ init_new_context(struct task_struct *tsk, struct mm_struct 
*mm)
 {
atomic64_set(&mm->context.id, 0);
refcount_set(&mm->context.pinned, 0);
+
+   /* pkey 0 is the default, so always reserve it. */
+   mm->context.pkey_allocation_map = 0x1;
+
+   return 0;
+}
+
+static inline void arch_dup_pkeys(struct mm_struct *oldmm,
+ struct mm_struct *mm)
+{
+   /* Duplicate the oldmm pkey state in mm: */
+   mm->context.pkey_allocation_map = oldmm->context.pkey_allocation_map;
+}
+
+static inline int arch_dup_mmap(struct mm_struct *oldmm, struct mm_struct *mm)
+{
+   arch_dup_pkeys(oldmm, mm);
+
return 0;
 }
 
+static inline void arch_exit_mmap(struct mm_struct *mm)
+{
+}
+
+static inline void arch_unmap(struct mm_struct *mm,
+   unsigned long start, unsigned long end)
+{
+}
+
 #ifdef CONFIG_ARM64_SW_TTBR0_PAN
 static inline void update_saved_ttbr0(struct task_struct *tsk,
  struct mm_struct *mm)
@@ -267,6 +294,28 @@ static inline unsigned long mm_untag_mask(struct mm_struct 
*mm)
return -1UL >> 8;
 }
 
+/*
+ * We only want to enforce protection keys on the current process
+ * because we effectively have no access to POR_EL0 for other
+ * processes or any way to tell *which * POR_EL0 in a threaded
+ * process we could use.
+ *
+ * So do not enforce things if the VMA is not from the current
+ * mm, or if we are in a kernel thread.
+ */
+static inline bool arch_vma_access_permitted(struct vm_area_struct *vma,
+   bool write, bool execute, bool foreign)
+{
+   if (!arch_pkeys_enabled())
+   return true;
+
+   /* allow access if the VMA is not one from this process */
+   if (foreign || vma_is_foreign(vma))
+   return true;
+
+   return por_el0_allows_pkey(vma_pkey(vma), write, execute);
+}
+
 #include 
 
 #endif /* !__ASSEMBLY__ */
diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index 2449e4e27ea6..8ee68ff03016 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -34,6 +34,7 @@
 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -153,6 +154,24 @@ static inline pteval_t __phys_to_pte_val(phys_addr_t phys)
 #define pte_accessible(mm, pte)\
(mm_tlb_flush_pending(mm) ? pte_present(pte) : pte_valid(pte))
 
+static inline bool por_el0_allows_pkey(u8 pkey, bool write, bool execute)
+{
+   u64 por;
+
+   if (!system_supports_poe())
+   return true;
+
+   por = read_sysreg_s(SYS_POR_EL0);
+
+   if (write)
+   return por_elx_allows_write(por, pkey);
+
+   if (execute)
+   return por_elx_allows_exec(por, pkey);
+
+   return por_elx_allows_read(por, pkey);
+}
+
 /*
  * p??_access_permitted() is true for valid user mappings (PTE_USER
  * bit set, subject to the write permission check). For execute-only
@@ -163,7 +182,8 @@ static inline pteval_t __phys_to_pte_val(phys_addr_t phys)
 #define pte_access_permitted_no_overlay(pte, write) \
(((pte_val(pte) & (PTE_VALID | PTE_USER)) == (PTE_VALID | PTE_USER)) && 
(!(write) || pte_write(pte)))
 #define pte_access_permitted(pte, write) \
-   pte_access_permitted_no_overlay(pte, write)
+   (pte_access_permitted_no_overlay(pte, write) && \
+   por_el0_allows_pkey(FIELD_GET(PTE_PO_IDX_MASK, pte_val(pte)), write, 
false))
 #define pmd_access_permitted(pmd, write) \
(pte_access_permitted(pmd_pte(pmd), (write)))
 #define pud_access_permitted(pud, 

[PATCH v4 18/29] arm64: add POE signal support

2024-05-03 Thread Joey Gouly
Add PKEY support to signals, by saving and restoring POR_EL0 from the 
stackframe.

Signed-off-by: Joey Gouly 
Cc: Catalin Marinas 
Cc: Will Deacon 
Reviewed-by: Mark Brown 
Acked-by: Szabolcs Nagy 
---
 arch/arm64/include/uapi/asm/sigcontext.h |  7 
 arch/arm64/kernel/signal.c   | 52 
 2 files changed, 59 insertions(+)

diff --git a/arch/arm64/include/uapi/asm/sigcontext.h 
b/arch/arm64/include/uapi/asm/sigcontext.h
index 8a45b7a411e0..e4cba8a6c9a2 100644
--- a/arch/arm64/include/uapi/asm/sigcontext.h
+++ b/arch/arm64/include/uapi/asm/sigcontext.h
@@ -98,6 +98,13 @@ struct esr_context {
__u64 esr;
 };
 
+#define POE_MAGIC  0x504f4530
+
+struct poe_context {
+   struct _aarch64_ctx head;
+   __u64 por_el0;
+};
+
 /*
  * extra_context: describes extra space in the signal frame for
  * additional structures that don't fit in sigcontext.__reserved[].
diff --git a/arch/arm64/kernel/signal.c b/arch/arm64/kernel/signal.c
index 4a77f4976e11..077436a8bc10 100644
--- a/arch/arm64/kernel/signal.c
+++ b/arch/arm64/kernel/signal.c
@@ -63,6 +63,7 @@ struct rt_sigframe_user_layout {
unsigned long fpmr_offset;
unsigned long extra_offset;
unsigned long end_offset;
+   unsigned long poe_offset;
 };
 
 #define BASE_SIGFRAME_SIZE round_up(sizeof(struct rt_sigframe), 16)
@@ -185,6 +186,8 @@ struct user_ctxs {
u32 zt_size;
struct fpmr_context __user *fpmr;
u32 fpmr_size;
+   struct poe_context __user *poe;
+   u32 poe_size;
 };
 
 static int preserve_fpsimd_context(struct fpsimd_context __user *ctx)
@@ -258,6 +261,21 @@ static int restore_fpmr_context(struct user_ctxs *user)
return err;
 }
 
+static int restore_poe_context(struct user_ctxs *user)
+{
+   u64 por_el0;
+   int err = 0;
+
+   if (user->poe_size != sizeof(*user->poe))
+   return -EINVAL;
+
+   __get_user_error(por_el0, &(user->poe->por_el0), err);
+   if (!err)
+   write_sysreg_s(por_el0, SYS_POR_EL0);
+
+   return err;
+}
+
 #ifdef CONFIG_ARM64_SVE
 
 static int preserve_sve_context(struct sve_context __user *ctx)
@@ -621,6 +639,7 @@ static int parse_user_sigframe(struct user_ctxs *user,
user->za = NULL;
user->zt = NULL;
user->fpmr = NULL;
+   user->poe = NULL;
 
if (!IS_ALIGNED((unsigned long)base, 16))
goto invalid;
@@ -671,6 +690,17 @@ static int parse_user_sigframe(struct user_ctxs *user,
/* ignore */
break;
 
+   case POE_MAGIC:
+   if (!system_supports_poe())
+   goto invalid;
+
+   if (user->poe)
+   goto invalid;
+
+   user->poe = (struct poe_context __user *)head;
+   user->poe_size = size;
+   break;
+
case SVE_MAGIC:
if (!system_supports_sve() && !system_supports_sme())
goto invalid;
@@ -857,6 +887,9 @@ static int restore_sigframe(struct pt_regs *regs,
if (err == 0 && system_supports_sme2() && user.zt)
err = restore_zt_context(&user);
 
+   if (err == 0 && system_supports_poe() && user.poe)
+   err = restore_poe_context(&user);
+
return err;
 }
 
@@ -980,6 +1013,13 @@ static int setup_sigframe_layout(struct 
rt_sigframe_user_layout *user,
return err;
}
 
+   if (system_supports_poe()) {
+   err = sigframe_alloc(user, &user->poe_offset,
+sizeof(struct poe_context));
+   if (err)
+   return err;
+   }
+
return sigframe_alloc_end(user);
 }
 
@@ -1020,6 +1060,15 @@ static int setup_sigframe(struct rt_sigframe_user_layout 
*user,
__put_user_error(current->thread.fault_code, &esr_ctx->esr, 
err);
}
 
+   if (system_supports_poe() && err == 0 && user->poe_offset) {
+   struct poe_context __user *poe_ctx =
+   apply_user_offset(user, user->poe_offset);
+
+   __put_user_error(POE_MAGIC, &poe_ctx->head.magic, err);
+   __put_user_error(sizeof(*poe_ctx), &poe_ctx->head.size, err);
+   __put_user_error(read_sysreg_s(SYS_POR_EL0), &poe_ctx->por_el0, 
err);
+   }
+
/* Scalable Vector Extension state (including streaming), if present */
if ((system_supports_sve() || system_supports_sme()) &&
err == 0 && user->sve_offset) {
@@ -1178,6 +1227,9 @@ static void setup_return(struct pt_regs *regs, struct 
k_sigaction *ka,
sme_smstop();
}
 
+   if (system_supports_poe())
+   write_sysreg_s(POR_EL0_INIT, SYS_POR_EL0);
+
if (ka->sa.sa_flags & SA_RESTORER)
sigtramp = ka->sa.sa_restorer;
else
-- 
2.25.1


[PATCH v4 20/29] arm64: enable POE and PIE to coexist

2024-05-03 Thread Joey Gouly
Set the EL0/userspace indirection encodings to be the overlay enabled
variants of the permissions.

Signed-off-by: Joey Gouly 
Cc: Catalin Marinas 
Cc: Will Deacon 
---
 arch/arm64/include/asm/pgtable-prot.h | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/include/asm/pgtable-prot.h 
b/arch/arm64/include/asm/pgtable-prot.h
index dd9ee67d1d87..4f9f85437d3d 100644
--- a/arch/arm64/include/asm/pgtable-prot.h
+++ b/arch/arm64/include/asm/pgtable-prot.h
@@ -147,10 +147,10 @@ static inline bool __pure lpa2_is_enabled(void)
 
 #define PIE_E0 ( \
PIRx_ELx_PERM(pte_pi_index(_PAGE_EXECONLY),  PIE_X_O) | \
-   PIRx_ELx_PERM(pte_pi_index(_PAGE_READONLY_EXEC), PIE_RX)  | \
-   PIRx_ELx_PERM(pte_pi_index(_PAGE_SHARED_EXEC),   PIE_RWX) | \
-   PIRx_ELx_PERM(pte_pi_index(_PAGE_READONLY),  PIE_R)   | \
-   PIRx_ELx_PERM(pte_pi_index(_PAGE_SHARED),PIE_RW))
+   PIRx_ELx_PERM(pte_pi_index(_PAGE_READONLY_EXEC), PIE_RX_O)  | \
+   PIRx_ELx_PERM(pte_pi_index(_PAGE_SHARED_EXEC),   PIE_RWX_O) | \
+   PIRx_ELx_PERM(pte_pi_index(_PAGE_READONLY),  PIE_R_O)   | \
+   PIRx_ELx_PERM(pte_pi_index(_PAGE_SHARED),PIE_RW_O))
 
 #define PIE_E1 ( \
PIRx_ELx_PERM(pte_pi_index(_PAGE_EXECONLY),  PIE_NONE_O) | \
-- 
2.25.1



[PATCH v4 19/29] arm64: enable PKEY support for CPUs with S1POE

2024-05-03 Thread Joey Gouly
Now that PKEYs support has been implemented, enable it for CPUs that
support S1POE.

Signed-off-by: Joey Gouly 
Cc: Catalin Marinas 
Cc: Will Deacon 
Acked-by: Catalin Marinas 
---
 arch/arm64/include/asm/pkeys.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/pkeys.h b/arch/arm64/include/asm/pkeys.h
index a284508a4d02..3ea928ec94c0 100644
--- a/arch/arm64/include/asm/pkeys.h
+++ b/arch/arm64/include/asm/pkeys.h
@@ -17,7 +17,7 @@ int arch_set_user_pkey_access(struct task_struct *tsk, int 
pkey,
 
 static inline bool arch_pkeys_enabled(void)
 {
-   return false;
+   return system_supports_poe();
 }
 
 static inline int vma_pkey(struct vm_area_struct *vma)
-- 
2.25.1



[PATCH v4 09/29] KVM: arm64: use `at s1e1a` for POE

2024-05-03 Thread Joey Gouly
FEAT_ATS1E1A introduces a new instruction: `at s1e1a`.
This is an address translation, without permission checks.

POE allows read permissions to be removed from S1 by the guest.  This means
that an `at` instruction could fail, and not get the IPA.

Switch to using `at s1e1a` so that KVM can get the IPA regardless of S1
permissions.

Signed-off-by: Joey Gouly 
Cc: Marc Zyngier 
Cc: Oliver Upton 
Cc: Catalin Marinas 
Cc: Will Deacon 
---
 arch/arm64/kvm/hyp/include/hyp/fault.h | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kvm/hyp/include/hyp/fault.h 
b/arch/arm64/kvm/hyp/include/hyp/fault.h
index 487c06099d6f..17df94570f03 100644
--- a/arch/arm64/kvm/hyp/include/hyp/fault.h
+++ b/arch/arm64/kvm/hyp/include/hyp/fault.h
@@ -14,6 +14,7 @@
 
 static inline bool __translate_far_to_hpfar(u64 far, u64 *hpfar)
 {
+   int ret;
u64 par, tmp;
 
/*
@@ -27,7 +28,9 @@ static inline bool __translate_far_to_hpfar(u64 far, u64 
*hpfar)
 * saved the guest context yet, and we may return early...
 */
par = read_sysreg_par();
-   if (!__kvm_at(OP_AT_S1E1R, far))
+   ret = system_supports_poe() ? __kvm_at(OP_AT_S1E1A, far) :
+ __kvm_at(OP_AT_S1E1R, far);
+   if (!ret)
tmp = read_sysreg_par();
else
tmp = SYS_PAR_EL1_F; /* back to the guest */
-- 
2.25.1



[PATCH v4 10/29] arm64: enable the Permission Overlay Extension for EL0

2024-05-03 Thread Joey Gouly
Expose a HWCAP and ID_AA64MMFR3_EL1_S1POE to userspace, so they can be used to
check if the CPU supports the feature.

Signed-off-by: Joey Gouly 
Cc: Catalin Marinas 
Cc: Will Deacon 
---

This takes the last bit of HWCAP2, is this fine? What can we do about more 
features in the future?


 Documentation/arch/arm64/elf_hwcaps.rst |  2 ++
 arch/arm64/include/asm/hwcap.h  |  1 +
 arch/arm64/include/uapi/asm/hwcap.h |  1 +
 arch/arm64/kernel/cpufeature.c  | 14 ++
 arch/arm64/kernel/cpuinfo.c |  1 +
 5 files changed, 19 insertions(+)

diff --git a/Documentation/arch/arm64/elf_hwcaps.rst 
b/Documentation/arch/arm64/elf_hwcaps.rst
index 448c1664879b..694f67fa07d1 100644
--- a/Documentation/arch/arm64/elf_hwcaps.rst
+++ b/Documentation/arch/arm64/elf_hwcaps.rst
@@ -365,6 +365,8 @@ HWCAP2_SME_SF8DP2
 HWCAP2_SME_SF8DP4
 Functionality implied by ID_AA64SMFR0_EL1.SF8DP4 == 0b1.
 
+HWCAP2_POE
+Functionality implied by ID_AA64MMFR3_EL1.S1POE == 0b0001.
 
 4. Unused AT_HWCAP bits
 ---
diff --git a/arch/arm64/include/asm/hwcap.h b/arch/arm64/include/asm/hwcap.h
index 4edd3b61df11..a775adddecf2 100644
--- a/arch/arm64/include/asm/hwcap.h
+++ b/arch/arm64/include/asm/hwcap.h
@@ -157,6 +157,7 @@
 #define KERNEL_HWCAP_SME_SF8FMA__khwcap2_feature(SME_SF8FMA)
 #define KERNEL_HWCAP_SME_SF8DP4__khwcap2_feature(SME_SF8DP4)
 #define KERNEL_HWCAP_SME_SF8DP2__khwcap2_feature(SME_SF8DP2)
+#define KERNEL_HWCAP_POE   __khwcap2_feature(POE)
 
 /*
  * This yields a mask that user programs can use to figure out what
diff --git a/arch/arm64/include/uapi/asm/hwcap.h 
b/arch/arm64/include/uapi/asm/hwcap.h
index 285610e626f5..055381b2c615 100644
--- a/arch/arm64/include/uapi/asm/hwcap.h
+++ b/arch/arm64/include/uapi/asm/hwcap.h
@@ -122,5 +122,6 @@
 #define HWCAP2_SME_SF8FMA  (1UL << 60)
 #define HWCAP2_SME_SF8DP4  (1UL << 61)
 #define HWCAP2_SME_SF8DP2  (1UL << 62)
+#define HWCAP2_POE (1UL << 63)
 
 #endif /* _UAPI__ASM_HWCAP_H */
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index 2f3c2346e156..8c02aae9db11 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -465,6 +465,8 @@ static const struct arm64_ftr_bits ftr_id_aa64mmfr2[] = {
 };
 
 static const struct arm64_ftr_bits ftr_id_aa64mmfr3[] = {
+   ARM64_FTR_BITS(FTR_VISIBLE_IF_IS_ENABLED(CONFIG_ARM64_POE),
+  FTR_NONSTRICT, FTR_LOWER_SAFE, 
ID_AA64MMFR3_EL1_S1POE_SHIFT, 4, 0),
ARM64_FTR_BITS(FTR_HIDDEN, FTR_NONSTRICT, FTR_LOWER_SAFE, 
ID_AA64MMFR3_EL1_S1PIE_SHIFT, 4, 0),
ARM64_FTR_BITS(FTR_HIDDEN, FTR_NONSTRICT, FTR_LOWER_SAFE, 
ID_AA64MMFR3_EL1_TCRX_SHIFT, 4, 0),
ARM64_FTR_END,
@@ -2339,6 +2341,14 @@ static void cpu_enable_mops(const struct 
arm64_cpu_capabilities *__unused)
sysreg_clear_set(sctlr_el1, 0, SCTLR_EL1_MSCEn);
 }
 
+#ifdef CONFIG_ARM64_POE
+static void cpu_enable_poe(const struct arm64_cpu_capabilities *__unused)
+{
+   sysreg_clear_set(REG_TCR2_EL1, 0, TCR2_EL1x_E0POE);
+   sysreg_clear_set(CPACR_EL1, 0, CPACR_ELx_E0POE);
+}
+#endif
+
 /* Internal helper functions to match cpu capability type */
 static bool
 cpucap_late_cpu_optional(const struct arm64_cpu_capabilities *cap)
@@ -2867,6 +2877,7 @@ static const struct arm64_cpu_capabilities 
arm64_features[] = {
.capability = ARM64_HAS_S1POE,
.type = ARM64_CPUCAP_BOOT_CPU_FEATURE,
.matches = has_cpuid_feature,
+   .cpu_enable = cpu_enable_poe,
ARM64_CPUID_FIELDS(ID_AA64MMFR3_EL1, S1POE, IMP)
},
 #endif
@@ -3034,6 +3045,9 @@ static const struct arm64_cpu_capabilities 
arm64_elf_hwcaps[] = {
HWCAP_CAP(ID_AA64FPFR0_EL1, F8DP2, IMP, CAP_HWCAP, KERNEL_HWCAP_F8DP2),
HWCAP_CAP(ID_AA64FPFR0_EL1, F8E4M3, IMP, CAP_HWCAP, 
KERNEL_HWCAP_F8E4M3),
HWCAP_CAP(ID_AA64FPFR0_EL1, F8E5M2, IMP, CAP_HWCAP, 
KERNEL_HWCAP_F8E5M2),
+#ifdef CONFIG_ARM64_POE
+   HWCAP_CAP(ID_AA64MMFR3_EL1, S1POE, IMP, CAP_HWCAP, KERNEL_HWCAP_POE),
+#endif
{},
 };
 
diff --git a/arch/arm64/kernel/cpuinfo.c b/arch/arm64/kernel/cpuinfo.c
index 09eeaa24d456..b9db812082b3 100644
--- a/arch/arm64/kernel/cpuinfo.c
+++ b/arch/arm64/kernel/cpuinfo.c
@@ -143,6 +143,7 @@ static const char *const hwcap_str[] = {
[KERNEL_HWCAP_SME_SF8FMA]   = "smesf8fma",
[KERNEL_HWCAP_SME_SF8DP4]   = "smesf8dp4",
[KERNEL_HWCAP_SME_SF8DP2]   = "smesf8dp2",
+   [KERNEL_HWCAP_POE]  = "poe",
 };
 
 #ifdef CONFIG_COMPAT
-- 
2.25.1



[PATCH v4 21/29] arm64/ptrace: add support for FEAT_POE

2024-05-03 Thread Joey Gouly
Add a regset for POE containing POR_EL0.

Signed-off-by: Joey Gouly 
Cc: Catalin Marinas 
Cc: Will Deacon 
Reviewed-by: Mark Brown 
Reviewed-by: Catalin Marinas 
---
 arch/arm64/kernel/ptrace.c | 46 ++
 include/uapi/linux/elf.h   |  1 +
 2 files changed, 47 insertions(+)

diff --git a/arch/arm64/kernel/ptrace.c b/arch/arm64/kernel/ptrace.c
index 0d022599eb61..b756578aeaee 100644
--- a/arch/arm64/kernel/ptrace.c
+++ b/arch/arm64/kernel/ptrace.c
@@ -1440,6 +1440,39 @@ static int tagged_addr_ctrl_set(struct task_struct 
*target, const struct
 }
 #endif
 
+#ifdef CONFIG_ARM64_POE
+static int poe_get(struct task_struct *target,
+  const struct user_regset *regset,
+  struct membuf to)
+{
+   if (!system_supports_poe())
+   return -EINVAL;
+
+   return membuf_write(&to, &target->thread.por_el0,
+   sizeof(target->thread.por_el0));
+}
+
+static int poe_set(struct task_struct *target, const struct
+  user_regset *regset, unsigned int pos,
+  unsigned int count, const void *kbuf, const
+  void __user *ubuf)
+{
+   int ret;
+   long ctrl;
+
+   if (!system_supports_poe())
+   return -EINVAL;
+
+   ret = user_regset_copyin(&pos, &count, &kbuf, &ubuf, &ctrl, 0, -1);
+   if (ret)
+   return ret;
+
+   target->thread.por_el0 = ctrl;
+
+   return 0;
+}
+#endif
+
 enum aarch64_regset {
REGSET_GPR,
REGSET_FPR,
@@ -1469,6 +1502,9 @@ enum aarch64_regset {
 #ifdef CONFIG_ARM64_TAGGED_ADDR_ABI
REGSET_TAGGED_ADDR_CTRL,
 #endif
+#ifdef CONFIG_ARM64_POE
+   REGSET_POE
+#endif
 };
 
 static const struct user_regset aarch64_regsets[] = {
@@ -1628,6 +1664,16 @@ static const struct user_regset aarch64_regsets[] = {
.set = tagged_addr_ctrl_set,
},
 #endif
+#ifdef CONFIG_ARM64_POE
+   [REGSET_POE] = {
+   .core_note_type = NT_ARM_POE,
+   .n = 1,
+   .size = sizeof(long),
+   .align = sizeof(long),
+   .regset_get = poe_get,
+   .set = poe_set,
+   },
+#endif
 };
 
 static const struct user_regset_view user_aarch64_view = {
diff --git a/include/uapi/linux/elf.h b/include/uapi/linux/elf.h
index b54b313bcf07..81762ff3c99e 100644
--- a/include/uapi/linux/elf.h
+++ b/include/uapi/linux/elf.h
@@ -441,6 +441,7 @@ typedef struct elf64_shdr {
 #define NT_ARM_ZA  0x40c   /* ARM SME ZA registers */
 #define NT_ARM_ZT  0x40d   /* ARM SME ZT registers */
 #define NT_ARM_FPMR0x40e   /* ARM floating point mode register */
+#define NT_ARM_POE 0x40f   /* ARM POE registers */
 #define NT_ARC_V2  0x600   /* ARCv2 accumulator/extra registers */
 #define NT_VMCOREDD0x700   /* Vmcore Device Dump Note */
 #define NT_MIPS_DSP0x800   /* MIPS DSP ASE registers */
-- 
2.25.1



[PATCH v4 12/29] arm64: add POIndex defines

2024-05-03 Thread Joey Gouly
The 3-bit POIndex is stored in the PTE at bits 60..62.

Signed-off-by: Joey Gouly 
Cc: Catalin Marinas 
Cc: Will Deacon 
---
 arch/arm64/include/asm/pgtable-hwdef.h | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/arch/arm64/include/asm/pgtable-hwdef.h 
b/arch/arm64/include/asm/pgtable-hwdef.h
index ef207a0d4f0d..370a02922fe1 100644
--- a/arch/arm64/include/asm/pgtable-hwdef.h
+++ b/arch/arm64/include/asm/pgtable-hwdef.h
@@ -198,6 +198,16 @@
 #define PTE_PI_IDX_2   53  /* PXN */
 #define PTE_PI_IDX_3   54  /* UXN */
 
+/*
+ * POIndex[2:0] encoding (Permission Overlay Extension)
+ */
+#define PTE_PO_IDX_0   (_AT(pteval_t, 1) << 60)
+#define PTE_PO_IDX_1   (_AT(pteval_t, 1) << 61)
+#define PTE_PO_IDX_2   (_AT(pteval_t, 1) << 62)
+
+#define PTE_PO_IDX_MASKGENMASK_ULL(62, 60)
+
+
 /*
  * Memory Attribute override for Stage-2 (MemAttr[3:0])
  */
-- 
2.25.1



[PATCH v4 11/29] arm64: re-order MTE VM_ flags

2024-05-03 Thread Joey Gouly
To make it easier to share the generic PKEYs flags, move the MTE flag.

Signed-off-by: Joey Gouly 
Cc: Catalin Marinas 
Cc: Will Deacon 
---
 include/linux/mm.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 5605b938acce..2065727b3787 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -377,8 +377,8 @@ extern unsigned int kobjsize(const void *objp);
 #endif
 
 #if defined(CONFIG_ARM64_MTE)
-# define VM_MTEVM_HIGH_ARCH_0  /* Use Tagged memory for access 
control */
-# define VM_MTE_ALLOWEDVM_HIGH_ARCH_1  /* Tagged memory permitted */
+# define VM_MTEVM_HIGH_ARCH_4  /* Use Tagged memory for access 
control */
+# define VM_MTE_ALLOWEDVM_HIGH_ARCH_5  /* Tagged memory permitted */
 #else
 # define VM_MTEVM_NONE
 # define VM_MTE_ALLOWEDVM_NONE
-- 
2.25.1



[PATCH v4 22/29] arm64: add Permission Overlay Extension Kconfig

2024-05-03 Thread Joey Gouly
Now that support for POE and Protection Keys has been implemented, add a
config to allow users to actually enable it.

Signed-off-by: Joey Gouly 
Cc: Catalin Marinas 
Cc: Will Deacon 
---
 arch/arm64/Kconfig | 22 ++
 1 file changed, 22 insertions(+)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 7b11c98b3e84..676ebe4bf9eb 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -2095,6 +2095,28 @@ config ARM64_EPAN
  if the cpu does not implement the feature.
 endmenu # "ARMv8.7 architectural features"
 
+menu "ARMv8.9 architectural features"
+config ARM64_POE
+   prompt "Permission Overlay Extension"
+   def_bool y
+   select ARCH_USES_HIGH_VMA_FLAGS
+   select ARCH_HAS_PKEYS
+   help
+ The Permission Overlay Extension is used to implement Memory
+ Protection Keys. Memory Protection Keys provides a mechanism for
+ enforcing page-based protections, but without requiring modification
+ of the page tables when an application changes protection domains.
+
+ For details, see Documentation/core-api/protection-keys.rst
+
+ If unsure, say y.
+
+config ARCH_PKEY_BITS
+   int
+   default 3
+
+endmenu # "ARMv8.9 architectural features"
+
 config ARM64_SVE
bool "ARM Scalable Vector Extension support"
default y
-- 
2.25.1



[PATCH v4 13/29] arm64: convert protection key into vm_flags and pgprot values

2024-05-03 Thread Joey Gouly
Modify arch_calc_vm_prot_bits() and vm_get_page_prot() such that the pkey
value is set in the vm_flags and then into the pgprot value.

Signed-off-by: Joey Gouly 
Cc: Catalin Marinas 
Cc: Will Deacon 
---
 arch/arm64/include/asm/mman.h | 8 +++-
 arch/arm64/mm/mmap.c  | 9 +
 2 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/mman.h b/arch/arm64/include/asm/mman.h
index 5966ee4a6154..ecb2d18dc4d7 100644
--- a/arch/arm64/include/asm/mman.h
+++ b/arch/arm64/include/asm/mman.h
@@ -7,7 +7,7 @@
 #include 
 
 static inline unsigned long arch_calc_vm_prot_bits(unsigned long prot,
-   unsigned long pkey __always_unused)
+   unsigned long pkey)
 {
unsigned long ret = 0;
 
@@ -17,6 +17,12 @@ static inline unsigned long arch_calc_vm_prot_bits(unsigned 
long prot,
if (system_supports_mte() && (prot & PROT_MTE))
ret |= VM_MTE;
 
+#if defined(CONFIG_ARCH_HAS_PKEYS)
+   ret |= pkey & 0x1 ? VM_PKEY_BIT0 : 0;
+   ret |= pkey & 0x2 ? VM_PKEY_BIT1 : 0;
+   ret |= pkey & 0x4 ? VM_PKEY_BIT2 : 0;
+#endif
+
return ret;
 }
 #define arch_calc_vm_prot_bits(prot, pkey) arch_calc_vm_prot_bits(prot, pkey)
diff --git a/arch/arm64/mm/mmap.c b/arch/arm64/mm/mmap.c
index 642bdf908b22..86eda6bc7893 100644
--- a/arch/arm64/mm/mmap.c
+++ b/arch/arm64/mm/mmap.c
@@ -102,6 +102,15 @@ pgprot_t vm_get_page_prot(unsigned long vm_flags)
if (vm_flags & VM_MTE)
prot |= PTE_ATTRINDX(MT_NORMAL_TAGGED);
 
+#ifdef CONFIG_ARCH_HAS_PKEYS
+   if (vm_flags & VM_PKEY_BIT0)
+   prot |= PTE_PO_IDX_0;
+   if (vm_flags & VM_PKEY_BIT1)
+   prot |= PTE_PO_IDX_1;
+   if (vm_flags & VM_PKEY_BIT2)
+   prot |= PTE_PO_IDX_2;
+#endif
+
return __pgprot(prot);
 }
 EXPORT_SYMBOL(vm_get_page_prot);
-- 
2.25.1



[PATCH v4 14/29] arm64: mask out POIndex when modifying a PTE

2024-05-03 Thread Joey Gouly
When a PTE is modified, the POIndex must be masked off so that it can be 
modified.

Signed-off-by: Joey Gouly 
Cc: Catalin Marinas 
Cc: Will Deacon 
Reviewed-by: Catalin Marinas 
---
 arch/arm64/include/asm/pgtable.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index afdd56d26ad7..5c970a9cca67 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -1028,7 +1028,8 @@ static inline pte_t pte_modify(pte_t pte, pgprot_t 
newprot)
 */
const pteval_t mask = PTE_USER | PTE_PXN | PTE_UXN | PTE_RDONLY |
  PTE_PROT_NONE | PTE_VALID | PTE_WRITE | PTE_GP |
- PTE_ATTRINDX_MASK;
+ PTE_ATTRINDX_MASK | PTE_PO_IDX_MASK;
+
/* preserve the hardware dirty information */
if (pte_hw_dirty(pte))
pte = set_pte_bit(pte, __pgprot(PTE_DIRTY));
-- 
2.25.1



[PATCH v4 16/29] arm64: add pte_access_permitted_no_overlay()

2024-05-03 Thread Joey Gouly
We do not want take POE into account when clearing the MTE tags.

Signed-off-by: Joey Gouly 
Cc: Catalin Marinas 
Cc: Will Deacon 
---
 arch/arm64/include/asm/pgtable.h | 11 +++
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index 5c970a9cca67..2449e4e27ea6 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -160,8 +160,10 @@ static inline pteval_t __phys_to_pte_val(phys_addr_t phys)
  * not set) must return false. PROT_NONE mappings do not have the
  * PTE_VALID bit set.
  */
-#define pte_access_permitted(pte, write) \
+#define pte_access_permitted_no_overlay(pte, write) \
(((pte_val(pte) & (PTE_VALID | PTE_USER)) == (PTE_VALID | PTE_USER)) && 
(!(write) || pte_write(pte)))
+#define pte_access_permitted(pte, write) \
+   pte_access_permitted_no_overlay(pte, write)
 #define pmd_access_permitted(pmd, write) \
(pte_access_permitted(pmd_pte(pmd), (write)))
 #define pud_access_permitted(pud, write) \
@@ -348,10 +350,11 @@ static inline void __sync_cache_and_tags(pte_t pte, 
unsigned int nr_pages)
/*
 * If the PTE would provide user space access to the tags associated
 * with it then ensure that the MTE tags are synchronised.  Although
-* pte_access_permitted() returns false for exec only mappings, they
-* don't expose tags (instruction fetches don't check tags).
+* pte_access_permitted_no_overlay() returns false for exec only
+* mappings, they don't expose tags (instruction fetches don't check
+* tags).
 */
-   if (system_supports_mte() && pte_access_permitted(pte, false) &&
+   if (system_supports_mte() && pte_access_permitted_no_overlay(pte, 
false) &&
!pte_special(pte) && pte_tagged(pte))
mte_sync_tags(pte, nr_pages);
 }
-- 
2.25.1



[PATCH v4 15/29] arm64: handle PKEY/POE faults

2024-05-03 Thread Joey Gouly
If a memory fault occurs that is due to an overlay/pkey fault, report that to
userspace with a SEGV_PKUERR.

Signed-off-by: Joey Gouly 
Cc: Catalin Marinas 
Cc: Will Deacon 
---
 arch/arm64/include/asm/traps.h |  1 +
 arch/arm64/kernel/traps.c  | 12 ++--
 arch/arm64/mm/fault.c  | 56 --
 3 files changed, 64 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/include/asm/traps.h b/arch/arm64/include/asm/traps.h
index eefe766d6161..f6f6f2cb7f10 100644
--- a/arch/arm64/include/asm/traps.h
+++ b/arch/arm64/include/asm/traps.h
@@ -25,6 +25,7 @@ try_emulate_armv8_deprecated(struct pt_regs *regs, u32 insn)
 void force_signal_inject(int signal, int code, unsigned long address, unsigned 
long err);
 void arm64_notify_segfault(unsigned long addr);
 void arm64_force_sig_fault(int signo, int code, unsigned long far, const char 
*str);
+void arm64_force_sig_fault_pkey(int signo, int code, unsigned long far, const 
char *str, int pkey);
 void arm64_force_sig_mceerr(int code, unsigned long far, short lsb, const char 
*str);
 void arm64_force_sig_ptrace_errno_trap(int errno, unsigned long far, const 
char *str);
 
diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c
index 215e6d7f2df8..1bac6c84d3f5 100644
--- a/arch/arm64/kernel/traps.c
+++ b/arch/arm64/kernel/traps.c
@@ -263,16 +263,24 @@ static void arm64_show_signal(int signo, const char *str)
__show_regs(regs);
 }
 
-void arm64_force_sig_fault(int signo, int code, unsigned long far,
-  const char *str)
+void arm64_force_sig_fault_pkey(int signo, int code, unsigned long far,
+  const char *str, int pkey)
 {
arm64_show_signal(signo, str);
if (signo == SIGKILL)
force_sig(SIGKILL);
+   else if (code == SEGV_PKUERR)
+   force_sig_pkuerr((void __user *)far, pkey);
else
force_sig_fault(signo, code, (void __user *)far);
 }
 
+void arm64_force_sig_fault(int signo, int code, unsigned long far,
+  const char *str)
+{
+   arm64_force_sig_fault_pkey(signo, code, far, str, 0);
+}
+
 void arm64_force_sig_mceerr(int code, unsigned long far, short lsb,
const char *str)
 {
diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
index 8251e2fea9c7..585295168918 100644
--- a/arch/arm64/mm/fault.c
+++ b/arch/arm64/mm/fault.c
@@ -23,6 +23,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -489,6 +490,23 @@ static void do_bad_area(unsigned long far, unsigned long 
esr,
 #define VM_FAULT_BADMAP((__force vm_fault_t)0x01)
 #define VM_FAULT_BADACCESS ((__force vm_fault_t)0x02)
 
+static bool fault_from_pkey(unsigned long esr, struct vm_area_struct *vma,
+   unsigned int mm_flags)
+{
+   unsigned long iss2 = ESR_ELx_ISS2(esr);
+
+   if (!arch_pkeys_enabled())
+   return false;
+
+   if (iss2 & ESR_ELx_Overlay)
+   return true;
+
+   return !arch_vma_access_permitted(vma,
+   mm_flags & FAULT_FLAG_WRITE,
+   mm_flags & FAULT_FLAG_INSTRUCTION,
+   mm_flags & FAULT_FLAG_REMOTE);
+}
+
 static vm_fault_t __do_page_fault(struct mm_struct *mm,
  struct vm_area_struct *vma, unsigned long 
addr,
  unsigned int mm_flags, unsigned long vm_flags,
@@ -529,6 +547,8 @@ static int __kprobes do_page_fault(unsigned long far, 
unsigned long esr,
unsigned int mm_flags = FAULT_FLAG_DEFAULT;
unsigned long addr = untagged_addr(far);
struct vm_area_struct *vma;
+   bool pkey_fault = false;
+   int pkey = -1;
 
if (kprobe_page_fault(regs, esr))
return 0;
@@ -590,6 +610,12 @@ static int __kprobes do_page_fault(unsigned long far, 
unsigned long esr,
vma_end_read(vma);
goto lock_mmap;
}
+
+   if (fault_from_pkey(esr, vma, mm_flags)) {
+   vma_end_read(vma);
+   goto lock_mmap;
+   }
+
fault = handle_mm_fault(vma, addr, mm_flags | FAULT_FLAG_VMA_LOCK, 
regs);
if (!(fault & (VM_FAULT_RETRY | VM_FAULT_COMPLETED)))
vma_end_read(vma);
@@ -617,6 +643,11 @@ static int __kprobes do_page_fault(unsigned long far, 
unsigned long esr,
goto done;
}
 
+   if (fault_from_pkey(esr, vma, mm_flags)) {
+   pkey_fault = true;
+   pkey = vma_pkey(vma);
+   }
+
fault = __do_page_fault(mm, vma, addr, mm_flags, vm_flags, regs);
 
/* Quick path to respond to signals */
@@ -682,9 +713,28 @@ static int __kprobes do_page_fault(unsigned long far, 
unsigned long esr,
 * Something tried to access memory that isn't in our memory
 * map.
 */
-   arm64_force_sig_fault(SIGSEGV,
- 

[PATCH v4 23/29] kselftest/arm64: move get_header()

2024-05-03 Thread Joey Gouly
Put this function in the header so that it can be used by other tests, without
needing to link to testcases.c.

This will be used by selftest/mm/protection_keys.c

Signed-off-by: Joey Gouly 
Cc: Catalin Marinas 
Cc: Will Deacon 
Cc: Andrew Morton 
Cc: Shuah Khan 
Cc: Dave Hansen 
Cc: Aneesh Kumar K.V 
Reviewed-by: Mark Brown 
---
 .../arm64/signal/testcases/testcases.c| 23 -
 .../arm64/signal/testcases/testcases.h| 25 +--
 2 files changed, 23 insertions(+), 25 deletions(-)

diff --git a/tools/testing/selftests/arm64/signal/testcases/testcases.c 
b/tools/testing/selftests/arm64/signal/testcases/testcases.c
index 674b88cc8c39..e4331440fed0 100644
--- a/tools/testing/selftests/arm64/signal/testcases/testcases.c
+++ b/tools/testing/selftests/arm64/signal/testcases/testcases.c
@@ -6,29 +6,6 @@
 
 #include "testcases.h"
 
-struct _aarch64_ctx *get_header(struct _aarch64_ctx *head, uint32_t magic,
-   size_t resv_sz, size_t *offset)
-{
-   size_t offs = 0;
-   struct _aarch64_ctx *found = NULL;
-
-   if (!head || resv_sz < HDR_SZ)
-   return found;
-
-   while (offs <= resv_sz - HDR_SZ &&
-  head->magic != magic && head->magic) {
-   offs += head->size;
-   head = GET_RESV_NEXT_HEAD(head);
-   }
-   if (head->magic == magic) {
-   found = head;
-   if (offset)
-   *offset = offs;
-   }
-
-   return found;
-}
-
 bool validate_extra_context(struct extra_context *extra, char **err,
void **extra_data, size_t *extra_size)
 {
diff --git a/tools/testing/selftests/arm64/signal/testcases/testcases.h 
b/tools/testing/selftests/arm64/signal/testcases/testcases.h
index 7727126347e0..3185e6875694 100644
--- a/tools/testing/selftests/arm64/signal/testcases/testcases.h
+++ b/tools/testing/selftests/arm64/signal/testcases/testcases.h
@@ -88,8 +88,29 @@ struct fake_sigframe {
 
 bool validate_reserved(ucontext_t *uc, size_t resv_sz, char **err);
 
-struct _aarch64_ctx *get_header(struct _aarch64_ctx *head, uint32_t magic,
-   size_t resv_sz, size_t *offset);
+static inline struct _aarch64_ctx *get_header(struct _aarch64_ctx *head, 
uint32_t magic,
+   size_t resv_sz, size_t *offset)
+{
+   size_t offs = 0;
+   struct _aarch64_ctx *found = NULL;
+
+   if (!head || resv_sz < HDR_SZ)
+   return found;
+
+   while (offs <= resv_sz - HDR_SZ &&
+  head->magic != magic && head->magic) {
+   offs += head->size;
+   head = GET_RESV_NEXT_HEAD(head);
+   }
+   if (head->magic == magic) {
+   found = head;
+   if (offset)
+   *offset = offs;
+   }
+
+   return found;
+}
+
 
 static inline struct _aarch64_ctx *get_terminator(struct _aarch64_ctx *head,
  size_t resv_sz,
-- 
2.25.1



[PATCH v4 24/29] selftests: mm: move fpregs printing

2024-05-03 Thread Joey Gouly
arm64's fpregs are not at a constant offset from sigcontext. Since this is
not an important part of the test, don't print the fpregs pointer on arm64.

Signed-off-by: Joey Gouly 
Cc: Catalin Marinas 
Cc: Will Deacon 
Cc: Andrew Morton 
Cc: Shuah Khan 
Cc: Dave Hansen 
Cc: Aneesh Kumar K.V 
Acked-by: Dave Hansen 
---
 tools/testing/selftests/mm/pkey-powerpc.h| 1 +
 tools/testing/selftests/mm/pkey-x86.h| 2 ++
 tools/testing/selftests/mm/protection_keys.c | 6 ++
 3 files changed, 9 insertions(+)

diff --git a/tools/testing/selftests/mm/pkey-powerpc.h 
b/tools/testing/selftests/mm/pkey-powerpc.h
index ae5df26104e5..6275d0f474b3 100644
--- a/tools/testing/selftests/mm/pkey-powerpc.h
+++ b/tools/testing/selftests/mm/pkey-powerpc.h
@@ -9,6 +9,7 @@
 #endif
 #define REG_IP_IDX PT_NIP
 #define REG_TRAPNO PT_TRAP
+#define MCONTEXT_FPREGS
 #define gregs  gp_regs
 #define fpregs fp_regs
 #define si_pkey_offset 0x20
diff --git a/tools/testing/selftests/mm/pkey-x86.h 
b/tools/testing/selftests/mm/pkey-x86.h
index 814758e109c0..b9170a26bfcb 100644
--- a/tools/testing/selftests/mm/pkey-x86.h
+++ b/tools/testing/selftests/mm/pkey-x86.h
@@ -15,6 +15,8 @@
 
 #endif
 
+#define MCONTEXT_FPREGS
+
 #ifndef PKEY_DISABLE_ACCESS
 # define PKEY_DISABLE_ACCESS   0x1
 #endif
diff --git a/tools/testing/selftests/mm/protection_keys.c 
b/tools/testing/selftests/mm/protection_keys.c
index 48dc151f8fca..b3dbd76ea27c 100644
--- a/tools/testing/selftests/mm/protection_keys.c
+++ b/tools/testing/selftests/mm/protection_keys.c
@@ -314,7 +314,9 @@ void signal_handler(int signum, siginfo_t *si, void 
*vucontext)
ucontext_t *uctxt = vucontext;
int trapno;
unsigned long ip;
+#ifdef MCONTEXT_FPREGS
char *fpregs;
+#endif
 #if defined(__i386__) || defined(__x86_64__) /* arch */
u32 *pkey_reg_ptr;
int pkey_reg_offset;
@@ -330,7 +332,9 @@ void signal_handler(int signum, siginfo_t *si, void 
*vucontext)
 
trapno = uctxt->uc_mcontext.gregs[REG_TRAPNO];
ip = uctxt->uc_mcontext.gregs[REG_IP_IDX];
+#ifdef MCONTEXT_FPREGS
fpregs = (char *) uctxt->uc_mcontext.fpregs;
+#endif
 
dprintf2("%s() trapno: %d ip: 0x%016lx info->si_code: %s/%d\n",
__func__, trapno, ip, si_code_str(si->si_code),
@@ -359,7 +363,9 @@ void signal_handler(int signum, siginfo_t *si, void 
*vucontext)
 #endif /* arch */
 
dprintf1("siginfo: %p\n", si);
+#ifdef MCONTEXT_FPREGS
dprintf1(" fpregs: %p\n", fpregs);
+#endif
 
if ((si->si_code == SEGV_MAPERR) ||
(si->si_code == SEGV_ACCERR) ||
-- 
2.25.1



[PATCH v4 25/29] selftests: mm: make protection_keys test work on arm64

2024-05-03 Thread Joey Gouly
The encoding of the pkey register differs on arm64, than on x86/ppc. On those
platforms, a bit in the register is used to disable permissions, for arm64, a
bit enabled in the register indicates that the permission is allowed.

This drops two asserts of the form:
 assert(read_pkey_reg() <= orig_pkey_reg);
Because on arm64 this doesn't hold, due to the encoding.

The pkey must be reset to both access allow and write allow in the signal
handler. pkey_access_allow() works currently for PowerPC as the
PKEY_DISABLE_ACCESS and PKEY_DISABLE_WRITE have overlapping bits set.

Access to the uc_mcontext is abstracted, as arm64 has a different structure.

Signed-off-by: Joey Gouly 
Cc: Catalin Marinas 
Cc: Will Deacon 
Cc: Andrew Morton 
Cc: Shuah Khan 
Cc: Dave Hansen 
Cc: Aneesh Kumar K.V 
Acked-by: Dave Hansen 
---
 .../arm64/signal/testcases/testcases.h|   3 +
 tools/testing/selftests/mm/Makefile   |   2 +-
 tools/testing/selftests/mm/pkey-arm64.h   | 139 ++
 tools/testing/selftests/mm/pkey-helpers.h |   8 +
 tools/testing/selftests/mm/pkey-powerpc.h |   2 +
 tools/testing/selftests/mm/pkey-x86.h |   2 +
 tools/testing/selftests/mm/protection_keys.c  | 103 +++--
 7 files changed, 247 insertions(+), 12 deletions(-)
 create mode 100644 tools/testing/selftests/mm/pkey-arm64.h

diff --git a/tools/testing/selftests/arm64/signal/testcases/testcases.h 
b/tools/testing/selftests/arm64/signal/testcases/testcases.h
index 3185e6875694..9872b8912714 100644
--- a/tools/testing/selftests/arm64/signal/testcases/testcases.h
+++ b/tools/testing/selftests/arm64/signal/testcases/testcases.h
@@ -26,6 +26,9 @@
 #define HDR_SZ \
sizeof(struct _aarch64_ctx)
 
+#define GET_UC_RESV_HEAD(uc) \
+   (struct _aarch64_ctx *)(&(uc->uc_mcontext.__reserved))
+
 #define GET_SF_RESV_HEAD(sf) \
(struct _aarch64_ctx *)(&(sf).uc.uc_mcontext.__reserved)
 
diff --git a/tools/testing/selftests/mm/Makefile 
b/tools/testing/selftests/mm/Makefile
index eb5f39a2668b..18642fb4966f 100644
--- a/tools/testing/selftests/mm/Makefile
+++ b/tools/testing/selftests/mm/Makefile
@@ -98,7 +98,7 @@ TEST_GEN_FILES += $(BINARIES_64)
 endif
 else
 
-ifneq (,$(findstring $(ARCH),ppc64))
+ifneq (,$(filter $(ARCH),arm64 ppc64))
 TEST_GEN_FILES += protection_keys
 endif
 
diff --git a/tools/testing/selftests/mm/pkey-arm64.h 
b/tools/testing/selftests/mm/pkey-arm64.h
new file mode 100644
index ..d17cad022100
--- /dev/null
+++ b/tools/testing/selftests/mm/pkey-arm64.h
@@ -0,0 +1,139 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (C) 2023 Arm Ltd.
+ */
+
+#ifndef _PKEYS_ARM64_H
+#define _PKEYS_ARM64_H
+
+#include "vm_util.h"
+/* for signal frame parsing */
+#include "../arm64/signal/testcases/testcases.h"
+
+#ifndef SYS_mprotect_key
+# define SYS_mprotect_key  288
+#endif
+#ifndef SYS_pkey_alloc
+# define SYS_pkey_alloc289
+# define SYS_pkey_free 290
+#endif
+#define MCONTEXT_IP(mc)mc.pc
+#define MCONTEXT_TRAPNO(mc)-1
+
+#define PKEY_MASK  0xf
+
+#define POE_NONE   0x0
+#define POE_X  0x2
+#define POE_RX 0x3
+#define POE_RWX0x7
+
+#define NR_PKEYS   7
+#define NR_RESERVED_PKEYS  1 /* pkey-0 */
+
+#define PKEY_ALLOW_ALL 0x
+
+#define PKEY_BITS_PER_PKEY 4
+#define PAGE_SIZE  sysconf(_SC_PAGESIZE)
+#undef HPAGE_SIZE
+#define HPAGE_SIZE default_huge_page_size()
+
+/* 4-byte instructions * 16384 = 64K page */
+#define __page_o_noops() asm(".rept 16384 ; nop; .endr")
+
+static inline u64 __read_pkey_reg(void)
+{
+   u64 pkey_reg = 0;
+
+   // POR_EL0
+   asm volatile("mrs %0, S3_3_c10_c2_4" : "=r" (pkey_reg));
+
+   return pkey_reg;
+}
+
+static inline void __write_pkey_reg(u64 pkey_reg)
+{
+   u64 por = pkey_reg;
+
+   dprintf4("%s() changing %016llx to %016llx\n",
+__func__, __read_pkey_reg(), pkey_reg);
+
+   // POR_EL0
+   asm volatile("msr S3_3_c10_c2_4, %0\nisb" :: "r" (por) :);
+
+   dprintf4("%s() pkey register after changing %016llx to %016llx\n",
+   __func__, __read_pkey_reg(), pkey_reg);
+}
+
+static inline int cpu_has_pkeys(void)
+{
+   /* No simple way to determine this */
+   return 1;
+}
+
+static inline u32 pkey_bit_position(int pkey)
+{
+   return pkey * PKEY_BITS_PER_PKEY;
+}
+
+static inline int get_arch_reserved_keys(void)
+{
+   return NR_RESERVED_PKEYS;
+}
+
+void expect_fault_on_read_execonly_key(void *p1, int pkey)
+{
+}
+
+void *malloc_pkey_with_mprotect_subpage(long size, int prot, u16 pkey)
+{
+   return PTR_ERR_ENOTSUP;
+}
+
+#define set_pkey_bits  set_pkey_bits
+static inline u64 set_pkey_bits(u64 reg, int pkey, u64 flags)
+{
+   u32 shift = pkey_bit_position(pkey);
+   u64 new_val = POE_RWX;
+
+   /* mask out bits from pkey in old value */
+   re

[PATCH v4 26/29] kselftest/arm64: add HWCAP test for FEAT_S1POE

2024-05-03 Thread Joey Gouly
Check that when POE is enabled, the POR_EL0 register is accessible.

Signed-off-by: Joey Gouly 
Cc: Catalin Marinas 
Cc: Will Deacon 
Cc: Mark Brown 
Cc: Shuah Khan 
Reviewed-by: Mark Brown 
---
 tools/testing/selftests/arm64/abi/hwcap.c | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/tools/testing/selftests/arm64/abi/hwcap.c 
b/tools/testing/selftests/arm64/abi/hwcap.c
index d8909b2b535a..f2d6007a2b98 100644
--- a/tools/testing/selftests/arm64/abi/hwcap.c
+++ b/tools/testing/selftests/arm64/abi/hwcap.c
@@ -156,6 +156,12 @@ static void pmull_sigill(void)
asm volatile(".inst 0x0ee0e000" : : : );
 }
 
+static void poe_sigill(void)
+{
+   /* mrs x0, POR_EL0 */
+   asm volatile("mrs x0, S3_3_C10_C2_4" : : : "x0");
+}
+
 static void rng_sigill(void)
 {
asm volatile("mrs x0, S3_3_C2_C4_0" : : : "x0");
@@ -601,6 +607,14 @@ static const struct hwcap_data {
.cpuinfo = "pmull",
.sigill_fn = pmull_sigill,
},
+   {
+   .name = "POE",
+   .at_hwcap = AT_HWCAP2,
+   .hwcap_bit = HWCAP2_POE,
+   .cpuinfo = "poe",
+   .sigill_fn = poe_sigill,
+   .sigill_reliable = true,
+   },
{
.name = "RNG",
.at_hwcap = AT_HWCAP2,
-- 
2.25.1



[PATCH v4 27/29] kselftest/arm64: parse POE_MAGIC in a signal frame

2024-05-03 Thread Joey Gouly
Teach the signal frame parsing about the new POE frame, avoids warning when it
is generated.

Signed-off-by: Joey Gouly 
Cc: Catalin Marinas 
Cc: Will Deacon 
Cc: Mark Brown 
Cc: Shuah Khan 
Reviewed-by: Mark Brown 
---
 tools/testing/selftests/arm64/signal/testcases/testcases.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/tools/testing/selftests/arm64/signal/testcases/testcases.c 
b/tools/testing/selftests/arm64/signal/testcases/testcases.c
index e4331440fed0..e6daa94fcd2e 100644
--- a/tools/testing/selftests/arm64/signal/testcases/testcases.c
+++ b/tools/testing/selftests/arm64/signal/testcases/testcases.c
@@ -161,6 +161,10 @@ bool validate_reserved(ucontext_t *uc, size_t resv_sz, 
char **err)
if (head->size != sizeof(struct esr_context))
*err = "Bad size for esr_context";
break;
+   case POE_MAGIC:
+   if (head->size != sizeof(struct poe_context))
+   *err = "Bad size for poe_context";
+   break;
case TPIDR2_MAGIC:
if (head->size != sizeof(struct tpidr2_context))
*err = "Bad size for tpidr2_context";
-- 
2.25.1



[PATCH v4 28/29] kselftest/arm64: Add test case for POR_EL0 signal frame records

2024-05-03 Thread Joey Gouly
Ensure that we get signal context for POR_EL0 if and only if POE is present
on the system.

Copied from the TPIDR2 test.

Signed-off-by: Joey Gouly 
Cc: Catalin Marinas 
Cc: Will Deacon 
Cc: Mark Brown 
Cc: Shuah Khan 
---
 .../testing/selftests/arm64/signal/.gitignore |  1 +
 .../arm64/signal/testcases/poe_siginfo.c  | 86 +++
 2 files changed, 87 insertions(+)
 create mode 100644 tools/testing/selftests/arm64/signal/testcases/poe_siginfo.c

diff --git a/tools/testing/selftests/arm64/signal/.gitignore 
b/tools/testing/selftests/arm64/signal/.gitignore
index 1ce5b5eac386..b2f2bfd5c6aa 100644
--- a/tools/testing/selftests/arm64/signal/.gitignore
+++ b/tools/testing/selftests/arm64/signal/.gitignore
@@ -2,6 +2,7 @@
 mangle_*
 fake_sigreturn_*
 fpmr_*
+poe_*
 sme_*
 ssve_*
 sve_*
diff --git a/tools/testing/selftests/arm64/signal/testcases/poe_siginfo.c 
b/tools/testing/selftests/arm64/signal/testcases/poe_siginfo.c
new file mode 100644
index ..d890029304c4
--- /dev/null
+++ b/tools/testing/selftests/arm64/signal/testcases/poe_siginfo.c
@@ -0,0 +1,86 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2023 Arm Limited
+ *
+ * Verify that the POR_EL0 register context in signal frames is set up as
+ * expected.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "test_signals_utils.h"
+#include "testcases.h"
+
+static union {
+   ucontext_t uc;
+   char buf[1024 * 128];
+} context;
+
+#define SYS_POR_EL0 "S3_3_C10_C2_4"
+
+static uint64_t get_por_el0(void)
+{
+   uint64_t val;
+
+   asm volatile (
+   "mrs%0, " SYS_POR_EL0 "\n"
+   : "=r"(val)
+   :
+   : "cc");
+
+   return val;
+}
+
+int poe_present(struct tdescr *td, siginfo_t *si, ucontext_t *uc)
+{
+   struct _aarch64_ctx *head = GET_BUF_RESV_HEAD(context);
+   struct poe_context *poe_ctx;
+   size_t offset;
+   bool in_sigframe;
+   bool have_poe;
+   __u64 orig_poe;
+
+   have_poe = getauxval(AT_HWCAP2) & HWCAP2_POE;
+   if (have_poe)
+   orig_poe = get_por_el0();
+
+   if (!get_current_context(td, &context.uc, sizeof(context)))
+   return 1;
+
+   poe_ctx = (struct poe_context *)
+   get_header(head, POE_MAGIC, td->live_sz, &offset);
+
+   in_sigframe = poe_ctx != NULL;
+
+   fprintf(stderr, "POR_EL0 sigframe %s on system %s POE\n",
+   in_sigframe ? "present" : "absent",
+   have_poe ? "with" : "without");
+
+   td->pass = (in_sigframe == have_poe);
+
+   /*
+* Check that the value we read back was the one present at
+* the time that the signal was triggered.
+*/
+   if (have_poe && poe_ctx) {
+   if (poe_ctx->por_el0 != orig_poe) {
+   fprintf(stderr, "POR_EL0 in frame is %llx, was %llx\n",
+   poe_ctx->por_el0, orig_poe);
+   td->pass = false;
+   }
+   }
+
+   return 0;
+}
+
+struct tdescr tde = {
+   .name = "POR_EL0",
+   .descr = "Validate that POR_EL0 is present as expected",
+   .timeout = 3,
+   .run = poe_present,
+};
-- 
2.25.1



[PATCH v4 29/29] KVM: selftests: get-reg-list: add Permission Overlay registers

2024-05-03 Thread Joey Gouly
Add new system registers:
  - POR_EL1
  - POR_EL0

Signed-off-by: Joey Gouly 
Cc: Catalin Marinas 
Cc: Will Deacon 
Cc: Marc Zyngier 
Cc: Oliver Upton 
Cc: Shuah Khan 
Reviewed-by: Mark Brown 
---
 tools/testing/selftests/kvm/aarch64/get-reg-list.c | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/tools/testing/selftests/kvm/aarch64/get-reg-list.c 
b/tools/testing/selftests/kvm/aarch64/get-reg-list.c
index 709d7d721760..ac661ebf6859 100644
--- a/tools/testing/selftests/kvm/aarch64/get-reg-list.c
+++ b/tools/testing/selftests/kvm/aarch64/get-reg-list.c
@@ -40,6 +40,18 @@ static struct feature_id_reg feat_id_regs[] = {
ARM64_SYS_REG(3, 0, 0, 7, 3),   /* ID_AA64MMFR3_EL1 */
4,
1
+   },
+   {
+   ARM64_SYS_REG(3, 0, 10, 2, 4),  /* POR_EL1 */
+   ARM64_SYS_REG(3, 0, 0, 7, 3),   /* ID_AA64MMFR3_EL1 */
+   16,
+   1
+   },
+   {
+   ARM64_SYS_REG(3, 3, 10, 2, 4),  /* POR_EL0 */
+   ARM64_SYS_REG(3, 0, 0, 7, 3),   /* ID_AA64MMFR3_EL1 */
+   16,
+   1
}
 };
 
@@ -468,6 +480,7 @@ static __u64 base_regs[] = {
ARM64_SYS_REG(3, 0, 10, 2, 0),  /* MAIR_EL1 */
ARM64_SYS_REG(3, 0, 10, 2, 2),  /* PIRE0_EL1 */
ARM64_SYS_REG(3, 0, 10, 2, 3),  /* PIR_EL1 */
+   ARM64_SYS_REG(3, 0, 10, 2, 4),  /* POR_EL1 */
ARM64_SYS_REG(3, 0, 10, 3, 0),  /* AMAIR_EL1 */
ARM64_SYS_REG(3, 0, 12, 0, 0),  /* VBAR_EL1 */
ARM64_SYS_REG(3, 0, 12, 1, 1),  /* DISR_EL1 */
@@ -475,6 +488,7 @@ static __u64 base_regs[] = {
ARM64_SYS_REG(3, 0, 13, 0, 4),  /* TPIDR_EL1 */
ARM64_SYS_REG(3, 0, 14, 1, 0),  /* CNTKCTL_EL1 */
ARM64_SYS_REG(3, 2, 0, 0, 0),   /* CSSELR_EL1 */
+   ARM64_SYS_REG(3, 3, 10, 2, 4),  /* POR_EL0 */
ARM64_SYS_REG(3, 3, 13, 0, 2),  /* TPIDR_EL0 */
ARM64_SYS_REG(3, 3, 13, 0, 3),  /* TPIDRRO_EL0 */
ARM64_SYS_REG(3, 3, 14, 0, 1),  /* CNTPCT_EL0 */
-- 
2.25.1



[PATCH V2] tty: hvc: hvc_opal: eliminate uses of of_node_put()

2024-05-03 Thread Lu Dai
Make use of the __free() cleanup handler to automatically free nodes
when they get out of scope.

Remove the need for a 'goto' as an effect.

Signed-off-by: Lu Dai 
---
Changes since v1:
Move the assignment of 'opal' to its declaration
Seperate the declaration of 'np'

 drivers/tty/hvc/hvc_opal.c | 13 +
 1 file changed, 5 insertions(+), 8 deletions(-)

diff --git a/drivers/tty/hvc/hvc_opal.c b/drivers/tty/hvc/hvc_opal.c
index 095c33ad10f8..c17e8343ea60 100644
--- a/drivers/tty/hvc/hvc_opal.c
+++ b/drivers/tty/hvc/hvc_opal.c
@@ -327,19 +327,18 @@ static void udbg_init_opal_common(void)
 
 void __init hvc_opal_init_early(void)
 {
-   struct device_node *stdout_node = of_node_get(of_stdout);
+   struct device_node *stdout_node __free(device_node) = 
of_node_get(of_stdout);
const __be32 *termno;
const struct hv_ops *ops;
u32 index;
 
/* If the console wasn't in /chosen, try /ibm,opal */
if (!stdout_node) {
-   struct device_node *opal, *np;
-
/* Current OPAL takeover doesn't provide the stdout
 * path, so we hard wire it
 */
-   opal = of_find_node_by_path("/ibm,opal/consoles");
+   struct device_node *opal __free(device_node) =
+   of_find_node_by_path("/ibm,opal/consoles");
if (opal) {
pr_devel("hvc_opal: Found consoles in new location\n");
} else {
@@ -350,13 +349,13 @@ void __init hvc_opal_init_early(void)
}
if (!opal)
return;
+   struct device_node *np;
for_each_child_of_node(opal, np) {
if (of_node_name_eq(np, "serial")) {
stdout_node = np;
break;
}
}
-   of_node_put(opal);
}
if (!stdout_node)
return;
@@ -382,13 +381,11 @@ void __init hvc_opal_init_early(void)
hvsilib_establish(&hvc_opal_boot_priv.hvsi);
pr_devel("hvc_opal: Found HVSI console\n");
} else
-   goto out;
+   return;
hvc_opal_boot_termno = index;
udbg_init_opal_common();
add_preferred_console("hvc", index, NULL);
hvc_instantiate(index, index, ops);
-out:
-   of_node_put(stdout_node);
 }
 
 #ifdef CONFIG_PPC_EARLY_DEBUG_OPAL_RAW
-- 
2.39.2



Re: [EXT] [PATCH v8 6/6] docs: trusted-encrypted: add DCP as new trust source

2024-05-03 Thread Jarkko Sakkinen
On Tue Apr 30, 2024 at 3:03 PM EEST, David Gstir wrote:
> Hi Jarkko,
>
> > On 30.04.2024, at 13:48, Kshitiz Varshney  wrote:
> > 
> > Hi David,
> > 
> >> -Original Message-
> >> From: David Gstir 
> >> Sent: Monday, April 29, 2024 5:05 PM
> >> To: Kshitiz Varshney 
>
>
> >> 
> >> Did you get around to testing this?
> >> I’d greatly appreciate a Tested-by for this. :-)
> >> 
> >> Thanks!
> >> BR, David
> > 
> > Currently, I am bit busy with other priority activities. It will take time 
> > to test this patch set.
>
> How should we proceed here?
> Do we have to miss another release cycle, because of a Tested-by?
>
> If any bugs pop up I’ll happily fix them, but at the moment it appears to be 
> more of a formality.
> IMHO the patch set itself is rather small and has been thoroughly reviewed to 
> ensure that any huge
> issues would already have been caught by now.

I don't mind picking this actually since unless you consume it,
it should not get in the way. I'll pick it during the weekend.
Thanks for reminding.

BR, Jarkko