Re: [PATCH v3] powerpc/mm: Support execute-only memory on the Radix MMU
On Tue, 2022-08-09 at 05:51 +, Christophe Leroy wrote: > Le 09/08/2022 à 04:44, Russell Currey a écrit : > > The Hash MMU already supports XOM (i.e. mmap with PROT_EXEC only) > > through the execute-only pkey. A PROT_EXEC-only mapping will > > actually > > map to RX, and then the pkey will be applied on top of it. > > I don't think XOM is a commonly understood accronym. Maybe the first > time you use it it'd be better to say something like: > > The Hash MMU already supports execute-only memory (XOM) Yes, that's better. > > When you say that Hash MMU supports it through the execute-only pkey, > does it mean that it is taken into account automatically at mmap > time, > or does the userspace app has to do something special to use the key > ? > If it is the second, it means that depending on whether you are radix > or > not, you must do something different ? Is that expected ? It happens at mmap time, see do_mmap() in mm/mmap.c (and similar for mprotect). That calls into execute_only_pkey() which can return something on x86 & Hash, and if it does that pkey gets used. The userspace process doesn't have to do anything, it's transparent. So there's no difference in program behaviour switching between Hash/Radix - at least in the basic cases I've tested. > > > > > Radix doesn't have pkeys, but it does have execute permissions > > built-in > > to the MMU, so all we have to do to support XOM is expose it. > > > > Signed-off-by: Russell Currey > > --- > > v3: Incorporate Aneesh's suggestions, leave protection_map > > untouched > > Basic test: > > https://github.com/ruscur/junkcode/blob/main/mmap_test.c > > > > arch/powerpc/include/asm/book3s/64/pgtable.h | 2 ++ > > arch/powerpc/mm/book3s64/pgtable.c | 11 +-- > > arch/powerpc/mm/fault.c | 6 +- > > 3 files changed, 16 insertions(+), 3 deletions(-) > > > > diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h > > b/arch/powerpc/include/asm/book3s/64/pgtable.h > > index 392ff48f77df..486902aff040 100644 > > --- a/arch/powerpc/include/asm/book3s/64/pgtable.h > > +++ b/arch/powerpc/include/asm/book3s/64/pgtable.h > > @@ -151,6 +151,8 @@ > > #define PAGE_COPY_X __pgprot(_PAGE_BASE | _PAGE_READ | > > _PAGE_EXEC) > > #define PAGE_READONLY __pgprot(_PAGE_BASE | _PAGE_READ) > > #define PAGE_READONLY_X __pgprot(_PAGE_BASE | _PAGE_READ | > > _PAGE_EXEC) > > +/* Radix only, Hash uses PAGE_READONLY_X + execute-only pkey > > instead */ > > +#define PAGE_EXECONLY __pgprot(_PAGE_BASE | _PAGE_EXEC) > > > > /* Permission masks used for kernel mappings */ > > #define PAGE_KERNEL __pgprot(_PAGE_BASE | _PAGE_KERNEL_RW) > > diff --git a/arch/powerpc/mm/book3s64/pgtable.c > > b/arch/powerpc/mm/book3s64/pgtable.c > > index 7b9966402b25..62f63d344596 100644 > > --- a/arch/powerpc/mm/book3s64/pgtable.c > > +++ b/arch/powerpc/mm/book3s64/pgtable.c > > @@ -553,8 +553,15 @@ EXPORT_SYMBOL_GPL(memremap_compat_align); > > > > pgprot_t vm_get_page_prot(unsigned long vm_flags) > > { > > - unsigned long prot = pgprot_val(protection_map[vm_flags & > > - > > (VM_READ|VM_WRITE|VM_EXEC|VM_ > > SHARED)]); > > + unsigned long prot; > > + > > + /* Radix supports execute-only, but protection_map maps X - > > > RX */ > > + if (radix_enabled() && ((vm_flags & > > (VM_READ|VM_WRITE|VM_EXEC)) == VM_EXEC)) { > > Maybe use VM_ACCESS_FLAGS ? I was looking for something like that but only checked powerpc, thanks. > > > + prot = pgprot_val(PAGE_EXECONLY); > > + } else { > > + prot = pgprot_val(protection_map[vm_flags & > > + > > (VM_READ|VM_WRITE|VM_EXEC|VM_SHARED)]); > > + } > > > > if (vm_flags & VM_SAO) > > prot |= _PAGE_SAO; > > diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c > > index 014005428687..59e4cbcf3109 100644 > > --- a/arch/powerpc/mm/fault.c > > +++ b/arch/powerpc/mm/fault.c > > @@ -270,7 +270,11 @@ static bool access_error(bool is_write, bool > > is_exec, struct vm_area_struct *vma > > return false; > > } > > > > - if (unlikely(!vma_is_accessible(vma))) > > + /* On Radix, a read fault could be from PROT_NONE or > > PROT_EXEC */ > > + if (unlikely(radix_enabled() && !(vma->vm_flags & > > VM_READ))) > > + return true; > > Why do you need the radix_enabled() here ? > Even if it doesn't fault directly, reading a non readable area is > still > an error and should be handled as such, even on hardware that will > not > generate a fault for it at the first place. So I'd just do: > > if (!(vma->vm_flags & VM_READ))) > return true; > I don't think we need it, just concerned I might break something. I can do that. > > + /* Check for a PROT_NONE fault on other MMUs */ > > + else if (unlikely(!vma_is_accessible(vma))) > >
Re: [PATCH v3] powerpc/mm: Support execute-only memory on the Radix MMU
On 8/9/22 11:21 AM, Christophe Leroy wrote: > Le 09/08/2022 à 04:44, Russell Currey a écrit : >> The Hash MMU already supports XOM (i.e. mmap with PROT_EXEC only) >> through the execute-only pkey. A PROT_EXEC-only mapping will actually >> map to RX, and then the pkey will be applied on top of it. > > I don't think XOM is a commonly understood accronym. Maybe the first > time you use it it'd be better to say something like: > > The Hash MMU already supports execute-only memory (XOM) > > When you say that Hash MMU supports it through the execute-only pkey, > does it mean that it is taken into account automatically at mmap time, > or does the userspace app has to do something special to use the key ? > If it is the second, it means that depending on whether you are radix or > not, you must do something different ? Is that expected ? > >> >> Radix doesn't have pkeys, but it does have execute permissions built-in >> to the MMU, so all we have to do to support XOM is expose it. >> >> Signed-off-by: Russell Currey >> --- >> v3: Incorporate Aneesh's suggestions, leave protection_map untouched >> Basic test: https://github.com/ruscur/junkcode/blob/main/mmap_test.c >> >> arch/powerpc/include/asm/book3s/64/pgtable.h | 2 ++ >> arch/powerpc/mm/book3s64/pgtable.c | 11 +-- >> arch/powerpc/mm/fault.c | 6 +- >> 3 files changed, 16 insertions(+), 3 deletions(-) >> >> diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h >> b/arch/powerpc/include/asm/book3s/64/pgtable.h >> index 392ff48f77df..486902aff040 100644 >> --- a/arch/powerpc/include/asm/book3s/64/pgtable.h >> +++ b/arch/powerpc/include/asm/book3s/64/pgtable.h >> @@ -151,6 +151,8 @@ >> #define PAGE_COPY_X__pgprot(_PAGE_BASE | _PAGE_READ | _PAGE_EXEC) >> #define PAGE_READONLY __pgprot(_PAGE_BASE | _PAGE_READ) >> #define PAGE_READONLY_X__pgprot(_PAGE_BASE | _PAGE_READ | _PAGE_EXEC) >> +/* Radix only, Hash uses PAGE_READONLY_X + execute-only pkey instead */ >> +#define PAGE_EXECONLY __pgprot(_PAGE_BASE | _PAGE_EXEC) >> >> /* Permission masks used for kernel mappings */ >> #define PAGE_KERNEL__pgprot(_PAGE_BASE | _PAGE_KERNEL_RW) >> diff --git a/arch/powerpc/mm/book3s64/pgtable.c >> b/arch/powerpc/mm/book3s64/pgtable.c >> index 7b9966402b25..62f63d344596 100644 >> --- a/arch/powerpc/mm/book3s64/pgtable.c >> +++ b/arch/powerpc/mm/book3s64/pgtable.c >> @@ -553,8 +553,15 @@ EXPORT_SYMBOL_GPL(memremap_compat_align); >> >> pgprot_t vm_get_page_prot(unsigned long vm_flags) >> { >> -unsigned long prot = pgprot_val(protection_map[vm_flags & >> -(VM_READ|VM_WRITE|VM_EXEC|VM_SHARED)]); >> +unsigned long prot; >> + >> +/* Radix supports execute-only, but protection_map maps X -> RX */ >> +if (radix_enabled() && ((vm_flags & (VM_READ|VM_WRITE|VM_EXEC)) == >> VM_EXEC)) { > > Maybe use VM_ACCESS_FLAGS ? > >> +prot = pgprot_val(PAGE_EXECONLY); >> +} else { >> +prot = pgprot_val(protection_map[vm_flags & >> + (VM_READ|VM_WRITE|VM_EXEC|VM_SHARED)]); >> +} >> >> if (vm_flags & VM_SAO) >> prot |= _PAGE_SAO; >> diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c >> index 014005428687..59e4cbcf3109 100644 >> --- a/arch/powerpc/mm/fault.c >> +++ b/arch/powerpc/mm/fault.c >> @@ -270,7 +270,11 @@ static bool access_error(bool is_write, bool is_exec, >> struct vm_area_struct *vma >> return false; >> } >> >> -if (unlikely(!vma_is_accessible(vma))) >> +/* On Radix, a read fault could be from PROT_NONE or PROT_EXEC */ >> +if (unlikely(radix_enabled() && !(vma->vm_flags & VM_READ))) >> +return true; > > Why do you need the radix_enabled() here ? > Even if it doesn't fault directly, reading a non readable area is still > an error and should be handled as such, even on hardware that will not > generate a fault for it at the first place. So I'd just do: > > if (!(vma->vm_flags & VM_READ))) > return true; > >> +/* Check for a PROT_NONE fault on other MMUs */ >> +else if (unlikely(!vma_is_accessible(vma))) >> return true; >> /* >> * We should ideally do the vma pkey access check here. But in the > > Don't use an if/else construct, there is no other 'else' in that > function, or in similar functions like bad_kernel_fault() for instance. > > So leave the !vma_is_accessible(vma) untouched and add your check as a > standalone check before or after it. What does vma_is_accessible() check bring if we have the VM_READ check unconditional ? -aneesh
Re: [PATCH v3 11/25] powerpc/ftrace: Make __ftrace_make_{nop/call}() common to PPC32 and PPC64
Nick Desaulniers wrote: Christ...that's not what I meant to send; sorry, mutt is giving me bizarro errors today so I can't insert myself into threads using the usual incantation I have in the past. What I meant to send was: Christophe, Ondrej reported a regression with ftrace when building w/ clang bisected to this commit. Can you PTAL? https://github.com/ClangBuiltLinux/linux/issues/1682 Thanks for the report. That was my doing - I tend to miss the fact that clang doesn't support -mprofile-kernel. Can you check if the below patch fixes it for you? - Naveen --- diff --git a/arch/powerpc/kernel/trace/ftrace.c b/arch/powerpc/kernel/trace/ftrace.c index cb158c32b50b99..7b85c3b460a3c0 100644 --- a/arch/powerpc/kernel/trace/ftrace.c +++ b/arch/powerpc/kernel/trace/ftrace.c @@ -393,11 +393,11 @@ int ftrace_make_nop(struct module *mod, */ static bool expected_nop_sequence(void *ip, ppc_inst_t op0, ppc_inst_t op1) { - if (IS_ENABLED(CONFIG_PPC64_ELF_ABI_V1)) + if (IS_ENABLED(CONFIG_DYNAMIC_FTRACE_WITH_REGS)) + return ppc_inst_equal(op0, ppc_inst(PPC_RAW_NOP())); + else return ppc_inst_equal(op0, ppc_inst(PPC_RAW_BRANCH(8))) && ppc_inst_equal(op1, ppc_inst(PPC_INST_LD_TOC)); - else - return ppc_inst_equal(op0, ppc_inst(PPC_RAW_NOP())); } static int @@ -412,7 +412,7 @@ __ftrace_make_call(struct dyn_ftrace *rec, unsigned long addr) if (copy_inst_from_kernel_nofault(op, ip)) return -EFAULT; - if (IS_ENABLED(CONFIG_PPC64_ELF_ABI_V1) && + if (!IS_ENABLED(CONFIG_DYNAMIC_FTRACE_WITH_REGS) && copy_inst_from_kernel_nofault(op + 1, ip + 4)) return -EFAULT;
Re: [PATCH v3] powerpc/mm: Support execute-only memory on the Radix MMU
Le 09/08/2022 à 04:44, Russell Currey a écrit : > The Hash MMU already supports XOM (i.e. mmap with PROT_EXEC only) > through the execute-only pkey. A PROT_EXEC-only mapping will actually > map to RX, and then the pkey will be applied on top of it. I don't think XOM is a commonly understood accronym. Maybe the first time you use it it'd be better to say something like: The Hash MMU already supports execute-only memory (XOM) When you say that Hash MMU supports it through the execute-only pkey, does it mean that it is taken into account automatically at mmap time, or does the userspace app has to do something special to use the key ? If it is the second, it means that depending on whether you are radix or not, you must do something different ? Is that expected ? > > Radix doesn't have pkeys, but it does have execute permissions built-in > to the MMU, so all we have to do to support XOM is expose it. > > Signed-off-by: Russell Currey > --- > v3: Incorporate Aneesh's suggestions, leave protection_map untouched > Basic test: https://github.com/ruscur/junkcode/blob/main/mmap_test.c > > arch/powerpc/include/asm/book3s/64/pgtable.h | 2 ++ > arch/powerpc/mm/book3s64/pgtable.c | 11 +-- > arch/powerpc/mm/fault.c | 6 +- > 3 files changed, 16 insertions(+), 3 deletions(-) > > diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h > b/arch/powerpc/include/asm/book3s/64/pgtable.h > index 392ff48f77df..486902aff040 100644 > --- a/arch/powerpc/include/asm/book3s/64/pgtable.h > +++ b/arch/powerpc/include/asm/book3s/64/pgtable.h > @@ -151,6 +151,8 @@ > #define PAGE_COPY_X __pgprot(_PAGE_BASE | _PAGE_READ | _PAGE_EXEC) > #define PAGE_READONLY __pgprot(_PAGE_BASE | _PAGE_READ) > #define PAGE_READONLY_X __pgprot(_PAGE_BASE | _PAGE_READ | _PAGE_EXEC) > +/* Radix only, Hash uses PAGE_READONLY_X + execute-only pkey instead */ > +#define PAGE_EXECONLY__pgprot(_PAGE_BASE | _PAGE_EXEC) > > /* Permission masks used for kernel mappings */ > #define PAGE_KERNEL __pgprot(_PAGE_BASE | _PAGE_KERNEL_RW) > diff --git a/arch/powerpc/mm/book3s64/pgtable.c > b/arch/powerpc/mm/book3s64/pgtable.c > index 7b9966402b25..62f63d344596 100644 > --- a/arch/powerpc/mm/book3s64/pgtable.c > +++ b/arch/powerpc/mm/book3s64/pgtable.c > @@ -553,8 +553,15 @@ EXPORT_SYMBOL_GPL(memremap_compat_align); > > pgprot_t vm_get_page_prot(unsigned long vm_flags) > { > - unsigned long prot = pgprot_val(protection_map[vm_flags & > - (VM_READ|VM_WRITE|VM_EXEC|VM_SHARED)]); > + unsigned long prot; > + > + /* Radix supports execute-only, but protection_map maps X -> RX */ > + if (radix_enabled() && ((vm_flags & (VM_READ|VM_WRITE|VM_EXEC)) == > VM_EXEC)) { Maybe use VM_ACCESS_FLAGS ? > + prot = pgprot_val(PAGE_EXECONLY); > + } else { > + prot = pgprot_val(protection_map[vm_flags & > + (VM_READ|VM_WRITE|VM_EXEC|VM_SHARED)]); > + } > > if (vm_flags & VM_SAO) > prot |= _PAGE_SAO; > diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c > index 014005428687..59e4cbcf3109 100644 > --- a/arch/powerpc/mm/fault.c > +++ b/arch/powerpc/mm/fault.c > @@ -270,7 +270,11 @@ static bool access_error(bool is_write, bool is_exec, > struct vm_area_struct *vma > return false; > } > > - if (unlikely(!vma_is_accessible(vma))) > + /* On Radix, a read fault could be from PROT_NONE or PROT_EXEC */ > + if (unlikely(radix_enabled() && !(vma->vm_flags & VM_READ))) > + return true; Why do you need the radix_enabled() here ? Even if it doesn't fault directly, reading a non readable area is still an error and should be handled as such, even on hardware that will not generate a fault for it at the first place. So I'd just do: if (!(vma->vm_flags & VM_READ))) return true; > + /* Check for a PROT_NONE fault on other MMUs */ > + else if (unlikely(!vma_is_accessible(vma))) > return true; > /* >* We should ideally do the vma pkey access check here. But in the Don't use an if/else construct, there is no other 'else' in that function, or in similar functions like bad_kernel_fault() for instance. So leave the !vma_is_accessible(vma) untouched and add your check as a standalone check before or after it.
[PATCH] powerpc/kexec: Fix build failure from uninitialised variable
clang 14 won't build because ret is uninitialised and can be returned if both prop and fdtprop are NULL. Fixes: b1fc44eaa9ba ("pseries/iommu/ddw: Fix kdump to work in absence of ibm,dma-window") Signed-off-by: Russell Currey --- Not sure what should be returned here, EINVAL seemed reasonable for a passed property not existing. Also, damn it Alexey, I mentioned this in my review: http://patchwork.ozlabs.org/project/linuxppc-dev/patch/20220616075901.835871-1-...@ozlabs.ru/ Consider yourself lucky I'm no longer your dictator (if you don't already) arch/powerpc/kexec/file_load_64.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/powerpc/kexec/file_load_64.c b/arch/powerpc/kexec/file_load_64.c index 683462e4556b..8fa2995e6fc7 100644 --- a/arch/powerpc/kexec/file_load_64.c +++ b/arch/powerpc/kexec/file_load_64.c @@ -1043,7 +1043,7 @@ static int copy_property(void *fdt, int node_offset, const struct device_node *d const char *propname) { const void *prop, *fdtprop; - int len = 0, fdtlen = 0, ret; + int len = 0, fdtlen = 0, ret = -EINVAL; prop = of_get_property(dn, propname, &len); fdtprop = fdt_getprop(fdt, node_offset, propname, &fdtlen); -- 2.37.1
Re: [PATCH v2 1/2] powerpc/mm: Move vm_get_page_prot() out of book3s64 code
Le 09/08/2022 à 02:55, Russell Currey a écrit : > On Mon, 2022-08-08 at 14:32 +, Christophe Leroy wrote: >> >> >> Le 08/08/2022 à 15:01, Russell Currey a écrit : >>> protection_map is about to be __ro_after_init instead of const, so >>> move >>> the only non-local function that consumes it to the same file so it >>> can >>> at least be static. >> >> What's the advantage of doing that ? Why does it need to be static ? >> >> Christophe > > It doesn't need to be, I didn't like having it exposed unnecessarily. > Aneesh's suggestion lets it stay const so I can drop this patch anyway. Yes I think Aneesh's approach is better as it keeps book3s/64 specific stuff in dedicated file. Also as I probably saw from the robots, including asm/pkeys.h in a non boo3s64 file was a problem, due to the following in that file: #ifdef CONFIG_PPC_BOOK3S_64 #include #else #error "Not supported" #endif Christophe
Re: [PATCH v4 3/8] dt-bindings: clock: Add ids for Lynx 10g PLLs
On 08/08/2022 18:16, Sean Anderson wrote: > >> This entry here is not >> parsed for any tools and only sometimes people look at it. The questions >> are directed via entry in maintainers file or via git history, so you >> can put company email just there. > > As I understand it, the email is simply informative. There are literally > hundreds of examples of mixing a "personal" copyright with a company email. > It is easy to find if you grep. If you are so opposed to it, then I will > remove the email and simply use my name. No, no problem for me. > >>> > + */ > + > +#ifndef __DT_BINDINGS_CLK_LYNX_10G_H > +#define __DT_BINDINGS_CLK_LYNX_10G_H > + > +#define LYNX10G_CLKS_PER_PLL 2 > + > +#define LYNX10G_PLLa(a) ((a) * LYNX10G_CLKS_PER_PLL) > +#define LYNX10G_PLLa_EX_DLY(a) ((a) * LYNX10G_CLKS_PER_PLL + 1) These do not look like proper IDs for clocks for bindings. Numbering starts from 0 or 1 and any "a" needs to be clearly explained. What do you bind here? >>> >>> This matches "a" is the index of the PLL. E.g. registers PLL1RSTCTL etc. >>> This matches the notation used in the reference manual. >> >> This is a file for bindings, not for storing register values. There is >> no single need to store register values (offsets, indexes) as bindings >> as it is not appropriate. Therefore if you do not use it as an ID, just >> remove the bindings header. > > This *is* just for IDs, as stated in the commit message. The above example > was only to illustrate that the clock controlled via the PLL1RSTCTL register > (among others) would have an ID of LYNX10G_PLLa(0). > > If you doubt it, review the driver. Indeed, thanks. Except the driver, where is the DTS user of these bindings? It's neither in bindings example, nor in the DTS patches. Best regards, Krzysztof
[PATCH v3] powerpc/mm: Support execute-only memory on the Radix MMU
The Hash MMU already supports XOM (i.e. mmap with PROT_EXEC only) through the execute-only pkey. A PROT_EXEC-only mapping will actually map to RX, and then the pkey will be applied on top of it. Radix doesn't have pkeys, but it does have execute permissions built-in to the MMU, so all we have to do to support XOM is expose it. Signed-off-by: Russell Currey --- v3: Incorporate Aneesh's suggestions, leave protection_map untouched Basic test: https://github.com/ruscur/junkcode/blob/main/mmap_test.c arch/powerpc/include/asm/book3s/64/pgtable.h | 2 ++ arch/powerpc/mm/book3s64/pgtable.c | 11 +-- arch/powerpc/mm/fault.c | 6 +- 3 files changed, 16 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h b/arch/powerpc/include/asm/book3s/64/pgtable.h index 392ff48f77df..486902aff040 100644 --- a/arch/powerpc/include/asm/book3s/64/pgtable.h +++ b/arch/powerpc/include/asm/book3s/64/pgtable.h @@ -151,6 +151,8 @@ #define PAGE_COPY_X__pgprot(_PAGE_BASE | _PAGE_READ | _PAGE_EXEC) #define PAGE_READONLY __pgprot(_PAGE_BASE | _PAGE_READ) #define PAGE_READONLY_X__pgprot(_PAGE_BASE | _PAGE_READ | _PAGE_EXEC) +/* Radix only, Hash uses PAGE_READONLY_X + execute-only pkey instead */ +#define PAGE_EXECONLY __pgprot(_PAGE_BASE | _PAGE_EXEC) /* Permission masks used for kernel mappings */ #define PAGE_KERNEL__pgprot(_PAGE_BASE | _PAGE_KERNEL_RW) diff --git a/arch/powerpc/mm/book3s64/pgtable.c b/arch/powerpc/mm/book3s64/pgtable.c index 7b9966402b25..62f63d344596 100644 --- a/arch/powerpc/mm/book3s64/pgtable.c +++ b/arch/powerpc/mm/book3s64/pgtable.c @@ -553,8 +553,15 @@ EXPORT_SYMBOL_GPL(memremap_compat_align); pgprot_t vm_get_page_prot(unsigned long vm_flags) { - unsigned long prot = pgprot_val(protection_map[vm_flags & - (VM_READ|VM_WRITE|VM_EXEC|VM_SHARED)]); + unsigned long prot; + + /* Radix supports execute-only, but protection_map maps X -> RX */ + if (radix_enabled() && ((vm_flags & (VM_READ|VM_WRITE|VM_EXEC)) == VM_EXEC)) { + prot = pgprot_val(PAGE_EXECONLY); + } else { + prot = pgprot_val(protection_map[vm_flags & + (VM_READ|VM_WRITE|VM_EXEC|VM_SHARED)]); + } if (vm_flags & VM_SAO) prot |= _PAGE_SAO; diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c index 014005428687..59e4cbcf3109 100644 --- a/arch/powerpc/mm/fault.c +++ b/arch/powerpc/mm/fault.c @@ -270,7 +270,11 @@ static bool access_error(bool is_write, bool is_exec, struct vm_area_struct *vma return false; } - if (unlikely(!vma_is_accessible(vma))) + /* On Radix, a read fault could be from PROT_NONE or PROT_EXEC */ + if (unlikely(radix_enabled() && !(vma->vm_flags & VM_READ))) + return true; + /* Check for a PROT_NONE fault on other MMUs */ + else if (unlikely(!vma_is_accessible(vma))) return true; /* * We should ideally do the vma pkey access check here. But in the -- 2.37.1
Re: [PATCH v2 2/2] powerpc/mm: Support execute-only memory on the Radix MMU
On Mon, 2022-08-08 at 18:54 +0530, Aneesh Kumar K V wrote: > On 8/8/22 6:31 PM, Russell Currey wrote: > > The Hash MMU already supports XOM (i.e. mmap with PROT_EXEC only) > > through the execute-only pkey. A PROT_EXEC-only mapping will > > actually > > map to RX, and then the pkey will be applied on top of it. > > > > Radix doesn't have pkeys, but it does have execute permissions > > built-in > > to the MMU, so all we have to do to support XOM is expose it. > > > > That's not possible with protection_map being const, so make it RO > > after > > init instead. > > > > Signed-off-by: Russell Currey > > --- > > v2: Make protection_map __ro_after_init and set it in an initcall > > (v1 didn't work, I tested before rebasing on Anshuman's patches) > > > > basic test: > > https://raw.githubusercontent.com/ruscur/junkcode/main/mmap_test.c > > > > arch/powerpc/include/asm/book3s/64/radix.h | 3 +++ > > arch/powerpc/include/asm/pgtable.h | 1 - > > arch/powerpc/mm/fault.c | 10 ++ > > arch/powerpc/mm/pgtable.c | 16 +++- > > 4 files changed, 28 insertions(+), 2 deletions(-) > > > > diff --git a/arch/powerpc/include/asm/book3s/64/radix.h > > b/arch/powerpc/include/asm/book3s/64/radix.h > > index 686001eda936..bf316b773d73 100644 > > --- a/arch/powerpc/include/asm/book3s/64/radix.h > > +++ b/arch/powerpc/include/asm/book3s/64/radix.h > > @@ -19,6 +19,9 @@ > > #include > > #endif > > > > +/* Execute-only page protections, Hash can use RX + execute-only > > pkey */ > > +#define PAGE_EXECONLY __pgprot(_PAGE_BASE | _PAGE_EXEC) > > + > > /* An empty PTE can still have a R or C writeback */ > > #define RADIX_PTE_NONE_MASK(_PAGE_DIRTY | > > _PAGE_ACCESSED) > > > > diff --git a/arch/powerpc/include/asm/pgtable.h > > b/arch/powerpc/include/asm/pgtable.h > > index 33f4bf8d22b0..3cbb6de20f9d 100644 > > --- a/arch/powerpc/include/asm/pgtable.h > > +++ b/arch/powerpc/include/asm/pgtable.h > > @@ -60,7 +60,6 @@ extern void paging_init(void); > > void poking_init(void); > > > > extern unsigned long ioremap_bot; > > -extern const pgprot_t protection_map[16]; > > > > /* > > * kern_addr_valid is intended to indicate whether an address is a > > valid > > diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c > > index 014005428687..887c0cc45ca6 100644 > > --- a/arch/powerpc/mm/fault.c > > +++ b/arch/powerpc/mm/fault.c > > @@ -270,6 +270,16 @@ static bool access_error(bool is_write, bool > > is_exec, struct vm_area_struct *vma > > return false; > > } > > > > + if (unlikely(!(vma->vm_flags & VM_READ))) { > > + /* > > + * If we're on Radix, then this could be a read > > attempt on > > + * execute-only memory. On other MMUs, an "exec- > > only" page > > + * will be given RX flags, so this might be > > redundant. > > + */ > > + if (radix_enabled()) > > + return true; > > + } > > + > > if (unlikely(!vma_is_accessible(vma))) > > return true; > > /* > > diff --git a/arch/powerpc/mm/pgtable.c b/arch/powerpc/mm/pgtable.c > > index 0b2bbde5fb65..6e1a6a999c3c 100644 > > --- a/arch/powerpc/mm/pgtable.c > > +++ b/arch/powerpc/mm/pgtable.c > > @@ -475,7 +475,7 @@ pte_t *__find_linux_pte(pgd_t *pgdir, unsigned > > long ea, > > EXPORT_SYMBOL_GPL(__find_linux_pte); > > > > /* Note due to the way vm flags are laid out, the bits are XWR */ > > -const pgprot_t protection_map[16] = { > > +static pgprot_t protection_map[16] __ro_after_init = { > > [VM_NONE] = > > PAGE_NONE, > > [VM_READ] = > > PAGE_READONLY, > > [VM_WRITE] = > > PAGE_COPY, > > @@ -494,6 +494,20 @@ const pgprot_t protection_map[16] = { > > [VM_SHARED | VM_EXEC | VM_WRITE | VM_READ] = > > PAGE_SHARED_X > > }; > > > > +#ifdef CONFIG_PPC_RADIX_MMU > > +static int __init radix_update_protection_map(void) > > +{ > > + if (early_radix_enabled()) { > > + /* Radix directly supports execute-only page > > protections */ > > + protection_map[VM_EXEC] = PAGE_EXECONLY; > > + protection_map[VM_EXEC | VM_SHARED] = > > PAGE_EXECONLY; > > + } > > + > > + return 0; > > +} > > +arch_initcall(radix_update_protection_map); > > Instead of this can we do this in vm_get_page_prot() ? > > /* EXEC only shared or non shared mapping ? */ > if (radix_enabled() && ((vm_flags & (VM_READ | VM_WRITE | > VM_EXEC)) == VM_EXEC)) > prot = PAGE_EXECONLY; That is a lot simpler, thanks. - Russell > > > > +#endif /* CONFIG_PPC_RADIX_MMU */ > > + > > #ifdef CONFIG_PPC_BOOK3S_64 > > pgprot_t vm_get_page_prot(unsigned long vm_flags) > > { >
Re: [PATCH] powerpc/mm: Support execute-only memory on the Radix MMU
On Mon, 2022-08-08 at 18:28 +0530, Aneesh Kumar K V wrote: > On 8/8/22 5:28 PM, Russell Currey wrote: > > The Hash MMU already supports XOM (i.e. mmap with PROT_EXEC only) > > through the execute-only pkey. A PROT_ONLY mapping will actually > > map to > > RX, and then the pkey will be applied on top of it. > > > > Radix doesn't have pkeys, but it does have execute permissions > > built-in > > to the MMU, so all we have to do to support XOM is expose it. > > > > Signed-off-by: Russell Currey > > --- > > quick test: > > https://raw.githubusercontent.com/ruscur/junkcode/main/mmap_test.c > > I can make it a selftest. > > > > arch/powerpc/include/asm/book3s/64/radix.h | 3 +++ > > arch/powerpc/mm/book3s64/radix_pgtable.c | 4 > > arch/powerpc/mm/fault.c | 10 ++ > > 3 files changed, 17 insertions(+) > > > > diff --git a/arch/powerpc/include/asm/book3s/64/radix.h > > b/arch/powerpc/include/asm/book3s/64/radix.h > > index 686001eda936..bf316b773d73 100644 > > --- a/arch/powerpc/include/asm/book3s/64/radix.h > > +++ b/arch/powerpc/include/asm/book3s/64/radix.h > > @@ -19,6 +19,9 @@ > > #include > > #endif > > > > +/* Execute-only page protections, Hash can use RX + execute-only > > pkey */ > > +#define PAGE_EXECONLY __pgprot(_PAGE_BASE | _PAGE_EXEC) > > + > > /* An empty PTE can still have a R or C writeback */ > > #define RADIX_PTE_NONE_MASK(_PAGE_DIRTY | > > _PAGE_ACCESSED) > > > > diff --git a/arch/powerpc/mm/book3s64/radix_pgtable.c > > b/arch/powerpc/mm/book3s64/radix_pgtable.c > > index 698274109c91..2edb56169805 100644 > > --- a/arch/powerpc/mm/book3s64/radix_pgtable.c > > +++ b/arch/powerpc/mm/book3s64/radix_pgtable.c > > @@ -617,6 +617,10 @@ void __init radix__early_init_mmu(void) > > __pmd_frag_nr = RADIX_PMD_FRAG_NR; > > __pmd_frag_size_shift = RADIX_PMD_FRAG_SIZE_SHIFT; > > > > + /* Radix directly supports execute-only page protections */ > > + protection_map[VM_EXEC] = PAGE_EXECONLY; > > + protection_map[VM_EXEC | VM_SHARED] = PAGE_EXECONLY; > > + > > radix_init_pgtable(); > > > > if (!firmware_has_feature(FW_FEATURE_LPAR)) { > > diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c > > index 014005428687..887c0cc45ca6 100644 > > --- a/arch/powerpc/mm/fault.c > > +++ b/arch/powerpc/mm/fault.c > > @@ -270,6 +270,16 @@ static bool access_error(bool is_write, bool > > is_exec, struct vm_area_struct *vma > > return false; > > } > > > > + if (unlikely(!(vma->vm_flags & VM_READ))) { > > + /* > > + * If we're on Radix, then this could be a read > > attempt on > > + * execute-only memory. On other MMUs, an "exec- > > only" page > > + * will be given RX flags, so this might be > > redundant. > > + */ > > + if (radix_enabled()) > > + return true; > > + } > > + > > > should we do > > /* This cover both PROT_NONE (due to check above) and exec only > mapping */ > if (radix_enabled() && !(vma->vm_flags & VM_READ)) { > return true; > /* PROT_NONE check */ > else if (!vma_is_accessible(vma)) > return true; > > return false; That is better, thanks. - Russell > > > > > if (unlikely(!vma_is_accessible(vma))) > > return true; > > /* > > -aneesh
Re: [PATCH v2 1/2] powerpc/mm: Move vm_get_page_prot() out of book3s64 code
On Mon, 2022-08-08 at 14:32 +, Christophe Leroy wrote: > > > Le 08/08/2022 à 15:01, Russell Currey a écrit : > > protection_map is about to be __ro_after_init instead of const, so > > move > > the only non-local function that consumes it to the same file so it > > can > > at least be static. > > What's the advantage of doing that ? Why does it need to be static ? > > Christophe It doesn't need to be, I didn't like having it exposed unnecessarily. Aneesh's suggestion lets it stay const so I can drop this patch anyway. - Russell > > > > > Signed-off-by: Russell Currey > > --- > > v2: new > > > > arch/powerpc/mm/book3s64/pgtable.c | 16 > > arch/powerpc/mm/pgtable.c | 21 +++-- > > 2 files changed, 19 insertions(+), 18 deletions(-) > > > > diff --git a/arch/powerpc/mm/book3s64/pgtable.c > > b/arch/powerpc/mm/book3s64/pgtable.c > > index 7b9966402b25..e2a4ea5eb960 100644 > > --- a/arch/powerpc/mm/book3s64/pgtable.c > > +++ b/arch/powerpc/mm/book3s64/pgtable.c > > @@ -550,19 +550,3 @@ unsigned long memremap_compat_align(void) > > } > > EXPORT_SYMBOL_GPL(memremap_compat_align); > > #endif > > - > > -pgprot_t vm_get_page_prot(unsigned long vm_flags) > > -{ > > - unsigned long prot = pgprot_val(protection_map[vm_flags & > > - > > (VM_READ|VM_WRITE|VM_EXEC|VM_ > > SHARED)]); > > - > > - if (vm_flags & VM_SAO) > > - prot |= _PAGE_SAO; > > - > > -#ifdef CONFIG_PPC_MEM_KEYS > > - prot |= vmflag_to_pte_pkey_bits(vm_flags); > > -#endif > > - > > - return __pgprot(prot); > > -} > > -EXPORT_SYMBOL(vm_get_page_prot); > > diff --git a/arch/powerpc/mm/pgtable.c b/arch/powerpc/mm/pgtable.c > > index cb2dcdb18f8e..0b2bbde5fb65 100644 > > --- a/arch/powerpc/mm/pgtable.c > > +++ b/arch/powerpc/mm/pgtable.c > > @@ -27,6 +27,7 @@ > > #include > > #include > > #include > > +#include > > > > #ifdef CONFIG_PPC64 > > #define PGD_ALIGN (sizeof(pgd_t) * MAX_PTRS_PER_PGD) > > @@ -493,6 +494,22 @@ const pgprot_t protection_map[16] = { > > [VM_SHARED | VM_EXEC | VM_WRITE | VM_READ] = > > PAGE_SHARED_X > > }; > > > > -#ifndef CONFIG_PPC_BOOK3S_64 > > -DECLARE_VM_GET_PAGE_PROT > > +#ifdef CONFIG_PPC_BOOK3S_64 > > +pgprot_t vm_get_page_prot(unsigned long vm_flags) > > +{ > > + unsigned long prot = pgprot_val(protection_map[vm_flags & > > + (VM_READ|VM_WRITE|VM_EXEC|V > > M_SHARED)]); > > + > > + if (vm_flags & VM_SAO) > > + prot |= _PAGE_SAO; > > + > > +#ifdef CONFIG_PPC_MEM_KEYS > > + prot |= vmflag_to_pte_pkey_bits(vm_flags); > > #endif > > + > > + return __pgprot(prot); > > +} > > +EXPORT_SYMBOL(vm_get_page_prot); > > +#else > > +DECLARE_VM_GET_PAGE_PROT > > +#endif /* CONFIG_PPC_BOOK3S_64 */
Re: [PATCH v2 01/14] powerpc: Adopt SYSCALL_DEFINE for arch-specific syscall handlers
> Well, of course we need the patch build, so SYSCALL_DEFINE and > COMPAT_SYSCALL_DEFINE have to go with the name changes, that's obvious. > > My comment was more related to changes like the renaming of > ppc64_personality() to do_ppc64_personality() and the creation of > sys_ppc64_personality() and compat_ > That could be a patch by itself. > > Also patch 4 could go up front in order to avoid renaming a function you > are removing in a follow-up patch. > > So as a summary, do all preparation up front of patch 1, in order to > keep it as minimal. Thanks. I’ll split up this commit accordingly.
Re: [PATCH 11/12] powerpc: wiiu: don't enforce flat memory
On Friday 10 June 2022 00:24:20 Pali Rohár wrote: > On Friday 20 May 2022 14:30:02 Pali Rohár wrote: > > + linux-mm > > > > Do you know what are requirements for kernel to support non-contiguous > > memory support and what is needed to enable it for 32-bit powerpc? > > Any hints? PING? > > Currently powerpc arch code does not support "memblock.memory.cnt > 1" > > except for WII which seems like a hack... See below. > > > > On Friday 20 May 2022 20:44:04 Ash Logan wrote: > > > On 20/5/22 18:04, Pali Rohár wrote: > > > > On Friday 20 May 2022 13:41:04 Ash Logan wrote: > > > >> On 14/5/22 08:43, Pali Rohár wrote: > > > >>> On Wednesday 02 March 2022 15:44:05 Ash Logan wrote: > > > pgtable_32.c:mapin_ram loops over each valid memory range, which > > > means > > > non-contiguous memory just works. > > > >>> > > > >>> Hello! Does it mean that non-contiguous memory works for any 32-bit > > > >>> powerpc platform, and not only for wiiu? If yes, should not be > > > >>> non-contiguous memory support enabled for all 32-bit ppc boards then? > > > >> > > > >> Hi! Sorry for my delayed response. As best I can tell, it does indeed > > > >> Just Work, but I have only been able to test on wiiu which is missing a > > > >> lot of features other boards have (like PCI) - so it's possible there's > > > >> still an assumption elsewhere in the kernel that I haven't hit. > > > >> > > > >> As best I can tell, the Wii and Wii U are the only 32-bit powerpc > > > >> boards > > > >> out there where it's even possible to have non-contiguous memory. > > > > > > > > What is the reason that those two boards are the **only**? Is there some > > > > specific requirement from bootloader or hardware to "enable" > > > > non-contiguous memory support? > > > > > > Not that I know of, I was just saying that I was only aware of those two > > > boards where the memory map isn't contiguous, and that is the only place > > > where it has been tested. Evidently you know of another board! > > > > > > > I'm interested in enabling non-contiguous memory support for P2020-based > > > > board as it has gaps in its 32-bit memory layout and which could be used > > > > for RAM mapping when 4GB DDR3 module is plugged in (default is 2GB). > > > > > > If it's like the Wii or Wii U (some memory at 0, a gap for MMIO or > > > whatever, then more memory at a higher address) then you should try a > > > patch along these lines, because barring the unknowns I mentioned before > > > it should work. At least as far as I'm aware ;) > > > > > > Signed-off-by: Ash Logan > > > --- > > > arch/powerpc/mm/init_32.c | 4 ++-- > > > 1 file changed, 2 insertions(+), 2 deletions(-) > > > > > > diff --git a/arch/powerpc/mm/init_32.c b/arch/powerpc/mm/init_32.c > > > index 3d690be48e84..59a84629d9a0 100644 > > > --- a/arch/powerpc/mm/init_32.c > > > +++ b/arch/powerpc/mm/init_32.c > > > @@ -125,10 +125,10 @@ void __init MMU_init(void) > > > * lowmem_end_addr is initialized below. > > > */ > > > if (memblock.memory.cnt > 1) { > > > -#ifndef CONFIG_WII > > > +#if !defined(CONFIG_WII) && !defined(CONFIG_WIIU) > > > > > > memblock_enforce_memory_limit(memblock.memory.regions[0].size); > > > pr_warn("Only using first contiguous memory region\n"); > > > -#else > > > +#elif defined(CONFIG_WII) > > > wii_memory_fixups(); > > > #endif > > > } > > > -- > > > 2.35.1 > > >
Re: [PATCH kernel v2] pseries/iommu/ddw: Fix kdump to work in absence of ibm,dma-window
Hi Alexey, This change is now in mainline as commit b1fc44eaa9ba ("pseries/iommu/ddw: Fix kdump to work in absence of ibm,dma-window"). > diff --git a/arch/powerpc/kexec/file_load_64.c > b/arch/powerpc/kexec/file_load_64.c > index b4981b651d9a..5d2c22aa34fb 100644 > --- a/arch/powerpc/kexec/file_load_64.c > +++ b/arch/powerpc/kexec/file_load_64.c > @@ -1038,6 +1038,48 @@ static int update_cpus_node(void *fdt) > return ret; > } > > +static int copy_property(void *fdt, int node_offset, const struct > device_node *dn, > + const char *propname) > +{ > + const void *prop, *fdtprop; > + int len = 0, fdtlen = 0, ret; > + > + prop = of_get_property(dn, propname, &len); > + fdtprop = fdt_getprop(fdt, node_offset, propname, &fdtlen); > + > + if (fdtprop && !prop) > + ret = fdt_delprop(fdt, node_offset, propname); > + else if (prop) > + ret = fdt_setprop(fdt, node_offset, propname, prop, len); > + > + return ret; > +} clang now warns/errors: arch/powerpc/kexec/file_load_64.c:1053:11: error: variable 'ret' is used uninitialized whenever 'if' condition is false [-Werror,-Wsometimes-uninitialized] else if (prop) ^~~~ arch/powerpc/kexec/file_load_64.c:1056:9: note: uninitialized use occurs here return ret; ^~~ arch/powerpc/kexec/file_load_64.c:1053:7: note: remove the 'if' if its condition is always true else if (prop) ^ arch/powerpc/kexec/file_load_64.c:1046:30: note: initialize the variable 'ret' to silence this warning int len = 0, fdtlen = 0, ret; ^ = 0 1 error generated. Is !fdtprop && !prop a concern? What should a sensible default for ret be? Cheers, Nathan
Re: [PATCH v3a 1/2] lib: generic accessor functions for arch keystore
On Mon, Aug 08, 2022 at 04:31:06PM +, Christophe Leroy wrote: > > > Le 08/08/2022 à 17:43, gjo...@linux.vnet.ibm.com a écrit : > > From: Greg Joyce > > > > Generic kernel subsystems may rely on platform specific persistent > > KeyStore to store objects containing sensitive key material. In such case, > > they need to access architecture specific functions to perform read/write > > operations on these variables. > > > > Define the generic variable read/write prototypes to be implemented by > > architecture specific versions. The default(weak) implementations of > > these prototypes return -EOPNOTSUPP unless overridden by architecture > > versions. > > > > Signed-off-by: Greg Joyce > > --- > > include/linux/arch_vars.h | 23 +++ > > lib/Makefile | 2 +- > > lib/arch_vars.c | 25 + > > 3 files changed, 49 insertions(+), 1 deletion(-) > > create mode 100644 include/linux/arch_vars.h > > create mode 100644 lib/arch_vars.c > > > > diff --git a/include/linux/arch_vars.h b/include/linux/arch_vars.h > > new file mode 100644 > > index ..9c280ff9432e > > --- /dev/null > > +++ b/include/linux/arch_vars.h > > @@ -0,0 +1,23 @@ > > +/* SPDX-License-Identifier: GPL-2.0 */ > > +/* > > + * Platform variable opearations. > > Is it platform specific or architecture specific ? > > > + * > > + * Copyright (C) 2022 IBM Corporation > > + * > > + * These are the accessor functions (read/write) for architecture specific > > + * variables. Specific architectures can provide overrides. > > "variables" is a very generic word which I think doesn't match what you > want to do. > > For me "variables" are local variables and global variables in a C file. > Here it seems to be something completely different hence the name is > really meaningfull and misleading. > > > + * > > + */ > > + > > +#include > > + > > +enum arch_variable_type { > > arch_variable_type ? What's that ? variable types are char, short, long, > long long, etc ... > > > + ARCH_VAR_OPAL_KEY = 0, /* SED Opal Authentication Key */ > > + ARCH_VAR_OTHER = 1, /* Other type of variable */ > > + ARCH_VAR_MAX = 1, /* Maximum type value */ > > +}; > > Why the hell do you need an enum for two values only ? > > > + > > +int arch_read_variable(enum arch_variable_type type, char *varname, > > + void *varbuf, u_int *varlen); > > +int arch_write_variable(enum arch_variable_type type, char *varname, > > + void *varbuf, u_int varlen); > > diff --git a/lib/Makefile b/lib/Makefile > > index f99bf61f8bbc..b90c4cb0dbbb 100644 > > --- a/lib/Makefile > > +++ b/lib/Makefile > > @@ -48,7 +48,7 @@ obj-y += bcd.o sort.o parser.o debug_locks.o random32.o \ > > bsearch.o find_bit.o llist.o memweight.o kfifo.o \ > > percpu-refcount.o rhashtable.o \ > > once.o refcount.o usercopy.o errseq.o bucket_locks.o \ > > -generic-radix-tree.o > > +generic-radix-tree.o arch_vars.o > > obj-$(CONFIG_STRING_SELFTEST) += test_string.o > > obj-y += string_helpers.o > > obj-$(CONFIG_TEST_STRING_HELPERS) += test-string_helpers.o > > diff --git a/lib/arch_vars.c b/lib/arch_vars.c > > new file mode 100644 > > index ..e6f16d7d09c1 > > --- /dev/null > > +++ b/lib/arch_vars.c > > The name is meaningless, too generic. > > > > @@ -0,0 +1,25 @@ > > +// SPDX-License-Identifier: GPL-2.0-only > > +/* > > + * Platform variable operations. > > platform versus architecture ? > > > + * > > + * Copyright (C) 2022 IBM Corporation > > + * > > + * These are the accessor functions (read/write) for architecture specific > > + * variables. Specific architectures can provide overrides. > > + * > > + */ > > + > > +#include > > +#include > > + > > +int __weak arch_read_variable(enum arch_variable_type type, char *varname, > > + void *varbuf, u_int *varlen) > > Sorry, to read a variable, I use READ_ONCE or I read it directly. This is supposed to be used for things like the EFI variables and the already existing powernv secure variables. Nonetheless, without adding the plumbing for the existing implementations it is not clear what it's doing, and the interface is agruably meaningless. Hence I would either suggest to provide the plumbing necessary for existing (secure) variable implementations to make use of the interface, or use private implementations like all the existing platforms do without exposing the values in any generic way, and leave that to somebody who is comfortable with designing a working general inteface for this. Thanks Michal
Re: [PATCH v3a 2/2] powerpc/pseries: Override lib/arch_vars.c functions
Le 08/08/2022 à 17:43, gjo...@linux.vnet.ibm.com a écrit : > From: Greg Joyce > > Self Encrypting Drives(SED) make use of POWER LPAR Platform KeyStore > for storing its variables. Thus the block subsystem needs to access > PowerPC specific functions to read/write objects in PLPKS. > > Override the default implementations in lib/arch_vars.c file with > PowerPC specific versions. > > Signed-off-by: Greg Joyce > --- > arch/powerpc/platforms/pseries/Makefile | 1 + > .../platforms/pseries/plpks_arch_ops.c| 167 ++ > 2 files changed, 168 insertions(+) > create mode 100644 arch/powerpc/platforms/pseries/plpks_arch_ops.c > > diff --git a/arch/powerpc/platforms/pseries/Makefile > b/arch/powerpc/platforms/pseries/Makefile > index 14e143b946a3..3a545422eae5 100644 > --- a/arch/powerpc/platforms/pseries/Makefile > +++ b/arch/powerpc/platforms/pseries/Makefile > @@ -29,6 +29,7 @@ obj-$(CONFIG_PPC_SPLPAR)+= vphn.o > obj-$(CONFIG_PPC_SVM) += svm.o > obj-$(CONFIG_FA_DUMP) += rtas-fadump.o > obj-$(CONFIG_PSERIES_PLPKS) += plpks.o > +obj-$(CONFIG_PSERIES_PLPKS) += plpks_arch_ops.o > > obj-$(CONFIG_SUSPEND) += suspend.o > obj-$(CONFIG_PPC_VAS) += vas.o vas-sysfs.o > diff --git a/arch/powerpc/platforms/pseries/plpks_arch_ops.c > b/arch/powerpc/platforms/pseries/plpks_arch_ops.c > new file mode 100644 > index ..fdea3322f696 > --- /dev/null > +++ b/arch/powerpc/platforms/pseries/plpks_arch_ops.c > @@ -0,0 +1,167 @@ > +// SPDX-License-Identifier: GPL-2.0-only > +/* > + * POWER Platform arch specific code for SED > + * Copyright (C) 2022 IBM Corporation > + * > + * Define operations for generic kernel subsystems to read/write keys > + * from POWER LPAR Platform KeyStore(PLPKS). > + * > + * List of subsystems/usecase using PLPKS: > + * - Self Encrypting Drives(SED) > + */ > + > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include "plpks.h" > + > +/* > + * variable structure that contains all SED data > + */ > +struct plpks_sed_object_data { > + u_char version; > + u_char pad1[7]; > + u_long authority; > + u_long range; > + u_int key_len; > + u_char key[32]; > +}; > + > +/* > + * ext_type values > + * 00no extension exists > + * 01-1F common > + * 20-3F AIX > + * 40-5F Linux > + * 60-7F IBMi > + */ > + > +/* > + * This extension is optional for version 1 sed_object_data > + */ > +struct sed_object_extension { > + u8 ext_type; > + u8 rsvd[3]; > + u8 ext_data[64]; > +}; > + > +#define PKS_SED_OBJECT_DATA_V1 1 > +#define PKS_SED_MANGLED_LABEL "/default/pri" > +#define PLPKS_SED_COMPONENT "sed-opal" > + > +#define PLPKS_ARCHVAR_POLICYWORLDREADABLE > +#define PLPKS_ARCHVAR_OS_COMMON 4 > + > +/* > + * Read the variable data from PKS given the label > + */ > +int arch_read_variable(enum arch_variable_type type, char *varname, > +void *varbuf, u_int *varlen) 'enum arch_variable_type' is pointlessly long. And it only has two possible values, so should be a 'bool' > +{ > + struct plpks_var var; > + struct plpks_sed_object_data *data; > + u_int offset = 0; Would be better to set it to 0 in the code that handles ARCH_VAR_OTHER and leave it unitialised here. > + char *buf = (char *)varbuf; > + int ret; > + > + var.name = varname; > + var.namelen = strlen(varname); > + var.policy = PLPKS_ARCHVAR_POLICY; > + var.os = PLPKS_ARCHVAR_OS_COMMON; > + var.data = NULL; > + var.datalen = 0; > + > + switch (type) { A switch for a boolean value is pointless. Just do if/else, it will be a lot more readable. > + case ARCH_VAR_OPAL_KEY: > + var.component = PLPKS_SED_COMPONENT; > +#ifdef OPAL_AUTH_KEY #ifdefs in C files should be avoided, refer https://docs.kernel.org/process/coding-style.html#conditional-compilation > + if (strcmp(OPAL_AUTH_KEY, varname) == 0) { > + var.name = PKS_SED_MANGLED_LABEL; > + var.namelen = strlen(varname); > + } > +#endif > + offset = offsetof(struct plpks_sed_object_data, key); > + break; > + case ARCH_VAR_OTHER: > + var.component = ""; > + break; > + } > + > + ret = plpks_read_os_var(&var); > + if (ret != 0) > + return ret; > + > + if (offset > var.datalen) > + offset = 0; > + > + switch (type) { > + case ARCH_VAR_OPAL_KEY: > + data = (struct plpks_sed_object_data *)var.data; > + *varlen = data->key_len; > + break; > + case ARCH_VAR_OTHER: > + *varlen = var.datalen; > + break; > + } > + > + if (var.data) { > + memcpy(varbuf, var.data + offset, var.datalen - offset); > + b
Re: [PATCH v3a 1/2] lib: generic accessor functions for arch keystore
Le 08/08/2022 à 17:43, gjo...@linux.vnet.ibm.com a écrit : > From: Greg Joyce > > Generic kernel subsystems may rely on platform specific persistent > KeyStore to store objects containing sensitive key material. In such case, > they need to access architecture specific functions to perform read/write > operations on these variables. > > Define the generic variable read/write prototypes to be implemented by > architecture specific versions. The default(weak) implementations of > these prototypes return -EOPNOTSUPP unless overridden by architecture > versions. > > Signed-off-by: Greg Joyce > --- > include/linux/arch_vars.h | 23 +++ > lib/Makefile | 2 +- > lib/arch_vars.c | 25 + > 3 files changed, 49 insertions(+), 1 deletion(-) > create mode 100644 include/linux/arch_vars.h > create mode 100644 lib/arch_vars.c > > diff --git a/include/linux/arch_vars.h b/include/linux/arch_vars.h > new file mode 100644 > index ..9c280ff9432e > --- /dev/null > +++ b/include/linux/arch_vars.h > @@ -0,0 +1,23 @@ > +/* SPDX-License-Identifier: GPL-2.0 */ > +/* > + * Platform variable opearations. Is it platform specific or architecture specific ? > + * > + * Copyright (C) 2022 IBM Corporation > + * > + * These are the accessor functions (read/write) for architecture specific > + * variables. Specific architectures can provide overrides. "variables" is a very generic word which I think doesn't match what you want to do. For me "variables" are local variables and global variables in a C file. Here it seems to be something completely different hence the name is really meaningfull and misleading. > + * > + */ > + > +#include > + > +enum arch_variable_type { arch_variable_type ? What's that ? variable types are char, short, long, long long, etc ... > + ARCH_VAR_OPAL_KEY = 0, /* SED Opal Authentication Key */ > + ARCH_VAR_OTHER = 1, /* Other type of variable */ > + ARCH_VAR_MAX = 1, /* Maximum type value */ > +}; Why the hell do you need an enum for two values only ? > + > +int arch_read_variable(enum arch_variable_type type, char *varname, > +void *varbuf, u_int *varlen); > +int arch_write_variable(enum arch_variable_type type, char *varname, > + void *varbuf, u_int varlen); > diff --git a/lib/Makefile b/lib/Makefile > index f99bf61f8bbc..b90c4cb0dbbb 100644 > --- a/lib/Makefile > +++ b/lib/Makefile > @@ -48,7 +48,7 @@ obj-y += bcd.o sort.o parser.o debug_locks.o random32.o \ >bsearch.o find_bit.o llist.o memweight.o kfifo.o \ >percpu-refcount.o rhashtable.o \ >once.o refcount.o usercopy.o errseq.o bucket_locks.o \ > - generic-radix-tree.o > + generic-radix-tree.o arch_vars.o > obj-$(CONFIG_STRING_SELFTEST) += test_string.o > obj-y += string_helpers.o > obj-$(CONFIG_TEST_STRING_HELPERS) += test-string_helpers.o > diff --git a/lib/arch_vars.c b/lib/arch_vars.c > new file mode 100644 > index ..e6f16d7d09c1 > --- /dev/null > +++ b/lib/arch_vars.c The name is meaningless, too generic. > @@ -0,0 +1,25 @@ > +// SPDX-License-Identifier: GPL-2.0-only > +/* > + * Platform variable operations. platform versus architecture ? > + * > + * Copyright (C) 2022 IBM Corporation > + * > + * These are the accessor functions (read/write) for architecture specific > + * variables. Specific architectures can provide overrides. > + * > + */ > + > +#include > +#include > + > +int __weak arch_read_variable(enum arch_variable_type type, char *varname, > + void *varbuf, u_int *varlen) Sorry, to read a variable, I use READ_ONCE or I read it directly. > +{ > + return -EOPNOTSUPP; > +} > + > +int __weak arch_write_variable(enum arch_variable_type type, char *varname, > +void *varbuf, u_int varlen) > +{ > + return -EOPNOTSUPP; > +}
[PATCH v3a 1/2] lib: generic accessor functions for arch keystore
From: Greg Joyce Generic kernel subsystems may rely on platform specific persistent KeyStore to store objects containing sensitive key material. In such case, they need to access architecture specific functions to perform read/write operations on these variables. Define the generic variable read/write prototypes to be implemented by architecture specific versions. The default(weak) implementations of these prototypes return -EOPNOTSUPP unless overridden by architecture versions. Signed-off-by: Greg Joyce --- include/linux/arch_vars.h | 23 +++ lib/Makefile | 2 +- lib/arch_vars.c | 25 + 3 files changed, 49 insertions(+), 1 deletion(-) create mode 100644 include/linux/arch_vars.h create mode 100644 lib/arch_vars.c diff --git a/include/linux/arch_vars.h b/include/linux/arch_vars.h new file mode 100644 index ..9c280ff9432e --- /dev/null +++ b/include/linux/arch_vars.h @@ -0,0 +1,23 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * Platform variable opearations. + * + * Copyright (C) 2022 IBM Corporation + * + * These are the accessor functions (read/write) for architecture specific + * variables. Specific architectures can provide overrides. + * + */ + +#include + +enum arch_variable_type { + ARCH_VAR_OPAL_KEY = 0, /* SED Opal Authentication Key */ + ARCH_VAR_OTHER = 1, /* Other type of variable */ + ARCH_VAR_MAX = 1, /* Maximum type value */ +}; + +int arch_read_variable(enum arch_variable_type type, char *varname, + void *varbuf, u_int *varlen); +int arch_write_variable(enum arch_variable_type type, char *varname, + void *varbuf, u_int varlen); diff --git a/lib/Makefile b/lib/Makefile index f99bf61f8bbc..b90c4cb0dbbb 100644 --- a/lib/Makefile +++ b/lib/Makefile @@ -48,7 +48,7 @@ obj-y += bcd.o sort.o parser.o debug_locks.o random32.o \ bsearch.o find_bit.o llist.o memweight.o kfifo.o \ percpu-refcount.o rhashtable.o \ once.o refcount.o usercopy.o errseq.o bucket_locks.o \ -generic-radix-tree.o +generic-radix-tree.o arch_vars.o obj-$(CONFIG_STRING_SELFTEST) += test_string.o obj-y += string_helpers.o obj-$(CONFIG_TEST_STRING_HELPERS) += test-string_helpers.o diff --git a/lib/arch_vars.c b/lib/arch_vars.c new file mode 100644 index ..e6f16d7d09c1 --- /dev/null +++ b/lib/arch_vars.c @@ -0,0 +1,25 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Platform variable operations. + * + * Copyright (C) 2022 IBM Corporation + * + * These are the accessor functions (read/write) for architecture specific + * variables. Specific architectures can provide overrides. + * + */ + +#include +#include + +int __weak arch_read_variable(enum arch_variable_type type, char *varname, + void *varbuf, u_int *varlen) +{ + return -EOPNOTSUPP; +} + +int __weak arch_write_variable(enum arch_variable_type type, char *varname, + void *varbuf, u_int varlen) +{ + return -EOPNOTSUPP; +} -- 2.27.0
[PATCH v3a 2/2] powerpc/pseries: Override lib/arch_vars.c functions
From: Greg Joyce Self Encrypting Drives(SED) make use of POWER LPAR Platform KeyStore for storing its variables. Thus the block subsystem needs to access PowerPC specific functions to read/write objects in PLPKS. Override the default implementations in lib/arch_vars.c file with PowerPC specific versions. Signed-off-by: Greg Joyce --- arch/powerpc/platforms/pseries/Makefile | 1 + .../platforms/pseries/plpks_arch_ops.c| 167 ++ 2 files changed, 168 insertions(+) create mode 100644 arch/powerpc/platforms/pseries/plpks_arch_ops.c diff --git a/arch/powerpc/platforms/pseries/Makefile b/arch/powerpc/platforms/pseries/Makefile index 14e143b946a3..3a545422eae5 100644 --- a/arch/powerpc/platforms/pseries/Makefile +++ b/arch/powerpc/platforms/pseries/Makefile @@ -29,6 +29,7 @@ obj-$(CONFIG_PPC_SPLPAR) += vphn.o obj-$(CONFIG_PPC_SVM) += svm.o obj-$(CONFIG_FA_DUMP) += rtas-fadump.o obj-$(CONFIG_PSERIES_PLPKS) += plpks.o +obj-$(CONFIG_PSERIES_PLPKS) += plpks_arch_ops.o obj-$(CONFIG_SUSPEND) += suspend.o obj-$(CONFIG_PPC_VAS) += vas.o vas-sysfs.o diff --git a/arch/powerpc/platforms/pseries/plpks_arch_ops.c b/arch/powerpc/platforms/pseries/plpks_arch_ops.c new file mode 100644 index ..fdea3322f696 --- /dev/null +++ b/arch/powerpc/platforms/pseries/plpks_arch_ops.c @@ -0,0 +1,167 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * POWER Platform arch specific code for SED + * Copyright (C) 2022 IBM Corporation + * + * Define operations for generic kernel subsystems to read/write keys + * from POWER LPAR Platform KeyStore(PLPKS). + * + * List of subsystems/usecase using PLPKS: + * - Self Encrypting Drives(SED) + */ + +#include +#include +#include +#include +#include +#include +#include +#include "plpks.h" + +/* + * variable structure that contains all SED data + */ +struct plpks_sed_object_data { + u_char version; + u_char pad1[7]; + u_long authority; + u_long range; + u_int key_len; + u_char key[32]; +}; + +/* + * ext_type values + * 00no extension exists + * 01-1F common + * 20-3F AIX + * 40-5F Linux + * 60-7F IBMi + */ + +/* + * This extension is optional for version 1 sed_object_data + */ +struct sed_object_extension { + u8 ext_type; + u8 rsvd[3]; + u8 ext_data[64]; +}; + +#define PKS_SED_OBJECT_DATA_V1 1 +#define PKS_SED_MANGLED_LABEL "/default/pri" +#define PLPKS_SED_COMPONENT "sed-opal" + +#define PLPKS_ARCHVAR_POLICYWORLDREADABLE +#define PLPKS_ARCHVAR_OS_COMMON 4 + +/* + * Read the variable data from PKS given the label + */ +int arch_read_variable(enum arch_variable_type type, char *varname, + void *varbuf, u_int *varlen) +{ + struct plpks_var var; + struct plpks_sed_object_data *data; + u_int offset = 0; + char *buf = (char *)varbuf; + int ret; + + var.name = varname; + var.namelen = strlen(varname); + var.policy = PLPKS_ARCHVAR_POLICY; + var.os = PLPKS_ARCHVAR_OS_COMMON; + var.data = NULL; + var.datalen = 0; + + switch (type) { + case ARCH_VAR_OPAL_KEY: + var.component = PLPKS_SED_COMPONENT; +#ifdef OPAL_AUTH_KEY + if (strcmp(OPAL_AUTH_KEY, varname) == 0) { + var.name = PKS_SED_MANGLED_LABEL; + var.namelen = strlen(varname); + } +#endif + offset = offsetof(struct plpks_sed_object_data, key); + break; + case ARCH_VAR_OTHER: + var.component = ""; + break; + } + + ret = plpks_read_os_var(&var); + if (ret != 0) + return ret; + + if (offset > var.datalen) + offset = 0; + + switch (type) { + case ARCH_VAR_OPAL_KEY: + data = (struct plpks_sed_object_data *)var.data; + *varlen = data->key_len; + break; + case ARCH_VAR_OTHER: + *varlen = var.datalen; + break; + } + + if (var.data) { + memcpy(varbuf, var.data + offset, var.datalen - offset); + buf[*varlen] = '\0'; + kfree(var.data); + } + + return 0; +} + +/* + * Write the variable data to PKS given the label + */ +int arch_write_variable(enum arch_variable_type type, char *varname, + void *varbuf, u_int varlen) +{ + struct plpks_var var; + struct plpks_sed_object_data data; + struct plpks_var_name vname; + + var.name = varname; + var.namelen = strlen(varname); + var.policy = PLPKS_ARCHVAR_POLICY; + var.os = PLPKS_ARCHVAR_OS_COMMON; + var.datalen = varlen; + var.data = varbuf; + + switch (type) { + case ARCH_VAR_OPAL_KEY: + var.component = PLPKS_SED_COMPONENT; +#ifdef OPA
[PATCH v3a 0/2] generic and PowerPC accessor functions for arch keystore
From: Greg Joyce Changelog v3a: - No code changes, but per reviewer requests, adding additional mailing lists(keyring, EFI) for wider review. Architectural neutral functions have been defined for accessing architecture specific variable store. The neutral functions are defined as weak so that they may be superseded by platform specific versions. The functions have been desigined so that they can support a large range of platforms/architectures. PowerPC/pseries versions of these functions provide read/write access to the non-volatile PLPKS data store. This functionality allows kernel code such as the block SED opal driver to store authentication keys in a secure permanent store. Greg Joyce (2): lib: define generic accessor functions for arch specific keystore powerpc/pseries: Override lib/arch_vars.c functions arch/powerpc/platforms/pseries/Makefile | 1 + .../platforms/pseries/plpks_arch_ops.c| 167 ++ include/linux/arch_vars.h | 23 +++ lib/Makefile | 2 +- lib/arch_vars.c | 25 +++ 5 files changed, 217 insertions(+), 1 deletion(-) create mode 100644 arch/powerpc/platforms/pseries/plpks_arch_ops.c create mode 100644 include/linux/arch_vars.h create mode 100644 lib/arch_vars.c Signed-off-by: Greg Joyce base-commit: ff6992735ade75aae3e35d16b17da1008d753d28 -- 2.27.0
Re: [PATCH v4 3/8] dt-bindings: clock: Add ids for Lynx 10g PLLs
On 8/8/22 1:46 AM, Krzysztof Kozlowski wrote: > On 05/08/2022 17:17, Sean Anderson wrote: >> >> >> On 8/5/22 2:53 AM, Krzysztof Kozlowski wrote: >>> On 05/08/2022 00:05, Sean Anderson wrote: This adds ids for the Lynx 10g SerDes's internal PLLs. These may be used witn assigned-clock* to specify a particular frequency to use. Signed-off-by: Sean Anderson --- Changes in v4: - New include/dt-bindings/clock/fsl,lynx-10g.h | 14 ++ 1 file changed, 14 insertions(+) create mode 100644 include/dt-bindings/clock/fsl,lynx-10g.h diff --git a/include/dt-bindings/clock/fsl,lynx-10g.h b/include/dt-bindings/clock/fsl,lynx-10g.h new file mode 100644 index ..f5b955658106 --- /dev/null +++ b/include/dt-bindings/clock/fsl,lynx-10g.h @@ -0,0 +1,14 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ >>> >>> This should be dual license. >> >> This is just matching what the majority (263 out of 326) clock dt-bindings >> headers do. > > Then please license it just like bindings, so dual license with BSD. OK >> +/* + * Copyright (C) 2022 Sean Anderson >>> >>> It's confusing to see personal copyrights with company email. Either the >>> copyright is attributed to your employer or to you. If to you, use >>> private email. >> >> I hold the copyright, and I would like inquiries to be directed to my work >> email (as I don't have this hardware at home). > > OK, I guess I won't be the only one confused :). You're the first person to comment on this. > This entry here is not > parsed for any tools and only sometimes people look at it. The questions > are directed via entry in maintainers file or via git history, so you > can put company email just there. As I understand it, the email is simply informative. There are literally hundreds of examples of mixing a "personal" copyright with a company email. It is easy to find if you grep. If you are so opposed to it, then I will remove the email and simply use my name. >> + */ + +#ifndef __DT_BINDINGS_CLK_LYNX_10G_H +#define __DT_BINDINGS_CLK_LYNX_10G_H + +#define LYNX10G_CLKS_PER_PLL 2 + +#define LYNX10G_PLLa(a) ((a) * LYNX10G_CLKS_PER_PLL) +#define LYNX10G_PLLa_EX_DLY(a)((a) * LYNX10G_CLKS_PER_PLL + 1) >>> >>> These do not look like proper IDs for clocks for bindings. Numbering >>> starts from 0 or 1 and any "a" needs to be clearly explained. What do >>> you bind here? >> >> This matches "a" is the index of the PLL. E.g. registers PLL1RSTCTL etc. >> This matches the notation used in the reference manual. > > This is a file for bindings, not for storing register values. There is > no single need to store register values (offsets, indexes) as bindings > as it is not appropriate. Therefore if you do not use it as an ID, just > remove the bindings header. This *is* just for IDs, as stated in the commit message. The above example was only to illustrate that the clock controlled via the PLL1RSTCTL register (among others) would have an ID of LYNX10G_PLLa(0). If you doubt it, review the driver. --Sean
Re: [PATCH v2 1/2] powerpc/mm: Move vm_get_page_prot() out of book3s64 code
Le 08/08/2022 à 15:01, Russell Currey a écrit : > protection_map is about to be __ro_after_init instead of const, so move > the only non-local function that consumes it to the same file so it can > at least be static. What's the advantage of doing that ? Why does it need to be static ? Christophe > > Signed-off-by: Russell Currey > --- > v2: new > > arch/powerpc/mm/book3s64/pgtable.c | 16 > arch/powerpc/mm/pgtable.c | 21 +++-- > 2 files changed, 19 insertions(+), 18 deletions(-) > > diff --git a/arch/powerpc/mm/book3s64/pgtable.c > b/arch/powerpc/mm/book3s64/pgtable.c > index 7b9966402b25..e2a4ea5eb960 100644 > --- a/arch/powerpc/mm/book3s64/pgtable.c > +++ b/arch/powerpc/mm/book3s64/pgtable.c > @@ -550,19 +550,3 @@ unsigned long memremap_compat_align(void) > } > EXPORT_SYMBOL_GPL(memremap_compat_align); > #endif > - > -pgprot_t vm_get_page_prot(unsigned long vm_flags) > -{ > - unsigned long prot = pgprot_val(protection_map[vm_flags & > - (VM_READ|VM_WRITE|VM_EXEC|VM_SHARED)]); > - > - if (vm_flags & VM_SAO) > - prot |= _PAGE_SAO; > - > -#ifdef CONFIG_PPC_MEM_KEYS > - prot |= vmflag_to_pte_pkey_bits(vm_flags); > -#endif > - > - return __pgprot(prot); > -} > -EXPORT_SYMBOL(vm_get_page_prot); > diff --git a/arch/powerpc/mm/pgtable.c b/arch/powerpc/mm/pgtable.c > index cb2dcdb18f8e..0b2bbde5fb65 100644 > --- a/arch/powerpc/mm/pgtable.c > +++ b/arch/powerpc/mm/pgtable.c > @@ -27,6 +27,7 @@ > #include > #include > #include > +#include > > #ifdef CONFIG_PPC64 > #define PGD_ALIGN (sizeof(pgd_t) * MAX_PTRS_PER_PGD) > @@ -493,6 +494,22 @@ const pgprot_t protection_map[16] = { > [VM_SHARED | VM_EXEC | VM_WRITE | VM_READ] = PAGE_SHARED_X > }; > > -#ifndef CONFIG_PPC_BOOK3S_64 > -DECLARE_VM_GET_PAGE_PROT > +#ifdef CONFIG_PPC_BOOK3S_64 > +pgprot_t vm_get_page_prot(unsigned long vm_flags) > +{ > + unsigned long prot = pgprot_val(protection_map[vm_flags & > + (VM_READ|VM_WRITE|VM_EXEC|VM_SHARED)]); > + > + if (vm_flags & VM_SAO) > + prot |= _PAGE_SAO; > + > +#ifdef CONFIG_PPC_MEM_KEYS > + prot |= vmflag_to_pte_pkey_bits(vm_flags); > #endif > + > + return __pgprot(prot); > +} > +EXPORT_SYMBOL(vm_get_page_prot); > +#else > +DECLARE_VM_GET_PAGE_PROT > +#endif /* CONFIG_PPC_BOOK3S_64 */
Re: [PATCH v2 2/2] powerpc/mm: Support execute-only memory on the Radix MMU
On 8/8/22 6:31 PM, Russell Currey wrote: > The Hash MMU already supports XOM (i.e. mmap with PROT_EXEC only) > through the execute-only pkey. A PROT_EXEC-only mapping will actually > map to RX, and then the pkey will be applied on top of it. > > Radix doesn't have pkeys, but it does have execute permissions built-in > to the MMU, so all we have to do to support XOM is expose it. > > That's not possible with protection_map being const, so make it RO after > init instead. > > Signed-off-by: Russell Currey > --- > v2: Make protection_map __ro_after_init and set it in an initcall > (v1 didn't work, I tested before rebasing on Anshuman's patches) > > basic test: https://raw.githubusercontent.com/ruscur/junkcode/main/mmap_test.c > > arch/powerpc/include/asm/book3s/64/radix.h | 3 +++ > arch/powerpc/include/asm/pgtable.h | 1 - > arch/powerpc/mm/fault.c| 10 ++ > arch/powerpc/mm/pgtable.c | 16 +++- > 4 files changed, 28 insertions(+), 2 deletions(-) > > diff --git a/arch/powerpc/include/asm/book3s/64/radix.h > b/arch/powerpc/include/asm/book3s/64/radix.h > index 686001eda936..bf316b773d73 100644 > --- a/arch/powerpc/include/asm/book3s/64/radix.h > +++ b/arch/powerpc/include/asm/book3s/64/radix.h > @@ -19,6 +19,9 @@ > #include > #endif > > +/* Execute-only page protections, Hash can use RX + execute-only pkey */ > +#define PAGE_EXECONLY__pgprot(_PAGE_BASE | _PAGE_EXEC) > + > /* An empty PTE can still have a R or C writeback */ > #define RADIX_PTE_NONE_MASK (_PAGE_DIRTY | _PAGE_ACCESSED) > > diff --git a/arch/powerpc/include/asm/pgtable.h > b/arch/powerpc/include/asm/pgtable.h > index 33f4bf8d22b0..3cbb6de20f9d 100644 > --- a/arch/powerpc/include/asm/pgtable.h > +++ b/arch/powerpc/include/asm/pgtable.h > @@ -60,7 +60,6 @@ extern void paging_init(void); > void poking_init(void); > > extern unsigned long ioremap_bot; > -extern const pgprot_t protection_map[16]; > > /* > * kern_addr_valid is intended to indicate whether an address is a valid > diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c > index 014005428687..887c0cc45ca6 100644 > --- a/arch/powerpc/mm/fault.c > +++ b/arch/powerpc/mm/fault.c > @@ -270,6 +270,16 @@ static bool access_error(bool is_write, bool is_exec, > struct vm_area_struct *vma > return false; > } > > + if (unlikely(!(vma->vm_flags & VM_READ))) { > + /* > + * If we're on Radix, then this could be a read attempt on > + * execute-only memory. On other MMUs, an "exec-only" page > + * will be given RX flags, so this might be redundant. > + */ > + if (radix_enabled()) > + return true; > + } > + > if (unlikely(!vma_is_accessible(vma))) > return true; > /* > diff --git a/arch/powerpc/mm/pgtable.c b/arch/powerpc/mm/pgtable.c > index 0b2bbde5fb65..6e1a6a999c3c 100644 > --- a/arch/powerpc/mm/pgtable.c > +++ b/arch/powerpc/mm/pgtable.c > @@ -475,7 +475,7 @@ pte_t *__find_linux_pte(pgd_t *pgdir, unsigned long ea, > EXPORT_SYMBOL_GPL(__find_linux_pte); > > /* Note due to the way vm flags are laid out, the bits are XWR */ > -const pgprot_t protection_map[16] = { > +static pgprot_t protection_map[16] __ro_after_init = { > [VM_NONE] = PAGE_NONE, > [VM_READ] = PAGE_READONLY, > [VM_WRITE] = PAGE_COPY, > @@ -494,6 +494,20 @@ const pgprot_t protection_map[16] = { > [VM_SHARED | VM_EXEC | VM_WRITE | VM_READ] = PAGE_SHARED_X > }; > > +#ifdef CONFIG_PPC_RADIX_MMU > +static int __init radix_update_protection_map(void) > +{ > + if (early_radix_enabled()) { > + /* Radix directly supports execute-only page protections */ > + protection_map[VM_EXEC] = PAGE_EXECONLY; > + protection_map[VM_EXEC | VM_SHARED] = PAGE_EXECONLY; > + } > + > + return 0; > +} > +arch_initcall(radix_update_protection_map); Instead of this can we do this in vm_get_page_prot() ? /* EXEC only shared or non shared mapping ? */ if (radix_enabled() && ((vm_flags & (VM_READ | VM_WRITE | VM_EXEC)) == VM_EXEC)) prot = PAGE_EXECONLY; > +#endif /* CONFIG_PPC_RADIX_MMU */ > + > #ifdef CONFIG_PPC_BOOK3S_64 > pgprot_t vm_get_page_prot(unsigned long vm_flags) > {
[PATCH v3 1/4] proc: Add get_task_cmdline_kernel() function
Add a new function get_task_cmdline_kernel() which reads the command line of a process into a kernel buffer. This command line can then be dumped by arch code to give additional debug info via the parameters with which a faulting process was started. The new function re-uses the existing code which provides the cmdline for the procfs. For that the existing functions were modified so that the buffer page is allocated outside of get_mm_proctitle() and get_mm_cmdline() and instead provided as parameter. Signed-off-by: Helge Deller -- Changes in v3: - add parameter names in header files, noticed by: kernel test robot - require task to be locked by caller --- fs/proc/base.c | 74 - include/linux/proc_fs.h | 5 +++ 2 files changed, 56 insertions(+), 23 deletions(-) diff --git a/fs/proc/base.c b/fs/proc/base.c index 8dfa36a99c74..e2d4152aed34 100644 --- a/fs/proc/base.c +++ b/fs/proc/base.c @@ -217,20 +217,17 @@ static int proc_root_link(struct dentry *dentry, struct path *path) */ static ssize_t get_mm_proctitle(struct mm_struct *mm, char __user *buf, size_t count, unsigned long pos, - unsigned long arg_start) + unsigned long arg_start, char *page) { - char *page; int ret, got; + size_t size; - if (pos >= PAGE_SIZE) + size = min_t(size_t, PAGE_SIZE, count); + if (pos >= size) return 0; - page = (char *)__get_free_page(GFP_KERNEL); - if (!page) - return -ENOMEM; - ret = 0; - got = access_remote_vm(mm, arg_start, page, PAGE_SIZE, FOLL_ANON); + got = access_remote_vm(mm, arg_start, page, size, FOLL_ANON); if (got > 0) { int len = strnlen(page, got); @@ -238,7 +235,9 @@ static ssize_t get_mm_proctitle(struct mm_struct *mm, char __user *buf, if (len < got) len++; - if (len > pos) { + if (!buf) + ret = len; + else if (len > pos) { len -= pos; if (len > count) len = count; @@ -248,16 +247,15 @@ static ssize_t get_mm_proctitle(struct mm_struct *mm, char __user *buf, ret = len; } } - free_page((unsigned long)page); return ret; } static ssize_t get_mm_cmdline(struct mm_struct *mm, char __user *buf, - size_t count, loff_t *ppos) + size_t count, loff_t *ppos, char *page) { unsigned long arg_start, arg_end, env_start, env_end; unsigned long pos, len; - char *page, c; + char c; /* Check if process spawned far enough to have cmdline. */ if (!mm->env_end) @@ -283,7 +281,7 @@ static ssize_t get_mm_cmdline(struct mm_struct *mm, char __user *buf, len = env_end - arg_start; /* We're not going to care if "*ppos" has high bits set */ - pos = *ppos; + pos = ppos ? *ppos : 0; if (pos >= len) return 0; if (count > len - pos) @@ -299,7 +297,7 @@ static ssize_t get_mm_cmdline(struct mm_struct *mm, char __user *buf, * pos is 0, and set a flag in the 'struct file'. */ if (access_remote_vm(mm, arg_end-1, &c, 1, FOLL_ANON) == 1 && c) - return get_mm_proctitle(mm, buf, count, pos, arg_start); + return get_mm_proctitle(mm, buf, count, pos, arg_start, page); /* * For the non-setproctitle() case we limit things strictly @@ -311,10 +309,6 @@ static ssize_t get_mm_cmdline(struct mm_struct *mm, char __user *buf, if (count > arg_end - pos) count = arg_end - pos; - page = (char *)__get_free_page(GFP_KERNEL); - if (!page) - return -ENOMEM; - len = 0; while (count) { int got; @@ -323,7 +317,8 @@ static ssize_t get_mm_cmdline(struct mm_struct *mm, char __user *buf, got = access_remote_vm(mm, pos, page, size, FOLL_ANON); if (got <= 0) break; - got -= copy_to_user(buf, page, got); + if (buf) + got -= copy_to_user(buf, page, got); if (unlikely(!got)) { if (!len) len = -EFAULT; @@ -335,12 +330,11 @@ static ssize_t get_mm_cmdline(struct mm_struct *mm, char __user *buf, count -= got; } - free_page((unsigned long)page); return len; } static ssize_t get_task_cmdline(struct task_struct *tsk, char __user *buf, - size_t count, loff_t *pos) + size_t count, loff_t *pos, char *page) { struct mm_struct *mm; ssize_t ret; @@ -349,23 +343,57
[PATCH v3 0/4] Dump command line of faulting process to syslog
This patch series dumps the command line (including the program parameters) of a faulting process to the syslog. The motivation for this patch is that it's sometimes quite hard to find out and annoying to not know which program *exactly* faulted when looking at the syslog. For example, a dump on parisc shows: do_page_fault() command='cc1' type=15 address=0x in libc-2.33.so[f6abb000+184000] -> We see the "cc1" compiler crashed, but it would be useful to know which file was compiled. With this patch you will see that cc1 crashed while compiling some haskell code: cc1[13472] cmdline: /usr/lib/gcc/hppa-linux-gnu/12/cc1 -quiet @/tmp/ccRkFSfY -imultilib . -imultiarch hppa-linux-gnu -D USE_MINIINTERPRETER -D NO_REGS -D _HPUX_SOURCE -D NOSMP -D THREADED_RTS -include /build/ghc/ghc-9.0.2/includes/dist-install/build/ghcversion.h -iquote compiler/GHC/Iface -quiet -dumpdir /tmp/ghc13413_0/ -dumpbase ghc_5.hc -dumpbase-ext .hc -O -Wimplicit -fno-PIC -fwrapv -fno-builtin -fno-strict-aliasing -o /tmp/ghc13413_0/ghc_5.s Another example are the glibc testcases which always segfault in "ld.so.1" with no other info: do_page_fault() command='ld.so.1' type=15 address=0x565921d8 in libc.so[f7339000+1bb000] -> With the patch you can see it was the "tst-safe-linking-malloc-hugetlb1" testcase: ld.so.1[1151] cmdline: /home/gnu/glibc/objdir/elf/ld.so.1 --library-path /home/gnu/glibc/objdir:/home/gnu/glibc/objdir/math:/home/gnu/ /home/gnu/glibc/objdir/malloc/tst-safe-linking-malloc-hugetlb1 An example of a typical x86 fault shows up as: crash[2326]: segfault at 0 ip 561a7969c12e sp 7ffe97a05630 error 6 in crash[561a7969c000+1000] Code: 68 ff ff ff c6 05 19 2f 00 00 01 5d c3 0f 1f 80 00 00 00 00 c3 0f 1f 80 00 00 ... -> with this patch you now see the whole command line: crash[2326] cmdline: ./crash test_write_to_page_0 The patches are relatively small, and reuse functions which are used to create the output for the /proc//cmdline files. The relevant changes are in patches #1 and #2. Patch #3 adds the cmdline dump on x86. Patch #4 drops code from arc which now becomes unnecessary as this is done by generic code. Helge --- Changes in v3: - require task to be locked by caller, noticed by kernel test robot - add parameter names in header files, noticed by kernel test robot Changes in v2: - Don't dump all or parts of the commandline depending on the kptr_restrict sysctl value (suggested by Josh Triplett). - Patch sent to more arch mailing lists Helge Deller (4): proc: Add get_task_cmdline_kernel() function lib/dump_stack: Add dump_stack_print_cmdline() and wire up in dump_stack_print_info() x86/fault: Dump command line of faulting process to syslog arc: Use generic dump_stack_print_cmdline() implementation arch/arc/kernel/troubleshoot.c | 24 --- arch/x86/mm/fault.c| 2 + fs/proc/base.c | 74 +++--- include/linux/printk.h | 5 +++ include/linux/proc_fs.h| 5 +++ lib/dump_stack.c | 34 6 files changed, 97 insertions(+), 47 deletions(-) -- 2.37.1
[PATCH v3 2/4] lib/dump_stack: Add dump_stack_print_cmdline() and wire up in dump_stack_print_info()
Add the function dump_stack_print_cmdline() which can be used by arch code to print the command line of the current processs. This function is useful in arch code when dumping information for a faulting process. Wire this function up in the dump_stack_print_info() function to include the dumping of the command line for architectures which use dump_stack_print_info(). As an example, with this patch a failing glibc testcase (which uses ld.so.1 as starting program) up to now reported just "ld.so.1" failing: do_page_fault() command='ld.so.1' type=15 address=0x565921d8 in libc.so[f7339000+1bb000] trap #15: Data TLB miss fault, vm_start = 0x0001a000, vm_end = 0x0001b000 and now it reports in addition: ld.so.1[1151] cmdline: /home/gnu/glibc/objdir/elf/ld.so.1 --library-path /home/gnu/glibc/objdir:/home/gnu/glibc/objdir/math:/home/gnu/ /home/gnu/glibc/objdir/malloc/tst-safe-linking-malloc-hugetlb1 Josh Triplett noted that dumping such command line parameters into syslog may theoretically lead to information disclosure. That's why this patch checks the value of the kptr_restrict sysctl variable and will not print any information if kptr_restrict==2, and will not show the program parameters if kptr_restrict==1. Signed-off-by: Helge Deller --- include/linux/printk.h | 5 + lib/dump_stack.c | 34 ++ 2 files changed, 39 insertions(+) diff --git a/include/linux/printk.h b/include/linux/printk.h index cf7d666ab1f8..5290a32a197d 100644 --- a/include/linux/printk.h +++ b/include/linux/printk.h @@ -191,6 +191,7 @@ u32 log_buf_len_get(void); void log_buf_vmcoreinfo_setup(void); void __init setup_log_buf(int early); __printf(1, 2) void dump_stack_set_arch_desc(const char *fmt, ...); +void dump_stack_print_cmdline(const char *log_lvl); void dump_stack_print_info(const char *log_lvl); void show_regs_print_info(const char *log_lvl); extern asmlinkage void dump_stack_lvl(const char *log_lvl) __cold; @@ -262,6 +263,10 @@ static inline __printf(1, 2) void dump_stack_set_arch_desc(const char *fmt, ...) { } +static inline void dump_stack_print_cmdline(const char *log_lvl) +{ +} + static inline void dump_stack_print_info(const char *log_lvl) { } diff --git a/lib/dump_stack.c b/lib/dump_stack.c index 83471e81501a..38ef1067c7eb 100644 --- a/lib/dump_stack.c +++ b/lib/dump_stack.c @@ -14,6 +14,7 @@ #include #include #include +#include static char dump_stack_arch_desc_str[128]; @@ -45,6 +46,37 @@ void __init dump_stack_set_arch_desc(const char *fmt, ...) #define BUILD_ID_VAL "" #endif +/** + * dump_stack_print_cmdline - print the command line of current process + * @log_lvl: log level + */ +void dump_stack_print_cmdline(const char *log_lvl) +{ + char cmdline[256]; + + if (kptr_restrict >= 2) + return; /* never show command line */ + + /* get command line */ + get_task_cmdline_kernel(current, cmdline, sizeof(cmdline)); + + if (kptr_restrict == 1) { + char *p; + + /* if restricted show program path only */ + p = strchr(cmdline, ' '); + if (p) { + *p = 0; + strlcat(cmdline, + " ... [parameters hidden due to kptr_restrict]", + sizeof(cmdline)); + } + } + + printk("%s%s[%d] cmdline: %s\n", log_lvl, current->comm, + current->pid, cmdline); +} + /** * dump_stack_print_info - print generic debug info for dump_stack() * @log_lvl: log level @@ -62,6 +94,8 @@ void dump_stack_print_info(const char *log_lvl) (int)strcspn(init_utsname()->version, " "), init_utsname()->version, BUILD_ID_VAL); + dump_stack_print_cmdline(log_lvl); + if (dump_stack_arch_desc_str[0] != '\0') printk("%sHardware name: %s\n", log_lvl, dump_stack_arch_desc_str); -- 2.37.1
[PATCH v3 3/4] x86/fault: Dump command line of faulting process to syslog
If a process segfaults, include the command line of the faulting process in the syslog. In the example below, the "crash" program (which simply writes zero to address 0) was called with the parameters "this is a test": crash[2326]: segfault at 0 ip 561a7969c12e sp 7ffe97a05630 error 6 in crash[561a7969c000+1000] crash[2326] cmdline: ./crash this is a test Code: 68 ff ff ff c6 05 19 2f 00 00 01 5d c3 0f 1f 80 00 00 00 00 c3 0f 1f ... Signed-off-by: Helge Deller --- arch/x86/mm/fault.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c index fad8faa29d04..d4e21c402e29 100644 --- a/arch/x86/mm/fault.c +++ b/arch/x86/mm/fault.c @@ -784,6 +784,8 @@ show_signal_msg(struct pt_regs *regs, unsigned long error_code, printk(KERN_CONT "\n"); + dump_stack_print_cmdline(loglvl); + show_opcodes(regs, loglvl); } -- 2.37.1
[PATCH v3 4/4] arc: Use generic dump_stack_print_cmdline() implementation
The process program name and command line is now shown in generic code in dump_stack_print_info(), so drop the arc-specific implementation. Signed-off-by: Helge Deller --- arch/arc/kernel/troubleshoot.c | 24 1 file changed, 24 deletions(-) diff --git a/arch/arc/kernel/troubleshoot.c b/arch/arc/kernel/troubleshoot.c index 7654c2e42dc0..9807e590ee55 100644 --- a/arch/arc/kernel/troubleshoot.c +++ b/arch/arc/kernel/troubleshoot.c @@ -51,29 +51,6 @@ static void print_regs_callee(struct callee_regs *regs) regs->r24, regs->r25); } -static void print_task_path_n_nm(struct task_struct *tsk) -{ - char *path_nm = NULL; - struct mm_struct *mm; - struct file *exe_file; - char buf[ARC_PATH_MAX]; - - mm = get_task_mm(tsk); - if (!mm) - goto done; - - exe_file = get_mm_exe_file(mm); - mmput(mm); - - if (exe_file) { - path_nm = file_path(exe_file, buf, ARC_PATH_MAX-1); - fput(exe_file); - } - -done: - pr_info("Path: %s\n", !IS_ERR(path_nm) ? path_nm : "?"); -} - static void show_faulting_vma(unsigned long address) { struct vm_area_struct *vma; @@ -176,7 +153,6 @@ void show_regs(struct pt_regs *regs) */ preempt_enable(); - print_task_path_n_nm(tsk); show_regs_print_info(KERN_INFO); show_ecr_verbose(regs); -- 2.37.1
[PATCH v2 2/2] powerpc/mm: Support execute-only memory on the Radix MMU
The Hash MMU already supports XOM (i.e. mmap with PROT_EXEC only) through the execute-only pkey. A PROT_EXEC-only mapping will actually map to RX, and then the pkey will be applied on top of it. Radix doesn't have pkeys, but it does have execute permissions built-in to the MMU, so all we have to do to support XOM is expose it. That's not possible with protection_map being const, so make it RO after init instead. Signed-off-by: Russell Currey --- v2: Make protection_map __ro_after_init and set it in an initcall (v1 didn't work, I tested before rebasing on Anshuman's patches) basic test: https://raw.githubusercontent.com/ruscur/junkcode/main/mmap_test.c arch/powerpc/include/asm/book3s/64/radix.h | 3 +++ arch/powerpc/include/asm/pgtable.h | 1 - arch/powerpc/mm/fault.c| 10 ++ arch/powerpc/mm/pgtable.c | 16 +++- 4 files changed, 28 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/include/asm/book3s/64/radix.h b/arch/powerpc/include/asm/book3s/64/radix.h index 686001eda936..bf316b773d73 100644 --- a/arch/powerpc/include/asm/book3s/64/radix.h +++ b/arch/powerpc/include/asm/book3s/64/radix.h @@ -19,6 +19,9 @@ #include #endif +/* Execute-only page protections, Hash can use RX + execute-only pkey */ +#define PAGE_EXECONLY __pgprot(_PAGE_BASE | _PAGE_EXEC) + /* An empty PTE can still have a R or C writeback */ #define RADIX_PTE_NONE_MASK(_PAGE_DIRTY | _PAGE_ACCESSED) diff --git a/arch/powerpc/include/asm/pgtable.h b/arch/powerpc/include/asm/pgtable.h index 33f4bf8d22b0..3cbb6de20f9d 100644 --- a/arch/powerpc/include/asm/pgtable.h +++ b/arch/powerpc/include/asm/pgtable.h @@ -60,7 +60,6 @@ extern void paging_init(void); void poking_init(void); extern unsigned long ioremap_bot; -extern const pgprot_t protection_map[16]; /* * kern_addr_valid is intended to indicate whether an address is a valid diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c index 014005428687..887c0cc45ca6 100644 --- a/arch/powerpc/mm/fault.c +++ b/arch/powerpc/mm/fault.c @@ -270,6 +270,16 @@ static bool access_error(bool is_write, bool is_exec, struct vm_area_struct *vma return false; } + if (unlikely(!(vma->vm_flags & VM_READ))) { + /* +* If we're on Radix, then this could be a read attempt on +* execute-only memory. On other MMUs, an "exec-only" page +* will be given RX flags, so this might be redundant. +*/ + if (radix_enabled()) + return true; + } + if (unlikely(!vma_is_accessible(vma))) return true; /* diff --git a/arch/powerpc/mm/pgtable.c b/arch/powerpc/mm/pgtable.c index 0b2bbde5fb65..6e1a6a999c3c 100644 --- a/arch/powerpc/mm/pgtable.c +++ b/arch/powerpc/mm/pgtable.c @@ -475,7 +475,7 @@ pte_t *__find_linux_pte(pgd_t *pgdir, unsigned long ea, EXPORT_SYMBOL_GPL(__find_linux_pte); /* Note due to the way vm flags are laid out, the bits are XWR */ -const pgprot_t protection_map[16] = { +static pgprot_t protection_map[16] __ro_after_init = { [VM_NONE] = PAGE_NONE, [VM_READ] = PAGE_READONLY, [VM_WRITE] = PAGE_COPY, @@ -494,6 +494,20 @@ const pgprot_t protection_map[16] = { [VM_SHARED | VM_EXEC | VM_WRITE | VM_READ] = PAGE_SHARED_X }; +#ifdef CONFIG_PPC_RADIX_MMU +static int __init radix_update_protection_map(void) +{ + if (early_radix_enabled()) { + /* Radix directly supports execute-only page protections */ + protection_map[VM_EXEC] = PAGE_EXECONLY; + protection_map[VM_EXEC | VM_SHARED] = PAGE_EXECONLY; + } + + return 0; +} +arch_initcall(radix_update_protection_map); +#endif /* CONFIG_PPC_RADIX_MMU */ + #ifdef CONFIG_PPC_BOOK3S_64 pgprot_t vm_get_page_prot(unsigned long vm_flags) { -- 2.37.1
[PATCH v2 1/2] powerpc/mm: Move vm_get_page_prot() out of book3s64 code
protection_map is about to be __ro_after_init instead of const, so move the only non-local function that consumes it to the same file so it can at least be static. Signed-off-by: Russell Currey --- v2: new arch/powerpc/mm/book3s64/pgtable.c | 16 arch/powerpc/mm/pgtable.c | 21 +++-- 2 files changed, 19 insertions(+), 18 deletions(-) diff --git a/arch/powerpc/mm/book3s64/pgtable.c b/arch/powerpc/mm/book3s64/pgtable.c index 7b9966402b25..e2a4ea5eb960 100644 --- a/arch/powerpc/mm/book3s64/pgtable.c +++ b/arch/powerpc/mm/book3s64/pgtable.c @@ -550,19 +550,3 @@ unsigned long memremap_compat_align(void) } EXPORT_SYMBOL_GPL(memremap_compat_align); #endif - -pgprot_t vm_get_page_prot(unsigned long vm_flags) -{ - unsigned long prot = pgprot_val(protection_map[vm_flags & - (VM_READ|VM_WRITE|VM_EXEC|VM_SHARED)]); - - if (vm_flags & VM_SAO) - prot |= _PAGE_SAO; - -#ifdef CONFIG_PPC_MEM_KEYS - prot |= vmflag_to_pte_pkey_bits(vm_flags); -#endif - - return __pgprot(prot); -} -EXPORT_SYMBOL(vm_get_page_prot); diff --git a/arch/powerpc/mm/pgtable.c b/arch/powerpc/mm/pgtable.c index cb2dcdb18f8e..0b2bbde5fb65 100644 --- a/arch/powerpc/mm/pgtable.c +++ b/arch/powerpc/mm/pgtable.c @@ -27,6 +27,7 @@ #include #include #include +#include #ifdef CONFIG_PPC64 #define PGD_ALIGN (sizeof(pgd_t) * MAX_PTRS_PER_PGD) @@ -493,6 +494,22 @@ const pgprot_t protection_map[16] = { [VM_SHARED | VM_EXEC | VM_WRITE | VM_READ] = PAGE_SHARED_X }; -#ifndef CONFIG_PPC_BOOK3S_64 -DECLARE_VM_GET_PAGE_PROT +#ifdef CONFIG_PPC_BOOK3S_64 +pgprot_t vm_get_page_prot(unsigned long vm_flags) +{ + unsigned long prot = pgprot_val(protection_map[vm_flags & + (VM_READ|VM_WRITE|VM_EXEC|VM_SHARED)]); + + if (vm_flags & VM_SAO) + prot |= _PAGE_SAO; + +#ifdef CONFIG_PPC_MEM_KEYS + prot |= vmflag_to_pte_pkey_bits(vm_flags); #endif + + return __pgprot(prot); +} +EXPORT_SYMBOL(vm_get_page_prot); +#else +DECLARE_VM_GET_PAGE_PROT +#endif /* CONFIG_PPC_BOOK3S_64 */ -- 2.37.1
Re: [PATCH] powerpc/mm: Support execute-only memory on the Radix MMU
On 8/8/22 5:28 PM, Russell Currey wrote: > The Hash MMU already supports XOM (i.e. mmap with PROT_EXEC only) > through the execute-only pkey. A PROT_ONLY mapping will actually map to > RX, and then the pkey will be applied on top of it. > > Radix doesn't have pkeys, but it does have execute permissions built-in > to the MMU, so all we have to do to support XOM is expose it. > > Signed-off-by: Russell Currey > --- > quick test: https://raw.githubusercontent.com/ruscur/junkcode/main/mmap_test.c > I can make it a selftest. > > arch/powerpc/include/asm/book3s/64/radix.h | 3 +++ > arch/powerpc/mm/book3s64/radix_pgtable.c | 4 > arch/powerpc/mm/fault.c| 10 ++ > 3 files changed, 17 insertions(+) > > diff --git a/arch/powerpc/include/asm/book3s/64/radix.h > b/arch/powerpc/include/asm/book3s/64/radix.h > index 686001eda936..bf316b773d73 100644 > --- a/arch/powerpc/include/asm/book3s/64/radix.h > +++ b/arch/powerpc/include/asm/book3s/64/radix.h > @@ -19,6 +19,9 @@ > #include > #endif > > +/* Execute-only page protections, Hash can use RX + execute-only pkey */ > +#define PAGE_EXECONLY__pgprot(_PAGE_BASE | _PAGE_EXEC) > + > /* An empty PTE can still have a R or C writeback */ > #define RADIX_PTE_NONE_MASK (_PAGE_DIRTY | _PAGE_ACCESSED) > > diff --git a/arch/powerpc/mm/book3s64/radix_pgtable.c > b/arch/powerpc/mm/book3s64/radix_pgtable.c > index 698274109c91..2edb56169805 100644 > --- a/arch/powerpc/mm/book3s64/radix_pgtable.c > +++ b/arch/powerpc/mm/book3s64/radix_pgtable.c > @@ -617,6 +617,10 @@ void __init radix__early_init_mmu(void) > __pmd_frag_nr = RADIX_PMD_FRAG_NR; > __pmd_frag_size_shift = RADIX_PMD_FRAG_SIZE_SHIFT; > > + /* Radix directly supports execute-only page protections */ > + protection_map[VM_EXEC] = PAGE_EXECONLY; > + protection_map[VM_EXEC | VM_SHARED] = PAGE_EXECONLY; > + > radix_init_pgtable(); > > if (!firmware_has_feature(FW_FEATURE_LPAR)) { > diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c > index 014005428687..887c0cc45ca6 100644 > --- a/arch/powerpc/mm/fault.c > +++ b/arch/powerpc/mm/fault.c > @@ -270,6 +270,16 @@ static bool access_error(bool is_write, bool is_exec, > struct vm_area_struct *vma > return false; > } > > + if (unlikely(!(vma->vm_flags & VM_READ))) { > + /* > + * If we're on Radix, then this could be a read attempt on > + * execute-only memory. On other MMUs, an "exec-only" page > + * will be given RX flags, so this might be redundant. > + */ > + if (radix_enabled()) > + return true; > + } > + should we do /* This cover both PROT_NONE (due to check above) and exec only mapping */ if (radix_enabled() && !(vma->vm_flags & VM_READ)) { return true; /* PROT_NONE check */ else if (!vma_is_accessible(vma)) return true; return false; > if (unlikely(!vma_is_accessible(vma))) > return true; > /* -aneesh
[PATCH 07/16] powerpc: Skip objtool from running on VDSO files
Do not run objtool on VDSO files, by using OBJECT_FILES_NON_STANDARD Suggested-by: Christophe Leroy Signed-off-by: Sathvika Vasireddy --- arch/powerpc/kernel/vdso/Makefile | 2 ++ 1 file changed, 2 insertions(+) diff --git a/arch/powerpc/kernel/vdso/Makefile b/arch/powerpc/kernel/vdso/Makefile index 096b0bf1335f..a49a0d6a1c53 100644 --- a/arch/powerpc/kernel/vdso/Makefile +++ b/arch/powerpc/kernel/vdso/Makefile @@ -102,3 +102,5 @@ quiet_cmd_vdso64ld_and_check = VDSO64L $@ cmd_vdso64ld_and_check = $(VDSOCC) $(c_flags) $(CC64FLAGS) -o $@ -Wl,-T$(filter %.lds,$^) $(filter %.o,$^) ; $(cmd_vdso_check) quiet_cmd_vdso64as = VDSO64A $@ cmd_vdso64as = $(VDSOCC) $(a_flags) $(CC64FLAGS) $(AS64FLAGS) -c -o $@ $< + +OBJECT_FILES_NON_STANDARD := y -- 2.31.1
[PATCH] powerpc/mm: Support execute-only memory on the Radix MMU
The Hash MMU already supports XOM (i.e. mmap with PROT_EXEC only) through the execute-only pkey. A PROT_ONLY mapping will actually map to RX, and then the pkey will be applied on top of it. Radix doesn't have pkeys, but it does have execute permissions built-in to the MMU, so all we have to do to support XOM is expose it. Signed-off-by: Russell Currey --- quick test: https://raw.githubusercontent.com/ruscur/junkcode/main/mmap_test.c I can make it a selftest. arch/powerpc/include/asm/book3s/64/radix.h | 3 +++ arch/powerpc/mm/book3s64/radix_pgtable.c | 4 arch/powerpc/mm/fault.c| 10 ++ 3 files changed, 17 insertions(+) diff --git a/arch/powerpc/include/asm/book3s/64/radix.h b/arch/powerpc/include/asm/book3s/64/radix.h index 686001eda936..bf316b773d73 100644 --- a/arch/powerpc/include/asm/book3s/64/radix.h +++ b/arch/powerpc/include/asm/book3s/64/radix.h @@ -19,6 +19,9 @@ #include #endif +/* Execute-only page protections, Hash can use RX + execute-only pkey */ +#define PAGE_EXECONLY __pgprot(_PAGE_BASE | _PAGE_EXEC) + /* An empty PTE can still have a R or C writeback */ #define RADIX_PTE_NONE_MASK(_PAGE_DIRTY | _PAGE_ACCESSED) diff --git a/arch/powerpc/mm/book3s64/radix_pgtable.c b/arch/powerpc/mm/book3s64/radix_pgtable.c index 698274109c91..2edb56169805 100644 --- a/arch/powerpc/mm/book3s64/radix_pgtable.c +++ b/arch/powerpc/mm/book3s64/radix_pgtable.c @@ -617,6 +617,10 @@ void __init radix__early_init_mmu(void) __pmd_frag_nr = RADIX_PMD_FRAG_NR; __pmd_frag_size_shift = RADIX_PMD_FRAG_SIZE_SHIFT; + /* Radix directly supports execute-only page protections */ + protection_map[VM_EXEC] = PAGE_EXECONLY; + protection_map[VM_EXEC | VM_SHARED] = PAGE_EXECONLY; + radix_init_pgtable(); if (!firmware_has_feature(FW_FEATURE_LPAR)) { diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c index 014005428687..887c0cc45ca6 100644 --- a/arch/powerpc/mm/fault.c +++ b/arch/powerpc/mm/fault.c @@ -270,6 +270,16 @@ static bool access_error(bool is_write, bool is_exec, struct vm_area_struct *vma return false; } + if (unlikely(!(vma->vm_flags & VM_READ))) { + /* +* If we're on Radix, then this could be a read attempt on +* execute-only memory. On other MMUs, an "exec-only" page +* will be given RX flags, so this might be redundant. +*/ + if (radix_enabled()) + return true; + } + if (unlikely(!vma_is_accessible(vma))) return true; /* -- 2.37.1
[PATCH 16/16] objtool/powerpc: Add --mcount specific implementation
This patch enables objtool --mcount on powerpc, and adds implementation specific to powerpc. Signed-off-by: Sathvika Vasireddy --- arch/powerpc/Kconfig | 1 + tools/objtool/arch/powerpc/decode.c | 22 +++ tools/objtool/arch/powerpc/include/arch/elf.h | 2 ++ 3 files changed, 25 insertions(+) diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index dc05cd23c233..6be2e68fa9eb 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -238,6 +238,7 @@ config PPC select HAVE_NMI if PERF_EVENTS || (PPC64 && PPC_BOOK3S) select HAVE_OPTPROBES select HAVE_OBJTOOL if PPC32 || MPROFILE_KERNEL + select HAVE_OBJTOOL_MCOUNT if HAVE_OBJTOOL select HAVE_PERF_EVENTS select HAVE_PERF_EVENTS_NMI if PPC64 select HAVE_PERF_REGS diff --git a/tools/objtool/arch/powerpc/decode.c b/tools/objtool/arch/powerpc/decode.c index 8b6a14680da7..b71c265ed503 100644 --- a/tools/objtool/arch/powerpc/decode.c +++ b/tools/objtool/arch/powerpc/decode.c @@ -9,6 +9,14 @@ #include #include +bool arch_ftrace_match(char *name) +{ + if (!strcmp(name, "_mcount")) + return true; + + return false; +} + unsigned long arch_dest_reloc_offset(int addend) { return addend; @@ -41,12 +49,26 @@ int arch_decode_instruction(struct objtool_file *file, const struct section *sec struct list_head *ops_list) { u32 insn; + unsigned int opcode; *immediate = 0; insn = bswap_if_needed(file->elf, *(u32 *)(sec->data->d_buf + offset)); *len = 4; *type = INSN_OTHER; + opcode = insn >> 26; + + switch (opcode) { + case 18: /* bl */ + if ((insn & 3) == 1) { + *type = INSN_CALL; + *immediate = insn & 0x3fc; + if (*immediate & 0x200) + *immediate -= 0x400; + } + break; + } + return 0; } diff --git a/tools/objtool/arch/powerpc/include/arch/elf.h b/tools/objtool/arch/powerpc/include/arch/elf.h index 3c8ebb7d2a6b..73f9ae172fe5 100644 --- a/tools/objtool/arch/powerpc/include/arch/elf.h +++ b/tools/objtool/arch/powerpc/include/arch/elf.h @@ -4,5 +4,7 @@ #define _OBJTOOL_ARCH_ELF #define R_NONE R_PPC_NONE +#define R_ABS64 R_PPC64_ADDR64 +#define R_ABS32 R_PPC_ADDR32 #endif /* _OBJTOOL_ARCH_ELF */ -- 2.31.1
[PATCH 15/16] objtool/powerpc: Enable objtool to be built on ppc
This patch adds [stub] implementations for required functions, inorder to enable objtool build on powerpc. Signed-off-by: Sathvika Vasireddy [Christophe Leroy: powerpc: Add missing asm/asm.h for objtool] Signed-off-by: Christophe Leroy --- arch/powerpc/Kconfig | 1 + arch/powerpc/include/asm/asm.h| 7 ++ tools/objtool/arch/powerpc/Build | 2 + tools/objtool/arch/powerpc/decode.c | 74 +++ .../arch/powerpc/include/arch/cfi_regs.h | 11 +++ tools/objtool/arch/powerpc/include/arch/elf.h | 8 ++ .../arch/powerpc/include/arch/special.h | 21 ++ tools/objtool/arch/powerpc/special.c | 19 + 8 files changed, 143 insertions(+) create mode 100644 arch/powerpc/include/asm/asm.h create mode 100644 tools/objtool/arch/powerpc/Build create mode 100644 tools/objtool/arch/powerpc/decode.c create mode 100644 tools/objtool/arch/powerpc/include/arch/cfi_regs.h create mode 100644 tools/objtool/arch/powerpc/include/arch/elf.h create mode 100644 tools/objtool/arch/powerpc/include/arch/special.h create mode 100644 tools/objtool/arch/powerpc/special.c diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index 4c466acdc70d..dc05cd23c233 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -237,6 +237,7 @@ config PPC select HAVE_MOD_ARCH_SPECIFIC select HAVE_NMI if PERF_EVENTS || (PPC64 && PPC_BOOK3S) select HAVE_OPTPROBES + select HAVE_OBJTOOL if PPC32 || MPROFILE_KERNEL select HAVE_PERF_EVENTS select HAVE_PERF_EVENTS_NMI if PPC64 select HAVE_PERF_REGS diff --git a/arch/powerpc/include/asm/asm.h b/arch/powerpc/include/asm/asm.h new file mode 100644 index ..86f46b604e9a --- /dev/null +++ b/arch/powerpc/include/asm/asm.h @@ -0,0 +1,7 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _ASM_POWERPC_ASM_H +#define _ASM_POWERPC_ASM_H + +#define _ASM_PTR " .long " + +#endif /* _ASM_POWERPC_ASM_H */ diff --git a/tools/objtool/arch/powerpc/Build b/tools/objtool/arch/powerpc/Build new file mode 100644 index ..d24d5636a5b8 --- /dev/null +++ b/tools/objtool/arch/powerpc/Build @@ -0,0 +1,2 @@ +objtool-y += decode.o +objtool-y += special.o diff --git a/tools/objtool/arch/powerpc/decode.c b/tools/objtool/arch/powerpc/decode.c new file mode 100644 index ..8b6a14680da7 --- /dev/null +++ b/tools/objtool/arch/powerpc/decode.c @@ -0,0 +1,74 @@ +// SPDX-License-Identifier: GPL-2.0-or-later + +#include +#include +#include +#include +#include +#include +#include +#include + +unsigned long arch_dest_reloc_offset(int addend) +{ + return addend; +} + +bool arch_callee_saved_reg(unsigned char reg) +{ + return false; +} + +int arch_decode_hint_reg(u8 sp_reg, int *base) +{ + exit(-1); +} + +const char *arch_nop_insn(int len) +{ + exit(-1); +} + +const char *arch_ret_insn(int len) +{ + exit(-1); +} + +int arch_decode_instruction(struct objtool_file *file, const struct section *sec, + unsigned long offset, unsigned int maxlen, + unsigned int *len, enum insn_type *type, + unsigned long *immediate, + struct list_head *ops_list) +{ + u32 insn; + + *immediate = 0; + insn = bswap_if_needed(file->elf, *(u32 *)(sec->data->d_buf + offset)); + *len = 4; + *type = INSN_OTHER; + + return 0; +} + +unsigned long arch_jump_destination(struct instruction *insn) +{ + return insn->offset + insn->immediate; +} + +void arch_initial_func_cfi_state(struct cfi_init_state *state) +{ + int i; + + for (i = 0; i < CFI_NUM_REGS; i++) { + state->regs[i].base = CFI_UNDEFINED; + state->regs[i].offset = 0; + } + + /* initial CFA (call frame address) */ + state->cfa.base = CFI_SP; + state->cfa.offset = 0; + + /* initial LR (return address) */ + state->regs[CFI_RA].base = CFI_CFA; + state->regs[CFI_RA].offset = 0; +} diff --git a/tools/objtool/arch/powerpc/include/arch/cfi_regs.h b/tools/objtool/arch/powerpc/include/arch/cfi_regs.h new file mode 100644 index ..59638ebeafc8 --- /dev/null +++ b/tools/objtool/arch/powerpc/include/arch/cfi_regs.h @@ -0,0 +1,11 @@ +/* SPDX-License-Identifier: GPL-2.0-or-later */ + +#ifndef _OBJTOOL_CFI_REGS_H +#define _OBJTOOL_CFI_REGS_H + +#define CFI_BP 1 +#define CFI_SP CFI_BP +#define CFI_RA 32 +#define CFI_NUM_REGS 33 + +#endif diff --git a/tools/objtool/arch/powerpc/include/arch/elf.h b/tools/objtool/arch/powerpc/include/arch/elf.h new file mode 100644 index ..3c8ebb7d2a6b --- /dev/null +++ b/tools/objtool/arch/powerpc/include/arch/elf.h @@ -0,0 +1,8 @@ +/* SPDX-License-Identifier: GPL-2.0-or-later */ + +#ifndef _OBJTOOL_ARCH_ELF +#define _OBJTOOL_ARCH_ELF + +#define R_NONE R_PPC_NON
[PATCH 14/16] objtool: Add arch specific function arch_ftrace_match()
Add architecture specific function to look for relocation records pointing to arch specific symbols. Suggested-by: Christophe Leroy Signed-off-by: Sathvika Vasireddy --- tools/objtool/arch/x86/decode.c | 8 tools/objtool/check.c| 2 +- tools/objtool/include/objtool/arch.h | 2 ++ 3 files changed, 11 insertions(+), 1 deletion(-) diff --git a/tools/objtool/arch/x86/decode.c b/tools/objtool/arch/x86/decode.c index c260006106be..0e91eb1d6386 100644 --- a/tools/objtool/arch/x86/decode.c +++ b/tools/objtool/arch/x86/decode.c @@ -23,6 +23,14 @@ #include #include +bool arch_ftrace_match(char *name) +{ + if (!strcmp(func->name, "__fentry__")) + return true; + + return false; +} + static int is_x86_64(const struct elf *elf) { switch (elf->ehdr.e_machine) { diff --git a/tools/objtool/check.c b/tools/objtool/check.c index 2f1da8dbfce9..dda163c1e5a3 100644 --- a/tools/objtool/check.c +++ b/tools/objtool/check.c @@ -2294,7 +2294,7 @@ static int classify_symbols(struct objtool_file *file) if (arch_is_rethunk(func)) func->return_thunk = true; - if (!strcmp(func->name, "__fentry__")) + if (arch_ftrace_match(func->name)) func->fentry = true; if (is_profiling_func(func->name)) diff --git a/tools/objtool/include/objtool/arch.h b/tools/objtool/include/objtool/arch.h index beb2f3aa94ff..2ba4b9897285 100644 --- a/tools/objtool/include/objtool/arch.h +++ b/tools/objtool/include/objtool/arch.h @@ -69,6 +69,8 @@ struct stack_op { struct instruction; +bool arch_ftrace_match(char *name); + void arch_initial_func_cfi_state(struct cfi_init_state *state); int arch_decode_instruction(struct objtool_file *file, const struct section *sec, -- 2.31.1
[PATCH 13/16] objtool: Use macros to define arch specific reloc types
Make relocation types architecture specific. Signed-off-by: Sathvika Vasireddy --- tools/objtool/arch/x86/include/arch/elf.h | 2 ++ tools/objtool/check.c | 2 +- 2 files changed, 3 insertions(+), 1 deletion(-) diff --git a/tools/objtool/arch/x86/include/arch/elf.h b/tools/objtool/arch/x86/include/arch/elf.h index 69cc4264b28a..ac14987cf687 100644 --- a/tools/objtool/arch/x86/include/arch/elf.h +++ b/tools/objtool/arch/x86/include/arch/elf.h @@ -2,5 +2,7 @@ #define _OBJTOOL_ARCH_ELF #define R_NONE R_X86_64_NONE +#define R_ABS64 R_X86_64_64 +#define R_ABS32 R_X86_64_32 #endif /* _OBJTOOL_ARCH_ELF */ diff --git a/tools/objtool/check.c b/tools/objtool/check.c index 36f367da4752..2f1da8dbfce9 100644 --- a/tools/objtool/check.c +++ b/tools/objtool/check.c @@ -883,7 +883,7 @@ static int create_mcount_loc_sections(struct objtool_file *file) memset(loc, 0, addrsize); if (elf_add_reloc_to_insn(file->elf, sec, idx, - R_X86_64_64, + addrsize == sizeof(u64) ? R_ABS64 : R_ABS32, insn->sec, insn->offset)) return -1; -- 2.31.1
[PATCH 12/16] objtool: Read special sections with alts only when specific options are selected
This patch reads special sections which have alternate instructions, only when stackval or orc or uaccess or noinstr options are passed to objtool. Signed-off-by: Sathvika Vasireddy --- tools/objtool/check.c | 8 +--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/tools/objtool/check.c b/tools/objtool/check.c index 1216a0fceaba..36f367da4752 100644 --- a/tools/objtool/check.c +++ b/tools/objtool/check.c @@ -2370,9 +2370,11 @@ static int decode_sections(struct objtool_file *file) * Must be before add_jump_destinations(), which depends on 'func' * being set for alternatives, to enable proper sibling call detection. */ - ret = add_special_section_alts(file); - if (ret) - return ret; + if (opts.stackval || opts.orc || opts.uaccess || opts.noinstr) { + ret = add_special_section_alts(file); + if (ret) + return ret; + } ret = add_jump_destinations(file); if (ret) -- 2.31.1
[PATCH 11/16] objtool: Add --mnop as an option to --mcount
Architectures can select HAVE_NOP_MCOUNT if they choose to nop out mcount call sites. If that config option is selected, then --mnop is passed as an option to objtool, along with --mcount. Also, make sure that --mnop can be passed as an option to objtool only when --mcount is passed. Signed-off-by: Sathvika Vasireddy --- Makefile| 4 +++- arch/x86/Kconfig| 1 + scripts/Makefile.lib| 1 + tools/objtool/builtin-check.c | 14 ++ tools/objtool/check.c | 19 ++- tools/objtool/include/objtool/builtin.h | 1 + 6 files changed, 30 insertions(+), 10 deletions(-) diff --git a/Makefile b/Makefile index dc6295f91263..c8f19c4bbc87 100644 --- a/Makefile +++ b/Makefile @@ -856,7 +856,9 @@ ifdef CONFIG_FTRACE_MCOUNT_USE_CC endif endif ifdef CONFIG_FTRACE_MCOUNT_USE_OBJTOOL - CC_FLAGS_USING += -DCC_USING_NOP_MCOUNT + ifdef CONFIG_HAVE_NOP_MCOUNT +CC_FLAGS_USING += -DCC_USING_NOP_MCOUNT + endif endif ifdef CONFIG_FTRACE_MCOUNT_USE_RECORDMCOUNT ifdef CONFIG_HAVE_C_RECORDMCOUNT diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index f9920f1341c8..a8dd138df637 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -189,6 +189,7 @@ config X86 select HAVE_CONTEXT_TRACKING_USER_OFFSTACK if HAVE_CONTEXT_TRACKING_USER select HAVE_C_RECORDMCOUNT select HAVE_OBJTOOL_MCOUNT if HAVE_OBJTOOL + select HAVE_NOP_MCOUNT if HAVE_OBJTOOL_MCOUNT select HAVE_BUILDTIME_MCOUNT_SORT select HAVE_DEBUG_KMEMLEAK select HAVE_DMA_CONTIGUOUS diff --git a/scripts/Makefile.lib b/scripts/Makefile.lib index 3fb6a99e78c4..0610078e057a 100644 --- a/scripts/Makefile.lib +++ b/scripts/Makefile.lib @@ -234,6 +234,7 @@ objtool_args = \ $(if $(CONFIG_HAVE_NOINSTR_HACK), --hacks=noinstr) \ $(if $(CONFIG_X86_KERNEL_IBT), --ibt) \ $(if $(CONFIG_FTRACE_MCOUNT_USE_OBJTOOL), --mcount) \ + $(if $(CONFIG_HAVE_NOP_MCOUNT), --mnop) \ $(if $(CONFIG_UNWINDER_ORC), --orc) \ $(if $(CONFIG_RETPOLINE), --retpoline) \ $(if $(CONFIG_RETHUNK), --rethunk) \ diff --git a/tools/objtool/builtin-check.c b/tools/objtool/builtin-check.c index 24fbe803a0d3..9bd347d3c244 100644 --- a/tools/objtool/builtin-check.c +++ b/tools/objtool/builtin-check.c @@ -82,6 +82,7 @@ const struct option check_options[] = { OPT_BOOLEAN(0, "dry-run", &opts.dryrun, "don't write modifications"), OPT_BOOLEAN(0, "link", &opts.link, "object is a linked object"), OPT_BOOLEAN(0, "module", &opts.module, "object is part of a kernel module"), + OPT_BOOLEAN(0, "mnop", &opts.mnop, "nop out mcount call sites"), OPT_BOOLEAN(0, "no-unreachable", &opts.no_unreachable, "skip 'unreachable instruction' warnings"), OPT_BOOLEAN(0, "sec-address", &opts.sec_address, "print section addresses in warnings"), OPT_BOOLEAN(0, "stats", &opts.stats, "print statistics"), @@ -150,6 +151,16 @@ static bool opts_valid(void) return false; } +static bool mnop_opts_valid(void) +{ + if (opts.mnop && !opts.mcount) { + ERROR("--mnop requires --mcount"); + return false; + } + + return true; +} + static bool link_opts_valid(struct objtool_file *file) { if (opts.link) @@ -198,6 +209,9 @@ int objtool_run(int argc, const char **argv) if (!file) return 1; + if (!mnop_opts_valid()) + return 1; + if (!link_opts_valid(file)) return 1; diff --git a/tools/objtool/check.c b/tools/objtool/check.c index 503a7961b9be..1216a0fceaba 100644 --- a/tools/objtool/check.c +++ b/tools/objtool/check.c @@ -1231,17 +1231,18 @@ static void annotate_call_site(struct objtool_file *file, if (opts.mcount && sym->fentry) { if (sibling) WARN_FUNC("Tail call to __fentry__ !?!?", insn->sec, insn->offset); + if (opts.mnop) { + if (reloc) { + reloc->type = R_NONE; + elf_write_reloc(file->elf, reloc); + } - if (reloc) { - reloc->type = R_NONE; - elf_write_reloc(file->elf, reloc); - } - - elf_write_insn(file->elf, insn->sec, - insn->offset, insn->len, - arch_nop_insn(insn->len)); + elf_write_insn(file->elf, insn->sec, + insn->offset, insn->len, + arch_nop_insn(insn->len));
[PATCH 10/16] objtool: Use target file class size instead of a compiled constant
From: Christophe Leroy In order to allow using objtool on cross-built kernels, determine size of long from elf data instead of using sizeof(long) at build time. For the time being this covers only mcount. Signed-off-by: Christophe Leroy --- tools/objtool/check.c | 16 +--- tools/objtool/elf.c | 8 ++-- tools/objtool/include/objtool/elf.h | 8 3 files changed, 23 insertions(+), 9 deletions(-) diff --git a/tools/objtool/check.c b/tools/objtool/check.c index 45fbc80ea7f9..503a7961b9be 100644 --- a/tools/objtool/check.c +++ b/tools/objtool/check.c @@ -851,9 +851,9 @@ static int create_ibt_endbr_seal_sections(struct objtool_file *file) static int create_mcount_loc_sections(struct objtool_file *file) { struct section *sec; - unsigned long *loc; struct instruction *insn; int idx; + int addrsize = elf_class_addrsize(file->elf); sec = find_section_by_name(file->elf, "__mcount_loc"); if (sec) { @@ -869,23 +869,25 @@ static int create_mcount_loc_sections(struct objtool_file *file) list_for_each_entry(insn, &file->mcount_loc_list, call_node) idx++; - sec = elf_create_section(file->elf, "__mcount_loc", 0, sizeof(unsigned long), idx); + sec = elf_create_section(file->elf, "__mcount_loc", 0, addrsize, idx); if (!sec) return -1; + sec->sh.sh_addralign = addrsize; + idx = 0; list_for_each_entry(insn, &file->mcount_loc_list, call_node) { + void *loc; - loc = (unsigned long *)sec->data->d_buf + idx; - memset(loc, 0, sizeof(unsigned long)); + loc = sec->data->d_buf + idx; + memset(loc, 0, addrsize); - if (elf_add_reloc_to_insn(file->elf, sec, - idx * sizeof(unsigned long), + if (elf_add_reloc_to_insn(file->elf, sec, idx, R_X86_64_64, insn->sec, insn->offset)) return -1; - idx++; + idx += addrsize; } return 0; diff --git a/tools/objtool/elf.c b/tools/objtool/elf.c index c25e957c1e52..40c6d53b081f 100644 --- a/tools/objtool/elf.c +++ b/tools/objtool/elf.c @@ -1124,6 +1124,7 @@ static struct section *elf_create_rela_reloc_section(struct elf *elf, struct sec { char *relocname; struct section *sec; + int addrsize = elf_class_addrsize(elf); relocname = malloc(strlen(base->name) + strlen(".rela") + 1); if (!relocname) { @@ -1133,7 +1134,10 @@ static struct section *elf_create_rela_reloc_section(struct elf *elf, struct sec strcpy(relocname, ".rela"); strcat(relocname, base->name); - sec = elf_create_section(elf, relocname, 0, sizeof(GElf_Rela), 0); + if (addrsize == sizeof(u32)) + sec = elf_create_section(elf, relocname, 0, sizeof(Elf32_Rela), 0); + else + sec = elf_create_section(elf, relocname, 0, sizeof(GElf_Rela), 0); free(relocname); if (!sec) return NULL; @@ -1142,7 +1146,7 @@ static struct section *elf_create_rela_reloc_section(struct elf *elf, struct sec sec->base = base; sec->sh.sh_type = SHT_RELA; - sec->sh.sh_addralign = 8; + sec->sh.sh_addralign = addrsize; sec->sh.sh_link = find_section_by_name(elf, ".symtab")->idx; sec->sh.sh_info = base->idx; sec->sh.sh_flags = SHF_INFO_LINK; diff --git a/tools/objtool/include/objtool/elf.h b/tools/objtool/include/objtool/elf.h index 16f4067b82ae..78b3aa2e546d 100644 --- a/tools/objtool/include/objtool/elf.h +++ b/tools/objtool/include/objtool/elf.h @@ -142,6 +142,14 @@ static inline bool has_multiple_files(struct elf *elf) return elf->num_files > 1; } +static inline int elf_class_addrsize(struct elf *elf) +{ + if (elf->ehdr.e_ident[EI_CLASS] == ELFCLASS32) + return sizeof(u32); + else + return sizeof(u64); +} + struct elf *elf_open_read(const char *name, int flags); struct section *elf_create_section(struct elf *elf, const char *name, unsigned int sh_flags, size_t entsize, int nr); -- 2.31.1
[PATCH 09/16] objtool: Use target file endianness instead of a compiled constant
From: Christophe Leroy Some architectures like powerpc support both endianness, it's therefore not possible to fix the endianness via arch/endianness.h because there is no easy way to get the target endianness at build time. Use the endianness recorded in the file objtool is working on. Signed-off-by: Christophe Leroy --- .../arch/x86/include/arch/endianness.h| 9 -- tools/objtool/check.c | 2 +- tools/objtool/include/objtool/endianness.h| 32 +-- tools/objtool/orc_dump.c | 11 +-- tools/objtool/orc_gen.c | 4 +-- tools/objtool/special.c | 3 +- 6 files changed, 30 insertions(+), 31 deletions(-) delete mode 100644 tools/objtool/arch/x86/include/arch/endianness.h diff --git a/tools/objtool/arch/x86/include/arch/endianness.h b/tools/objtool/arch/x86/include/arch/endianness.h deleted file mode 100644 index 7c362527da20.. --- a/tools/objtool/arch/x86/include/arch/endianness.h +++ /dev/null @@ -1,9 +0,0 @@ -/* SPDX-License-Identifier: GPL-2.0-or-later */ -#ifndef _ARCH_ENDIANNESS_H -#define _ARCH_ENDIANNESS_H - -#include - -#define __TARGET_BYTE_ORDER __LITTLE_ENDIAN - -#endif /* _ARCH_ENDIANNESS_H */ diff --git a/tools/objtool/check.c b/tools/objtool/check.c index 0ef805843d4f..45fbc80ea7f9 100644 --- a/tools/objtool/check.c +++ b/tools/objtool/check.c @@ -2075,7 +2075,7 @@ static int read_unwind_hints(struct objtool_file *file) return -1; } - cfi.cfa.offset = bswap_if_needed(hint->sp_offset); + cfi.cfa.offset = bswap_if_needed(file->elf, hint->sp_offset); cfi.type = hint->type; cfi.end = hint->end; diff --git a/tools/objtool/include/objtool/endianness.h b/tools/objtool/include/objtool/endianness.h index 10241341eff3..4d2aa9b0fe2f 100644 --- a/tools/objtool/include/objtool/endianness.h +++ b/tools/objtool/include/objtool/endianness.h @@ -2,33 +2,33 @@ #ifndef _OBJTOOL_ENDIANNESS_H #define _OBJTOOL_ENDIANNESS_H -#include #include #include - -#ifndef __TARGET_BYTE_ORDER -#error undefined arch __TARGET_BYTE_ORDER -#endif - -#if __BYTE_ORDER != __TARGET_BYTE_ORDER -#define __NEED_BSWAP 1 -#else -#define __NEED_BSWAP 0 -#endif +#include /* - * Does a byte swap if target endianness doesn't match the host, i.e. cross + * Does a byte swap if target file endianness doesn't match the host, i.e. cross * compilation for little endian on big endian and vice versa. * To be used for multi-byte values conversion, which are read from / about * to be written to a target native endianness ELF file. */ -#define bswap_if_needed(val) \ +static inline bool need_bswap(struct elf *elf) +{ + return (__BYTE_ORDER == __LITTLE_ENDIAN) ^ + (elf->ehdr.e_ident[EI_DATA] == ELFDATA2LSB); +} + +#define bswap_if_needed(elf, val) \ ({ \ __typeof__(val) __ret; \ + bool __need_bswap = need_bswap(elf);\ switch (sizeof(val)) { \ - case 8: __ret = __NEED_BSWAP ? bswap_64(val) : (val); break;\ - case 4: __ret = __NEED_BSWAP ? bswap_32(val) : (val); break;\ - case 2: __ret = __NEED_BSWAP ? bswap_16(val) : (val); break;\ + case 8: \ + __ret = __need_bswap ? bswap_64(val) : (val); break;\ + case 4: \ + __ret = __need_bswap ? bswap_32(val) : (val); break;\ + case 2: \ + __ret = __need_bswap ? bswap_16(val) : (val); break;\ default:\ BUILD_BUG(); break; \ } \ diff --git a/tools/objtool/orc_dump.c b/tools/objtool/orc_dump.c index f5a8508c42d6..4f1211fec82c 100644 --- a/tools/objtool/orc_dump.c +++ b/tools/objtool/orc_dump.c @@ -76,6 +76,7 @@ int orc_dump(const char *_objname) GElf_Rela rela; GElf_Sym sym; Elf_Data *data, *symtab = NULL, *rela_orc_ip = NULL; + struct elf dummy_elf = {}; objname = _objname; @@ -94,6 +95,12 @@ int orc_dump(const char *_objname) return -1; } + if (!elf64_getehdr(elf)) { + WARN_ELF("elf64_getehdr"); + return -1; + } + memcpy(&dummy_elf.ehdr, elf64_getehdr(elf), sizeof(dummy_elf.ehdr)); + if (elf_getshdrnum(elf, &nr_sections)) { WARN_ELF("elf_getshdrnum"); return -1;
[PATCH 08/16] objtool: Fix SEGFAULT
From: Christophe Leroy find_insn() will return NULL in case of failure. Check insn in order to avoid a kernel Oops for NULL pointer dereference. Signed-off-by: Christophe Leroy --- tools/objtool/check.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/objtool/check.c b/tools/objtool/check.c index 0cec74da7ffe..0ef805843d4f 100644 --- a/tools/objtool/check.c +++ b/tools/objtool/check.c @@ -205,7 +205,7 @@ static bool __dead_end_function(struct objtool_file *file, struct symbol *func, return false; insn = find_insn(file, func->sec, func->offset); - if (!insn->func) + if (!insn || !insn->func) return false; func_for_each_insn(file, func, insn) { -- 2.31.1
[PATCH 06/16] powerpc: Fix objtool unannotated intra-function call warnings on PPC32
From: Christophe Leroy Fix several annotations in assembly files on PPC32. Signed-off-by: Christophe Leroy [Sathvika Vasireddy: Changed subject line from objtool/powerpc: Activate objtool on PPC32 to powerpc: Fix objtool unannotated intra-function call warnings on PPC32, and removed Kconfig change to enable objtool, as it is part of objtool/powerpc: Enable objtool to be built on ppc patch in this series.] Signed-off-by: Sathvika Vasireddy --- arch/powerpc/kernel/cpu_setup_6xx.S | 26 -- arch/powerpc/kernel/cpu_setup_fsl_booke.S| 8 -- arch/powerpc/kernel/entry_32.S | 8 -- arch/powerpc/kernel/head_40x.S | 5 +++- arch/powerpc/kernel/head_8xx.S | 5 +++- arch/powerpc/kernel/head_book3s_32.S | 29 ++-- arch/powerpc/kernel/head_fsl_booke.S | 5 +++- arch/powerpc/kernel/swsusp_32.S | 5 +++- arch/powerpc/kvm/fpu.S | 17 arch/powerpc/platforms/52xx/lite5200_sleep.S | 15 +++--- 10 files changed, 89 insertions(+), 34 deletions(-) diff --git a/arch/powerpc/kernel/cpu_setup_6xx.S b/arch/powerpc/kernel/cpu_setup_6xx.S index f8b5ff64b604..f29ce3dd6140 100644 --- a/arch/powerpc/kernel/cpu_setup_6xx.S +++ b/arch/powerpc/kernel/cpu_setup_6xx.S @@ -4,6 +4,8 @@ *Copyright (C) 2003 Benjamin Herrenschmidt (b...@kernel.crashing.org) */ +#include + #include #include #include @@ -81,7 +83,7 @@ _GLOBAL(__setup_cpu_745x) blr /* Enable caches for 603's, 604, 750 & 7400 */ -setup_common_caches: +SYM_FUNC_START_LOCAL(setup_common_caches) mfspr r11,SPRN_HID0 andi. r0,r11,HID0_DCE ori r11,r11,HID0_ICE|HID0_DCE @@ -95,11 +97,12 @@ setup_common_caches: sync isync blr +SYM_FUNC_END(setup_common_caches) /* 604, 604e, 604ev, ... * Enable superscalar execution & branch history table */ -setup_604_hid0: +SYM_FUNC_START_LOCAL(setup_604_hid0) mfspr r11,SPRN_HID0 ori r11,r11,HID0_SIED|HID0_BHTE ori r8,r11,HID0_BTCD @@ -110,6 +113,7 @@ setup_604_hid0: sync isync blr +SYM_FUNC_END(setup_604_hid0) /* 7400 <= rev 2.7 and 7410 rev = 1.0 suffer from some * erratas we work around here. @@ -125,13 +129,14 @@ setup_604_hid0: * needed once we have applied workaround #5 (though it's * not set by Apple's firmware at least). */ -setup_7400_workarounds: +SYM_FUNC_START_LOCAL(setup_7400_workarounds) mfpvr r3 rlwinm r3,r3,0,20,31 cmpwi 0,r3,0x0207 ble 1f blr -setup_7410_workarounds: +SYM_FUNC_END(setup_7400_workarounds) +SYM_FUNC_START_LOCAL(setup_7410_workarounds) mfpvr r3 rlwinm r3,r3,0,20,31 cmpwi 0,r3,0x0100 @@ -151,6 +156,7 @@ setup_7410_workarounds: sync isync blr +SYM_FUNC_END(setup_7410_workarounds) /* 740/750/7400/7410 * Enable Store Gathering (SGE), Address Broadcast (ABE), @@ -158,7 +164,7 @@ setup_7410_workarounds: * Dynamic Power Management (DPM), Speculative (SPD) * Clear Instruction cache throttling (ICTC) */ -setup_750_7400_hid0: +SYM_FUNC_START_LOCAL(setup_750_7400_hid0) mfspr r11,SPRN_HID0 ori r11,r11,HID0_SGE | HID0_ABE | HID0_BHTE | HID0_BTIC orisr11,r11,HID0_DPM@h @@ -177,12 +183,13 @@ END_FTR_SECTION_IFSET(CPU_FTR_NO_DPM) sync isync blr +SYM_FUNC_END(setup_750_7400_hid0) /* 750cx specific * Looks like we have to disable NAP feature for some PLL settings... * (waiting for confirmation) */ -setup_750cx: +SYM_FUNC_START_LOCAL(setup_750cx) mfspr r10, SPRN_HID1 rlwinm r10,r10,4,28,31 cmpwi cr0,r10,7 @@ -196,11 +203,13 @@ setup_750cx: andcr6,r6,r7 stw r6,CPU_SPEC_FEATURES(r4) blr +SYM_FUNC_END(setup_750cx) /* 750fx specific */ -setup_750fx: +SYM_FUNC_START_LOCAL(setup_750fx) blr +SYM_FUNC_END(setup_750fx) /* MPC 745x * Enable Store Gathering (SGE), Branch Folding (FOLD) @@ -212,7 +221,7 @@ setup_750fx: * Clear Instruction cache throttling (ICTC) * Enable L2 HW prefetch */ -setup_745x_specifics: +SYM_FUNC_START_LOCAL(setup_745x_specifics) /* We check for the presence of an L3 cache setup by * the firmware. If any, we disable NAP capability as * it's known to be bogus on rev 2.1 and earlier @@ -270,6 +279,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_NO_DPM) sync isync blr +SYM_FUNC_END(setup_745x_specifics) /* * Initialize the FPU registers. This is needed to work around an errata diff --git a/arch/powerpc/kernel/cpu_setup_fsl_booke.S b/arch/powerpc/kernel/cpu_setup_fsl_booke.S index 4bf33f1b4193..f573a4f3bbe6 100644 --- a/arch/powerpc/kernel/cpu_setup_fsl_booke.S +++ b/arch/powerpc/kernel/cpu_setup_fsl_booke.S @@ -8,6 +8,8 @@ * Benjamin Herrenschmidt */ +#include + #include #include #include
[PATCH 05/16] powerpc: Skip objtool from running on drivers/crypto/vmx/aesp8-ppc.o
With objtool enabled, below warnings are seen when trying to build: drivers/crypto/vmx/aesp8-ppc.o: warning: objtool: aes_p8_set_encrypt_key+0x44: unannotated intra-function call drivers/crypto/vmx/aesp8-ppc.o: warning: objtool: .text+0x2448: unannotated intra-function call drivers/crypto/vmx/aesp8-ppc.o: warning: objtool: .text+0x2d68: unannotated intra-function call Skip objtool from running on this file, as there are no calls to _mcount. Signed-off-by: Sathvika Vasireddy --- drivers/crypto/vmx/Makefile | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/crypto/vmx/Makefile b/drivers/crypto/vmx/Makefile index 2560cfea1dec..7b41f0da6807 100644 --- a/drivers/crypto/vmx/Makefile +++ b/drivers/crypto/vmx/Makefile @@ -9,3 +9,5 @@ targets += aesp8-ppc.S ghashp8-ppc.S $(obj)/aesp8-ppc.S $(obj)/ghashp8-ppc.S: $(obj)/%.S: $(src)/%.pl FORCE $(call if_changed,perl) + +OBJECT_FILES_NON_STANDARD_aesp8-ppc.o := y -- 2.31.1
[PATCH 03/16] powerpc: Fix objtool unannotated intra-function call warnings
objtool throws unannotated intra-function call warnings in the following assembly files. arch/powerpc/kernel/vector.o: warning: objtool: .text+0x53c: unannotated intra-function call arch/powerpc/kvm/book3s_hv_rmhandlers.o: warning: objtool: .text+0x60: unannotated intra-function call arch/powerpc/kvm/book3s_hv_rmhandlers.o: warning: objtool: .text+0x124: unannotated intra-function call arch/powerpc/kvm/book3s_hv_rmhandlers.o: warning: objtool: .text+0x5d4: unannotated intra-function call arch/powerpc/kvm/book3s_hv_rmhandlers.o: warning: objtool: .text+0x5dc: unannotated intra-function call arch/powerpc/kvm/book3s_hv_rmhandlers.o: warning: objtool: .text+0xcb8: unannotated intra-function call arch/powerpc/kvm/book3s_hv_rmhandlers.o: warning: objtool: .text+0xd0c: unannotated intra-function call arch/powerpc/kvm/book3s_hv_rmhandlers.o: warning: objtool: .text+0x1030: unannotated intra-function call arch/powerpc/kernel/head_64.o: warning: objtool: .text+0x358: unannotated intra-function call arch/powerpc/kernel/head_64.o: warning: objtool: .text+0x728: unannotated intra-function call arch/powerpc/kernel/head_64.o: warning: objtool: .text+0x4d94: unannotated intra-function call arch/powerpc/kernel/head_64.o: warning: objtool: .text+0x4ec4: unannotated intra-function call arch/powerpc/kvm/book3s_hv_interrupts.o: warning: objtool: .text+0x6c: unannotated intra-function call arch/powerpc/kernel/misc_64.o: warning: objtool: .text+0x64: unannotated intra-function call Fix these warnings by annotating those functions with SYM_FUNC_START_LOCAL() and SYM_FUNC_END() macros. Signed-off-by: Sathvika Vasireddy --- arch/powerpc/kernel/exceptions-64s.S| 7 +-- arch/powerpc/kernel/head_64.S | 7 +-- arch/powerpc/kernel/misc_64.S | 4 +++- arch/powerpc/kernel/vector.S| 4 +++- arch/powerpc/kvm/book3s_hv_interrupts.S | 4 +++- arch/powerpc/kvm/book3s_hv_rmhandlers.S | 22 +++--- 6 files changed, 34 insertions(+), 14 deletions(-) diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S index 3d0dc133a9ae..4242c1a20bcd 100644 --- a/arch/powerpc/kernel/exceptions-64s.S +++ b/arch/powerpc/kernel/exceptions-64s.S @@ -20,6 +20,7 @@ #include #include #include +#include /* * Following are fixed section helper macros. @@ -3075,7 +3076,7 @@ CLOSE_FIXED_SECTION(virt_trampolines); USE_TEXT_SECTION() /* MSR[RI] should be clear because this uses SRR[01] */ -enable_machine_check: +SYM_FUNC_START_LOCAL(enable_machine_check) mflrr0 bcl 20,31,$+4 0: mflrr3 @@ -3087,9 +3088,10 @@ enable_machine_check: RFI_TO_KERNEL 1: mtlrr0 blr +SYM_FUNC_END(enable_machine_check) /* MSR[RI] should be clear because this uses SRR[01] */ -disable_machine_check: +SYM_FUNC_START_LOCAL(disable_machine_check) mflrr0 bcl 20,31,$+4 0: mflrr3 @@ -3102,3 +3104,4 @@ disable_machine_check: RFI_TO_KERNEL 1: mtlrr0 blr +SYM_FUNC_END(disable_machine_check) diff --git a/arch/powerpc/kernel/head_64.S b/arch/powerpc/kernel/head_64.S index cf2c08902c05..10e2d43420d0 100644 --- a/arch/powerpc/kernel/head_64.S +++ b/arch/powerpc/kernel/head_64.S @@ -18,6 +18,7 @@ * variants. */ +#include #include #include #include @@ -465,7 +466,7 @@ generic_secondary_common_init: * Assumes we're mapped EA == RA if the MMU is on. */ #ifdef CONFIG_PPC_BOOK3S -__mmu_off: +SYM_FUNC_START_LOCAL(__mmu_off) mfmsr r3 andi. r0,r3,MSR_IR|MSR_DR beqlr @@ -476,6 +477,7 @@ __mmu_off: sync rfid b . /* prevent speculative execution */ +SYM_FUNC_END(__mmu_off) #endif @@ -869,7 +871,7 @@ _GLOBAL(start_secondary_resume) /* * This subroutine clobbers r11 and r12 */ -enable_64b_mode: +SYM_FUNC_START_LOCAL(enable_64b_mode) mfmsr r11 /* grab the current MSR */ #ifdef CONFIG_PPC_BOOK3E orisr11,r11,0x8000 /* CM bit set, we'll set ICM later */ @@ -881,6 +883,7 @@ enable_64b_mode: isync #endif blr +SYM_FUNC_END(enable_64b_mode) /* * This puts the TOC pointer into r2, offset by 0x8000 (as expected diff --git a/arch/powerpc/kernel/misc_64.S b/arch/powerpc/kernel/misc_64.S index fd6d8d3a548e..b36fb89ff718 100644 --- a/arch/powerpc/kernel/misc_64.S +++ b/arch/powerpc/kernel/misc_64.S @@ -9,6 +9,7 @@ * PPC64 updates by Dave Engebretsen (engeb...@us.ibm.com) */ +#include #include #include #include @@ -353,7 +354,7 @@ _GLOBAL(kexec_smp_wait) * * don't overwrite r3 here, it is live for kexec_wait above. */ -real_mode: /* assume normal blr return */ +SYM_FUNC_START_LOCAL(real_mode)/* assume normal blr return */ #ifdef CONFIG_PPC_BOOK3E /* Create an identity mapping. */ b kexec_create_tlb @@ -370,6 +371,7 @@ real_mode: /* assume normal blr return */ mtspr
[PATCH 04/16] powerpc: curb objtool unannotated intra-function call warnings
objtool throws the following unannotated intra-function call warnings: arch/powerpc/kernel/entry_64.o: warning: objtool: .text+0x4: unannotated intra-function call arch/powerpc/kvm/book3s_hv_rmhandlers.o: warning: objtool: .text+0xe64: unannotated intra-function call arch/powerpc/kvm/book3s_hv_rmhandlers.o: warning: objtool: .text+0xee4: unannotated intra-function call Fix these warnings by annotating intra-function call, using ANNOTATE_INTRA_FUNCTION_CALL macro, to indicate that the branch targets are valid. Signed-off-by: Sathvika Vasireddy --- arch/powerpc/kernel/entry_64.S | 2 ++ arch/powerpc/kvm/book3s_hv_rmhandlers.S | 3 +++ 2 files changed, 5 insertions(+) diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S index 01ace4c56104..fb444bc64f3f 100644 --- a/arch/powerpc/kernel/entry_64.S +++ b/arch/powerpc/kernel/entry_64.S @@ -14,6 +14,7 @@ * code, and exception/interrupt return code for PowerPC. */ +#include #include #include #include @@ -73,6 +74,7 @@ flush_branch_caches: // Flush the link stack .rept 64 + ANNOTATE_INTRA_FUNCTION_CALL bl .+4 .endr b 1f diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S index de91118df0c5..ea39a0cf591a 100644 --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S @@ -30,6 +30,7 @@ #include #include #include +#include /* Values in HSTATE_NAPPING(r13) */ #define NAPPING_CEDE 1 @@ -1523,12 +1524,14 @@ kvm_flush_link_stack: /* Flush the link stack. On Power8 it's up to 32 entries in size. */ .rept 32 + ANNOTATE_INTRA_FUNCTION_CALL bl .+4 .endr /* And on Power9 it's up to 64. */ BEGIN_FTR_SECTION .rept 32 + ANNOTATE_INTRA_FUNCTION_CALL bl .+4 .endr END_FTR_SECTION_IFSET(CPU_FTR_ARCH_300) -- 2.31.1
[PATCH 02/16] powerpc: override __ALIGN() and __ALIGN_STR() macros
Since we need an alignment of 4 bytes, override __ALIGN() and __ALIGN_STR() accordingly. Signed-off-by: Sathvika Vasireddy --- arch/powerpc/include/asm/linkage.h | 4 1 file changed, 4 insertions(+) diff --git a/arch/powerpc/include/asm/linkage.h b/arch/powerpc/include/asm/linkage.h index b71b9582e754..8df88fe61438 100644 --- a/arch/powerpc/include/asm/linkage.h +++ b/arch/powerpc/include/asm/linkage.h @@ -2,8 +2,12 @@ #ifndef _ASM_POWERPC_LINKAGE_H #define _ASM_POWERPC_LINKAGE_H +#include #include +#define __ALIGN.align 2 +#define __ALIGN_STR__stringify(__ALIGN) + #ifdef CONFIG_PPC64_ELF_ABI_V1 #define cond_syscall(x) \ asm ("\t.weak " #x "\n\t.set " #x ", sys_ni_syscall\n" \ -- 2.31.1
[PATCH 01/16] powerpc: Replace unreachable() with it's builtin variant in WARN_ON()
objtool is throwing *unannotated intra-function call* warnings with a few instructions that are marked unreachable. Replace unreachable() with __builtin_unreachable() to fix these warnings, as the codegen remains same with unreachable() and __builtin_unreachable(). Signed-off-by: Sathvika Vasireddy --- arch/powerpc/include/asm/bug.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/powerpc/include/asm/bug.h b/arch/powerpc/include/asm/bug.h index 61a4736355c2..074be1a78c56 100644 --- a/arch/powerpc/include/asm/bug.h +++ b/arch/powerpc/include/asm/bug.h @@ -99,7 +99,7 @@ __label__ __label_warn_on; \ \ WARN_ENTRY("twi 31, 0, 0", BUGFLAG_WARNING | (flags), __label_warn_on); \ - unreachable(); \ + __builtin_unreachable();\ \ __label_warn_on: \ break; \ -- 2.31.1
[PATCH 00/16] objtool: Enable and implement --mcount option on powerpc
This patchset enables and implements objtool --mcount option on powerpc. This applies atop powerpc/merge branch. Christophe Leroy (4): objtool: Fix SEGFAULT objtool: Use target file endianness instead of a compiled constant objtool: Use target file class size instead of a compiled constant powerpc: Fix objtool unannotated intra-function call warnings on PPC32 Sathvika Vasireddy (12): powerpc: Replace unreachable() with it's builtin variant in WARN_ON() powerpc: override __ALIGN() and __ALIGN_STR() macros powerpc: Fix objtool unannotated intra-function call warnings powerpc: curb objtool unannotated intra-function call warnings powerpc: Skip objtool from running on drivers/crypto/vmx/aesp8-ppc.o powerpc: Skip objtool from running on VDSO files objtool: Add --mnop as an option to --mcount objtool: Read special sections with alts only when specific options are selected objtool: Use macros to define arch specific reloc types objtool: Add arch specific function arch_ftrace_match() objtool/powerpc: Enable objtool to be built on ppc objtool/powerpc: Add --mcount specific implementation Makefile | 4 +- arch/powerpc/Kconfig | 2 + arch/powerpc/include/asm/asm.h| 7 ++ arch/powerpc/include/asm/bug.h| 2 +- arch/powerpc/include/asm/linkage.h| 4 + arch/powerpc/kernel/cpu_setup_6xx.S | 26 +++-- arch/powerpc/kernel/cpu_setup_fsl_booke.S | 8 +- arch/powerpc/kernel/entry_32.S| 8 +- arch/powerpc/kernel/entry_64.S| 2 + arch/powerpc/kernel/exceptions-64s.S | 7 +- arch/powerpc/kernel/head_40x.S| 5 +- arch/powerpc/kernel/head_64.S | 7 +- arch/powerpc/kernel/head_8xx.S| 5 +- arch/powerpc/kernel/head_book3s_32.S | 29 -- arch/powerpc/kernel/head_fsl_booke.S | 5 +- arch/powerpc/kernel/misc_64.S | 4 +- arch/powerpc/kernel/swsusp_32.S | 5 +- arch/powerpc/kernel/vdso/Makefile | 2 + arch/powerpc/kernel/vector.S | 4 +- arch/powerpc/kvm/book3s_hv_interrupts.S | 4 +- arch/powerpc/kvm/book3s_hv_rmhandlers.S | 25 +++-- arch/powerpc/kvm/fpu.S| 17 +++- arch/powerpc/platforms/52xx/lite5200_sleep.S | 15 ++- arch/x86/Kconfig | 1 + drivers/crypto/vmx/Makefile | 2 + scripts/Makefile.lib | 1 + tools/objtool/arch/powerpc/Build | 2 + tools/objtool/arch/powerpc/decode.c | 96 +++ .../arch/powerpc/include/arch/cfi_regs.h | 11 +++ tools/objtool/arch/powerpc/include/arch/elf.h | 10 ++ .../arch/powerpc/include/arch/special.h | 21 tools/objtool/arch/powerpc/special.c | 19 tools/objtool/arch/x86/decode.c | 8 ++ tools/objtool/arch/x86/include/arch/elf.h | 2 + .../arch/x86/include/arch/endianness.h| 9 -- tools/objtool/builtin-check.c | 14 +++ tools/objtool/check.c | 51 +- tools/objtool/elf.c | 8 +- tools/objtool/include/objtool/arch.h | 2 + tools/objtool/include/objtool/builtin.h | 1 + tools/objtool/include/objtool/elf.h | 8 ++ tools/objtool/include/objtool/endianness.h| 32 +++ tools/objtool/orc_dump.c | 11 ++- tools/objtool/orc_gen.c | 4 +- tools/objtool/special.c | 3 +- 45 files changed, 408 insertions(+), 105 deletions(-) create mode 100644 arch/powerpc/include/asm/asm.h create mode 100644 tools/objtool/arch/powerpc/Build create mode 100644 tools/objtool/arch/powerpc/decode.c create mode 100644 tools/objtool/arch/powerpc/include/arch/cfi_regs.h create mode 100644 tools/objtool/arch/powerpc/include/arch/elf.h create mode 100644 tools/objtool/arch/powerpc/include/arch/special.h create mode 100644 tools/objtool/arch/powerpc/special.c delete mode 100644 tools/objtool/arch/x86/include/arch/endianness.h -- 2.31.1
Re: [PATCH v2 13/14] powerpc/64s: Fix comment on interrupt handler prologue
Le 25/07/2022 à 08:31, Rohan McLure a écrit : > Interrupt handlers on 64s systems will often need to save register state > from the interrupted process to make space for loading special purpose > registers or for internal state. > > Fix a comment documenting a common code path macro in the beginning of > interrupt handlers where r10 is saved to the PACA to afford space for > the value of the CFAR. Comment is currently written as if r10-r12 are > saved to PACA, but in fact only r10 is saved. Maybe it would be interesting to know from which patch the error comes. > > Signed-off-by: Rohan McLure > --- > V1 -> V2: Given its own commit > --- > arch/powerpc/kernel/exceptions-64s.S | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/arch/powerpc/kernel/exceptions-64s.S > b/arch/powerpc/kernel/exceptions-64s.S > index b66dd6f775a4..102896fc6a86 100644 > --- a/arch/powerpc/kernel/exceptions-64s.S > +++ b/arch/powerpc/kernel/exceptions-64s.S > @@ -281,7 +281,7 @@ BEGIN_FTR_SECTION > mfspr r9,SPRN_PPR > END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR) > HMT_MEDIUM > - std r10,IAREA+EX_R10(r13) /* save r10 - r12 */ > + std r10,IAREA+EX_R10(r13) /* save r10 */ > .if ICFAR > BEGIN_FTR_SECTION > mfspr r10,SPRN_CFAR
Re: [PATCH v2 12/14] powerpc/64s: Use {NULLIFY,SAVE,REST}_GPRS macros in sc, scv 0 handlers
Le 25/07/2022 à 08:31, Rohan McLure a écrit : > Use the convenience macros for saving/clearing/restoring gprs in keeping > with syscall calling conventions. The plural variants of these macros > can store a range of registers for concision. > > This works well when the save-to-stack logic is simple (a gpr is saved > to its corresponding offset in the struct pt_regs on the stack). > Wherever a different gpr is storing the initial value of another gpr at > the interrupt location, care must be taken to issue the correct save > instruction and so the macros are not applied in neighbouring stores. Some cleaning could also be done in entry_32.S > > Signed-off-by: Rohan McLure > --- > V1 -> V2: Update summary > --- > arch/powerpc/kernel/interrupt_64.S | 29 ++-- > 1 file changed, 6 insertions(+), 23 deletions(-) > > diff --git a/arch/powerpc/kernel/interrupt_64.S > b/arch/powerpc/kernel/interrupt_64.S > index 34167cfa5d60..efaf162fa772 100644 > --- a/arch/powerpc/kernel/interrupt_64.S > +++ b/arch/powerpc/kernel/interrupt_64.S > @@ -71,12 +71,7 @@ _ASM_NOKPROBE_SYMBOL(system_call_vectored_\name) > mfcrr12 > li r11,0 > /* Save syscall parameters in r3-r8 */ > - std r3,GPR3(r1) > - std r4,GPR4(r1) > - std r5,GPR5(r1) > - std r6,GPR6(r1) > - std r7,GPR7(r1) > - std r8,GPR8(r1) > + SAVE_GPRS(3, 8, r1) > /* Zero r9-r12, this should only be required when restoring all GPRs */ > std r11,GPR9(r1) > std r11,GPR10(r1) > @@ -156,17 +151,10 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR) > /* Could zero these as per ABI, but we may consider a stricter ABI >* which preserves these if libc implementations can benefit, so >* restore them for now until further measurement is done. */ > - ld r0,GPR0(r1) > - ld r4,GPR4(r1) > - ld r5,GPR5(r1) > - ld r6,GPR6(r1) > - ld r7,GPR7(r1) > - ld r8,GPR8(r1) > + REST_GPR(0, r1) > + REST_GPRS(4, 8, r1) > /* Zero volatile regs that may contain sensitive kernel data */ > - li r9,0 > - li r10,0 > - li r11,0 > - li r12,0 > + NULLIFY_GPRS(9, 12) > mtspr SPRN_XER,r0 > > /* > @@ -188,7 +176,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR) > ld r4,_LINK(r1) > ld r5,_XER(r1) > > - ld r0,GPR0(r1) > + REST_GPR(0, r1) > mtcrr2 > mtctr r3 > mtlrr4 > @@ -256,12 +244,7 @@ END_BTB_FLUSH_SECTION > mfcrr12 > li r11,0 > /* Save syscall parameters in r3-r8 */ > - std r3,GPR3(r1) > - std r4,GPR4(r1) > - std r5,GPR5(r1) > - std r6,GPR6(r1) > - std r7,GPR7(r1) > - std r8,GPR8(r1) > + SAVE_GPRS(3, 8, r1) > /* Zero r9-r12, this should only be required when restoring all GPRs */ > std r11,GPR9(r1) > std r11,GPR10(r1)
Re: [PATCH v2 09/14] powerpc: Add NULLIFY_GPRS macros for register clears
Le 08/08/2022 à 09:42, Andrew Donnellan a écrit : > On Mon, 2022-07-25 at 16:29 +1000, Rohan McLure wrote: >> Macros for restoring and saving registers to and from the stack >> exist. >> Provide macros with the same interface for clearing a range of gprs >> by >> setting each register's value in that range to zero. >> >> The resulting macros are called NULLIFY_GPRS and NULLIFY_NVGPRS, >> keeping >> with the naming of the accompanying restore and save macros, and >> usage >> of nullify to describe this operation elsewhere in the kernel. >> >> Signed-off-by: Rohan McLure >> --- >> V1 -> V2: Change 'ZERO' usage in naming to 'NULLIFY', a more obvious >> verb > > And I see you've addressed my comment re 32 v 64 bit from V1 as well. > > Reviewed-by: Andrew Donnellan > >> --- >> arch/powerpc/include/asm/ppc_asm.h | 22 ++ >> 1 file changed, 22 insertions(+) >> >> diff --git a/arch/powerpc/include/asm/ppc_asm.h >> b/arch/powerpc/include/asm/ppc_asm.h >> index 83c02f5a7f2a..d6c46082bf7f 100644 >> --- a/arch/powerpc/include/asm/ppc_asm.h >> +++ b/arch/powerpc/include/asm/ppc_asm.h >> @@ -33,6 +33,20 @@ >> .endr >> .endm >> >> +/* >> + * This expands to a sequence of register clears for regs start to >> end >> + * inclusive, of the form: >> + * >> + * li rN, 0 >> + */ >> +.macro NULLIFY_REGS start, end >> + .Lreg=\start >> + .rept (\end - \start + 1) >> + li .Lreg, 0 >> + .Lreg=.Lreg+1 >> + .endr >> +.endm >> + > > I suppose this could be done using the existing OP_REGS macro, > something like OP_REGS li, 0, start, end, 0, 0 - but a load immediate > is semantically a bit different from the existing uses of OP_REGS so I > don't mind either way Not sure it would work. You'd get: li .Lreg, 0 + 0 * .Lreg(0) Even if it works that would be odd. > >> /* >> * Macros for storing registers into and loading registers from >> * exception frames. >> @@ -49,6 +63,14 @@ >> #define REST_NVGPRS(base) REST_GPRS(13, 31, base) >> #endif >> >> +#defineNULLIFY_GPRS(start, end)NULLIFY_REGS start, >> end >> +#ifdef __powerpc64__ >> +#defineNULLIFY_NVGPRS()NULLIFY_GPRS(14, 31) >> +#else >> +#defineNULLIFY_NVGPRS()NULLIFY_GPRS(13, 31) >> +#endif >> +#defineNULLIFY_GPR(n) NULLIFY_GPRS(n, n) >> + >> #define SAVE_GPR(n, base) SAVE_GPRS(n, n, base) >> #define REST_GPR(n, base) REST_GPRS(n, n, base) >> >
Re: [PATCH v2 04/14] powerpc/32: Remove powerpc select specialisation
Le 25/07/2022 à 08:26, Rohan McLure a écrit : > Syscall #82 has been implemented for 32-bit platforms in a unique way on > powerpc systems. This hack will in effect guess whether the caller is > expecting new select semantics or old select semantics. It does so via a > guess, based off the first parameter. In new select, this parameter > represents the length of a user-memory array of file descriptors, and in > old select this is a pointer to an arguments structure. > > The heuristic simply interprets sufficiently large values of its first > parameter as being a call to old select. The following is a discussion > on how this syscall should be handled. > > Link: > https://lore.kernel.org/lkml/13737de5-0eb7-e881-9af0-163b0d29a...@csgroup.eu/ > > As discussed in this thread, the existence of such a hack suggests that for > whatever powerpc binaries may predate glibc, it is most likely that they > would have taken use of the old select semantics. x86 and arm64 both > implement this syscall with oldselect semantics. > > Remove the powerpc implementation, and update syscall.tbl to refer to emit > a reference to sys_old_select for 32-bit binaries, in keeping with how > other architectures support syscall #82. > > Signed-off-by: Rohan McLure Move this before patch 1, it will remove one change in patch 1. > --- > V1 -> V2: Remove arch-specific select handler > --- > arch/powerpc/kernel/syscalls.c | 18 -- > arch/powerpc/kernel/syscalls/syscall.tbl | 2 +- > .../arch/powerpc/entry/syscalls/syscall.tbl | 2 +- > 3 files changed, 2 insertions(+), 20 deletions(-) > > diff --git a/arch/powerpc/kernel/syscalls.c b/arch/powerpc/kernel/syscalls.c > index 9f339bcb433d..0afbcbd50433 100644 > --- a/arch/powerpc/kernel/syscalls.c > +++ b/arch/powerpc/kernel/syscalls.c > @@ -74,24 +74,6 @@ SYSCALL_DEFINE6(mmap, unsigned long, addr, size_t, len, > return do_mmap2(addr, len, prot, flags, fd, offset, PAGE_SHIFT); > } > > -#ifdef CONFIG_PPC32 > -/* > - * Due to some executables calling the wrong select we sometimes > - * get wrong args. This determines how the args are being passed > - * (a single ptr to them all args passed) then calls > - * sys_select() with the appropriate args. -- Cort > - */ > -SYSCALL_DEFINE5(ppc_select, int, n, fd_set __user *, inp, > - fd_set __user *, outp, fd_set __user *, exp, > - struct __kernel_old_timeval __user *, tvp) > -{ > - if ((unsigned long)n >= 4096) > - return sys_old_select((void __user *)n); > - > - return sys_select(n, inp, outp, exp, tvp); > -} > -#endif > - > #ifdef CONFIG_PPC64 > static inline long do_ppc64_personality(unsigned long personality) > { > diff --git a/arch/powerpc/kernel/syscalls/syscall.tbl > b/arch/powerpc/kernel/syscalls/syscall.tbl > index 59d9259dfbb5..c6cfcdf52c57 100644 > --- a/arch/powerpc/kernel/syscalls/syscall.tbl > +++ b/arch/powerpc/kernel/syscalls/syscall.tbl > @@ -110,7 +110,7 @@ > 79 common settimeofdaysys_settimeofday > compat_sys_settimeofday > 80 common getgroups sys_getgroups > 81 common setgroups sys_setgroups > -82 32 select sys_ppc_select > sys_ni_syscall > +82 32 select sys_old_select > sys_ni_syscall > 82 64 select sys_ni_syscall > 82 spu select sys_ni_syscall > 83 common symlink sys_symlink > diff --git a/tools/perf/arch/powerpc/entry/syscalls/syscall.tbl > b/tools/perf/arch/powerpc/entry/syscalls/syscall.tbl > index 437066f5c4b2..b4c970c9c6b1 100644 > --- a/tools/perf/arch/powerpc/entry/syscalls/syscall.tbl > +++ b/tools/perf/arch/powerpc/entry/syscalls/syscall.tbl > @@ -110,7 +110,7 @@ > 79 common settimeofdaysys_settimeofday > compat_sys_settimeofday > 80 common getgroups sys_getgroups > 81 common setgroups sys_setgroups > -82 32 select sys_ppc_select > sys_ni_syscall > +82 32 select sys_old_select > sys_ni_syscall > 82 64 select sys_ni_syscall > 82 spu select sys_ni_syscall > 83 common symlink sys_symlink
Re: [PATCH v2 06/14] powerpc: Include all arch-specific syscall prototypes
Le 25/07/2022 à 08:27, Rohan McLure a écrit : > Forward declare all syscall handler prototypes where a generic prototype > is not provided in either linux/syscalls.h or linux/compat.h in > asm/syscalls.h. This is required for compile-time type-checking for > syscall handlers, which is implemented later in this series. > > 32-bit compatibility syscall handlers are expressed in terms of types in > ppc32.h. Expose this header globally. > > Signed-off-by: Rohan McLure > --- > V1 -> V2: Explicitly include prototypes. > --- > arch/powerpc/{kernel => include/asm}/ppc32.h | 0 I think a new name would be welcome, for instance syscalls_32.h ? > arch/powerpc/include/asm/syscalls.h | 104 - > arch/powerpc/kernel/signal_32.c | 2 +- > arch/powerpc/perf/callchain_32.c | 2 +- > 4 files changed, 76 insertions(+), 32 deletions(-) > > diff --git a/arch/powerpc/kernel/ppc32.h b/arch/powerpc/include/asm/ppc32.h > similarity index 100% > rename from arch/powerpc/kernel/ppc32.h > rename to arch/powerpc/include/asm/ppc32.h > diff --git a/arch/powerpc/include/asm/syscalls.h > b/arch/powerpc/include/asm/syscalls.h > index 025d4b877161..8b2757d7f423 100644 > --- a/arch/powerpc/include/asm/syscalls.h > +++ b/arch/powerpc/include/asm/syscalls.h > @@ -8,49 +8,93 @@ > #include > #include > > +#ifdef CONFIG_PPC64 > +#include > +#endif > +#include > +#include > + > struct rtas_args; > > -asmlinkage long sys_mmap(unsigned long addr, size_t len, > - unsigned long prot, unsigned long flags, > - unsigned long fd, off_t offset); > -asmlinkage long sys_mmap2(unsigned long addr, size_t len, > - unsigned long prot, unsigned long flags, > - unsigned long fd, unsigned long pgoff); > -asmlinkage long sys_ppc64_personality(unsigned long personality); > +#ifndef CONFIG_ARCH_HAS_SYSCALL_WRAPPER > + > +/* > + * PowerPC architecture-specific syscalls > + */ > + > asmlinkage long sys_rtas(struct rtas_args __user *uargs); > -int sys_ppc_select(int n, fd_set __user *inp, fd_set __user *outp, > -fd_set __user *exp, struct __kernel_old_timeval __user *tvp); > -long sys_ppc_fadvise64_64(int fd, int advice, u32 offset_high, u32 > offset_low, > - u32 len_high, u32 len_low); > +asmlinkage long sys_ni_syscall(void); > > +#ifdef CONFIG_PPC64 > +asmlinkage long sys_ppc64_personality(unsigned long personality); > #ifdef CONFIG_COMPAT > -unsigned long compat_sys_mmap2(unsigned long addr, size_t len, > -unsigned long prot, unsigned long flags, > -unsigned long fd, unsigned long pgoff); > +asmlinkage long compat_sys_ppc64_personality(unsigned long personality); > +#endif /* CONFIG_COMPAT */ > +#endif /* CONFIG_PPC64 */ > > -compat_ssize_t compat_sys_pread64(unsigned int fd, char __user *ubuf, > compat_size_t count, > - u32 reg6, u32 pos1, u32 pos2); > +/* Parameters are reordered for powerpc to avoid padding */ > +asmlinkage long sys_ppc_fadvise64_64(int fd, int advice, > + u32 offset_high, u32 offset_low, > + u32 len_high, u32 len_low); > +asmlinkage long sys_swapcontext(struct ucontext __user *old_ctx, > + struct ucontext __user *new_ctx, long ctx_size); > +asmlinkage long sys_mmap(unsigned long addr, size_t len, > + unsigned long prot, unsigned long flags, > + unsigned long fd, off_t offset); > +asmlinkage long sys_mmap2(unsigned long addr, size_t len, > + unsigned long prot, unsigned long flags, > + unsigned long fd, unsigned long pgoff); > +asmlinkage long sys_switch_endian(void); > > -compat_ssize_t compat_sys_pwrite64(unsigned int fd, const char __user *ubuf, > compat_size_t count, > -u32 reg6, u32 pos1, u32 pos2); > +#ifdef CONFIG_PPC32 > +asmlinkage long sys_sigreturn(void); > +asmlinkage long sys_debug_setcontext(struct ucontext __user *ctx, int ndbg, > + struct sig_dbg_op __user *dbg); > +#endif > > -compat_ssize_t compat_sys_readahead(int fd, u32 r4, u32 offset1, u32 > offset2, u32 count); > +asmlinkage long sys_rt_sigreturn(void); > > -int compat_sys_truncate64(const char __user *path, u32 reg4, > - unsigned long len1, unsigned long len2); > +asmlinkage long sys_subpage_prot(unsigned long addr, > + unsigned long len, u32 __user *map); > > -long compat_sys_fallocate(int fd, int mode, u32 offset1, u32 offset2, u32 > len1, u32 len2); > +#ifdef CONFIG_COMPAT > +asmlinkage long compat_sys_swapcontext(struct ucontext32 __user *old_ctx, > +struct ucontext32 __user *new_ctx, > +int ctx_size); > +asmlinkage long compat_s
Re: [PATCH v2 03/14] powerpc: Remove direct call to mmap2 syscall handlers
Le 25/07/2022 à 08:25, Rohan McLure a écrit : > Syscall handlers should not be invoked internally by their symbol names, > as these symbols defined by the architecture-defined SYSCALL_DEFINE > macro. Move the compatibility syscall definition for mmap2 to > syscalls.c, so that all mmap implementations can share an inline helper > function, as is done with the personality handlers. > > Signed-off-by: Rohan McLure > --- > V1 -> V2: Move mmap2 compat implementation to asm/kernel/syscalls.c. > --- > arch/powerpc/kernel/sys_ppc32.c| 10 -- > arch/powerpc/kernel/syscalls.c | 11 +++ > arch/powerpc/kernel/syscalls/syscall.tbl | 2 +- > tools/perf/arch/powerpc/entry/syscalls/syscall.tbl | 4 ++-- > 4 files changed, 14 insertions(+), 13 deletions(-) > > diff --git a/arch/powerpc/kernel/sys_ppc32.c b/arch/powerpc/kernel/sys_ppc32.c > index c3490adcb158..89e8c478fd6a 100644 > --- a/arch/powerpc/kernel/sys_ppc32.c > +++ b/arch/powerpc/kernel/sys_ppc32.c > @@ -25,7 +25,6 @@ > #include > #include > #include > -#include > #include > #include > #include > @@ -48,15 +47,6 @@ > #include > #include > > -COMPAT_SYSCALL_DEFINE6(mmap2, > -unsigned long, addr, size_t, len, > -unsigned long, prot, unsigned long, flags, > -unsigned long, fd, unsigned long, pgoff) > -{ > - /* This should remain 12 even if PAGE_SIZE changes */ > - return sys_mmap(addr, len, prot, flags, fd, pgoff << 12); > -} > - > /* >* long long munging: >* The 32 bit ABI passes long longs in an odd even register pair. > diff --git a/arch/powerpc/kernel/syscalls.c b/arch/powerpc/kernel/syscalls.c > index 22755502afc0..9f339bcb433d 100644 > --- a/arch/powerpc/kernel/syscalls.c > +++ b/arch/powerpc/kernel/syscalls.c > @@ -56,6 +56,17 @@ SYSCALL_DEFINE6(mmap2, unsigned long, addr, size_t, len, > return do_mmap2(addr, len, prot, flags, fd, pgoff, PAGE_SHIFT-12); > } > > +#ifdef CONFIG_COMPAT > +COMPAT_SYSCALL_DEFINE6(mmap2, > +unsigned long, addr, size_t, len, > +unsigned long, prot, unsigned long, flags, > +unsigned long, fd, unsigned long, pgoff) > +{ > + /* This should remain 12 even if PAGE_SIZE changes */ > + return do_mmap2(addr, len, prot, flags, fd, pgoff << 12, PAGE_SHIFT-12); > +} > +#endif > + > SYSCALL_DEFINE6(mmap, unsigned long, addr, size_t, len, > unsigned long, prot, unsigned long, flags, > unsigned long, fd, off_t, offset) > diff --git a/arch/powerpc/kernel/syscalls/syscall.tbl > b/arch/powerpc/kernel/syscalls/syscall.tbl > index 54bb5834785f..59d9259dfbb5 100644 > --- a/arch/powerpc/kernel/syscalls/syscall.tbl > +++ b/arch/powerpc/kernel/syscalls/syscall.tbl > @@ -243,7 +243,7 @@ > 189 nospu vfork sys_vfork > 190 common ugetrlimit sys_getrlimit > compat_sys_getrlimit > 191 common readahead sys_readahead > compat_sys_ppc_readahead > -192 32 mmap2 sys_mmap2 > compat_sys_ppc_mmap2 > +192 32 mmap2 sys_mmap2 > compat_sys_mmap2 Why a name change here when the function is just move from one C file to another without any name change ? This look like the reverse of what was done in patch 1. Could that change go up front so that patch 1 doesn't need a change at all in the table ? > 193 32 truncate64 sys_truncate64 > compat_sys_ppc_truncate64 > 194 32 ftruncate64 sys_ftruncate64 > compat_sys_ppc_ftruncate64 > 195 32 stat64 sys_stat64 > diff --git a/tools/perf/arch/powerpc/entry/syscalls/syscall.tbl > b/tools/perf/arch/powerpc/entry/syscalls/syscall.tbl > index 54bb5834785f..437066f5c4b2 100644 > --- a/tools/perf/arch/powerpc/entry/syscalls/syscall.tbl > +++ b/tools/perf/arch/powerpc/entry/syscalls/syscall.tbl > @@ -243,7 +243,7 @@ > 189 nospu vfork sys_vfork > 190 common ugetrlimit sys_getrlimit > compat_sys_getrlimit > 191 common readahead sys_readahead > compat_sys_ppc_readahead > -192 32 mmap2 sys_mmap2 > compat_sys_ppc_mmap2 > +192 32 mmap2 sys_mmap2 > compat_sys_mmap2 > 193 32 truncate64 sys_truncate64 > compat_sys_ppc_truncate64 > 194 32 ftruncate64 sys_ftruncate64 > compat_sys_ppc_ftruncate64 > 195 32 stat64 sys_stat64 > @@ -391,7 +391,7 @@ > 306 common timerfd_create
Re: [PATCH v2 01/14] powerpc: Adopt SYSCALL_DEFINE for arch-specific syscall handlers
Le 08/08/2022 à 08:04, Rohan McLure a écrit : > Thanks for reviewing my patches. > >> I think this patch should be split in two patches. One where you just >> change to using SYSCALL_DEFINE and COMPAT_SYSCALL_DEFINE, and a second >> patch for everything else. >> >> The first patch could then be linked to >> https://github.com/linuxppc/issues/issues/146 >> >> Or in the reverse order if it makes more sense maybe. > > My reasoning for completing all of these changes in a single patch is for > retaining bisectability. Adopting SYSCALL_DEFINE and COMPAT_SYSCALL_DEFINE > prepends {sys_, compat_sys_} to the symbol names, and so this must be > reflected immediately in the symbol tables. Well, of course we need the patch build, so SYSCALL_DEFINE and COMPAT_SYSCALL_DEFINE have to go with the name changes, that's obvious. My comment was more related to changes like the renaming of ppc64_personality() to do_ppc64_personality() and the creation of sys_ppc64_personality() and compat_ That could be a patch by itself. Also patch 4 could go up front in order to avoid renaming a function you are removing in a follow-up patch. So as a summary, do all preparation up front of patch 1, in order to keep it as minimal. > >> Is that 'asmlinkage' still needed ? Not all syscalls have it seems, and >> as far as I can see that macro voids for powerpc. > > Sorry, you commented on this in the previous revision, and I should have > acted on it then. I in fact touch all but one usages of ‘asmlinkage’ in > arch/powerpc, and so am in a good position to get rid of all of them. I’ll > do that in next revision. >
Re: [PATCH v2 12/14] powerpc/64s: Use {NULLIFY,SAVE,REST}_GPRS macros in sc, scv 0 handlers
On Mon, 2022-07-25 at 16:31 +1000, Rohan McLure wrote: > Use the convenience macros for saving/clearing/restoring gprs in > keeping > with syscall calling conventions. The plural variants of these macros > can store a range of registers for concision. > > This works well when the save-to-stack logic is simple (a gpr is > saved > to its corresponding offset in the struct pt_regs on the stack). > Wherever a different gpr is storing the initial value of another gpr > at > the interrupt location, care must be taken to issue the correct save > instruction and so the macros are not applied in neighbouring stores. This second paragraph is a bit unclear, I think I understand what you're saying but could you rephrase it for clarity? > > Signed-off-by: Rohan McLure > --- > V1 -> V2: Update summary > --- > arch/powerpc/kernel/interrupt_64.S | 29 ++-- > 1 file changed, 6 insertions(+), 23 deletions(-) > > diff --git a/arch/powerpc/kernel/interrupt_64.S > b/arch/powerpc/kernel/interrupt_64.S > index 34167cfa5d60..efaf162fa772 100644 > --- a/arch/powerpc/kernel/interrupt_64.S > +++ b/arch/powerpc/kernel/interrupt_64.S > @@ -71,12 +71,7 @@ _ASM_NOKPROBE_SYMBOL(system_call_vectored_\name) > mfcrr12 > li r11,0 > /* Save syscall parameters in r3-r8 */ > - std r3,GPR3(r1) > - std r4,GPR4(r1) > - std r5,GPR5(r1) > - std r6,GPR6(r1) > - std r7,GPR7(r1) > - std r8,GPR8(r1) > + SAVE_GPRS(3, 8, r1) > /* Zero r9-r12, this should only be required when restoring > all GPRs */ > std r11,GPR9(r1) > std r11,GPR10(r1) > @@ -156,17 +151,10 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR) > /* Could zero these as per ABI, but we may consider a > stricter ABI > * which preserves these if libc implementations can benefit, > so > * restore them for now until further measurement is done. */ > - ld r0,GPR0(r1) > - ld r4,GPR4(r1) > - ld r5,GPR5(r1) > - ld r6,GPR6(r1) > - ld r7,GPR7(r1) > - ld r8,GPR8(r1) > + REST_GPR(0, r1) > + REST_GPRS(4, 8, r1) > /* Zero volatile regs that may contain sensitive kernel data > */ > - li r9,0 > - li r10,0 > - li r11,0 > - li r12,0 > + NULLIFY_GPRS(9, 12) > mtspr SPRN_XER,r0 > > /* > @@ -188,7 +176,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR) > ld r4,_LINK(r1) > ld r5,_XER(r1) > > - ld r0,GPR0(r1) > + REST_GPR(0, r1) > mtcrr2 > mtctr r3 > mtlrr4 > @@ -256,12 +244,7 @@ END_BTB_FLUSH_SECTION > mfcrr12 > li r11,0 > /* Save syscall parameters in r3-r8 */ > - std r3,GPR3(r1) > - std r4,GPR4(r1) > - std r5,GPR5(r1) > - std r6,GPR6(r1) > - std r7,GPR7(r1) > - std r8,GPR8(r1) > + SAVE_GPRS(3, 8, r1) > /* Zero r9-r12, this should only be required when restoring > all GPRs */ > std r11,GPR9(r1) > std r11,GPR10(r1) -- Andrew DonnellanOzLabs, ADL Canberra a...@linux.ibm.com IBM Australia Limited
Re: [PATCH v2 10/14] powerpc: Provide syscall wrapper
Le 25/07/2022 à 08:30, Rohan McLure a écrit : > Implement syscall wrapper as per s390, x86, arm64. When enabled > cause handlers to accept parameters from a stack frame rather than > from user scratch register state. This allows for user registers to be > safely cleared in order to reduce caller influence on speculation > within syscall routine. The wrapper is a macro that emits syscall > handler symbols that call into the target handler, obtaining its > parameters from a struct pt_regs on the stack. > > As registers are already saved to the stack prior to calling > system_call_exception, it appears that this function is executed more > efficiently with the new stack-pointer convention than with parameters > passed by registers, avoiding the allocation of a stack frame for this > method. On a 32-bit system, we see >20% performance increases on the > null_syscall microbenchmark, and on a Power 8 the performance gains > amortise the cost of clearing and restoring registers which is > implemented at the end of this series, seeing final result of ~5.6% > performance improvement on null_syscall. > > Syscalls are wrapped in this fashion on all platforms except for the > Cell processor as this commit does not provide SPU support. This can be > quickly fixed in a successive patch, but requires spu_sys_callback to > allocate a pt_regs structure to satisfy the wrapped calling convention. > > Co-developed-by: Andrew Donnellan > Signed-off-by: Andrew Donnellan > Signed-off-by: Rohan McLure > --- > V1 -> V2: Generate prototypes for symbols produced by the wrapper. > --- > arch/powerpc/Kconfig | 1 + > arch/powerpc/include/asm/interrupt.h | 3 +- > arch/powerpc/include/asm/syscall.h | 4 + > arch/powerpc/include/asm/syscall_wrapper.h | 94 > arch/powerpc/include/asm/syscalls.h| 25 +- > arch/powerpc/kernel/entry_32.S | 6 +- > arch/powerpc/kernel/interrupt.c| 31 +++ > arch/powerpc/kernel/interrupt_64.S | 30 +++ > arch/powerpc/kernel/systbl.c | 2 + > arch/powerpc/kernel/vdso.c | 2 + > 10 files changed, 156 insertions(+), 42 deletions(-) > This patch doesn't apply on powerpc tree, conflicts with 1547db7d1f44 ("powerpc: Move system_call_exception() to syscall.c") Christophe
Re: [PATCH v2 09/14] powerpc: Add NULLIFY_GPRS macros for register clears
On Mon, 2022-07-25 at 16:29 +1000, Rohan McLure wrote: > Macros for restoring and saving registers to and from the stack > exist. > Provide macros with the same interface for clearing a range of gprs > by > setting each register's value in that range to zero. > > The resulting macros are called NULLIFY_GPRS and NULLIFY_NVGPRS, > keeping > with the naming of the accompanying restore and save macros, and > usage > of nullify to describe this operation elsewhere in the kernel. > > Signed-off-by: Rohan McLure > --- > V1 -> V2: Change 'ZERO' usage in naming to 'NULLIFY', a more obvious > verb And I see you've addressed my comment re 32 v 64 bit from V1 as well. Reviewed-by: Andrew Donnellan > --- > arch/powerpc/include/asm/ppc_asm.h | 22 ++ > 1 file changed, 22 insertions(+) > > diff --git a/arch/powerpc/include/asm/ppc_asm.h > b/arch/powerpc/include/asm/ppc_asm.h > index 83c02f5a7f2a..d6c46082bf7f 100644 > --- a/arch/powerpc/include/asm/ppc_asm.h > +++ b/arch/powerpc/include/asm/ppc_asm.h > @@ -33,6 +33,20 @@ > .endr > .endm > > +/* > + * This expands to a sequence of register clears for regs start to > end > + * inclusive, of the form: > + * > + * li rN, 0 > + */ > +.macro NULLIFY_REGS start, end > + .Lreg=\start > + .rept (\end - \start + 1) > + li .Lreg, 0 > + .Lreg=.Lreg+1 > + .endr > +.endm > + I suppose this could be done using the existing OP_REGS macro, something like OP_REGS li, 0, start, end, 0, 0 - but a load immediate is semantically a bit different from the existing uses of OP_REGS so I don't mind either way > /* > * Macros for storing registers into and loading registers from > * exception frames. > @@ -49,6 +63,14 @@ > #define REST_NVGPRS(base) REST_GPRS(13, 31, base) > #endif > > +#defineNULLIFY_GPRS(start, end)NULLIFY_REGS start, > end > +#ifdef __powerpc64__ > +#defineNULLIFY_NVGPRS()NULLIFY_GPRS(14, 31) > +#else > +#defineNULLIFY_NVGPRS()NULLIFY_GPRS(13, 31) > +#endif > +#defineNULLIFY_GPR(n) NULLIFY_GPRS(n, n) > + > #define SAVE_GPR(n, base) SAVE_GPRS(n, n, base) > #define REST_GPR(n, base) REST_GPRS(n, n, base) > -- Andrew DonnellanOzLabs, ADL Canberra a...@linux.ibm.com IBM Australia Limited
Re: [PATCH v2 08/14] powerpc: Use common syscall handler type
On Mon, 2022-07-25 at 16:28 +1000, Rohan McLure wrote: > Cause syscall handlers to be typed as follows when called indirectly > throughout the kernel. > > typedef long (*syscall_fn)(unsigned long, unsigned long, unsigned > long, > unsigned long, unsigned long, unsigned > long); > > Since both 32 and 64-bit abis allow for at least the first six > machine-word length parameters to a function to be passed by > registers, > even handlers which admit fewer than six parameters may be viewed as > having the above type. > > Fixup comparisons in VDSO to avoid pointer-integer comparison. > Introduce > explicit cast on systems with SPUs. > > Signed-off-by: Rohan McLure This patch needs rebasing on top of commit 1547db7d1f44 ("powerpc: Move system_call_exception() to syscall.c"). > > diff --git a/arch/powerpc/platforms/cell/spu_callbacks.c > b/arch/powerpc/platforms/cell/spu_callbacks.c > index fe0d8797a00a..114bbe2eeacc 100644 > --- a/arch/powerpc/platforms/cell/spu_callbacks.c > +++ b/arch/powerpc/platforms/cell/spu_callbacks.c > @@ -34,22 +34,22 @@ > * mbind, mq_open, ipc, ... > */ > > -static void *spu_syscall_table[] = { > +static const syscall_fn spu_syscall_table[] = { > #define __SYSCALL_WITH_COMPAT(nr, entry, compat) __SYSCALL(nr, > entry) > -#define __SYSCALL(nr, entry) [nr] = entry, > +#define __SYSCALL(nr, entry) [nr] = (void *) entry, > #include > }; > > long spu_sys_callback(struct spu_syscall_block *s) > { > - long (*syscall)(u64 a1, u64 a2, u64 a3, u64 a4, u64 a5, u64 > a6); > + syscall_fn syscall; > > if (s->nr_ret >= ARRAY_SIZE(spu_syscall_table)) { > pr_debug("%s: invalid syscall #%lld", __func__, s- > >nr_ret); > return -ENOSYS; > } > > - syscall = spu_syscall_table[s->nr_ret]; > + syscall = (syscall_fn) spu_syscall_table[s->nr_ret]; I don't think this cast is necessary? > > pr_debug("SPU-syscall " > "%pSR:syscall%lld(%llx, %llx, %llx, %llx, %llx, > %llx)\n", -- Andrew DonnellanOzLabs, ADL Canberra a...@linux.ibm.com IBM Australia Limited