date:20240312

Re: [PATCH v3 07/12] powerpc: Use initializer for struct vm_unmapped_area_info

2024-03-12 Thread Christophe Leroy



Le 12/03/2024 à 23:28, Rick Edgecombe a écrit :
> Future changes will need to add a new member to struct
> vm_unmapped_area_info. This would cause trouble for any call site that
> doesn't initialize the struct. Currently every caller sets each member
> manually, so if new members are added they will be uninitialized and the
> core code parsing the struct will see garbage in the new member.
> 
> It could be possible to initialize the new member manually to 0 at each
> call site. This and a couple other options were discussed, and a working
> consensus (see links) was that in general the best way to accomplish this
> would be via static initialization with designated member initiators.
> Having some struct vm_unmapped_area_info instances not zero initialized
> will put those sites at risk of feeding garbage into vm_unmapped_area() if
> the convention is to zero initialize the struct and any new member addition
> misses a call site that initializes each member manually.
> 
> It could be possible to leave the code mostly untouched, and just change
> the line:
> struct vm_unmapped_area_info info
> to:
> struct vm_unmapped_area_info info = {};
> 
> However, that would leave cleanup for the members that are manually set
> to zero, as it would no longer be required.
> 
> So to be reduce the chance of bugs via uninitialized members, instead
> simply continue the process to initialize the struct this way tree wide.
> This will zero any unspecified members. Move the member initializers to the
> struct declaration when they are known at that time. Leave the members out
> that were manually initialized to zero, as this would be redundant for
> designated initializers.

I understand from this text that, as agreed, this patch removes the 
pointless/redundant zero-init of individual members. But it is not what 
is done, see below ?

> 
> Signed-off-by: Rick Edgecombe 
> Acked-by: Michael Ellerman 
> Cc: Michael Ellerman 
> Cc: Nicholas Piggin 
> Cc: Christophe Leroy 
> Cc: Aneesh Kumar K.V 
> Cc: Naveen N. Rao 
> Cc: linuxppc-dev@lists.ozlabs.org
> Link: https://lore.kernel.org/lkml/202402280912.33AEE7A9CF@keescook/#t
> Link: 
> https://lore.kernel.org/lkml/j7bfvig3gew3qruouxrh7z7ehjjafrgkbcmg6tcghhfh3rhmzi@wzlcoecgy5rs/
> ---
> v3:
>   - Fixed spelling errors in log
>   - Be consistent about field vs member in log
> 
> Hi,
> 
> This patch was split and refactored out of a tree-wide change [0] to just
> zero-init each struct vm_unmapped_area_info. The overall goal of the
> series is to help shadow stack guard gaps. Currently, there is only one
> arch with shadow stacks, but two more are in progress. It is compile tested
> only.
> 
> There was further discussion that this method of initializing the structs
> while nice in some ways has a greater risk of introducing bugs in some of
> the more complicated callers. Since this version was reviewed my arch
> maintainers already, leave it as was already acknowledged.
> 
> Thanks,
> 
> Rick
> 
> [0] 
> https://lore.kernel.org/lkml/20240226190951.3240433-6-rick.p.edgeco...@intel.com/
> ---
>   arch/powerpc/mm/book3s64/slice.c | 23 ---
>   1 file changed, 12 insertions(+), 11 deletions(-)
> 
> diff --git a/arch/powerpc/mm/book3s64/slice.c 
> b/arch/powerpc/mm/book3s64/slice.c
> index c0b58afb9a47..6c7ac8c73a6c 100644
> --- a/arch/powerpc/mm/book3s64/slice.c
> +++ b/arch/powerpc/mm/book3s64/slice.c
> @@ -282,12 +282,12 @@ static unsigned long slice_find_area_bottomup(struct 
> mm_struct *mm,
>   {
>   int pshift = max_t(int, mmu_psize_defs[psize].shift, PAGE_SHIFT);
>   unsigned long found, next_end;
> - struct vm_unmapped_area_info info;
> -
> - info.flags = 0;
> - info.length = len;
> - info.align_mask = PAGE_MASK & ((1ul << pshift) - 1);
> - info.align_offset = 0;
> + struct vm_unmapped_area_info info = {
> + .flags = 0,

Please remove zero-init as agreed and explained in the commit message

> + .length = len,
> + .align_mask = PAGE_MASK & ((1ul << pshift) - 1),
> + .align_offset = 0

Same here.

> + };
>   /*
>* Check till the allow max value for this mmap request
>*/
> @@ -326,13 +326,14 @@ static unsigned long slice_find_area_topdown(struct 
> mm_struct *mm,
>   {
>   int pshift = max_t(int, mmu_psize_defs[psize].shift, PAGE_SHIFT);
>   unsigned long found, prev;
> - struct vm_unmapped_area_info info;
> + struct vm_unmapped_area_info info = {
> + .flags = VM_UNMAPPED_AREA_TOPDOWN,
> + .length = len,
> + .align_mask = PAGE_MASK & ((1ul << pshift) - 1),
> + .align_offset = 0

Same here.

> + };
>   unsigned long min_addr = max(PAGE_SIZE, mmap_min_addr);
>   
> - info.flags = VM_UNMAPPED_AREA_TOPDOWN;
> - info.length = len;
> - info.align_mask = PAGE_MASK & ((1ul << pshift) - 1);
> - info.align_offset = 0;
>   /*
>* If we are trying to allocate above DEFAULT

Re: kexec verbose dumps with 6.8 [was: [PATCH v4 1/7] kexec_file: add kexec_file flag to control debug printing]

2024-03-12 Thread Jiri Slaby


Hi,


On 13. 03. 24, 1:48, Baoquan He wrote:

Hi Jiri,

On 03/12/24 at 10:58am, Jiri Slaby wrote:

On 13. 12. 23, 6:57, Baoquan He wrote:

  ... snip...

--- a/include/linux/kexec.h
+++ b/include/linux/kexec.h

...

@@ -500,6 +500,13 @@ static inline int crash_hotplug_memory_support(void) { 
return 0; }
   static inline unsigned int crash_get_elfcorehdr_size(void) { return 0; }
   #endif
+extern bool kexec_file_dbg_print;
+
+#define kexec_dprintk(fmt, ...)\
+   printk("%s" fmt,  \
+  kexec_file_dbg_print ? KERN_INFO : KERN_DEBUG,   \
+  ##__VA_ARGS__)


This means you dump it _always_. Only with different levels.


It dumped always too with pr_debug() before, I just add a switch to
control it's pr_info() or pr_debug().


Not really, see below.



And without any prefix whatsoever, so people see bloat like this in their
log now:
[  +0.01] 1000-0009 (1)
[  +0.02] 7f96d000-7f97efff (3)
[  +0.02] 0080-00807fff (4)
[  +0.01] 0080b000-0080bfff (4)
[  +0.02] 0081-008f (4)
[  +0.01] 7f97f000-7f9fefff (4)
[  +0.01] 7ff0-7fff (4)
[  +0.02] -0fff (2)


On which arch are you seeing this? There should be one line above these
range printing to tell what they are, like:

E820 memmap:


Ah this is there too. It's a lot of output, so I took it out of context, 
apparently.



-0009a3ff (1)
0009a400-0009 (2)
000e-000f (2)
0010-6ff83fff (1)
6ff84000-7ac50fff (2)


It should all be prefixed like kdump: or kexec: in any way.


without actually knowing what that is.

There should be nothing logged if that is not asked for and especially if
kexec load went fine, right?


Right. Before this patch, those pr_debug() were already there. You need
enable them to print out like add '#define DEBUG' in *.c file, or enable
the dynamic debugging of the file or function.


I think it's perfectly fine for DEBUG builds to print this out. And many 
(all major?) distros use dyndbg, so it used to print nothing by default.



With this patch applied,
you only need specify '-d' when you execute kexec command with
kexec_file load interface, like:

kexec -s -l -d /boot/vmlinuz-.img --initrd xxx.img --reuse-cmdline


Perhaps our (SUSE) tooling passes -d? But I am seeing this every time I 
boot.


No, it does not seem so:
load.sh[915]: Starting kdump kernel load; kexec cmdline: /sbin/kexec -p 
/var/lib/kdump/kernel --append=" loglevel=7 console=tty0 console=ttyS0 
video=1920x1080,1024x768,800x600 oops=panic 
lsm=lockdown,capability,integrity,selinux sysrq=yes reset_devices 
acpi_no_memhotplug cgroup_disable=memory nokaslr numa=off irqpoll 
nr_cpus=1 root=kdump rootflags=bind rd.udev.children-max=8 
disable_cpu_apicid=0   panic=1" --initrd=/var/lib/kdump/initrd  -a



For kexec_file load, it is not logging if not specifying '-d', unless
you take way to make pr_debug() work in that file.


So is -d detection malfunctioning under some circumstances?


Can this be redesigned, please?


Sure, after making clear what's going on with this, I will try.



Actually what was wrong on the pr_debug()s? Can you simply turn them on from
the kernel when -d is passed to kexec instead of all this?


Joe suggested this during v1 reviewing:
https://lore.kernel.org/all/1e7863ec4e4ab10b84fd0e64f30f8464d2e484a3.ca...@perches.com/T/#u



...

--- a/kernel/kexec_core.c
+++ b/kernel/kexec_core.c
@@ -52,6 +52,8 @@ atomic_t __kexec_lock = ATOMIC_INIT(0);
   /* Flag to indicate we are going to kexec a new kernel */
   bool kexec_in_progress = false;
+bool kexec_file_dbg_print;


Ugh, and a global flag for this?


Yeah, kexec_file_dbg_print records if '-d' is specified when 'kexec'
command executed. Anything wrong with the global flag?


Global variables are frowned upon. To cite coding style: unless you 
**really** need them. Here, it looks like you do not.


thanks,
--
js
suse labs

[PATCH v10 07/12] mm: Provide address parameter to p{te,md,ud}_user_accessible_page()

2024-03-12 Thread Rohan McLure

On several powerpc platforms, a page table entry may not imply whether
the relevant mapping is for userspace or kernelspace. Instead, such
platforms infer this by the address which is being accessed.

Add an additional address argument to each of these routines in order to
provide support for page table check on powerpc.

Signed-off-by: Rohan McLure 
---
 arch/arm64/include/asm/pgtable.h |  6 +++---
 arch/riscv/include/asm/pgtable.h |  6 +++---
 arch/x86/include/asm/pgtable.h   |  6 +++---
 mm/page_table_check.c| 12 ++--
 4 files changed, 15 insertions(+), 15 deletions(-)

diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index 74bb81744df2..8ea22deff9a3 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -874,17 +874,17 @@ static inline int pgd_devmap(pgd_t pgd)
 #endif
 
 #ifdef CONFIG_PAGE_TABLE_CHECK
-static inline bool pte_user_accessible_page(pte_t pte)
+static inline bool pte_user_accessible_page(pte_t pte, unsigned long addr)
 {
return pte_present(pte) && (pte_user(pte) || pte_user_exec(pte));
 }
 
-static inline bool pmd_user_accessible_page(pmd_t pmd)
+static inline bool pmd_user_accessible_page(pmd_t pmd, unsigned long addr)
 {
return pmd_leaf(pmd) && !pmd_present_invalid(pmd) && (pmd_user(pmd) || 
pmd_user_exec(pmd));
 }
 
-static inline bool pud_user_accessible_page(pud_t pud)
+static inline bool pud_user_accessible_page(pud_t pud, unsigned long addr)
 {
return pud_leaf(pud) && (pud_user(pud) || pud_user_exec(pud));
 }
diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
index 2fa6625f591a..f5c937007590 100644
--- a/arch/riscv/include/asm/pgtable.h
+++ b/arch/riscv/include/asm/pgtable.h
@@ -722,17 +722,17 @@ static inline void set_pud_at(struct mm_struct *mm, 
unsigned long addr,
 }
 
 #ifdef CONFIG_PAGE_TABLE_CHECK
-static inline bool pte_user_accessible_page(pte_t pte)
+static inline bool pte_user_accessible_page(pte_t pte, unsigned long addr)
 {
return pte_present(pte) && pte_user(pte);
 }
 
-static inline bool pmd_user_accessible_page(pmd_t pmd)
+static inline bool pmd_user_accessible_page(pmd_t pmd, unsigned long addr)
 {
return pmd_leaf(pmd) && pmd_user(pmd);
 }
 
-static inline bool pud_user_accessible_page(pud_t pud)
+static inline bool pud_user_accessible_page(pud_t pud, unsigned long addr)
 {
return pud_leaf(pud) && pud_user(pud);
 }
diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index e8fd625de280..514374a27124 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -1695,17 +1695,17 @@ static inline bool arch_has_hw_nonleaf_pmd_young(void)
 #endif
 
 #ifdef CONFIG_PAGE_TABLE_CHECK
-static inline bool pte_user_accessible_page(pte_t pte)
+static inline bool pte_user_accessible_page(pte_t pte, unsigned long addr)
 {
return (pte_val(pte) & _PAGE_PRESENT) && (pte_val(pte) & _PAGE_USER);
 }
 
-static inline bool pmd_user_accessible_page(pmd_t pmd)
+static inline bool pmd_user_accessible_page(pmd_t pmd, unsigned long addr)
 {
return pmd_leaf(pmd) && (pmd_val(pmd) & _PAGE_PRESENT) && (pmd_val(pmd) 
& _PAGE_USER);
 }
 
-static inline bool pud_user_accessible_page(pud_t pud)
+static inline bool pud_user_accessible_page(pud_t pud, unsigned long addr)
 {
return pud_leaf(pud) && (pud_val(pud) & _PAGE_PRESENT) && (pud_val(pud) 
& _PAGE_USER);
 }
diff --git a/mm/page_table_check.c b/mm/page_table_check.c
index 98cccee74b02..aa5e16c8328e 100644
--- a/mm/page_table_check.c
+++ b/mm/page_table_check.c
@@ -155,7 +155,7 @@ void __page_table_check_pte_clear(struct mm_struct *mm, 
unsigned long addr,
if (&init_mm == mm)
return;
 
-   if (pte_user_accessible_page(pte)) {
+   if (pte_user_accessible_page(pte, addr)) {
page_table_check_clear(pte_pfn(pte), PAGE_SIZE >> PAGE_SHIFT);
}
 }
@@ -167,7 +167,7 @@ void __page_table_check_pmd_clear(struct mm_struct *mm, 
unsigned long addr,
if (&init_mm == mm)
return;
 
-   if (pmd_user_accessible_page(pmd)) {
+   if (pmd_user_accessible_page(pmd, addr)) {
page_table_check_clear(pmd_pfn(pmd), PMD_SIZE >> PAGE_SHIFT);
}
 }
@@ -179,7 +179,7 @@ void __page_table_check_pud_clear(struct mm_struct *mm, 
unsigned long addr,
if (&init_mm == mm)
return;
 
-   if (pud_user_accessible_page(pud)) {
+   if (pud_user_accessible_page(pud, addr)) {
page_table_check_clear(pud_pfn(pud), PUD_SIZE >> PAGE_SHIFT);
}
 }
@@ -195,7 +195,7 @@ void __page_table_check_ptes_set(struct mm_struct *mm, 
unsigned long addr,
 
for (i = 0; i < nr; i++)
__page_table_check_pte_clear(mm, addr, ptep_get(ptep + i));
-   if (pte_user_accessible_page(pte))
+   if (pte_user_accessible_page(pte, addr))
page_table_check_set(pte_pfn(pte), nr, pte_write(pte))

[PATCH v10 03/12] mm: Provide addr parameter to page_table_check_pte_set()

2024-03-12 Thread Rohan McLure

To provide support for powerpc platforms, provide an address parameter
to the page_table_check_pte_set() routine. This parameter is needed on
some powerpc platforms which do not encode whether a mapping is for
user or kernel in the pte. On such platforms, this can be inferred
from the addr parameter.

Signed-off-by: Rohan McLure 
---
 arch/arm64/include/asm/pgtable.h |  2 +-
 arch/riscv/include/asm/pgtable.h |  2 +-
 include/linux/page_table_check.h | 12 +++-
 include/linux/pgtable.h  |  2 +-
 mm/page_table_check.c|  4 ++--
 5 files changed, 12 insertions(+), 10 deletions(-)

diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index e97c1b7e3ee1..965e35adb206 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -345,7 +345,7 @@ static inline void set_ptes(struct mm_struct *mm,
unsigned long __always_unused addr,
pte_t *ptep, pte_t pte, unsigned int nr)
 {
-   page_table_check_ptes_set(mm, ptep, pte, nr);
+   page_table_check_ptes_set(mm, addr, ptep, pte, nr);
__sync_cache_and_tags(pte, nr);
 
for (;;) {
diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
index 4e1ef3a77879..a4b5da7f0704 100644
--- a/arch/riscv/include/asm/pgtable.h
+++ b/arch/riscv/include/asm/pgtable.h
@@ -530,7 +530,7 @@ static inline void __set_pte_at(pte_t *ptep, pte_t pteval)
 static inline void set_ptes(struct mm_struct *mm, unsigned long addr,
pte_t *ptep, pte_t pteval, unsigned int nr)
 {
-   page_table_check_ptes_set(mm, ptep, pteval, nr);
+   page_table_check_ptes_set(mm, addr, ptep, pteval, nr);
 
for (;;) {
__set_pte_at(ptep, pteval);
diff --git a/include/linux/page_table_check.h b/include/linux/page_table_check.h
index 5855d690c48a..9243c920ed02 100644
--- a/include/linux/page_table_check.h
+++ b/include/linux/page_table_check.h
@@ -17,8 +17,8 @@ void __page_table_check_zero(struct page *page, unsigned int 
order);
 void __page_table_check_pte_clear(struct mm_struct *mm, pte_t pte);
 void __page_table_check_pmd_clear(struct mm_struct *mm, pmd_t pmd);
 void __page_table_check_pud_clear(struct mm_struct *mm, pud_t pud);
-void __page_table_check_ptes_set(struct mm_struct *mm, pte_t *ptep, pte_t pte,
-   unsigned int nr);
+void __page_table_check_ptes_set(struct mm_struct *mm, unsigned long addr,
+pte_t *ptep, pte_t pte, unsigned int nr);
 void __page_table_check_pmd_set(struct mm_struct *mm, unsigned long addr,
pmd_t *pmdp, pmd_t pmd);
 void __page_table_check_pud_set(struct mm_struct *mm, unsigned long addr,
@@ -68,12 +68,13 @@ static inline void page_table_check_pud_clear(struct 
mm_struct *mm, pud_t pud)
 }
 
 static inline void page_table_check_ptes_set(struct mm_struct *mm,
-   pte_t *ptep, pte_t pte, unsigned int nr)
+unsigned long addr, pte_t *ptep,
+pte_t pte, unsigned int nr)
 {
if (static_branch_likely(&page_table_check_disabled))
return;
 
-   __page_table_check_ptes_set(mm, ptep, pte, nr);
+   __page_table_check_ptes_set(mm, addr, ptep, pte, nr);
 }
 
 static inline void page_table_check_pmd_set(struct mm_struct *mm,
@@ -129,7 +130,8 @@ static inline void page_table_check_pud_clear(struct 
mm_struct *mm, pud_t pud)
 }
 
 static inline void page_table_check_ptes_set(struct mm_struct *mm,
-   pte_t *ptep, pte_t pte, unsigned int nr)
+unsigned long addr, pte_t *ptep,
+pte_t pte, unsigned int nr)
 {
 }
 
diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
index f6d0e3513948..5da04d056bc3 100644
--- a/include/linux/pgtable.h
+++ b/include/linux/pgtable.h
@@ -238,7 +238,7 @@ static inline pte_t pte_next_pfn(pte_t pte)
 static inline void set_ptes(struct mm_struct *mm, unsigned long addr,
pte_t *ptep, pte_t pte, unsigned int nr)
 {
-   page_table_check_ptes_set(mm, ptep, pte, nr);
+   page_table_check_ptes_set(mm, addr, ptep, pte, nr);
 
arch_enter_lazy_mmu_mode();
for (;;) {
diff --git a/mm/page_table_check.c b/mm/page_table_check.c
index 7b9d7b45505d..3a338fee6d00 100644
--- a/mm/page_table_check.c
+++ b/mm/page_table_check.c
@@ -182,8 +182,8 @@ void __page_table_check_pud_clear(struct mm_struct *mm, 
pud_t pud)
 }
 EXPORT_SYMBOL(__page_table_check_pud_clear);
 
-void __page_table_check_ptes_set(struct mm_struct *mm, pte_t *ptep, pte_t pte,
-   unsigned int nr)
+void __page_table_check_ptes_set(struct mm_struct *mm, unsigned long addr,
+pte_t *ptep, pte_t pte, unsigned int nr)
 {
unsigned int i;
 
-- 
2.44.0

[PATCH v10 05/12] Revert "mm/page_table_check: remove unused parameter in [__]page_table_check_pmd_clear"

2024-03-12 Thread Rohan McLure

This reverts commit 1831414cd729a34af937d56ad684a66599de6344.

Reinstate previously unused parameters for the purpose of supporting
powerpc platforms, as many do not encode user/kernel ownership of the
page in the pte, but instead in the address of the access.

Signed-off-by: Rohan McLure 
---
 arch/arm64/include/asm/pgtable.h |  2 +-
 arch/riscv/include/asm/pgtable.h |  2 +-
 arch/x86/include/asm/pgtable.h   |  2 +-
 include/linux/page_table_check.h | 11 +++
 include/linux/pgtable.h  |  2 +-
 mm/page_table_check.c|  5 +++--
 6 files changed, 14 insertions(+), 10 deletions(-)

diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index 965e35adb206..4210d3b071ec 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -965,7 +965,7 @@ static inline pmd_t pmdp_huge_get_and_clear(struct 
mm_struct *mm,
 {
pmd_t pmd = __pmd(xchg_relaxed(&pmd_val(*pmdp), 0));
 
-   page_table_check_pmd_clear(mm, pmd);
+   page_table_check_pmd_clear(mm, address, pmd);
 
return pmd;
 }
diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
index a4b5da7f0704..cf8e18f27649 100644
--- a/arch/riscv/include/asm/pgtable.h
+++ b/arch/riscv/include/asm/pgtable.h
@@ -765,7 +765,7 @@ static inline pmd_t pmdp_huge_get_and_clear(struct 
mm_struct *mm,
 {
pmd_t pmd = __pmd(atomic_long_xchg((atomic_long_t *)pmdp, 0));
 
-   page_table_check_pmd_clear(mm, pmd);
+   page_table_check_pmd_clear(mm, address, pmd);
 
return pmd;
 }
diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index 3e7003b01c9d..26722f553c43 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -1352,7 +1352,7 @@ static inline pmd_t pmdp_huge_get_and_clear(struct 
mm_struct *mm, unsigned long
 {
pmd_t pmd = native_pmdp_get_and_clear(pmdp);
 
-   page_table_check_pmd_clear(mm, pmd);
+   page_table_check_pmd_clear(mm, addr, pmd);
 
return pmd;
 }
diff --git a/include/linux/page_table_check.h b/include/linux/page_table_check.h
index d01a00ffc1f9..0a6ebfa46a31 100644
--- a/include/linux/page_table_check.h
+++ b/include/linux/page_table_check.h
@@ -15,7 +15,8 @@ extern struct page_ext_operations page_table_check_ops;
 
 void __page_table_check_zero(struct page *page, unsigned int order);
 void __page_table_check_pte_clear(struct mm_struct *mm, pte_t pte);
-void __page_table_check_pmd_clear(struct mm_struct *mm, pmd_t pmd);
+void __page_table_check_pmd_clear(struct mm_struct *mm, unsigned long addr,
+ pmd_t pmd);
 void __page_table_check_pud_clear(struct mm_struct *mm, unsigned long addr,
  pud_t pud);
 void __page_table_check_ptes_set(struct mm_struct *mm, unsigned long addr,
@@ -52,12 +53,13 @@ static inline void page_table_check_pte_clear(struct 
mm_struct *mm, pte_t pte)
__page_table_check_pte_clear(mm, pte);
 }
 
-static inline void page_table_check_pmd_clear(struct mm_struct *mm, pmd_t pmd)
+static inline void page_table_check_pmd_clear(struct mm_struct *mm,
+ unsigned long addr, pmd_t pmd)
 {
if (static_branch_likely(&page_table_check_disabled))
return;
 
-   __page_table_check_pmd_clear(mm, pmd);
+   __page_table_check_pmd_clear(mm, addr, pmd);
 }
 
 static inline void page_table_check_pud_clear(struct mm_struct *mm,
@@ -123,7 +125,8 @@ static inline void page_table_check_pte_clear(struct 
mm_struct *mm, pte_t pte)
 {
 }
 
-static inline void page_table_check_pmd_clear(struct mm_struct *mm, pmd_t pmd)
+static inline void page_table_check_pmd_clear(struct mm_struct *mm,
+ unsigned long addr, pmd_t pmd)
 {
 }
 
diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
index 77877ae8abef..d0d1a0bbf905 100644
--- a/include/linux/pgtable.h
+++ b/include/linux/pgtable.h
@@ -531,7 +531,7 @@ static inline pmd_t pmdp_huge_get_and_clear(struct 
mm_struct *mm,
pmd_t pmd = *pmdp;
 
pmd_clear(pmdp);
-   page_table_check_pmd_clear(mm, pmd);
+   page_table_check_pmd_clear(mm, address, pmd);
 
return pmd;
 }
diff --git a/mm/page_table_check.c b/mm/page_table_check.c
index a8c8fd7f06f8..7afaad9c6e6f 100644
--- a/mm/page_table_check.c
+++ b/mm/page_table_check.c
@@ -160,7 +160,8 @@ void __page_table_check_pte_clear(struct mm_struct *mm, 
pte_t pte)
 }
 EXPORT_SYMBOL(__page_table_check_pte_clear);
 
-void __page_table_check_pmd_clear(struct mm_struct *mm, pmd_t pmd)
+void __page_table_check_pmd_clear(struct mm_struct *mm, unsigned long addr,
+ pmd_t pmd)
 {
if (&init_mm == mm)
return;
@@ -204,7 +205,7 @@ void __page_table_check_pmd_set(struct mm_struct *mm, 
unsigned long addr,
if (&init_mm == mm)
return;
 
-   __page_table_check_pmd_clear(mm, *pmdp

[PATCH v10 00/12] Support page table check PowerPC

2024-03-12 Thread Rohan McLure

Support page table check on all PowerPC platforms. This works by
serialising assignments, reassignments and clears of page table
entries at each level in order to ensure that anonymous mappings
have at most one writable consumer, and likewise that file-backed
mappings are not simultaneously also anonymous mappings.

In order to support this infrastructure, a number of stubs must be
defined for all powerpc platforms. Additionally, seperate set_pte_at()
and set_pte_at_unchecked(), to allow for internal, uninstrumented mappings.

v10:
 * Revert patches that removed address and mm parameters from page table
   check routines, including consuming code from arm64, x86_64 and
   riscv.
 * Implement *_user_accessible_page() routines in terms of pte_user()
   where available (64-bit, book3s) but otherwise by checking the
   address (on platforms where the pte does not imply whether the
   mapping is for user or kernel) 
 * Internal set_pte_at() calls replaced with set_pte_at_unchecked(), which
   is identical, but prevents double instrumentation.

v9:
 * Adapt to using the set_ptes() API, using __set_pte_at() where we need
   must avoid instrumentation.
 * Use the logic of *_access_permitted() for implementing
   *_user_accessible_page(), which are required routines for page table
   check.
 * Even though we no longer need p{m,u,4}d_leaf(), still default
   implement these to assist in refactoring out extant
   p{m,u,4}_is_leaf().
 * Add p{m,u}_pte() stubs where asm-generic does not provide them, as
   page table check wants all *user_accessible_page() variants, and we
   would like to default implement the variants in terms of
   pte_user_accessible_page().
 * Avoid the ugly pmdp_collapse_flush() macro nonsense! Just instrument
   its constituent calls instead for radix and hash.
Link: 
https://lore.kernel.org/linuxppc-dev/20231130025404.37179-2-rmcl...@linux.ibm.com/

v8:
 * Fix linux/page_table_check.h include in asm/pgtable.h breaking
   32-bit.
Link: 
https://lore.kernel.org/linuxppc-dev/20230215231153.2147454-1-rmcl...@linux.ibm.com/

v7:
 * Remove use of extern in set_pte prototypes
 * Clean up pmdp_collapse_flush macro
 * Replace set_pte_at with static inline function
 * Fix commit message for patch 7
Link: 
https://lore.kernel.org/linuxppc-dev/20230215020155.1969194-1-rmcl...@linux.ibm.com/

v6:
 * Support huge pages and p{m,u}d accounting.
 * Remove instrumentation from set_pte from kernel internal pages.
 * 64s: Implement pmdp_collapse_flush in terms of __pmdp_collapse_flush
   as access to the mm_struct * is required.
Link: 
https://lore.kernel.org/linuxppc-dev/20230214015939.1853438-1-rmcl...@linux.ibm.com/

v5:
Link: 
https://lore.kernel.org/linuxppc-dev/20221118002146.25979-1-rmcl...@linux.ibm.com/

Rohan McLure (12):
  Revert "mm/page_table_check: remove unused parameter in
[__]page_table_check_pud_set"
  Revert "mm/page_table_check: remove unused parameter in
[__]page_table_check_pmd_set"
  mm: Provide addr parameter to page_table_check_pte_set()
  Revert "mm/page_table_check: remove unused parameter in
[__]page_table_check_pud_clear"
  Revert "mm/page_table_check: remove unused parameter in
[__]page_table_check_pmd_clear"
  Revert "mm/page_table_check: remove unused parameter in
[__]page_table_check_pte_clear"
  mm: Provide address parameter to p{te,md,ud}_user_accessible_page()
  powerpc: mm: Replace p{u,m,4}d_is_leaf with p{u,m,4}_leaf
  powerpc: mm: Add common pud_pfn stub for all platforms
  poweprc: mm: Implement *_user_accessible_page() for ptes
  powerpc: mm: Use set_pte_at_unchecked() for early-boot / internal
usages
  powerpc: mm: Support page table check

 arch/arm64/include/asm/pgtable.h | 18 ++---
 arch/powerpc/Kconfig |  1 +
 arch/powerpc/include/asm/book3s/32/pgtable.h |  7 +-
 arch/powerpc/include/asm/book3s/64/pgtable.h | 74 +++-
 arch/powerpc/include/asm/pgtable.h   | 66 ++---
 arch/powerpc/kvm/book3s_64_mmu_radix.c   | 12 ++--
 arch/powerpc/mm/book3s64/hash_pgtable.c  |  6 +-
 arch/powerpc/mm/book3s64/pgtable.c   | 17 +++--
 arch/powerpc/mm/book3s64/radix_pgtable.c | 25 ---
 arch/powerpc/mm/nohash/book3e_pgtable.c  |  2 +-
 arch/powerpc/mm/pgtable.c| 17 -
 arch/powerpc/mm/pgtable_32.c |  2 +-
 arch/powerpc/mm/pgtable_64.c |  6 +-
 arch/powerpc/xmon/xmon.c |  6 +-
 arch/riscv/include/asm/pgtable.h | 18 ++---
 arch/x86/include/asm/pgtable.h   | 20 +++---
 include/linux/page_table_check.h | 67 +++---
 include/linux/pgtable.h  |  8 +--
 mm/page_table_check.c| 39 ++-
 19 files changed, 261 insertions(+), 150 deletions(-)

-- 
2.44.0

[PATCH v10 08/12] powerpc: mm: Replace p{u,m,4}d_is_leaf with p{u,m,4}_leaf

2024-03-12 Thread Rohan McLure

Replace occurrences of p{u,m,4}d_is_leaf with p{u,m,4}_leaf, as the
latter is the name given to checking that a higher-level entry in
multi-level paging contains a page translation entry (pte) throughout
all other archs.

Reviewed-by: Christophe Leroy 
Signed-off-by: Rohan McLure 
---
v9: No longer required in order to implement page table check, just a
refactor.
v10: Fix more occurances, and just delete p{u,m,4}_is_leaf() stubs as
equivalent p{u,m,4}_leaf() stubs already exist.
---
 arch/powerpc/include/asm/book3s/64/pgtable.h | 10 
 arch/powerpc/include/asm/pgtable.h   | 24 
 arch/powerpc/kvm/book3s_64_mmu_radix.c   | 12 +-
 arch/powerpc/mm/book3s64/radix_pgtable.c | 14 ++--
 arch/powerpc/mm/pgtable.c|  6 ++---
 arch/powerpc/mm/pgtable_64.c |  6 ++---
 arch/powerpc/xmon/xmon.c |  6 ++---
 7 files changed, 26 insertions(+), 52 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h 
b/arch/powerpc/include/asm/book3s/64/pgtable.h
index 62c43d3d80ec..382724c5e872 100644
--- a/arch/powerpc/include/asm/book3s/64/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
@@ -1443,16 +1443,14 @@ static inline bool is_pte_rw_upgrade(unsigned long 
old_val, unsigned long new_va
 /*
  * Like pmd_huge() and pmd_large(), but works regardless of config options
  */
-#define pmd_is_leaf pmd_is_leaf
-#define pmd_leaf pmd_is_leaf
-static inline bool pmd_is_leaf(pmd_t pmd)
+#define pmd_leaf pmd_leaf
+static inline bool pmd_leaf(pmd_t pmd)
 {
return !!(pmd_raw(pmd) & cpu_to_be64(_PAGE_PTE));
 }
 
-#define pud_is_leaf pud_is_leaf
-#define pud_leaf pud_is_leaf
-static inline bool pud_is_leaf(pud_t pud)
+#define pud_leaf pud_leaf
+static inline bool pud_leaf(pud_t pud)
 {
return !!(pud_raw(pud) & cpu_to_be64(_PAGE_PTE));
 }
diff --git a/arch/powerpc/include/asm/pgtable.h 
b/arch/powerpc/include/asm/pgtable.h
index 9224f23065ff..0c0ffbe7a3b5 100644
--- a/arch/powerpc/include/asm/pgtable.h
+++ b/arch/powerpc/include/asm/pgtable.h
@@ -180,30 +180,6 @@ static inline void pte_frag_set(mm_context_t *ctx, void *p)
 }
 #endif
 
-#ifndef pmd_is_leaf
-#define pmd_is_leaf pmd_is_leaf
-static inline bool pmd_is_leaf(pmd_t pmd)
-{
-   return false;
-}
-#endif
-
-#ifndef pud_is_leaf
-#define pud_is_leaf pud_is_leaf
-static inline bool pud_is_leaf(pud_t pud)
-{
-   return false;
-}
-#endif
-
-#ifndef p4d_is_leaf
-#define p4d_is_leaf p4d_is_leaf
-static inline bool p4d_is_leaf(p4d_t p4d)
-{
-   return false;
-}
-#endif
-
 #define pmd_pgtable pmd_pgtable
 static inline pgtable_t pmd_pgtable(pmd_t pmd)
 {
diff --git a/arch/powerpc/kvm/book3s_64_mmu_radix.c 
b/arch/powerpc/kvm/book3s_64_mmu_radix.c
index 4a1abb9f7c05..408d98f8a514 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_radix.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_radix.c
@@ -503,7 +503,7 @@ static void kvmppc_unmap_free_pmd(struct kvm *kvm, pmd_t 
*pmd, bool full,
for (im = 0; im < PTRS_PER_PMD; ++im, ++p) {
if (!pmd_present(*p))
continue;
-   if (pmd_is_leaf(*p)) {
+   if (pmd_leaf(*p)) {
if (full) {
pmd_clear(p);
} else {
@@ -532,7 +532,7 @@ static void kvmppc_unmap_free_pud(struct kvm *kvm, pud_t 
*pud,
for (iu = 0; iu < PTRS_PER_PUD; ++iu, ++p) {
if (!pud_present(*p))
continue;
-   if (pud_is_leaf(*p)) {
+   if (pud_leaf(*p)) {
pud_clear(p);
} else {
pmd_t *pmd;
@@ -635,12 +635,12 @@ int kvmppc_create_pte(struct kvm *kvm, pgd_t *pgtable, 
pte_t pte,
new_pud = pud_alloc_one(kvm->mm, gpa);
 
pmd = NULL;
-   if (pud && pud_present(*pud) && !pud_is_leaf(*pud))
+   if (pud && pud_present(*pud) && !pud_leaf(*pud))
pmd = pmd_offset(pud, gpa);
else if (level <= 1)
new_pmd = kvmppc_pmd_alloc();
 
-   if (level == 0 && !(pmd && pmd_present(*pmd) && !pmd_is_leaf(*pmd)))
+   if (level == 0 && !(pmd && pmd_present(*pmd) && !pmd_leaf(*pmd)))
new_ptep = kvmppc_pte_alloc();
 
/* Check if we might have been invalidated; let the guest retry if so */
@@ -658,7 +658,7 @@ int kvmppc_create_pte(struct kvm *kvm, pgd_t *pgtable, 
pte_t pte,
new_pud = NULL;
}
pud = pud_offset(p4d, gpa);
-   if (pud_is_leaf(*pud)) {
+   if (pud_leaf(*pud)) {
unsigned long hgpa = gpa & PUD_MASK;
 
/* Check if we raced and someone else has set the same thing */
@@ -709,7 +709,7 @@ int kvmppc_create_pte(struct kvm *kvm, pgd_t *pgtable, 
pte_t pte,
new_pmd = NULL;
}
pmd = pmd_offset(pud, gpa);
-   if (pmd_is_leaf(*pmd)) {
+   if (pmd_leaf(*pmd)) {
unsigned long

[PATCH v10 12/12] powerpc: mm: Support page table check

2024-03-12 Thread Rohan McLure

On creation and clearing of a page table mapping, instrument such calls
by invoking page_table_check_pte_set and page_table_check_pte_clear
respectively. These calls serve as a sanity check against illegal
mappings.

Enable ARCH_SUPPORTS_PAGE_TABLE_CHECK for all platforms.

See also:

riscv support in commit 3fee229a8eb9 ("riscv/mm: enable
ARCH_SUPPORTS_PAGE_TABLE_CHECK")
arm64 in commit 42b2547137f5 ("arm64/mm: enable
ARCH_SUPPORTS_PAGE_TABLE_CHECK")
x86_64 in commit d283d422c6c4 ("x86: mm: add x86_64 support for page table
check")

Reviewed-by: Christophe Leroy 
Signed-off-by: Rohan McLure 
---
v9: Updated for new API. Instrument pmdp_collapse_flush's two
constituent calls to avoid header hell
v10: Cause p{u,m}dp_huge_get_and_clear() to resemble one another
---
 arch/powerpc/Kconfig |  1 +
 arch/powerpc/include/asm/book3s/32/pgtable.h |  7 ++-
 arch/powerpc/include/asm/book3s/64/pgtable.h | 45 +++-
 arch/powerpc/mm/book3s64/hash_pgtable.c  |  4 ++
 arch/powerpc/mm/book3s64/pgtable.c   | 11 +++--
 arch/powerpc/mm/book3s64/radix_pgtable.c |  3 ++
 arch/powerpc/mm/pgtable.c|  4 ++
 7 files changed, 61 insertions(+), 14 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index b9fc064d38d2..2dfa5ccb25cc 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -166,6 +166,7 @@ config PPC
select ARCH_STACKWALK
select ARCH_SUPPORTS_ATOMIC_RMW
select ARCH_SUPPORTS_DEBUG_PAGEALLOCif PPC_BOOK3S || PPC_8xx || 40x
+   select ARCH_SUPPORTS_PAGE_TABLE_CHECK
select ARCH_USE_BUILTIN_BSWAP
select ARCH_USE_CMPXCHG_LOCKREF if PPC64
select ARCH_USE_MEMTEST
diff --git a/arch/powerpc/include/asm/book3s/32/pgtable.h 
b/arch/powerpc/include/asm/book3s/32/pgtable.h
index 52971ee30717..a97edbc09984 100644
--- a/arch/powerpc/include/asm/book3s/32/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/32/pgtable.h
@@ -201,6 +201,7 @@ void unmap_kernel_page(unsigned long va);
 #ifndef __ASSEMBLY__
 #include 
 #include 
+#include 
 
 /* Bits to mask out from a PGD to get to the PUD page */
 #define PGD_MASKED_BITS0
@@ -314,7 +315,11 @@ static inline int __ptep_test_and_clear_young(struct 
mm_struct *mm,
 static inline pte_t ptep_get_and_clear(struct mm_struct *mm, unsigned long 
addr,
   pte_t *ptep)
 {
-   return __pte(pte_update(mm, addr, ptep, ~_PAGE_HASHPTE, 0, 0));
+   pte_t old_pte = __pte(pte_update(mm, addr, ptep, ~_PAGE_HASHPTE, 0, 0));
+
+   page_table_check_pte_clear(mm, addr, old_pte);
+
+   return old_pte;
 }
 
 #define __HAVE_ARCH_PTEP_SET_WRPROTECT
diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h 
b/arch/powerpc/include/asm/book3s/64/pgtable.h
index ca765331e21d..4ad88d4ede88 100644
--- a/arch/powerpc/include/asm/book3s/64/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
@@ -145,6 +145,8 @@
 #define PAGE_KERNEL_ROX__pgprot(_PAGE_BASE | _PAGE_KERNEL_ROX)
 
 #ifndef __ASSEMBLY__
+#include 
+
 /*
  * page table defines
  */
@@ -415,8 +417,11 @@ static inline void huge_ptep_set_wrprotect(struct 
mm_struct *mm,
 static inline pte_t ptep_get_and_clear(struct mm_struct *mm,
   unsigned long addr, pte_t *ptep)
 {
-   unsigned long old = pte_update(mm, addr, ptep, ~0UL, 0, 0);
-   return __pte(old);
+   pte_t old_pte = __pte(pte_update(mm, addr, ptep, ~0UL, 0, 0));
+
+   page_table_check_pte_clear(mm, addr, old_pte);
+
+   return old_pte;
 }
 
 #define __HAVE_ARCH_PTEP_GET_AND_CLEAR_FULL
@@ -425,11 +430,16 @@ static inline pte_t ptep_get_and_clear_full(struct 
mm_struct *mm,
pte_t *ptep, int full)
 {
if (full && radix_enabled()) {
+   pte_t old_pte;
+
/*
 * We know that this is a full mm pte clear and
 * hence can be sure there is no parallel set_pte.
 */
-   return radix__ptep_get_and_clear_full(mm, addr, ptep, full);
+   old_pte = radix__ptep_get_and_clear_full(mm, addr, ptep, full);
+   page_table_check_pte_clear(mm, addr, old_pte);
+
+   return old_pte;
}
return ptep_get_and_clear(mm, addr, ptep);
 }
@@ -1334,19 +1344,34 @@ extern int pudp_test_and_clear_young(struct 
vm_area_struct *vma,
 static inline pmd_t pmdp_huge_get_and_clear(struct mm_struct *mm,
unsigned long addr, pmd_t *pmdp)
 {
-   if (radix_enabled())
-   return radix__pmdp_huge_get_and_clear(mm, addr, pmdp);
-   return hash__pmdp_huge_get_and_clear(mm, addr, pmdp);
+   pmd_t old_pmd;
+
+   if (radix_enabled()) {
+   old_pmd = radix__pmdp_huge_get_and_clear(mm, addr, pmdp);
+   } else {
+   old_pmd = hash__pmdp_huge_get_and_clear(mm, addr, pmdp);
+   }
+
+   pa

[PATCH v10 02/12] Revert "mm/page_table_check: remove unused parameter in [__]page_table_check_pmd_set"

2024-03-12 Thread Rohan McLure

This reverts commit a3b837130b5865521fa8662aceaa6ebc8d29389a.

Reinstate previously unused parameters for the purpose of supporting
powerpc platforms, as many do not encode user/kernel ownership of the
page in the pte, but instead in the address of the access.

riscv: Respect change to delete mm, addr parameters from __set_pte_at()

This commit also changed calls to __set_pte_at() to use fewer parameters
on riscv. Keep that change rather than reverting it, as the signature of
__set_pte_at() is changed in a different commit.

Signed-off-by: Rohan McLure 
---
 arch/arm64/include/asm/pgtable.h |  4 ++--
 arch/riscv/include/asm/pgtable.h |  4 ++--
 arch/x86/include/asm/pgtable.h   |  4 ++--
 include/linux/page_table_check.h | 11 +++
 mm/page_table_check.c|  3 ++-
 5 files changed, 15 insertions(+), 11 deletions(-)

diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index a965b59401b3..e97c1b7e3ee1 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -540,7 +540,7 @@ static inline void __set_pte_at(struct mm_struct *mm,
 static inline void set_pmd_at(struct mm_struct *mm, unsigned long addr,
  pmd_t *pmdp, pmd_t pmd)
 {
-   page_table_check_pmd_set(mm, pmdp, pmd);
+   page_table_check_pmd_set(mm, addr, pmdp, pmd);
return __set_pte_at(mm, addr, (pte_t *)pmdp, pmd_pte(pmd),
PMD_SIZE >> PAGE_SHIFT);
 }
@@ -1001,7 +1001,7 @@ static inline void pmdp_set_wrprotect(struct mm_struct 
*mm,
 static inline pmd_t pmdp_establish(struct vm_area_struct *vma,
unsigned long address, pmd_t *pmdp, pmd_t pmd)
 {
-   page_table_check_pmd_set(vma->vm_mm, pmdp, pmd);
+   page_table_check_pmd_set(vma->vm_mm, address, pmdp, pmd);
return __pmd(xchg_relaxed(&pmd_val(*pmdp), pmd_val(pmd)));
 }
 #endif
diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
index 4fc99dd3bb42..4e1ef3a77879 100644
--- a/arch/riscv/include/asm/pgtable.h
+++ b/arch/riscv/include/asm/pgtable.h
@@ -710,7 +710,7 @@ static inline pmd_t pmd_mkdirty(pmd_t pmd)
 static inline void set_pmd_at(struct mm_struct *mm, unsigned long addr,
pmd_t *pmdp, pmd_t pmd)
 {
-   page_table_check_pmd_set(mm, pmdp, pmd);
+   page_table_check_pmd_set(mm, addr, pmdp, pmd);
return __set_pte_at((pte_t *)pmdp, pmd_pte(pmd));
 }
 
@@ -781,7 +781,7 @@ static inline void pmdp_set_wrprotect(struct mm_struct *mm,
 static inline pmd_t pmdp_establish(struct vm_area_struct *vma,
unsigned long address, pmd_t *pmdp, pmd_t pmd)
 {
-   page_table_check_pmd_set(vma->vm_mm, pmdp, pmd);
+   page_table_check_pmd_set(vma->vm_mm, address, pmdp, pmd);
return __pmd(atomic_long_xchg((atomic_long_t *)pmdp, pmd_val(pmd)));
 }
 
diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index a9b1e8e6d4b9..6a7dc2524344 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -1245,7 +1245,7 @@ static inline pud_t native_local_pudp_get_and_clear(pud_t 
*pudp)
 static inline void set_pmd_at(struct mm_struct *mm, unsigned long addr,
  pmd_t *pmdp, pmd_t pmd)
 {
-   page_table_check_pmd_set(mm, pmdp, pmd);
+   page_table_check_pmd_set(mm, addr, pmdp, pmd);
set_pmd(pmdp, pmd);
 }
 
@@ -1390,7 +1390,7 @@ static inline void pmdp_set_wrprotect(struct mm_struct 
*mm,
 static inline pmd_t pmdp_establish(struct vm_area_struct *vma,
unsigned long address, pmd_t *pmdp, pmd_t pmd)
 {
-   page_table_check_pmd_set(vma->vm_mm, pmdp, pmd);
+   page_table_check_pmd_set(vma->vm_mm, address, pmdp, pmd);
if (IS_ENABLED(CONFIG_SMP)) {
return xchg(pmdp, pmd);
} else {
diff --git a/include/linux/page_table_check.h b/include/linux/page_table_check.h
index d188428512f5..5855d690c48a 100644
--- a/include/linux/page_table_check.h
+++ b/include/linux/page_table_check.h
@@ -19,7 +19,8 @@ void __page_table_check_pmd_clear(struct mm_struct *mm, pmd_t 
pmd);
 void __page_table_check_pud_clear(struct mm_struct *mm, pud_t pud);
 void __page_table_check_ptes_set(struct mm_struct *mm, pte_t *ptep, pte_t pte,
unsigned int nr);
-void __page_table_check_pmd_set(struct mm_struct *mm, pmd_t *pmdp, pmd_t pmd);
+void __page_table_check_pmd_set(struct mm_struct *mm, unsigned long addr,
+   pmd_t *pmdp, pmd_t pmd);
 void __page_table_check_pud_set(struct mm_struct *mm, unsigned long addr,
pud_t *pudp, pud_t pud);
 void __page_table_check_pte_clear_range(struct mm_struct *mm,
@@ -75,13 +76,14 @@ static inline void page_table_check_ptes_set(struct 
mm_struct *mm,
__page_table_check_ptes_set(mm, ptep, pte, nr);
 }
 
-static inline void page_table_check_pmd_set(struct mm_struct *mm, pmd_t *pmdp,
+static inline voi

[PATCH v10 10/12] poweprc: mm: Implement *_user_accessible_page() for ptes

2024-03-12 Thread Rohan McLure

Page table checking depends on architectures providing an
implementation of p{te,md,ud}_user_accessible_page. With
refactorisations made on powerpc/mm, the pte_access_permitted() and
similar methods verify whether a userland page is accessible with the
required permissions.

Since page table checking is the only user of
p{te,md,ud}_user_accessible_page(), implement these for all platforms,
using some of the same preliminay checks taken by pte_access_permitted()
on that platform.

Since Commit 8e9bd41e4ce1 ("powerpc/nohash: Replace pte_user() by pte_read()")
pte_user() is no longer required to be present on all platforms as it
may be equivalent to or implied by pte_read(). Hence implementations are
specialised.

Signed-off-by: Rohan McLure 
---
v9: New implementation
v10: Let book3s/64 use pte_user(), but otherwise default other platforms
to using the address provided with the call to infer whether it is a
user page or not. pmd/pud variants will warn on all other platforms, as
they should not be used for user page mappings
---
 arch/powerpc/include/asm/book3s/64/pgtable.h | 19 ++
 arch/powerpc/include/asm/pgtable.h   | 26 
 2 files changed, 45 insertions(+)

diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h 
b/arch/powerpc/include/asm/book3s/64/pgtable.h
index 382724c5e872..ca765331e21d 100644
--- a/arch/powerpc/include/asm/book3s/64/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
@@ -538,6 +538,12 @@ static inline bool pte_access_permitted(pte_t pte, bool 
write)
return arch_pte_access_permitted(pte_val(pte), write, 0);
 }
 
+#define pte_user_accessible_page pte_user_accessible_page
+static inline bool pte_user_accessible_page(pte_t pte, unsigned long addr)
+{
+   return pte_present(pte) && pte_user(pte);
+}
+
 /*
  * Conversion functions: convert a page and protection to a page entry,
  * and a page entry and page directory to the page they refer to.
@@ -881,6 +887,7 @@ static inline int pud_present(pud_t pud)
 
 extern struct page *pud_page(pud_t pud);
 extern struct page *pmd_page(pmd_t pmd);
+
 static inline pte_t pud_pte(pud_t pud)
 {
return __pte_raw(pud_raw(pud));
@@ -926,6 +933,12 @@ static inline bool pud_access_permitted(pud_t pud, bool 
write)
return pte_access_permitted(pud_pte(pud), write);
 }
 
+#define pud_user_accessible_page pud_user_accessible_page
+static inline bool pud_user_accessible_page(pud_t pud, unsigned long addr)
+{
+   return pte_user_accessible_page(pud_pte(pud), addr);
+}
+
 #define __p4d_raw(x)   ((p4d_t) { __pgd_raw(x) })
 static inline __be64 p4d_raw(p4d_t x)
 {
@@ -1091,6 +1104,12 @@ static inline bool pmd_access_permitted(pmd_t pmd, bool 
write)
return pte_access_permitted(pmd_pte(pmd), write);
 }
 
+#define pmd_user_accessible_page pmd_user_accessible_page
+static inline bool pmd_user_accessible_page(pmd_t pmd, unsigned long addr)
+{
+   return pte_user_accessible_page(pmd_pte(pmd), addr);
+}
+
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
 extern pmd_t pfn_pmd(unsigned long pfn, pgprot_t pgprot);
 extern pud_t pfn_pud(unsigned long pfn, pgprot_t pgprot);
diff --git a/arch/powerpc/include/asm/pgtable.h 
b/arch/powerpc/include/asm/pgtable.h
index 13f661831333..3741a63fb82e 100644
--- a/arch/powerpc/include/asm/pgtable.h
+++ b/arch/powerpc/include/asm/pgtable.h
@@ -227,6 +227,32 @@ static inline int pud_pfn(pud_t pud)
 }
 #endif
 
+#ifndef pte_user_accessible_page
+#define pte_user_accessible_page pte_user_accessible_page
+static inline bool pte_user_accessible_page(pte_t pte, unsigned long addr)
+{
+   return pte_present(pte) && !is_kernel_addr(addr);
+}
+#endif
+
+#ifndef pmd_user_accessible_page
+#define pmd_user_accessible_page pmd_user_accessible_page
+static inline bool pmd_user_accessible_page(pmd_t pmd, unsigned long addr)
+{
+   WARN_ONCE(1, "pmd: platform does not use pmd entries directly");
+   return false;
+}
+#endif
+
+#ifndef pud_user_accessible_page
+#define pud_user_accessible_page pud_user_accessible_page
+static inline bool pud_user_accessible_page(pud_t pud, unsigned long addr)
+{
+   WARN_ONCE(1, "pud: platform does not use pud entries directly");
+   return false;
+}
+#endif
+
 #endif /* __ASSEMBLY__ */
 
 #endif /* _ASM_POWERPC_PGTABLE_H */
-- 
2.44.0

[PATCH v10 11/12] powerpc: mm: Use set_pte_at_unchecked() for early-boot / internal usages

2024-03-12 Thread Rohan McLure

In the new set_ptes() API, set_pte_at() (a special case of set_ptes())
is intended to be instrumented by the page table check facility. There
are however several other routines that constitute the API for setting
page table entries, including set_pmd_at() among others. Such routines
are themselves implemented in terms of set_ptes_at().

A future patch providing support for page table checking on powerpc
must take care to avoid duplicate calls to
page_table_check_p{te,md,ud}_set(). Allow for assignment of pte entries
without instrumentation through the set_pte_at_unchecked() routine
introduced in this patch.

Cause API-facing routines that call set_pte_at() to instead call
set_pte_at_unchecked(), which will remain uninstrumented by page
table check. set_ptes() is itself implemented by calls to
__set_pte_at(), so this eliminates redundant code.

Also prefer set_pte_at_unchecked() in early-boot usages which should not be
instrumented.

Signed-off-by: Rohan McLure 
---
v9: New patch
v10: don't reuse __set_pte_at(), as that will not apply filters. Instead
use new set_pte_at_unchecked().
---
 arch/powerpc/include/asm/pgtable.h   | 2 ++
 arch/powerpc/mm/book3s64/hash_pgtable.c  | 2 +-
 arch/powerpc/mm/book3s64/pgtable.c   | 6 +++---
 arch/powerpc/mm/book3s64/radix_pgtable.c | 8 
 arch/powerpc/mm/nohash/book3e_pgtable.c  | 2 +-
 arch/powerpc/mm/pgtable.c| 7 +++
 arch/powerpc/mm/pgtable_32.c | 2 +-
 7 files changed, 19 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/include/asm/pgtable.h 
b/arch/powerpc/include/asm/pgtable.h
index 3741a63fb82e..6ff1d8cfa216 100644
--- a/arch/powerpc/include/asm/pgtable.h
+++ b/arch/powerpc/include/asm/pgtable.h
@@ -44,6 +44,8 @@ struct mm_struct;
 void set_ptes(struct mm_struct *mm, unsigned long addr, pte_t *ptep,
pte_t pte, unsigned int nr);
 #define set_ptes set_ptes
+void set_pte_at_unchecked(struct mm_struct *mm, unsigned long addr,
+ pte_t *ptep, pte_t pte);
 #define update_mmu_cache(vma, addr, ptep) \
update_mmu_cache_range(NULL, vma, addr, ptep, 1)
 
diff --git a/arch/powerpc/mm/book3s64/hash_pgtable.c 
b/arch/powerpc/mm/book3s64/hash_pgtable.c
index 988948d69bc1..871472f99a01 100644
--- a/arch/powerpc/mm/book3s64/hash_pgtable.c
+++ b/arch/powerpc/mm/book3s64/hash_pgtable.c
@@ -165,7 +165,7 @@ int hash__map_kernel_page(unsigned long ea, unsigned long 
pa, pgprot_t prot)
ptep = pte_alloc_kernel(pmdp, ea);
if (!ptep)
return -ENOMEM;
-   set_pte_at(&init_mm, ea, ptep, pfn_pte(pa >> PAGE_SHIFT, prot));
+   set_pte_at_unchecked(&init_mm, ea, ptep, pfn_pte(pa >> 
PAGE_SHIFT, prot));
} else {
/*
 * If the mm subsystem is not fully up, we cannot create a
diff --git a/arch/powerpc/mm/book3s64/pgtable.c 
b/arch/powerpc/mm/book3s64/pgtable.c
index 3438ab72c346..25082ab6018b 100644
--- a/arch/powerpc/mm/book3s64/pgtable.c
+++ b/arch/powerpc/mm/book3s64/pgtable.c
@@ -116,7 +116,7 @@ void set_pmd_at(struct mm_struct *mm, unsigned long addr,
WARN_ON(!(pmd_large(pmd)));
 #endif
trace_hugepage_set_pmd(addr, pmd_val(pmd));
-   return set_pte_at(mm, addr, pmdp_ptep(pmdp), pmd_pte(pmd));
+   return set_pte_at_unchecked(mm, addr, pmdp_ptep(pmdp), pmd_pte(pmd));
 }
 
 void set_pud_at(struct mm_struct *mm, unsigned long addr,
@@ -133,7 +133,7 @@ void set_pud_at(struct mm_struct *mm, unsigned long addr,
WARN_ON(!(pud_large(pud)));
 #endif
trace_hugepage_set_pud(addr, pud_val(pud));
-   return set_pte_at(mm, addr, pudp_ptep(pudp), pud_pte(pud));
+   return set_pte_at_unchecked(mm, addr, pudp_ptep(pudp), pud_pte(pud));
 }
 
 static void do_serialize(void *arg)
@@ -539,7 +539,7 @@ void ptep_modify_prot_commit(struct vm_area_struct *vma, 
unsigned long addr,
if (radix_enabled())
return radix__ptep_modify_prot_commit(vma, addr,
  ptep, old_pte, pte);
-   set_pte_at(vma->vm_mm, addr, ptep, pte);
+   set_pte_at_unchecked(vma->vm_mm, addr, ptep, pte);
 }
 
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
diff --git a/arch/powerpc/mm/book3s64/radix_pgtable.c 
b/arch/powerpc/mm/book3s64/radix_pgtable.c
index 46fa46ce6526..c661e42bb2f1 100644
--- a/arch/powerpc/mm/book3s64/radix_pgtable.c
+++ b/arch/powerpc/mm/book3s64/radix_pgtable.c
@@ -109,7 +109,7 @@ static int early_map_kernel_page(unsigned long ea, unsigned 
long pa,
ptep = pte_offset_kernel(pmdp, ea);
 
 set_the_pte:
-   set_pte_at(&init_mm, ea, ptep, pfn_pte(pfn, flags));
+   set_pte_at_unchecked(&init_mm, ea, ptep, pfn_pte(pfn, flags));
asm volatile("ptesync": : :"memory");
return 0;
 }
@@ -1522,7 +1522,7 @@ void radix__ptep_modify_prot_commit(struct vm_area_struct 
*vma,
(atomic_read(&mm->context.copros) > 0))
radix__flush_tlb_page(vma, addr);
 
-

[PATCH v10 06/12] Revert "mm/page_table_check: remove unused parameter in [__]page_table_check_pte_clear"

2024-03-12 Thread Rohan McLure

This reverts commit aa232204c4689427cefa55fe975692b57291523a.

Reinstate previously unused parameters for the purpose of supporting
powerpc platforms, as many do not encode user/kernel ownership of the
page in the pte, but instead in the address of the access.

Signed-off-by: Rohan McLure 
---
 arch/arm64/include/asm/pgtable.h |  2 +-
 arch/riscv/include/asm/pgtable.h |  2 +-
 arch/x86/include/asm/pgtable.h   |  4 ++--
 include/linux/page_table_check.h | 11 +++
 include/linux/pgtable.h  |  2 +-
 mm/page_table_check.c|  7 ---
 6 files changed, 16 insertions(+), 12 deletions(-)

diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index 4210d3b071ec..74bb81744df2 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -953,7 +953,7 @@ static inline pte_t ptep_get_and_clear(struct mm_struct *mm,
 {
pte_t pte = __pte(xchg_relaxed(&pte_val(*ptep), 0));
 
-   page_table_check_pte_clear(mm, pte);
+   page_table_check_pte_clear(mm, address, pte);
 
return pte;
 }
diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
index cf8e18f27649..2fa6625f591a 100644
--- a/arch/riscv/include/asm/pgtable.h
+++ b/arch/riscv/include/asm/pgtable.h
@@ -561,7 +561,7 @@ static inline pte_t ptep_get_and_clear(struct mm_struct *mm,
 {
pte_t pte = __pte(atomic_long_xchg((atomic_long_t *)ptep, 0));
 
-   page_table_check_pte_clear(mm, pte);
+   page_table_check_pte_clear(mm, address, pte);
 
return pte;
 }
diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index 26722f553c43..e8fd625de280 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -1283,7 +1283,7 @@ static inline pte_t ptep_get_and_clear(struct mm_struct 
*mm, unsigned long addr,
   pte_t *ptep)
 {
pte_t pte = native_ptep_get_and_clear(ptep);
-   page_table_check_pte_clear(mm, pte);
+   page_table_check_pte_clear(mm, addr, pte);
return pte;
 }
 
@@ -1299,7 +1299,7 @@ static inline pte_t ptep_get_and_clear_full(struct 
mm_struct *mm,
 * care about updates and native needs no locking
 */
pte = native_local_ptep_get_and_clear(ptep);
-   page_table_check_pte_clear(mm, pte);
+   page_table_check_pte_clear(mm, addr, pte);
} else {
pte = ptep_get_and_clear(mm, addr, ptep);
}
diff --git a/include/linux/page_table_check.h b/include/linux/page_table_check.h
index 0a6ebfa46a31..48721a4a2b84 100644
--- a/include/linux/page_table_check.h
+++ b/include/linux/page_table_check.h
@@ -14,7 +14,8 @@ extern struct static_key_true page_table_check_disabled;
 extern struct page_ext_operations page_table_check_ops;
 
 void __page_table_check_zero(struct page *page, unsigned int order);
-void __page_table_check_pte_clear(struct mm_struct *mm, pte_t pte);
+void __page_table_check_pte_clear(struct mm_struct *mm, unsigned long addr,
+ pte_t pte);
 void __page_table_check_pmd_clear(struct mm_struct *mm, unsigned long addr,
  pmd_t pmd);
 void __page_table_check_pud_clear(struct mm_struct *mm, unsigned long addr,
@@ -45,12 +46,13 @@ static inline void page_table_check_free(struct page *page, 
unsigned int order)
__page_table_check_zero(page, order);
 }
 
-static inline void page_table_check_pte_clear(struct mm_struct *mm, pte_t pte)
+static inline void page_table_check_pte_clear(struct mm_struct *mm,
+ unsigned long addr, pte_t pte)
 {
if (static_branch_likely(&page_table_check_disabled))
return;
 
-   __page_table_check_pte_clear(mm, pte);
+   __page_table_check_pte_clear(mm, addr, pte);
 }
 
 static inline void page_table_check_pmd_clear(struct mm_struct *mm,
@@ -121,7 +123,8 @@ static inline void page_table_check_free(struct page *page, 
unsigned int order)
 {
 }
 
-static inline void page_table_check_pte_clear(struct mm_struct *mm, pte_t pte)
+static inline void page_table_check_pte_clear(struct mm_struct *mm,
+ unsigned long addr, pte_t pte)
 {
 }
 
diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
index d0d1a0bbf905..89af325129f2 100644
--- a/include/linux/pgtable.h
+++ b/include/linux/pgtable.h
@@ -428,7 +428,7 @@ static inline pte_t ptep_get_and_clear(struct mm_struct *mm,
 {
pte_t pte = ptep_get(ptep);
pte_clear(mm, address, ptep);
-   page_table_check_pte_clear(mm, pte);
+   page_table_check_pte_clear(mm, address, pte);
return pte;
 }
 #endif
diff --git a/mm/page_table_check.c b/mm/page_table_check.c
index 7afaad9c6e6f..98cccee74b02 100644
--- a/mm/page_table_check.c
+++ b/mm/page_table_check.c
@@ -149,7 +149,8 @@ void __page_table_check_zero(struct page *page, unsigned 
int order)

[PATCH v10 04/12] Revert "mm/page_table_check: remove unused parameter in [__]page_table_check_pud_clear"

2024-03-12 Thread Rohan McLure

This reverts commit 931c38e16499a057e30a3033f4d6a9c242f0f156.

Reinstate previously unused parameters for the purpose of supporting
powerpc platforms, as many do not encode user/kernel ownership of the
page in the pte, but instead in the address of the access.

Signed-off-by: Rohan McLure 
---
 arch/x86/include/asm/pgtable.h   |  2 +-
 include/linux/page_table_check.h | 11 +++
 include/linux/pgtable.h  |  2 +-
 mm/page_table_check.c|  5 +++--
 4 files changed, 12 insertions(+), 8 deletions(-)

diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index 6a7dc2524344..3e7003b01c9d 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -1363,7 +1363,7 @@ static inline pud_t pudp_huge_get_and_clear(struct 
mm_struct *mm,
 {
pud_t pud = native_pudp_get_and_clear(pudp);
 
-   page_table_check_pud_clear(mm, pud);
+   page_table_check_pud_clear(mm, addr, pud);
 
return pud;
 }
diff --git a/include/linux/page_table_check.h b/include/linux/page_table_check.h
index 9243c920ed02..d01a00ffc1f9 100644
--- a/include/linux/page_table_check.h
+++ b/include/linux/page_table_check.h
@@ -16,7 +16,8 @@ extern struct page_ext_operations page_table_check_ops;
 void __page_table_check_zero(struct page *page, unsigned int order);
 void __page_table_check_pte_clear(struct mm_struct *mm, pte_t pte);
 void __page_table_check_pmd_clear(struct mm_struct *mm, pmd_t pmd);
-void __page_table_check_pud_clear(struct mm_struct *mm, pud_t pud);
+void __page_table_check_pud_clear(struct mm_struct *mm, unsigned long addr,
+ pud_t pud);
 void __page_table_check_ptes_set(struct mm_struct *mm, unsigned long addr,
 pte_t *ptep, pte_t pte, unsigned int nr);
 void __page_table_check_pmd_set(struct mm_struct *mm, unsigned long addr,
@@ -59,12 +60,13 @@ static inline void page_table_check_pmd_clear(struct 
mm_struct *mm, pmd_t pmd)
__page_table_check_pmd_clear(mm, pmd);
 }
 
-static inline void page_table_check_pud_clear(struct mm_struct *mm, pud_t pud)
+static inline void page_table_check_pud_clear(struct mm_struct *mm,
+ unsigned long addr, pud_t pud)
 {
if (static_branch_likely(&page_table_check_disabled))
return;
 
-   __page_table_check_pud_clear(mm, pud);
+   __page_table_check_pud_clear(mm, addr, pud);
 }
 
 static inline void page_table_check_ptes_set(struct mm_struct *mm,
@@ -125,7 +127,8 @@ static inline void page_table_check_pmd_clear(struct 
mm_struct *mm, pmd_t pmd)
 {
 }
 
-static inline void page_table_check_pud_clear(struct mm_struct *mm, pud_t pud)
+static inline void page_table_check_pud_clear(struct mm_struct *mm,
+ unsigned long addr, pud_t pud)
 {
 }
 
diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
index 5da04d056bc3..77877ae8abef 100644
--- a/include/linux/pgtable.h
+++ b/include/linux/pgtable.h
@@ -544,7 +544,7 @@ static inline pud_t pudp_huge_get_and_clear(struct 
mm_struct *mm,
pud_t pud = *pudp;
 
pud_clear(pudp);
-   page_table_check_pud_clear(mm, pud);
+   page_table_check_pud_clear(mm, address, pud);
 
return pud;
 }
diff --git a/mm/page_table_check.c b/mm/page_table_check.c
index 3a338fee6d00..a8c8fd7f06f8 100644
--- a/mm/page_table_check.c
+++ b/mm/page_table_check.c
@@ -171,7 +171,8 @@ void __page_table_check_pmd_clear(struct mm_struct *mm, 
pmd_t pmd)
 }
 EXPORT_SYMBOL(__page_table_check_pmd_clear);
 
-void __page_table_check_pud_clear(struct mm_struct *mm, pud_t pud)
+void __page_table_check_pud_clear(struct mm_struct *mm, unsigned long addr,
+ pud_t pud)
 {
if (&init_mm == mm)
return;
@@ -217,7 +218,7 @@ void __page_table_check_pud_set(struct mm_struct *mm, 
unsigned long addr,
if (&init_mm == mm)
return;
 
-   __page_table_check_pud_clear(mm, *pudp);
+   __page_table_check_pud_clear(mm, addr, *pudp);
if (pud_user_accessible_page(pud)) {
page_table_check_set(pud_pfn(pud), PUD_SIZE >> PAGE_SHIFT,
 pud_write(pud));
-- 
2.44.0

[PATCH v10 01/12] Revert "mm/page_table_check: remove unused parameter in [__]page_table_check_pud_set"

2024-03-12 Thread Rohan McLure

This reverts commit 6d144436d954311f2dbacb5bf7b084042448d83e.

Reinstate previously unused parameters for the purpose of supporting
powerpc platforms, as many do not encode user/kernel ownership of the
page in the pte, but instead in the address of the access.

riscv: Respect change to delete mm, addr parameters from __set_pte_at()

This commit also changed calls to __set_pte_at() to use fewer parameters
on riscv. Keep that change rather than reverting it, as the signature of
__set_pte_at() is changed in a different commit.

Signed-off-by: Rohan McLure 
---
 arch/arm64/include/asm/pgtable.h |  2 +-
 arch/riscv/include/asm/pgtable.h |  2 +-
 arch/x86/include/asm/pgtable.h   |  2 +-
 include/linux/page_table_check.h | 11 +++
 mm/page_table_check.c|  3 ++-
 5 files changed, 12 insertions(+), 8 deletions(-)

diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index 79ce70fbb751..a965b59401b3 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -548,7 +548,7 @@ static inline void set_pmd_at(struct mm_struct *mm, 
unsigned long addr,
 static inline void set_pud_at(struct mm_struct *mm, unsigned long addr,
  pud_t *pudp, pud_t pud)
 {
-   page_table_check_pud_set(mm, pudp, pud);
+   page_table_check_pud_set(mm, addr, pudp, pud);
return __set_pte_at(mm, addr, (pte_t *)pudp, pud_pte(pud),
PUD_SIZE >> PAGE_SHIFT);
 }
diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
index 6066822e7396..4fc99dd3bb42 100644
--- a/arch/riscv/include/asm/pgtable.h
+++ b/arch/riscv/include/asm/pgtable.h
@@ -717,7 +717,7 @@ static inline void set_pmd_at(struct mm_struct *mm, 
unsigned long addr,
 static inline void set_pud_at(struct mm_struct *mm, unsigned long addr,
pud_t *pudp, pud_t pud)
 {
-   page_table_check_pud_set(mm, pudp, pud);
+   page_table_check_pud_set(mm, addr, pudp, pud);
return __set_pte_at((pte_t *)pudp, pud_pte(pud));
 }
 
diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index 9d077bca6a10..a9b1e8e6d4b9 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -1252,7 +1252,7 @@ static inline void set_pmd_at(struct mm_struct *mm, 
unsigned long addr,
 static inline void set_pud_at(struct mm_struct *mm, unsigned long addr,
  pud_t *pudp, pud_t pud)
 {
-   page_table_check_pud_set(mm, pudp, pud);
+   page_table_check_pud_set(mm, addr, pudp, pud);
native_set_pud(pudp, pud);
 }
 
diff --git a/include/linux/page_table_check.h b/include/linux/page_table_check.h
index 6722941c7cb8..d188428512f5 100644
--- a/include/linux/page_table_check.h
+++ b/include/linux/page_table_check.h
@@ -20,7 +20,8 @@ void __page_table_check_pud_clear(struct mm_struct *mm, pud_t 
pud);
 void __page_table_check_ptes_set(struct mm_struct *mm, pte_t *ptep, pte_t pte,
unsigned int nr);
 void __page_table_check_pmd_set(struct mm_struct *mm, pmd_t *pmdp, pmd_t pmd);
-void __page_table_check_pud_set(struct mm_struct *mm, pud_t *pudp, pud_t pud);
+void __page_table_check_pud_set(struct mm_struct *mm, unsigned long addr,
+   pud_t *pudp, pud_t pud);
 void __page_table_check_pte_clear_range(struct mm_struct *mm,
unsigned long addr,
pmd_t pmd);
@@ -83,13 +84,14 @@ static inline void page_table_check_pmd_set(struct 
mm_struct *mm, pmd_t *pmdp,
__page_table_check_pmd_set(mm, pmdp, pmd);
 }
 
-static inline void page_table_check_pud_set(struct mm_struct *mm, pud_t *pudp,
+static inline void page_table_check_pud_set(struct mm_struct *mm,
+   unsigned long addr, pud_t *pudp,
pud_t pud)
 {
if (static_branch_likely(&page_table_check_disabled))
return;
 
-   __page_table_check_pud_set(mm, pudp, pud);
+   __page_table_check_pud_set(mm, addr, pudp, pud);
 }
 
 static inline void page_table_check_pte_clear_range(struct mm_struct *mm,
@@ -134,7 +136,8 @@ static inline void page_table_check_pmd_set(struct 
mm_struct *mm, pmd_t *pmdp,
 {
 }
 
-static inline void page_table_check_pud_set(struct mm_struct *mm, pud_t *pudp,
+static inline void page_table_check_pud_set(struct mm_struct *mm,
+   unsigned long addr, pud_t *pudp,
pud_t pud)
 {
 }
diff --git a/mm/page_table_check.c b/mm/page_table_check.c
index af69c3c8f7c2..75167537ebd7 100644
--- a/mm/page_table_check.c
+++ b/mm/page_table_check.c
@@ -210,7 +210,8 @@ void __page_table_check_pmd_set(struct mm_struct *mm, pmd_t 
*pmdp, pmd_t pmd)
 }
 EXPORT_SYMBOL(__page_table_check_pmd_set);
 
-void __page_table_check_pud_set(struct mm_struct *mm, pud_t *pudp

[PATCH v10 09/12] powerpc: mm: Add common pud_pfn stub for all platforms

2024-03-12 Thread Rohan McLure

Prior to this commit, pud_pfn was implemented with BUILD_BUG as the inline
function for 64-bit Book3S systems but is never included, as its
invocations in generic code are guarded by calls to pud_devmap which return
zero on such systems. A future patch will provide support for page table
checks, the generic code for which depends on a pud_pfn stub being
implemented, even while the patch will not interact with puds directly.

Remove the 64-bit Book3S stub and define pud_pfn to warn on all
platforms. pud_pfn may be defined properly on a per-platform basis
should it grow real usages in future.

Signed-off-by: Rohan McLure 
---
 arch/powerpc/include/asm/pgtable.h | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/arch/powerpc/include/asm/pgtable.h 
b/arch/powerpc/include/asm/pgtable.h
index 0c0ffbe7a3b5..13f661831333 100644
--- a/arch/powerpc/include/asm/pgtable.h
+++ b/arch/powerpc/include/asm/pgtable.h
@@ -213,6 +213,20 @@ static inline bool arch_supports_memmap_on_memory(unsigned 
long vmemmap_size)
 
 #endif /* CONFIG_PPC64 */
 
+/*
+ * Currently only consumed by page_table_check_pud_{set,clear}. Since clears
+ * and sets to page table entries at any level are done through
+ * page_table_check_pte_{set,clear}, provide stub implementation.
+ */
+#ifndef pud_pfn
+#define pud_pfn pud_pfn
+static inline int pud_pfn(pud_t pud)
+{
+   WARN_ONCE(1, "pud: platform does not use pud entries directly");
+   return 0;
+}
+#endif
+
 #endif /* __ASSEMBLY__ */
 
 #endif /* _ASM_POWERPC_PGTABLE_H */
-- 
2.44.0

Re: kexec verbose dumps with 6.8 [was: [PATCH v4 1/7] kexec_file: add kexec_file flag to control debug printing]

2024-03-12 Thread Baoquan He

Hi Jiri,

On 03/12/24 at 10:58am, Jiri Slaby wrote:
> On 13. 12. 23, 6:57, Baoquan He wrote:
 ... snip...
> > --- a/include/linux/kexec.h
> > +++ b/include/linux/kexec.h
> ...
> > @@ -500,6 +500,13 @@ static inline int crash_hotplug_memory_support(void) { 
> > return 0; }
> >   static inline unsigned int crash_get_elfcorehdr_size(void) { return 0; }
> >   #endif
> > +extern bool kexec_file_dbg_print;
> > +
> > +#define kexec_dprintk(fmt, ...)\
> > +   printk("%s" fmt,\
> > +  kexec_file_dbg_print ? KERN_INFO : KERN_DEBUG,   \
> > +  ##__VA_ARGS__)
> 
> This means you dump it _always_. Only with different levels.

It dumped always too with pr_debug() before, I just add a switch to
control it's pr_info() or pr_debug().

> 
> And without any prefix whatsoever, so people see bloat like this in their
> log now:
> [  +0.01] 1000-0009 (1)
> [  +0.02] 7f96d000-7f97efff (3)
> [  +0.02] 0080-00807fff (4)
> [  +0.01] 0080b000-0080bfff (4)
> [  +0.02] 0081-008f (4)
> [  +0.01] 7f97f000-7f9fefff (4)
> [  +0.01] 7ff0-7fff (4)
> [  +0.02] -0fff (2)

On which arch are you seeing this? There should be one line above these
range printing to tell what they are, like:

E820 memmap:
-0009a3ff (1)
0009a400-0009 (2)
000e-000f (2)
0010-6ff83fff (1)
6ff84000-7ac50fff (2)

> 
> without actually knowing what that is.
> 
> There should be nothing logged if that is not asked for and especially if
> kexec load went fine, right?

Right. Before this patch, those pr_debug() were already there. You need
enable them to print out like add '#define DEBUG' in *.c file, or enable
the dynamic debugging of the file or function. With this patch applied,
you only need specify '-d' when you execute kexec command with
kexec_file load interface, like:

kexec -s -l -d /boot/vmlinuz-.img --initrd xxx.img --reuse-cmdline

For kexec_file load, it is not logging if not specifying '-d', unless
you take way to make pr_debug() work in that file.

> 
> Can this be redesigned, please?

Sure, after making clear what's going on with this, I will try.

> 
> Actually what was wrong on the pr_debug()s? Can you simply turn them on from
> the kernel when -d is passed to kexec instead of all this?

Joe suggested this during v1 reviewing:
https://lore.kernel.org/all/1e7863ec4e4ab10b84fd0e64f30f8464d2e484a3.ca...@perches.com/T/#u

> 
> ...
> > --- a/kernel/kexec_core.c
> > +++ b/kernel/kexec_core.c
> > @@ -52,6 +52,8 @@ atomic_t __kexec_lock = ATOMIC_INIT(0);
> >   /* Flag to indicate we are going to kexec a new kernel */
> >   bool kexec_in_progress = false;
> > +bool kexec_file_dbg_print;
> 
> Ugh, and a global flag for this?

Yeah, kexec_file_dbg_print records if '-d' is specified when 'kexec'
command executed. Anything wrong with the global flag?

Thanks
Baoquan

[PATCH v3 07/12] powerpc: Use initializer for struct vm_unmapped_area_info

2024-03-12 Thread Rick Edgecombe

Future changes will need to add a new member to struct
vm_unmapped_area_info. This would cause trouble for any call site that
doesn't initialize the struct. Currently every caller sets each member
manually, so if new members are added they will be uninitialized and the
core code parsing the struct will see garbage in the new member.

It could be possible to initialize the new member manually to 0 at each
call site. This and a couple other options were discussed, and a working
consensus (see links) was that in general the best way to accomplish this
would be via static initialization with designated member initiators.
Having some struct vm_unmapped_area_info instances not zero initialized
will put those sites at risk of feeding garbage into vm_unmapped_area() if
the convention is to zero initialize the struct and any new member addition
misses a call site that initializes each member manually.

It could be possible to leave the code mostly untouched, and just change
the line:
struct vm_unmapped_area_info info
to:
struct vm_unmapped_area_info info = {};

However, that would leave cleanup for the members that are manually set
to zero, as it would no longer be required.

So to be reduce the chance of bugs via uninitialized members, instead
simply continue the process to initialize the struct this way tree wide.
This will zero any unspecified members. Move the member initializers to the
struct declaration when they are known at that time. Leave the members out
that were manually initialized to zero, as this would be redundant for
designated initializers.

Signed-off-by: Rick Edgecombe 
Acked-by: Michael Ellerman 
Cc: Michael Ellerman 
Cc: Nicholas Piggin 
Cc: Christophe Leroy 
Cc: Aneesh Kumar K.V 
Cc: Naveen N. Rao 
Cc: linuxppc-dev@lists.ozlabs.org
Link: https://lore.kernel.org/lkml/202402280912.33AEE7A9CF@keescook/#t
Link: 
https://lore.kernel.org/lkml/j7bfvig3gew3qruouxrh7z7ehjjafrgkbcmg6tcghhfh3rhmzi@wzlcoecgy5rs/
---
v3:
 - Fixed spelling errors in log
 - Be consistent about field vs member in log

Hi,

This patch was split and refactored out of a tree-wide change [0] to just
zero-init each struct vm_unmapped_area_info. The overall goal of the
series is to help shadow stack guard gaps. Currently, there is only one
arch with shadow stacks, but two more are in progress. It is compile tested
only.

There was further discussion that this method of initializing the structs
while nice in some ways has a greater risk of introducing bugs in some of
the more complicated callers. Since this version was reviewed my arch
maintainers already, leave it as was already acknowledged.

Thanks,

Rick

[0] 
https://lore.kernel.org/lkml/20240226190951.3240433-6-rick.p.edgeco...@intel.com/
---
 arch/powerpc/mm/book3s64/slice.c | 23 ---
 1 file changed, 12 insertions(+), 11 deletions(-)

diff --git a/arch/powerpc/mm/book3s64/slice.c b/arch/powerpc/mm/book3s64/slice.c
index c0b58afb9a47..6c7ac8c73a6c 100644
--- a/arch/powerpc/mm/book3s64/slice.c
+++ b/arch/powerpc/mm/book3s64/slice.c
@@ -282,12 +282,12 @@ static unsigned long slice_find_area_bottomup(struct 
mm_struct *mm,
 {
int pshift = max_t(int, mmu_psize_defs[psize].shift, PAGE_SHIFT);
unsigned long found, next_end;
-   struct vm_unmapped_area_info info;
-
-   info.flags = 0;
-   info.length = len;
-   info.align_mask = PAGE_MASK & ((1ul << pshift) - 1);
-   info.align_offset = 0;
+   struct vm_unmapped_area_info info = {
+   .flags = 0,
+   .length = len,
+   .align_mask = PAGE_MASK & ((1ul << pshift) - 1),
+   .align_offset = 0
+   };
/*
 * Check till the allow max value for this mmap request
 */
@@ -326,13 +326,14 @@ static unsigned long slice_find_area_topdown(struct 
mm_struct *mm,
 {
int pshift = max_t(int, mmu_psize_defs[psize].shift, PAGE_SHIFT);
unsigned long found, prev;
-   struct vm_unmapped_area_info info;
+   struct vm_unmapped_area_info info = {
+   .flags = VM_UNMAPPED_AREA_TOPDOWN,
+   .length = len,
+   .align_mask = PAGE_MASK & ((1ul << pshift) - 1),
+   .align_offset = 0
+   };
unsigned long min_addr = max(PAGE_SIZE, mmap_min_addr);
 
-   info.flags = VM_UNMAPPED_AREA_TOPDOWN;
-   info.length = len;
-   info.align_mask = PAGE_MASK & ((1ul << pshift) - 1);
-   info.align_offset = 0;
/*
 * If we are trying to allocate above DEFAULT_MAP_WINDOW
 * Add the different to the mmap_base.
-- 
2.34.1

Re: [PATCH 04/14] kunit: Add documentation for warning backtrace suppression API

2024-03-12 Thread Kees Cook

On Tue, Mar 12, 2024 at 10:02:59AM -0700, Guenter Roeck wrote:
> Document API functions for suppressing warning backtraces.
> 
> Signed-off-by: Guenter Roeck 

Reviewed-by: Kees Cook 

-- 
Kees Cook

Re: [PATCH 03/14] kunit: Add test cases for backtrace warning suppression

2024-03-12 Thread Kees Cook

On Tue, Mar 12, 2024 at 10:02:58AM -0700, Guenter Roeck wrote:
> Add unit tests to verify that warning backtrace suppression works.
> 
> If backtrace suppression does _not_ work, the unit tests will likely
> trigger unsuppressed backtraces, which should actually help to get
> the affected architectures / platforms fixed.
> 
> Signed-off-by: Guenter Roeck 

Reviewed-by: Kees Cook 

-- 
Kees Cook

Re: [PATCH 02/14] kunit: bug: Count suppressed warning backtraces

2024-03-12 Thread Kees Cook

On Tue, Mar 12, 2024 at 10:02:57AM -0700, Guenter Roeck wrote:
> Count suppressed warning backtraces to enable code which suppresses
> warning backtraces to check if the expected backtrace(s) have been
> observed.
> 
> Using atomics for the backtrace count resulted in build errors on some
> architectures due to include file recursion, so use a plain integer
> for now.
> 
> Signed-off-by: Guenter Roeck 
> ---
>  include/kunit/bug.h | 7 ++-
>  lib/kunit/bug.c | 4 +++-
>  2 files changed, 9 insertions(+), 2 deletions(-)
> 
> diff --git a/include/kunit/bug.h b/include/kunit/bug.h
> index 1e34da961599..2097a854ac8c 100644
> --- a/include/kunit/bug.h
> +++ b/include/kunit/bug.h
> @@ -20,6 +20,7 @@
>  struct __suppressed_warning {
>   struct list_head node;
>   const char *function;
> + int counter;

Thanks for adding a counter!

>  };
>  
>  void __start_suppress_warning(struct __suppressed_warning *warning);
> @@ -28,7 +29,7 @@ bool __is_suppressed_warning(const char *function);
>  
>  #define DEFINE_SUPPRESSED_WARNING(func)  \
>   struct __suppressed_warning __kunit_suppress_##func = \
> - { .function = __stringify(func) }
> + { .function = __stringify(func), .counter = 0 }
>  
>  #define START_SUPPRESSED_WARNING(func) \
>   __start_suppress_warning(&__kunit_suppress_##func)
> @@ -39,12 +40,16 @@ bool __is_suppressed_warning(const char *function);
>  #define IS_SUPPRESSED_WARNING(func) \
>   __is_suppressed_warning(func)
>  
> +#define SUPPRESSED_WARNING_COUNT(func) \
> + (__kunit_suppress_##func.counter)
> +
>  #else /* CONFIG_KUNIT */
>  
>  #define DEFINE_SUPPRESSED_WARNING(func)
>  #define START_SUPPRESSED_WARNING(func)
>  #define END_SUPPRESSED_WARNING(func)
>  #define IS_SUPPRESSED_WARNING(func) (false)
> +#define SUPPRESSED_WARNING_COUNT(func) (0)
>  
>  #endif /* CONFIG_KUNIT */
>  #endif /* __ASSEMBLY__ */
> diff --git a/lib/kunit/bug.c b/lib/kunit/bug.c
> index f93544d7a9d1..13b3d896c114 100644
> --- a/lib/kunit/bug.c
> +++ b/lib/kunit/bug.c
> @@ -32,8 +32,10 @@ bool __is_suppressed_warning(const char *function)
>   return false;
>  
>   list_for_each_entry(warning, &suppressed_warnings, node) {
> - if (!strcmp(function, warning->function))
> + if (!strcmp(function, warning->function)) {
> + warning->counter++;
>   return true;
> + }
>   }
>   return false;
>  }
> -- 
> 2.39.2
> 

Reviewed-by: Kees Cook 

-- 
Kees Cook

Re: [PATCH 01/14] bug/kunit: Core support for suppressing warning backtraces

2024-03-12 Thread Kees Cook

On Tue, Mar 12, 2024 at 10:02:56AM -0700, Guenter Roeck wrote:
> Some unit tests intentionally trigger warning backtraces by passing
> bad parameters to API functions. Such unit tests typically check the
> return value from those calls, not the existence of the warning backtrace.
> 
> Such intentionally generated warning backtraces are neither desirable
> nor useful for a number of reasons.
> - They can result in overlooked real problems.
> - A warning that suddenly starts to show up in unit tests needs to be
>   investigated and has to be marked to be ignored, for example by
>   adjusting filter scripts. Such filters are ad-hoc because there is
>   no real standard format for warnings. On top of that, such filter
>   scripts would require constant maintenance.
> 
> One option to address problem would be to add messages such as "expected
> warning backtraces start / end here" to the kernel log.  However, that
> would again require filter scripts, it might result in missing real
> problematic warning backtraces triggered while the test is running, and
> the irrelevant backtrace(s) would still clog the kernel log.
> 
> Solve the problem by providing a means to identify and suppress specific
> warning backtraces while executing test code.
> 
> Cc: Dan Carpenter 
> Cc: Daniel Diaz 
> Cc: Naresh Kamboju 
> Cc: Kees Cook 
> Signed-off-by: Guenter Roeck 

Yup, this looks fine to me.

Reviewed-by: Kees Cook 

-- 
Kees Cook

Re: [PATCH RFC 00/13] mm/treewide: Remove pXd_huge() API

2024-03-12 Thread Peter Xu

Hi, Christophe,

On Mon, Mar 11, 2024 at 09:58:47AM +, Christophe Leroy wrote:
> Hi Peter, and nice job you are doing in cleaning up things around _huge 
> stuff.

Thanks.  I appreciate your help along the way on Power.

> 
> One thing that might be worth looking at also at some point is the mess 
> around pmd_clear_huge() and pud_clear_huge().
> 
> I tried to clean things up with commit c742199a014d ("mm/pgtable: add 
> stubs for {pmd/pub}_{set/clear}_huge") but it was reverted because of 
> arm64 by commit d8a719059b9d ("Revert "mm/pgtable: add stubs for 
> {pmd/pub}_{set/clear}_huge"")
> 
> So now powerpc/8xx has to implement pmd_clear_huge() and 
> pud_clear_huge() allthough 8xx page hierarchy only has 2 levels.

Those are so far out of my radar, as my focus right now is still more on
hugetlbfs relevant side of things, while kernel mappings are not yet
directly involved in hugetlbfs, even though they're still huge mappings.

It's a pity to know that broke arm and got reverted, as that looks like a
good thing to clean it up if ever possible.  I tend to agree with you that
it seems for 3lvl we should define pgd_huge*() instead of pud_huge*(), so
that it looks like the only way to provide such a treewide clean API is to
properly define those APIs for aarch64, and define different pud helpers
for either 3/4 levels.  But I confess I don't think I fully digested all
the bits.

Thanks,

-- 
Peter Xu

Re: [RFC PATCH v2 3/3] tpm: of: If available use linux,sml-log to get the log and its size

2024-03-12 Thread Stefan Berger





On 3/12/24 11:43, Jarkko Sakkinen wrote:

On Mon Mar 11, 2024 at 10:33 PM EET, Stefan Berger wrote:



On 3/11/24 16:25, Jarkko Sakkinen wrote:

On Mon Mar 11, 2024 at 3:20 PM EET, Stefan Berger wrote:

If linux,sml-log is available use it to get the TPM log rather than the
pointer found in linux,sml-base. This resolves an issue on PowerVM and KVM
on Power where after a kexec the memory pointed to by linux,sml-base may
have become inaccessible or corrupted. Also, linux,sml-log has replaced
linux,sml-base and linux,sml-size on these two platforms.

Keep the handling of linux,sml-base/sml-size for powernv platforms that
provide the two properties via skiboot.

Fixes: c5df39262dd5 ("drivers/char/tpm: Add securityfs support for event log")
Signed-off-by: Stefan Berger 


I'm worried about not being up to date and instead using "cached" values
when verifying anything from a security chip. Does this guarantee that
TPM log is corrupted and will not get updated somehow?



What do you mean 'guarantee that TPM log is corrupted'?


I presume that this is for power architecture but I have no idea what


Yes it is for Power. From commit message above: "This resolves an issue 
on PowerVM and KVM on Power where after a kexec the memory pointed to by 
linux,sml-base may have become inaccessible or corrupted."



leads log being corrupted, and is the scope all of that that arch or
some subset of CPUs.


Every CPU will see a corrupted log.



The commit message is not very detailed on kexec scenario. It more like


I guess what is missing in the message that the buffer was not properly 
protected during the kexec and may have been overwritten for example 
since it was mistakenly assumed to be free memory?



assumes that reader knows all the detail beforehand. So probably this
will start to make sense once the backing story is improved, that's
all.


The TPM was handed over from the firmware to Linux and the firmware then
freed all memory associated with the log and will then not create a new
log or touch the TPM or do anything that would require an update to the
handed-over log. Linux also does not append to the firmware log. So
whatever we now find in linux,sml-log would be the latest firmware log
and the state of the 'firmware PCRs' computed from it should correspond
to the current state of the 'firmware PCRs'.


So on what CPU this happens and is there any bigger picture for that
design choice in the firmware?


The firmware provides a call sml-handover, which hands over the TPM log 
to the caller and at the same time frees the log. You cannot call the 
firmware a 2nd time for the log.




If it is a firmware bug, this should emit FW_BUG log message. If not,
then this commit message should provide the necessary context.


It's not a firmware bug. The issue is that the buffer holding the TPM 
log is not properly carried across a kexec soft reboot and may for 
example have been overwritten since it was assumed to be free memory.




BR, Jarkko

Re: [PATCH 1/2] powerpc/prom_init: Replace linux,sml-base/sml-size with linux,sml-log

2024-03-12 Thread Stefan Berger





On 3/12/24 12:22, Rob Herring wrote:

On Tue, Mar 12, 2024 at 09:32:50PM +1100, Michael Ellerman wrote:

Rob Herring  writes:

On Fri, Mar 08, 2024 at 07:23:35AM -0500, Stefan Berger wrote:

On 3/7/24 16:52, Rob Herring wrote:

On Thu, Mar 07, 2024 at 09:41:31PM +1100, Michael Ellerman wrote:

Stefan Berger  writes:

linux,sml-base holds the address of a buffer with the TPM log. This
buffer may become invalid after a kexec and therefore embed the whole TPM
log in linux,sml-log. This helps to protect the log since it is properly
carried across a kexec with both of the kexec syscalls.

Signed-off-by: Stefan Berger 
---
   arch/powerpc/kernel/prom_init.c | 8 ++--
   1 file changed, 2 insertions(+), 6 deletions(-)







Also adding the new linux,sml-log property should be accompanied by a
change to the device tree binding.

The syntax is not very obvious to me, but possibly something like?

diff --git a/Documentation/devicetree/bindings/tpm/ibm,vtpm.yaml 
b/Documentation/devicetree/bindings/tpm/ibm,vtpm.yaml
index 50a3fd31241c..cd75037948bc 100644
--- a/Documentation/devicetree/bindings/tpm/ibm,vtpm.yaml
+++ b/Documentation/devicetree/bindings/tpm/ibm,vtpm.yaml
@@ -74,8 +74,6 @@ required:
 - ibm,my-dma-window
 - ibm,my-drc-index
 - ibm,loc-code
-  - linux,sml-base
-  - linux,sml-size


Dropping required properties is an ABI break. If you drop them, an older
OS version won't work.


1) On PowerVM and KVM on Power these two properties were added in the Linux
code. I replaced the creation of these properties with creation of
linux,sml-log (1/2 in this series). I also replaced the handling of
these two (2/2 in this series) for these two platforms but leaving it for
powernv systems where the firmware creates these.


Okay, I guess your case is not a ABI break if the kernel is populating
it and the same kernel consumes it.

You failed to answer my question on using /reserved-memory. Again, why
can't that be used? That is the standard way we prevent chunks of memory
from being clobbered.


Yes I think that would mostly work. I don't see support for
/reserved-memory in kexec-tools, so that would need fixing I think.

My logic was that the memory is not special. It's just a buffer we
allocated during early boot to store the log. There isn't anything else
in the system that relies on that memory remaining untouched. So it
seemed cleaner to just put the log in the device tree, rather than a
pointer to it.


My issue is we already have 2 ways to describe the log to the OS. I
don't see a good reason to add a 3rd way. (Though it might actually be a
4th way, because the chosen property for the last attempt was accepted
to dtschema yet the code has been abandoned.)


I have a revert for this in my tree...



If you put the log into the DT, then the memory for the log remains
untouched too because the FDT remains untouched. For reserved-memory
regions, the OS is free to free them if it knows what the region is and
that it is no longer needed. IOW, if freeing the log memory is desired,
then the suggested approach doesn't work.


The log, once embedded in the FDT, would stay there forever. The TPM 
subsystem looks at it when initializing and will look at it again when 
initializing after a kexec soft reboot.






Having the log external to the device tree creates several problems,
like the crash kernel region colliding with it, it being clobbered by
kexec, etc.


We have multiple regions to pass/maintain thru kexec, so how does having
one less really matter?


I had tried it with the newer kexec syscall. Yes, there was a way of 
doing it but the older kexec call is still around and since that one is 
also still being used the problems with corrupted memory for example 
still persisted. So that's why we now embed it in the FDT because both 
syscalls carry that across unharmed.




Rob

Re: [PATCH 2/2] tpm: of: If available Use linux,sml-log to get the log and its size

2024-03-12 Thread Stefan Berger





On 3/12/24 11:50, Jarkko Sakkinen wrote:

On Tue Mar 12, 2024 at 12:35 PM EET, Michael Ellerman wrote:

Stefan Berger  writes:

On 3/7/24 15:00, Jarkko Sakkinen wrote:

On Thu Mar 7, 2024 at 9:57 PM EET, Jarkko Sakkinen wrote:

in short summary: s/Use/use/

On Wed Mar 6, 2024 at 5:55 PM EET, Stefan Berger wrote:

If linux,sml-log is available use it to get the TPM log rather than the
pointer found in linux,sml-base. This resolves an issue on PowerVM and KVM
on Power where after a kexec the memory pointed to by linux,sml-base may
have been corrupted. Also, linux,sml-log has replaced linux,sml-base and
linux,sml-size on these two platforms.

Signed-off-by: Stefan Berger 


So shouldn't this have a fixed tag, or not?


In English: do we want this to be backported to stable kernel releases or not?


Ideally, yes. v3 will have 3 patches and all 3 of them will have to be
backported *together* and not applied otherwise if any one of them
fails. Can this be 'guaranteed'?


You can use Depends-on:  to indicate the relationship.

cheers


Thanks, I've missed depends-on tag.

Stefan, please add also "Cc: sta...@vger.kernel.org" just to make sure
that I don't forget to add it.


Yeah, once we know whether this is the way forward or not... I posted v2 
as RFC to figure this out.


v2's 2/3 patch will only apply to 6.8. To avoid any inconsistencies 
between code and bindings we cannot even go further back with this 
series (IFF it's the way forward at all). So I am inclined to remove the 
Fixes tags. I also find little under Documentation about the Depends-on 
tag and what it's supposed to be formatted like -- a commit hash of 1/3 
appearing in 2/3 for example? The commit hash is not stable at this 
point so I couldn't created it.



Right, and since these are so small scoped commits, and bug fixes in
particular, it is also possible to do PR during the release cycle.

BR, Jarkko

Re: [PATCH v5 02/10] locking/mutex: introduce devm_mutex_init

2024-03-12 Thread Christophe Leroy



Le 12/03/2024 à 16:30, George Stark a écrit :
> [Vous ne recevez pas souvent de courriers de gnst...@salutedevices.com. 
> Découvrez pourquoi ceci est important à 
> https://aka.ms/LearnAboutSenderIdentification ]
> 
> Hello Christophe
> 
> On 3/12/24 14:51, Christophe Leroy wrote:
>>
>>
>> Le 12/03/2024 à 12:39, George Stark a écrit :
>>> [Vous ne recevez pas souvent de courriers de gnst...@salutedevices.com.
>>> Découvrez pourquoi ceci est important à
>>> https://aka.ms/LearnAboutSenderIdentification ]
> 
> ...
> 
>> You don't need that inline function, just change debug_devm_mutex_init()
>> to __devm_mutex_init().
> 
> I stuck to debug_* name because mutex-debug.c already exports a set
> of debug_ calls so...

Ah yes you are right I didn't see that. On the other hand all those 
debug_mutex_* are used by kernel/locking/mutex.c.
Here we really don't want our new function to be called by anything else 
than devm_mutex_init so by calling it __devm_mutex_init() you kind of 
tie them together.

> Well it's not essential anyway. Here's the next try:

Looks good to me.

> 
> diff --git a/include/linux/mutex.h b/include/linux/mutex.h
> index 67edc4ca2bee..537b5ea18ceb 100644
> --- a/include/linux/mutex.h
> +++ b/include/linux/mutex.h
> @@ -22,6 +22,8 @@
>   #include 
>   #include 
> 
> +struct device;
> +
>   #ifdef CONFIG_DEBUG_LOCK_ALLOC
>   # define __DEP_MAP_MUTEX_INITIALIZER(lockname)    \
>     , .dep_map = {  \
> @@ -117,6 +119,29 @@ do 
> {   \
>   } while (0)
>   #endif /* CONFIG_PREEMPT_RT */
> 
> +#ifdef CONFIG_DEBUG_MUTEXES
> +
> +int __devm_mutex_init(struct device *dev, struct mutex *lock);
> +
> +#else
> +
> +static inline int __devm_mutex_init(struct device *dev, struct mutex 
> *lock)
> +{
> +   /*
> +    * When CONFIG_DEBUG_MUTEXES is off mutex_destroy is just a nop so
> +    * no really need to register it in devm subsystem.
> +    */
> +   return 0;
> +}
> +
> +#endif
> +
> +#define devm_mutex_init(dev, mutex)    \
> +({ \
> +   mutex_init(mutex);  \
> +   __devm_mutex_init(dev, mutex);  \
> +})
> +
>   /*
>    * See kernel/locking/mutex.c for detailed documentation of these APIs.
>    * Also see Documentation/locking/mutex-design.rst.
> diff --git a/kernel/locking/mutex-debug.c b/kernel/locking/mutex-debug.c
> index bc8abb8549d2..6aa77e3dc82e 100644
> --- a/kernel/locking/mutex-debug.c
> +++ b/kernel/locking/mutex-debug.c
> @@ -19,6 +19,7 @@
>   #include 
>   #include 
>   #include 
> +#include 
> 
>   #include "mutex.h"
> 
> @@ -89,6 +90,16 @@ void debug_mutex_init(struct mutex *lock, const char
> *name,
>     lock->magic = lock;
>   }
> 
> +static void devm_mutex_release(void *res)
> +{
> +   mutex_destroy(res);
> +}
> +
> +int __devm_mutex_init(struct device *dev, struct mutex *lock)
> +{
> +   return devm_add_action_or_reset(dev, devm_mutex_release, lock);
> +}
> +
>   /***
>    * mutex_destroy - mark a mutex unusable
>    * @lock: the mutex to be destroyed
> -- 
> 2.25.1
> 
> 
> 
>>> +
>>> +#else
>>> +
>>> +static inline int __devm_mutex_init(struct device *dev, struct mutex
>>> *lock)
>>> +{
>>> +   /*
>>> +   * When CONFIG_DEBUG_MUTEXES is off mutex_destroy is just a 
>>> nop so
>>> +   * no really need to register it in devm subsystem.
>>> +   */
>>
>> Don't know if it is because tabs are replaced by blanks in you email,
>> but the stars should be aligned
> 
> Ack
> 
> 
> -- 
> Best regards
> George

[RFC PATCH 3/3] pseries/iommu: Enable DDW for VFIO TCE create

2024-03-12 Thread Shivaprasad G Bhat

The commit 9d67c9433509 ("powerpc/iommu: Add \"borrowing\"
iommu_table_group_ops") implemented the "borrow" mechanism for
the pSeries SPAPR TCE. It did implement this support partially
that it left out creating the DDW if not present already.

The patch here attempts to fix the missing gaps.
 - Expose the DDW info to user by collecting it during probe.
 - Create the window and the iommu table if not present during
   VFIO_SPAPR_TCE_CREATE.
 - Remove and recreate the window if the pageshift and window sizes
   do not match.
 - Restore the original window in enable_ddw() if the user had
   created/modified the DDW. As there is preference for DIRECT mapping
   on the host driver side, the user created window is removed.

The changes work only for the non-SRIOV-VF scenarios for PEs having
2 DMA windows.

Signed-off-by: Shivaprasad G Bhat 
---
 arch/powerpc/include/asm/iommu.h   |3 
 arch/powerpc/kernel/iommu.c|7 -
 arch/powerpc/platforms/pseries/iommu.c |  362 +++-
 3 files changed, 360 insertions(+), 12 deletions(-)

diff --git a/arch/powerpc/include/asm/iommu.h b/arch/powerpc/include/asm/iommu.h
index 744cc5fc22d3..fde174122844 100644
--- a/arch/powerpc/include/asm/iommu.h
+++ b/arch/powerpc/include/asm/iommu.h
@@ -110,6 +110,7 @@ struct iommu_table {
unsigned long  it_page_shift;/* table iommu page size */
struct list_head it_group_list;/* List of iommu_table_group_link */
__be64 *it_userspace; /* userspace view of the table */
+   bool reset_ddw;
struct iommu_table_ops *it_ops;
struct krefit_kref;
int it_nid;
@@ -169,6 +170,8 @@ struct iommu_table_group_ops {
__u32 page_shift,
__u64 window_size,
__u32 levels);
+   void (*init_group)(struct iommu_table_group *table_group,
+   struct device *dev);
long (*create_table)(struct iommu_table_group *table_group,
int num,
__u32 page_shift,
diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c
index aa11b2acf24f..1cce2b8b8f2c 100644
--- a/arch/powerpc/kernel/iommu.c
+++ b/arch/powerpc/kernel/iommu.c
@@ -740,6 +740,7 @@ struct iommu_table *iommu_init_table(struct iommu_table 
*tbl, int nid,
return NULL;
}
 
+   tbl->it_nid = nid;
iommu_table_reserve_pages(tbl, res_start, res_end);
 
/* We only split the IOMMU table if we have 1GB or more of space */
@@ -1141,7 +1142,10 @@ spapr_tce_platform_iommu_attach_dev(struct iommu_domain 
*platform_domain,
 {
struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
struct iommu_group *grp = iommu_group_get(dev);
-   struct iommu_table_group *table_group;
+   struct iommu_table_group *table_group = iommu_group_get_iommudata(grp);
+
+   /* This should have been in spapr_tce_iommu_probe_device() ?*/
+   table_group->ops->init_group(table_group, dev);
 
/* At first attach the ownership is already set */
if (!domain) {
@@ -1149,7 +1153,6 @@ spapr_tce_platform_iommu_attach_dev(struct iommu_domain 
*platform_domain,
return 0;
}
 
-   table_group = iommu_group_get_iommudata(grp);
/*
 * The domain being set to PLATFORM from earlier
 * BLOCKED. The table_group ownership has to be released.
diff --git a/arch/powerpc/platforms/pseries/iommu.c 
b/arch/powerpc/platforms/pseries/iommu.c
index 3d9865dadf73..7224107a0f60 100644
--- a/arch/powerpc/platforms/pseries/iommu.c
+++ b/arch/powerpc/platforms/pseries/iommu.c
@@ -630,6 +630,62 @@ static void iommu_table_setparms(struct pci_controller 
*phb,
phb->dma_window_base_cur += phb->dma_window_size;
 }
 
+static int iommu_table_reset(struct iommu_table *tbl, unsigned long busno,
+  unsigned long liobn, unsigned long win_addr,
+  unsigned long window_size, unsigned long 
page_shift,
+  void *base, struct iommu_table_ops 
*table_ops)
+{
+   unsigned long sz = BITS_TO_LONGS(tbl->it_size) * sizeof(unsigned long);
+   unsigned int i, oldsize = tbl->it_size;
+   struct iommu_pool *p;
+
+   WARN_ON(!tbl->it_ops);
+
+   if (oldsize != (window_size >> page_shift)) {
+   vfree(tbl->it_map);
+
+   tbl->it_map = vzalloc_node(sz, tbl->it_nid);
+   if (!tbl->it_map)
+   return -ENOMEM;
+
+   tbl->it_size = window_size >> page_shift;
+   if (oldsize < (window_size >> page_shift))
+   iommu_table_clear(tbl);
+   }
+   tbl->it_busno = busno;
+   tbl->it_index = liobn;
+   tbl->it_offset = win_addr >> page_shift;
+   tbl->it_blocksize = 16;
+   tbl->it_type = TCE_PCI;
+   tbl->it_ops = table_ops;
+   tbl->it_page_shift = page_shift;
+   tbl->i

[RFC PATCH 2/3] powerpc/iommu: Move pSeries specific functions to pseries/iommu.c

2024-03-12 Thread Shivaprasad G Bhat

The PowerNV specific table_group_ops are defined in powernv/pci-ioda.c.
The pSeries specific table_group_ops are sitting in the generic powerpc
file. Move it to where it actually belong(pseries/iommu.c).

Only code movement, no functional changes intended.

Signed-off-by: Shivaprasad G Bhat 
---
 arch/powerpc/include/asm/iommu.h   |4 +
 arch/powerpc/kernel/iommu.c|  149 
 arch/powerpc/platforms/pseries/iommu.c |  145 +++
 3 files changed, 150 insertions(+), 148 deletions(-)

diff --git a/arch/powerpc/include/asm/iommu.h b/arch/powerpc/include/asm/iommu.h
index 026695943550..744cc5fc22d3 100644
--- a/arch/powerpc/include/asm/iommu.h
+++ b/arch/powerpc/include/asm/iommu.h
@@ -156,6 +156,9 @@ extern int iommu_tce_table_put(struct iommu_table *tbl);
 extern struct iommu_table *iommu_init_table(struct iommu_table *tbl,
int nid, unsigned long res_start, unsigned long res_end);
 bool iommu_table_in_use(struct iommu_table *tbl);
+extern void iommu_table_reserve_pages(struct iommu_table *tbl,
+   unsigned long res_start, unsigned long res_end);
+extern void iommu_table_clear(struct iommu_table *tbl);
 
 #define IOMMU_TABLE_GROUP_MAX_TABLES   2
 
@@ -218,7 +221,6 @@ extern long iommu_tce_xchg_no_kill(struct mm_struct *mm,
 extern void iommu_tce_kill(struct iommu_table *tbl,
unsigned long entry, unsigned long pages);
 
-extern struct iommu_table_group_ops spapr_tce_table_group_ops;
 #else
 static inline void iommu_register_group(struct iommu_table_group *table_group,
int pci_domain_number,
diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c
index 1185efebf032..aa11b2acf24f 100644
--- a/arch/powerpc/kernel/iommu.c
+++ b/arch/powerpc/kernel/iommu.c
@@ -642,7 +642,7 @@ void ppc_iommu_unmap_sg(struct iommu_table *tbl, struct 
scatterlist *sglist,
tbl->it_ops->flush(tbl);
 }
 
-static void iommu_table_clear(struct iommu_table *tbl)
+void iommu_table_clear(struct iommu_table *tbl)
 {
/*
 * In case of firmware assisted dump system goes through clean
@@ -683,7 +683,7 @@ static void iommu_table_clear(struct iommu_table *tbl)
 #endif
 }
 
-static void iommu_table_reserve_pages(struct iommu_table *tbl,
+void iommu_table_reserve_pages(struct iommu_table *tbl,
unsigned long res_start, unsigned long res_end)
 {
int i;
@@ -1101,59 +1101,6 @@ void iommu_tce_kill(struct iommu_table *tbl,
 }
 EXPORT_SYMBOL_GPL(iommu_tce_kill);
 
-#if defined(CONFIG_PPC_PSERIES) || defined(CONFIG_PPC_POWERNV)
-static int iommu_take_ownership(struct iommu_table *tbl)
-{
-   unsigned long flags, i, sz = (tbl->it_size + 7) >> 3;
-   int ret = 0;
-
-   /*
-* VFIO does not control TCE entries allocation and the guest
-* can write new TCEs on top of existing ones so iommu_tce_build()
-* must be able to release old pages. This functionality
-* requires exchange() callback defined so if it is not
-* implemented, we disallow taking ownership over the table.
-*/
-   if (!tbl->it_ops->xchg_no_kill)
-   return -EINVAL;
-
-   spin_lock_irqsave(&tbl->large_pool.lock, flags);
-   for (i = 0; i < tbl->nr_pools; i++)
-   spin_lock_nest_lock(&tbl->pools[i].lock, &tbl->large_pool.lock);
-
-   if (iommu_table_in_use(tbl)) {
-   pr_err("iommu_tce: it_map is not empty");
-   ret = -EBUSY;
-   } else {
-   memset(tbl->it_map, 0xff, sz);
-   }
-
-   for (i = 0; i < tbl->nr_pools; i++)
-   spin_unlock(&tbl->pools[i].lock);
-   spin_unlock_irqrestore(&tbl->large_pool.lock, flags);
-
-   return ret;
-}
-
-static void iommu_release_ownership(struct iommu_table *tbl)
-{
-   unsigned long flags, i, sz = (tbl->it_size + 7) >> 3;
-
-   spin_lock_irqsave(&tbl->large_pool.lock, flags);
-   for (i = 0; i < tbl->nr_pools; i++)
-   spin_lock_nest_lock(&tbl->pools[i].lock, &tbl->large_pool.lock);
-
-   memset(tbl->it_map, 0, sz);
-
-   iommu_table_reserve_pages(tbl, tbl->it_reserved_start,
-   tbl->it_reserved_end);
-
-   for (i = 0; i < tbl->nr_pools; i++)
-   spin_unlock(&tbl->pools[i].lock);
-   spin_unlock_irqrestore(&tbl->large_pool.lock, flags);
-}
-#endif
-
 int iommu_add_device(struct iommu_table_group *table_group, struct device *dev)
 {
/*
@@ -1185,98 +1132,6 @@ int iommu_add_device(struct iommu_table_group 
*table_group, struct device *dev)
 EXPORT_SYMBOL_GPL(iommu_add_device);
 
 #if defined(CONFIG_PPC_PSERIES) || defined(CONFIG_PPC_POWERNV)
-/*
- * A simple iommu_table_group_ops which only allows reusing the existing
- * iommu_table. This handles VFIO for POWER7 or the nested KVM.
- * The ops does not allow creating windows and only allows reusing the existing
- * one if it matches table_group->tce3

[RFC PATCH 1/3] powerpc/pseries/iommu: Bring back userspace view for single level TCE tables

2024-03-12 Thread Shivaprasad G Bhat

The commit 090bad39b237a ("powerpc/powernv: Add indirect levels to
it_userspace") which implemented the tce indirect levels
support for PowerNV ended up removing the single level support
which existed by default(generic tce_iommu_userspace_view_alloc/free()
calls). On pSeries the TCEs are single level, and the allocation
of userspace view is lost with the removal of generic code.

The patch attempts to bring it back for pseries on the refactored
code base.

On pSeries, the windows/tables are "borrowed", so the it_ops->free()
is not called during the container detach or the tce release call paths
as the table is not really freed. So, decoupling the userspace view
array free and alloc from table's it_ops just the way it was before.

Signed-off-by: Shivaprasad G Bhat 
---
 arch/powerpc/platforms/pseries/iommu.c |   19 ++--
 drivers/vfio/vfio_iommu_spapr_tce.c|   51 
 2 files changed, 67 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/iommu.c 
b/arch/powerpc/platforms/pseries/iommu.c
index e8c4129697b1..40de8d55faef 100644
--- a/arch/powerpc/platforms/pseries/iommu.c
+++ b/arch/powerpc/platforms/pseries/iommu.c
@@ -143,7 +143,7 @@ static int tce_build_pSeries(struct iommu_table *tbl, long 
index,
 }
 
 
-static void tce_free_pSeries(struct iommu_table *tbl, long index, long npages)
+static void tce_clear_pSeries(struct iommu_table *tbl, long index, long npages)
 {
__be64 *tcep;
 
@@ -162,6 +162,11 @@ static unsigned long tce_get_pseries(struct iommu_table 
*tbl, long index)
return be64_to_cpu(*tcep);
 }
 
+static void tce_free_pSeries(struct iommu_table *tbl)
+{
+   /* Do nothing. */
+}
+
 static void tce_free_pSeriesLP(unsigned long liobn, long, long, long);
 static void tce_freemulti_pSeriesLP(struct iommu_table*, long, long);
 
@@ -576,7 +581,7 @@ struct iommu_table_ops iommu_table_lpar_multi_ops;
 
 struct iommu_table_ops iommu_table_pseries_ops = {
.set = tce_build_pSeries,
-   .clear = tce_free_pSeries,
+   .clear = tce_clear_pSeries,
.get = tce_get_pseries
 };
 
@@ -685,15 +690,23 @@ static int tce_exchange_pseries(struct iommu_table *tbl, 
long index, unsigned
 
return rc;
 }
+
+static __be64 *tce_useraddr_pSeriesLP(struct iommu_table *tbl, long index,
+ bool __always_unused alloc)
+{
+   return tbl->it_userspace ? &tbl->it_userspace[index - tbl->it_offset] : 
NULL;
+}
 #endif
 
 struct iommu_table_ops iommu_table_lpar_multi_ops = {
.set = tce_buildmulti_pSeriesLP,
 #ifdef CONFIG_IOMMU_API
.xchg_no_kill = tce_exchange_pseries,
+   .useraddrptr = tce_useraddr_pSeriesLP,
 #endif
.clear = tce_freemulti_pSeriesLP,
-   .get = tce_get_pSeriesLP
+   .get = tce_get_pSeriesLP,
+   .free = tce_free_pSeries
 };
 
 /*
diff --git a/drivers/vfio/vfio_iommu_spapr_tce.c 
b/drivers/vfio/vfio_iommu_spapr_tce.c
index a94ec6225d31..1cf36d687559 100644
--- a/drivers/vfio/vfio_iommu_spapr_tce.c
+++ b/drivers/vfio/vfio_iommu_spapr_tce.c
@@ -177,6 +177,50 @@ static long tce_iommu_register_pages(struct tce_container 
*container,
return ret;
 }
 
+static long tce_iommu_userspace_view_alloc(struct iommu_table *tbl,
+   struct mm_struct *mm)
+{
+   unsigned long cb = ALIGN(sizeof(tbl->it_userspace[0]) *
+   tbl->it_size, PAGE_SIZE);
+   unsigned long *uas;
+   long ret;
+
+   if (tbl->it_indirect_levels)
+   return 0;
+
+   WARN_ON(tbl->it_userspace);
+
+   ret = account_locked_vm(mm, cb >> PAGE_SHIFT, true);
+   if (ret)
+   return ret;
+
+   uas = vzalloc(cb);
+   if (!uas) {
+   account_locked_vm(mm, cb >> PAGE_SHIFT, false);
+   return -ENOMEM;
+   }
+   tbl->it_userspace = (__be64 *) uas;
+
+   return 0;
+}
+
+static void tce_iommu_userspace_view_free(struct iommu_table *tbl,
+   struct mm_struct *mm)
+{
+   unsigned long cb = ALIGN(sizeof(tbl->it_userspace[0]) *
+   tbl->it_size, PAGE_SIZE);
+
+   if (!tbl->it_userspace)
+   return;
+
+   if (tbl->it_indirect_levels)
+   return;
+
+   vfree(tbl->it_userspace);
+   tbl->it_userspace = NULL;
+   account_locked_vm(mm, cb >> PAGE_SHIFT, false);
+}
+
 static bool tce_page_is_contained(struct mm_struct *mm, unsigned long hpa,
unsigned int it_page_shift)
 {
@@ -554,6 +598,12 @@ static long tce_iommu_build_v2(struct tce_container 
*container,
unsigned long hpa;
enum dma_data_direction dirtmp;
 
+   if (!tbl->it_userspace) {
+   ret = tce_iommu_userspace_view_alloc(tbl, container->mm);
+   if (ret)
+   return ret;
+   }
+
for (i = 0; i < pages; ++i) {
struct mm_iommu_table_group_mem_t *mem = NULL;
__be64 *pua = IOMMU_TABLE_USERSPACE_ENTRY(tbl, entry

[RFC PATCH 0/3] powerpc: pSeries: vfio: iommu: Re-enable support for SPAPR TCE VFIO

2024-03-12 Thread Shivaprasad G Bhat

This patch-set fills the missing gaps on pSeries for VFIO SPAPR TCE
sub-driver thereby re-enabling the VFIO support on POWER pSeries
machines.

Structure of the patchset
=

* Currently, due to [1] VFIO_IOMMU_UNMAP_{DMA,UNMAP} ioctls are broken
on pSeries as it only supports single level TCEs. This is addressed in
the first patch as the lost functionality is restored.

On pSeries, the DMA windows (default 32-bit and the 64-bit DDW) are
"borrowed" between the host and the vfio driver. The necessary
mechanism for this is already been in place since [2]. However,
the VFIO SPAPR-TCE sub-driver doesn't open the 64-bit window if
it wasn't already done by a host driver like NVME. So the user-space
only gets access to the default 32-bit DMA window alone. This poses a
challenge for devices having no host kernel drivers and completely
depend on VFIO user-space interface to request DMA. The
VFIO_SPAPR_TCE_CREATE ioctl currently just returns EPERM without
attempting to open the second window for such devices.

* The second patch is just code movement for pSeries specific
functions from arch iommu to the pSeries platform iommu file.
This is needed as the DMA window manipulation operations
introduced in the third patch depend on these functions and are
entirely pSeries specific.

* The third patch adds necessary support to open up the 64-bit DMA
window on VFIO_SPAPR_TCE_CREATE ioctl from the user-space. It also
collects the DDW information from the platform for exposing it to
user through 'struct vfio_iommu_spapr_tce_ddw_info'.

Testing

These patches are tested with by attaching a nvme disk to a nested
kvm guest running a pSeries lpar. Also vfio-test [3] by Alex Willamson,
was forked and updated to add support for pSeries guest and used to
test these patches[4].

Limitations/Known Issues

* Does not work for SRIOV VFs, as they have only one DMA window.
* Does not work for multi-function cards.
* Bugs
  - mmdrop() in tce_iommu_release() when container detached
with pending unmaps.

[1] Commit: 090bad39b237 ("powerpc/powernv: Add indirect levels to 
it_userspace")
[2] Commit: 9d67c9433509 ("powerpc/iommu: Add \"borrowing\" 
iommu_table_group_ops")
[3] https://github.com/awilliam/tests
[4] https://github.com/nnmwebmin/vfio-ppc-tests/tree/vfio-ppc-ex

---

Shivaprasad G Bhat (3):
  powerpc/pseries/iommu: Bring back userspace view for single level TCE 
tables
  powerpc/iommu: Move pSeries specific functions to pseries/iommu.c
  pseries/iommu: Enable DDW for VFIO TCE create


 arch/powerpc/include/asm/iommu.h   |   7 +-
 arch/powerpc/kernel/iommu.c| 156 +---
 arch/powerpc/platforms/pseries/iommu.c | 514 -
 drivers/vfio/vfio_iommu_spapr_tce.c|  51 +++
 4 files changed, 571 insertions(+), 157 deletions(-)

[PATCH 14/14] powerpc: Add support for suppressing warning backtraces

2024-03-12 Thread Guenter Roeck

Add name of functions triggering warning backtraces to the __bug_table
object section to enable support for suppressing WARNING backtraces.

To limit image size impact, the pointer to the function name is only added
to the __bug_table section if both CONFIG_KUNIT and CONFIG_DEBUG_BUGVERBOSE
are enabled. Otherwise, the __func__ assembly parameter is replaced with a
(dummy) NULL parameter to avoid an image size increase due to unused
__func__ entries (this is necessary because __func__ is not a define but a
virtual variable).

Signed-off-by: Guenter Roeck 
---
 arch/powerpc/include/asm/bug.h | 37 +-
 1 file changed, 28 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/include/asm/bug.h b/arch/powerpc/include/asm/bug.h
index 1db485aacbd9..330d4983f90e 100644
--- a/arch/powerpc/include/asm/bug.h
+++ b/arch/powerpc/include/asm/bug.h
@@ -14,6 +14,9 @@
 .section __bug_table,"aw"
 5001:   .4byte \addr - .
 .4byte 5002f - .
+#if IS_ENABLED(CONFIG_KUNIT)
+.4byte 0
+#endif
 .short \line, \flags
 .org 5001b+BUG_ENTRY_SIZE
 .previous
@@ -32,30 +35,46 @@
 #endif /* verbose */
 
 #else /* !__ASSEMBLY__ */
-/* _EMIT_BUG_ENTRY expects args %0,%1,%2,%3 to be FILE, LINE, flags and
-   sizeof(struct bug_entry), respectively */
+/* _EMIT_BUG_ENTRY expects args %0,%1,%2,%3,%4 to be FILE, __func__, LINE, 
flags
+   and sizeof(struct bug_entry), respectively */
 #ifdef CONFIG_DEBUG_BUGVERBOSE
+
+#if IS_ENABLED(CONFIG_KUNIT)
+# define HAVE_BUG_FUNCTION
+# define __BUG_FUNC_PTR"   .4byte %1 - .\n"
+#else
+# define __BUG_FUNC_PTR
+#endif /* IS_ENABLED(CONFIG_KUNIT) */
+
 #define _EMIT_BUG_ENTRY\
".section __bug_table,\"aw\"\n" \
"2: .4byte 1b - .\n"\
"   .4byte %0 - .\n"\
-   "   .short %1, %2\n"\
-   ".org 2b+%3\n"  \
+   __BUG_FUNC_PTR  \
+   "   .short %2, %3\n"\
+   ".org 2b+%4\n"  \
".previous\n"
 #else
 #define _EMIT_BUG_ENTRY\
".section __bug_table,\"aw\"\n" \
"2: .4byte 1b - .\n"\
-   "   .short %2\n"\
-   ".org 2b+%3\n"  \
+   "   .short %3\n"\
+   ".org 2b+%4\n"  \
".previous\n"
 #endif
 
+#ifdef HAVE_BUG_FUNCTION
+# define __BUG_FUNC__func__
+#else
+# define __BUG_FUNCNULL
+#endif
+
 #define BUG_ENTRY(insn, flags, ...)\
__asm__ __volatile__(   \
"1: " insn "\n" \
_EMIT_BUG_ENTRY \
-   : : "i" (__FILE__), "i" (__LINE__), \
+   : : "i" (__FILE__), "i" (__BUG_FUNC),   \
+ "i" (__LINE__),   \
  "i" (flags),  \
  "i" (sizeof(struct bug_entry)),   \
  ##__VA_ARGS__)
@@ -80,7 +99,7 @@
if (x)  \
BUG();  \
} else {\
-   BUG_ENTRY(PPC_TLNEI " %4, 0", 0, "r" ((__force long)(x)));  
\
+   BUG_ENTRY(PPC_TLNEI " %5, 0", 0, "r" ((__force long)(x)));  
\
}   \
 } while (0)
 
@@ -90,7 +109,7 @@
if (__ret_warn_on)  \
__WARN();   \
} else {\
-   BUG_ENTRY(PPC_TLNEI " %4, 0",   \
+   BUG_ENTRY(PPC_TLNEI " %5, 0",   \
  BUGFLAG_WARNING | BUGFLAG_TAINT(TAINT_WARN),  \
  "r" (__ret_warn_on)); \
}   \
-- 
2.39.2

[PATCH 13/14] riscv: Add support for suppressing warning backtraces

2024-03-12 Thread Guenter Roeck

Add name of functions triggering warning backtraces to the __bug_table
object section to enable support for suppressing WARNING backtraces.

To limit image size impact, the pointer to the function name is only added
to the __bug_table section if both CONFIG_KUNIT and CONFIG_DEBUG_BUGVERBOSE
are enabled. Otherwise, the __func__ assembly parameter is replaced with a
(dummy) NULL parameter to avoid an image size increase due to unused
__func__ entries (this is necessary because __func__ is not a define but a
virtual variable).

To simplify the implementation, unify the __BUG_ENTRY_ADDR and
__BUG_ENTRY_FILE macros into a single macro named __BUG_REL() which takes
the address, file, or function reference as parameter.

Signed-off-by: Guenter Roeck 
---
 arch/riscv/include/asm/bug.h | 38 
 1 file changed, 26 insertions(+), 12 deletions(-)

diff --git a/arch/riscv/include/asm/bug.h b/arch/riscv/include/asm/bug.h
index 1aaea81fb141..8955ee5c1c27 100644
--- a/arch/riscv/include/asm/bug.h
+++ b/arch/riscv/include/asm/bug.h
@@ -30,26 +30,39 @@
 typedef u32 bug_insn_t;
 
 #ifdef CONFIG_GENERIC_BUG_RELATIVE_POINTERS
-#define __BUG_ENTRY_ADDR   RISCV_INT " 1b - ."
-#define __BUG_ENTRY_FILE   RISCV_INT " %0 - ."
+#define __BUG_REL(val) RISCV_INT " " __stringify(val) " - ."
 #else
-#define __BUG_ENTRY_ADDR   RISCV_PTR " 1b"
-#define __BUG_ENTRY_FILE   RISCV_PTR " %0"
+#define __BUG_REL(val) RISCV_PTR " " __stringify(val)
 #endif
 
 #ifdef CONFIG_DEBUG_BUGVERBOSE
+
+#if IS_ENABLED(CONFIG_KUNIT)
+# define HAVE_BUG_FUNCTION
+# define __BUG_FUNC_PTR__BUG_REL(%1)
+#else
+# define __BUG_FUNC_PTR
+#endif /* IS_ENABLED(CONFIG_KUNIT) */
+
 #define __BUG_ENTRY\
-   __BUG_ENTRY_ADDR "\n\t" \
-   __BUG_ENTRY_FILE "\n\t" \
-   RISCV_SHORT " %1\n\t"   \
-   RISCV_SHORT " %2"
+   __BUG_REL(1b) "\n\t"\
+   __BUG_REL(%0) "\n\t"\
+   __BUG_FUNC_PTR "\n\t"   \
+   RISCV_SHORT " %2\n\t"   \
+   RISCV_SHORT " %3"
 #else
 #define __BUG_ENTRY\
-   __BUG_ENTRY_ADDR "\n\t" \
-   RISCV_SHORT " %2"
+   __BUG_REL(1b) "\n\t"\
+   RISCV_SHORT " %3"
 #endif
 
 #ifdef CONFIG_GENERIC_BUG
+#ifdef HAVE_BUG_FUNCTION
+# define __BUG_FUNC__func__
+#else
+# define __BUG_FUNCNULL
+#endif
+
 #define __BUG_FLAGS(flags) \
 do {   \
__asm__ __volatile__ (  \
@@ -58,10 +71,11 @@ do {
\
".pushsection __bug_table,\"aw\"\n\t"   \
"2:\n\t"\
__BUG_ENTRY "\n\t"  \
-   ".org 2b + %3\n\t"  \
+   ".org 2b + %4\n\t"  \
".popsection"   \
:   \
-   : "i" (__FILE__), "i" (__LINE__),   \
+   : "i" (__FILE__), "i" (__BUG_FUNC), \
+ "i" (__LINE__),   \
  "i" (flags),  \
  "i" (sizeof(struct bug_entry)));  \
 } while (0)
-- 
2.39.2

[PATCH 12/14] sh: Add support for suppressing warning backtraces

2024-03-12 Thread Guenter Roeck

Add name of functions triggering warning backtraces to the __bug_table
object section to enable support for suppressing WARNING backtraces.

To limit image size impact, the pointer to the function name is only added
to the __bug_table section if both CONFIG_KUNIT and CONFIG_DEBUG_BUGVERBOSE
are enabled. Otherwise, the __func__ assembly parameter is replaced with a
(dummy) NULL parameter to avoid an image size increase due to unused
__func__ entries (this is necessary because __func__ is not a define but a
virtual variable).

Signed-off-by: Guenter Roeck 
---
 arch/sh/include/asm/bug.h | 26 ++
 1 file changed, 22 insertions(+), 4 deletions(-)

diff --git a/arch/sh/include/asm/bug.h b/arch/sh/include/asm/bug.h
index 05a485c4fabc..cadc335eb759 100644
--- a/arch/sh/include/asm/bug.h
+++ b/arch/sh/include/asm/bug.h
@@ -24,21 +24,36 @@
  * The offending file and line are encoded in the __bug_table section.
  */
 #ifdef CONFIG_DEBUG_BUGVERBOSE
+
+#if IS_ENABLED(CONFIG_KUNIT)
+# define HAVE_BUG_FUNCTION
+# define __BUG_FUNC_PTR"\t.long %O2\n"
+#else
+# define __BUG_FUNC_PTR
+#endif /* IS_ENABLED(CONFIG_KUNIT) */
+
 #define _EMIT_BUG_ENTRY\
"\t.pushsection __bug_table,\"aw\"\n"   \
"2:\t.long 1b, %O1\n"   \
-   "\t.short %O2, %O3\n"   \
-   "\t.org 2b+%O4\n"   \
+   __BUG_FUNC_PTR  \
+   "\t.short %O3, %O4\n"   \
+   "\t.org 2b+%O5\n"   \
"\t.popsection\n"
 #else
 #define _EMIT_BUG_ENTRY\
"\t.pushsection __bug_table,\"aw\"\n"   \
"2:\t.long 1b\n"\
-   "\t.short %O3\n"\
-   "\t.org 2b+%O4\n"   \
+   "\t.short %O4\n"\
+   "\t.org 2b+%O5\n"   \
"\t.popsection\n"
 #endif
 
+#ifdef HAVE_BUG_FUNCTION
+# define __BUG_FUNC__func__
+#else
+# define __BUG_FUNCNULL
+#endif
+
 #define BUG()  \
 do {   \
__asm__ __volatile__ (  \
@@ -47,6 +62,7 @@ do {  \
 :  \
 : "n" (TRAPA_BUG_OPCODE),  \
   "i" (__FILE__),  \
+  "i" (__BUG_FUNC),\
   "i" (__LINE__), "i" (0), \
   "i" (sizeof(struct bug_entry))); \
unreachable();  \
@@ -60,6 +76,7 @@ do {  \
 :  \
 : "n" (TRAPA_BUG_OPCODE),  \
   "i" (__FILE__),  \
+  "i" (__BUG_FUNC),\
   "i" (__LINE__),  \
   "i" (BUGFLAG_WARNING|(flags)),   \
   "i" (sizeof(struct bug_entry))); \
@@ -85,6 +102,7 @@ do { \
 :  \
 : "n" (TRAPA_BUG_OPCODE),  \
   "i" (__FILE__),  \
+  "i" (__BUG_FUNC),\
   "i" (__LINE__),  \
   "i" (BUGFLAG_UNWINDER),  \
   "i" (sizeof(struct bug_entry))); \
-- 
2.39.2

[PATCH 11/14] s390: Add support for suppressing warning backtraces

2024-03-12 Thread Guenter Roeck

Add name of functions triggering warning backtraces to the __bug_table
object section to enable support for suppressing WARNING backtraces.

To limit image size impact, the pointer to the function name is only added
to the __bug_table section if both CONFIG_KUNIT and CONFIG_DEBUG_BUGVERBOSE
are enabled. Otherwise, the __func__ assembly parameter is replaced with a
(dummy) NULL parameter to avoid an image size increase due to unused
__func__ entries (this is necessary because __func__ is not a define but a
virtual variable).

Signed-off-by: Guenter Roeck 
---
 arch/s390/include/asm/bug.h | 19 +++
 1 file changed, 15 insertions(+), 4 deletions(-)

diff --git a/arch/s390/include/asm/bug.h b/arch/s390/include/asm/bug.h
index aebe1e22c7be..01e2aa4069d7 100644
--- a/arch/s390/include/asm/bug.h
+++ b/arch/s390/include/asm/bug.h
@@ -8,19 +8,30 @@
 
 #ifdef CONFIG_DEBUG_BUGVERBOSE
 
+#if IS_ENABLED(CONFIG_KUNIT)
+# define HAVE_BUG_FUNCTION
+# define __BUG_FUNC_PTR"   .long   %0-.\n"
+# define __BUG_FUNC__func__
+#else
+# define __BUG_FUNC_PTR
+# define __BUG_FUNCNULL
+#endif /* IS_ENABLED(CONFIG_KUNIT) */
+
 #define __EMIT_BUG(x) do { \
asm_inline volatile(\
"0: mc  0,0\n"  \
".section .rodata.str,\"aMS\",@progbits,1\n"\
"1: .asciz  \""__FILE__"\"\n"   \
".previous\n"   \
-   ".section __bug_table,\"awM\",@progbits,%2\n"   \
+   ".section __bug_table,\"awM\",@progbits,%3\n"   \
"2: .long   0b-.\n" \
"   .long   1b-.\n" \
-   "   .short  %0,%1\n"\
-   "   .org2b+%2\n"\
+   __BUG_FUNC_PTR  \
+   "   .short  %1,%2\n"\
+   "   .org2b+%3\n"\
".previous\n"   \
-   : : "i" (__LINE__), \
+   : : "i" (__BUG_FUNC),   \
+   "i" (__LINE__), \
"i" (x),\
"i" (sizeof(struct bug_entry)));\
 } while (0)
-- 
2.39.2

[PATCH 10/14] parisc: Add support for suppressing warning backtraces

2024-03-12 Thread Guenter Roeck

Add name of functions triggering warning backtraces to the __bug_table
object section to enable support for suppressing WARNING backtraces.

To limit image size impact, the pointer to the function name is only added
to the __bug_table section if both CONFIG_KUNIT and CONFIG_DEBUG_BUGVERBOSE
are enabled. Otherwise, the __func__ assembly parameter is replaced with a
(dummy) NULL parameter to avoid an image size increase due to unused
__func__ entries (this is necessary because __func__ is not a define but a
virtual variable).

While at it, declare assembler parameters as constants where possible.
Refine .blockz instructions to calculate the necessary padding instead
of using fixed values.

Signed-off-by: Guenter Roeck 
---
 arch/parisc/include/asm/bug.h | 29 +
 1 file changed, 21 insertions(+), 8 deletions(-)

diff --git a/arch/parisc/include/asm/bug.h b/arch/parisc/include/asm/bug.h
index 833555f74ffa..792dacc2a653 100644
--- a/arch/parisc/include/asm/bug.h
+++ b/arch/parisc/include/asm/bug.h
@@ -23,8 +23,17 @@
 # define __BUG_REL(val) ".word " __stringify(val)
 #endif
 
-
 #ifdef CONFIG_DEBUG_BUGVERBOSE
+
+#if IS_ENABLED(CONFIG_KUNIT)
+# define HAVE_BUG_FUNCTION
+# define __BUG_FUNC_PTR__BUG_REL(%c1)
+# define __BUG_FUNC__func__
+#else
+# define __BUG_FUNC_PTR
+# define __BUG_FUNCNULL
+#endif /* IS_ENABLED(CONFIG_KUNIT) */
+
 #define BUG()  \
do {\
asm volatile("\n"   \
@@ -33,10 +42,12 @@
 "\t.align 4\n" \
 "2:\t" __BUG_REL(1b) "\n"  \
 "\t" __BUG_REL(%c0)  "\n"  \
-"\t.short %1, %2\n"\
-"\t.blockz %3-2*4-2*2\n"   \
+"\t" __BUG_FUNC_PTR  "\n"  \
+"\t.short %c2, %c3\n"  \
+"\t.blockz %c4-(.-2b)\n"   \
 "\t.popsection"\
-: : "i" (__FILE__), "i" (__LINE__),\
+: : "i" (__FILE__), "i" (__BUG_FUNC),  \
+"i" (__LINE__),\
 "i" (0), "i" (sizeof(struct bug_entry)) ); \
unreachable();  \
} while(0)
@@ -58,10 +69,12 @@
 "\t.align 4\n" \
 "2:\t" __BUG_REL(1b) "\n"  \
 "\t" __BUG_REL(%c0)  "\n"  \
-"\t.short %1, %2\n"\
-"\t.blockz %3-2*4-2*2\n"   \
+"\t" __BUG_FUNC_PTR  "\n"  \
+"\t.short %c2, %3\n"   \
+"\t.blockz %c4-(.-2b)\n"   \
 "\t.popsection"\
-: : "i" (__FILE__), "i" (__LINE__),\
+: : "i" (__FILE__), "i" (__BUG_FUNC),  \
+"i" (__LINE__),\
 "i" (BUGFLAG_WARNING|(flags)), \
 "i" (sizeof(struct bug_entry)) );  \
} while(0)
@@ -74,7 +87,7 @@
 "\t.align 4\n" \
 "2:\t" __BUG_REL(1b) "\n"  \
 "\t.short %0\n"\
-"\t.blockz %1-4-2\n"   \
+"\t.blockz %c1-(.-2b)\n"   \
 "\t.popsection"\
 : : "i" (BUGFLAG_WARNING|(flags)), \
 "i" (sizeof(struct bug_entry)) );  \
-- 
2.39.2

[PATCH 09/14] loongarch: Add support for suppressing warning backtraces

2024-03-12 Thread Guenter Roeck

Add name of functions triggering warning backtraces to the __bug_table
object section to enable support for suppressing WARNING backtraces.

To limit image size impact, the pointer to the function name is only added
to the __bug_table section if both CONFIG_KUNIT and CONFIG_DEBUG_BUGVERBOSE
are enabled. Otherwise, the __func__ assembly parameter is replaced with a
(dummy) NULL parameter to avoid an image size increase due to unused
__func__ entries (this is necessary because __func__ is not a define but a
virtual variable).

Signed-off-by: Guenter Roeck 
---
 arch/loongarch/include/asm/bug.h | 38 +++-
 1 file changed, 27 insertions(+), 11 deletions(-)

diff --git a/arch/loongarch/include/asm/bug.h b/arch/loongarch/include/asm/bug.h
index d4ca3ba25418..25f2b5ae7702 100644
--- a/arch/loongarch/include/asm/bug.h
+++ b/arch/loongarch/include/asm/bug.h
@@ -3,47 +3,63 @@
 #define __ASM_BUG_H
 
 #include 
+#include 
 #include 
 
 #ifndef CONFIG_DEBUG_BUGVERBOSE
-#define _BUGVERBOSE_LOCATION(file, line)
+#define _BUGVERBOSE_LOCATION(file, func, line)
 #else
-#define __BUGVERBOSE_LOCATION(file, line)  \
+#if IS_ENABLED(CONFIG_KUNIT)
+# define HAVE_BUG_FUNCTION
+# define __BUG_FUNC_PTR(func)  .long func - .;
+#else
+# define __BUG_FUNC_PTR(func)
+#endif
+
+#define __BUGVERBOSE_LOCATION(file, func, line)\
.pushsection .rodata.str, "aMS", @progbits, 1;  \
10002:  .string file;   \
.popsection;\
\
.long 10002b - .;   \
+   __BUG_FUNC_PTR(func)\
.short line;
-#define _BUGVERBOSE_LOCATION(file, line) __BUGVERBOSE_LOCATION(file, line)
+#define _BUGVERBOSE_LOCATION(file, func, line) __BUGVERBOSE_LOCATION(file, 
func, line)
 #endif
 
 #ifndef CONFIG_GENERIC_BUG
-#define __BUG_ENTRY(flags)
+#define __BUG_ENTRY(flags, func)
 #else
-#define __BUG_ENTRY(flags) \
+#define __BUG_ENTRY(flags, func)   \
.pushsection __bug_table, "aw"; \
.align 2;   \
1:  .long 10001f - .;   \
-   _BUGVERBOSE_LOCATION(__FILE__, __LINE__)\
+   _BUGVERBOSE_LOCATION(__FILE__, func, __LINE__)  \
.short flags;   \
.popsection;\
10001:
 #endif
 
-#define ASM_BUG_FLAGS(flags)   \
-   __BUG_ENTRY(flags)  \
+#define ASM_BUG_FLAGS(flags, func) \
+   __BUG_ENTRY(flags, func)\
break   BRK_BUG
 
-#define ASM_BUG()  ASM_BUG_FLAGS(0)
+#define ASM_BUG()  ASM_BUG_FLAGS(0, .)
+
+#ifdef HAVE_BUG_FUNCTION
+# define __BUG_FUNC__func__
+#else
+# define __BUG_FUNCNULL
+#endif
 
 #define __BUG_FLAGS(flags) \
-   asm_inline volatile (__stringify(ASM_BUG_FLAGS(flags)));
+   asm_inline volatile (__stringify(ASM_BUG_FLAGS(flags, %0)) : : "i" 
(__BUG_FUNC));
 
 #define __WARN_FLAGS(flags)\
 do {   \
instrumentation_begin();\
-   __BUG_FLAGS(BUGFLAG_WARNING|(flags));   \
+   if (!IS_SUPPRESSED_WARNING(__func__))   \
+   __BUG_FLAGS(BUGFLAG_WARNING|(flags));   \
instrumentation_end();  \
 } while (0)
 
-- 
2.39.2

[PATCH 08/14] arm64: Add support for suppressing warning backtraces

2024-03-12 Thread Guenter Roeck

Add name of functions triggering warning backtraces to the __bug_table
object section to enable support for suppressing WARNING backtraces.

To limit image size impact, the pointer to the function name is only added
to the __bug_table section if both CONFIG_KUNIT and CONFIG_DEBUG_BUGVERBOSE
are enabled. Otherwise, the __func__ assembly parameter is replaced with a
(dummy) NULL parameter to avoid an image size increase due to unused
__func__ entries (this is necessary because __func__ is not a define but a
virtual variable).

Signed-off-by: Guenter Roeck 
---
 arch/arm64/include/asm/asm-bug.h | 29 +++--
 arch/arm64/include/asm/bug.h |  8 +++-
 2 files changed, 26 insertions(+), 11 deletions(-)

diff --git a/arch/arm64/include/asm/asm-bug.h b/arch/arm64/include/asm/asm-bug.h
index c762038ba400..6884089a7191 100644
--- a/arch/arm64/include/asm/asm-bug.h
+++ b/arch/arm64/include/asm/asm-bug.h
@@ -8,36 +8,45 @@
 #include 
 
 #ifdef CONFIG_DEBUG_BUGVERBOSE
-#define _BUGVERBOSE_LOCATION(file, line) __BUGVERBOSE_LOCATION(file, line)
-#define __BUGVERBOSE_LOCATION(file, line)  \
+
+#if IS_ENABLED(CONFIG_KUNIT)
+# define HAVE_BUG_FUNCTION
+# define __BUG_FUNC_PTR(func)  .long func - .;
+#else
+# define __BUG_FUNC_PTR(func)
+#endif
+
+#define _BUGVERBOSE_LOCATION(file, func, line) __BUGVERBOSE_LOCATION(file, 
func, line)
+#define __BUGVERBOSE_LOCATION(file, func, line)\
.pushsection .rodata.str,"aMS",@progbits,1; \
14472:  .string file;   \
.popsection;\
\
.long 14472b - .;   \
+   __BUG_FUNC_PTR(func)\
.short line;
 #else
-#define _BUGVERBOSE_LOCATION(file, line)
+#define _BUGVERBOSE_LOCATION(file, func, line)
 #endif
 
 #ifdef CONFIG_GENERIC_BUG
 
-#define __BUG_ENTRY(flags) \
+#define __BUG_ENTRY(flags, func)   \
.pushsection __bug_table,"aw";  \
.align 2;   \
14470:  .long 14471f - .;   \
-_BUGVERBOSE_LOCATION(__FILE__, __LINE__)   \
-   .short flags;   \
+_BUGVERBOSE_LOCATION(__FILE__, func, __LINE__) \
+   .short flags;   \
.popsection;\
14471:
 #else
-#define __BUG_ENTRY(flags)
+#define __BUG_ENTRY(flags, func)
 #endif
 
-#define ASM_BUG_FLAGS(flags)   \
-   __BUG_ENTRY(flags)  \
+#define ASM_BUG_FLAGS(flags, func) \
+   __BUG_ENTRY(flags, func)\
brk BUG_BRK_IMM
 
-#define ASM_BUG()  ASM_BUG_FLAGS(0)
+#define ASM_BUG()  ASM_BUG_FLAGS(0, .)
 
 #endif /* __ASM_ASM_BUG_H */
diff --git a/arch/arm64/include/asm/bug.h b/arch/arm64/include/asm/bug.h
index 28be048db3f6..044c5e24a17d 100644
--- a/arch/arm64/include/asm/bug.h
+++ b/arch/arm64/include/asm/bug.h
@@ -11,8 +11,14 @@
 
 #include 
 
+#ifdef HAVE_BUG_FUNCTION
+# define __BUG_FUNC__func__
+#else
+# define __BUG_FUNCNULL
+#endif
+
 #define __BUG_FLAGS(flags) \
-   asm volatile (__stringify(ASM_BUG_FLAGS(flags)));
+   asm volatile (__stringify(ASM_BUG_FLAGS(flags, %c0)) : : "i" 
(__BUG_FUNC));
 
 #define BUG() do { \
__BUG_FLAGS(0); \
-- 
2.39.2

[PATCH 07/14] x86: Add support for suppressing warning backtraces

2024-03-12 Thread Guenter Roeck

Add name of functions triggering warning backtraces to the __bug_table
object section to enable support for suppressing WARNING backtraces.

To limit image size impact, the pointer to the function name is only added
to the __bug_table section if both CONFIG_KUNIT and CONFIG_DEBUG_BUGVERBOSE
are enabled. Otherwise, the __func__ assembly parameter is replaced with a
(dummy) NULL parameter to avoid an image size increase due to unused
__func__ entries (this is necessary because __func__ is not a define but a
virtual variable).

Signed-off-by: Guenter Roeck 
---
 arch/x86/include/asm/bug.h | 21 -
 1 file changed, 16 insertions(+), 5 deletions(-)

diff --git a/arch/x86/include/asm/bug.h b/arch/x86/include/asm/bug.h
index a3ec87d198ac..d87bf8bab238 100644
--- a/arch/x86/include/asm/bug.h
+++ b/arch/x86/include/asm/bug.h
@@ -23,18 +23,28 @@
 
 #ifdef CONFIG_DEBUG_BUGVERBOSE
 
+#if IS_ENABLED(CONFIG_KUNIT)
+# define HAVE_BUG_FUNCTION
+# define __BUG_FUNC_PTR__BUG_REL(%c1)
+# define __BUG_FUNC__func__
+#else
+# define __BUG_FUNC_PTR
+# define __BUG_FUNCNULL
+#endif /* IS_ENABLED(CONFIG_KUNIT) */
+
 #define _BUG_FLAGS(ins, flags, extra)  \
 do {   \
asm_inline volatile("1:\t" ins "\n" \
 ".pushsection __bug_table,\"aw\"\n"\
 "2:\t" __BUG_REL(1b) "\t# bug_entry::bug_addr\n"   \
 "\t"  __BUG_REL(%c0) "\t# bug_entry::file\n"   \
-"\t.word %c1""\t# bug_entry::line\n"   \
-"\t.word %c2""\t# bug_entry::flags\n"  \
-"\t.org 2b+%c3\n"  \
+"\t"  __BUG_FUNC_PTR "\t# bug_entry::function\n"   \
+"\t.word %c2""\t# bug_entry::line\n"   \
+"\t.word %c3""\t# bug_entry::flags\n"  \
+"\t.org 2b+%c4\n"  \
 ".popsection\n"\
 extra  \
-: : "i" (__FILE__), "i" (__LINE__),\
+: : "i" (__FILE__), "i" (__BUG_FUNC), "i" (__LINE__),\
 "i" (flags),   \
 "i" (sizeof(struct bug_entry)));   \
 } while (0)
@@ -80,7 +90,8 @@ do {  
\
 do {   \
__auto_type __flags = BUGFLAG_WARNING|(flags);  \
instrumentation_begin();\
-   _BUG_FLAGS(ASM_UD2, __flags, ASM_REACHABLE);\
+   if (!IS_SUPPRESSED_WARNING(__func__))   \
+   _BUG_FLAGS(ASM_UD2, __flags, ASM_REACHABLE);\
instrumentation_end();  \
 } while (0)
 
-- 
2.39.2

[PATCH 06/14] net: kunit: Suppress lock warning noise at end of dev_addr_lists tests

2024-03-12 Thread Guenter Roeck

dev_addr_lists_test generates lock warning noise at the end of tests
if lock debugging is enabled. There are two sets of warnings.

WARNING: CPU: 0 PID: 689 at kernel/locking/mutex.c:923 
__mutex_unlock_slowpath.constprop.0+0x13c/0x368
DEBUG_LOCKS_WARN_ON(__owner_task(owner) != __get_current())

WARNING: kunit_try_catch/1336 still has locks held!

KUnit test cleanup is not guaranteed to run in the same thread as the test
itself. For this test, this means that rtnl_lock() and rtnl_unlock() may
be called from different threads. This triggers the warnings.
Suppress the warnings because they are irrelevant for the test and just
confusing.

The first warning can be suppressed by using START_SUPPRESSED_WARNING()
and END_SUPPRESSED_WARNING() around the call to rtnl_unlock(). To suppress
the second warning, it is necessary to set debug_locks_silent while the
rtnl lock is held.

Cc: David Gow 
Cc: Jakub Kicinski 
Signed-off-by: Guenter Roeck 
---
 net/core/dev_addr_lists_test.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/net/core/dev_addr_lists_test.c b/net/core/dev_addr_lists_test.c
index 4dbd0dc6aea2..b427dd1a3c93 100644
--- a/net/core/dev_addr_lists_test.c
+++ b/net/core/dev_addr_lists_test.c
@@ -1,6 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0-or-later
 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -49,6 +50,7 @@ static int dev_addr_test_init(struct kunit *test)
KUNIT_FAIL(test, "Can't register netdev %d", err);
}
 
+   debug_locks_silent = 1;
rtnl_lock();
return 0;
 }
@@ -56,8 +58,12 @@ static int dev_addr_test_init(struct kunit *test)
 static void dev_addr_test_exit(struct kunit *test)
 {
struct net_device *netdev = test->priv;
+   DEFINE_SUPPRESSED_WARNING(__mutex_unlock_slowpath);
 
+   START_SUPPRESSED_WARNING(__mutex_unlock_slowpath);
rtnl_unlock();
+   END_SUPPRESSED_WARNING(__mutex_unlock_slowpath);
+   debug_locks_silent = 0;
unregister_netdev(netdev);
free_netdev(netdev);
 }
-- 
2.39.2

[PATCH 05/14] drm: Suppress intentional warning backtraces in scaling unit tests

2024-03-12 Thread Guenter Roeck

The drm_test_rect_calc_hscale and drm_test_rect_calc_vscale unit tests
intentionally trigger warning backtraces by providing bad parameters to
the tested functions. What is tested is the return value, not the existence
of a warning backtrace. Suppress the backtraces to avoid clogging the
kernel log.

Signed-off-by: Guenter Roeck 
---
 drivers/gpu/drm/tests/drm_rect_test.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/drivers/gpu/drm/tests/drm_rect_test.c 
b/drivers/gpu/drm/tests/drm_rect_test.c
index 76332cd2ead8..75614cb4deb5 100644
--- a/drivers/gpu/drm/tests/drm_rect_test.c
+++ b/drivers/gpu/drm/tests/drm_rect_test.c
@@ -406,22 +406,28 @@ KUNIT_ARRAY_PARAM(drm_rect_scale, drm_rect_scale_cases, 
drm_rect_scale_case_desc
 
 static void drm_test_rect_calc_hscale(struct kunit *test)
 {
+   DEFINE_SUPPRESSED_WARNING(drm_calc_scale);
const struct drm_rect_scale_case *params = test->param_value;
int scaling_factor;
 
+   START_SUPPRESSED_WARNING(drm_calc_scale);
scaling_factor = drm_rect_calc_hscale(¶ms->src, ¶ms->dst,
  params->min_range, 
params->max_range);
+   END_SUPPRESSED_WARNING(drm_calc_scale);
 
KUNIT_EXPECT_EQ(test, scaling_factor, params->expected_scaling_factor);
 }
 
 static void drm_test_rect_calc_vscale(struct kunit *test)
 {
+   DEFINE_SUPPRESSED_WARNING(drm_calc_scale);
const struct drm_rect_scale_case *params = test->param_value;
int scaling_factor;
 
+   START_SUPPRESSED_WARNING(drm_calc_scale);
scaling_factor = drm_rect_calc_vscale(¶ms->src, ¶ms->dst,
  params->min_range, 
params->max_range);
+   END_SUPPRESSED_WARNING(drm_calc_scale);
 
KUNIT_EXPECT_EQ(test, scaling_factor, params->expected_scaling_factor);
 }
-- 
2.39.2

[PATCH 04/14] kunit: Add documentation for warning backtrace suppression API

2024-03-12 Thread Guenter Roeck

Document API functions for suppressing warning backtraces.

Signed-off-by: Guenter Roeck 
---
 Documentation/dev-tools/kunit/usage.rst | 30 -
 1 file changed, 29 insertions(+), 1 deletion(-)

diff --git a/Documentation/dev-tools/kunit/usage.rst 
b/Documentation/dev-tools/kunit/usage.rst
index 22955d56b379..8d3d36d4103d 100644
--- a/Documentation/dev-tools/kunit/usage.rst
+++ b/Documentation/dev-tools/kunit/usage.rst
@@ -157,6 +157,34 @@ Alternatively, one can take full control over the error 
message by using
if (some_setup_function())
KUNIT_FAIL(test, "Failed to setup thing for testing");
 
+Suppressing warning backtraces
+--
+
+Some unit tests trigger warning backtraces either intentionally or as side
+effect. Such backtraces are normally undesirable since they distract from
+the actual test and may result in the impression that there is a problem.
+
+Such backtraces can be suppressed. To suppress a backtrace in some_function(),
+use the following code.
+
+.. code-block:: c
+
+   static void some_test(struct kunit *test)
+   {
+   DEFINE_SUPPRESSED_WARNING(some_function);
+
+   START_SUPPRESSED_WARNING(some_function);
+   trigger_backtrace();
+   END_SUPPRESSED_WARNING(some_function);
+   }
+
+SUPPRESSED_WARNING_COUNT() returns the number of suppressed backtraces. If the
+suppressed backtrace was triggered on purpose, this can be used to check if
+the backtrace was actually triggered.
+
+.. code-block:: c
+
+   KUNIT_EXPECT_EQ(test, SUPPRESSED_WARNING_COUNT(some_function), 1);
 
 Test Suites
 ~~~
@@ -857,4 +885,4 @@ For example:
dev_managed_string = devm_kstrdup(fake_device, "Hello, World!");
 
// Everything is cleaned up automatically when the test ends.
-   }
\ No newline at end of file
+   }
-- 
2.39.2

[PATCH 03/14] kunit: Add test cases for backtrace warning suppression

2024-03-12 Thread Guenter Roeck

Add unit tests to verify that warning backtrace suppression works.

If backtrace suppression does _not_ work, the unit tests will likely
trigger unsuppressed backtraces, which should actually help to get
the affected architectures / platforms fixed.

Signed-off-by: Guenter Roeck 
---
 lib/kunit/Makefile |   3 +-
 lib/kunit/backtrace-suppression-test.c | 104 +
 2 files changed, 106 insertions(+), 1 deletion(-)
 create mode 100644 lib/kunit/backtrace-suppression-test.c

diff --git a/lib/kunit/Makefile b/lib/kunit/Makefile
index 545b57c3be48..6a44d2b54ea9 100644
--- a/lib/kunit/Makefile
+++ b/lib/kunit/Makefile
@@ -19,7 +19,8 @@ endif
 obj-y +=   hooks.o \
bug.o
 
-obj-$(CONFIG_KUNIT_TEST) +=kunit-test.o
+obj-$(CONFIG_KUNIT_TEST) +=kunit-test.o \
+   backtrace-suppression-test.o
 
 # string-stream-test compiles built-in only.
 ifeq ($(CONFIG_KUNIT_TEST),y)
diff --git a/lib/kunit/backtrace-suppression-test.c 
b/lib/kunit/backtrace-suppression-test.c
new file mode 100644
index ..47c619283802
--- /dev/null
+++ b/lib/kunit/backtrace-suppression-test.c
@@ -0,0 +1,104 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * KUnit test for suppressing warning tracebacks
+ *
+ * Copyright (C) 2024, Guenter Roeck
+ * Author: Guenter Roeck 
+ */
+
+#include 
+#include 
+
+static void backtrace_suppression_test_warn_direct(struct kunit *test)
+{
+   DEFINE_SUPPRESSED_WARNING(backtrace_suppression_test_warn_direct);
+
+   START_SUPPRESSED_WARNING(backtrace_suppression_test_warn_direct);
+   WARN(1, "This backtrace should be suppressed");
+   END_SUPPRESSED_WARNING(backtrace_suppression_test_warn_direct);
+
+   KUNIT_EXPECT_EQ(test, 
SUPPRESSED_WARNING_COUNT(backtrace_suppression_test_warn_direct), 1);
+}
+
+static void trigger_backtrace_warn(void)
+{
+   WARN(1, "This backtrace should be suppressed");
+}
+
+static void backtrace_suppression_test_warn_indirect(struct kunit *test)
+{
+   DEFINE_SUPPRESSED_WARNING(trigger_backtrace_warn);
+
+   START_SUPPRESSED_WARNING(trigger_backtrace_warn);
+   trigger_backtrace_warn();
+   END_SUPPRESSED_WARNING(trigger_backtrace_warn);
+
+   KUNIT_EXPECT_EQ(test, SUPPRESSED_WARNING_COUNT(trigger_backtrace_warn), 
1);
+}
+
+static void backtrace_suppression_test_warn_multi(struct kunit *test)
+{
+   DEFINE_SUPPRESSED_WARNING(trigger_backtrace_warn);
+   DEFINE_SUPPRESSED_WARNING(backtrace_suppression_test_warn_multi);
+
+   START_SUPPRESSED_WARNING(backtrace_suppression_test_warn_multi);
+   START_SUPPRESSED_WARNING(trigger_backtrace_warn);
+   WARN(1, "This backtrace should be suppressed");
+   trigger_backtrace_warn();
+   END_SUPPRESSED_WARNING(trigger_backtrace_warn);
+   END_SUPPRESSED_WARNING(backtrace_suppression_test_warn_multi);
+
+   KUNIT_EXPECT_EQ(test, 
SUPPRESSED_WARNING_COUNT(backtrace_suppression_test_warn_multi), 1);
+   KUNIT_EXPECT_EQ(test, SUPPRESSED_WARNING_COUNT(trigger_backtrace_warn), 
1);
+}
+
+static void backtrace_suppression_test_warn_on_direct(struct kunit *test)
+{
+   DEFINE_SUPPRESSED_WARNING(backtrace_suppression_test_warn_on_direct);
+
+   if (!IS_ENABLED(CONFIG_DEBUG_BUGVERBOSE) && 
!IS_ENABLED(CONFIG_KALLSYMS))
+   kunit_skip(test, "requires CONFIG_DEBUG_BUGVERBOSE or 
CONFIG_KALLSYMS");
+
+   START_SUPPRESSED_WARNING(backtrace_suppression_test_warn_on_direct);
+   WARN_ON(1);
+   END_SUPPRESSED_WARNING(backtrace_suppression_test_warn_on_direct);
+
+   KUNIT_EXPECT_EQ(test,
+   
SUPPRESSED_WARNING_COUNT(backtrace_suppression_test_warn_on_direct), 1);
+}
+
+static void trigger_backtrace_warn_on(void)
+{
+   WARN_ON(1);
+}
+
+static void backtrace_suppression_test_warn_on_indirect(struct kunit *test)
+{
+   DEFINE_SUPPRESSED_WARNING(trigger_backtrace_warn_on);
+
+   if (!IS_ENABLED(CONFIG_DEBUG_BUGVERBOSE))
+   kunit_skip(test, "requires CONFIG_DEBUG_BUGVERBOSE");
+
+   START_SUPPRESSED_WARNING(trigger_backtrace_warn_on);
+   trigger_backtrace_warn_on();
+   END_SUPPRESSED_WARNING(trigger_backtrace_warn_on);
+
+   KUNIT_EXPECT_EQ(test, 
SUPPRESSED_WARNING_COUNT(trigger_backtrace_warn_on), 1);
+}
+
+static struct kunit_case backtrace_suppression_test_cases[] = {
+   KUNIT_CASE(backtrace_suppression_test_warn_direct),
+   KUNIT_CASE(backtrace_suppression_test_warn_indirect),
+   KUNIT_CASE(backtrace_suppression_test_warn_multi),
+   KUNIT_CASE(backtrace_suppression_test_warn_on_direct),
+   KUNIT_CASE(backtrace_suppression_test_warn_on_indirect),
+   {}
+};
+
+static struct kunit_suite backtrace_suppression_test_suite = {
+   .name = "backtrace-suppression-test",
+   .test_cases = backtrace_suppression_test_cases,
+};
+kunit_test_suites(&backtrace_suppression_test_suite);
+
+

[PATCH 02/14] kunit: bug: Count suppressed warning backtraces

2024-03-12 Thread Guenter Roeck

Count suppressed warning backtraces to enable code which suppresses
warning backtraces to check if the expected backtrace(s) have been
observed.

Using atomics for the backtrace count resulted in build errors on some
architectures due to include file recursion, so use a plain integer
for now.

Signed-off-by: Guenter Roeck 
---
 include/kunit/bug.h | 7 ++-
 lib/kunit/bug.c | 4 +++-
 2 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/include/kunit/bug.h b/include/kunit/bug.h
index 1e34da961599..2097a854ac8c 100644
--- a/include/kunit/bug.h
+++ b/include/kunit/bug.h
@@ -20,6 +20,7 @@
 struct __suppressed_warning {
struct list_head node;
const char *function;
+   int counter;
 };
 
 void __start_suppress_warning(struct __suppressed_warning *warning);
@@ -28,7 +29,7 @@ bool __is_suppressed_warning(const char *function);
 
 #define DEFINE_SUPPRESSED_WARNING(func)\
struct __suppressed_warning __kunit_suppress_##func = \
-   { .function = __stringify(func) }
+   { .function = __stringify(func), .counter = 0 }
 
 #define START_SUPPRESSED_WARNING(func) \
__start_suppress_warning(&__kunit_suppress_##func)
@@ -39,12 +40,16 @@ bool __is_suppressed_warning(const char *function);
 #define IS_SUPPRESSED_WARNING(func) \
__is_suppressed_warning(func)
 
+#define SUPPRESSED_WARNING_COUNT(func) \
+   (__kunit_suppress_##func.counter)
+
 #else /* CONFIG_KUNIT */
 
 #define DEFINE_SUPPRESSED_WARNING(func)
 #define START_SUPPRESSED_WARNING(func)
 #define END_SUPPRESSED_WARNING(func)
 #define IS_SUPPRESSED_WARNING(func) (false)
+#define SUPPRESSED_WARNING_COUNT(func) (0)
 
 #endif /* CONFIG_KUNIT */
 #endif /* __ASSEMBLY__ */
diff --git a/lib/kunit/bug.c b/lib/kunit/bug.c
index f93544d7a9d1..13b3d896c114 100644
--- a/lib/kunit/bug.c
+++ b/lib/kunit/bug.c
@@ -32,8 +32,10 @@ bool __is_suppressed_warning(const char *function)
return false;
 
list_for_each_entry(warning, &suppressed_warnings, node) {
-   if (!strcmp(function, warning->function))
+   if (!strcmp(function, warning->function)) {
+   warning->counter++;
return true;
+   }
}
return false;
 }
-- 
2.39.2

[PATCH 01/14] bug/kunit: Core support for suppressing warning backtraces

2024-03-12 Thread Guenter Roeck

Some unit tests intentionally trigger warning backtraces by passing
bad parameters to API functions. Such unit tests typically check the
return value from those calls, not the existence of the warning backtrace.

Such intentionally generated warning backtraces are neither desirable
nor useful for a number of reasons.
- They can result in overlooked real problems.
- A warning that suddenly starts to show up in unit tests needs to be
  investigated and has to be marked to be ignored, for example by
  adjusting filter scripts. Such filters are ad-hoc because there is
  no real standard format for warnings. On top of that, such filter
  scripts would require constant maintenance.

One option to address problem would be to add messages such as "expected
warning backtraces start / end here" to the kernel log.  However, that
would again require filter scripts, it might result in missing real
problematic warning backtraces triggered while the test is running, and
the irrelevant backtrace(s) would still clog the kernel log.

Solve the problem by providing a means to identify and suppress specific
warning backtraces while executing test code.

Cc: Dan Carpenter 
Cc: Daniel Diaz 
Cc: Naresh Kamboju 
Cc: Kees Cook 
Signed-off-by: Guenter Roeck 
---
 include/asm-generic/bug.h | 16 +---
 include/kunit/bug.h   | 51 +++
 include/kunit/test.h  |  1 +
 include/linux/bug.h   | 13 ++
 lib/bug.c | 51 ---
 lib/kunit/Makefile|  6 +++--
 lib/kunit/bug.c   | 40 ++
 7 files changed, 169 insertions(+), 9 deletions(-)
 create mode 100644 include/kunit/bug.h
 create mode 100644 lib/kunit/bug.c

diff --git a/include/asm-generic/bug.h b/include/asm-generic/bug.h
index 6e794420bd39..c170b6477689 100644
--- a/include/asm-generic/bug.h
+++ b/include/asm-generic/bug.h
@@ -18,6 +18,7 @@
 #endif
 
 #ifndef __ASSEMBLY__
+#include 
 #include 
 #include 
 
@@ -39,8 +40,14 @@ struct bug_entry {
 #ifdef CONFIG_DEBUG_BUGVERBOSE
 #ifndef CONFIG_GENERIC_BUG_RELATIVE_POINTERS
const char  *file;
+#ifdef HAVE_BUG_FUNCTION
+   const char  *function;
+#endif
 #else
signed int  file_disp;
+#ifdef HAVE_BUG_FUNCTION
+   signed int  function_disp;
+#endif
 #endif
unsigned short  line;
 #endif
@@ -96,15 +103,18 @@ extern __printf(1, 2) void __warn_printk(const char *fmt, 
...);
 #define __WARN()   __WARN_printf(TAINT_WARN, NULL)
 #define __WARN_printf(taint, arg...) do {  \
instrumentation_begin();\
-   warn_slowpath_fmt(__FILE__, __LINE__, taint, arg);  \
+   if (!IS_SUPPRESSED_WARNING(__func__))   \
+   warn_slowpath_fmt(__FILE__, __LINE__, taint, arg);\
instrumentation_end();  \
} while (0)
 #else
 #define __WARN()   __WARN_FLAGS(BUGFLAG_TAINT(TAINT_WARN))
 #define __WARN_printf(taint, arg...) do {  \
instrumentation_begin();\
-   __warn_printk(arg); \
-   __WARN_FLAGS(BUGFLAG_NO_CUT_HERE | BUGFLAG_TAINT(taint));\
+   if (!IS_SUPPRESSED_WARNING(__func__)) { \
+   __warn_printk(arg); \
+   __WARN_FLAGS(BUGFLAG_NO_CUT_HERE | 
BUGFLAG_TAINT(taint));\
+   }   \
instrumentation_end();  \
} while (0)
 #define WARN_ON_ONCE(condition) ({ \
diff --git a/include/kunit/bug.h b/include/kunit/bug.h
new file mode 100644
index ..1e34da961599
--- /dev/null
+++ b/include/kunit/bug.h
@@ -0,0 +1,51 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * KUnit helpers for backtrace suppression
+ *
+ * Copyright (c) 2024 Guenter Roeck 
+ */
+
+#ifndef _KUNIT_BUG_H
+#define _KUNIT_BUG_H
+
+#ifndef __ASSEMBLY__
+
+#include 
+
+#if IS_ENABLED(CONFIG_KUNIT)
+
+#include 
+#include 
+
+struct __suppressed_warning {
+   struct list_head node;
+   const char *function;
+};
+
+void __start_suppress_warning(struct __suppressed_warning *warning);
+void __end_suppress_warning(struct __suppressed_warning *warning);
+bool __is_suppressed_warning(const char *function);
+
+#define DEFINE_SUPPRESSED_WARNING(func)\
+   struct __suppressed_warning __kunit_suppress_##func = \
+   { .function = __stringify(func) }
+
+#define START_SUPPRESSED_WARNING(func) \
+   __start_suppress_warning(&__kunit_suppress_##func)
+
+#define END_SUPPRESSED_WARNING(func) \
+   __end_suppress_warning(&__kunit_suppress_##func)
+
+#define IS_SUPPRESSED_WARNING(func) \
+   __is_su

[PATCH 00/14] Add support for suppressing warning backtraces

2024-03-12 Thread Guenter Roeck

Some unit tests intentionally trigger warning backtraces by passing bad
parameters to kernel API functions. Such unit tests typically check the
return value from such calls, not the existence of the warning backtrace.

Such intentionally generated warning backtraces are neither desirable
nor useful for a number of reasons.
- They can result in overlooked real problems.
- A warning that suddenly starts to show up in unit tests needs to be
  investigated and has to be marked to be ignored, for example by
  adjusting filter scripts. Such filters are ad-hoc because there is
  no real standard format for warnings. On top of that, such filter
  scripts would require constant maintenance.

One option to address problem would be to add messages such as "expected
warning backtraces start / end here" to the kernel log.  However, that
would again require filter scripts, it might result in missing real
problematic warning backtraces triggered while the test is running, and
the irrelevant backtrace(s) would still clog the kernel log.

Solve the problem by providing a means to identify and suppress specific
warning backtraces while executing test code. Support suppressing multiple
backtraces while at the same time limiting changes to generic code to the
absolute minimum. Architecture specific changes are kept at minimum by
retaining function names only if both CONFIG_DEBUG_BUGVERBOSE and
CONFIG_KUNIT are enabled.

The first patch of the series introduces the necessary infrastructure.
The second patch introduces support for counting suppressed backtraces.
This capability is used in patch three to implement unit tests.
Patch four documents the new API.
The next two patches add support for suppressing backtraces in drm_rect
and dev_addr_lists unit tests. These patches are intended to serve as
examples for the use of the functionality introduced with this series.
The remaining patches implement the necessary changes for all
architectures with GENERIC_BUG support.

This series is based on the RFC patch and subsequent discussion at
https://patchwork.kernel.org/project/linux-kselftest/patch/02546e59-1afe-4b08-ba81-d94f3b691c9a@moroto.mountain/
and offers a more comprehensive solution of the problem discussed there.

Design note:
  Function pointers are only added to the __bug_table section if both
  CONFIG_KUNIT and CONFIG_DEBUG_BUGVERBOSE are enabled to avoid image
  size increases if CONFIG_KUNIT=n. There would be some benefits to
  adding those pointers all the time (reduced complexity, ability to
  display function names in BUG/WARNING messages). That change, if
  desired, can be made later.

Checkpatch note:
  Remaining checkpatch errors and warnings were deliberately ignored.
  Some are triggered by matching coding style or by comments interpreted
  as code, others by assembler macros which are disliked by checkpatch.
  Suggestions for improvements are welcome.

Changes since RFC:
- Minor cleanups and bug fixes
- Added support for all affected architectures
- Added support for counting suppressed warnings
- Added unit tests using those counters
- Added patch to suppress warning backtraces in dev_addr_lists tests

Re: [PATCH 1/2] powerpc/prom_init: Replace linux,sml-base/sml-size with linux,sml-log

2024-03-12 Thread Rob Herring

On Tue, Mar 12, 2024 at 09:32:50PM +1100, Michael Ellerman wrote:
> Rob Herring  writes:
> > On Fri, Mar 08, 2024 at 07:23:35AM -0500, Stefan Berger wrote:
> >> On 3/7/24 16:52, Rob Herring wrote:
> >> > On Thu, Mar 07, 2024 at 09:41:31PM +1100, Michael Ellerman wrote:
> >> > > Stefan Berger  writes:
> >> > > > linux,sml-base holds the address of a buffer with the TPM log. This
> >> > > > buffer may become invalid after a kexec and therefore embed the 
> >> > > > whole TPM
> >> > > > log in linux,sml-log. This helps to protect the log since it is 
> >> > > > properly
> >> > > > carried across a kexec with both of the kexec syscalls.
> >> > > > 
> >> > > > Signed-off-by: Stefan Berger 
> >> > > > ---
> >> > > >   arch/powerpc/kernel/prom_init.c | 8 ++--
> >> > > >   1 file changed, 2 insertions(+), 6 deletions(-)
> >> > > > 
> >> 
> >> > 
> >> > 
> >> > > Also adding the new linux,sml-log property should be accompanied by a
> >> > > change to the device tree binding.
> >> > > 
> >> > > The syntax is not very obvious to me, but possibly something like?
> >> > > 
> >> > > diff --git a/Documentation/devicetree/bindings/tpm/ibm,vtpm.yaml 
> >> > > b/Documentation/devicetree/bindings/tpm/ibm,vtpm.yaml
> >> > > index 50a3fd31241c..cd75037948bc 100644
> >> > > --- a/Documentation/devicetree/bindings/tpm/ibm,vtpm.yaml
> >> > > +++ b/Documentation/devicetree/bindings/tpm/ibm,vtpm.yaml
> >> > > @@ -74,8 +74,6 @@ required:
> >> > > - ibm,my-dma-window
> >> > > - ibm,my-drc-index
> >> > > - ibm,loc-code
> >> > > -  - linux,sml-base
> >> > > -  - linux,sml-size
> >> > 
> >> > Dropping required properties is an ABI break. If you drop them, an older
> >> > OS version won't work.
> >> 
> >> 1) On PowerVM and KVM on Power these two properties were added in the Linux
> >> code. I replaced the creation of these properties with creation of
> >> linux,sml-log (1/2 in this series). I also replaced the handling of
> >> these two (2/2 in this series) for these two platforms but leaving it for
> >> powernv systems where the firmware creates these.
> >
> > Okay, I guess your case is not a ABI break if the kernel is populating 
> > it and the same kernel consumes it. 
> >
> > You failed to answer my question on using /reserved-memory. Again, why 
> > can't that be used? That is the standard way we prevent chunks of memory 
> > from being clobbered.
> 
> Yes I think that would mostly work. I don't see support for
> /reserved-memory in kexec-tools, so that would need fixing I think.
> 
> My logic was that the memory is not special. It's just a buffer we
> allocated during early boot to store the log. There isn't anything else
> in the system that relies on that memory remaining untouched. So it
> seemed cleaner to just put the log in the device tree, rather than a
> pointer to it.

My issue is we already have 2 ways to describe the log to the OS. I 
don't see a good reason to add a 3rd way. (Though it might actually be a 
4th way, because the chosen property for the last attempt was accepted 
to dtschema yet the code has been abandoned.)

If you put the log into the DT, then the memory for the log remains 
untouched too because the FDT remains untouched. For reserved-memory 
regions, the OS is free to free them if it knows what the region is and 
that it is no longer needed. IOW, if freeing the log memory is desired, 
then the suggested approach doesn't work.

> 
> Having the log external to the device tree creates several problems,
> like the crash kernel region colliding with it, it being clobbered by
> kexec, etc.

We have multiple regions to pass/maintain thru kexec, so how does having 
one less really matter?

Rob

Re: [RFC PATCH v2 2/3] dt-bindings: tpm: Add linux,sml-log to ibm,vtpm.yaml

2024-03-12 Thread Jarkko Sakkinen

On Tue Mar 12, 2024 at 1:11 PM EET, Lukas Wunner wrote:
> On Mon, Mar 11, 2024 at 09:20:29AM -0400, Stefan Berger wrote:
> > Add linux,sml-log, which carries the firmware TPM log in a uint8-array, to
> > the properties. Either this property is required or both linux,sml-base and
> > linux,sml-size are required. Add a test case for verification.
> > 
> > Fixes: 82003e0487fb ("Documentation: tpm: add the IBM Virtual TPM device 
> > tree binding documentation")
>
> The Fixes tag is confusing.  The patch won't even apply cleanly to the
> v4.10 commit referenced here as the conversion to yaml happened only
> recently with v6.8.
>
> Why is the Fixes tag necessary in the first place?  Same question for
> the other patches in the series.  This looks like feature work rather
> than a fix.  Not sure whether it satisfies the "obviously correct"
> rule per Documentation/process/stable-kernel-rules.rst.

I'm not yet sure whether these are bug fixes and or improvements because
I did not fully understand the scenario where TPM corrupts the event log
so that part reminds to be seen.

Probably once I fully understand what is going on, it is possible to
argue on that.

BR, Jarkko

Re: [PATCH 2/2] tpm: of: If available Use linux,sml-log to get the log and its size

2024-03-12 Thread Jarkko Sakkinen

On Tue Mar 12, 2024 at 12:35 PM EET, Michael Ellerman wrote:
> Stefan Berger  writes:
> > On 3/7/24 15:00, Jarkko Sakkinen wrote:
> >> On Thu Mar 7, 2024 at 9:57 PM EET, Jarkko Sakkinen wrote:
> >>> in short summary: s/Use/use/
> >>>
> >>> On Wed Mar 6, 2024 at 5:55 PM EET, Stefan Berger wrote:
>  If linux,sml-log is available use it to get the TPM log rather than the
>  pointer found in linux,sml-base. This resolves an issue on PowerVM and 
>  KVM
>  on Power where after a kexec the memory pointed to by linux,sml-base may
>  have been corrupted. Also, linux,sml-log has replaced linux,sml-base and
>  linux,sml-size on these two platforms.
> 
>  Signed-off-by: Stefan Berger 
> >>>
> >>> So shouldn't this have a fixed tag, or not?
> >> 
> >> In English: do we want this to be backported to stable kernel releases or 
> >> not?
> >
> > Ideally, yes. v3 will have 3 patches and all 3 of them will have to be 
> > backported *together* and not applied otherwise if any one of them 
> > fails. Can this be 'guaranteed'?
>
> You can use Depends-on:  to indicate the relationship.
>
> cheers

Thanks, I've missed depends-on tag.

Stefan, please add also "Cc: sta...@vger.kernel.org" just to make sure
that I don't forget to add it.

Right, and since these are so small scoped commits, and bug fixes in
particular, it is also possible to do PR during the release cycle.

BR, Jarkko

Re: [RFC PATCH v2 3/3] tpm: of: If available use linux,sml-log to get the log and its size

2024-03-12 Thread Jarkko Sakkinen

On Mon Mar 11, 2024 at 10:33 PM EET, Stefan Berger wrote:
>
>
> On 3/11/24 16:25, Jarkko Sakkinen wrote:
> > On Mon Mar 11, 2024 at 3:20 PM EET, Stefan Berger wrote:
> >> If linux,sml-log is available use it to get the TPM log rather than the
> >> pointer found in linux,sml-base. This resolves an issue on PowerVM and KVM
> >> on Power where after a kexec the memory pointed to by linux,sml-base may
> >> have become inaccessible or corrupted. Also, linux,sml-log has replaced
> >> linux,sml-base and linux,sml-size on these two platforms.
> >>
> >> Keep the handling of linux,sml-base/sml-size for powernv platforms that
> >> provide the two properties via skiboot.
> >>
> >> Fixes: c5df39262dd5 ("drivers/char/tpm: Add securityfs support for event 
> >> log")
> >> Signed-off-by: Stefan Berger 
> > 
> > I'm worried about not being up to date and instead using "cached" values
> > when verifying anything from a security chip. Does this guarantee that
> > TPM log is corrupted and will not get updated somehow?
>
>
> What do you mean 'guarantee that TPM log is corrupted'?

I presume that this is for power architecture but I have no idea what
leads log being corrupted, and is the scope all of that that arch or
some subset of CPUs.

The commit message is not very detailed on kexec scenario. It more like
assumes that reader knows all the detail beforehand. So probably this
will start to make sense once the backing story is improved, that's
all.

> The TPM was handed over from the firmware to Linux and the firmware then 
> freed all memory associated with the log and will then not create a new 
> log or touch the TPM or do anything that would require an update to the 
> handed-over log. Linux also does not append to the firmware log. So 
> whatever we now find in linux,sml-log would be the latest firmware log 
> and the state of the 'firmware PCRs' computed from it should correspond 
> to the current state of the 'firmware PCRs'.

So on what CPU this happens and is there any bigger picture for that
design choice in the firmware?

If it is a firmware bug, this should emit FW_BUG log message. If not,
then this commit message should provide the necessary context.

BR, Jarkko

Re: [PATCH v5 02/10] locking/mutex: introduce devm_mutex_init

2024-03-12 Thread George Stark


Hello Christophe

On 3/12/24 14:51, Christophe Leroy wrote:



Le 12/03/2024 à 12:39, George Stark a écrit :

[Vous ne recevez pas souvent de courriers de gnst...@salutedevices.com.
Découvrez pourquoi ceci est important à
https://aka.ms/LearnAboutSenderIdentification ]


...


You don't need that inline function, just change debug_devm_mutex_init()
to __devm_mutex_init().


I stuck to debug_* name because mutex-debug.c already exports a set
of debug_ calls so...
Well it's not essential anyway. Here's the next try:

diff --git a/include/linux/mutex.h b/include/linux/mutex.h
index 67edc4ca2bee..537b5ea18ceb 100644
--- a/include/linux/mutex.h
+++ b/include/linux/mutex.h
@@ -22,6 +22,8 @@
 #include 
 #include 

+struct device;
+
 #ifdef CONFIG_DEBUG_LOCK_ALLOC
 # define __DEP_MAP_MUTEX_INITIALIZER(lockname) \
, .dep_map = {  \
@@ -117,6 +119,29 @@ do {   
\
 } while (0)
 #endif /* CONFIG_PREEMPT_RT */

+#ifdef CONFIG_DEBUG_MUTEXES
+
+int __devm_mutex_init(struct device *dev, struct mutex *lock);
+
+#else
+
+static inline int __devm_mutex_init(struct device *dev, struct mutex *lock)
+{
+   /*
+* When CONFIG_DEBUG_MUTEXES is off mutex_destroy is just a nop so
+* no really need to register it in devm subsystem.
+*/
+   return 0;
+}
+
+#endif
+
+#define devm_mutex_init(dev, mutex)\
+({ \
+   mutex_init(mutex);  \
+   __devm_mutex_init(dev, mutex);  \
+})
+
 /*
  * See kernel/locking/mutex.c for detailed documentation of these APIs.
  * Also see Documentation/locking/mutex-design.rst.
diff --git a/kernel/locking/mutex-debug.c b/kernel/locking/mutex-debug.c
index bc8abb8549d2..6aa77e3dc82e 100644
--- a/kernel/locking/mutex-debug.c
+++ b/kernel/locking/mutex-debug.c
@@ -19,6 +19,7 @@
 #include 
 #include 
 #include 
+#include 

 #include "mutex.h"

@@ -89,6 +90,16 @@ void debug_mutex_init(struct mutex *lock, const char 
*name,

lock->magic = lock;
 }

+static void devm_mutex_release(void *res)
+{
+   mutex_destroy(res);
+}
+
+int __devm_mutex_init(struct device *dev, struct mutex *lock)
+{
+   return devm_add_action_or_reset(dev, devm_mutex_release, lock);
+}
+
 /***
  * mutex_destroy - mark a mutex unusable
  * @lock: the mutex to be destroyed
--
2.25.1




+
+#else
+
+static inline int __devm_mutex_init(struct device *dev, struct mutex
*lock)
+{
+   /*
+   * When CONFIG_DEBUG_MUTEXES is off mutex_destroy is just a nop so
+   * no really need to register it in devm subsystem.
+   */


Don't know if it is because tabs are replaced by blanks in you email,
but the stars should be aligned


Ack


--
Best regards
George

Re: [RFC PATCH v2 2/3] dt-bindings: tpm: Add linux,sml-log to ibm,vtpm.yaml

2024-03-12 Thread Stefan Berger





On 3/12/24 07:11, Lukas Wunner wrote:

On Mon, Mar 11, 2024 at 09:20:29AM -0400, Stefan Berger wrote:

Add linux,sml-log, which carries the firmware TPM log in a uint8-array, to
the properties. Either this property is required or both linux,sml-base and
linux,sml-size are required. Add a test case for verification.

Fixes: 82003e0487fb ("Documentation: tpm: add the IBM Virtual TPM device tree 
binding documentation")


The Fixes tag is confusing.  The patch won't even apply cleanly to the
v4.10 commit referenced here as the conversion to yaml happened only
recently with v6.8.


Then that's as far back (6.8) as the series may be applied. I put the 
Fixes tag on the first appearance of sml-base/sml-size since for kexec 
this was never correct.




Why is the Fixes tag necessary in the first place?  Same question for
the other patches in the series.  This looks like feature work rather
than a fix.  Not sure whether it satisfies the "obviously correct"
rule per Documentation/process/stable-kernel-rules.rst.



It is a fix for the interaction of the TPM firmware log with kexec. The 
sml-base buffer pointer was never protected across a kexec.






--- a/Documentation/devicetree/bindings/tpm/ibm,vtpm.yaml
+++ b/Documentation/devicetree/bindings/tpm/ibm,vtpm.yaml
@@ -74,8 +74,6 @@ required:
- ibm,my-dma-window
- ibm,my-drc-index
- ibm,loc-code
-  - linux,sml-base
-  - linux,sml-size


I assume that either these two or the new "linux,sml-log" property
are (still) required?  If so, a quick grep through the bindings
(e.g. auxdisplay/img,ascii-lcd.yaml) shows that the following
might work:

required:
   - ...

oneOf:
   - required:
   - linux,sml-base
   - required:
   - linux,sml-log

You're right, they need to be here since examples could now omit 
sml-base or sml-log. I added them. Thanks.





--- a/Documentation/devicetree/bindings/tpm/tpm-common.yaml
+++ b/Documentation/devicetree/bindings/tpm/tpm-common.yaml
@@ -30,6 +30,11 @@ properties:
size of reserved memory allocated for firmware event log
  $ref: /schemas/types.yaml#/definitions/uint32
  
+  linux,sml-log:

+description:
+  Content of firmware event log


Please add one or two sentences of context so that readers don't
need to use git blame + git log to find out what this is for.
(Mention at least that the property may be used to pass the log
to a kexec kernel.)


Ok, will do:

Content of firmware event log embedded in device tree to be safely 
carried across a kexec soft reboot.








-# must only have either memory-region or linux,sml-base
+# must only have either memory-region or linux,sml-base/size or linux,sml-log
  # as well as either resets or reset-gpios
  dependentSchemas:
memory-region:
  properties:
linux,sml-base: false
+  linux,sml-log: false
linux,sml-base:
  properties:
memory-region: false
+  linux,sml-log: false
+  linux,sml-log:
+properties:
+  memory-region: false
+  linux,sml-base: false
+  linux,sml-size: false


Could you add "linux,sml-size: false" to "memory-region" as well
while at it for consistency?


Done. Thanks.

   Stefan


Thanks,

Lukas

Re: [PATCH] powerpc/kernel: Fix potential spectre v1 in syscall

2024-03-12 Thread Breno Leitao

On Tue, Mar 12, 2024 at 10:07:54PM +1100, Michael Ellerman wrote:
> Breno Leitao  writes:
> > On Tue, Mar 12, 2024 at 08:17:42AM +, Christophe Leroy wrote:
> >> +Nathan as this is RTAS related.
> >> 
> >> Le 21/08/2018 à 20:42, Breno Leitao a écrit :
> >> > The rtas syscall reads a value from a user-provided structure and uses it
> >> > to index an array, being a possible area for a potential spectre v1 
> >> > attack.
> >> > This is the code that exposes this problem.
> >> > 
> >> >  args.rets = &args.args[nargs];
> >> > 
> >> > The nargs is an user provided value, and the below code is an example 
> >> > where
> >> > the 'nargs' value would be set to XX.
> >> > 
> >> >  struct rtas_args ra;
> >> >  ra.nargs = htobe32(XX);
> >> >  syscall(__NR_rtas, &ra);
> >> 
> >> 
> >> This patch has been hanging around in patchwork since 2018 and doesn't 
> >> apply anymore. Is it still relevant ? If so, can you rebase et resubmit ?
> >
> > This seems to be important, since nargs is a user-provided value. I can
> > submit it if the maintainers are willing to accept. I do not want to
> > spend my time if no one is willing to review it.
> 
> My memory is that I didn't think it was actually a problem, because all
> we do is memset args.rets to zero. I thought I'd talked to you on Slack
> about it, but maybe I didn't.
> 
> Anyway we should probably just fix it to be safe and keep the static
> checkers happy.
> 
> I'll rebase it and apply it, I'm sure you've got better things to do :)

Awesome. Thanks Michael!

Re: [PATCH v5 02/10] locking/mutex: introduce devm_mutex_init

2024-03-12 Thread Christophe Leroy



Le 12/03/2024 à 12:39, George Stark a écrit :
> [Vous ne recevez pas souvent de courriers de gnst...@salutedevices.com. 
> Découvrez pourquoi ceci est important à 
> https://aka.ms/LearnAboutSenderIdentification ]
> 
> Hello Christophe
> 
> Thanks for the review
> You were right about typecheck - it was meant to check errors even if
> CONFIG_DEBUG_MUTEXES was off.

Yes that's current practice in order to catch problems as soon as possible.

> 
> Here's new version based on the comments:
> 
> diff --git a/include/linux/mutex.h b/include/linux/mutex.h
> index 67edc4ca2bee..9193b163038f 100644
> --- a/include/linux/mutex.h
> +++ b/include/linux/mutex.h
> @@ -22,6 +22,8 @@
>   #include 
>   #include 
> 
> +struct device;
> +
>   #ifdef CONFIG_DEBUG_LOCK_ALLOC
>   # define __DEP_MAP_MUTEX_INITIALIZER(lockname)    \
>     , .dep_map = {  \
> @@ -117,6 +119,34 @@ do 
> {   \
>   } while (0)
>   #endif /* CONFIG_PREEMPT_RT */
> 
> +#ifdef CONFIG_DEBUG_MUTEXES
> +
> +int debug_devm_mutex_init(struct device *dev, struct mutex *lock);
> +
> +static inline int __devm_mutex_init(struct device *dev, struct mutex 
> *lock)
> +{
> +   return debug_devm_mutex_init(dev, lock);
> +}

You don't need that inline function, just change debug_devm_mutex_init() 
to __devm_mutex_init().

> +
> +#else
> +
> +static inline int __devm_mutex_init(struct device *dev, struct mutex 
> *lock)
> +{
> +   /*
> +   * When CONFIG_DEBUG_MUTEXES is off mutex_destroy is just a nop so
> +   * no really need to register it in devm subsystem.
> +   */

Don't know if it is because tabs are replaced by blanks in you email, 
but the stars should be aligned

/* ...
  * ...
  */

> +   return 0;
> +}
> +
> +#endif
> +
> +#define devm_mutex_init(dev, mutex)    \
> +({ \
> +   mutex_init(mutex);  \
> +   __devm_mutex_init(dev, mutex);  \
> +})
> +
>   /*
>    * See kernel/locking/mutex.c for detailed documentation of these APIs.
>    * Also see Documentation/locking/mutex-design.rst.
> diff --git a/kernel/locking/mutex-debug.c b/kernel/locking/mutex-debug.c
> index bc8abb8549d2..967a5367c79a 100644
> --- a/kernel/locking/mutex-debug.c
> +++ b/kernel/locking/mutex-debug.c
> @@ -19,6 +19,7 @@
>   #include 
>   #include 
>   #include 
> +#include 
> 
>   #include "mutex.h"
> 
> @@ -89,6 +90,16 @@ void debug_mutex_init(struct mutex *lock, const char
> *name,
>     lock->magic = lock;
>   }
> 
> +static void devm_mutex_release(void *res)
> +{
> +   mutex_destroy(res);
> +}
> +
> +int debug_devm_mutex_init(struct device *dev, struct mutex *lock)

Rename __devm_mutex_init();

It makes it more clear that nobody is expected to call it directly.

> +{
> +   return devm_add_action_or_reset(dev, devm_mutex_release, lock);
> +}
> +
>   /***
>    * mutex_destroy - mark a mutex unusable
>    * @lock: the mutex to be destroyed
> -- 
> 2.25.1
> 
>

Re: [PATCH v5 02/10] locking/mutex: introduce devm_mutex_init

2024-03-12 Thread George Stark


Hello Christophe

Thanks for the review
You were right about typecheck - it was meant to check errors even if 
CONFIG_DEBUG_MUTEXES was off.


Here's new version based on the comments:

diff --git a/include/linux/mutex.h b/include/linux/mutex.h
index 67edc4ca2bee..9193b163038f 100644
--- a/include/linux/mutex.h
+++ b/include/linux/mutex.h
@@ -22,6 +22,8 @@
 #include 
 #include 

+struct device;
+
 #ifdef CONFIG_DEBUG_LOCK_ALLOC
 # define __DEP_MAP_MUTEX_INITIALIZER(lockname) \
, .dep_map = {  \
@@ -117,6 +119,34 @@ do {   
\
 } while (0)
 #endif /* CONFIG_PREEMPT_RT */

+#ifdef CONFIG_DEBUG_MUTEXES
+
+int debug_devm_mutex_init(struct device *dev, struct mutex *lock);
+
+static inline int __devm_mutex_init(struct device *dev, struct mutex *lock)
+{
+   return debug_devm_mutex_init(dev, lock);
+}
+
+#else
+
+static inline int __devm_mutex_init(struct device *dev, struct mutex *lock)
+{
+   /*
+   * When CONFIG_DEBUG_MUTEXES is off mutex_destroy is just a nop so
+   * no really need to register it in devm subsystem.
+   */
+   return 0;
+}
+
+#endif
+
+#define devm_mutex_init(dev, mutex)\
+({ \
+   mutex_init(mutex);  \
+   __devm_mutex_init(dev, mutex);  \
+})
+
 /*
  * See kernel/locking/mutex.c for detailed documentation of these APIs.
  * Also see Documentation/locking/mutex-design.rst.
diff --git a/kernel/locking/mutex-debug.c b/kernel/locking/mutex-debug.c
index bc8abb8549d2..967a5367c79a 100644
--- a/kernel/locking/mutex-debug.c
+++ b/kernel/locking/mutex-debug.c
@@ -19,6 +19,7 @@
 #include 
 #include 
 #include 
+#include 

 #include "mutex.h"

@@ -89,6 +90,16 @@ void debug_mutex_init(struct mutex *lock, const char 
*name,

lock->magic = lock;
 }

+static void devm_mutex_release(void *res)
+{
+   mutex_destroy(res);
+}
+
+int debug_devm_mutex_init(struct device *dev, struct mutex *lock)
+{
+   return devm_add_action_or_reset(dev, devm_mutex_release, lock);
+}
+
 /***
  * mutex_destroy - mark a mutex unusable
  * @lock: the mutex to be destroyed
--
2.25.1



On 3/12/24 09:04, Christophe Leroy wrote:

...



I think it would be preferable to minimise the number of macros.

If I were you I would keep your devm_mutex_init() as is but rename it
__devm_mutex_init() and just remove the mutex_init() from it, then add
only one macro that works independant of CONFIG_DEBUG_MUTEXES:

#define devm_mutex_init(dev, mutex) \
({  \
mutex_init(mutex);  \
__devm_mutex_init(dev, mutex);  \
})

With that, no need of a second version of the macro and no need for the
typecheck either.

Note the __ which is a clear indication that allthough that function is
declared in public mutex.h, it is not meant to be used outside of it.






--
Best regards
George

Re: [RFC PATCH v2 2/3] dt-bindings: tpm: Add linux,sml-log to ibm,vtpm.yaml

2024-03-12 Thread Lukas Wunner

On Mon, Mar 11, 2024 at 09:20:29AM -0400, Stefan Berger wrote:
> Add linux,sml-log, which carries the firmware TPM log in a uint8-array, to
> the properties. Either this property is required or both linux,sml-base and
> linux,sml-size are required. Add a test case for verification.
> 
> Fixes: 82003e0487fb ("Documentation: tpm: add the IBM Virtual TPM device tree 
> binding documentation")

The Fixes tag is confusing.  The patch won't even apply cleanly to the
v4.10 commit referenced here as the conversion to yaml happened only
recently with v6.8.

Why is the Fixes tag necessary in the first place?  Same question for
the other patches in the series.  This looks like feature work rather
than a fix.  Not sure whether it satisfies the "obviously correct"
rule per Documentation/process/stable-kernel-rules.rst.

> --- a/Documentation/devicetree/bindings/tpm/ibm,vtpm.yaml
> +++ b/Documentation/devicetree/bindings/tpm/ibm,vtpm.yaml
> @@ -74,8 +74,6 @@ required:
>- ibm,my-dma-window
>- ibm,my-drc-index
>- ibm,loc-code
> -  - linux,sml-base
> -  - linux,sml-size

I assume that either these two or the new "linux,sml-log" property
are (still) required?  If so, a quick grep through the bindings
(e.g. auxdisplay/img,ascii-lcd.yaml) shows that the following
might work:

required:
  - ...

oneOf:
  - required:
  - linux,sml-base
  - required:
  - linux,sml-log

> --- a/Documentation/devicetree/bindings/tpm/tpm-common.yaml
> +++ b/Documentation/devicetree/bindings/tpm/tpm-common.yaml
> @@ -30,6 +30,11 @@ properties:
>size of reserved memory allocated for firmware event log
>  $ref: /schemas/types.yaml#/definitions/uint32
>  
> +  linux,sml-log:
> +description:
> +  Content of firmware event log

Please add one or two sentences of context so that readers don't
need to use git blame + git log to find out what this is for.
(Mention at least that the property may be used to pass the log
to a kexec kernel.)

> -# must only have either memory-region or linux,sml-base
> +# must only have either memory-region or linux,sml-base/size or linux,sml-log
>  # as well as either resets or reset-gpios
>  dependentSchemas:
>memory-region:
>  properties:
>linux,sml-base: false
> +  linux,sml-log: false
>linux,sml-base:
>  properties:
>memory-region: false
> +  linux,sml-log: false
> +  linux,sml-log:
> +properties:
> +  memory-region: false
> +  linux,sml-base: false
> +  linux,sml-size: false

Could you add "linux,sml-size: false" to "memory-region" as well
while at it for consistency?

Thanks,

Lukas

Re: [PATCH] powerpc/kernel: Fix potential spectre v1 in syscall

2024-03-12 Thread Michael Ellerman

Breno Leitao  writes:
> On Tue, Mar 12, 2024 at 08:17:42AM +, Christophe Leroy wrote:
>> +Nathan as this is RTAS related.
>> 
>> Le 21/08/2018 à 20:42, Breno Leitao a écrit :
>> > The rtas syscall reads a value from a user-provided structure and uses it
>> > to index an array, being a possible area for a potential spectre v1 attack.
>> > This is the code that exposes this problem.
>> > 
>> >args.rets = &args.args[nargs];
>> > 
>> > The nargs is an user provided value, and the below code is an example where
>> > the 'nargs' value would be set to XX.
>> > 
>> >struct rtas_args ra;
>> >ra.nargs = htobe32(XX);
>> >syscall(__NR_rtas, &ra);
>> 
>> 
>> This patch has been hanging around in patchwork since 2018 and doesn't 
>> apply anymore. Is it still relevant ? If so, can you rebase et resubmit ?
>
> This seems to be important, since nargs is a user-provided value. I can
> submit it if the maintainers are willing to accept. I do not want to
> spend my time if no one is willing to review it.

My memory is that I didn't think it was actually a problem, because all
we do is memset args.rets to zero. I thought I'd talked to you on Slack
about it, but maybe I didn't.

Anyway we should probably just fix it to be safe and keep the static
checkers happy.

I'll rebase it and apply it, I'm sure you've got better things to do :)

cheers

Re: [PATCH 2/2] tpm: of: If available Use linux,sml-log to get the log and its size

2024-03-12 Thread Michael Ellerman

Stefan Berger  writes:
> On 3/7/24 15:00, Jarkko Sakkinen wrote:
>> On Thu Mar 7, 2024 at 9:57 PM EET, Jarkko Sakkinen wrote:
>>> in short summary: s/Use/use/
>>>
>>> On Wed Mar 6, 2024 at 5:55 PM EET, Stefan Berger wrote:
 If linux,sml-log is available use it to get the TPM log rather than the
 pointer found in linux,sml-base. This resolves an issue on PowerVM and KVM
 on Power where after a kexec the memory pointed to by linux,sml-base may
 have been corrupted. Also, linux,sml-log has replaced linux,sml-base and
 linux,sml-size on these two platforms.

 Signed-off-by: Stefan Berger 
>>>
>>> So shouldn't this have a fixed tag, or not?
>> 
>> In English: do we want this to be backported to stable kernel releases or 
>> not?
>
> Ideally, yes. v3 will have 3 patches and all 3 of them will have to be 
> backported *together* and not applied otherwise if any one of them 
> fails. Can this be 'guaranteed'?

You can use Depends-on:  to indicate the relationship.

cheers

Re: [PATCH 1/2] powerpc/prom_init: Replace linux,sml-base/sml-size with linux,sml-log

2024-03-12 Thread Michael Ellerman

Rob Herring  writes:
> On Fri, Mar 08, 2024 at 07:23:35AM -0500, Stefan Berger wrote:
>> On 3/7/24 16:52, Rob Herring wrote:
>> > On Thu, Mar 07, 2024 at 09:41:31PM +1100, Michael Ellerman wrote:
>> > > Stefan Berger  writes:
>> > > > linux,sml-base holds the address of a buffer with the TPM log. This
>> > > > buffer may become invalid after a kexec and therefore embed the whole 
>> > > > TPM
>> > > > log in linux,sml-log. This helps to protect the log since it is 
>> > > > properly
>> > > > carried across a kexec with both of the kexec syscalls.
>> > > > 
>> > > > Signed-off-by: Stefan Berger 
>> > > > ---
>> > > >   arch/powerpc/kernel/prom_init.c | 8 ++--
>> > > >   1 file changed, 2 insertions(+), 6 deletions(-)
>> > > > 
>> 
>> > 
>> > 
>> > > Also adding the new linux,sml-log property should be accompanied by a
>> > > change to the device tree binding.
>> > > 
>> > > The syntax is not very obvious to me, but possibly something like?
>> > > 
>> > > diff --git a/Documentation/devicetree/bindings/tpm/ibm,vtpm.yaml 
>> > > b/Documentation/devicetree/bindings/tpm/ibm,vtpm.yaml
>> > > index 50a3fd31241c..cd75037948bc 100644
>> > > --- a/Documentation/devicetree/bindings/tpm/ibm,vtpm.yaml
>> > > +++ b/Documentation/devicetree/bindings/tpm/ibm,vtpm.yaml
>> > > @@ -74,8 +74,6 @@ required:
>> > > - ibm,my-dma-window
>> > > - ibm,my-drc-index
>> > > - ibm,loc-code
>> > > -  - linux,sml-base
>> > > -  - linux,sml-size
>> > 
>> > Dropping required properties is an ABI break. If you drop them, an older
>> > OS version won't work.
>> 
>> 1) On PowerVM and KVM on Power these two properties were added in the Linux
>> code. I replaced the creation of these properties with creation of
>> linux,sml-log (1/2 in this series). I also replaced the handling of
>> these two (2/2 in this series) for these two platforms but leaving it for
>> powernv systems where the firmware creates these.
>
> Okay, I guess your case is not a ABI break if the kernel is populating 
> it and the same kernel consumes it. 
>
> You failed to answer my question on using /reserved-memory. Again, why 
> can't that be used? That is the standard way we prevent chunks of memory 
> from being clobbered.

Yes I think that would mostly work. I don't see support for
/reserved-memory in kexec-tools, so that would need fixing I think.

My logic was that the memory is not special. It's just a buffer we
allocated during early boot to store the log. There isn't anything else
in the system that relies on that memory remaining untouched. So it
seemed cleaner to just put the log in the device tree, rather than a
pointer to it.

Having the log external to the device tree creates several problems,
like the crash kernel region colliding with it, it being clobbered by
kexec, etc.

cheers

kexec verbose dumps with 6.8 [was: [PATCH v4 1/7] kexec_file: add kexec_file flag to control debug printing]

2024-03-12 Thread Jiri Slaby


On 13. 12. 23, 6:57, Baoquan He wrote:

When specifying 'kexec -c -d', kexec_load interface will print loading
information, e.g the regions where kernel/initrd/purgatory/cmdline
are put, the memmap passed to 2nd kernel taken as system RAM ranges,
and printing all contents of struct kexec_segment, etc. These are
very helpful for analyzing or positioning what's happening when
kexec/kdump itself failed. The debugging printing for kexec_load
interface is made in user space utility kexec-tools.

Whereas, with kexec_file_load interface, 'kexec -s -d' print nothing.
Because kexec_file code is mostly implemented in kernel space, and the
debugging printing functionality is missed. It's not convenient when
debugging kexec/kdump loading and jumping with kexec_file_load
interface.

Now add KEXEC_FILE_DEBUG to kexec_file flag to control the debugging
message printing. And add global variable kexec_file_dbg_print and
macro kexec_dprintk() to facilitate the printing.

This is a preparation, later kexec_dprintk() will be used to replace the
existing pr_debug(). Once 'kexec -s -d' is specified, it will print out
kexec/kdump loading information. If '-d' is not specified, it regresses
to pr_debug().

Signed-off-by: Baoquan He 

...

--- a/include/linux/kexec.h
+++ b/include/linux/kexec.h

...

@@ -500,6 +500,13 @@ static inline int crash_hotplug_memory_support(void) { 
return 0; }
  static inline unsigned int crash_get_elfcorehdr_size(void) { return 0; }
  #endif
  
+extern bool kexec_file_dbg_print;

+
+#define kexec_dprintk(fmt, ...)\
+   printk("%s" fmt,  \
+  kexec_file_dbg_print ? KERN_INFO : KERN_DEBUG,   \
+  ##__VA_ARGS__)


This means you dump it _always_. Only with different levels.

And without any prefix whatsoever, so people see bloat like this in 
their log now:

[  +0.01] 1000-0009 (1)
[  +0.02] 7f96d000-7f97efff (3)
[  +0.02] 0080-00807fff (4)
[  +0.01] 0080b000-0080bfff (4)
[  +0.02] 0081-008f (4)
[  +0.01] 7f97f000-7f9fefff (4)
[  +0.01] 7ff0-7fff (4)
[  +0.02] -0fff (2)

without actually knowing what that is.

There should be nothing logged if that is not asked for and especially 
if kexec load went fine, right?


Can this be redesigned, please?

Actually what was wrong on the pr_debug()s? Can you simply turn them on 
from the kernel when -d is passed to kexec instead of all this?


...

--- a/kernel/kexec_core.c
+++ b/kernel/kexec_core.c
@@ -52,6 +52,8 @@ atomic_t __kexec_lock = ATOMIC_INIT(0);
  /* Flag to indicate we are going to kexec a new kernel */
  bool kexec_in_progress = false;
  
+bool kexec_file_dbg_print;


Ugh, and a global flag for this?

thanks,
--
js
suse labs

Re: [PATCH] powerpc/kernel: Fix potential spectre v1 in syscall

2024-03-12 Thread Breno Leitao

On Tue, Mar 12, 2024 at 08:17:42AM +, Christophe Leroy wrote:
> +Nathan as this is RTAS related.
> 
> Le 21/08/2018 à 20:42, Breno Leitao a écrit :
> > The rtas syscall reads a value from a user-provided structure and uses it
> > to index an array, being a possible area for a potential spectre v1 attack.
> > This is the code that exposes this problem.
> > 
> > args.rets = &args.args[nargs];
> > 
> > The nargs is an user provided value, and the below code is an example where
> > the 'nargs' value would be set to XX.
> > 
> > struct rtas_args ra;
> > ra.nargs = htobe32(XX);
> > syscall(__NR_rtas, &ra);
> 
> 
> This patch has been hanging around in patchwork since 2018 and doesn't 
> apply anymore. Is it still relevant ? If so, can you rebase et resubmit ?

This seems to be important, since nargs is a user-provided value. I can
submit it if the maintainers are willing to accept. I do not want to
spend my time if no one is willing to review it.

Thanks for revamping this one.

Re: [PATCH v5 02/10] locking/mutex: introduce devm_mutex_init

2024-03-12 Thread Andy Shevchenko

On Tue, Mar 12, 2024 at 7:41 AM Christophe Leroy
 wrote:
> Le 12/03/2024 à 01:01, George Stark a écrit :
> > [Vous ne recevez pas souvent de courriers de gnst...@salutedevices.com.
> > Découvrez pourquoi ceci est important à
> > https://aka.ms/LearnAboutSenderIdentification ]
> > On 3/7/24 13:34, Andy Shevchenko wrote:
> >> On Thu, Mar 7, 2024 at 4:40 AM George Stark
> >>  wrote:

...

> >>> Signed-off-by: George Stark 
> >>> Signed-off-by: Christophe Leroy 
> >>
> >>>   Hello Christophe. Hope you don't mind I put you SoB tag because you
> >>> helped alot
> >>>   to make this patch happen.
> >>
> >> You also need to figure out who should be the author of the patch and
> >> probably add a (missing) Co-developed-by. After all you should also
> >> follow the correct order of SoBs.
> >
> > Thanks for the review.
> > I explained in the other letter as I see it. So I'd leave myself
> > as author and add appropriate tag with Christophe's name.
> > BTW what do you mean by correct SoB order?
> > Is it alphabetical order or order of importance?

> The correct order is to first have the Author's SoB.

At the last one is submitters. So, if it's the same person, which one
should go first?

-- 
With Best Regards,
Andy Shevchenko

Re: [PATCH] powerpc/kernel: Fix potential spectre v1 in syscall

2024-03-12 Thread Christophe Leroy

+Nathan as this is RTAS related.

Le 21/08/2018 à 20:42, Breno Leitao a écrit :
> The rtas syscall reads a value from a user-provided structure and uses it
> to index an array, being a possible area for a potential spectre v1 attack.
> This is the code that exposes this problem.
> 
>   args.rets = &args.args[nargs];
> 
> The nargs is an user provided value, and the below code is an example where
> the 'nargs' value would be set to XX.
> 
>   struct rtas_args ra;
>   ra.nargs = htobe32(XX);
>   syscall(__NR_rtas, &ra);


This patch has been hanging around in patchwork since 2018 and doesn't 
apply anymore. Is it still relevant ? If so, can you rebase et resubmit ?

Thanks
Christophe


> 
> Signed-off-by: Breno Leitao 
> ---
>   arch/powerpc/kernel/rtas.c | 6 --
>   1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
> index 8afd146bc9c7..5ef3c863003d 100644
> --- a/arch/powerpc/kernel/rtas.c
> +++ b/arch/powerpc/kernel/rtas.c
> @@ -27,6 +27,7 @@
>   #include 
>   #include 
>   #include 
> +#include 
>   
>   #include 
>   #include 
> @@ -1056,7 +1057,7 @@ SYSCALL_DEFINE1(rtas, struct rtas_args __user *, uargs)
>   struct rtas_args args;
>   unsigned long flags;
>   char *buff_copy, *errbuf = NULL;
> - int nargs, nret, token;
> + int index, nargs, nret, token;
>   
>   if (!capable(CAP_SYS_ADMIN))
>   return -EPERM;
> @@ -1084,7 +1085,8 @@ SYSCALL_DEFINE1(rtas, struct rtas_args __user *, uargs)
>   if (token == RTAS_UNKNOWN_SERVICE)
>   return -EINVAL;
>   
> - args.rets = &args.args[nargs];
> + index = array_index_nospec(nargs, ARRAY_SIZE(args.args));
> + args.rets = &args.args[index];
>   memset(args.rets, 0, nret * sizeof(rtas_arg_t));
>   
>   /* Need to handle ibm,suspend_me call specially */

Re: [PATCH 1/2] powerpc: Flush checkpointed gpr state for 32-bit processes in ptrace

2024-03-12 Thread Christophe Leroy



Le 19/06/2018 à 21:54, Pedro Franco de Carvalho a écrit :
> Would something like this be ok?
> 
> I factored out the calls to flush_fp_to_thread and flush_altivec_to_thread,
> although I don't really understand why these are necessary. The 
> tm_cvsx_get/set
> functions also calls flush_vsx_to_thread (outside of the helper function).
> 
> I also noticed that tm_ppr/dscr/tar_get/set functions don't flush the tm
> state. Should they do it, and if so, should they also flush the fp and altivec
> state?
> 
> Thanks!
> Pedro
> 
> -- >8 --
> Currently ptrace doesn't flush the register state when the
> checkpointed GPRs of a 32-bit thread are accessed. This can cause core
> dumps to have stale data in the checkpointed GPR note.
> 
> This patch adds a helper function to flush the TM, fpu and altivec
> state and calls it from the tm_cgpr32_get/set functions.

This patch is almost 6 yr old and doesn't apply anymore.

If someone thinks it is still relevant, please rebase and resubmit.

Thanks
Christophe

> 
> Signed-off-by: Pedro Franco de Carvalho 
> ---
>   arch/powerpc/kernel/ptrace.c | 33 +
>   1 file changed, 33 insertions(+)
> 
> diff --git a/arch/powerpc/kernel/ptrace.c b/arch/powerpc/kernel/ptrace.c
> index 9667666eb18e..0d56857e1e89 100644
> --- a/arch/powerpc/kernel/ptrace.c
> +++ b/arch/powerpc/kernel/ptrace.c
> @@ -778,6 +778,29 @@ static int evr_set(struct task_struct *target, const 
> struct user_regset *regset,
>   
>   #ifdef CONFIG_PPC_TRANSACTIONAL_MEM
>   /**
> + * tm_flush_if_active - flush TM, fpu and altivec state if TM active
> + * @target:  The target task.
> + *
> + * This function flushes the TM, fpu and altivec state to the target
> + * task and returns 0 if TM is available and active in the target, and
> + * returns an error code suitable for ptrace otherwise.
> + */
> +static int tm_flush_if_active (struct task_struct *target)
> +{
> + if (!cpu_has_feature(CPU_FTR_TM))
> + return -ENODEV;
> +
> + if (!MSR_TM_ACTIVE(target->thread.regs->msr))
> + return -ENODATA;
> +
> + flush_tmregs_to_thread(target);
> + flush_fp_to_thread(target);
> + flush_altivec_to_thread(target);
> +
> + return 0;
> +}
> +
> +/**
>* tm_cgpr_active - get active number of registers in CGPR
>* @target: The target task.
>* @regset: The user regset structure.
> @@ -2124,6 +2147,11 @@ static int tm_cgpr32_get(struct task_struct *target,
>unsigned int pos, unsigned int count,
>void *kbuf, void __user *ubuf)
>   {
> + int ret = tm_flush_if_active(target);
> +
> + if (ret)
> + return ret;
> +
>   return gpr32_get_common(target, regset, pos, count, kbuf, ubuf,
>   &target->thread.ckpt_regs.gpr[0]);
>   }
> @@ -2133,6 +2161,11 @@ static int tm_cgpr32_set(struct task_struct *target,
>unsigned int pos, unsigned int count,
>const void *kbuf, const void __user *ubuf)
>   {
> + int ret = tm_flush_if_active(target);
> +
> + if (ret)
> + return ret;
> +
>   return gpr32_set_common(target, regset, pos, count, kbuf, ubuf,
>   &target->thread.ckpt_regs.gpr[0]);
>   }

Re: [PATCH] powerpc: build-time fixup alternate feature relative addresses

2024-03-12 Thread Christophe Leroy



Le 29/01/2024 à 07:25, Sathvika Vasireddy a écrit :
> Hi Christophe, Nick
> 
> On 1/26/24 12:32 AM, Christophe Leroy wrote:
>> Hi Nic,
>>
>> Le 21/05/2017 à 03:01, Nicholas Piggin a écrit :
>>> Implement build-time fixup of alternate feature relative addresses for
>>> the out-of-line ("else") patch code. This is done post-link with a new
>>> powerpc build tool that parses relocations and fixup structures, and
>>> adjusts branch instructions.
>>>
>>> This gives us the ability to link patch code anywhere in the kernel,
>>> without branches to targets outside the patch code having to be
>>> reached directly (without a linker stub). This allows patch code to be
>>> moved out from the head section, and avoids build failures with
>>> unresolvable branche.
>>
>> Is it worth keeping this hanging in patchwork ? It seems outdated and 
>> doesn't apply. Could this me done with objtool instead ?
>>
>> Christophe
> 
> Yes, this can be done with objtool. I am working on this and will post 
> an RFC this week.
> 

Nice.

I've open an issue to track his at 
https://github.com/linuxppc/issues/issues/479 and have retired Nic's 
patch in patchwork.

Christophe

64 matches

Mail list logo