Re: [powerpc]Kernel crash while running xfstests (generic/250) [next-20220404]

2022-05-09 Thread Sachin Sant



> On 07-Apr-2022, at 10:19 AM, Sachin Sant  wrote:
> 
> 
>> On 04-Apr-2022, at 5:04 PM, Sachin Sant  wrote:
>> 
>> While running xfstests(ext4 or XFS as fs) on a Power10 LPAR booted with 
>> today’s
>> next (5.18.0-rc1-next-20220404) following crash is seen. 
>> 
>> This problem was possibly introduced with 5.17.0-next-20220330. 
>> Git bisect leads me to following patch
>> commit 1d158814db8e7b3cbca0f2c8d9242fbec4fbc57e
>>   dm: conditionally enable BIOSET_PERCPU_CACHE for dm_io bioset
>> 
> 
> Continue to see this problem with latest next. 

I can still recreate this issue against latest linux-next build.

[ 1536.883400] Buffer I/O error on dev dm-0, logical block 10485497, async page 
read
[ 1536.936018] XFS (dm-0): Unmounting Filesystem
[ 1536.938849] XFS (dm-0): Mounting V5 Filesystem
[ 1536.946007] XFS (dm-0): Ending clean mount
[ 1536.947926] xfs filesystem being mounted at /mnt/scratch supports timestamps 
until 2038 (0x7fff)
[ 1537.052850] XFS (dm-0): Unmounting Filesystem
[ 1537.083979] BUG: Unable to handle kernel data access at 0x5deadbeef122
[ 1537.083982] Faulting instruction address: 0xc015b0bc
[ 1537.083984] Oops: Kernel access of bad area, sig: 11 [#1]
[ 1537.084000] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries
[ 1537.084006] Modules linked in: dm_snapshot(E) dm_bufio(E) loop(E) 
dm_flakey(E) xfs(E) dm_mod(E) nft_fib_inet(E) nft_fib_ipv4(E) nft_fib_ipv6(E) 
nft_fib(E) nft_reject_inet(E) nf_reject_ipv4(E) nf_reject_ipv6(E) nft_reject(E) 
nft_ct(E) nft_chain_nat(E) nf_nat(E) nf_conntrack(E) nf_defrag_ipv6(E) 
nf_defrag_ipv4(E) rfkill(E) ip_set(E) nf_tables(E) bonding(E) tls(E) 
libcrc32c(E) nfnetlink(E) sunrpc(E) nd_pmem(E) nd_btt(E) dax_pmem(E) 
pseries_rng(E) papr_scm(E) libnvdimm(E) vmx_crypto(E) ext4(E) mbcache(E) 
jbd2(E) sd_mod(E) t10_pi(E) crc64_rocksoft(E) crc64(E) sg(E) ibmvscsi(E) 
ibmveth(E) scsi_transport_srp(E) fuse(E) [last unloaded: scsi_debug]
[ 1537.084056] CPU: 10 PID: 970489 Comm: dmsetup Tainted: GE 
5.18.0-rc6-next-20220509 #2
[ 1537.084061] NIP:  c015b0bc LR: c015afe8 CTR: c0753bb0
[ 1537.084064] REGS: c000211fb610 TRAP: 0380   Tainted: GE  
(5.18.0-rc6-next-20220509)
[ 1537.084068] MSR:  8280b033   CR: 
24024824  XER: 2004
[ 1537.084078] CFAR: c015aff0 IRQMASK: 0 
[ 1537.084078] GPR00: c015afe8 c000211fb8b0 c2a7cf00 
 
[ 1537.084078] GPR04: c000f98a1378  c000f5043b50 
c0043463e280 
[ 1537.084078] GPR08: c0043463e280 5deadbeef100 5deadbeef122 
c0080214dcb0 
[ 1537.084078] GPR12: c0753bb0 c00abfff1700 000155ee0b60 
7fffa7c29da8 
[ 1537.084078] GPR16: 7fffa7c29da8 7fffa7c29da8 7fffa7c63670 
 
[ 1537.084078] GPR20: 7fffa7c33388 7fffa7c62040 000155ee0b90 
0131 
[ 1537.084078] GPR24: c25adb68  c25adb30 
c00103a5e000 
[ 1537.084078] GPR28: c2a23ce8 c000f98a1378 0017 
 
[ 1537.084117] NIP [c015b0bc] __cpuhp_state_remove_instance+0x19c/0x2c0
[ 1537.084125] LR [c015afe8] __cpuhp_state_remove_instance+0xc8/0x2c0
[ 1537.084130] Call Trace:
[ 1537.084131] [c000211fb8b0] [c015afe8] 
__cpuhp_state_remove_instance+0xc8/0x2c0 (unreliable)
[ 1537.084138] [c000211fb920] [c0753c14] bioset_exit+0x64/0x280
[ 1537.084144] [c000211fb9c0] [c00802137744] 
cleanup_mapped_device+0x4c/0x1c0 [dm_mod]
[ 1537.084155] [c000211fba00] [c00802137a60] __dm_destroy+0x1a8/0x360 
[dm_mod]
[ 1537.084163] [c000211fbaa0] [c008021445c0] dev_remove+0x1b8/0x290 
[dm_mod]
[ 1537.084172] [c000211fbb30] [c0080214488c] ctl_ioctl+0x1f4/0x7d0 
[dm_mod]
[ 1537.084181] [c000211fbd40] [c00802144e88] dm_ctl_ioctl+0x20/0x40 
[dm_mod]
[ 1537.084190] [c000211fbd60] [c055ff28] sys_ioctl+0xf8/0x190
[ 1537.084195] [c000211fbdb0] [c003377c] 
system_call_exception+0x17c/0x350
[ 1537.084200] [c000211fbe10] [c000c54c] 
system_call_common+0xec/0x270
[ 1537.084205] --- interrupt: c00 at 0x7fffa7529210
[ 1537.084208] NIP:  7fffa7529210 LR: 7fffa7c26824 CTR: 
[ 1537.084211] REGS: c000211fbe80 TRAP: 0c00   Tainted: GE  
(5.18.0-rc6-next-20220509)
[ 1537.084215] MSR:  8280f033   CR: 
24004484  XER: 
[ 1537.084224] IRQMASK: 0 
[ 1537.084224] GPR00: 0036 7fffc7448c30 7fffa7607300 
0003 
[ 1537.084224] GPR04: c138fd04 000155ee0b60 0004 
7fffa7c33f98 
[ 1537.084224] GPR08: 0003   
 
[ 1537.084224] GPR12:  7fffa7d0fa80 000155ee0b60 
7fffa7c29da8 
[ 1537.084224] GPR16: 7fffa7c29da8 7fffa7c29da8 7fffa7

Re: [PATCH v3 0/3] Fix CONT-PTE/PMD size hugetlb issue when unmapping or migrating

2022-05-09 Thread Baolin Wang




On 5/10/2022 12:04 PM, Andrew Morton wrote:

On Tue, 10 May 2022 11:45:57 +0800 Baolin Wang  
wrote:


Hi,

Now migrating a hugetlb page or unmapping a poisoned hugetlb page, we'll
use ptep_clear_flush() and set_pte_at() to nuke the page table entry
and remap it, and this is incorrect for CONT-PTE or CONT-PMD size hugetlb
page,


It would be helpful to describe why it's wrong.  Something like "should
use huge_ptep_clear_flush() and huge_ptep_clear_flush() for this
purpose"?


Sorry for the confusing description. I described the problem explicitly 
in each patch's commit message.


https://lore.kernel.org/all/ea5abf529f0997b5430961012bfda6166c1efc8c.1652147571.git.baolin.w...@linux.alibaba.com/
https://lore.kernel.org/all/730ea4b6d292f32fb10b7a4e87dad49b0eb30474.1652147571.git.baolin.w...@linux.alibaba.com/




which will cause potential data consistent issue. This patch set
will change to use hugetlb related APIs to fix this issue, please find
details in each patch. Thanks.


Is a cc:stable needed here?  And are we able to identify a target for a
Fixes: tag?


I think need a cc:stable tag, however I am not sure the target fixes 
tag, since we should trace back to the introduction of CONT-PTE/PMD 
hugetlb? 66b3923a1a0f ("arm64: hugetlb: add support for PTE contiguous bit")


Re: [PATCH v3 0/3] Fix CONT-PTE/PMD size hugetlb issue when unmapping or migrating

2022-05-09 Thread Andrew Morton
On Tue, 10 May 2022 11:45:57 +0800 Baolin Wang  
wrote:

> Hi,
> 
> Now migrating a hugetlb page or unmapping a poisoned hugetlb page, we'll
> use ptep_clear_flush() and set_pte_at() to nuke the page table entry
> and remap it, and this is incorrect for CONT-PTE or CONT-PMD size hugetlb
> page,

It would be helpful to describe why it's wrong.  Something like "should
use huge_ptep_clear_flush() and huge_ptep_clear_flush() for this
purpose"?

> which will cause potential data consistent issue. This patch set
> will change to use hugetlb related APIs to fix this issue, please find
> details in each patch. Thanks.

Is a cc:stable needed here?  And are we able to identify a target for a
Fixes: tag?




[PATCH v3 3/3] mm: rmap: Fix CONT-PTE/PMD size hugetlb issue when unmapping

2022-05-09 Thread Baolin Wang
On some architectures (like ARM64), it can support CONT-PTE/PMD size
hugetlb, which means it can support not only PMD/PUD size hugetlb:
2M and 1G, but also CONT-PTE/PMD size: 64K and 32M if a 4K page
size specified.

When unmapping a hugetlb page, we will get the relevant page table
entry by huge_pte_offset() only once to nuke it. This is correct
for PMD or PUD size hugetlb, since they always contain only one
pmd entry or pud entry in the page table.

However this is incorrect for CONT-PTE and CONT-PMD size hugetlb,
since they can contain several continuous pte or pmd entry with
same page table attributes, so we will nuke only one pte or pmd
entry for this CONT-PTE/PMD size hugetlb page.

And now try_to_unmap() is only passed a hugetlb page in the case
where the hugetlb page is poisoned. Which means now we will unmap
only one pte entry for a CONT-PTE or CONT-PMD size poisoned hugetlb
page, and we can still access other subpages of a CONT-PTE or CONT-PMD
size poisoned hugetlb page, which will cause serious issues possibly.

So we should change to use huge_ptep_clear_flush() to nuke the
hugetlb page table to fix this issue, which already considered
CONT-PTE and CONT-PMD size hugetlb.

We've already used set_huge_swap_pte_at() to set a poisoned
swap entry for a poisoned hugetlb page. Meanwhile adding a VM_BUG_ON()
to make sure the passed hugetlb page is poisoned in try_to_unmap().

Signed-off-by: Baolin Wang 
Reviewed-by: Muchun Song 
Reviewed-by: Mike Kravetz 
---
 mm/rmap.c | 39 ++-
 1 file changed, 22 insertions(+), 17 deletions(-)

diff --git a/mm/rmap.c b/mm/rmap.c
index 4e96daf..219e287 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1528,6 +1528,11 @@ static bool try_to_unmap_one(struct folio *folio, struct 
vm_area_struct *vma,
 
if (folio_test_hugetlb(folio)) {
/*
+* The try_to_unmap() is only passed a hugetlb page
+* in the case where the hugetlb page is poisoned.
+*/
+   VM_BUG_ON_PAGE(!PageHWPoison(subpage), subpage);
+   /*
 * huge_pmd_unshare may unmap an entire PMD page.
 * There is no way of knowing exactly which PMDs may
 * be cached for this mm, so we must flush them all.
@@ -1562,28 +1567,28 @@ static bool try_to_unmap_one(struct folio *folio, 
struct vm_area_struct *vma,
break;
}
}
+   pteval = huge_ptep_clear_flush(vma, address, pvmw.pte);
} else {
flush_cache_page(vma, address, pte_pfn(*pvmw.pte));
-   }
-
-   /*
-* Nuke the page table entry. When having to clear
-* PageAnonExclusive(), we always have to flush.
-*/
-   if (should_defer_flush(mm, flags) && !anon_exclusive) {
/*
-* We clear the PTE but do not flush so potentially
-* a remote CPU could still be writing to the folio.
-* If the entry was previously clean then the
-* architecture must guarantee that a clear->dirty
-* transition on a cached TLB entry is written through
-* and traps if the PTE is unmapped.
+* Nuke the page table entry. When having to clear
+* PageAnonExclusive(), we always have to flush.
 */
-   pteval = ptep_get_and_clear(mm, address, pvmw.pte);
+   if (should_defer_flush(mm, flags) && !anon_exclusive) {
+   /*
+* We clear the PTE but do not flush so 
potentially
+* a remote CPU could still be writing to the 
folio.
+* If the entry was previously clean then the
+* architecture must guarantee that a 
clear->dirty
+* transition on a cached TLB entry is written 
through
+* and traps if the PTE is unmapped.
+*/
+   pteval = ptep_get_and_clear(mm, address, 
pvmw.pte);
 
-   set_tlb_ubc_flush_pending(mm, pte_dirty(pteval));
-   } else {
-   pteval = ptep_clear_flush(vma, address, pvmw.pte);
+   set_tlb_ubc_flush_pending(mm, 
pte_dirty(pteval));
+   } else {
+   pteval = ptep_clear_flush(vma, address, 
pvmw.pte);
+   }
}
 
/*
-- 
1.8.3.1



[PATCH v3 2/3] mm: rmap: Fix CONT-PTE/PMD size hugetlb issue when migration

2022-05-09 Thread Baolin Wang
On some architectures (like ARM64), it can support CONT-PTE/PMD size
hugetlb, which means it can support not only PMD/PUD size hugetlb:
2M and 1G, but also CONT-PTE/PMD size: 64K and 32M if a 4K page
size specified.

When migrating a hugetlb page, we will get the relevant page table
entry by huge_pte_offset() only once to nuke it and remap it with
a migration pte entry. This is correct for PMD or PUD size hugetlb,
since they always contain only one pmd entry or pud entry in the
page table.

However this is incorrect for CONT-PTE and CONT-PMD size hugetlb,
since they can contain several continuous pte or pmd entry with
same page table attributes. So we will nuke or remap only one pte
or pmd entry for this CONT-PTE/PMD size hugetlb page, which is
not expected for hugetlb migration. The problem is we can still
continue to modify the subpages' data of a hugetlb page during
migrating a hugetlb page, which can cause a serious data consistent
issue, since we did not nuke the page table entry and set a
migration pte for the subpages of a hugetlb page.

To fix this issue, we should change to use huge_ptep_clear_flush()
to nuke a hugetlb page table, and remap it with set_huge_pte_at()
and set_huge_swap_pte_at() when migrating a hugetlb page, which
already considered the CONT-PTE or CONT-PMD size hugetlb.

Signed-off-by: Baolin Wang 
Reviewed-by: Muchun Song 
Reviewed-by: Mike Kravetz 
---
 include/linux/hugetlb.h | 11 +++
 mm/rmap.c   | 24 ++--
 2 files changed, 29 insertions(+), 6 deletions(-)

diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index 306d6ef..9f71043 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -1093,6 +1093,17 @@ static inline void set_huge_swap_pte_at(struct mm_struct 
*mm, unsigned long addr
pte_t *ptep, pte_t pte, unsigned long 
sz)
 {
 }
+
+static inline pte_t huge_ptep_clear_flush(struct vm_area_struct *vma,
+ unsigned long addr, pte_t *ptep)
+{
+   return ptep_get(ptep);
+}
+
+static inline void set_huge_pte_at(struct mm_struct *mm, unsigned long addr,
+  pte_t *ptep, pte_t pte)
+{
+}
 #endif /* CONFIG_HUGETLB_PAGE */
 
 static inline spinlock_t *huge_pte_lock(struct hstate *h,
diff --git a/mm/rmap.c b/mm/rmap.c
index 94d6b24..4e96daf 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1926,13 +1926,15 @@ static bool try_to_migrate_one(struct folio *folio, 
struct vm_area_struct *vma,
break;
}
}
+
+   /* Nuke the hugetlb page table entry */
+   pteval = huge_ptep_clear_flush(vma, address, pvmw.pte);
} else {
flush_cache_page(vma, address, pte_pfn(*pvmw.pte));
+   /* Nuke the page table entry. */
+   pteval = ptep_clear_flush(vma, address, pvmw.pte);
}
 
-   /* Nuke the page table entry. */
-   pteval = ptep_clear_flush(vma, address, pvmw.pte);
-
/* Set the dirty flag on the folio now the pte is gone. */
if (pte_dirty(pteval))
folio_mark_dirty(folio);
@@ -2017,7 +2019,10 @@ static bool try_to_migrate_one(struct folio *folio, 
struct vm_area_struct *vma,
pte_t swp_pte;
 
if (arch_unmap_one(mm, vma, address, pteval) < 0) {
-   set_pte_at(mm, address, pvmw.pte, pteval);
+   if (folio_test_hugetlb(folio))
+   set_huge_pte_at(mm, address, pvmw.pte, 
pteval);
+   else
+   set_pte_at(mm, address, pvmw.pte, 
pteval);
ret = false;
page_vma_mapped_walk_done();
break;
@@ -2026,7 +2031,10 @@ static bool try_to_migrate_one(struct folio *folio, 
struct vm_area_struct *vma,
   !anon_exclusive, subpage);
if (anon_exclusive &&
page_try_share_anon_rmap(subpage)) {
-   set_pte_at(mm, address, pvmw.pte, pteval);
+   if (folio_test_hugetlb(folio))
+   set_huge_pte_at(mm, address, pvmw.pte, 
pteval);
+   else
+   set_pte_at(mm, address, pvmw.pte, 
pteval);
ret = false;
page_vma_mapped_walk_done();
break;
@@ -2052,7 +2060,11 @@ static bool try_to_migrate_one(struct folio *folio, 
struct vm_area_struct *vma,
swp_pte = pte_swp_mksoft_dirty(swp_pte);
  

[PATCH v3 1/3] mm: change huge_ptep_clear_flush() to return the original pte

2022-05-09 Thread Baolin Wang
It is incorrect to use ptep_clear_flush() to nuke a hugetlb page
table when unmapping or migrating a hugetlb page, and will change
to use huge_ptep_clear_flush() instead in the following patches.

So this is a preparation patch, which changes the huge_ptep_clear_flush()
to return the original pte to help to nuke a hugetlb page table.

Signed-off-by: Baolin Wang 
Acked-by: Mike Kravetz 
Reviewed-by: Muchun Song 
---
 arch/arm64/include/asm/hugetlb.h   |  4 ++--
 arch/arm64/mm/hugetlbpage.c| 12 +---
 arch/ia64/include/asm/hugetlb.h|  4 ++--
 arch/mips/include/asm/hugetlb.h|  9 ++---
 arch/parisc/include/asm/hugetlb.h  |  4 ++--
 arch/powerpc/include/asm/hugetlb.h |  9 ++---
 arch/s390/include/asm/hugetlb.h|  6 +++---
 arch/sh/include/asm/hugetlb.h  |  4 ++--
 arch/sparc/include/asm/hugetlb.h   |  4 ++--
 include/asm-generic/hugetlb.h  |  4 ++--
 10 files changed, 32 insertions(+), 28 deletions(-)

diff --git a/arch/arm64/include/asm/hugetlb.h b/arch/arm64/include/asm/hugetlb.h
index 1242f71..616b2ca 100644
--- a/arch/arm64/include/asm/hugetlb.h
+++ b/arch/arm64/include/asm/hugetlb.h
@@ -39,8 +39,8 @@ extern pte_t huge_ptep_get_and_clear(struct mm_struct *mm,
 extern void huge_ptep_set_wrprotect(struct mm_struct *mm,
unsigned long addr, pte_t *ptep);
 #define __HAVE_ARCH_HUGE_PTEP_CLEAR_FLUSH
-extern void huge_ptep_clear_flush(struct vm_area_struct *vma,
- unsigned long addr, pte_t *ptep);
+extern pte_t huge_ptep_clear_flush(struct vm_area_struct *vma,
+  unsigned long addr, pte_t *ptep);
 #define __HAVE_ARCH_HUGE_PTE_CLEAR
 extern void huge_pte_clear(struct mm_struct *mm, unsigned long addr,
   pte_t *ptep, unsigned long sz);
diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c
index cbace1c..ca8e65c 100644
--- a/arch/arm64/mm/hugetlbpage.c
+++ b/arch/arm64/mm/hugetlbpage.c
@@ -486,19 +486,17 @@ void huge_ptep_set_wrprotect(struct mm_struct *mm,
set_pte_at(mm, addr, ptep, pfn_pte(pfn, hugeprot));
 }
 
-void huge_ptep_clear_flush(struct vm_area_struct *vma,
-  unsigned long addr, pte_t *ptep)
+pte_t huge_ptep_clear_flush(struct vm_area_struct *vma,
+   unsigned long addr, pte_t *ptep)
 {
size_t pgsize;
int ncontig;
 
-   if (!pte_cont(READ_ONCE(*ptep))) {
-   ptep_clear_flush(vma, addr, ptep);
-   return;
-   }
+   if (!pte_cont(READ_ONCE(*ptep)))
+   return ptep_clear_flush(vma, addr, ptep);
 
ncontig = find_num_contig(vma->vm_mm, addr, ptep, );
-   clear_flush(vma->vm_mm, addr, ptep, pgsize, ncontig);
+   return get_clear_flush(vma->vm_mm, addr, ptep, pgsize, ncontig);
 }
 
 static int __init hugetlbpage_init(void)
diff --git a/arch/ia64/include/asm/hugetlb.h b/arch/ia64/include/asm/hugetlb.h
index 7e46ebd..65d3811 100644
--- a/arch/ia64/include/asm/hugetlb.h
+++ b/arch/ia64/include/asm/hugetlb.h
@@ -23,8 +23,8 @@ static inline int is_hugepage_only_range(struct mm_struct *mm,
 #define is_hugepage_only_range is_hugepage_only_range
 
 #define __HAVE_ARCH_HUGE_PTEP_CLEAR_FLUSH
-static inline void huge_ptep_clear_flush(struct vm_area_struct *vma,
-unsigned long addr, pte_t *ptep)
+static inline pte_t huge_ptep_clear_flush(struct vm_area_struct *vma,
+ unsigned long addr, pte_t *ptep)
 {
 }
 
diff --git a/arch/mips/include/asm/hugetlb.h b/arch/mips/include/asm/hugetlb.h
index c214440..fd69c88 100644
--- a/arch/mips/include/asm/hugetlb.h
+++ b/arch/mips/include/asm/hugetlb.h
@@ -43,16 +43,19 @@ static inline pte_t huge_ptep_get_and_clear(struct 
mm_struct *mm,
 }
 
 #define __HAVE_ARCH_HUGE_PTEP_CLEAR_FLUSH
-static inline void huge_ptep_clear_flush(struct vm_area_struct *vma,
-unsigned long addr, pte_t *ptep)
+static inline pte_t huge_ptep_clear_flush(struct vm_area_struct *vma,
+ unsigned long addr, pte_t *ptep)
 {
+   pte_t pte;
+
/*
 * clear the huge pte entry firstly, so that the other smp threads will
 * not get old pte entry after finishing flush_tlb_page and before
 * setting new huge pte entry
 */
-   huge_ptep_get_and_clear(vma->vm_mm, addr, ptep);
+   pte = huge_ptep_get_and_clear(vma->vm_mm, addr, ptep);
flush_tlb_page(vma, addr);
+   return pte;
 }
 
 #define __HAVE_ARCH_HUGE_PTE_NONE
diff --git a/arch/parisc/include/asm/hugetlb.h 
b/arch/parisc/include/asm/hugetlb.h
index a69cf9e..25bc560 100644
--- a/arch/parisc/include/asm/hugetlb.h
+++ b/arch/parisc/include/asm/hugetlb.h
@@ -28,8 +28,8 @@ static inline int prepare_hugepage_range(struct file *file,
 }
 
 #define __HAVE_ARCH_HUGE_PTEP_CLEAR_FLUSH
-static inline void huge_ptep_clear_flush(struct 

[PATCH v3 0/3] Fix CONT-PTE/PMD size hugetlb issue when unmapping or migrating

2022-05-09 Thread Baolin Wang
Hi,

Now migrating a hugetlb page or unmapping a poisoned hugetlb page, we'll
use ptep_clear_flush() and set_pte_at() to nuke the page table entry
and remap it, and this is incorrect for CONT-PTE or CONT-PMD size hugetlb
page, which will cause potential data consistent issue. This patch set
will change to use hugetlb related APIs to fix this issue, please find
details in each patch. Thanks.

Note: Mike pointed out the huge_ptep_get() will only return the one specific
value, and it would not take into account the dirty or young bits of 
CONT-PTE/PMDs
like the huge_ptep_get_and_clear() [1]. This inconsistent issue is not 
introduced
by this patch set, and will address this issue in another thread [2]. Meanwhile
the uffd for hugetlb case [3] pointed by Gerald also need another patch to 
address.

[1] 
https://lore.kernel.org/linux-mm/85bd80b4-b4fd-0d3f-a2e5-149559f2f...@oracle.com/
[2] 
https://lore.kernel.org/all/cover.1651998586.git.baolin.w...@linux.alibaba.com/
[3] https://lore.kernel.org/linux-mm/20220503120343.6264e126@thinkpad/

Changes from v2:
 - Collect reviewed tags from Muchun and Mike.
 - Drop the unnecessary casting in hugetlb.c.
 - Fix building errors with adding dummy functions for !CONFIG_HUGETLB_PAGE.

Changes from v1:
 - Add acked tag from Mike.
 - Update some commit message.
 - Add VM_BUG_ON in try_to_unmap() for hugetlb case.
 - Add an explict void casting for huge_ptep_clear_flush() in hugetlb.c.

Baolin Wang (3):
  mm: change huge_ptep_clear_flush() to return the original pte
  mm: rmap: Fix CONT-PTE/PMD size hugetlb issue when migration
  mm: rmap: Fix CONT-PTE/PMD size hugetlb issue when unmapping

 arch/arm64/include/asm/hugetlb.h   |  4 +--
 arch/arm64/mm/hugetlbpage.c| 12 +++-
 arch/ia64/include/asm/hugetlb.h|  4 +--
 arch/mips/include/asm/hugetlb.h|  9 --
 arch/parisc/include/asm/hugetlb.h  |  4 +--
 arch/powerpc/include/asm/hugetlb.h |  9 --
 arch/s390/include/asm/hugetlb.h|  6 ++--
 arch/sh/include/asm/hugetlb.h  |  4 +--
 arch/sparc/include/asm/hugetlb.h   |  4 +--
 include/asm-generic/hugetlb.h  |  4 +--
 include/linux/hugetlb.h| 11 +++
 mm/rmap.c  | 63 --
 12 files changed, 83 insertions(+), 51 deletions(-)

-- 
1.8.3.1



Re: [PATCH] crypto: vmx - Fix build error

2022-05-09 Thread Herbert Xu
On Sat, May 07, 2022 at 02:22:43PM +0900, Masahiro Yamada wrote:
> When I refactored this Makefile, I accidentally changed the CONFIG
> option.
> 
> Fixes: b52455a73db9 ("crypto: vmx - Align the short log with Makefile 
> cleanups")
> Reported-by: kernel test robot 
> Signed-off-by: Masahiro Yamada 
> ---
> 
>  drivers/crypto/vmx/Makefile | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Patch applied.  Thanks.
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt


Re: [PATCH v2 1/3] mm: change huge_ptep_clear_flush() to return the original pte

2022-05-09 Thread Baolin Wang




On 5/10/2022 4:02 AM, Mike Kravetz wrote:

On 5/9/22 01:46, Baolin Wang wrote:



On 5/9/2022 1:46 PM, Christophe Leroy wrote:



Le 08/05/2022 à 15:09, Baolin Wang a écrit :



On 5/8/2022 7:09 PM, Muchun Song wrote:

On Sun, May 08, 2022 at 05:36:39PM +0800, Baolin Wang wrote:

It is incorrect to use ptep_clear_flush() to nuke a hugetlb page
table when unmapping or migrating a hugetlb page, and will change
to use huge_ptep_clear_flush() instead in the following patches.

So this is a preparation patch, which changes the
huge_ptep_clear_flush()
to return the original pte to help to nuke a hugetlb page table.

Signed-off-by: Baolin Wang 
Acked-by: Mike Kravetz 


Reviewed-by: Muchun Song 


Thanks for reviewing.



But one nit below:

[...]

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 8605d7e..61a21af 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -5342,7 +5342,7 @@ static vm_fault_t hugetlb_wp(struct mm_struct
*mm, struct vm_area_struct *vma,
    ClearHPageRestoreReserve(new_page);
    /* Break COW or unshare */
-    huge_ptep_clear_flush(vma, haddr, ptep);
+    (void)huge_ptep_clear_flush(vma, haddr, ptep);


Why add a "(void)" here? Is there any warning if no "(void)"?
IIUC, I think we can remove this, right?


I did not meet any warning without the casting, but this is per Mike's
comment[1] to make the code consistent with other functions casting to
void type explicitly in hugetlb.c file.

[1]
https://lore.kernel.org/all/495c4ebe-a5b4-afb6-4cb0-956c1b18d...@oracle.com/



As far as I understand, Mike said that you should be accompagnied with a
big fat comment explaining why we ignore the returned value from
huge_ptep_clear_flush(). >
By the way huge_ptep_clear_flush() is not declared 'must_check' so this
cast is just visual polution and should be removed.

In the meantime the comment suggested by Mike should be added instead.

Sorry for my misunderstanding. I just follow the explicit void casting like 
other places in hugetlb.c file. And I am not sure if it is useful adding some 
comments like below, since we did not need the original pte value in the COW 
case mapping with a new page, and the code is more readable already I think.

Mike, could you help to clarify what useful comments would you like? and remove 
the explicit void casting? Thanks.



Sorry for the confusion.

In the original commit, it seemed odd to me that the signature of the
function was changing and there was not an associated change to the only
caller of the function.  I did suggest casting to void or adding a comment.
As Christophe mentions, the cast to void is not necessary.  In addition,
there really isn't a need for a comment as the calling code is not changed.


OK. Will drop the casting in next version.



The original version of the commit without either is actually preferable.
The commit message does say this is a preparation patch and the return
value will be used in later patches.


OK. Thanks Mike for making me clear. Also thanks to Muchun and Christophe.


Re: [PATCH 3/3] mm: rmap: Fix CONT-PTE/PMD size hugetlb issue when unmapping

2022-05-09 Thread Baolin Wang




On 5/10/2022 12:41 AM, Peter Xu wrote:

On Fri, May 06, 2022 at 12:07:13PM -0700, Mike Kravetz wrote:

On 5/3/22 03:03, Gerald Schaefer wrote:

On Tue, 3 May 2022 10:19:46 +0800
Baolin Wang  wrote:

On 5/2/2022 10:02 PM, Gerald Schaefer wrote:


[...]


Please see previous code, we'll use the original pte value to check if
it is uffd-wp armed, and if need to mark it dirty though the hugetlbfs
is set noop_dirty_folio().

pte_install_uffd_wp_if_needed(vma, address, pvmw.pte, pteval);


Uh, ok, that wouldn't work on s390, but we also don't have
CONFIG_PTE_MARKER_UFFD_WP / HAVE_ARCH_USERFAULTFD_WP set, so
I guess we will be fine (for now).

Still, I find it a bit unsettling that pte_install_uffd_wp_if_needed()
would work on a potential hugetlb *pte, directly de-referencing it
instead of using huge_ptep_get().

The !pte_none(*pte) check at the beginning would be broken in the
hugetlb case for s390 (not sure about other archs, but I think s390
might be the only exception strictly requiring huge_ptep_get()
for de-referencing hugetlb *pte pointers).


We could have used is_vm_hugetlb_page(vma) within the helper so as to
properly use either generic pte or hugetlb version of pte fetching.  We may
want to conditionally do set_[huge_]pte_at() too at the end.

I could prepare a patch for that even if it's not really anything urgently
needed. I assume that won't need to block this patchset since we need the
pteval for pte_dirty() check anyway and uffd-wp definitely needs it too.


OK. Thanks Peter.


Re: [PATCH] bug: Use normal relative pointers in 'struct bug_entry'

2022-05-09 Thread Josh Poimboeuf
On Mon, May 09, 2022 at 10:31:14PM +1000, Michael Ellerman wrote:
> Embarrassingly, we have another copy of the logic, used in the C
> versions, they need updating too:

Oops, thanks for finding that.

> With that added it seems to be working correctly for me.
> 
> Acked-by: Michael Ellerman  (powerpc)

Thanks!

-- 
Josh


Re: [PATCH 24/30] panic: Refactor the panic path

2022-05-09 Thread Guilherme G. Piccoli
Hey Hatayma, thanks for your great analysis and no need for apologies!

I'll comment/respond properly inline below, just noticing here that I've
CCed Mark and Marc (from the ARM64 perspective), Michael (Hyper-V
perspective) and Hari (PowerPC perspective), besides the usual suspects
as Petr, Baoquan, etc.

On 09/05/2022 12:16, d.hatay...@fujitsu.com wrote:
> Sorry for the delayed response. Unfortunately, I had 10 days holidays
> until yesterday...
> [...] 
>> +   We currently have 4 lists of panic notifiers; based
>> +   on the functionality and risk (for panic success) the
>> +   callbacks are added in a given list. The lists are:
>> +   - hypervisor/FW notification list (low risk);
>> +   - informational list (low/medium risk);
>> +   - pre_reboot list (higher risk);
>> +   - post_reboot list (only run late in panic and after
>> +   kdump, not configurable for now).
>> +   This parameter defines the ordering of the first 3
>> +   lists with regards to kdump; the levels determine
>> +   which set of notifiers execute before kdump. The
>> +   accepted levels are:
>> +   0: kdump is the first thing to run, NO list is
>> +   executed before kdump.
>> +   1: only the hypervisor list is executed before kdump.
>> +   2 (default level): the hypervisor list and (*if*
> 
> Hmmm, why are you trying to change default setting?
> 
> Based on the current design of kdump, it's natural to put what the
> handlers for these level 1 and level 2 handlers do in
> machine_crash_shutdown(), as these are necessary by default, right?
> 
> Or have you already tried that and figured out it's difficult in some
> reason and reached the current design? If so, why is that difficult?
> Could you point to if there is already such discussion online?
> 
> kdump is designed to perform as little things as possible before
> transferring the execution to the 2nd kernel in order to increase
> reliability. Just detour to panic() increases risks of kdump failure
> in the sense of increasing the executed codes in the abnormal
> situation, which is very the note in the explanation of
> crash_kexec_post_notifiers.
> 
> Also, the current implementation of crash_kexec_post_notifiers uses
> the panic notifier, but this is not from the technical
> reason. Ideally, it should have been implemented in the context of
> crash_kexec() independently of panic().
> 
> That is, it looks to me that, in addition to changing design of panic
> notifier, you are trying to integrate shutdown code of the crash kexec
> and the panic paths. If so, this is a big design change for kdump.
> I'm concerned about increase of reliability. I'd like you to discuss
> them carefully.

>From my understanding (specially based on both these threads [0] and
[1]), 3 facts are clear and quite complex in nature:

(a) Currently, the panic notifier list is a "no man's land" - it's a
mess, all sort of callbacks are added there, some of them are extremely
risk for breaking kdump, others are quite safe (like setting a
variable). Petr's details in thread [0] are really clear and express in
great way how confusing and conflicting the panic notifiers goals are.

(b) In order to "address" problems in the antagonistic goals of
notifiers (see point (a) above and thread [0]), we have this
quirk/workaround called "crash_kexec_post_notifiers". This is useful,
but (almost as for attesting how this is working as band-aid over
complex and fundamental issues) see the following commits:

a11589563e96 ("x86/Hyper-V: Report crash register data or kmsg before
running crash kernel")

06e629c25daa ("powerpc/fadump: Fix inaccurate CPU state info in vmcore
generated with panic")

They hardcode such workaround, because they *need* some notifiers'
callbacks. But notice they *don't need all of them*, only some important
ones (that usually are good considering the risk, it's a good
cost/benefit). Since we currently have an all-or-nothing behavior for
the panic notifiers, both PowerPC and Hyper-V end-up bringing all of
them to run before kdump due to the lack of flexibility, increasing a
lot the risk of failure for kdump.

(c) To add on top of all such complexity, we don't have a custom
machine_crash_shutdown() handler for some architectures like ARM64, and
the feeling is that's not right to add a bunch of callbacks / complexity
in such architecture code, specially since we have the notifiers
infrastructure in the kernel. I've recently started a discussion about
that with ARM64 community, please take a look in [1].

With that said, we can use (a) + (b) + (c) to justify our argument here:
the panic notifiers should be refactored! We need to try to encompass
the antagonistic goals of kdump (wants to be the 

Re: [PATCH 09/30] coresight: cpu-debug: Replace mutex with mutex_trylock on panic notifier

2022-05-09 Thread Guilherme G. Piccoli
On 09/05/2022 13:14, Suzuki K Poulose wrote:
> [...]> 
> I have queued this to coresight/next.
> 
> Thanks
> Suzuki


Thanks a lot Suzuki!


Re: [PATCH 09/30] coresight: cpu-debug: Replace mutex with mutex_trylock on panic notifier

2022-05-09 Thread Suzuki K Poulose

Hi

On 09/05/2022 14:09, Guilherme G. Piccoli wrote:

On 28/04/2022 05:11, Suzuki K Poulose wrote:

Hi Guilherme,

On 27/04/2022 23:49, Guilherme G. Piccoli wrote:

The panic notifier infrastructure executes registered callbacks when
a panic event happens - such callbacks are executed in atomic context,
with interrupts and preemption disabled in the running CPU and all other
CPUs disabled. That said, mutexes in such context are not a good idea.

This patch replaces a regular mutex with a mutex_trylock safer approach;
given the nature of the mutex used in the driver, it should be pretty
uncommon being unable to acquire such mutex in the panic path, hence
no functional change should be observed (and if it is, that would be
likely a deadlock with the regular mutex).

Fixes: 2227b7c74634 ("coresight: add support for CPU debug module")
Cc: Leo Yan 
Cc: Mathieu Poirier 
Cc: Mike Leach 
Cc: Suzuki K Poulose 
Signed-off-by: Guilherme G. Piccoli 


How would you like to proceed with queuing this ? I am happy
either way. In case you plan to push this as part of this
series (I don't see any potential conflicts) :

Reviewed-by: Suzuki K Poulose 


Hi Suzuki, some other maintainers are taking the patches to their next
branches for example. I'm working on V2, and I guess in the end would be
nice to reduce the size of the series a bit.

So, do you think you could pick this one for your coresight/next branch
(or even for rc cycle, your call - this is really a fix)?
This way, I won't re-submit this one in V2, since it's gonna be merged
already in your branch.


I have queued this to coresight/next.

Thanks
Suzuki


Re: [PATCH 24/30] panic: Refactor the panic path

2022-05-09 Thread d.hatay...@fujitsu.com
Sorry for the delayed response. Unfortunately, I had 10 days holidays
until yesterday...

>  .../admin-guide/kernel-parameters.txt |  42 ++-
>  include/linux/panic_notifier.h|   1 +
>  kernel/kexec_core.c   |   8 +-
>  kernel/panic.c| 292 +-
>  .../selftests/pstore/pstore_crash_test|   5 +-
>  5 files changed, 252 insertions(+), 96 deletions(-)
> 
> diff --git a/Documentation/admin-guide/kernel-parameters.txt 
> b/Documentation/admin-guide/kernel-parameters.txt
> index 3f1cc5e317ed..8d3524060ce3 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
...snip...
> @@ -3784,6 +3791,33 @@
> timeout < 0: reboot immediately
> Format: 
> 
> +   panic_notifiers_level=
> +   [KNL] Set the panic notifiers execution order.
> +   Format: 
> +   We currently have 4 lists of panic notifiers; based
> +   on the functionality and risk (for panic success) the
> +   callbacks are added in a given list. The lists are:
> +   - hypervisor/FW notification list (low risk);
> +   - informational list (low/medium risk);
> +   - pre_reboot list (higher risk);
> +   - post_reboot list (only run late in panic and after
> +   kdump, not configurable for now).
> +   This parameter defines the ordering of the first 3
> +   lists with regards to kdump; the levels determine
> +   which set of notifiers execute before kdump. The
> +   accepted levels are:
> +   0: kdump is the first thing to run, NO list is
> +   executed before kdump.
> +   1: only the hypervisor list is executed before kdump.
> +   2 (default level): the hypervisor list and (*if*

Hmmm, why are you trying to change default setting?

Based on the current design of kdump, it's natural to put what the
handlers for these level 1 and level 2 handlers do in
machine_crash_shutdown(), as these are necessary by default, right?

Or have you already tried that and figured out it's difficult in some
reason and reached the current design? If so, why is that difficult?
Could you point to if there is already such discussion online?

kdump is designed to perform as little things as possible before
transferring the execution to the 2nd kernel in order to increase
reliability. Just detour to panic() increases risks of kdump failure
in the sense of increasing the executed codes in the abnormal
situation, which is very the note in the explanation of
crash_kexec_post_notifiers.

Also, the current implementation of crash_kexec_post_notifiers uses
the panic notifier, but this is not from the technical
reason. Ideally, it should have been implemented in the context of
crash_kexec() independently of panic().

That is, it looks to me that, in addition to changing design of panic
notifier, you are trying to integrate shutdown code of the crash kexec
and the panic paths. If so, this is a big design change for kdump.
I'm concerned about increase of reliability. I'd like you to discuss
them carefully.

Thanks.
HATAYAMA, Daisuke



Re: [PATCH 22/30] panic: Introduce the panic post-reboot notifier list

2022-05-09 Thread Guilherme G. Piccoli
On 27/04/2022 19:49, Guilherme G. Piccoli wrote:
> Currently we have 3 notifier lists in the panic path, which will
> be wired in a way to allow the notifier callbacks to run in
> different moments at panic time, in a subsequent patch.
> 
> But there is also an odd set of architecture calls hardcoded in
> the end of panic path, after the restart machinery. They're
> responsible for late time tunings / events, like enabling a stop
> button (Sparc) or effectively stopping the machine (s390).
> 
> This patch introduces yet another notifier list to offer the
> architectures a way to add callbacks in such late moment on
> panic path without the need of ifdefs / hardcoded approaches.
> 
> Cc: Alexander Gordeev 
> Cc: Christian Borntraeger 
> Cc: "David S. Miller" 
> Cc: Heiko Carstens 
> Cc: Sven Schnelle 
> Cc: Vasily Gorbik 
> Signed-off-by: Guilherme G. Piccoli 

Hey S390/SPARC folks, sorry for the ping!

Any reviews on this V1 would be greatly appreciated, I'm working on V2
and seeking feedback in the non-reviewed patches.

Thanks in advance,


Guilherme


Re: [PATCH 09/30] coresight: cpu-debug: Replace mutex with mutex_trylock on panic notifier

2022-05-09 Thread Guilherme G. Piccoli
On 28/04/2022 05:11, Suzuki K Poulose wrote:
> Hi Guilherme,
> 
> On 27/04/2022 23:49, Guilherme G. Piccoli wrote:
>> The panic notifier infrastructure executes registered callbacks when
>> a panic event happens - such callbacks are executed in atomic context,
>> with interrupts and preemption disabled in the running CPU and all other
>> CPUs disabled. That said, mutexes in such context are not a good idea.
>>
>> This patch replaces a regular mutex with a mutex_trylock safer approach;
>> given the nature of the mutex used in the driver, it should be pretty
>> uncommon being unable to acquire such mutex in the panic path, hence
>> no functional change should be observed (and if it is, that would be
>> likely a deadlock with the regular mutex).
>>
>> Fixes: 2227b7c74634 ("coresight: add support for CPU debug module")
>> Cc: Leo Yan 
>> Cc: Mathieu Poirier 
>> Cc: Mike Leach 
>> Cc: Suzuki K Poulose 
>> Signed-off-by: Guilherme G. Piccoli 
> 
> How would you like to proceed with queuing this ? I am happy
> either way. In case you plan to push this as part of this
> series (I don't see any potential conflicts) :
> 
> Reviewed-by: Suzuki K Poulose 

Hi Suzuki, some other maintainers are taking the patches to their next
branches for example. I'm working on V2, and I guess in the end would be
nice to reduce the size of the series a bit.

So, do you think you could pick this one for your coresight/next branch
(or even for rc cycle, your call - this is really a fix)?
This way, I won't re-submit this one in V2, since it's gonna be merged
already in your branch.

Thanks in advance,


Guilherme


Re: [PATCH 08/30] powerpc/setup: Refactor/untangle panic notifiers

2022-05-09 Thread Guilherme G. Piccoli
On 05/05/2022 15:55, Hari Bathini wrote:
> [...] 
> The change looks good. I have tested it on an LPAR (ppc64).
> 
> Reviewed-by: Hari Bathini 
> 

Hi Michael. do you think it's possible to add this one to powerpc/next
(or something like that), or do you prefer a V2 with his tag?
Thanks,


Guilherme


Re: [PATCH 01/30] x86/crash,reboot: Avoid re-disabling VMX in all CPUs on crash/restart

2022-05-09 Thread Guilherme G. Piccoli
On 27/04/2022 19:48, Guilherme G. Piccoli wrote:
> In the panic path we have a list of functions to be called, the panic
> notifiers - such callbacks perform various actions in the machine's
> last breath, and sometimes users want them to run before kdump. We
> have the parameter "crash_kexec_post_notifiers" for that. When such
> parameter is used, the function "crash_smp_send_stop()" is executed
> to poweroff all secondary CPUs through the NMI-shootdown mechanism;
> part of this process involves disabling virtualization features in
> all CPUs (except the main one).
> 
> Now, in the emergency restart procedure we have also a way of
> disabling VMX in all CPUs, using the same NMI-shootdown mechanism;
> what happens though is that in case we already NMI-disabled all CPUs,
> the emergency restart fails due to a second addition of the same items
> in the NMI list, as per the following log output:
> 
> sysrq: Trigger a crash
> Kernel panic - not syncing: sysrq triggered crash
> [...]
> Rebooting in 2 seconds..
> list_add double add: new=, prev=, next=.
> [ cut here ]
> kernel BUG at lib/list_debug.c:29!
> invalid opcode:  [#1] PREEMPT SMP PTI
> 
> In order to reproduce the problem, users just need to set the kernel
> parameter "crash_kexec_post_notifiers" *without* kdump set in any
> system with the VMX feature present.
> 
> Since there is no benefit in re-disabling VMX in all CPUs in case
> it was already done, this patch prevents that by guarding the restart
> routine against doubly issuing NMIs unnecessarily. Notice we still
> need to disable VMX locally in the emergency restart.
> 
> Fixes: ed72736183c4 ("x86/reboot: Force all cpus to exit VMX root if VMX is 
> supported)
> Fixes: 0ee59413c967 ("x86/panic: replace smp_send_stop() with kdump friendly 
> version in panic path")
> Cc: David P. Reed 
> Cc: Hidehiro Kawai 
> Cc: Paolo Bonzini 
> Cc: Sean Christopherson 
> Signed-off-by: Guilherme G. Piccoli 
> ---
>  arch/x86/include/asm/cpu.h |  1 +
>  arch/x86/kernel/crash.c|  8 
>  arch/x86/kernel/reboot.c   | 14 --
>  3 files changed, 17 insertions(+), 6 deletions(-)
> 

Hi Paolo / Sean / Vitaly, sorry for the ping.
But do you think this fix is OK from the VMX point-of-view?

I'd like to send a V2 of this set soon, so any review here is highly
appreciated!

Cheers,


Guilherme



Re: [PATCH 2/2] powerpc/vdso: Link with ld.lld when requested

2022-05-09 Thread Nathan Chancellor
On Mon, May 09, 2022 at 02:58:09PM -0700, Nick Desaulniers wrote:
> On Mon, May 9, 2022 at 2:47 PM Nathan Chancellor  wrote:
> >
> > On Mon, May 09, 2022 at 02:24:40PM -0700, Nick Desaulniers wrote:
> > > On Mon, May 9, 2022 at 1:47 PM Nathan Chancellor  
> > > wrote:
> > > >
> > > > The PowerPC vDSO is linked with $(CC) instead of $(LD), which means the
> > > > default linker of the compiler is used instead of the linker requested
> > > > by the builder.
> > > >
> > > >   $ make ARCH=powerpc LLVM=1 mrproper defconfig 
> > > > arch/powerpc/kernel/vdso/
> > > >   ...
> > > >
> > > >   $ llvm-readelf -p .comment arch/powerpc/kernel/vdso/vdso{32,64}.so.dbg
> > > >
> > > >   File: arch/powerpc/kernel/vdso/vdso32.so.dbg
> > > >   String dump of section '.comment':
> > > >   [ 0] clang version 14.0.0 (Fedora 14.0.0-1.fc37)
> > > >
> > > >   File: arch/powerpc/kernel/vdso/vdso64.so.dbg
> > > >   String dump of section '.comment':
> > > >   [ 0] clang version 14.0.0 (Fedora 14.0.0-1.fc37)
> > > >
> > > > The compiler option '-fuse-ld' tells the compiler which linker to use
> > > > when it is invoked as both the compiler and linker. Use '-fuse-ld=lld'
> > > > when LD=ld.lld has been specified (CONFIG_LD_IS_LLD) so that the vDSO is
> > > > linked with the same linker as the rest of the kernel.
> > > >
> > > >   $ llvm-readelf -p .comment arch/powerpc/kernel/vdso/vdso{32,64}.so.dbg
> > > >
> > > >   File: arch/powerpc/kernel/vdso/vdso32.so.dbg
> > > >   String dump of section '.comment':
> > > >   [ 0] Linker: LLD 14.0.0
> > > >   [14] clang version 14.0.0 (Fedora 14.0.0-1.fc37)
> > > >
> > > >   File: arch/powerpc/kernel/vdso/vdso64.so.dbg
> > > >   String dump of section '.comment':
> > > >   [ 0] Linker: LLD 14.0.0
> > > >   [14] clang version 14.0.0 (Fedora 14.0.0-1.fc37)
> > > >
> > > > LD can be a full path to ld.lld, which will not be handled properly by
> > > > '-fuse-ld=lld' if the full path to ld.lld is outside of the compiler's
> > > > search path. '-fuse-ld' can take a path to the linker but it is
> > > > deprecated in clang 12.0.0; '--ld-path' is preferred for this scenario.
> > > >
> > > > Use '--ld-path' if it is supported, as it will handle a full path or
> > > > just 'ld.lld' properly. See the LLVM commit below for the full details
> > > > of '--ld-path'.
> > >
> > > Perhaps worth adding some additional background from the cover letter
> > > to the commit message that will actually go into the kernel,
> > > particularly:
> > > 1. Kbuild mostly invokes the compiler and linker distinctly; the ppc
> > > vdso code uses the compiler as the linker driver though.
> > > 2. When doing so, depending on how the compiler was configured, the
> > > implicit default linker the compiler invokes might not match $LD.
> >
> > Sure, I think I can clear up these two points with something like:
> >
> > "The PowerPC vDSO uses $(CC) to link, which differs from the rest of the
> > kernel, which uses $(LD) directly. As a result, the default linker of
> > the compiler is used, which may differ from the linker requested by the
> > builder. For example:
> >
> > 
> >
> > LLVM=1 sets LD=ld.lld but ld.lld is not used to link the vDSO; GNU ld is
> > because "ld" is the default linker for clang on most Linux platforms."
> >
> > Thoughts?
> 
> SGTM
> 
> >
> > > 3. This is a problem for LTO since clang may try to invoke ld.gold,
> > > which is not supported as of
> > > commit 75959d44f9dc ("kbuild: Fail if gold linker is detected")
> >
> > Technically, it seemed like ld.bfd was being invoked but the LLVMgold
> > plugin did not exist. Regardless, moving to ld.lld will resolve that,
> > since the LLVMgold plugin won't be needed.
> 
> Oh indeed, invoking clang with `-flto -###` shows it does invoke the
> system's linker with `-plugin path/to/LLVMgold.so`, not `ld.gold`
> itself.  I don't think we should use or depend on the LLVMgold.so
> plugin either (I suspect it will invoke ld.gold, but as you noticed, I
> don't bother to build LLVMgold.so).  So, perhaps reworded (feel free
> to reword further):
> 
> 3. This is a problem for LTO since clang may try to invoke ld.bfd with
> the LLVMgold.so plugin.
> https://llvm.org/docs/GoldPlugin.html states that "usage of the LLVM
> gold plugin with ld.bfd is not tested and therefore not officially
> supported or recommended." Users should instead use ld.lld to drive
> linking for LTO with clang.

I'll stick this blurb above "The compiler option":

"This is a problem for Clang's Link Time Optimization as implemented in
the kernel because use of GNU ld with LTO requires the LLVMgold plugin,
which is not technically supported for ld.bfd per
https://llvm.org/docs/GoldPlugin.html. Furthermore, if LLVMgold.so is
missing from a user's system, the build will fail, even though LTO as it
is implemented in the kernel requires ld.lld to avoid this dependency in
the first place."

Yell if there is something I should change!

> > > 4. Using the linker as the driver can cause ld.bfd 2.26 to 

Re: [PATCH v2 3/3] mm: rmap: Fix CONT-PTE/PMD size hugetlb issue when unmapping

2022-05-09 Thread Mike Kravetz
On 5/8/22 02:36, Baolin Wang wrote:
> On some architectures (like ARM64), it can support CONT-PTE/PMD size
> hugetlb, which means it can support not only PMD/PUD size hugetlb:
> 2M and 1G, but also CONT-PTE/PMD size: 64K and 32M if a 4K page
> size specified.
> 
> When unmapping a hugetlb page, we will get the relevant page table
> entry by huge_pte_offset() only once to nuke it. This is correct
> for PMD or PUD size hugetlb, since they always contain only one
> pmd entry or pud entry in the page table.
> 
> However this is incorrect for CONT-PTE and CONT-PMD size hugetlb,
> since they can contain several continuous pte or pmd entry with
> same page table attributes, so we will nuke only one pte or pmd
> entry for this CONT-PTE/PMD size hugetlb page.
> 
> And now try_to_unmap() is only passed a hugetlb page in the case
> where the hugetlb page is poisoned. Which means now we will unmap
> only one pte entry for a CONT-PTE or CONT-PMD size poisoned hugetlb
> page, and we can still access other subpages of a CONT-PTE or CONT-PMD
> size poisoned hugetlb page, which will cause serious issues possibly.
> 
> So we should change to use huge_ptep_clear_flush() to nuke the
> hugetlb page table to fix this issue, which already considered
> CONT-PTE and CONT-PMD size hugetlb.
> 
> We've already used set_huge_swap_pte_at() to set a poisoned
> swap entry for a poisoned hugetlb page. Meanwhile adding a VM_BUG_ON()
> to make sure the passed hugetlb page is poisoned in try_to_unmap().
> 
> Signed-off-by: Baolin Wang 
> ---
>  mm/rmap.c | 39 ++-
>  1 file changed, 22 insertions(+), 17 deletions(-)
> 
> diff --git a/mm/rmap.c b/mm/rmap.c
> index 7cf2408..37c8fd2 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -1530,6 +1530,11 @@ static bool try_to_unmap_one(struct folio *folio, 
> struct vm_area_struct *vma,
>  
>   if (folio_test_hugetlb(folio)) {
>   /*
> +  * The try_to_unmap() is only passed a hugetlb page
> +  * in the case where the hugetlb page is poisoned.
> +  */
> + VM_BUG_ON_PAGE(!PageHWPoison(subpage), subpage);
> + /*

It is unfortunate that this could not easily be added to the first
if (folio_test_hugetlb(folio)) block in this routine.  However, it
is fine to add here.

Looks good.  Thanks for all these changes,

Reviewed-by: Mike Kravetz 

-- 
Mike Kravetz


Re: [PATCH v4 00/14] kbuild: yet another series of cleanups (modpost, LTO, MODULE_REL_CRCS, export.h)

2022-05-09 Thread Nathan Chancellor
[resending due to mailing list bounces from a large plain text attachment...]

On Mon, May 09, 2022 at 01:24:33PM +0900, Masahiro Yamada wrote:
> On Mon, May 9, 2022 at 4:09 AM Masahiro Yamada  wrote:
> >
> > This is the third batch of cleanups in this development cycle.
> >
> > Major changes in v4:
> >  - Move static EXPORT_SYMBOL check to a script
> >  - Some refactoring
> >
> > Major changes in v3:
> >
> >  - Generate symbol CRCs as C code, and remove CONFIG_MODULE_REL_CRCS.
> >
> > Major changes in v2:
> >
> >  - V1 did not work with CONFIG_MODULE_REL_CRCS.
> >I fixed this for v2.
> >
> >  - Reflect some review comments in v1
> >
> >  - Refactor the code more
> >
> >  - Avoid too long argument error
> 
> This series is available at
> git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild.git
> lto-cleanup-v4

Hi Masahiro,

I checked this out and went to run it through my QEMU tests but I see
two new errors.

Failure #1:

In file included from scripts/mod/section-check.c:3:
scripts/mod/modpost.h:15:10: fatal error: 'elfconfig.h' file not found
#include "elfconfig.h"
 ^
1 error generated.

I was able to get past that with

diff --git a/scripts/mod/Makefile b/scripts/mod/Makefile
index ca739c6c68a1..c33b83bfbcad 100644
--- a/scripts/mod/Makefile
+++ b/scripts/mod/Makefile
@@ -16,7 +16,7 @@ targets += $(devicetable-offsets-file) devicetable-offsets.s

 # dependencies on generated files need to be listed explicitly

-$(obj)/modpost.o $(obj)/file2alias.o $(obj)/sumversion.o: $(obj)/elfconfig.h
+$(obj)/modpost.o $(obj)/file2alias.o $(obj)/sumversion.o 
$(obj)/section-check.o: $(obj)/elfconfig.h
 $(obj)/file2alias.o: $(obj)/$(devicetable-offsets-file)

 quiet_cmd_elfconfig = MKELF   $@

Failure #2:

  GEN .version
  CHK include/generated/compile.h
  GEN .tmp_initcalls.lds
  LTO vmlinux.o
  OBJTOOL vmlinux.o
  MODPOST vmlinux.symvers
  MODINFO modules.builtin.modinfo
  GEN modules.builtin
  LD  .tmp_vmlinux.btf
ld.lld: error: cannot open .vmlinux.export.o: No such file or directory
  BTF .btf.vmlinux.bin.o
pahole: .tmp_vmlinux.btf: No such file or directory
  CC  .vmlinux.export.c
  LD  .tmp_vmlinux.kallsyms1
ld.lld: error: .btf.vmlinux.bin.o: unknown file type
make[1]: *** [Makefile:1159: vmlinux] Error 1

I was not really able to see what is going wrong here. Attached is the
configuration that I ran into this with. If you need any other
information, please let me know!

Cheers,
Nathan


config.gz
Description: application/gzip


Re: [PATCH 2/2] powerpc/vdso: Link with ld.lld when requested

2022-05-09 Thread Nathan Chancellor
On Mon, May 09, 2022 at 02:24:40PM -0700, Nick Desaulniers wrote:
> On Mon, May 9, 2022 at 1:47 PM Nathan Chancellor  wrote:
> >
> > The PowerPC vDSO is linked with $(CC) instead of $(LD), which means the
> > default linker of the compiler is used instead of the linker requested
> > by the builder.
> >
> >   $ make ARCH=powerpc LLVM=1 mrproper defconfig arch/powerpc/kernel/vdso/
> >   ...
> >
> >   $ llvm-readelf -p .comment arch/powerpc/kernel/vdso/vdso{32,64}.so.dbg
> >
> >   File: arch/powerpc/kernel/vdso/vdso32.so.dbg
> >   String dump of section '.comment':
> >   [ 0] clang version 14.0.0 (Fedora 14.0.0-1.fc37)
> >
> >   File: arch/powerpc/kernel/vdso/vdso64.so.dbg
> >   String dump of section '.comment':
> >   [ 0] clang version 14.0.0 (Fedora 14.0.0-1.fc37)
> >
> > The compiler option '-fuse-ld' tells the compiler which linker to use
> > when it is invoked as both the compiler and linker. Use '-fuse-ld=lld'
> > when LD=ld.lld has been specified (CONFIG_LD_IS_LLD) so that the vDSO is
> > linked with the same linker as the rest of the kernel.
> >
> >   $ llvm-readelf -p .comment arch/powerpc/kernel/vdso/vdso{32,64}.so.dbg
> >
> >   File: arch/powerpc/kernel/vdso/vdso32.so.dbg
> >   String dump of section '.comment':
> >   [ 0] Linker: LLD 14.0.0
> >   [14] clang version 14.0.0 (Fedora 14.0.0-1.fc37)
> >
> >   File: arch/powerpc/kernel/vdso/vdso64.so.dbg
> >   String dump of section '.comment':
> >   [ 0] Linker: LLD 14.0.0
> >   [14] clang version 14.0.0 (Fedora 14.0.0-1.fc37)
> >
> > LD can be a full path to ld.lld, which will not be handled properly by
> > '-fuse-ld=lld' if the full path to ld.lld is outside of the compiler's
> > search path. '-fuse-ld' can take a path to the linker but it is
> > deprecated in clang 12.0.0; '--ld-path' is preferred for this scenario.
> >
> > Use '--ld-path' if it is supported, as it will handle a full path or
> > just 'ld.lld' properly. See the LLVM commit below for the full details
> > of '--ld-path'.
> 
> Perhaps worth adding some additional background from the cover letter
> to the commit message that will actually go into the kernel,
> particularly:
> 1. Kbuild mostly invokes the compiler and linker distinctly; the ppc
> vdso code uses the compiler as the linker driver though.
> 2. When doing so, depending on how the compiler was configured, the
> implicit default linker the compiler invokes might not match $LD.

Sure, I think I can clear up these two points with something like:

"The PowerPC vDSO uses $(CC) to link, which differs from the rest of the
kernel, which uses $(LD) directly. As a result, the default linker of
the compiler is used, which may differ from the linker requested by the
builder. For example:



LLVM=1 sets LD=ld.lld but ld.lld is not used to link the vDSO; GNU ld is
because "ld" is the default linker for clang on most Linux platforms."

Thoughts?

> 3. This is a problem for LTO since clang may try to invoke ld.gold,
> which is not supported as of
> commit 75959d44f9dc ("kbuild: Fail if gold linker is detected")

Technically, it seemed like ld.bfd was being invoked but the LLVMgold
plugin did not exist. Regardless, moving to ld.lld will resolve that,
since the LLVMgold plugin won't be needed.

> 4. Using the linker as the driver can cause ld.bfd 2.26 to crash.
> https://lore.kernel.org/all/b2066ccd-2b81-6032-08e3-41105b400...@csgroup.eu/
> (Though, I wonder if that's because I was trying to add
> --orphan-handling=warn, which we're not yet doing for the ppc vdso
> AFAICT).

I can add this if necessary but it seemed like there might have been
other problems reported? I could just add a blanket "linker driver had
issues, we'll try again later" or something of that effect?

> Reviewed-by: Nick Desaulniers 

Thank you for the review as always!

> >
> > Link: https://github.com/ClangBuiltLinux/linux/issues/774
> > Link: 
> > https://github.com/llvm/llvm-project/commit/1bc5c84710a8c73ef21295e63c19d10a8c71f2f5
> > Signed-off-by: Nathan Chancellor 
> > ---
> >  arch/powerpc/kernel/vdso/Makefile | 1 +
> >  1 file changed, 1 insertion(+)
> >
> > diff --git a/arch/powerpc/kernel/vdso/Makefile 
> > b/arch/powerpc/kernel/vdso/Makefile
> > index 954974287ee7..096b0bf1335f 100644
> > --- a/arch/powerpc/kernel/vdso/Makefile
> > +++ b/arch/powerpc/kernel/vdso/Makefile
> > @@ -48,6 +48,7 @@ UBSAN_SANITIZE := n
> >  KASAN_SANITIZE := n
> >
> >  ccflags-y := -shared -fno-common -fno-builtin -nostdlib 
> > -Wl,--hash-style=both
> > +ccflags-$(CONFIG_LD_IS_LLD) += $(call 
> > cc-option,--ld-path=$(LD),-fuse-ld=lld)
> >
> >  CC32FLAGS := -Wl,-soname=linux-vdso32.so.1 -m32
> >  AS32FLAGS := -D__VDSO32__ -s
> > --
> > 2.36.1
> >
> 
> 
> -- 
> Thanks,
> ~Nick Desaulniers


Re: [PATCH v2 2/3] mm: rmap: Fix CONT-PTE/PMD size hugetlb issue when migration

2022-05-09 Thread Mike Kravetz
On 5/8/22 02:36, Baolin Wang wrote:
> On some architectures (like ARM64), it can support CONT-PTE/PMD size
> hugetlb, which means it can support not only PMD/PUD size hugetlb:
> 2M and 1G, but also CONT-PTE/PMD size: 64K and 32M if a 4K page
> size specified.
> 
> When migrating a hugetlb page, we will get the relevant page table
> entry by huge_pte_offset() only once to nuke it and remap it with
> a migration pte entry. This is correct for PMD or PUD size hugetlb,
> since they always contain only one pmd entry or pud entry in the
> page table.
> 
> However this is incorrect for CONT-PTE and CONT-PMD size hugetlb,
> since they can contain several continuous pte or pmd entry with
> same page table attributes. So we will nuke or remap only one pte
> or pmd entry for this CONT-PTE/PMD size hugetlb page, which is
> not expected for hugetlb migration. The problem is we can still
> continue to modify the subpages' data of a hugetlb page during
> migrating a hugetlb page, which can cause a serious data consistent
> issue, since we did not nuke the page table entry and set a
> migration pte for the subpages of a hugetlb page.
> 
> To fix this issue, we should change to use huge_ptep_clear_flush()
> to nuke a hugetlb page table, and remap it with set_huge_pte_at()
> and set_huge_swap_pte_at() when migrating a hugetlb page, which
> already considered the CONT-PTE or CONT-PMD size hugetlb.
> 
> Signed-off-by: Baolin Wang 
> ---
>  mm/rmap.c | 24 ++--
>  1 file changed, 18 insertions(+), 6 deletions(-)

With the addition of !CONFIG_HUGETLB_PAGE stubs,

Reviewed-by: Mike Kravetz 
-- 
Mike Kravetz


[PATCH 2/2] powerpc/vdso: Link with ld.lld when requested

2022-05-09 Thread Nathan Chancellor
The PowerPC vDSO is linked with $(CC) instead of $(LD), which means the
default linker of the compiler is used instead of the linker requested
by the builder.

  $ make ARCH=powerpc LLVM=1 mrproper defconfig arch/powerpc/kernel/vdso/
  ...

  $ llvm-readelf -p .comment arch/powerpc/kernel/vdso/vdso{32,64}.so.dbg

  File: arch/powerpc/kernel/vdso/vdso32.so.dbg
  String dump of section '.comment':
  [ 0] clang version 14.0.0 (Fedora 14.0.0-1.fc37)

  File: arch/powerpc/kernel/vdso/vdso64.so.dbg
  String dump of section '.comment':
  [ 0] clang version 14.0.0 (Fedora 14.0.0-1.fc37)

The compiler option '-fuse-ld' tells the compiler which linker to use
when it is invoked as both the compiler and linker. Use '-fuse-ld=lld'
when LD=ld.lld has been specified (CONFIG_LD_IS_LLD) so that the vDSO is
linked with the same linker as the rest of the kernel.

  $ llvm-readelf -p .comment arch/powerpc/kernel/vdso/vdso{32,64}.so.dbg

  File: arch/powerpc/kernel/vdso/vdso32.so.dbg
  String dump of section '.comment':
  [ 0] Linker: LLD 14.0.0
  [14] clang version 14.0.0 (Fedora 14.0.0-1.fc37)

  File: arch/powerpc/kernel/vdso/vdso64.so.dbg
  String dump of section '.comment':
  [ 0] Linker: LLD 14.0.0
  [14] clang version 14.0.0 (Fedora 14.0.0-1.fc37)

LD can be a full path to ld.lld, which will not be handled properly by
'-fuse-ld=lld' if the full path to ld.lld is outside of the compiler's
search path. '-fuse-ld' can take a path to the linker but it is
deprecated in clang 12.0.0; '--ld-path' is preferred for this scenario.

Use '--ld-path' if it is supported, as it will handle a full path or
just 'ld.lld' properly. See the LLVM commit below for the full details
of '--ld-path'.

Link: https://github.com/ClangBuiltLinux/linux/issues/774
Link: 
https://github.com/llvm/llvm-project/commit/1bc5c84710a8c73ef21295e63c19d10a8c71f2f5
Signed-off-by: Nathan Chancellor 
---
 arch/powerpc/kernel/vdso/Makefile | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/powerpc/kernel/vdso/Makefile 
b/arch/powerpc/kernel/vdso/Makefile
index 954974287ee7..096b0bf1335f 100644
--- a/arch/powerpc/kernel/vdso/Makefile
+++ b/arch/powerpc/kernel/vdso/Makefile
@@ -48,6 +48,7 @@ UBSAN_SANITIZE := n
 KASAN_SANITIZE := n
 
 ccflags-y := -shared -fno-common -fno-builtin -nostdlib -Wl,--hash-style=both
+ccflags-$(CONFIG_LD_IS_LLD) += $(call cc-option,--ld-path=$(LD),-fuse-ld=lld)
 
 CC32FLAGS := -Wl,-soname=linux-vdso32.so.1 -m32
 AS32FLAGS := -D__VDSO32__ -s
-- 
2.36.1



[PATCH 1/2] powerpc/vdso: Remove unused ENTRY in linker scripts

2022-05-09 Thread Nathan Chancellor
When linking vdso{32,64}.so.dbg with ld.lld, there is a warning about
not finding _start for the starting address:

  ld.lld: warning: cannot find entry symbol _start; not setting start address
  ld.lld: warning: cannot find entry symbol _start; not setting start address

Looking at GCC + GNU ld, the entry point address is 0x0:

  $ llvm-readelf -h vdso{32,64}.so.dbg &| rg "(File|Entry point address):"
  File: vdso32.so.dbg
Entry point address:   0x0
  File: vdso64.so.dbg
Entry point address:   0x0

This matches what ld.lld emits:

  $ powerpc64le-linux-gnu-readelf -p .comment vdso{32,64}.so.dbg

  File: vdso32.so.dbg

  String dump of section '.comment':
[ 0]  Linker: LLD 14.0.0
[14]  clang version 14.0.0 (Fedora 14.0.0-1.fc37)

  File: vdso64.so.dbg

  String dump of section '.comment':
[ 0]  Linker: LLD 14.0.0
[14]  clang version 14.0.0 (Fedora 14.0.0-1.fc37)

  $ llvm-readelf -h vdso{32,64}.so.dbg &| rg "(File|Entry point address):"
  File: vdso32.so.dbg
Entry point address:   0x0
  File: vdso64.so.dbg
Entry point address:   0x0

Remove ENTRY to remove the warning, as it is unnecessary for the vDSO to
function correctly.

Signed-off-by: Nathan Chancellor 
---
 arch/powerpc/kernel/vdso/vdso32.lds.S | 1 -
 arch/powerpc/kernel/vdso/vdso64.lds.S | 1 -
 2 files changed, 2 deletions(-)

diff --git a/arch/powerpc/kernel/vdso/vdso32.lds.S 
b/arch/powerpc/kernel/vdso/vdso32.lds.S
index 58e0099f70f4..e0d19d74455f 100644
--- a/arch/powerpc/kernel/vdso/vdso32.lds.S
+++ b/arch/powerpc/kernel/vdso/vdso32.lds.S
@@ -13,7 +13,6 @@ OUTPUT_FORMAT("elf32-powerpcle", "elf32-powerpcle", 
"elf32-powerpcle")
 OUTPUT_FORMAT("elf32-powerpc", "elf32-powerpc", "elf32-powerpc")
 #endif
 OUTPUT_ARCH(powerpc:common)
-ENTRY(_start)
 
 SECTIONS
 {
diff --git a/arch/powerpc/kernel/vdso/vdso64.lds.S 
b/arch/powerpc/kernel/vdso/vdso64.lds.S
index 0288cad428b0..1a4a7bc4c815 100644
--- a/arch/powerpc/kernel/vdso/vdso64.lds.S
+++ b/arch/powerpc/kernel/vdso/vdso64.lds.S
@@ -13,7 +13,6 @@ OUTPUT_FORMAT("elf64-powerpcle", "elf64-powerpcle", 
"elf64-powerpcle")
 OUTPUT_FORMAT("elf64-powerpc", "elf64-powerpc", "elf64-powerpc")
 #endif
 OUTPUT_ARCH(powerpc:common64)
-ENTRY(_start)
 
 SECTIONS
 {
-- 
2.36.1



[PATCH 0/2] Link the PowerPC vDSO with ld.lld

2022-05-09 Thread Nathan Chancellor
Hi all,

This series is an alternative to the one proposed by Nick before the
PowerPC vDSO unification in commit fd1feade75fb ("powerpc/vdso: Merge
vdso64 and vdso32 into a single directory"):

https://lore.kernel.org/20200901222523.1941988-1-ndesaulni...@google.com/

Normally, we try to make compiling and linking two separate stages so
that they can be done by $(CC) and $(LD) respectively, which is more in
line with what the user expects, versus using the compiler as a linker
driver and relying on the implicit default linker value. However, as
shown in the above thread, getting this right for the PowerPC vDSO is a
little tricky due to the linker emulation values.

The unification might make this easier but that needs further
investigation. To avoid regressing ld.bfd while enabling support for
linking the vDSO with ld.lld, we can tell the compiler to use ld.lld via
either '--ld-path=' (clang 12.0.0) or '-fuse-ld=lld'.

The first patch avoids a warning from ld.lld when linking both vDSO
objects and the second patch adds the flags.

This should help avoid the issue noticed during Alexey's LTO bring up:

https://lore.kernel.org/cakwvodmumhqhqhdcpwjmniqqpvwojb9mbukf3rr0bl+h+da...@mail.gmail.com/

Nathan Chancellor (2):
  powerpc/vdso: Remove unused ENTRY in linker scripts
  powerpc/vdso: Link with ld.lld when requested

 arch/powerpc/kernel/vdso/Makefile | 1 +
 arch/powerpc/kernel/vdso/vdso32.lds.S | 1 -
 arch/powerpc/kernel/vdso/vdso64.lds.S | 1 -
 3 files changed, 1 insertion(+), 2 deletions(-)


base-commit: f06351f8c0c85e2d53e73c53a33b4ef55b4ad6de
-- 
2.36.1



Re: [PATCH v2 1/3] mm: change huge_ptep_clear_flush() to return the original pte

2022-05-09 Thread Mike Kravetz
On 5/9/22 01:46, Baolin Wang wrote:
> 
> 
> On 5/9/2022 1:46 PM, Christophe Leroy wrote:
>>
>>
>> Le 08/05/2022 à 15:09, Baolin Wang a écrit :
>>>
>>>
>>> On 5/8/2022 7:09 PM, Muchun Song wrote:
 On Sun, May 08, 2022 at 05:36:39PM +0800, Baolin Wang wrote:
> It is incorrect to use ptep_clear_flush() to nuke a hugetlb page
> table when unmapping or migrating a hugetlb page, and will change
> to use huge_ptep_clear_flush() instead in the following patches.
>
> So this is a preparation patch, which changes the
> huge_ptep_clear_flush()
> to return the original pte to help to nuke a hugetlb page table.
>
> Signed-off-by: Baolin Wang 
> Acked-by: Mike Kravetz 

 Reviewed-by: Muchun Song 
>>>
>>> Thanks for reviewing.
>>>

 But one nit below:

 [...]
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 8605d7e..61a21af 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -5342,7 +5342,7 @@ static vm_fault_t hugetlb_wp(struct mm_struct
> *mm, struct vm_area_struct *vma,
>    ClearHPageRestoreReserve(new_page);
>    /* Break COW or unshare */
> -    huge_ptep_clear_flush(vma, haddr, ptep);
> +    (void)huge_ptep_clear_flush(vma, haddr, ptep);

 Why add a "(void)" here? Is there any warning if no "(void)"?
 IIUC, I think we can remove this, right?
>>>
>>> I did not meet any warning without the casting, but this is per Mike's
>>> comment[1] to make the code consistent with other functions casting to
>>> void type explicitly in hugetlb.c file.
>>>
>>> [1]
>>> https://lore.kernel.org/all/495c4ebe-a5b4-afb6-4cb0-956c1b18d...@oracle.com/
>>>
>>
>> As far as I understand, Mike said that you should be accompagnied with a
>> big fat comment explaining why we ignore the returned value from
>> huge_ptep_clear_flush(). >
>> By the way huge_ptep_clear_flush() is not declared 'must_check' so this
>> cast is just visual polution and should be removed.
>>
>> In the meantime the comment suggested by Mike should be added instead.
> Sorry for my misunderstanding. I just follow the explicit void casting like 
> other places in hugetlb.c file. And I am not sure if it is useful adding some 
> comments like below, since we did not need the original pte value in the COW 
> case mapping with a new page, and the code is more readable already I think.
> 
> Mike, could you help to clarify what useful comments would you like? and 
> remove the explicit void casting? Thanks.
> 

Sorry for the confusion.

In the original commit, it seemed odd to me that the signature of the
function was changing and there was not an associated change to the only
caller of the function.  I did suggest casting to void or adding a comment.
As Christophe mentions, the cast to void is not necessary.  In addition,
there really isn't a need for a comment as the calling code is not changed.

The original version of the commit without either is actually preferable.
The commit message does say this is a preparation patch and the return
value will be used in later patches.
Again, sorry for the confusion.
-- 
Mike Kravetz


Re: [PATCH kernel] powerpc/llvm/lto: Allow LLVM LTO builds

2022-05-09 Thread Nathan Chancellor
Hi Alexey,

On Mon, May 09, 2022 at 05:42:59PM +1000, Alexey Kardashevskiy wrote:
> 
> 
> On 5/9/22 15:18, Alexey Kardashevskiy wrote:
> > 
> > 
> > On 5/4/22 07:21, Nick Desaulniers wrote:
> > > On Thu, Apr 28, 2022 at 11:46 PM Alexey Kardashevskiy
> > >  wrote:
> > > > 
> > > > This enables LTO_CLANG builds on POWER with the upstream version of
> > > > LLVM.
> > > > 
> > > > LTO optimizes the output vmlinux binary and this may affect the FTP
> > > > alternative section if alt branches use "bc" (Branch Conditional) which
> > > > is limited by 16 bit offsets. This shows up in errors like:
> > > > 
> > > > ld.lld: error: InputSection too large for range extension thunk
> > > > vmlinux.o:(__ftr_alt_97+0xF0)
> > > > 
> > > > This works around the issue by replacing "bc" in FTR_SECTION_ELSE with
> > > > "b" which allows 26 bit offsets.
> > > > 
> > > > This catches the problem instructions in vmlinux.o before it LTO'ed:
> > > > 
> > > > $ objdump -d -M raw -j __ftr_alt_97 vmlinux.o | egrep '\S+\s*\'
> > > >    30:   00 00 82 40 bc  4,eq,30 <__ftr_alt_97+0x30>
> > > >    f0:   00 00 82 40 bc  4,eq,f0 <__ftr_alt_97+0xf0>
> > > > 
> > > > This allows LTO builds for ppc64le_defconfig plus LTO options.
> > > > Note that DYNAMIC_FTRACE/FUNCTION_TRACER is not supported by LTO builds
> > > > but this is not POWERPC-specific.
> > > 
> > > $ ARCH=powerpc make LLVM=1 -j72 ppc64le_defconfig
> > > $ ARCH=powerpc make LLVM=1 -j72 menuconfig
> > > 
> > > $ ARCH=powerpc make LLVM=1 -j72
> > > ...
> > >    VDSO64L arch/powerpc/kernel/vdso/vdso64.so.dbg
> > > /usr/bin/powerpc64le-linux-gnu-ld:
> > > /android0/llvm-project/llvm/build/bin/../lib/LLVMgold.so: error
> > > loading plugin:
> > > /android0/llvm-project/llvm/build/bin/../lib/LLVMgold.so: cannot open
> > > shared object file: No such file or directory
> > > clang-15: error: linker command failed with exit code 1 (use -v to see
> > > invocation)
> > > make[1]: *** [arch/powerpc/kernel/vdso/Makefile:67:
> > > arch/powerpc/kernel/vdso/vdso64.so.dbg] Error 1
> > > 
> > > Looks like LLD isn't being invoked correctly to link the vdso.
> > > Probably need to revisit
> > > https://lore.kernel.org/lkml/20200901222523.1941988-1-ndesaulni...@google.com/
> > > 
> > > How were you working around this issue? Perhaps you built clang to
> > > default to LLD? (there's a cmake option for that)
> > 
> > 
> > What option is that? I only add  -DLLVM_ENABLE_LLD=ON  which (I think)

The option Nick is referring to here is CLANG_DEFAULT_LINKER, which sets
the default linker when clang is invoked as the linker driver, which the
PowerPC vDSO Makefile does. We have been trying to move all parts of the
kernel to compile with $(CC) and link with $(LD) so that the default
linker invoked by the compiler is taken out of the equation.

For what it's worth, I think that the error Nick is seeing is due to a
lack of the LLVMgold.so plugin on his system, which is built when
-DLLVM_BINUTILS_INCDIR=... is included in the list of CMake variables,
which you have. The LLVMgold.so plugin is needed when doing LTO with
GNU ld, which is the case with the vDSO currently for the reason I
mentioned above. We could work around this with Nick's proposed
'-fuse-ld=lld' patch or just disable LTO for the vDSO.

https://llvm.org/docs/GoldPlugin.html

My version of the hack would probably be:

ccflags-$(CONFIG_LD_IS_LLD) += -fuse-ld=lld
asflags-$(CONFIG_LD_IS_LLD) += -fuse-ld=lld

> > tells cmake to use lld to link the LLVM being built but does not seem to
> > tell what the built clang should do.

Right, -DLLVM_ENABLE_LLD=ON is equivalent to -DLLVM_USE_LINKER=lld,
except when doing a multi-stage build:

https://llvm.org/docs/CMake.html#llvm-related-variables

> > 
> > Without -DLLVM_ENABLE_LLD=ON, building just fails:
> > 
> > [fstn1-p1 ~/pbuild/llvm/llvm-lto-latest-cleanbuild]$ ninja -j 100
> > [619/3501] Linking CXX executable bin/not
> > FAILED: bin/not
> > : && /usr/bin/clang++ -fPIC -fvisibility-inlines-hidden
> > -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra
> > -Wno-unused-parameter -Wwrite-strings -Wcast-qual
> > -Wmissing-field-initializers -pedantic -Wno-long-long
> > -Wc++98-compat-extra-semi -Wimplicit-fallthrough
> > -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor
> > -Wdelete-non-virtual-dtor -Wsuggest-override -Wstring-conversion
> > -Wmisleading-indentation -fdiagnostics-color -ffunction-sections
> > -fdata-sections -flto -O3 -DNDEBUG -flto
> > -Wl,-rpath-link,/home/aik/pbuild/llvm/llvm-lto-latest-cleanbuild/./lib
> > -Wl,--gc-sections utils/not/CMakeFiles/not.dir/not.cpp.o -o bin/not
> > -Wl,-rpath,"\$ORIGIN/../lib"  -lpthread  lib/libLLVMSupport.a  -lrt
> > -ldl  -lpthread  -lm  /usr/lib/powerpc64le-linux-gnu/libz.so
> > /usr/lib/powerpc64le-linux-gnu/libtinfo.so  lib/libLLVMDemangle.a && :
> > /usr/bin/ld: lib/libLLVMSupport.a: error adding symbols: archive has no
> > index; run ranlib to add one
> > clang: error: linker command failed with exit code 1 

Re: [PATCH 2/3] of: dynamic: add of_node_alloc() and of_node_free()

2022-05-09 Thread Rob Herring
On Fri, May 6, 2022 at 5:45 AM Clément Léger  wrote:
>
> Le Thu, 5 May 2022 14:43:54 -0500,
> Rob Herring  a écrit :
>
> > On Wed, May 04, 2022 at 05:40:32PM +0200, Clément Léger wrote:
> > > Add functions which allows to create and free nodes.
> > >
> > > Signed-off-by: Clément Léger 
> > > ---
> > >  drivers/of/dynamic.c | 59 
> > >  include/linux/of.h   |  9 +++
> > >  2 files changed, 58 insertions(+), 10 deletions(-)
> > >
> > > diff --git a/drivers/of/dynamic.c b/drivers/of/dynamic.c
> > > index e8700e509d2e..ec28e5ba2969 100644
> > > --- a/drivers/of/dynamic.c
> > > +++ b/drivers/of/dynamic.c
> > > @@ -455,6 +455,54 @@ struct property *__of_prop_dup(const struct property 
> > > *prop, gfp_t allocflags)
> > >  prop->length, allocflags);
> > >  }
> > >
> > > +/**
> > > + * of_node_free - Free a node allocated dynamically.
> > > + * @node:  Node to be freed
> > > + */
> > > +void of_node_free(const struct device_node *node)
> > > +{
> > > +   kfree(node->full_name);
> > > +   kfree(node->data);
> > > +   kfree(node);
> > > +}
> > > +EXPORT_SYMBOL(of_node_free);
> >
> > This shouldn't be needed. Nodes are refcounted, so any caller should
> > just do a put.
>
> Acked. Do you want the name to be allocated as part of the node
> allocation also ?

Yeah, I think that would be fine.

Rob


Re: [PATCH 3/3] mm: rmap: Fix CONT-PTE/PMD size hugetlb issue when unmapping

2022-05-09 Thread Peter Xu
On Fri, May 06, 2022 at 12:07:13PM -0700, Mike Kravetz wrote:
> On 5/3/22 03:03, Gerald Schaefer wrote:
> > On Tue, 3 May 2022 10:19:46 +0800
> > Baolin Wang  wrote:
> >> On 5/2/2022 10:02 PM, Gerald Schaefer wrote:

[...]

> >> Please see previous code, we'll use the original pte value to check if 
> >> it is uffd-wp armed, and if need to mark it dirty though the hugetlbfs 
> >> is set noop_dirty_folio().
> >>
> >> pte_install_uffd_wp_if_needed(vma, address, pvmw.pte, pteval);
> > 
> > Uh, ok, that wouldn't work on s390, but we also don't have
> > CONFIG_PTE_MARKER_UFFD_WP / HAVE_ARCH_USERFAULTFD_WP set, so
> > I guess we will be fine (for now).
> > 
> > Still, I find it a bit unsettling that pte_install_uffd_wp_if_needed()
> > would work on a potential hugetlb *pte, directly de-referencing it
> > instead of using huge_ptep_get().
> > 
> > The !pte_none(*pte) check at the beginning would be broken in the
> > hugetlb case for s390 (not sure about other archs, but I think s390
> > might be the only exception strictly requiring huge_ptep_get()
> > for de-referencing hugetlb *pte pointers).

We could have used is_vm_hugetlb_page(vma) within the helper so as to
properly use either generic pte or hugetlb version of pte fetching.  We may
want to conditionally do set_[huge_]pte_at() too at the end.

I could prepare a patch for that even if it's not really anything urgently
needed. I assume that won't need to block this patchset since we need the
pteval for pte_dirty() check anyway and uffd-wp definitely needs it too.

Thanks,

-- 
Peter Xu



Re: request_module DoS

2022-05-09 Thread Luis Chamberlain
On Mon, May 09, 2022 at 09:23:39PM +1000, Michael Ellerman wrote:
> Herbert Xu  writes:
> > Hi:
> >
> > There are some code paths in the kernel where you can reliably
> > trigger a request_module of a non-existant module.  For example,
> > if you attempt to load a non-existent crypto algorithm, or create
> > a socket of a non-existent network family, it will result in a
> > request_module call that is guaranteed to fail.
> >
> > As user-space can do this repeatedly, it can quickly overwhelm
> > the concurrency limit in kmod.  This in itself is expected,
> > however, at least on some platforms this appears to result in
> > a live-lock.  Here is an example triggered by stress-ng on ppc64:
> >
> > [  529.853264] request_module: kmod_concurrent_max (0) close to 0 
> > (max_modprobes: 50), for module crypto-aegis128l, throttling...
> ...
> > [  580.414590] __request_module: 25 callbacks suppressed
> > [  580.414597] request_module: kmod_concurrent_max (0) close to 0 
> > (max_modprobes: 50), for module crypto-aegis256-all, throttling...
> > [  580.423082] watchdog: CPU 784 self-detected hard LOCKUP @ 
> > plpar_hcall_norets_notrace+0x18/0x2c
> > [  580.423097] watchdog: CPU 784 TB:1297691958559475, last heartbeat 
> > TB:1297686321743840 (11009ms ago)
> > [  580.423099] Modules linked in: cast6_generic cast5_generic cast_common 
> > camellia_generic blowfish_generic blowfish_common tun nft_fib_inet 
> > nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 
> > nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack 
> > nf_defrag_ipv6 nf_defrag_ipv4 rfkill bonding tls ip_set nf_tables nfnetlink 
> > pseries_rng binfmt_misc drm drm_panel_orientation_quirks xfs libcrc32c 
> > sd_mod t10_pi sg ibmvscsi ibmveth scsi_transport_srp vmx_crypto dm_mirror 
> > dm_region_hash dm_log dm_mod fuse
> > [  580.423136] CPU: 784 PID: 77071 Comm: stress-ng Kdump: loaded Not 
> > tainted 5.14.0-55.el9.ppc64le #1
> > [  580.423139] NIP:  c00f8ff4 LR: c01f7c38 CTR: 
> > 
> > [  580.423140] REGS: c043fdd7bd60 TRAP: 0900   Not tainted  
> > (5.14.0-55.el9.ppc64le)
> > [  580.423142] MSR:  8280b033   
> > CR: 28008202  XER: 2004
> > [  580.423148] CFAR: 0c00 IRQMASK: 1 
> >GPR00: 28008202 c044c46b3850 c2a46f00 
> >  
> >GPR04:   0010 
> > c2a83060 
> >GPR08:  0001 0001 
> >  
> >GPR12: c01b9530 c043ffe16700 00020117 
> > 10185ea8 
> >GPR16: 10212150 10186198 101863a0 
> > 1021b3c0 
> >GPR20: 0001  0001 
> > 00ff 
> >GPR24: c043f4a00e14 c043fafe0e00 0c44 
> >  
> >GPR28: c043f4a00e00 c043f4a00e00 c21e0e00 
> > c2561aa0 
> > [  580.423166] NIP [c00f8ff4] plpar_hcall_norets_notrace+0x18/0x2c
> > [  580.423168] LR [c01f7c38] 
> > __pv_queued_spin_lock_slowpath+0x528/0x530
> > [  580.423173] Call Trace:
> > [  580.423174] [c044c46b3850] [00016b60] 0x16b60 
> > (unreliable)
> > [  580.423177] [c044c46b3910] [c0ea6948] 
> > _raw_spin_lock_irqsave+0xa8/0xc0
> > [  580.423182] [c044c46b3940] [c01dd7c0] 
> > prepare_to_wait_event+0x40/0x200
> > [  580.423185] [c044c46b39a0] [c019e9e0] 
> > __request_module+0x320/0x510
> > [  580.423188] [c044c46b3ac0] [c06f1a14] 
> > crypto_alg_mod_lookup+0x1e4/0x2e0
> > [  580.423192] [c044c46b3b60] [c06f2178] 
> > crypto_alloc_tfm_node+0xa8/0x1a0
> > [  580.423194] [c044c46b3be0] [c06f84f8] 
> > crypto_alloc_aead+0x38/0x50
> > [  580.423196] [c044c46b3c00] [c072cba0] aead_bind+0x70/0x140
> > [  580.423199] [c044c46b3c40] [c0727824] alg_bind+0xb4/0x210
> > [  580.423201] [c044c46b3cc0] [c0bc2ad4] __sys_bind+0x114/0x160
> > [  580.423205] [c044c46b3d90] [c0bc2b48] sys_bind+0x28/0x40
> > [  580.423207] [c044c46b3db0] [c0030880] 
> > system_call_exception+0x160/0x300
> > [  580.423209] [c044c46b3e10] [c000c168] 
> > system_call_vectored_common+0xe8/0x278
> > [  580.423213] --- interrupt: 3000 at 0x7fff9b824464
> > [  580.423214] NIP:  7fff9b824464 LR:  CTR: 
> > 
> > [  580.423215] REGS: c044c46b3e80 TRAP: 3000   Not tainted  
> > (5.14.0-55.el9.ppc64le)
> > [  580.423216] MSR:  8280f033   
> > CR: 42004802  XER: 
> > [  580.423221] IRQMASK: 0 
> >GPR00: 0147 7fffdcff2780 7fff9b917100 
> > 0004 
> >GPR04: 7fffdcff27e0 0058  
> >  
> >GPR08: 

Re: [PATCH 24/30] panic: Refactor the panic path

2022-05-09 Thread Guilherme G. Piccoli
On 29/04/2022 13:04, Guilherme G. Piccoli wrote:
> On 27/04/2022 21:28, Randy Dunlap wrote:
>>
>>
>> On 4/27/22 15:49, Guilherme G. Piccoli wrote:
>>> +   crash_kexec_post_notifiers
>>> +   This was DEPRECATED - users should always prefer the
>>
>>  This is DEPRECATED - users should always prefer the
>>
>>> +   parameter "panic_notifiers_level" - check its entry
>>> +   in this documentation for details on how it works.
>>> +   Setting this parameter is exactly the same as setting
>>> +   "panic_notifiers_level=4".
>>
> 
> Thanks Randy, for your suggestion - but I confess I couldn't understand
> it properly. It's related to spaces/tabs, right? What you suggest me to
> change in this formatting? Just by looking the email I can't parse.
> 
> Cheers,
> 
> 
> Guilherme

Complete lack of attention from me, apologies!
The suggestions was s/was/is - already fixed for V2, thanks Randy.


Re: [PATCH 10/30] alpha: Clean-up the panic notifier code

2022-05-09 Thread Guilherme G. Piccoli
On 27/04/2022 19:49, Guilherme G. Piccoli wrote:
> The alpha panic notifier has some code issues, not following
> the conventions of other notifiers. Also, it might halt the
> machine but still it is set to run as early as possible, which
> doesn't seem to be a good idea.
> 
> This patch cleans the code, and set the notifier to run as the
> latest, following the same approach other architectures are doing.
> Also, we remove the unnecessary include of a header already
> included indirectly.
> 
> Cc: Ivan Kokshaysky 
> Cc: Matt Turner 
> Cc: Richard Henderson 
> Signed-off-by: Guilherme G. Piccoli 
> ---
>  arch/alpha/kernel/setup.c | 36 +++-
>  1 file changed, 15 insertions(+), 21 deletions(-)
> 
> diff --git a/arch/alpha/kernel/setup.c b/arch/alpha/kernel/setup.c
> index b4fbbba30aa2..d88bdf852753 100644
> --- a/arch/alpha/kernel/setup.c
> +++ b/arch/alpha/kernel/setup.c
> @@ -41,19 +41,11 @@
>  #include 
>  #include 
>  #endif
> -#include 
>  #include 
>  #include 
>  #include 
>  #include 
>  
> -static int alpha_panic_event(struct notifier_block *, unsigned long, void *);
> -static struct notifier_block alpha_panic_block = {
> - alpha_panic_event,
> -NULL,
> -INT_MAX /* try to do it first */
> -};
> -
>  #include 
>  #include 
>  #include 
> @@ -435,6 +427,21 @@ static const struct sysrq_key_op srm_sysrq_reboot_op = {
>  };
>  #endif
>  
> +static int alpha_panic_event(struct notifier_block *this,
> +  unsigned long event, void *ptr)
> +{
> + /* If we are using SRM and serial console, just hard halt here. */
> + if (alpha_using_srm && srmcons_output)
> + __halt();
> +
> + return NOTIFY_DONE;
> +}
> +
> +static struct notifier_block alpha_panic_block = {
> + .notifier_call = alpha_panic_event,
> + .priority = INT_MIN, /* may not return, do it last */
> +};
> +
>  void __init
>  setup_arch(char **cmdline_p)
>  {
> @@ -1427,19 +1434,6 @@ const struct seq_operations cpuinfo_op = {
>   .show   = show_cpuinfo,
>  };
>  
> -
> -static int
> -alpha_panic_event(struct notifier_block *this, unsigned long event, void 
> *ptr)
> -{
> -#if 1
> - /* FIXME FIXME FIXME */
> - /* If we are using SRM and serial console, just hard halt here. */
> - if (alpha_using_srm && srmcons_output)
> - __halt();
> -#endif
> -return NOTIFY_DONE;
> -}
> -
>  static __init int add_pcspkr(void)
>  {
>   struct platform_device *pd;


Hi folks, I'm updating Richard's email and re-sending the V1, any
reviews are greatly appreciated!

Cheers,


Guilherme


Re: [PATCH v6 22/29] x86/watchdog/hardlockup: Add an HPET-based hardlockup detector

2022-05-09 Thread Thomas Gleixner
On Thu, May 05 2022 at 17:00, Ricardo Neri wrote:
> + if (is_hpet_hld_interrupt(hdata)) {
> + /*
> +  * Kick the timer first. If the HPET channel is periodic, it
> +  * helps to reduce the delta between the expected TSC value and
> +  * its actual value the next time the HPET channel fires.
> +  */
> + kick_timer(hdata, !(hdata->has_periodic));
> +
> + if (cpumask_weight(hld_data->monitored_cpumask) > 1) {
> + /*
> +  * Since we cannot know the source of an NMI, the best
> +  * we can do is to use a flag to indicate to all online
> +  * CPUs that they will get an NMI and that the source of
> +  * that NMI is the hardlockup detector. Offline CPUs
> +  * also receive the NMI but they ignore it.
> +  *
> +  * Even though we are in NMI context, we have concluded
> +  * that the NMI came from the HPET channel assigned to
> +  * the detector, an event that is infrequent and only
> +  * occurs in the handling CPU. There should not be races
> +  * with other NMIs.
> +  */
> + cpumask_copy(hld_data->inspect_cpumask,
> +  cpu_online_mask);
> +
> + /* If we are here, IPI shorthands are enabled. */
> + apic->send_IPI_allbutself(NMI_VECTOR);

So if the monitored cpumask is a subset of online CPUs, which is the
case when isolation features are enabled, then you still send NMIs to
those isolated CPUs. I'm sure the isolation folks will be enthused.

Thanks,

tglx


Re: [PATCH v6 21/29] x86/nmi: Add an NMI_WATCHDOG NMI handler category

2022-05-09 Thread Thomas Gleixner
On Thu, May 05 2022 at 17:00, Ricardo Neri wrote:
> Add a NMI_WATCHDOG as a new category of NMI handler. This new category
> is to be used with the HPET-based hardlockup detector. This detector
> does not have a direct way of checking if the HPET timer is the source of
> the NMI. Instead, it indirectly estimates it using the time-stamp counter.
>
> Therefore, we may have false-positives in case another NMI occurs within
> the estimated time window. For this reason, we want the handler of the
> detector to be called after all the NMI_LOCAL handlers. A simple way
> of achieving this with a new NMI handler category.
>
> @@ -379,6 +385,10 @@ static noinstr void default_do_nmi(struct pt_regs *regs)
>   }
>   raw_spin_unlock(_reason_lock);
>  
> + handled = nmi_handle(NMI_WATCHDOG, regs);
> + if (handled == NMI_HANDLED)
> + goto out;
> +

How is this supposed to work reliably?

If perf is active and the HPET NMI and the perf NMI come in around the
same time, then nmi_handle(LOCAL) can swallow the NMI and the watchdog
won't be checked. Because MSI is strictly edge and the message is only
sent once, this can result in a stale watchdog, no?

Thanks,

tglx





Re: [PATCH] bug: Use normal relative pointers in 'struct bug_entry'

2022-05-09 Thread Michael Ellerman
Josh Poimboeuf  writes:
> With CONFIG_GENERIC_BUG_RELATIVE_POINTERS, the addr/file relative
> pointers are calculated weirdly: based on the beginning of the bug_entry
> struct address, rather than their respective pointer addresses.
>
> Make the relative pointers less surprising to both humans and tools by
> calculating them the normal way.
>
> Signed-off-by: Josh Poimboeuf 
> ---
...
> diff --git a/arch/powerpc/include/asm/bug.h b/arch/powerpc/include/asm/bug.h
> index ecbae1832de3..76252576d889 100644
> --- a/arch/powerpc/include/asm/bug.h
> +++ b/arch/powerpc/include/asm/bug.h
> @@ -13,7 +13,8 @@
>  #ifdef CONFIG_DEBUG_BUGVERBOSE
>  .macro __EMIT_BUG_ENTRY addr,file,line,flags
>.section __bug_table,"aw"
> -5001: .4byte \addr - 5001b, 5002f - 5001b
> +5001: .4byte \addr - .
> +  .4byte 5002f - .
>.short \line, \flags
>.org 5001b+BUG_ENTRY_SIZE
>.previous
> @@ -24,7 +25,7 @@
>  #else
>  .macro __EMIT_BUG_ENTRY addr,file,line,flags
>.section __bug_table,"aw"
> -5001: .4byte \addr - 5001b
> +5001: .4byte \addr - .
>.short \flags
>.org 5001b+BUG_ENTRY_SIZE
>.previous

Embarrassingly, we have another copy of the logic, used in the C
versions, they need updating too:

diff --git a/arch/powerpc/include/asm/bug.h b/arch/powerpc/include/asm/bug.h
index ecbae1832de3..3fde35fd67f8 100644
--- a/arch/powerpc/include/asm/bug.h
+++ b/arch/powerpc/include/asm/bug.h
@@ -49,14 +49,14 @@
 #ifdef CONFIG_DEBUG_BUGVERBOSE
 #define _EMIT_BUG_ENTRY\
".section __bug_table,\"aw\"\n" \
-   "2:\t.4byte 1b - 2b, %0 - 2b\n" \
+   "2:\t.4byte 1b - ., %0 - .\n"   \
"\t.short %1, %2\n" \
".org 2b+%3\n"  \
".previous\n"
 #else
 #define _EMIT_BUG_ENTRY\
".section __bug_table,\"aw\"\n" \
-   "2:\t.4byte 1b - 2b\n"  \
+   "2:\t.4byte 1b - .\n"   \
"\t.short %2\n" \
".org 2b+%3\n"  \
".previous\n"


With that added it seems to be working correctly for me.

Acked-by: Michael Ellerman  (powerpc)


cheers


Re: [PATCH 1/3] termbits.h: create termbits-common.h for identical bits

2022-05-09 Thread Ilpo Järvinen
On Mon, 9 May 2022, Helge Deller wrote:

> Hello Ilpo,
> 
> On 5/9/22 11:34, Ilpo Järvinen wrote:
> > Some defines are the same across all archs. Move the most obvious
> > intersection to termbits-common.h.
> 
> I like your cleanup patches, but in this specific one, does it makes sense
> to split up together-belonging constants, e.g.
> 
> > diff --git a/arch/parisc/include/uapi/asm/termbits.h 
> > b/arch/parisc/include/uapi/asm/termbits.h
> > index 6017ee08f099..7f74a822b7ea 100644
> > --- a/arch/parisc/include/uapi/asm/termbits.h
> > +++ b/arch/parisc/include/uapi/asm/termbits.h
> > @@ -61,31 +61,15 @@ struct ktermios {
> >
> >
> >  /* c_iflag bits */
> > -#define IGNBRK 0x1
> > -#define BRKINT 0x2
> > -#define IGNPAR 0x4
> > -#define PARMRK 0x8
> > -#define INPCK  0x00010
> > -#define ISTRIP 0x00020
> > -#define INLCR  0x00040
> > -#define IGNCR  0x00080
> > -#define ICRNL  0x00100
> >  #define IUCLC  0x00200
> >  #define IXON   0x00400
> > -#define IXANY  0x00800
> >  #define IXOFF  0x01000
> >  #define IMAXBEL0x04000
> >  #define IUTF8  0x08000
> 
> In the hunk above you leave IUCLC, IXON, IXOFF... because they seem unique to 
> parisc.
> The other defines are then taken from generic header.
> Although this is correct, it leaves single values alone, which make it hard 
> to verify
> because you don't see the full list of values in one place.

While I too am fine either way, I don't think these are as strongly 
grouped as you seem to imply. There's no big advantage in having as much 
as possible within the same file. If somebody is looking for the meaning 
of these, these headers are no match when compared e.g. with stty manpage.

For c_iflag, the break, parity and cr related "groups" within c_iflag are 
moving completely to common header.

IXANY is probably only one close to borderline whether it kind of belongs 
to the same group as IXON/IXOFF (which both by chance both remained on the 
same side in the split). I don't think it does strongly enough to warrant 
keeping them next to each other but I'm open what opinions others have on 
it.

The rest in c_iflag don't seem to be strongly tied/grouped to the other 
defines within c_iflag. They're just bits that appear next/close to each 
other but are not tied by any significant meaning-based connection.

C_oflag is more messy. I exercised grouping based judgement with c_oflag 
where only the defines with all bits as zero would have moved to the 
common header breaking the groups very badly. That is, only CR0 would have 
moved and CR1-3 remained in arch headers, etc. which made no sense to do. 
One could argue, that since ONLCR (and perhaps CRDLY) are not moving, no 
other cr related defines should move either.

> > @@ -112,24 +96,6 @@ struct ktermios {
> >
> >  /* c_cflag bit meaning */
> >  #define CBAUD  0x100f
> > -#define  B00x  /* hang up */
> > -#define  B50   0x0001
> > -#define  B75   0x0002
> > -#define  B110  0x0003
> > -#define  B134  0x0004
> > -#define  B150  0x0005
> > -#define  B200  0x0006
> > -#define  B300  0x0007
> > -#define  B600  0x0008
> > -#define  B1200 0x0009
> > -#define  B1800 0x000a
> > -#define  B2400 0x000b
> > -#define  B4800 0x000c
> > -#define  B9600 0x000d
> > -#define  B192000x000e
> > -#define  B384000x000f
> > -#define EXTA B19200
> > -#define EXTB B38400
> 
> Here all baud values are dropped and will be taken from generic header, 
> which is good. 
> 
> That said, I think it's good to move away the second hunk,
> but maybe we should keep the first as is?
> 
> It's just a thought. Either way, I'm fine your patch if that's the
> way which is decided to go for all platforms.

Yes, lets wait and see what the others think.

Thanks for taking a look!

-- 
 i.


Re: [PATCH 1/3] termbits.h: create termbits-common.h for identical bits

2022-05-09 Thread Helge Deller
Hello Ilpo,

On 5/9/22 11:34, Ilpo Järvinen wrote:
> Some defines are the same across all archs. Move the most obvious
> intersection to termbits-common.h.

I like your cleanup patches, but in this specific one, does it makes sense
to split up together-belonging constants, e.g.

> diff --git a/arch/parisc/include/uapi/asm/termbits.h 
> b/arch/parisc/include/uapi/asm/termbits.h
> index 6017ee08f099..7f74a822b7ea 100644
> --- a/arch/parisc/include/uapi/asm/termbits.h
> +++ b/arch/parisc/include/uapi/asm/termbits.h
> @@ -61,31 +61,15 @@ struct ktermios {
>
>
>  /* c_iflag bits */
> -#define IGNBRK   0x1
> -#define BRKINT   0x2
> -#define IGNPAR   0x4
> -#define PARMRK   0x8
> -#define INPCK0x00010
> -#define ISTRIP   0x00020
> -#define INLCR0x00040
> -#define IGNCR0x00080
> -#define ICRNL0x00100
>  #define IUCLC0x00200
>  #define IXON 0x00400
> -#define IXANY0x00800
>  #define IXOFF0x01000
>  #define IMAXBEL  0x04000
>  #define IUTF80x08000

In the hunk above you leave IUCLC, IXON, IXOFF... because they seem unique to 
parisc.
The other defines are then taken from generic header.
Although this is correct, it leaves single values alone, which make it hard to 
verify
because you don't see the full list of values in one place.

> @@ -112,24 +96,6 @@ struct ktermios {
>
>  /* c_cflag bit meaning */
>  #define CBAUD0x100f
> -#define  B0  0x  /* hang up */
> -#define  B50 0x0001
> -#define  B75 0x0002
> -#define  B1100x0003
> -#define  B1340x0004
> -#define  B1500x0005
> -#define  B2000x0006
> -#define  B3000x0007
> -#define  B6000x0008
> -#define  B1200   0x0009
> -#define  B1800   0x000a
> -#define  B2400   0x000b
> -#define  B4800   0x000c
> -#define  B9600   0x000d
> -#define  B19200  0x000e
> -#define  B38400  0x000f
> -#define EXTA B19200
> -#define EXTB B38400

Here all baud values are dropped and will be taken from generic header, which 
is good.

That said, I think it's good to move away the second hunk,
but maybe we should keep the first as is?

It's just a thought. Either way, I'm fine your patch if that's the
way which is decided to go for all platforms.

Helge


Re: [PATCH v6 00/23] Rust support

2022-05-09 Thread Wei Liu
On Sat, May 07, 2022 at 01:06:18AM -0700, Kees Cook wrote:
> On Sat, May 07, 2022 at 07:23:58AM +0200, Miguel Ojeda wrote:
> > ## Patch series status
> > 
> > The Rust support is still to be considered experimental. However,
> > support is good enough that kernel developers can start working on the
> > Rust abstractions for subsystems and write drivers and other modules.
> 
> I'd really like to see this landed for a few reasons:
> 
> - It's under active development, and I'd rather review the changes
>   "normally", incrementally, etc. Right now it can be hard to re-review
>   some of the "mostly the same each version" patches in the series.
> 
> - I'd like to break the catch-22 of "ask for a new driver to be
>   written in rust but the rust support isn't landed" vs "the rust
>   support isn't landed because there aren't enough drivers". It
>   really feels like "release early, release often" is needed here;
>   it's hard to develop against -next. :)

+1 to both points. :-)


[PATCH 3/3] termbits.h: Remove posix_types.h include

2022-05-09 Thread Ilpo Järvinen
Nothing in termbits seems to require anything from linux/posix_types.h.

Signed-off-by: Ilpo Järvinen 
---
 arch/alpha/include/uapi/asm/termbits.h  | 2 --
 arch/mips/include/uapi/asm/termbits.h   | 2 --
 arch/parisc/include/uapi/asm/termbits.h | 2 --
 arch/sparc/include/uapi/asm/termbits.h  | 2 --
 include/uapi/asm-generic/termbits.h | 2 --
 5 files changed, 10 deletions(-)

diff --git a/arch/alpha/include/uapi/asm/termbits.h 
b/arch/alpha/include/uapi/asm/termbits.h
index 735e9ffe2795..f1290b22072b 100644
--- a/arch/alpha/include/uapi/asm/termbits.h
+++ b/arch/alpha/include/uapi/asm/termbits.h
@@ -2,8 +2,6 @@
 #ifndef _ALPHA_TERMBITS_H
 #define _ALPHA_TERMBITS_H
 
-#include 
-
 #include 
 
 typedef unsigned int   tcflag_t;
diff --git a/arch/mips/include/uapi/asm/termbits.h 
b/arch/mips/include/uapi/asm/termbits.h
index 8fa3e79d4f94..1eb60903d6f0 100644
--- a/arch/mips/include/uapi/asm/termbits.h
+++ b/arch/mips/include/uapi/asm/termbits.h
@@ -11,8 +11,6 @@
 #ifndef _ASM_TERMBITS_H
 #define _ASM_TERMBITS_H
 
-#include 
-
 #include 
 
 typedef unsigned int   tcflag_t;
diff --git a/arch/parisc/include/uapi/asm/termbits.h 
b/arch/parisc/include/uapi/asm/termbits.h
index d72c5ebf3a3a..3a8938d26fb4 100644
--- a/arch/parisc/include/uapi/asm/termbits.h
+++ b/arch/parisc/include/uapi/asm/termbits.h
@@ -2,8 +2,6 @@
 #ifndef __ARCH_PARISC_TERMBITS_H__
 #define __ARCH_PARISC_TERMBITS_H__
 
-#include 
-
 #include 
 
 typedef unsigned int   tcflag_t;
diff --git a/arch/sparc/include/uapi/asm/termbits.h 
b/arch/sparc/include/uapi/asm/termbits.h
index cfcc4e07ce51..4321322701fc 100644
--- a/arch/sparc/include/uapi/asm/termbits.h
+++ b/arch/sparc/include/uapi/asm/termbits.h
@@ -2,8 +2,6 @@
 #ifndef _UAPI_SPARC_TERMBITS_H
 #define _UAPI_SPARC_TERMBITS_H
 
-#include 
-
 #include 
 
 #if defined(__sparc__) && defined(__arch64__)
diff --git a/include/uapi/asm-generic/termbits.h 
b/include/uapi/asm-generic/termbits.h
index c92179563289..890ef29053e2 100644
--- a/include/uapi/asm-generic/termbits.h
+++ b/include/uapi/asm-generic/termbits.h
@@ -2,8 +2,6 @@
 #ifndef __ASM_GENERIC_TERMBITS_H
 #define __ASM_GENERIC_TERMBITS_H
 
-#include 
-
 #include 
 
 typedef unsigned int   tcflag_t;
-- 
2.30.2



[PATCH 2/3] termbits.h: Align lines & format

2022-05-09 Thread Ilpo Järvinen
- Align c_cc defines.
- Remove extra newlines.
- Realign & adjust number of leading zeros.
- Reorder c_cflag defines to ascending order
- Make comment ending shorted (=remove period and one extra space from
  the comments in mips).

Co-developed-by: Arnd Bergmann 
Signed-off-by: Arnd Bergmann 
Signed-off-by: Ilpo Järvinen 
---
 arch/alpha/include/uapi/asm/termbits.h   | 130 +-
 arch/mips/include/uapi/asm/termbits.h| 156 ++---
 arch/parisc/include/uapi/asm/termbits.h  |  77 +--
 arch/powerpc/include/uapi/asm/termbits.h | 100 +++---
 arch/sparc/include/uapi/asm/termbits.h   | 168 +++
 include/uapi/asm-generic/termbits.h  |  76 +-
 6 files changed, 346 insertions(+), 361 deletions(-)

diff --git a/arch/alpha/include/uapi/asm/termbits.h 
b/arch/alpha/include/uapi/asm/termbits.h
index 3c7a9afc4333..735e9ffe2795 100644
--- a/arch/alpha/include/uapi/asm/termbits.h
+++ b/arch/alpha/include/uapi/asm/termbits.h
@@ -53,60 +53,58 @@ struct ktermios {
 };
 
 /* c_cc characters */
-#define VEOF 0
-#define VEOL 1
-#define VEOL2 2
-#define VERASE 3
-#define VWERASE 4
-#define VKILL 5
-#define VREPRINT 6
-#define VSWTC 7
-#define VINTR 8
-#define VQUIT 9
-#define VSUSP 10
-#define VSTART 12
-#define VSTOP 13
-#define VLNEXT 14
-#define VDISCARD 15
-#define VMIN 16
-#define VTIME 17
+#define VEOF0
+#define VEOL1
+#define VEOL2   2
+#define VERASE  3
+#define VWERASE 4
+#define VKILL   5
+#define VREPRINT6
+#define VSWTC   7
+#define VINTR   8
+#define VQUIT   9
+#define VSUSP  10
+#define VSTART 12
+#define VSTOP  13
+#define VLNEXT 14
+#define VDISCARD   15
+#define VMIN   16
+#define VTIME  17
 
 /* c_iflag bits */
-#define IXON   0x00200
-#define IXOFF  0x00400
-#define IUCLC  0x01000
-#define IMAXBEL0x02000
-#define IUTF8  0x04000
+#define IXON   0x0200
+#define IXOFF  0x0400
+#define IUCLC  0x1000
+#define IMAXBEL0x2000
+#define IUTF8  0x4000
 
 /* c_oflag bits */
 #define ONLCR  0x2
 #define OLCUC  0x4
-
-
-#define NLDLY  0x000300
-#define   NL0  0x00
-#define   NL1  0x000100
-#define   NL2  0x000200
-#define   NL3  0x000300
-#define TABDLY 0x000c00
-#define   TAB0 0x00
-#define   TAB1 0x000400
-#define   TAB2 0x000800
-#define   TAB3 0x000c00
-#define CRDLY  0x003000
-#define   CR0  0x00
-#define   CR1  0x001000
-#define   CR2  0x002000
-#define   CR3  0x003000
-#define FFDLY  0x004000
-#define   FF0  0x00
-#define   FF1  0x004000
-#define BSDLY  0x008000
-#define   BS0  0x00
-#define   BS1  0x008000
-#define VTDLY  0x01
-#define   VT0  0x00
-#define   VT1  0x01
+#define NLDLY  0x00300
+#define   NL0  0x0
+#define   NL1  0x00100
+#define   NL2  0x00200
+#define   NL3  0x00300
+#define TABDLY 0x00c00
+#define   TAB0 0x0
+#define   TAB1 0x00400
+#define   TAB2 0x00800
+#define   TAB3 0x00c00
+#define CRDLY  0x03000
+#define   CR0  0x0
+#define   CR1  0x01000
+#define   CR2  0x02000
+#define   CR3  0x03000
+#define FFDLY  0x04000
+#define   FF0  0x0
+#define   FF1  0x04000
+#define BSDLY  0x08000
+#define   BS0  0x0
+#define   BS1  0x08000
+#define VTDLY  0x1
+#define   VT0  0x0
+#define   VT1  0x1
 /*
  * Should be equivalent to TAB3, see description of TAB3 in
  * POSIX.1-2008, Ch. 11.2.3 "Output Modes"
@@ -116,38 +114,34 @@ struct ktermios {
 /* c_cflag bit meaning */
 #define CBAUD  0x001f
 #define CBAUDEX0x
-#define  B576000x0010
-#define  B115200   0x0011
-#define  B230400   0x0012
-#define  B460800   0x0013
-#define  B50   0x0014
-#define  B576000   0x0015
-#define  B921600   0x0016
-#define B100   0x0017
-#define B1152000   0x0018
-#define B150   0x0019
-#define B200   0x001a
-#define B250   0x001b
-#define B300   0x001c
-#define B350   0x001d
-#define B400   0x001e
 #define BOTHER 0x001f
-
+#define B57600 0x0010
+#defineB115200 0x0011
+#defineB230400 0x0012
+#defineB460800 0x0013
+#defineB50 0x0014
+#defineB576000 0x0015
+#defineB921600 0x0016
+#define   B100 0x0017
+#define   B1152000 0x0018
+#define   B150 0x0019
+#define   B200 0x001a
+#define   B250 0x001b
+#define   B300 0x001c
+#define   B350 0x001d
+#define   B400 0x001e
 #define CSIZE  0x0300
 #define   CS5  0x
 #define   CS6  0x0100
 #define   CS7  0x0200
 #define   CS8  0x0300
-
 #define CSTOPB 0x0400
 #define CREAD  0x0800
 #define PARENB 0x1000
 

[PATCH 1/3] termbits.h: create termbits-common.h for identical bits

2022-05-09 Thread Ilpo Järvinen
Some defines are the same across all archs. Move the most obvious
intersection to termbits-common.h.

Signed-off-by: Ilpo Järvinen 
---
 arch/alpha/include/uapi/asm/termbits.h | 52 +
 arch/mips/include/uapi/asm/termbits.h  | 53 +-
 arch/parisc/include/uapi/asm/termbits.h| 54 +-
 arch/powerpc/include/uapi/asm/termbits.h   | 52 +
 arch/sparc/include/uapi/asm/termbits.h | 53 +-
 include/uapi/asm-generic/termbits-common.h | 65 ++
 include/uapi/asm-generic/termbits.h| 53 +-
 7 files changed, 76 insertions(+), 306 deletions(-)
 create mode 100644 include/uapi/asm-generic/termbits-common.h

diff --git a/arch/alpha/include/uapi/asm/termbits.h 
b/arch/alpha/include/uapi/asm/termbits.h
index 30dc7ff777b8..3c7a9afc4333 100644
--- a/arch/alpha/include/uapi/asm/termbits.h
+++ b/arch/alpha/include/uapi/asm/termbits.h
@@ -4,8 +4,8 @@
 
 #include 
 
-typedef unsigned char  cc_t;
-typedef unsigned int   speed_t;
+#include 
+
 typedef unsigned int   tcflag_t;
 
 /*
@@ -72,33 +72,17 @@ struct ktermios {
 #define VTIME 17
 
 /* c_iflag bits */
-#define IGNBRK 0x1
-#define BRKINT 0x2
-#define IGNPAR 0x4
-#define PARMRK 0x8
-#define INPCK  0x00010
-#define ISTRIP 0x00020
-#define INLCR  0x00040
-#define IGNCR  0x00080
-#define ICRNL  0x00100
 #define IXON   0x00200
 #define IXOFF  0x00400
-#define IXANY  0x00800
 #define IUCLC  0x01000
 #define IMAXBEL0x02000
 #define IUTF8  0x04000
 
 /* c_oflag bits */
-#define OPOST  0x1
 #define ONLCR  0x2
 #define OLCUC  0x4
 
-#define OCRNL  0x8
-#define ONOCR  0x00010
-#define ONLRET 0x00020
 
-#define OFILL  0x40
-#define OFDEL  0x80
 #define NLDLY  0x000300
 #define   NL0  0x00
 #define   NL1  0x000100
@@ -131,24 +115,6 @@ struct ktermios {
 
 /* c_cflag bit meaning */
 #define CBAUD  0x001f
-#define  B00x  /* hang up */
-#define  B50   0x0001
-#define  B75   0x0002
-#define  B110  0x0003
-#define  B134  0x0004
-#define  B150  0x0005
-#define  B200  0x0006
-#define  B300  0x0007
-#define  B600  0x0008
-#define  B1200 0x0009
-#define  B1800 0x000a
-#define  B2400 0x000b
-#define  B4800 0x000c
-#define  B9600 0x000d
-#define  B192000x000e
-#define  B384000x000f
-#define EXTA B19200
-#define EXTB B38400
 #define CBAUDEX0x
 #define  B576000x0010
 #define  B115200   0x0011
@@ -180,11 +146,8 @@ struct ktermios {
 #define HUPCL  0x4000
 
 #define CLOCAL 0x8000
-#define CMSPAR 0x4000  /* mark or space (stick) parity */
-#define CRTSCTS0x8000  /* flow control */
 
 #define CIBAUD 0x1f
-#define IBSHIFT16
 
 /* c_lflag bits */
 #define ISIG   0x0080
@@ -204,17 +167,6 @@ struct ktermios {
 #define IEXTEN 0x0400
 #define EXTPROC0x1000
 
-/* Values for the ACTION argument to `tcflow'.  */
-#defineTCOOFF  0
-#defineTCOON   1
-#defineTCIOFF  2
-#defineTCION   3
-
-/* Values for the QUEUE_SELECTOR argument to `tcflush'.  */
-#defineTCIFLUSH0
-#defineTCOFLUSH1
-#defineTCIOFLUSH   2
-
 /* Values for the OPTIONAL_ACTIONS argument to `tcsetattr'.  */
 #defineTCSANOW 0
 #defineTCSADRAIN   1
diff --git a/arch/mips/include/uapi/asm/termbits.h 
b/arch/mips/include/uapi/asm/termbits.h
index d1315ccdb39a..b33f514315ce 100644
--- a/arch/mips/include/uapi/asm/termbits.h
+++ b/arch/mips/include/uapi/asm/termbits.h
@@ -13,8 +13,8 @@
 
 #include 
 
-typedef unsigned char cc_t;
-typedef unsigned int speed_t;
+#include 
+
 typedef unsigned int tcflag_t;
 
 /*
@@ -80,31 +80,15 @@ struct ktermios {
 #define VEOL   17  /* End-of-line character [ICANON].  */
 
 /* c_iflag bits */
-#define IGNBRK 0x1 /* Ignore break condition.  */
-#define BRKINT 0x2 /* Signal interrupt on break.  */
-#define IGNPAR 0x4 /* Ignore characters with parity errors.  */
-#define PARMRK 0x8 /* Mark parity and framing errors.  */
-#define INPCK  0x00010 /* Enable input parity check.  */
-#define ISTRIP 0x00020 /* Strip 8th bit off characters.  */
-#define INLCR  0x00040 /* Map NL to CR on input.  */
-#define IGNCR  0x00080 /* Ignore CR.  */
-#define ICRNL  0x00100 /* Map CR to NL on input.  */
 #define IUCLC  0x00200 /* Map upper case to lower case on input.  */
 #define IXON   0x00400 /* Enable start/stop output control.  */
-#define IXANY  0x00800 /* Any character will restart after stop.  */
 #define IXOFF  

[PATCH 0/3] termbits.h: Further improvements

2022-05-09 Thread Ilpo Järvinen
Again, I prefer Greg to take these through his tty tree.

Changes done by this serie:

1) Create termbits-common.h for the most obvious termbits.h intersection.
2) Reformat some lines that remain in termbits.h files.
3) Don't include posix_types.h unnecessarily.

Please do check I got the uapi include things done correctly! That is,
that including using asm-generic/termbits-common.h path is the preferred
approach for a header file that is not supposed to be overridden by the
arch specific header files and that mandatory-y is not required for
termbits-common.h.

Unfortunately I couldn't move also tcflag_t into termbits-common.h due
to the way it is being defined for sparc. However, by the looks of how
the type for tcflag_t is being chosen there, having just unsigned int
might work also for sparc?

Ilpo Järvinen (3):
  termbits.h: create termbits-common.h for identical bits
  termbits.h: Align lines & format
  termbits.h: Remove posix_types.h include

 arch/alpha/include/uapi/asm/termbits.h | 182 ++---
 arch/mips/include/uapi/asm/termbits.h  | 209 ---
 arch/parisc/include/uapi/asm/termbits.h| 131 
 arch/powerpc/include/uapi/asm/termbits.h   | 152 +-
 arch/sparc/include/uapi/asm/termbits.h | 223 -
 include/uapi/asm-generic/termbits-common.h |  65 ++
 include/uapi/asm-generic/termbits.h| 129 
 7 files changed, 418 insertions(+), 673 deletions(-)
 create mode 100644 include/uapi/asm-generic/termbits-common.h

-- 
2.30.2



Re: [PATCH v2 1/3] mm: change huge_ptep_clear_flush() to return the original pte

2022-05-09 Thread Baolin Wang




On 5/9/2022 1:46 PM, Christophe Leroy wrote:



Le 08/05/2022 à 15:09, Baolin Wang a écrit :



On 5/8/2022 7:09 PM, Muchun Song wrote:

On Sun, May 08, 2022 at 05:36:39PM +0800, Baolin Wang wrote:

It is incorrect to use ptep_clear_flush() to nuke a hugetlb page
table when unmapping or migrating a hugetlb page, and will change
to use huge_ptep_clear_flush() instead in the following patches.

So this is a preparation patch, which changes the
huge_ptep_clear_flush()
to return the original pte to help to nuke a hugetlb page table.

Signed-off-by: Baolin Wang 
Acked-by: Mike Kravetz 


Reviewed-by: Muchun Song 


Thanks for reviewing.



But one nit below:

[...]

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 8605d7e..61a21af 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -5342,7 +5342,7 @@ static vm_fault_t hugetlb_wp(struct mm_struct
*mm, struct vm_area_struct *vma,
   ClearHPageRestoreReserve(new_page);
   /* Break COW or unshare */
-    huge_ptep_clear_flush(vma, haddr, ptep);
+    (void)huge_ptep_clear_flush(vma, haddr, ptep);


Why add a "(void)" here? Is there any warning if no "(void)"?
IIUC, I think we can remove this, right?


I did not meet any warning without the casting, but this is per Mike's
comment[1] to make the code consistent with other functions casting to
void type explicitly in hugetlb.c file.

[1]
https://lore.kernel.org/all/495c4ebe-a5b4-afb6-4cb0-956c1b18d...@oracle.com/



As far as I understand, Mike said that you should be accompagnied with a
big fat comment explaining why we ignore the returned value from
huge_ptep_clear_flush(). >
By the way huge_ptep_clear_flush() is not declared 'must_check' so this
cast is just visual polution and should be removed.

In the meantime the comment suggested by Mike should be added instead.
Sorry for my misunderstanding. I just follow the explicit void casting 
like other places in hugetlb.c file. And I am not sure if it is useful 
adding some comments like below, since we did not need the original pte 
value in the COW case mapping with a new page, and the code is more 
readable already I think.


Mike, could you help to clarify what useful comments would you like? and 
remove the explicit void casting? Thanks.


/*
 * Just ignore the return value with new page mapped.
 */


Re: [PATCH kernel] powerpc/llvm/lto: Allow LLVM LTO builds

2022-05-09 Thread Alexey Kardashevskiy




On 5/9/22 15:18, Alexey Kardashevskiy wrote:



On 5/4/22 07:21, Nick Desaulniers wrote:
On Thu, Apr 28, 2022 at 11:46 PM Alexey Kardashevskiy  
wrote:


This enables LTO_CLANG builds on POWER with the upstream version of
LLVM.

LTO optimizes the output vmlinux binary and this may affect the FTP
alternative section if alt branches use "bc" (Branch Conditional) which
is limited by 16 bit offsets. This shows up in errors like:

ld.lld: error: InputSection too large for range extension thunk 
vmlinux.o:(__ftr_alt_97+0xF0)


This works around the issue by replacing "bc" in FTR_SECTION_ELSE with
"b" which allows 26 bit offsets.

This catches the problem instructions in vmlinux.o before it LTO'ed:

$ objdump -d -M raw -j __ftr_alt_97 vmlinux.o | egrep '\S+\s*\'
   30:   00 00 82 40 bc  4,eq,30 <__ftr_alt_97+0x30>
   f0:   00 00 82 40 bc  4,eq,f0 <__ftr_alt_97+0xf0>

This allows LTO builds for ppc64le_defconfig plus LTO options.
Note that DYNAMIC_FTRACE/FUNCTION_TRACER is not supported by LTO builds
but this is not POWERPC-specific.


$ ARCH=powerpc make LLVM=1 -j72 ppc64le_defconfig
$ ARCH=powerpc make LLVM=1 -j72 menuconfig

$ ARCH=powerpc make LLVM=1 -j72
...
   VDSO64L arch/powerpc/kernel/vdso/vdso64.so.dbg
/usr/bin/powerpc64le-linux-gnu-ld:
/android0/llvm-project/llvm/build/bin/../lib/LLVMgold.so: error
loading plugin:
/android0/llvm-project/llvm/build/bin/../lib/LLVMgold.so: cannot open
shared object file: No such file or directory
clang-15: error: linker command failed with exit code 1 (use -v to see
invocation)
make[1]: *** [arch/powerpc/kernel/vdso/Makefile:67:
arch/powerpc/kernel/vdso/vdso64.so.dbg] Error 1

Looks like LLD isn't being invoked correctly to link the vdso.
Probably need to revisit
https://lore.kernel.org/lkml/20200901222523.1941988-1-ndesaulni...@google.com/

How were you working around this issue? Perhaps you built clang to
default to LLD? (there's a cmake option for that)



What option is that? I only add  -DLLVM_ENABLE_LLD=ON  which (I think) 
tells cmake to use lld to link the LLVM being built but does not seem to 
tell what the built clang should do.


Without -DLLVM_ENABLE_LLD=ON, building just fails:

[fstn1-p1 ~/pbuild/llvm/llvm-lto-latest-cleanbuild]$ ninja -j 100
[619/3501] Linking CXX executable bin/not
FAILED: bin/not
: && /usr/bin/clang++ -fPIC -fvisibility-inlines-hidden 
-Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra 
-Wno-unused-parameter -Wwrite-strings -Wcast-qual 
-Wmissing-field-initializers -pedantic -Wno-long-long 
-Wc++98-compat-extra-semi -Wimplicit-fallthrough 
-Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor 
-Wdelete-non-virtual-dtor -Wsuggest-override -Wstring-conversion 
-Wmisleading-indentation -fdiagnostics-color -ffunction-sections 
-fdata-sections -flto -O3 -DNDEBUG -flto 
-Wl,-rpath-link,/home/aik/pbuild/llvm/llvm-lto-latest-cleanbuild/./lib 
-Wl,--gc-sections utils/not/CMakeFiles/not.dir/not.cpp.o -o bin/not 
-Wl,-rpath,"\$ORIGIN/../lib"  -lpthread  lib/libLLVMSupport.a  -lrt 
-ldl  -lpthread  -lm  /usr/lib/powerpc64le-linux-gnu/libz.so 
/usr/lib/powerpc64le-linux-gnu/libtinfo.so  lib/libLLVMDemangle.a && :
/usr/bin/ld: lib/libLLVMSupport.a: error adding symbols: archive has no 
index; run ranlib to add one
clang: error: linker command failed with exit code 1 (use -v to see 
invocation)
[701/3501] Building CXX object 
utils/TableGen/CMakeFiles/llvm-tblgen.dir/GlobalISelEmitter.cpp.o

ninja: build stopped: subcommand failed.



My head hurts :(
The above example is running on PPC. Now I am trying x86 box:



A bit of progress.

cmake -G Ninja -DLLVM_ENABLE_PROJECTS="clang;lld" 
-DLLVM_TARGET_ARCH=PowerPC -DLLVM_TARGETS_TO_BUILD=PowerPC 
~/llvm-project//llvm -DLLVM_ENABLE_LTO=ON 
-DLLVM_BINUTILS_INCDIR=/usr/lib/gcc/powerpc64le-linux-gnu/11/plugin/include/ 
-DCMAKE_BUILD_TYPE=Release


produces:

-- Native target architecture is PowerPC 



-- LLVM host triple: x86_64-unknown-linux-gnu
-- LLVM default target triple: x86_64-unknown-linux-gnu


and the resulting "clang" can only to "Target: 
x86_64-unknown-linux-gnu", how do you build LLVM exactly? Thanks,





[PATCH kernel] KVM: PPC: Book3s: Remove real mode interrupt controller hcalls handlers

2022-05-09 Thread Alexey Kardashevskiy
Currently we have 2 sets of interrupt controller hypercalls handlers
for real and virtual modes, this is from POWER8 times when switching
MMU on was considered an expensive operation.

POWER9 however does not have dependent threads and MMU is enabled for
handling hcalls so the XIVE native or XICS-on-XIVE real mode handlers
never execute on real P9 and later CPUs.

This untemplate the handlers and only keeps the real mode handlers for
XICS native (up to POWER8) and remove the rest of dead code. Changes
in functions are mechanical except few missing empty lines to make
checkpatch.pl happy.

The default implemented hcalls list already contains XICS hcalls so
no change there.

This should not cause any behavioral change.

Signed-off-by: Alexey Kardashevskiy 
---

Minus 153 lines nevertheless.


---
 arch/powerpc/kvm/Makefile   |   2 +-
 arch/powerpc/include/asm/kvm_ppc.h  |   7 -
 arch/powerpc/kvm/book3s_xive.h  |   7 -
 arch/powerpc/kvm/book3s_hv_builtin.c|  64 ---
 arch/powerpc/kvm/book3s_hv_rm_xics.c|   5 +
 arch/powerpc/kvm/book3s_hv_rm_xive.c|  46 --
 arch/powerpc/kvm/book3s_xive.c  | 638 +++-
 arch/powerpc/kvm/book3s_xive_template.c | 636 ---
 arch/powerpc/kvm/book3s_hv_rmhandlers.S |  12 +-
 9 files changed, 632 insertions(+), 785 deletions(-)
 delete mode 100644 arch/powerpc/kvm/book3s_hv_rm_xive.c
 delete mode 100644 arch/powerpc/kvm/book3s_xive_template.c

diff --git a/arch/powerpc/kvm/Makefile b/arch/powerpc/kvm/Makefile
index 8e3681a86074..f17379b0f161 100644
--- a/arch/powerpc/kvm/Makefile
+++ b/arch/powerpc/kvm/Makefile
@@ -73,7 +73,7 @@ kvm-hv-$(CONFIG_PPC_TRANSACTIONAL_MEM) += \
book3s_hv_tm.o
 
 kvm-book3s_64-builtin-xics-objs-$(CONFIG_KVM_XICS) := \
-   book3s_hv_rm_xics.o book3s_hv_rm_xive.o
+   book3s_hv_rm_xics.o
 
 kvm-book3s_64-builtin-tm-objs-$(CONFIG_PPC_TRANSACTIONAL_MEM) += \
book3s_hv_tm_builtin.o
diff --git a/arch/powerpc/include/asm/kvm_ppc.h 
b/arch/powerpc/include/asm/kvm_ppc.h
index 44200a27371b..a775377a570e 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -787,13 +787,6 @@ long kvmppc_rm_h_page_init(struct kvm_vcpu *vcpu, unsigned 
long flags,
   unsigned long dest, unsigned long src);
 long kvmppc_hpte_hv_fault(struct kvm_vcpu *vcpu, unsigned long addr,
   unsigned long slb_v, unsigned int status, bool data);
-unsigned long kvmppc_rm_h_xirr(struct kvm_vcpu *vcpu);
-unsigned long kvmppc_rm_h_xirr_x(struct kvm_vcpu *vcpu);
-unsigned long kvmppc_rm_h_ipoll(struct kvm_vcpu *vcpu, unsigned long server);
-int kvmppc_rm_h_ipi(struct kvm_vcpu *vcpu, unsigned long server,
-unsigned long mfrr);
-int kvmppc_rm_h_cppr(struct kvm_vcpu *vcpu, unsigned long cppr);
-int kvmppc_rm_h_eoi(struct kvm_vcpu *vcpu, unsigned long xirr);
 void kvmppc_guest_entry_inject_int(struct kvm_vcpu *vcpu);
 
 /*
diff --git a/arch/powerpc/kvm/book3s_xive.h b/arch/powerpc/kvm/book3s_xive.h
index 09d0657596c3..1e48f72e8aa5 100644
--- a/arch/powerpc/kvm/book3s_xive.h
+++ b/arch/powerpc/kvm/book3s_xive.h
@@ -285,13 +285,6 @@ static inline u32 __xive_read_eq(__be32 *qpage, u32 msk, 
u32 *idx, u32 *toggle)
return cur & 0x7fff;
 }
 
-extern unsigned long xive_rm_h_xirr(struct kvm_vcpu *vcpu);
-extern unsigned long xive_rm_h_ipoll(struct kvm_vcpu *vcpu, unsigned long 
server);
-extern int xive_rm_h_ipi(struct kvm_vcpu *vcpu, unsigned long server,
-unsigned long mfrr);
-extern int xive_rm_h_cppr(struct kvm_vcpu *vcpu, unsigned long cppr);
-extern int xive_rm_h_eoi(struct kvm_vcpu *vcpu, unsigned long xirr);
-
 /*
  * Common Xive routines for XICS-over-XIVE and XIVE native
  */
diff --git a/arch/powerpc/kvm/book3s_hv_builtin.c 
b/arch/powerpc/kvm/book3s_hv_builtin.c
index 7e52d0beee77..88a8f6473c4e 100644
--- a/arch/powerpc/kvm/book3s_hv_builtin.c
+++ b/arch/powerpc/kvm/book3s_hv_builtin.c
@@ -489,70 +489,6 @@ static long kvmppc_read_one_intr(bool *again)
return kvmppc_check_passthru(xisr, xirr, again);
 }
 
-#ifdef CONFIG_KVM_XICS
-unsigned long kvmppc_rm_h_xirr(struct kvm_vcpu *vcpu)
-{
-   if (!kvmppc_xics_enabled(vcpu))
-   return H_TOO_HARD;
-   if (xics_on_xive())
-   return xive_rm_h_xirr(vcpu);
-   else
-   return xics_rm_h_xirr(vcpu);
-}
-
-unsigned long kvmppc_rm_h_xirr_x(struct kvm_vcpu *vcpu)
-{
-   if (!kvmppc_xics_enabled(vcpu))
-   return H_TOO_HARD;
-   vcpu->arch.regs.gpr[5] = get_tb();
-   if (xics_on_xive())
-   return xive_rm_h_xirr(vcpu);
-   else
-   return xics_rm_h_xirr(vcpu);
-}
-
-unsigned long kvmppc_rm_h_ipoll(struct kvm_vcpu *vcpu, unsigned long server)
-{
-   if (!kvmppc_xics_enabled(vcpu))
-   return H_TOO_HARD;
-   if (xics_on_xive())
-   return xive_rm_h_ipoll(vcpu, server);
-   else
-   

Re: [PATCH v2 3/3] mm: rmap: Fix CONT-PTE/PMD size hugetlb issue when unmapping

2022-05-09 Thread Muchun Song
On Sun, May 08, 2022 at 05:36:41PM +0800, Baolin Wang wrote:
> On some architectures (like ARM64), it can support CONT-PTE/PMD size
> hugetlb, which means it can support not only PMD/PUD size hugetlb:
> 2M and 1G, but also CONT-PTE/PMD size: 64K and 32M if a 4K page
> size specified.
> 
> When unmapping a hugetlb page, we will get the relevant page table
> entry by huge_pte_offset() only once to nuke it. This is correct
> for PMD or PUD size hugetlb, since they always contain only one
> pmd entry or pud entry in the page table.
> 
> However this is incorrect for CONT-PTE and CONT-PMD size hugetlb,
> since they can contain several continuous pte or pmd entry with
> same page table attributes, so we will nuke only one pte or pmd
> entry for this CONT-PTE/PMD size hugetlb page.
> 
> And now try_to_unmap() is only passed a hugetlb page in the case
> where the hugetlb page is poisoned. Which means now we will unmap
> only one pte entry for a CONT-PTE or CONT-PMD size poisoned hugetlb
> page, and we can still access other subpages of a CONT-PTE or CONT-PMD
> size poisoned hugetlb page, which will cause serious issues possibly.
> 
> So we should change to use huge_ptep_clear_flush() to nuke the
> hugetlb page table to fix this issue, which already considered
> CONT-PTE and CONT-PMD size hugetlb.
> 
> We've already used set_huge_swap_pte_at() to set a poisoned
> swap entry for a poisoned hugetlb page. Meanwhile adding a VM_BUG_ON()
> to make sure the passed hugetlb page is poisoned in try_to_unmap().
> 
> Signed-off-by: Baolin Wang 

Reviewed-by: Muchun Song 

Thanks.


[PATCH] powerpc/papr_scm: Fix leaking nvdimm_events_map elements

2022-05-09 Thread Vaibhav Jain
Right now 'char *' elements allocated individual 'stat_id' in
'papr_scm_priv.nvdimm_events_map' during papr_scm_pmu_check_events() leak in
papr_scm_remove() and papr_scm_pmu_register(), papr_scm_pmu_check_events() error
paths.

Also individual 'stat_id' arent NULL terminated 'char *' instead they are fixed
8-byte sized identifiers. However papr_scm_pmu_register() assumes it to be a
NULL terminated 'char *' and at other places it assumes it to be a
'papr_scm_perf_stat.stat_id' sized string which is 8-byes in size.

Fix this by allocating the memory for papr_scm_priv.nvdimm_events_map to also
include space for 'stat_id' entries. This is possible since number of available
events/stat_ids are known upfront. This saves some memory and one extra level of
indirection from 'nvdimm_events_map' to 'stat_id'. Also rest of the code
can continue to call 'kfree(papr_scm_priv.nvdimm_events_map)' without needing to
iterate over the array and free up individual elements.

Also introduce a new typedef called 'state_id_t' thats a 'u8[8]' and can be used
across papr_scm to deal with stat_ids.

Fixes: 4c08d4bbc089 ("powerpc/papr_scm: Add perf interface support")
Signed-off-by: Vaibhav Jain 
---
 arch/powerpc/platforms/pseries/papr_scm.c | 48 +++
 1 file changed, 22 insertions(+), 26 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/papr_scm.c 
b/arch/powerpc/platforms/pseries/papr_scm.c
index 39962c905542..f33a865ad5fb 100644
--- a/arch/powerpc/platforms/pseries/papr_scm.c
+++ b/arch/powerpc/platforms/pseries/papr_scm.c
@@ -70,8 +70,10 @@
 #define PAPR_SCM_PERF_STATS_VERSION 0x1
 
 /* Struct holding a single performance metric */
+typedef u8 stat_id_t[8];
+
 struct papr_scm_perf_stat {
-   u8 stat_id[8];
+   stat_id_t stat_id;
__be64 stat_val;
 } __packed;
 
@@ -126,7 +128,7 @@ struct papr_scm_priv {
u64 health_bitmap_inject_mask;
 
 /* array to have event_code and stat_id mappings */
-   char **nvdimm_events_map;
+   stat_id_t *nvdimm_events_map;
 };
 
 static int papr_scm_pmem_flush(struct nd_region *nd_region,
@@ -462,7 +464,7 @@ static int papr_scm_pmu_check_events(struct papr_scm_priv 
*p, struct nvdimm_pmu
 {
struct papr_scm_perf_stat *stat;
struct papr_scm_perf_stats *stats;
-   int index, rc, count;
+   int index, rc = 0;
u32 available_events;
 
if (!p->stat_buffer_len)
@@ -478,35 +480,29 @@ static int papr_scm_pmu_check_events(struct papr_scm_priv 
*p, struct nvdimm_pmu
return rc;
}
 
-   /* Allocate memory to nvdimm_event_map */
-   p->nvdimm_events_map = kcalloc(available_events, sizeof(char *), 
GFP_KERNEL);
-   if (!p->nvdimm_events_map) {
-   rc = -ENOMEM;
-   goto out_stats;
-   }
-
/* Called to get list of events supported */
rc = drc_pmem_query_stats(p, stats, 0);
if (rc)
-   goto out_nvdimm_events_map;
-
-   for (index = 0, stat = stats->scm_statistic, count = 0;
-index < available_events; index++, ++stat) {
-   p->nvdimm_events_map[count] = kmemdup_nul(stat->stat_id, 8, 
GFP_KERNEL);
-   if (!p->nvdimm_events_map[count]) {
-   rc = -ENOMEM;
-   goto out_nvdimm_events_map;
-   }
+   goto out;
 
-   count++;
+   /*
+* Allocate memory and populate nvdimm_event_map.
+* Allocate an extra element for NULL entry
+*/
+   p->nvdimm_events_map = kcalloc(available_events + 1,
+  sizeof(stat_id_t), GFP_KERNEL);
+   if (!p->nvdimm_events_map) {
+   rc = -ENOMEM;
+   goto out;
}
-   p->nvdimm_events_map[count] = NULL;
-   kfree(stats);
-   return 0;
 
-out_nvdimm_events_map:
-   kfree(p->nvdimm_events_map);
-out_stats:
+   /* Copy all stat_ids to event map */
+   for (index = 0, stat = stats->scm_statistic;
+index < available_events; index++, ++stat) {
+   memcpy(>nvdimm_events_map[index], >stat_id,
+  sizeof(stat_id_t));
+   }
+out:
kfree(stats);
return rc;
 }

base-commit: 348c71344111d7a48892e3e52264ff11956fc196
-- 
2.35.1