Re: [powerpc]Kernel crash while running xfstests (generic/250) [next-20220404]
> On 07-Apr-2022, at 10:19 AM, Sachin Sant wrote: > > >> On 04-Apr-2022, at 5:04 PM, Sachin Sant wrote: >> >> While running xfstests(ext4 or XFS as fs) on a Power10 LPAR booted with >> today’s >> next (5.18.0-rc1-next-20220404) following crash is seen. >> >> This problem was possibly introduced with 5.17.0-next-20220330. >> Git bisect leads me to following patch >> commit 1d158814db8e7b3cbca0f2c8d9242fbec4fbc57e >> dm: conditionally enable BIOSET_PERCPU_CACHE for dm_io bioset >> > > Continue to see this problem with latest next. I can still recreate this issue against latest linux-next build. [ 1536.883400] Buffer I/O error on dev dm-0, logical block 10485497, async page read [ 1536.936018] XFS (dm-0): Unmounting Filesystem [ 1536.938849] XFS (dm-0): Mounting V5 Filesystem [ 1536.946007] XFS (dm-0): Ending clean mount [ 1536.947926] xfs filesystem being mounted at /mnt/scratch supports timestamps until 2038 (0x7fff) [ 1537.052850] XFS (dm-0): Unmounting Filesystem [ 1537.083979] BUG: Unable to handle kernel data access at 0x5deadbeef122 [ 1537.083982] Faulting instruction address: 0xc015b0bc [ 1537.083984] Oops: Kernel access of bad area, sig: 11 [#1] [ 1537.084000] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries [ 1537.084006] Modules linked in: dm_snapshot(E) dm_bufio(E) loop(E) dm_flakey(E) xfs(E) dm_mod(E) nft_fib_inet(E) nft_fib_ipv4(E) nft_fib_ipv6(E) nft_fib(E) nft_reject_inet(E) nf_reject_ipv4(E) nf_reject_ipv6(E) nft_reject(E) nft_ct(E) nft_chain_nat(E) nf_nat(E) nf_conntrack(E) nf_defrag_ipv6(E) nf_defrag_ipv4(E) rfkill(E) ip_set(E) nf_tables(E) bonding(E) tls(E) libcrc32c(E) nfnetlink(E) sunrpc(E) nd_pmem(E) nd_btt(E) dax_pmem(E) pseries_rng(E) papr_scm(E) libnvdimm(E) vmx_crypto(E) ext4(E) mbcache(E) jbd2(E) sd_mod(E) t10_pi(E) crc64_rocksoft(E) crc64(E) sg(E) ibmvscsi(E) ibmveth(E) scsi_transport_srp(E) fuse(E) [last unloaded: scsi_debug] [ 1537.084056] CPU: 10 PID: 970489 Comm: dmsetup Tainted: GE 5.18.0-rc6-next-20220509 #2 [ 1537.084061] NIP: c015b0bc LR: c015afe8 CTR: c0753bb0 [ 1537.084064] REGS: c000211fb610 TRAP: 0380 Tainted: GE (5.18.0-rc6-next-20220509) [ 1537.084068] MSR: 8280b033 CR: 24024824 XER: 2004 [ 1537.084078] CFAR: c015aff0 IRQMASK: 0 [ 1537.084078] GPR00: c015afe8 c000211fb8b0 c2a7cf00 [ 1537.084078] GPR04: c000f98a1378 c000f5043b50 c0043463e280 [ 1537.084078] GPR08: c0043463e280 5deadbeef100 5deadbeef122 c0080214dcb0 [ 1537.084078] GPR12: c0753bb0 c00abfff1700 000155ee0b60 7fffa7c29da8 [ 1537.084078] GPR16: 7fffa7c29da8 7fffa7c29da8 7fffa7c63670 [ 1537.084078] GPR20: 7fffa7c33388 7fffa7c62040 000155ee0b90 0131 [ 1537.084078] GPR24: c25adb68 c25adb30 c00103a5e000 [ 1537.084078] GPR28: c2a23ce8 c000f98a1378 0017 [ 1537.084117] NIP [c015b0bc] __cpuhp_state_remove_instance+0x19c/0x2c0 [ 1537.084125] LR [c015afe8] __cpuhp_state_remove_instance+0xc8/0x2c0 [ 1537.084130] Call Trace: [ 1537.084131] [c000211fb8b0] [c015afe8] __cpuhp_state_remove_instance+0xc8/0x2c0 (unreliable) [ 1537.084138] [c000211fb920] [c0753c14] bioset_exit+0x64/0x280 [ 1537.084144] [c000211fb9c0] [c00802137744] cleanup_mapped_device+0x4c/0x1c0 [dm_mod] [ 1537.084155] [c000211fba00] [c00802137a60] __dm_destroy+0x1a8/0x360 [dm_mod] [ 1537.084163] [c000211fbaa0] [c008021445c0] dev_remove+0x1b8/0x290 [dm_mod] [ 1537.084172] [c000211fbb30] [c0080214488c] ctl_ioctl+0x1f4/0x7d0 [dm_mod] [ 1537.084181] [c000211fbd40] [c00802144e88] dm_ctl_ioctl+0x20/0x40 [dm_mod] [ 1537.084190] [c000211fbd60] [c055ff28] sys_ioctl+0xf8/0x190 [ 1537.084195] [c000211fbdb0] [c003377c] system_call_exception+0x17c/0x350 [ 1537.084200] [c000211fbe10] [c000c54c] system_call_common+0xec/0x270 [ 1537.084205] --- interrupt: c00 at 0x7fffa7529210 [ 1537.084208] NIP: 7fffa7529210 LR: 7fffa7c26824 CTR: [ 1537.084211] REGS: c000211fbe80 TRAP: 0c00 Tainted: GE (5.18.0-rc6-next-20220509) [ 1537.084215] MSR: 8280f033 CR: 24004484 XER: [ 1537.084224] IRQMASK: 0 [ 1537.084224] GPR00: 0036 7fffc7448c30 7fffa7607300 0003 [ 1537.084224] GPR04: c138fd04 000155ee0b60 0004 7fffa7c33f98 [ 1537.084224] GPR08: 0003 [ 1537.084224] GPR12: 7fffa7d0fa80 000155ee0b60 7fffa7c29da8 [ 1537.084224] GPR16: 7fffa7c29da8 7fffa7c29da8 7fffa7
Re: [PATCH v3 0/3] Fix CONT-PTE/PMD size hugetlb issue when unmapping or migrating
On 5/10/2022 12:04 PM, Andrew Morton wrote: On Tue, 10 May 2022 11:45:57 +0800 Baolin Wang wrote: Hi, Now migrating a hugetlb page or unmapping a poisoned hugetlb page, we'll use ptep_clear_flush() and set_pte_at() to nuke the page table entry and remap it, and this is incorrect for CONT-PTE or CONT-PMD size hugetlb page, It would be helpful to describe why it's wrong. Something like "should use huge_ptep_clear_flush() and huge_ptep_clear_flush() for this purpose"? Sorry for the confusing description. I described the problem explicitly in each patch's commit message. https://lore.kernel.org/all/ea5abf529f0997b5430961012bfda6166c1efc8c.1652147571.git.baolin.w...@linux.alibaba.com/ https://lore.kernel.org/all/730ea4b6d292f32fb10b7a4e87dad49b0eb30474.1652147571.git.baolin.w...@linux.alibaba.com/ which will cause potential data consistent issue. This patch set will change to use hugetlb related APIs to fix this issue, please find details in each patch. Thanks. Is a cc:stable needed here? And are we able to identify a target for a Fixes: tag? I think need a cc:stable tag, however I am not sure the target fixes tag, since we should trace back to the introduction of CONT-PTE/PMD hugetlb? 66b3923a1a0f ("arm64: hugetlb: add support for PTE contiguous bit")
Re: [PATCH v3 0/3] Fix CONT-PTE/PMD size hugetlb issue when unmapping or migrating
On Tue, 10 May 2022 11:45:57 +0800 Baolin Wang wrote: > Hi, > > Now migrating a hugetlb page or unmapping a poisoned hugetlb page, we'll > use ptep_clear_flush() and set_pte_at() to nuke the page table entry > and remap it, and this is incorrect for CONT-PTE or CONT-PMD size hugetlb > page, It would be helpful to describe why it's wrong. Something like "should use huge_ptep_clear_flush() and huge_ptep_clear_flush() for this purpose"? > which will cause potential data consistent issue. This patch set > will change to use hugetlb related APIs to fix this issue, please find > details in each patch. Thanks. Is a cc:stable needed here? And are we able to identify a target for a Fixes: tag?
[PATCH v3 3/3] mm: rmap: Fix CONT-PTE/PMD size hugetlb issue when unmapping
On some architectures (like ARM64), it can support CONT-PTE/PMD size hugetlb, which means it can support not only PMD/PUD size hugetlb: 2M and 1G, but also CONT-PTE/PMD size: 64K and 32M if a 4K page size specified. When unmapping a hugetlb page, we will get the relevant page table entry by huge_pte_offset() only once to nuke it. This is correct for PMD or PUD size hugetlb, since they always contain only one pmd entry or pud entry in the page table. However this is incorrect for CONT-PTE and CONT-PMD size hugetlb, since they can contain several continuous pte or pmd entry with same page table attributes, so we will nuke only one pte or pmd entry for this CONT-PTE/PMD size hugetlb page. And now try_to_unmap() is only passed a hugetlb page in the case where the hugetlb page is poisoned. Which means now we will unmap only one pte entry for a CONT-PTE or CONT-PMD size poisoned hugetlb page, and we can still access other subpages of a CONT-PTE or CONT-PMD size poisoned hugetlb page, which will cause serious issues possibly. So we should change to use huge_ptep_clear_flush() to nuke the hugetlb page table to fix this issue, which already considered CONT-PTE and CONT-PMD size hugetlb. We've already used set_huge_swap_pte_at() to set a poisoned swap entry for a poisoned hugetlb page. Meanwhile adding a VM_BUG_ON() to make sure the passed hugetlb page is poisoned in try_to_unmap(). Signed-off-by: Baolin Wang Reviewed-by: Muchun Song Reviewed-by: Mike Kravetz --- mm/rmap.c | 39 ++- 1 file changed, 22 insertions(+), 17 deletions(-) diff --git a/mm/rmap.c b/mm/rmap.c index 4e96daf..219e287 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1528,6 +1528,11 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma, if (folio_test_hugetlb(folio)) { /* +* The try_to_unmap() is only passed a hugetlb page +* in the case where the hugetlb page is poisoned. +*/ + VM_BUG_ON_PAGE(!PageHWPoison(subpage), subpage); + /* * huge_pmd_unshare may unmap an entire PMD page. * There is no way of knowing exactly which PMDs may * be cached for this mm, so we must flush them all. @@ -1562,28 +1567,28 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma, break; } } + pteval = huge_ptep_clear_flush(vma, address, pvmw.pte); } else { flush_cache_page(vma, address, pte_pfn(*pvmw.pte)); - } - - /* -* Nuke the page table entry. When having to clear -* PageAnonExclusive(), we always have to flush. -*/ - if (should_defer_flush(mm, flags) && !anon_exclusive) { /* -* We clear the PTE but do not flush so potentially -* a remote CPU could still be writing to the folio. -* If the entry was previously clean then the -* architecture must guarantee that a clear->dirty -* transition on a cached TLB entry is written through -* and traps if the PTE is unmapped. +* Nuke the page table entry. When having to clear +* PageAnonExclusive(), we always have to flush. */ - pteval = ptep_get_and_clear(mm, address, pvmw.pte); + if (should_defer_flush(mm, flags) && !anon_exclusive) { + /* +* We clear the PTE but do not flush so potentially +* a remote CPU could still be writing to the folio. +* If the entry was previously clean then the +* architecture must guarantee that a clear->dirty +* transition on a cached TLB entry is written through +* and traps if the PTE is unmapped. +*/ + pteval = ptep_get_and_clear(mm, address, pvmw.pte); - set_tlb_ubc_flush_pending(mm, pte_dirty(pteval)); - } else { - pteval = ptep_clear_flush(vma, address, pvmw.pte); + set_tlb_ubc_flush_pending(mm, pte_dirty(pteval)); + } else { + pteval = ptep_clear_flush(vma, address, pvmw.pte); + } } /* -- 1.8.3.1
[PATCH v3 2/3] mm: rmap: Fix CONT-PTE/PMD size hugetlb issue when migration
On some architectures (like ARM64), it can support CONT-PTE/PMD size hugetlb, which means it can support not only PMD/PUD size hugetlb: 2M and 1G, but also CONT-PTE/PMD size: 64K and 32M if a 4K page size specified. When migrating a hugetlb page, we will get the relevant page table entry by huge_pte_offset() only once to nuke it and remap it with a migration pte entry. This is correct for PMD or PUD size hugetlb, since they always contain only one pmd entry or pud entry in the page table. However this is incorrect for CONT-PTE and CONT-PMD size hugetlb, since they can contain several continuous pte or pmd entry with same page table attributes. So we will nuke or remap only one pte or pmd entry for this CONT-PTE/PMD size hugetlb page, which is not expected for hugetlb migration. The problem is we can still continue to modify the subpages' data of a hugetlb page during migrating a hugetlb page, which can cause a serious data consistent issue, since we did not nuke the page table entry and set a migration pte for the subpages of a hugetlb page. To fix this issue, we should change to use huge_ptep_clear_flush() to nuke a hugetlb page table, and remap it with set_huge_pte_at() and set_huge_swap_pte_at() when migrating a hugetlb page, which already considered the CONT-PTE or CONT-PMD size hugetlb. Signed-off-by: Baolin Wang Reviewed-by: Muchun Song Reviewed-by: Mike Kravetz --- include/linux/hugetlb.h | 11 +++ mm/rmap.c | 24 ++-- 2 files changed, 29 insertions(+), 6 deletions(-) diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index 306d6ef..9f71043 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -1093,6 +1093,17 @@ static inline void set_huge_swap_pte_at(struct mm_struct *mm, unsigned long addr pte_t *ptep, pte_t pte, unsigned long sz) { } + +static inline pte_t huge_ptep_clear_flush(struct vm_area_struct *vma, + unsigned long addr, pte_t *ptep) +{ + return ptep_get(ptep); +} + +static inline void set_huge_pte_at(struct mm_struct *mm, unsigned long addr, + pte_t *ptep, pte_t pte) +{ +} #endif /* CONFIG_HUGETLB_PAGE */ static inline spinlock_t *huge_pte_lock(struct hstate *h, diff --git a/mm/rmap.c b/mm/rmap.c index 94d6b24..4e96daf 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1926,13 +1926,15 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma, break; } } + + /* Nuke the hugetlb page table entry */ + pteval = huge_ptep_clear_flush(vma, address, pvmw.pte); } else { flush_cache_page(vma, address, pte_pfn(*pvmw.pte)); + /* Nuke the page table entry. */ + pteval = ptep_clear_flush(vma, address, pvmw.pte); } - /* Nuke the page table entry. */ - pteval = ptep_clear_flush(vma, address, pvmw.pte); - /* Set the dirty flag on the folio now the pte is gone. */ if (pte_dirty(pteval)) folio_mark_dirty(folio); @@ -2017,7 +2019,10 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma, pte_t swp_pte; if (arch_unmap_one(mm, vma, address, pteval) < 0) { - set_pte_at(mm, address, pvmw.pte, pteval); + if (folio_test_hugetlb(folio)) + set_huge_pte_at(mm, address, pvmw.pte, pteval); + else + set_pte_at(mm, address, pvmw.pte, pteval); ret = false; page_vma_mapped_walk_done(); break; @@ -2026,7 +2031,10 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma, !anon_exclusive, subpage); if (anon_exclusive && page_try_share_anon_rmap(subpage)) { - set_pte_at(mm, address, pvmw.pte, pteval); + if (folio_test_hugetlb(folio)) + set_huge_pte_at(mm, address, pvmw.pte, pteval); + else + set_pte_at(mm, address, pvmw.pte, pteval); ret = false; page_vma_mapped_walk_done(); break; @@ -2052,7 +2060,11 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma, swp_pte = pte_swp_mksoft_dirty(swp_pte);
[PATCH v3 1/3] mm: change huge_ptep_clear_flush() to return the original pte
It is incorrect to use ptep_clear_flush() to nuke a hugetlb page table when unmapping or migrating a hugetlb page, and will change to use huge_ptep_clear_flush() instead in the following patches. So this is a preparation patch, which changes the huge_ptep_clear_flush() to return the original pte to help to nuke a hugetlb page table. Signed-off-by: Baolin Wang Acked-by: Mike Kravetz Reviewed-by: Muchun Song --- arch/arm64/include/asm/hugetlb.h | 4 ++-- arch/arm64/mm/hugetlbpage.c| 12 +--- arch/ia64/include/asm/hugetlb.h| 4 ++-- arch/mips/include/asm/hugetlb.h| 9 ++--- arch/parisc/include/asm/hugetlb.h | 4 ++-- arch/powerpc/include/asm/hugetlb.h | 9 ++--- arch/s390/include/asm/hugetlb.h| 6 +++--- arch/sh/include/asm/hugetlb.h | 4 ++-- arch/sparc/include/asm/hugetlb.h | 4 ++-- include/asm-generic/hugetlb.h | 4 ++-- 10 files changed, 32 insertions(+), 28 deletions(-) diff --git a/arch/arm64/include/asm/hugetlb.h b/arch/arm64/include/asm/hugetlb.h index 1242f71..616b2ca 100644 --- a/arch/arm64/include/asm/hugetlb.h +++ b/arch/arm64/include/asm/hugetlb.h @@ -39,8 +39,8 @@ extern pte_t huge_ptep_get_and_clear(struct mm_struct *mm, extern void huge_ptep_set_wrprotect(struct mm_struct *mm, unsigned long addr, pte_t *ptep); #define __HAVE_ARCH_HUGE_PTEP_CLEAR_FLUSH -extern void huge_ptep_clear_flush(struct vm_area_struct *vma, - unsigned long addr, pte_t *ptep); +extern pte_t huge_ptep_clear_flush(struct vm_area_struct *vma, + unsigned long addr, pte_t *ptep); #define __HAVE_ARCH_HUGE_PTE_CLEAR extern void huge_pte_clear(struct mm_struct *mm, unsigned long addr, pte_t *ptep, unsigned long sz); diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c index cbace1c..ca8e65c 100644 --- a/arch/arm64/mm/hugetlbpage.c +++ b/arch/arm64/mm/hugetlbpage.c @@ -486,19 +486,17 @@ void huge_ptep_set_wrprotect(struct mm_struct *mm, set_pte_at(mm, addr, ptep, pfn_pte(pfn, hugeprot)); } -void huge_ptep_clear_flush(struct vm_area_struct *vma, - unsigned long addr, pte_t *ptep) +pte_t huge_ptep_clear_flush(struct vm_area_struct *vma, + unsigned long addr, pte_t *ptep) { size_t pgsize; int ncontig; - if (!pte_cont(READ_ONCE(*ptep))) { - ptep_clear_flush(vma, addr, ptep); - return; - } + if (!pte_cont(READ_ONCE(*ptep))) + return ptep_clear_flush(vma, addr, ptep); ncontig = find_num_contig(vma->vm_mm, addr, ptep, ); - clear_flush(vma->vm_mm, addr, ptep, pgsize, ncontig); + return get_clear_flush(vma->vm_mm, addr, ptep, pgsize, ncontig); } static int __init hugetlbpage_init(void) diff --git a/arch/ia64/include/asm/hugetlb.h b/arch/ia64/include/asm/hugetlb.h index 7e46ebd..65d3811 100644 --- a/arch/ia64/include/asm/hugetlb.h +++ b/arch/ia64/include/asm/hugetlb.h @@ -23,8 +23,8 @@ static inline int is_hugepage_only_range(struct mm_struct *mm, #define is_hugepage_only_range is_hugepage_only_range #define __HAVE_ARCH_HUGE_PTEP_CLEAR_FLUSH -static inline void huge_ptep_clear_flush(struct vm_area_struct *vma, -unsigned long addr, pte_t *ptep) +static inline pte_t huge_ptep_clear_flush(struct vm_area_struct *vma, + unsigned long addr, pte_t *ptep) { } diff --git a/arch/mips/include/asm/hugetlb.h b/arch/mips/include/asm/hugetlb.h index c214440..fd69c88 100644 --- a/arch/mips/include/asm/hugetlb.h +++ b/arch/mips/include/asm/hugetlb.h @@ -43,16 +43,19 @@ static inline pte_t huge_ptep_get_and_clear(struct mm_struct *mm, } #define __HAVE_ARCH_HUGE_PTEP_CLEAR_FLUSH -static inline void huge_ptep_clear_flush(struct vm_area_struct *vma, -unsigned long addr, pte_t *ptep) +static inline pte_t huge_ptep_clear_flush(struct vm_area_struct *vma, + unsigned long addr, pte_t *ptep) { + pte_t pte; + /* * clear the huge pte entry firstly, so that the other smp threads will * not get old pte entry after finishing flush_tlb_page and before * setting new huge pte entry */ - huge_ptep_get_and_clear(vma->vm_mm, addr, ptep); + pte = huge_ptep_get_and_clear(vma->vm_mm, addr, ptep); flush_tlb_page(vma, addr); + return pte; } #define __HAVE_ARCH_HUGE_PTE_NONE diff --git a/arch/parisc/include/asm/hugetlb.h b/arch/parisc/include/asm/hugetlb.h index a69cf9e..25bc560 100644 --- a/arch/parisc/include/asm/hugetlb.h +++ b/arch/parisc/include/asm/hugetlb.h @@ -28,8 +28,8 @@ static inline int prepare_hugepage_range(struct file *file, } #define __HAVE_ARCH_HUGE_PTEP_CLEAR_FLUSH -static inline void huge_ptep_clear_flush(struct
[PATCH v3 0/3] Fix CONT-PTE/PMD size hugetlb issue when unmapping or migrating
Hi, Now migrating a hugetlb page or unmapping a poisoned hugetlb page, we'll use ptep_clear_flush() and set_pte_at() to nuke the page table entry and remap it, and this is incorrect for CONT-PTE or CONT-PMD size hugetlb page, which will cause potential data consistent issue. This patch set will change to use hugetlb related APIs to fix this issue, please find details in each patch. Thanks. Note: Mike pointed out the huge_ptep_get() will only return the one specific value, and it would not take into account the dirty or young bits of CONT-PTE/PMDs like the huge_ptep_get_and_clear() [1]. This inconsistent issue is not introduced by this patch set, and will address this issue in another thread [2]. Meanwhile the uffd for hugetlb case [3] pointed by Gerald also need another patch to address. [1] https://lore.kernel.org/linux-mm/85bd80b4-b4fd-0d3f-a2e5-149559f2f...@oracle.com/ [2] https://lore.kernel.org/all/cover.1651998586.git.baolin.w...@linux.alibaba.com/ [3] https://lore.kernel.org/linux-mm/20220503120343.6264e126@thinkpad/ Changes from v2: - Collect reviewed tags from Muchun and Mike. - Drop the unnecessary casting in hugetlb.c. - Fix building errors with adding dummy functions for !CONFIG_HUGETLB_PAGE. Changes from v1: - Add acked tag from Mike. - Update some commit message. - Add VM_BUG_ON in try_to_unmap() for hugetlb case. - Add an explict void casting for huge_ptep_clear_flush() in hugetlb.c. Baolin Wang (3): mm: change huge_ptep_clear_flush() to return the original pte mm: rmap: Fix CONT-PTE/PMD size hugetlb issue when migration mm: rmap: Fix CONT-PTE/PMD size hugetlb issue when unmapping arch/arm64/include/asm/hugetlb.h | 4 +-- arch/arm64/mm/hugetlbpage.c| 12 +++- arch/ia64/include/asm/hugetlb.h| 4 +-- arch/mips/include/asm/hugetlb.h| 9 -- arch/parisc/include/asm/hugetlb.h | 4 +-- arch/powerpc/include/asm/hugetlb.h | 9 -- arch/s390/include/asm/hugetlb.h| 6 ++-- arch/sh/include/asm/hugetlb.h | 4 +-- arch/sparc/include/asm/hugetlb.h | 4 +-- include/asm-generic/hugetlb.h | 4 +-- include/linux/hugetlb.h| 11 +++ mm/rmap.c | 63 -- 12 files changed, 83 insertions(+), 51 deletions(-) -- 1.8.3.1
Re: [PATCH] crypto: vmx - Fix build error
On Sat, May 07, 2022 at 02:22:43PM +0900, Masahiro Yamada wrote: > When I refactored this Makefile, I accidentally changed the CONFIG > option. > > Fixes: b52455a73db9 ("crypto: vmx - Align the short log with Makefile > cleanups") > Reported-by: kernel test robot > Signed-off-by: Masahiro Yamada > --- > > drivers/crypto/vmx/Makefile | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) Patch applied. Thanks. -- Email: Herbert Xu Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
Re: [PATCH v2 1/3] mm: change huge_ptep_clear_flush() to return the original pte
On 5/10/2022 4:02 AM, Mike Kravetz wrote: On 5/9/22 01:46, Baolin Wang wrote: On 5/9/2022 1:46 PM, Christophe Leroy wrote: Le 08/05/2022 à 15:09, Baolin Wang a écrit : On 5/8/2022 7:09 PM, Muchun Song wrote: On Sun, May 08, 2022 at 05:36:39PM +0800, Baolin Wang wrote: It is incorrect to use ptep_clear_flush() to nuke a hugetlb page table when unmapping or migrating a hugetlb page, and will change to use huge_ptep_clear_flush() instead in the following patches. So this is a preparation patch, which changes the huge_ptep_clear_flush() to return the original pte to help to nuke a hugetlb page table. Signed-off-by: Baolin Wang Acked-by: Mike Kravetz Reviewed-by: Muchun Song Thanks for reviewing. But one nit below: [...] diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 8605d7e..61a21af 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -5342,7 +5342,7 @@ static vm_fault_t hugetlb_wp(struct mm_struct *mm, struct vm_area_struct *vma, ClearHPageRestoreReserve(new_page); /* Break COW or unshare */ - huge_ptep_clear_flush(vma, haddr, ptep); + (void)huge_ptep_clear_flush(vma, haddr, ptep); Why add a "(void)" here? Is there any warning if no "(void)"? IIUC, I think we can remove this, right? I did not meet any warning without the casting, but this is per Mike's comment[1] to make the code consistent with other functions casting to void type explicitly in hugetlb.c file. [1] https://lore.kernel.org/all/495c4ebe-a5b4-afb6-4cb0-956c1b18d...@oracle.com/ As far as I understand, Mike said that you should be accompagnied with a big fat comment explaining why we ignore the returned value from huge_ptep_clear_flush(). > By the way huge_ptep_clear_flush() is not declared 'must_check' so this cast is just visual polution and should be removed. In the meantime the comment suggested by Mike should be added instead. Sorry for my misunderstanding. I just follow the explicit void casting like other places in hugetlb.c file. And I am not sure if it is useful adding some comments like below, since we did not need the original pte value in the COW case mapping with a new page, and the code is more readable already I think. Mike, could you help to clarify what useful comments would you like? and remove the explicit void casting? Thanks. Sorry for the confusion. In the original commit, it seemed odd to me that the signature of the function was changing and there was not an associated change to the only caller of the function. I did suggest casting to void or adding a comment. As Christophe mentions, the cast to void is not necessary. In addition, there really isn't a need for a comment as the calling code is not changed. OK. Will drop the casting in next version. The original version of the commit without either is actually preferable. The commit message does say this is a preparation patch and the return value will be used in later patches. OK. Thanks Mike for making me clear. Also thanks to Muchun and Christophe.
Re: [PATCH 3/3] mm: rmap: Fix CONT-PTE/PMD size hugetlb issue when unmapping
On 5/10/2022 12:41 AM, Peter Xu wrote: On Fri, May 06, 2022 at 12:07:13PM -0700, Mike Kravetz wrote: On 5/3/22 03:03, Gerald Schaefer wrote: On Tue, 3 May 2022 10:19:46 +0800 Baolin Wang wrote: On 5/2/2022 10:02 PM, Gerald Schaefer wrote: [...] Please see previous code, we'll use the original pte value to check if it is uffd-wp armed, and if need to mark it dirty though the hugetlbfs is set noop_dirty_folio(). pte_install_uffd_wp_if_needed(vma, address, pvmw.pte, pteval); Uh, ok, that wouldn't work on s390, but we also don't have CONFIG_PTE_MARKER_UFFD_WP / HAVE_ARCH_USERFAULTFD_WP set, so I guess we will be fine (for now). Still, I find it a bit unsettling that pte_install_uffd_wp_if_needed() would work on a potential hugetlb *pte, directly de-referencing it instead of using huge_ptep_get(). The !pte_none(*pte) check at the beginning would be broken in the hugetlb case for s390 (not sure about other archs, but I think s390 might be the only exception strictly requiring huge_ptep_get() for de-referencing hugetlb *pte pointers). We could have used is_vm_hugetlb_page(vma) within the helper so as to properly use either generic pte or hugetlb version of pte fetching. We may want to conditionally do set_[huge_]pte_at() too at the end. I could prepare a patch for that even if it's not really anything urgently needed. I assume that won't need to block this patchset since we need the pteval for pte_dirty() check anyway and uffd-wp definitely needs it too. OK. Thanks Peter.
Re: [PATCH] bug: Use normal relative pointers in 'struct bug_entry'
On Mon, May 09, 2022 at 10:31:14PM +1000, Michael Ellerman wrote: > Embarrassingly, we have another copy of the logic, used in the C > versions, they need updating too: Oops, thanks for finding that. > With that added it seems to be working correctly for me. > > Acked-by: Michael Ellerman (powerpc) Thanks! -- Josh
Re: [PATCH 24/30] panic: Refactor the panic path
Hey Hatayma, thanks for your great analysis and no need for apologies! I'll comment/respond properly inline below, just noticing here that I've CCed Mark and Marc (from the ARM64 perspective), Michael (Hyper-V perspective) and Hari (PowerPC perspective), besides the usual suspects as Petr, Baoquan, etc. On 09/05/2022 12:16, d.hatay...@fujitsu.com wrote: > Sorry for the delayed response. Unfortunately, I had 10 days holidays > until yesterday... > [...] >> + We currently have 4 lists of panic notifiers; based >> + on the functionality and risk (for panic success) the >> + callbacks are added in a given list. The lists are: >> + - hypervisor/FW notification list (low risk); >> + - informational list (low/medium risk); >> + - pre_reboot list (higher risk); >> + - post_reboot list (only run late in panic and after >> + kdump, not configurable for now). >> + This parameter defines the ordering of the first 3 >> + lists with regards to kdump; the levels determine >> + which set of notifiers execute before kdump. The >> + accepted levels are: >> + 0: kdump is the first thing to run, NO list is >> + executed before kdump. >> + 1: only the hypervisor list is executed before kdump. >> + 2 (default level): the hypervisor list and (*if* > > Hmmm, why are you trying to change default setting? > > Based on the current design of kdump, it's natural to put what the > handlers for these level 1 and level 2 handlers do in > machine_crash_shutdown(), as these are necessary by default, right? > > Or have you already tried that and figured out it's difficult in some > reason and reached the current design? If so, why is that difficult? > Could you point to if there is already such discussion online? > > kdump is designed to perform as little things as possible before > transferring the execution to the 2nd kernel in order to increase > reliability. Just detour to panic() increases risks of kdump failure > in the sense of increasing the executed codes in the abnormal > situation, which is very the note in the explanation of > crash_kexec_post_notifiers. > > Also, the current implementation of crash_kexec_post_notifiers uses > the panic notifier, but this is not from the technical > reason. Ideally, it should have been implemented in the context of > crash_kexec() independently of panic(). > > That is, it looks to me that, in addition to changing design of panic > notifier, you are trying to integrate shutdown code of the crash kexec > and the panic paths. If so, this is a big design change for kdump. > I'm concerned about increase of reliability. I'd like you to discuss > them carefully. >From my understanding (specially based on both these threads [0] and [1]), 3 facts are clear and quite complex in nature: (a) Currently, the panic notifier list is a "no man's land" - it's a mess, all sort of callbacks are added there, some of them are extremely risk for breaking kdump, others are quite safe (like setting a variable). Petr's details in thread [0] are really clear and express in great way how confusing and conflicting the panic notifiers goals are. (b) In order to "address" problems in the antagonistic goals of notifiers (see point (a) above and thread [0]), we have this quirk/workaround called "crash_kexec_post_notifiers". This is useful, but (almost as for attesting how this is working as band-aid over complex and fundamental issues) see the following commits: a11589563e96 ("x86/Hyper-V: Report crash register data or kmsg before running crash kernel") 06e629c25daa ("powerpc/fadump: Fix inaccurate CPU state info in vmcore generated with panic") They hardcode such workaround, because they *need* some notifiers' callbacks. But notice they *don't need all of them*, only some important ones (that usually are good considering the risk, it's a good cost/benefit). Since we currently have an all-or-nothing behavior for the panic notifiers, both PowerPC and Hyper-V end-up bringing all of them to run before kdump due to the lack of flexibility, increasing a lot the risk of failure for kdump. (c) To add on top of all such complexity, we don't have a custom machine_crash_shutdown() handler for some architectures like ARM64, and the feeling is that's not right to add a bunch of callbacks / complexity in such architecture code, specially since we have the notifiers infrastructure in the kernel. I've recently started a discussion about that with ARM64 community, please take a look in [1]. With that said, we can use (a) + (b) + (c) to justify our argument here: the panic notifiers should be refactored! We need to try to encompass the antagonistic goals of kdump (wants to be the
Re: [PATCH 09/30] coresight: cpu-debug: Replace mutex with mutex_trylock on panic notifier
On 09/05/2022 13:14, Suzuki K Poulose wrote: > [...]> > I have queued this to coresight/next. > > Thanks > Suzuki Thanks a lot Suzuki!
Re: [PATCH 09/30] coresight: cpu-debug: Replace mutex with mutex_trylock on panic notifier
Hi On 09/05/2022 14:09, Guilherme G. Piccoli wrote: On 28/04/2022 05:11, Suzuki K Poulose wrote: Hi Guilherme, On 27/04/2022 23:49, Guilherme G. Piccoli wrote: The panic notifier infrastructure executes registered callbacks when a panic event happens - such callbacks are executed in atomic context, with interrupts and preemption disabled in the running CPU and all other CPUs disabled. That said, mutexes in such context are not a good idea. This patch replaces a regular mutex with a mutex_trylock safer approach; given the nature of the mutex used in the driver, it should be pretty uncommon being unable to acquire such mutex in the panic path, hence no functional change should be observed (and if it is, that would be likely a deadlock with the regular mutex). Fixes: 2227b7c74634 ("coresight: add support for CPU debug module") Cc: Leo Yan Cc: Mathieu Poirier Cc: Mike Leach Cc: Suzuki K Poulose Signed-off-by: Guilherme G. Piccoli How would you like to proceed with queuing this ? I am happy either way. In case you plan to push this as part of this series (I don't see any potential conflicts) : Reviewed-by: Suzuki K Poulose Hi Suzuki, some other maintainers are taking the patches to their next branches for example. I'm working on V2, and I guess in the end would be nice to reduce the size of the series a bit. So, do you think you could pick this one for your coresight/next branch (or even for rc cycle, your call - this is really a fix)? This way, I won't re-submit this one in V2, since it's gonna be merged already in your branch. I have queued this to coresight/next. Thanks Suzuki
Re: [PATCH 24/30] panic: Refactor the panic path
Sorry for the delayed response. Unfortunately, I had 10 days holidays until yesterday... > .../admin-guide/kernel-parameters.txt | 42 ++- > include/linux/panic_notifier.h| 1 + > kernel/kexec_core.c | 8 +- > kernel/panic.c| 292 +- > .../selftests/pstore/pstore_crash_test| 5 +- > 5 files changed, 252 insertions(+), 96 deletions(-) > > diff --git a/Documentation/admin-guide/kernel-parameters.txt > b/Documentation/admin-guide/kernel-parameters.txt > index 3f1cc5e317ed..8d3524060ce3 100644 > --- a/Documentation/admin-guide/kernel-parameters.txt > +++ b/Documentation/admin-guide/kernel-parameters.txt ...snip... > @@ -3784,6 +3791,33 @@ > timeout < 0: reboot immediately > Format: > > + panic_notifiers_level= > + [KNL] Set the panic notifiers execution order. > + Format: > + We currently have 4 lists of panic notifiers; based > + on the functionality and risk (for panic success) the > + callbacks are added in a given list. The lists are: > + - hypervisor/FW notification list (low risk); > + - informational list (low/medium risk); > + - pre_reboot list (higher risk); > + - post_reboot list (only run late in panic and after > + kdump, not configurable for now). > + This parameter defines the ordering of the first 3 > + lists with regards to kdump; the levels determine > + which set of notifiers execute before kdump. The > + accepted levels are: > + 0: kdump is the first thing to run, NO list is > + executed before kdump. > + 1: only the hypervisor list is executed before kdump. > + 2 (default level): the hypervisor list and (*if* Hmmm, why are you trying to change default setting? Based on the current design of kdump, it's natural to put what the handlers for these level 1 and level 2 handlers do in machine_crash_shutdown(), as these are necessary by default, right? Or have you already tried that and figured out it's difficult in some reason and reached the current design? If so, why is that difficult? Could you point to if there is already such discussion online? kdump is designed to perform as little things as possible before transferring the execution to the 2nd kernel in order to increase reliability. Just detour to panic() increases risks of kdump failure in the sense of increasing the executed codes in the abnormal situation, which is very the note in the explanation of crash_kexec_post_notifiers. Also, the current implementation of crash_kexec_post_notifiers uses the panic notifier, but this is not from the technical reason. Ideally, it should have been implemented in the context of crash_kexec() independently of panic(). That is, it looks to me that, in addition to changing design of panic notifier, you are trying to integrate shutdown code of the crash kexec and the panic paths. If so, this is a big design change for kdump. I'm concerned about increase of reliability. I'd like you to discuss them carefully. Thanks. HATAYAMA, Daisuke
Re: [PATCH 22/30] panic: Introduce the panic post-reboot notifier list
On 27/04/2022 19:49, Guilherme G. Piccoli wrote: > Currently we have 3 notifier lists in the panic path, which will > be wired in a way to allow the notifier callbacks to run in > different moments at panic time, in a subsequent patch. > > But there is also an odd set of architecture calls hardcoded in > the end of panic path, after the restart machinery. They're > responsible for late time tunings / events, like enabling a stop > button (Sparc) or effectively stopping the machine (s390). > > This patch introduces yet another notifier list to offer the > architectures a way to add callbacks in such late moment on > panic path without the need of ifdefs / hardcoded approaches. > > Cc: Alexander Gordeev > Cc: Christian Borntraeger > Cc: "David S. Miller" > Cc: Heiko Carstens > Cc: Sven Schnelle > Cc: Vasily Gorbik > Signed-off-by: Guilherme G. Piccoli Hey S390/SPARC folks, sorry for the ping! Any reviews on this V1 would be greatly appreciated, I'm working on V2 and seeking feedback in the non-reviewed patches. Thanks in advance, Guilherme
Re: [PATCH 09/30] coresight: cpu-debug: Replace mutex with mutex_trylock on panic notifier
On 28/04/2022 05:11, Suzuki K Poulose wrote: > Hi Guilherme, > > On 27/04/2022 23:49, Guilherme G. Piccoli wrote: >> The panic notifier infrastructure executes registered callbacks when >> a panic event happens - such callbacks are executed in atomic context, >> with interrupts and preemption disabled in the running CPU and all other >> CPUs disabled. That said, mutexes in such context are not a good idea. >> >> This patch replaces a regular mutex with a mutex_trylock safer approach; >> given the nature of the mutex used in the driver, it should be pretty >> uncommon being unable to acquire such mutex in the panic path, hence >> no functional change should be observed (and if it is, that would be >> likely a deadlock with the regular mutex). >> >> Fixes: 2227b7c74634 ("coresight: add support for CPU debug module") >> Cc: Leo Yan >> Cc: Mathieu Poirier >> Cc: Mike Leach >> Cc: Suzuki K Poulose >> Signed-off-by: Guilherme G. Piccoli > > How would you like to proceed with queuing this ? I am happy > either way. In case you plan to push this as part of this > series (I don't see any potential conflicts) : > > Reviewed-by: Suzuki K Poulose Hi Suzuki, some other maintainers are taking the patches to their next branches for example. I'm working on V2, and I guess in the end would be nice to reduce the size of the series a bit. So, do you think you could pick this one for your coresight/next branch (or even for rc cycle, your call - this is really a fix)? This way, I won't re-submit this one in V2, since it's gonna be merged already in your branch. Thanks in advance, Guilherme
Re: [PATCH 08/30] powerpc/setup: Refactor/untangle panic notifiers
On 05/05/2022 15:55, Hari Bathini wrote: > [...] > The change looks good. I have tested it on an LPAR (ppc64). > > Reviewed-by: Hari Bathini > Hi Michael. do you think it's possible to add this one to powerpc/next (or something like that), or do you prefer a V2 with his tag? Thanks, Guilherme
Re: [PATCH 01/30] x86/crash,reboot: Avoid re-disabling VMX in all CPUs on crash/restart
On 27/04/2022 19:48, Guilherme G. Piccoli wrote: > In the panic path we have a list of functions to be called, the panic > notifiers - such callbacks perform various actions in the machine's > last breath, and sometimes users want them to run before kdump. We > have the parameter "crash_kexec_post_notifiers" for that. When such > parameter is used, the function "crash_smp_send_stop()" is executed > to poweroff all secondary CPUs through the NMI-shootdown mechanism; > part of this process involves disabling virtualization features in > all CPUs (except the main one). > > Now, in the emergency restart procedure we have also a way of > disabling VMX in all CPUs, using the same NMI-shootdown mechanism; > what happens though is that in case we already NMI-disabled all CPUs, > the emergency restart fails due to a second addition of the same items > in the NMI list, as per the following log output: > > sysrq: Trigger a crash > Kernel panic - not syncing: sysrq triggered crash > [...] > Rebooting in 2 seconds.. > list_add double add: new=, prev=, next=. > [ cut here ] > kernel BUG at lib/list_debug.c:29! > invalid opcode: [#1] PREEMPT SMP PTI > > In order to reproduce the problem, users just need to set the kernel > parameter "crash_kexec_post_notifiers" *without* kdump set in any > system with the VMX feature present. > > Since there is no benefit in re-disabling VMX in all CPUs in case > it was already done, this patch prevents that by guarding the restart > routine against doubly issuing NMIs unnecessarily. Notice we still > need to disable VMX locally in the emergency restart. > > Fixes: ed72736183c4 ("x86/reboot: Force all cpus to exit VMX root if VMX is > supported) > Fixes: 0ee59413c967 ("x86/panic: replace smp_send_stop() with kdump friendly > version in panic path") > Cc: David P. Reed > Cc: Hidehiro Kawai > Cc: Paolo Bonzini > Cc: Sean Christopherson > Signed-off-by: Guilherme G. Piccoli > --- > arch/x86/include/asm/cpu.h | 1 + > arch/x86/kernel/crash.c| 8 > arch/x86/kernel/reboot.c | 14 -- > 3 files changed, 17 insertions(+), 6 deletions(-) > Hi Paolo / Sean / Vitaly, sorry for the ping. But do you think this fix is OK from the VMX point-of-view? I'd like to send a V2 of this set soon, so any review here is highly appreciated! Cheers, Guilherme
Re: [PATCH 2/2] powerpc/vdso: Link with ld.lld when requested
On Mon, May 09, 2022 at 02:58:09PM -0700, Nick Desaulniers wrote: > On Mon, May 9, 2022 at 2:47 PM Nathan Chancellor wrote: > > > > On Mon, May 09, 2022 at 02:24:40PM -0700, Nick Desaulniers wrote: > > > On Mon, May 9, 2022 at 1:47 PM Nathan Chancellor > > > wrote: > > > > > > > > The PowerPC vDSO is linked with $(CC) instead of $(LD), which means the > > > > default linker of the compiler is used instead of the linker requested > > > > by the builder. > > > > > > > > $ make ARCH=powerpc LLVM=1 mrproper defconfig > > > > arch/powerpc/kernel/vdso/ > > > > ... > > > > > > > > $ llvm-readelf -p .comment arch/powerpc/kernel/vdso/vdso{32,64}.so.dbg > > > > > > > > File: arch/powerpc/kernel/vdso/vdso32.so.dbg > > > > String dump of section '.comment': > > > > [ 0] clang version 14.0.0 (Fedora 14.0.0-1.fc37) > > > > > > > > File: arch/powerpc/kernel/vdso/vdso64.so.dbg > > > > String dump of section '.comment': > > > > [ 0] clang version 14.0.0 (Fedora 14.0.0-1.fc37) > > > > > > > > The compiler option '-fuse-ld' tells the compiler which linker to use > > > > when it is invoked as both the compiler and linker. Use '-fuse-ld=lld' > > > > when LD=ld.lld has been specified (CONFIG_LD_IS_LLD) so that the vDSO is > > > > linked with the same linker as the rest of the kernel. > > > > > > > > $ llvm-readelf -p .comment arch/powerpc/kernel/vdso/vdso{32,64}.so.dbg > > > > > > > > File: arch/powerpc/kernel/vdso/vdso32.so.dbg > > > > String dump of section '.comment': > > > > [ 0] Linker: LLD 14.0.0 > > > > [14] clang version 14.0.0 (Fedora 14.0.0-1.fc37) > > > > > > > > File: arch/powerpc/kernel/vdso/vdso64.so.dbg > > > > String dump of section '.comment': > > > > [ 0] Linker: LLD 14.0.0 > > > > [14] clang version 14.0.0 (Fedora 14.0.0-1.fc37) > > > > > > > > LD can be a full path to ld.lld, which will not be handled properly by > > > > '-fuse-ld=lld' if the full path to ld.lld is outside of the compiler's > > > > search path. '-fuse-ld' can take a path to the linker but it is > > > > deprecated in clang 12.0.0; '--ld-path' is preferred for this scenario. > > > > > > > > Use '--ld-path' if it is supported, as it will handle a full path or > > > > just 'ld.lld' properly. See the LLVM commit below for the full details > > > > of '--ld-path'. > > > > > > Perhaps worth adding some additional background from the cover letter > > > to the commit message that will actually go into the kernel, > > > particularly: > > > 1. Kbuild mostly invokes the compiler and linker distinctly; the ppc > > > vdso code uses the compiler as the linker driver though. > > > 2. When doing so, depending on how the compiler was configured, the > > > implicit default linker the compiler invokes might not match $LD. > > > > Sure, I think I can clear up these two points with something like: > > > > "The PowerPC vDSO uses $(CC) to link, which differs from the rest of the > > kernel, which uses $(LD) directly. As a result, the default linker of > > the compiler is used, which may differ from the linker requested by the > > builder. For example: > > > > > > > > LLVM=1 sets LD=ld.lld but ld.lld is not used to link the vDSO; GNU ld is > > because "ld" is the default linker for clang on most Linux platforms." > > > > Thoughts? > > SGTM > > > > > > 3. This is a problem for LTO since clang may try to invoke ld.gold, > > > which is not supported as of > > > commit 75959d44f9dc ("kbuild: Fail if gold linker is detected") > > > > Technically, it seemed like ld.bfd was being invoked but the LLVMgold > > plugin did not exist. Regardless, moving to ld.lld will resolve that, > > since the LLVMgold plugin won't be needed. > > Oh indeed, invoking clang with `-flto -###` shows it does invoke the > system's linker with `-plugin path/to/LLVMgold.so`, not `ld.gold` > itself. I don't think we should use or depend on the LLVMgold.so > plugin either (I suspect it will invoke ld.gold, but as you noticed, I > don't bother to build LLVMgold.so). So, perhaps reworded (feel free > to reword further): > > 3. This is a problem for LTO since clang may try to invoke ld.bfd with > the LLVMgold.so plugin. > https://llvm.org/docs/GoldPlugin.html states that "usage of the LLVM > gold plugin with ld.bfd is not tested and therefore not officially > supported or recommended." Users should instead use ld.lld to drive > linking for LTO with clang. I'll stick this blurb above "The compiler option": "This is a problem for Clang's Link Time Optimization as implemented in the kernel because use of GNU ld with LTO requires the LLVMgold plugin, which is not technically supported for ld.bfd per https://llvm.org/docs/GoldPlugin.html. Furthermore, if LLVMgold.so is missing from a user's system, the build will fail, even though LTO as it is implemented in the kernel requires ld.lld to avoid this dependency in the first place." Yell if there is something I should change! > > > 4. Using the linker as the driver can cause ld.bfd 2.26 to
Re: [PATCH v2 3/3] mm: rmap: Fix CONT-PTE/PMD size hugetlb issue when unmapping
On 5/8/22 02:36, Baolin Wang wrote: > On some architectures (like ARM64), it can support CONT-PTE/PMD size > hugetlb, which means it can support not only PMD/PUD size hugetlb: > 2M and 1G, but also CONT-PTE/PMD size: 64K and 32M if a 4K page > size specified. > > When unmapping a hugetlb page, we will get the relevant page table > entry by huge_pte_offset() only once to nuke it. This is correct > for PMD or PUD size hugetlb, since they always contain only one > pmd entry or pud entry in the page table. > > However this is incorrect for CONT-PTE and CONT-PMD size hugetlb, > since they can contain several continuous pte or pmd entry with > same page table attributes, so we will nuke only one pte or pmd > entry for this CONT-PTE/PMD size hugetlb page. > > And now try_to_unmap() is only passed a hugetlb page in the case > where the hugetlb page is poisoned. Which means now we will unmap > only one pte entry for a CONT-PTE or CONT-PMD size poisoned hugetlb > page, and we can still access other subpages of a CONT-PTE or CONT-PMD > size poisoned hugetlb page, which will cause serious issues possibly. > > So we should change to use huge_ptep_clear_flush() to nuke the > hugetlb page table to fix this issue, which already considered > CONT-PTE and CONT-PMD size hugetlb. > > We've already used set_huge_swap_pte_at() to set a poisoned > swap entry for a poisoned hugetlb page. Meanwhile adding a VM_BUG_ON() > to make sure the passed hugetlb page is poisoned in try_to_unmap(). > > Signed-off-by: Baolin Wang > --- > mm/rmap.c | 39 ++- > 1 file changed, 22 insertions(+), 17 deletions(-) > > diff --git a/mm/rmap.c b/mm/rmap.c > index 7cf2408..37c8fd2 100644 > --- a/mm/rmap.c > +++ b/mm/rmap.c > @@ -1530,6 +1530,11 @@ static bool try_to_unmap_one(struct folio *folio, > struct vm_area_struct *vma, > > if (folio_test_hugetlb(folio)) { > /* > + * The try_to_unmap() is only passed a hugetlb page > + * in the case where the hugetlb page is poisoned. > + */ > + VM_BUG_ON_PAGE(!PageHWPoison(subpage), subpage); > + /* It is unfortunate that this could not easily be added to the first if (folio_test_hugetlb(folio)) block in this routine. However, it is fine to add here. Looks good. Thanks for all these changes, Reviewed-by: Mike Kravetz -- Mike Kravetz
Re: [PATCH v4 00/14] kbuild: yet another series of cleanups (modpost, LTO, MODULE_REL_CRCS, export.h)
[resending due to mailing list bounces from a large plain text attachment...] On Mon, May 09, 2022 at 01:24:33PM +0900, Masahiro Yamada wrote: > On Mon, May 9, 2022 at 4:09 AM Masahiro Yamada wrote: > > > > This is the third batch of cleanups in this development cycle. > > > > Major changes in v4: > > - Move static EXPORT_SYMBOL check to a script > > - Some refactoring > > > > Major changes in v3: > > > > - Generate symbol CRCs as C code, and remove CONFIG_MODULE_REL_CRCS. > > > > Major changes in v2: > > > > - V1 did not work with CONFIG_MODULE_REL_CRCS. > >I fixed this for v2. > > > > - Reflect some review comments in v1 > > > > - Refactor the code more > > > > - Avoid too long argument error > > This series is available at > git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild.git > lto-cleanup-v4 Hi Masahiro, I checked this out and went to run it through my QEMU tests but I see two new errors. Failure #1: In file included from scripts/mod/section-check.c:3: scripts/mod/modpost.h:15:10: fatal error: 'elfconfig.h' file not found #include "elfconfig.h" ^ 1 error generated. I was able to get past that with diff --git a/scripts/mod/Makefile b/scripts/mod/Makefile index ca739c6c68a1..c33b83bfbcad 100644 --- a/scripts/mod/Makefile +++ b/scripts/mod/Makefile @@ -16,7 +16,7 @@ targets += $(devicetable-offsets-file) devicetable-offsets.s # dependencies on generated files need to be listed explicitly -$(obj)/modpost.o $(obj)/file2alias.o $(obj)/sumversion.o: $(obj)/elfconfig.h +$(obj)/modpost.o $(obj)/file2alias.o $(obj)/sumversion.o $(obj)/section-check.o: $(obj)/elfconfig.h $(obj)/file2alias.o: $(obj)/$(devicetable-offsets-file) quiet_cmd_elfconfig = MKELF $@ Failure #2: GEN .version CHK include/generated/compile.h GEN .tmp_initcalls.lds LTO vmlinux.o OBJTOOL vmlinux.o MODPOST vmlinux.symvers MODINFO modules.builtin.modinfo GEN modules.builtin LD .tmp_vmlinux.btf ld.lld: error: cannot open .vmlinux.export.o: No such file or directory BTF .btf.vmlinux.bin.o pahole: .tmp_vmlinux.btf: No such file or directory CC .vmlinux.export.c LD .tmp_vmlinux.kallsyms1 ld.lld: error: .btf.vmlinux.bin.o: unknown file type make[1]: *** [Makefile:1159: vmlinux] Error 1 I was not really able to see what is going wrong here. Attached is the configuration that I ran into this with. If you need any other information, please let me know! Cheers, Nathan config.gz Description: application/gzip
Re: [PATCH 2/2] powerpc/vdso: Link with ld.lld when requested
On Mon, May 09, 2022 at 02:24:40PM -0700, Nick Desaulniers wrote: > On Mon, May 9, 2022 at 1:47 PM Nathan Chancellor wrote: > > > > The PowerPC vDSO is linked with $(CC) instead of $(LD), which means the > > default linker of the compiler is used instead of the linker requested > > by the builder. > > > > $ make ARCH=powerpc LLVM=1 mrproper defconfig arch/powerpc/kernel/vdso/ > > ... > > > > $ llvm-readelf -p .comment arch/powerpc/kernel/vdso/vdso{32,64}.so.dbg > > > > File: arch/powerpc/kernel/vdso/vdso32.so.dbg > > String dump of section '.comment': > > [ 0] clang version 14.0.0 (Fedora 14.0.0-1.fc37) > > > > File: arch/powerpc/kernel/vdso/vdso64.so.dbg > > String dump of section '.comment': > > [ 0] clang version 14.0.0 (Fedora 14.0.0-1.fc37) > > > > The compiler option '-fuse-ld' tells the compiler which linker to use > > when it is invoked as both the compiler and linker. Use '-fuse-ld=lld' > > when LD=ld.lld has been specified (CONFIG_LD_IS_LLD) so that the vDSO is > > linked with the same linker as the rest of the kernel. > > > > $ llvm-readelf -p .comment arch/powerpc/kernel/vdso/vdso{32,64}.so.dbg > > > > File: arch/powerpc/kernel/vdso/vdso32.so.dbg > > String dump of section '.comment': > > [ 0] Linker: LLD 14.0.0 > > [14] clang version 14.0.0 (Fedora 14.0.0-1.fc37) > > > > File: arch/powerpc/kernel/vdso/vdso64.so.dbg > > String dump of section '.comment': > > [ 0] Linker: LLD 14.0.0 > > [14] clang version 14.0.0 (Fedora 14.0.0-1.fc37) > > > > LD can be a full path to ld.lld, which will not be handled properly by > > '-fuse-ld=lld' if the full path to ld.lld is outside of the compiler's > > search path. '-fuse-ld' can take a path to the linker but it is > > deprecated in clang 12.0.0; '--ld-path' is preferred for this scenario. > > > > Use '--ld-path' if it is supported, as it will handle a full path or > > just 'ld.lld' properly. See the LLVM commit below for the full details > > of '--ld-path'. > > Perhaps worth adding some additional background from the cover letter > to the commit message that will actually go into the kernel, > particularly: > 1. Kbuild mostly invokes the compiler and linker distinctly; the ppc > vdso code uses the compiler as the linker driver though. > 2. When doing so, depending on how the compiler was configured, the > implicit default linker the compiler invokes might not match $LD. Sure, I think I can clear up these two points with something like: "The PowerPC vDSO uses $(CC) to link, which differs from the rest of the kernel, which uses $(LD) directly. As a result, the default linker of the compiler is used, which may differ from the linker requested by the builder. For example: LLVM=1 sets LD=ld.lld but ld.lld is not used to link the vDSO; GNU ld is because "ld" is the default linker for clang on most Linux platforms." Thoughts? > 3. This is a problem for LTO since clang may try to invoke ld.gold, > which is not supported as of > commit 75959d44f9dc ("kbuild: Fail if gold linker is detected") Technically, it seemed like ld.bfd was being invoked but the LLVMgold plugin did not exist. Regardless, moving to ld.lld will resolve that, since the LLVMgold plugin won't be needed. > 4. Using the linker as the driver can cause ld.bfd 2.26 to crash. > https://lore.kernel.org/all/b2066ccd-2b81-6032-08e3-41105b400...@csgroup.eu/ > (Though, I wonder if that's because I was trying to add > --orphan-handling=warn, which we're not yet doing for the ppc vdso > AFAICT). I can add this if necessary but it seemed like there might have been other problems reported? I could just add a blanket "linker driver had issues, we'll try again later" or something of that effect? > Reviewed-by: Nick Desaulniers Thank you for the review as always! > > > > Link: https://github.com/ClangBuiltLinux/linux/issues/774 > > Link: > > https://github.com/llvm/llvm-project/commit/1bc5c84710a8c73ef21295e63c19d10a8c71f2f5 > > Signed-off-by: Nathan Chancellor > > --- > > arch/powerpc/kernel/vdso/Makefile | 1 + > > 1 file changed, 1 insertion(+) > > > > diff --git a/arch/powerpc/kernel/vdso/Makefile > > b/arch/powerpc/kernel/vdso/Makefile > > index 954974287ee7..096b0bf1335f 100644 > > --- a/arch/powerpc/kernel/vdso/Makefile > > +++ b/arch/powerpc/kernel/vdso/Makefile > > @@ -48,6 +48,7 @@ UBSAN_SANITIZE := n > > KASAN_SANITIZE := n > > > > ccflags-y := -shared -fno-common -fno-builtin -nostdlib > > -Wl,--hash-style=both > > +ccflags-$(CONFIG_LD_IS_LLD) += $(call > > cc-option,--ld-path=$(LD),-fuse-ld=lld) > > > > CC32FLAGS := -Wl,-soname=linux-vdso32.so.1 -m32 > > AS32FLAGS := -D__VDSO32__ -s > > -- > > 2.36.1 > > > > > -- > Thanks, > ~Nick Desaulniers
Re: [PATCH v2 2/3] mm: rmap: Fix CONT-PTE/PMD size hugetlb issue when migration
On 5/8/22 02:36, Baolin Wang wrote: > On some architectures (like ARM64), it can support CONT-PTE/PMD size > hugetlb, which means it can support not only PMD/PUD size hugetlb: > 2M and 1G, but also CONT-PTE/PMD size: 64K and 32M if a 4K page > size specified. > > When migrating a hugetlb page, we will get the relevant page table > entry by huge_pte_offset() only once to nuke it and remap it with > a migration pte entry. This is correct for PMD or PUD size hugetlb, > since they always contain only one pmd entry or pud entry in the > page table. > > However this is incorrect for CONT-PTE and CONT-PMD size hugetlb, > since they can contain several continuous pte or pmd entry with > same page table attributes. So we will nuke or remap only one pte > or pmd entry for this CONT-PTE/PMD size hugetlb page, which is > not expected for hugetlb migration. The problem is we can still > continue to modify the subpages' data of a hugetlb page during > migrating a hugetlb page, which can cause a serious data consistent > issue, since we did not nuke the page table entry and set a > migration pte for the subpages of a hugetlb page. > > To fix this issue, we should change to use huge_ptep_clear_flush() > to nuke a hugetlb page table, and remap it with set_huge_pte_at() > and set_huge_swap_pte_at() when migrating a hugetlb page, which > already considered the CONT-PTE or CONT-PMD size hugetlb. > > Signed-off-by: Baolin Wang > --- > mm/rmap.c | 24 ++-- > 1 file changed, 18 insertions(+), 6 deletions(-) With the addition of !CONFIG_HUGETLB_PAGE stubs, Reviewed-by: Mike Kravetz -- Mike Kravetz
[PATCH 2/2] powerpc/vdso: Link with ld.lld when requested
The PowerPC vDSO is linked with $(CC) instead of $(LD), which means the default linker of the compiler is used instead of the linker requested by the builder. $ make ARCH=powerpc LLVM=1 mrproper defconfig arch/powerpc/kernel/vdso/ ... $ llvm-readelf -p .comment arch/powerpc/kernel/vdso/vdso{32,64}.so.dbg File: arch/powerpc/kernel/vdso/vdso32.so.dbg String dump of section '.comment': [ 0] clang version 14.0.0 (Fedora 14.0.0-1.fc37) File: arch/powerpc/kernel/vdso/vdso64.so.dbg String dump of section '.comment': [ 0] clang version 14.0.0 (Fedora 14.0.0-1.fc37) The compiler option '-fuse-ld' tells the compiler which linker to use when it is invoked as both the compiler and linker. Use '-fuse-ld=lld' when LD=ld.lld has been specified (CONFIG_LD_IS_LLD) so that the vDSO is linked with the same linker as the rest of the kernel. $ llvm-readelf -p .comment arch/powerpc/kernel/vdso/vdso{32,64}.so.dbg File: arch/powerpc/kernel/vdso/vdso32.so.dbg String dump of section '.comment': [ 0] Linker: LLD 14.0.0 [14] clang version 14.0.0 (Fedora 14.0.0-1.fc37) File: arch/powerpc/kernel/vdso/vdso64.so.dbg String dump of section '.comment': [ 0] Linker: LLD 14.0.0 [14] clang version 14.0.0 (Fedora 14.0.0-1.fc37) LD can be a full path to ld.lld, which will not be handled properly by '-fuse-ld=lld' if the full path to ld.lld is outside of the compiler's search path. '-fuse-ld' can take a path to the linker but it is deprecated in clang 12.0.0; '--ld-path' is preferred for this scenario. Use '--ld-path' if it is supported, as it will handle a full path or just 'ld.lld' properly. See the LLVM commit below for the full details of '--ld-path'. Link: https://github.com/ClangBuiltLinux/linux/issues/774 Link: https://github.com/llvm/llvm-project/commit/1bc5c84710a8c73ef21295e63c19d10a8c71f2f5 Signed-off-by: Nathan Chancellor --- arch/powerpc/kernel/vdso/Makefile | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/powerpc/kernel/vdso/Makefile b/arch/powerpc/kernel/vdso/Makefile index 954974287ee7..096b0bf1335f 100644 --- a/arch/powerpc/kernel/vdso/Makefile +++ b/arch/powerpc/kernel/vdso/Makefile @@ -48,6 +48,7 @@ UBSAN_SANITIZE := n KASAN_SANITIZE := n ccflags-y := -shared -fno-common -fno-builtin -nostdlib -Wl,--hash-style=both +ccflags-$(CONFIG_LD_IS_LLD) += $(call cc-option,--ld-path=$(LD),-fuse-ld=lld) CC32FLAGS := -Wl,-soname=linux-vdso32.so.1 -m32 AS32FLAGS := -D__VDSO32__ -s -- 2.36.1
[PATCH 1/2] powerpc/vdso: Remove unused ENTRY in linker scripts
When linking vdso{32,64}.so.dbg with ld.lld, there is a warning about not finding _start for the starting address: ld.lld: warning: cannot find entry symbol _start; not setting start address ld.lld: warning: cannot find entry symbol _start; not setting start address Looking at GCC + GNU ld, the entry point address is 0x0: $ llvm-readelf -h vdso{32,64}.so.dbg &| rg "(File|Entry point address):" File: vdso32.so.dbg Entry point address: 0x0 File: vdso64.so.dbg Entry point address: 0x0 This matches what ld.lld emits: $ powerpc64le-linux-gnu-readelf -p .comment vdso{32,64}.so.dbg File: vdso32.so.dbg String dump of section '.comment': [ 0] Linker: LLD 14.0.0 [14] clang version 14.0.0 (Fedora 14.0.0-1.fc37) File: vdso64.so.dbg String dump of section '.comment': [ 0] Linker: LLD 14.0.0 [14] clang version 14.0.0 (Fedora 14.0.0-1.fc37) $ llvm-readelf -h vdso{32,64}.so.dbg &| rg "(File|Entry point address):" File: vdso32.so.dbg Entry point address: 0x0 File: vdso64.so.dbg Entry point address: 0x0 Remove ENTRY to remove the warning, as it is unnecessary for the vDSO to function correctly. Signed-off-by: Nathan Chancellor --- arch/powerpc/kernel/vdso/vdso32.lds.S | 1 - arch/powerpc/kernel/vdso/vdso64.lds.S | 1 - 2 files changed, 2 deletions(-) diff --git a/arch/powerpc/kernel/vdso/vdso32.lds.S b/arch/powerpc/kernel/vdso/vdso32.lds.S index 58e0099f70f4..e0d19d74455f 100644 --- a/arch/powerpc/kernel/vdso/vdso32.lds.S +++ b/arch/powerpc/kernel/vdso/vdso32.lds.S @@ -13,7 +13,6 @@ OUTPUT_FORMAT("elf32-powerpcle", "elf32-powerpcle", "elf32-powerpcle") OUTPUT_FORMAT("elf32-powerpc", "elf32-powerpc", "elf32-powerpc") #endif OUTPUT_ARCH(powerpc:common) -ENTRY(_start) SECTIONS { diff --git a/arch/powerpc/kernel/vdso/vdso64.lds.S b/arch/powerpc/kernel/vdso/vdso64.lds.S index 0288cad428b0..1a4a7bc4c815 100644 --- a/arch/powerpc/kernel/vdso/vdso64.lds.S +++ b/arch/powerpc/kernel/vdso/vdso64.lds.S @@ -13,7 +13,6 @@ OUTPUT_FORMAT("elf64-powerpcle", "elf64-powerpcle", "elf64-powerpcle") OUTPUT_FORMAT("elf64-powerpc", "elf64-powerpc", "elf64-powerpc") #endif OUTPUT_ARCH(powerpc:common64) -ENTRY(_start) SECTIONS { -- 2.36.1
[PATCH 0/2] Link the PowerPC vDSO with ld.lld
Hi all, This series is an alternative to the one proposed by Nick before the PowerPC vDSO unification in commit fd1feade75fb ("powerpc/vdso: Merge vdso64 and vdso32 into a single directory"): https://lore.kernel.org/20200901222523.1941988-1-ndesaulni...@google.com/ Normally, we try to make compiling and linking two separate stages so that they can be done by $(CC) and $(LD) respectively, which is more in line with what the user expects, versus using the compiler as a linker driver and relying on the implicit default linker value. However, as shown in the above thread, getting this right for the PowerPC vDSO is a little tricky due to the linker emulation values. The unification might make this easier but that needs further investigation. To avoid regressing ld.bfd while enabling support for linking the vDSO with ld.lld, we can tell the compiler to use ld.lld via either '--ld-path=' (clang 12.0.0) or '-fuse-ld=lld'. The first patch avoids a warning from ld.lld when linking both vDSO objects and the second patch adds the flags. This should help avoid the issue noticed during Alexey's LTO bring up: https://lore.kernel.org/cakwvodmumhqhqhdcpwjmniqqpvwojb9mbukf3rr0bl+h+da...@mail.gmail.com/ Nathan Chancellor (2): powerpc/vdso: Remove unused ENTRY in linker scripts powerpc/vdso: Link with ld.lld when requested arch/powerpc/kernel/vdso/Makefile | 1 + arch/powerpc/kernel/vdso/vdso32.lds.S | 1 - arch/powerpc/kernel/vdso/vdso64.lds.S | 1 - 3 files changed, 1 insertion(+), 2 deletions(-) base-commit: f06351f8c0c85e2d53e73c53a33b4ef55b4ad6de -- 2.36.1
Re: [PATCH v2 1/3] mm: change huge_ptep_clear_flush() to return the original pte
On 5/9/22 01:46, Baolin Wang wrote: > > > On 5/9/2022 1:46 PM, Christophe Leroy wrote: >> >> >> Le 08/05/2022 à 15:09, Baolin Wang a écrit : >>> >>> >>> On 5/8/2022 7:09 PM, Muchun Song wrote: On Sun, May 08, 2022 at 05:36:39PM +0800, Baolin Wang wrote: > It is incorrect to use ptep_clear_flush() to nuke a hugetlb page > table when unmapping or migrating a hugetlb page, and will change > to use huge_ptep_clear_flush() instead in the following patches. > > So this is a preparation patch, which changes the > huge_ptep_clear_flush() > to return the original pte to help to nuke a hugetlb page table. > > Signed-off-by: Baolin Wang > Acked-by: Mike Kravetz Reviewed-by: Muchun Song >>> >>> Thanks for reviewing. >>> But one nit below: [...] > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index 8605d7e..61a21af 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -5342,7 +5342,7 @@ static vm_fault_t hugetlb_wp(struct mm_struct > *mm, struct vm_area_struct *vma, > ClearHPageRestoreReserve(new_page); > /* Break COW or unshare */ > - huge_ptep_clear_flush(vma, haddr, ptep); > + (void)huge_ptep_clear_flush(vma, haddr, ptep); Why add a "(void)" here? Is there any warning if no "(void)"? IIUC, I think we can remove this, right? >>> >>> I did not meet any warning without the casting, but this is per Mike's >>> comment[1] to make the code consistent with other functions casting to >>> void type explicitly in hugetlb.c file. >>> >>> [1] >>> https://lore.kernel.org/all/495c4ebe-a5b4-afb6-4cb0-956c1b18d...@oracle.com/ >>> >> >> As far as I understand, Mike said that you should be accompagnied with a >> big fat comment explaining why we ignore the returned value from >> huge_ptep_clear_flush(). > >> By the way huge_ptep_clear_flush() is not declared 'must_check' so this >> cast is just visual polution and should be removed. >> >> In the meantime the comment suggested by Mike should be added instead. > Sorry for my misunderstanding. I just follow the explicit void casting like > other places in hugetlb.c file. And I am not sure if it is useful adding some > comments like below, since we did not need the original pte value in the COW > case mapping with a new page, and the code is more readable already I think. > > Mike, could you help to clarify what useful comments would you like? and > remove the explicit void casting? Thanks. > Sorry for the confusion. In the original commit, it seemed odd to me that the signature of the function was changing and there was not an associated change to the only caller of the function. I did suggest casting to void or adding a comment. As Christophe mentions, the cast to void is not necessary. In addition, there really isn't a need for a comment as the calling code is not changed. The original version of the commit without either is actually preferable. The commit message does say this is a preparation patch and the return value will be used in later patches. Again, sorry for the confusion. -- Mike Kravetz
Re: [PATCH kernel] powerpc/llvm/lto: Allow LLVM LTO builds
Hi Alexey, On Mon, May 09, 2022 at 05:42:59PM +1000, Alexey Kardashevskiy wrote: > > > On 5/9/22 15:18, Alexey Kardashevskiy wrote: > > > > > > On 5/4/22 07:21, Nick Desaulniers wrote: > > > On Thu, Apr 28, 2022 at 11:46 PM Alexey Kardashevskiy > > > wrote: > > > > > > > > This enables LTO_CLANG builds on POWER with the upstream version of > > > > LLVM. > > > > > > > > LTO optimizes the output vmlinux binary and this may affect the FTP > > > > alternative section if alt branches use "bc" (Branch Conditional) which > > > > is limited by 16 bit offsets. This shows up in errors like: > > > > > > > > ld.lld: error: InputSection too large for range extension thunk > > > > vmlinux.o:(__ftr_alt_97+0xF0) > > > > > > > > This works around the issue by replacing "bc" in FTR_SECTION_ELSE with > > > > "b" which allows 26 bit offsets. > > > > > > > > This catches the problem instructions in vmlinux.o before it LTO'ed: > > > > > > > > $ objdump -d -M raw -j __ftr_alt_97 vmlinux.o | egrep '\S+\s*\' > > > > 30: 00 00 82 40 bc 4,eq,30 <__ftr_alt_97+0x30> > > > > f0: 00 00 82 40 bc 4,eq,f0 <__ftr_alt_97+0xf0> > > > > > > > > This allows LTO builds for ppc64le_defconfig plus LTO options. > > > > Note that DYNAMIC_FTRACE/FUNCTION_TRACER is not supported by LTO builds > > > > but this is not POWERPC-specific. > > > > > > $ ARCH=powerpc make LLVM=1 -j72 ppc64le_defconfig > > > $ ARCH=powerpc make LLVM=1 -j72 menuconfig > > > > > > $ ARCH=powerpc make LLVM=1 -j72 > > > ... > > > VDSO64L arch/powerpc/kernel/vdso/vdso64.so.dbg > > > /usr/bin/powerpc64le-linux-gnu-ld: > > > /android0/llvm-project/llvm/build/bin/../lib/LLVMgold.so: error > > > loading plugin: > > > /android0/llvm-project/llvm/build/bin/../lib/LLVMgold.so: cannot open > > > shared object file: No such file or directory > > > clang-15: error: linker command failed with exit code 1 (use -v to see > > > invocation) > > > make[1]: *** [arch/powerpc/kernel/vdso/Makefile:67: > > > arch/powerpc/kernel/vdso/vdso64.so.dbg] Error 1 > > > > > > Looks like LLD isn't being invoked correctly to link the vdso. > > > Probably need to revisit > > > https://lore.kernel.org/lkml/20200901222523.1941988-1-ndesaulni...@google.com/ > > > > > > How were you working around this issue? Perhaps you built clang to > > > default to LLD? (there's a cmake option for that) > > > > > > What option is that? I only add -DLLVM_ENABLE_LLD=ON which (I think) The option Nick is referring to here is CLANG_DEFAULT_LINKER, which sets the default linker when clang is invoked as the linker driver, which the PowerPC vDSO Makefile does. We have been trying to move all parts of the kernel to compile with $(CC) and link with $(LD) so that the default linker invoked by the compiler is taken out of the equation. For what it's worth, I think that the error Nick is seeing is due to a lack of the LLVMgold.so plugin on his system, which is built when -DLLVM_BINUTILS_INCDIR=... is included in the list of CMake variables, which you have. The LLVMgold.so plugin is needed when doing LTO with GNU ld, which is the case with the vDSO currently for the reason I mentioned above. We could work around this with Nick's proposed '-fuse-ld=lld' patch or just disable LTO for the vDSO. https://llvm.org/docs/GoldPlugin.html My version of the hack would probably be: ccflags-$(CONFIG_LD_IS_LLD) += -fuse-ld=lld asflags-$(CONFIG_LD_IS_LLD) += -fuse-ld=lld > > tells cmake to use lld to link the LLVM being built but does not seem to > > tell what the built clang should do. Right, -DLLVM_ENABLE_LLD=ON is equivalent to -DLLVM_USE_LINKER=lld, except when doing a multi-stage build: https://llvm.org/docs/CMake.html#llvm-related-variables > > > > Without -DLLVM_ENABLE_LLD=ON, building just fails: > > > > [fstn1-p1 ~/pbuild/llvm/llvm-lto-latest-cleanbuild]$ ninja -j 100 > > [619/3501] Linking CXX executable bin/not > > FAILED: bin/not > > : && /usr/bin/clang++ -fPIC -fvisibility-inlines-hidden > > -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra > > -Wno-unused-parameter -Wwrite-strings -Wcast-qual > > -Wmissing-field-initializers -pedantic -Wno-long-long > > -Wc++98-compat-extra-semi -Wimplicit-fallthrough > > -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor > > -Wdelete-non-virtual-dtor -Wsuggest-override -Wstring-conversion > > -Wmisleading-indentation -fdiagnostics-color -ffunction-sections > > -fdata-sections -flto -O3 -DNDEBUG -flto > > -Wl,-rpath-link,/home/aik/pbuild/llvm/llvm-lto-latest-cleanbuild/./lib > > -Wl,--gc-sections utils/not/CMakeFiles/not.dir/not.cpp.o -o bin/not > > -Wl,-rpath,"\$ORIGIN/../lib" -lpthread lib/libLLVMSupport.a -lrt > > -ldl -lpthread -lm /usr/lib/powerpc64le-linux-gnu/libz.so > > /usr/lib/powerpc64le-linux-gnu/libtinfo.so lib/libLLVMDemangle.a && : > > /usr/bin/ld: lib/libLLVMSupport.a: error adding symbols: archive has no > > index; run ranlib to add one > > clang: error: linker command failed with exit code 1
Re: [PATCH 2/3] of: dynamic: add of_node_alloc() and of_node_free()
On Fri, May 6, 2022 at 5:45 AM Clément Léger wrote: > > Le Thu, 5 May 2022 14:43:54 -0500, > Rob Herring a écrit : > > > On Wed, May 04, 2022 at 05:40:32PM +0200, Clément Léger wrote: > > > Add functions which allows to create and free nodes. > > > > > > Signed-off-by: Clément Léger > > > --- > > > drivers/of/dynamic.c | 59 > > > include/linux/of.h | 9 +++ > > > 2 files changed, 58 insertions(+), 10 deletions(-) > > > > > > diff --git a/drivers/of/dynamic.c b/drivers/of/dynamic.c > > > index e8700e509d2e..ec28e5ba2969 100644 > > > --- a/drivers/of/dynamic.c > > > +++ b/drivers/of/dynamic.c > > > @@ -455,6 +455,54 @@ struct property *__of_prop_dup(const struct property > > > *prop, gfp_t allocflags) > > > prop->length, allocflags); > > > } > > > > > > +/** > > > + * of_node_free - Free a node allocated dynamically. > > > + * @node: Node to be freed > > > + */ > > > +void of_node_free(const struct device_node *node) > > > +{ > > > + kfree(node->full_name); > > > + kfree(node->data); > > > + kfree(node); > > > +} > > > +EXPORT_SYMBOL(of_node_free); > > > > This shouldn't be needed. Nodes are refcounted, so any caller should > > just do a put. > > Acked. Do you want the name to be allocated as part of the node > allocation also ? Yeah, I think that would be fine. Rob
Re: [PATCH 3/3] mm: rmap: Fix CONT-PTE/PMD size hugetlb issue when unmapping
On Fri, May 06, 2022 at 12:07:13PM -0700, Mike Kravetz wrote: > On 5/3/22 03:03, Gerald Schaefer wrote: > > On Tue, 3 May 2022 10:19:46 +0800 > > Baolin Wang wrote: > >> On 5/2/2022 10:02 PM, Gerald Schaefer wrote: [...] > >> Please see previous code, we'll use the original pte value to check if > >> it is uffd-wp armed, and if need to mark it dirty though the hugetlbfs > >> is set noop_dirty_folio(). > >> > >> pte_install_uffd_wp_if_needed(vma, address, pvmw.pte, pteval); > > > > Uh, ok, that wouldn't work on s390, but we also don't have > > CONFIG_PTE_MARKER_UFFD_WP / HAVE_ARCH_USERFAULTFD_WP set, so > > I guess we will be fine (for now). > > > > Still, I find it a bit unsettling that pte_install_uffd_wp_if_needed() > > would work on a potential hugetlb *pte, directly de-referencing it > > instead of using huge_ptep_get(). > > > > The !pte_none(*pte) check at the beginning would be broken in the > > hugetlb case for s390 (not sure about other archs, but I think s390 > > might be the only exception strictly requiring huge_ptep_get() > > for de-referencing hugetlb *pte pointers). We could have used is_vm_hugetlb_page(vma) within the helper so as to properly use either generic pte or hugetlb version of pte fetching. We may want to conditionally do set_[huge_]pte_at() too at the end. I could prepare a patch for that even if it's not really anything urgently needed. I assume that won't need to block this patchset since we need the pteval for pte_dirty() check anyway and uffd-wp definitely needs it too. Thanks, -- Peter Xu
Re: request_module DoS
On Mon, May 09, 2022 at 09:23:39PM +1000, Michael Ellerman wrote: > Herbert Xu writes: > > Hi: > > > > There are some code paths in the kernel where you can reliably > > trigger a request_module of a non-existant module. For example, > > if you attempt to load a non-existent crypto algorithm, or create > > a socket of a non-existent network family, it will result in a > > request_module call that is guaranteed to fail. > > > > As user-space can do this repeatedly, it can quickly overwhelm > > the concurrency limit in kmod. This in itself is expected, > > however, at least on some platforms this appears to result in > > a live-lock. Here is an example triggered by stress-ng on ppc64: > > > > [ 529.853264] request_module: kmod_concurrent_max (0) close to 0 > > (max_modprobes: 50), for module crypto-aegis128l, throttling... > ... > > [ 580.414590] __request_module: 25 callbacks suppressed > > [ 580.414597] request_module: kmod_concurrent_max (0) close to 0 > > (max_modprobes: 50), for module crypto-aegis256-all, throttling... > > [ 580.423082] watchdog: CPU 784 self-detected hard LOCKUP @ > > plpar_hcall_norets_notrace+0x18/0x2c > > [ 580.423097] watchdog: CPU 784 TB:1297691958559475, last heartbeat > > TB:1297686321743840 (11009ms ago) > > [ 580.423099] Modules linked in: cast6_generic cast5_generic cast_common > > camellia_generic blowfish_generic blowfish_common tun nft_fib_inet > > nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 > > nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack > > nf_defrag_ipv6 nf_defrag_ipv4 rfkill bonding tls ip_set nf_tables nfnetlink > > pseries_rng binfmt_misc drm drm_panel_orientation_quirks xfs libcrc32c > > sd_mod t10_pi sg ibmvscsi ibmveth scsi_transport_srp vmx_crypto dm_mirror > > dm_region_hash dm_log dm_mod fuse > > [ 580.423136] CPU: 784 PID: 77071 Comm: stress-ng Kdump: loaded Not > > tainted 5.14.0-55.el9.ppc64le #1 > > [ 580.423139] NIP: c00f8ff4 LR: c01f7c38 CTR: > > > > [ 580.423140] REGS: c043fdd7bd60 TRAP: 0900 Not tainted > > (5.14.0-55.el9.ppc64le) > > [ 580.423142] MSR: 8280b033 > > CR: 28008202 XER: 2004 > > [ 580.423148] CFAR: 0c00 IRQMASK: 1 > >GPR00: 28008202 c044c46b3850 c2a46f00 > > > >GPR04: 0010 > > c2a83060 > >GPR08: 0001 0001 > > > >GPR12: c01b9530 c043ffe16700 00020117 > > 10185ea8 > >GPR16: 10212150 10186198 101863a0 > > 1021b3c0 > >GPR20: 0001 0001 > > 00ff > >GPR24: c043f4a00e14 c043fafe0e00 0c44 > > > >GPR28: c043f4a00e00 c043f4a00e00 c21e0e00 > > c2561aa0 > > [ 580.423166] NIP [c00f8ff4] plpar_hcall_norets_notrace+0x18/0x2c > > [ 580.423168] LR [c01f7c38] > > __pv_queued_spin_lock_slowpath+0x528/0x530 > > [ 580.423173] Call Trace: > > [ 580.423174] [c044c46b3850] [00016b60] 0x16b60 > > (unreliable) > > [ 580.423177] [c044c46b3910] [c0ea6948] > > _raw_spin_lock_irqsave+0xa8/0xc0 > > [ 580.423182] [c044c46b3940] [c01dd7c0] > > prepare_to_wait_event+0x40/0x200 > > [ 580.423185] [c044c46b39a0] [c019e9e0] > > __request_module+0x320/0x510 > > [ 580.423188] [c044c46b3ac0] [c06f1a14] > > crypto_alg_mod_lookup+0x1e4/0x2e0 > > [ 580.423192] [c044c46b3b60] [c06f2178] > > crypto_alloc_tfm_node+0xa8/0x1a0 > > [ 580.423194] [c044c46b3be0] [c06f84f8] > > crypto_alloc_aead+0x38/0x50 > > [ 580.423196] [c044c46b3c00] [c072cba0] aead_bind+0x70/0x140 > > [ 580.423199] [c044c46b3c40] [c0727824] alg_bind+0xb4/0x210 > > [ 580.423201] [c044c46b3cc0] [c0bc2ad4] __sys_bind+0x114/0x160 > > [ 580.423205] [c044c46b3d90] [c0bc2b48] sys_bind+0x28/0x40 > > [ 580.423207] [c044c46b3db0] [c0030880] > > system_call_exception+0x160/0x300 > > [ 580.423209] [c044c46b3e10] [c000c168] > > system_call_vectored_common+0xe8/0x278 > > [ 580.423213] --- interrupt: 3000 at 0x7fff9b824464 > > [ 580.423214] NIP: 7fff9b824464 LR: CTR: > > > > [ 580.423215] REGS: c044c46b3e80 TRAP: 3000 Not tainted > > (5.14.0-55.el9.ppc64le) > > [ 580.423216] MSR: 8280f033 > > CR: 42004802 XER: > > [ 580.423221] IRQMASK: 0 > >GPR00: 0147 7fffdcff2780 7fff9b917100 > > 0004 > >GPR04: 7fffdcff27e0 0058 > > > >GPR08:
Re: [PATCH 24/30] panic: Refactor the panic path
On 29/04/2022 13:04, Guilherme G. Piccoli wrote: > On 27/04/2022 21:28, Randy Dunlap wrote: >> >> >> On 4/27/22 15:49, Guilherme G. Piccoli wrote: >>> + crash_kexec_post_notifiers >>> + This was DEPRECATED - users should always prefer the >> >> This is DEPRECATED - users should always prefer the >> >>> + parameter "panic_notifiers_level" - check its entry >>> + in this documentation for details on how it works. >>> + Setting this parameter is exactly the same as setting >>> + "panic_notifiers_level=4". >> > > Thanks Randy, for your suggestion - but I confess I couldn't understand > it properly. It's related to spaces/tabs, right? What you suggest me to > change in this formatting? Just by looking the email I can't parse. > > Cheers, > > > Guilherme Complete lack of attention from me, apologies! The suggestions was s/was/is - already fixed for V2, thanks Randy.
Re: [PATCH 10/30] alpha: Clean-up the panic notifier code
On 27/04/2022 19:49, Guilherme G. Piccoli wrote: > The alpha panic notifier has some code issues, not following > the conventions of other notifiers. Also, it might halt the > machine but still it is set to run as early as possible, which > doesn't seem to be a good idea. > > This patch cleans the code, and set the notifier to run as the > latest, following the same approach other architectures are doing. > Also, we remove the unnecessary include of a header already > included indirectly. > > Cc: Ivan Kokshaysky > Cc: Matt Turner > Cc: Richard Henderson > Signed-off-by: Guilherme G. Piccoli > --- > arch/alpha/kernel/setup.c | 36 +++- > 1 file changed, 15 insertions(+), 21 deletions(-) > > diff --git a/arch/alpha/kernel/setup.c b/arch/alpha/kernel/setup.c > index b4fbbba30aa2..d88bdf852753 100644 > --- a/arch/alpha/kernel/setup.c > +++ b/arch/alpha/kernel/setup.c > @@ -41,19 +41,11 @@ > #include > #include > #endif > -#include > #include > #include > #include > #include > > -static int alpha_panic_event(struct notifier_block *, unsigned long, void *); > -static struct notifier_block alpha_panic_block = { > - alpha_panic_event, > -NULL, > -INT_MAX /* try to do it first */ > -}; > - > #include > #include > #include > @@ -435,6 +427,21 @@ static const struct sysrq_key_op srm_sysrq_reboot_op = { > }; > #endif > > +static int alpha_panic_event(struct notifier_block *this, > + unsigned long event, void *ptr) > +{ > + /* If we are using SRM and serial console, just hard halt here. */ > + if (alpha_using_srm && srmcons_output) > + __halt(); > + > + return NOTIFY_DONE; > +} > + > +static struct notifier_block alpha_panic_block = { > + .notifier_call = alpha_panic_event, > + .priority = INT_MIN, /* may not return, do it last */ > +}; > + > void __init > setup_arch(char **cmdline_p) > { > @@ -1427,19 +1434,6 @@ const struct seq_operations cpuinfo_op = { > .show = show_cpuinfo, > }; > > - > -static int > -alpha_panic_event(struct notifier_block *this, unsigned long event, void > *ptr) > -{ > -#if 1 > - /* FIXME FIXME FIXME */ > - /* If we are using SRM and serial console, just hard halt here. */ > - if (alpha_using_srm && srmcons_output) > - __halt(); > -#endif > -return NOTIFY_DONE; > -} > - > static __init int add_pcspkr(void) > { > struct platform_device *pd; Hi folks, I'm updating Richard's email and re-sending the V1, any reviews are greatly appreciated! Cheers, Guilherme
Re: [PATCH v6 22/29] x86/watchdog/hardlockup: Add an HPET-based hardlockup detector
On Thu, May 05 2022 at 17:00, Ricardo Neri wrote: > + if (is_hpet_hld_interrupt(hdata)) { > + /* > + * Kick the timer first. If the HPET channel is periodic, it > + * helps to reduce the delta between the expected TSC value and > + * its actual value the next time the HPET channel fires. > + */ > + kick_timer(hdata, !(hdata->has_periodic)); > + > + if (cpumask_weight(hld_data->monitored_cpumask) > 1) { > + /* > + * Since we cannot know the source of an NMI, the best > + * we can do is to use a flag to indicate to all online > + * CPUs that they will get an NMI and that the source of > + * that NMI is the hardlockup detector. Offline CPUs > + * also receive the NMI but they ignore it. > + * > + * Even though we are in NMI context, we have concluded > + * that the NMI came from the HPET channel assigned to > + * the detector, an event that is infrequent and only > + * occurs in the handling CPU. There should not be races > + * with other NMIs. > + */ > + cpumask_copy(hld_data->inspect_cpumask, > + cpu_online_mask); > + > + /* If we are here, IPI shorthands are enabled. */ > + apic->send_IPI_allbutself(NMI_VECTOR); So if the monitored cpumask is a subset of online CPUs, which is the case when isolation features are enabled, then you still send NMIs to those isolated CPUs. I'm sure the isolation folks will be enthused. Thanks, tglx
Re: [PATCH v6 21/29] x86/nmi: Add an NMI_WATCHDOG NMI handler category
On Thu, May 05 2022 at 17:00, Ricardo Neri wrote: > Add a NMI_WATCHDOG as a new category of NMI handler. This new category > is to be used with the HPET-based hardlockup detector. This detector > does not have a direct way of checking if the HPET timer is the source of > the NMI. Instead, it indirectly estimates it using the time-stamp counter. > > Therefore, we may have false-positives in case another NMI occurs within > the estimated time window. For this reason, we want the handler of the > detector to be called after all the NMI_LOCAL handlers. A simple way > of achieving this with a new NMI handler category. > > @@ -379,6 +385,10 @@ static noinstr void default_do_nmi(struct pt_regs *regs) > } > raw_spin_unlock(_reason_lock); > > + handled = nmi_handle(NMI_WATCHDOG, regs); > + if (handled == NMI_HANDLED) > + goto out; > + How is this supposed to work reliably? If perf is active and the HPET NMI and the perf NMI come in around the same time, then nmi_handle(LOCAL) can swallow the NMI and the watchdog won't be checked. Because MSI is strictly edge and the message is only sent once, this can result in a stale watchdog, no? Thanks, tglx
Re: [PATCH] bug: Use normal relative pointers in 'struct bug_entry'
Josh Poimboeuf writes: > With CONFIG_GENERIC_BUG_RELATIVE_POINTERS, the addr/file relative > pointers are calculated weirdly: based on the beginning of the bug_entry > struct address, rather than their respective pointer addresses. > > Make the relative pointers less surprising to both humans and tools by > calculating them the normal way. > > Signed-off-by: Josh Poimboeuf > --- ... > diff --git a/arch/powerpc/include/asm/bug.h b/arch/powerpc/include/asm/bug.h > index ecbae1832de3..76252576d889 100644 > --- a/arch/powerpc/include/asm/bug.h > +++ b/arch/powerpc/include/asm/bug.h > @@ -13,7 +13,8 @@ > #ifdef CONFIG_DEBUG_BUGVERBOSE > .macro __EMIT_BUG_ENTRY addr,file,line,flags >.section __bug_table,"aw" > -5001: .4byte \addr - 5001b, 5002f - 5001b > +5001: .4byte \addr - . > + .4byte 5002f - . >.short \line, \flags >.org 5001b+BUG_ENTRY_SIZE >.previous > @@ -24,7 +25,7 @@ > #else > .macro __EMIT_BUG_ENTRY addr,file,line,flags >.section __bug_table,"aw" > -5001: .4byte \addr - 5001b > +5001: .4byte \addr - . >.short \flags >.org 5001b+BUG_ENTRY_SIZE >.previous Embarrassingly, we have another copy of the logic, used in the C versions, they need updating too: diff --git a/arch/powerpc/include/asm/bug.h b/arch/powerpc/include/asm/bug.h index ecbae1832de3..3fde35fd67f8 100644 --- a/arch/powerpc/include/asm/bug.h +++ b/arch/powerpc/include/asm/bug.h @@ -49,14 +49,14 @@ #ifdef CONFIG_DEBUG_BUGVERBOSE #define _EMIT_BUG_ENTRY\ ".section __bug_table,\"aw\"\n" \ - "2:\t.4byte 1b - 2b, %0 - 2b\n" \ + "2:\t.4byte 1b - ., %0 - .\n" \ "\t.short %1, %2\n" \ ".org 2b+%3\n" \ ".previous\n" #else #define _EMIT_BUG_ENTRY\ ".section __bug_table,\"aw\"\n" \ - "2:\t.4byte 1b - 2b\n" \ + "2:\t.4byte 1b - .\n" \ "\t.short %2\n" \ ".org 2b+%3\n" \ ".previous\n" With that added it seems to be working correctly for me. Acked-by: Michael Ellerman (powerpc) cheers
Re: [PATCH 1/3] termbits.h: create termbits-common.h for identical bits
On Mon, 9 May 2022, Helge Deller wrote: > Hello Ilpo, > > On 5/9/22 11:34, Ilpo Järvinen wrote: > > Some defines are the same across all archs. Move the most obvious > > intersection to termbits-common.h. > > I like your cleanup patches, but in this specific one, does it makes sense > to split up together-belonging constants, e.g. > > > diff --git a/arch/parisc/include/uapi/asm/termbits.h > > b/arch/parisc/include/uapi/asm/termbits.h > > index 6017ee08f099..7f74a822b7ea 100644 > > --- a/arch/parisc/include/uapi/asm/termbits.h > > +++ b/arch/parisc/include/uapi/asm/termbits.h > > @@ -61,31 +61,15 @@ struct ktermios { > > > > > > /* c_iflag bits */ > > -#define IGNBRK 0x1 > > -#define BRKINT 0x2 > > -#define IGNPAR 0x4 > > -#define PARMRK 0x8 > > -#define INPCK 0x00010 > > -#define ISTRIP 0x00020 > > -#define INLCR 0x00040 > > -#define IGNCR 0x00080 > > -#define ICRNL 0x00100 > > #define IUCLC 0x00200 > > #define IXON 0x00400 > > -#define IXANY 0x00800 > > #define IXOFF 0x01000 > > #define IMAXBEL0x04000 > > #define IUTF8 0x08000 > > In the hunk above you leave IUCLC, IXON, IXOFF... because they seem unique to > parisc. > The other defines are then taken from generic header. > Although this is correct, it leaves single values alone, which make it hard > to verify > because you don't see the full list of values in one place. While I too am fine either way, I don't think these are as strongly grouped as you seem to imply. There's no big advantage in having as much as possible within the same file. If somebody is looking for the meaning of these, these headers are no match when compared e.g. with stty manpage. For c_iflag, the break, parity and cr related "groups" within c_iflag are moving completely to common header. IXANY is probably only one close to borderline whether it kind of belongs to the same group as IXON/IXOFF (which both by chance both remained on the same side in the split). I don't think it does strongly enough to warrant keeping them next to each other but I'm open what opinions others have on it. The rest in c_iflag don't seem to be strongly tied/grouped to the other defines within c_iflag. They're just bits that appear next/close to each other but are not tied by any significant meaning-based connection. C_oflag is more messy. I exercised grouping based judgement with c_oflag where only the defines with all bits as zero would have moved to the common header breaking the groups very badly. That is, only CR0 would have moved and CR1-3 remained in arch headers, etc. which made no sense to do. One could argue, that since ONLCR (and perhaps CRDLY) are not moving, no other cr related defines should move either. > > @@ -112,24 +96,6 @@ struct ktermios { > > > > /* c_cflag bit meaning */ > > #define CBAUD 0x100f > > -#define B00x /* hang up */ > > -#define B50 0x0001 > > -#define B75 0x0002 > > -#define B110 0x0003 > > -#define B134 0x0004 > > -#define B150 0x0005 > > -#define B200 0x0006 > > -#define B300 0x0007 > > -#define B600 0x0008 > > -#define B1200 0x0009 > > -#define B1800 0x000a > > -#define B2400 0x000b > > -#define B4800 0x000c > > -#define B9600 0x000d > > -#define B192000x000e > > -#define B384000x000f > > -#define EXTA B19200 > > -#define EXTB B38400 > > Here all baud values are dropped and will be taken from generic header, > which is good. > > That said, I think it's good to move away the second hunk, > but maybe we should keep the first as is? > > It's just a thought. Either way, I'm fine your patch if that's the > way which is decided to go for all platforms. Yes, lets wait and see what the others think. Thanks for taking a look! -- i.
Re: [PATCH 1/3] termbits.h: create termbits-common.h for identical bits
Hello Ilpo, On 5/9/22 11:34, Ilpo Järvinen wrote: > Some defines are the same across all archs. Move the most obvious > intersection to termbits-common.h. I like your cleanup patches, but in this specific one, does it makes sense to split up together-belonging constants, e.g. > diff --git a/arch/parisc/include/uapi/asm/termbits.h > b/arch/parisc/include/uapi/asm/termbits.h > index 6017ee08f099..7f74a822b7ea 100644 > --- a/arch/parisc/include/uapi/asm/termbits.h > +++ b/arch/parisc/include/uapi/asm/termbits.h > @@ -61,31 +61,15 @@ struct ktermios { > > > /* c_iflag bits */ > -#define IGNBRK 0x1 > -#define BRKINT 0x2 > -#define IGNPAR 0x4 > -#define PARMRK 0x8 > -#define INPCK0x00010 > -#define ISTRIP 0x00020 > -#define INLCR0x00040 > -#define IGNCR0x00080 > -#define ICRNL0x00100 > #define IUCLC0x00200 > #define IXON 0x00400 > -#define IXANY0x00800 > #define IXOFF0x01000 > #define IMAXBEL 0x04000 > #define IUTF80x08000 In the hunk above you leave IUCLC, IXON, IXOFF... because they seem unique to parisc. The other defines are then taken from generic header. Although this is correct, it leaves single values alone, which make it hard to verify because you don't see the full list of values in one place. > @@ -112,24 +96,6 @@ struct ktermios { > > /* c_cflag bit meaning */ > #define CBAUD0x100f > -#define B0 0x /* hang up */ > -#define B50 0x0001 > -#define B75 0x0002 > -#define B1100x0003 > -#define B1340x0004 > -#define B1500x0005 > -#define B2000x0006 > -#define B3000x0007 > -#define B6000x0008 > -#define B1200 0x0009 > -#define B1800 0x000a > -#define B2400 0x000b > -#define B4800 0x000c > -#define B9600 0x000d > -#define B19200 0x000e > -#define B38400 0x000f > -#define EXTA B19200 > -#define EXTB B38400 Here all baud values are dropped and will be taken from generic header, which is good. That said, I think it's good to move away the second hunk, but maybe we should keep the first as is? It's just a thought. Either way, I'm fine your patch if that's the way which is decided to go for all platforms. Helge
Re: [PATCH v6 00/23] Rust support
On Sat, May 07, 2022 at 01:06:18AM -0700, Kees Cook wrote: > On Sat, May 07, 2022 at 07:23:58AM +0200, Miguel Ojeda wrote: > > ## Patch series status > > > > The Rust support is still to be considered experimental. However, > > support is good enough that kernel developers can start working on the > > Rust abstractions for subsystems and write drivers and other modules. > > I'd really like to see this landed for a few reasons: > > - It's under active development, and I'd rather review the changes > "normally", incrementally, etc. Right now it can be hard to re-review > some of the "mostly the same each version" patches in the series. > > - I'd like to break the catch-22 of "ask for a new driver to be > written in rust but the rust support isn't landed" vs "the rust > support isn't landed because there aren't enough drivers". It > really feels like "release early, release often" is needed here; > it's hard to develop against -next. :) +1 to both points. :-)
[PATCH 3/3] termbits.h: Remove posix_types.h include
Nothing in termbits seems to require anything from linux/posix_types.h. Signed-off-by: Ilpo Järvinen --- arch/alpha/include/uapi/asm/termbits.h | 2 -- arch/mips/include/uapi/asm/termbits.h | 2 -- arch/parisc/include/uapi/asm/termbits.h | 2 -- arch/sparc/include/uapi/asm/termbits.h | 2 -- include/uapi/asm-generic/termbits.h | 2 -- 5 files changed, 10 deletions(-) diff --git a/arch/alpha/include/uapi/asm/termbits.h b/arch/alpha/include/uapi/asm/termbits.h index 735e9ffe2795..f1290b22072b 100644 --- a/arch/alpha/include/uapi/asm/termbits.h +++ b/arch/alpha/include/uapi/asm/termbits.h @@ -2,8 +2,6 @@ #ifndef _ALPHA_TERMBITS_H #define _ALPHA_TERMBITS_H -#include - #include typedef unsigned int tcflag_t; diff --git a/arch/mips/include/uapi/asm/termbits.h b/arch/mips/include/uapi/asm/termbits.h index 8fa3e79d4f94..1eb60903d6f0 100644 --- a/arch/mips/include/uapi/asm/termbits.h +++ b/arch/mips/include/uapi/asm/termbits.h @@ -11,8 +11,6 @@ #ifndef _ASM_TERMBITS_H #define _ASM_TERMBITS_H -#include - #include typedef unsigned int tcflag_t; diff --git a/arch/parisc/include/uapi/asm/termbits.h b/arch/parisc/include/uapi/asm/termbits.h index d72c5ebf3a3a..3a8938d26fb4 100644 --- a/arch/parisc/include/uapi/asm/termbits.h +++ b/arch/parisc/include/uapi/asm/termbits.h @@ -2,8 +2,6 @@ #ifndef __ARCH_PARISC_TERMBITS_H__ #define __ARCH_PARISC_TERMBITS_H__ -#include - #include typedef unsigned int tcflag_t; diff --git a/arch/sparc/include/uapi/asm/termbits.h b/arch/sparc/include/uapi/asm/termbits.h index cfcc4e07ce51..4321322701fc 100644 --- a/arch/sparc/include/uapi/asm/termbits.h +++ b/arch/sparc/include/uapi/asm/termbits.h @@ -2,8 +2,6 @@ #ifndef _UAPI_SPARC_TERMBITS_H #define _UAPI_SPARC_TERMBITS_H -#include - #include #if defined(__sparc__) && defined(__arch64__) diff --git a/include/uapi/asm-generic/termbits.h b/include/uapi/asm-generic/termbits.h index c92179563289..890ef29053e2 100644 --- a/include/uapi/asm-generic/termbits.h +++ b/include/uapi/asm-generic/termbits.h @@ -2,8 +2,6 @@ #ifndef __ASM_GENERIC_TERMBITS_H #define __ASM_GENERIC_TERMBITS_H -#include - #include typedef unsigned int tcflag_t; -- 2.30.2
[PATCH 2/3] termbits.h: Align lines & format
- Align c_cc defines. - Remove extra newlines. - Realign & adjust number of leading zeros. - Reorder c_cflag defines to ascending order - Make comment ending shorted (=remove period and one extra space from the comments in mips). Co-developed-by: Arnd Bergmann Signed-off-by: Arnd Bergmann Signed-off-by: Ilpo Järvinen --- arch/alpha/include/uapi/asm/termbits.h | 130 +- arch/mips/include/uapi/asm/termbits.h| 156 ++--- arch/parisc/include/uapi/asm/termbits.h | 77 +-- arch/powerpc/include/uapi/asm/termbits.h | 100 +++--- arch/sparc/include/uapi/asm/termbits.h | 168 +++ include/uapi/asm-generic/termbits.h | 76 +- 6 files changed, 346 insertions(+), 361 deletions(-) diff --git a/arch/alpha/include/uapi/asm/termbits.h b/arch/alpha/include/uapi/asm/termbits.h index 3c7a9afc4333..735e9ffe2795 100644 --- a/arch/alpha/include/uapi/asm/termbits.h +++ b/arch/alpha/include/uapi/asm/termbits.h @@ -53,60 +53,58 @@ struct ktermios { }; /* c_cc characters */ -#define VEOF 0 -#define VEOL 1 -#define VEOL2 2 -#define VERASE 3 -#define VWERASE 4 -#define VKILL 5 -#define VREPRINT 6 -#define VSWTC 7 -#define VINTR 8 -#define VQUIT 9 -#define VSUSP 10 -#define VSTART 12 -#define VSTOP 13 -#define VLNEXT 14 -#define VDISCARD 15 -#define VMIN 16 -#define VTIME 17 +#define VEOF0 +#define VEOL1 +#define VEOL2 2 +#define VERASE 3 +#define VWERASE 4 +#define VKILL 5 +#define VREPRINT6 +#define VSWTC 7 +#define VINTR 8 +#define VQUIT 9 +#define VSUSP 10 +#define VSTART 12 +#define VSTOP 13 +#define VLNEXT 14 +#define VDISCARD 15 +#define VMIN 16 +#define VTIME 17 /* c_iflag bits */ -#define IXON 0x00200 -#define IXOFF 0x00400 -#define IUCLC 0x01000 -#define IMAXBEL0x02000 -#define IUTF8 0x04000 +#define IXON 0x0200 +#define IXOFF 0x0400 +#define IUCLC 0x1000 +#define IMAXBEL0x2000 +#define IUTF8 0x4000 /* c_oflag bits */ #define ONLCR 0x2 #define OLCUC 0x4 - - -#define NLDLY 0x000300 -#define NL0 0x00 -#define NL1 0x000100 -#define NL2 0x000200 -#define NL3 0x000300 -#define TABDLY 0x000c00 -#define TAB0 0x00 -#define TAB1 0x000400 -#define TAB2 0x000800 -#define TAB3 0x000c00 -#define CRDLY 0x003000 -#define CR0 0x00 -#define CR1 0x001000 -#define CR2 0x002000 -#define CR3 0x003000 -#define FFDLY 0x004000 -#define FF0 0x00 -#define FF1 0x004000 -#define BSDLY 0x008000 -#define BS0 0x00 -#define BS1 0x008000 -#define VTDLY 0x01 -#define VT0 0x00 -#define VT1 0x01 +#define NLDLY 0x00300 +#define NL0 0x0 +#define NL1 0x00100 +#define NL2 0x00200 +#define NL3 0x00300 +#define TABDLY 0x00c00 +#define TAB0 0x0 +#define TAB1 0x00400 +#define TAB2 0x00800 +#define TAB3 0x00c00 +#define CRDLY 0x03000 +#define CR0 0x0 +#define CR1 0x01000 +#define CR2 0x02000 +#define CR3 0x03000 +#define FFDLY 0x04000 +#define FF0 0x0 +#define FF1 0x04000 +#define BSDLY 0x08000 +#define BS0 0x0 +#define BS1 0x08000 +#define VTDLY 0x1 +#define VT0 0x0 +#define VT1 0x1 /* * Should be equivalent to TAB3, see description of TAB3 in * POSIX.1-2008, Ch. 11.2.3 "Output Modes" @@ -116,38 +114,34 @@ struct ktermios { /* c_cflag bit meaning */ #define CBAUD 0x001f #define CBAUDEX0x -#define B576000x0010 -#define B115200 0x0011 -#define B230400 0x0012 -#define B460800 0x0013 -#define B50 0x0014 -#define B576000 0x0015 -#define B921600 0x0016 -#define B100 0x0017 -#define B1152000 0x0018 -#define B150 0x0019 -#define B200 0x001a -#define B250 0x001b -#define B300 0x001c -#define B350 0x001d -#define B400 0x001e #define BOTHER 0x001f - +#define B57600 0x0010 +#defineB115200 0x0011 +#defineB230400 0x0012 +#defineB460800 0x0013 +#defineB50 0x0014 +#defineB576000 0x0015 +#defineB921600 0x0016 +#define B100 0x0017 +#define B1152000 0x0018 +#define B150 0x0019 +#define B200 0x001a +#define B250 0x001b +#define B300 0x001c +#define B350 0x001d +#define B400 0x001e #define CSIZE 0x0300 #define CS5 0x #define CS6 0x0100 #define CS7 0x0200 #define CS8 0x0300 - #define CSTOPB 0x0400 #define CREAD 0x0800 #define PARENB 0x1000
[PATCH 1/3] termbits.h: create termbits-common.h for identical bits
Some defines are the same across all archs. Move the most obvious intersection to termbits-common.h. Signed-off-by: Ilpo Järvinen --- arch/alpha/include/uapi/asm/termbits.h | 52 + arch/mips/include/uapi/asm/termbits.h | 53 +- arch/parisc/include/uapi/asm/termbits.h| 54 +- arch/powerpc/include/uapi/asm/termbits.h | 52 + arch/sparc/include/uapi/asm/termbits.h | 53 +- include/uapi/asm-generic/termbits-common.h | 65 ++ include/uapi/asm-generic/termbits.h| 53 +- 7 files changed, 76 insertions(+), 306 deletions(-) create mode 100644 include/uapi/asm-generic/termbits-common.h diff --git a/arch/alpha/include/uapi/asm/termbits.h b/arch/alpha/include/uapi/asm/termbits.h index 30dc7ff777b8..3c7a9afc4333 100644 --- a/arch/alpha/include/uapi/asm/termbits.h +++ b/arch/alpha/include/uapi/asm/termbits.h @@ -4,8 +4,8 @@ #include -typedef unsigned char cc_t; -typedef unsigned int speed_t; +#include + typedef unsigned int tcflag_t; /* @@ -72,33 +72,17 @@ struct ktermios { #define VTIME 17 /* c_iflag bits */ -#define IGNBRK 0x1 -#define BRKINT 0x2 -#define IGNPAR 0x4 -#define PARMRK 0x8 -#define INPCK 0x00010 -#define ISTRIP 0x00020 -#define INLCR 0x00040 -#define IGNCR 0x00080 -#define ICRNL 0x00100 #define IXON 0x00200 #define IXOFF 0x00400 -#define IXANY 0x00800 #define IUCLC 0x01000 #define IMAXBEL0x02000 #define IUTF8 0x04000 /* c_oflag bits */ -#define OPOST 0x1 #define ONLCR 0x2 #define OLCUC 0x4 -#define OCRNL 0x8 -#define ONOCR 0x00010 -#define ONLRET 0x00020 -#define OFILL 0x40 -#define OFDEL 0x80 #define NLDLY 0x000300 #define NL0 0x00 #define NL1 0x000100 @@ -131,24 +115,6 @@ struct ktermios { /* c_cflag bit meaning */ #define CBAUD 0x001f -#define B00x /* hang up */ -#define B50 0x0001 -#define B75 0x0002 -#define B110 0x0003 -#define B134 0x0004 -#define B150 0x0005 -#define B200 0x0006 -#define B300 0x0007 -#define B600 0x0008 -#define B1200 0x0009 -#define B1800 0x000a -#define B2400 0x000b -#define B4800 0x000c -#define B9600 0x000d -#define B192000x000e -#define B384000x000f -#define EXTA B19200 -#define EXTB B38400 #define CBAUDEX0x #define B576000x0010 #define B115200 0x0011 @@ -180,11 +146,8 @@ struct ktermios { #define HUPCL 0x4000 #define CLOCAL 0x8000 -#define CMSPAR 0x4000 /* mark or space (stick) parity */ -#define CRTSCTS0x8000 /* flow control */ #define CIBAUD 0x1f -#define IBSHIFT16 /* c_lflag bits */ #define ISIG 0x0080 @@ -204,17 +167,6 @@ struct ktermios { #define IEXTEN 0x0400 #define EXTPROC0x1000 -/* Values for the ACTION argument to `tcflow'. */ -#defineTCOOFF 0 -#defineTCOON 1 -#defineTCIOFF 2 -#defineTCION 3 - -/* Values for the QUEUE_SELECTOR argument to `tcflush'. */ -#defineTCIFLUSH0 -#defineTCOFLUSH1 -#defineTCIOFLUSH 2 - /* Values for the OPTIONAL_ACTIONS argument to `tcsetattr'. */ #defineTCSANOW 0 #defineTCSADRAIN 1 diff --git a/arch/mips/include/uapi/asm/termbits.h b/arch/mips/include/uapi/asm/termbits.h index d1315ccdb39a..b33f514315ce 100644 --- a/arch/mips/include/uapi/asm/termbits.h +++ b/arch/mips/include/uapi/asm/termbits.h @@ -13,8 +13,8 @@ #include -typedef unsigned char cc_t; -typedef unsigned int speed_t; +#include + typedef unsigned int tcflag_t; /* @@ -80,31 +80,15 @@ struct ktermios { #define VEOL 17 /* End-of-line character [ICANON]. */ /* c_iflag bits */ -#define IGNBRK 0x1 /* Ignore break condition. */ -#define BRKINT 0x2 /* Signal interrupt on break. */ -#define IGNPAR 0x4 /* Ignore characters with parity errors. */ -#define PARMRK 0x8 /* Mark parity and framing errors. */ -#define INPCK 0x00010 /* Enable input parity check. */ -#define ISTRIP 0x00020 /* Strip 8th bit off characters. */ -#define INLCR 0x00040 /* Map NL to CR on input. */ -#define IGNCR 0x00080 /* Ignore CR. */ -#define ICRNL 0x00100 /* Map CR to NL on input. */ #define IUCLC 0x00200 /* Map upper case to lower case on input. */ #define IXON 0x00400 /* Enable start/stop output control. */ -#define IXANY 0x00800 /* Any character will restart after stop. */ #define IXOFF
[PATCH 0/3] termbits.h: Further improvements
Again, I prefer Greg to take these through his tty tree. Changes done by this serie: 1) Create termbits-common.h for the most obvious termbits.h intersection. 2) Reformat some lines that remain in termbits.h files. 3) Don't include posix_types.h unnecessarily. Please do check I got the uapi include things done correctly! That is, that including using asm-generic/termbits-common.h path is the preferred approach for a header file that is not supposed to be overridden by the arch specific header files and that mandatory-y is not required for termbits-common.h. Unfortunately I couldn't move also tcflag_t into termbits-common.h due to the way it is being defined for sparc. However, by the looks of how the type for tcflag_t is being chosen there, having just unsigned int might work also for sparc? Ilpo Järvinen (3): termbits.h: create termbits-common.h for identical bits termbits.h: Align lines & format termbits.h: Remove posix_types.h include arch/alpha/include/uapi/asm/termbits.h | 182 ++--- arch/mips/include/uapi/asm/termbits.h | 209 --- arch/parisc/include/uapi/asm/termbits.h| 131 arch/powerpc/include/uapi/asm/termbits.h | 152 +- arch/sparc/include/uapi/asm/termbits.h | 223 - include/uapi/asm-generic/termbits-common.h | 65 ++ include/uapi/asm-generic/termbits.h| 129 7 files changed, 418 insertions(+), 673 deletions(-) create mode 100644 include/uapi/asm-generic/termbits-common.h -- 2.30.2
Re: [PATCH v2 1/3] mm: change huge_ptep_clear_flush() to return the original pte
On 5/9/2022 1:46 PM, Christophe Leroy wrote: Le 08/05/2022 à 15:09, Baolin Wang a écrit : On 5/8/2022 7:09 PM, Muchun Song wrote: On Sun, May 08, 2022 at 05:36:39PM +0800, Baolin Wang wrote: It is incorrect to use ptep_clear_flush() to nuke a hugetlb page table when unmapping or migrating a hugetlb page, and will change to use huge_ptep_clear_flush() instead in the following patches. So this is a preparation patch, which changes the huge_ptep_clear_flush() to return the original pte to help to nuke a hugetlb page table. Signed-off-by: Baolin Wang Acked-by: Mike Kravetz Reviewed-by: Muchun Song Thanks for reviewing. But one nit below: [...] diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 8605d7e..61a21af 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -5342,7 +5342,7 @@ static vm_fault_t hugetlb_wp(struct mm_struct *mm, struct vm_area_struct *vma, ClearHPageRestoreReserve(new_page); /* Break COW or unshare */ - huge_ptep_clear_flush(vma, haddr, ptep); + (void)huge_ptep_clear_flush(vma, haddr, ptep); Why add a "(void)" here? Is there any warning if no "(void)"? IIUC, I think we can remove this, right? I did not meet any warning without the casting, but this is per Mike's comment[1] to make the code consistent with other functions casting to void type explicitly in hugetlb.c file. [1] https://lore.kernel.org/all/495c4ebe-a5b4-afb6-4cb0-956c1b18d...@oracle.com/ As far as I understand, Mike said that you should be accompagnied with a big fat comment explaining why we ignore the returned value from huge_ptep_clear_flush(). > By the way huge_ptep_clear_flush() is not declared 'must_check' so this cast is just visual polution and should be removed. In the meantime the comment suggested by Mike should be added instead. Sorry for my misunderstanding. I just follow the explicit void casting like other places in hugetlb.c file. And I am not sure if it is useful adding some comments like below, since we did not need the original pte value in the COW case mapping with a new page, and the code is more readable already I think. Mike, could you help to clarify what useful comments would you like? and remove the explicit void casting? Thanks. /* * Just ignore the return value with new page mapped. */
Re: [PATCH kernel] powerpc/llvm/lto: Allow LLVM LTO builds
On 5/9/22 15:18, Alexey Kardashevskiy wrote: On 5/4/22 07:21, Nick Desaulniers wrote: On Thu, Apr 28, 2022 at 11:46 PM Alexey Kardashevskiy wrote: This enables LTO_CLANG builds on POWER with the upstream version of LLVM. LTO optimizes the output vmlinux binary and this may affect the FTP alternative section if alt branches use "bc" (Branch Conditional) which is limited by 16 bit offsets. This shows up in errors like: ld.lld: error: InputSection too large for range extension thunk vmlinux.o:(__ftr_alt_97+0xF0) This works around the issue by replacing "bc" in FTR_SECTION_ELSE with "b" which allows 26 bit offsets. This catches the problem instructions in vmlinux.o before it LTO'ed: $ objdump -d -M raw -j __ftr_alt_97 vmlinux.o | egrep '\S+\s*\' 30: 00 00 82 40 bc 4,eq,30 <__ftr_alt_97+0x30> f0: 00 00 82 40 bc 4,eq,f0 <__ftr_alt_97+0xf0> This allows LTO builds for ppc64le_defconfig plus LTO options. Note that DYNAMIC_FTRACE/FUNCTION_TRACER is not supported by LTO builds but this is not POWERPC-specific. $ ARCH=powerpc make LLVM=1 -j72 ppc64le_defconfig $ ARCH=powerpc make LLVM=1 -j72 menuconfig $ ARCH=powerpc make LLVM=1 -j72 ... VDSO64L arch/powerpc/kernel/vdso/vdso64.so.dbg /usr/bin/powerpc64le-linux-gnu-ld: /android0/llvm-project/llvm/build/bin/../lib/LLVMgold.so: error loading plugin: /android0/llvm-project/llvm/build/bin/../lib/LLVMgold.so: cannot open shared object file: No such file or directory clang-15: error: linker command failed with exit code 1 (use -v to see invocation) make[1]: *** [arch/powerpc/kernel/vdso/Makefile:67: arch/powerpc/kernel/vdso/vdso64.so.dbg] Error 1 Looks like LLD isn't being invoked correctly to link the vdso. Probably need to revisit https://lore.kernel.org/lkml/20200901222523.1941988-1-ndesaulni...@google.com/ How were you working around this issue? Perhaps you built clang to default to LLD? (there's a cmake option for that) What option is that? I only add -DLLVM_ENABLE_LLD=ON which (I think) tells cmake to use lld to link the LLVM being built but does not seem to tell what the built clang should do. Without -DLLVM_ENABLE_LLD=ON, building just fails: [fstn1-p1 ~/pbuild/llvm/llvm-lto-latest-cleanbuild]$ ninja -j 100 [619/3501] Linking CXX executable bin/not FAILED: bin/not : && /usr/bin/clang++ -fPIC -fvisibility-inlines-hidden -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wc++98-compat-extra-semi -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wstring-conversion -Wmisleading-indentation -fdiagnostics-color -ffunction-sections -fdata-sections -flto -O3 -DNDEBUG -flto -Wl,-rpath-link,/home/aik/pbuild/llvm/llvm-lto-latest-cleanbuild/./lib -Wl,--gc-sections utils/not/CMakeFiles/not.dir/not.cpp.o -o bin/not -Wl,-rpath,"\$ORIGIN/../lib" -lpthread lib/libLLVMSupport.a -lrt -ldl -lpthread -lm /usr/lib/powerpc64le-linux-gnu/libz.so /usr/lib/powerpc64le-linux-gnu/libtinfo.so lib/libLLVMDemangle.a && : /usr/bin/ld: lib/libLLVMSupport.a: error adding symbols: archive has no index; run ranlib to add one clang: error: linker command failed with exit code 1 (use -v to see invocation) [701/3501] Building CXX object utils/TableGen/CMakeFiles/llvm-tblgen.dir/GlobalISelEmitter.cpp.o ninja: build stopped: subcommand failed. My head hurts :( The above example is running on PPC. Now I am trying x86 box: A bit of progress. cmake -G Ninja -DLLVM_ENABLE_PROJECTS="clang;lld" -DLLVM_TARGET_ARCH=PowerPC -DLLVM_TARGETS_TO_BUILD=PowerPC ~/llvm-project//llvm -DLLVM_ENABLE_LTO=ON -DLLVM_BINUTILS_INCDIR=/usr/lib/gcc/powerpc64le-linux-gnu/11/plugin/include/ -DCMAKE_BUILD_TYPE=Release produces: -- Native target architecture is PowerPC -- LLVM host triple: x86_64-unknown-linux-gnu -- LLVM default target triple: x86_64-unknown-linux-gnu and the resulting "clang" can only to "Target: x86_64-unknown-linux-gnu", how do you build LLVM exactly? Thanks,
[PATCH kernel] KVM: PPC: Book3s: Remove real mode interrupt controller hcalls handlers
Currently we have 2 sets of interrupt controller hypercalls handlers for real and virtual modes, this is from POWER8 times when switching MMU on was considered an expensive operation. POWER9 however does not have dependent threads and MMU is enabled for handling hcalls so the XIVE native or XICS-on-XIVE real mode handlers never execute on real P9 and later CPUs. This untemplate the handlers and only keeps the real mode handlers for XICS native (up to POWER8) and remove the rest of dead code. Changes in functions are mechanical except few missing empty lines to make checkpatch.pl happy. The default implemented hcalls list already contains XICS hcalls so no change there. This should not cause any behavioral change. Signed-off-by: Alexey Kardashevskiy --- Minus 153 lines nevertheless. --- arch/powerpc/kvm/Makefile | 2 +- arch/powerpc/include/asm/kvm_ppc.h | 7 - arch/powerpc/kvm/book3s_xive.h | 7 - arch/powerpc/kvm/book3s_hv_builtin.c| 64 --- arch/powerpc/kvm/book3s_hv_rm_xics.c| 5 + arch/powerpc/kvm/book3s_hv_rm_xive.c| 46 -- arch/powerpc/kvm/book3s_xive.c | 638 +++- arch/powerpc/kvm/book3s_xive_template.c | 636 --- arch/powerpc/kvm/book3s_hv_rmhandlers.S | 12 +- 9 files changed, 632 insertions(+), 785 deletions(-) delete mode 100644 arch/powerpc/kvm/book3s_hv_rm_xive.c delete mode 100644 arch/powerpc/kvm/book3s_xive_template.c diff --git a/arch/powerpc/kvm/Makefile b/arch/powerpc/kvm/Makefile index 8e3681a86074..f17379b0f161 100644 --- a/arch/powerpc/kvm/Makefile +++ b/arch/powerpc/kvm/Makefile @@ -73,7 +73,7 @@ kvm-hv-$(CONFIG_PPC_TRANSACTIONAL_MEM) += \ book3s_hv_tm.o kvm-book3s_64-builtin-xics-objs-$(CONFIG_KVM_XICS) := \ - book3s_hv_rm_xics.o book3s_hv_rm_xive.o + book3s_hv_rm_xics.o kvm-book3s_64-builtin-tm-objs-$(CONFIG_PPC_TRANSACTIONAL_MEM) += \ book3s_hv_tm_builtin.o diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h index 44200a27371b..a775377a570e 100644 --- a/arch/powerpc/include/asm/kvm_ppc.h +++ b/arch/powerpc/include/asm/kvm_ppc.h @@ -787,13 +787,6 @@ long kvmppc_rm_h_page_init(struct kvm_vcpu *vcpu, unsigned long flags, unsigned long dest, unsigned long src); long kvmppc_hpte_hv_fault(struct kvm_vcpu *vcpu, unsigned long addr, unsigned long slb_v, unsigned int status, bool data); -unsigned long kvmppc_rm_h_xirr(struct kvm_vcpu *vcpu); -unsigned long kvmppc_rm_h_xirr_x(struct kvm_vcpu *vcpu); -unsigned long kvmppc_rm_h_ipoll(struct kvm_vcpu *vcpu, unsigned long server); -int kvmppc_rm_h_ipi(struct kvm_vcpu *vcpu, unsigned long server, -unsigned long mfrr); -int kvmppc_rm_h_cppr(struct kvm_vcpu *vcpu, unsigned long cppr); -int kvmppc_rm_h_eoi(struct kvm_vcpu *vcpu, unsigned long xirr); void kvmppc_guest_entry_inject_int(struct kvm_vcpu *vcpu); /* diff --git a/arch/powerpc/kvm/book3s_xive.h b/arch/powerpc/kvm/book3s_xive.h index 09d0657596c3..1e48f72e8aa5 100644 --- a/arch/powerpc/kvm/book3s_xive.h +++ b/arch/powerpc/kvm/book3s_xive.h @@ -285,13 +285,6 @@ static inline u32 __xive_read_eq(__be32 *qpage, u32 msk, u32 *idx, u32 *toggle) return cur & 0x7fff; } -extern unsigned long xive_rm_h_xirr(struct kvm_vcpu *vcpu); -extern unsigned long xive_rm_h_ipoll(struct kvm_vcpu *vcpu, unsigned long server); -extern int xive_rm_h_ipi(struct kvm_vcpu *vcpu, unsigned long server, -unsigned long mfrr); -extern int xive_rm_h_cppr(struct kvm_vcpu *vcpu, unsigned long cppr); -extern int xive_rm_h_eoi(struct kvm_vcpu *vcpu, unsigned long xirr); - /* * Common Xive routines for XICS-over-XIVE and XIVE native */ diff --git a/arch/powerpc/kvm/book3s_hv_builtin.c b/arch/powerpc/kvm/book3s_hv_builtin.c index 7e52d0beee77..88a8f6473c4e 100644 --- a/arch/powerpc/kvm/book3s_hv_builtin.c +++ b/arch/powerpc/kvm/book3s_hv_builtin.c @@ -489,70 +489,6 @@ static long kvmppc_read_one_intr(bool *again) return kvmppc_check_passthru(xisr, xirr, again); } -#ifdef CONFIG_KVM_XICS -unsigned long kvmppc_rm_h_xirr(struct kvm_vcpu *vcpu) -{ - if (!kvmppc_xics_enabled(vcpu)) - return H_TOO_HARD; - if (xics_on_xive()) - return xive_rm_h_xirr(vcpu); - else - return xics_rm_h_xirr(vcpu); -} - -unsigned long kvmppc_rm_h_xirr_x(struct kvm_vcpu *vcpu) -{ - if (!kvmppc_xics_enabled(vcpu)) - return H_TOO_HARD; - vcpu->arch.regs.gpr[5] = get_tb(); - if (xics_on_xive()) - return xive_rm_h_xirr(vcpu); - else - return xics_rm_h_xirr(vcpu); -} - -unsigned long kvmppc_rm_h_ipoll(struct kvm_vcpu *vcpu, unsigned long server) -{ - if (!kvmppc_xics_enabled(vcpu)) - return H_TOO_HARD; - if (xics_on_xive()) - return xive_rm_h_ipoll(vcpu, server); - else -
Re: [PATCH v2 3/3] mm: rmap: Fix CONT-PTE/PMD size hugetlb issue when unmapping
On Sun, May 08, 2022 at 05:36:41PM +0800, Baolin Wang wrote: > On some architectures (like ARM64), it can support CONT-PTE/PMD size > hugetlb, which means it can support not only PMD/PUD size hugetlb: > 2M and 1G, but also CONT-PTE/PMD size: 64K and 32M if a 4K page > size specified. > > When unmapping a hugetlb page, we will get the relevant page table > entry by huge_pte_offset() only once to nuke it. This is correct > for PMD or PUD size hugetlb, since they always contain only one > pmd entry or pud entry in the page table. > > However this is incorrect for CONT-PTE and CONT-PMD size hugetlb, > since they can contain several continuous pte or pmd entry with > same page table attributes, so we will nuke only one pte or pmd > entry for this CONT-PTE/PMD size hugetlb page. > > And now try_to_unmap() is only passed a hugetlb page in the case > where the hugetlb page is poisoned. Which means now we will unmap > only one pte entry for a CONT-PTE or CONT-PMD size poisoned hugetlb > page, and we can still access other subpages of a CONT-PTE or CONT-PMD > size poisoned hugetlb page, which will cause serious issues possibly. > > So we should change to use huge_ptep_clear_flush() to nuke the > hugetlb page table to fix this issue, which already considered > CONT-PTE and CONT-PMD size hugetlb. > > We've already used set_huge_swap_pte_at() to set a poisoned > swap entry for a poisoned hugetlb page. Meanwhile adding a VM_BUG_ON() > to make sure the passed hugetlb page is poisoned in try_to_unmap(). > > Signed-off-by: Baolin Wang Reviewed-by: Muchun Song Thanks.
[PATCH] powerpc/papr_scm: Fix leaking nvdimm_events_map elements
Right now 'char *' elements allocated individual 'stat_id' in 'papr_scm_priv.nvdimm_events_map' during papr_scm_pmu_check_events() leak in papr_scm_remove() and papr_scm_pmu_register(), papr_scm_pmu_check_events() error paths. Also individual 'stat_id' arent NULL terminated 'char *' instead they are fixed 8-byte sized identifiers. However papr_scm_pmu_register() assumes it to be a NULL terminated 'char *' and at other places it assumes it to be a 'papr_scm_perf_stat.stat_id' sized string which is 8-byes in size. Fix this by allocating the memory for papr_scm_priv.nvdimm_events_map to also include space for 'stat_id' entries. This is possible since number of available events/stat_ids are known upfront. This saves some memory and one extra level of indirection from 'nvdimm_events_map' to 'stat_id'. Also rest of the code can continue to call 'kfree(papr_scm_priv.nvdimm_events_map)' without needing to iterate over the array and free up individual elements. Also introduce a new typedef called 'state_id_t' thats a 'u8[8]' and can be used across papr_scm to deal with stat_ids. Fixes: 4c08d4bbc089 ("powerpc/papr_scm: Add perf interface support") Signed-off-by: Vaibhav Jain --- arch/powerpc/platforms/pseries/papr_scm.c | 48 +++ 1 file changed, 22 insertions(+), 26 deletions(-) diff --git a/arch/powerpc/platforms/pseries/papr_scm.c b/arch/powerpc/platforms/pseries/papr_scm.c index 39962c905542..f33a865ad5fb 100644 --- a/arch/powerpc/platforms/pseries/papr_scm.c +++ b/arch/powerpc/platforms/pseries/papr_scm.c @@ -70,8 +70,10 @@ #define PAPR_SCM_PERF_STATS_VERSION 0x1 /* Struct holding a single performance metric */ +typedef u8 stat_id_t[8]; + struct papr_scm_perf_stat { - u8 stat_id[8]; + stat_id_t stat_id; __be64 stat_val; } __packed; @@ -126,7 +128,7 @@ struct papr_scm_priv { u64 health_bitmap_inject_mask; /* array to have event_code and stat_id mappings */ - char **nvdimm_events_map; + stat_id_t *nvdimm_events_map; }; static int papr_scm_pmem_flush(struct nd_region *nd_region, @@ -462,7 +464,7 @@ static int papr_scm_pmu_check_events(struct papr_scm_priv *p, struct nvdimm_pmu { struct papr_scm_perf_stat *stat; struct papr_scm_perf_stats *stats; - int index, rc, count; + int index, rc = 0; u32 available_events; if (!p->stat_buffer_len) @@ -478,35 +480,29 @@ static int papr_scm_pmu_check_events(struct papr_scm_priv *p, struct nvdimm_pmu return rc; } - /* Allocate memory to nvdimm_event_map */ - p->nvdimm_events_map = kcalloc(available_events, sizeof(char *), GFP_KERNEL); - if (!p->nvdimm_events_map) { - rc = -ENOMEM; - goto out_stats; - } - /* Called to get list of events supported */ rc = drc_pmem_query_stats(p, stats, 0); if (rc) - goto out_nvdimm_events_map; - - for (index = 0, stat = stats->scm_statistic, count = 0; -index < available_events; index++, ++stat) { - p->nvdimm_events_map[count] = kmemdup_nul(stat->stat_id, 8, GFP_KERNEL); - if (!p->nvdimm_events_map[count]) { - rc = -ENOMEM; - goto out_nvdimm_events_map; - } + goto out; - count++; + /* +* Allocate memory and populate nvdimm_event_map. +* Allocate an extra element for NULL entry +*/ + p->nvdimm_events_map = kcalloc(available_events + 1, + sizeof(stat_id_t), GFP_KERNEL); + if (!p->nvdimm_events_map) { + rc = -ENOMEM; + goto out; } - p->nvdimm_events_map[count] = NULL; - kfree(stats); - return 0; -out_nvdimm_events_map: - kfree(p->nvdimm_events_map); -out_stats: + /* Copy all stat_ids to event map */ + for (index = 0, stat = stats->scm_statistic; +index < available_events; index++, ++stat) { + memcpy(>nvdimm_events_map[index], >stat_id, + sizeof(stat_id_t)); + } +out: kfree(stats); return rc; } base-commit: 348c71344111d7a48892e3e52264ff11956fc196 -- 2.35.1