[PATCH] arch/powerpc: Remove duplicate ifdefs

2024-02-15 Thread Shrikanth Hegde
When a ifdef is used in the below manner, second one could be considered as duplicate. ifdef DEFINE_A ...code block... ifdef DEFINE_A <-- This is a duplicate. ...code block... endif else ifndef DEFINE_A <-- This is also duplicate. ...code block... endif endif More details about the

Re: [PATCH v2] uapi/auxvec: Define AT_HWCAP3 and AT_HWCAP4 aux vector, entries

2024-02-15 Thread Peter Bergner
On 2/15/24 7:49 PM, Michael Ellerman wrote: > Peter Bergner writes: >> On 2/15/24 2:16 AM, Arnd Bergmann wrote: >>> On Wed, Feb 14, 2024, at 23:34, Peter Bergner wrote: Arnd, we seem to have consensus on the patch below. Is this something you could take and apply to your tree?

Re: [RFC PATCH 1/5] powerpc/smp: Adjust nr_cpu_ids to cover all threads of a core

2024-02-15 Thread Pingfan Liu
On Thu, Feb 15, 2024 at 9:09 PM Michael Ellerman wrote: > > On Fri, 29 Dec 2023 23:01:03 +1100, Michael Ellerman wrote: > > If nr_cpu_ids is too low to include at least all the threads of a single > > core adjust nr_cpu_ids upwards. This avoids triggering odd bugs in code > > that assumes all

Re: [PATCH v2] uapi/auxvec: Define AT_HWCAP3 and AT_HWCAP4 aux vector, entries

2024-02-15 Thread Michael Ellerman
Peter Bergner writes: > On 2/15/24 2:16 AM, Arnd Bergmann wrote: >> On Wed, Feb 14, 2024, at 23:34, Peter Bergner wrote: >>> The powerpc toolchain keeps a copy of the HWCAP bit masks in our TCB for >>> fast >>> access by the __builtin_cpu_supports built-in function. The TCB space for >>> the

Re: [PATCH 0/7] macintosh: Convert to platform remove callback returning void

2024-02-15 Thread Michael Ellerman
Uwe Kleine-König writes: > Hello, > > On Wed, Jan 10, 2024 at 04:42:47PM +0100, Uwe Kleine-König wrote: >> Hello, >> >> this series converts all drivers below drivers/macintosh to use >> .remove_new(). See commit 5c5a7680e67b ("platform: Provide a remove >> callback that returns no value") for

[PATCH v3] powerpc/pseries/iommu: DLPAR ADD of pci device doesn't completely initialize pci_controller structure

2024-02-15 Thread Gaurav Batra
When a PCI device is Dynamically added, LPAR OOPS with NULL pointer exception. Complete stack is as below [ 211.239206] BUG: Kernel NULL pointer dereference on read at 0x0030 [ 211.239210] Faulting instruction address: 0xc06bbe5c [ 211.239214] Oops: Kernel access of bad area, sig:

Re: [PATCH] selftests: powerpc: Add header symlinks for building papr character device tests

2024-02-15 Thread Michal Suchánek
On Thu, Feb 15, 2024 at 01:39:27PM -0600, Nathan Lynch wrote: > Michal Suchánek writes: > > On Thu, Feb 15, 2024 at 01:13:34PM -0600, Nathan Lynch wrote: > >> Michal Suchanek writes: > >> > > >> > Without the headers the tests don't build. > >> > > >> > Fixes: 9118c5d32bdd ("powerpc/selftests:

Re: [PATCH 0/7] macintosh: Convert to platform remove callback returning void

2024-02-15 Thread Uwe Kleine-König
Hello, On Wed, Jan 10, 2024 at 04:42:47PM +0100, Uwe Kleine-König wrote: > Hello, > > this series converts all drivers below drivers/macintosh to use > .remove_new(). See commit 5c5a7680e67b ("platform: Provide a remove > callback that returns no value") for an extended explanation and the >

Re: [PATCH] selftests: powerpc: Add header symlinks for building papr character device tests

2024-02-15 Thread Nathan Lynch
Michal Suchánek writes: > On Thu, Feb 15, 2024 at 01:13:34PM -0600, Nathan Lynch wrote: >> Michal Suchanek writes: >> > >> > Without the headers the tests don't build. >> > >> > Fixes: 9118c5d32bdd ("powerpc/selftests: Add test for papr-vpd") >> > Fixes: 76b2ec3faeaa ("powerpc/selftests: Add

Re: [PATCH] selftests: powerpc: Add header symlinks for building papr character device tests

2024-02-15 Thread Michal Suchánek
On Thu, Feb 15, 2024 at 01:13:34PM -0600, Nathan Lynch wrote: > Michal Suchanek writes: > > > > Without the headers the tests don't build. > > > > Fixes: 9118c5d32bdd ("powerpc/selftests: Add test for papr-vpd") > > Fixes: 76b2ec3faeaa ("powerpc/selftests: Add test for papr-sysparm") > >

Re: [PATCH v6 11/18] arm64/mm: Split __flush_tlb_range() to elide trailing DSB

2024-02-15 Thread Catalin Marinas
On Thu, Feb 15, 2024 at 10:31:58AM +, Ryan Roberts wrote: > Split __flush_tlb_range() into __flush_tlb_range_nosync() + > __flush_tlb_range(), in the same way as the existing flush_tlb_page() > arrangement. This allows calling __flush_tlb_range_nosync() to elide the > trailing DSB. Forthcoming

Re: [PATCH v6 10/18] arm64/mm: New ptep layer to manage contig bit

2024-02-15 Thread Catalin Marinas
On Thu, Feb 15, 2024 at 10:31:57AM +, Ryan Roberts wrote: > Create a new layer for the in-table PTE manipulation APIs. For now, The > existing API is prefixed with double underscore to become the > arch-private API and the public API is just a simple wrapper that calls > the private API. > >

Re: [PATCH v3 RESEND 3/6] bitmap: Make bitmap_onto() available to users

2024-02-15 Thread Andy Shevchenko
On Thu, Feb 15, 2024 at 06:46:12PM +0100, Herve Codina wrote: > On Mon, 12 Feb 2024 11:13:13 -0800 > Yury Norov wrote: ... > > That's I agree. Scatter/gather from your last approach sound better. > > Do you plan to send a v2? See below. ... > > I think your scatter/gather is better then this

Re: [kvm-unit-tests PATCH v1 01/18] Makefile: Define __ASSEMBLY__ for assembly files

2024-02-15 Thread Andrew Jones
On Thu, Feb 15, 2024 at 05:16:01PM +, Alexandru Elisei wrote: > Hi Drew, > > On Thu, Feb 15, 2024 at 05:32:22PM +0100, Andrew Jones wrote: > > On Thu, Feb 15, 2024 at 04:05:56PM +, Alexandru Elisei wrote: > > > Hi Drew, > > > > > > On Mon, Jan 15, 2024 at 01:44:17PM +0100, Andrew Jones

Re: [PATCH] selftests: powerpc: Add header symlinks for building papr character device tests

2024-02-15 Thread Nathan Lynch
Michal Suchanek writes: > > Without the headers the tests don't build. > > Fixes: 9118c5d32bdd ("powerpc/selftests: Add test for papr-vpd") > Fixes: 76b2ec3faeaa ("powerpc/selftests: Add test for papr-sysparm") > Signed-off-by: Michal Suchanek > --- >

Re: [PATCH v6 09/18] arm64/mm: Convert ptep_clear() to ptep_get_and_clear()

2024-02-15 Thread Catalin Marinas
On Thu, Feb 15, 2024 at 10:31:56AM +, Ryan Roberts wrote: > ptep_clear() is a generic wrapper around the arch-implemented > ptep_get_and_clear(). We are about to convert ptep_get_and_clear() into > a public version and private version (__ptep_get_and_clear()) to support > the transparent

Re: [PATCH v6 08/18] arm64/mm: Convert set_pte_at() to set_ptes(..., 1)

2024-02-15 Thread Catalin Marinas
On Thu, Feb 15, 2024 at 10:31:55AM +, Ryan Roberts wrote: > Since set_ptes() was introduced, set_pte_at() has been implemented as a > generic macro around set_ptes(..., 1). So this change should continue to > generate the same code. However, making this change prepares us for the > transparent

Re: [PATCH v6 07/18] arm64/mm: Convert READ_ONCE(*ptep) to ptep_get(ptep)

2024-02-15 Thread Catalin Marinas
On Thu, Feb 15, 2024 at 10:31:54AM +, Ryan Roberts wrote: > There are a number of places in the arch code that read a pte by using > the READ_ONCE() macro. Refactor these call sites to instead use the > ptep_get() helper, which itself is a READ_ONCE(). Generated code should > be the same. > >

Re: [PATCH v6 04/18] arm64/mm: Convert pte_next_pfn() to pte_advance_pfn()

2024-02-15 Thread Catalin Marinas
On Thu, Feb 15, 2024 at 10:31:51AM +, Ryan Roberts wrote: > Core-mm needs to be able to advance the pfn by an arbitrary amount, so > override the new pte_advance_pfn() API to do so. > > Signed-off-by: Ryan Roberts Acked-by: Catalin Marinas

Re: [PATCH v3 RESEND 3/6] bitmap: Make bitmap_onto() available to users

2024-02-15 Thread Herve Codina
Hi Andy, Yury, On Mon, 12 Feb 2024 11:13:13 -0800 Yury Norov wrote: ... > > That's I agree. Scatter/gather from your last approach sound better. > Do you plan to send a v2? > ... > > I think your scatter/gather is better then this onto/off by naming and > implementation. If you'll send a

Re: [kvm-unit-tests PATCH v1 01/18] Makefile: Define __ASSEMBLY__ for assembly files

2024-02-15 Thread Alexandru Elisei
Hi Drew, On Thu, Feb 15, 2024 at 05:32:22PM +0100, Andrew Jones wrote: > On Thu, Feb 15, 2024 at 04:05:56PM +, Alexandru Elisei wrote: > > Hi Drew, > > > > On Mon, Jan 15, 2024 at 01:44:17PM +0100, Andrew Jones wrote: > > > On Thu, Nov 30, 2023 at 04:07:03AM -0500, Shaoqin Huang wrote: > > >

[PATCH] selftests: powerpc: Add header symlinks for building papr character device tests

2024-02-15 Thread Michal Suchanek
From: root Without the headers the tests don't build. Fixes: 9118c5d32bdd ("powerpc/selftests: Add test for papr-vpd") Fixes: 76b2ec3faeaa ("powerpc/selftests: Add test for papr-sysparm") Signed-off-by: Michal Suchanek --- tools/testing/selftests/powerpc/include/asm/papr-miscdev.h | 1 +

Re: [kvm-unit-tests PATCH v1 01/18] Makefile: Define __ASSEMBLY__ for assembly files

2024-02-15 Thread Andrew Jones
On Thu, Feb 15, 2024 at 04:05:56PM +, Alexandru Elisei wrote: > Hi Drew, > > On Mon, Jan 15, 2024 at 01:44:17PM +0100, Andrew Jones wrote: > > On Thu, Nov 30, 2023 at 04:07:03AM -0500, Shaoqin Huang wrote: > > > From: Alexandru Elisei > > > > > > There are 25 header files today (found with

[RESEND PATCH net v4 1/2] soc: fsl: qbman: Always disable interrupts when taking cgr_lock

2024-02-15 Thread Sean Anderson
smp_call_function_single disables IRQs when executing the callback. To prevent deadlocks, we must disable IRQs when taking cgr_lock elsewhere. This is already done by qman_update_cgr and qman_delete_cgr; fix the other lockers. Fixes: 96f413f47677 ("soc/fsl/qbman: fix issue in

[RESEND PATCH net v4 2/2] soc: fsl: qbman: Use raw spinlock for cgr_lock

2024-02-15 Thread Sean Anderson
cgr_lock may be locked with interrupts already disabled by smp_call_function_single. As such, we must use a raw spinlock to avoid problems on PREEMPT_RT kernels. Although this bug has existed for a while, it was not apparent until commit ef2a8d5478b9 ("net: dpaa: Adjust queue depth on rate

Re: [kvm-unit-tests PATCH v1 01/18] Makefile: Define __ASSEMBLY__ for assembly files

2024-02-15 Thread Alexandru Elisei
Hi Drew, On Mon, Jan 15, 2024 at 01:44:17PM +0100, Andrew Jones wrote: > On Thu, Nov 30, 2023 at 04:07:03AM -0500, Shaoqin Huang wrote: > > From: Alexandru Elisei > > > > There are 25 header files today (found with grep -r "#ifndef __ASSEMBLY__) > > with functionality relies on the __ASSEMBLY__

Re: [PATCH v2] uapi/auxvec: Define AT_HWCAP3 and AT_HWCAP4 aux vector, entries

2024-02-15 Thread Peter Bergner
On 2/15/24 2:16 AM, Arnd Bergmann wrote: > On Wed, Feb 14, 2024, at 23:34, Peter Bergner wrote: >> The powerpc toolchain keeps a copy of the HWCAP bit masks in our TCB for fast >> access by the __builtin_cpu_supports built-in function. The TCB space for >> the HWCAP entries - which are created in

[powerpc:fixes-test] BUILD SUCCESS 0846dd77c8349ec92ca0079c9c71d130f34cb192

2024-02-15 Thread kernel test robot
-20240215 clang i386 buildonly-randconfig-002-20240215 clang i386 buildonly-randconfig-003-20240215 clang i386 buildonly-randconfig-004-20240215 clang i386 buildonly-randconfig-005-20240215 clang i386 buildonly-randconfig-006-20240215 clang i386

[powerpc:next] BUILD SUCCESS 14ce0dbb562713bc058ad16d281db355757e6ec0

2024-02-15 Thread kernel test robot
allmodconfig gcc i386 allnoconfig gcc i386 allyesconfig gcc i386 buildonly-randconfig-001-20240215 clang i386 buildonly-randconfig-002-20240215 clang i386 buildonly-randconfig-003-20240215 clang

Re: [PATCH v2] powerpc/iommu: Fix the iommu group reference leak during platform domain attach

2024-02-15 Thread Shivaprasad G Bhat
On 2/15/24 08:01, Michael Ellerman wrote: Shivaprasad G Bhat writes: The function spapr_tce_platform_iommu_attach_dev() is missing to call iommu_group_put() when the domain is already set. This refcount leak shows up with BUG_ON() during DLPAR remove operation as, [c013aed5fd10]

[PATCH] powerpc/iommu: Refactor spapr_tce_platform_iommu_attach_dev()

2024-02-15 Thread Shivaprasad G Bhat
The patch makes the iommu_group_get() call only when using it thereby avoiding the unnecessary get & put for domain already being set case. Reviewed-by: Jason Gunthorpe Signed-off-by: Shivaprasad G Bhat --- Changelog: v2:

Re: [PATCH v2 0/5] powerpc: struct bus_type cleanup

2024-02-15 Thread Michael Ellerman
On Mon, 12 Feb 2024 17:04:58 -0300, Ricardo B. Marliere wrote: > This series is part of an effort to cleanup the users of the driver > core, as can be seen in many recent patches authored by Greg across the > tree (e.g. [1]). Patch 1/5 is a prerequisite to 2/5, but the others have > no dependency.

Re: [RFC PATCH 1/5] powerpc/smp: Adjust nr_cpu_ids to cover all threads of a core

2024-02-15 Thread Michael Ellerman
On Fri, 29 Dec 2023 23:01:03 +1100, Michael Ellerman wrote: > If nr_cpu_ids is too low to include at least all the threads of a single > core adjust nr_cpu_ids upwards. This avoids triggering odd bugs in code > that assumes all threads of a core are available. > > Applied to powerpc/next.

Re: [PATCH] powerpc: Force inlining of arch_vmap_p{u/m}d_supported()

2024-02-15 Thread Michael Ellerman
On Tue, 13 Feb 2024 14:58:37 +0100, Christophe Leroy wrote: > arch_vmap_pud_supported() and arch_vmap_pmd_supported() are > expected to constant-fold to false when RADIX is not enabled. > > Force inlining in order to avoid following failure which > leads to unexpected call of non-existing

Re: [PING PATCH] powerpc/kasan: Fix addr error caused by page alignment

2024-02-15 Thread Michael Ellerman
On Tue, 23 Jan 2024 09:45:59 +0800, Jiangfeng Xiao wrote: > In kasan_init_region, when k_start is not page aligned, > at the begin of for loop, k_cur = k_start & PAGE_MASK > is less than k_start, and then va = block + k_cur - k_start > is less than block, the addr va is invalid, because the >

Re: [PATCH] powerpc/pseries: fix accuracy of stolen time

2024-02-15 Thread Michael Ellerman
On Tue, 13 Feb 2024 10:56:35 +0530, Shrikanth Hegde wrote: > powerVM hypervisor updates the VPA fields with stolen time data. > It currently reports enqueue_dispatch_tb and ready_enqueue_tb for > this purpose. In linux these two fields are used to report the stolen time. > > The VPA fields are

Re: [PATCH] powerpc/iommu: Fix the missing iommu_group_put() during platform domain attach

2024-02-15 Thread Michael Ellerman
On Tue, 13 Feb 2024 10:05:22 -0600, Shivaprasad G Bhat wrote: > The function spapr_tce_platform_iommu_attach_dev() is missing to call > iommu_group_put() when the domain is already set. This refcount leak > shows up with BUG_ON() during DLPAR remove operation as, > > KernelBug: Kernel bug in

Re: [PATCH] papr_vpd.c: calling devfd before get_system_loc_code

2024-02-15 Thread Michael Ellerman
On Wed, 31 Jan 2024 18:38:59 +0530, R Nageswara Sastry wrote: > Calling get_system_loc_code before checking devfd and errno - fails the test > when the device is not available, expected a SKIP. > Change the order of 'SKIP_IF_MSG' correctly SKIP when the /dev/papr-vpd device > is not available. >

Re: [PATCH v2] powerpc/ftrace: Ignore ftrace locations in exit text sections

2024-02-15 Thread Michael Ellerman
On Tue, 13 Feb 2024 23:24:10 +0530, Naveen N Rao wrote: > Michael reported that we are seeing ftrace bug on bootup when KASAN is > enabled, and if we are using -fpatchable-function-entry: > > ftrace: allocating 47780 entries in 18 pages > ftrace-powerpc: 0xc20b3d5c: No module

Re: [PATCH v2] powerpc/64: Set task pt_regs->link to the LR value on scv entry

2024-02-15 Thread Michael Ellerman
On Fri, 02 Feb 2024 21:13:16 +0530, Naveen N Rao wrote: > Nysal reported that userspace backtraces are missing in offcputime bcc > tool. As an example: > $ sudo ./bcc/tools/offcputime.py -uU > Tracing off-CPU time (us) of user threads by user stack... Hit Ctrl-C to > end. > > ^C >

Re: [PATCH] powerpc/pseries/papr-sysparm: use u8 arrays for payloads

2024-02-15 Thread Michael Ellerman
On Fri, 02 Feb 2024 18:26:46 -0600, Nathan Lynch wrote: > Some PAPR system parameter values are formatted by firmware as > nul-terminated strings (e.g. LPAR name, shared processor attributes). > But the values returned for other parameters, such as processor module > info and TLB block invalidate

Re: [PATCH 1/2] powerpc: udbg_memcons: mark functions static

2024-02-15 Thread Michael Ellerman
On Tue, 23 Jan 2024 13:51:41 +0100, Arnd Bergmann wrote: > ppc64_book3e_allmodconfig has one more driver that triggeres a > few missing-prototypes warnings: > > arch/powerpc/sysdev/udbg_memcons.c:44:6: error: no previous prototype for > 'memcons_putc' [-Werror=missing-prototypes] >

Re: [PATCH] powerpc/kasan: Limit KASAN thread size increase to 32KB

2024-02-15 Thread Michael Ellerman
On Mon, 12 Feb 2024 17:42:44 +1100, Michael Ellerman wrote: > KASAN is seen to increase stack usage, to the point that it was reported > to lead to stack overflow on some 32-bit machines (see link). > > To avoid overflows the stack size was doubled for KASAN builds in > commit 3e8635fb2e07

Re: [PATCH v2] powerpc/6xx: set High BAT Enable flag on G2_LE cores

2024-02-15 Thread Michael Ellerman
On Wed, 24 Jan 2024 11:38:38 +0100, Matthias Schiffer wrote: > MMU_FTR_USE_HIGH_BATS is set for G2_LE cores and derivatives like e300cX, > but the high BATs need to be enabled in HID2 to work. Add register > definitions and add the needed setup to __setup_cpu_603. > > This fixes boot on CPUs like

Re: [PATCH] powerpc/cputable: Add missing PPC_FEATURE_BOOKE on PPC64 Book-E

2024-02-15 Thread Michael Ellerman
On Wed, 07 Feb 2024 10:27:58 +0100, David Engraf wrote: > Commit e320a76db4b0 ("powerpc/cputable: Split cpu_specs[] out of cputable.h") > moved the cpu_specs to separate header files. Previously PPC_FEATURE_BOOKE > was enabled by CONFIG_PPC_BOOK3E_64. The definition in cpu_specs_e500mc.h > for

Re: [PATCH 0/2] ALSA: struct bus_type cleanup

2024-02-15 Thread Takashi Iwai
On Wed, 14 Feb 2024 20:28:27 +0100, Ricardo B. Marliere wrote: > > This series is part of an effort to cleanup the users of the driver > core, as can be seen in many recent patches authored by Greg across the > tree (e.g. [1]). > > --- > [1]: >

Re: [PATCH v6 00/18] Transparent Contiguous PTEs for User Mappings

2024-02-15 Thread Mark Rutland
On Thu, Feb 15, 2024 at 10:31:47AM +, Ryan Roberts wrote: > Hi All, > > This is a series to opportunistically and transparently use contpte mappings > (set the contiguous bit in ptes) for user memory when those mappings meet the > requirements. The change benefits arm64, but there is some

Re: [PATCH v6 18/18] arm64/mm: Automatically fold contpte mappings

2024-02-15 Thread Mark Rutland
On Thu, Feb 15, 2024 at 10:32:05AM +, Ryan Roberts wrote: > There are situations where a change to a single PTE could cause the > contpte block in which it resides to become foldable (i.e. could be > repainted with the contiguous bit). Such situations arise, for example, > when user space

Re: [PATCH v6 14/18] arm64/mm: Implement new [get_and_]clear_full_ptes() batch APIs

2024-02-15 Thread Mark Rutland
On Thu, Feb 15, 2024 at 10:32:01AM +, Ryan Roberts wrote: > Optimize the contpte implementation to fix some of the > exit/munmap/dontneed performance regression introduced by the initial > contpte commit. Subsequent patches will solve it entirely. > > During exit(), munmap() or

Re: [PATCH v6 13/18] arm64/mm: Implement new wrprotect_ptes() batch API

2024-02-15 Thread Mark Rutland
On Thu, Feb 15, 2024 at 10:32:00AM +, Ryan Roberts wrote: > Optimize the contpte implementation to fix some of the fork performance > regression introduced by the initial contpte commit. Subsequent patches > will solve it entirely. > > During fork(), any private memory in the parent must be

Re: [PATCH v6 12/18] arm64/mm: Wire up PTE_CONT for user mappings

2024-02-15 Thread Mark Rutland
On Thu, Feb 15, 2024 at 10:31:59AM +, Ryan Roberts wrote: > With the ptep API sufficiently refactored, we can now introduce a new > "contpte" API layer, which transparently manages the PTE_CONT bit for > user mappings. > > In this initial implementation, only suitable batches of PTEs, set via

Re: [PATCH v6 11/18] arm64/mm: Split __flush_tlb_range() to elide trailing DSB

2024-02-15 Thread Mark Rutland
On Thu, Feb 15, 2024 at 10:31:58AM +, Ryan Roberts wrote: > Split __flush_tlb_range() into __flush_tlb_range_nosync() + > __flush_tlb_range(), in the same way as the existing flush_tlb_page() > arrangement. This allows calling __flush_tlb_range_nosync() to elide the > trailing DSB. Forthcoming

Re: [PATCH v6 10/18] arm64/mm: New ptep layer to manage contig bit

2024-02-15 Thread Mark Rutland
On Thu, Feb 15, 2024 at 10:31:57AM +, Ryan Roberts wrote: > Create a new layer for the in-table PTE manipulation APIs. For now, The > existing API is prefixed with double underscore to become the > arch-private API and the public API is just a simple wrapper that calls > the private API. > >

Re: [PATCH v6 09/18] arm64/mm: Convert ptep_clear() to ptep_get_and_clear()

2024-02-15 Thread Mark Rutland
On Thu, Feb 15, 2024 at 10:31:56AM +, Ryan Roberts wrote: > ptep_clear() is a generic wrapper around the arch-implemented > ptep_get_and_clear(). We are about to convert ptep_get_and_clear() into > a public version and private version (__ptep_get_and_clear()) to support > the transparent

Re: [PATCH v6 08/18] arm64/mm: Convert set_pte_at() to set_ptes(..., 1)

2024-02-15 Thread Mark Rutland
On Thu, Feb 15, 2024 at 10:31:55AM +, Ryan Roberts wrote: > Since set_ptes() was introduced, set_pte_at() has been implemented as a > generic macro around set_ptes(..., 1). So this change should continue to > generate the same code. However, making this change prepares us for the > transparent

Re: [PATCH v6 07/18] arm64/mm: Convert READ_ONCE(*ptep) to ptep_get(ptep)

2024-02-15 Thread Mark Rutland
On Thu, Feb 15, 2024 at 10:31:54AM +, Ryan Roberts wrote: > There are a number of places in the arch code that read a pte by using > the READ_ONCE() macro. Refactor these call sites to instead use the > ptep_get() helper, which itself is a READ_ONCE(). Generated code should > be the same. > >

Re: [PATCH v6 04/18] arm64/mm: Convert pte_next_pfn() to pte_advance_pfn()

2024-02-15 Thread Mark Rutland
On Thu, Feb 15, 2024 at 10:31:51AM +, Ryan Roberts wrote: > Core-mm needs to be able to advance the pfn by an arbitrary amount, so > override the new pte_advance_pfn() API to do so. > > Signed-off-by: Ryan Roberts Acked-by: Mark Rutland Mark. > --- > arch/arm64/include/asm/pgtable.h | 8

Re: [PATCH v2 2/2] powerpc/bpf: enable kfunc call

2024-02-15 Thread Naveen N Rao
On Tue, Feb 13, 2024 at 07:54:27AM +, Christophe Leroy wrote: > > > Le 01/02/2024 à 18:12, Hari Bathini a écrit : > > With module addresses supported, override bpf_jit_supports_kfunc_call() > > to enable kfunc support. Module address offsets can be more than 32-bit > > long, so override

Re: [PATCH v6 06/18] mm: Tidy up pte_next_pfn() definition

2024-02-15 Thread David Hildenbrand
On 15.02.24 11:31, Ryan Roberts wrote: Now that the all architecture overrides of pte_next_pfn() have been replaced with pte_advance_pfn(), we can simplify the definition of the generic pte_next_pfn() macro so that it is unconditionally defined. Signed-off-by: Ryan Roberts ---

Re: [PATCH v6 05/18] x86/mm: Convert pte_next_pfn() to pte_advance_pfn()

2024-02-15 Thread David Hildenbrand
On 15.02.24 11:31, Ryan Roberts wrote: Core-mm needs to be able to advance the pfn by an arbitrary amount, so override the new pte_advance_pfn() API to do so. Signed-off-by: Ryan Roberts --- arch/x86/include/asm/pgtable.h | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff

Re: [PATCH v6 04/18] arm64/mm: Convert pte_next_pfn() to pte_advance_pfn()

2024-02-15 Thread David Hildenbrand
On 15.02.24 11:31, Ryan Roberts wrote: Core-mm needs to be able to advance the pfn by an arbitrary amount, so override the new pte_advance_pfn() API to do so. Signed-off-by: Ryan Roberts --- arch/arm64/include/asm/pgtable.h | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff

Re: [PATCH v6 03/18] mm: Introduce pte_advance_pfn() and use for pte_next_pfn()

2024-02-15 Thread David Hildenbrand
On 15.02.24 11:31, Ryan Roberts wrote: The goal is to be able to advance a PTE by an arbitrary number of PFNs. So introduce a new API that takes a nr param. Define the default implementation here and allow for architectures to override. pte_next_pfn() becomes a wrapper around pte_advance_pfn().

Re: [PATCH v2 1/2] powerpc/bpf: ensure module addresses are supported

2024-02-15 Thread Naveen N Rao
On Thu, Feb 01, 2024 at 10:42:48PM +0530, Hari Bathini wrote: > Currently, bpf jit code on powerpc assumes all the bpf functions and > helpers to be kernel text. This is false for kfunc case, as function > addresses are mostly module addresses in that case. Ensure module > addresses are supported

Re: [PATCH v2 1/2] powerpc/bpf: ensure module addresses are supported

2024-02-15 Thread Hari Bathini
On 13/02/24 1:23 pm, Christophe Leroy wrote: Le 01/02/2024 à 18:12, Hari Bathini a écrit : Currently, bpf jit code on powerpc assumes all the bpf functions and helpers to be kernel text. This is false for kfunc case, as function addresses are mostly module addresses in that case. Ensure

Re: [PATCH v2 2/2] powerpc/bpf: enable kfunc call

2024-02-15 Thread Hari Bathini
On 13/02/24 1:24 pm, Christophe Leroy wrote: Le 01/02/2024 à 18:12, Hari Bathini a écrit : With module addresses supported, override bpf_jit_supports_kfunc_call() to enable kfunc support. Module address offsets can be more than 32-bit long, so override bpf_jit_supports_far_kfunc_call() to

[PATCH v6 14/18] arm64/mm: Implement new [get_and_]clear_full_ptes() batch APIs

2024-02-15 Thread Ryan Roberts
Optimize the contpte implementation to fix some of the exit/munmap/dontneed performance regression introduced by the initial contpte commit. Subsequent patches will solve it entirely. During exit(), munmap() or madvise(MADV_DONTNEED), mappings must be cleared. Previously this was done 1 PTE at a

[PATCH v6 13/18] arm64/mm: Implement new wrprotect_ptes() batch API

2024-02-15 Thread Ryan Roberts
Optimize the contpte implementation to fix some of the fork performance regression introduced by the initial contpte commit. Subsequent patches will solve it entirely. During fork(), any private memory in the parent must be write-protected. Previously this was done 1 PTE at a time. But the

[PATCH v6 12/18] arm64/mm: Wire up PTE_CONT for user mappings

2024-02-15 Thread Ryan Roberts
With the ptep API sufficiently refactored, we can now introduce a new "contpte" API layer, which transparently manages the PTE_CONT bit for user mappings. In this initial implementation, only suitable batches of PTEs, set via set_ptes(), are mapped with the PTE_CONT bit. Any subsequent

[PATCH v6 09/18] arm64/mm: Convert ptep_clear() to ptep_get_and_clear()

2024-02-15 Thread Ryan Roberts
ptep_clear() is a generic wrapper around the arch-implemented ptep_get_and_clear(). We are about to convert ptep_get_and_clear() into a public version and private version (__ptep_get_and_clear()) to support the transparent contpte work. We won't have a private version of ptep_clear() so let's

[PATCH v6 11/18] arm64/mm: Split __flush_tlb_range() to elide trailing DSB

2024-02-15 Thread Ryan Roberts
Split __flush_tlb_range() into __flush_tlb_range_nosync() + __flush_tlb_range(), in the same way as the existing flush_tlb_page() arrangement. This allows calling __flush_tlb_range_nosync() to elide the trailing DSB. Forthcoming "contpte" code will take advantage of this when clearing the young

[PATCH v6 10/18] arm64/mm: New ptep layer to manage contig bit

2024-02-15 Thread Ryan Roberts
Create a new layer for the in-table PTE manipulation APIs. For now, The existing API is prefixed with double underscore to become the arch-private API and the public API is just a simple wrapper that calls the private API. The public API implementation will subsequently be used to transparently

[PATCH v6 15/18] mm: Add pte_batch_hint() to reduce scanning in folio_pte_batch()

2024-02-15 Thread Ryan Roberts
Some architectures (e.g. arm64) can tell from looking at a pte, if some follow-on ptes also map contiguous physical memory with the same pgprot. (for arm64, these are contpte mappings). Take advantage of this knowledge to optimize folio_pte_batch() so that it can skip these ptes when scanning to

[PATCH v6 17/18] arm64/mm: __always_inline to improve fork() perf

2024-02-15 Thread Ryan Roberts
As set_ptes() and wrprotect_ptes() become a bit more complex, the compiler may choose not to inline them. But this is critical for fork() performance. So mark the functions, along with contpte_try_unfold() which is called by them, as __always_inline. This is worth ~1% on the fork() microbenchmark

[PATCH v6 18/18] arm64/mm: Automatically fold contpte mappings

2024-02-15 Thread Ryan Roberts
There are situations where a change to a single PTE could cause the contpte block in which it resides to become foldable (i.e. could be repainted with the contiguous bit). Such situations arise, for example, when user space temporarily changes protections, via mprotect, for individual pages, such

[PATCH v6 16/18] arm64/mm: Implement pte_batch_hint()

2024-02-15 Thread Ryan Roberts
When core code iterates over a range of ptes and calls ptep_get() for each of them, if the range happens to cover contpte mappings, the number of pte reads becomes amplified by a factor of the number of PTEs in a contpte block. This is because for each call to ptep_get(), the implementation must

[PATCH v6 08/18] arm64/mm: Convert set_pte_at() to set_ptes(..., 1)

2024-02-15 Thread Ryan Roberts
Since set_ptes() was introduced, set_pte_at() has been implemented as a generic macro around set_ptes(..., 1). So this change should continue to generate the same code. However, making this change prepares us for the transparent contpte support. It means we can reroute set_ptes() to __set_ptes().

[PATCH v6 07/18] arm64/mm: Convert READ_ONCE(*ptep) to ptep_get(ptep)

2024-02-15 Thread Ryan Roberts
There are a number of places in the arch code that read a pte by using the READ_ONCE() macro. Refactor these call sites to instead use the ptep_get() helper, which itself is a READ_ONCE(). Generated code should be the same. This will benefit us when we shortly introduce the transparent contpte

[PATCH v6 05/18] x86/mm: Convert pte_next_pfn() to pte_advance_pfn()

2024-02-15 Thread Ryan Roberts
Core-mm needs to be able to advance the pfn by an arbitrary amount, so override the new pte_advance_pfn() API to do so. Signed-off-by: Ryan Roberts --- arch/x86/include/asm/pgtable.h | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/arch/x86/include/asm/pgtable.h

[PATCH v6 06/18] mm: Tidy up pte_next_pfn() definition

2024-02-15 Thread Ryan Roberts
Now that the all architecture overrides of pte_next_pfn() have been replaced with pte_advance_pfn(), we can simplify the definition of the generic pte_next_pfn() macro so that it is unconditionally defined. Signed-off-by: Ryan Roberts --- include/linux/pgtable.h | 2 -- 1 file changed, 2

[PATCH v6 04/18] arm64/mm: Convert pte_next_pfn() to pte_advance_pfn()

2024-02-15 Thread Ryan Roberts
Core-mm needs to be able to advance the pfn by an arbitrary amount, so override the new pte_advance_pfn() API to do so. Signed-off-by: Ryan Roberts --- arch/arm64/include/asm/pgtable.h | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/arch/arm64/include/asm/pgtable.h

[PATCH v6 03/18] mm: Introduce pte_advance_pfn() and use for pte_next_pfn()

2024-02-15 Thread Ryan Roberts
The goal is to be able to advance a PTE by an arbitrary number of PFNs. So introduce a new API that takes a nr param. Define the default implementation here and allow for architectures to override. pte_next_pfn() becomes a wrapper around pte_advance_pfn(). Follow up commits will convert each

[PATCH v6 02/18] mm: thp: Batch-collapse PMD with set_ptes()

2024-02-15 Thread Ryan Roberts
Refactor __split_huge_pmd_locked() so that a present PMD can be collapsed to PTEs in a single batch using set_ptes(). This should improve performance a little bit, but the real motivation is to remove the need for the arm64 backend to have to fold the contpte entries. Instead, since the ptes are

[PATCH v6 00/18] Transparent Contiguous PTEs for User Mappings

2024-02-15 Thread Ryan Roberts
Hi All, This is a series to opportunistically and transparently use contpte mappings (set the contiguous bit in ptes) for user memory when those mappings meet the requirements. The change benefits arm64, but there is some (very) minor refactoring for x86 to enable its integration with core-mm.

[PATCH v6 01/18] mm: Clarify the spec for set_ptes()

2024-02-15 Thread Ryan Roberts
set_ptes() spec implies that it can only be used to set a present pte because it interprets the PFN field to increment it. However, set_pte_at() has been implemented on top of set_ptes() since set_ptes() was introduced, and set_pte_at() allows setting a pte to a not-present state. So clarify the

Re: [PATCH v2] uapi/auxvec: Define AT_HWCAP3 and AT_HWCAP4 aux vector, entries

2024-02-15 Thread Arnd Bergmann
On Wed, Feb 14, 2024, at 23:34, Peter Bergner wrote: > The powerpc toolchain keeps a copy of the HWCAP bit masks in our TCB for fast > access by the __builtin_cpu_supports built-in function. The TCB space for > the HWCAP entries - which are created in pairs - is an ABI extension, so > waiting to