On 18/08/2025 19:01, Edgecombe, Rick P wrote: > On Mon, 2025-08-18 at 18:02 +0200, Kevin Brodsky wrote: >> The benchmarking results (see cover letter) don't seem to point to a >> major performance hit from setting the pkey on arm64 (worth noting that >> the linear mapping is PTE-mapped on arm64 today so no splitting should >> occur when setting the pkey). The overhead may well be substantially >> higher on x86. > It's surprising to me. The batching seems to be about switching the pkey, not > the conversion of the direct map.
Correct, there is still a set_memory_pkey() for each PTP. > And with batching you measured a fork > benchmark actually sped up a tiny bit. Shouldn't it involve a pile of page > table > allocations and so extra direct map work? It should indeed... > I don't know if it's possible the mock implementation skipped some > set_memory() > work somehow? In fact you're absolutely right, in the mock implementation I benchmarked set_memory_pkey() is in fact a no-op :( This is because patch 6 gates set_memory_pkey() on system_supports_poe(), but the mock implementation [1] only modifies arch_kpkeys_enabled(). In other words the numbers in the cover letter correspond to the added pkey register switches, without touching the page tables. I am now re-running the benchmarks with set_memory_pkey() actually modifying the page tables. I'll reply to the cover letter with the updated numbers. - Kevin [1] https://gitlab.arm.com/linux-arm/linux-kb/-/commit/fd75b43abb354e84d06f3dfb05ce839e9fb13e08
