From: Jack Thomson <[email protected]> Hi,
This series adds arm64 support for KVM_PRE_FAULT_MEMORY, which was added for x86 in [1]. The ioctl allows userspace to populate stage-2 mappings before running a vCPU, reducing the number of stage-2 faults taken in the run path. This is useful for post-copy migration, where stage-2 fault latency shows up directly in memory-intensive workloads. On arm64, the GPA supplied to the ioctl is treated as an IPA in the userspace-owned VM's memslot address space. If the vCPU most recently ran a nested guest, KVM still targets the VM's canonical stage-2. It does not interpret the GPA as an L2 IPA, and does not try to populate the nested/shadow stage-2 selected by the vCPU's last run state. The patches are: - Allow callers of kvm_pgtable_get_leaf() to pass walk flags, so the prefault path can walk stage-2 under the MMU read lock. - Add arm64 support for KVM_PRE_FAULT_MEMORY. - Enable pre_fault_memory_test on arm64. - Add a backing-source option to pre_fault_memory_test. - Add a nested (NV) selftest that prefaults on a vCPU whose last-run context is backed by a shadow stage-2 MMU with an empty nested stage-2 root. The prefault flag and page_size output in the stage-2 fault descriptor remain in this series so the arm64 implementation can advance by the mapping granule installed by the fault path and report poison without queueing a SIGBUS. Tested with pre_fault_memory_test under an arm64 QEMU setup with anonymous, shmem, anonymous_thp, anonymous_hugetlb and shared_hugetlb backings, including 64K, 2M and 32M hugetlb pools, and with the new nv_pre_fault_memory_test on an NV-capable setup. === Changes since v4 [2] === - Reworked nested virt semantics: arm64 now treats the ioctl GPA as the VM/memslot IPA and always targets the canonical stage-2. It no longer translates an L2 IPA through L1's stage-2. - Documented the arm64 nested behavior in the KVM API text. - Switch to the canonical stage-2 with the vCPU put/load helpers when the vCPU last ran with a nested/shadow MMU, keeping VMID, VNCR and shadow-MMU refcount state consistent. - Split the kvm_pgtable_get_leaf() walk-flag plumbing into a prep patch and walk existing mappings with KVM_PGTABLE_WALK_SHARED under the MMU read lock. - Tightened prefault fault handling: preserve fault info, set IL in the synthetic ESR, handle existing mappings, return -EAGAIN for invalid memslot races, and report -EHWPOISON without queueing SIGBUS. - Avoid directly walking stage-2 page tables when pKVM is enabled. Protected VMs remain unsupported via -EOPNOTSUPP. - Preserve the selected selftest memory backing when recreating the racing memslot. - Add the nested (NV) prefault selftest, including an empty nested stage-2 root to catch accidental L2-IPA interpretation. === Changes since v3 [3] === - Return -EOPNOTSUPP for protected VMs. - Reworked nested-vCPU handling to translate an L2 IPA through L1's stage-2. This has been superseded by the canonical VM-IPA semantics described above. - Make page_size unsigned and keep local declarations ordered at the top of kvm_arch_vcpu_pre_fault_memory(). === Changes since v2 [4] === - Update the synthetic fault info. Thanks Suzuki. - Remove the selftest change for unaligned mmap allocations. Thanks Sean. [1]: https://lore.kernel.org/kvm/[email protected]/ [2]: https://lore.kernel.org/linux-arm-kernel/[email protected]/ [3]: https://lore.kernel.org/linux-arm-kernel/[email protected]/ [4]: https://lore.kernel.org/linux-arm-kernel/[email protected]/ Jack Thomson (5): KVM: arm64: Pass walk flags to kvm_pgtable_get_leaf() KVM: arm64: Add pre_fault_memory implementation KVM: selftests: Enable pre_fault_memory_test for arm64 KVM: selftests: Add option for different backing in pre-fault tests KVM: selftests: Add nested pre-fault test for arm64 Documentation/virt/kvm/api.rst | 18 +- arch/arm64/include/asm/kvm_pgtable.h | 5 +- arch/arm64/kvm/Kconfig | 1 + arch/arm64/kvm/arm.c | 1 + arch/arm64/kvm/hyp/nvhe/mem_protect.c | 10 +- arch/arm64/kvm/hyp/pgtable.c | 5 +- arch/arm64/kvm/mmu.c | 164 +++++++++++++- arch/arm64/kvm/nested.c | 2 +- tools/testing/selftests/kvm/Makefile.kvm | 2 + .../kvm/arm64/nv_pre_fault_memory_test.c | 200 ++++++++++++++++++ .../selftests/kvm/pre_fault_memory_test.c | 150 ++++++++++--- 11 files changed, 513 insertions(+), 45 deletions(-) create mode 100644 tools/testing/selftests/kvm/arm64/nv_pre_fault_memory_test.c base-commit: 98f826f3c500fda08d51fca434b7aefa6a2f7076 -- 2.43.0

