On Thu, 2025-10-16 at 17:32 -0700, Sean Christopherson wrote: > From: Yan Zhao <[email protected]> > > Don't explicitly pin pages when mapping pages into the S-EPT, guest_memfd > doesn't support page migration in any capacity, i.e. there are no migrate > callbacks because guest_memfd pages *can't* be migrated. See the WARN in > kvm_gmem_migrate_folio(). > > Eliminating TDX's explicit pinning will also enable guest_memfd to support > in-place conversion between shared and private memory[1][2]. Because KVM > cannot distinguish between speculative/transient refcounts and the > intentional refcount for TDX on private pages[3], failing to release > private page refcount in TDX could cause guest_memfd to indefinitely wait > on decreasing the refcount for the splitting. > > Under normal conditions, not holding an extra page refcount in TDX is safe > because guest_memfd ensures pages are retained until its invalidation > notification to KVM MMU is completed. However, if there're bugs in KVM/TDX > module, not holding an extra refcount when a page is mapped in S-EPT could > result in a page being released from guest_memfd while still mapped in the > S-EPT. But, doing work to make a fatal error slightly less fatal is a net > negative when that extra work adds complexity and confusion. > > Several approaches were considered to address the refcount issue, including > - Attempting to modify the KVM unmap operation to return a failure, > which was deemed too complex and potentially incorrect[4]. > - Increasing the folio reference count only upon S-EPT zapping failure[5]. > - Use page flags or page_ext to indicate a page is still used by TDX[6], > which does not work for HVO (HugeTLB Vmemmap Optimization). > - Setting HWPOISON bit or leveraging folio_set_hugetlb_hwpoison()[7].
Some white space issues above. But in any case: Reviewed-by: Rick Edgecombe <[email protected]>
