On Mon, 11 May 2026 at 21:25, Sean Christopherson <[email protected]> wrote:
>
> On Mon, May 11, 2026, Fuad Tabba wrote:
> > On Mon, 11 May 2026 at 15:58, Sean Christopherson <[email protected]> wrote:
> > >
> > > On Mon, May 11, 2026, Fuad Tabba wrote:
> > > > kvm_vm_release() closes vmp->fd and vmp->kvm_fd unconditionally, and
> > > > kvm_vm_free() calls kvm_vm_release() at teardown.  A test that calls
> > > > kvm_vm_release() and then kvm_vm_free() without a
> > > > vm_recreate_with_one_vcpu() in between double-closes both FDs.
> > >
> > > That's a test bug, no?
> >
> > Fair point.  Let me describe the actual use case and then I'd like
> > your guidance on which way to fix it.
> >
> > The pKVM selftests I'm preparing need to verify that protected guest
> > memory is zeroed when the VM is destroyed before the host regains
> > visibility of those pages.  The test pattern is:
> >
> >   vm_userspace_mem_region_add(vm, ..., GPA_BASE, 1, GPA_PAGES, 0);
> >   /* run protected guest, which writes/shares/unshares its memory */
> >   ...
> >   TEST_ASSERT(get_proc_locked_vm_size() > GPA_SIZE);   /* still pinned */
> >
> >   kvm_vm_release(vm);                                  /* trigger destroy */
> >
> >   /* host_mem is still mmap'd in the test process; pKVM should have
> >    * zeroed it before unpinning */
> >   TEST_ASSERT(is_zero(region->host_mem, GPA_PAGES * PAGE_SIZE));
> >
> >   kvm_vm_free(vm);                                     /* tidy up */
> >   TEST_ASSERT(get_proc_locked_vm_size() == 0);
> >
> > I used kvm_vm_release() because it's the only public API that closes
> > vm->fd to trigger kernel-side destruction.  But the existing callers
> > follow it with vm_recreate_with_one_vcpu(), so the "release + later
> > kvm_vm_free()" path isn't exercised today.
> >
> > I see three ways to make this clean:
> >   a) This patch: kvm_vm_release() becomes idempotent for its three
> >      FDs, matching the kvm_stats_release() idiom it already invokes.
> >   b) Leave kvm_vm_release() as-is and add a dedicated helper, e.g.
> >      kvm_vm_destroy_kernel(), that closes vm->fd to trigger kernel
> >      destruction while leaving the kvm_vm struct intact for
> >      post-destruction inspection.  kvm_vm_free() learns to handle the
> >      half-released state.
> >   c) Something else entirely, e.g., the test should manage vm->fd
> >      directly and not rely on library helpers for this pattern.
>
>     d) Fully kill the VM; validate the semantics with an explict mmap().
>
> The entire point of the test you are writing is to verfiy that a guest_memfd 
> VMA
> doesn't somehow cause KVM to leak state.  So, make that obvious instead of 
> abusing
> APIs that kinda sorta do what you want, but not really.
>
>         mem = kvm_mmap(region->mmap_size, PROT_READ | PROT_WRITE, MAP_SHARED,
>                        region->guest_memfd);
>
>         ...
>
>         kvm_vm_free(vm);
>
>         TEST_ASSERT(is_zero(mem, ...));

The test isn't about guest_memfd.  The pKVM support that just landed
via Will's series [1] is built on anonymous host memory, and that's
the only backing this selftest exercises (VM_MEM_SRC_ANONYMOUS and
VM_MEM_SRC_ANONYMOUS_THP).  The property under test is the one
documented by the same series in Documentation/virt/kvm/arm/pkvm.rst:

  Status: Isolation of anonymous memory and metadata pages.
  ...
  Reclaimed memory is zeroed by the hypervisor ...

The test mirrors that contract: assert VmLck > donated size before
teardown (pages pinned), trigger kernel-side destruction, assert the
host VMA reads as zero (hypervisor scrubbed on reclaim), then assert
VmLck == 0 (pages unpinned).

kvm_mmap() + kvm_vm_free() + is_zero() doesn't translate here.  The only
host view of the donated pages is the memslot mmap, and kvm_vm_free()
munmaps it on the way out, so inspection has to happen between
kernel-side destruction and userspace free.  kvm_vm_release() is the
only library primitive that does that today.

What do you suggest?

Cheers,
/fuad

[1] https://lore.kernel.org/all/[email protected]/

Reply via email to