On Thu, Jan 30, 2025 at 11:28:11AM -0500, Peter Xu wrote:
> On Sun, Jan 26, 2025 at 11:34:29AM +0800, Xu Yilun wrote:
> > > Definitely not suggesting to install an invalid pointer anywhere. The
> > > mapped pointer will still be valid for gmem for example, but the fault
> > > isn't. We need to differenciate two things (1) virtual address mapping,
> > > then (2) permission and accesses on the folios / pages of the mapping.
> > > Here I think it's okay if the host pointer is correctly mapped.
> > >
> > > For your private MMIO use case, my question is if there's no host pointer
> > > to be mapped anyway, then what's the benefit to make the MR to be ram=on?
> > > Can we simply make it a normal IO memory region? The only benefit of a
> >
> > The guest access to normal IO memory region would be emulated by QEMU,
> > while private assigned MMIO requires guest direct access via Secure EPT.
> >
> > Seems the existing code doesn't support guest direct access if
> > mr->ram == false:
>
> Ah it's about this, ok.
>
> I am not sure what's the best approach, but IMHO it's still better we stick
> with host pointer always available when ram=on. OTOH, VFIO private regions
> may be able to provide a special mark somewhere, just like when romd_mode
> was done previously as below (qemu 235e8982ad39), so that KVM should still
> apply these MRs even if they're not RAM.
Also good to me.
>
> >
> > static void kvm_set_phys_mem(KVMMemoryListener *kml,
> > MemoryRegionSection *section, bool add)
> > {
> > [...]
> >
> > if (!memory_region_is_ram(mr)) {
> > if (writable || !kvm_readonly_mem_allowed) {
> > return;
> > } else if (!mr->romd_mode) {
> > /* If the memory device is not in romd_mode, then we actually
> > want
> > * to remove the kvm memory slot so all accesses will trap. */
> > add = false;
> > }
> > }
> >
> > [...]
> >
> > /* register the new slot */
> > do {
> >
> > [...]
> >
> > err = kvm_set_user_memory_region(kml, mem, true);
> > }
> > }
> >
> > > ram=on MR is, IMHO, being able to be accessed as RAM-like. If there's no
> > > host pointer at all, I don't yet understand how that helps private MMIO
> > > from working.
> >
> > I expect private MMIO not accessible from host, but accessible from
> > guest so has kvm_userspace_memory_region2 set. That means the resolving
> > of its PFN during EPT fault cannot depends on host pointer.
> >
> > https://lore.kernel.org/all/[email protected]/
>
> I'll leave this to KVM experts, but I actually didn't follow exactly on why
> mmu notifier is an issue to make , as I thought that was per-mm anyway, and
> KVM
> should logically be able to skip all VFIO private MMIO regions if affected.
I think this creates logical inconsistency. You builds the private MMIO
EPT mapping on fault based on the HVA<->HPA mapping, but doesn't follow
the HVA<->HPA mapping change. Why KVM believes the mapping on fault time
but doesn't on mmu notify time?
> This is a comment to this part of your commit message:
>
> Rely on userspace mapping also means private MMIO mapping should
> follow userspace mapping change via mmu_notifier. This conflicts
> with the current design that mmu_notifier never impacts private
> mapping. It also makes no sense to support mmu_notifier just for
> private MMIO, private MMIO mapping should be fixed when CoCo-VM
> accepts the private MMIO, any following mapping change without
> guest permission should be invalid.
>
> So I don't yet see a hard-no of reusing userspace mapping even if they're
> not faultable as of now - what if they can be faultable in the future? I
The first commit of guest_memfd emphasize a lot on the benifit of
decoupling KVM mapping from host mapping. My understanding is even if
guest memfd can be faultable later, KVM should still work in a way
without userspace mapping.
> am not sure..
>
> OTOH, I also don't think we need KVM_SET_USER_MEMORY_REGION3 anyway.. The
> _REGION2 API is already smart enough to leave some reserved fields:
>
> /* for KVM_SET_USER_MEMORY_REGION2 */
> struct kvm_userspace_memory_region2 {
> __u32 slot;
> __u32 flags;
> __u64 guest_phys_addr;
> __u64 memory_size;
> __u64 userspace_addr;
> __u64 guest_memfd_offset;
> __u32 guest_memfd;
> __u32 pad1;
> __u64 pad2[14];
> };
>
> I think we _could_ reuse some pad*? Reusing guest_memfd field sounds error
> prone to me.
It truly is. I'm expecting some suggestions here.
Thanks,
Yilun
>
> Not sure it could be easier if it's not guest_memfd* but fd + fd_offset
> since the start. But I guess when introducing _REGION2 we didn't expect
> MMIO private regions come so soon..
>
> Thanks,
>
> --
> Peter Xu
>