On Wed, Jun 19, 2024, Fuad Tabba wrote: > Hi Jason, > > On Wed, Jun 19, 2024 at 12:51 PM Jason Gunthorpe <j...@nvidia.com> wrote: > > > > On Wed, Jun 19, 2024 at 10:11:35AM +0100, Fuad Tabba wrote: > > > > > To be honest, personally (speaking only for myself, not necessarily > > > for Elliot and not for anyone else in the pKVM team), I still would > > > prefer to use guest_memfd(). I think that having one solution for > > > confidential computing that rules them all would be best. But we do > > > need to be able to share memory in place, have a plan for supporting > > > huge pages in the near future, and migration in the not-too-distant > > > future. > > > > I think using a FD to control this special lifetime stuff is > > dramatically better than trying to force the MM to do it with struct > > page hacks. > > > > If you can't agree with the guest_memfd people on how to get there > > then maybe you need a guest_memfd2 for this slightly different special > > stuff instead of intruding on the core mm so much. (though that would > > be sad) > > > > We really need to be thinking more about containing these special > > things and not just sprinkling them everywhere. > > I agree that we need to agree :) This discussion has been going on > since before LPC last year, and the consensus from the guest_memfd() > folks (if I understood it correctly) is that guest_memfd() is what it > is: designed for a specific type of confidential computing, in the > style of TDX and CCA perhaps, and that it cannot (or will not) perform > the role of being a general solution for all confidential computing.
That isn't remotely accurate. I have stated multiple times that I want guest_memfd to be a vehicle for all VM types, i.e. not just CoCo VMs, and most definitely not just TDX/SNP/CCA VMs. What I am staunchly against is piling features onto guest_memfd that will cause it to eventually become virtually indistinguishable from any other file-based backing store. I.e. while I want to make guest_memfd usable for all VM *types*, making guest_memfd the preferred backing store for all *VMs* and use cases is very much a non-goal. >From an earlier conversation[1]: : In other words, ditch the complexity for features that are well served by existing : general purpose solutions, so that guest_memfd can take on a bit of complexity to : serve use cases that are unique to KVM guests, without becoming an unmaintainble : mess due to cross-products. > > > Also, since pin is already overloading the refcount, having the > > > exclusive pin there helps in ensuring atomic accesses and avoiding > > > races. > > > > Yeah, but every time someone does this and then links it to a uAPI it > > becomes utterly baked in concrete for the MM forever. > > I agree. But if we can't modify guest_memfd() to fit our needs (pKVM, > Gunyah), then we don't really have that many other options. What _are_ your needs? There are multiple unanswered questions from our last conversation[2]. And by "needs" I don't mean "what changes do you want to make to guest_memfd?", I mean "what are the use cases, patterns, and scenarios that you want to support?". : What's "hypervisor-assisted page migration"? More specifically, what's the : mechanism that drives it? : Do you happen to have a list of exactly what you mean by "normal mm stuff"? I : am not at all opposed to supporting .mmap(), because long term I also want to : use guest_memfd for non-CoCo VMs. But I want to be very conservative with respect : to what is allowed for guest_memfd. E.g. host userspace can map guest_memfd, : and do operations that are directly related to its mapping, but that's about it. That distinction matters, because as I have stated in that thread, I am not opposed to page migration itself: : I am not opposed to page migration itself, what I am opposed to is adding deep : integration with core MM to do some of the fancy/complex things that lead to page : migration. I am generally aware of the core pKVM use cases, but I AFAIK I haven't seen a complete picture of everything you want to do, and _why_. E.g. if one of your requirements is that guest memory is managed by core-mm the same as all other memory in the system, then yeah, guest_memfd isn't for you. Integrating guest_memfd deeply into core-mm simply isn't realistic, at least not without *massive* changes to core-mm, as the whole point of guest_memfd is that it is guest-first memory, i.e. it is NOT memory that is managed by core-mm (primary MMU) and optionally mapped into KVM (secondary MMU). Again from that thread, one of most important aspects guest_memfd is that VMAs are not required. Stating the obvious, lack of VMAs makes it really hard to drive swap, reclaim, migration, etc. from code that fundamentally operates on VMAs. : More broadly, no VMAs are required. The lack of stage-1 page tables are nice to : have; the lack of VMAs means that guest_memfd isn't playing second fiddle, e.g. : it's not subject to VMA protections, isn't restricted to host mapping size, etc. [1] https://lore.kernel.org/all/zfmpby6i3pfbe...@google.com [2] https://lore.kernel.org/all/zg3xf7dttx6hb...@google.com