On Thu, Aug 10, 2023 at 10:58:09AM -0500, Michael Roth via <qemu-devel@nongnu.org> wrote:
> On Tue, Aug 01, 2023 at 09:45:41AM +0800, Xiaoyao Li wrote: > > On 8/1/2023 12:51 AM, Daniel P. Berrangé wrote: > > > On Mon, Jul 31, 2023 at 12:21:42PM -0400, Xiaoyao Li wrote: > > > > This is the first RFC version of enabling KVM gmem[1] as the backend for > > > > private memory of KVM_X86_PROTECTED_VM. > > > > > > > > It adds the support to create a specific KVM_X86_PROTECTED_VM type VM, > > > > and introduces 'private' property for memory backend. When the vm type > > > > is KVM_X86_PROTECTED_VM and memory backend has private enabled as below, > > > > it will call KVM gmem ioctl to allocate private memory for the backend. > > > > > > > > $qemu -object memory-backend-ram,id=mem0,size=1G,private=on \ > > > > -machine q35,kvm-type=sw-protected-vm,memory-backend=mem0 \ > > > > ... > > > > > > > > Unfortunately this patch series fails the boot of OVMF at very early > > > > stage due to triple fault because KVM doesn't support emulate string IO > > > > to private memory. We leave it as an open to be discussed. > > > > > > > > There are following design opens that need to be discussed: > > > > > > > > 1. how to determine the vm type? > > > > > > > > a. like this series, specify the vm type via machine property > > > > 'kvm-type' > > > > b. check the memory backend, if any backend has 'private' property > > > > set, the vm-type is set to KVM_X86_PROTECTED_VM. > > > > > > > > 2. whether 'private' property is needed if we choose 1.b as design > > > > > > > > with 1.b, QEMU can decide whether the memory region needs to be > > > > private (allocates gmem fd for it) or not, on its own. > > > > > > > > 3. What is KVM_X86_SW_PROTECTED_VM going to look like? What's the > > > > purose of it and what's the requirement on it. I think it's the > > > > questions for KVM folks than QEMU folks. > > > > > > > > Any other idea/open/question is welcomed. > > > > > > > > > > > > Beside, TDX QEMU implemetation is based on this series to provide > > > > private gmem for TD private memory, which can be found at [2]. > > > > And it can work corresponding KVM [3] to boot TDX guest. > > > > > > We already have a general purpose configuration mechanism for > > > confidential guests. The -machine argument has a property > > > confidential-guest-support=$OBJECT-ID, for pointing to an > > > object that implements the TYPE_CONFIDENTIAL_GUEST_SUPPORT > > > interface in QEMU. This is implemented with SEV, PPC PEF > > > mode, and s390 protvirt. > > > > > > I would expect TDX to follow this same design ie > > > > > > qemu-system-x86_64 \ > > > -object tdx-guest,id=tdx0,..... \ > > > -machine q35,confidential-guest-support=tdx0 \ > > > ... > > > > > > and not require inventing the new 'kvm-type' attribute at least. > > > > yes. > > > > TDX is initialized exactly as the above. > > > > This RFC series introduces the 'kvm-type' for KVM_X86_SW_PROTECTED_VM. It's > > my fault that forgot to list the option of introducing sw_protected_vm > > object with CONFIDENTIAL_GUEST_SUPPORT interface. > > Thanks for Isaku to raise it > > https://lore.kernel.org/qemu-devel/20230731171041.gb1807...@ls.amr.corp.intel.com/ > > > > we can specify KVM_X86_SW_PROTECTED_VM this way: > > > > qemu \ > > -object sw-protected,id=swp0,... \ > > -machine confidential-guest-support=swp0 \ > > ... > > > > > For the memory backend though, I'm not so sure - possibly that > > > might be something that still wants an extra property to identify > > > the type of memory to allocate, since we use memory-backend-ram > > > for a variety of use cases. Or it could be an entirely new object > > > type such as "memory-backend-gmem" > > > > What I want to discuss is whether providing the interface to users to allow > > them configuring which memory is/can be private. For example, QEMU can do it > > internally. If users wants a confidential guest, QEMU allocates private gmem > > for normal RAM automatically. > > I think handling it automatically simplifies things a good deal on the > QEMU side. I think it's still worthwhile to still allow: > > -object memory-backend-memfd-private,... > > because it provides a nice mechanism to set up a pair of shared/private > memfd's to enable hole-punching via fallocate() to avoid doubling memory > allocations for shared/private. It's also a nice place to control > potentially-configurable things like: > > - whether or not to enable discard/hole-punching > - if discard is enabled, whether or not to register the range via > RamDiscardManager interface so that VFIO/IOMMU mappings get updated > when doing PCI passthrough. SNP relies on this for PCI passthrough > when discard is enabled, otherwise DMA occurs to stale mappings of > discarded bounce-buffer pages: > > > https://github.com/AMDESE/qemu/blob/snp-latest/backends/hostmem-memfd-private.c#L449 > > But for other memory ranges, it doesn't do a lot of good to rely on > users to control those via -object memory-backend-memfd-private, since > QEMU will set up some regions internally, like the UEFI ROM. > > It also isn't ideal for QEMU itself to internally control what > should/shouldn't be set up with a backing guest_memfd, because some > guest kernels do weird stuff, like scan for ROM regions in areas that > guest kernels might have mapped as encrypted in guest page table. You > can consider them to be guest bugs, but even current SNP-capable > kernels exhibit this behavior and if the guest wants to do dumb stuff > QEMU should let it. > > But for these latter 2 cases, it doesn't make sense to attempt to do > any sort of discarding of backing pages since it doesn't make sense to > discard ROM pages. > > So I think it makes sense to just set up the gmemfd automatically across > the board internally, and keep memory-backend-memfd-private around > purely as a way to control/configure discardable memory. I'm looking at the repo and 31a7c7e36684 ("*hostmem-memfd-private: Initial discard manager support") Do we have to implement RAM_DISCARD_MANGER at memory-backend-memfd-private? Can't we implement it at host_mem? The interface callbacks can have check "if (!private) return". Then we can support any host-mem backend. -- Isaku Yamahata <isaku.yamah...@gmail.com>