On Tue, Jun 09, 2020 at 11:25:41PM -0700, John G Johnson wrote: > > On Jun 2, 2020, at 8:06 AM, Alex Williamson <alex.william...@redhat.com> > > wrote: > > > > On Wed, 20 May 2020 17:45:13 -0700 > > John G Johnson <john.g.john...@oracle.com> wrote: > > > >>> I'm confused by VFIO_USER_ADD_MEMORY_REGION vs VFIO_USER_IOMMU_MAP_DMA. > >>> The former seems intended to provide the server with access to the > >>> entire GPA space, while the latter indicates an IOVA to GPA mapping of > >>> those regions. Doesn't this break the basic isolation of a vIOMMU? > >>> This essentially says to me "here's all the guest memory, but please > >>> only access these regions for which we're providing DMA mappings". > >>> That invites abuse. > >>> > >> > >> The purpose behind separating QEMU into multiple processes is > >> to provide an additional layer protection for the infrastructure against > >> a malign guest, not for the guest against itself, so preventing a server > >> that has been compromised by a guest from accessing all of guest memory > >> adds no additional benefit. We don’t even have an IOMMU in our current > >> guest model for this reason. > > > > One of the use cases we see a lot with vfio is nested assignment, ie. > > we assign a device to a VM where the VM includes a vIOMMU, such that > > the guest OS can then assign the device to userspace within the guest. > > This is safe to do AND provides isolation within the guest exactly > > because the device only has access to memory mapped to the device, not > > the entire guest address space. I don't think it's just the hypervisor > > you're trying to protect, we can't assume there are always trusted > > drivers managing the device. > > > > We intend to support an IOMMU. The question seems to be whether > it’s implemented in the server or client. The current proposal has it > in the server, ala vhost-user, but we are fine with moving it.
It's challenging to implement a fast and secure IOMMU. The simplest approach is secure but not fast: add protocol messages for DMA_READ(iova, length) and DMA_WRITE(iova, buffer, length). An issue with file descriptor passing is that it's hard to revoke access once the file descriptor has been passed. memfd supports sealing with fnctl(F_ADD_SEALS) it doesn't revoke mmap(MAP_WRITE) on other processes. Memory Protection Keys don't seem to be useful here either and their availability is limited (see pkeys(7)). One crazy idea is to use KVM as a sandbox for running the device and let the vIOMMU control the page tables instead of the device (guest). That way the hardware MMU provides memory translation, but I think this is impractical because the guest environment is too different from the Linux userspace environment. As a starting point adding DMA_READ/DMA_WRITE messages would provide the functionality and security. Unfortunately it makes DMA expensive and performance will suffer. Stefan
signature.asc
Description: PGP signature