Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling
On Tue, 2013-06-18 at 14:38 +1000, Benjamin Herrenschmidt wrote: On Mon, 2013-06-17 at 20:32 -0600, Alex Williamson wrote: Right, we don't want to create dependencies across modules. I don't have a vision for how this should work. This is effectively a complete side-band to vfio, so we're really just dealing in the iommu group space. Maybe there needs to be some kind of registration of ownership for the group using some kind of token. It would need to include some kind of notification when that ownership ends. That might also be a convenient tag to toggle driver probing off for devices in the group. Other ideas? Thanks, All of that smells nasty like it will need a pile of bloody infrastructure which makes me think it's too complicated and not the right approach. How does access control work today on x86/VFIO ? Can you give me a bit more details ? I didn't get a good grasp in your previous email The current model is not x86 specific, but it only covers doing iommu and device access through vfio. The kink here is that we're trying to do device access and setup through vfio, but iommu manipulation through kvm. We may want to revisit whether we can do the in-kernel iommu manipulation through vfio rather than kvm. For vfio in general, the group is the unit of ownership. A user is granted access to /dev/vfio/$GROUP through file permissions. The user opens the group and a container (/dev/vfio/vfio) and calls SET_CONTAINER on the group. If supported by the platform, multiple groups can be set to the same container, which allows for iommu domain sharing. Once a group is associated with a container, an iommu backend can be initialized for the container. Only then can a device be accessed through the group. So even if we were to pass a vfio group file descriptor into kvm and it matched as some kind of ownership token on the iommu group, it's not clear that's sufficient to assume we can start programming the iommu. Thanks, Alex From the look of it, the VFIO file descriptor is what has the access control to the underlying iommu, is this right ? So we somewhat need to transfer (or copy) that ownership from the VFIO fd to the KVM VM. I don't see a way to do that without some cross-layering here... Rusty, are you aware of some kernel mechanism we can use for that ? Cheers, Ben. -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling
On Tue, 2013-06-18 at 08:48 -0600, Alex Williamson wrote: On Tue, 2013-06-18 at 14:38 +1000, Benjamin Herrenschmidt wrote: On Mon, 2013-06-17 at 20:32 -0600, Alex Williamson wrote: Right, we don't want to create dependencies across modules. I don't have a vision for how this should work. This is effectively a complete side-band to vfio, so we're really just dealing in the iommu group space. Maybe there needs to be some kind of registration of ownership for the group using some kind of token. It would need to include some kind of notification when that ownership ends. That might also be a convenient tag to toggle driver probing off for devices in the group. Other ideas? Thanks, All of that smells nasty like it will need a pile of bloody infrastructure which makes me think it's too complicated and not the right approach. How does access control work today on x86/VFIO ? Can you give me a bit more details ? I didn't get a good grasp in your previous email The current model is not x86 specific, but it only covers doing iommu and device access through vfio. The kink here is that we're trying to do device access and setup through vfio, but iommu manipulation through kvm. We may want to revisit whether we can do the in-kernel iommu manipulation through vfio rather than kvm. How would that be possible ? The hypercalls from the guest arrive in KVM... in a very very specific restricted environment which we call real mode (MMU off but still running in guest context), where we try to do as much as possible, or in virtual mode, where they get handled as normal KVM exits. The only way we could handle them in VFIO would be if somewhat VFIO registered callbacks with KVM... if we have that sort of cross-dependency, then we may as well have a simpler one where VFIO tells KVM what iommu is available for the VM For vfio in general, the group is the unit of ownership. A user is granted access to /dev/vfio/$GROUP through file permissions. The user opens the group and a container (/dev/vfio/vfio) and calls SET_CONTAINER on the group. If supported by the platform, multiple groups can be set to the same container, which allows for iommu domain sharing. Once a group is associated with a container, an iommu backend can be initialized for the container. Only then can a device be accessed through the group. So even if we were to pass a vfio group file descriptor into kvm and it matched as some kind of ownership token on the iommu group, it's not clear that's sufficient to assume we can start programming the iommu. Thanks, Your scheme seems to me that it would have the same problem if you wanted to do virtualized iommu In any case, this is a big deal. We have a requirement for pass-through. It cannot work with any remotely usable performance level if we don't implement the calls in KVM, so it needs to be sorted one way or another and I'm at a loss how here... Ben. Alex From the look of it, the VFIO file descriptor is what has the access control to the underlying iommu, is this right ? So we somewhat need to transfer (or copy) that ownership from the VFIO fd to the KVM VM. I don't see a way to do that without some cross-layering here... Rusty, are you aware of some kernel mechanism we can use for that ? Cheers, Ben. -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling
On 06/16/2013 02:39 PM, Benjamin Herrenschmidt wrote: static pte_t kvmppc_lookup_pte(pgd_t *pgdir, unsigned long hva, bool writing, -unsigned long *pte_sizep) +unsigned long *pte_sizep, bool do_get_page) { pte_t *ptep; unsigned int shift = 0; @@ -135,6 +136,14 @@ static pte_t kvmppc_lookup_pte(pgd_t *pgdir, unsigned long hva, bool writing, if (!pte_present(*ptep)) return __pte(0); +/* + * Put huge pages handling to the virtual mode. + * The only exception is for TCE list pages which we + * do need to call get_page() for. + */ +if ((*pte_sizep PAGE_SIZE) do_get_page) +return __pte(0); + /* wait until _PAGE_BUSY is clear then set it atomically */ __asm__ __volatile__ ( 1: ldarx %0,0,%3\n @@ -148,6 +157,18 @@ static pte_t kvmppc_lookup_pte(pgd_t *pgdir, unsigned long hva, bool writing, : cc); ret = pte; +if (do_get_page pte_present(pte) (!writing || pte_write(pte))) { +struct page *pg = NULL; +pg = realmode_pfn_to_page(pte_pfn(pte)); +if (realmode_get_page(pg)) { +ret = __pte(0); +} else { +pte = pte_mkyoung(pte); +if (writing) +pte = pte_mkdirty(pte); +} +} +*ptep = pte;/* clears _PAGE_BUSY */ return ret; } So now you are adding the clearing of _PAGE_BUSY that was missing for your first patch, except that this is not enough since that means that in the emulated case (ie, !do_get_page) you will in essence return and then use a PTE that is not locked without any synchronization to ensure that the underlying page doesn't go away... then you'll dereference that page. So either make everything use speculative get_page, or make the emulated case use the MMU notifier to drop the operation in case of collision. The former looks easier. Also, any specific reason why you do: - Lock the PTE - get_page() - Unlock the PTE Instead of - Read the PTE - get_page_unless_zero - re-check PTE Like get_user_pages_fast() does ? The former will be two atomic ops, the latter only one (faster), but maybe you have a good reason why that can't work... If we want to set dirty and young bits for pte then I do not know how to avoid _PAGE_BUSY. -- Alexey -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling
Alex Williamson alex.william...@redhat.com writes: On Mon, 2013-06-17 at 13:56 +1000, Benjamin Herrenschmidt wrote: On Sun, 2013-06-16 at 21:13 -0600, Alex Williamson wrote: IOMMU groups themselves don't provide security, they're accessed by interfaces like VFIO, which provide the security. Given a brief look, I agree, this looks like a possible backdoor. The typical VFIO way to handle this would be to pass a VFIO file descriptor here to prove that the process has access to the IOMMU group. This is how /dev/vfio/vfio gains the ability to setup an IOMMU domain an do mappings with the SET_CONTAINER ioctl using a group fd. Thanks, How do you envision that in the kernel ? IE. I'm in KVM code, gets that vfio fd, what do I do with it ? Basically, KVM needs to know that the user is allowed to use that iommu group. I don't think we want KVM however to call into VFIO directly right ? Right, we don't want to create dependencies across modules. I don't have a vision for how this should work. This is effectively a complete side-band to vfio, so we're really just dealing in the iommu group space. Maybe there needs to be some kind of registration of ownership for the group using some kind of token. It would need to include some kind of notification when that ownership ends. That might also be a convenient tag to toggle driver probing off for devices in the group. Other ideas? Thanks, It's actually not that bad. eg. struct vfio_container *vfio_container_from_file(struct file *filp) { if (filp-f_op != vfio_device_fops) return ERR_PTR(-EINVAL); /* OK it really is a vfio fd, return the data. */ } EXPORT_SYMBOL_GPL(vfio_container_from_file); ... inside KVM_CREATE_SPAPR_TCE_IOMMU: struct file *vfio_filp; struct vfio_container *(lookup)(struct file *filp); vfio_filp = fget(create_tce_iommu.fd); if (!vfio) ret = -EBADF; lookup = symbol_get(vfio_container_from_file); if (!lookup) ret = -EINVAL; else { container = lookup(vfio_filp); if (IS_ERR(container)) ret = PTR_ERR(container); else ... symbol_put(vfio_container_from_file); } symbol_get() won't try to load a module; it'll just fail. This is what you want, since they must have vfio in the kernel to get a valid fd... Hope that helps, Rusty. -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling
On Wed, 2013-06-19 at 13:05 +0930, Rusty Russell wrote: symbol_get() won't try to load a module; it'll just fail. This is what you want, since they must have vfio in the kernel to get a valid fd... Ok, cool. I suppose what we want here Alexey is slightly higher level, something like: vfio_validate_iommu_id(file, iommu_id) Which verifies that the file that was passed in is allowed to use that iommu_id. That's a simple and flexible interface (ie, it will work even if we support multiple iommu IDs in the future for a vfio, for example for DDW windows etc...), the logic to know about the ID remains in qemu, this is strictly a validation call. That way we also don't have to expose the containing vfio struct etc... just that simple function. Alex, any objection ? Do we need to make it a get/put interface instead ? vfio_validate_and_use_iommu(file, iommu_id); vfio_release_iommu(file, iommu_id); To ensure that the resource remains owned by the process until KVM is closed as well ? Or do we want to register with VFIO with a callback so that VFIO can call us if it needs us to give it up ? Cheers, Ben. -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html