Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling
On Sun, Jun 23, 2013 at 10:41:24PM -0600, Alex Williamson wrote: On Mon, 2013-06-24 at 13:52 +1000, David Gibson wrote: On Sat, Jun 22, 2013 at 08:28:06AM -0600, Alex Williamson wrote: On Sat, 2013-06-22 at 22:03 +1000, David Gibson wrote: On Thu, Jun 20, 2013 at 08:55:13AM -0600, Alex Williamson wrote: On Thu, 2013-06-20 at 18:48 +1000, Alexey Kardashevskiy wrote: On 06/20/2013 05:47 PM, Benjamin Herrenschmidt wrote: On Thu, 2013-06-20 at 15:28 +1000, David Gibson wrote: Just out of curiosity - would not get_file() and fput_atomic() on a group's file* do the right job instead of vfio_group_add_external_user() and vfio_group_del_external_user()? I was thinking that too. Grabbing a file reference would certainly be the usual way of handling this sort of thing. But that wouldn't prevent the group ownership to be returned to the kernel or another user would it ? Holding the file pointer does not let the group-container_users counter go to zero How so? Holding the file pointer means the file won't go away, which means the group release function won't be called. That means the group won't go away, but that doesn't mean it's attached to an IOMMU. A user could call UNSET_CONTAINER. Uhh... *thinks*. Ah, I see. I think the interface should not take the group fd, but the container fd. Holding a reference to *that* would keep the necessary things around. But more to the point, it's the right thing semantically: The container is essentially the handle on a host iommu address space, and so that's what should be bound by the KVM call to a particular guest iommu address space. e.g. it would make no sense to bind two different groups to different guest iommu address spaces, if they were in the same container - the guest thinks they are different spaces, but if they're in the same container they must be the same space. While the container is the gateway to the iommu, what empowers the container to maintain an iommu is the group. What happens to a container when all the groups are disconnected or closed? Groups are the unit that indicates hardware access, not containers. Thanks, Uh... huh? I'm really not sure what you're getting at. The operation we're doing for KVM here is binding a guest iommu address space to a particular host iommu address space. Why would we not want to use the obvious handle on the host iommu address space, which is the container fd? AIUI, the request isn't for an interface through which to do iommu mappings. The request is for an interface to show that the user has sufficient privileges to do mappings. Groups are what gives the user that ability. The iommu is also possibly associated with multiple iommu groups and I believe what is being asked for here is a way to hold and lock a single iommu group with iommu protection. From a practical point of view, the iommu interface is de-privileged once the groups are disconnected or closed. Holding a reference count on the iommu fd won't prevent that. That means we'd have to use a notifier to have KVM stop the side-channel iommu access. Meanwhile holding the file descriptor for the group and adding an interface that bumps use counter allows KVM to lock itself in, just as if it had a device opened itself. Thanks, Ah, good point. -- David Gibson| I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson pgpCSZuQp1350.pgp Description: PGP signature
Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling
On Sun, Jun 23, 2013 at 10:41:24PM -0600, Alex Williamson wrote: On Mon, 2013-06-24 at 13:52 +1000, David Gibson wrote: On Sat, Jun 22, 2013 at 08:28:06AM -0600, Alex Williamson wrote: On Sat, 2013-06-22 at 22:03 +1000, David Gibson wrote: On Thu, Jun 20, 2013 at 08:55:13AM -0600, Alex Williamson wrote: On Thu, 2013-06-20 at 18:48 +1000, Alexey Kardashevskiy wrote: On 06/20/2013 05:47 PM, Benjamin Herrenschmidt wrote: On Thu, 2013-06-20 at 15:28 +1000, David Gibson wrote: Just out of curiosity - would not get_file() and fput_atomic() on a group's file* do the right job instead of vfio_group_add_external_user() and vfio_group_del_external_user()? I was thinking that too. Grabbing a file reference would certainly be the usual way of handling this sort of thing. But that wouldn't prevent the group ownership to be returned to the kernel or another user would it ? Holding the file pointer does not let the group-container_users counter go to zero How so? Holding the file pointer means the file won't go away, which means the group release function won't be called. That means the group won't go away, but that doesn't mean it's attached to an IOMMU. A user could call UNSET_CONTAINER. Uhh... *thinks*. Ah, I see. I think the interface should not take the group fd, but the container fd. Holding a reference to *that* would keep the necessary things around. But more to the point, it's the right thing semantically: The container is essentially the handle on a host iommu address space, and so that's what should be bound by the KVM call to a particular guest iommu address space. e.g. it would make no sense to bind two different groups to different guest iommu address spaces, if they were in the same container - the guest thinks they are different spaces, but if they're in the same container they must be the same space. While the container is the gateway to the iommu, what empowers the container to maintain an iommu is the group. What happens to a container when all the groups are disconnected or closed? Groups are the unit that indicates hardware access, not containers. Thanks, Uh... huh? I'm really not sure what you're getting at. The operation we're doing for KVM here is binding a guest iommu address space to a particular host iommu address space. Why would we not want to use the obvious handle on the host iommu address space, which is the container fd? AIUI, the request isn't for an interface through which to do iommu mappings. The request is for an interface to show that the user has sufficient privileges to do mappings. Groups are what gives the user that ability. The iommu is also possibly associated with multiple iommu groups and I believe what is being asked for here is a way to hold and lock a single iommu group with iommu protection. From a practical point of view, the iommu interface is de-privileged once the groups are disconnected or closed. Holding a reference count on the iommu fd won't prevent that. That means we'd have to use a notifier to have KVM stop the side-channel iommu access. Meanwhile holding the file descriptor for the group and adding an interface that bumps use counter allows KVM to lock itself in, just as if it had a device opened itself. Thanks, Ah, good point. -- David Gibson| I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson pgpAjDK6J9rb1.pgp Description: PGP signature
Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling
On Sun, Jun 23, 2013 at 09:28:13AM +1000, Benjamin Herrenschmidt wrote: On Sat, 2013-06-22 at 22:03 +1000, David Gibson wrote: I think the interface should not take the group fd, but the container fd. Holding a reference to *that* would keep the necessary things around. But more to the point, it's the right thing semantically: The container is essentially the handle on a host iommu address space, and so that's what should be bound by the KVM call to a particular guest iommu address space. e.g. it would make no sense to bind two different groups to different guest iommu address spaces, if they were in the same container - the guest thinks they are different spaces, but if they're in the same container they must be the same space. Interestingly, how are we going to extend that when/if we implement DDW ? DDW means an API by which the guest can request the creation of additional iommus for a given device (typically, in addition to the default smallish 32-bit one using 4k pages, the guest can request a larger window in 64-bit space using a larger page size). So, would a PAPR gest requesting this expect the new window to have a new liobn, or an existing liobn? -- David Gibson| I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson pgpaNZNxL_Bip.pgp Description: PGP signature
Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling
On Sat, Jun 22, 2013 at 08:28:06AM -0600, Alex Williamson wrote: On Sat, 2013-06-22 at 22:03 +1000, David Gibson wrote: On Thu, Jun 20, 2013 at 08:55:13AM -0600, Alex Williamson wrote: On Thu, 2013-06-20 at 18:48 +1000, Alexey Kardashevskiy wrote: On 06/20/2013 05:47 PM, Benjamin Herrenschmidt wrote: On Thu, 2013-06-20 at 15:28 +1000, David Gibson wrote: Just out of curiosity - would not get_file() and fput_atomic() on a group's file* do the right job instead of vfio_group_add_external_user() and vfio_group_del_external_user()? I was thinking that too. Grabbing a file reference would certainly be the usual way of handling this sort of thing. But that wouldn't prevent the group ownership to be returned to the kernel or another user would it ? Holding the file pointer does not let the group-container_users counter go to zero How so? Holding the file pointer means the file won't go away, which means the group release function won't be called. That means the group won't go away, but that doesn't mean it's attached to an IOMMU. A user could call UNSET_CONTAINER. Uhh... *thinks*. Ah, I see. I think the interface should not take the group fd, but the container fd. Holding a reference to *that* would keep the necessary things around. But more to the point, it's the right thing semantically: The container is essentially the handle on a host iommu address space, and so that's what should be bound by the KVM call to a particular guest iommu address space. e.g. it would make no sense to bind two different groups to different guest iommu address spaces, if they were in the same container - the guest thinks they are different spaces, but if they're in the same container they must be the same space. While the container is the gateway to the iommu, what empowers the container to maintain an iommu is the group. What happens to a container when all the groups are disconnected or closed? Groups are the unit that indicates hardware access, not containers. Thanks, Uh... huh? I'm really not sure what you're getting at. The operation we're doing for KVM here is binding a guest iommu address space to a particular host iommu address space. Why would we not want to use the obvious handle on the host iommu address space, which is the container fd? -- David Gibson| I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson pgp5yjWbp1NZI.pgp Description: PGP signature
Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling
On Mon, 2013-06-24 at 13:54 +1000, David Gibson wrote: DDW means an API by which the guest can request the creation of additional iommus for a given device (typically, in addition to the default smallish 32-bit one using 4k pages, the guest can request a larger window in 64-bit space using a larger page size). So, would a PAPR gest requesting this expect the new window to have a new liobn, or an existing liobn? New liobn or there is no way to H_PUT_TCE it (it exists in addition to the legacy window). Cheers, Ben. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling
On Mon, 2013-06-24 at 13:52 +1000, David Gibson wrote: On Sat, Jun 22, 2013 at 08:28:06AM -0600, Alex Williamson wrote: On Sat, 2013-06-22 at 22:03 +1000, David Gibson wrote: On Thu, Jun 20, 2013 at 08:55:13AM -0600, Alex Williamson wrote: On Thu, 2013-06-20 at 18:48 +1000, Alexey Kardashevskiy wrote: On 06/20/2013 05:47 PM, Benjamin Herrenschmidt wrote: On Thu, 2013-06-20 at 15:28 +1000, David Gibson wrote: Just out of curiosity - would not get_file() and fput_atomic() on a group's file* do the right job instead of vfio_group_add_external_user() and vfio_group_del_external_user()? I was thinking that too. Grabbing a file reference would certainly be the usual way of handling this sort of thing. But that wouldn't prevent the group ownership to be returned to the kernel or another user would it ? Holding the file pointer does not let the group-container_users counter go to zero How so? Holding the file pointer means the file won't go away, which means the group release function won't be called. That means the group won't go away, but that doesn't mean it's attached to an IOMMU. A user could call UNSET_CONTAINER. Uhh... *thinks*. Ah, I see. I think the interface should not take the group fd, but the container fd. Holding a reference to *that* would keep the necessary things around. But more to the point, it's the right thing semantically: The container is essentially the handle on a host iommu address space, and so that's what should be bound by the KVM call to a particular guest iommu address space. e.g. it would make no sense to bind two different groups to different guest iommu address spaces, if they were in the same container - the guest thinks they are different spaces, but if they're in the same container they must be the same space. While the container is the gateway to the iommu, what empowers the container to maintain an iommu is the group. What happens to a container when all the groups are disconnected or closed? Groups are the unit that indicates hardware access, not containers. Thanks, Uh... huh? I'm really not sure what you're getting at. The operation we're doing for KVM here is binding a guest iommu address space to a particular host iommu address space. Why would we not want to use the obvious handle on the host iommu address space, which is the container fd? AIUI, the request isn't for an interface through which to do iommu mappings. The request is for an interface to show that the user has sufficient privileges to do mappings. Groups are what gives the user that ability. The iommu is also possibly associated with multiple iommu groups and I believe what is being asked for here is a way to hold and lock a single iommu group with iommu protection. From a practical point of view, the iommu interface is de-privileged once the groups are disconnected or closed. Holding a reference count on the iommu fd won't prevent that. That means we'd have to use a notifier to have KVM stop the side-channel iommu access. Meanwhile holding the file descriptor for the group and adding an interface that bumps use counter allows KVM to lock itself in, just as if it had a device opened itself. Thanks, Alex -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling
On Sun, Jun 23, 2013 at 09:28:13AM +1000, Benjamin Herrenschmidt wrote: On Sat, 2013-06-22 at 22:03 +1000, David Gibson wrote: I think the interface should not take the group fd, but the container fd. Holding a reference to *that* would keep the necessary things around. But more to the point, it's the right thing semantically: The container is essentially the handle on a host iommu address space, and so that's what should be bound by the KVM call to a particular guest iommu address space. e.g. it would make no sense to bind two different groups to different guest iommu address spaces, if they were in the same container - the guest thinks they are different spaces, but if they're in the same container they must be the same space. Interestingly, how are we going to extend that when/if we implement DDW ? DDW means an API by which the guest can request the creation of additional iommus for a given device (typically, in addition to the default smallish 32-bit one using 4k pages, the guest can request a larger window in 64-bit space using a larger page size). So, would a PAPR gest requesting this expect the new window to have a new liobn, or an existing liobn? -- David Gibson| I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson pgp5jkS8m1XZl.pgp Description: PGP signature
Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling
On Sat, Jun 22, 2013 at 08:28:06AM -0600, Alex Williamson wrote: On Sat, 2013-06-22 at 22:03 +1000, David Gibson wrote: On Thu, Jun 20, 2013 at 08:55:13AM -0600, Alex Williamson wrote: On Thu, 2013-06-20 at 18:48 +1000, Alexey Kardashevskiy wrote: On 06/20/2013 05:47 PM, Benjamin Herrenschmidt wrote: On Thu, 2013-06-20 at 15:28 +1000, David Gibson wrote: Just out of curiosity - would not get_file() and fput_atomic() on a group's file* do the right job instead of vfio_group_add_external_user() and vfio_group_del_external_user()? I was thinking that too. Grabbing a file reference would certainly be the usual way of handling this sort of thing. But that wouldn't prevent the group ownership to be returned to the kernel or another user would it ? Holding the file pointer does not let the group-container_users counter go to zero How so? Holding the file pointer means the file won't go away, which means the group release function won't be called. That means the group won't go away, but that doesn't mean it's attached to an IOMMU. A user could call UNSET_CONTAINER. Uhh... *thinks*. Ah, I see. I think the interface should not take the group fd, but the container fd. Holding a reference to *that* would keep the necessary things around. But more to the point, it's the right thing semantically: The container is essentially the handle on a host iommu address space, and so that's what should be bound by the KVM call to a particular guest iommu address space. e.g. it would make no sense to bind two different groups to different guest iommu address spaces, if they were in the same container - the guest thinks they are different spaces, but if they're in the same container they must be the same space. While the container is the gateway to the iommu, what empowers the container to maintain an iommu is the group. What happens to a container when all the groups are disconnected or closed? Groups are the unit that indicates hardware access, not containers. Thanks, Uh... huh? I'm really not sure what you're getting at. The operation we're doing for KVM here is binding a guest iommu address space to a particular host iommu address space. Why would we not want to use the obvious handle on the host iommu address space, which is the container fd? -- David Gibson| I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson pgpfZESO3eXiq.pgp Description: PGP signature
Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling
On Mon, 2013-06-24 at 13:54 +1000, David Gibson wrote: DDW means an API by which the guest can request the creation of additional iommus for a given device (typically, in addition to the default smallish 32-bit one using 4k pages, the guest can request a larger window in 64-bit space using a larger page size). So, would a PAPR gest requesting this expect the new window to have a new liobn, or an existing liobn? New liobn or there is no way to H_PUT_TCE it (it exists in addition to the legacy window). Cheers, Ben. -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling
On Mon, 2013-06-24 at 13:52 +1000, David Gibson wrote: On Sat, Jun 22, 2013 at 08:28:06AM -0600, Alex Williamson wrote: On Sat, 2013-06-22 at 22:03 +1000, David Gibson wrote: On Thu, Jun 20, 2013 at 08:55:13AM -0600, Alex Williamson wrote: On Thu, 2013-06-20 at 18:48 +1000, Alexey Kardashevskiy wrote: On 06/20/2013 05:47 PM, Benjamin Herrenschmidt wrote: On Thu, 2013-06-20 at 15:28 +1000, David Gibson wrote: Just out of curiosity - would not get_file() and fput_atomic() on a group's file* do the right job instead of vfio_group_add_external_user() and vfio_group_del_external_user()? I was thinking that too. Grabbing a file reference would certainly be the usual way of handling this sort of thing. But that wouldn't prevent the group ownership to be returned to the kernel or another user would it ? Holding the file pointer does not let the group-container_users counter go to zero How so? Holding the file pointer means the file won't go away, which means the group release function won't be called. That means the group won't go away, but that doesn't mean it's attached to an IOMMU. A user could call UNSET_CONTAINER. Uhh... *thinks*. Ah, I see. I think the interface should not take the group fd, but the container fd. Holding a reference to *that* would keep the necessary things around. But more to the point, it's the right thing semantically: The container is essentially the handle on a host iommu address space, and so that's what should be bound by the KVM call to a particular guest iommu address space. e.g. it would make no sense to bind two different groups to different guest iommu address spaces, if they were in the same container - the guest thinks they are different spaces, but if they're in the same container they must be the same space. While the container is the gateway to the iommu, what empowers the container to maintain an iommu is the group. What happens to a container when all the groups are disconnected or closed? Groups are the unit that indicates hardware access, not containers. Thanks, Uh... huh? I'm really not sure what you're getting at. The operation we're doing for KVM here is binding a guest iommu address space to a particular host iommu address space. Why would we not want to use the obvious handle on the host iommu address space, which is the container fd? AIUI, the request isn't for an interface through which to do iommu mappings. The request is for an interface to show that the user has sufficient privileges to do mappings. Groups are what gives the user that ability. The iommu is also possibly associated with multiple iommu groups and I believe what is being asked for here is a way to hold and lock a single iommu group with iommu protection. From a practical point of view, the iommu interface is de-privileged once the groups are disconnected or closed. Holding a reference count on the iommu fd won't prevent that. That means we'd have to use a notifier to have KVM stop the side-channel iommu access. Meanwhile holding the file descriptor for the group and adding an interface that bumps use counter allows KVM to lock itself in, just as if it had a device opened itself. Thanks, Alex -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling
On 06/21/2013 12:55 AM, Alex Williamson wrote: On Thu, 2013-06-20 at 18:48 +1000, Alexey Kardashevskiy wrote: On 06/20/2013 05:47 PM, Benjamin Herrenschmidt wrote: On Thu, 2013-06-20 at 15:28 +1000, David Gibson wrote: Just out of curiosity - would not get_file() and fput_atomic() on a group's file* do the right job instead of vfio_group_add_external_user() and vfio_group_del_external_user()? I was thinking that too. Grabbing a file reference would certainly be the usual way of handling this sort of thing. But that wouldn't prevent the group ownership to be returned to the kernel or another user would it ? Holding the file pointer does not let the group-container_users counter go to zero How so? Holding the file pointer means the file won't go away, which means the group release function won't be called. That means the group won't go away, but that doesn't mean it's attached to an IOMMU. A user could call UNSET_CONTAINER. and this is exactly what vfio_group_add_external_user() and vfio_group_del_external_user() do. The difference is only in absolute value - 2 vs. 3. No change in behaviour whether I use new vfio API or simply hold file* till KVM closes fd created when IOMMU was connected to LIOBN. By that notion you could open(/dev/vfio/$GROUP) and you're safe, right? But what about SET_CONTAINER SET_IOMMU? All that you guarantee holding the file pointer is that the vfio_group exists. And while this counter is not zero, QEMU cannot take ownership over the group. I am definitely still missing the bigger picture... The bigger picture is that the group needs to exist AND it needs to be setup and maintained to have IOMMU protection. Actually, my first stab at add_external_user doesn't look sufficient, it needs to look more like vfio_group_get_device_fd, checking group-container-iommu and group_viable(). This makes sense. If you did this, that would be great. Without it, I really cannot see how the proposed inc/dec of container_users is better than simple holding file*. Thanks. As written it would allow an external user after SET_CONTAINER without SET_IOMMU. It should also be part of the API that the external user must hold the file reference between add_external_use and del_external_user and do cleanup on any exit paths. Thanks, -- Alexey -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling
On Thu, Jun 20, 2013 at 08:55:13AM -0600, Alex Williamson wrote: On Thu, 2013-06-20 at 18:48 +1000, Alexey Kardashevskiy wrote: On 06/20/2013 05:47 PM, Benjamin Herrenschmidt wrote: On Thu, 2013-06-20 at 15:28 +1000, David Gibson wrote: Just out of curiosity - would not get_file() and fput_atomic() on a group's file* do the right job instead of vfio_group_add_external_user() and vfio_group_del_external_user()? I was thinking that too. Grabbing a file reference would certainly be the usual way of handling this sort of thing. But that wouldn't prevent the group ownership to be returned to the kernel or another user would it ? Holding the file pointer does not let the group-container_users counter go to zero How so? Holding the file pointer means the file won't go away, which means the group release function won't be called. That means the group won't go away, but that doesn't mean it's attached to an IOMMU. A user could call UNSET_CONTAINER. Uhh... *thinks*. Ah, I see. I think the interface should not take the group fd, but the container fd. Holding a reference to *that* would keep the necessary things around. But more to the point, it's the right thing semantically: The container is essentially the handle on a host iommu address space, and so that's what should be bound by the KVM call to a particular guest iommu address space. e.g. it would make no sense to bind two different groups to different guest iommu address spaces, if they were in the same container - the guest thinks they are different spaces, but if they're in the same container they must be the same space. -- David Gibson| I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson pgpV98XDgZ24l.pgp Description: PGP signature
Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling
On Sat, 2013-06-22 at 22:03 +1000, David Gibson wrote: On Thu, Jun 20, 2013 at 08:55:13AM -0600, Alex Williamson wrote: On Thu, 2013-06-20 at 18:48 +1000, Alexey Kardashevskiy wrote: On 06/20/2013 05:47 PM, Benjamin Herrenschmidt wrote: On Thu, 2013-06-20 at 15:28 +1000, David Gibson wrote: Just out of curiosity - would not get_file() and fput_atomic() on a group's file* do the right job instead of vfio_group_add_external_user() and vfio_group_del_external_user()? I was thinking that too. Grabbing a file reference would certainly be the usual way of handling this sort of thing. But that wouldn't prevent the group ownership to be returned to the kernel or another user would it ? Holding the file pointer does not let the group-container_users counter go to zero How so? Holding the file pointer means the file won't go away, which means the group release function won't be called. That means the group won't go away, but that doesn't mean it's attached to an IOMMU. A user could call UNSET_CONTAINER. Uhh... *thinks*. Ah, I see. I think the interface should not take the group fd, but the container fd. Holding a reference to *that* would keep the necessary things around. But more to the point, it's the right thing semantically: The container is essentially the handle on a host iommu address space, and so that's what should be bound by the KVM call to a particular guest iommu address space. e.g. it would make no sense to bind two different groups to different guest iommu address spaces, if they were in the same container - the guest thinks they are different spaces, but if they're in the same container they must be the same space. While the container is the gateway to the iommu, what empowers the container to maintain an iommu is the group. What happens to a container when all the groups are disconnected or closed? Groups are the unit that indicates hardware access, not containers. Thanks, Alex -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling
On Sat, 2013-06-22 at 22:03 +1000, David Gibson wrote: I think the interface should not take the group fd, but the container fd. Holding a reference to *that* would keep the necessary things around. But more to the point, it's the right thing semantically: The container is essentially the handle on a host iommu address space, and so that's what should be bound by the KVM call to a particular guest iommu address space. e.g. it would make no sense to bind two different groups to different guest iommu address spaces, if they were in the same container - the guest thinks they are different spaces, but if they're in the same container they must be the same space. Interestingly, how are we going to extend that when/if we implement DDW ? DDW means an API by which the guest can request the creation of additional iommus for a given device (typically, in addition to the default smallish 32-bit one using 4k pages, the guest can request a larger window in 64-bit space using a larger page size). Cheers, Ben. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling
On 06/21/2013 12:55 AM, Alex Williamson wrote: On Thu, 2013-06-20 at 18:48 +1000, Alexey Kardashevskiy wrote: On 06/20/2013 05:47 PM, Benjamin Herrenschmidt wrote: On Thu, 2013-06-20 at 15:28 +1000, David Gibson wrote: Just out of curiosity - would not get_file() and fput_atomic() on a group's file* do the right job instead of vfio_group_add_external_user() and vfio_group_del_external_user()? I was thinking that too. Grabbing a file reference would certainly be the usual way of handling this sort of thing. But that wouldn't prevent the group ownership to be returned to the kernel or another user would it ? Holding the file pointer does not let the group-container_users counter go to zero How so? Holding the file pointer means the file won't go away, which means the group release function won't be called. That means the group won't go away, but that doesn't mean it's attached to an IOMMU. A user could call UNSET_CONTAINER. and this is exactly what vfio_group_add_external_user() and vfio_group_del_external_user() do. The difference is only in absolute value - 2 vs. 3. No change in behaviour whether I use new vfio API or simply hold file* till KVM closes fd created when IOMMU was connected to LIOBN. By that notion you could open(/dev/vfio/$GROUP) and you're safe, right? But what about SET_CONTAINER SET_IOMMU? All that you guarantee holding the file pointer is that the vfio_group exists. And while this counter is not zero, QEMU cannot take ownership over the group. I am definitely still missing the bigger picture... The bigger picture is that the group needs to exist AND it needs to be setup and maintained to have IOMMU protection. Actually, my first stab at add_external_user doesn't look sufficient, it needs to look more like vfio_group_get_device_fd, checking group-container-iommu and group_viable(). This makes sense. If you did this, that would be great. Without it, I really cannot see how the proposed inc/dec of container_users is better than simple holding file*. Thanks. As written it would allow an external user after SET_CONTAINER without SET_IOMMU. It should also be part of the API that the external user must hold the file reference between add_external_use and del_external_user and do cleanup on any exit paths. Thanks, -- Alexey -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling
On Thu, Jun 20, 2013 at 08:55:13AM -0600, Alex Williamson wrote: On Thu, 2013-06-20 at 18:48 +1000, Alexey Kardashevskiy wrote: On 06/20/2013 05:47 PM, Benjamin Herrenschmidt wrote: On Thu, 2013-06-20 at 15:28 +1000, David Gibson wrote: Just out of curiosity - would not get_file() and fput_atomic() on a group's file* do the right job instead of vfio_group_add_external_user() and vfio_group_del_external_user()? I was thinking that too. Grabbing a file reference would certainly be the usual way of handling this sort of thing. But that wouldn't prevent the group ownership to be returned to the kernel or another user would it ? Holding the file pointer does not let the group-container_users counter go to zero How so? Holding the file pointer means the file won't go away, which means the group release function won't be called. That means the group won't go away, but that doesn't mean it's attached to an IOMMU. A user could call UNSET_CONTAINER. Uhh... *thinks*. Ah, I see. I think the interface should not take the group fd, but the container fd. Holding a reference to *that* would keep the necessary things around. But more to the point, it's the right thing semantically: The container is essentially the handle on a host iommu address space, and so that's what should be bound by the KVM call to a particular guest iommu address space. e.g. it would make no sense to bind two different groups to different guest iommu address spaces, if they were in the same container - the guest thinks they are different spaces, but if they're in the same container they must be the same space. -- David Gibson| I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson pgp9ekBfTPFTm.pgp Description: PGP signature
Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling
On Sat, 2013-06-22 at 22:03 +1000, David Gibson wrote: On Thu, Jun 20, 2013 at 08:55:13AM -0600, Alex Williamson wrote: On Thu, 2013-06-20 at 18:48 +1000, Alexey Kardashevskiy wrote: On 06/20/2013 05:47 PM, Benjamin Herrenschmidt wrote: On Thu, 2013-06-20 at 15:28 +1000, David Gibson wrote: Just out of curiosity - would not get_file() and fput_atomic() on a group's file* do the right job instead of vfio_group_add_external_user() and vfio_group_del_external_user()? I was thinking that too. Grabbing a file reference would certainly be the usual way of handling this sort of thing. But that wouldn't prevent the group ownership to be returned to the kernel or another user would it ? Holding the file pointer does not let the group-container_users counter go to zero How so? Holding the file pointer means the file won't go away, which means the group release function won't be called. That means the group won't go away, but that doesn't mean it's attached to an IOMMU. A user could call UNSET_CONTAINER. Uhh... *thinks*. Ah, I see. I think the interface should not take the group fd, but the container fd. Holding a reference to *that* would keep the necessary things around. But more to the point, it's the right thing semantically: The container is essentially the handle on a host iommu address space, and so that's what should be bound by the KVM call to a particular guest iommu address space. e.g. it would make no sense to bind two different groups to different guest iommu address spaces, if they were in the same container - the guest thinks they are different spaces, but if they're in the same container they must be the same space. While the container is the gateway to the iommu, what empowers the container to maintain an iommu is the group. What happens to a container when all the groups are disconnected or closed? Groups are the unit that indicates hardware access, not containers. Thanks, Alex -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling
On Sat, 2013-06-22 at 22:03 +1000, David Gibson wrote: I think the interface should not take the group fd, but the container fd. Holding a reference to *that* would keep the necessary things around. But more to the point, it's the right thing semantically: The container is essentially the handle on a host iommu address space, and so that's what should be bound by the KVM call to a particular guest iommu address space. e.g. it would make no sense to bind two different groups to different guest iommu address spaces, if they were in the same container - the guest thinks they are different spaces, but if they're in the same container they must be the same space. Interestingly, how are we going to extend that when/if we implement DDW ? DDW means an API by which the guest can request the creation of additional iommus for a given device (typically, in addition to the default smallish 32-bit one using 4k pages, the guest can request a larger window in 64-bit space using a larger page size). Cheers, Ben. -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling
On Thu, Jun 20, 2013 at 02:58:18PM +1000, Alexey Kardashevskiy wrote: On 06/20/2013 01:49 AM, Alex Williamson wrote: On Thu, 2013-06-20 at 00:50 +1000, Benjamin Herrenschmidt wrote: On Wed, 2013-06-19 at 11:58 +0200, Alexander Graf wrote: Alex, any objection ? Which Alex? :) Heh, mostly Williamson in this specific case but your input is still welcome :-) I think validate works, it keeps iteration logic out of the kernel which is a good thing. There still needs to be an interface for getting the iommu id in VFIO, but I suppose that one's for the other Alex and Jörg to comment on. I think getting the iommu fd is already covered by separate patches from Alexey. Do we need to make it a get/put interface instead ? vfio_validate_and_use_iommu(file, iommu_id); vfio_release_iommu(file, iommu_id); To ensure that the resource remains owned by the process until KVM is closed as well ? Or do we want to register with VFIO with a callback so that VFIO can call us if it needs us to give it up ? Can't we just register a handler on the fd and get notified when it closes? Can you kill VFIO access without closing the fd? That sounds actually harder :-) The question is basically: When we validate that relationship between a specific VFIO struct file with an iommu, what is the lifetime of that and how do we handle this lifetime properly. There's two ways for that sort of situation: The notification model where we get notified when the relationship is broken, and the refcount model where we become a user and thus delay the breaking of the relationship until we have been disposed of as well. In this specific case, it's hard to tell what is the right model from my perspective, which is why I would welcome Alex (W.) input. In the end, the solution will end up being in the form of APIs exposed by VFIO for use by KVM (via that symbol lookup mechanism) so Alex (W), as owner of VFIO at this stage, what do you want those to look like ? :-) My first thought is that we should use the same reference counting as we have for vfio devices (group-container_users). An interface for that might look like: int vfio_group_add_external_user(struct file *filep) { struct vfio_group *group = filep-private_data; if (filep-f_op != vfio_group_fops) return -EINVAL; if (!atomic_inc_not_zero(group-container_users)) return -EINVAL; return 0; } void vfio_group_del_external_user(struct file *filep) { struct vfio_group *group = filep-private_data; BUG_ON(filep-f_op != vfio_group_fops); vfio_group_try_dissolve_container(group); } int vfio_group_iommu_id_from_file(struct file *filep) { struct vfio_group *group = filep-private_data; BUG_ON(filep-f_op != vfio_group_fops); return iommu_group_id(group-iommu_group); } Would that work? Thanks, Just out of curiosity - would not get_file() and fput_atomic() on a group's file* do the right job instead of vfio_group_add_external_user() and vfio_group_del_external_user()? I was thinking that too. Grabbing a file reference would certainly be the usual way of handling this sort of thing. -- David Gibson| I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson pgpREe0KCyiw8.pgp Description: PGP signature
Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling
On Thu, 2013-06-20 at 15:28 +1000, David Gibson wrote: Just out of curiosity - would not get_file() and fput_atomic() on a group's file* do the right job instead of vfio_group_add_external_user() and vfio_group_del_external_user()? I was thinking that too. Grabbing a file reference would certainly be the usual way of handling this sort of thing. But that wouldn't prevent the group ownership to be returned to the kernel or another user would it ? Ben. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling
On 06/20/2013 05:47 PM, Benjamin Herrenschmidt wrote: On Thu, 2013-06-20 at 15:28 +1000, David Gibson wrote: Just out of curiosity - would not get_file() and fput_atomic() on a group's file* do the right job instead of vfio_group_add_external_user() and vfio_group_del_external_user()? I was thinking that too. Grabbing a file reference would certainly be the usual way of handling this sort of thing. But that wouldn't prevent the group ownership to be returned to the kernel or another user would it ? Holding the file pointer does not let the group-container_users counter go to zero and this is exactly what vfio_group_add_external_user() and vfio_group_del_external_user() do. The difference is only in absolute value - 2 vs. 3. No change in behaviour whether I use new vfio API or simply hold file* till KVM closes fd created when IOMMU was connected to LIOBN. And while this counter is not zero, QEMU cannot take ownership over the group. I am definitely still missing the bigger picture... -- Alexey -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling
On Thu, 2013-06-20 at 18:48 +1000, Alexey Kardashevskiy wrote: On 06/20/2013 05:47 PM, Benjamin Herrenschmidt wrote: On Thu, 2013-06-20 at 15:28 +1000, David Gibson wrote: Just out of curiosity - would not get_file() and fput_atomic() on a group's file* do the right job instead of vfio_group_add_external_user() and vfio_group_del_external_user()? I was thinking that too. Grabbing a file reference would certainly be the usual way of handling this sort of thing. But that wouldn't prevent the group ownership to be returned to the kernel or another user would it ? Holding the file pointer does not let the group-container_users counter go to zero How so? Holding the file pointer means the file won't go away, which means the group release function won't be called. That means the group won't go away, but that doesn't mean it's attached to an IOMMU. A user could call UNSET_CONTAINER. and this is exactly what vfio_group_add_external_user() and vfio_group_del_external_user() do. The difference is only in absolute value - 2 vs. 3. No change in behaviour whether I use new vfio API or simply hold file* till KVM closes fd created when IOMMU was connected to LIOBN. By that notion you could open(/dev/vfio/$GROUP) and you're safe, right? But what about SET_CONTAINER SET_IOMMU? All that you guarantee holding the file pointer is that the vfio_group exists. And while this counter is not zero, QEMU cannot take ownership over the group. I am definitely still missing the bigger picture... The bigger picture is that the group needs to exist AND it needs to be setup and maintained to have IOMMU protection. Actually, my first stab at add_external_user doesn't look sufficient, it needs to look more like vfio_group_get_device_fd, checking group-container-iommu and group_viable(). As written it would allow an external user after SET_CONTAINER without SET_IOMMU. It should also be part of the API that the external user must hold the file reference between add_external_use and del_external_user and do cleanup on any exit paths. Thanks, Alex -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling
On Thu, 2013-06-20 at 15:28 +1000, David Gibson wrote: Just out of curiosity - would not get_file() and fput_atomic() on a group's file* do the right job instead of vfio_group_add_external_user() and vfio_group_del_external_user()? I was thinking that too. Grabbing a file reference would certainly be the usual way of handling this sort of thing. But that wouldn't prevent the group ownership to be returned to the kernel or another user would it ? Ben. -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling
On 06/20/2013 05:47 PM, Benjamin Herrenschmidt wrote: On Thu, 2013-06-20 at 15:28 +1000, David Gibson wrote: Just out of curiosity - would not get_file() and fput_atomic() on a group's file* do the right job instead of vfio_group_add_external_user() and vfio_group_del_external_user()? I was thinking that too. Grabbing a file reference would certainly be the usual way of handling this sort of thing. But that wouldn't prevent the group ownership to be returned to the kernel or another user would it ? Holding the file pointer does not let the group-container_users counter go to zero and this is exactly what vfio_group_add_external_user() and vfio_group_del_external_user() do. The difference is only in absolute value - 2 vs. 3. No change in behaviour whether I use new vfio API or simply hold file* till KVM closes fd created when IOMMU was connected to LIOBN. And while this counter is not zero, QEMU cannot take ownership over the group. I am definitely still missing the bigger picture... -- Alexey -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling
On Thu, 2013-06-20 at 18:48 +1000, Alexey Kardashevskiy wrote: On 06/20/2013 05:47 PM, Benjamin Herrenschmidt wrote: On Thu, 2013-06-20 at 15:28 +1000, David Gibson wrote: Just out of curiosity - would not get_file() and fput_atomic() on a group's file* do the right job instead of vfio_group_add_external_user() and vfio_group_del_external_user()? I was thinking that too. Grabbing a file reference would certainly be the usual way of handling this sort of thing. But that wouldn't prevent the group ownership to be returned to the kernel or another user would it ? Holding the file pointer does not let the group-container_users counter go to zero How so? Holding the file pointer means the file won't go away, which means the group release function won't be called. That means the group won't go away, but that doesn't mean it's attached to an IOMMU. A user could call UNSET_CONTAINER. and this is exactly what vfio_group_add_external_user() and vfio_group_del_external_user() do. The difference is only in absolute value - 2 vs. 3. No change in behaviour whether I use new vfio API or simply hold file* till KVM closes fd created when IOMMU was connected to LIOBN. By that notion you could open(/dev/vfio/$GROUP) and you're safe, right? But what about SET_CONTAINER SET_IOMMU? All that you guarantee holding the file pointer is that the vfio_group exists. And while this counter is not zero, QEMU cannot take ownership over the group. I am definitely still missing the bigger picture... The bigger picture is that the group needs to exist AND it needs to be setup and maintained to have IOMMU protection. Actually, my first stab at add_external_user doesn't look sufficient, it needs to look more like vfio_group_get_device_fd, checking group-container-iommu and group_viable(). As written it would allow an external user after SET_CONTAINER without SET_IOMMU. It should also be part of the API that the external user must hold the file reference between add_external_use and del_external_user and do cleanup on any exit paths. Thanks, Alex -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling
On 19.06.2013, at 06:59, Benjamin Herrenschmidt wrote: On Wed, 2013-06-19 at 13:05 +0930, Rusty Russell wrote: symbol_get() won't try to load a module; it'll just fail. This is what you want, since they must have vfio in the kernel to get a valid fd... Ok, cool. I suppose what we want here Alexey is slightly higher level, something like: vfio_validate_iommu_id(file, iommu_id) Which verifies that the file that was passed in is allowed to use that iommu_id. That's a simple and flexible interface (ie, it will work even if we support multiple iommu IDs in the future for a vfio, for example for DDW windows etc...), the logic to know about the ID remains in qemu, this is strictly a validation call. That way we also don't have to expose the containing vfio struct etc... just that simple function. Alex, any objection ? Which Alex? :) I think validate works, it keeps iteration logic out of the kernel which is a good thing. There still needs to be an interface for getting the iommu id in VFIO, but I suppose that one's for the other Alex and Jörg to comment on. Do we need to make it a get/put interface instead ? vfio_validate_and_use_iommu(file, iommu_id); vfio_release_iommu(file, iommu_id); To ensure that the resource remains owned by the process until KVM is closed as well ? Or do we want to register with VFIO with a callback so that VFIO can call us if it needs us to give it up ? Can't we just register a handler on the fd and get notified when it closes? Can you kill VFIO access without closing the fd? Alex -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling
On Wed, 2013-06-19 at 11:58 +0200, Alexander Graf wrote: Alex, any objection ? Which Alex? :) Heh, mostly Williamson in this specific case but your input is still welcome :-) I think validate works, it keeps iteration logic out of the kernel which is a good thing. There still needs to be an interface for getting the iommu id in VFIO, but I suppose that one's for the other Alex and Jörg to comment on. I think getting the iommu fd is already covered by separate patches from Alexey. Do we need to make it a get/put interface instead ? vfio_validate_and_use_iommu(file, iommu_id); vfio_release_iommu(file, iommu_id); To ensure that the resource remains owned by the process until KVM is closed as well ? Or do we want to register with VFIO with a callback so that VFIO can call us if it needs us to give it up ? Can't we just register a handler on the fd and get notified when it closes? Can you kill VFIO access without closing the fd? That sounds actually harder :-) The question is basically: When we validate that relationship between a specific VFIO struct file with an iommu, what is the lifetime of that and how do we handle this lifetime properly. There's two ways for that sort of situation: The notification model where we get notified when the relationship is broken, and the refcount model where we become a user and thus delay the breaking of the relationship until we have been disposed of as well. In this specific case, it's hard to tell what is the right model from my perspective, which is why I would welcome Alex (W.) input. In the end, the solution will end up being in the form of APIs exposed by VFIO for use by KVM (via that symbol lookup mechanism) so Alex (W), as owner of VFIO at this stage, what do you want those to look like ? :-) Cheers, Ben. Alex -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling
On Thu, 2013-06-20 at 00:50 +1000, Benjamin Herrenschmidt wrote: On Wed, 2013-06-19 at 11:58 +0200, Alexander Graf wrote: Alex, any objection ? Which Alex? :) Heh, mostly Williamson in this specific case but your input is still welcome :-) I think validate works, it keeps iteration logic out of the kernel which is a good thing. There still needs to be an interface for getting the iommu id in VFIO, but I suppose that one's for the other Alex and Jörg to comment on. I think getting the iommu fd is already covered by separate patches from Alexey. Do we need to make it a get/put interface instead ? vfio_validate_and_use_iommu(file, iommu_id); vfio_release_iommu(file, iommu_id); To ensure that the resource remains owned by the process until KVM is closed as well ? Or do we want to register with VFIO with a callback so that VFIO can call us if it needs us to give it up ? Can't we just register a handler on the fd and get notified when it closes? Can you kill VFIO access without closing the fd? That sounds actually harder :-) The question is basically: When we validate that relationship between a specific VFIO struct file with an iommu, what is the lifetime of that and how do we handle this lifetime properly. There's two ways for that sort of situation: The notification model where we get notified when the relationship is broken, and the refcount model where we become a user and thus delay the breaking of the relationship until we have been disposed of as well. In this specific case, it's hard to tell what is the right model from my perspective, which is why I would welcome Alex (W.) input. In the end, the solution will end up being in the form of APIs exposed by VFIO for use by KVM (via that symbol lookup mechanism) so Alex (W), as owner of VFIO at this stage, what do you want those to look like ? :-) My first thought is that we should use the same reference counting as we have for vfio devices (group-container_users). An interface for that might look like: int vfio_group_add_external_user(struct file *filep) { struct vfio_group *group = filep-private_data; if (filep-f_op != vfio_group_fops) return -EINVAL; if (!atomic_inc_not_zero(group-container_users)) return -EINVAL; return 0; } void vfio_group_del_external_user(struct file *filep) { struct vfio_group *group = filep-private_data; BUG_ON(filep-f_op != vfio_group_fops); vfio_group_try_dissolve_container(group); } int vfio_group_iommu_id_from_file(struct file *filep) { struct vfio_group *group = filep-private_data; BUG_ON(filep-f_op != vfio_group_fops); return iommu_group_id(group-iommu_group); } Would that work? Thanks, Alex -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling
On 06/20/2013 01:49 AM, Alex Williamson wrote: On Thu, 2013-06-20 at 00:50 +1000, Benjamin Herrenschmidt wrote: On Wed, 2013-06-19 at 11:58 +0200, Alexander Graf wrote: Alex, any objection ? Which Alex? :) Heh, mostly Williamson in this specific case but your input is still welcome :-) I think validate works, it keeps iteration logic out of the kernel which is a good thing. There still needs to be an interface for getting the iommu id in VFIO, but I suppose that one's for the other Alex and Jörg to comment on. I think getting the iommu fd is already covered by separate patches from Alexey. Do we need to make it a get/put interface instead ? vfio_validate_and_use_iommu(file, iommu_id); vfio_release_iommu(file, iommu_id); To ensure that the resource remains owned by the process until KVM is closed as well ? Or do we want to register with VFIO with a callback so that VFIO can call us if it needs us to give it up ? Can't we just register a handler on the fd and get notified when it closes? Can you kill VFIO access without closing the fd? That sounds actually harder :-) The question is basically: When we validate that relationship between a specific VFIO struct file with an iommu, what is the lifetime of that and how do we handle this lifetime properly. There's two ways for that sort of situation: The notification model where we get notified when the relationship is broken, and the refcount model where we become a user and thus delay the breaking of the relationship until we have been disposed of as well. In this specific case, it's hard to tell what is the right model from my perspective, which is why I would welcome Alex (W.) input. In the end, the solution will end up being in the form of APIs exposed by VFIO for use by KVM (via that symbol lookup mechanism) so Alex (W), as owner of VFIO at this stage, what do you want those to look like ? :-) My first thought is that we should use the same reference counting as we have for vfio devices (group-container_users). An interface for that might look like: int vfio_group_add_external_user(struct file *filep) { struct vfio_group *group = filep-private_data; if (filep-f_op != vfio_group_fops) return -EINVAL; if (!atomic_inc_not_zero(group-container_users)) return -EINVAL; return 0; } void vfio_group_del_external_user(struct file *filep) { struct vfio_group *group = filep-private_data; BUG_ON(filep-f_op != vfio_group_fops); vfio_group_try_dissolve_container(group); } int vfio_group_iommu_id_from_file(struct file *filep) { struct vfio_group *group = filep-private_data; BUG_ON(filep-f_op != vfio_group_fops); return iommu_group_id(group-iommu_group); } Would that work? Thanks, Just out of curiosity - would not get_file() and fput_atomic() on a group's file* do the right job instead of vfio_group_add_external_user() and vfio_group_del_external_user()? -- Alexey -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling
On 19.06.2013, at 06:59, Benjamin Herrenschmidt wrote: On Wed, 2013-06-19 at 13:05 +0930, Rusty Russell wrote: symbol_get() won't try to load a module; it'll just fail. This is what you want, since they must have vfio in the kernel to get a valid fd... Ok, cool. I suppose what we want here Alexey is slightly higher level, something like: vfio_validate_iommu_id(file, iommu_id) Which verifies that the file that was passed in is allowed to use that iommu_id. That's a simple and flexible interface (ie, it will work even if we support multiple iommu IDs in the future for a vfio, for example for DDW windows etc...), the logic to know about the ID remains in qemu, this is strictly a validation call. That way we also don't have to expose the containing vfio struct etc... just that simple function. Alex, any objection ? Which Alex? :) I think validate works, it keeps iteration logic out of the kernel which is a good thing. There still needs to be an interface for getting the iommu id in VFIO, but I suppose that one's for the other Alex and Jörg to comment on. Do we need to make it a get/put interface instead ? vfio_validate_and_use_iommu(file, iommu_id); vfio_release_iommu(file, iommu_id); To ensure that the resource remains owned by the process until KVM is closed as well ? Or do we want to register with VFIO with a callback so that VFIO can call us if it needs us to give it up ? Can't we just register a handler on the fd and get notified when it closes? Can you kill VFIO access without closing the fd? Alex -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling
On Wed, 2013-06-19 at 11:58 +0200, Alexander Graf wrote: Alex, any objection ? Which Alex? :) Heh, mostly Williamson in this specific case but your input is still welcome :-) I think validate works, it keeps iteration logic out of the kernel which is a good thing. There still needs to be an interface for getting the iommu id in VFIO, but I suppose that one's for the other Alex and Jörg to comment on. I think getting the iommu fd is already covered by separate patches from Alexey. Do we need to make it a get/put interface instead ? vfio_validate_and_use_iommu(file, iommu_id); vfio_release_iommu(file, iommu_id); To ensure that the resource remains owned by the process until KVM is closed as well ? Or do we want to register with VFIO with a callback so that VFIO can call us if it needs us to give it up ? Can't we just register a handler on the fd and get notified when it closes? Can you kill VFIO access without closing the fd? That sounds actually harder :-) The question is basically: When we validate that relationship between a specific VFIO struct file with an iommu, what is the lifetime of that and how do we handle this lifetime properly. There's two ways for that sort of situation: The notification model where we get notified when the relationship is broken, and the refcount model where we become a user and thus delay the breaking of the relationship until we have been disposed of as well. In this specific case, it's hard to tell what is the right model from my perspective, which is why I would welcome Alex (W.) input. In the end, the solution will end up being in the form of APIs exposed by VFIO for use by KVM (via that symbol lookup mechanism) so Alex (W), as owner of VFIO at this stage, what do you want those to look like ? :-) Cheers, Ben. Alex -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling
On Thu, 2013-06-20 at 00:50 +1000, Benjamin Herrenschmidt wrote: On Wed, 2013-06-19 at 11:58 +0200, Alexander Graf wrote: Alex, any objection ? Which Alex? :) Heh, mostly Williamson in this specific case but your input is still welcome :-) I think validate works, it keeps iteration logic out of the kernel which is a good thing. There still needs to be an interface for getting the iommu id in VFIO, but I suppose that one's for the other Alex and Jörg to comment on. I think getting the iommu fd is already covered by separate patches from Alexey. Do we need to make it a get/put interface instead ? vfio_validate_and_use_iommu(file, iommu_id); vfio_release_iommu(file, iommu_id); To ensure that the resource remains owned by the process until KVM is closed as well ? Or do we want to register with VFIO with a callback so that VFIO can call us if it needs us to give it up ? Can't we just register a handler on the fd and get notified when it closes? Can you kill VFIO access without closing the fd? That sounds actually harder :-) The question is basically: When we validate that relationship between a specific VFIO struct file with an iommu, what is the lifetime of that and how do we handle this lifetime properly. There's two ways for that sort of situation: The notification model where we get notified when the relationship is broken, and the refcount model where we become a user and thus delay the breaking of the relationship until we have been disposed of as well. In this specific case, it's hard to tell what is the right model from my perspective, which is why I would welcome Alex (W.) input. In the end, the solution will end up being in the form of APIs exposed by VFIO for use by KVM (via that symbol lookup mechanism) so Alex (W), as owner of VFIO at this stage, what do you want those to look like ? :-) My first thought is that we should use the same reference counting as we have for vfio devices (group-container_users). An interface for that might look like: int vfio_group_add_external_user(struct file *filep) { struct vfio_group *group = filep-private_data; if (filep-f_op != vfio_group_fops) return -EINVAL; if (!atomic_inc_not_zero(group-container_users)) return -EINVAL; return 0; } void vfio_group_del_external_user(struct file *filep) { struct vfio_group *group = filep-private_data; BUG_ON(filep-f_op != vfio_group_fops); vfio_group_try_dissolve_container(group); } int vfio_group_iommu_id_from_file(struct file *filep) { struct vfio_group *group = filep-private_data; BUG_ON(filep-f_op != vfio_group_fops); return iommu_group_id(group-iommu_group); } Would that work? Thanks, Alex -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling
On 06/20/2013 01:49 AM, Alex Williamson wrote: On Thu, 2013-06-20 at 00:50 +1000, Benjamin Herrenschmidt wrote: On Wed, 2013-06-19 at 11:58 +0200, Alexander Graf wrote: Alex, any objection ? Which Alex? :) Heh, mostly Williamson in this specific case but your input is still welcome :-) I think validate works, it keeps iteration logic out of the kernel which is a good thing. There still needs to be an interface for getting the iommu id in VFIO, but I suppose that one's for the other Alex and Jörg to comment on. I think getting the iommu fd is already covered by separate patches from Alexey. Do we need to make it a get/put interface instead ? vfio_validate_and_use_iommu(file, iommu_id); vfio_release_iommu(file, iommu_id); To ensure that the resource remains owned by the process until KVM is closed as well ? Or do we want to register with VFIO with a callback so that VFIO can call us if it needs us to give it up ? Can't we just register a handler on the fd and get notified when it closes? Can you kill VFIO access without closing the fd? That sounds actually harder :-) The question is basically: When we validate that relationship between a specific VFIO struct file with an iommu, what is the lifetime of that and how do we handle this lifetime properly. There's two ways for that sort of situation: The notification model where we get notified when the relationship is broken, and the refcount model where we become a user and thus delay the breaking of the relationship until we have been disposed of as well. In this specific case, it's hard to tell what is the right model from my perspective, which is why I would welcome Alex (W.) input. In the end, the solution will end up being in the form of APIs exposed by VFIO for use by KVM (via that symbol lookup mechanism) so Alex (W), as owner of VFIO at this stage, what do you want those to look like ? :-) My first thought is that we should use the same reference counting as we have for vfio devices (group-container_users). An interface for that might look like: int vfio_group_add_external_user(struct file *filep) { struct vfio_group *group = filep-private_data; if (filep-f_op != vfio_group_fops) return -EINVAL; if (!atomic_inc_not_zero(group-container_users)) return -EINVAL; return 0; } void vfio_group_del_external_user(struct file *filep) { struct vfio_group *group = filep-private_data; BUG_ON(filep-f_op != vfio_group_fops); vfio_group_try_dissolve_container(group); } int vfio_group_iommu_id_from_file(struct file *filep) { struct vfio_group *group = filep-private_data; BUG_ON(filep-f_op != vfio_group_fops); return iommu_group_id(group-iommu_group); } Would that work? Thanks, Just out of curiosity - would not get_file() and fput_atomic() on a group's file* do the right job instead of vfio_group_add_external_user() and vfio_group_del_external_user()? -- Alexey -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling
On Tue, 2013-06-18 at 14:38 +1000, Benjamin Herrenschmidt wrote: On Mon, 2013-06-17 at 20:32 -0600, Alex Williamson wrote: Right, we don't want to create dependencies across modules. I don't have a vision for how this should work. This is effectively a complete side-band to vfio, so we're really just dealing in the iommu group space. Maybe there needs to be some kind of registration of ownership for the group using some kind of token. It would need to include some kind of notification when that ownership ends. That might also be a convenient tag to toggle driver probing off for devices in the group. Other ideas? Thanks, All of that smells nasty like it will need a pile of bloody infrastructure which makes me think it's too complicated and not the right approach. How does access control work today on x86/VFIO ? Can you give me a bit more details ? I didn't get a good grasp in your previous email The current model is not x86 specific, but it only covers doing iommu and device access through vfio. The kink here is that we're trying to do device access and setup through vfio, but iommu manipulation through kvm. We may want to revisit whether we can do the in-kernel iommu manipulation through vfio rather than kvm. For vfio in general, the group is the unit of ownership. A user is granted access to /dev/vfio/$GROUP through file permissions. The user opens the group and a container (/dev/vfio/vfio) and calls SET_CONTAINER on the group. If supported by the platform, multiple groups can be set to the same container, which allows for iommu domain sharing. Once a group is associated with a container, an iommu backend can be initialized for the container. Only then can a device be accessed through the group. So even if we were to pass a vfio group file descriptor into kvm and it matched as some kind of ownership token on the iommu group, it's not clear that's sufficient to assume we can start programming the iommu. Thanks, Alex From the look of it, the VFIO file descriptor is what has the access control to the underlying iommu, is this right ? So we somewhat need to transfer (or copy) that ownership from the VFIO fd to the KVM VM. I don't see a way to do that without some cross-layering here... Rusty, are you aware of some kernel mechanism we can use for that ? Cheers, Ben. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling
On Tue, 2013-06-18 at 08:48 -0600, Alex Williamson wrote: On Tue, 2013-06-18 at 14:38 +1000, Benjamin Herrenschmidt wrote: On Mon, 2013-06-17 at 20:32 -0600, Alex Williamson wrote: Right, we don't want to create dependencies across modules. I don't have a vision for how this should work. This is effectively a complete side-band to vfio, so we're really just dealing in the iommu group space. Maybe there needs to be some kind of registration of ownership for the group using some kind of token. It would need to include some kind of notification when that ownership ends. That might also be a convenient tag to toggle driver probing off for devices in the group. Other ideas? Thanks, All of that smells nasty like it will need a pile of bloody infrastructure which makes me think it's too complicated and not the right approach. How does access control work today on x86/VFIO ? Can you give me a bit more details ? I didn't get a good grasp in your previous email The current model is not x86 specific, but it only covers doing iommu and device access through vfio. The kink here is that we're trying to do device access and setup through vfio, but iommu manipulation through kvm. We may want to revisit whether we can do the in-kernel iommu manipulation through vfio rather than kvm. How would that be possible ? The hypercalls from the guest arrive in KVM... in a very very specific restricted environment which we call real mode (MMU off but still running in guest context), where we try to do as much as possible, or in virtual mode, where they get handled as normal KVM exits. The only way we could handle them in VFIO would be if somewhat VFIO registered callbacks with KVM... if we have that sort of cross-dependency, then we may as well have a simpler one where VFIO tells KVM what iommu is available for the VM For vfio in general, the group is the unit of ownership. A user is granted access to /dev/vfio/$GROUP through file permissions. The user opens the group and a container (/dev/vfio/vfio) and calls SET_CONTAINER on the group. If supported by the platform, multiple groups can be set to the same container, which allows for iommu domain sharing. Once a group is associated with a container, an iommu backend can be initialized for the container. Only then can a device be accessed through the group. So even if we were to pass a vfio group file descriptor into kvm and it matched as some kind of ownership token on the iommu group, it's not clear that's sufficient to assume we can start programming the iommu. Thanks, Your scheme seems to me that it would have the same problem if you wanted to do virtualized iommu In any case, this is a big deal. We have a requirement for pass-through. It cannot work with any remotely usable performance level if we don't implement the calls in KVM, so it needs to be sorted one way or another and I'm at a loss how here... Ben. Alex From the look of it, the VFIO file descriptor is what has the access control to the underlying iommu, is this right ? So we somewhat need to transfer (or copy) that ownership from the VFIO fd to the KVM VM. I don't see a way to do that without some cross-layering here... Rusty, are you aware of some kernel mechanism we can use for that ? Cheers, Ben. -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling
On 06/16/2013 02:39 PM, Benjamin Herrenschmidt wrote: static pte_t kvmppc_lookup_pte(pgd_t *pgdir, unsigned long hva, bool writing, -unsigned long *pte_sizep) +unsigned long *pte_sizep, bool do_get_page) { pte_t *ptep; unsigned int shift = 0; @@ -135,6 +136,14 @@ static pte_t kvmppc_lookup_pte(pgd_t *pgdir, unsigned long hva, bool writing, if (!pte_present(*ptep)) return __pte(0); +/* + * Put huge pages handling to the virtual mode. + * The only exception is for TCE list pages which we + * do need to call get_page() for. + */ +if ((*pte_sizep PAGE_SIZE) do_get_page) +return __pte(0); + /* wait until _PAGE_BUSY is clear then set it atomically */ __asm__ __volatile__ ( 1: ldarx %0,0,%3\n @@ -148,6 +157,18 @@ static pte_t kvmppc_lookup_pte(pgd_t *pgdir, unsigned long hva, bool writing, : cc); ret = pte; +if (do_get_page pte_present(pte) (!writing || pte_write(pte))) { +struct page *pg = NULL; +pg = realmode_pfn_to_page(pte_pfn(pte)); +if (realmode_get_page(pg)) { +ret = __pte(0); +} else { +pte = pte_mkyoung(pte); +if (writing) +pte = pte_mkdirty(pte); +} +} +*ptep = pte;/* clears _PAGE_BUSY */ return ret; } So now you are adding the clearing of _PAGE_BUSY that was missing for your first patch, except that this is not enough since that means that in the emulated case (ie, !do_get_page) you will in essence return and then use a PTE that is not locked without any synchronization to ensure that the underlying page doesn't go away... then you'll dereference that page. So either make everything use speculative get_page, or make the emulated case use the MMU notifier to drop the operation in case of collision. The former looks easier. Also, any specific reason why you do: - Lock the PTE - get_page() - Unlock the PTE Instead of - Read the PTE - get_page_unless_zero - re-check PTE Like get_user_pages_fast() does ? The former will be two atomic ops, the latter only one (faster), but maybe you have a good reason why that can't work... If we want to set dirty and young bits for pte then I do not know how to avoid _PAGE_BUSY. -- Alexey -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling
Alex Williamson alex.william...@redhat.com writes: On Mon, 2013-06-17 at 13:56 +1000, Benjamin Herrenschmidt wrote: On Sun, 2013-06-16 at 21:13 -0600, Alex Williamson wrote: IOMMU groups themselves don't provide security, they're accessed by interfaces like VFIO, which provide the security. Given a brief look, I agree, this looks like a possible backdoor. The typical VFIO way to handle this would be to pass a VFIO file descriptor here to prove that the process has access to the IOMMU group. This is how /dev/vfio/vfio gains the ability to setup an IOMMU domain an do mappings with the SET_CONTAINER ioctl using a group fd. Thanks, How do you envision that in the kernel ? IE. I'm in KVM code, gets that vfio fd, what do I do with it ? Basically, KVM needs to know that the user is allowed to use that iommu group. I don't think we want KVM however to call into VFIO directly right ? Right, we don't want to create dependencies across modules. I don't have a vision for how this should work. This is effectively a complete side-band to vfio, so we're really just dealing in the iommu group space. Maybe there needs to be some kind of registration of ownership for the group using some kind of token. It would need to include some kind of notification when that ownership ends. That might also be a convenient tag to toggle driver probing off for devices in the group. Other ideas? Thanks, It's actually not that bad. eg. struct vfio_container *vfio_container_from_file(struct file *filp) { if (filp-f_op != vfio_device_fops) return ERR_PTR(-EINVAL); /* OK it really is a vfio fd, return the data. */ } EXPORT_SYMBOL_GPL(vfio_container_from_file); ... inside KVM_CREATE_SPAPR_TCE_IOMMU: struct file *vfio_filp; struct vfio_container *(lookup)(struct file *filp); vfio_filp = fget(create_tce_iommu.fd); if (!vfio) ret = -EBADF; lookup = symbol_get(vfio_container_from_file); if (!lookup) ret = -EINVAL; else { container = lookup(vfio_filp); if (IS_ERR(container)) ret = PTR_ERR(container); else ... symbol_put(vfio_container_from_file); } symbol_get() won't try to load a module; it'll just fail. This is what you want, since they must have vfio in the kernel to get a valid fd... Hope that helps, Rusty. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling
On Wed, 2013-06-19 at 13:05 +0930, Rusty Russell wrote: symbol_get() won't try to load a module; it'll just fail. This is what you want, since they must have vfio in the kernel to get a valid fd... Ok, cool. I suppose what we want here Alexey is slightly higher level, something like: vfio_validate_iommu_id(file, iommu_id) Which verifies that the file that was passed in is allowed to use that iommu_id. That's a simple and flexible interface (ie, it will work even if we support multiple iommu IDs in the future for a vfio, for example for DDW windows etc...), the logic to know about the ID remains in qemu, this is strictly a validation call. That way we also don't have to expose the containing vfio struct etc... just that simple function. Alex, any objection ? Do we need to make it a get/put interface instead ? vfio_validate_and_use_iommu(file, iommu_id); vfio_release_iommu(file, iommu_id); To ensure that the resource remains owned by the process until KVM is closed as well ? Or do we want to register with VFIO with a callback so that VFIO can call us if it needs us to give it up ? Cheers, Ben. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling
On Tue, 2013-06-18 at 14:38 +1000, Benjamin Herrenschmidt wrote: On Mon, 2013-06-17 at 20:32 -0600, Alex Williamson wrote: Right, we don't want to create dependencies across modules. I don't have a vision for how this should work. This is effectively a complete side-band to vfio, so we're really just dealing in the iommu group space. Maybe there needs to be some kind of registration of ownership for the group using some kind of token. It would need to include some kind of notification when that ownership ends. That might also be a convenient tag to toggle driver probing off for devices in the group. Other ideas? Thanks, All of that smells nasty like it will need a pile of bloody infrastructure which makes me think it's too complicated and not the right approach. How does access control work today on x86/VFIO ? Can you give me a bit more details ? I didn't get a good grasp in your previous email The current model is not x86 specific, but it only covers doing iommu and device access through vfio. The kink here is that we're trying to do device access and setup through vfio, but iommu manipulation through kvm. We may want to revisit whether we can do the in-kernel iommu manipulation through vfio rather than kvm. For vfio in general, the group is the unit of ownership. A user is granted access to /dev/vfio/$GROUP through file permissions. The user opens the group and a container (/dev/vfio/vfio) and calls SET_CONTAINER on the group. If supported by the platform, multiple groups can be set to the same container, which allows for iommu domain sharing. Once a group is associated with a container, an iommu backend can be initialized for the container. Only then can a device be accessed through the group. So even if we were to pass a vfio group file descriptor into kvm and it matched as some kind of ownership token on the iommu group, it's not clear that's sufficient to assume we can start programming the iommu. Thanks, Alex From the look of it, the VFIO file descriptor is what has the access control to the underlying iommu, is this right ? So we somewhat need to transfer (or copy) that ownership from the VFIO fd to the KVM VM. I don't see a way to do that without some cross-layering here... Rusty, are you aware of some kernel mechanism we can use for that ? Cheers, Ben. -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling
On Tue, 2013-06-18 at 08:48 -0600, Alex Williamson wrote: On Tue, 2013-06-18 at 14:38 +1000, Benjamin Herrenschmidt wrote: On Mon, 2013-06-17 at 20:32 -0600, Alex Williamson wrote: Right, we don't want to create dependencies across modules. I don't have a vision for how this should work. This is effectively a complete side-band to vfio, so we're really just dealing in the iommu group space. Maybe there needs to be some kind of registration of ownership for the group using some kind of token. It would need to include some kind of notification when that ownership ends. That might also be a convenient tag to toggle driver probing off for devices in the group. Other ideas? Thanks, All of that smells nasty like it will need a pile of bloody infrastructure which makes me think it's too complicated and not the right approach. How does access control work today on x86/VFIO ? Can you give me a bit more details ? I didn't get a good grasp in your previous email The current model is not x86 specific, but it only covers doing iommu and device access through vfio. The kink here is that we're trying to do device access and setup through vfio, but iommu manipulation through kvm. We may want to revisit whether we can do the in-kernel iommu manipulation through vfio rather than kvm. How would that be possible ? The hypercalls from the guest arrive in KVM... in a very very specific restricted environment which we call real mode (MMU off but still running in guest context), where we try to do as much as possible, or in virtual mode, where they get handled as normal KVM exits. The only way we could handle them in VFIO would be if somewhat VFIO registered callbacks with KVM... if we have that sort of cross-dependency, then we may as well have a simpler one where VFIO tells KVM what iommu is available for the VM For vfio in general, the group is the unit of ownership. A user is granted access to /dev/vfio/$GROUP through file permissions. The user opens the group and a container (/dev/vfio/vfio) and calls SET_CONTAINER on the group. If supported by the platform, multiple groups can be set to the same container, which allows for iommu domain sharing. Once a group is associated with a container, an iommu backend can be initialized for the container. Only then can a device be accessed through the group. So even if we were to pass a vfio group file descriptor into kvm and it matched as some kind of ownership token on the iommu group, it's not clear that's sufficient to assume we can start programming the iommu. Thanks, Your scheme seems to me that it would have the same problem if you wanted to do virtualized iommu In any case, this is a big deal. We have a requirement for pass-through. It cannot work with any remotely usable performance level if we don't implement the calls in KVM, so it needs to be sorted one way or another and I'm at a loss how here... Ben. Alex From the look of it, the VFIO file descriptor is what has the access control to the underlying iommu, is this right ? So we somewhat need to transfer (or copy) that ownership from the VFIO fd to the KVM VM. I don't see a way to do that without some cross-layering here... Rusty, are you aware of some kernel mechanism we can use for that ? Cheers, Ben. -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling
On 06/16/2013 02:39 PM, Benjamin Herrenschmidt wrote: static pte_t kvmppc_lookup_pte(pgd_t *pgdir, unsigned long hva, bool writing, -unsigned long *pte_sizep) +unsigned long *pte_sizep, bool do_get_page) { pte_t *ptep; unsigned int shift = 0; @@ -135,6 +136,14 @@ static pte_t kvmppc_lookup_pte(pgd_t *pgdir, unsigned long hva, bool writing, if (!pte_present(*ptep)) return __pte(0); +/* + * Put huge pages handling to the virtual mode. + * The only exception is for TCE list pages which we + * do need to call get_page() for. + */ +if ((*pte_sizep PAGE_SIZE) do_get_page) +return __pte(0); + /* wait until _PAGE_BUSY is clear then set it atomically */ __asm__ __volatile__ ( 1: ldarx %0,0,%3\n @@ -148,6 +157,18 @@ static pte_t kvmppc_lookup_pte(pgd_t *pgdir, unsigned long hva, bool writing, : cc); ret = pte; +if (do_get_page pte_present(pte) (!writing || pte_write(pte))) { +struct page *pg = NULL; +pg = realmode_pfn_to_page(pte_pfn(pte)); +if (realmode_get_page(pg)) { +ret = __pte(0); +} else { +pte = pte_mkyoung(pte); +if (writing) +pte = pte_mkdirty(pte); +} +} +*ptep = pte;/* clears _PAGE_BUSY */ return ret; } So now you are adding the clearing of _PAGE_BUSY that was missing for your first patch, except that this is not enough since that means that in the emulated case (ie, !do_get_page) you will in essence return and then use a PTE that is not locked without any synchronization to ensure that the underlying page doesn't go away... then you'll dereference that page. So either make everything use speculative get_page, or make the emulated case use the MMU notifier to drop the operation in case of collision. The former looks easier. Also, any specific reason why you do: - Lock the PTE - get_page() - Unlock the PTE Instead of - Read the PTE - get_page_unless_zero - re-check PTE Like get_user_pages_fast() does ? The former will be two atomic ops, the latter only one (faster), but maybe you have a good reason why that can't work... If we want to set dirty and young bits for pte then I do not know how to avoid _PAGE_BUSY. -- Alexey -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling
Alex Williamson alex.william...@redhat.com writes: On Mon, 2013-06-17 at 13:56 +1000, Benjamin Herrenschmidt wrote: On Sun, 2013-06-16 at 21:13 -0600, Alex Williamson wrote: IOMMU groups themselves don't provide security, they're accessed by interfaces like VFIO, which provide the security. Given a brief look, I agree, this looks like a possible backdoor. The typical VFIO way to handle this would be to pass a VFIO file descriptor here to prove that the process has access to the IOMMU group. This is how /dev/vfio/vfio gains the ability to setup an IOMMU domain an do mappings with the SET_CONTAINER ioctl using a group fd. Thanks, How do you envision that in the kernel ? IE. I'm in KVM code, gets that vfio fd, what do I do with it ? Basically, KVM needs to know that the user is allowed to use that iommu group. I don't think we want KVM however to call into VFIO directly right ? Right, we don't want to create dependencies across modules. I don't have a vision for how this should work. This is effectively a complete side-band to vfio, so we're really just dealing in the iommu group space. Maybe there needs to be some kind of registration of ownership for the group using some kind of token. It would need to include some kind of notification when that ownership ends. That might also be a convenient tag to toggle driver probing off for devices in the group. Other ideas? Thanks, It's actually not that bad. eg. struct vfio_container *vfio_container_from_file(struct file *filp) { if (filp-f_op != vfio_device_fops) return ERR_PTR(-EINVAL); /* OK it really is a vfio fd, return the data. */ } EXPORT_SYMBOL_GPL(vfio_container_from_file); ... inside KVM_CREATE_SPAPR_TCE_IOMMU: struct file *vfio_filp; struct vfio_container *(lookup)(struct file *filp); vfio_filp = fget(create_tce_iommu.fd); if (!vfio) ret = -EBADF; lookup = symbol_get(vfio_container_from_file); if (!lookup) ret = -EINVAL; else { container = lookup(vfio_filp); if (IS_ERR(container)) ret = PTR_ERR(container); else ... symbol_put(vfio_container_from_file); } symbol_get() won't try to load a module; it'll just fail. This is what you want, since they must have vfio in the kernel to get a valid fd... Hope that helps, Rusty. -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling
On Wed, 2013-06-19 at 13:05 +0930, Rusty Russell wrote: symbol_get() won't try to load a module; it'll just fail. This is what you want, since they must have vfio in the kernel to get a valid fd... Ok, cool. I suppose what we want here Alexey is slightly higher level, something like: vfio_validate_iommu_id(file, iommu_id) Which verifies that the file that was passed in is allowed to use that iommu_id. That's a simple and flexible interface (ie, it will work even if we support multiple iommu IDs in the future for a vfio, for example for DDW windows etc...), the logic to know about the ID remains in qemu, this is strictly a validation call. That way we also don't have to expose the containing vfio struct etc... just that simple function. Alex, any objection ? Do we need to make it a get/put interface instead ? vfio_validate_and_use_iommu(file, iommu_id); vfio_release_iommu(file, iommu_id); To ensure that the resource remains owned by the process until KVM is closed as well ? Or do we want to register with VFIO with a callback so that VFIO can call us if it needs us to give it up ? Cheers, Ben. -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling
On Mon, 2013-06-17 at 13:56 +1000, Benjamin Herrenschmidt wrote: On Sun, 2013-06-16 at 21:13 -0600, Alex Williamson wrote: IOMMU groups themselves don't provide security, they're accessed by interfaces like VFIO, which provide the security. Given a brief look, I agree, this looks like a possible backdoor. The typical VFIO way to handle this would be to pass a VFIO file descriptor here to prove that the process has access to the IOMMU group. This is how /dev/vfio/vfio gains the ability to setup an IOMMU domain an do mappings with the SET_CONTAINER ioctl using a group fd. Thanks, How do you envision that in the kernel ? IE. I'm in KVM code, gets that vfio fd, what do I do with it ? Basically, KVM needs to know that the user is allowed to use that iommu group. I don't think we want KVM however to call into VFIO directly right ? Right, we don't want to create dependencies across modules. I don't have a vision for how this should work. This is effectively a complete side-band to vfio, so we're really just dealing in the iommu group space. Maybe there needs to be some kind of registration of ownership for the group using some kind of token. It would need to include some kind of notification when that ownership ends. That might also be a convenient tag to toggle driver probing off for devices in the group. Other ideas? Thanks, Alex -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling
On Mon, 2013-06-17 at 20:32 -0600, Alex Williamson wrote: Right, we don't want to create dependencies across modules. I don't have a vision for how this should work. This is effectively a complete side-band to vfio, so we're really just dealing in the iommu group space. Maybe there needs to be some kind of registration of ownership for the group using some kind of token. It would need to include some kind of notification when that ownership ends. That might also be a convenient tag to toggle driver probing off for devices in the group. Other ideas? Thanks, All of that smells nasty like it will need a pile of bloody infrastructure which makes me think it's too complicated and not the right approach. How does access control work today on x86/VFIO ? Can you give me a bit more details ? I didn't get a good grasp in your previous email From the look of it, the VFIO file descriptor is what has the access control to the underlying iommu, is this right ? So we somewhat need to transfer (or copy) that ownership from the VFIO fd to the KVM VM. I don't see a way to do that without some cross-layering here... Rusty, are you aware of some kernel mechanism we can use for that ? Cheers, Ben. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling
On Mon, 2013-06-17 at 13:56 +1000, Benjamin Herrenschmidt wrote: On Sun, 2013-06-16 at 21:13 -0600, Alex Williamson wrote: IOMMU groups themselves don't provide security, they're accessed by interfaces like VFIO, which provide the security. Given a brief look, I agree, this looks like a possible backdoor. The typical VFIO way to handle this would be to pass a VFIO file descriptor here to prove that the process has access to the IOMMU group. This is how /dev/vfio/vfio gains the ability to setup an IOMMU domain an do mappings with the SET_CONTAINER ioctl using a group fd. Thanks, How do you envision that in the kernel ? IE. I'm in KVM code, gets that vfio fd, what do I do with it ? Basically, KVM needs to know that the user is allowed to use that iommu group. I don't think we want KVM however to call into VFIO directly right ? Right, we don't want to create dependencies across modules. I don't have a vision for how this should work. This is effectively a complete side-band to vfio, so we're really just dealing in the iommu group space. Maybe there needs to be some kind of registration of ownership for the group using some kind of token. It would need to include some kind of notification when that ownership ends. That might also be a convenient tag to toggle driver probing off for devices in the group. Other ideas? Thanks, Alex -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling
On Mon, 2013-06-17 at 20:32 -0600, Alex Williamson wrote: Right, we don't want to create dependencies across modules. I don't have a vision for how this should work. This is effectively a complete side-band to vfio, so we're really just dealing in the iommu group space. Maybe there needs to be some kind of registration of ownership for the group using some kind of token. It would need to include some kind of notification when that ownership ends. That might also be a convenient tag to toggle driver probing off for devices in the group. Other ideas? Thanks, All of that smells nasty like it will need a pile of bloody infrastructure which makes me think it's too complicated and not the right approach. How does access control work today on x86/VFIO ? Can you give me a bit more details ? I didn't get a good grasp in your previous email From the look of it, the VFIO file descriptor is what has the access control to the underlying iommu, is this right ? So we somewhat need to transfer (or copy) that ownership from the VFIO fd to the KVM VM. I don't see a way to do that without some cross-layering here... Rusty, are you aware of some kernel mechanism we can use for that ? Cheers, Ben. -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling
On 05.06.2013, at 08:11, Alexey Kardashevskiy wrote: This allows the host kernel to handle H_PUT_TCE, H_PUT_TCE_INDIRECT and H_STUFF_TCE requests without passing them to QEMU, which should save time on switching to QEMU and back. Both real and virtual modes are supported - whenever the kernel fails to handle TCE request, it passes it to the virtual mode. If it the virtual mode handlers fail, then the request is passed to the user mode, for example, to QEMU. This adds a new KVM_CAP_SPAPR_TCE_IOMMU ioctl to asssociate a virtual PCI bus ID (LIOBN) with an IOMMU group, which enables in-kernel handling of IOMMU map/unmap. Tests show that this patch increases transmission speed from 220MB/s to 750..1020MB/s on 10Gb network (Chelsea CXGB3 10Gb ethernet card). Cc: David Gibson da...@gibson.dropbear.id.au Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru Signed-off-by: Paul Mackerras pau...@samba.org --- Changes: 2013/06/05: * changed capability number * changed ioctl number * update the doc article number 2013/05/20: * removed get_user() from real mode handlers * kvm_vcpu_arch::tce_tmp usage extended. Now real mode handler puts there translated TCEs, tries realmode_get_page() on those and if it fails, it passes control over the virtual mode handler which tries to finish the request handling * kvmppc_lookup_pte() now does realmode_get_page() protected by BUSY bit on a page * The only reason to pass the request to user mode now is when the user mode did not register TCE table in the kernel, in all other cases the virtual mode handler is expected to do the job --- Documentation/virtual/kvm/api.txt | 28 + arch/powerpc/include/asm/kvm_host.h |3 + arch/powerpc/include/asm/kvm_ppc.h |2 + arch/powerpc/include/uapi/asm/kvm.h |7 ++ arch/powerpc/kvm/book3s_64_vio.c| 198 ++- arch/powerpc/kvm/book3s_64_vio_hv.c | 193 +- arch/powerpc/kvm/powerpc.c | 12 +++ include/uapi/linux/kvm.h|2 + 8 files changed, 439 insertions(+), 6 deletions(-) diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt index 6c082ff..e962e3b 100644 --- a/Documentation/virtual/kvm/api.txt +++ b/Documentation/virtual/kvm/api.txt @@ -2379,6 +2379,34 @@ the guest. Othwerwise it might be better for the guest to continue using H_PUT_T hypercall (if KVM_CAP_SPAPR_TCE or KVM_CAP_SPAPR_TCE_IOMMU are present). +4.84 KVM_CREATE_SPAPR_TCE_IOMMU + +Capability: KVM_CAP_SPAPR_TCE_IOMMU +Architectures: powerpc +Type: vm ioctl +Parameters: struct kvm_create_spapr_tce_iommu (in) +Returns: 0 on success, -1 on error + +This creates a link between IOMMU group and a hardware TCE (translation +control entry) table. This link lets the host kernel know what IOMMU +group (i.e. TCE table) to use for the LIOBN number passed with +H_PUT_TCE, H_PUT_TCE_INDIRECT, H_STUFF_TCE hypercalls. + +/* for KVM_CAP_SPAPR_TCE_IOMMU */ +struct kvm_create_spapr_tce_iommu { + __u64 liobn; + __u32 iommu_id; + __u32 flags; +}; + +No flag is supported at the moment. + +When the guest issues TCE call on a liobn for which a TCE table has been +registered, the kernel will handle it in real mode, updating the hardware +TCE table. TCE table calls for other liobns will cause a vm exit and must +be handled by userspace. Ok, please walk me through the security model you have in mind here. Basically what this ioctl does is that it creates a guest TCE table that reflects its changes into a host TCE table whenever it gets modified. So far so good. Now I don't see any checks that verify whether iommu_id is actually good to use from that user's access rights. Just because I have access to /dev/kvm I don't necessarily have access to an iommu control device. So the least I can see would be a local DoS attack where one user space program with only access to /dev/kvm can simply kill any access to another process's device by overflowing a host iommu TCE table with junk entries. There's even a certain chance of an information disclosure exploit here where a malicious user space program could get itself all network traffic DMA'd from another VM. How does this work on the host level? What is the security token to take control of a host TCE table? Alex -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling
On Mon, 2013-06-17 at 08:39 +1000, Benjamin Herrenschmidt wrote: On Wed, 2013-06-05 at 16:11 +1000, Alexey Kardashevskiy wrote: +long kvm_vm_ioctl_create_spapr_tce_iommu(struct kvm *kvm, + struct kvm_create_spapr_tce_iommu *args) +{ + struct kvmppc_spapr_tce_table *tt = NULL; + struct iommu_group *grp; + struct iommu_table *tbl; + + /* Find an IOMMU table for the given ID */ + grp = iommu_group_get_by_id(args-iommu_id); + if (!grp) + return -ENXIO; + + tbl = iommu_group_get_iommudata(grp); + if (!tbl) + return -ENXIO; So Alex Graf pointed out here, there is a security issue here, or are we missing something ? What prevents a malicious program that has access to /dev/kvm from taking over random iommu groups (including host used ones) that way? What is the security model of that whole iommu stuff to begin with ? IOMMU groups themselves don't provide security, they're accessed by interfaces like VFIO, which provide the security. Given a brief look, I agree, this looks like a possible backdoor. The typical VFIO way to handle this would be to pass a VFIO file descriptor here to prove that the process has access to the IOMMU group. This is how /dev/vfio/vfio gains the ability to setup an IOMMU domain an do mappings with the SET_CONTAINER ioctl using a group fd. Thanks, Alex -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling
On Sun, 2013-06-16 at 21:13 -0600, Alex Williamson wrote: IOMMU groups themselves don't provide security, they're accessed by interfaces like VFIO, which provide the security. Given a brief look, I agree, this looks like a possible backdoor. The typical VFIO way to handle this would be to pass a VFIO file descriptor here to prove that the process has access to the IOMMU group. This is how /dev/vfio/vfio gains the ability to setup an IOMMU domain an do mappings with the SET_CONTAINER ioctl using a group fd. Thanks, How do you envision that in the kernel ? IE. I'm in KVM code, gets that vfio fd, what do I do with it ? Basically, KVM needs to know that the user is allowed to use that iommu group. I don't think we want KVM however to call into VFIO directly right ? Cheers, Ben. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling
On 05.06.2013, at 08:11, Alexey Kardashevskiy wrote: This allows the host kernel to handle H_PUT_TCE, H_PUT_TCE_INDIRECT and H_STUFF_TCE requests without passing them to QEMU, which should save time on switching to QEMU and back. Both real and virtual modes are supported - whenever the kernel fails to handle TCE request, it passes it to the virtual mode. If it the virtual mode handlers fail, then the request is passed to the user mode, for example, to QEMU. This adds a new KVM_CAP_SPAPR_TCE_IOMMU ioctl to asssociate a virtual PCI bus ID (LIOBN) with an IOMMU group, which enables in-kernel handling of IOMMU map/unmap. Tests show that this patch increases transmission speed from 220MB/s to 750..1020MB/s on 10Gb network (Chelsea CXGB3 10Gb ethernet card). Cc: David Gibson da...@gibson.dropbear.id.au Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru Signed-off-by: Paul Mackerras pau...@samba.org --- Changes: 2013/06/05: * changed capability number * changed ioctl number * update the doc article number 2013/05/20: * removed get_user() from real mode handlers * kvm_vcpu_arch::tce_tmp usage extended. Now real mode handler puts there translated TCEs, tries realmode_get_page() on those and if it fails, it passes control over the virtual mode handler which tries to finish the request handling * kvmppc_lookup_pte() now does realmode_get_page() protected by BUSY bit on a page * The only reason to pass the request to user mode now is when the user mode did not register TCE table in the kernel, in all other cases the virtual mode handler is expected to do the job --- Documentation/virtual/kvm/api.txt | 28 + arch/powerpc/include/asm/kvm_host.h |3 + arch/powerpc/include/asm/kvm_ppc.h |2 + arch/powerpc/include/uapi/asm/kvm.h |7 ++ arch/powerpc/kvm/book3s_64_vio.c| 198 ++- arch/powerpc/kvm/book3s_64_vio_hv.c | 193 +- arch/powerpc/kvm/powerpc.c | 12 +++ include/uapi/linux/kvm.h|2 + 8 files changed, 439 insertions(+), 6 deletions(-) diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt index 6c082ff..e962e3b 100644 --- a/Documentation/virtual/kvm/api.txt +++ b/Documentation/virtual/kvm/api.txt @@ -2379,6 +2379,34 @@ the guest. Othwerwise it might be better for the guest to continue using H_PUT_T hypercall (if KVM_CAP_SPAPR_TCE or KVM_CAP_SPAPR_TCE_IOMMU are present). +4.84 KVM_CREATE_SPAPR_TCE_IOMMU + +Capability: KVM_CAP_SPAPR_TCE_IOMMU +Architectures: powerpc +Type: vm ioctl +Parameters: struct kvm_create_spapr_tce_iommu (in) +Returns: 0 on success, -1 on error + +This creates a link between IOMMU group and a hardware TCE (translation +control entry) table. This link lets the host kernel know what IOMMU +group (i.e. TCE table) to use for the LIOBN number passed with +H_PUT_TCE, H_PUT_TCE_INDIRECT, H_STUFF_TCE hypercalls. + +/* for KVM_CAP_SPAPR_TCE_IOMMU */ +struct kvm_create_spapr_tce_iommu { + __u64 liobn; + __u32 iommu_id; + __u32 flags; +}; + +No flag is supported at the moment. + +When the guest issues TCE call on a liobn for which a TCE table has been +registered, the kernel will handle it in real mode, updating the hardware +TCE table. TCE table calls for other liobns will cause a vm exit and must +be handled by userspace. Ok, please walk me through the security model you have in mind here. Basically what this ioctl does is that it creates a guest TCE table that reflects its changes into a host TCE table whenever it gets modified. So far so good. Now I don't see any checks that verify whether iommu_id is actually good to use from that user's access rights. Just because I have access to /dev/kvm I don't necessarily have access to an iommu control device. So the least I can see would be a local DoS attack where one user space program with only access to /dev/kvm can simply kill any access to another process's device by overflowing a host iommu TCE table with junk entries. There's even a certain chance of an information disclosure exploit here where a malicious user space program could get itself all network traffic DMA'd from another VM. How does this work on the host level? What is the security token to take control of a host TCE table? Alex -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling
On Wed, 2013-06-05 at 16:11 +1000, Alexey Kardashevskiy wrote: +long kvm_vm_ioctl_create_spapr_tce_iommu(struct kvm *kvm, + struct kvm_create_spapr_tce_iommu *args) +{ + struct kvmppc_spapr_tce_table *tt = NULL; + struct iommu_group *grp; + struct iommu_table *tbl; + + /* Find an IOMMU table for the given ID */ + grp = iommu_group_get_by_id(args-iommu_id); + if (!grp) + return -ENXIO; + + tbl = iommu_group_get_iommudata(grp); + if (!tbl) + return -ENXIO; So Alex Graf pointed out here, there is a security issue here, or are we missing something ? What prevents a malicious program that has access to /dev/kvm from taking over random iommu groups (including host used ones) that way? What is the security model of that whole iommu stuff to begin with ? Cheers, Ben. -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling
On Mon, 2013-06-17 at 08:39 +1000, Benjamin Herrenschmidt wrote: On Wed, 2013-06-05 at 16:11 +1000, Alexey Kardashevskiy wrote: +long kvm_vm_ioctl_create_spapr_tce_iommu(struct kvm *kvm, + struct kvm_create_spapr_tce_iommu *args) +{ + struct kvmppc_spapr_tce_table *tt = NULL; + struct iommu_group *grp; + struct iommu_table *tbl; + + /* Find an IOMMU table for the given ID */ + grp = iommu_group_get_by_id(args-iommu_id); + if (!grp) + return -ENXIO; + + tbl = iommu_group_get_iommudata(grp); + if (!tbl) + return -ENXIO; So Alex Graf pointed out here, there is a security issue here, or are we missing something ? What prevents a malicious program that has access to /dev/kvm from taking over random iommu groups (including host used ones) that way? What is the security model of that whole iommu stuff to begin with ? IOMMU groups themselves don't provide security, they're accessed by interfaces like VFIO, which provide the security. Given a brief look, I agree, this looks like a possible backdoor. The typical VFIO way to handle this would be to pass a VFIO file descriptor here to prove that the process has access to the IOMMU group. This is how /dev/vfio/vfio gains the ability to setup an IOMMU domain an do mappings with the SET_CONTAINER ioctl using a group fd. Thanks, Alex -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling
On Sun, 2013-06-16 at 21:13 -0600, Alex Williamson wrote: IOMMU groups themselves don't provide security, they're accessed by interfaces like VFIO, which provide the security. Given a brief look, I agree, this looks like a possible backdoor. The typical VFIO way to handle this would be to pass a VFIO file descriptor here to prove that the process has access to the IOMMU group. This is how /dev/vfio/vfio gains the ability to setup an IOMMU domain an do mappings with the SET_CONTAINER ioctl using a group fd. Thanks, How do you envision that in the kernel ? IE. I'm in KVM code, gets that vfio fd, what do I do with it ? Basically, KVM needs to know that the user is allowed to use that iommu group. I don't think we want KVM however to call into VFIO directly right ? Cheers, Ben. -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling
static pte_t kvmppc_lookup_pte(pgd_t *pgdir, unsigned long hva, bool writing, - unsigned long *pte_sizep) + unsigned long *pte_sizep, bool do_get_page) { pte_t *ptep; unsigned int shift = 0; @@ -135,6 +136,14 @@ static pte_t kvmppc_lookup_pte(pgd_t *pgdir, unsigned long hva, bool writing, if (!pte_present(*ptep)) return __pte(0); + /* + * Put huge pages handling to the virtual mode. + * The only exception is for TCE list pages which we + * do need to call get_page() for. + */ + if ((*pte_sizep PAGE_SIZE) do_get_page) + return __pte(0); + /* wait until _PAGE_BUSY is clear then set it atomically */ __asm__ __volatile__ ( 1: ldarx %0,0,%3\n @@ -148,6 +157,18 @@ static pte_t kvmppc_lookup_pte(pgd_t *pgdir, unsigned long hva, bool writing, : cc); ret = pte; + if (do_get_page pte_present(pte) (!writing || pte_write(pte))) { + struct page *pg = NULL; + pg = realmode_pfn_to_page(pte_pfn(pte)); + if (realmode_get_page(pg)) { + ret = __pte(0); + } else { + pte = pte_mkyoung(pte); + if (writing) + pte = pte_mkdirty(pte); + } + } + *ptep = pte;/* clears _PAGE_BUSY */ return ret; } So now you are adding the clearing of _PAGE_BUSY that was missing for your first patch, except that this is not enough since that means that in the emulated case (ie, !do_get_page) you will in essence return and then use a PTE that is not locked without any synchronization to ensure that the underlying page doesn't go away... then you'll dereference that page. So either make everything use speculative get_page, or make the emulated case use the MMU notifier to drop the operation in case of collision. The former looks easier. Also, any specific reason why you do: - Lock the PTE - get_page() - Unlock the PTE Instead of - Read the PTE - get_page_unless_zero - re-check PTE Like get_user_pages_fast() does ? The former will be two atomic ops, the latter only one (faster), but maybe you have a good reason why that can't work... Cheers, Ben. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling
static pte_t kvmppc_lookup_pte(pgd_t *pgdir, unsigned long hva, bool writing, - unsigned long *pte_sizep) + unsigned long *pte_sizep, bool do_get_page) { pte_t *ptep; unsigned int shift = 0; @@ -135,6 +136,14 @@ static pte_t kvmppc_lookup_pte(pgd_t *pgdir, unsigned long hva, bool writing, if (!pte_present(*ptep)) return __pte(0); + /* + * Put huge pages handling to the virtual mode. + * The only exception is for TCE list pages which we + * do need to call get_page() for. + */ + if ((*pte_sizep PAGE_SIZE) do_get_page) + return __pte(0); + /* wait until _PAGE_BUSY is clear then set it atomically */ __asm__ __volatile__ ( 1: ldarx %0,0,%3\n @@ -148,6 +157,18 @@ static pte_t kvmppc_lookup_pte(pgd_t *pgdir, unsigned long hva, bool writing, : cc); ret = pte; + if (do_get_page pte_present(pte) (!writing || pte_write(pte))) { + struct page *pg = NULL; + pg = realmode_pfn_to_page(pte_pfn(pte)); + if (realmode_get_page(pg)) { + ret = __pte(0); + } else { + pte = pte_mkyoung(pte); + if (writing) + pte = pte_mkdirty(pte); + } + } + *ptep = pte;/* clears _PAGE_BUSY */ return ret; } So now you are adding the clearing of _PAGE_BUSY that was missing for your first patch, except that this is not enough since that means that in the emulated case (ie, !do_get_page) you will in essence return and then use a PTE that is not locked without any synchronization to ensure that the underlying page doesn't go away... then you'll dereference that page. So either make everything use speculative get_page, or make the emulated case use the MMU notifier to drop the operation in case of collision. The former looks easier. Also, any specific reason why you do: - Lock the PTE - get_page() - Unlock the PTE Instead of - Read the PTE - get_page_unless_zero - re-check PTE Like get_user_pages_fast() does ? The former will be two atomic ops, the latter only one (faster), but maybe you have a good reason why that can't work... Cheers, Ben. -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling
This allows the host kernel to handle H_PUT_TCE, H_PUT_TCE_INDIRECT and H_STUFF_TCE requests without passing them to QEMU, which should save time on switching to QEMU and back. Both real and virtual modes are supported - whenever the kernel fails to handle TCE request, it passes it to the virtual mode. If it the virtual mode handlers fail, then the request is passed to the user mode, for example, to QEMU. This adds a new KVM_CAP_SPAPR_TCE_IOMMU ioctl to asssociate a virtual PCI bus ID (LIOBN) with an IOMMU group, which enables in-kernel handling of IOMMU map/unmap. Tests show that this patch increases transmission speed from 220MB/s to 750..1020MB/s on 10Gb network (Chelsea CXGB3 10Gb ethernet card). Cc: David Gibson da...@gibson.dropbear.id.au Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru Signed-off-by: Paul Mackerras pau...@samba.org --- Changes: 2013/06/05: * changed capability number * changed ioctl number * update the doc article number 2013/05/20: * removed get_user() from real mode handlers * kvm_vcpu_arch::tce_tmp usage extended. Now real mode handler puts there translated TCEs, tries realmode_get_page() on those and if it fails, it passes control over the virtual mode handler which tries to finish the request handling * kvmppc_lookup_pte() now does realmode_get_page() protected by BUSY bit on a page * The only reason to pass the request to user mode now is when the user mode did not register TCE table in the kernel, in all other cases the virtual mode handler is expected to do the job --- Documentation/virtual/kvm/api.txt | 28 + arch/powerpc/include/asm/kvm_host.h |3 + arch/powerpc/include/asm/kvm_ppc.h |2 + arch/powerpc/include/uapi/asm/kvm.h |7 ++ arch/powerpc/kvm/book3s_64_vio.c| 198 ++- arch/powerpc/kvm/book3s_64_vio_hv.c | 193 +- arch/powerpc/kvm/powerpc.c | 12 +++ include/uapi/linux/kvm.h|2 + 8 files changed, 439 insertions(+), 6 deletions(-) diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt index 6c082ff..e962e3b 100644 --- a/Documentation/virtual/kvm/api.txt +++ b/Documentation/virtual/kvm/api.txt @@ -2379,6 +2379,34 @@ the guest. Othwerwise it might be better for the guest to continue using H_PUT_T hypercall (if KVM_CAP_SPAPR_TCE or KVM_CAP_SPAPR_TCE_IOMMU are present). +4.84 KVM_CREATE_SPAPR_TCE_IOMMU + +Capability: KVM_CAP_SPAPR_TCE_IOMMU +Architectures: powerpc +Type: vm ioctl +Parameters: struct kvm_create_spapr_tce_iommu (in) +Returns: 0 on success, -1 on error + +This creates a link between IOMMU group and a hardware TCE (translation +control entry) table. This link lets the host kernel know what IOMMU +group (i.e. TCE table) to use for the LIOBN number passed with +H_PUT_TCE, H_PUT_TCE_INDIRECT, H_STUFF_TCE hypercalls. + +/* for KVM_CAP_SPAPR_TCE_IOMMU */ +struct kvm_create_spapr_tce_iommu { + __u64 liobn; + __u32 iommu_id; + __u32 flags; +}; + +No flag is supported at the moment. + +When the guest issues TCE call on a liobn for which a TCE table has been +registered, the kernel will handle it in real mode, updating the hardware +TCE table. TCE table calls for other liobns will cause a vm exit and must +be handled by userspace. + + 5. The kvm_run structure diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index 85d8f26..ac0e2fe 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -180,6 +180,7 @@ struct kvmppc_spapr_tce_table { struct kvm *kvm; u64 liobn; u32 window_size; + struct iommu_group *grp;/* used for IOMMU groups */ struct page *pages[0]; }; @@ -611,6 +612,8 @@ struct kvm_vcpu_arch { u64 busy_preempt; unsigned long *tce_tmp;/* TCE cache for TCE_PUT_INDIRECT hall */ + unsigned long tce_tmp_num; /* Number of handled TCEs in the cache */ + unsigned long tce_reason; /* The reason of switching to the virtmode */ #endif }; diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h index e852921b..934e01d 100644 --- a/arch/powerpc/include/asm/kvm_ppc.h +++ b/arch/powerpc/include/asm/kvm_ppc.h @@ -133,6 +133,8 @@ extern int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu); extern long kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm, struct kvm_create_spapr_tce *args); +extern long kvm_vm_ioctl_create_spapr_tce_iommu(struct kvm *kvm, + struct kvm_create_spapr_tce_iommu *args); extern struct kvmppc_spapr_tce_table *kvmppc_find_tce_table( struct kvm_vcpu *vcpu, unsigned long liobn); extern long kvmppc_emulated_validate_tce(unsigned long tce); diff --git a/arch/powerpc/include/uapi/asm/kvm.h b/arch/powerpc/include/uapi/asm/kvm.h index 0fb1a6e..cf82af4 100644 ---
[PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling
This allows the host kernel to handle H_PUT_TCE, H_PUT_TCE_INDIRECT and H_STUFF_TCE requests without passing them to QEMU, which should save time on switching to QEMU and back. Both real and virtual modes are supported - whenever the kernel fails to handle TCE request, it passes it to the virtual mode. If it the virtual mode handlers fail, then the request is passed to the user mode, for example, to QEMU. This adds a new KVM_CAP_SPAPR_TCE_IOMMU ioctl to asssociate a virtual PCI bus ID (LIOBN) with an IOMMU group, which enables in-kernel handling of IOMMU map/unmap. Tests show that this patch increases transmission speed from 220MB/s to 750..1020MB/s on 10Gb network (Chelsea CXGB3 10Gb ethernet card). Cc: David Gibson da...@gibson.dropbear.id.au Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru Signed-off-by: Paul Mackerras pau...@samba.org --- Changes: 2013/06/05: * changed capability number * changed ioctl number * update the doc article number 2013/05/20: * removed get_user() from real mode handlers * kvm_vcpu_arch::tce_tmp usage extended. Now real mode handler puts there translated TCEs, tries realmode_get_page() on those and if it fails, it passes control over the virtual mode handler which tries to finish the request handling * kvmppc_lookup_pte() now does realmode_get_page() protected by BUSY bit on a page * The only reason to pass the request to user mode now is when the user mode did not register TCE table in the kernel, in all other cases the virtual mode handler is expected to do the job --- Documentation/virtual/kvm/api.txt | 28 + arch/powerpc/include/asm/kvm_host.h |3 + arch/powerpc/include/asm/kvm_ppc.h |2 + arch/powerpc/include/uapi/asm/kvm.h |7 ++ arch/powerpc/kvm/book3s_64_vio.c| 198 ++- arch/powerpc/kvm/book3s_64_vio_hv.c | 193 +- arch/powerpc/kvm/powerpc.c | 12 +++ include/uapi/linux/kvm.h|2 + 8 files changed, 439 insertions(+), 6 deletions(-) diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt index 6c082ff..e962e3b 100644 --- a/Documentation/virtual/kvm/api.txt +++ b/Documentation/virtual/kvm/api.txt @@ -2379,6 +2379,34 @@ the guest. Othwerwise it might be better for the guest to continue using H_PUT_T hypercall (if KVM_CAP_SPAPR_TCE or KVM_CAP_SPAPR_TCE_IOMMU are present). +4.84 KVM_CREATE_SPAPR_TCE_IOMMU + +Capability: KVM_CAP_SPAPR_TCE_IOMMU +Architectures: powerpc +Type: vm ioctl +Parameters: struct kvm_create_spapr_tce_iommu (in) +Returns: 0 on success, -1 on error + +This creates a link between IOMMU group and a hardware TCE (translation +control entry) table. This link lets the host kernel know what IOMMU +group (i.e. TCE table) to use for the LIOBN number passed with +H_PUT_TCE, H_PUT_TCE_INDIRECT, H_STUFF_TCE hypercalls. + +/* for KVM_CAP_SPAPR_TCE_IOMMU */ +struct kvm_create_spapr_tce_iommu { + __u64 liobn; + __u32 iommu_id; + __u32 flags; +}; + +No flag is supported at the moment. + +When the guest issues TCE call on a liobn for which a TCE table has been +registered, the kernel will handle it in real mode, updating the hardware +TCE table. TCE table calls for other liobns will cause a vm exit and must +be handled by userspace. + + 5. The kvm_run structure diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index 85d8f26..ac0e2fe 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -180,6 +180,7 @@ struct kvmppc_spapr_tce_table { struct kvm *kvm; u64 liobn; u32 window_size; + struct iommu_group *grp;/* used for IOMMU groups */ struct page *pages[0]; }; @@ -611,6 +612,8 @@ struct kvm_vcpu_arch { u64 busy_preempt; unsigned long *tce_tmp;/* TCE cache for TCE_PUT_INDIRECT hall */ + unsigned long tce_tmp_num; /* Number of handled TCEs in the cache */ + unsigned long tce_reason; /* The reason of switching to the virtmode */ #endif }; diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h index e852921b..934e01d 100644 --- a/arch/powerpc/include/asm/kvm_ppc.h +++ b/arch/powerpc/include/asm/kvm_ppc.h @@ -133,6 +133,8 @@ extern int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu); extern long kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm, struct kvm_create_spapr_tce *args); +extern long kvm_vm_ioctl_create_spapr_tce_iommu(struct kvm *kvm, + struct kvm_create_spapr_tce_iommu *args); extern struct kvmppc_spapr_tce_table *kvmppc_find_tce_table( struct kvm_vcpu *vcpu, unsigned long liobn); extern long kvmppc_emulated_validate_tce(unsigned long tce); diff --git a/arch/powerpc/include/uapi/asm/kvm.h b/arch/powerpc/include/uapi/asm/kvm.h index 0fb1a6e..cf82af4 100644 ---
Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling
On 05/28/2013 07:12:32 PM, Alexey Kardashevskiy wrote: On 05/29/2013 09:35 AM, Scott Wood wrote: On 05/28/2013 06:30:40 PM, Alexey Kardashevskiy wrote: @@ -939,6 +940,9 @@ struct kvm_s390_ucas_mapping { #define KVM_GET_DEVICE_ATTR _IOW(KVMIO, 0xe2, struct kvm_device_attr) #define KVM_HAS_DEVICE_ATTR _IOW(KVMIO, 0xe3, struct kvm_device_attr) +/* ioctl for SPAPR TCE IOMMU */ +#define KVM_CREATE_SPAPR_TCE_IOMMU _IOW(KVMIO, 0xe4, struct kvm_create_spapr_tce_iommu) Shouldn't this go under the vm ioctl section? The KVM_CREATE_SPAPR_TCE_IOMMU ioctl (the version for emulated devices) is in this section so I decided to keep them together. Wrong? You decided to keep KVM_CREATE_SPAPR_TCE_IOMMU together with KVM_CREATE_SPAPR_TCE_IOMMU? Yes. Sigh. That's the same thing repeated. There's only one IOCTL. Nothing is being kept together. Sorry, I meant this ioctl - KVM_CREATE_SPAPR_TCE. But you didn't put it in the same section as KVM_CREATE_SPAPR_TCE. 0xe0 begins a different section. -Scott -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling
On 05/30/2013 06:05 AM, Scott Wood wrote: On 05/28/2013 07:12:32 PM, Alexey Kardashevskiy wrote: On 05/29/2013 09:35 AM, Scott Wood wrote: On 05/28/2013 06:30:40 PM, Alexey Kardashevskiy wrote: @@ -939,6 +940,9 @@ struct kvm_s390_ucas_mapping { #define KVM_GET_DEVICE_ATTR _IOW(KVMIO, 0xe2, struct kvm_device_attr) #define KVM_HAS_DEVICE_ATTR _IOW(KVMIO, 0xe3, struct kvm_device_attr) +/* ioctl for SPAPR TCE IOMMU */ +#define KVM_CREATE_SPAPR_TCE_IOMMU _IOW(KVMIO, 0xe4, struct kvm_create_spapr_tce_iommu) Shouldn't this go under the vm ioctl section? The KVM_CREATE_SPAPR_TCE_IOMMU ioctl (the version for emulated devices) is in this section so I decided to keep them together. Wrong? You decided to keep KVM_CREATE_SPAPR_TCE_IOMMU together with KVM_CREATE_SPAPR_TCE_IOMMU? Yes. Sigh. That's the same thing repeated. There's only one IOCTL. Nothing is being kept together. Sorry, I meant this ioctl - KVM_CREATE_SPAPR_TCE. But you didn't put it in the same section as KVM_CREATE_SPAPR_TCE. 0xe0 begins a different section. It is not really obvious that there are sections as no comment defines those :) But yes, makes sense to move it up a bit and change the code to 0xad. -- Alexey -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling
On 05/29/2013 06:10:33 PM, Alexey Kardashevskiy wrote: On 05/30/2013 06:05 AM, Scott Wood wrote: On 05/28/2013 07:12:32 PM, Alexey Kardashevskiy wrote: On 05/29/2013 09:35 AM, Scott Wood wrote: On 05/28/2013 06:30:40 PM, Alexey Kardashevskiy wrote: @@ -939,6 +940,9 @@ struct kvm_s390_ucas_mapping { #define KVM_GET_DEVICE_ATTR _IOW(KVMIO, 0xe2, struct kvm_device_attr) #define KVM_HAS_DEVICE_ATTR _IOW(KVMIO, 0xe3, struct kvm_device_attr) +/* ioctl for SPAPR TCE IOMMU */ +#define KVM_CREATE_SPAPR_TCE_IOMMU _IOW(KVMIO, 0xe4, struct kvm_create_spapr_tce_iommu) Shouldn't this go under the vm ioctl section? The KVM_CREATE_SPAPR_TCE_IOMMU ioctl (the version for emulated devices) is in this section so I decided to keep them together. Wrong? You decided to keep KVM_CREATE_SPAPR_TCE_IOMMU together with KVM_CREATE_SPAPR_TCE_IOMMU? Yes. Sigh. That's the same thing repeated. There's only one IOCTL. Nothing is being kept together. Sorry, I meant this ioctl - KVM_CREATE_SPAPR_TCE. But you didn't put it in the same section as KVM_CREATE_SPAPR_TCE. 0xe0 begins a different section. It is not really obvious that there are sections as no comment defines those :) There is a comment /* ioctls for fds returned by KVM_CREATE_DEVICE */ Putting KVM_CREATE_DEVICE in there was mainly to avoid dealing with the ioctl number conflict mess in the vm-ioctl section, but at least that one is related to the device control API. :-) But yes, makes sense to move it up a bit and change the code to 0xad. 0xad is KVM_KVMCLOCK_CTRL -Scott -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling
On 05/30/2013 09:14 AM, Scott Wood wrote: On 05/29/2013 06:10:33 PM, Alexey Kardashevskiy wrote: On 05/30/2013 06:05 AM, Scott Wood wrote: On 05/28/2013 07:12:32 PM, Alexey Kardashevskiy wrote: On 05/29/2013 09:35 AM, Scott Wood wrote: On 05/28/2013 06:30:40 PM, Alexey Kardashevskiy wrote: @@ -939,6 +940,9 @@ struct kvm_s390_ucas_mapping { #define KVM_GET_DEVICE_ATTR _IOW(KVMIO, 0xe2, struct kvm_device_attr) #define KVM_HAS_DEVICE_ATTR _IOW(KVMIO, 0xe3, struct kvm_device_attr) +/* ioctl for SPAPR TCE IOMMU */ +#define KVM_CREATE_SPAPR_TCE_IOMMU _IOW(KVMIO, 0xe4, struct kvm_create_spapr_tce_iommu) Shouldn't this go under the vm ioctl section? The KVM_CREATE_SPAPR_TCE_IOMMU ioctl (the version for emulated devices) is in this section so I decided to keep them together. Wrong? You decided to keep KVM_CREATE_SPAPR_TCE_IOMMU together with KVM_CREATE_SPAPR_TCE_IOMMU? Yes. Sigh. That's the same thing repeated. There's only one IOCTL. Nothing is being kept together. Sorry, I meant this ioctl - KVM_CREATE_SPAPR_TCE. But you didn't put it in the same section as KVM_CREATE_SPAPR_TCE. 0xe0 begins a different section. It is not really obvious that there are sections as no comment defines those :) There is a comment /* ioctls for fds returned by KVM_CREATE_DEVICE */ Putting KVM_CREATE_DEVICE in there was mainly to avoid dealing with the ioctl number conflict mess in the vm-ioctl section, but at least that one is related to the device control API. :-) But yes, makes sense to move it up a bit and change the code to 0xad. 0xad is KVM_KVMCLOCK_CTRL That's it. I am _completely_ confused now. No system whatsoever :( What rule should I use in order to choose the number for my new ioctl? :) -- Alexey -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling
On 05/29/2013 06:29:13 PM, Alexey Kardashevskiy wrote: On 05/30/2013 09:14 AM, Scott Wood wrote: On 05/29/2013 06:10:33 PM, Alexey Kardashevskiy wrote: On 05/30/2013 06:05 AM, Scott Wood wrote: But you didn't put it in the same section as KVM_CREATE_SPAPR_TCE. 0xe0 begins a different section. It is not really obvious that there are sections as no comment defines those :) There is a comment /* ioctls for fds returned by KVM_CREATE_DEVICE */ Putting KVM_CREATE_DEVICE in there was mainly to avoid dealing with the ioctl number conflict mess in the vm-ioctl section, but at least that one is related to the device control API. :-) But yes, makes sense to move it up a bit and change the code to 0xad. 0xad is KVM_KVMCLOCK_CTRL That's it. I am _completely_ confused now. No system whatsoever :( What rule should I use in order to choose the number for my new ioctl? :) Yeah, it's a mess. 0xaf seems to be free. :-) -Scott -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling
On 05/28/2013 07:12:32 PM, Alexey Kardashevskiy wrote: On 05/29/2013 09:35 AM, Scott Wood wrote: On 05/28/2013 06:30:40 PM, Alexey Kardashevskiy wrote: @@ -939,6 +940,9 @@ struct kvm_s390_ucas_mapping { #define KVM_GET_DEVICE_ATTR _IOW(KVMIO, 0xe2, struct kvm_device_attr) #define KVM_HAS_DEVICE_ATTR _IOW(KVMIO, 0xe3, struct kvm_device_attr) +/* ioctl for SPAPR TCE IOMMU */ +#define KVM_CREATE_SPAPR_TCE_IOMMU _IOW(KVMIO, 0xe4, struct kvm_create_spapr_tce_iommu) Shouldn't this go under the vm ioctl section? The KVM_CREATE_SPAPR_TCE_IOMMU ioctl (the version for emulated devices) is in this section so I decided to keep them together. Wrong? You decided to keep KVM_CREATE_SPAPR_TCE_IOMMU together with KVM_CREATE_SPAPR_TCE_IOMMU? Yes. Sigh. That's the same thing repeated. There's only one IOCTL. Nothing is being kept together. Sorry, I meant this ioctl - KVM_CREATE_SPAPR_TCE. But you didn't put it in the same section as KVM_CREATE_SPAPR_TCE. 0xe0 begins a different section. -Scott -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling
On 05/30/2013 06:05 AM, Scott Wood wrote: On 05/28/2013 07:12:32 PM, Alexey Kardashevskiy wrote: On 05/29/2013 09:35 AM, Scott Wood wrote: On 05/28/2013 06:30:40 PM, Alexey Kardashevskiy wrote: @@ -939,6 +940,9 @@ struct kvm_s390_ucas_mapping { #define KVM_GET_DEVICE_ATTR _IOW(KVMIO, 0xe2, struct kvm_device_attr) #define KVM_HAS_DEVICE_ATTR _IOW(KVMIO, 0xe3, struct kvm_device_attr) +/* ioctl for SPAPR TCE IOMMU */ +#define KVM_CREATE_SPAPR_TCE_IOMMU _IOW(KVMIO, 0xe4, struct kvm_create_spapr_tce_iommu) Shouldn't this go under the vm ioctl section? The KVM_CREATE_SPAPR_TCE_IOMMU ioctl (the version for emulated devices) is in this section so I decided to keep them together. Wrong? You decided to keep KVM_CREATE_SPAPR_TCE_IOMMU together with KVM_CREATE_SPAPR_TCE_IOMMU? Yes. Sigh. That's the same thing repeated. There's only one IOCTL. Nothing is being kept together. Sorry, I meant this ioctl - KVM_CREATE_SPAPR_TCE. But you didn't put it in the same section as KVM_CREATE_SPAPR_TCE. 0xe0 begins a different section. It is not really obvious that there are sections as no comment defines those :) But yes, makes sense to move it up a bit and change the code to 0xad. -- Alexey -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling
On 05/29/2013 06:10:33 PM, Alexey Kardashevskiy wrote: On 05/30/2013 06:05 AM, Scott Wood wrote: On 05/28/2013 07:12:32 PM, Alexey Kardashevskiy wrote: On 05/29/2013 09:35 AM, Scott Wood wrote: On 05/28/2013 06:30:40 PM, Alexey Kardashevskiy wrote: @@ -939,6 +940,9 @@ struct kvm_s390_ucas_mapping { #define KVM_GET_DEVICE_ATTR _IOW(KVMIO, 0xe2, struct kvm_device_attr) #define KVM_HAS_DEVICE_ATTR _IOW(KVMIO, 0xe3, struct kvm_device_attr) +/* ioctl for SPAPR TCE IOMMU */ +#define KVM_CREATE_SPAPR_TCE_IOMMU _IOW(KVMIO, 0xe4, struct kvm_create_spapr_tce_iommu) Shouldn't this go under the vm ioctl section? The KVM_CREATE_SPAPR_TCE_IOMMU ioctl (the version for emulated devices) is in this section so I decided to keep them together. Wrong? You decided to keep KVM_CREATE_SPAPR_TCE_IOMMU together with KVM_CREATE_SPAPR_TCE_IOMMU? Yes. Sigh. That's the same thing repeated. There's only one IOCTL. Nothing is being kept together. Sorry, I meant this ioctl - KVM_CREATE_SPAPR_TCE. But you didn't put it in the same section as KVM_CREATE_SPAPR_TCE. 0xe0 begins a different section. It is not really obvious that there are sections as no comment defines those :) There is a comment /* ioctls for fds returned by KVM_CREATE_DEVICE */ Putting KVM_CREATE_DEVICE in there was mainly to avoid dealing with the ioctl number conflict mess in the vm-ioctl section, but at least that one is related to the device control API. :-) But yes, makes sense to move it up a bit and change the code to 0xad. 0xad is KVM_KVMCLOCK_CTRL -Scott -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling
On 05/30/2013 09:14 AM, Scott Wood wrote: On 05/29/2013 06:10:33 PM, Alexey Kardashevskiy wrote: On 05/30/2013 06:05 AM, Scott Wood wrote: On 05/28/2013 07:12:32 PM, Alexey Kardashevskiy wrote: On 05/29/2013 09:35 AM, Scott Wood wrote: On 05/28/2013 06:30:40 PM, Alexey Kardashevskiy wrote: @@ -939,6 +940,9 @@ struct kvm_s390_ucas_mapping { #define KVM_GET_DEVICE_ATTR _IOW(KVMIO, 0xe2, struct kvm_device_attr) #define KVM_HAS_DEVICE_ATTR _IOW(KVMIO, 0xe3, struct kvm_device_attr) +/* ioctl for SPAPR TCE IOMMU */ +#define KVM_CREATE_SPAPR_TCE_IOMMU _IOW(KVMIO, 0xe4, struct kvm_create_spapr_tce_iommu) Shouldn't this go under the vm ioctl section? The KVM_CREATE_SPAPR_TCE_IOMMU ioctl (the version for emulated devices) is in this section so I decided to keep them together. Wrong? You decided to keep KVM_CREATE_SPAPR_TCE_IOMMU together with KVM_CREATE_SPAPR_TCE_IOMMU? Yes. Sigh. That's the same thing repeated. There's only one IOCTL. Nothing is being kept together. Sorry, I meant this ioctl - KVM_CREATE_SPAPR_TCE. But you didn't put it in the same section as KVM_CREATE_SPAPR_TCE. 0xe0 begins a different section. It is not really obvious that there are sections as no comment defines those :) There is a comment /* ioctls for fds returned by KVM_CREATE_DEVICE */ Putting KVM_CREATE_DEVICE in there was mainly to avoid dealing with the ioctl number conflict mess in the vm-ioctl section, but at least that one is related to the device control API. :-) But yes, makes sense to move it up a bit and change the code to 0xad. 0xad is KVM_KVMCLOCK_CTRL That's it. I am _completely_ confused now. No system whatsoever :( What rule should I use in order to choose the number for my new ioctl? :) -- Alexey -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling
On 05/24/2013 09:45:24 PM, David Gibson wrote: On Wed, May 22, 2013 at 04:06:57PM -0500, Scott Wood wrote: On 05/20/2013 10:06:46 PM, Alexey Kardashevskiy wrote: diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c index 8465c2a..da6bf61 100644 --- a/arch/powerpc/kvm/powerpc.c @@ -396,6 +396,7 @@ int kvm_dev_ioctl_check_extension(long ext) +++ b/arch/powerpc/kvm/powerpc.c break; #endif case KVM_CAP_SPAPR_MULTITCE: + case KVM_CAP_SPAPR_TCE_IOMMU: r = 1; break; default: Don't advertise SPAPR capabilities if it's not book3s -- and probably there's some additional limitation that would be appropriate. So, in the case of MULTITCE, that's not quite right. PR KVM can emulate a PAPR system on a BookE machine, and there's no reason not to allow TCE acceleration as well. That might (or might not; consider things like Altivec versus SPE opcode conflict, whether unimplemented SPRs trap, behavior of unprivileged SPRs/instructions, etc) be true in theory, but it's not currently a supported configuration. BookE KVM does not support emulating a different CPU than the host. In the unlikely case that ever changes to the point of allowing PAPR guests on a BookE host, then we can revisit how to properly determine whether the capability is supported, but for now the capability will never be valid in the CONFIG_BOOKE case (though I'd rather see it depend on an appropriate book3s symbol than depend on !BOOKE). Or we could just leave it as is, and let it indicate whether the host kernel supports the feature in general, with the user needing to understand when it's applicable... I'm a bit confused by the documentation, however -- the MULTITCE capability was documented in the capabilities that can be enabled section, but I don't see where it can be enabled. -Scott -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling
On 05/26/2013 09:44:24 PM, Alexey Kardashevskiy wrote: On 05/25/2013 12:45 PM, David Gibson wrote: On Wed, May 22, 2013 at 04:06:57PM -0500, Scott Wood wrote: On 05/20/2013 10:06:46 PM, Alexey Kardashevskiy wrote: diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c index 8465c2a..da6bf61 100644 --- a/arch/powerpc/kvm/powerpc.c @@ -396,6 +396,7 @@ int kvm_dev_ioctl_check_extension(long ext) +++ b/arch/powerpc/kvm/powerpc.c break; #endif case KVM_CAP_SPAPR_MULTITCE: + case KVM_CAP_SPAPR_TCE_IOMMU: r = 1; break; default: Don't advertise SPAPR capabilities if it's not book3s -- and probably there's some additional limitation that would be appropriate. So, in the case of MULTITCE, that's not quite right. PR KVM can emulate a PAPR system on a BookE machine, and there's no reason not to allow TCE acceleration as well. We can't make it dependent on PAPR mode being selected, because that's enabled per-vcpu, whereas these capabilities are queried on the VM before the vcpus are created. CAP_SPAPR_TCE_IOMMU should be dependent on the presence of suitable host side hardware (i.e. a PAPR style IOMMU), though. The capability says that the ioctl is supported. If there is no IOMMU group registered, than it will fail with a reasonable error and nobody gets hurt. What is the problem? You could say that about a lot of the capabilities that just advertise the existence of new ioctls. :-) Sometimes it's nice to know in advance whether it's supported, before actually requesting that something happen. @@ -939,6 +940,9 @@ struct kvm_s390_ucas_mapping { #define KVM_GET_DEVICE_ATTR _IOW(KVMIO, 0xe2, struct kvm_device_attr) #define KVM_HAS_DEVICE_ATTR _IOW(KVMIO, 0xe3, struct kvm_device_attr) +/* ioctl for SPAPR TCE IOMMU */ +#define KVM_CREATE_SPAPR_TCE_IOMMU _IOW(KVMIO, 0xe4, struct kvm_create_spapr_tce_iommu) Shouldn't this go under the vm ioctl section? The KVM_CREATE_SPAPR_TCE_IOMMU ioctl (the version for emulated devices) is in this section so I decided to keep them together. Wrong? You decided to keep KVM_CREATE_SPAPR_TCE_IOMMU together with KVM_CREATE_SPAPR_TCE_IOMMU? -Scott -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling
On 05/29/2013 03:45 AM, Scott Wood wrote: On 05/26/2013 09:44:24 PM, Alexey Kardashevskiy wrote: On 05/25/2013 12:45 PM, David Gibson wrote: On Wed, May 22, 2013 at 04:06:57PM -0500, Scott Wood wrote: On 05/20/2013 10:06:46 PM, Alexey Kardashevskiy wrote: diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c index 8465c2a..da6bf61 100644 --- a/arch/powerpc/kvm/powerpc.c @@ -396,6 +396,7 @@ int kvm_dev_ioctl_check_extension(long ext) +++ b/arch/powerpc/kvm/powerpc.c break; #endif case KVM_CAP_SPAPR_MULTITCE: +case KVM_CAP_SPAPR_TCE_IOMMU: r = 1; break; default: Don't advertise SPAPR capabilities if it's not book3s -- and probably there's some additional limitation that would be appropriate. So, in the case of MULTITCE, that's not quite right. PR KVM can emulate a PAPR system on a BookE machine, and there's no reason not to allow TCE acceleration as well. We can't make it dependent on PAPR mode being selected, because that's enabled per-vcpu, whereas these capabilities are queried on the VM before the vcpus are created. CAP_SPAPR_TCE_IOMMU should be dependent on the presence of suitable host side hardware (i.e. a PAPR style IOMMU), though. The capability says that the ioctl is supported. If there is no IOMMU group registered, than it will fail with a reasonable error and nobody gets hurt. What is the problem? You could say that about a lot of the capabilities that just advertise the existence of new ioctls. :-) Sometimes it's nice to know in advance whether it's supported, before actually requesting that something happen. Yes, would be nice. There is just no quick way to know if this real system supports IOMMU groups. I could add another helper to generic IOMMU code which would return the number of registered IOMMU groups but it is a bit too much :) @@ -939,6 +940,9 @@ struct kvm_s390_ucas_mapping { #define KVM_GET_DEVICE_ATTR _IOW(KVMIO, 0xe2, struct kvm_device_attr) #define KVM_HAS_DEVICE_ATTR _IOW(KVMIO, 0xe3, struct kvm_device_attr) +/* ioctl for SPAPR TCE IOMMU */ +#define KVM_CREATE_SPAPR_TCE_IOMMU _IOW(KVMIO, 0xe4, struct kvm_create_spapr_tce_iommu) Shouldn't this go under the vm ioctl section? The KVM_CREATE_SPAPR_TCE_IOMMU ioctl (the version for emulated devices) is in this section so I decided to keep them together. Wrong? You decided to keep KVM_CREATE_SPAPR_TCE_IOMMU together with KVM_CREATE_SPAPR_TCE_IOMMU? Yes. -- Alexey -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling
On 05/28/2013 06:30:40 PM, Alexey Kardashevskiy wrote: @@ -939,6 +940,9 @@ struct kvm_s390_ucas_mapping { #define KVM_GET_DEVICE_ATTR _IOW(KVMIO, 0xe2, struct kvm_device_attr) #define KVM_HAS_DEVICE_ATTR _IOW(KVMIO, 0xe3, struct kvm_device_attr) +/* ioctl for SPAPR TCE IOMMU */ +#define KVM_CREATE_SPAPR_TCE_IOMMU _IOW(KVMIO, 0xe4, struct kvm_create_spapr_tce_iommu) Shouldn't this go under the vm ioctl section? The KVM_CREATE_SPAPR_TCE_IOMMU ioctl (the version for emulated devices) is in this section so I decided to keep them together. Wrong? You decided to keep KVM_CREATE_SPAPR_TCE_IOMMU together with KVM_CREATE_SPAPR_TCE_IOMMU? Yes. Sigh. That's the same thing repeated. There's only one IOCTL. Nothing is being kept together. -Scott -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling
On 05/29/2013 09:35 AM, Scott Wood wrote: On 05/28/2013 06:30:40 PM, Alexey Kardashevskiy wrote: @@ -939,6 +940,9 @@ struct kvm_s390_ucas_mapping { #define KVM_GET_DEVICE_ATTR _IOW(KVMIO, 0xe2, struct kvm_device_attr) #define KVM_HAS_DEVICE_ATTR _IOW(KVMIO, 0xe3, struct kvm_device_attr) +/* ioctl for SPAPR TCE IOMMU */ +#define KVM_CREATE_SPAPR_TCE_IOMMU _IOW(KVMIO, 0xe4, struct kvm_create_spapr_tce_iommu) Shouldn't this go under the vm ioctl section? The KVM_CREATE_SPAPR_TCE_IOMMU ioctl (the version for emulated devices) is in this section so I decided to keep them together. Wrong? You decided to keep KVM_CREATE_SPAPR_TCE_IOMMU together with KVM_CREATE_SPAPR_TCE_IOMMU? Yes. Sigh. That's the same thing repeated. There's only one IOCTL. Nothing is being kept together. Sorry, I meant this ioctl - KVM_CREATE_SPAPR_TCE. -- Alexey -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling
On 05/29/2013 02:32 AM, Scott Wood wrote: On 05/24/2013 09:45:24 PM, David Gibson wrote: On Wed, May 22, 2013 at 04:06:57PM -0500, Scott Wood wrote: On 05/20/2013 10:06:46 PM, Alexey Kardashevskiy wrote: diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c index 8465c2a..da6bf61 100644 --- a/arch/powerpc/kvm/powerpc.c @@ -396,6 +396,7 @@ int kvm_dev_ioctl_check_extension(long ext) +++ b/arch/powerpc/kvm/powerpc.c break; #endif case KVM_CAP_SPAPR_MULTITCE: +case KVM_CAP_SPAPR_TCE_IOMMU: r = 1; break; default: Don't advertise SPAPR capabilities if it's not book3s -- and probably there's some additional limitation that would be appropriate. So, in the case of MULTITCE, that's not quite right. PR KVM can emulate a PAPR system on a BookE machine, and there's no reason not to allow TCE acceleration as well. That might (or might not; consider things like Altivec versus SPE opcode conflict, whether unimplemented SPRs trap, behavior of unprivileged SPRs/instructions, etc) be true in theory, but it's not currently a supported configuration. BookE KVM does not support emulating a different CPU than the host. In the unlikely case that ever changes to the point of allowing PAPR guests on a BookE host, then we can revisit how to properly determine whether the capability is supported, but for now the capability will never be valid in the CONFIG_BOOKE case (though I'd rather see it depend on an appropriate book3s symbol than depend on !BOOKE). Or we could just leave it as is, and let it indicate whether the host kernel supports the feature in general, with the user needing to understand when it's applicable... I'm a bit confused by the documentation, however -- the MULTITCE capability was documented in the capabilities that can be enabled section, but I don't see where it can be enabled. True, it cannot be enabled (but it could be enabled a long time ago), it is either supported or not, I'll fix the documentation. Thanks! -- Alexey -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling
On 05/24/2013 09:45:24 PM, David Gibson wrote: On Wed, May 22, 2013 at 04:06:57PM -0500, Scott Wood wrote: On 05/20/2013 10:06:46 PM, Alexey Kardashevskiy wrote: diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c index 8465c2a..da6bf61 100644 --- a/arch/powerpc/kvm/powerpc.c @@ -396,6 +396,7 @@ int kvm_dev_ioctl_check_extension(long ext) +++ b/arch/powerpc/kvm/powerpc.c break; #endif case KVM_CAP_SPAPR_MULTITCE: + case KVM_CAP_SPAPR_TCE_IOMMU: r = 1; break; default: Don't advertise SPAPR capabilities if it's not book3s -- and probably there's some additional limitation that would be appropriate. So, in the case of MULTITCE, that's not quite right. PR KVM can emulate a PAPR system on a BookE machine, and there's no reason not to allow TCE acceleration as well. That might (or might not; consider things like Altivec versus SPE opcode conflict, whether unimplemented SPRs trap, behavior of unprivileged SPRs/instructions, etc) be true in theory, but it's not currently a supported configuration. BookE KVM does not support emulating a different CPU than the host. In the unlikely case that ever changes to the point of allowing PAPR guests on a BookE host, then we can revisit how to properly determine whether the capability is supported, but for now the capability will never be valid in the CONFIG_BOOKE case (though I'd rather see it depend on an appropriate book3s symbol than depend on !BOOKE). Or we could just leave it as is, and let it indicate whether the host kernel supports the feature in general, with the user needing to understand when it's applicable... I'm a bit confused by the documentation, however -- the MULTITCE capability was documented in the capabilities that can be enabled section, but I don't see where it can be enabled. -Scott -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling
On 05/26/2013 09:44:24 PM, Alexey Kardashevskiy wrote: On 05/25/2013 12:45 PM, David Gibson wrote: On Wed, May 22, 2013 at 04:06:57PM -0500, Scott Wood wrote: On 05/20/2013 10:06:46 PM, Alexey Kardashevskiy wrote: diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c index 8465c2a..da6bf61 100644 --- a/arch/powerpc/kvm/powerpc.c @@ -396,6 +396,7 @@ int kvm_dev_ioctl_check_extension(long ext) +++ b/arch/powerpc/kvm/powerpc.c break; #endif case KVM_CAP_SPAPR_MULTITCE: + case KVM_CAP_SPAPR_TCE_IOMMU: r = 1; break; default: Don't advertise SPAPR capabilities if it's not book3s -- and probably there's some additional limitation that would be appropriate. So, in the case of MULTITCE, that's not quite right. PR KVM can emulate a PAPR system on a BookE machine, and there's no reason not to allow TCE acceleration as well. We can't make it dependent on PAPR mode being selected, because that's enabled per-vcpu, whereas these capabilities are queried on the VM before the vcpus are created. CAP_SPAPR_TCE_IOMMU should be dependent on the presence of suitable host side hardware (i.e. a PAPR style IOMMU), though. The capability says that the ioctl is supported. If there is no IOMMU group registered, than it will fail with a reasonable error and nobody gets hurt. What is the problem? You could say that about a lot of the capabilities that just advertise the existence of new ioctls. :-) Sometimes it's nice to know in advance whether it's supported, before actually requesting that something happen. @@ -939,6 +940,9 @@ struct kvm_s390_ucas_mapping { #define KVM_GET_DEVICE_ATTR _IOW(KVMIO, 0xe2, struct kvm_device_attr) #define KVM_HAS_DEVICE_ATTR _IOW(KVMIO, 0xe3, struct kvm_device_attr) +/* ioctl for SPAPR TCE IOMMU */ +#define KVM_CREATE_SPAPR_TCE_IOMMU _IOW(KVMIO, 0xe4, struct kvm_create_spapr_tce_iommu) Shouldn't this go under the vm ioctl section? The KVM_CREATE_SPAPR_TCE_IOMMU ioctl (the version for emulated devices) is in this section so I decided to keep them together. Wrong? You decided to keep KVM_CREATE_SPAPR_TCE_IOMMU together with KVM_CREATE_SPAPR_TCE_IOMMU? -Scott -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling
On 05/29/2013 03:45 AM, Scott Wood wrote: On 05/26/2013 09:44:24 PM, Alexey Kardashevskiy wrote: On 05/25/2013 12:45 PM, David Gibson wrote: On Wed, May 22, 2013 at 04:06:57PM -0500, Scott Wood wrote: On 05/20/2013 10:06:46 PM, Alexey Kardashevskiy wrote: diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c index 8465c2a..da6bf61 100644 --- a/arch/powerpc/kvm/powerpc.c @@ -396,6 +396,7 @@ int kvm_dev_ioctl_check_extension(long ext) +++ b/arch/powerpc/kvm/powerpc.c break; #endif case KVM_CAP_SPAPR_MULTITCE: +case KVM_CAP_SPAPR_TCE_IOMMU: r = 1; break; default: Don't advertise SPAPR capabilities if it's not book3s -- and probably there's some additional limitation that would be appropriate. So, in the case of MULTITCE, that's not quite right. PR KVM can emulate a PAPR system on a BookE machine, and there's no reason not to allow TCE acceleration as well. We can't make it dependent on PAPR mode being selected, because that's enabled per-vcpu, whereas these capabilities are queried on the VM before the vcpus are created. CAP_SPAPR_TCE_IOMMU should be dependent on the presence of suitable host side hardware (i.e. a PAPR style IOMMU), though. The capability says that the ioctl is supported. If there is no IOMMU group registered, than it will fail with a reasonable error and nobody gets hurt. What is the problem? You could say that about a lot of the capabilities that just advertise the existence of new ioctls. :-) Sometimes it's nice to know in advance whether it's supported, before actually requesting that something happen. Yes, would be nice. There is just no quick way to know if this real system supports IOMMU groups. I could add another helper to generic IOMMU code which would return the number of registered IOMMU groups but it is a bit too much :) @@ -939,6 +940,9 @@ struct kvm_s390_ucas_mapping { #define KVM_GET_DEVICE_ATTR _IOW(KVMIO, 0xe2, struct kvm_device_attr) #define KVM_HAS_DEVICE_ATTR _IOW(KVMIO, 0xe3, struct kvm_device_attr) +/* ioctl for SPAPR TCE IOMMU */ +#define KVM_CREATE_SPAPR_TCE_IOMMU _IOW(KVMIO, 0xe4, struct kvm_create_spapr_tce_iommu) Shouldn't this go under the vm ioctl section? The KVM_CREATE_SPAPR_TCE_IOMMU ioctl (the version for emulated devices) is in this section so I decided to keep them together. Wrong? You decided to keep KVM_CREATE_SPAPR_TCE_IOMMU together with KVM_CREATE_SPAPR_TCE_IOMMU? Yes. -- Alexey -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling
On 05/29/2013 09:35 AM, Scott Wood wrote: On 05/28/2013 06:30:40 PM, Alexey Kardashevskiy wrote: @@ -939,6 +940,9 @@ struct kvm_s390_ucas_mapping { #define KVM_GET_DEVICE_ATTR _IOW(KVMIO, 0xe2, struct kvm_device_attr) #define KVM_HAS_DEVICE_ATTR _IOW(KVMIO, 0xe3, struct kvm_device_attr) +/* ioctl for SPAPR TCE IOMMU */ +#define KVM_CREATE_SPAPR_TCE_IOMMU _IOW(KVMIO, 0xe4, struct kvm_create_spapr_tce_iommu) Shouldn't this go under the vm ioctl section? The KVM_CREATE_SPAPR_TCE_IOMMU ioctl (the version for emulated devices) is in this section so I decided to keep them together. Wrong? You decided to keep KVM_CREATE_SPAPR_TCE_IOMMU together with KVM_CREATE_SPAPR_TCE_IOMMU? Yes. Sigh. That's the same thing repeated. There's only one IOCTL. Nothing is being kept together. Sorry, I meant this ioctl - KVM_CREATE_SPAPR_TCE. -- Alexey -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling
On 05/29/2013 02:32 AM, Scott Wood wrote: On 05/24/2013 09:45:24 PM, David Gibson wrote: On Wed, May 22, 2013 at 04:06:57PM -0500, Scott Wood wrote: On 05/20/2013 10:06:46 PM, Alexey Kardashevskiy wrote: diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c index 8465c2a..da6bf61 100644 --- a/arch/powerpc/kvm/powerpc.c @@ -396,6 +396,7 @@ int kvm_dev_ioctl_check_extension(long ext) +++ b/arch/powerpc/kvm/powerpc.c break; #endif case KVM_CAP_SPAPR_MULTITCE: +case KVM_CAP_SPAPR_TCE_IOMMU: r = 1; break; default: Don't advertise SPAPR capabilities if it's not book3s -- and probably there's some additional limitation that would be appropriate. So, in the case of MULTITCE, that's not quite right. PR KVM can emulate a PAPR system on a BookE machine, and there's no reason not to allow TCE acceleration as well. That might (or might not; consider things like Altivec versus SPE opcode conflict, whether unimplemented SPRs trap, behavior of unprivileged SPRs/instructions, etc) be true in theory, but it's not currently a supported configuration. BookE KVM does not support emulating a different CPU than the host. In the unlikely case that ever changes to the point of allowing PAPR guests on a BookE host, then we can revisit how to properly determine whether the capability is supported, but for now the capability will never be valid in the CONFIG_BOOKE case (though I'd rather see it depend on an appropriate book3s symbol than depend on !BOOKE). Or we could just leave it as is, and let it indicate whether the host kernel supports the feature in general, with the user needing to understand when it's applicable... I'm a bit confused by the documentation, however -- the MULTITCE capability was documented in the capabilities that can be enabled section, but I don't see where it can be enabled. True, it cannot be enabled (but it could be enabled a long time ago), it is either supported or not, I'll fix the documentation. Thanks! -- Alexey -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling
Il 25/05/2013 04:45, David Gibson ha scritto: + case KVM_CREATE_SPAPR_TCE_IOMMU: { + struct kvm_create_spapr_tce_iommu create_tce_iommu; + struct kvm *kvm = filp-private_data; + + r = -EFAULT; + if (copy_from_user(create_tce_iommu, argp, + sizeof(create_tce_iommu))) + goto out; + r = kvm_vm_ioctl_create_spapr_tce_iommu(kvm, create_tce_iommu); + goto out; + } Would it make sense to make this the only interface for creating TCEs? That is, pass both a window_size and an IOMMU group id (or e.g. -1 for no hardware IOMMU usage), and have a single ioctl for both cases? There's some duplicated code between kvm_vm_ioctl_create_spapr_tce and kvm_vm_ioctl_create_spapr_tce_iommu. KVM_CREATE_SPAPR_TCE could stay for backwards-compatibility, or you could just use a new capability and drop the old ioctl. I'm not sure whether you're already considering the ABI to be stable for kvmppc. Paolo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling
On 05/27/2013 08:23 PM, Paolo Bonzini wrote: Il 25/05/2013 04:45, David Gibson ha scritto: + case KVM_CREATE_SPAPR_TCE_IOMMU: { + struct kvm_create_spapr_tce_iommu create_tce_iommu; + struct kvm *kvm = filp-private_data; + + r = -EFAULT; + if (copy_from_user(create_tce_iommu, argp, + sizeof(create_tce_iommu))) + goto out; + r = kvm_vm_ioctl_create_spapr_tce_iommu(kvm, create_tce_iommu); + goto out; + } Would it make sense to make this the only interface for creating TCEs? That is, pass both a window_size and an IOMMU group id (or e.g. -1 for no hardware IOMMU usage), and have a single ioctl for both cases? There's some duplicated code between kvm_vm_ioctl_create_spapr_tce and kvm_vm_ioctl_create_spapr_tce_iommu. Just few bits. Is there really much sense in making one function from those two? I tried, looked a bit messy. KVM_CREATE_SPAPR_TCE could stay for backwards-compatibility, or you could just use a new capability and drop the old ioctl. The old capability+ioctl already exist for quite a while and few QEMU versions supporting it were released so we do not want just drop it. So then what is the benefit of having a new interface with support of both types? I'm not sure whether you're already considering the ABI to be stable for kvmppc. Is any bit of KVM using it? Cannot see from Documentation/ABI. -- Alexey -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling
Il 27/05/2013 16:26, Alexey Kardashevskiy ha scritto: On 05/27/2013 08:23 PM, Paolo Bonzini wrote: Il 25/05/2013 04:45, David Gibson ha scritto: + case KVM_CREATE_SPAPR_TCE_IOMMU: { + struct kvm_create_spapr_tce_iommu create_tce_iommu; + struct kvm *kvm = filp-private_data; + + r = -EFAULT; + if (copy_from_user(create_tce_iommu, argp, + sizeof(create_tce_iommu))) + goto out; + r = kvm_vm_ioctl_create_spapr_tce_iommu(kvm, create_tce_iommu); + goto out; + } Would it make sense to make this the only interface for creating TCEs? That is, pass both a window_size and an IOMMU group id (or e.g. -1 for no hardware IOMMU usage), and have a single ioctl for both cases? There's some duplicated code between kvm_vm_ioctl_create_spapr_tce and kvm_vm_ioctl_create_spapr_tce_iommu. Just few bits. Is there really much sense in making one function from those two? I tried, looked a bit messy. Cannot really tell without the userspace bits. But ioctl proliferation is what the device and one_reg APIs were supposed to avoid... KVM_CREATE_SPAPR_TCE could stay for backwards-compatibility, or you could just use a new capability and drop the old ioctl. The old capability+ioctl already exist for quite a while and few QEMU versions supporting it were released so we do not want just drop it. So then what is the benefit of having a new interface with support of both types? I'm not sure whether you're already considering the ABI to be stable for kvmppc. Is any bit of KVM using it? Cannot see from Documentation/ABI. I mean the userspace ABI (ioctls). Paolo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling
Il 25/05/2013 04:45, David Gibson ha scritto: + case KVM_CREATE_SPAPR_TCE_IOMMU: { + struct kvm_create_spapr_tce_iommu create_tce_iommu; + struct kvm *kvm = filp-private_data; + + r = -EFAULT; + if (copy_from_user(create_tce_iommu, argp, + sizeof(create_tce_iommu))) + goto out; + r = kvm_vm_ioctl_create_spapr_tce_iommu(kvm, create_tce_iommu); + goto out; + } Would it make sense to make this the only interface for creating TCEs? That is, pass both a window_size and an IOMMU group id (or e.g. -1 for no hardware IOMMU usage), and have a single ioctl for both cases? There's some duplicated code between kvm_vm_ioctl_create_spapr_tce and kvm_vm_ioctl_create_spapr_tce_iommu. KVM_CREATE_SPAPR_TCE could stay for backwards-compatibility, or you could just use a new capability and drop the old ioctl. I'm not sure whether you're already considering the ABI to be stable for kvmppc. Paolo -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling
On 05/27/2013 08:23 PM, Paolo Bonzini wrote: Il 25/05/2013 04:45, David Gibson ha scritto: + case KVM_CREATE_SPAPR_TCE_IOMMU: { + struct kvm_create_spapr_tce_iommu create_tce_iommu; + struct kvm *kvm = filp-private_data; + + r = -EFAULT; + if (copy_from_user(create_tce_iommu, argp, + sizeof(create_tce_iommu))) + goto out; + r = kvm_vm_ioctl_create_spapr_tce_iommu(kvm, create_tce_iommu); + goto out; + } Would it make sense to make this the only interface for creating TCEs? That is, pass both a window_size and an IOMMU group id (or e.g. -1 for no hardware IOMMU usage), and have a single ioctl for both cases? There's some duplicated code between kvm_vm_ioctl_create_spapr_tce and kvm_vm_ioctl_create_spapr_tce_iommu. Just few bits. Is there really much sense in making one function from those two? I tried, looked a bit messy. KVM_CREATE_SPAPR_TCE could stay for backwards-compatibility, or you could just use a new capability and drop the old ioctl. The old capability+ioctl already exist for quite a while and few QEMU versions supporting it were released so we do not want just drop it. So then what is the benefit of having a new interface with support of both types? I'm not sure whether you're already considering the ABI to be stable for kvmppc. Is any bit of KVM using it? Cannot see from Documentation/ABI. -- Alexey -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling
On 05/25/2013 12:45 PM, David Gibson wrote: On Wed, May 22, 2013 at 04:06:57PM -0500, Scott Wood wrote: On 05/20/2013 10:06:46 PM, Alexey Kardashevskiy wrote: diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c index 8465c2a..da6bf61 100644 --- a/arch/powerpc/kvm/powerpc.c @@ -396,6 +396,7 @@ int kvm_dev_ioctl_check_extension(long ext) +++ b/arch/powerpc/kvm/powerpc.c break; #endif case KVM_CAP_SPAPR_MULTITCE: + case KVM_CAP_SPAPR_TCE_IOMMU: r = 1; break; default: Don't advertise SPAPR capabilities if it's not book3s -- and probably there's some additional limitation that would be appropriate. So, in the case of MULTITCE, that's not quite right. PR KVM can emulate a PAPR system on a BookE machine, and there's no reason not to allow TCE acceleration as well. We can't make it dependent on PAPR mode being selected, because that's enabled per-vcpu, whereas these capabilities are queried on the VM before the vcpus are created. CAP_SPAPR_TCE_IOMMU should be dependent on the presence of suitable host side hardware (i.e. a PAPR style IOMMU), though. The capability says that the ioctl is supported. If there is no IOMMU group registered, than it will fail with a reasonable error and nobody gets hurt. What is the problem? @@ -1025,6 +1026,17 @@ long kvm_arch_vm_ioctl(struct file *filp, r = kvm_vm_ioctl_create_spapr_tce(kvm, create_tce); goto out; } + case KVM_CREATE_SPAPR_TCE_IOMMU: { + struct kvm_create_spapr_tce_iommu create_tce_iommu; + struct kvm *kvm = filp-private_data; + + r = -EFAULT; + if (copy_from_user(create_tce_iommu, argp, + sizeof(create_tce_iommu))) + goto out; + r = kvm_vm_ioctl_create_spapr_tce_iommu(kvm, create_tce_iommu); + goto out; + } #endif /* CONFIG_PPC_BOOK3S_64 */ #ifdef CONFIG_KVM_BOOK3S_64_HV diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index 5a2afda..450c82a 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -667,6 +667,7 @@ struct kvm_ppc_smmu_info { #define KVM_CAP_PPC_RTAS 91 #define KVM_CAP_IRQ_XICS 92 #define KVM_CAP_SPAPR_MULTITCE (0x11 + 89) +#define KVM_CAP_SPAPR_TCE_IOMMU (0x11 + 90) Hmm... Ah, yeah, that needs to be fixed. Those were interim numbers so that we didn't have to keep changing our internal trees as new upstream ioctls got added to the list. We need to get a proper number for the merge, though. @@ -939,6 +940,9 @@ struct kvm_s390_ucas_mapping { #define KVM_GET_DEVICE_ATTR _IOW(KVMIO, 0xe2, struct kvm_device_attr) #define KVM_HAS_DEVICE_ATTR _IOW(KVMIO, 0xe3, struct kvm_device_attr) +/* ioctl for SPAPR TCE IOMMU */ +#define KVM_CREATE_SPAPR_TCE_IOMMU _IOW(KVMIO, 0xe4, struct kvm_create_spapr_tce_iommu) Shouldn't this go under the vm ioctl section? The KVM_CREATE_SPAPR_TCE_IOMMU ioctl (the version for emulated devices) is in this section so I decided to keep them together. Wrong? -- Alexey -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling
On 05/25/2013 12:45 PM, David Gibson wrote: On Wed, May 22, 2013 at 04:06:57PM -0500, Scott Wood wrote: On 05/20/2013 10:06:46 PM, Alexey Kardashevskiy wrote: diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c index 8465c2a..da6bf61 100644 --- a/arch/powerpc/kvm/powerpc.c @@ -396,6 +396,7 @@ int kvm_dev_ioctl_check_extension(long ext) +++ b/arch/powerpc/kvm/powerpc.c break; #endif case KVM_CAP_SPAPR_MULTITCE: + case KVM_CAP_SPAPR_TCE_IOMMU: r = 1; break; default: Don't advertise SPAPR capabilities if it's not book3s -- and probably there's some additional limitation that would be appropriate. So, in the case of MULTITCE, that's not quite right. PR KVM can emulate a PAPR system on a BookE machine, and there's no reason not to allow TCE acceleration as well. We can't make it dependent on PAPR mode being selected, because that's enabled per-vcpu, whereas these capabilities are queried on the VM before the vcpus are created. CAP_SPAPR_TCE_IOMMU should be dependent on the presence of suitable host side hardware (i.e. a PAPR style IOMMU), though. The capability says that the ioctl is supported. If there is no IOMMU group registered, than it will fail with a reasonable error and nobody gets hurt. What is the problem? @@ -1025,6 +1026,17 @@ long kvm_arch_vm_ioctl(struct file *filp, r = kvm_vm_ioctl_create_spapr_tce(kvm, create_tce); goto out; } + case KVM_CREATE_SPAPR_TCE_IOMMU: { + struct kvm_create_spapr_tce_iommu create_tce_iommu; + struct kvm *kvm = filp-private_data; + + r = -EFAULT; + if (copy_from_user(create_tce_iommu, argp, + sizeof(create_tce_iommu))) + goto out; + r = kvm_vm_ioctl_create_spapr_tce_iommu(kvm, create_tce_iommu); + goto out; + } #endif /* CONFIG_PPC_BOOK3S_64 */ #ifdef CONFIG_KVM_BOOK3S_64_HV diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index 5a2afda..450c82a 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -667,6 +667,7 @@ struct kvm_ppc_smmu_info { #define KVM_CAP_PPC_RTAS 91 #define KVM_CAP_IRQ_XICS 92 #define KVM_CAP_SPAPR_MULTITCE (0x11 + 89) +#define KVM_CAP_SPAPR_TCE_IOMMU (0x11 + 90) Hmm... Ah, yeah, that needs to be fixed. Those were interim numbers so that we didn't have to keep changing our internal trees as new upstream ioctls got added to the list. We need to get a proper number for the merge, though. @@ -939,6 +940,9 @@ struct kvm_s390_ucas_mapping { #define KVM_GET_DEVICE_ATTR _IOW(KVMIO, 0xe2, struct kvm_device_attr) #define KVM_HAS_DEVICE_ATTR _IOW(KVMIO, 0xe3, struct kvm_device_attr) +/* ioctl for SPAPR TCE IOMMU */ +#define KVM_CREATE_SPAPR_TCE_IOMMU _IOW(KVMIO, 0xe4, struct kvm_create_spapr_tce_iommu) Shouldn't this go under the vm ioctl section? The KVM_CREATE_SPAPR_TCE_IOMMU ioctl (the version for emulated devices) is in this section so I decided to keep them together. Wrong? -- Alexey -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling
On Wed, May 22, 2013 at 04:06:57PM -0500, Scott Wood wrote: On 05/20/2013 10:06:46 PM, Alexey Kardashevskiy wrote: diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c index 8465c2a..da6bf61 100644 --- a/arch/powerpc/kvm/powerpc.c @@ -396,6 +396,7 @@ int kvm_dev_ioctl_check_extension(long ext) +++ b/arch/powerpc/kvm/powerpc.c break; #endif case KVM_CAP_SPAPR_MULTITCE: +case KVM_CAP_SPAPR_TCE_IOMMU: r = 1; break; default: Don't advertise SPAPR capabilities if it's not book3s -- and probably there's some additional limitation that would be appropriate. So, in the case of MULTITCE, that's not quite right. PR KVM can emulate a PAPR system on a BookE machine, and there's no reason not to allow TCE acceleration as well. We can't make it dependent on PAPR mode being selected, because that's enabled per-vcpu, whereas these capabilities are queried on the VM before the vcpus are created. CAP_SPAPR_TCE_IOMMU should be dependent on the presence of suitable host side hardware (i.e. a PAPR style IOMMU), though. @@ -1025,6 +1026,17 @@ long kvm_arch_vm_ioctl(struct file *filp, r = kvm_vm_ioctl_create_spapr_tce(kvm, create_tce); goto out; } +case KVM_CREATE_SPAPR_TCE_IOMMU: { +struct kvm_create_spapr_tce_iommu create_tce_iommu; +struct kvm *kvm = filp-private_data; + +r = -EFAULT; +if (copy_from_user(create_tce_iommu, argp, +sizeof(create_tce_iommu))) +goto out; +r = kvm_vm_ioctl_create_spapr_tce_iommu(kvm, create_tce_iommu); +goto out; +} #endif /* CONFIG_PPC_BOOK3S_64 */ #ifdef CONFIG_KVM_BOOK3S_64_HV diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index 5a2afda..450c82a 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -667,6 +667,7 @@ struct kvm_ppc_smmu_info { #define KVM_CAP_PPC_RTAS 91 #define KVM_CAP_IRQ_XICS 92 #define KVM_CAP_SPAPR_MULTITCE (0x11 + 89) +#define KVM_CAP_SPAPR_TCE_IOMMU (0x11 + 90) Hmm... Ah, yeah, that needs to be fixed. Those were interim numbers so that we didn't have to keep changing our internal trees as new upstream ioctls got added to the list. We need to get a proper number for the merge, though. @@ -939,6 +940,9 @@ struct kvm_s390_ucas_mapping { #define KVM_GET_DEVICE_ATTR _IOW(KVMIO, 0xe2, struct kvm_device_attr) #define KVM_HAS_DEVICE_ATTR _IOW(KVMIO, 0xe3, struct kvm_device_attr) +/* ioctl for SPAPR TCE IOMMU */ +#define KVM_CREATE_SPAPR_TCE_IOMMU _IOW(KVMIO, 0xe4, struct kvm_create_spapr_tce_iommu) Shouldn't this go under the vm ioctl section? -Scott ___ Linuxppc-dev mailing list linuxppc-...@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev -- David Gibson| I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson signature.asc Description: Digital signature
Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling
On Wed, May 22, 2013 at 04:06:57PM -0500, Scott Wood wrote: On 05/20/2013 10:06:46 PM, Alexey Kardashevskiy wrote: diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c index 8465c2a..da6bf61 100644 --- a/arch/powerpc/kvm/powerpc.c @@ -396,6 +396,7 @@ int kvm_dev_ioctl_check_extension(long ext) +++ b/arch/powerpc/kvm/powerpc.c break; #endif case KVM_CAP_SPAPR_MULTITCE: +case KVM_CAP_SPAPR_TCE_IOMMU: r = 1; break; default: Don't advertise SPAPR capabilities if it's not book3s -- and probably there's some additional limitation that would be appropriate. So, in the case of MULTITCE, that's not quite right. PR KVM can emulate a PAPR system on a BookE machine, and there's no reason not to allow TCE acceleration as well. We can't make it dependent on PAPR mode being selected, because that's enabled per-vcpu, whereas these capabilities are queried on the VM before the vcpus are created. CAP_SPAPR_TCE_IOMMU should be dependent on the presence of suitable host side hardware (i.e. a PAPR style IOMMU), though. @@ -1025,6 +1026,17 @@ long kvm_arch_vm_ioctl(struct file *filp, r = kvm_vm_ioctl_create_spapr_tce(kvm, create_tce); goto out; } +case KVM_CREATE_SPAPR_TCE_IOMMU: { +struct kvm_create_spapr_tce_iommu create_tce_iommu; +struct kvm *kvm = filp-private_data; + +r = -EFAULT; +if (copy_from_user(create_tce_iommu, argp, +sizeof(create_tce_iommu))) +goto out; +r = kvm_vm_ioctl_create_spapr_tce_iommu(kvm, create_tce_iommu); +goto out; +} #endif /* CONFIG_PPC_BOOK3S_64 */ #ifdef CONFIG_KVM_BOOK3S_64_HV diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index 5a2afda..450c82a 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -667,6 +667,7 @@ struct kvm_ppc_smmu_info { #define KVM_CAP_PPC_RTAS 91 #define KVM_CAP_IRQ_XICS 92 #define KVM_CAP_SPAPR_MULTITCE (0x11 + 89) +#define KVM_CAP_SPAPR_TCE_IOMMU (0x11 + 90) Hmm... Ah, yeah, that needs to be fixed. Those were interim numbers so that we didn't have to keep changing our internal trees as new upstream ioctls got added to the list. We need to get a proper number for the merge, though. @@ -939,6 +940,9 @@ struct kvm_s390_ucas_mapping { #define KVM_GET_DEVICE_ATTR _IOW(KVMIO, 0xe2, struct kvm_device_attr) #define KVM_HAS_DEVICE_ATTR _IOW(KVMIO, 0xe3, struct kvm_device_attr) +/* ioctl for SPAPR TCE IOMMU */ +#define KVM_CREATE_SPAPR_TCE_IOMMU _IOW(KVMIO, 0xe4, struct kvm_create_spapr_tce_iommu) Shouldn't this go under the vm ioctl section? -Scott ___ Linuxppc-dev mailing list linuxppc-...@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev -- David Gibson| I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson signature.asc Description: Digital signature
Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling
On 05/20/2013 10:06:46 PM, Alexey Kardashevskiy wrote: diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c index 8465c2a..da6bf61 100644 --- a/arch/powerpc/kvm/powerpc.c @@ -396,6 +396,7 @@ int kvm_dev_ioctl_check_extension(long ext) +++ b/arch/powerpc/kvm/powerpc.c break; #endif case KVM_CAP_SPAPR_MULTITCE: + case KVM_CAP_SPAPR_TCE_IOMMU: r = 1; break; default: Don't advertise SPAPR capabilities if it's not book3s -- and probably there's some additional limitation that would be appropriate. @@ -1025,6 +1026,17 @@ long kvm_arch_vm_ioctl(struct file *filp, r = kvm_vm_ioctl_create_spapr_tce(kvm, create_tce); goto out; } + case KVM_CREATE_SPAPR_TCE_IOMMU: { + struct kvm_create_spapr_tce_iommu create_tce_iommu; + struct kvm *kvm = filp-private_data; + + r = -EFAULT; + if (copy_from_user(create_tce_iommu, argp, + sizeof(create_tce_iommu))) + goto out; + r = kvm_vm_ioctl_create_spapr_tce_iommu(kvm, create_tce_iommu); + goto out; + } #endif /* CONFIG_PPC_BOOK3S_64 */ #ifdef CONFIG_KVM_BOOK3S_64_HV diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index 5a2afda..450c82a 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -667,6 +667,7 @@ struct kvm_ppc_smmu_info { #define KVM_CAP_PPC_RTAS 91 #define KVM_CAP_IRQ_XICS 92 #define KVM_CAP_SPAPR_MULTITCE (0x11 + 89) +#define KVM_CAP_SPAPR_TCE_IOMMU (0x11 + 90) Hmm... @@ -939,6 +940,9 @@ struct kvm_s390_ucas_mapping { #define KVM_GET_DEVICE_ATTR _IOW(KVMIO, 0xe2, struct kvm_device_attr) #define KVM_HAS_DEVICE_ATTR _IOW(KVMIO, 0xe3, struct kvm_device_attr) +/* ioctl for SPAPR TCE IOMMU */ +#define KVM_CREATE_SPAPR_TCE_IOMMU _IOW(KVMIO, 0xe4, struct kvm_create_spapr_tce_iommu) Shouldn't this go under the vm ioctl section? -Scott -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling
On 05/20/2013 10:06:46 PM, Alexey Kardashevskiy wrote: diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c index 8465c2a..da6bf61 100644 --- a/arch/powerpc/kvm/powerpc.c @@ -396,6 +396,7 @@ int kvm_dev_ioctl_check_extension(long ext) +++ b/arch/powerpc/kvm/powerpc.c break; #endif case KVM_CAP_SPAPR_MULTITCE: + case KVM_CAP_SPAPR_TCE_IOMMU: r = 1; break; default: Don't advertise SPAPR capabilities if it's not book3s -- and probably there's some additional limitation that would be appropriate. @@ -1025,6 +1026,17 @@ long kvm_arch_vm_ioctl(struct file *filp, r = kvm_vm_ioctl_create_spapr_tce(kvm, create_tce); goto out; } + case KVM_CREATE_SPAPR_TCE_IOMMU: { + struct kvm_create_spapr_tce_iommu create_tce_iommu; + struct kvm *kvm = filp-private_data; + + r = -EFAULT; + if (copy_from_user(create_tce_iommu, argp, + sizeof(create_tce_iommu))) + goto out; + r = kvm_vm_ioctl_create_spapr_tce_iommu(kvm, create_tce_iommu); + goto out; + } #endif /* CONFIG_PPC_BOOK3S_64 */ #ifdef CONFIG_KVM_BOOK3S_64_HV diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index 5a2afda..450c82a 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -667,6 +667,7 @@ struct kvm_ppc_smmu_info { #define KVM_CAP_PPC_RTAS 91 #define KVM_CAP_IRQ_XICS 92 #define KVM_CAP_SPAPR_MULTITCE (0x11 + 89) +#define KVM_CAP_SPAPR_TCE_IOMMU (0x11 + 90) Hmm... @@ -939,6 +940,9 @@ struct kvm_s390_ucas_mapping { #define KVM_GET_DEVICE_ATTR _IOW(KVMIO, 0xe2, struct kvm_device_attr) #define KVM_HAS_DEVICE_ATTR _IOW(KVMIO, 0xe3, struct kvm_device_attr) +/* ioctl for SPAPR TCE IOMMU */ +#define KVM_CREATE_SPAPR_TCE_IOMMU _IOW(KVMIO, 0xe4, struct kvm_create_spapr_tce_iommu) Shouldn't this go under the vm ioctl section? -Scott -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling
This allows the host kernel to handle H_PUT_TCE, H_PUT_TCE_INDIRECT and H_STUFF_TCE requests without passing them to QEMU, which should save time on switching to QEMU and back. Both real and virtual modes are supported - whenever the kernel fails to handle TCE request, it passes it to the virtual mode. If it the virtual mode handlers fail, then the request is passed to the user mode, for example, to QEMU. This adds a new KVM_CAP_SPAPR_TCE_IOMMU ioctl to asssociate a virtual PCI bus ID (LIOBN) with an IOMMU group, which enables in-kernel handling of IOMMU map/unmap. Tests show that this patch increases transmission speed from 220MB/s to 750..1020MB/s on 10Gb network (Chelsea CXGB3 10Gb ethernet card). Cc: David Gibson da...@gibson.dropbear.id.au Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru Signed-off-by: Paul Mackerras pau...@samba.org --- Changes: 2013-05-20: * removed get_user() from real mode handlers * kvm_vcpu_arch::tce_tmp usage extended. Now real mode handler puts there translated TCEs, tries realmode_get_page() on those and if it fails, it passes control over the virtual mode handler which tries to finish the request handling * kvmppc_lookup_pte() now does realmode_get_page() protected by BUSY bit on a page * The only reason to pass the request to user mode now is when the user mode did not register TCE table in the kernel, in all other cases the virtual mode handler is expected to do the job --- Documentation/virtual/kvm/api.txt | 28 + arch/powerpc/include/asm/kvm_host.h |3 + arch/powerpc/include/asm/kvm_ppc.h |2 + arch/powerpc/include/uapi/asm/kvm.h |7 ++ arch/powerpc/kvm/book3s_64_vio.c| 198 ++- arch/powerpc/kvm/book3s_64_vio_hv.c | 193 +- arch/powerpc/kvm/powerpc.c | 12 +++ include/uapi/linux/kvm.h|4 + 8 files changed, 441 insertions(+), 6 deletions(-) diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt index 3c7c7ea..3c8e9fe 100644 --- a/Documentation/virtual/kvm/api.txt +++ b/Documentation/virtual/kvm/api.txt @@ -2362,6 +2362,34 @@ calls by the guest for that service will be passed to userspace to be handled. +4.79 KVM_CREATE_SPAPR_TCE_IOMMU + +Capability: KVM_CAP_SPAPR_TCE_IOMMU +Architectures: powerpc +Type: vm ioctl +Parameters: struct kvm_create_spapr_tce_iommu (in) +Returns: 0 on success, -1 on error + +This creates a link between IOMMU group and a hardware TCE (translation +control entry) table. This link lets the host kernel know what IOMMU +group (i.e. TCE table) to use for the LIOBN number passed with +H_PUT_TCE, H_PUT_TCE_INDIRECT, H_STUFF_TCE hypercalls. + +/* for KVM_CAP_SPAPR_TCE_IOMMU */ +struct kvm_create_spapr_tce_iommu { + __u64 liobn; + __u32 iommu_id; + __u32 flags; +}; + +No flag is supported at the moment. + +When the guest issues TCE call on a liobn for which a TCE table has been +registered, the kernel will handle it in real mode, updating the hardware +TCE table. TCE table calls for other liobns will cause a vm exit and must +be handled by userspace. + + 5. The kvm_run structure diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index 85d8f26..ac0e2fe 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -180,6 +180,7 @@ struct kvmppc_spapr_tce_table { struct kvm *kvm; u64 liobn; u32 window_size; + struct iommu_group *grp;/* used for IOMMU groups */ struct page *pages[0]; }; @@ -611,6 +612,8 @@ struct kvm_vcpu_arch { u64 busy_preempt; unsigned long *tce_tmp;/* TCE cache for TCE_PUT_INDIRECT hall */ + unsigned long tce_tmp_num; /* Number of handled TCEs in the cache */ + unsigned long tce_reason; /* The reason of switching to the virtmode */ #endif }; diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h index e852921b..934e01d 100644 --- a/arch/powerpc/include/asm/kvm_ppc.h +++ b/arch/powerpc/include/asm/kvm_ppc.h @@ -133,6 +133,8 @@ extern int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu); extern long kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm, struct kvm_create_spapr_tce *args); +extern long kvm_vm_ioctl_create_spapr_tce_iommu(struct kvm *kvm, + struct kvm_create_spapr_tce_iommu *args); extern struct kvmppc_spapr_tce_table *kvmppc_find_tce_table( struct kvm_vcpu *vcpu, unsigned long liobn); extern long kvmppc_emulated_validate_tce(unsigned long tce); diff --git a/arch/powerpc/include/uapi/asm/kvm.h b/arch/powerpc/include/uapi/asm/kvm.h index 0fb1a6e..cf82af4 100644 --- a/arch/powerpc/include/uapi/asm/kvm.h +++ b/arch/powerpc/include/uapi/asm/kvm.h @@ -319,6 +319,13 @@ struct kvm_create_spapr_tce { __u32 window_size; }; +/* for KVM_CAP_SPAPR_TCE_IOMMU */
[PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling
This allows the host kernel to handle H_PUT_TCE, H_PUT_TCE_INDIRECT and H_STUFF_TCE requests without passing them to QEMU, which should save time on switching to QEMU and back. Both real and virtual modes are supported - whenever the kernel fails to handle TCE request, it passes it to the virtual mode. If it the virtual mode handlers fail, then the request is passed to the user mode, for example, to QEMU. This adds a new KVM_CAP_SPAPR_TCE_IOMMU ioctl to asssociate a virtual PCI bus ID (LIOBN) with an IOMMU group, which enables in-kernel handling of IOMMU map/unmap. Tests show that this patch increases transmission speed from 220MB/s to 750..1020MB/s on 10Gb network (Chelsea CXGB3 10Gb ethernet card). Cc: David Gibson da...@gibson.dropbear.id.au Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru Signed-off-by: Paul Mackerras pau...@samba.org --- Changes: 2013-05-20: * removed get_user() from real mode handlers * kvm_vcpu_arch::tce_tmp usage extended. Now real mode handler puts there translated TCEs, tries realmode_get_page() on those and if it fails, it passes control over the virtual mode handler which tries to finish the request handling * kvmppc_lookup_pte() now does realmode_get_page() protected by BUSY bit on a page * The only reason to pass the request to user mode now is when the user mode did not register TCE table in the kernel, in all other cases the virtual mode handler is expected to do the job --- Documentation/virtual/kvm/api.txt | 28 + arch/powerpc/include/asm/kvm_host.h |3 + arch/powerpc/include/asm/kvm_ppc.h |2 + arch/powerpc/include/uapi/asm/kvm.h |7 ++ arch/powerpc/kvm/book3s_64_vio.c| 198 ++- arch/powerpc/kvm/book3s_64_vio_hv.c | 193 +- arch/powerpc/kvm/powerpc.c | 12 +++ include/uapi/linux/kvm.h|4 + 8 files changed, 441 insertions(+), 6 deletions(-) diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt index 3c7c7ea..3c8e9fe 100644 --- a/Documentation/virtual/kvm/api.txt +++ b/Documentation/virtual/kvm/api.txt @@ -2362,6 +2362,34 @@ calls by the guest for that service will be passed to userspace to be handled. +4.79 KVM_CREATE_SPAPR_TCE_IOMMU + +Capability: KVM_CAP_SPAPR_TCE_IOMMU +Architectures: powerpc +Type: vm ioctl +Parameters: struct kvm_create_spapr_tce_iommu (in) +Returns: 0 on success, -1 on error + +This creates a link between IOMMU group and a hardware TCE (translation +control entry) table. This link lets the host kernel know what IOMMU +group (i.e. TCE table) to use for the LIOBN number passed with +H_PUT_TCE, H_PUT_TCE_INDIRECT, H_STUFF_TCE hypercalls. + +/* for KVM_CAP_SPAPR_TCE_IOMMU */ +struct kvm_create_spapr_tce_iommu { + __u64 liobn; + __u32 iommu_id; + __u32 flags; +}; + +No flag is supported at the moment. + +When the guest issues TCE call on a liobn for which a TCE table has been +registered, the kernel will handle it in real mode, updating the hardware +TCE table. TCE table calls for other liobns will cause a vm exit and must +be handled by userspace. + + 5. The kvm_run structure diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index 85d8f26..ac0e2fe 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -180,6 +180,7 @@ struct kvmppc_spapr_tce_table { struct kvm *kvm; u64 liobn; u32 window_size; + struct iommu_group *grp;/* used for IOMMU groups */ struct page *pages[0]; }; @@ -611,6 +612,8 @@ struct kvm_vcpu_arch { u64 busy_preempt; unsigned long *tce_tmp;/* TCE cache for TCE_PUT_INDIRECT hall */ + unsigned long tce_tmp_num; /* Number of handled TCEs in the cache */ + unsigned long tce_reason; /* The reason of switching to the virtmode */ #endif }; diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h index e852921b..934e01d 100644 --- a/arch/powerpc/include/asm/kvm_ppc.h +++ b/arch/powerpc/include/asm/kvm_ppc.h @@ -133,6 +133,8 @@ extern int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu); extern long kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm, struct kvm_create_spapr_tce *args); +extern long kvm_vm_ioctl_create_spapr_tce_iommu(struct kvm *kvm, + struct kvm_create_spapr_tce_iommu *args); extern struct kvmppc_spapr_tce_table *kvmppc_find_tce_table( struct kvm_vcpu *vcpu, unsigned long liobn); extern long kvmppc_emulated_validate_tce(unsigned long tce); diff --git a/arch/powerpc/include/uapi/asm/kvm.h b/arch/powerpc/include/uapi/asm/kvm.h index 0fb1a6e..cf82af4 100644 --- a/arch/powerpc/include/uapi/asm/kvm.h +++ b/arch/powerpc/include/uapi/asm/kvm.h @@ -319,6 +319,13 @@ struct kvm_create_spapr_tce { __u32 window_size; }; +/* for KVM_CAP_SPAPR_TCE_IOMMU */