Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling

2013-06-27 Thread David Gibson
On Sun, Jun 23, 2013 at 10:41:24PM -0600, Alex Williamson wrote:
 On Mon, 2013-06-24 at 13:52 +1000, David Gibson wrote:
  On Sat, Jun 22, 2013 at 08:28:06AM -0600, Alex Williamson wrote:
   On Sat, 2013-06-22 at 22:03 +1000, David Gibson wrote:
On Thu, Jun 20, 2013 at 08:55:13AM -0600, Alex Williamson wrote:
 On Thu, 2013-06-20 at 18:48 +1000, Alexey Kardashevskiy wrote:
  On 06/20/2013 05:47 PM, Benjamin Herrenschmidt wrote:
   On Thu, 2013-06-20 at 15:28 +1000, David Gibson wrote:
   Just out of curiosity - would not get_file() and fput_atomic() 
   on a
   group's
   file* do the right job instead of 
   vfio_group_add_external_user() and
   vfio_group_del_external_user()?
  
   I was thinking that too.  Grabbing a file reference would 
   certainly be
   the usual way of handling this sort of thing.
   
   But that wouldn't prevent the group ownership to be returned to
   the kernel or another user would it ?
  
  
  Holding the file pointer does not let the group-container_users 
  counter go
  to zero
 
 How so?  Holding the file pointer means the file won't go away, which
 means the group release function won't be called.  That means the 
 group
 won't go away, but that doesn't mean it's attached to an IOMMU.  A 
 user
 could call UNSET_CONTAINER.

Uhh... *thinks*.  Ah, I see.

I think the interface should not take the group fd, but the container
fd.  Holding a reference to *that* would keep the necessary things
around.  But more to the point, it's the right thing semantically:

The container is essentially the handle on a host iommu address space,
and so that's what should be bound by the KVM call to a particular
guest iommu address space.  e.g. it would make no sense to bind two
different groups to different guest iommu address spaces, if they were
in the same container - the guest thinks they are different spaces,
but if they're in the same container they must be the same space.
   
   While the container is the gateway to the iommu, what empowers the
   container to maintain an iommu is the group.  What happens to a
   container when all the groups are disconnected or closed?  Groups are
   the unit that indicates hardware access, not containers.  Thanks,
  
  Uh... huh?  I'm really not sure what you're getting at.
  
  The operation we're doing for KVM here is binding a guest iommu
  address space to a particular host iommu address space.  Why would we
  not want to use the obvious handle on the host iommu address space,
  which is the container fd?
 
 AIUI, the request isn't for an interface through which to do iommu
 mappings.  The request is for an interface to show that the user has
 sufficient privileges to do mappings.  Groups are what gives the user
 that ability.  The iommu is also possibly associated with multiple iommu
 groups and I believe what is being asked for here is a way to hold and
 lock a single iommu group with iommu protection.
 
 From a practical point of view, the iommu interface is de-privileged
 once the groups are disconnected or closed.  Holding a reference count
 on the iommu fd won't prevent that.  That means we'd have to use a
 notifier to have KVM stop the side-channel iommu access.  Meanwhile
 holding the file descriptor for the group and adding an interface that
 bumps use counter allows KVM to lock itself in, just as if it had a
 device opened itself.  Thanks,

Ah, good point.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


pgpCSZuQp1350.pgp
Description: PGP signature


Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling

2013-06-27 Thread David Gibson
On Sun, Jun 23, 2013 at 10:41:24PM -0600, Alex Williamson wrote:
 On Mon, 2013-06-24 at 13:52 +1000, David Gibson wrote:
  On Sat, Jun 22, 2013 at 08:28:06AM -0600, Alex Williamson wrote:
   On Sat, 2013-06-22 at 22:03 +1000, David Gibson wrote:
On Thu, Jun 20, 2013 at 08:55:13AM -0600, Alex Williamson wrote:
 On Thu, 2013-06-20 at 18:48 +1000, Alexey Kardashevskiy wrote:
  On 06/20/2013 05:47 PM, Benjamin Herrenschmidt wrote:
   On Thu, 2013-06-20 at 15:28 +1000, David Gibson wrote:
   Just out of curiosity - would not get_file() and fput_atomic() 
   on a
   group's
   file* do the right job instead of 
   vfio_group_add_external_user() and
   vfio_group_del_external_user()?
  
   I was thinking that too.  Grabbing a file reference would 
   certainly be
   the usual way of handling this sort of thing.
   
   But that wouldn't prevent the group ownership to be returned to
   the kernel or another user would it ?
  
  
  Holding the file pointer does not let the group-container_users 
  counter go
  to zero
 
 How so?  Holding the file pointer means the file won't go away, which
 means the group release function won't be called.  That means the 
 group
 won't go away, but that doesn't mean it's attached to an IOMMU.  A 
 user
 could call UNSET_CONTAINER.

Uhh... *thinks*.  Ah, I see.

I think the interface should not take the group fd, but the container
fd.  Holding a reference to *that* would keep the necessary things
around.  But more to the point, it's the right thing semantically:

The container is essentially the handle on a host iommu address space,
and so that's what should be bound by the KVM call to a particular
guest iommu address space.  e.g. it would make no sense to bind two
different groups to different guest iommu address spaces, if they were
in the same container - the guest thinks they are different spaces,
but if they're in the same container they must be the same space.
   
   While the container is the gateway to the iommu, what empowers the
   container to maintain an iommu is the group.  What happens to a
   container when all the groups are disconnected or closed?  Groups are
   the unit that indicates hardware access, not containers.  Thanks,
  
  Uh... huh?  I'm really not sure what you're getting at.
  
  The operation we're doing for KVM here is binding a guest iommu
  address space to a particular host iommu address space.  Why would we
  not want to use the obvious handle on the host iommu address space,
  which is the container fd?
 
 AIUI, the request isn't for an interface through which to do iommu
 mappings.  The request is for an interface to show that the user has
 sufficient privileges to do mappings.  Groups are what gives the user
 that ability.  The iommu is also possibly associated with multiple iommu
 groups and I believe what is being asked for here is a way to hold and
 lock a single iommu group with iommu protection.
 
 From a practical point of view, the iommu interface is de-privileged
 once the groups are disconnected or closed.  Holding a reference count
 on the iommu fd won't prevent that.  That means we'd have to use a
 notifier to have KVM stop the side-channel iommu access.  Meanwhile
 holding the file descriptor for the group and adding an interface that
 bumps use counter allows KVM to lock itself in, just as if it had a
 device opened itself.  Thanks,

Ah, good point.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


pgpAjDK6J9rb1.pgp
Description: PGP signature


Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling

2013-06-23 Thread David Gibson
On Sun, Jun 23, 2013 at 09:28:13AM +1000, Benjamin Herrenschmidt wrote:
 On Sat, 2013-06-22 at 22:03 +1000, David Gibson wrote:
  I think the interface should not take the group fd, but the container
  fd.  Holding a reference to *that* would keep the necessary things
  around.  But more to the point, it's the right thing semantically:
  
  The container is essentially the handle on a host iommu address space,
  and so that's what should be bound by the KVM call to a particular
  guest iommu address space.  e.g. it would make no sense to bind two
  different groups to different guest iommu address spaces, if they were
  in the same container - the guest thinks they are different spaces,
  but if they're in the same container they must be the same space.
 
 Interestingly, how are we going to extend that when/if we implement
 DDW ?
 
 DDW means an API by which the guest can request the creation of
 additional iommus for a given device (typically, in addition to the
 default smallish 32-bit one using 4k pages, the guest can request
 a larger window in 64-bit space using a larger page size).

So, would a PAPR gest requesting this expect the new window to have
a new liobn, or an existing liobn?

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


pgpaNZNxL_Bip.pgp
Description: PGP signature


Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling

2013-06-23 Thread David Gibson
On Sat, Jun 22, 2013 at 08:28:06AM -0600, Alex Williamson wrote:
 On Sat, 2013-06-22 at 22:03 +1000, David Gibson wrote:
  On Thu, Jun 20, 2013 at 08:55:13AM -0600, Alex Williamson wrote:
   On Thu, 2013-06-20 at 18:48 +1000, Alexey Kardashevskiy wrote:
On 06/20/2013 05:47 PM, Benjamin Herrenschmidt wrote:
 On Thu, 2013-06-20 at 15:28 +1000, David Gibson wrote:
 Just out of curiosity - would not get_file() and fput_atomic() on a
 group's
 file* do the right job instead of vfio_group_add_external_user() and
 vfio_group_del_external_user()?

 I was thinking that too.  Grabbing a file reference would certainly 
 be
 the usual way of handling this sort of thing.
 
 But that wouldn't prevent the group ownership to be returned to
 the kernel or another user would it ?


Holding the file pointer does not let the group-container_users 
counter go
to zero
   
   How so?  Holding the file pointer means the file won't go away, which
   means the group release function won't be called.  That means the group
   won't go away, but that doesn't mean it's attached to an IOMMU.  A user
   could call UNSET_CONTAINER.
  
  Uhh... *thinks*.  Ah, I see.
  
  I think the interface should not take the group fd, but the container
  fd.  Holding a reference to *that* would keep the necessary things
  around.  But more to the point, it's the right thing semantically:
  
  The container is essentially the handle on a host iommu address space,
  and so that's what should be bound by the KVM call to a particular
  guest iommu address space.  e.g. it would make no sense to bind two
  different groups to different guest iommu address spaces, if they were
  in the same container - the guest thinks they are different spaces,
  but if they're in the same container they must be the same space.
 
 While the container is the gateway to the iommu, what empowers the
 container to maintain an iommu is the group.  What happens to a
 container when all the groups are disconnected or closed?  Groups are
 the unit that indicates hardware access, not containers.  Thanks,

Uh... huh?  I'm really not sure what you're getting at.

The operation we're doing for KVM here is binding a guest iommu
address space to a particular host iommu address space.  Why would we
not want to use the obvious handle on the host iommu address space,
which is the container fd?

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


pgp5yjWbp1NZI.pgp
Description: PGP signature


Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling

2013-06-23 Thread Benjamin Herrenschmidt
On Mon, 2013-06-24 at 13:54 +1000, David Gibson wrote:
  DDW means an API by which the guest can request the creation of
  additional iommus for a given device (typically, in addition to the
  default smallish 32-bit one using 4k pages, the guest can request
  a larger window in 64-bit space using a larger page size).
 
 So, would a PAPR gest requesting this expect the new window to have
 a new liobn, or an existing liobn?

New liobn or there is no way to H_PUT_TCE it (it exists in addition
to the legacy window).

Cheers,
Ben.


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling

2013-06-23 Thread Alex Williamson
On Mon, 2013-06-24 at 13:52 +1000, David Gibson wrote:
 On Sat, Jun 22, 2013 at 08:28:06AM -0600, Alex Williamson wrote:
  On Sat, 2013-06-22 at 22:03 +1000, David Gibson wrote:
   On Thu, Jun 20, 2013 at 08:55:13AM -0600, Alex Williamson wrote:
On Thu, 2013-06-20 at 18:48 +1000, Alexey Kardashevskiy wrote:
 On 06/20/2013 05:47 PM, Benjamin Herrenschmidt wrote:
  On Thu, 2013-06-20 at 15:28 +1000, David Gibson wrote:
  Just out of curiosity - would not get_file() and fput_atomic() on 
  a
  group's
  file* do the right job instead of vfio_group_add_external_user() 
  and
  vfio_group_del_external_user()?
 
  I was thinking that too.  Grabbing a file reference would 
  certainly be
  the usual way of handling this sort of thing.
  
  But that wouldn't prevent the group ownership to be returned to
  the kernel or another user would it ?
 
 
 Holding the file pointer does not let the group-container_users 
 counter go
 to zero

How so?  Holding the file pointer means the file won't go away, which
means the group release function won't be called.  That means the group
won't go away, but that doesn't mean it's attached to an IOMMU.  A user
could call UNSET_CONTAINER.
   
   Uhh... *thinks*.  Ah, I see.
   
   I think the interface should not take the group fd, but the container
   fd.  Holding a reference to *that* would keep the necessary things
   around.  But more to the point, it's the right thing semantically:
   
   The container is essentially the handle on a host iommu address space,
   and so that's what should be bound by the KVM call to a particular
   guest iommu address space.  e.g. it would make no sense to bind two
   different groups to different guest iommu address spaces, if they were
   in the same container - the guest thinks they are different spaces,
   but if they're in the same container they must be the same space.
  
  While the container is the gateway to the iommu, what empowers the
  container to maintain an iommu is the group.  What happens to a
  container when all the groups are disconnected or closed?  Groups are
  the unit that indicates hardware access, not containers.  Thanks,
 
 Uh... huh?  I'm really not sure what you're getting at.
 
 The operation we're doing for KVM here is binding a guest iommu
 address space to a particular host iommu address space.  Why would we
 not want to use the obvious handle on the host iommu address space,
 which is the container fd?

AIUI, the request isn't for an interface through which to do iommu
mappings.  The request is for an interface to show that the user has
sufficient privileges to do mappings.  Groups are what gives the user
that ability.  The iommu is also possibly associated with multiple iommu
groups and I believe what is being asked for here is a way to hold and
lock a single iommu group with iommu protection.

From a practical point of view, the iommu interface is de-privileged
once the groups are disconnected or closed.  Holding a reference count
on the iommu fd won't prevent that.  That means we'd have to use a
notifier to have KVM stop the side-channel iommu access.  Meanwhile
holding the file descriptor for the group and adding an interface that
bumps use counter allows KVM to lock itself in, just as if it had a
device opened itself.  Thanks,

Alex

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling

2013-06-23 Thread David Gibson
On Sun, Jun 23, 2013 at 09:28:13AM +1000, Benjamin Herrenschmidt wrote:
 On Sat, 2013-06-22 at 22:03 +1000, David Gibson wrote:
  I think the interface should not take the group fd, but the container
  fd.  Holding a reference to *that* would keep the necessary things
  around.  But more to the point, it's the right thing semantically:
  
  The container is essentially the handle on a host iommu address space,
  and so that's what should be bound by the KVM call to a particular
  guest iommu address space.  e.g. it would make no sense to bind two
  different groups to different guest iommu address spaces, if they were
  in the same container - the guest thinks they are different spaces,
  but if they're in the same container they must be the same space.
 
 Interestingly, how are we going to extend that when/if we implement
 DDW ?
 
 DDW means an API by which the guest can request the creation of
 additional iommus for a given device (typically, in addition to the
 default smallish 32-bit one using 4k pages, the guest can request
 a larger window in 64-bit space using a larger page size).

So, would a PAPR gest requesting this expect the new window to have
a new liobn, or an existing liobn?

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


pgp5jkS8m1XZl.pgp
Description: PGP signature


Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling

2013-06-23 Thread David Gibson
On Sat, Jun 22, 2013 at 08:28:06AM -0600, Alex Williamson wrote:
 On Sat, 2013-06-22 at 22:03 +1000, David Gibson wrote:
  On Thu, Jun 20, 2013 at 08:55:13AM -0600, Alex Williamson wrote:
   On Thu, 2013-06-20 at 18:48 +1000, Alexey Kardashevskiy wrote:
On 06/20/2013 05:47 PM, Benjamin Herrenschmidt wrote:
 On Thu, 2013-06-20 at 15:28 +1000, David Gibson wrote:
 Just out of curiosity - would not get_file() and fput_atomic() on a
 group's
 file* do the right job instead of vfio_group_add_external_user() and
 vfio_group_del_external_user()?

 I was thinking that too.  Grabbing a file reference would certainly 
 be
 the usual way of handling this sort of thing.
 
 But that wouldn't prevent the group ownership to be returned to
 the kernel or another user would it ?


Holding the file pointer does not let the group-container_users 
counter go
to zero
   
   How so?  Holding the file pointer means the file won't go away, which
   means the group release function won't be called.  That means the group
   won't go away, but that doesn't mean it's attached to an IOMMU.  A user
   could call UNSET_CONTAINER.
  
  Uhh... *thinks*.  Ah, I see.
  
  I think the interface should not take the group fd, but the container
  fd.  Holding a reference to *that* would keep the necessary things
  around.  But more to the point, it's the right thing semantically:
  
  The container is essentially the handle on a host iommu address space,
  and so that's what should be bound by the KVM call to a particular
  guest iommu address space.  e.g. it would make no sense to bind two
  different groups to different guest iommu address spaces, if they were
  in the same container - the guest thinks they are different spaces,
  but if they're in the same container they must be the same space.
 
 While the container is the gateway to the iommu, what empowers the
 container to maintain an iommu is the group.  What happens to a
 container when all the groups are disconnected or closed?  Groups are
 the unit that indicates hardware access, not containers.  Thanks,

Uh... huh?  I'm really not sure what you're getting at.

The operation we're doing for KVM here is binding a guest iommu
address space to a particular host iommu address space.  Why would we
not want to use the obvious handle on the host iommu address space,
which is the container fd?

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


pgpfZESO3eXiq.pgp
Description: PGP signature


Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling

2013-06-23 Thread Benjamin Herrenschmidt
On Mon, 2013-06-24 at 13:54 +1000, David Gibson wrote:
  DDW means an API by which the guest can request the creation of
  additional iommus for a given device (typically, in addition to the
  default smallish 32-bit one using 4k pages, the guest can request
  a larger window in 64-bit space using a larger page size).
 
 So, would a PAPR gest requesting this expect the new window to have
 a new liobn, or an existing liobn?

New liobn or there is no way to H_PUT_TCE it (it exists in addition
to the legacy window).

Cheers,
Ben.


--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling

2013-06-23 Thread Alex Williamson
On Mon, 2013-06-24 at 13:52 +1000, David Gibson wrote:
 On Sat, Jun 22, 2013 at 08:28:06AM -0600, Alex Williamson wrote:
  On Sat, 2013-06-22 at 22:03 +1000, David Gibson wrote:
   On Thu, Jun 20, 2013 at 08:55:13AM -0600, Alex Williamson wrote:
On Thu, 2013-06-20 at 18:48 +1000, Alexey Kardashevskiy wrote:
 On 06/20/2013 05:47 PM, Benjamin Herrenschmidt wrote:
  On Thu, 2013-06-20 at 15:28 +1000, David Gibson wrote:
  Just out of curiosity - would not get_file() and fput_atomic() on 
  a
  group's
  file* do the right job instead of vfio_group_add_external_user() 
  and
  vfio_group_del_external_user()?
 
  I was thinking that too.  Grabbing a file reference would 
  certainly be
  the usual way of handling this sort of thing.
  
  But that wouldn't prevent the group ownership to be returned to
  the kernel or another user would it ?
 
 
 Holding the file pointer does not let the group-container_users 
 counter go
 to zero

How so?  Holding the file pointer means the file won't go away, which
means the group release function won't be called.  That means the group
won't go away, but that doesn't mean it's attached to an IOMMU.  A user
could call UNSET_CONTAINER.
   
   Uhh... *thinks*.  Ah, I see.
   
   I think the interface should not take the group fd, but the container
   fd.  Holding a reference to *that* would keep the necessary things
   around.  But more to the point, it's the right thing semantically:
   
   The container is essentially the handle on a host iommu address space,
   and so that's what should be bound by the KVM call to a particular
   guest iommu address space.  e.g. it would make no sense to bind two
   different groups to different guest iommu address spaces, if they were
   in the same container - the guest thinks they are different spaces,
   but if they're in the same container they must be the same space.
  
  While the container is the gateway to the iommu, what empowers the
  container to maintain an iommu is the group.  What happens to a
  container when all the groups are disconnected or closed?  Groups are
  the unit that indicates hardware access, not containers.  Thanks,
 
 Uh... huh?  I'm really not sure what you're getting at.
 
 The operation we're doing for KVM here is binding a guest iommu
 address space to a particular host iommu address space.  Why would we
 not want to use the obvious handle on the host iommu address space,
 which is the container fd?

AIUI, the request isn't for an interface through which to do iommu
mappings.  The request is for an interface to show that the user has
sufficient privileges to do mappings.  Groups are what gives the user
that ability.  The iommu is also possibly associated with multiple iommu
groups and I believe what is being asked for here is a way to hold and
lock a single iommu group with iommu protection.

From a practical point of view, the iommu interface is de-privileged
once the groups are disconnected or closed.  Holding a reference count
on the iommu fd won't prevent that.  That means we'd have to use a
notifier to have KVM stop the side-channel iommu access.  Meanwhile
holding the file descriptor for the group and adding an interface that
bumps use counter allows KVM to lock itself in, just as if it had a
device opened itself.  Thanks,

Alex

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling

2013-06-22 Thread Alexey Kardashevskiy
On 06/21/2013 12:55 AM, Alex Williamson wrote:
 On Thu, 2013-06-20 at 18:48 +1000, Alexey Kardashevskiy wrote:
 On 06/20/2013 05:47 PM, Benjamin Herrenschmidt wrote:
 On Thu, 2013-06-20 at 15:28 +1000, David Gibson wrote:
 Just out of curiosity - would not get_file() and fput_atomic() on a
 group's
 file* do the right job instead of vfio_group_add_external_user() and
 vfio_group_del_external_user()?

 I was thinking that too.  Grabbing a file reference would certainly be
 the usual way of handling this sort of thing.

 But that wouldn't prevent the group ownership to be returned to
 the kernel or another user would it ?


 Holding the file pointer does not let the group-container_users counter go
 to zero
 
 How so?  Holding the file pointer means the file won't go away, which
 means the group release function won't be called.  That means the group
 won't go away, but that doesn't mean it's attached to an IOMMU.  A user
 could call UNSET_CONTAINER.
 
  and this is exactly what vfio_group_add_external_user() and
 vfio_group_del_external_user() do. The difference is only in absolute value
 - 2 vs. 3.

 No change in behaviour whether I use new vfio API or simply hold file* till
 KVM closes fd created when IOMMU was connected to LIOBN.
 
 By that notion you could open(/dev/vfio/$GROUP) and you're safe, right?
 But what about SET_CONTAINER  SET_IOMMU?  All that you guarantee
 holding the file pointer is that the vfio_group exists.
 
 And while this counter is not zero, QEMU cannot take ownership over the 
 group.

 I am definitely still missing the bigger picture...
 
 The bigger picture is that the group needs to exist AND it needs to be
 setup and maintained to have IOMMU protection.  Actually, my first stab
 at add_external_user doesn't look sufficient, it needs to look more like
 vfio_group_get_device_fd, checking group-container-iommu and
 group_viable().


This makes sense. If you did this, that would be great. Without it, I
really cannot see how the proposed inc/dec of container_users is better
than simple holding file*. Thanks.


 As written it would allow an external user after
 SET_CONTAINER without SET_IOMMU.  It should also be part of the API that
 the external user must hold the file reference between add_external_use
 and del_external_user and do cleanup on any exit paths.  Thanks,


-- 
Alexey
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling

2013-06-22 Thread David Gibson
On Thu, Jun 20, 2013 at 08:55:13AM -0600, Alex Williamson wrote:
 On Thu, 2013-06-20 at 18:48 +1000, Alexey Kardashevskiy wrote:
  On 06/20/2013 05:47 PM, Benjamin Herrenschmidt wrote:
   On Thu, 2013-06-20 at 15:28 +1000, David Gibson wrote:
   Just out of curiosity - would not get_file() and fput_atomic() on a
   group's
   file* do the right job instead of vfio_group_add_external_user() and
   vfio_group_del_external_user()?
  
   I was thinking that too.  Grabbing a file reference would certainly be
   the usual way of handling this sort of thing.
   
   But that wouldn't prevent the group ownership to be returned to
   the kernel or another user would it ?
  
  
  Holding the file pointer does not let the group-container_users counter go
  to zero
 
 How so?  Holding the file pointer means the file won't go away, which
 means the group release function won't be called.  That means the group
 won't go away, but that doesn't mean it's attached to an IOMMU.  A user
 could call UNSET_CONTAINER.

Uhh... *thinks*.  Ah, I see.

I think the interface should not take the group fd, but the container
fd.  Holding a reference to *that* would keep the necessary things
around.  But more to the point, it's the right thing semantically:

The container is essentially the handle on a host iommu address space,
and so that's what should be bound by the KVM call to a particular
guest iommu address space.  e.g. it would make no sense to bind two
different groups to different guest iommu address spaces, if they were
in the same container - the guest thinks they are different spaces,
but if they're in the same container they must be the same space.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


pgpV98XDgZ24l.pgp
Description: PGP signature


Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling

2013-06-22 Thread Alex Williamson
On Sat, 2013-06-22 at 22:03 +1000, David Gibson wrote:
 On Thu, Jun 20, 2013 at 08:55:13AM -0600, Alex Williamson wrote:
  On Thu, 2013-06-20 at 18:48 +1000, Alexey Kardashevskiy wrote:
   On 06/20/2013 05:47 PM, Benjamin Herrenschmidt wrote:
On Thu, 2013-06-20 at 15:28 +1000, David Gibson wrote:
Just out of curiosity - would not get_file() and fput_atomic() on a
group's
file* do the right job instead of vfio_group_add_external_user() and
vfio_group_del_external_user()?
   
I was thinking that too.  Grabbing a file reference would certainly be
the usual way of handling this sort of thing.

But that wouldn't prevent the group ownership to be returned to
the kernel or another user would it ?
   
   
   Holding the file pointer does not let the group-container_users counter 
   go
   to zero
  
  How so?  Holding the file pointer means the file won't go away, which
  means the group release function won't be called.  That means the group
  won't go away, but that doesn't mean it's attached to an IOMMU.  A user
  could call UNSET_CONTAINER.
 
 Uhh... *thinks*.  Ah, I see.
 
 I think the interface should not take the group fd, but the container
 fd.  Holding a reference to *that* would keep the necessary things
 around.  But more to the point, it's the right thing semantically:
 
 The container is essentially the handle on a host iommu address space,
 and so that's what should be bound by the KVM call to a particular
 guest iommu address space.  e.g. it would make no sense to bind two
 different groups to different guest iommu address spaces, if they were
 in the same container - the guest thinks they are different spaces,
 but if they're in the same container they must be the same space.

While the container is the gateway to the iommu, what empowers the
container to maintain an iommu is the group.  What happens to a
container when all the groups are disconnected or closed?  Groups are
the unit that indicates hardware access, not containers.  Thanks,

Alex

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling

2013-06-22 Thread Benjamin Herrenschmidt
On Sat, 2013-06-22 at 22:03 +1000, David Gibson wrote:
 I think the interface should not take the group fd, but the container
 fd.  Holding a reference to *that* would keep the necessary things
 around.  But more to the point, it's the right thing semantically:
 
 The container is essentially the handle on a host iommu address space,
 and so that's what should be bound by the KVM call to a particular
 guest iommu address space.  e.g. it would make no sense to bind two
 different groups to different guest iommu address spaces, if they were
 in the same container - the guest thinks they are different spaces,
 but if they're in the same container they must be the same space.

Interestingly, how are we going to extend that when/if we implement
DDW ?

DDW means an API by which the guest can request the creation of
additional iommus for a given device (typically, in addition to the
default smallish 32-bit one using 4k pages, the guest can request
a larger window in 64-bit space using a larger page size).

Cheers,
Ben.


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling

2013-06-22 Thread Alexey Kardashevskiy
On 06/21/2013 12:55 AM, Alex Williamson wrote:
 On Thu, 2013-06-20 at 18:48 +1000, Alexey Kardashevskiy wrote:
 On 06/20/2013 05:47 PM, Benjamin Herrenschmidt wrote:
 On Thu, 2013-06-20 at 15:28 +1000, David Gibson wrote:
 Just out of curiosity - would not get_file() and fput_atomic() on a
 group's
 file* do the right job instead of vfio_group_add_external_user() and
 vfio_group_del_external_user()?

 I was thinking that too.  Grabbing a file reference would certainly be
 the usual way of handling this sort of thing.

 But that wouldn't prevent the group ownership to be returned to
 the kernel or another user would it ?


 Holding the file pointer does not let the group-container_users counter go
 to zero
 
 How so?  Holding the file pointer means the file won't go away, which
 means the group release function won't be called.  That means the group
 won't go away, but that doesn't mean it's attached to an IOMMU.  A user
 could call UNSET_CONTAINER.
 
  and this is exactly what vfio_group_add_external_user() and
 vfio_group_del_external_user() do. The difference is only in absolute value
 - 2 vs. 3.

 No change in behaviour whether I use new vfio API or simply hold file* till
 KVM closes fd created when IOMMU was connected to LIOBN.
 
 By that notion you could open(/dev/vfio/$GROUP) and you're safe, right?
 But what about SET_CONTAINER  SET_IOMMU?  All that you guarantee
 holding the file pointer is that the vfio_group exists.
 
 And while this counter is not zero, QEMU cannot take ownership over the 
 group.

 I am definitely still missing the bigger picture...
 
 The bigger picture is that the group needs to exist AND it needs to be
 setup and maintained to have IOMMU protection.  Actually, my first stab
 at add_external_user doesn't look sufficient, it needs to look more like
 vfio_group_get_device_fd, checking group-container-iommu and
 group_viable().


This makes sense. If you did this, that would be great. Without it, I
really cannot see how the proposed inc/dec of container_users is better
than simple holding file*. Thanks.


 As written it would allow an external user after
 SET_CONTAINER without SET_IOMMU.  It should also be part of the API that
 the external user must hold the file reference between add_external_use
 and del_external_user and do cleanup on any exit paths.  Thanks,


-- 
Alexey
--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling

2013-06-22 Thread David Gibson
On Thu, Jun 20, 2013 at 08:55:13AM -0600, Alex Williamson wrote:
 On Thu, 2013-06-20 at 18:48 +1000, Alexey Kardashevskiy wrote:
  On 06/20/2013 05:47 PM, Benjamin Herrenschmidt wrote:
   On Thu, 2013-06-20 at 15:28 +1000, David Gibson wrote:
   Just out of curiosity - would not get_file() and fput_atomic() on a
   group's
   file* do the right job instead of vfio_group_add_external_user() and
   vfio_group_del_external_user()?
  
   I was thinking that too.  Grabbing a file reference would certainly be
   the usual way of handling this sort of thing.
   
   But that wouldn't prevent the group ownership to be returned to
   the kernel or another user would it ?
  
  
  Holding the file pointer does not let the group-container_users counter go
  to zero
 
 How so?  Holding the file pointer means the file won't go away, which
 means the group release function won't be called.  That means the group
 won't go away, but that doesn't mean it's attached to an IOMMU.  A user
 could call UNSET_CONTAINER.

Uhh... *thinks*.  Ah, I see.

I think the interface should not take the group fd, but the container
fd.  Holding a reference to *that* would keep the necessary things
around.  But more to the point, it's the right thing semantically:

The container is essentially the handle on a host iommu address space,
and so that's what should be bound by the KVM call to a particular
guest iommu address space.  e.g. it would make no sense to bind two
different groups to different guest iommu address spaces, if they were
in the same container - the guest thinks they are different spaces,
but if they're in the same container they must be the same space.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


pgp9ekBfTPFTm.pgp
Description: PGP signature


Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling

2013-06-22 Thread Alex Williamson
On Sat, 2013-06-22 at 22:03 +1000, David Gibson wrote:
 On Thu, Jun 20, 2013 at 08:55:13AM -0600, Alex Williamson wrote:
  On Thu, 2013-06-20 at 18:48 +1000, Alexey Kardashevskiy wrote:
   On 06/20/2013 05:47 PM, Benjamin Herrenschmidt wrote:
On Thu, 2013-06-20 at 15:28 +1000, David Gibson wrote:
Just out of curiosity - would not get_file() and fput_atomic() on a
group's
file* do the right job instead of vfio_group_add_external_user() and
vfio_group_del_external_user()?
   
I was thinking that too.  Grabbing a file reference would certainly be
the usual way of handling this sort of thing.

But that wouldn't prevent the group ownership to be returned to
the kernel or another user would it ?
   
   
   Holding the file pointer does not let the group-container_users counter 
   go
   to zero
  
  How so?  Holding the file pointer means the file won't go away, which
  means the group release function won't be called.  That means the group
  won't go away, but that doesn't mean it's attached to an IOMMU.  A user
  could call UNSET_CONTAINER.
 
 Uhh... *thinks*.  Ah, I see.
 
 I think the interface should not take the group fd, but the container
 fd.  Holding a reference to *that* would keep the necessary things
 around.  But more to the point, it's the right thing semantically:
 
 The container is essentially the handle on a host iommu address space,
 and so that's what should be bound by the KVM call to a particular
 guest iommu address space.  e.g. it would make no sense to bind two
 different groups to different guest iommu address spaces, if they were
 in the same container - the guest thinks they are different spaces,
 but if they're in the same container they must be the same space.

While the container is the gateway to the iommu, what empowers the
container to maintain an iommu is the group.  What happens to a
container when all the groups are disconnected or closed?  Groups are
the unit that indicates hardware access, not containers.  Thanks,

Alex

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling

2013-06-22 Thread Benjamin Herrenschmidt
On Sat, 2013-06-22 at 22:03 +1000, David Gibson wrote:
 I think the interface should not take the group fd, but the container
 fd.  Holding a reference to *that* would keep the necessary things
 around.  But more to the point, it's the right thing semantically:
 
 The container is essentially the handle on a host iommu address space,
 and so that's what should be bound by the KVM call to a particular
 guest iommu address space.  e.g. it would make no sense to bind two
 different groups to different guest iommu address spaces, if they were
 in the same container - the guest thinks they are different spaces,
 but if they're in the same container they must be the same space.

Interestingly, how are we going to extend that when/if we implement
DDW ?

DDW means an API by which the guest can request the creation of
additional iommus for a given device (typically, in addition to the
default smallish 32-bit one using 4k pages, the guest can request
a larger window in 64-bit space using a larger page size).

Cheers,
Ben.


--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling

2013-06-20 Thread David Gibson
On Thu, Jun 20, 2013 at 02:58:18PM +1000, Alexey Kardashevskiy wrote:
 On 06/20/2013 01:49 AM, Alex Williamson wrote:
  On Thu, 2013-06-20 at 00:50 +1000, Benjamin Herrenschmidt wrote:
  On Wed, 2013-06-19 at 11:58 +0200, Alexander Graf wrote:
 
  Alex, any objection ?
 
  Which Alex? :)
 
  Heh, mostly Williamson in this specific case but your input is still
  welcome :-)
 
  I think validate works, it keeps iteration logic out of the kernel
  which is a good thing. There still needs to be an interface for
  getting the iommu id in VFIO, but I suppose that one's for the other
  Alex and Jörg to comment on.
 
  I think getting the iommu fd is already covered by separate patches from
  Alexey.
 
 
  Do we need to make it a get/put interface instead ?
 
   vfio_validate_and_use_iommu(file, iommu_id);
 
   vfio_release_iommu(file, iommu_id);
 
  To ensure that the resource remains owned by the process until KVM
  is closed as well ?
 
  Or do we want to register with VFIO with a callback so that VFIO can
  call us if it needs us to give it up ?
 
  Can't we just register a handler on the fd and get notified when it
  closes? Can you kill VFIO access without closing the fd?
 
  That sounds actually harder :-)
 
  The question is basically: When we validate that relationship between a
  specific VFIO struct file with an iommu, what is the lifetime of that
  and how do we handle this lifetime properly.
 
  There's two ways for that sort of situation: The notification model
  where we get notified when the relationship is broken, and the refcount
  model where we become a user and thus delay the breaking of the
  relationship until we have been disposed of as well.
 
  In this specific case, it's hard to tell what is the right model from my
  perspective, which is why I would welcome Alex (W.) input.
 
  In the end, the solution will end up being in the form of APIs exposed
  by VFIO for use by KVM (via that symbol lookup mechanism) so Alex (W),
  as owner of VFIO at this stage, what do you want those to look
  like ? :-)
  
  My first thought is that we should use the same reference counting as we
  have for vfio devices (group-container_users).  An interface for that
  might look like:
  
  int vfio_group_add_external_user(struct file *filep)
  {
  struct vfio_group *group = filep-private_data;
  
  if (filep-f_op != vfio_group_fops)
  return -EINVAL;
  
  
  if (!atomic_inc_not_zero(group-container_users))
  return -EINVAL;
  
  return 0;
  }
  
  void vfio_group_del_external_user(struct file *filep)
  {
  struct vfio_group *group = filep-private_data;
  
  BUG_ON(filep-f_op != vfio_group_fops);
  
  vfio_group_try_dissolve_container(group);
  }
  
  int vfio_group_iommu_id_from_file(struct file *filep)
  {
  struct vfio_group *group = filep-private_data;
  
  BUG_ON(filep-f_op != vfio_group_fops);
  
  return iommu_group_id(group-iommu_group);
  }
  
  Would that work?  Thanks,
 
 
 Just out of curiosity - would not get_file() and fput_atomic() on a group's
 file* do the right job instead of vfio_group_add_external_user() and
 vfio_group_del_external_user()?

I was thinking that too.  Grabbing a file reference would certainly be
the usual way of handling this sort of thing.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


pgpREe0KCyiw8.pgp
Description: PGP signature


Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling

2013-06-20 Thread Benjamin Herrenschmidt
On Thu, 2013-06-20 at 15:28 +1000, David Gibson wrote:
  Just out of curiosity - would not get_file() and fput_atomic() on a
 group's
  file* do the right job instead of vfio_group_add_external_user() and
  vfio_group_del_external_user()?
 
 I was thinking that too.  Grabbing a file reference would certainly be
 the usual way of handling this sort of thing.

But that wouldn't prevent the group ownership to be returned to
the kernel or another user would it ?

Ben.


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling

2013-06-20 Thread Alexey Kardashevskiy
On 06/20/2013 05:47 PM, Benjamin Herrenschmidt wrote:
 On Thu, 2013-06-20 at 15:28 +1000, David Gibson wrote:
 Just out of curiosity - would not get_file() and fput_atomic() on a
 group's
 file* do the right job instead of vfio_group_add_external_user() and
 vfio_group_del_external_user()?

 I was thinking that too.  Grabbing a file reference would certainly be
 the usual way of handling this sort of thing.
 
 But that wouldn't prevent the group ownership to be returned to
 the kernel or another user would it ?


Holding the file pointer does not let the group-container_users counter go
to zero and this is exactly what vfio_group_add_external_user() and
vfio_group_del_external_user() do. The difference is only in absolute value
- 2 vs. 3.

No change in behaviour whether I use new vfio API or simply hold file* till
KVM closes fd created when IOMMU was connected to LIOBN.

And while this counter is not zero, QEMU cannot take ownership over the group.

I am definitely still missing the bigger picture...


-- 
Alexey
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling

2013-06-20 Thread Alex Williamson
On Thu, 2013-06-20 at 18:48 +1000, Alexey Kardashevskiy wrote:
 On 06/20/2013 05:47 PM, Benjamin Herrenschmidt wrote:
  On Thu, 2013-06-20 at 15:28 +1000, David Gibson wrote:
  Just out of curiosity - would not get_file() and fput_atomic() on a
  group's
  file* do the right job instead of vfio_group_add_external_user() and
  vfio_group_del_external_user()?
 
  I was thinking that too.  Grabbing a file reference would certainly be
  the usual way of handling this sort of thing.
  
  But that wouldn't prevent the group ownership to be returned to
  the kernel or another user would it ?
 
 
 Holding the file pointer does not let the group-container_users counter go
 to zero

How so?  Holding the file pointer means the file won't go away, which
means the group release function won't be called.  That means the group
won't go away, but that doesn't mean it's attached to an IOMMU.  A user
could call UNSET_CONTAINER.

  and this is exactly what vfio_group_add_external_user() and
 vfio_group_del_external_user() do. The difference is only in absolute value
 - 2 vs. 3.
 
 No change in behaviour whether I use new vfio API or simply hold file* till
 KVM closes fd created when IOMMU was connected to LIOBN.

By that notion you could open(/dev/vfio/$GROUP) and you're safe, right?
But what about SET_CONTAINER  SET_IOMMU?  All that you guarantee
holding the file pointer is that the vfio_group exists.

 And while this counter is not zero, QEMU cannot take ownership over the group.

 I am definitely still missing the bigger picture...

The bigger picture is that the group needs to exist AND it needs to be
setup and maintained to have IOMMU protection.  Actually, my first stab
at add_external_user doesn't look sufficient, it needs to look more like
vfio_group_get_device_fd, checking group-container-iommu and
group_viable().  As written it would allow an external user after
SET_CONTAINER without SET_IOMMU.  It should also be part of the API that
the external user must hold the file reference between add_external_use
and del_external_user and do cleanup on any exit paths.  Thanks,

Alex

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling

2013-06-20 Thread Benjamin Herrenschmidt
On Thu, 2013-06-20 at 15:28 +1000, David Gibson wrote:
  Just out of curiosity - would not get_file() and fput_atomic() on a
 group's
  file* do the right job instead of vfio_group_add_external_user() and
  vfio_group_del_external_user()?
 
 I was thinking that too.  Grabbing a file reference would certainly be
 the usual way of handling this sort of thing.

But that wouldn't prevent the group ownership to be returned to
the kernel or another user would it ?

Ben.


--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling

2013-06-20 Thread Alexey Kardashevskiy
On 06/20/2013 05:47 PM, Benjamin Herrenschmidt wrote:
 On Thu, 2013-06-20 at 15:28 +1000, David Gibson wrote:
 Just out of curiosity - would not get_file() and fput_atomic() on a
 group's
 file* do the right job instead of vfio_group_add_external_user() and
 vfio_group_del_external_user()?

 I was thinking that too.  Grabbing a file reference would certainly be
 the usual way of handling this sort of thing.
 
 But that wouldn't prevent the group ownership to be returned to
 the kernel or another user would it ?


Holding the file pointer does not let the group-container_users counter go
to zero and this is exactly what vfio_group_add_external_user() and
vfio_group_del_external_user() do. The difference is only in absolute value
- 2 vs. 3.

No change in behaviour whether I use new vfio API or simply hold file* till
KVM closes fd created when IOMMU was connected to LIOBN.

And while this counter is not zero, QEMU cannot take ownership over the group.

I am definitely still missing the bigger picture...


-- 
Alexey
--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling

2013-06-20 Thread Alex Williamson
On Thu, 2013-06-20 at 18:48 +1000, Alexey Kardashevskiy wrote:
 On 06/20/2013 05:47 PM, Benjamin Herrenschmidt wrote:
  On Thu, 2013-06-20 at 15:28 +1000, David Gibson wrote:
  Just out of curiosity - would not get_file() and fput_atomic() on a
  group's
  file* do the right job instead of vfio_group_add_external_user() and
  vfio_group_del_external_user()?
 
  I was thinking that too.  Grabbing a file reference would certainly be
  the usual way of handling this sort of thing.
  
  But that wouldn't prevent the group ownership to be returned to
  the kernel or another user would it ?
 
 
 Holding the file pointer does not let the group-container_users counter go
 to zero

How so?  Holding the file pointer means the file won't go away, which
means the group release function won't be called.  That means the group
won't go away, but that doesn't mean it's attached to an IOMMU.  A user
could call UNSET_CONTAINER.

  and this is exactly what vfio_group_add_external_user() and
 vfio_group_del_external_user() do. The difference is only in absolute value
 - 2 vs. 3.
 
 No change in behaviour whether I use new vfio API or simply hold file* till
 KVM closes fd created when IOMMU was connected to LIOBN.

By that notion you could open(/dev/vfio/$GROUP) and you're safe, right?
But what about SET_CONTAINER  SET_IOMMU?  All that you guarantee
holding the file pointer is that the vfio_group exists.

 And while this counter is not zero, QEMU cannot take ownership over the group.

 I am definitely still missing the bigger picture...

The bigger picture is that the group needs to exist AND it needs to be
setup and maintained to have IOMMU protection.  Actually, my first stab
at add_external_user doesn't look sufficient, it needs to look more like
vfio_group_get_device_fd, checking group-container-iommu and
group_viable().  As written it would allow an external user after
SET_CONTAINER without SET_IOMMU.  It should also be part of the API that
the external user must hold the file reference between add_external_use
and del_external_user and do cleanup on any exit paths.  Thanks,

Alex

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling

2013-06-19 Thread Alexander Graf

On 19.06.2013, at 06:59, Benjamin Herrenschmidt wrote:

 On Wed, 2013-06-19 at 13:05 +0930, Rusty Russell wrote:
 symbol_get() won't try to load a module; it'll just fail.  This is what
 you want, since they must have vfio in the kernel to get a valid fd...
 
 Ok, cool. I suppose what we want here Alexey is slightly higher level,
 something like:
 
   vfio_validate_iommu_id(file, iommu_id)
 
 Which verifies that the file that was passed in is allowed to use
 that iommu_id.
 
 That's a simple and flexible interface (ie, it will work even if we
 support multiple iommu IDs in the future for a vfio, for example
 for DDW windows etc...), the logic to know about the ID remains
 in qemu, this is strictly a validation call.
 
 That way we also don't have to expose the containing vfio struct etc...
 just that simple function.
 
 Alex, any objection ?

Which Alex? :)

I think validate works, it keeps iteration logic out of the kernel which is a 
good thing. There still needs to be an interface for getting the iommu id in 
VFIO, but I suppose that one's for the other Alex and Jörg to comment on.

 
 Do we need to make it a get/put interface instead ?
 
   vfio_validate_and_use_iommu(file, iommu_id);
 
   vfio_release_iommu(file, iommu_id);
 
 To ensure that the resource remains owned by the process until KVM
 is closed as well ?
 
 Or do we want to register with VFIO with a callback so that VFIO can
 call us if it needs us to give it up ?

Can't we just register a handler on the fd and get notified when it closes? Can 
you kill VFIO access without closing the fd?


Alex

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling

2013-06-19 Thread Benjamin Herrenschmidt
On Wed, 2013-06-19 at 11:58 +0200, Alexander Graf wrote:

  Alex, any objection ?
 
 Which Alex? :)

Heh, mostly Williamson in this specific case but your input is still
welcome :-)

 I think validate works, it keeps iteration logic out of the kernel
 which is a good thing. There still needs to be an interface for
 getting the iommu id in VFIO, but I suppose that one's for the other
 Alex and Jörg to comment on.

I think getting the iommu fd is already covered by separate patches from
Alexey.

  
  Do we need to make it a get/put interface instead ?
  
  vfio_validate_and_use_iommu(file, iommu_id);
  
  vfio_release_iommu(file, iommu_id);
  
  To ensure that the resource remains owned by the process until KVM
  is closed as well ?
  
  Or do we want to register with VFIO with a callback so that VFIO can
  call us if it needs us to give it up ?
 
 Can't we just register a handler on the fd and get notified when it
 closes? Can you kill VFIO access without closing the fd?

That sounds actually harder :-)

The question is basically: When we validate that relationship between a
specific VFIO struct file with an iommu, what is the lifetime of that
and how do we handle this lifetime properly.

There's two ways for that sort of situation: The notification model
where we get notified when the relationship is broken, and the refcount
model where we become a user and thus delay the breaking of the
relationship until we have been disposed of as well.

In this specific case, it's hard to tell what is the right model from my
perspective, which is why I would welcome Alex (W.) input.

In the end, the solution will end up being in the form of APIs exposed
by VFIO for use by KVM (via that symbol lookup mechanism) so Alex (W),
as owner of VFIO at this stage, what do you want those to look
like ? :-)

Cheers,
Ben.
 
 
 Alex
 
 --
 To unsubscribe from this list: send the line unsubscribe kvm-ppc in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling

2013-06-19 Thread Alex Williamson
On Thu, 2013-06-20 at 00:50 +1000, Benjamin Herrenschmidt wrote:
 On Wed, 2013-06-19 at 11:58 +0200, Alexander Graf wrote:
 
   Alex, any objection ?
  
  Which Alex? :)
 
 Heh, mostly Williamson in this specific case but your input is still
 welcome :-)
 
  I think validate works, it keeps iteration logic out of the kernel
  which is a good thing. There still needs to be an interface for
  getting the iommu id in VFIO, but I suppose that one's for the other
  Alex and Jörg to comment on.
 
 I think getting the iommu fd is already covered by separate patches from
 Alexey.
 
   
   Do we need to make it a get/put interface instead ?
   
 vfio_validate_and_use_iommu(file, iommu_id);
   
 vfio_release_iommu(file, iommu_id);
   
   To ensure that the resource remains owned by the process until KVM
   is closed as well ?
   
   Or do we want to register with VFIO with a callback so that VFIO can
   call us if it needs us to give it up ?
  
  Can't we just register a handler on the fd and get notified when it
  closes? Can you kill VFIO access without closing the fd?
 
 That sounds actually harder :-)
 
 The question is basically: When we validate that relationship between a
 specific VFIO struct file with an iommu, what is the lifetime of that
 and how do we handle this lifetime properly.
 
 There's two ways for that sort of situation: The notification model
 where we get notified when the relationship is broken, and the refcount
 model where we become a user and thus delay the breaking of the
 relationship until we have been disposed of as well.
 
 In this specific case, it's hard to tell what is the right model from my
 perspective, which is why I would welcome Alex (W.) input.
 
 In the end, the solution will end up being in the form of APIs exposed
 by VFIO for use by KVM (via that symbol lookup mechanism) so Alex (W),
 as owner of VFIO at this stage, what do you want those to look
 like ? :-)

My first thought is that we should use the same reference counting as we
have for vfio devices (group-container_users).  An interface for that
might look like:

int vfio_group_add_external_user(struct file *filep)
{
struct vfio_group *group = filep-private_data;

if (filep-f_op != vfio_group_fops)
return -EINVAL;


if (!atomic_inc_not_zero(group-container_users))
return -EINVAL;

return 0;
}

void vfio_group_del_external_user(struct file *filep)
{
struct vfio_group *group = filep-private_data;

BUG_ON(filep-f_op != vfio_group_fops);

vfio_group_try_dissolve_container(group);
}

int vfio_group_iommu_id_from_file(struct file *filep)
{
struct vfio_group *group = filep-private_data;

BUG_ON(filep-f_op != vfio_group_fops);

return iommu_group_id(group-iommu_group);
}

Would that work?  Thanks,

Alex

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling

2013-06-19 Thread Alexey Kardashevskiy
On 06/20/2013 01:49 AM, Alex Williamson wrote:
 On Thu, 2013-06-20 at 00:50 +1000, Benjamin Herrenschmidt wrote:
 On Wed, 2013-06-19 at 11:58 +0200, Alexander Graf wrote:

 Alex, any objection ?

 Which Alex? :)

 Heh, mostly Williamson in this specific case but your input is still
 welcome :-)

 I think validate works, it keeps iteration logic out of the kernel
 which is a good thing. There still needs to be an interface for
 getting the iommu id in VFIO, but I suppose that one's for the other
 Alex and Jörg to comment on.

 I think getting the iommu fd is already covered by separate patches from
 Alexey.


 Do we need to make it a get/put interface instead ?

vfio_validate_and_use_iommu(file, iommu_id);

vfio_release_iommu(file, iommu_id);

 To ensure that the resource remains owned by the process until KVM
 is closed as well ?

 Or do we want to register with VFIO with a callback so that VFIO can
 call us if it needs us to give it up ?

 Can't we just register a handler on the fd and get notified when it
 closes? Can you kill VFIO access without closing the fd?

 That sounds actually harder :-)

 The question is basically: When we validate that relationship between a
 specific VFIO struct file with an iommu, what is the lifetime of that
 and how do we handle this lifetime properly.

 There's two ways for that sort of situation: The notification model
 where we get notified when the relationship is broken, and the refcount
 model where we become a user and thus delay the breaking of the
 relationship until we have been disposed of as well.

 In this specific case, it's hard to tell what is the right model from my
 perspective, which is why I would welcome Alex (W.) input.

 In the end, the solution will end up being in the form of APIs exposed
 by VFIO for use by KVM (via that symbol lookup mechanism) so Alex (W),
 as owner of VFIO at this stage, what do you want those to look
 like ? :-)
 
 My first thought is that we should use the same reference counting as we
 have for vfio devices (group-container_users).  An interface for that
 might look like:
 
 int vfio_group_add_external_user(struct file *filep)
 {
   struct vfio_group *group = filep-private_data;
 
   if (filep-f_op != vfio_group_fops)
   return -EINVAL;
 
 
   if (!atomic_inc_not_zero(group-container_users))
   return -EINVAL;
 
   return 0;
 }
 
 void vfio_group_del_external_user(struct file *filep)
 {
   struct vfio_group *group = filep-private_data;
 
   BUG_ON(filep-f_op != vfio_group_fops);
 
   vfio_group_try_dissolve_container(group);
 }
 
 int vfio_group_iommu_id_from_file(struct file *filep)
 {
   struct vfio_group *group = filep-private_data;
 
   BUG_ON(filep-f_op != vfio_group_fops);
 
   return iommu_group_id(group-iommu_group);
 }
 
 Would that work?  Thanks,


Just out of curiosity - would not get_file() and fput_atomic() on a group's
file* do the right job instead of vfio_group_add_external_user() and
vfio_group_del_external_user()?



-- 
Alexey
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling

2013-06-19 Thread Alexander Graf

On 19.06.2013, at 06:59, Benjamin Herrenschmidt wrote:

 On Wed, 2013-06-19 at 13:05 +0930, Rusty Russell wrote:
 symbol_get() won't try to load a module; it'll just fail.  This is what
 you want, since they must have vfio in the kernel to get a valid fd...
 
 Ok, cool. I suppose what we want here Alexey is slightly higher level,
 something like:
 
   vfio_validate_iommu_id(file, iommu_id)
 
 Which verifies that the file that was passed in is allowed to use
 that iommu_id.
 
 That's a simple and flexible interface (ie, it will work even if we
 support multiple iommu IDs in the future for a vfio, for example
 for DDW windows etc...), the logic to know about the ID remains
 in qemu, this is strictly a validation call.
 
 That way we also don't have to expose the containing vfio struct etc...
 just that simple function.
 
 Alex, any objection ?

Which Alex? :)

I think validate works, it keeps iteration logic out of the kernel which is a 
good thing. There still needs to be an interface for getting the iommu id in 
VFIO, but I suppose that one's for the other Alex and Jörg to comment on.

 
 Do we need to make it a get/put interface instead ?
 
   vfio_validate_and_use_iommu(file, iommu_id);
 
   vfio_release_iommu(file, iommu_id);
 
 To ensure that the resource remains owned by the process until KVM
 is closed as well ?
 
 Or do we want to register with VFIO with a callback so that VFIO can
 call us if it needs us to give it up ?

Can't we just register a handler on the fd and get notified when it closes? Can 
you kill VFIO access without closing the fd?


Alex

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling

2013-06-19 Thread Benjamin Herrenschmidt
On Wed, 2013-06-19 at 11:58 +0200, Alexander Graf wrote:

  Alex, any objection ?
 
 Which Alex? :)

Heh, mostly Williamson in this specific case but your input is still
welcome :-)

 I think validate works, it keeps iteration logic out of the kernel
 which is a good thing. There still needs to be an interface for
 getting the iommu id in VFIO, but I suppose that one's for the other
 Alex and Jörg to comment on.

I think getting the iommu fd is already covered by separate patches from
Alexey.

  
  Do we need to make it a get/put interface instead ?
  
  vfio_validate_and_use_iommu(file, iommu_id);
  
  vfio_release_iommu(file, iommu_id);
  
  To ensure that the resource remains owned by the process until KVM
  is closed as well ?
  
  Or do we want to register with VFIO with a callback so that VFIO can
  call us if it needs us to give it up ?
 
 Can't we just register a handler on the fd and get notified when it
 closes? Can you kill VFIO access without closing the fd?

That sounds actually harder :-)

The question is basically: When we validate that relationship between a
specific VFIO struct file with an iommu, what is the lifetime of that
and how do we handle this lifetime properly.

There's two ways for that sort of situation: The notification model
where we get notified when the relationship is broken, and the refcount
model where we become a user and thus delay the breaking of the
relationship until we have been disposed of as well.

In this specific case, it's hard to tell what is the right model from my
perspective, which is why I would welcome Alex (W.) input.

In the end, the solution will end up being in the form of APIs exposed
by VFIO for use by KVM (via that symbol lookup mechanism) so Alex (W),
as owner of VFIO at this stage, what do you want those to look
like ? :-)

Cheers,
Ben.
 
 
 Alex
 
 --
 To unsubscribe from this list: send the line unsubscribe kvm-ppc in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling

2013-06-19 Thread Alex Williamson
On Thu, 2013-06-20 at 00:50 +1000, Benjamin Herrenschmidt wrote:
 On Wed, 2013-06-19 at 11:58 +0200, Alexander Graf wrote:
 
   Alex, any objection ?
  
  Which Alex? :)
 
 Heh, mostly Williamson in this specific case but your input is still
 welcome :-)
 
  I think validate works, it keeps iteration logic out of the kernel
  which is a good thing. There still needs to be an interface for
  getting the iommu id in VFIO, but I suppose that one's for the other
  Alex and Jörg to comment on.
 
 I think getting the iommu fd is already covered by separate patches from
 Alexey.
 
   
   Do we need to make it a get/put interface instead ?
   
 vfio_validate_and_use_iommu(file, iommu_id);
   
 vfio_release_iommu(file, iommu_id);
   
   To ensure that the resource remains owned by the process until KVM
   is closed as well ?
   
   Or do we want to register with VFIO with a callback so that VFIO can
   call us if it needs us to give it up ?
  
  Can't we just register a handler on the fd and get notified when it
  closes? Can you kill VFIO access without closing the fd?
 
 That sounds actually harder :-)
 
 The question is basically: When we validate that relationship between a
 specific VFIO struct file with an iommu, what is the lifetime of that
 and how do we handle this lifetime properly.
 
 There's two ways for that sort of situation: The notification model
 where we get notified when the relationship is broken, and the refcount
 model where we become a user and thus delay the breaking of the
 relationship until we have been disposed of as well.
 
 In this specific case, it's hard to tell what is the right model from my
 perspective, which is why I would welcome Alex (W.) input.
 
 In the end, the solution will end up being in the form of APIs exposed
 by VFIO for use by KVM (via that symbol lookup mechanism) so Alex (W),
 as owner of VFIO at this stage, what do you want those to look
 like ? :-)

My first thought is that we should use the same reference counting as we
have for vfio devices (group-container_users).  An interface for that
might look like:

int vfio_group_add_external_user(struct file *filep)
{
struct vfio_group *group = filep-private_data;

if (filep-f_op != vfio_group_fops)
return -EINVAL;


if (!atomic_inc_not_zero(group-container_users))
return -EINVAL;

return 0;
}

void vfio_group_del_external_user(struct file *filep)
{
struct vfio_group *group = filep-private_data;

BUG_ON(filep-f_op != vfio_group_fops);

vfio_group_try_dissolve_container(group);
}

int vfio_group_iommu_id_from_file(struct file *filep)
{
struct vfio_group *group = filep-private_data;

BUG_ON(filep-f_op != vfio_group_fops);

return iommu_group_id(group-iommu_group);
}

Would that work?  Thanks,

Alex

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling

2013-06-19 Thread Alexey Kardashevskiy
On 06/20/2013 01:49 AM, Alex Williamson wrote:
 On Thu, 2013-06-20 at 00:50 +1000, Benjamin Herrenschmidt wrote:
 On Wed, 2013-06-19 at 11:58 +0200, Alexander Graf wrote:

 Alex, any objection ?

 Which Alex? :)

 Heh, mostly Williamson in this specific case but your input is still
 welcome :-)

 I think validate works, it keeps iteration logic out of the kernel
 which is a good thing. There still needs to be an interface for
 getting the iommu id in VFIO, but I suppose that one's for the other
 Alex and Jörg to comment on.

 I think getting the iommu fd is already covered by separate patches from
 Alexey.


 Do we need to make it a get/put interface instead ?

vfio_validate_and_use_iommu(file, iommu_id);

vfio_release_iommu(file, iommu_id);

 To ensure that the resource remains owned by the process until KVM
 is closed as well ?

 Or do we want to register with VFIO with a callback so that VFIO can
 call us if it needs us to give it up ?

 Can't we just register a handler on the fd and get notified when it
 closes? Can you kill VFIO access without closing the fd?

 That sounds actually harder :-)

 The question is basically: When we validate that relationship between a
 specific VFIO struct file with an iommu, what is the lifetime of that
 and how do we handle this lifetime properly.

 There's two ways for that sort of situation: The notification model
 where we get notified when the relationship is broken, and the refcount
 model where we become a user and thus delay the breaking of the
 relationship until we have been disposed of as well.

 In this specific case, it's hard to tell what is the right model from my
 perspective, which is why I would welcome Alex (W.) input.

 In the end, the solution will end up being in the form of APIs exposed
 by VFIO for use by KVM (via that symbol lookup mechanism) so Alex (W),
 as owner of VFIO at this stage, what do you want those to look
 like ? :-)
 
 My first thought is that we should use the same reference counting as we
 have for vfio devices (group-container_users).  An interface for that
 might look like:
 
 int vfio_group_add_external_user(struct file *filep)
 {
   struct vfio_group *group = filep-private_data;
 
   if (filep-f_op != vfio_group_fops)
   return -EINVAL;
 
 
   if (!atomic_inc_not_zero(group-container_users))
   return -EINVAL;
 
   return 0;
 }
 
 void vfio_group_del_external_user(struct file *filep)
 {
   struct vfio_group *group = filep-private_data;
 
   BUG_ON(filep-f_op != vfio_group_fops);
 
   vfio_group_try_dissolve_container(group);
 }
 
 int vfio_group_iommu_id_from_file(struct file *filep)
 {
   struct vfio_group *group = filep-private_data;
 
   BUG_ON(filep-f_op != vfio_group_fops);
 
   return iommu_group_id(group-iommu_group);
 }
 
 Would that work?  Thanks,


Just out of curiosity - would not get_file() and fput_atomic() on a group's
file* do the right job instead of vfio_group_add_external_user() and
vfio_group_del_external_user()?



-- 
Alexey
--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling

2013-06-18 Thread Alex Williamson
On Tue, 2013-06-18 at 14:38 +1000, Benjamin Herrenschmidt wrote:
 On Mon, 2013-06-17 at 20:32 -0600, Alex Williamson wrote:
 
  Right, we don't want to create dependencies across modules.  I don't
  have a vision for how this should work.  This is effectively a complete
  side-band to vfio, so we're really just dealing in the iommu group
  space.  Maybe there needs to be some kind of registration of ownership
  for the group using some kind of token.  It would need to include some
  kind of notification when that ownership ends.  That might also be a
  convenient tag to toggle driver probing off for devices in the group.
  Other ideas?  Thanks,
 
 All of that smells nasty like it will need a pile of bloody
 infrastructure which makes me think it's too complicated and not the
 right approach.
 
 How does access control work today on x86/VFIO ? Can you give me a bit
 more details ? I didn't get a good grasp in your previous email

The current model is not x86 specific, but it only covers doing iommu
and device access through vfio.  The kink here is that we're trying to
do device access and setup through vfio, but iommu manipulation through
kvm.  We may want to revisit whether we can do the in-kernel iommu
manipulation through vfio rather than kvm.

For vfio in general, the group is the unit of ownership.  A user is
granted access to /dev/vfio/$GROUP through file permissions.  The user
opens the group and a container (/dev/vfio/vfio) and calls SET_CONTAINER
on the group.  If supported by the platform, multiple groups can be set
to the same container, which allows for iommu domain sharing.  Once a
group is associated with a container, an iommu backend can be
initialized for the container.  Only then can a device be accessed
through the group.

So even if we were to pass a vfio group file descriptor into kvm and it
matched as some kind of ownership token on the iommu group, it's not
clear that's sufficient to assume we can start programming the iommu.
Thanks,

Alex

 From the look of it, the VFIO file descriptor is what has the access
 control to the underlying iommu, is this right ? So we somewhat need to
 transfer (or copy) that ownership from the VFIO fd to the KVM VM.
 
 I don't see a way to do that without some cross-layering here...
 
 Rusty, are you aware of some kernel mechanism we can use for that ?
 
 Cheers,
 Ben.
 
 



--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling

2013-06-18 Thread Benjamin Herrenschmidt
On Tue, 2013-06-18 at 08:48 -0600, Alex Williamson wrote:
 On Tue, 2013-06-18 at 14:38 +1000, Benjamin Herrenschmidt wrote:
  On Mon, 2013-06-17 at 20:32 -0600, Alex Williamson wrote:
  
   Right, we don't want to create dependencies across modules.  I don't
   have a vision for how this should work.  This is effectively a complete
   side-band to vfio, so we're really just dealing in the iommu group
   space.  Maybe there needs to be some kind of registration of ownership
   for the group using some kind of token.  It would need to include some
   kind of notification when that ownership ends.  That might also be a
   convenient tag to toggle driver probing off for devices in the group.
   Other ideas?  Thanks,
  
  All of that smells nasty like it will need a pile of bloody
  infrastructure which makes me think it's too complicated and not the
  right approach.
  
  How does access control work today on x86/VFIO ? Can you give me a bit
  more details ? I didn't get a good grasp in your previous email
 
 The current model is not x86 specific, but it only covers doing iommu
 and device access through vfio.  The kink here is that we're trying to
 do device access and setup through vfio, but iommu manipulation through
 kvm.  We may want to revisit whether we can do the in-kernel iommu
 manipulation through vfio rather than kvm.

How would that be possible ?

The hypercalls from the guest arrive in KVM... in a very very specific 
restricted environment which we call real mode (MMU off but still
running in guest context), where we try to do as much as possible, or in
virtual mode, where they get handled as normal KVM exits.

The only way we could handle them in VFIO would be if somewhat VFIO
registered callbacks with KVM... if we have that sort of
cross-dependency, then we may as well have a simpler one where VFIO
tells KVM what iommu is available for the VM

 For vfio in general, the group is the unit of ownership.  A user is
 granted access to /dev/vfio/$GROUP through file permissions.  The user
 opens the group and a container (/dev/vfio/vfio) and calls SET_CONTAINER
 on the group.  If supported by the platform, multiple groups can be set
 to the same container, which allows for iommu domain sharing.  Once a
 group is associated with a container, an iommu backend can be
 initialized for the container.  Only then can a device be accessed
 through the group.
 
 So even if we were to pass a vfio group file descriptor into kvm and it
 matched as some kind of ownership token on the iommu group, it's not
 clear that's sufficient to assume we can start programming the iommu.
 Thanks,

Your scheme seems to me that it would have the same problem if you
wanted to do virtualized iommu

In any case, this is a big deal. We have a requirement for pass-through.
It cannot work with any remotely usable performance level if we don't
implement the calls in KVM, so it needs to be sorted one way or another
and I'm at a loss how here...

Ben.

 Alex
 
  From the look of it, the VFIO file descriptor is what has the access
  control to the underlying iommu, is this right ? So we somewhat need to
  transfer (or copy) that ownership from the VFIO fd to the KVM VM.
  
  I don't see a way to do that without some cross-layering here...
  
  Rusty, are you aware of some kernel mechanism we can use for that ?
  
  Cheers,
  Ben.
  
  
 
 
 
 --
 To unsubscribe from this list: send the line unsubscribe kvm-ppc in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling

2013-06-18 Thread Alexey Kardashevskiy
On 06/16/2013 02:39 PM, Benjamin Herrenschmidt wrote:
  static pte_t kvmppc_lookup_pte(pgd_t *pgdir, unsigned long hva, bool 
 writing,
 -unsigned long *pte_sizep)
 +unsigned long *pte_sizep, bool do_get_page)
  {
  pte_t *ptep;
  unsigned int shift = 0;
 @@ -135,6 +136,14 @@ static pte_t kvmppc_lookup_pte(pgd_t *pgdir, unsigned 
 long hva, bool writing,
  if (!pte_present(*ptep))
  return __pte(0);
  
 +/*
 + * Put huge pages handling to the virtual mode.
 + * The only exception is for TCE list pages which we
 + * do need to call get_page() for.
 + */
 +if ((*pte_sizep  PAGE_SIZE)  do_get_page)
 +return __pte(0);
 +
  /* wait until _PAGE_BUSY is clear then set it atomically */
  __asm__ __volatile__ (
  1: ldarx   %0,0,%3\n
 @@ -148,6 +157,18 @@ static pte_t kvmppc_lookup_pte(pgd_t *pgdir, unsigned 
 long hva, bool writing,
  : cc);
  
  ret = pte;
 +if (do_get_page  pte_present(pte)  (!writing || pte_write(pte))) {
 +struct page *pg = NULL;
 +pg = realmode_pfn_to_page(pte_pfn(pte));
 +if (realmode_get_page(pg)) {
 +ret = __pte(0);
 +} else {
 +pte = pte_mkyoung(pte);
 +if (writing)
 +pte = pte_mkdirty(pte);
 +}
 +}
 +*ptep = pte;/* clears _PAGE_BUSY */
  
  return ret;
  }
 
 So now you are adding the clearing of _PAGE_BUSY that was missing for
 your first patch, except that this is not enough since that means that
 in the emulated case (ie, !do_get_page) you will in essence return
 and then use a PTE that is not locked without any synchronization to
 ensure that the underlying page doesn't go away... then you'll
 dereference that page.
 
 So either make everything use speculative get_page, or make the emulated
 case use the MMU notifier to drop the operation in case of collision.
 
 The former looks easier.
 
 Also, any specific reason why you do:
 
   - Lock the PTE
   - get_page()
   - Unlock the PTE
 
 Instead of
 
   - Read the PTE
   - get_page_unless_zero
   - re-check PTE
 
 Like get_user_pages_fast() does ?
 
 The former will be two atomic ops, the latter only one (faster), but
 maybe you have a good reason why that can't work...



If we want to set dirty and young bits for pte then I do not know how
to avoid _PAGE_BUSY.



-- 
Alexey
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling

2013-06-18 Thread Rusty Russell
Alex Williamson alex.william...@redhat.com writes:
 On Mon, 2013-06-17 at 13:56 +1000, Benjamin Herrenschmidt wrote:
 On Sun, 2013-06-16 at 21:13 -0600, Alex Williamson wrote:
 
  IOMMU groups themselves don't provide security, they're accessed by
  interfaces like VFIO, which provide the security.  Given a brief look, I
  agree, this looks like a possible backdoor.  The typical VFIO way to
  handle this would be to pass a VFIO file descriptor here to prove that
  the process has access to the IOMMU group.  This is how /dev/vfio/vfio
  gains the ability to setup an IOMMU domain an do mappings with the
  SET_CONTAINER ioctl using a group fd.  Thanks,
 
 How do you envision that in the kernel ? IE. I'm in KVM code, gets that
 vfio fd, what do I do with it ?
 
 Basically, KVM needs to know that the user is allowed to use that iommu
 group. I don't think we want KVM however to call into VFIO directly
 right ?

 Right, we don't want to create dependencies across modules.  I don't
 have a vision for how this should work.  This is effectively a complete
 side-band to vfio, so we're really just dealing in the iommu group
 space.  Maybe there needs to be some kind of registration of ownership
 for the group using some kind of token.  It would need to include some
 kind of notification when that ownership ends.  That might also be a
 convenient tag to toggle driver probing off for devices in the group.
 Other ideas?  Thanks,

It's actually not that bad.

eg. 

struct vfio_container *vfio_container_from_file(struct file *filp)
{
if (filp-f_op != vfio_device_fops)
return ERR_PTR(-EINVAL);

/* OK it really is a vfio fd, return the data. */

}
EXPORT_SYMBOL_GPL(vfio_container_from_file);

...

inside KVM_CREATE_SPAPR_TCE_IOMMU:

struct file *vfio_filp;
struct vfio_container *(lookup)(struct file *filp);

vfio_filp = fget(create_tce_iommu.fd);
if (!vfio)
ret = -EBADF;
lookup = symbol_get(vfio_container_from_file);
if (!lookup)
ret = -EINVAL;
else {
container = lookup(vfio_filp);
if (IS_ERR(container))
ret = PTR_ERR(container);
else
...
symbol_put(vfio_container_from_file);
}

symbol_get() won't try to load a module; it'll just fail.  This is what
you want, since they must have vfio in the kernel to get a valid fd...

Hope that helps,
Rusty.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling

2013-06-18 Thread Benjamin Herrenschmidt
On Wed, 2013-06-19 at 13:05 +0930, Rusty Russell wrote:
 symbol_get() won't try to load a module; it'll just fail.  This is what
 you want, since they must have vfio in the kernel to get a valid fd...

Ok, cool. I suppose what we want here Alexey is slightly higher level,
something like:

vfio_validate_iommu_id(file, iommu_id)

Which verifies that the file that was passed in is allowed to use
that iommu_id.

That's a simple and flexible interface (ie, it will work even if we
support multiple iommu IDs in the future for a vfio, for example
for DDW windows etc...), the logic to know about the ID remains
in qemu, this is strictly a validation call.

That way we also don't have to expose the containing vfio struct etc...
just that simple function.

Alex, any objection ?

Do we need to make it a get/put interface instead ?

vfio_validate_and_use_iommu(file, iommu_id);

vfio_release_iommu(file, iommu_id);

To ensure that the resource remains owned by the process until KVM
is closed as well ?

Or do we want to register with VFIO with a callback so that VFIO can
call us if it needs us to give it up ?

Cheers,
Ben.


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling

2013-06-18 Thread Alex Williamson
On Tue, 2013-06-18 at 14:38 +1000, Benjamin Herrenschmidt wrote:
 On Mon, 2013-06-17 at 20:32 -0600, Alex Williamson wrote:
 
  Right, we don't want to create dependencies across modules.  I don't
  have a vision for how this should work.  This is effectively a complete
  side-band to vfio, so we're really just dealing in the iommu group
  space.  Maybe there needs to be some kind of registration of ownership
  for the group using some kind of token.  It would need to include some
  kind of notification when that ownership ends.  That might also be a
  convenient tag to toggle driver probing off for devices in the group.
  Other ideas?  Thanks,
 
 All of that smells nasty like it will need a pile of bloody
 infrastructure which makes me think it's too complicated and not the
 right approach.
 
 How does access control work today on x86/VFIO ? Can you give me a bit
 more details ? I didn't get a good grasp in your previous email

The current model is not x86 specific, but it only covers doing iommu
and device access through vfio.  The kink here is that we're trying to
do device access and setup through vfio, but iommu manipulation through
kvm.  We may want to revisit whether we can do the in-kernel iommu
manipulation through vfio rather than kvm.

For vfio in general, the group is the unit of ownership.  A user is
granted access to /dev/vfio/$GROUP through file permissions.  The user
opens the group and a container (/dev/vfio/vfio) and calls SET_CONTAINER
on the group.  If supported by the platform, multiple groups can be set
to the same container, which allows for iommu domain sharing.  Once a
group is associated with a container, an iommu backend can be
initialized for the container.  Only then can a device be accessed
through the group.

So even if we were to pass a vfio group file descriptor into kvm and it
matched as some kind of ownership token on the iommu group, it's not
clear that's sufficient to assume we can start programming the iommu.
Thanks,

Alex

 From the look of it, the VFIO file descriptor is what has the access
 control to the underlying iommu, is this right ? So we somewhat need to
 transfer (or copy) that ownership from the VFIO fd to the KVM VM.
 
 I don't see a way to do that without some cross-layering here...
 
 Rusty, are you aware of some kernel mechanism we can use for that ?
 
 Cheers,
 Ben.
 
 



--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling

2013-06-18 Thread Benjamin Herrenschmidt
On Tue, 2013-06-18 at 08:48 -0600, Alex Williamson wrote:
 On Tue, 2013-06-18 at 14:38 +1000, Benjamin Herrenschmidt wrote:
  On Mon, 2013-06-17 at 20:32 -0600, Alex Williamson wrote:
  
   Right, we don't want to create dependencies across modules.  I don't
   have a vision for how this should work.  This is effectively a complete
   side-band to vfio, so we're really just dealing in the iommu group
   space.  Maybe there needs to be some kind of registration of ownership
   for the group using some kind of token.  It would need to include some
   kind of notification when that ownership ends.  That might also be a
   convenient tag to toggle driver probing off for devices in the group.
   Other ideas?  Thanks,
  
  All of that smells nasty like it will need a pile of bloody
  infrastructure which makes me think it's too complicated and not the
  right approach.
  
  How does access control work today on x86/VFIO ? Can you give me a bit
  more details ? I didn't get a good grasp in your previous email
 
 The current model is not x86 specific, but it only covers doing iommu
 and device access through vfio.  The kink here is that we're trying to
 do device access and setup through vfio, but iommu manipulation through
 kvm.  We may want to revisit whether we can do the in-kernel iommu
 manipulation through vfio rather than kvm.

How would that be possible ?

The hypercalls from the guest arrive in KVM... in a very very specific 
restricted environment which we call real mode (MMU off but still
running in guest context), where we try to do as much as possible, or in
virtual mode, where they get handled as normal KVM exits.

The only way we could handle them in VFIO would be if somewhat VFIO
registered callbacks with KVM... if we have that sort of
cross-dependency, then we may as well have a simpler one where VFIO
tells KVM what iommu is available for the VM

 For vfio in general, the group is the unit of ownership.  A user is
 granted access to /dev/vfio/$GROUP through file permissions.  The user
 opens the group and a container (/dev/vfio/vfio) and calls SET_CONTAINER
 on the group.  If supported by the platform, multiple groups can be set
 to the same container, which allows for iommu domain sharing.  Once a
 group is associated with a container, an iommu backend can be
 initialized for the container.  Only then can a device be accessed
 through the group.
 
 So even if we were to pass a vfio group file descriptor into kvm and it
 matched as some kind of ownership token on the iommu group, it's not
 clear that's sufficient to assume we can start programming the iommu.
 Thanks,

Your scheme seems to me that it would have the same problem if you
wanted to do virtualized iommu

In any case, this is a big deal. We have a requirement for pass-through.
It cannot work with any remotely usable performance level if we don't
implement the calls in KVM, so it needs to be sorted one way or another
and I'm at a loss how here...

Ben.

 Alex
 
  From the look of it, the VFIO file descriptor is what has the access
  control to the underlying iommu, is this right ? So we somewhat need to
  transfer (or copy) that ownership from the VFIO fd to the KVM VM.
  
  I don't see a way to do that without some cross-layering here...
  
  Rusty, are you aware of some kernel mechanism we can use for that ?
  
  Cheers,
  Ben.
  
  
 
 
 
 --
 To unsubscribe from this list: send the line unsubscribe kvm-ppc in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling

2013-06-18 Thread Alexey Kardashevskiy
On 06/16/2013 02:39 PM, Benjamin Herrenschmidt wrote:
  static pte_t kvmppc_lookup_pte(pgd_t *pgdir, unsigned long hva, bool 
 writing,
 -unsigned long *pte_sizep)
 +unsigned long *pte_sizep, bool do_get_page)
  {
  pte_t *ptep;
  unsigned int shift = 0;
 @@ -135,6 +136,14 @@ static pte_t kvmppc_lookup_pte(pgd_t *pgdir, unsigned 
 long hva, bool writing,
  if (!pte_present(*ptep))
  return __pte(0);
  
 +/*
 + * Put huge pages handling to the virtual mode.
 + * The only exception is for TCE list pages which we
 + * do need to call get_page() for.
 + */
 +if ((*pte_sizep  PAGE_SIZE)  do_get_page)
 +return __pte(0);
 +
  /* wait until _PAGE_BUSY is clear then set it atomically */
  __asm__ __volatile__ (
  1: ldarx   %0,0,%3\n
 @@ -148,6 +157,18 @@ static pte_t kvmppc_lookup_pte(pgd_t *pgdir, unsigned 
 long hva, bool writing,
  : cc);
  
  ret = pte;
 +if (do_get_page  pte_present(pte)  (!writing || pte_write(pte))) {
 +struct page *pg = NULL;
 +pg = realmode_pfn_to_page(pte_pfn(pte));
 +if (realmode_get_page(pg)) {
 +ret = __pte(0);
 +} else {
 +pte = pte_mkyoung(pte);
 +if (writing)
 +pte = pte_mkdirty(pte);
 +}
 +}
 +*ptep = pte;/* clears _PAGE_BUSY */
  
  return ret;
  }
 
 So now you are adding the clearing of _PAGE_BUSY that was missing for
 your first patch, except that this is not enough since that means that
 in the emulated case (ie, !do_get_page) you will in essence return
 and then use a PTE that is not locked without any synchronization to
 ensure that the underlying page doesn't go away... then you'll
 dereference that page.
 
 So either make everything use speculative get_page, or make the emulated
 case use the MMU notifier to drop the operation in case of collision.
 
 The former looks easier.
 
 Also, any specific reason why you do:
 
   - Lock the PTE
   - get_page()
   - Unlock the PTE
 
 Instead of
 
   - Read the PTE
   - get_page_unless_zero
   - re-check PTE
 
 Like get_user_pages_fast() does ?
 
 The former will be two atomic ops, the latter only one (faster), but
 maybe you have a good reason why that can't work...



If we want to set dirty and young bits for pte then I do not know how
to avoid _PAGE_BUSY.



-- 
Alexey
--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling

2013-06-18 Thread Rusty Russell
Alex Williamson alex.william...@redhat.com writes:
 On Mon, 2013-06-17 at 13:56 +1000, Benjamin Herrenschmidt wrote:
 On Sun, 2013-06-16 at 21:13 -0600, Alex Williamson wrote:
 
  IOMMU groups themselves don't provide security, they're accessed by
  interfaces like VFIO, which provide the security.  Given a brief look, I
  agree, this looks like a possible backdoor.  The typical VFIO way to
  handle this would be to pass a VFIO file descriptor here to prove that
  the process has access to the IOMMU group.  This is how /dev/vfio/vfio
  gains the ability to setup an IOMMU domain an do mappings with the
  SET_CONTAINER ioctl using a group fd.  Thanks,
 
 How do you envision that in the kernel ? IE. I'm in KVM code, gets that
 vfio fd, what do I do with it ?
 
 Basically, KVM needs to know that the user is allowed to use that iommu
 group. I don't think we want KVM however to call into VFIO directly
 right ?

 Right, we don't want to create dependencies across modules.  I don't
 have a vision for how this should work.  This is effectively a complete
 side-band to vfio, so we're really just dealing in the iommu group
 space.  Maybe there needs to be some kind of registration of ownership
 for the group using some kind of token.  It would need to include some
 kind of notification when that ownership ends.  That might also be a
 convenient tag to toggle driver probing off for devices in the group.
 Other ideas?  Thanks,

It's actually not that bad.

eg. 

struct vfio_container *vfio_container_from_file(struct file *filp)
{
if (filp-f_op != vfio_device_fops)
return ERR_PTR(-EINVAL);

/* OK it really is a vfio fd, return the data. */

}
EXPORT_SYMBOL_GPL(vfio_container_from_file);

...

inside KVM_CREATE_SPAPR_TCE_IOMMU:

struct file *vfio_filp;
struct vfio_container *(lookup)(struct file *filp);

vfio_filp = fget(create_tce_iommu.fd);
if (!vfio)
ret = -EBADF;
lookup = symbol_get(vfio_container_from_file);
if (!lookup)
ret = -EINVAL;
else {
container = lookup(vfio_filp);
if (IS_ERR(container))
ret = PTR_ERR(container);
else
...
symbol_put(vfio_container_from_file);
}

symbol_get() won't try to load a module; it'll just fail.  This is what
you want, since they must have vfio in the kernel to get a valid fd...

Hope that helps,
Rusty.

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling

2013-06-18 Thread Benjamin Herrenschmidt
On Wed, 2013-06-19 at 13:05 +0930, Rusty Russell wrote:
 symbol_get() won't try to load a module; it'll just fail.  This is what
 you want, since they must have vfio in the kernel to get a valid fd...

Ok, cool. I suppose what we want here Alexey is slightly higher level,
something like:

vfio_validate_iommu_id(file, iommu_id)

Which verifies that the file that was passed in is allowed to use
that iommu_id.

That's a simple and flexible interface (ie, it will work even if we
support multiple iommu IDs in the future for a vfio, for example
for DDW windows etc...), the logic to know about the ID remains
in qemu, this is strictly a validation call.

That way we also don't have to expose the containing vfio struct etc...
just that simple function.

Alex, any objection ?

Do we need to make it a get/put interface instead ?

vfio_validate_and_use_iommu(file, iommu_id);

vfio_release_iommu(file, iommu_id);

To ensure that the resource remains owned by the process until KVM
is closed as well ?

Or do we want to register with VFIO with a callback so that VFIO can
call us if it needs us to give it up ?

Cheers,
Ben.


--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling

2013-06-17 Thread Alex Williamson
On Mon, 2013-06-17 at 13:56 +1000, Benjamin Herrenschmidt wrote:
 On Sun, 2013-06-16 at 21:13 -0600, Alex Williamson wrote:
 
  IOMMU groups themselves don't provide security, they're accessed by
  interfaces like VFIO, which provide the security.  Given a brief look, I
  agree, this looks like a possible backdoor.  The typical VFIO way to
  handle this would be to pass a VFIO file descriptor here to prove that
  the process has access to the IOMMU group.  This is how /dev/vfio/vfio
  gains the ability to setup an IOMMU domain an do mappings with the
  SET_CONTAINER ioctl using a group fd.  Thanks,
 
 How do you envision that in the kernel ? IE. I'm in KVM code, gets that
 vfio fd, what do I do with it ?
 
 Basically, KVM needs to know that the user is allowed to use that iommu
 group. I don't think we want KVM however to call into VFIO directly
 right ?

Right, we don't want to create dependencies across modules.  I don't
have a vision for how this should work.  This is effectively a complete
side-band to vfio, so we're really just dealing in the iommu group
space.  Maybe there needs to be some kind of registration of ownership
for the group using some kind of token.  It would need to include some
kind of notification when that ownership ends.  That might also be a
convenient tag to toggle driver probing off for devices in the group.
Other ideas?  Thanks,

Alex

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling

2013-06-17 Thread Benjamin Herrenschmidt
On Mon, 2013-06-17 at 20:32 -0600, Alex Williamson wrote:

 Right, we don't want to create dependencies across modules.  I don't
 have a vision for how this should work.  This is effectively a complete
 side-band to vfio, so we're really just dealing in the iommu group
 space.  Maybe there needs to be some kind of registration of ownership
 for the group using some kind of token.  It would need to include some
 kind of notification when that ownership ends.  That might also be a
 convenient tag to toggle driver probing off for devices in the group.
 Other ideas?  Thanks,

All of that smells nasty like it will need a pile of bloody
infrastructure which makes me think it's too complicated and not the
right approach.

How does access control work today on x86/VFIO ? Can you give me a bit
more details ? I didn't get a good grasp in your previous email

From the look of it, the VFIO file descriptor is what has the access
control to the underlying iommu, is this right ? So we somewhat need to
transfer (or copy) that ownership from the VFIO fd to the KVM VM.

I don't see a way to do that without some cross-layering here...

Rusty, are you aware of some kernel mechanism we can use for that ?

Cheers,
Ben.


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling

2013-06-17 Thread Alex Williamson
On Mon, 2013-06-17 at 13:56 +1000, Benjamin Herrenschmidt wrote:
 On Sun, 2013-06-16 at 21:13 -0600, Alex Williamson wrote:
 
  IOMMU groups themselves don't provide security, they're accessed by
  interfaces like VFIO, which provide the security.  Given a brief look, I
  agree, this looks like a possible backdoor.  The typical VFIO way to
  handle this would be to pass a VFIO file descriptor here to prove that
  the process has access to the IOMMU group.  This is how /dev/vfio/vfio
  gains the ability to setup an IOMMU domain an do mappings with the
  SET_CONTAINER ioctl using a group fd.  Thanks,
 
 How do you envision that in the kernel ? IE. I'm in KVM code, gets that
 vfio fd, what do I do with it ?
 
 Basically, KVM needs to know that the user is allowed to use that iommu
 group. I don't think we want KVM however to call into VFIO directly
 right ?

Right, we don't want to create dependencies across modules.  I don't
have a vision for how this should work.  This is effectively a complete
side-band to vfio, so we're really just dealing in the iommu group
space.  Maybe there needs to be some kind of registration of ownership
for the group using some kind of token.  It would need to include some
kind of notification when that ownership ends.  That might also be a
convenient tag to toggle driver probing off for devices in the group.
Other ideas?  Thanks,

Alex

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling

2013-06-17 Thread Benjamin Herrenschmidt
On Mon, 2013-06-17 at 20:32 -0600, Alex Williamson wrote:

 Right, we don't want to create dependencies across modules.  I don't
 have a vision for how this should work.  This is effectively a complete
 side-band to vfio, so we're really just dealing in the iommu group
 space.  Maybe there needs to be some kind of registration of ownership
 for the group using some kind of token.  It would need to include some
 kind of notification when that ownership ends.  That might also be a
 convenient tag to toggle driver probing off for devices in the group.
 Other ideas?  Thanks,

All of that smells nasty like it will need a pile of bloody
infrastructure which makes me think it's too complicated and not the
right approach.

How does access control work today on x86/VFIO ? Can you give me a bit
more details ? I didn't get a good grasp in your previous email

From the look of it, the VFIO file descriptor is what has the access
control to the underlying iommu, is this right ? So we somewhat need to
transfer (or copy) that ownership from the VFIO fd to the KVM VM.

I don't see a way to do that without some cross-layering here...

Rusty, are you aware of some kernel mechanism we can use for that ?

Cheers,
Ben.


--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling

2013-06-16 Thread Alexander Graf

On 05.06.2013, at 08:11, Alexey Kardashevskiy wrote:

 This allows the host kernel to handle H_PUT_TCE, H_PUT_TCE_INDIRECT
 and H_STUFF_TCE requests without passing them to QEMU, which should
 save time on switching to QEMU and back.
 
 Both real and virtual modes are supported - whenever the kernel
 fails to handle TCE request, it passes it to the virtual mode.
 If it the virtual mode handlers fail, then the request is passed
 to the user mode, for example, to QEMU.
 
 This adds a new KVM_CAP_SPAPR_TCE_IOMMU ioctl to asssociate
 a virtual PCI bus ID (LIOBN) with an IOMMU group, which enables
 in-kernel handling of IOMMU map/unmap.
 
 Tests show that this patch increases transmission speed from 220MB/s
 to 750..1020MB/s on 10Gb network (Chelsea CXGB3 10Gb ethernet card).
 
 Cc: David Gibson da...@gibson.dropbear.id.au
 Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru
 Signed-off-by: Paul Mackerras pau...@samba.org
 
 ---
 
 Changes:
 2013/06/05:
 * changed capability number
 * changed ioctl number
 * update the doc article number
 
 2013/05/20:
 * removed get_user() from real mode handlers
 * kvm_vcpu_arch::tce_tmp usage extended. Now real mode handler puts there
 translated TCEs, tries realmode_get_page() on those and if it fails, it
 passes control over the virtual mode handler which tries to finish
 the request handling
 * kvmppc_lookup_pte() now does realmode_get_page() protected by BUSY bit
 on a page
 * The only reason to pass the request to user mode now is when the user mode
 did not register TCE table in the kernel, in all other cases the virtual mode
 handler is expected to do the job
 ---
 Documentation/virtual/kvm/api.txt   |   28 +
 arch/powerpc/include/asm/kvm_host.h |3 +
 arch/powerpc/include/asm/kvm_ppc.h  |2 +
 arch/powerpc/include/uapi/asm/kvm.h |7 ++
 arch/powerpc/kvm/book3s_64_vio.c|  198 ++-
 arch/powerpc/kvm/book3s_64_vio_hv.c |  193 +-
 arch/powerpc/kvm/powerpc.c  |   12 +++
 include/uapi/linux/kvm.h|2 +
 8 files changed, 439 insertions(+), 6 deletions(-)
 
 diff --git a/Documentation/virtual/kvm/api.txt 
 b/Documentation/virtual/kvm/api.txt
 index 6c082ff..e962e3b 100644
 --- a/Documentation/virtual/kvm/api.txt
 +++ b/Documentation/virtual/kvm/api.txt
 @@ -2379,6 +2379,34 @@ the guest. Othwerwise it might be better for the guest 
 to continue using H_PUT_T
 hypercall (if KVM_CAP_SPAPR_TCE or KVM_CAP_SPAPR_TCE_IOMMU are present).
 
 
 +4.84 KVM_CREATE_SPAPR_TCE_IOMMU
 +
 +Capability: KVM_CAP_SPAPR_TCE_IOMMU
 +Architectures: powerpc
 +Type: vm ioctl
 +Parameters: struct kvm_create_spapr_tce_iommu (in)
 +Returns: 0 on success, -1 on error
 +
 +This creates a link between IOMMU group and a hardware TCE (translation
 +control entry) table. This link lets the host kernel know what IOMMU
 +group (i.e. TCE table) to use for the LIOBN number passed with
 +H_PUT_TCE, H_PUT_TCE_INDIRECT, H_STUFF_TCE hypercalls.
 +
 +/* for KVM_CAP_SPAPR_TCE_IOMMU */
 +struct kvm_create_spapr_tce_iommu {
 + __u64 liobn;
 + __u32 iommu_id;
 + __u32 flags;
 +};
 +
 +No flag is supported at the moment.
 +
 +When the guest issues TCE call on a liobn for which a TCE table has been
 +registered, the kernel will handle it in real mode, updating the hardware
 +TCE table. TCE table calls for other liobns will cause a vm exit and must
 +be handled by userspace.

Ok, please walk me through the security model you have in mind here.

Basically what this ioctl does is that it creates a guest TCE table that 
reflects its changes into a host TCE table whenever it gets modified. So far so 
good.

Now I don't see any checks that verify whether iommu_id is actually good to use 
from that user's access rights. Just because I have access to /dev/kvm I don't 
necessarily have access to an iommu control device.

So the least I can see would be a local DoS attack where one user space program 
with only access to /dev/kvm can simply kill any access to another process's 
device by overflowing a host iommu TCE table with junk entries.

There's even a certain chance of an information disclosure exploit here where a 
malicious user space program could get itself all network traffic DMA'd from 
another VM.

How does this work on the host level? What is the security token to take 
control of a host TCE table?


Alex

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling

2013-06-16 Thread Alex Williamson
On Mon, 2013-06-17 at 08:39 +1000, Benjamin Herrenschmidt wrote:
 On Wed, 2013-06-05 at 16:11 +1000, Alexey Kardashevskiy wrote:
  +long kvm_vm_ioctl_create_spapr_tce_iommu(struct kvm *kvm,
  +   struct kvm_create_spapr_tce_iommu *args)
  +{
  +   struct kvmppc_spapr_tce_table *tt = NULL;
  +   struct iommu_group *grp;
  +   struct iommu_table *tbl;
  +
  +   /* Find an IOMMU table for the given ID */
  +   grp = iommu_group_get_by_id(args-iommu_id);
  +   if (!grp)
  +   return -ENXIO;
  +
  +   tbl = iommu_group_get_iommudata(grp);
  +   if (!tbl)
  +   return -ENXIO;
 
 So Alex Graf pointed out here, there is a security issue here, or are we
 missing something ?
 
 What prevents a malicious program that has access to /dev/kvm from
 taking over random iommu groups (including host used ones) that way?
 
 What is the security model of that whole iommu stuff to begin with ?

IOMMU groups themselves don't provide security, they're accessed by
interfaces like VFIO, which provide the security.  Given a brief look, I
agree, this looks like a possible backdoor.  The typical VFIO way to
handle this would be to pass a VFIO file descriptor here to prove that
the process has access to the IOMMU group.  This is how /dev/vfio/vfio
gains the ability to setup an IOMMU domain an do mappings with the
SET_CONTAINER ioctl using a group fd.  Thanks,

Alex



--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling

2013-06-16 Thread Benjamin Herrenschmidt
On Sun, 2013-06-16 at 21:13 -0600, Alex Williamson wrote:

 IOMMU groups themselves don't provide security, they're accessed by
 interfaces like VFIO, which provide the security.  Given a brief look, I
 agree, this looks like a possible backdoor.  The typical VFIO way to
 handle this would be to pass a VFIO file descriptor here to prove that
 the process has access to the IOMMU group.  This is how /dev/vfio/vfio
 gains the ability to setup an IOMMU domain an do mappings with the
 SET_CONTAINER ioctl using a group fd.  Thanks,

How do you envision that in the kernel ? IE. I'm in KVM code, gets that
vfio fd, what do I do with it ?

Basically, KVM needs to know that the user is allowed to use that iommu
group. I don't think we want KVM however to call into VFIO directly
right ?

Cheers,
Ben.


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling

2013-06-16 Thread Alexander Graf

On 05.06.2013, at 08:11, Alexey Kardashevskiy wrote:

 This allows the host kernel to handle H_PUT_TCE, H_PUT_TCE_INDIRECT
 and H_STUFF_TCE requests without passing them to QEMU, which should
 save time on switching to QEMU and back.
 
 Both real and virtual modes are supported - whenever the kernel
 fails to handle TCE request, it passes it to the virtual mode.
 If it the virtual mode handlers fail, then the request is passed
 to the user mode, for example, to QEMU.
 
 This adds a new KVM_CAP_SPAPR_TCE_IOMMU ioctl to asssociate
 a virtual PCI bus ID (LIOBN) with an IOMMU group, which enables
 in-kernel handling of IOMMU map/unmap.
 
 Tests show that this patch increases transmission speed from 220MB/s
 to 750..1020MB/s on 10Gb network (Chelsea CXGB3 10Gb ethernet card).
 
 Cc: David Gibson da...@gibson.dropbear.id.au
 Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru
 Signed-off-by: Paul Mackerras pau...@samba.org
 
 ---
 
 Changes:
 2013/06/05:
 * changed capability number
 * changed ioctl number
 * update the doc article number
 
 2013/05/20:
 * removed get_user() from real mode handlers
 * kvm_vcpu_arch::tce_tmp usage extended. Now real mode handler puts there
 translated TCEs, tries realmode_get_page() on those and if it fails, it
 passes control over the virtual mode handler which tries to finish
 the request handling
 * kvmppc_lookup_pte() now does realmode_get_page() protected by BUSY bit
 on a page
 * The only reason to pass the request to user mode now is when the user mode
 did not register TCE table in the kernel, in all other cases the virtual mode
 handler is expected to do the job
 ---
 Documentation/virtual/kvm/api.txt   |   28 +
 arch/powerpc/include/asm/kvm_host.h |3 +
 arch/powerpc/include/asm/kvm_ppc.h  |2 +
 arch/powerpc/include/uapi/asm/kvm.h |7 ++
 arch/powerpc/kvm/book3s_64_vio.c|  198 ++-
 arch/powerpc/kvm/book3s_64_vio_hv.c |  193 +-
 arch/powerpc/kvm/powerpc.c  |   12 +++
 include/uapi/linux/kvm.h|2 +
 8 files changed, 439 insertions(+), 6 deletions(-)
 
 diff --git a/Documentation/virtual/kvm/api.txt 
 b/Documentation/virtual/kvm/api.txt
 index 6c082ff..e962e3b 100644
 --- a/Documentation/virtual/kvm/api.txt
 +++ b/Documentation/virtual/kvm/api.txt
 @@ -2379,6 +2379,34 @@ the guest. Othwerwise it might be better for the guest 
 to continue using H_PUT_T
 hypercall (if KVM_CAP_SPAPR_TCE or KVM_CAP_SPAPR_TCE_IOMMU are present).
 
 
 +4.84 KVM_CREATE_SPAPR_TCE_IOMMU
 +
 +Capability: KVM_CAP_SPAPR_TCE_IOMMU
 +Architectures: powerpc
 +Type: vm ioctl
 +Parameters: struct kvm_create_spapr_tce_iommu (in)
 +Returns: 0 on success, -1 on error
 +
 +This creates a link between IOMMU group and a hardware TCE (translation
 +control entry) table. This link lets the host kernel know what IOMMU
 +group (i.e. TCE table) to use for the LIOBN number passed with
 +H_PUT_TCE, H_PUT_TCE_INDIRECT, H_STUFF_TCE hypercalls.
 +
 +/* for KVM_CAP_SPAPR_TCE_IOMMU */
 +struct kvm_create_spapr_tce_iommu {
 + __u64 liobn;
 + __u32 iommu_id;
 + __u32 flags;
 +};
 +
 +No flag is supported at the moment.
 +
 +When the guest issues TCE call on a liobn for which a TCE table has been
 +registered, the kernel will handle it in real mode, updating the hardware
 +TCE table. TCE table calls for other liobns will cause a vm exit and must
 +be handled by userspace.

Ok, please walk me through the security model you have in mind here.

Basically what this ioctl does is that it creates a guest TCE table that 
reflects its changes into a host TCE table whenever it gets modified. So far so 
good.

Now I don't see any checks that verify whether iommu_id is actually good to use 
from that user's access rights. Just because I have access to /dev/kvm I don't 
necessarily have access to an iommu control device.

So the least I can see would be a local DoS attack where one user space program 
with only access to /dev/kvm can simply kill any access to another process's 
device by overflowing a host iommu TCE table with junk entries.

There's even a certain chance of an information disclosure exploit here where a 
malicious user space program could get itself all network traffic DMA'd from 
another VM.

How does this work on the host level? What is the security token to take 
control of a host TCE table?


Alex

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling

2013-06-16 Thread Benjamin Herrenschmidt
On Wed, 2013-06-05 at 16:11 +1000, Alexey Kardashevskiy wrote:
 +long kvm_vm_ioctl_create_spapr_tce_iommu(struct kvm *kvm,
 +   struct kvm_create_spapr_tce_iommu *args)
 +{
 +   struct kvmppc_spapr_tce_table *tt = NULL;
 +   struct iommu_group *grp;
 +   struct iommu_table *tbl;
 +
 +   /* Find an IOMMU table for the given ID */
 +   grp = iommu_group_get_by_id(args-iommu_id);
 +   if (!grp)
 +   return -ENXIO;
 +
 +   tbl = iommu_group_get_iommudata(grp);
 +   if (!tbl)
 +   return -ENXIO;

So Alex Graf pointed out here, there is a security issue here, or are we
missing something ?

What prevents a malicious program that has access to /dev/kvm from
taking over random iommu groups (including host used ones) that way?

What is the security model of that whole iommu stuff to begin with ?

Cheers,
Ben.


--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling

2013-06-16 Thread Alex Williamson
On Mon, 2013-06-17 at 08:39 +1000, Benjamin Herrenschmidt wrote:
 On Wed, 2013-06-05 at 16:11 +1000, Alexey Kardashevskiy wrote:
  +long kvm_vm_ioctl_create_spapr_tce_iommu(struct kvm *kvm,
  +   struct kvm_create_spapr_tce_iommu *args)
  +{
  +   struct kvmppc_spapr_tce_table *tt = NULL;
  +   struct iommu_group *grp;
  +   struct iommu_table *tbl;
  +
  +   /* Find an IOMMU table for the given ID */
  +   grp = iommu_group_get_by_id(args-iommu_id);
  +   if (!grp)
  +   return -ENXIO;
  +
  +   tbl = iommu_group_get_iommudata(grp);
  +   if (!tbl)
  +   return -ENXIO;
 
 So Alex Graf pointed out here, there is a security issue here, or are we
 missing something ?
 
 What prevents a malicious program that has access to /dev/kvm from
 taking over random iommu groups (including host used ones) that way?
 
 What is the security model of that whole iommu stuff to begin with ?

IOMMU groups themselves don't provide security, they're accessed by
interfaces like VFIO, which provide the security.  Given a brief look, I
agree, this looks like a possible backdoor.  The typical VFIO way to
handle this would be to pass a VFIO file descriptor here to prove that
the process has access to the IOMMU group.  This is how /dev/vfio/vfio
gains the ability to setup an IOMMU domain an do mappings with the
SET_CONTAINER ioctl using a group fd.  Thanks,

Alex



--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling

2013-06-16 Thread Benjamin Herrenschmidt
On Sun, 2013-06-16 at 21:13 -0600, Alex Williamson wrote:

 IOMMU groups themselves don't provide security, they're accessed by
 interfaces like VFIO, which provide the security.  Given a brief look, I
 agree, this looks like a possible backdoor.  The typical VFIO way to
 handle this would be to pass a VFIO file descriptor here to prove that
 the process has access to the IOMMU group.  This is how /dev/vfio/vfio
 gains the ability to setup an IOMMU domain an do mappings with the
 SET_CONTAINER ioctl using a group fd.  Thanks,

How do you envision that in the kernel ? IE. I'm in KVM code, gets that
vfio fd, what do I do with it ?

Basically, KVM needs to know that the user is allowed to use that iommu
group. I don't think we want KVM however to call into VFIO directly
right ?

Cheers,
Ben.


--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling

2013-06-15 Thread Benjamin Herrenschmidt
  static pte_t kvmppc_lookup_pte(pgd_t *pgdir, unsigned long hva, bool writing,
 - unsigned long *pte_sizep)
 + unsigned long *pte_sizep, bool do_get_page)
  {
   pte_t *ptep;
   unsigned int shift = 0;
 @@ -135,6 +136,14 @@ static pte_t kvmppc_lookup_pte(pgd_t *pgdir, unsigned 
 long hva, bool writing,
   if (!pte_present(*ptep))
   return __pte(0);
  
 + /*
 +  * Put huge pages handling to the virtual mode.
 +  * The only exception is for TCE list pages which we
 +  * do need to call get_page() for.
 +  */
 + if ((*pte_sizep  PAGE_SIZE)  do_get_page)
 + return __pte(0);
 +
   /* wait until _PAGE_BUSY is clear then set it atomically */
   __asm__ __volatile__ (
   1: ldarx   %0,0,%3\n
 @@ -148,6 +157,18 @@ static pte_t kvmppc_lookup_pte(pgd_t *pgdir, unsigned 
 long hva, bool writing,
   : cc);
  
   ret = pte;
 + if (do_get_page  pte_present(pte)  (!writing || pte_write(pte))) {
 + struct page *pg = NULL;
 + pg = realmode_pfn_to_page(pte_pfn(pte));
 + if (realmode_get_page(pg)) {
 + ret = __pte(0);
 + } else {
 + pte = pte_mkyoung(pte);
 + if (writing)
 + pte = pte_mkdirty(pte);
 + }
 + }
 + *ptep = pte;/* clears _PAGE_BUSY */
  
   return ret;
  }

So now you are adding the clearing of _PAGE_BUSY that was missing for
your first patch, except that this is not enough since that means that
in the emulated case (ie, !do_get_page) you will in essence return
and then use a PTE that is not locked without any synchronization to
ensure that the underlying page doesn't go away... then you'll
dereference that page.

So either make everything use speculative get_page, or make the emulated
case use the MMU notifier to drop the operation in case of collision.

The former looks easier.

Also, any specific reason why you do:

  - Lock the PTE
  - get_page()
  - Unlock the PTE

Instead of

  - Read the PTE
  - get_page_unless_zero
  - re-check PTE

Like get_user_pages_fast() does ?

The former will be two atomic ops, the latter only one (faster), but
maybe you have a good reason why that can't work...

Cheers,
Ben.


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling

2013-06-15 Thread Benjamin Herrenschmidt
  static pte_t kvmppc_lookup_pte(pgd_t *pgdir, unsigned long hva, bool writing,
 - unsigned long *pte_sizep)
 + unsigned long *pte_sizep, bool do_get_page)
  {
   pte_t *ptep;
   unsigned int shift = 0;
 @@ -135,6 +136,14 @@ static pte_t kvmppc_lookup_pte(pgd_t *pgdir, unsigned 
 long hva, bool writing,
   if (!pte_present(*ptep))
   return __pte(0);
  
 + /*
 +  * Put huge pages handling to the virtual mode.
 +  * The only exception is for TCE list pages which we
 +  * do need to call get_page() for.
 +  */
 + if ((*pte_sizep  PAGE_SIZE)  do_get_page)
 + return __pte(0);
 +
   /* wait until _PAGE_BUSY is clear then set it atomically */
   __asm__ __volatile__ (
   1: ldarx   %0,0,%3\n
 @@ -148,6 +157,18 @@ static pte_t kvmppc_lookup_pte(pgd_t *pgdir, unsigned 
 long hva, bool writing,
   : cc);
  
   ret = pte;
 + if (do_get_page  pte_present(pte)  (!writing || pte_write(pte))) {
 + struct page *pg = NULL;
 + pg = realmode_pfn_to_page(pte_pfn(pte));
 + if (realmode_get_page(pg)) {
 + ret = __pte(0);
 + } else {
 + pte = pte_mkyoung(pte);
 + if (writing)
 + pte = pte_mkdirty(pte);
 + }
 + }
 + *ptep = pte;/* clears _PAGE_BUSY */
  
   return ret;
  }

So now you are adding the clearing of _PAGE_BUSY that was missing for
your first patch, except that this is not enough since that means that
in the emulated case (ie, !do_get_page) you will in essence return
and then use a PTE that is not locked without any synchronization to
ensure that the underlying page doesn't go away... then you'll
dereference that page.

So either make everything use speculative get_page, or make the emulated
case use the MMU notifier to drop the operation in case of collision.

The former looks easier.

Also, any specific reason why you do:

  - Lock the PTE
  - get_page()
  - Unlock the PTE

Instead of

  - Read the PTE
  - get_page_unless_zero
  - re-check PTE

Like get_user_pages_fast() does ?

The former will be two atomic ops, the latter only one (faster), but
maybe you have a good reason why that can't work...

Cheers,
Ben.


--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling

2013-06-05 Thread Alexey Kardashevskiy
This allows the host kernel to handle H_PUT_TCE, H_PUT_TCE_INDIRECT
and H_STUFF_TCE requests without passing them to QEMU, which should
save time on switching to QEMU and back.

Both real and virtual modes are supported - whenever the kernel
fails to handle TCE request, it passes it to the virtual mode.
If it the virtual mode handlers fail, then the request is passed
to the user mode, for example, to QEMU.

This adds a new KVM_CAP_SPAPR_TCE_IOMMU ioctl to asssociate
a virtual PCI bus ID (LIOBN) with an IOMMU group, which enables
in-kernel handling of IOMMU map/unmap.

Tests show that this patch increases transmission speed from 220MB/s
to 750..1020MB/s on 10Gb network (Chelsea CXGB3 10Gb ethernet card).

Cc: David Gibson da...@gibson.dropbear.id.au
Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru
Signed-off-by: Paul Mackerras pau...@samba.org

---

Changes:
2013/06/05:
* changed capability number
* changed ioctl number
* update the doc article number

2013/05/20:
* removed get_user() from real mode handlers
* kvm_vcpu_arch::tce_tmp usage extended. Now real mode handler puts there
translated TCEs, tries realmode_get_page() on those and if it fails, it
passes control over the virtual mode handler which tries to finish
the request handling
* kvmppc_lookup_pte() now does realmode_get_page() protected by BUSY bit
on a page
* The only reason to pass the request to user mode now is when the user mode
did not register TCE table in the kernel, in all other cases the virtual mode
handler is expected to do the job
---
 Documentation/virtual/kvm/api.txt   |   28 +
 arch/powerpc/include/asm/kvm_host.h |3 +
 arch/powerpc/include/asm/kvm_ppc.h  |2 +
 arch/powerpc/include/uapi/asm/kvm.h |7 ++
 arch/powerpc/kvm/book3s_64_vio.c|  198 ++-
 arch/powerpc/kvm/book3s_64_vio_hv.c |  193 +-
 arch/powerpc/kvm/powerpc.c  |   12 +++
 include/uapi/linux/kvm.h|2 +
 8 files changed, 439 insertions(+), 6 deletions(-)

diff --git a/Documentation/virtual/kvm/api.txt 
b/Documentation/virtual/kvm/api.txt
index 6c082ff..e962e3b 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -2379,6 +2379,34 @@ the guest. Othwerwise it might be better for the guest 
to continue using H_PUT_T
 hypercall (if KVM_CAP_SPAPR_TCE or KVM_CAP_SPAPR_TCE_IOMMU are present).
 
 
+4.84 KVM_CREATE_SPAPR_TCE_IOMMU
+
+Capability: KVM_CAP_SPAPR_TCE_IOMMU
+Architectures: powerpc
+Type: vm ioctl
+Parameters: struct kvm_create_spapr_tce_iommu (in)
+Returns: 0 on success, -1 on error
+
+This creates a link between IOMMU group and a hardware TCE (translation
+control entry) table. This link lets the host kernel know what IOMMU
+group (i.e. TCE table) to use for the LIOBN number passed with
+H_PUT_TCE, H_PUT_TCE_INDIRECT, H_STUFF_TCE hypercalls.
+
+/* for KVM_CAP_SPAPR_TCE_IOMMU */
+struct kvm_create_spapr_tce_iommu {
+   __u64 liobn;
+   __u32 iommu_id;
+   __u32 flags;
+};
+
+No flag is supported at the moment.
+
+When the guest issues TCE call on a liobn for which a TCE table has been
+registered, the kernel will handle it in real mode, updating the hardware
+TCE table. TCE table calls for other liobns will cause a vm exit and must
+be handled by userspace.
+
+
 5. The kvm_run structure
 
 
diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index 85d8f26..ac0e2fe 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -180,6 +180,7 @@ struct kvmppc_spapr_tce_table {
struct kvm *kvm;
u64 liobn;
u32 window_size;
+   struct iommu_group *grp;/* used for IOMMU groups */
struct page *pages[0];
 };
 
@@ -611,6 +612,8 @@ struct kvm_vcpu_arch {
u64 busy_preempt;
 
unsigned long *tce_tmp;/* TCE cache for TCE_PUT_INDIRECT hall */
+   unsigned long tce_tmp_num; /* Number of handled TCEs in the cache */
+   unsigned long tce_reason;  /* The reason of switching to the virtmode */
 #endif
 };
 
diff --git a/arch/powerpc/include/asm/kvm_ppc.h 
b/arch/powerpc/include/asm/kvm_ppc.h
index e852921b..934e01d 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -133,6 +133,8 @@ extern int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu);
 
 extern long kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm,
struct kvm_create_spapr_tce *args);
+extern long kvm_vm_ioctl_create_spapr_tce_iommu(struct kvm *kvm,
+   struct kvm_create_spapr_tce_iommu *args);
 extern struct kvmppc_spapr_tce_table *kvmppc_find_tce_table(
struct kvm_vcpu *vcpu, unsigned long liobn);
 extern long kvmppc_emulated_validate_tce(unsigned long tce);
diff --git a/arch/powerpc/include/uapi/asm/kvm.h 
b/arch/powerpc/include/uapi/asm/kvm.h
index 0fb1a6e..cf82af4 100644
--- 

[PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling

2013-06-05 Thread Alexey Kardashevskiy
This allows the host kernel to handle H_PUT_TCE, H_PUT_TCE_INDIRECT
and H_STUFF_TCE requests without passing them to QEMU, which should
save time on switching to QEMU and back.

Both real and virtual modes are supported - whenever the kernel
fails to handle TCE request, it passes it to the virtual mode.
If it the virtual mode handlers fail, then the request is passed
to the user mode, for example, to QEMU.

This adds a new KVM_CAP_SPAPR_TCE_IOMMU ioctl to asssociate
a virtual PCI bus ID (LIOBN) with an IOMMU group, which enables
in-kernel handling of IOMMU map/unmap.

Tests show that this patch increases transmission speed from 220MB/s
to 750..1020MB/s on 10Gb network (Chelsea CXGB3 10Gb ethernet card).

Cc: David Gibson da...@gibson.dropbear.id.au
Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru
Signed-off-by: Paul Mackerras pau...@samba.org

---

Changes:
2013/06/05:
* changed capability number
* changed ioctl number
* update the doc article number

2013/05/20:
* removed get_user() from real mode handlers
* kvm_vcpu_arch::tce_tmp usage extended. Now real mode handler puts there
translated TCEs, tries realmode_get_page() on those and if it fails, it
passes control over the virtual mode handler which tries to finish
the request handling
* kvmppc_lookup_pte() now does realmode_get_page() protected by BUSY bit
on a page
* The only reason to pass the request to user mode now is when the user mode
did not register TCE table in the kernel, in all other cases the virtual mode
handler is expected to do the job
---
 Documentation/virtual/kvm/api.txt   |   28 +
 arch/powerpc/include/asm/kvm_host.h |3 +
 arch/powerpc/include/asm/kvm_ppc.h  |2 +
 arch/powerpc/include/uapi/asm/kvm.h |7 ++
 arch/powerpc/kvm/book3s_64_vio.c|  198 ++-
 arch/powerpc/kvm/book3s_64_vio_hv.c |  193 +-
 arch/powerpc/kvm/powerpc.c  |   12 +++
 include/uapi/linux/kvm.h|2 +
 8 files changed, 439 insertions(+), 6 deletions(-)

diff --git a/Documentation/virtual/kvm/api.txt 
b/Documentation/virtual/kvm/api.txt
index 6c082ff..e962e3b 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -2379,6 +2379,34 @@ the guest. Othwerwise it might be better for the guest 
to continue using H_PUT_T
 hypercall (if KVM_CAP_SPAPR_TCE or KVM_CAP_SPAPR_TCE_IOMMU are present).
 
 
+4.84 KVM_CREATE_SPAPR_TCE_IOMMU
+
+Capability: KVM_CAP_SPAPR_TCE_IOMMU
+Architectures: powerpc
+Type: vm ioctl
+Parameters: struct kvm_create_spapr_tce_iommu (in)
+Returns: 0 on success, -1 on error
+
+This creates a link between IOMMU group and a hardware TCE (translation
+control entry) table. This link lets the host kernel know what IOMMU
+group (i.e. TCE table) to use for the LIOBN number passed with
+H_PUT_TCE, H_PUT_TCE_INDIRECT, H_STUFF_TCE hypercalls.
+
+/* for KVM_CAP_SPAPR_TCE_IOMMU */
+struct kvm_create_spapr_tce_iommu {
+   __u64 liobn;
+   __u32 iommu_id;
+   __u32 flags;
+};
+
+No flag is supported at the moment.
+
+When the guest issues TCE call on a liobn for which a TCE table has been
+registered, the kernel will handle it in real mode, updating the hardware
+TCE table. TCE table calls for other liobns will cause a vm exit and must
+be handled by userspace.
+
+
 5. The kvm_run structure
 
 
diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index 85d8f26..ac0e2fe 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -180,6 +180,7 @@ struct kvmppc_spapr_tce_table {
struct kvm *kvm;
u64 liobn;
u32 window_size;
+   struct iommu_group *grp;/* used for IOMMU groups */
struct page *pages[0];
 };
 
@@ -611,6 +612,8 @@ struct kvm_vcpu_arch {
u64 busy_preempt;
 
unsigned long *tce_tmp;/* TCE cache for TCE_PUT_INDIRECT hall */
+   unsigned long tce_tmp_num; /* Number of handled TCEs in the cache */
+   unsigned long tce_reason;  /* The reason of switching to the virtmode */
 #endif
 };
 
diff --git a/arch/powerpc/include/asm/kvm_ppc.h 
b/arch/powerpc/include/asm/kvm_ppc.h
index e852921b..934e01d 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -133,6 +133,8 @@ extern int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu);
 
 extern long kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm,
struct kvm_create_spapr_tce *args);
+extern long kvm_vm_ioctl_create_spapr_tce_iommu(struct kvm *kvm,
+   struct kvm_create_spapr_tce_iommu *args);
 extern struct kvmppc_spapr_tce_table *kvmppc_find_tce_table(
struct kvm_vcpu *vcpu, unsigned long liobn);
 extern long kvmppc_emulated_validate_tce(unsigned long tce);
diff --git a/arch/powerpc/include/uapi/asm/kvm.h 
b/arch/powerpc/include/uapi/asm/kvm.h
index 0fb1a6e..cf82af4 100644
--- 

Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling

2013-05-29 Thread Scott Wood

On 05/28/2013 07:12:32 PM, Alexey Kardashevskiy wrote:

On 05/29/2013 09:35 AM, Scott Wood wrote:
 On 05/28/2013 06:30:40 PM, Alexey Kardashevskiy wrote:
   @@ -939,6 +940,9 @@ struct kvm_s390_ucas_mapping {
   #define KVM_GET_DEVICE_ATTR  _IOW(KVMIO,  0xe2, struct
   kvm_device_attr)
   #define KVM_HAS_DEVICE_ATTR  _IOW(KVMIO,  0xe3, struct
   kvm_device_attr)
  
   +/* ioctl for SPAPR TCE IOMMU */
   +#define KVM_CREATE_SPAPR_TCE_IOMMU _IOW(KVMIO,  0xe4,  
struct

   kvm_create_spapr_tce_iommu)
  
   Shouldn't this go under the vm ioctl section?
 
 
  The KVM_CREATE_SPAPR_TCE_IOMMU ioctl (the version for emulated
 devices) is
  in this section so I decided to keep them together. Wrong?
 
  You decided to keep KVM_CREATE_SPAPR_TCE_IOMMU together with
  KVM_CREATE_SPAPR_TCE_IOMMU?

 Yes.

 Sigh.  That's the same thing repeated.  There's only one IOCTL.   
Nothing is

 being kept together.

Sorry, I meant this ioctl - KVM_CREATE_SPAPR_TCE.


But you didn't put it in the same section as KVM_CREATE_SPAPR_TCE.   
0xe0 begins a different section.


-Scott
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling

2013-05-29 Thread Alexey Kardashevskiy
On 05/30/2013 06:05 AM, Scott Wood wrote:
 On 05/28/2013 07:12:32 PM, Alexey Kardashevskiy wrote:
 On 05/29/2013 09:35 AM, Scott Wood wrote:
  On 05/28/2013 06:30:40 PM, Alexey Kardashevskiy wrote:
@@ -939,6 +940,9 @@ struct kvm_s390_ucas_mapping {
#define KVM_GET_DEVICE_ATTR  _IOW(KVMIO,  0xe2, struct
kvm_device_attr)
#define KVM_HAS_DEVICE_ATTR  _IOW(KVMIO,  0xe3, struct
kvm_device_attr)
   
+/* ioctl for SPAPR TCE IOMMU */
+#define KVM_CREATE_SPAPR_TCE_IOMMU _IOW(KVMIO,  0xe4, struct
kvm_create_spapr_tce_iommu)
   
Shouldn't this go under the vm ioctl section?
  
  
   The KVM_CREATE_SPAPR_TCE_IOMMU ioctl (the version for emulated
  devices) is
   in this section so I decided to keep them together. Wrong?
  
   You decided to keep KVM_CREATE_SPAPR_TCE_IOMMU together with
   KVM_CREATE_SPAPR_TCE_IOMMU?
 
  Yes.
 
  Sigh.  That's the same thing repeated.  There's only one IOCTL. 
 Nothing is
  being kept together.

 Sorry, I meant this ioctl - KVM_CREATE_SPAPR_TCE.
 
 But you didn't put it in the same section as KVM_CREATE_SPAPR_TCE.  0xe0
 begins a different section.

It is not really obvious that there are sections as no comment defines
those :) But yes, makes sense to move it up a bit and change the code to 0xad.



-- 
Alexey
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling

2013-05-29 Thread Scott Wood

On 05/29/2013 06:10:33 PM, Alexey Kardashevskiy wrote:

On 05/30/2013 06:05 AM, Scott Wood wrote:
 On 05/28/2013 07:12:32 PM, Alexey Kardashevskiy wrote:
 On 05/29/2013 09:35 AM, Scott Wood wrote:
  On 05/28/2013 06:30:40 PM, Alexey Kardashevskiy wrote:
@@ -939,6 +940,9 @@ struct kvm_s390_ucas_mapping {
#define KVM_GET_DEVICE_ATTR  _IOW(KVMIO,  0xe2,  
struct

kvm_device_attr)
#define KVM_HAS_DEVICE_ATTR  _IOW(KVMIO,  0xe3,  
struct

kvm_device_attr)
   
+/* ioctl for SPAPR TCE IOMMU */
+#define KVM_CREATE_SPAPR_TCE_IOMMU _IOW(KVMIO,  0xe4,  
struct

kvm_create_spapr_tce_iommu)
   
Shouldn't this go under the vm ioctl section?
  
  
   The KVM_CREATE_SPAPR_TCE_IOMMU ioctl (the version for  
emulated

  devices) is
   in this section so I decided to keep them together. Wrong?
  
   You decided to keep KVM_CREATE_SPAPR_TCE_IOMMU together with
   KVM_CREATE_SPAPR_TCE_IOMMU?
 
  Yes.
 
  Sigh.  That's the same thing repeated.  There's only one IOCTL.
 Nothing is
  being kept together.

 Sorry, I meant this ioctl - KVM_CREATE_SPAPR_TCE.

 But you didn't put it in the same section as KVM_CREATE_SPAPR_TCE.   
0xe0

 begins a different section.

It is not really obvious that there are sections as no comment defines
those :)


There is a comment /* ioctls for fds returned by KVM_CREATE_DEVICE */

Putting KVM_CREATE_DEVICE in there was mainly to avoid dealing with the  
ioctl number conflict mess in the vm-ioctl section, but at least that  
one is related to the device control API. :-)



But yes, makes sense to move it up a bit and change the code to 0xad.


0xad is KVM_KVMCLOCK_CTRL

-Scott
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling

2013-05-29 Thread Alexey Kardashevskiy
On 05/30/2013 09:14 AM, Scott Wood wrote:
 On 05/29/2013 06:10:33 PM, Alexey Kardashevskiy wrote:
 On 05/30/2013 06:05 AM, Scott Wood wrote:
  On 05/28/2013 07:12:32 PM, Alexey Kardashevskiy wrote:
  On 05/29/2013 09:35 AM, Scott Wood wrote:
   On 05/28/2013 06:30:40 PM, Alexey Kardashevskiy wrote:
 @@ -939,6 +940,9 @@ struct kvm_s390_ucas_mapping {
 #define KVM_GET_DEVICE_ATTR  _IOW(KVMIO,  0xe2, struct
 kvm_device_attr)
 #define KVM_HAS_DEVICE_ATTR  _IOW(KVMIO,  0xe3, struct
 kvm_device_attr)

 +/* ioctl for SPAPR TCE IOMMU */
 +#define KVM_CREATE_SPAPR_TCE_IOMMU _IOW(KVMIO,  0xe4, struct
 kvm_create_spapr_tce_iommu)

 Shouldn't this go under the vm ioctl section?
   
   
The KVM_CREATE_SPAPR_TCE_IOMMU ioctl (the version for emulated
   devices) is
in this section so I decided to keep them together. Wrong?
   
You decided to keep KVM_CREATE_SPAPR_TCE_IOMMU together with
KVM_CREATE_SPAPR_TCE_IOMMU?
  
   Yes.
  
   Sigh.  That's the same thing repeated.  There's only one IOCTL.
  Nothing is
   being kept together.
 
  Sorry, I meant this ioctl - KVM_CREATE_SPAPR_TCE.
 
  But you didn't put it in the same section as KVM_CREATE_SPAPR_TCE.  0xe0
  begins a different section.

 It is not really obvious that there are sections as no comment defines
 those :)
 
 There is a comment /* ioctls for fds returned by KVM_CREATE_DEVICE */
 
 Putting KVM_CREATE_DEVICE in there was mainly to avoid dealing with the
 ioctl number conflict mess in the vm-ioctl section, but at least that one
 is related to the device control API. :-)
 
 But yes, makes sense to move it up a bit and change the code to 0xad.
 
 0xad is KVM_KVMCLOCK_CTRL

That's it. I am _completely_ confused now. No system whatsoever :(
What rule should I use in order to choose the number for my new ioctl? :)



-- 
Alexey
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling

2013-05-29 Thread Scott Wood

On 05/29/2013 06:29:13 PM, Alexey Kardashevskiy wrote:

On 05/30/2013 09:14 AM, Scott Wood wrote:
 On 05/29/2013 06:10:33 PM, Alexey Kardashevskiy wrote:
 On 05/30/2013 06:05 AM, Scott Wood wrote:
  But you didn't put it in the same section as  
KVM_CREATE_SPAPR_TCE.  0xe0

  begins a different section.

 It is not really obvious that there are sections as no comment  
defines

 those :)

 There is a comment /* ioctls for fds returned by KVM_CREATE_DEVICE  
*/


 Putting KVM_CREATE_DEVICE in there was mainly to avoid dealing with  
the
 ioctl number conflict mess in the vm-ioctl section, but at least  
that one

 is related to the device control API. :-)

 But yes, makes sense to move it up a bit and change the code to  
0xad.


 0xad is KVM_KVMCLOCK_CTRL

That's it. I am _completely_ confused now. No system whatsoever :(
What rule should I use in order to choose the number for my new  
ioctl? :)


Yeah, it's a mess.  0xaf seems to be free. :-)

-Scott
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling

2013-05-29 Thread Scott Wood

On 05/28/2013 07:12:32 PM, Alexey Kardashevskiy wrote:

On 05/29/2013 09:35 AM, Scott Wood wrote:
 On 05/28/2013 06:30:40 PM, Alexey Kardashevskiy wrote:
   @@ -939,6 +940,9 @@ struct kvm_s390_ucas_mapping {
   #define KVM_GET_DEVICE_ATTR  _IOW(KVMIO,  0xe2, struct
   kvm_device_attr)
   #define KVM_HAS_DEVICE_ATTR  _IOW(KVMIO,  0xe3, struct
   kvm_device_attr)
  
   +/* ioctl for SPAPR TCE IOMMU */
   +#define KVM_CREATE_SPAPR_TCE_IOMMU _IOW(KVMIO,  0xe4,  
struct

   kvm_create_spapr_tce_iommu)
  
   Shouldn't this go under the vm ioctl section?
 
 
  The KVM_CREATE_SPAPR_TCE_IOMMU ioctl (the version for emulated
 devices) is
  in this section so I decided to keep them together. Wrong?
 
  You decided to keep KVM_CREATE_SPAPR_TCE_IOMMU together with
  KVM_CREATE_SPAPR_TCE_IOMMU?

 Yes.

 Sigh.  That's the same thing repeated.  There's only one IOCTL.   
Nothing is

 being kept together.

Sorry, I meant this ioctl - KVM_CREATE_SPAPR_TCE.


But you didn't put it in the same section as KVM_CREATE_SPAPR_TCE.   
0xe0 begins a different section.


-Scott
--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling

2013-05-29 Thread Alexey Kardashevskiy
On 05/30/2013 06:05 AM, Scott Wood wrote:
 On 05/28/2013 07:12:32 PM, Alexey Kardashevskiy wrote:
 On 05/29/2013 09:35 AM, Scott Wood wrote:
  On 05/28/2013 06:30:40 PM, Alexey Kardashevskiy wrote:
@@ -939,6 +940,9 @@ struct kvm_s390_ucas_mapping {
#define KVM_GET_DEVICE_ATTR  _IOW(KVMIO,  0xe2, struct
kvm_device_attr)
#define KVM_HAS_DEVICE_ATTR  _IOW(KVMIO,  0xe3, struct
kvm_device_attr)
   
+/* ioctl for SPAPR TCE IOMMU */
+#define KVM_CREATE_SPAPR_TCE_IOMMU _IOW(KVMIO,  0xe4, struct
kvm_create_spapr_tce_iommu)
   
Shouldn't this go under the vm ioctl section?
  
  
   The KVM_CREATE_SPAPR_TCE_IOMMU ioctl (the version for emulated
  devices) is
   in this section so I decided to keep them together. Wrong?
  
   You decided to keep KVM_CREATE_SPAPR_TCE_IOMMU together with
   KVM_CREATE_SPAPR_TCE_IOMMU?
 
  Yes.
 
  Sigh.  That's the same thing repeated.  There's only one IOCTL. 
 Nothing is
  being kept together.

 Sorry, I meant this ioctl - KVM_CREATE_SPAPR_TCE.
 
 But you didn't put it in the same section as KVM_CREATE_SPAPR_TCE.  0xe0
 begins a different section.

It is not really obvious that there are sections as no comment defines
those :) But yes, makes sense to move it up a bit and change the code to 0xad.



-- 
Alexey
--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling

2013-05-29 Thread Scott Wood

On 05/29/2013 06:10:33 PM, Alexey Kardashevskiy wrote:

On 05/30/2013 06:05 AM, Scott Wood wrote:
 On 05/28/2013 07:12:32 PM, Alexey Kardashevskiy wrote:
 On 05/29/2013 09:35 AM, Scott Wood wrote:
  On 05/28/2013 06:30:40 PM, Alexey Kardashevskiy wrote:
@@ -939,6 +940,9 @@ struct kvm_s390_ucas_mapping {
#define KVM_GET_DEVICE_ATTR  _IOW(KVMIO,  0xe2,  
struct

kvm_device_attr)
#define KVM_HAS_DEVICE_ATTR  _IOW(KVMIO,  0xe3,  
struct

kvm_device_attr)
   
+/* ioctl for SPAPR TCE IOMMU */
+#define KVM_CREATE_SPAPR_TCE_IOMMU _IOW(KVMIO,  0xe4,  
struct

kvm_create_spapr_tce_iommu)
   
Shouldn't this go under the vm ioctl section?
  
  
   The KVM_CREATE_SPAPR_TCE_IOMMU ioctl (the version for  
emulated

  devices) is
   in this section so I decided to keep them together. Wrong?
  
   You decided to keep KVM_CREATE_SPAPR_TCE_IOMMU together with
   KVM_CREATE_SPAPR_TCE_IOMMU?
 
  Yes.
 
  Sigh.  That's the same thing repeated.  There's only one IOCTL.
 Nothing is
  being kept together.

 Sorry, I meant this ioctl - KVM_CREATE_SPAPR_TCE.

 But you didn't put it in the same section as KVM_CREATE_SPAPR_TCE.   
0xe0

 begins a different section.

It is not really obvious that there are sections as no comment defines
those :)


There is a comment /* ioctls for fds returned by KVM_CREATE_DEVICE */

Putting KVM_CREATE_DEVICE in there was mainly to avoid dealing with the  
ioctl number conflict mess in the vm-ioctl section, but at least that  
one is related to the device control API. :-)



But yes, makes sense to move it up a bit and change the code to 0xad.


0xad is KVM_KVMCLOCK_CTRL

-Scott
--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling

2013-05-29 Thread Alexey Kardashevskiy
On 05/30/2013 09:14 AM, Scott Wood wrote:
 On 05/29/2013 06:10:33 PM, Alexey Kardashevskiy wrote:
 On 05/30/2013 06:05 AM, Scott Wood wrote:
  On 05/28/2013 07:12:32 PM, Alexey Kardashevskiy wrote:
  On 05/29/2013 09:35 AM, Scott Wood wrote:
   On 05/28/2013 06:30:40 PM, Alexey Kardashevskiy wrote:
 @@ -939,6 +940,9 @@ struct kvm_s390_ucas_mapping {
 #define KVM_GET_DEVICE_ATTR  _IOW(KVMIO,  0xe2, struct
 kvm_device_attr)
 #define KVM_HAS_DEVICE_ATTR  _IOW(KVMIO,  0xe3, struct
 kvm_device_attr)

 +/* ioctl for SPAPR TCE IOMMU */
 +#define KVM_CREATE_SPAPR_TCE_IOMMU _IOW(KVMIO,  0xe4, struct
 kvm_create_spapr_tce_iommu)

 Shouldn't this go under the vm ioctl section?
   
   
The KVM_CREATE_SPAPR_TCE_IOMMU ioctl (the version for emulated
   devices) is
in this section so I decided to keep them together. Wrong?
   
You decided to keep KVM_CREATE_SPAPR_TCE_IOMMU together with
KVM_CREATE_SPAPR_TCE_IOMMU?
  
   Yes.
  
   Sigh.  That's the same thing repeated.  There's only one IOCTL.
  Nothing is
   being kept together.
 
  Sorry, I meant this ioctl - KVM_CREATE_SPAPR_TCE.
 
  But you didn't put it in the same section as KVM_CREATE_SPAPR_TCE.  0xe0
  begins a different section.

 It is not really obvious that there are sections as no comment defines
 those :)
 
 There is a comment /* ioctls for fds returned by KVM_CREATE_DEVICE */
 
 Putting KVM_CREATE_DEVICE in there was mainly to avoid dealing with the
 ioctl number conflict mess in the vm-ioctl section, but at least that one
 is related to the device control API. :-)
 
 But yes, makes sense to move it up a bit and change the code to 0xad.
 
 0xad is KVM_KVMCLOCK_CTRL

That's it. I am _completely_ confused now. No system whatsoever :(
What rule should I use in order to choose the number for my new ioctl? :)



-- 
Alexey
--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling

2013-05-28 Thread Scott Wood

On 05/24/2013 09:45:24 PM, David Gibson wrote:

On Wed, May 22, 2013 at 04:06:57PM -0500, Scott Wood wrote:
 On 05/20/2013 10:06:46 PM, Alexey Kardashevskiy wrote:
 diff --git a/arch/powerpc/kvm/powerpc.c  
b/arch/powerpc/kvm/powerpc.c

 index 8465c2a..da6bf61 100644
 --- a/arch/powerpc/kvm/powerpc.c
 @@ -396,6 +396,7 @@ int kvm_dev_ioctl_check_extension(long ext)
 +++ b/arch/powerpc/kvm/powerpc.c
break;
  #endif
case KVM_CAP_SPAPR_MULTITCE:
 +  case KVM_CAP_SPAPR_TCE_IOMMU:
r = 1;
break;
default:

 Don't advertise SPAPR capabilities if it's not book3s -- and
 probably there's some additional limitation that would be
 appropriate.

So, in the case of MULTITCE, that's not quite right.  PR KVM can
emulate a PAPR system on a BookE machine, and there's no reason not to
allow TCE acceleration as well.


That might (or might not; consider things like Altivec versus SPE  
opcode conflict, whether unimplemented SPRs trap, behavior of  
unprivileged SPRs/instructions, etc) be true in theory, but it's not  
currently a supported configuration.  BookE KVM does not support  
emulating a different CPU than the host.  In the unlikely case that  
ever changes to the point of allowing PAPR guests on a BookE host, then  
we can revisit how to properly determine whether the capability is  
supported, but for now the capability will never be valid in the  
CONFIG_BOOKE case (though I'd rather see it depend on an appropriate  
book3s symbol than depend on !BOOKE).


Or we could just leave it as is, and let it indicate whether the host  
kernel supports the feature in general, with the user needing to  
understand when it's applicable...  I'm a bit confused by the  
documentation, however -- the MULTITCE capability was documented in the  
capabilities that can be enabled section, but I don't see where it  
can be enabled.


-Scott
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling

2013-05-28 Thread Scott Wood

On 05/26/2013 09:44:24 PM, Alexey Kardashevskiy wrote:

On 05/25/2013 12:45 PM, David Gibson wrote:
 On Wed, May 22, 2013 at 04:06:57PM -0500, Scott Wood wrote:
 On 05/20/2013 10:06:46 PM, Alexey Kardashevskiy wrote:
 diff --git a/arch/powerpc/kvm/powerpc.c  
b/arch/powerpc/kvm/powerpc.c

 index 8465c2a..da6bf61 100644
 --- a/arch/powerpc/kvm/powerpc.c
 @@ -396,6 +396,7 @@ int kvm_dev_ioctl_check_extension(long ext)
 +++ b/arch/powerpc/kvm/powerpc.c
break;
 #endif
case KVM_CAP_SPAPR_MULTITCE:
 +  case KVM_CAP_SPAPR_TCE_IOMMU:
r = 1;
break;
default:

 Don't advertise SPAPR capabilities if it's not book3s -- and
 probably there's some additional limitation that would be
 appropriate.

 So, in the case of MULTITCE, that's not quite right.  PR KVM can
 emulate a PAPR system on a BookE machine, and there's no reason not  
to

 allow TCE acceleration as well.  We can't make it dependent on PAPR
 mode being selected, because that's enabled per-vcpu, whereas these
 capabilities are queried on the VM before the vcpus are created.

 CAP_SPAPR_TCE_IOMMU should be dependent on the presence of suitable
 host side hardware (i.e. a PAPR style IOMMU), though.


The capability says that the ioctl is supported. If there is no IOMMU  
group
registered, than it will fail with a reasonable error and nobody gets  
hurt.

What is the problem?


You could say that about a lot of the capabilities that just advertise  
the existence of new ioctls. :-)


Sometimes it's nice to know in advance whether it's supported, before  
actually requesting that something happen.



 @@ -939,6 +940,9 @@ struct kvm_s390_ucas_mapping {
 #define KVM_GET_DEVICE_ATTR  _IOW(KVMIO,  0xe2, struct
 kvm_device_attr)
 #define KVM_HAS_DEVICE_ATTR  _IOW(KVMIO,  0xe3, struct
 kvm_device_attr)

 +/* ioctl for SPAPR TCE IOMMU */
 +#define KVM_CREATE_SPAPR_TCE_IOMMU _IOW(KVMIO,  0xe4, struct
 kvm_create_spapr_tce_iommu)

 Shouldn't this go under the vm ioctl section?


The KVM_CREATE_SPAPR_TCE_IOMMU ioctl (the version for emulated  
devices) is

in this section so I decided to keep them together. Wrong?


You decided to keep KVM_CREATE_SPAPR_TCE_IOMMU together with  
KVM_CREATE_SPAPR_TCE_IOMMU?


-Scott
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling

2013-05-28 Thread Alexey Kardashevskiy
On 05/29/2013 03:45 AM, Scott Wood wrote:
 On 05/26/2013 09:44:24 PM, Alexey Kardashevskiy wrote:
 On 05/25/2013 12:45 PM, David Gibson wrote:
  On Wed, May 22, 2013 at 04:06:57PM -0500, Scott Wood wrote:
  On 05/20/2013 10:06:46 PM, Alexey Kardashevskiy wrote:
  diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
  index 8465c2a..da6bf61 100644
  --- a/arch/powerpc/kvm/powerpc.c
  @@ -396,6 +396,7 @@ int kvm_dev_ioctl_check_extension(long ext)
  +++ b/arch/powerpc/kvm/powerpc.c
  break;
  #endif
  case KVM_CAP_SPAPR_MULTITCE:
  +case KVM_CAP_SPAPR_TCE_IOMMU:
  r = 1;
  break;
  default:
 
  Don't advertise SPAPR capabilities if it's not book3s -- and
  probably there's some additional limitation that would be
  appropriate.
 
  So, in the case of MULTITCE, that's not quite right.  PR KVM can
  emulate a PAPR system on a BookE machine, and there's no reason not to
  allow TCE acceleration as well.  We can't make it dependent on PAPR
  mode being selected, because that's enabled per-vcpu, whereas these
  capabilities are queried on the VM before the vcpus are created.
 
  CAP_SPAPR_TCE_IOMMU should be dependent on the presence of suitable
  host side hardware (i.e. a PAPR style IOMMU), though.


 The capability says that the ioctl is supported. If there is no IOMMU group
 registered, than it will fail with a reasonable error and nobody gets hurt.
 What is the problem?
 
 You could say that about a lot of the capabilities that just advertise the
 existence of new ioctls. :-)
 
 Sometimes it's nice to know in advance whether it's supported, before
 actually requesting that something happen.

Yes, would be nice. There is just no quick way to know if this real system
supports IOMMU groups. I could add another helper to generic IOMMU code
which would return the number of registered IOMMU groups but it is a bit
too much :)


  @@ -939,6 +940,9 @@ struct kvm_s390_ucas_mapping {
  #define KVM_GET_DEVICE_ATTR  _IOW(KVMIO,  0xe2, struct
  kvm_device_attr)
  #define KVM_HAS_DEVICE_ATTR  _IOW(KVMIO,  0xe3, struct
  kvm_device_attr)
 
  +/* ioctl for SPAPR TCE IOMMU */
  +#define KVM_CREATE_SPAPR_TCE_IOMMU _IOW(KVMIO,  0xe4, struct
  kvm_create_spapr_tce_iommu)
 
  Shouldn't this go under the vm ioctl section?


 The KVM_CREATE_SPAPR_TCE_IOMMU ioctl (the version for emulated devices) is
 in this section so I decided to keep them together. Wrong?
 
 You decided to keep KVM_CREATE_SPAPR_TCE_IOMMU together with
 KVM_CREATE_SPAPR_TCE_IOMMU?

Yes.


-- 
Alexey
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling

2013-05-28 Thread Scott Wood

On 05/28/2013 06:30:40 PM, Alexey Kardashevskiy wrote:

  @@ -939,6 +940,9 @@ struct kvm_s390_ucas_mapping {
  #define KVM_GET_DEVICE_ATTR  _IOW(KVMIO,  0xe2, struct
  kvm_device_attr)
  #define KVM_HAS_DEVICE_ATTR  _IOW(KVMIO,  0xe3, struct
  kvm_device_attr)
 
  +/* ioctl for SPAPR TCE IOMMU */
  +#define KVM_CREATE_SPAPR_TCE_IOMMU _IOW(KVMIO,  0xe4, struct
  kvm_create_spapr_tce_iommu)
 
  Shouldn't this go under the vm ioctl section?


 The KVM_CREATE_SPAPR_TCE_IOMMU ioctl (the version for emulated  
devices) is

 in this section so I decided to keep them together. Wrong?

 You decided to keep KVM_CREATE_SPAPR_TCE_IOMMU together with
 KVM_CREATE_SPAPR_TCE_IOMMU?

Yes.


Sigh.  That's the same thing repeated.  There's only one IOCTL.   
Nothing is being kept together.


-Scott
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling

2013-05-28 Thread Alexey Kardashevskiy
On 05/29/2013 09:35 AM, Scott Wood wrote:
 On 05/28/2013 06:30:40 PM, Alexey Kardashevskiy wrote:
   @@ -939,6 +940,9 @@ struct kvm_s390_ucas_mapping {
   #define KVM_GET_DEVICE_ATTR  _IOW(KVMIO,  0xe2, struct
   kvm_device_attr)
   #define KVM_HAS_DEVICE_ATTR  _IOW(KVMIO,  0xe3, struct
   kvm_device_attr)
  
   +/* ioctl for SPAPR TCE IOMMU */
   +#define KVM_CREATE_SPAPR_TCE_IOMMU _IOW(KVMIO,  0xe4, struct
   kvm_create_spapr_tce_iommu)
  
   Shouldn't this go under the vm ioctl section?
 
 
  The KVM_CREATE_SPAPR_TCE_IOMMU ioctl (the version for emulated
 devices) is
  in this section so I decided to keep them together. Wrong?
 
  You decided to keep KVM_CREATE_SPAPR_TCE_IOMMU together with
  KVM_CREATE_SPAPR_TCE_IOMMU?

 Yes.
 
 Sigh.  That's the same thing repeated.  There's only one IOCTL.  Nothing is
 being kept together.

Sorry, I meant this ioctl - KVM_CREATE_SPAPR_TCE.


-- 
Alexey
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling

2013-05-28 Thread Alexey Kardashevskiy
On 05/29/2013 02:32 AM, Scott Wood wrote:
 On 05/24/2013 09:45:24 PM, David Gibson wrote:
 On Wed, May 22, 2013 at 04:06:57PM -0500, Scott Wood wrote:
  On 05/20/2013 10:06:46 PM, Alexey Kardashevskiy wrote:
  diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
  index 8465c2a..da6bf61 100644
  --- a/arch/powerpc/kvm/powerpc.c
  @@ -396,6 +396,7 @@ int kvm_dev_ioctl_check_extension(long ext)
  +++ b/arch/powerpc/kvm/powerpc.c
   break;
   #endif
   case KVM_CAP_SPAPR_MULTITCE:
  +case KVM_CAP_SPAPR_TCE_IOMMU:
   r = 1;
   break;
   default:
 
  Don't advertise SPAPR capabilities if it's not book3s -- and
  probably there's some additional limitation that would be
  appropriate.

 So, in the case of MULTITCE, that's not quite right.  PR KVM can
 emulate a PAPR system on a BookE machine, and there's no reason not to
 allow TCE acceleration as well.
 
 That might (or might not; consider things like Altivec versus SPE opcode
 conflict, whether unimplemented SPRs trap, behavior of unprivileged
 SPRs/instructions, etc) be true in theory, but it's not currently a
 supported configuration.  BookE KVM does not support emulating a different
 CPU than the host.  In the unlikely case that ever changes to the point of
 allowing PAPR guests on a BookE host, then we can revisit how to properly
 determine whether the capability is supported, but for now the capability
 will never be valid in the CONFIG_BOOKE case (though I'd rather see it
 depend on an appropriate book3s symbol than depend on !BOOKE).
 
 Or we could just leave it as is, and let it indicate whether the host
 kernel supports the feature in general, with the user needing to understand
 when it's applicable...  I'm a bit confused by the documentation, however
 -- the MULTITCE capability was documented in the capabilities that can be
 enabled section, but I don't see where it can be enabled.


True, it cannot be enabled (but it could be enabled a long time ago), it is
either supported or not, I'll fix the documentation. Thanks!


-- 
Alexey
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling

2013-05-28 Thread Scott Wood

On 05/24/2013 09:45:24 PM, David Gibson wrote:

On Wed, May 22, 2013 at 04:06:57PM -0500, Scott Wood wrote:
 On 05/20/2013 10:06:46 PM, Alexey Kardashevskiy wrote:
 diff --git a/arch/powerpc/kvm/powerpc.c  
b/arch/powerpc/kvm/powerpc.c

 index 8465c2a..da6bf61 100644
 --- a/arch/powerpc/kvm/powerpc.c
 @@ -396,6 +396,7 @@ int kvm_dev_ioctl_check_extension(long ext)
 +++ b/arch/powerpc/kvm/powerpc.c
break;
  #endif
case KVM_CAP_SPAPR_MULTITCE:
 +  case KVM_CAP_SPAPR_TCE_IOMMU:
r = 1;
break;
default:

 Don't advertise SPAPR capabilities if it's not book3s -- and
 probably there's some additional limitation that would be
 appropriate.

So, in the case of MULTITCE, that's not quite right.  PR KVM can
emulate a PAPR system on a BookE machine, and there's no reason not to
allow TCE acceleration as well.


That might (or might not; consider things like Altivec versus SPE  
opcode conflict, whether unimplemented SPRs trap, behavior of  
unprivileged SPRs/instructions, etc) be true in theory, but it's not  
currently a supported configuration.  BookE KVM does not support  
emulating a different CPU than the host.  In the unlikely case that  
ever changes to the point of allowing PAPR guests on a BookE host, then  
we can revisit how to properly determine whether the capability is  
supported, but for now the capability will never be valid in the  
CONFIG_BOOKE case (though I'd rather see it depend on an appropriate  
book3s symbol than depend on !BOOKE).


Or we could just leave it as is, and let it indicate whether the host  
kernel supports the feature in general, with the user needing to  
understand when it's applicable...  I'm a bit confused by the  
documentation, however -- the MULTITCE capability was documented in the  
capabilities that can be enabled section, but I don't see where it  
can be enabled.


-Scott
--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling

2013-05-28 Thread Scott Wood

On 05/26/2013 09:44:24 PM, Alexey Kardashevskiy wrote:

On 05/25/2013 12:45 PM, David Gibson wrote:
 On Wed, May 22, 2013 at 04:06:57PM -0500, Scott Wood wrote:
 On 05/20/2013 10:06:46 PM, Alexey Kardashevskiy wrote:
 diff --git a/arch/powerpc/kvm/powerpc.c  
b/arch/powerpc/kvm/powerpc.c

 index 8465c2a..da6bf61 100644
 --- a/arch/powerpc/kvm/powerpc.c
 @@ -396,6 +396,7 @@ int kvm_dev_ioctl_check_extension(long ext)
 +++ b/arch/powerpc/kvm/powerpc.c
break;
 #endif
case KVM_CAP_SPAPR_MULTITCE:
 +  case KVM_CAP_SPAPR_TCE_IOMMU:
r = 1;
break;
default:

 Don't advertise SPAPR capabilities if it's not book3s -- and
 probably there's some additional limitation that would be
 appropriate.

 So, in the case of MULTITCE, that's not quite right.  PR KVM can
 emulate a PAPR system on a BookE machine, and there's no reason not  
to

 allow TCE acceleration as well.  We can't make it dependent on PAPR
 mode being selected, because that's enabled per-vcpu, whereas these
 capabilities are queried on the VM before the vcpus are created.

 CAP_SPAPR_TCE_IOMMU should be dependent on the presence of suitable
 host side hardware (i.e. a PAPR style IOMMU), though.


The capability says that the ioctl is supported. If there is no IOMMU  
group
registered, than it will fail with a reasonable error and nobody gets  
hurt.

What is the problem?


You could say that about a lot of the capabilities that just advertise  
the existence of new ioctls. :-)


Sometimes it's nice to know in advance whether it's supported, before  
actually requesting that something happen.



 @@ -939,6 +940,9 @@ struct kvm_s390_ucas_mapping {
 #define KVM_GET_DEVICE_ATTR  _IOW(KVMIO,  0xe2, struct
 kvm_device_attr)
 #define KVM_HAS_DEVICE_ATTR  _IOW(KVMIO,  0xe3, struct
 kvm_device_attr)

 +/* ioctl for SPAPR TCE IOMMU */
 +#define KVM_CREATE_SPAPR_TCE_IOMMU _IOW(KVMIO,  0xe4, struct
 kvm_create_spapr_tce_iommu)

 Shouldn't this go under the vm ioctl section?


The KVM_CREATE_SPAPR_TCE_IOMMU ioctl (the version for emulated  
devices) is

in this section so I decided to keep them together. Wrong?


You decided to keep KVM_CREATE_SPAPR_TCE_IOMMU together with  
KVM_CREATE_SPAPR_TCE_IOMMU?


-Scott
--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling

2013-05-28 Thread Alexey Kardashevskiy
On 05/29/2013 03:45 AM, Scott Wood wrote:
 On 05/26/2013 09:44:24 PM, Alexey Kardashevskiy wrote:
 On 05/25/2013 12:45 PM, David Gibson wrote:
  On Wed, May 22, 2013 at 04:06:57PM -0500, Scott Wood wrote:
  On 05/20/2013 10:06:46 PM, Alexey Kardashevskiy wrote:
  diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
  index 8465c2a..da6bf61 100644
  --- a/arch/powerpc/kvm/powerpc.c
  @@ -396,6 +396,7 @@ int kvm_dev_ioctl_check_extension(long ext)
  +++ b/arch/powerpc/kvm/powerpc.c
  break;
  #endif
  case KVM_CAP_SPAPR_MULTITCE:
  +case KVM_CAP_SPAPR_TCE_IOMMU:
  r = 1;
  break;
  default:
 
  Don't advertise SPAPR capabilities if it's not book3s -- and
  probably there's some additional limitation that would be
  appropriate.
 
  So, in the case of MULTITCE, that's not quite right.  PR KVM can
  emulate a PAPR system on a BookE machine, and there's no reason not to
  allow TCE acceleration as well.  We can't make it dependent on PAPR
  mode being selected, because that's enabled per-vcpu, whereas these
  capabilities are queried on the VM before the vcpus are created.
 
  CAP_SPAPR_TCE_IOMMU should be dependent on the presence of suitable
  host side hardware (i.e. a PAPR style IOMMU), though.


 The capability says that the ioctl is supported. If there is no IOMMU group
 registered, than it will fail with a reasonable error and nobody gets hurt.
 What is the problem?
 
 You could say that about a lot of the capabilities that just advertise the
 existence of new ioctls. :-)
 
 Sometimes it's nice to know in advance whether it's supported, before
 actually requesting that something happen.

Yes, would be nice. There is just no quick way to know if this real system
supports IOMMU groups. I could add another helper to generic IOMMU code
which would return the number of registered IOMMU groups but it is a bit
too much :)


  @@ -939,6 +940,9 @@ struct kvm_s390_ucas_mapping {
  #define KVM_GET_DEVICE_ATTR  _IOW(KVMIO,  0xe2, struct
  kvm_device_attr)
  #define KVM_HAS_DEVICE_ATTR  _IOW(KVMIO,  0xe3, struct
  kvm_device_attr)
 
  +/* ioctl for SPAPR TCE IOMMU */
  +#define KVM_CREATE_SPAPR_TCE_IOMMU _IOW(KVMIO,  0xe4, struct
  kvm_create_spapr_tce_iommu)
 
  Shouldn't this go under the vm ioctl section?


 The KVM_CREATE_SPAPR_TCE_IOMMU ioctl (the version for emulated devices) is
 in this section so I decided to keep them together. Wrong?
 
 You decided to keep KVM_CREATE_SPAPR_TCE_IOMMU together with
 KVM_CREATE_SPAPR_TCE_IOMMU?

Yes.


-- 
Alexey
--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling

2013-05-28 Thread Alexey Kardashevskiy
On 05/29/2013 09:35 AM, Scott Wood wrote:
 On 05/28/2013 06:30:40 PM, Alexey Kardashevskiy wrote:
   @@ -939,6 +940,9 @@ struct kvm_s390_ucas_mapping {
   #define KVM_GET_DEVICE_ATTR  _IOW(KVMIO,  0xe2, struct
   kvm_device_attr)
   #define KVM_HAS_DEVICE_ATTR  _IOW(KVMIO,  0xe3, struct
   kvm_device_attr)
  
   +/* ioctl for SPAPR TCE IOMMU */
   +#define KVM_CREATE_SPAPR_TCE_IOMMU _IOW(KVMIO,  0xe4, struct
   kvm_create_spapr_tce_iommu)
  
   Shouldn't this go under the vm ioctl section?
 
 
  The KVM_CREATE_SPAPR_TCE_IOMMU ioctl (the version for emulated
 devices) is
  in this section so I decided to keep them together. Wrong?
 
  You decided to keep KVM_CREATE_SPAPR_TCE_IOMMU together with
  KVM_CREATE_SPAPR_TCE_IOMMU?

 Yes.
 
 Sigh.  That's the same thing repeated.  There's only one IOCTL.  Nothing is
 being kept together.

Sorry, I meant this ioctl - KVM_CREATE_SPAPR_TCE.


-- 
Alexey
--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling

2013-05-28 Thread Alexey Kardashevskiy
On 05/29/2013 02:32 AM, Scott Wood wrote:
 On 05/24/2013 09:45:24 PM, David Gibson wrote:
 On Wed, May 22, 2013 at 04:06:57PM -0500, Scott Wood wrote:
  On 05/20/2013 10:06:46 PM, Alexey Kardashevskiy wrote:
  diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
  index 8465c2a..da6bf61 100644
  --- a/arch/powerpc/kvm/powerpc.c
  @@ -396,6 +396,7 @@ int kvm_dev_ioctl_check_extension(long ext)
  +++ b/arch/powerpc/kvm/powerpc.c
   break;
   #endif
   case KVM_CAP_SPAPR_MULTITCE:
  +case KVM_CAP_SPAPR_TCE_IOMMU:
   r = 1;
   break;
   default:
 
  Don't advertise SPAPR capabilities if it's not book3s -- and
  probably there's some additional limitation that would be
  appropriate.

 So, in the case of MULTITCE, that's not quite right.  PR KVM can
 emulate a PAPR system on a BookE machine, and there's no reason not to
 allow TCE acceleration as well.
 
 That might (or might not; consider things like Altivec versus SPE opcode
 conflict, whether unimplemented SPRs trap, behavior of unprivileged
 SPRs/instructions, etc) be true in theory, but it's not currently a
 supported configuration.  BookE KVM does not support emulating a different
 CPU than the host.  In the unlikely case that ever changes to the point of
 allowing PAPR guests on a BookE host, then we can revisit how to properly
 determine whether the capability is supported, but for now the capability
 will never be valid in the CONFIG_BOOKE case (though I'd rather see it
 depend on an appropriate book3s symbol than depend on !BOOKE).
 
 Or we could just leave it as is, and let it indicate whether the host
 kernel supports the feature in general, with the user needing to understand
 when it's applicable...  I'm a bit confused by the documentation, however
 -- the MULTITCE capability was documented in the capabilities that can be
 enabled section, but I don't see where it can be enabled.


True, it cannot be enabled (but it could be enabled a long time ago), it is
either supported or not, I'll fix the documentation. Thanks!


-- 
Alexey
--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling

2013-05-27 Thread Paolo Bonzini
Il 25/05/2013 04:45, David Gibson ha scritto:
 +   case KVM_CREATE_SPAPR_TCE_IOMMU: {
 +   struct kvm_create_spapr_tce_iommu create_tce_iommu;
 +   struct kvm *kvm = filp-private_data;
 +
 +   r = -EFAULT;
 +   if (copy_from_user(create_tce_iommu, argp,
 +   sizeof(create_tce_iommu)))
 +   goto out;
 +   r = kvm_vm_ioctl_create_spapr_tce_iommu(kvm,
 create_tce_iommu);
 +   goto out;
 +   }

Would it make sense to make this the only interface for creating TCEs?
That is, pass both a window_size and an IOMMU group id (or e.g. -1 for
no hardware IOMMU usage), and have a single ioctl for both cases?
There's some duplicated code between kvm_vm_ioctl_create_spapr_tce and
kvm_vm_ioctl_create_spapr_tce_iommu.

KVM_CREATE_SPAPR_TCE could stay for backwards-compatibility, or you
could just use a new capability and drop the old ioctl.  I'm not sure
whether you're already considering the ABI to be stable for kvmppc.

Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling

2013-05-27 Thread Alexey Kardashevskiy
On 05/27/2013 08:23 PM, Paolo Bonzini wrote:
 Il 25/05/2013 04:45, David Gibson ha scritto:
 +  case KVM_CREATE_SPAPR_TCE_IOMMU: {
 +  struct kvm_create_spapr_tce_iommu create_tce_iommu;
 +  struct kvm *kvm = filp-private_data;
 +
 +  r = -EFAULT;
 +  if (copy_from_user(create_tce_iommu, argp,
 +  sizeof(create_tce_iommu)))
 +  goto out;
 +  r = kvm_vm_ioctl_create_spapr_tce_iommu(kvm,
 create_tce_iommu);
 +  goto out;
 +  }
 
 Would it make sense to make this the only interface for creating TCEs?
 That is, pass both a window_size and an IOMMU group id (or e.g. -1 for
 no hardware IOMMU usage), and have a single ioctl for both cases?
 There's some duplicated code between kvm_vm_ioctl_create_spapr_tce and
 kvm_vm_ioctl_create_spapr_tce_iommu.

Just few bits. Is there really much sense in making one function from those
two? I tried, looked a bit messy.

 KVM_CREATE_SPAPR_TCE could stay for backwards-compatibility, or you
 could just use a new capability and drop the old ioctl.

The old capability+ioctl already exist for quite a while and few QEMU
versions supporting it were released so we do not want just drop it. So
then what is the benefit of having a new interface with support of both types?

  I'm not sure
 whether you're already considering the ABI to be stable for kvmppc.

Is any bit of KVM using it? Cannot see from Documentation/ABI.


-- 
Alexey
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling

2013-05-27 Thread Paolo Bonzini
Il 27/05/2013 16:26, Alexey Kardashevskiy ha scritto:
 On 05/27/2013 08:23 PM, Paolo Bonzini wrote:
 Il 25/05/2013 04:45, David Gibson ha scritto:
 + case KVM_CREATE_SPAPR_TCE_IOMMU: {
 + struct kvm_create_spapr_tce_iommu create_tce_iommu;
 + struct kvm *kvm = filp-private_data;
 +
 + r = -EFAULT;
 + if (copy_from_user(create_tce_iommu, argp,
 + sizeof(create_tce_iommu)))
 + goto out;
 + r = kvm_vm_ioctl_create_spapr_tce_iommu(kvm,
 create_tce_iommu);
 + goto out;
 + }

 Would it make sense to make this the only interface for creating TCEs?
 That is, pass both a window_size and an IOMMU group id (or e.g. -1 for
 no hardware IOMMU usage), and have a single ioctl for both cases?
 There's some duplicated code between kvm_vm_ioctl_create_spapr_tce and
 kvm_vm_ioctl_create_spapr_tce_iommu.
 
 Just few bits. Is there really much sense in making one function from those
 two? I tried, looked a bit messy.

Cannot really tell without the userspace bits.  But ioctl proliferation
is what the device and one_reg APIs were supposed to avoid...

 KVM_CREATE_SPAPR_TCE could stay for backwards-compatibility, or you
 could just use a new capability and drop the old ioctl.
 
 The old capability+ioctl already exist for quite a while and few QEMU
 versions supporting it were released so we do not want just drop it. So
 then what is the benefit of having a new interface with support of both types?
 
  I'm not sure
 whether you're already considering the ABI to be stable for kvmppc.
 
 Is any bit of KVM using it? Cannot see from Documentation/ABI.

I mean the userspace ABI (ioctls).

Paolo

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling

2013-05-27 Thread Paolo Bonzini
Il 25/05/2013 04:45, David Gibson ha scritto:
 +   case KVM_CREATE_SPAPR_TCE_IOMMU: {
 +   struct kvm_create_spapr_tce_iommu create_tce_iommu;
 +   struct kvm *kvm = filp-private_data;
 +
 +   r = -EFAULT;
 +   if (copy_from_user(create_tce_iommu, argp,
 +   sizeof(create_tce_iommu)))
 +   goto out;
 +   r = kvm_vm_ioctl_create_spapr_tce_iommu(kvm,
 create_tce_iommu);
 +   goto out;
 +   }

Would it make sense to make this the only interface for creating TCEs?
That is, pass both a window_size and an IOMMU group id (or e.g. -1 for
no hardware IOMMU usage), and have a single ioctl for both cases?
There's some duplicated code between kvm_vm_ioctl_create_spapr_tce and
kvm_vm_ioctl_create_spapr_tce_iommu.

KVM_CREATE_SPAPR_TCE could stay for backwards-compatibility, or you
could just use a new capability and drop the old ioctl.  I'm not sure
whether you're already considering the ABI to be stable for kvmppc.

Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling

2013-05-27 Thread Alexey Kardashevskiy
On 05/27/2013 08:23 PM, Paolo Bonzini wrote:
 Il 25/05/2013 04:45, David Gibson ha scritto:
 +  case KVM_CREATE_SPAPR_TCE_IOMMU: {
 +  struct kvm_create_spapr_tce_iommu create_tce_iommu;
 +  struct kvm *kvm = filp-private_data;
 +
 +  r = -EFAULT;
 +  if (copy_from_user(create_tce_iommu, argp,
 +  sizeof(create_tce_iommu)))
 +  goto out;
 +  r = kvm_vm_ioctl_create_spapr_tce_iommu(kvm,
 create_tce_iommu);
 +  goto out;
 +  }
 
 Would it make sense to make this the only interface for creating TCEs?
 That is, pass both a window_size and an IOMMU group id (or e.g. -1 for
 no hardware IOMMU usage), and have a single ioctl for both cases?
 There's some duplicated code between kvm_vm_ioctl_create_spapr_tce and
 kvm_vm_ioctl_create_spapr_tce_iommu.

Just few bits. Is there really much sense in making one function from those
two? I tried, looked a bit messy.

 KVM_CREATE_SPAPR_TCE could stay for backwards-compatibility, or you
 could just use a new capability and drop the old ioctl.

The old capability+ioctl already exist for quite a while and few QEMU
versions supporting it were released so we do not want just drop it. So
then what is the benefit of having a new interface with support of both types?

  I'm not sure
 whether you're already considering the ABI to be stable for kvmppc.

Is any bit of KVM using it? Cannot see from Documentation/ABI.


-- 
Alexey
--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling

2013-05-26 Thread Alexey Kardashevskiy
On 05/25/2013 12:45 PM, David Gibson wrote:
 On Wed, May 22, 2013 at 04:06:57PM -0500, Scott Wood wrote:
 On 05/20/2013 10:06:46 PM, Alexey Kardashevskiy wrote:
 diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
 index 8465c2a..da6bf61 100644
 --- a/arch/powerpc/kvm/powerpc.c
 @@ -396,6 +396,7 @@ int kvm_dev_ioctl_check_extension(long ext)
 +++ b/arch/powerpc/kvm/powerpc.c
 break;
 #endif
 case KVM_CAP_SPAPR_MULTITCE:
 +   case KVM_CAP_SPAPR_TCE_IOMMU:
 r = 1;
 break;
 default:

 Don't advertise SPAPR capabilities if it's not book3s -- and
 probably there's some additional limitation that would be
 appropriate.
 
 So, in the case of MULTITCE, that's not quite right.  PR KVM can
 emulate a PAPR system on a BookE machine, and there's no reason not to
 allow TCE acceleration as well.  We can't make it dependent on PAPR
 mode being selected, because that's enabled per-vcpu, whereas these
 capabilities are queried on the VM before the vcpus are created.
 
 CAP_SPAPR_TCE_IOMMU should be dependent on the presence of suitable
 host side hardware (i.e. a PAPR style IOMMU), though.


The capability says that the ioctl is supported. If there is no IOMMU group
registered, than it will fail with a reasonable error and nobody gets hurt.
What is the problem?



 @@ -1025,6 +1026,17 @@ long kvm_arch_vm_ioctl(struct file *filp,
 r = kvm_vm_ioctl_create_spapr_tce(kvm, create_tce);
 goto out;
 }
 +   case KVM_CREATE_SPAPR_TCE_IOMMU: {
 +   struct kvm_create_spapr_tce_iommu create_tce_iommu;
 +   struct kvm *kvm = filp-private_data;
 +
 +   r = -EFAULT;
 +   if (copy_from_user(create_tce_iommu, argp,
 +   sizeof(create_tce_iommu)))
 +   goto out;
 +   r = kvm_vm_ioctl_create_spapr_tce_iommu(kvm,
 create_tce_iommu);
 +   goto out;
 +   }
 #endif /* CONFIG_PPC_BOOK3S_64 */

 #ifdef CONFIG_KVM_BOOK3S_64_HV
 diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
 index 5a2afda..450c82a 100644
 --- a/include/uapi/linux/kvm.h
 +++ b/include/uapi/linux/kvm.h
 @@ -667,6 +667,7 @@ struct kvm_ppc_smmu_info {
 #define KVM_CAP_PPC_RTAS 91
 #define KVM_CAP_IRQ_XICS 92
 #define KVM_CAP_SPAPR_MULTITCE (0x11 + 89)
 +#define KVM_CAP_SPAPR_TCE_IOMMU (0x11 + 90)

 Hmm...
 
 Ah, yeah, that needs to be fixed.  Those were interim numbers so that
 we didn't have to keep changing our internal trees as new upstream
 ioctls got added to the list.  We need to get a proper number for the
 merge, though.

 @@ -939,6 +940,9 @@ struct kvm_s390_ucas_mapping {
 #define KVM_GET_DEVICE_ATTR   _IOW(KVMIO,  0xe2, struct
 kvm_device_attr)
 #define KVM_HAS_DEVICE_ATTR   _IOW(KVMIO,  0xe3, struct
 kvm_device_attr)

 +/* ioctl for SPAPR TCE IOMMU */
 +#define KVM_CREATE_SPAPR_TCE_IOMMU _IOW(KVMIO,  0xe4, struct
 kvm_create_spapr_tce_iommu)

 Shouldn't this go under the vm ioctl section?


The KVM_CREATE_SPAPR_TCE_IOMMU ioctl (the version for emulated devices) is
in this section so I decided to keep them together. Wrong?


-- 
Alexey
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling

2013-05-26 Thread Alexey Kardashevskiy
On 05/25/2013 12:45 PM, David Gibson wrote:
 On Wed, May 22, 2013 at 04:06:57PM -0500, Scott Wood wrote:
 On 05/20/2013 10:06:46 PM, Alexey Kardashevskiy wrote:
 diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
 index 8465c2a..da6bf61 100644
 --- a/arch/powerpc/kvm/powerpc.c
 @@ -396,6 +396,7 @@ int kvm_dev_ioctl_check_extension(long ext)
 +++ b/arch/powerpc/kvm/powerpc.c
 break;
 #endif
 case KVM_CAP_SPAPR_MULTITCE:
 +   case KVM_CAP_SPAPR_TCE_IOMMU:
 r = 1;
 break;
 default:

 Don't advertise SPAPR capabilities if it's not book3s -- and
 probably there's some additional limitation that would be
 appropriate.
 
 So, in the case of MULTITCE, that's not quite right.  PR KVM can
 emulate a PAPR system on a BookE machine, and there's no reason not to
 allow TCE acceleration as well.  We can't make it dependent on PAPR
 mode being selected, because that's enabled per-vcpu, whereas these
 capabilities are queried on the VM before the vcpus are created.
 
 CAP_SPAPR_TCE_IOMMU should be dependent on the presence of suitable
 host side hardware (i.e. a PAPR style IOMMU), though.


The capability says that the ioctl is supported. If there is no IOMMU group
registered, than it will fail with a reasonable error and nobody gets hurt.
What is the problem?



 @@ -1025,6 +1026,17 @@ long kvm_arch_vm_ioctl(struct file *filp,
 r = kvm_vm_ioctl_create_spapr_tce(kvm, create_tce);
 goto out;
 }
 +   case KVM_CREATE_SPAPR_TCE_IOMMU: {
 +   struct kvm_create_spapr_tce_iommu create_tce_iommu;
 +   struct kvm *kvm = filp-private_data;
 +
 +   r = -EFAULT;
 +   if (copy_from_user(create_tce_iommu, argp,
 +   sizeof(create_tce_iommu)))
 +   goto out;
 +   r = kvm_vm_ioctl_create_spapr_tce_iommu(kvm,
 create_tce_iommu);
 +   goto out;
 +   }
 #endif /* CONFIG_PPC_BOOK3S_64 */

 #ifdef CONFIG_KVM_BOOK3S_64_HV
 diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
 index 5a2afda..450c82a 100644
 --- a/include/uapi/linux/kvm.h
 +++ b/include/uapi/linux/kvm.h
 @@ -667,6 +667,7 @@ struct kvm_ppc_smmu_info {
 #define KVM_CAP_PPC_RTAS 91
 #define KVM_CAP_IRQ_XICS 92
 #define KVM_CAP_SPAPR_MULTITCE (0x11 + 89)
 +#define KVM_CAP_SPAPR_TCE_IOMMU (0x11 + 90)

 Hmm...
 
 Ah, yeah, that needs to be fixed.  Those were interim numbers so that
 we didn't have to keep changing our internal trees as new upstream
 ioctls got added to the list.  We need to get a proper number for the
 merge, though.

 @@ -939,6 +940,9 @@ struct kvm_s390_ucas_mapping {
 #define KVM_GET_DEVICE_ATTR   _IOW(KVMIO,  0xe2, struct
 kvm_device_attr)
 #define KVM_HAS_DEVICE_ATTR   _IOW(KVMIO,  0xe3, struct
 kvm_device_attr)

 +/* ioctl for SPAPR TCE IOMMU */
 +#define KVM_CREATE_SPAPR_TCE_IOMMU _IOW(KVMIO,  0xe4, struct
 kvm_create_spapr_tce_iommu)

 Shouldn't this go under the vm ioctl section?


The KVM_CREATE_SPAPR_TCE_IOMMU ioctl (the version for emulated devices) is
in this section so I decided to keep them together. Wrong?


-- 
Alexey
--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling

2013-05-24 Thread David Gibson
On Wed, May 22, 2013 at 04:06:57PM -0500, Scott Wood wrote:
 On 05/20/2013 10:06:46 PM, Alexey Kardashevskiy wrote:
 diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
 index 8465c2a..da6bf61 100644
 --- a/arch/powerpc/kvm/powerpc.c
 @@ -396,6 +396,7 @@ int kvm_dev_ioctl_check_extension(long ext)
 +++ b/arch/powerpc/kvm/powerpc.c
  break;
  #endif
  case KVM_CAP_SPAPR_MULTITCE:
 +case KVM_CAP_SPAPR_TCE_IOMMU:
  r = 1;
  break;
  default:
 
 Don't advertise SPAPR capabilities if it's not book3s -- and
 probably there's some additional limitation that would be
 appropriate.

So, in the case of MULTITCE, that's not quite right.  PR KVM can
emulate a PAPR system on a BookE machine, and there's no reason not to
allow TCE acceleration as well.  We can't make it dependent on PAPR
mode being selected, because that's enabled per-vcpu, whereas these
capabilities are queried on the VM before the vcpus are created.

CAP_SPAPR_TCE_IOMMU should be dependent on the presence of suitable
host side hardware (i.e. a PAPR style IOMMU), though.

 
 @@ -1025,6 +1026,17 @@ long kvm_arch_vm_ioctl(struct file *filp,
  r = kvm_vm_ioctl_create_spapr_tce(kvm, create_tce);
  goto out;
  }
 +case KVM_CREATE_SPAPR_TCE_IOMMU: {
 +struct kvm_create_spapr_tce_iommu create_tce_iommu;
 +struct kvm *kvm = filp-private_data;
 +
 +r = -EFAULT;
 +if (copy_from_user(create_tce_iommu, argp,
 +sizeof(create_tce_iommu)))
 +goto out;
 +r = kvm_vm_ioctl_create_spapr_tce_iommu(kvm,
 create_tce_iommu);
 +goto out;
 +}
  #endif /* CONFIG_PPC_BOOK3S_64 */
 
  #ifdef CONFIG_KVM_BOOK3S_64_HV
 diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
 index 5a2afda..450c82a 100644
 --- a/include/uapi/linux/kvm.h
 +++ b/include/uapi/linux/kvm.h
 @@ -667,6 +667,7 @@ struct kvm_ppc_smmu_info {
  #define KVM_CAP_PPC_RTAS 91
  #define KVM_CAP_IRQ_XICS 92
  #define KVM_CAP_SPAPR_MULTITCE (0x11 + 89)
 +#define KVM_CAP_SPAPR_TCE_IOMMU (0x11 + 90)
 
 Hmm...

Ah, yeah, that needs to be fixed.  Those were interim numbers so that
we didn't have to keep changing our internal trees as new upstream
ioctls got added to the list.  We need to get a proper number for the
merge, though.

 @@ -939,6 +940,9 @@ struct kvm_s390_ucas_mapping {
  #define KVM_GET_DEVICE_ATTR   _IOW(KVMIO,  0xe2, struct
 kvm_device_attr)
  #define KVM_HAS_DEVICE_ATTR   _IOW(KVMIO,  0xe3, struct
 kvm_device_attr)
 
 +/* ioctl for SPAPR TCE IOMMU */
 +#define KVM_CREATE_SPAPR_TCE_IOMMU _IOW(KVMIO,  0xe4, struct
 kvm_create_spapr_tce_iommu)
 
 Shouldn't this go under the vm ioctl section?
 
 -Scott
 ___
 Linuxppc-dev mailing list
 linuxppc-...@lists.ozlabs.org
 https://lists.ozlabs.org/listinfo/linuxppc-dev
 

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: Digital signature


Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling

2013-05-24 Thread David Gibson
On Wed, May 22, 2013 at 04:06:57PM -0500, Scott Wood wrote:
 On 05/20/2013 10:06:46 PM, Alexey Kardashevskiy wrote:
 diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
 index 8465c2a..da6bf61 100644
 --- a/arch/powerpc/kvm/powerpc.c
 @@ -396,6 +396,7 @@ int kvm_dev_ioctl_check_extension(long ext)
 +++ b/arch/powerpc/kvm/powerpc.c
  break;
  #endif
  case KVM_CAP_SPAPR_MULTITCE:
 +case KVM_CAP_SPAPR_TCE_IOMMU:
  r = 1;
  break;
  default:
 
 Don't advertise SPAPR capabilities if it's not book3s -- and
 probably there's some additional limitation that would be
 appropriate.

So, in the case of MULTITCE, that's not quite right.  PR KVM can
emulate a PAPR system on a BookE machine, and there's no reason not to
allow TCE acceleration as well.  We can't make it dependent on PAPR
mode being selected, because that's enabled per-vcpu, whereas these
capabilities are queried on the VM before the vcpus are created.

CAP_SPAPR_TCE_IOMMU should be dependent on the presence of suitable
host side hardware (i.e. a PAPR style IOMMU), though.

 
 @@ -1025,6 +1026,17 @@ long kvm_arch_vm_ioctl(struct file *filp,
  r = kvm_vm_ioctl_create_spapr_tce(kvm, create_tce);
  goto out;
  }
 +case KVM_CREATE_SPAPR_TCE_IOMMU: {
 +struct kvm_create_spapr_tce_iommu create_tce_iommu;
 +struct kvm *kvm = filp-private_data;
 +
 +r = -EFAULT;
 +if (copy_from_user(create_tce_iommu, argp,
 +sizeof(create_tce_iommu)))
 +goto out;
 +r = kvm_vm_ioctl_create_spapr_tce_iommu(kvm,
 create_tce_iommu);
 +goto out;
 +}
  #endif /* CONFIG_PPC_BOOK3S_64 */
 
  #ifdef CONFIG_KVM_BOOK3S_64_HV
 diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
 index 5a2afda..450c82a 100644
 --- a/include/uapi/linux/kvm.h
 +++ b/include/uapi/linux/kvm.h
 @@ -667,6 +667,7 @@ struct kvm_ppc_smmu_info {
  #define KVM_CAP_PPC_RTAS 91
  #define KVM_CAP_IRQ_XICS 92
  #define KVM_CAP_SPAPR_MULTITCE (0x11 + 89)
 +#define KVM_CAP_SPAPR_TCE_IOMMU (0x11 + 90)
 
 Hmm...

Ah, yeah, that needs to be fixed.  Those were interim numbers so that
we didn't have to keep changing our internal trees as new upstream
ioctls got added to the list.  We need to get a proper number for the
merge, though.

 @@ -939,6 +940,9 @@ struct kvm_s390_ucas_mapping {
  #define KVM_GET_DEVICE_ATTR   _IOW(KVMIO,  0xe2, struct
 kvm_device_attr)
  #define KVM_HAS_DEVICE_ATTR   _IOW(KVMIO,  0xe3, struct
 kvm_device_attr)
 
 +/* ioctl for SPAPR TCE IOMMU */
 +#define KVM_CREATE_SPAPR_TCE_IOMMU _IOW(KVMIO,  0xe4, struct
 kvm_create_spapr_tce_iommu)
 
 Shouldn't this go under the vm ioctl section?
 
 -Scott
 ___
 Linuxppc-dev mailing list
 linuxppc-...@lists.ozlabs.org
 https://lists.ozlabs.org/listinfo/linuxppc-dev
 

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: Digital signature


Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling

2013-05-22 Thread Scott Wood

On 05/20/2013 10:06:46 PM, Alexey Kardashevskiy wrote:

diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 8465c2a..da6bf61 100644
--- a/arch/powerpc/kvm/powerpc.c
@@ -396,6 +396,7 @@ int kvm_dev_ioctl_check_extension(long ext)
+++ b/arch/powerpc/kvm/powerpc.c
break;
 #endif
case KVM_CAP_SPAPR_MULTITCE:
+   case KVM_CAP_SPAPR_TCE_IOMMU:
r = 1;
break;
default:


Don't advertise SPAPR capabilities if it's not book3s -- and probably  
there's some additional limitation that would be appropriate.



@@ -1025,6 +1026,17 @@ long kvm_arch_vm_ioctl(struct file *filp,
r = kvm_vm_ioctl_create_spapr_tce(kvm, create_tce);
goto out;
}
+   case KVM_CREATE_SPAPR_TCE_IOMMU: {
+   struct kvm_create_spapr_tce_iommu create_tce_iommu;
+   struct kvm *kvm = filp-private_data;
+
+   r = -EFAULT;
+   if (copy_from_user(create_tce_iommu, argp,
+   sizeof(create_tce_iommu)))
+   goto out;
+		r = kvm_vm_ioctl_create_spapr_tce_iommu(kvm,  
create_tce_iommu);

+   goto out;
+   }
 #endif /* CONFIG_PPC_BOOK3S_64 */

 #ifdef CONFIG_KVM_BOOK3S_64_HV
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 5a2afda..450c82a 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -667,6 +667,7 @@ struct kvm_ppc_smmu_info {
 #define KVM_CAP_PPC_RTAS 91
 #define KVM_CAP_IRQ_XICS 92
 #define KVM_CAP_SPAPR_MULTITCE (0x11 + 89)
+#define KVM_CAP_SPAPR_TCE_IOMMU (0x11 + 90)


Hmm...


@@ -939,6 +940,9 @@ struct kvm_s390_ucas_mapping {
 #define KVM_GET_DEVICE_ATTR	  _IOW(KVMIO,  0xe2, struct  
kvm_device_attr)
 #define KVM_HAS_DEVICE_ATTR	  _IOW(KVMIO,  0xe3, struct  
kvm_device_attr)


+/* ioctl for SPAPR TCE IOMMU */
+#define KVM_CREATE_SPAPR_TCE_IOMMU _IOW(KVMIO,  0xe4, struct  
kvm_create_spapr_tce_iommu)


Shouldn't this go under the vm ioctl section?

-Scott
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling

2013-05-22 Thread Scott Wood

On 05/20/2013 10:06:46 PM, Alexey Kardashevskiy wrote:

diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 8465c2a..da6bf61 100644
--- a/arch/powerpc/kvm/powerpc.c
@@ -396,6 +396,7 @@ int kvm_dev_ioctl_check_extension(long ext)
+++ b/arch/powerpc/kvm/powerpc.c
break;
 #endif
case KVM_CAP_SPAPR_MULTITCE:
+   case KVM_CAP_SPAPR_TCE_IOMMU:
r = 1;
break;
default:


Don't advertise SPAPR capabilities if it's not book3s -- and probably  
there's some additional limitation that would be appropriate.



@@ -1025,6 +1026,17 @@ long kvm_arch_vm_ioctl(struct file *filp,
r = kvm_vm_ioctl_create_spapr_tce(kvm, create_tce);
goto out;
}
+   case KVM_CREATE_SPAPR_TCE_IOMMU: {
+   struct kvm_create_spapr_tce_iommu create_tce_iommu;
+   struct kvm *kvm = filp-private_data;
+
+   r = -EFAULT;
+   if (copy_from_user(create_tce_iommu, argp,
+   sizeof(create_tce_iommu)))
+   goto out;
+		r = kvm_vm_ioctl_create_spapr_tce_iommu(kvm,  
create_tce_iommu);

+   goto out;
+   }
 #endif /* CONFIG_PPC_BOOK3S_64 */

 #ifdef CONFIG_KVM_BOOK3S_64_HV
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 5a2afda..450c82a 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -667,6 +667,7 @@ struct kvm_ppc_smmu_info {
 #define KVM_CAP_PPC_RTAS 91
 #define KVM_CAP_IRQ_XICS 92
 #define KVM_CAP_SPAPR_MULTITCE (0x11 + 89)
+#define KVM_CAP_SPAPR_TCE_IOMMU (0x11 + 90)


Hmm...


@@ -939,6 +940,9 @@ struct kvm_s390_ucas_mapping {
 #define KVM_GET_DEVICE_ATTR	  _IOW(KVMIO,  0xe2, struct  
kvm_device_attr)
 #define KVM_HAS_DEVICE_ATTR	  _IOW(KVMIO,  0xe3, struct  
kvm_device_attr)


+/* ioctl for SPAPR TCE IOMMU */
+#define KVM_CREATE_SPAPR_TCE_IOMMU _IOW(KVMIO,  0xe4, struct  
kvm_create_spapr_tce_iommu)


Shouldn't this go under the vm ioctl section?

-Scott
--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling

2013-05-20 Thread Alexey Kardashevskiy
This allows the host kernel to handle H_PUT_TCE, H_PUT_TCE_INDIRECT
and H_STUFF_TCE requests without passing them to QEMU, which should
save time on switching to QEMU and back.

Both real and virtual modes are supported - whenever the kernel
fails to handle TCE request, it passes it to the virtual mode.
If it the virtual mode handlers fail, then the request is passed
to the user mode, for example, to QEMU.

This adds a new KVM_CAP_SPAPR_TCE_IOMMU ioctl to asssociate
a virtual PCI bus ID (LIOBN) with an IOMMU group, which enables
in-kernel handling of IOMMU map/unmap.

Tests show that this patch increases transmission speed from 220MB/s
to 750..1020MB/s on 10Gb network (Chelsea CXGB3 10Gb ethernet card).

Cc: David Gibson da...@gibson.dropbear.id.au
Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru
Signed-off-by: Paul Mackerras pau...@samba.org

---

Changes:
2013-05-20:
* removed get_user() from real mode handlers
* kvm_vcpu_arch::tce_tmp usage extended. Now real mode handler puts there
translated TCEs, tries realmode_get_page() on those and if it fails, it
passes control over the virtual mode handler which tries to finish
the request handling
* kvmppc_lookup_pte() now does realmode_get_page() protected by BUSY bit
on a page
* The only reason to pass the request to user mode now is when the user mode
did not register TCE table in the kernel, in all other cases the virtual mode
handler is expected to do the job
---
 Documentation/virtual/kvm/api.txt   |   28 +
 arch/powerpc/include/asm/kvm_host.h |3 +
 arch/powerpc/include/asm/kvm_ppc.h  |2 +
 arch/powerpc/include/uapi/asm/kvm.h |7 ++
 arch/powerpc/kvm/book3s_64_vio.c|  198 ++-
 arch/powerpc/kvm/book3s_64_vio_hv.c |  193 +-
 arch/powerpc/kvm/powerpc.c  |   12 +++
 include/uapi/linux/kvm.h|4 +
 8 files changed, 441 insertions(+), 6 deletions(-)

diff --git a/Documentation/virtual/kvm/api.txt 
b/Documentation/virtual/kvm/api.txt
index 3c7c7ea..3c8e9fe 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -2362,6 +2362,34 @@ calls by the guest for that service will be passed to 
userspace to be
 handled.
 
 
+4.79 KVM_CREATE_SPAPR_TCE_IOMMU
+
+Capability: KVM_CAP_SPAPR_TCE_IOMMU
+Architectures: powerpc
+Type: vm ioctl
+Parameters: struct kvm_create_spapr_tce_iommu (in)
+Returns: 0 on success, -1 on error
+
+This creates a link between IOMMU group and a hardware TCE (translation
+control entry) table. This link lets the host kernel know what IOMMU
+group (i.e. TCE table) to use for the LIOBN number passed with
+H_PUT_TCE, H_PUT_TCE_INDIRECT, H_STUFF_TCE hypercalls.
+
+/* for KVM_CAP_SPAPR_TCE_IOMMU */
+struct kvm_create_spapr_tce_iommu {
+   __u64 liobn;
+   __u32 iommu_id;
+   __u32 flags;
+};
+
+No flag is supported at the moment.
+
+When the guest issues TCE call on a liobn for which a TCE table has been
+registered, the kernel will handle it in real mode, updating the hardware
+TCE table. TCE table calls for other liobns will cause a vm exit and must
+be handled by userspace.
+
+
 5. The kvm_run structure
 
 
diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index 85d8f26..ac0e2fe 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -180,6 +180,7 @@ struct kvmppc_spapr_tce_table {
struct kvm *kvm;
u64 liobn;
u32 window_size;
+   struct iommu_group *grp;/* used for IOMMU groups */
struct page *pages[0];
 };
 
@@ -611,6 +612,8 @@ struct kvm_vcpu_arch {
u64 busy_preempt;
 
unsigned long *tce_tmp;/* TCE cache for TCE_PUT_INDIRECT hall */
+   unsigned long tce_tmp_num; /* Number of handled TCEs in the cache */
+   unsigned long tce_reason;  /* The reason of switching to the virtmode */
 #endif
 };
 
diff --git a/arch/powerpc/include/asm/kvm_ppc.h 
b/arch/powerpc/include/asm/kvm_ppc.h
index e852921b..934e01d 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -133,6 +133,8 @@ extern int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu);
 
 extern long kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm,
struct kvm_create_spapr_tce *args);
+extern long kvm_vm_ioctl_create_spapr_tce_iommu(struct kvm *kvm,
+   struct kvm_create_spapr_tce_iommu *args);
 extern struct kvmppc_spapr_tce_table *kvmppc_find_tce_table(
struct kvm_vcpu *vcpu, unsigned long liobn);
 extern long kvmppc_emulated_validate_tce(unsigned long tce);
diff --git a/arch/powerpc/include/uapi/asm/kvm.h 
b/arch/powerpc/include/uapi/asm/kvm.h
index 0fb1a6e..cf82af4 100644
--- a/arch/powerpc/include/uapi/asm/kvm.h
+++ b/arch/powerpc/include/uapi/asm/kvm.h
@@ -319,6 +319,13 @@ struct kvm_create_spapr_tce {
__u32 window_size;
 };
 
+/* for KVM_CAP_SPAPR_TCE_IOMMU */

[PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling

2013-05-20 Thread Alexey Kardashevskiy
This allows the host kernel to handle H_PUT_TCE, H_PUT_TCE_INDIRECT
and H_STUFF_TCE requests without passing them to QEMU, which should
save time on switching to QEMU and back.

Both real and virtual modes are supported - whenever the kernel
fails to handle TCE request, it passes it to the virtual mode.
If it the virtual mode handlers fail, then the request is passed
to the user mode, for example, to QEMU.

This adds a new KVM_CAP_SPAPR_TCE_IOMMU ioctl to asssociate
a virtual PCI bus ID (LIOBN) with an IOMMU group, which enables
in-kernel handling of IOMMU map/unmap.

Tests show that this patch increases transmission speed from 220MB/s
to 750..1020MB/s on 10Gb network (Chelsea CXGB3 10Gb ethernet card).

Cc: David Gibson da...@gibson.dropbear.id.au
Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru
Signed-off-by: Paul Mackerras pau...@samba.org

---

Changes:
2013-05-20:
* removed get_user() from real mode handlers
* kvm_vcpu_arch::tce_tmp usage extended. Now real mode handler puts there
translated TCEs, tries realmode_get_page() on those and if it fails, it
passes control over the virtual mode handler which tries to finish
the request handling
* kvmppc_lookup_pte() now does realmode_get_page() protected by BUSY bit
on a page
* The only reason to pass the request to user mode now is when the user mode
did not register TCE table in the kernel, in all other cases the virtual mode
handler is expected to do the job
---
 Documentation/virtual/kvm/api.txt   |   28 +
 arch/powerpc/include/asm/kvm_host.h |3 +
 arch/powerpc/include/asm/kvm_ppc.h  |2 +
 arch/powerpc/include/uapi/asm/kvm.h |7 ++
 arch/powerpc/kvm/book3s_64_vio.c|  198 ++-
 arch/powerpc/kvm/book3s_64_vio_hv.c |  193 +-
 arch/powerpc/kvm/powerpc.c  |   12 +++
 include/uapi/linux/kvm.h|4 +
 8 files changed, 441 insertions(+), 6 deletions(-)

diff --git a/Documentation/virtual/kvm/api.txt 
b/Documentation/virtual/kvm/api.txt
index 3c7c7ea..3c8e9fe 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -2362,6 +2362,34 @@ calls by the guest for that service will be passed to 
userspace to be
 handled.
 
 
+4.79 KVM_CREATE_SPAPR_TCE_IOMMU
+
+Capability: KVM_CAP_SPAPR_TCE_IOMMU
+Architectures: powerpc
+Type: vm ioctl
+Parameters: struct kvm_create_spapr_tce_iommu (in)
+Returns: 0 on success, -1 on error
+
+This creates a link between IOMMU group and a hardware TCE (translation
+control entry) table. This link lets the host kernel know what IOMMU
+group (i.e. TCE table) to use for the LIOBN number passed with
+H_PUT_TCE, H_PUT_TCE_INDIRECT, H_STUFF_TCE hypercalls.
+
+/* for KVM_CAP_SPAPR_TCE_IOMMU */
+struct kvm_create_spapr_tce_iommu {
+   __u64 liobn;
+   __u32 iommu_id;
+   __u32 flags;
+};
+
+No flag is supported at the moment.
+
+When the guest issues TCE call on a liobn for which a TCE table has been
+registered, the kernel will handle it in real mode, updating the hardware
+TCE table. TCE table calls for other liobns will cause a vm exit and must
+be handled by userspace.
+
+
 5. The kvm_run structure
 
 
diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index 85d8f26..ac0e2fe 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -180,6 +180,7 @@ struct kvmppc_spapr_tce_table {
struct kvm *kvm;
u64 liobn;
u32 window_size;
+   struct iommu_group *grp;/* used for IOMMU groups */
struct page *pages[0];
 };
 
@@ -611,6 +612,8 @@ struct kvm_vcpu_arch {
u64 busy_preempt;
 
unsigned long *tce_tmp;/* TCE cache for TCE_PUT_INDIRECT hall */
+   unsigned long tce_tmp_num; /* Number of handled TCEs in the cache */
+   unsigned long tce_reason;  /* The reason of switching to the virtmode */
 #endif
 };
 
diff --git a/arch/powerpc/include/asm/kvm_ppc.h 
b/arch/powerpc/include/asm/kvm_ppc.h
index e852921b..934e01d 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -133,6 +133,8 @@ extern int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu);
 
 extern long kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm,
struct kvm_create_spapr_tce *args);
+extern long kvm_vm_ioctl_create_spapr_tce_iommu(struct kvm *kvm,
+   struct kvm_create_spapr_tce_iommu *args);
 extern struct kvmppc_spapr_tce_table *kvmppc_find_tce_table(
struct kvm_vcpu *vcpu, unsigned long liobn);
 extern long kvmppc_emulated_validate_tce(unsigned long tce);
diff --git a/arch/powerpc/include/uapi/asm/kvm.h 
b/arch/powerpc/include/uapi/asm/kvm.h
index 0fb1a6e..cf82af4 100644
--- a/arch/powerpc/include/uapi/asm/kvm.h
+++ b/arch/powerpc/include/uapi/asm/kvm.h
@@ -319,6 +319,13 @@ struct kvm_create_spapr_tce {
__u32 window_size;
 };
 
+/* for KVM_CAP_SPAPR_TCE_IOMMU */