Re: RFC: vfio API changes needed for powerpc (v2)

2013-04-04 Thread Alex Williamson
On Thu, 2013-04-04 at 17:32 +, Yoder Stuart-B08248 wrote:
 Based on the email thread over the last couple of days, I have
 below an more concrete proposal (v2) for new ioctls supporting vfio-pci
 on SoCs with the Freescale PAMU.
 
 Example usage is as described by Scott:
 
 count = VFIO_IOMMU_GET_MSI_BANK_COUNT
 VFIO_IOMMU_SET_ATTR(ATTR_GEOMETRY)
 VFIO_IOMMU_SET_ATTR(ATTR_WINDOWS)
 // do other DMA maps now, or later, or not at all, doesn't matter
 for (i = 0; i  count; i++)
 VFIO_IOMMU_MAP_MSI_BANK(iova, i);
 // The kernel now knows where each bank has been mapped, and can
 //   update PCI config space appropriately.
 
 Thanks,
 Stuart
 
 
 
 The Freescale PAMU is an aperture-based IOMMU with the following 
 characteristics.  Each device has an entry in a table in memory
 describing the iova-phys mapping. The mapping has:
-an overall aperture that is power of 2 sized, and has a start iova that
 is naturally aligned
-has 1 or more windows within the aperture
   -number of windows must be power of 2, max is 256
   -size of each window is determined by aperture size / # of windows
   -iova of each window is determined by aperture start iova / # of windows
   -the mapped region in each window can be different than
the window size...mapping must power of 2 
   -physical address of the mapping must be naturally aligned
with the mapping size
 
 /*
  * VFIO_PAMU_GET_ATTR
  *
  * Gets the iommu attributes for the current vfio container.  This 
  * ioctl is applicable to an iommu type of VFIO_PAMU only.
  * Caller sets argsz and attribute.  The ioctl fills in
  * the provided struct vfio_pamu_attr based on the attribute
  * value that was set.
  * Operates on VFIO file descriptor (/dev/vfio/vfio).
  * Return: 0 on success, -errno on failure
  */
 struct vfio_pamu_attr {
 __u32   argsz;
 __u32   attribute;
 #define VFIO_ATTR_GEOMETRY 1
 #define VFIO_ATTR_WINDOWS  2
 #define VFIO_ATTR_PAMU_STASH   3
 
 /* fowlling fields are for VFIO_ATTR_GEOMETRY */
 __u64 aperture_start; /* first address that can be mapped*/
 __u64 aperture_end;   /* last address that can be mapped */
 
 /* follwing fields are for VFIO_ATTR_WINDOWS */
 __u32 windows;/* number of windows in the aperture */
   /* initially this will be the max number
* of windows that can be set
*/
 
 /* following fields are for VFIO_ATTR_PAMU_STASH */
 __u32 cpu;/* CPU number for stashing */
 __u32 cache;  /* cache ID for stashing */

Can multiple attribute bits be set at once?  If not then the above could
be a union and the attribute could be an enum.  A flags field is
probably still a good idea.

 };
 #define VFIO_PAMU_GET_ATTR  _IO(VFIO_TYPE, VFIO_BASE + x,
 struct vfio_pamu_attr)
 
 /*
  * VFIO_PAMU_SET_ATTR
  *
  * Sets the iommu attributes for the current vfio container.  This 
  * ioctl is applicable to an iommu type of VFIO_PAMU only.
  * Caller sets struct vfio_pamu attr, including argsz and attribute and
  * setting any fields that are valid for the attribute.
  * Operates on VFIO file descriptor (/dev/vfio/vfio).
  * Return: 0 on success, -errno on failure
  */
 #define VFIO_PAMU_SET_ATTR  _IO(VFIO_TYPE, VFIO_BASE + x,
 struct vfio_pamu_attr)
 
 /*
  * VFIO_PAMU_GET_MSI_BANK_COUNT
  *
  * Returns the number of MSI banks for this platform.  This tells user space
  * how many aperture windows should be reserved for MSI banks when setting
  * the PAMU geometry and window count.
  * Fills in provided struct vfio_pamu_msi_banks. Caller sets argsz. 
  * Operates on VFIO file descriptor (/dev/vfio/vfio).
  * Return: 0 on success, -errno on failure
  */
 struct vfio_pamu_msi_banks {
 __u32   argsz;
 __u32   bank_count;  /* the number of MSI
 };
 #define VFIO_PAMU_GET_MSI_BANK_COUNT  _IO(VFIO_TYPE, VFIO_BASE + x,
 struct vfio_pamu_msi_banks)

argsz w/o flags doesn't really buy us much.

 
 /*
  * VFIO_PAMU_MAP_MSI_BANK
  *
  * Maps the MSI bank at the specified index and iova.  User space must
  * call this ioctl once for each MSI bank (count of banks is returned by
  * VFIO_IOMMU_GET_MSI_BANK_COUNT).
  * Caller provides struct vfio_pamu_msi_bank_map with all fields set.
  * Operates on VFIO file descriptor (/dev/vfio/vfio).
  * Return: 0 on success, -errno on failure
  */
 
 struct vfio_pamu_msi_bank_map {
 __u32   argsz;
 __u32   msi_bank_index;  /* the index of the MSI bank */
 __u64   iova;  /* the iova the bank is to be mapped to */
 };

Again, flags.  If we dynamically add or remove devices from a container
the bank count can change, right?  If bank count goes from 2 to 3, does
userspace know assume the new bank is [2]?  If bank count goes from 3 

RE: RFC: vfio API changes needed for powerpc (v2)

2013-04-04 Thread Yoder Stuart-B08248


 -Original Message-
 From: Wood Scott-B07421
 Sent: Thursday, April 04, 2013 5:52 PM
 To: Yoder Stuart-B08248
 Cc: Alex Williamson; Wood Scott-B07421; ag...@suse.de; Bhushan Bharat-R65777; 
 Sethi Varun-B16395;
 k...@vger.kernel.org; qemu-de...@nongnu.org; iommu@lists.linux-foundation.org
 Subject: Re: RFC: vfio API changes needed for powerpc (v2)
 
 On 04/04/2013 04:38:44 PM, Yoder Stuart-B08248 wrote:
 
/*
 * VFIO_PAMU_MAP_MSI_BANK
 *
 * Maps the MSI bank at the specified index and iova.  User space
  must
 * call this ioctl once for each MSI bank (count of banks is
  returned by
 * VFIO_IOMMU_GET_MSI_BANK_COUNT).
 * Caller provides struct vfio_pamu_msi_bank_map with all fields
  set.
 * Operates on VFIO file descriptor (/dev/vfio/vfio).
 * Return: 0 on success, -errno on failure
 */
   
struct vfio_pamu_msi_bank_map {
__u32   argsz;
__u32   msi_bank_index;  /* the index of the MSI bank */
__u64   iova;  /* the iova the bank is to be mapped
  to */
};
  
   Again, flags.  If we dynamically add or remove devices from a
  container
   the bank count can change, right?  If bank count goes from 2 to 3,
  does
   userspace know assume the new bank is [2]?  If bank count goes from
  3 to
   2, which index was that?  If removing a device removes an MSI bank
  then
   vfio-pamu will automatically do the unmap, right?
 
  My assumption is that the bank count returned by
  VFIO_IOMMU_GET_MSI_BANK_COUNT
  is the max bank count for a platform.  (number will mostly likely
  always be
  3 or 4).  So it won't change as devices are added or removed.
 
 It should be the actual number of banks used.  This is required if
 we're going to have userspace do the iteration and specify the exact
 iovas to use -- and even if we aren't going to do that, it would be
 more restrictive on available iova-space than is necessary.  Usually
 there will be only one bank in the group.
 
 Actually mapping all of the MSI banks, all the time, would completely
 negate the point of using the separate alias pages.

The geometry, windows, DMA mappings, etc is set on a 'container'
which may have multiple groups in it.  So user space needs to
determine the total number of MSI windows needed when setting
the geometry and window count.

In the flow you proposed:

count = VFIO_IOMMU_GET_MSI_BANK_COUNT
VFIO_IOMMU_SET_ATTR(ATTR_GEOMETRY)
VFIO_IOMMU_SET_ATTR(ATTR_WINDOWS)
// do other DMA maps now, or later, or not at all, doesn't matter
for (i = 0; i  count; i++)
VFIO_IOMMU_MAP_MSI_BANK(iova, i);

...that count has to be the total, not the count for 1 of
N possible groups.   So the get count ioctl is not done on
a group.

However, like you pointed out we don't want to negate isolation 
of the separate alias pages.  All this API is doing is telling
the kernel which windows to use for which MSI banks.   It's up
to the kernel to actually enable them as needed.

Say 3 MSI banks exist.  If there are no groups added to the
vfio container and all 3 MAP_MSI_BANK calls occurred
the picture may look like this (based on my earlier example):

   win gphys/
#   enabled iovaphys  size
--- ---   
5 N 0x1000  0xf_fe044000  4KB// msi bank 1
6 N 0x1400  0xf_fe045000  4KB// msi bank 2
7 N 0x1800  0xf_fe046000  4KB// msi bank 3

User space adds 2 groups that use bank 1:

   win  gphys/
#   enabled iovaphys  size
--- ---   
5 Y 0x1000  0xf_fe044000  4KB// msi bank 1
6 N 0x1400  0xf_fe045000  4KB// msi bank 2
7 N 0x1800  0xf_fe046000  4KB// msi bank 3

User space adds another group that uses bank 3:

   win  gphys/
#   enabled iovaphys  size
--- ---   
5 Y 0x1000  0xf_fe044000  4KB// msi bank 1
6 N 0x1400  0xf_fe045000  4KB// msi bank 2
7 Y 0x1800  0xf_fe046000  4KB// msi bank 3

User space doesn't need to care what is actually enabled,
it just needs to the the kernel which windows to use
and the kernel can take care of the rest.

Stuart


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu