Re: RFC: vfio API changes needed for powerpc (v2)
On Thu, 2013-04-04 at 17:32 +, Yoder Stuart-B08248 wrote: Based on the email thread over the last couple of days, I have below an more concrete proposal (v2) for new ioctls supporting vfio-pci on SoCs with the Freescale PAMU. Example usage is as described by Scott: count = VFIO_IOMMU_GET_MSI_BANK_COUNT VFIO_IOMMU_SET_ATTR(ATTR_GEOMETRY) VFIO_IOMMU_SET_ATTR(ATTR_WINDOWS) // do other DMA maps now, or later, or not at all, doesn't matter for (i = 0; i count; i++) VFIO_IOMMU_MAP_MSI_BANK(iova, i); // The kernel now knows where each bank has been mapped, and can // update PCI config space appropriately. Thanks, Stuart The Freescale PAMU is an aperture-based IOMMU with the following characteristics. Each device has an entry in a table in memory describing the iova-phys mapping. The mapping has: -an overall aperture that is power of 2 sized, and has a start iova that is naturally aligned -has 1 or more windows within the aperture -number of windows must be power of 2, max is 256 -size of each window is determined by aperture size / # of windows -iova of each window is determined by aperture start iova / # of windows -the mapped region in each window can be different than the window size...mapping must power of 2 -physical address of the mapping must be naturally aligned with the mapping size /* * VFIO_PAMU_GET_ATTR * * Gets the iommu attributes for the current vfio container. This * ioctl is applicable to an iommu type of VFIO_PAMU only. * Caller sets argsz and attribute. The ioctl fills in * the provided struct vfio_pamu_attr based on the attribute * value that was set. * Operates on VFIO file descriptor (/dev/vfio/vfio). * Return: 0 on success, -errno on failure */ struct vfio_pamu_attr { __u32 argsz; __u32 attribute; #define VFIO_ATTR_GEOMETRY 1 #define VFIO_ATTR_WINDOWS 2 #define VFIO_ATTR_PAMU_STASH 3 /* fowlling fields are for VFIO_ATTR_GEOMETRY */ __u64 aperture_start; /* first address that can be mapped*/ __u64 aperture_end; /* last address that can be mapped */ /* follwing fields are for VFIO_ATTR_WINDOWS */ __u32 windows;/* number of windows in the aperture */ /* initially this will be the max number * of windows that can be set */ /* following fields are for VFIO_ATTR_PAMU_STASH */ __u32 cpu;/* CPU number for stashing */ __u32 cache; /* cache ID for stashing */ Can multiple attribute bits be set at once? If not then the above could be a union and the attribute could be an enum. A flags field is probably still a good idea. }; #define VFIO_PAMU_GET_ATTR _IO(VFIO_TYPE, VFIO_BASE + x, struct vfio_pamu_attr) /* * VFIO_PAMU_SET_ATTR * * Sets the iommu attributes for the current vfio container. This * ioctl is applicable to an iommu type of VFIO_PAMU only. * Caller sets struct vfio_pamu attr, including argsz and attribute and * setting any fields that are valid for the attribute. * Operates on VFIO file descriptor (/dev/vfio/vfio). * Return: 0 on success, -errno on failure */ #define VFIO_PAMU_SET_ATTR _IO(VFIO_TYPE, VFIO_BASE + x, struct vfio_pamu_attr) /* * VFIO_PAMU_GET_MSI_BANK_COUNT * * Returns the number of MSI banks for this platform. This tells user space * how many aperture windows should be reserved for MSI banks when setting * the PAMU geometry and window count. * Fills in provided struct vfio_pamu_msi_banks. Caller sets argsz. * Operates on VFIO file descriptor (/dev/vfio/vfio). * Return: 0 on success, -errno on failure */ struct vfio_pamu_msi_banks { __u32 argsz; __u32 bank_count; /* the number of MSI }; #define VFIO_PAMU_GET_MSI_BANK_COUNT _IO(VFIO_TYPE, VFIO_BASE + x, struct vfio_pamu_msi_banks) argsz w/o flags doesn't really buy us much. /* * VFIO_PAMU_MAP_MSI_BANK * * Maps the MSI bank at the specified index and iova. User space must * call this ioctl once for each MSI bank (count of banks is returned by * VFIO_IOMMU_GET_MSI_BANK_COUNT). * Caller provides struct vfio_pamu_msi_bank_map with all fields set. * Operates on VFIO file descriptor (/dev/vfio/vfio). * Return: 0 on success, -errno on failure */ struct vfio_pamu_msi_bank_map { __u32 argsz; __u32 msi_bank_index; /* the index of the MSI bank */ __u64 iova; /* the iova the bank is to be mapped to */ }; Again, flags. If we dynamically add or remove devices from a container the bank count can change, right? If bank count goes from 2 to 3, does userspace know assume the new bank is [2]? If bank count goes from 3
RE: RFC: vfio API changes needed for powerpc (v2)
/* * VFIO_PAMU_MAP_MSI_BANK * * Maps the MSI bank at the specified index and iova. User space must * call this ioctl once for each MSI bank (count of banks is returned by * VFIO_IOMMU_GET_MSI_BANK_COUNT). * Caller provides struct vfio_pamu_msi_bank_map with all fields set. * Operates on VFIO file descriptor (/dev/vfio/vfio). * Return: 0 on success, -errno on failure */ struct vfio_pamu_msi_bank_map { __u32 argsz; __u32 msi_bank_index; /* the index of the MSI bank */ __u64 iova; /* the iova the bank is to be mapped to */ }; Again, flags. If we dynamically add or remove devices from a container the bank count can change, right? If bank count goes from 2 to 3, does userspace know assume the new bank is [2]? If bank count goes from 3 to 2, which index was that? If removing a device removes an MSI bank then vfio-pamu will automatically do the unmap, right? My assumption is that the bank count returned by VFIO_IOMMU_GET_MSI_BANK_COUNT is the max bank count for a platform. (number will mostly likely always be 3 or 4). So it won't change as devices are added or removed. If devices are added or removed, the kernel side can enable or disable the corresponding MSI windows. But that is hidden from user space. Stuart
Re: RFC: vfio API changes needed for powerpc (v2)
On 04/04/2013 04:38:44 PM, Yoder Stuart-B08248 wrote: /* * VFIO_PAMU_MAP_MSI_BANK * * Maps the MSI bank at the specified index and iova. User space must * call this ioctl once for each MSI bank (count of banks is returned by * VFIO_IOMMU_GET_MSI_BANK_COUNT). * Caller provides struct vfio_pamu_msi_bank_map with all fields set. * Operates on VFIO file descriptor (/dev/vfio/vfio). * Return: 0 on success, -errno on failure */ struct vfio_pamu_msi_bank_map { __u32 argsz; __u32 msi_bank_index; /* the index of the MSI bank */ __u64 iova; /* the iova the bank is to be mapped to */ }; Again, flags. If we dynamically add or remove devices from a container the bank count can change, right? If bank count goes from 2 to 3, does userspace know assume the new bank is [2]? If bank count goes from 3 to 2, which index was that? If removing a device removes an MSI bank then vfio-pamu will automatically do the unmap, right? My assumption is that the bank count returned by VFIO_IOMMU_GET_MSI_BANK_COUNT is the max bank count for a platform. (number will mostly likely always be 3 or 4). So it won't change as devices are added or removed. It should be the actual number of banks used. This is required if we're going to have userspace do the iteration and specify the exact iovas to use -- and even if we aren't going to do that, it would be more restrictive on available iova-space than is necessary. Usually there will be only one bank in the group. Actually mapping all of the MSI banks, all the time, would completely negate the point of using the separate alias pages. -Scott -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: RFC: vfio API changes needed for powerpc (v2)
-Original Message- From: Wood Scott-B07421 Sent: Thursday, April 04, 2013 5:52 PM To: Yoder Stuart-B08248 Cc: Alex Williamson; Wood Scott-B07421; ag...@suse.de; Bhushan Bharat-R65777; Sethi Varun-B16395; kvm@vger.kernel.org; qemu-de...@nongnu.org; io...@lists.linux-foundation.org Subject: Re: RFC: vfio API changes needed for powerpc (v2) On 04/04/2013 04:38:44 PM, Yoder Stuart-B08248 wrote: /* * VFIO_PAMU_MAP_MSI_BANK * * Maps the MSI bank at the specified index and iova. User space must * call this ioctl once for each MSI bank (count of banks is returned by * VFIO_IOMMU_GET_MSI_BANK_COUNT). * Caller provides struct vfio_pamu_msi_bank_map with all fields set. * Operates on VFIO file descriptor (/dev/vfio/vfio). * Return: 0 on success, -errno on failure */ struct vfio_pamu_msi_bank_map { __u32 argsz; __u32 msi_bank_index; /* the index of the MSI bank */ __u64 iova; /* the iova the bank is to be mapped to */ }; Again, flags. If we dynamically add or remove devices from a container the bank count can change, right? If bank count goes from 2 to 3, does userspace know assume the new bank is [2]? If bank count goes from 3 to 2, which index was that? If removing a device removes an MSI bank then vfio-pamu will automatically do the unmap, right? My assumption is that the bank count returned by VFIO_IOMMU_GET_MSI_BANK_COUNT is the max bank count for a platform. (number will mostly likely always be 3 or 4). So it won't change as devices are added or removed. It should be the actual number of banks used. This is required if we're going to have userspace do the iteration and specify the exact iovas to use -- and even if we aren't going to do that, it would be more restrictive on available iova-space than is necessary. Usually there will be only one bank in the group. Actually mapping all of the MSI banks, all the time, would completely negate the point of using the separate alias pages. The geometry, windows, DMA mappings, etc is set on a 'container' which may have multiple groups in it. So user space needs to determine the total number of MSI windows needed when setting the geometry and window count. In the flow you proposed: count = VFIO_IOMMU_GET_MSI_BANK_COUNT VFIO_IOMMU_SET_ATTR(ATTR_GEOMETRY) VFIO_IOMMU_SET_ATTR(ATTR_WINDOWS) // do other DMA maps now, or later, or not at all, doesn't matter for (i = 0; i count; i++) VFIO_IOMMU_MAP_MSI_BANK(iova, i); ...that count has to be the total, not the count for 1 of N possible groups. So the get count ioctl is not done on a group. However, like you pointed out we don't want to negate isolation of the separate alias pages. All this API is doing is telling the kernel which windows to use for which MSI banks. It's up to the kernel to actually enable them as needed. Say 3 MSI banks exist. If there are no groups added to the vfio container and all 3 MAP_MSI_BANK calls occurred the picture may look like this (based on my earlier example): win gphys/ # enabled iovaphys size --- --- 5 N 0x1000 0xf_fe044000 4KB// msi bank 1 6 N 0x1400 0xf_fe045000 4KB// msi bank 2 7 N 0x1800 0xf_fe046000 4KB// msi bank 3 User space adds 2 groups that use bank 1: win gphys/ # enabled iovaphys size --- --- 5 Y 0x1000 0xf_fe044000 4KB// msi bank 1 6 N 0x1400 0xf_fe045000 4KB// msi bank 2 7 N 0x1800 0xf_fe046000 4KB// msi bank 3 User space adds another group that uses bank 3: win gphys/ # enabled iovaphys size --- --- 5 Y 0x1000 0xf_fe044000 4KB// msi bank 1 6 N 0x1400 0xf_fe045000 4KB// msi bank 2 7 Y 0x1800 0xf_fe046000 4KB// msi bank 3 User space doesn't need to care what is actually enabled, it just needs to the the kernel which windows to use and the kernel can take care of the rest. Stuart -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html