RE: RFC: vfio interface for platform devices (v2)

2013-07-16 Thread Yoder Stuart-B08248


 -Original Message-
 From: Mario Smarduch [mailto:mario.smard...@huawei.com]
 Sent: Thursday, July 04, 2013 9:45 AM
 To: Yoder Stuart-B08248
 Cc: Alex Williamson; Alexander Graf; Wood Scott-B07421; k...@vger.kernel.org 
 list; Bhushan Bharat-R65777;
 kvm-ppc@vger.kernel.org; virtualizat...@lists.linux-foundation.org; Sethi 
 Varun-B16395;
 kvm...@lists.cs.columbia.edu
 Subject: Re: RFC: vfio interface for platform devices (v2)
 
 
 I'm having trouble understanding how this works where
 the Guest Device Model != Host. How do you inform the guest
 where the device is mapped in its physical address space,
 and handle GPA faults?

The vfio mechanisms just expose hardware to user space
and the user space app may or may not QEMU.  So there
may be no 'guest' at all.

The intent of this RFC is to provide enough info to user space so
an application can use the device, or in the case of QEMU expose
the device to a VM.  Platform devices are typically exposed via
the device tree and that is how I envision them being presented
to a guest.

Are there real cases you see where guest device model != host?
I don't envision ever presenting a platform device as a PCI device
or vise versa.

Stuart

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: RFC: vfio interface for platform devices

2013-07-16 Thread Yoder Stuart-B08248

 -Original Message-
 From: Wood Scott-B07421
 Sent: Wednesday, July 03, 2013 5:32 PM
 To: Yoder Stuart-B08248
 Cc: Alex Williamson; Alexander Graf; Wood Scott-B07421; Bhushan 
 Bharat-R65777; Sethi Varun-B16395;
 virtualizat...@lists.linux-foundation.org; Antonios Motakis; 
 k...@vger.kernel.org list; kvm-
 p...@vger.kernel.org; kvm...@lists.cs.columbia.edu
 Subject: Re: RFC: vfio interface for platform devices
 
 On 07/02/2013 06:25:59 PM, Yoder Stuart-B08248 wrote:
  The write-up below is the first draft of a proposal for how the
  kernel can expose
  platform devices to user space using vfio.
 
  In short, I'm proposing a new ioctl VFIO_DEVICE_GET_DEVTREE_INFO which
  allows user space to correlate regions and interrupts to the
  corresponding
  device tree node structure that is defined for most platform devices.
 
  Regards,
  Stuart Yoder
 
  --
  VFIO for Platform Devices
 
  The existing infrastructure for vfio-pci is pretty close to what we
  need:
 -mechanism to create a container
 -add groups/devices to a container
 -set the IOMMU model
 -map DMA regions
 -get an fd for a specific device, which allows user space to
  determine
  info about device regions (e.g. registers) and interrupt info
 -support for mmapping device regions
 -mechanism to set how interrupts are signaled
 
  Platform devices can get complicated-- potentially with a tree
  hierarchy
  of nodes, and links/phandles pointing to other platform
  devices.   The kernel doesn't expose relationships between
  devices.  The kernel just exposes mappable register regions and
  interrupts.
  It's up to user space to work out relationships between devices
  if it needs to-- this can be determined in the device tree exposed in
  /proc/device-tree.
 
  I think the changes needed for vfio are around some of the device tree
  related info that needs to be available with the device fd.
 
  1.  VFIO_GROUP_GET_DEVICE_FD
 
User space has to know which device it is accessing and will call
VFIO_GROUP_GET_DEVICE_FD passing a specific platform device path to
get the device information:
 
fd = ioctl(group, VFIO_GROUP_GET_DEVICE_FD,
  /soc@ffe00/usb@21);
 
(whether the path is a device tree path or a sysfs path is up for
discussion, e.g. /sys/bus/platform/devices/ffe21.usb)
 
 Doesn't VFIO need to operate on an actual Linux device, rather than
 just an OF node?
 
 Are we going to have a fixed assumption that you always want all the
 children of the node corresponding to the assigned device, or will it
 be possible to exclude some?

My assumption is that you always get all the children of the
node corresponding to the assigned device.

  2.  VFIO_DEVICE_GET_INFO
 
 Don't think any changes are needed to VFIO_DEVICE_GET_INFO other
 than adding a new flag identifying a devices as a 'platform'
 device.
 
 This ioctl simply returns the number of regions and number of irqs.
 
 The number of regions corresponds to the number of regions
 that can be mapped for the device-- corresponds to the regions
  defined
 in reg and ranges in the device tree.
 
  3.  VFIO_DEVICE_GET_REGION_INFO
 
 No changes needed, except perhaps adding a new flag.  Freescale
  has some
 devices with regions that must be mapped cacheable.
 
 While I don't object to making the information available to the user
 just in case, the main thing we need here is to influence what the
 kernel does when the user tries to map it.  At least on PPC it's not up
 to userspace to select whether a mmap is cacheable.

If user space really can't do anything with the 'cacheable'
flag, do you think there is good reason to keep it?   Will it
help any decision that user space makes?  Maybe we should just
drop it.
 
  4. VFIO_DEVICE_GET_DEVTREE_INFO
 
 The VFIO_DEVICE_GET_REGION_INFO and VFIO_DEVICE_GET_IRQ_INFO APIs
 expose device regions and interrupts, but it's not enough to know
 that there are X regions and Y interrupts.  User space needs to
 know what the resources are for-- to correlate those
  regions/interrupts
 to the device tree structure that drivers use.  The device tree
 structure could consist of multiple nodes and it is necessary to
 identify the node corresponding to the region/interrupt exposed
 by VFIO.
 
 The following information is needed:
-the device tree path to the node corresponding to the
 region or interrupt
-for a region, whether it corresponds to a reg or ranges
 property
-there could be multiple sub-regions per reg or ranges and
 the sub-index within the reg/ranges is needed
 
 The VFIO_DEVICE_GET_DEVTREE_INFO operates on a device fd.
 
 ioctl: VFIO_DEVICE_GET_DEVTREE_INFO
 
 struct vfio_path_info {
  __u32   argsz;
  __u32   flags;
 #define VFIO_DEVTREE_INFO_RANGES  (1  3

RE: RFC: vfio interface for platform devices (v2)

2013-07-16 Thread Yoder Stuart-B08248
(sorry for the delayed response, but I've been on PTO)

  1.  VFIO_GROUP_GET_DEVICE_FD
 
User space knows by out-of-band means which device it is accessing
and will call VFIO_GROUP_GET_DEVICE_FD passing a specific sysfs path
to get the device information:
 
fd = ioctl(group, VFIO_GROUP_GET_DEVICE_FD,
   /sys/bus/platform/devices/ffe21.usb));
 
 FWIW, I'm in favor of whichever way works out cleaner in the code for
 pre-pending /sys/bus or not.  It sort of seems like it's unnecessary.
 It's also a little inconsistent that the returned path doesn't
 pre-pend /sys in the examples below.

Ok.  For the returned path in the examples I have the actual device tree
path which is slightly different from the path in /sys.  The device
tree path is what user space would need to interpret /proc/device-tree.

  2.  VFIO_DEVICE_GET_INFO
 
 The number of regions corresponds to the regions defined
 in reg and ranges in the device tree.
 
 Two new flags are added to struct vfio_device_info:
 
 #define VFIO_DEVICE_FLAGS_PLATFORM (1  ?) /* A platform bus device */
 #define VFIO_DEVICE_FLAGS_DEVTREE  (1  ?) /* device tree info 
  available */
 
 It is possible that there could be platform bus devices
 that are not in the device tree, so we use 2 flags to
 allow for that.
 
 If just VFIO_DEVICE_FLAGS_PLATFORM is set, it means
 that there are regions and IRQs but no device tree info
 available.
 
 If just VFIO_DEVICE_FLAGS_DEVTREE is set, it means
 there is device tree info available.
 
 But it would be invalid to only have DEVTREE w/o PLATFORM for now,
 right?

Right.  The way I stated it is incorrect. DEVTREE would never
be set by itself.

  3. VFIO_DEVICE_GET_REGION_INFO
 
 For platform devices with multiple regions, information
 is needed to correlate the regions with the device
 tree structure that drivers use to determine the meaning
 of device resources.
 
 The VFIO_DEVICE_GET_REGION_INFO is extended to provide
 device tree information.
 
 The following information is needed:
-the device tree path to the node corresponding to the
 region
-whether it corresponds to a reg or ranges property
-there could be multiple sub-regions per reg or ranges and
 the sub-index within the reg/ranges is needed
 
 There are 5 new flags added to vfio_region_info :
 
 struct vfio_region_info {
  __u32   argsz;
  __u32   flags;
 #define VFIO_REGION_INFO_FLAG_CACHEABLE (1  ?)
 #define VFIO_DEVTREE_REGION_INFO_FLAG_REG (1  ?)
 #define VFIO_DEVTREE_REGION_INFO_FLAG_RANGE (1  ?)
 #define VFIO_DEVTREE_REGION_INFO_FLAG_INDEX (1  ?)
 #define VFIO_DEVTREE_REGION_INFO_FLAG_PATH (1  ?)
  __u32   index;  /* Region index */
  __u32   resv;   /* Reserved for alignment */
  __u64   size;   /* Region size (bytes) */
  __u64   offset; /* Region offset from start of device fd */
 };
 
 VFIO_REGION_INFO_FLAG_CACHEABLE
 -if set indicates that the region must be mapped as cacheable
 
 VFIO_DEVTREE_REGION_INFO_FLAG_REG
 -if set indicates that the region corresponds to a reg property
  in the device tree representation of the device
 
 VFIO_DEVTREE_REGION_INFO_FLAG_RANGE
 -if set indicates that the region corresponds to a ranges property
  in the device tree representation of the device
 
 VFIO_DEVTREE_REGION_INFO_FLAG_INDEX
 -if set indicates that there is a dword aligned struct
  struct vfio_devtree_region_info_index appended to the
  end of vfio_region_info:
 
  struct vfio_devtree_region_info_index
  {
u32 index;
  }
 
  A reg or ranges property may have multiple regsion.  The index
  specifies the index within the reg or ranges
  that this region corresponds to.
 
 VFIO_DEVTREE_REGION_INFO_FLAG_PATH
 -if set indicates that there is a dword aligned struct
  struct vfio_devtree_info_path appended to the
  end of vfio_region_info:
 
  struct vfio_devtree_info_path
  {
  u32 len;
  u8 path[];
  }
 
  The path is the full path to the corresponding device
  tree node.  The len field specifies the length of the
  path string.
 
 If multiple flags are set that indicate that there is
 an appended struct, the order of the flags indicates
 the order of the structs.
 
 argsz is set by the kernel specifying the total size of
 struct vfio_region_info and all appended structs.
 
 Suggested usage:
-call VFIO_DEVICE_GET_REGION_INFO with argsz =
 sizeof(struct vfio_region_info)
-realloc the buffer
-call VFIO_DEVICE_GET_REGION_INFO again, and the appended
 structs will be returned
 
  4.  VFIO_DEVICE_GET_IRQ_INFO
 
 

Re: RFC: vfio interface for platform devices

2013-07-16 Thread Scott Wood

On 07/16/2013 04:51:12 PM, Yoder Stuart-B08248 wrote:

  3.  VFIO_DEVICE_GET_REGION_INFO
 
 No changes needed, except perhaps adding a new flag.  Freescale
  has some
 devices with regions that must be mapped cacheable.

 While I don't object to making the information available to the user
 just in case, the main thing we need here is to influence what the
 kernel does when the user tries to map it.  At least on PPC it's  
not up

 to userspace to select whether a mmap is cacheable.

If user space really can't do anything with the 'cacheable'
flag, do you think there is good reason to keep it?   Will it
help any decision that user space makes?  Maybe we should just
drop it.


As long as we can be sure all architectures will map things correctly  
without any flags needing to be specified, that's fine.



 struct vfio_path_info {
  __u32   argsz;
  __u32   flags;
 #define VFIO_DEVTREE_INFO_RANGES  (1  3) /* the region  
is a

  ranges property */

 What about distinguishing a normal interrupt from one found in an
 interrupt-map?

I'm not sure we need that.  The kernel needs to use the interrupt
map to get interrupts hooked up right, but all user space needs to
know is that there are N interrupts and possibly device tree
paths to help user space interpret which interrupt is which.


What if the interrupt map is for devices without explicit nodes, such  
as with a PCI controller (ignore the fact that we would normally use  
vfio_pci for the indivdual PCI devices instead)?


You could say the same thing about ranges -- why expose ranges instead  
of the individual child node regs after translation?



 In the case of both ranges and interrupt-maps, we'll also want to
 decide what the policy is for when to expose them directly, versus  
just

 using them to translate regs and interrupts of child nodes

Yes, not sure the best approach there...but guess we can cross
that bridge when we implement this.  It doesn't affect this
interface.


It does affect the interface, because if you allow either of them to be  
mapped directly (rather than implicitly used when mapping a child  
node), you need a way to indicate which type of resource it is you're  
describing (as you already do for reg/ranges).


It also affects how vfio device binding is done, even if only to the  
point of specifying default behavior in the absence of knobs which  
change whether interrupt maps and/or ranges are mapped.



  __u8path[]; /* output: Full path to associated
  device tree node */

 How does the caller know what size buffer to supply for this?


Ping

-Scott
--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: RFC: vfio interface for platform devices

2013-07-16 Thread Yoder Stuart-B08248


 -Original Message-
 From: Wood Scott-B07421
 Sent: Tuesday, July 16, 2013 5:01 PM
 To: Yoder Stuart-B08248
 Cc: Wood Scott-B07421; Alex Williamson; Alexander Graf; Bhushan 
 Bharat-R65777; Sethi Varun-B16395;
 virtualizat...@lists.linux-foundation.org; Antonios Motakis; 
 k...@vger.kernel.org list; kvm-
 p...@vger.kernel.org; kvm...@lists.cs.columbia.edu
 Subject: Re: RFC: vfio interface for platform devices
 
 On 07/16/2013 04:51:12 PM, Yoder Stuart-B08248 wrote:
3.  VFIO_DEVICE_GET_REGION_INFO
   
   No changes needed, except perhaps adding a new flag.  Freescale
has some
   devices with regions that must be mapped cacheable.
  
   While I don't object to making the information available to the user
   just in case, the main thing we need here is to influence what the
   kernel does when the user tries to map it.  At least on PPC it's
  not up
   to userspace to select whether a mmap is cacheable.
 
  If user space really can't do anything with the 'cacheable'
  flag, do you think there is good reason to keep it?   Will it
  help any decision that user space makes?  Maybe we should just
  drop it.
 
 As long as we can be sure all architectures will map things correctly
 without any flags needing to be specified, that's fine.
 
   struct vfio_path_info {
__u32   argsz;
__u32   flags;
   #define VFIO_DEVTREE_INFO_RANGES  (1  3) /* the region
  is a
ranges property */
  
   What about distinguishing a normal interrupt from one found in an
   interrupt-map?
 
  I'm not sure we need that.  The kernel needs to use the interrupt
  map to get interrupts hooked up right, but all user space needs to
  know is that there are N interrupts and possibly device tree
  paths to help user space interpret which interrupt is which.
 
 What if the interrupt map is for devices without explicit nodes, such
 as with a PCI controller (ignore the fact that we would normally use
 vfio_pci for the indivdual PCI devices instead)?
 
 You could say the same thing about ranges -- why expose ranges instead
 of the individual child node regs after translation?

Hmm...yes, I guess ranges and interrupt-map fall into the same
basic type of resource category.  I'm not sure it's realistic
to pass entire bus controllers through to user space vs
just individual devices on a bus, but I guess it's theoretically
possible.

So the question is whether we future proof by adding flags 
for both ranges and interrupt-map, or wait until there is
an actual need for it.

   In the case of both ranges and interrupt-maps, we'll also want to
   decide what the policy is for when to expose them directly, versus
  just
   using them to translate regs and interrupts of child nodes
 
  Yes, not sure the best approach there...but guess we can cross
  that bridge when we implement this.  It doesn't affect this
  interface.
 
 It does affect the interface, because if you allow either of them to be
 mapped directly (rather than implicitly used when mapping a child
 node), you need a way to indicate which type of resource it is you're
 describing (as you already do for reg/ranges).

 It also affects how vfio device binding is done, even if only to the
 point of specifying default behavior in the absence of knobs which
 change whether interrupt maps and/or ranges are mapped.

My opinion is that we want to expose the regs and interrupts for
individual nodes by default, not ranges (or interrupt maps).   When someone
needs ranges/interrupt-map in the future they'll need to figure out some
means for the vfio layer to do the right thing.  It's complicated
and I would be surprised to see someone need it.
 
__u8path[]; /* output: Full path to associated
device tree node */
  
   How does the caller know what size buffer to supply for this?
 
 Ping

This is in the v2 RFC... the caller invokes the ioctl which returns
the complete/full size, then re-allocs the buffer and calls the
ioctl again.  Or, as Alex suggested, just use a sufficiently large
buffer to start with.

Stuart

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RFC: vfio interface for platform devices

2013-07-16 Thread Scott Wood

On 07/16/2013 05:41:04 PM, Yoder Stuart-B08248 wrote:



 -Original Message-
 From: Wood Scott-B07421
 Sent: Tuesday, July 16, 2013 5:01 PM
 To: Yoder Stuart-B08248
 Cc: Wood Scott-B07421; Alex Williamson; Alexander Graf; Bhushan  
Bharat-R65777; Sethi Varun-B16395;
 virtualizat...@lists.linux-foundation.org; Antonios Motakis;  
k...@vger.kernel.org list; kvm-

 p...@vger.kernel.org; kvm...@lists.cs.columbia.edu
 Subject: Re: RFC: vfio interface for platform devices

 What if the interrupt map is for devices without explicit nodes,  
such

 as with a PCI controller (ignore the fact that we would normally use
 vfio_pci for the indivdual PCI devices instead)?

 You could say the same thing about ranges -- why expose ranges  
instead

 of the individual child node regs after translation?

Hmm...yes, I guess ranges and interrupt-map fall into the same
basic type of resource category.  I'm not sure it's realistic
to pass entire bus controllers through to user space vs
just individual devices on a bus, but I guess it's theoretically
possible.


Where theoretically possible means we've done it before in other  
contexts. :-)



So the question is whether we future proof by adding flags
for both ranges and interrupt-map, or wait until there is
an actual need for it.


We don't need to actually add a flag for it, but we should have a  
flag/type for the resources we do support, so that code written to the  
current API would recognize that it doesn't recognize an interrupt-map  
entry if it's added later.


__u8path[]; /* output: Full path to  
associated

device tree node */
  
   How does the caller know what size buffer to supply for this?

 Ping

This is in the v2 RFC... the caller invokes the ioctl which returns
the complete/full size, then re-allocs the buffer and calls the
ioctl again.


OK.

Or, as Alex suggested, just use a sufficiently large buffer to start  
with.


It's fine for a user of the API to simplify things by using a large  
fixed buffer, but the API shouldn't force that approach.


-Scott
--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RFC: vfio interface for platform devices (v2)

2013-07-04 Thread Alexander Graf

On 04.07.2013, at 16:44, Mario Smarduch wrote:

 
 I'm having trouble understanding how this works where
 the Guest Device Model != Host. How do you inform the guest
 where the device is mapped in its physical address space,
 and handle GPA faults?

The same way as you would for emulated devices.


Alex

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RFC: vfio interface for platform devices (v2)

2013-07-04 Thread Mario Smarduch

I'm having trouble understanding how this works where
the Guest Device Model != Host. How do you inform the guest
where the device is mapped in its physical address space,
and handle GPA faults?

- Mario

On 7/3/2013 11:40 PM, Yoder Stuart-B08248 wrote:
 Version 2
   -VFIO_GROUP_GET_DEVICE_FD-- specified that the path is a sysfs path
   -VFIO_DEVICE_GET_INFO-- defined 2 flags instead of 1
   -deleted VFIO_DEVICE_GET_DEVTREE_INFO ioctl
   -VFIO_DEVICE_GET_REGION_INFO-- updated as per AlexW's suggestion,
defined 5 new flags and associated structs
   -VFIO_DEVICE_GET_IRQ_INFO-- updated as per AlexW's suggestion,
defined 1 new flag and associated struct
   -removed redundant example
 
 --
 VFIO for Platform Devices
 
 The existing kernel interface for vfio-pci is pretty close to what is needed
 for platform devices:
-mechanism to create a container
-add groups/devices to a container
-set the IOMMU model
-map DMA regions
-get an fd for a specific device, which allows user space to determine
 info about device regions (e.g. registers) and interrupt info
-support for mmapping device regions
-mechanism to set how interrupts are signaled
 
 Many platform device are simple and consist of a single register
 region and a single interrupt.  For these types of devices the
 existing vfio interfaces should be sufficient.
 
 However, platform devices can get complicated-- logically represented
 as a device tree hierarchy of nodes.  For devices with multiple regions
 and interrupts, new mechanisms are needed in vfio to correlate the
 regions/interrupts with the device tree structure that drivers use
 to determine the meaning of device resources.
 
 In some cases there are relationships between device, and devices
 reference other devices using phandle links.  The kernel won't expose
 relationships between devices, but just exposes mappable register
 regions and interrupts.
 
 The changes needed for vfio are around some of the device tree
 related info that needs to be available with the device fd.
 
 1.  VFIO_GROUP_GET_DEVICE_FD
 
   User space knows by out-of-band means which device it is accessing
   and will call VFIO_GROUP_GET_DEVICE_FD passing a specific sysfs path
   to get the device information:
 
   fd = ioctl(group, VFIO_GROUP_GET_DEVICE_FD,
  /sys/bus/platform/devices/ffe21.usb));
 
 2.  VFIO_DEVICE_GET_INFO
 
The number of regions corresponds to the regions defined
in reg and ranges in the device tree.  
 
Two new flags are added to struct vfio_device_info:
 
#define VFIO_DEVICE_FLAGS_PLATFORM (1  ?) /* A platform bus device */
#define VFIO_DEVICE_FLAGS_DEVTREE  (1  ?) /* device tree info available 
 */
 
It is possible that there could be platform bus devices 
that are not in the device tree, so we use 2 flags to
allow for that.
 
If just VFIO_DEVICE_FLAGS_PLATFORM is set, it means
that there are regions and IRQs but no device tree info
available.
 
If just VFIO_DEVICE_FLAGS_DEVTREE is set, it means
there is device tree info available.
 
 3. VFIO_DEVICE_GET_REGION_INFO
 
For platform devices with multiple regions, information
is needed to correlate the regions with the device 
tree structure that drivers use to determine the meaning
of device resources.

The VFIO_DEVICE_GET_REGION_INFO is extended to provide
device tree information.
 
The following information is needed:
   -the device tree path to the node corresponding to the
region
   -whether it corresponds to a reg or ranges property
   -there could be multiple sub-regions per reg or ranges and
the sub-index within the reg/ranges is needed
 
There are 5 new flags added to vfio_region_info :
 
struct vfio_region_info {
 __u32   argsz;
 __u32   flags;
#define VFIO_REGION_INFO_FLAG_CACHEABLE (1  ?)
#define VFIO_DEVTREE_REGION_INFO_FLAG_REG (1  ?)
#define VFIO_DEVTREE_REGION_INFO_FLAG_RANGE (1  ?)
#define VFIO_DEVTREE_REGION_INFO_FLAG_INDEX (1  ?)
#define VFIO_DEVTREE_REGION_INFO_FLAG_PATH (1  ?)
 __u32   index;  /* Region index */
 __u32   resv;   /* Reserved for alignment */
 __u64   size;   /* Region size (bytes) */
 __u64   offset; /* Region offset from start of device fd */
};
  
VFIO_REGION_INFO_FLAG_CACHEABLE
-if set indicates that the region must be mapped as cacheable
 
VFIO_DEVTREE_REGION_INFO_FLAG_REG
-if set indicates that the region corresponds to a reg property
 in the device tree representation of the device
 
VFIO_DEVTREE_REGION_INFO_FLAG_RANGE
-if set indicates that the region corresponds to a ranges property
 in the device tree representation of the device
 
VFIO_DEVTREE_REGION_INFO_FLAG_INDEX
-if set indicates that there is a 

Re: RFC: vfio interface for platform devices

2013-07-03 Thread Antonios Motakis
On Wed, Jul 3, 2013 at 5:07 AM, Alex Williamson
alex.william...@redhat.com wrote:
 On Tue, 2013-07-02 at 23:25 +, Yoder Stuart-B08248 wrote:
 The write-up below is the first draft of a proposal for how the kernel can 
 expose
 platform devices to user space using vfio.

 In short, I'm proposing a new ioctl VFIO_DEVICE_GET_DEVTREE_INFO which
 allows user space to correlate regions and interrupts to the corresponding
 device tree node structure that is defined for most platform devices.

 Regards,
 Stuart Yoder

 --
 VFIO for Platform Devices

 The existing infrastructure for vfio-pci is pretty close to what we need:
-mechanism to create a container
-add groups/devices to a container
-set the IOMMU model
-map DMA regions
-get an fd for a specific device, which allows user space to determine
 info about device regions (e.g. registers) and interrupt info
-support for mmapping device regions
-mechanism to set how interrupts are signaled

 Platform devices can get complicated-- potentially with a tree hierarchy
 of nodes, and links/phandles pointing to other platform
 devices.   The kernel doesn't expose relationships between
 devices.  The kernel just exposes mappable register regions and interrupts.
 It's up to user space to work out relationships between devices
 if it needs to-- this can be determined in the device tree exposed in
 /proc/device-tree.

 I think the changes needed for vfio are around some of the device tree
 related info that needs to be available with the device fd.

 1.  VFIO_GROUP_GET_DEVICE_FD

   User space has to know which device it is accessing and will call
   VFIO_GROUP_GET_DEVICE_FD passing a specific platform device path to
   get the device information:

   fd = ioctl(group, VFIO_GROUP_GET_DEVICE_FD, /soc@ffe00/usb@21);

   (whether the path is a device tree path or a sysfs path is up for
   discussion, e.g. /sys/bus/platform/devices/ffe21.usb)

 2.  VFIO_DEVICE_GET_INFO

Don't think any changes are needed to VFIO_DEVICE_GET_INFO other
than adding a new flag identifying a devices as a 'platform'
device.

This ioctl simply returns the number of regions and number of irqs.

The number of regions corresponds to the number of regions
that can be mapped for the device-- corresponds to the regions defined
in reg and ranges in the device tree.

 3.  VFIO_DEVICE_GET_REGION_INFO

No changes needed, except perhaps adding a new flag.  Freescale has some
devices with regions that must be mapped cacheable.

 3.  VFIO_DEVICE_GET_IRQ_INFO

No changes needed.

 4. VFIO_DEVICE_GET_DEVTREE_INFO

The VFIO_DEVICE_GET_REGION_INFO and VFIO_DEVICE_GET_IRQ_INFO APIs
expose device regions and interrupts, but it's not enough to know
that there are X regions and Y interrupts.  User space needs to
know what the resources are for-- to correlate those regions/interrupts
to the device tree structure that drivers use.  The device tree
structure could consist of multiple nodes and it is necessary to
identify the node corresponding to the region/interrupt exposed
by VFIO.

The following information is needed:
   -the device tree path to the node corresponding to the
region or interrupt
   -for a region, whether it corresponds to a reg or ranges
property
   -there could be multiple sub-regions per reg or ranges and
the sub-index within the reg/ranges is needed

The VFIO_DEVICE_GET_DEVTREE_INFO operates on a device fd.

ioctl: VFIO_DEVICE_GET_DEVTREE_INFO

struct vfio_path_info {
 __u32   argsz;
 __u32   flags;
#define VFIO_DEVTREE_INFO_RANGES  (1  3) /* the region is a 
 ranges property */

 (1  0)?

 Having flags = 0x0 for regs and 0x1 for ranges is a bit awkward.  I'd
 suggest a bit for each.  Otherwise, what does it mean when this returns
 flags = 0x0 for an irq?

 __u32   index;  /* input: index of region or irq for which 
 we are getting info */
 __u32   type;   /* input: 0 - get devtree info for a region
   1 - get devtree info for an irq
  */
 __u32   start;  /* output: identifies the index within the 
 reg/ranges */
 __u8path[]; /* output: Full path to associated device 
 tree node */
};

User space allocates enough space for the device tree path, sets
the type field identifying whether this is a region, or irq,
and sets argsz appropriately.

 5.  EXAMPLE 1

 Example, Freescale SATA controller:

  sata@22 {
  compatible = fsl,p2041-sata, fsl,pq-sata-v2;
  reg = 0x22 0x1000;
  interrupts = 0x44 0x2 0x0 0x0;
  };

 request to get device FD would look like:
   fd = ioctl(group, VFIO_GROUP_GET_DEVICE_FD, 
 /soc@ffe00/sata@22);


RE: RFC: vfio interface for platform devices

2013-07-03 Thread Yoder Stuart-B08248
[cut]

 So overall the interface and extension makes sense.  My only question is
 whether it's better to get complete reuse out of GET_REGION_INFO and
 GET_IRQ_INFO and then add another device tree specific ioctl or is it
 better to add a device tree index and path to the existing GET_*_INFO
 ioctls?  Getting some information from one ioctl and passing pieces of
 it back to another ioctl feels a little clunky.

Heh...extending region/irq info is the direction I started with, but 
because of the variable nature of the device tree data thought maybe
it was better to not add complexity to those APIs and leave them
alone.

Many or most platform devices will have 1 region and 1 interrupt, and
so it wouldn't be necessary in most cases to need device tree info at
all since there is no ambiguity.  So, was thinking that for the more
rare, complicated devices that a bit would advertise the existence of
the device tree info and the separate ioctl would be used to access it.

But, I'm completely open to extending the get region/irq info
ioctls if that direction is what you prefer...which seems to be
the case.

 DEVICE_GET_INFO will identify the device as device tree, which gives you
 the opportunity to extend or replace vfio_region_info and vfio_irq_info.
 It seems like it could even be done in a compatible way.  For example,
 if you were to call VFIO_DEVICE_GET_REGION_INFO with argsz =
 sizeof(struct vfio_region_info), the kernel could fill in all the info
 up to that size and fill argsz with the size needed for the remaining
 info.  You could then realloc the buffer and the kernel would add the
 extra info on the next call, setting a flag for each additional field
 returned.  Userspace could also just be sloppy and call it with a lot of
 padding and get everything in one shot.
 
 We'd need to define which flags have associated structures and define
 those structures.  For instance, some require no space:
 
 #define VFIO_DEVTREE_REGION_INFO_FLAG_REG (1  ?)
 #define VFIO_DEVTREE_REGION_INFO_FLAG_RANGE (1  ?)
 
 Others imply a structure added to the end:
 
 #define VFIO_DEVTREE_REGION_INFO_FLAG_INDEX (1  ?)
 
 struct vfio_devtree_region_info_index
 {
   u32 index;
 }
 
 #define VFIO_DEVTREE_REGION_INFO_FLAG_PATH (1  ?)
 
 struct vfio_devtree_region_info_path
 {
   u32 len;
   u8  path[];
 }
 
 The order of the flags indicates the order of the structures at the end.
 We'd need to have some rules about alignment, probably always dword
 aligned.  I'm not sure if it would be necessary each structure to have a
 length.  It would only be needed if we want to let userspace skip over
 structures they don't understand how to parse.
 
 Another idea is that the space after struct vfio_region/irq_info could
 be a self describing capabilities area, much like PCI config space.
 Starting immediately after the static structure we'd have:
 
 struct vfio_info_cap_header
 {
   u16 type;
   u16 next;
 };
 
 Where type defines the structure that follows and next indicates the
 offset of then next header (could also be len of current cap).
 
 Anyway, it seems like there are possibilities that would allow us to
 extend the info ioctls in ways that would be generic for any device
 type.  Thanks,

I think I like the approach using the flags and struct
extensions.

Stuart


Re: RFC: vfio interface for platform devices

2013-07-03 Thread Scott Wood

On 07/02/2013 08:07:53 PM, Alexander Graf wrote:


On 03.07.2013, at 01:25, Yoder Stuart-B08248 wrote:

 8.  Open Issues

   -how to handle cases where VFIO is requested to handle
a device where the valid, mappable range for a region
is less than a page size.   See example above where an
advertised region in the DMA node is 4 bytes.  If exposed
to a guest VM, the guest has to be able to map a full page
of I/O space which opens a potential security issue.

The way we solved this for legacy PCI device assignment was by going  
through QEMU for emulation and falling back to legacy read/write  
IIRC. We could probably do the same here. IIRC there was a way for a  
normal Linux mmap'ed device region to trap individual accesses too,  
so we could just use that one too.


The slow path emulation would then happen magically in QEMU, since  
MMIO writes will get reinjected into the normal QEMU MMIO handling  
path which will just issue a read/write on the mmap'ed region if it's  
not declared as emulated.


I agree that's what should happen by default, but there should be a way  
for root to tell vfio that a device is allowed to overmap, in order to  
get the performance benefit of direct access in cases where root knows  
(or explicitly doesn't care) that it is safe.


-Scott
--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: RFC: vfio interface for platform devices

2013-07-03 Thread Yoder Stuart-B08248


 -Original Message-
 From: Wood Scott-B07421
 Sent: Wednesday, July 03, 2013 1:52 PM
 To: Alexander Graf
 Cc: Yoder Stuart-B08248; Alex Williamson; Wood Scott-B07421; Bhushan 
 Bharat-R65777; Sethi Varun-B16395;
 virtualizat...@lists.linux-foundation.org; Antonios Motakis; 
 k...@vger.kernel.org list; kvm-
 p...@vger.kernel.org; kvm...@lists.cs.columbia.edu
 Subject: Re: RFC: vfio interface for platform devices
 
 On 07/02/2013 08:07:53 PM, Alexander Graf wrote:
 
  On 03.07.2013, at 01:25, Yoder Stuart-B08248 wrote:
 
   8.  Open Issues
  
 -how to handle cases where VFIO is requested to handle
  a device where the valid, mappable range for a region
  is less than a page size.   See example above where an
  advertised region in the DMA node is 4 bytes.  If exposed
  to a guest VM, the guest has to be able to map a full page
  of I/O space which opens a potential security issue.
 
  The way we solved this for legacy PCI device assignment was by going
  through QEMU for emulation and falling back to legacy read/write
  IIRC. We could probably do the same here. IIRC there was a way for a
  normal Linux mmap'ed device region to trap individual accesses too,
  so we could just use that one too.
 
  The slow path emulation would then happen magically in QEMU, since
  MMIO writes will get reinjected into the normal QEMU MMIO handling
  path which will just issue a read/write on the mmap'ed region if it's
  not declared as emulated.
 
 I agree that's what should happen by default, but there should be a way
 for root to tell vfio that a device is allowed to overmap, in order to
 get the performance benefit of direct access in cases where root knows
 (or explicitly doesn't care) that it is safe.

Perhaps a sysfs mechanism like this:

echo /sys/bus/platform/devices/ffe21.usb  
/sys/bus/platform/drivers/vfio-platform/allow_overmap

Stuart




--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: RFC: vfio interface for platform devices

2013-07-03 Thread Yoder Stuart-B08248
[cut]
  So overall the interface and extension makes sense.  My only question is
  whether it's better to get complete reuse out of GET_REGION_INFO and
  GET_IRQ_INFO and then add another device tree specific ioctl or is it
  better to add a device tree index and path to the existing GET_*_INFO
  ioctls?  Getting some information from one ioctl and passing pieces of
  it back to another ioctl feels a little clunky.
 
 
 I thing at this point we should clearly separate the info we need to
 pass for the core functionality (assigning the device's resources),
 and the information we want to pass in order to generate a guest DT.
 For ARM a DT is not generated by QEMU yet, but instead a proper DTB
 needs to be passed by the user (granted, this will not be the case for
 ever). So I think even if we treat them the same in code, we should be
 discussing them separately.

We do need to keep core resources separate from what it takes
to generate a guest DT, but note the purpose of the devtree info
is not primarily to help generate a guest DT.

User space (not just QEMU) needs to know what the regions
and interrupts advertised by DEVICE_GET_INFO correspond to.
If there are 4 interrupts and 2 register regions, how does user
space know the purpose/function of each?
Apart from something like the devtree info I don't see
how a user space driver can know how to use the regions
and interrupts.   The kernel is not guaranteeing any
particular ordering of resources.

So in the DMA engine example I gave, the devtree info
let's user space know which interrupt corresponds to
which DMA channel.

QEMU is a special case in that it is going to expose
the device to a virtual machine and needs to generate
a normal device tree node...but that is a separate problem
that needs to be solved in QEMU.

Stuart

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RFC: vfio interface for platform devices

2013-07-03 Thread Scott Wood

On 07/02/2013 06:25:59 PM, Yoder Stuart-B08248 wrote:
The write-up below is the first draft of a proposal for how the  
kernel can expose

platform devices to user space using vfio.

In short, I'm proposing a new ioctl VFIO_DEVICE_GET_DEVTREE_INFO which
allows user space to correlate regions and interrupts to the  
corresponding

device tree node structure that is defined for most platform devices.

Regards,
Stuart Yoder

--
VFIO for Platform Devices

The existing infrastructure for vfio-pci is pretty close to what we  
need:

   -mechanism to create a container
   -add groups/devices to a container
   -set the IOMMU model
   -map DMA regions
   -get an fd for a specific device, which allows user space to  
determine

info about device regions (e.g. registers) and interrupt info
   -support for mmapping device regions
   -mechanism to set how interrupts are signaled

Platform devices can get complicated-- potentially with a tree  
hierarchy

of nodes, and links/phandles pointing to other platform
devices.   The kernel doesn't expose relationships between
devices.  The kernel just exposes mappable register regions and  
interrupts.

It's up to user space to work out relationships between devices
if it needs to-- this can be determined in the device tree exposed in
/proc/device-tree.

I think the changes needed for vfio are around some of the device tree
related info that needs to be available with the device fd.

1.  VFIO_GROUP_GET_DEVICE_FD

  User space has to know which device it is accessing and will call
  VFIO_GROUP_GET_DEVICE_FD passing a specific platform device path to
  get the device information:

  fd = ioctl(group, VFIO_GROUP_GET_DEVICE_FD,  
/soc@ffe00/usb@21);


  (whether the path is a device tree path or a sysfs path is up for
  discussion, e.g. /sys/bus/platform/devices/ffe21.usb)


Doesn't VFIO need to operate on an actual Linux device, rather than  
just an OF node?


Are we going to have a fixed assumption that you always want all the  
children of the node corresponding to the assigned device, or will it  
be possible to exclude some?



2.  VFIO_DEVICE_GET_INFO

   Don't think any changes are needed to VFIO_DEVICE_GET_INFO other
   than adding a new flag identifying a devices as a 'platform'
   device.

   This ioctl simply returns the number of regions and number of irqs.

   The number of regions corresponds to the number of regions
   that can be mapped for the device-- corresponds to the regions  
defined

   in reg and ranges in the device tree.

3.  VFIO_DEVICE_GET_REGION_INFO

   No changes needed, except perhaps adding a new flag.  Freescale  
has some

   devices with regions that must be mapped cacheable.


While I don't object to making the information available to the user  
just in case, the main thing we need here is to influence what the  
kernel does when the user tries to map it.  At least on PPC it's not up  
to userspace to select whether a mmap is cacheable.



4. VFIO_DEVICE_GET_DEVTREE_INFO

   The VFIO_DEVICE_GET_REGION_INFO and VFIO_DEVICE_GET_IRQ_INFO APIs
   expose device regions and interrupts, but it's not enough to know
   that there are X regions and Y interrupts.  User space needs to
   know what the resources are for-- to correlate those  
regions/interrupts

   to the device tree structure that drivers use.  The device tree
   structure could consist of multiple nodes and it is necessary to
   identify the node corresponding to the region/interrupt exposed
   by VFIO.

   The following information is needed:
  -the device tree path to the node corresponding to the
   region or interrupt
  -for a region, whether it corresponds to a reg or ranges
   property
  -there could be multiple sub-regions per reg or ranges and
   the sub-index within the reg/ranges is needed

   The VFIO_DEVICE_GET_DEVTREE_INFO operates on a device fd.

   ioctl: VFIO_DEVICE_GET_DEVTREE_INFO

   struct vfio_path_info {
__u32   argsz;
__u32   flags;
   #define VFIO_DEVTREE_INFO_RANGES  (1  3) /* the region is a  
ranges property */


What about distinguishing a normal interrupt from one found in an  
interrupt-map?


In the case of both ranges and interrupt-maps, we'll also want to  
decide what the policy is for when to expose them directly, versus just  
using them to translate regs and interrupts of child nodes.


__u32   index;  /* input: index of region or irq for  
which we are getting info */
__u32   type;   /* input: 0 - get devtree info for a  
region
  1 - get devtree info for an  
irq

 */
__u32   start;  /* output: identifies the index  
within the reg/ranges */


start is an odd name for this.  I'd rename index to vfio_index  
and this to dt_index.


__u8path[]; /* output: Full path to 

Re: RFC: vfio interface for platform devices (v2)

2013-07-03 Thread Alex Williamson
On Wed, 2013-07-03 at 21:40 +, Yoder Stuart-B08248 wrote:
 Version 2
   -VFIO_GROUP_GET_DEVICE_FD-- specified that the path is a sysfs path
   -VFIO_DEVICE_GET_INFO-- defined 2 flags instead of 1
   -deleted VFIO_DEVICE_GET_DEVTREE_INFO ioctl
   -VFIO_DEVICE_GET_REGION_INFO-- updated as per AlexW's suggestion,
defined 5 new flags and associated structs
   -VFIO_DEVICE_GET_IRQ_INFO-- updated as per AlexW's suggestion,
defined 1 new flag and associated struct
   -removed redundant example
 
 --
 VFIO for Platform Devices
 
 The existing kernel interface for vfio-pci is pretty close to what is needed
 for platform devices:
-mechanism to create a container
-add groups/devices to a container
-set the IOMMU model
-map DMA regions
-get an fd for a specific device, which allows user space to determine
 info about device regions (e.g. registers) and interrupt info
-support for mmapping device regions
-mechanism to set how interrupts are signaled
 
 Many platform device are simple and consist of a single register
 region and a single interrupt.  For these types of devices the
 existing vfio interfaces should be sufficient.
 
 However, platform devices can get complicated-- logically represented
 as a device tree hierarchy of nodes.  For devices with multiple regions
 and interrupts, new mechanisms are needed in vfio to correlate the
 regions/interrupts with the device tree structure that drivers use
 to determine the meaning of device resources.
 
 In some cases there are relationships between device, and devices
 reference other devices using phandle links.  The kernel won't expose
 relationships between devices, but just exposes mappable register
 regions and interrupts.
 
 The changes needed for vfio are around some of the device tree
 related info that needs to be available with the device fd.
 
 1.  VFIO_GROUP_GET_DEVICE_FD
 
   User space knows by out-of-band means which device it is accessing
   and will call VFIO_GROUP_GET_DEVICE_FD passing a specific sysfs path
   to get the device information:
 
   fd = ioctl(group, VFIO_GROUP_GET_DEVICE_FD,
  /sys/bus/platform/devices/ffe21.usb));

FWIW, I'm in favor of whichever way works out cleaner in the code for
pre-pending /sys/bus or not.  It sort of seems like it's unnecessary.
It's also a little inconsistent that the returned path doesn't
pre-pend /sys in the examples below.

 2.  VFIO_DEVICE_GET_INFO
 
The number of regions corresponds to the regions defined
in reg and ranges in the device tree.  
 
Two new flags are added to struct vfio_device_info:
 
#define VFIO_DEVICE_FLAGS_PLATFORM (1  ?) /* A platform bus device */
#define VFIO_DEVICE_FLAGS_DEVTREE  (1  ?) /* device tree info available 
 */
 
It is possible that there could be platform bus devices 
that are not in the device tree, so we use 2 flags to
allow for that.
 
If just VFIO_DEVICE_FLAGS_PLATFORM is set, it means
that there are regions and IRQs but no device tree info
available.
 
If just VFIO_DEVICE_FLAGS_DEVTREE is set, it means
there is device tree info available.

But it would be invalid to only have DEVTREE w/o PLATFORM for now,
right?

 3. VFIO_DEVICE_GET_REGION_INFO
 
For platform devices with multiple regions, information
is needed to correlate the regions with the device 
tree structure that drivers use to determine the meaning
of device resources.

The VFIO_DEVICE_GET_REGION_INFO is extended to provide
device tree information.
 
The following information is needed:
   -the device tree path to the node corresponding to the
region
   -whether it corresponds to a reg or ranges property
   -there could be multiple sub-regions per reg or ranges and
the sub-index within the reg/ranges is needed
 
There are 5 new flags added to vfio_region_info :
 
struct vfio_region_info {
 __u32   argsz;
 __u32   flags;
#define VFIO_REGION_INFO_FLAG_CACHEABLE (1  ?)
#define VFIO_DEVTREE_REGION_INFO_FLAG_REG (1  ?)
#define VFIO_DEVTREE_REGION_INFO_FLAG_RANGE (1  ?)
#define VFIO_DEVTREE_REGION_INFO_FLAG_INDEX (1  ?)
#define VFIO_DEVTREE_REGION_INFO_FLAG_PATH (1  ?)
 __u32   index;  /* Region index */
 __u32   resv;   /* Reserved for alignment */
 __u64   size;   /* Region size (bytes) */
 __u64   offset; /* Region offset from start of device fd */
};
  
VFIO_REGION_INFO_FLAG_CACHEABLE
-if set indicates that the region must be mapped as cacheable
 
VFIO_DEVTREE_REGION_INFO_FLAG_REG
-if set indicates that the region corresponds to a reg property
 in the device tree representation of the device
 
VFIO_DEVTREE_REGION_INFO_FLAG_RANGE
-if set indicates that the region corresponds to a ranges property
 in the 

Re: RFC: vfio interface for platform devices (v2)

2013-07-03 Thread Scott Wood

On 07/03/2013 05:53:09 PM, Alex Williamson wrote:

Seems like it should work.  My only API concern with this model of
appending structs is that a user needs to know the size of each struct
even if they don't otherwise care about it in order to step over it.


In that case, it might be better to make the struct grow linearly  
rather than with options, and just have a version number on the struct  
indicating how far the caller thinks struct has grown.  The kernel  
could respond back with a lower version to reflect that it only filled  
in the fields it knows about.  Flags could still be used to indicate  
which portions of the struct are relevant, but not the physical layout  
of the struct.


In some cases, like the path, the size is variable and the user needs  
to

look into it.


For things like path, maybe the caller should just pass in a string  
buffer that is separate from the struct buffer?


-Scott
--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RFC: vfio interface for platform devices

2013-07-02 Thread Alexander Graf

On 03.07.2013, at 01:25, Yoder Stuart-B08248 wrote:

 The write-up below is the first draft of a proposal for how the kernel can 
 expose
 platform devices to user space using vfio.
 
 In short, I'm proposing a new ioctl VFIO_DEVICE_GET_DEVTREE_INFO which
 allows user space to correlate regions and interrupts to the corresponding
 device tree node structure that is defined for most platform devices.
 
 Regards,
 Stuart Yoder
 
 --
 VFIO for Platform Devices
 
 The existing infrastructure for vfio-pci is pretty close to what we need:
   -mechanism to create a container
   -add groups/devices to a container
   -set the IOMMU model
   -map DMA regions
   -get an fd for a specific device, which allows user space to determine
info about device regions (e.g. registers) and interrupt info
   -support for mmapping device regions
   -mechanism to set how interrupts are signaled
 
 Platform devices can get complicated-- potentially with a tree hierarchy
 of nodes, and links/phandles pointing to other platform 
 devices.   The kernel doesn't expose relationships between
 devices.  The kernel just exposes mappable register regions and interrupts.
 It's up to user space to work out relationships between devices
 if it needs to-- this can be determined in the device tree exposed in
 /proc/device-tree.
 
 I think the changes needed for vfio are around some of the device tree
 related info that needs to be available with the device fd.
 
 1.  VFIO_GROUP_GET_DEVICE_FD
 
  User space has to know which device it is accessing and will call
  VFIO_GROUP_GET_DEVICE_FD passing a specific platform device path to
  get the device information:
 
  fd = ioctl(group, VFIO_GROUP_GET_DEVICE_FD, /soc@ffe00/usb@21);
 
  (whether the path is a device tree path or a sysfs path is up for
  discussion, e.g. /sys/bus/platform/devices/ffe21.usb)
 
 2.  VFIO_DEVICE_GET_INFO
 
   Don't think any changes are needed to VFIO_DEVICE_GET_INFO other
   than adding a new flag identifying a devices as a 'platform'
   device.
 
   This ioctl simply returns the number of regions and number of irqs.
 
   The number of regions corresponds to the number of regions
   that can be mapped for the device-- corresponds to the regions defined
   in reg and ranges in the device tree.  
 
 3.  VFIO_DEVICE_GET_REGION_INFO
 
   No changes needed, except perhaps adding a new flag.  Freescale has some
   devices with regions that must be mapped cacheable.
 
 3.  VFIO_DEVICE_GET_IRQ_INFO
 
   No changes needed.
 
 4. VFIO_DEVICE_GET_DEVTREE_INFO
 
   The VFIO_DEVICE_GET_REGION_INFO and VFIO_DEVICE_GET_IRQ_INFO APIs
   expose device regions and interrupts, but it's not enough to know
   that there are X regions and Y interrupts.  User space needs to
   know what the resources are for-- to correlate those regions/interrupts
   to the device tree structure that drivers use.  The device tree
   structure could consist of multiple nodes and it is necessary to
   identify the node corresponding to the region/interrupt exposed
   by VFIO.
 
   The following information is needed:
  -the device tree path to the node corresponding to the
   region or interrupt
  -for a region, whether it corresponds to a reg or ranges
   property
  -there could be multiple sub-regions per reg or ranges and
   the sub-index within the reg/ranges is needed
 
   The VFIO_DEVICE_GET_DEVTREE_INFO operates on a device fd.
 
   ioctl: VFIO_DEVICE_GET_DEVTREE_INFO
 
   struct vfio_path_info {
__u32   argsz;
__u32   flags;
   #define VFIO_DEVTREE_INFO_RANGES  (1  3) /* the region is a ranges 
 property */
__u32   index;  /* input: index of region or irq for which we 
 are getting info */
__u32   type;   /* input: 0 - get devtree info for a region
  1 - get devtree info for an irq
 */
__u32   start;  /* output: identifies the index within the 
 reg/ranges */
__u8path[]; /* output: Full path to associated device tree 
 node */
   };
 
   User space allocates enough space for the device tree path, sets
   the type field identifying whether this is a region, or irq,
   and sets argsz appropriately.
 
 5.  EXAMPLE 1
 
Example, Freescale SATA controller:
 
 sata@22 {
 compatible = fsl,p2041-sata, fsl,pq-sata-v2;
 reg = 0x22 0x1000;
 interrupts = 0x44 0x2 0x0 0x0;
 };
 
request to get device FD would look like:
  fd = ioctl(group, VFIO_GROUP_GET_DEVICE_FD, 
 /soc@ffe00/sata@22);
 
The VFIO_DEVICE_GET_INFO ioctl would return:
  -1 region
  -1 interrupts
 
The VFIO_DEVICE_GET_REGION_INFO ioctl would return:
  -for index 0:
   offset=0, size=0x1 -- allows mmap of physical 0xffe22
 
The VFIO_DEVICE_GET_IRQ_INFO ioctl would return appropriate 

Re: RFC: vfio interface for platform devices

2013-07-02 Thread Alex Williamson
On Tue, 2013-07-02 at 23:25 +, Yoder Stuart-B08248 wrote:
 The write-up below is the first draft of a proposal for how the kernel can 
 expose
 platform devices to user space using vfio.
 
 In short, I'm proposing a new ioctl VFIO_DEVICE_GET_DEVTREE_INFO which
 allows user space to correlate regions and interrupts to the corresponding
 device tree node structure that is defined for most platform devices.
 
 Regards,
 Stuart Yoder
 
 --
 VFIO for Platform Devices
 
 The existing infrastructure for vfio-pci is pretty close to what we need:
-mechanism to create a container
-add groups/devices to a container
-set the IOMMU model
-map DMA regions
-get an fd for a specific device, which allows user space to determine
 info about device regions (e.g. registers) and interrupt info
-support for mmapping device regions
-mechanism to set how interrupts are signaled
 
 Platform devices can get complicated-- potentially with a tree hierarchy
 of nodes, and links/phandles pointing to other platform 
 devices.   The kernel doesn't expose relationships between
 devices.  The kernel just exposes mappable register regions and interrupts.
 It's up to user space to work out relationships between devices
 if it needs to-- this can be determined in the device tree exposed in
 /proc/device-tree.
 
 I think the changes needed for vfio are around some of the device tree
 related info that needs to be available with the device fd.
 
 1.  VFIO_GROUP_GET_DEVICE_FD
 
   User space has to know which device it is accessing and will call
   VFIO_GROUP_GET_DEVICE_FD passing a specific platform device path to
   get the device information:
 
   fd = ioctl(group, VFIO_GROUP_GET_DEVICE_FD, /soc@ffe00/usb@21);
 
   (whether the path is a device tree path or a sysfs path is up for
   discussion, e.g. /sys/bus/platform/devices/ffe21.usb)
 
 2.  VFIO_DEVICE_GET_INFO
 
Don't think any changes are needed to VFIO_DEVICE_GET_INFO other
than adding a new flag identifying a devices as a 'platform'
device.
 
This ioctl simply returns the number of regions and number of irqs.
 
The number of regions corresponds to the number of regions
that can be mapped for the device-- corresponds to the regions defined
in reg and ranges in the device tree.  
 
 3.  VFIO_DEVICE_GET_REGION_INFO
 
No changes needed, except perhaps adding a new flag.  Freescale has some
devices with regions that must be mapped cacheable.
 
 3.  VFIO_DEVICE_GET_IRQ_INFO
 
No changes needed.
 
 4. VFIO_DEVICE_GET_DEVTREE_INFO
 
The VFIO_DEVICE_GET_REGION_INFO and VFIO_DEVICE_GET_IRQ_INFO APIs
expose device regions and interrupts, but it's not enough to know
that there are X regions and Y interrupts.  User space needs to
know what the resources are for-- to correlate those regions/interrupts
to the device tree structure that drivers use.  The device tree
structure could consist of multiple nodes and it is necessary to
identify the node corresponding to the region/interrupt exposed
by VFIO.
 
The following information is needed:
   -the device tree path to the node corresponding to the
region or interrupt
   -for a region, whether it corresponds to a reg or ranges
property
   -there could be multiple sub-regions per reg or ranges and
the sub-index within the reg/ranges is needed
 
The VFIO_DEVICE_GET_DEVTREE_INFO operates on a device fd.
 
ioctl: VFIO_DEVICE_GET_DEVTREE_INFO

struct vfio_path_info {
 __u32   argsz;
 __u32   flags;
#define VFIO_DEVTREE_INFO_RANGES  (1  3) /* the region is a ranges 
 property */

(1  0)?

Having flags = 0x0 for regs and 0x1 for ranges is a bit awkward.  I'd
suggest a bit for each.  Otherwise, what does it mean when this returns
flags = 0x0 for an irq?

 __u32   index;  /* input: index of region or irq for which we 
 are getting info */
 __u32   type;   /* input: 0 - get devtree info for a region
   1 - get devtree info for an irq
  */
 __u32   start;  /* output: identifies the index within the 
 reg/ranges */
 __u8path[]; /* output: Full path to associated device 
 tree node */
};
 
User space allocates enough space for the device tree path, sets
the type field identifying whether this is a region, or irq,
and sets argsz appropriately.
 
 5.  EXAMPLE 1
 
 Example, Freescale SATA controller:
 
  sata@22 {
  compatible = fsl,p2041-sata, fsl,pq-sata-v2;
  reg = 0x22 0x1000;
  interrupts = 0x44 0x2 0x0 0x0;
  };
 
 request to get device FD would look like:
   fd = ioctl(group, VFIO_GROUP_GET_DEVICE_FD, 
 /soc@ffe00/sata@22);
 
 The VFIO_DEVICE_GET_INFO ioctl would return: