Re: [PATCH V3] VFIO driver: Non-privileged user level PCI drivers

2010-07-30 Thread Alex Williamson
On Thu, 2010-07-29 at 17:27 +0300, Michael S. Tsirkin wrote: 
> On Wed, Jul 28, 2010 at 04:13:19PM -0600, Alex Williamson wrote:
> > On Thu, 2010-07-29 at 00:57 +0300, Michael S. Tsirkin wrote:
> > > On Wed, Jul 28, 2010 at 03:57:02PM -0600, Alex Williamson wrote:
> > > > 
> > > > Something like GET_MSIX_VECTORS seems like a user library routine to me.
> > > > The PCI config space is well specified and if we try to do more than
> > > > shortcut trivial operations (like getting the BAR length), we risk
> > > > losing functionality.  And for my purposes, translating to and from a
> > > > made up API to PCI for the guest seems like a pain.
> > > 
> > > Won't a userspace library do just as well for you?
> > 
> > You mean aside from qemu's reluctance to add dependencies for more
> > libraries?
> 
> Main reason is portability. So as long as it's kvm-only stuff, they
> likely won't care.

I'd like the qemu vfio driver to be available for both qemu and kvm.  It
may have limitation in qemu mode from non-atomic memory writes via tcg,
but I really don't want to have it only live in qemu-kvm like the
current device assignment code.

> > My only concern is that I want enough virtualized/raw config
> > space that I'm not always translating PCI config accesses from the guest
> > into some userspace API.  If it makes sense to do this for things like
> > MSI, since I need someone to figure out what resources can actually be
> > allocated on the host, then maybe the library makes sense for that.
> > Then again, if every user needs to do this, let the vfio kernel driver
> > check what it can get and virtualize the available MSIs in exposed
> > config space, and my driver would be even happier.
> > 
> > Alex
> 
> It would?  guest driver might or might not work if you reduce the number
> of vectors for device.  So I think you need an API to find out whether
> all vectors can be allocated.
>
> And these are examples of why virtualizing is wrong:
> 1. hides real hardware
> 2. no way to report errors

So I think you're proposing something like:

ioctl(GET_MSIX_VECTORS)
ioctl(VFIO_EVENTFDS_MSIX)

which I don't see any value over:

Get #MSIX supported via expose PCI config space
ioctl(VFIO_EVENTFDS_MSIX)

The GET_MSIX_VECTORS can't do any better job of predicting how many
interrupts can be allocated than looking at what the hardware supports.
Someone else could have exhausted the interrupt vectors by the time you
make the second call.  You'll note here that enabling the MSIX
interrupts actually requires an ioctl to associate eventfds with each
interrupt, so yes, we do have a way to report errors if the system can't
support the number of interrupts requested.  Thanks,

Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V3] VFIO driver: Non-privileged user level PCI drivers

2010-07-29 Thread Michael S. Tsirkin
On Wed, Jul 28, 2010 at 04:13:19PM -0600, Alex Williamson wrote:
> On Thu, 2010-07-29 at 00:57 +0300, Michael S. Tsirkin wrote:
> > On Wed, Jul 28, 2010 at 03:57:02PM -0600, Alex Williamson wrote:
> > > 
> > > Something like GET_MSIX_VECTORS seems like a user library routine to me.
> > > The PCI config space is well specified and if we try to do more than
> > > shortcut trivial operations (like getting the BAR length), we risk
> > > losing functionality.  And for my purposes, translating to and from a
> > > made up API to PCI for the guest seems like a pain.
> > 
> > Won't a userspace library do just as well for you?
> 
> You mean aside from qemu's reluctance to add dependencies for more
> libraries?

Main reason is portability. So as long as it's kvm-only stuff, they
likely won't care.

> My only concern is that I want enough virtualized/raw config
> space that I'm not always translating PCI config accesses from the guest
> into some userspace API.  If it makes sense to do this for things like
> MSI, since I need someone to figure out what resources can actually be
> allocated on the host, then maybe the library makes sense for that.
> Then again, if every user needs to do this, let the vfio kernel driver
> check what it can get and virtualize the available MSIs in exposed
> config space, and my driver would be even happier.
> 
> Alex

It would?  guest driver might or might not work if you reduce the number
of vectors for device.  So I think you need an API to find out whether
all vectors can be allocated.

And these are examples of why virtualizing is wrong:
1. hides real hardware
2. no way to report errors

-- 
MST
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V3] VFIO driver: Non-privileged user level PCI drivers

2010-07-28 Thread Alex Williamson
On Thu, 2010-07-29 at 00:57 +0300, Michael S. Tsirkin wrote:
> On Wed, Jul 28, 2010 at 03:57:02PM -0600, Alex Williamson wrote:
> > 
> > Something like GET_MSIX_VECTORS seems like a user library routine to me.
> > The PCI config space is well specified and if we try to do more than
> > shortcut trivial operations (like getting the BAR length), we risk
> > losing functionality.  And for my purposes, translating to and from a
> > made up API to PCI for the guest seems like a pain.
> 
> Won't a userspace library do just as well for you?

You mean aside from qemu's reluctance to add dependencies for more
libraries?  My only concern is that I want enough virtualized/raw config
space that I'm not always translating PCI config accesses from the guest
into some userspace API.  If it makes sense to do this for things like
MSI, since I need someone to figure out what resources can actually be
allocated on the host, then maybe the library makes sense for that.
Then again, if every user needs to do this, let the vfio kernel driver
check what it can get and virtualize the available MSIs in exposed
config space, and my driver would be even happier.

Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V3] VFIO driver: Non-privileged user level PCI drivers

2010-07-28 Thread Michael S. Tsirkin
On Wed, Jul 28, 2010 at 03:57:02PM -0600, Alex Williamson wrote:
> 
> Something like GET_MSIX_VECTORS seems like a user library routine to me.
> The PCI config space is well specified and if we try to do more than
> shortcut trivial operations (like getting the BAR length), we risk
> losing functionality.  And for my purposes, translating to and from a
> made up API to PCI for the guest seems like a pain.

Won't a userspace library do just as well for you?

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V3] VFIO driver: Non-privileged user level PCI drivers

2010-07-28 Thread Alex Williamson
On Wed, 2010-07-28 at 14:14 -0700, Tom Lyon wrote:
> On Tuesday, July 27, 2010 04:53:22 pm Michael S. Tsirkin wrote:
> > On Tue, Jul 27, 2010 at 03:13:14PM -0700, Tom Lyon wrote:
> > > [ Sorry for the long hiatus, I've been wrapped up in other issues.]
> > > 
> > > I think the fundamental issue to resolve is to decide on the model which
> > > the VFIO driver presents to its users.
> > > 
> > > Fundamentally, VFIO as part of the OS must protect the system from its
> > > users and also protect the users from each other.  No disagreement here.
> > > 
> > > But another fundamental purpose of an OS to to present an abstract model
> > > of the underlying hardware to its users, so that the users don't have to
> > > deal with the full complexity of the hardware.
> > > 
> > > So I think VFIO should present a 'virtual', abstracted PCI device to its
> > > users whereas Michael has argued for a simpler model of presenting the
> > > real PCI device config registers while preventing writes only to the
> > > registers which would clearly disrupt the system.
> > 
> > In fact, there is no contradiction. I am all for an abstracted
> > API *and* I think the virtualization concept is a bad way
> > to build this API.
> > 
> > The 'virtual' interface you present is very complex and hardware specific:
> > you do not hide literally *anything*. Deciding which functionality
> > userspace needs, and exposing it to userspace as a set of APIs would be
> > abstract. Instead you ask people to go read the PCI spec, the device spec,
> > and bang on PCI registers, little-endian-ness and all, then try to
> > interpret what do the virtual values mean.
> 
> Exactly! The PCI bus is far better *specified*, *documented*, and widely 
> implemented than a Linux driver could ever hope to be.  And there are lots of 
> current Linux drivers which bang around in pci config space simply because 
> the 
> authors were not aware of some api call buried deep in linux which would do 
> the work for them - or - got tired of using OS-specific APIs when porting a 
> driver and decided to just ask the hardware.
> 
> > Example:
> > 
> > How do I find # of MSI-X vectors? Sure, scan the capability list,
> > find the capability, read the value, convert from little endian
> > at each step.
> > A page or two of code, and let's hope I have a ppc to test on.
> > And note no driver has this code - they all use OS routines.
> > 
> > So why wouldn't
> > ioctl(dev, VFIO_GET_MSIX_VECTORS, &n);
> > better serve the declared goal of presenting an abstracted PCI device to
> > users?
> 
> By and large, the user drivers just know how many because the hardware is 
> constant.

Something like GET_MSIX_VECTORS seems like a user library routine to me.
The PCI config space is well specified and if we try to do more than
shortcut trivial operations (like getting the BAR length), we risk
losing functionality.  And for my purposes, translating to and from a
made up API to PCI for the guest seems like a pain.

> And inventing 20 or 30 ioctls to do a bunch of random stuff is gross when you 
> can instead use normal read and write calls to a well defined structure.

Yep, this sounds like a job for libvfio.

> > > Now, the virtual model *could* look little like the real hardware, and
> > > use bunches of ioctls for everything it needs,
> > 
> > Or reads/writes at special offsets, or sysfs attributes.
> > 
> > > or it could look a lot like PCI and
> > > use reads and writes of the virtual PCI config registers to trigger its
> > > actions.  The latter makes things more amenable to those porting drivers
> > > from other environments.
> > 
> > I really doubt this helps at all. Drivers typically use OS-specific
> > APIs. It is very uncommon for them to touch standard registers,
> > which is 100% of what your patch seem to be dealing with.
> > 
> > And again, how about a small userspace library that would wrap vfio and
> > add the abstractions for drivers that do need them?
> 
> Yes, there will be userspace libraries - I already have a vfio backend for 
> libpci.
> > 
> > > I realize that to date the VFIO driver has been a  bit of a mish-mash
> > > between the ioctl and config based techniques; I intend to clean that
> > > up.  And, yes, the abstract model presented by VFIO will need plenty of
> > > documentation.
> > 
> > And, it will need to be maintained forever, bugs and all.
> > For example, if you change some register you emulated
> > to fix a bug, to the driver this looks like a hardware change,
> > and it will crash.
> 
> The changes will come only to allow for a more-perfect emulation, so I doubt 
> that  will cause driver problems.  No different than discovering and fixing 
> bugs in the ioctls needed in you scenario.
> 
> > 
> > The PCI spec has some weak versioning support, but it
> > is mostly not a problem in that space: a specific driver needs to
> > only deal with a specific device.  We have a generic driver so PCI
> > configuration space is a bad interface to use.
> 
> PCI h

Re: [PATCH V3] VFIO driver: Non-privileged user level PCI drivers

2010-07-28 Thread Michael S. Tsirkin
On Wed, Jul 28, 2010 at 02:14:21PM -0700, Tom Lyon wrote:
> On Tuesday, July 27, 2010 04:53:22 pm Michael S. Tsirkin wrote:
> > On Tue, Jul 27, 2010 at 03:13:14PM -0700, Tom Lyon wrote:
> > > [ Sorry for the long hiatus, I've been wrapped up in other issues.]
> > > 
> > > I think the fundamental issue to resolve is to decide on the model which
> > > the VFIO driver presents to its users.
> > > 
> > > Fundamentally, VFIO as part of the OS must protect the system from its
> > > users and also protect the users from each other.  No disagreement here.
> > > 
> > > But another fundamental purpose of an OS to to present an abstract model
> > > of the underlying hardware to its users, so that the users don't have to
> > > deal with the full complexity of the hardware.
> > > 
> > > So I think VFIO should present a 'virtual', abstracted PCI device to its
> > > users whereas Michael has argued for a simpler model of presenting the
> > > real PCI device config registers while preventing writes only to the
> > > registers which would clearly disrupt the system.
> > 
> > In fact, there is no contradiction. I am all for an abstracted
> > API *and* I think the virtualization concept is a bad way
> > to build this API.
> > 
> > The 'virtual' interface you present is very complex and hardware specific:
> > you do not hide literally *anything*. Deciding which functionality
> > userspace needs, and exposing it to userspace as a set of APIs would be
> > abstract. Instead you ask people to go read the PCI spec, the device spec,
> > and bang on PCI registers, little-endian-ness and all, then try to
> > interpret what do the virtual values mean.
> 
> Exactly! The PCI bus is far better *specified*, *documented*, and widely 
> implemented than a Linux driver could ever hope to be.

Yes but it does not map all that well to what you need to do.
We need a sane backward compatibility plan, cross-platform support,
error reporting, atomicity ... PCI config has support for none of this.
So you implement a "kind of" PCI config, where accesses might fail
or not go through to device, where there are some atomicity guarantees
but not others ...
And there won't even be a header file to look at to say "aha,
this driver has this functionality".
How does an application know whether you support capability X?
Reading the driver source seems to be shaping up the only way.

>  And there are lots of 
> current Linux drivers which bang around in pci config space simply because 
> the 
> authors were not aware of some api call buried deep in linux which would do 
> the work for them - or - got tired of using OS-specific APIs when porting a 
> driver and decided to just ask the hardware.

Really? Example? drivers either use proper APIs or are broken in some way.
You can not even size the BARs without using the OS API.
So what's safe to do directly? Maybe reading out device/vendor/revision ID ...
looks like small change to me.

> 
> > Example:
> > 
> > How do I find # of MSI-X vectors? Sure, scan the capability list,
> > find the capability, read the value, convert from little endian
> > at each step.
> > A page or two of code, and let's hope I have a ppc to test on.
> > And note no driver has this code - they all use OS routines.
> > 
> > So why wouldn't
> > ioctl(dev, VFIO_GET_MSIX_VECTORS, &n);
> > better serve the declared goal of presenting an abstracted PCI device to
> > users?
> 
> By and large, the user drivers just know how many because the hardware is 
> constant.

But you might not have CPU resources to allocate all vectors.
And, same will apply to any register you spend code virtualizing.

> And inventing 20 or 30 ioctls to do a bunch of random stuff is gross


If you dislike ioctls, use read/write at a defined offset,
or sysfs. Just don't pretend you can say "look at PCI spec"
and avoid the need to document your interface this way.

> when you 
> can instead use normal read and write calls to a well defined structure.

It is not all that well defined.
What if hardware supports MSIX but host controller does not?
Do you return error from write enabling MSIX?
Virtualize it and pretend there is no capability?
PCI has no provision for this, and deciding what to do
here is policy which kernel should not dictate.


> > 
> > > Now, the virtual model *could* look little like the real hardware, and
> > > use bunches of ioctls for everything it needs,
> > 
> > Or reads/writes at special offsets, or sysfs attributes.
> > 
> > > or it could look a lot like PCI and
> > > use reads and writes of the virtual PCI config registers to trigger its
> > > actions.  The latter makes things more amenable to those porting drivers
> > > from other environments.
> > 
> > I really doubt this helps at all. Drivers typically use OS-specific
> > APIs. It is very uncommon for them to touch standard registers,
> > which is 100% of what your patch seem to be dealing with.
> > 
> > And again, how about a small userspace library that would wrap vfio and
> > add the abstrac

Re: [PATCH V3] VFIO driver: Non-privileged user level PCI drivers

2010-07-28 Thread Tom Lyon
On Tuesday, July 27, 2010 04:53:22 pm Michael S. Tsirkin wrote:
> On Tue, Jul 27, 2010 at 03:13:14PM -0700, Tom Lyon wrote:
> > [ Sorry for the long hiatus, I've been wrapped up in other issues.]
> > 
> > I think the fundamental issue to resolve is to decide on the model which
> > the VFIO driver presents to its users.
> > 
> > Fundamentally, VFIO as part of the OS must protect the system from its
> > users and also protect the users from each other.  No disagreement here.
> > 
> > But another fundamental purpose of an OS to to present an abstract model
> > of the underlying hardware to its users, so that the users don't have to
> > deal with the full complexity of the hardware.
> > 
> > So I think VFIO should present a 'virtual', abstracted PCI device to its
> > users whereas Michael has argued for a simpler model of presenting the
> > real PCI device config registers while preventing writes only to the
> > registers which would clearly disrupt the system.
> 
> In fact, there is no contradiction. I am all for an abstracted
> API *and* I think the virtualization concept is a bad way
> to build this API.
> 
> The 'virtual' interface you present is very complex and hardware specific:
> you do not hide literally *anything*. Deciding which functionality
> userspace needs, and exposing it to userspace as a set of APIs would be
> abstract. Instead you ask people to go read the PCI spec, the device spec,
> and bang on PCI registers, little-endian-ness and all, then try to
> interpret what do the virtual values mean.

Exactly! The PCI bus is far better *specified*, *documented*, and widely 
implemented than a Linux driver could ever hope to be.  And there are lots of 
current Linux drivers which bang around in pci config space simply because the 
authors were not aware of some api call buried deep in linux which would do 
the work for them - or - got tired of using OS-specific APIs when porting a 
driver and decided to just ask the hardware.

> Example:
> 
> How do I find # of MSI-X vectors? Sure, scan the capability list,
> find the capability, read the value, convert from little endian
> at each step.
> A page or two of code, and let's hope I have a ppc to test on.
> And note no driver has this code - they all use OS routines.
> 
> So why wouldn't
>   ioctl(dev, VFIO_GET_MSIX_VECTORS, &n);
> better serve the declared goal of presenting an abstracted PCI device to
> users?

By and large, the user drivers just know how many because the hardware is 
constant.

And inventing 20 or 30 ioctls to do a bunch of random stuff is gross when you 
can instead use normal read and write calls to a well defined structure.
> 
> > Now, the virtual model *could* look little like the real hardware, and
> > use bunches of ioctls for everything it needs,
> 
> Or reads/writes at special offsets, or sysfs attributes.
> 
> > or it could look a lot like PCI and
> > use reads and writes of the virtual PCI config registers to trigger its
> > actions.  The latter makes things more amenable to those porting drivers
> > from other environments.
> 
> I really doubt this helps at all. Drivers typically use OS-specific
> APIs. It is very uncommon for them to touch standard registers,
> which is 100% of what your patch seem to be dealing with.
> 
> And again, how about a small userspace library that would wrap vfio and
> add the abstractions for drivers that do need them?

Yes, there will be userspace libraries - I already have a vfio backend for 
libpci.
> 
> > I realize that to date the VFIO driver has been a  bit of a mish-mash
> > between the ioctl and config based techniques; I intend to clean that
> > up.  And, yes, the abstract model presented by VFIO will need plenty of
> > documentation.
> 
> And, it will need to be maintained forever, bugs and all.
> For example, if you change some register you emulated
> to fix a bug, to the driver this looks like a hardware change,
> and it will crash.

The changes will come only to allow for a more-perfect emulation, so I doubt 
that  will cause driver problems.  No different than discovering and fixing 
bugs in the ioctls needed in you scenario.

> 
> The PCI spec has some weak versioning support, but it
> is mostly not a problem in that space: a specific driver needs to
> only deal with a specific device.  We have a generic driver so PCI
> configuration space is a bad interface to use.

PCI has great versioning. Damn near every change made in 16+ years has been 
upwards compatible.  BIOS and OS writers don't have trouble with generic PCI, 
why should vfio?

> 
> > Since KVM/qemu already has its own notion of a virtual PCI device which
> > it presents to the guest OS, we either need to reconcile VFIO and qemu,
> > or provide a bypass of the VFIO virtual model.  This could be direct
> > access through sysfs, or else an ioctl to VFIO.  Since I have no
> > internals knowledge of qemu, I look to others to choose.
> 
> Ah, so there will be 2 APIs, one for qemu, one for userspace drivers?

I hope not,

Re: [PATCH V3] VFIO driver: Non-privileged user level PCI drivers

2010-07-27 Thread Michael S. Tsirkin
On Tue, Jul 27, 2010 at 03:13:14PM -0700, Tom Lyon wrote:
> [ Sorry for the long hiatus, I've been wrapped up in other issues.]
> 
> I think the fundamental issue to resolve is to decide on the model which the 
> VFIO driver presents to its users.
> 
> Fundamentally, VFIO as part of the OS must protect the system from its users 
> and also protect the users from each other.  No disagreement here.
> 
> But another fundamental purpose of an OS to to present an abstract model of 
> the underlying hardware to its users, so that the users don't have to deal 
> with the full complexity of the hardware.
> 
> So I think VFIO should present a 'virtual', abstracted PCI device to its 
> users 
> whereas Michael has argued for a simpler model of presenting the real PCI
> device config registers while preventing writes only to the registers which 
> would clearly disrupt the system.

In fact, there is no contradiction. I am all for an abstracted
API *and* I think the virtualization concept is a bad way
to build this API.

The 'virtual' interface you present is very complex and hardware specific:
you do not hide literally *anything*. Deciding which functionality userspace
needs, and exposing it to userspace as a set of APIs would be abstract.
Instead you ask people to go read the PCI spec, the device spec, and bang
on PCI registers, little-endian-ness and all, then try to interpret
what do the virtual values mean.

Example:

How do I find # of MSI-X vectors? Sure, scan the capability list,
find the capability, read the value, convert from little endian
at each step.
A page or two of code, and let's hope I have a ppc to test on.
And note no driver has this code - they all use OS routines.

So why wouldn't
ioctl(dev, VFIO_GET_MSIX_VECTORS, &n);
better serve the declared goal of presenting an abstracted PCI device to
users?


> Now, the virtual model *could* look little like the real hardware, and use 
> bunches of ioctls for everything it needs,

Or reads/writes at special offsets, or sysfs attributes.

> or it could look a lot like PCI and 
> use reads and writes of the virtual PCI config registers to trigger its 
> actions.  The latter makes things more amenable to those porting drivers from 
> other environments.

I really doubt this helps at all. Drivers typically use OS-specific
APIs. It is very uncommon for them to touch standard registers,
which is 100% of what your patch seem to be dealing with.

And again, how about a small userspace library that would wrap vfio and
add the abstractions for drivers that do need them?

> I realize that to date the VFIO driver has been a  bit of a mish-mash between 
> the ioctl and config based techniques; I intend to clean that up.  And, yes, 
> the abstract model presented by VFIO will need plenty of documentation.

And, it will need to be maintained forever, bugs and all.
For example, if you change some register you emulated
to fix a bug, to the driver this looks like a hardware change,
and it will crash.

The PCI spec has some weak versioning support, but it
is mostly not a problem in that space: a specific driver needs to
only deal with a specific device.  We have a generic driver so PCI
configuration space is a bad interface to use.


> Since KVM/qemu already has its own notion of a virtual PCI device which it 
> presents to the guest OS, we either need to reconcile VFIO and qemu, or 
> provide a bypass of the VFIO virtual model.  This could be direct access 
> through sysfs, or else an ioctl to VFIO.  Since I have no internals knowledge 
> of qemu, I look to others to choose.

Ah, so there will be 2 APIs, one for qemu, one for userspace drivers?

> Other little things:
> 1. Yes, I can share some code with sysfs if I can get the right EXPORTs there.
> 2. I'll add multiple MSI support, but I wish to point out that even though 
> the 
> PCI MSI API supports it, none of the architectures do.
> 3. FLR needs work.  I was foolish enough to assume that FLR wouldn't reset 
> BARs; now I know better.

And as I said separately, drivers might reset BARs without FLR as well.
As long as io/memory is disabled, we really should allow userspace
write anything in BARs. And once we let it do it, most of the problem goes
away.

> 4. I'll get rid of the vfio config_map in sysfs; it was there for debugging.
> 5. I'm still looking to support hotplug/unplug and power management stuff via 
> generic netlink notifications.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V3] VFIO driver: Non-privileged user level PCI drivers

2010-07-27 Thread Tom Lyon
[ Sorry for the long hiatus, I've been wrapped up in other issues.]

I think the fundamental issue to resolve is to decide on the model which the 
VFIO driver presents to its users.

Fundamentally, VFIO as part of the OS must protect the system from its users 
and also protect the users from each other.  No disagreement here.

But another fundamental purpose of an OS to to present an abstract model of 
the underlying hardware to its users, so that the users don't have to deal 
with the full complexity of the hardware.

So I think VFIO should present a 'virtual', abstracted PCI device to its users 
whereas Michael has argued for a simpler model of presenting the real PCI
device config registers while preventing writes only to the registers which 
would clearly disrupt the system.

Now, the virtual model *could* look little like the real hardware, and use 
bunches of ioctls for everything it needs, or it could look a lot like PCI and 
use reads and writes of the virtual PCI config registers to trigger its 
actions.  The latter makes things more amenable to those porting drivers from 
other environments.

I realize that to date the VFIO driver has been a  bit of a mish-mash between 
the ioctl and config based techniques; I intend to clean that up.  And, yes, 
the abstract model presented by VFIO will need plenty of documentation.

Since KVM/qemu already has its own notion of a virtual PCI device which it 
presents to the guest OS, we either need to reconcile VFIO and qemu, or 
provide a bypass of the VFIO virtual model.  This could be direct access 
through sysfs, or else an ioctl to VFIO.  Since I have no internals knowledge 
of qemu, I look to others to choose.

Other little things:
1. Yes, I can share some code with sysfs if I can get the right EXPORTs there.
2. I'll add multiple MSI support, but I wish to point out that even though the 
PCI MSI API supports it, none of the architectures do.
3. FLR needs work.  I was foolish enough to assume that FLR wouldn't reset 
BARs; now I know better.
4. I'll get rid of the vfio config_map in sysfs; it was there for debugging.
5. I'm still looking to support hotplug/unplug and power management stuff via 
generic netlink notifications.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V3] VFIO driver: Non-privileged user level PCI drivers

2010-07-20 Thread Greg KH
On Sat, Jul 17, 2010 at 10:45:23AM +0200, Piotr Jaroszy??ski wrote:
> On 16 July 2010 23:58, Tom Lyon  wrote:
> > The VFIO "driver" is used to allow privileged AND non-privileged processes 
> > to
> > implement user-level device drivers for any well-behaved PCI, PCI-X, and 
> > PCIe
> > devices.
> 
> Thanks for working on that! I wonder whether it's possible to say what
> are the chances of it being merged to mainline and which version we
> might be talking about?

We still have a long way to go before you need to worry about what
kernel version it's going to show up in...

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V3] VFIO driver: Non-privileged user level PCI drivers

2010-07-19 Thread Michael S. Tsirkin
On Sun, Jul 18, 2010 at 10:56:47PM -0600, Alex Williamson wrote:
> Hi Tom, Michael,
> 
> Comments for both of you below.  Tom, what does this build against?  Are
> we still on 2.6.34?
> 
> On Sun, 2010-07-18 at 12:39 +0300, Michael S. Tsirkin wrote:
> > Hi Tom,
> > > ---
> > > In this version:
> > > 
> > > There are lots of bug fixes and cleanups in this version, but the main
> > > change is to check to make sure that the IOMMU has interrupt remapping
> > > enabled, which is necessary to prevent user level code from triggering
> > > spurious interrupts for other devices.  Since most platforms today
> > > do not have the necessary hardware and/or software for this, a module
> > > option can override this check, thus making vfio useful (but not safe)
> > > on many more platforms.
> > > 
> > > In the next version I plan to add kernel to user messaging using the
> > > generic netlink mechanism to allow the user driver to react to hot add
> > > and remove, and power management requests.
> > > 
> > > Blurb from version 2:
> > > 
> > > This version now requires an IOMMU domain to be set before any access to
> > > device registers is granted (except that config space may be read).  In
> > > addition, the VFIO_DMA_MAP_ANYWHERE is dropped - it used the dma_map_sg 
> > > API
> > > which does not have sufficient controls around IOMMU usage.  The IOMMU 
> > > domain
> > > is obtained from the 'uiommu' driver which is included in this patch.
> > > 
> > > Various locking, security, and documentation issues have also been fixed.
> > > 
> > 
> > I think this is making nice progress, especially good to see
> > some effort to address the interrupt remapping issue.
> > I just realized we also have an issue with determining
> > the MSI capability and allocating entries. Some ideas
> > on addressing this posted below.
> > I also think it might make sense to involve the pci crowd here.
> > 
> > It might be nice to see a bit of documentation for the interface
> > presented to userspace. It is not always obvious.
> > For example, passing CONFIG as offset to
> > write or read will not always trigger access to config space.
> > 
> > More comments below.
> > 
> > 
> > On Fri, Jul 16, 2010 at 02:58:48PM -0700, Tom Lyon wrote:
> > > +static int vfio_msix_check(struct vfio_dev *vdev, u64 start, u32 len)
> > > +{
> > > + struct pci_dev *pdev = vdev->pdev;
> > > + u16 pos;
> > > + u32 table_offset;
> > > + u16 table_size;
> > > + u8 bir;
> > > + u32 lo, hi, startp, endp;
> > > +
> > > + pos = pci_find_capability(pdev, PCI_CAP_ID_MSIX);
> > > + if (!pos)
> > > + return 0;
> > > +
> > > + pci_read_config_word(pdev, pos + PCI_MSIX_FLAGS, &table_size);
> > > + table_size = (table_size & PCI_MSIX_FLAGS_QSIZE) + 1;
> > > + pci_read_config_dword(pdev, pos + 4, &table_offset);
> > > + bir = table_offset & PCI_MSIX_FLAGS_BIRMASK;
> > > + lo = table_offset >> PAGE_SHIFT;
> > > + hi = (table_offset + PCI_MSIX_ENTRY_SIZE * table_size + PAGE_SIZE - 1)
> > > + >> PAGE_SHIFT;
> > > + startp = start >> PAGE_SHIFT;
> > > + endp = (start + len + PAGE_SIZE - 1) >> PAGE_SHIFT;
> > > + if (bir == vfio_offset_to_pci_space(start) &&
> > > + overlap(lo, hi, startp, endp)) {
> > > + printk(KERN_WARNING "%s: cannot write msi-x vectors\n",
> > > + __func__);
> > > + return -EINVAL;
> > > + }
> > > + return 0;
> > > +}
> > 
> > 
> > You can't just check MSI capability to figure out whether MSI
> > will work: this also depends on the host system in many ways.
> > The only way to really know is to request vectors in host. So
> > what we really need is an API that will reserve MSI vectors,
> > but not assign them in the device until guest asks us to
> > do it.
> > 
> > > +
> > > + if (pci_resource_flags(pdev, pci_space) & IORESOURCE_MEM) {
> > > + if (allow_unsafe_intrs) {
> > > + /* don't allow writes to msi-x vectors */
> > > + ret = vfio_msix_check(vdev, *ppos, count);
> > 
> > I thought we have an agreement here: it's useless to protect
> > msix vector space and at the same time allow DMA from device,
> > since device can do DMA write to trigger MSI.
> > So why do you keep this code around?
> > 
> > 
> > > + case VFIO_EVENTFD_MSI:
> > > + if (copy_from_user(&fd, uarg, sizeof fd))
> > > + return -EFAULT;
> > > + mutex_lock(&vdev->igate);
> > > + if (fd >= 0 && vdev->ev_msi == NULL && vdev->ev_msix == NULL)
> > > + ret = vfio_enable_msi(vdev, fd);
> > > + else if (fd < 0 && vdev->ev_msi)
> > > + vfio_disable_msi(vdev);
> > > + else
> > > + ret = -EINVAL;
> > > + mutex_unlock(&vdev->igate);
> > > + break;
> > 
> > I think that for virt we'll need multivector support for MSI.
> > 
> > 
> > > diff --git a/drivers/vfio/vfio_pci_config.c 
> > > b/drivers/vfio/vfio_pci_config.c
> > > index e69de29..8bd5c00 100644
> > > --- a/drivers/vfio/vfio_pci_config.c
> > > +++

Re: [PATCH V3] VFIO driver: Non-privileged user level PCI drivers

2010-07-18 Thread Alex Williamson
Hi Tom, Michael,

Comments for both of you below.  Tom, what does this build against?  Are
we still on 2.6.34?

On Sun, 2010-07-18 at 12:39 +0300, Michael S. Tsirkin wrote:
> Hi Tom,
> > ---
> > In this version:
> > 
> > There are lots of bug fixes and cleanups in this version, but the main
> > change is to check to make sure that the IOMMU has interrupt remapping
> > enabled, which is necessary to prevent user level code from triggering
> > spurious interrupts for other devices.  Since most platforms today
> > do not have the necessary hardware and/or software for this, a module
> > option can override this check, thus making vfio useful (but not safe)
> > on many more platforms.
> > 
> > In the next version I plan to add kernel to user messaging using the
> > generic netlink mechanism to allow the user driver to react to hot add
> > and remove, and power management requests.
> > 
> > Blurb from version 2:
> > 
> > This version now requires an IOMMU domain to be set before any access to
> > device registers is granted (except that config space may be read).  In
> > addition, the VFIO_DMA_MAP_ANYWHERE is dropped - it used the dma_map_sg API
> > which does not have sufficient controls around IOMMU usage.  The IOMMU 
> > domain
> > is obtained from the 'uiommu' driver which is included in this patch.
> > 
> > Various locking, security, and documentation issues have also been fixed.
> > 
> 
> I think this is making nice progress, especially good to see
> some effort to address the interrupt remapping issue.
> I just realized we also have an issue with determining
> the MSI capability and allocating entries. Some ideas
> on addressing this posted below.
> I also think it might make sense to involve the pci crowd here.
> 
> It might be nice to see a bit of documentation for the interface
> presented to userspace. It is not always obvious.
> For example, passing CONFIG as offset to
> write or read will not always trigger access to config space.
> 
> More comments below.
> 
> 
> On Fri, Jul 16, 2010 at 02:58:48PM -0700, Tom Lyon wrote:
> > +static int vfio_msix_check(struct vfio_dev *vdev, u64 start, u32 len)
> > +{
> > +   struct pci_dev *pdev = vdev->pdev;
> > +   u16 pos;
> > +   u32 table_offset;
> > +   u16 table_size;
> > +   u8 bir;
> > +   u32 lo, hi, startp, endp;
> > +
> > +   pos = pci_find_capability(pdev, PCI_CAP_ID_MSIX);
> > +   if (!pos)
> > +   return 0;
> > +
> > +   pci_read_config_word(pdev, pos + PCI_MSIX_FLAGS, &table_size);
> > +   table_size = (table_size & PCI_MSIX_FLAGS_QSIZE) + 1;
> > +   pci_read_config_dword(pdev, pos + 4, &table_offset);
> > +   bir = table_offset & PCI_MSIX_FLAGS_BIRMASK;
> > +   lo = table_offset >> PAGE_SHIFT;
> > +   hi = (table_offset + PCI_MSIX_ENTRY_SIZE * table_size + PAGE_SIZE - 1)
> > +   >> PAGE_SHIFT;
> > +   startp = start >> PAGE_SHIFT;
> > +   endp = (start + len + PAGE_SIZE - 1) >> PAGE_SHIFT;
> > +   if (bir == vfio_offset_to_pci_space(start) &&
> > +   overlap(lo, hi, startp, endp)) {
> > +   printk(KERN_WARNING "%s: cannot write msi-x vectors\n",
> > +   __func__);
> > +   return -EINVAL;
> > +   }
> > +   return 0;
> > +}
> 
> 
> You can't just check MSI capability to figure out whether MSI
> will work: this also depends on the host system in many ways.
> The only way to really know is to request vectors in host. So
> what we really need is an API that will reserve MSI vectors,
> but not assign them in the device until guest asks us to
> do it.
> 
> > +
> > +   if (pci_resource_flags(pdev, pci_space) & IORESOURCE_MEM) {
> > +   if (allow_unsafe_intrs) {
> > +   /* don't allow writes to msi-x vectors */
> > +   ret = vfio_msix_check(vdev, *ppos, count);
> 
> I thought we have an agreement here: it's useless to protect
> msix vector space and at the same time allow DMA from device,
> since device can do DMA write to trigger MSI.
> So why do you keep this code around?
> 
> 
> > +   case VFIO_EVENTFD_MSI:
> > +   if (copy_from_user(&fd, uarg, sizeof fd))
> > +   return -EFAULT;
> > +   mutex_lock(&vdev->igate);
> > +   if (fd >= 0 && vdev->ev_msi == NULL && vdev->ev_msix == NULL)
> > +   ret = vfio_enable_msi(vdev, fd);
> > +   else if (fd < 0 && vdev->ev_msi)
> > +   vfio_disable_msi(vdev);
> > +   else
> > +   ret = -EINVAL;
> > +   mutex_unlock(&vdev->igate);
> > +   break;
> 
> I think that for virt we'll need multivector support for MSI.
> 
> 
> > diff --git a/drivers/vfio/vfio_pci_config.c b/drivers/vfio/vfio_pci_config.c
> > index e69de29..8bd5c00 100644
> > --- a/drivers/vfio/vfio_pci_config.c
> > +++ b/drivers/vfio/vfio_pci_config.c
> > @@ -0,0 +1,605 @@
> > +/*
> > + * Copyright 2010 Cisco Systems, Inc.  All rights reserved.
> > + * Author: Tom Lyon, p...@cisco.com
> 
> Any hints on what this file does?
> 
> > + *
> > + * This progra

Re: [PATCH V3] VFIO driver: Non-privileged user level PCI drivers

2010-07-18 Thread Michael S. Tsirkin
Hi Tom,
> ---
> In this version:
> 
> There are lots of bug fixes and cleanups in this version, but the main
> change is to check to make sure that the IOMMU has interrupt remapping
> enabled, which is necessary to prevent user level code from triggering
> spurious interrupts for other devices.  Since most platforms today
> do not have the necessary hardware and/or software for this, a module
> option can override this check, thus making vfio useful (but not safe)
> on many more platforms.
> 
> In the next version I plan to add kernel to user messaging using the
> generic netlink mechanism to allow the user driver to react to hot add
> and remove, and power management requests.
> 
> Blurb from version 2:
> 
> This version now requires an IOMMU domain to be set before any access to
> device registers is granted (except that config space may be read).  In
> addition, the VFIO_DMA_MAP_ANYWHERE is dropped - it used the dma_map_sg API
> which does not have sufficient controls around IOMMU usage.  The IOMMU domain
> is obtained from the 'uiommu' driver which is included in this patch.
> 
> Various locking, security, and documentation issues have also been fixed.
> 

I think this is making nice progress, especially good to see
some effort to address the interrupt remapping issue.
I just realized we also have an issue with determining
the MSI capability and allocating entries. Some ideas
on addressing this posted below.
I also think it might make sense to involve the pci crowd here.

It might be nice to see a bit of documentation for the interface
presented to userspace. It is not always obvious.
For example, passing CONFIG as offset to
write or read will not always trigger access to config space.

More comments below.


On Fri, Jul 16, 2010 at 02:58:48PM -0700, Tom Lyon wrote:
> +static int vfio_msix_check(struct vfio_dev *vdev, u64 start, u32 len)
> +{
> + struct pci_dev *pdev = vdev->pdev;
> + u16 pos;
> + u32 table_offset;
> + u16 table_size;
> + u8 bir;
> + u32 lo, hi, startp, endp;
> +
> + pos = pci_find_capability(pdev, PCI_CAP_ID_MSIX);
> + if (!pos)
> + return 0;
> +
> + pci_read_config_word(pdev, pos + PCI_MSIX_FLAGS, &table_size);
> + table_size = (table_size & PCI_MSIX_FLAGS_QSIZE) + 1;
> + pci_read_config_dword(pdev, pos + 4, &table_offset);
> + bir = table_offset & PCI_MSIX_FLAGS_BIRMASK;
> + lo = table_offset >> PAGE_SHIFT;
> + hi = (table_offset + PCI_MSIX_ENTRY_SIZE * table_size + PAGE_SIZE - 1)
> + >> PAGE_SHIFT;
> + startp = start >> PAGE_SHIFT;
> + endp = (start + len + PAGE_SIZE - 1) >> PAGE_SHIFT;
> + if (bir == vfio_offset_to_pci_space(start) &&
> + overlap(lo, hi, startp, endp)) {
> + printk(KERN_WARNING "%s: cannot write msi-x vectors\n",
> + __func__);
> + return -EINVAL;
> + }
> + return 0;
> +}


You can't just check MSI capability to figure out whether MSI
will work: this also depends on the host system in many ways.
The only way to really know is to request vectors in host. So
what we really need is an API that will reserve MSI vectors,
but not assign them in the device until guest asks us to
do it.

> +
> + if (pci_resource_flags(pdev, pci_space) & IORESOURCE_MEM) {
> + if (allow_unsafe_intrs) {
> + /* don't allow writes to msi-x vectors */
> + ret = vfio_msix_check(vdev, *ppos, count);

I thought we have an agreement here: it's useless to protect
msix vector space and at the same time allow DMA from device,
since device can do DMA write to trigger MSI.
So why do you keep this code around?


> + case VFIO_EVENTFD_MSI:
> + if (copy_from_user(&fd, uarg, sizeof fd))
> + return -EFAULT;
> + mutex_lock(&vdev->igate);
> + if (fd >= 0 && vdev->ev_msi == NULL && vdev->ev_msix == NULL)
> + ret = vfio_enable_msi(vdev, fd);
> + else if (fd < 0 && vdev->ev_msi)
> + vfio_disable_msi(vdev);
> + else
> + ret = -EINVAL;
> + mutex_unlock(&vdev->igate);
> + break;

I think that for virt we'll need multivector support for MSI.


> diff --git a/drivers/vfio/vfio_pci_config.c b/drivers/vfio/vfio_pci_config.c
> index e69de29..8bd5c00 100644
> --- a/drivers/vfio/vfio_pci_config.c
> +++ b/drivers/vfio/vfio_pci_config.c
> @@ -0,0 +1,605 @@
> +/*
> + * Copyright 2010 Cisco Systems, Inc.  All rights reserved.
> + * Author: Tom Lyon, p...@cisco.com

Any hints on what this file does?

> + *
> + * This program is free software; you may redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; version 2 of the License.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
> + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES O

Re: [PATCH V3] VFIO driver: Non-privileged user level PCI drivers

2010-07-17 Thread Piotr Jaroszyński
On 16 July 2010 23:58, Tom Lyon  wrote:
> The VFIO "driver" is used to allow privileged AND non-privileged processes to
> implement user-level device drivers for any well-behaved PCI, PCI-X, and PCIe
> devices.

Thanks for working on that! I wonder whether it's possible to say what
are the chances of it being merged to mainline and which version we
might be talking about?

-- 
Best Regards
Piotr Jaroszyński
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html