Re: [libvirt] RFC: Creating mediated devices with libvirt

2017-06-22 Thread Martin Polednik

On 22/06/17 14:05 +0200, Erik Skultety wrote:

On Thu, Jun 22, 2017 at 10:41:13AM +0200, Martin Polednik wrote:

On 16/06/17 18:14 +0100, Daniel P. Berrange wrote:
> On Fri, Jun 16, 2017 at 06:11:17PM +0100, Daniel P. Berrange wrote:
> > On Fri, Jun 16, 2017 at 11:02:55AM -0600, Alex Williamson wrote:
> > > On Fri, 16 Jun 2017 11:32:04 -0400
> > > Laine Stump <la...@redhat.com> wrote:
> > >
> > > > On 06/15/2017 02:42 PM, Alex Williamson wrote:
> > > > > On Thu, 15 Jun 2017 09:33:01 +0100
> > > > > "Daniel P. Berrange" <berra...@redhat.com> wrote:
> > > > >
> > > > >> On Thu, Jun 15, 2017 at 12:06:43AM +0200, Erik Skultety wrote:
> > > > >>> Hi all,
> > > > >>>
> > > > >>> so there's been an off-list discussion about finally implementing 
creation of
> > > > >>> mediated devices with libvirt and it's more than desired to get as 
many opinions
> > > > >>> on that as possible, so please do share your ideas. This did come 
up already as
> > > > >>> part of some older threads ([1] for example), so this will be a 
respin of the
> > > > >>> discussions. Long story short, we decided to put device creation 
off and focus
> > > > >>> on the introduction of the framework as such first and build upon 
that later,
> > > > >>> i.e. now.
> > > > >>>
> > > > >>> [1] 
https://www.redhat.com/archives/libvir-list/2017-February/msg00177.html
> > > > >>>
> > > > >>> 
> > > > >>> PART 1: NODEDEV-DRIVER
> > > > >>> 
> > > > >>>
> > > > >>> API-wise, device creation through the nodedev driver should be 
pretty
> > > > >>> straightforward and without any issues, since virNodeDevCreateXML 
takes an XML
> > > > >>> and does support flags. Looking at the current device XML:
> > > > >>>
> > > > >>> 
> > > > >>>   mdev_0cce8709_0640_46ef_bd14_962c7f73cc6f
> > > > >>>   
/sys/devices/pci:00/.../0cce8709-0640-46ef-bd14-962c7f73cc6f
> > > > >>>   pci__03_00_0
> > > > >>>   
> > > > >>> vfio_mdev
> > > > >>>   
> > > > >>>   
> > > > >>> 
> > > > >>> 
> > > > >>> UUID 
> > > > >>>   
> > > > >>> 
> > > > >>>
> > > > >>> We can ignore ,, elements, since these 
are useless
> > > > >>> during creation. We also cannot use  since we don't support 
arbitrary
> > > > >>> names and we also can't rely on users providing a name in correct 
form which we
> > > > >>> would need to further parse in order to get the UUID.
> > > > >>> So since the only thing missing to successfully use create an mdev 
using XML is
> > > > >>> the UUID (if user doesn't want it to be generated automatically), 
how about
> > > > >>> having a  subelement under  just like PCIs have 
 and
> > > > >>> friends, USBs have  & , interfaces have  to 
uniquely
> > > > >>> identify the device even if the name itself is unique.
> > > > >>> Removal of a device should work as well, although we might want to
> > > > >>> consider creating a *Flags version of the API.
> > > > >>>
> > > > >>> =
> > > > >>> PART 2: DOMAIN XML & DEVICE AUTO-CREATION, NO POLICY INVOLVED!
> > > > >>> =
> > > > >>>
> > > > >>> There were some doubts about auto-creation mentioned in [1], 
although they
> > > > >>> weren't specified further. So hopefully, we'll get further in the 
discussion
> > > > >>> this time.
> > > > >>>
> > > > >>> From my perspective there are two main reasons/benefits to that:
> > > > >>>
> > > > >>> 1) Convenience
> > > > >>> For apps like virt-manager, user will want to add a host device 
transparently,
> > > > >

Re: [libvirt] RFC: Creating mediated devices with libvirt

2017-06-22 Thread Martin Polednik

On 16/06/17 18:14 +0100, Daniel P. Berrange wrote:

On Fri, Jun 16, 2017 at 06:11:17PM +0100, Daniel P. Berrange wrote:

On Fri, Jun 16, 2017 at 11:02:55AM -0600, Alex Williamson wrote:
> On Fri, 16 Jun 2017 11:32:04 -0400
> Laine Stump  wrote:
>
> > On 06/15/2017 02:42 PM, Alex Williamson wrote:
> > > On Thu, 15 Jun 2017 09:33:01 +0100
> > > "Daniel P. Berrange"  wrote:
> > >
> > >> On Thu, Jun 15, 2017 at 12:06:43AM +0200, Erik Skultety wrote:
> > >>> Hi all,
> > >>>
> > >>> so there's been an off-list discussion about finally implementing 
creation of
> > >>> mediated devices with libvirt and it's more than desired to get as many 
opinions
> > >>> on that as possible, so please do share your ideas. This did come up 
already as
> > >>> part of some older threads ([1] for example), so this will be a respin 
of the
> > >>> discussions. Long story short, we decided to put device creation off 
and focus
> > >>> on the introduction of the framework as such first and build upon that 
later,
> > >>> i.e. now.
> > >>>
> > >>> [1] 
https://www.redhat.com/archives/libvir-list/2017-February/msg00177.html
> > >>>
> > >>> 
> > >>> PART 1: NODEDEV-DRIVER
> > >>> 
> > >>>
> > >>> API-wise, device creation through the nodedev driver should be pretty
> > >>> straightforward and without any issues, since virNodeDevCreateXML takes 
an XML
> > >>> and does support flags. Looking at the current device XML:
> > >>>
> > >>> 
> > >>>   mdev_0cce8709_0640_46ef_bd14_962c7f73cc6f
> > >>>   
/sys/devices/pci:00/.../0cce8709-0640-46ef-bd14-962c7f73cc6f
> > >>>   pci__03_00_0
> > >>>   
> > >>> vfio_mdev
> > >>>   
> > >>>   
> > >>> 
> > >>> 
> > >>> UUID 
> > >>>   
> > >>> 
> > >>>
> > >>> We can ignore ,, elements, since these are 
useless
> > >>> during creation. We also cannot use  since we don't support 
arbitrary
> > >>> names and we also can't rely on users providing a name in correct form 
which we
> > >>> would need to further parse in order to get the UUID.
> > >>> So since the only thing missing to successfully use create an mdev 
using XML is
> > >>> the UUID (if user doesn't want it to be generated automatically), how 
about
> > >>> having a  subelement under  just like PCIs have 
 and
> > >>> friends, USBs have  & , interfaces have  to 
uniquely
> > >>> identify the device even if the name itself is unique.
> > >>> Removal of a device should work as well, although we might want to
> > >>> consider creating a *Flags version of the API.
> > >>>
> > >>> =
> > >>> PART 2: DOMAIN XML & DEVICE AUTO-CREATION, NO POLICY INVOLVED!
> > >>> =
> > >>>
> > >>> There were some doubts about auto-creation mentioned in [1], although 
they
> > >>> weren't specified further. So hopefully, we'll get further in the 
discussion
> > >>> this time.
> > >>>
> > >>> From my perspective there are two main reasons/benefits to that:
> > >>>
> > >>> 1) Convenience
> > >>> For apps like virt-manager, user will want to add a host device 
transparently,
> > >>> "hey libvirt, I want an mdev assigned to my VM, can you do that". Even 
for
> > >>> higher management apps, like oVirt, even they might not care about the 
parent
> > >>> device at all times and considering that they would need to enumerate 
the
> > >>> parents, pick one, create the device XML and pass it to the nodedev 
driver, IMHO
> > >>> it would actually be easier and faster to just do it directly 
through sysfs,
> > >>> bypassing libvirt once again
> > >>
> > >> The convenience only works if the policy we've provided in libvirt 
actually
> > >> matches the policy the application wants. I think it is quite likely 
that with
> > >> cloud the mdevs will be created out of band from the domain startup 
process.
> > >> It is possible the app will just have a fixed set of mdevs pre-created 
when
> > >> the host starts up. Or that the mgmt app wants the domain startup 
process to
> > >> be a two phase setup, where it first allocates the resources needed, and 
later
> > >> then tries to start the guest. This is why I keep saying that putting 
this kind
> > >> of "convenient" policy in libvirt is a bad idea - it is essentially just 
putting
> > >> a bit of virt-manager code into libvirt - more advanced apps will need 
more
> > >> flexibility in this area.
> > >>
> > >>> 2) Future domain migration
> > >>> Suppose now that the mdev backing physical devices support state dump 
and
> > >>> reload. Chances are, that the corresponding mdev doesn't even exist or 
has a
> > >>> different UUID on the destination, so libvirt would do its best to 
handle this
> > >>> before the domain could be resumed.
> > >>
> > >> This is not an unusual scenario - there are already many other parts of 
the
> > >> device backend config that need to 

Re: [libvirt] Add mdev reporting capability to the nodedev driver

2017-04-18 Thread Martin Polednik

On 12/04/17 16:19 -0600, Alex Williamson wrote:

On Wed, 5 Apr 2017 09:21:17 +0100
"Daniel P. Berrange"  wrote:


On Wed, Apr 05, 2017 at 08:12:31AM +0200, Erik Skultety wrote:
> On Tue, Apr 04, 2017 at 04:23:18PM +0100, Daniel P. Berrange wrote:
> > On Wed, Mar 29, 2017 at 02:51:10PM +0200, Erik Skultety wrote:
> > > This series enables the node device driver to report information about the
> > > existing mediated devices on the host. There is no device creation 
involved
> > > yet. The information reported by the node device driver is split into two
> > > parts, one that is reported within the physical parent's capabilities
> > > (the generic stuff that comes from the mdev types' sysfs attributes, note 
the
> > >  'description' attribute which is verbatim - raw,unstructured string) and 
the
> > > other that is reported within the mdev child device and merely contains 
the
> > > mdev type id, which the device was instantiated from, and the iommu group
> > > number.
> > >
> > > Basically, the format of the XML I went for is as follows:
> > >
> > > PCI parent:
> > > 
> > >   pci__06_00_0
> > >   /sys/devices/.../:06:00.0
> > >   pci__05_08_0
> > >   ...
> > >   
> > > ...
> > > 
> > >   
> > > GRID M60-0B
> > > num_heads=2, frl_config=45, framebuffer=512M, 
max_resolution=2560x1600, max_instance=16
> >
> > This 'description' field is pretty horrific.
> >
> > We were quite clear that we were not going to expose arbitrary attributes
> > in the XML without modelling them explicitly as XML elements. Using the
> > description field in this way is just doing arbitrary attribute passthrough
> > via the backdoor - it is clear that applications are doing to end up parsing
> > this 'description' data and will them complain if we later change it.
> >
>
> I remember us stating that, but this is the case with NVIDIA (who at least
> reports some useful information in it) and Intel - what if other vendor comes
> and creates a valid, human readable description of a device without specifying
> any attributes like in the case above? Just because some vendors "abused" the
> attribute doesn't mean we should stop reporting it completely. IMHO the name
> "description" should mean that it's something raw, possibly unstructured, and
> thus the applications should treat it that way. Now, this is my bad and I need
> to revisit the last patch with documentation and add a paragraph for the
>  element as an optional element with raw data.
>
> Until all the attributes are properly unified/standardized under sysfs ABI and
> respected by vendors, I think we should report everything we're able to gather
> about a device, description included. If properly documented, nobody can
> complain about us breaking anything, since how do you guarantee format-less 
raw
> string anyway? After all, applications will end up parsing it anyway, be it 
from
> our XML or not.

I understand your point, but I'm still not in favour of exposing this info
because it is a clear cut attempt to do arbitrary attribute passthrough to
libvirt.

All the attributes show there can be determined by a simple lookup based on
the name field "GRID M60-0B". So if apps want to know the number of heads,
framebuffer size, etc, etc I think they should just maintain a lookup map
based on name in their own code, until we explicitly model this stuff in
the XML.

Once we model the attributes in the XML, this description adds no useful
info that we wouldn't already be reported, and before we model it in the
XML, this is just clearly an abuse of our design statement that we are not
doing generic attribute passthrough.


I told Erik I'd jump in here, but I think I might agree with Dan.  On
the kernel side, the description attribute was an attempt to pull
ourselves out of a rat hole of trying to figure out how to classify
devices and then figure out what attributes for each class might be
common across vendors.  Clearly it'd be great to somehow know that a
device is a GPU and has a resolution and graphics memory attribute, but
getting there is hard.  A free-form description field is sort of a
catch-all where the vendor can provide a stop-gap and it's useful for
human consumption.  It would be inadvisable to parse it with machine
code though.  Including it in the XML sort of invites that possibility.

A given mdev type is supposed to be stable.  It should always point to
a software equivalent device, including capabilities and resources.  It
should therefore be valid for upper level software to use the type to
lookup from a human generated table the properties for that device.  Of



From an upper level software's perspective, this is super inefficient

unless database like pci IDs exist. Each management software
maintaining their possibly outdated lookup table may lead to
inconsistence in the user experience and data.

If such database existed, there is no reason for libvirt not to use it
and report the data in some sane 

Re: [libvirt] docs: Document the new hostdev and address type 'mdev'

2017-02-27 Thread Martin Polednik

On 20/02/17 15:28 +0100, Erik Skultety wrote:

Signed-off-by: Erik Skultety 
---
docs/formatdomain.html.in | 48 +++
1 file changed, 44 insertions(+), 4 deletions(-)

diff --git a/docs/formatdomain.html.in b/docs/formatdomain.html.in
index b69bd4c..13cb767 100644
--- a/docs/formatdomain.html.in
+++ b/docs/formatdomain.html.in
@@ -3277,8 +3277,20 @@
attributes: iobase and irq.
Since 1.2.1
  
+  mdev
+  Mediated devices' addresses have so far only one mandatory attribute
+uuid (since 3.1.0) which
+uniquely identifies a mediated device under the syfs file system.
+  


+
+Note: Due to nature of mediated devices, being only software devices
+defining an allocation of resources on the physical parent device, the
+address type mdev is supposed to be used to identify a
+device on the host only, rather than identifying it in the guest.
+
+
Controllers


@@ -3774,6 +3786,19 @@
  /devices
  ...

+or:
+
+
+  ...
+  devices
+hostdev mode='subsystem' type='mdev' model='vfio-pci'
+source
+  address type='mdev' uuid='c2177883-f1bb-47f0-914d-32a22e3a8804'
+/source
+/hostdev
+  /devices
+  ...


Can't really test it yet, but from the docs/code seems to be OK for
oVirt.


+

  hostdev
  The hostdev element is the main container for describing
@@ -3818,12 +3843,23 @@
type passes all LUNs presented by a single HBA to
the guest.
  
+  mdev
+  For mediated devices (Since 3.1.0)
+  the model attribute specifies the device API which
+  determines how the host's vfio driver will expose the device to the
+  guest. Currently, only vfio-pci model is supported.
+  The model also has implications on the guest's address type, i.e.
+  for vfio-pci device API any address type other than PCI
+  will result in an error.
+  


-  Note: The managed attribute is only used with PCI 
devices
-  and is ignored by all the other device types, thus setting
-  managed explicitly with other than PCI device has the 
same
-  effect as omitting it.
+  Note: The managed attribute is only used with PCI and is
+  ignored by all the other device types, thus setting
+  managed explicitly with other than a PCI device has the
+  same effect as omitting it. Similarly, model attribute 
is
+  only supported by mediated devices and ignored by all other device
+  types.

  
  source
@@ -3888,6 +3924,10 @@
is the vhost_scsi wwpn (16 hexadecimal digits with a prefix of
"naa.") established in the host configfs.
  
+  mdev
+  Mediated devices (Since 3.1.0) are
+  described by the address element.
+  

  
  vendor, product
--
2.10.2

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] Introduce vGPU mdev framework to libvirt

2017-02-15 Thread Martin Polednik

On 14/02/17 09:58 -0700, Alex Williamson wrote:

On Tue, 14 Feb 2017 16:50:14 +0100
Martin Polednik <mpoled...@redhat.com> wrote:


On 07/02/17 12:29 -0700, Alex Williamson wrote:
>On Tue, 7 Feb 2017 17:26:51 +0100
>Erik Skultety <eskul...@redhat.com> wrote:
>
>> On Mon, Feb 06, 2017 at 09:33:14AM -0700, Alex Williamson wrote:
>> > On Mon,  6 Feb 2017 13:19:42 +0100
>> > Erik Skultety <eskul...@redhat.com> wrote:
>> >
>> > > Finally. It's here. This is the initial suggestion on how libvirt might
>> > > interract with the mdev framework, currently only focussing on the 
non-managed
>> > > devices, i.e. those pre-created by the user, since that will be 
revisited once
>> > > we all settled on how the XML should look like, given we might not want 
to use
>> > > the sysfs path directly as an attribute in the domain XML. My proposal 
on the
>> > > XML is the following:
>> > >
>> > > 
>> > > 
>> > > 
>> > > 
>> > > vGPU_UUID
>> > > 
>> > > 
>> > > 
>> > >
>> > > So the mediated device is identified by the physical parent device 
visible on
>> > > the host and a UUID which allows us to construct the sysfs path by 
ourselves,
>> > > which we then put on the QEMU's command line.
>> >
>> > Based on your test code, I think you're creating something like this:
>> >
>> > -device 
vfio-pci,sysfsdev=/sys/class/mdev_bus/:00:03.0/53764d0e-85a0-42b4-af5c-2046b460b1dc
>> >
>> > That would explain the need for the parent device address, but that's
>> > an entirely self inflicted requirement.  For a managed="no" scenarios,
>> > we shouldn't need the parent, we can get to the mdev device
>> > via /sys/bus/mdev/devices/53764d0e-85a0-42b4-af5c-2046b460b1dc.  So it
>>
>> True, for managed="no" would this path be a nice optimization.
>>
>> > seems that the UUID should be the only required source element for
>> > managed="no".
>> >
>> > For managed="yes", it seems like the parent device is still an optional
>>
>> The reason I went with the parent address element (and purposely neglecting 
the
>> sample mtty driver) was that I assumed any modern mdev capable HW would be
>> accessible through the PCI bus on the host. Also I wanted to explicitly hint
>> libvirt as much as possible which parent device a vGPU device instance should
>> be created on in case there are more than one of them, rather then scanning
>> sysfs for a suitable parent which actually supports the given vGPU type.
>>
>> > field.  The most important thing that libvirt needs to know when
>> > creating a mdev device for a VM is the mdev type name.  The parent
>> > device should be an optional field to help higher level management
>> > tools deal with placement of the device for locality or load balancing.
>> > Also, we can't assume that the parent device is a PCI device, the
>> > sample mtty driver already breaks this assumption.
>>
>> Since we need to assume non-PCI devices and we still need to enable 
management
>> to hint libvirt about the parent to utilize load balancing and stuff, I've 
come
>> up with the following adjustments/ideas on how to reflect that in the XML:
>> - still use the address element but use it with the 'type' attribute [1] 
(still
>>   breaks the sample mtty driver though) while making the element truly 
optional
>>   if I'm going to be outvoted in favor of scanning the directory for a 
suitable
>>   parent device on our own, rather than requiring the user to provide that
>>
>> - providing either an attribute or a standalone element for the parent device
>>   name, like a string version of the PCI address or whatever form the parent
>>   device comes in (doesn't break the mtty driver but I don't quite like this)
>>
>> - providing a path element/attribute to sysfs pointing to the parent device
>>   which I'm afraid is what Daniel is not in favor of libvirt doing
>>
>> So, this is what I've so far come up with in terms of hinting libvirt about 
the
>> parent device, do you have any input on this, maybe some more ideas on how we
>> should identify the parent device?
>
>IMO, if we cannot account for the mtty sample driver, we're doing it
>wrong.  I suppose we can leave it unspecified how one selects a parent
>device for the mtty driver, but it should be possible to expand the
>syntax to include it.  So I think that means that wh

Re: [libvirt] Introduce vGPU mdev framework to libvirt

2017-02-14 Thread Martin Polednik

On 07/02/17 12:29 -0700, Alex Williamson wrote:

On Tue, 7 Feb 2017 17:26:51 +0100
Erik Skultety  wrote:


On Mon, Feb 06, 2017 at 09:33:14AM -0700, Alex Williamson wrote:
> On Mon,  6 Feb 2017 13:19:42 +0100
> Erik Skultety  wrote:
>
> > Finally. It's here. This is the initial suggestion on how libvirt might
> > interract with the mdev framework, currently only focussing on the 
non-managed
> > devices, i.e. those pre-created by the user, since that will be revisited 
once
> > we all settled on how the XML should look like, given we might not want to 
use
> > the sysfs path directly as an attribute in the domain XML. My proposal on 
the
> > XML is the following:
> >
> > 
> > 
> > 
> > 
> > vGPU_UUID
> > 
> > 
> > 
> >
> > So the mediated device is identified by the physical parent device visible 
on
> > the host and a UUID which allows us to construct the sysfs path by 
ourselves,
> > which we then put on the QEMU's command line.
>
> Based on your test code, I think you're creating something like this:
>
> -device 
vfio-pci,sysfsdev=/sys/class/mdev_bus/:00:03.0/53764d0e-85a0-42b4-af5c-2046b460b1dc
>
> That would explain the need for the parent device address, but that's
> an entirely self inflicted requirement.  For a managed="no" scenarios,
> we shouldn't need the parent, we can get to the mdev device
> via /sys/bus/mdev/devices/53764d0e-85a0-42b4-af5c-2046b460b1dc.  So it

True, for managed="no" would this path be a nice optimization.

> seems that the UUID should be the only required source element for
> managed="no".
>
> For managed="yes", it seems like the parent device is still an optional

The reason I went with the parent address element (and purposely neglecting the
sample mtty driver) was that I assumed any modern mdev capable HW would be
accessible through the PCI bus on the host. Also I wanted to explicitly hint
libvirt as much as possible which parent device a vGPU device instance should
be created on in case there are more than one of them, rather then scanning
sysfs for a suitable parent which actually supports the given vGPU type.

> field.  The most important thing that libvirt needs to know when
> creating a mdev device for a VM is the mdev type name.  The parent
> device should be an optional field to help higher level management
> tools deal with placement of the device for locality or load balancing.
> Also, we can't assume that the parent device is a PCI device, the
> sample mtty driver already breaks this assumption.

Since we need to assume non-PCI devices and we still need to enable management
to hint libvirt about the parent to utilize load balancing and stuff, I've come
up with the following adjustments/ideas on how to reflect that in the XML:
- still use the address element but use it with the 'type' attribute [1] (still
  breaks the sample mtty driver though) while making the element truly optional
  if I'm going to be outvoted in favor of scanning the directory for a suitable
  parent device on our own, rather than requiring the user to provide that

- providing either an attribute or a standalone element for the parent device
  name, like a string version of the PCI address or whatever form the parent
  device comes in (doesn't break the mtty driver but I don't quite like this)

- providing a path element/attribute to sysfs pointing to the parent device
  which I'm afraid is what Daniel is not in favor of libvirt doing

So, this is what I've so far come up with in terms of hinting libvirt about the
parent device, do you have any input on this, maybe some more ideas on how we
should identify the parent device?


IMO, if we cannot account for the mtty sample driver, we're doing it
wrong.  I suppose we can leave it unspecified how one selects a parent
device for the mtty driver, but it should be possible to expand the
syntax to include it.  So I think that means that when the parent
address is provided, the parent address type needs to be specified as
PCI.  So...



This needs to encompass the device API or else the optional VM address
cannot be resolved.  Perhaps model='vfio-pci' here?  Seems similar to
how we specify the device type for PCI controllers where we have
multiple options:



  

For managed='no', I don't see that anything other than the mdev UUID is
useful.

MDEV_UUID

If libvirt gets into the business of creating mdev devices and we call
that managed='yes', then the mdev type to create is required.  I don't
know whether there's anything similar we can steal syntax from:

"nvidia-11"

That's pretty horrible, needs some xml guru love.

We need to provide for specifying a parent, but we can't assume the



From higher level perspective, I believe it would be "good

enough" for most of the cases to only specify the type. Libvirt will
anyway have to be able to enumerate the devices for listAllDevices
afaik.

My wish would be specifying

   nvidia-11

unless the user has 

[libvirt] [PATCH v2 3/5] virDomainFormatSchedDef: factor out subset code

2016-11-21 Thread Martin Polednik
The code within the function is too specific for priority attribute of
RT schedulers. To allow addition of schedulers that group by different
properties, we factor out the logic to calculate cpu subset. Instead
of comparing by priority, the new code accepts comparator for the 2
sched structs.
---
 src/conf/domain_conf.c | 66 --
 1 file changed, 48 insertions(+), 18 deletions(-)

diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c
index 2dd5e6f..fe5a754 100644
--- a/src/conf/domain_conf.c
+++ b/src/conf/domain_conf.c
@@ -23065,6 +23065,44 @@ virDomainDefHasCapabilitiesFeatures(virDomainDefPtr 
def)
 return false;
 }
 
+static bool
+virDomainSchedPriorityComparator(virDomainThreadSchedParamPtr baseSched,
+ virDomainThreadSchedParamPtr sched)
+{
+return (baseSched->priority == sched->priority);
+}
+
+static virDomainThreadSchedParamPtr
+virDomainSchedSubsetCharacteristic(virDomainDefPtr def,
+   virBitmapPtr schedMap,
+   virBitmapPtr prioMap,
+   virDomainThreadSchedParamPtr 
(*func)(virDomainDefPtr, unsigned int),
+   bool 
(*comparator)(virDomainThreadSchedParamPtr,
+  
virDomainThreadSchedParamPtr))
+{
+ssize_t nextprio;
+virDomainThreadSchedParamPtr sched;
+virDomainThreadSchedParamPtr baseSched = NULL;
+
+virBitmapClearAll(prioMap);
+
+/* we need to find a subset of vCPUs with the given scheduler
+* that share the priority */
+nextprio = virBitmapNextSetBit(schedMap, -1);
+if (!(sched = func(def, nextprio)))
+return NULL;
+
+baseSched = sched;
+ignore_value(virBitmapSetBit(prioMap, nextprio));
+
+while ((nextprio = virBitmapNextSetBit(schedMap, nextprio)) > -1) {
+sched = func(def, nextprio);
+if (sched && comparator(baseSched, sched))
+ignore_value(virBitmapSetBit(prioMap, nextprio));
+}
+
+return baseSched;
+}
 
 /**
  * virDomainFormatSchedDef:
@@ -23091,6 +23129,7 @@ virDomainFormatSchedDef(virDomainDefPtr def,
 virBitmapPtr schedMap = NULL;
 virBitmapPtr prioMap = NULL;
 virDomainThreadSchedParamPtr sched;
+virDomainThreadSchedParamPtr baseSched;
 char *tmp = NULL;
 ssize_t next;
 size_t i;
@@ -23124,9 +23163,8 @@ virDomainFormatSchedDef(virDomainDefPtr def,
  * have them */
 while (!virBitmapIsAllClear(schedMap)) {
 virBitmapPtr currentMap = NULL;
-ssize_t nextprio;
 bool hasPriority = false;
-int priority = 0;
+baseSched = NULL;
 
 switch ((virProcessSchedPolicy) i) {
 case VIR_PROC_POLICY_NONE:
@@ -23139,24 +23177,16 @@ virDomainFormatSchedDef(virDomainDefPtr def,
 
 case VIR_PROC_POLICY_FIFO:
 case VIR_PROC_POLICY_RR:
-virBitmapClearAll(prioMap);
 hasPriority = true;
 
-/* we need to find a subset of vCPUs with the given scheduler
- * that share the priority */
-nextprio = virBitmapNextSetBit(schedMap, -1);
-if (!(sched = func(def, nextprio)))
+baseSched = virDomainSchedSubsetCharacteristic(def,
+   schedMap,
+   prioMap,
+   func,
+   
virDomainSchedPriorityComparator);
+if (baseSched == NULL)
 goto cleanup;
 
-priority = sched->priority;
-ignore_value(virBitmapSetBit(prioMap, nextprio));
-
-while ((nextprio = virBitmapNextSetBit(schedMap, nextprio)) > 
-1) {
-sched = func(def, nextprio);
-if (sched && sched->priority == priority)
-ignore_value(virBitmapSetBit(prioMap, nextprio));
-}
-
 currentMap = prioMap;
 break;
 }
@@ -23171,8 +23201,8 @@ virDomainFormatSchedDef(virDomainDefPtr def,
   virProcessSchedPolicyTypeToString(i));
 VIR_FREE(tmp);
 
-if (hasPriority)
-virBufferAsprintf(buf, " priority='%d'", priority);
+if (hasPriority && baseSched != NULL)
+virBufferAsprintf(buf, " priority='%d'", baseSched->priority);
 
 virBufferAddLit(buf, "/>\n");
 
-- 
2.8.1

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


[libvirt] [PATCH v2 4/5] virDomainFormatSchedDef: rename prioMap and nextprio

2016-11-21 Thread Martin Polednik
Since the code is now generalized beyond just vcpusched priority, it's
better to rename the variables that indicated direct relationship with
priority: prioMap and nextprio.
---
 src/conf/domain_conf.c | 28 ++--
 1 file changed, 14 insertions(+), 14 deletions(-)

diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c
index fe5a754..9ec23be 100644
--- a/src/conf/domain_conf.c
+++ b/src/conf/domain_conf.c
@@ -23075,30 +23075,30 @@ 
virDomainSchedPriorityComparator(virDomainThreadSchedParamPtr baseSched,
 static virDomainThreadSchedParamPtr
 virDomainSchedSubsetCharacteristic(virDomainDefPtr def,
virBitmapPtr schedMap,
-   virBitmapPtr prioMap,
+   virBitmapPtr subsetMap,
virDomainThreadSchedParamPtr 
(*func)(virDomainDefPtr, unsigned int),
bool 
(*comparator)(virDomainThreadSchedParamPtr,
   
virDomainThreadSchedParamPtr))
 {
-ssize_t nextprio;
+ssize_t nextidx;
 virDomainThreadSchedParamPtr sched;
 virDomainThreadSchedParamPtr baseSched = NULL;
 
-virBitmapClearAll(prioMap);
+virBitmapClearAll(subsetMap);
 
 /* we need to find a subset of vCPUs with the given scheduler
 * that share the priority */
-nextprio = virBitmapNextSetBit(schedMap, -1);
-if (!(sched = func(def, nextprio)))
+nextidx = virBitmapNextSetBit(schedMap, -1);
+if (!(sched = func(def, nextidx)))
 return NULL;
 
 baseSched = sched;
-ignore_value(virBitmapSetBit(prioMap, nextprio));
+ignore_value(virBitmapSetBit(subsetMap, nextidx));
 
-while ((nextprio = virBitmapNextSetBit(schedMap, nextprio)) > -1) {
-sched = func(def, nextprio);
+while ((nextidx = virBitmapNextSetBit(schedMap, nextidx)) > -1) {
+sched = func(def, nextidx);
 if (sched && comparator(baseSched, sched))
-ignore_value(virBitmapSetBit(prioMap, nextprio));
+ignore_value(virBitmapSetBit(subsetMap, nextidx));
 }
 
 return baseSched;
@@ -23127,7 +23127,7 @@ virDomainFormatSchedDef(virDomainDefPtr def,
 virBitmapPtr resourceMap)
 {
 virBitmapPtr schedMap = NULL;
-virBitmapPtr prioMap = NULL;
+virBitmapPtr subsetMap = NULL;
 virDomainThreadSchedParamPtr sched;
 virDomainThreadSchedParamPtr baseSched;
 char *tmp = NULL;
@@ -23144,7 +23144,7 @@ virDomainFormatSchedDef(virDomainDefPtr def,
  */
 
 if (!(schedMap = virBitmapNew(VIR_DOMAIN_CPUMASK_LEN)) ||
-!(prioMap = virBitmapNew(VIR_DOMAIN_CPUMASK_LEN)))
+!(subsetMap = virBitmapNew(VIR_DOMAIN_CPUMASK_LEN)))
 goto cleanup;
 
 for (i = VIR_PROC_POLICY_NONE + 1; i < VIR_PROC_POLICY_LAST; i++) {
@@ -23181,13 +23181,13 @@ virDomainFormatSchedDef(virDomainDefPtr def,
 
 baseSched = virDomainSchedSubsetCharacteristic(def,
schedMap,
-   prioMap,
+   subsetMap,
func,

virDomainSchedPriorityComparator);
 if (baseSched == NULL)
 goto cleanup;
 
-currentMap = prioMap;
+currentMap = subsetMap;
 break;
 }
 
@@ -23215,7 +23215,7 @@ virDomainFormatSchedDef(virDomainDefPtr def,
 
  cleanup:
 virBitmapFree(schedMap);
-virBitmapFree(prioMap);
+virBitmapFree(subsetMap);
 return ret;
 }
 
-- 
2.8.1

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


[libvirt] [PATCH v2 2/5] util: allow virProcessSetScheduler to set SCHED_DEADLINE

2016-11-21 Thread Martin Polednik
As sched_deadline is linux specific, it is not set through
sched_setscheduler but rather the sched_setattr syscall. Additionally,
the scheduler has new set of parameters: runtime, deadline and period.

In this part of the series, we extend virProcessSetScheduler to
accommodate the additional parameters and use sched_setattr syscall to
set SCHED_DEADLINE. Another small addition is sched_attr struct, which
is required for sched_setattr and not exposed from the kernel or
libraries.
---
 src/qemu/qemu_process.c |  3 ++-
 src/util/virprocess.c   | 37 +
 src/util/virprocess.h   | 25 -
 3 files changed, 59 insertions(+), 6 deletions(-)

diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c
index 3552a31..e984ccd 100644
--- a/src/qemu/qemu_process.c
+++ b/src/qemu/qemu_process.c
@@ -2395,7 +2395,8 @@ qemuProcessSetupPid(virDomainObjPtr vm,
 
 /* Set scheduler type and priority. */
 if (sched &&
-virProcessSetScheduler(pid, sched->policy, sched->priority) < 0)
+virProcessSetScheduler(pid, sched->policy, sched->priority,
+   sched->runtime, sched->deadline, sched->period) 
< 0)
 goto cleanup;
 
 ret = 0;
diff --git a/src/util/virprocess.c b/src/util/virprocess.c
index 9b4a555..acf4d6b 100644
--- a/src/util/virprocess.c
+++ b/src/util/virprocess.c
@@ -67,6 +67,7 @@
 VIR_LOG_INIT("util.process");
 
 #ifdef __linux__
+# include 
 /*
  * Workaround older glibc. While kernel may support the setns
  * syscall, the glibc wrapper might not exist. If that's the
@@ -87,10 +88,8 @@ VIR_LOG_INIT("util.process");
 #   define __NR_setns 339
 #  endif
 # endif
-
 # ifndef HAVE_SETNS
 #  if defined(__NR_setns)
-#   include 
 
 static inline int setns(int fd, int nstype)
 {
@@ -100,6 +99,24 @@ static inline int setns(int fd, int nstype)
 #   error Please determine the syscall number for setns on your architecture
 #  endif
 # endif
+
+# if defined(SCHED_DEADLINE) && defined(__NR_sched_setattr)
+static inline int sched_setattr(pid_t pid,
+const struct sched_attr *attr,
+unsigned int flags)
+{
+return syscall(__NR_sched_setattr, pid, attr, flags);
+}
+# else
+static inline int sched_setattr(pid_t pid,
+const struct sched_attr *attr,
+unsigned int flags)
+{
+virReportSystemError(ENOSYS, "%s",
+ _("Deadline scheduler is not supported on this 
platform."));
+return -1;
+}
+# endif
 #else /* !__linux__ */
 static inline int setns(int fd ATTRIBUTE_UNUSED, int nstype ATTRIBUTE_UNUSED)
 {
@@ -1234,10 +1251,16 @@ virProcessSchedTranslatePolicy(virProcessSchedPolicy 
policy)
 int
 virProcessSetScheduler(pid_t pid,
virProcessSchedPolicy policy,
-   int priority)
+   int priority,
+   unsigned long long runtime,
+   unsigned long long deadline,
+   unsigned long long period)
 {
 struct sched_param param = {0};
+struct sched_attr attrs = { sizeof(attrs), SCHED_DEADLINE, 0, 0, 0,
+runtime, deadline, period };
 int pol = virProcessSchedTranslatePolicy(policy);
+int ret = 0;
 
 VIR_DEBUG("pid=%lld, policy=%d, priority=%u",
   (long long) pid, policy, priority);
@@ -1280,7 +1303,13 @@ virProcessSetScheduler(pid_t pid,
 param.sched_priority = priority;
 }
 
-if (sched_setscheduler(pid, pol, ) < 0) {
+if (pol == SCHED_DEADLINE) {
+ret = sched_setattr(pid, , 0);
+} else {
+ret = sched_setscheduler(pid, pol, );
+}
+
+if (ret < 0) {
 virReportSystemError(errno,
  _("Cannot set scheduler parameters for pid %lld"),
  (long long) pid);
diff --git a/src/util/virprocess.h b/src/util/virprocess.h
index 1eac3e6..02522b4 100644
--- a/src/util/virprocess.h
+++ b/src/util/virprocess.h
@@ -41,6 +41,26 @@ typedef enum {
 
 VIR_ENUM_DECL(virProcessSchedPolicy);
 
+# ifdef SCHED_DEADLINE
+struct sched_attr {
+uint32_t size;
+
+uint32_t sched_policy;
+uint64_t sched_flags;
+
+/* SCHED_NORMAL, SCHED_BATCH */
+uint32_t sched_nice;
+
+/* SCHED_FIFO, SCHED_RR */
+uint32_t sched_priority;
+
+/* SCHED_DEADLINE (nsec) */
+uint64_t sched_runtime;
+uint64_t sched_deadline;
+uint64_t sched_period;
+};
+# endif
+
 char *
 virProcessTranslateStatus(int status);
 
@@ -93,6 +113,9 @@ int virProcessRunInMountNamespace(pid_t pid,
 
 int virProcessSetScheduler(pid_t pid,
virProcessSchedPolicy policy,
-   int priority);
+   int priority,
+   unsigned long long runtime,
+   unsigned long long deadline,
+   

[libvirt] [PATCH v2 5/5] conf: add deadline scheduler

2016-11-21 Thread Martin Polednik
As the code for changing task scheduler is now able to choose deadline
scheduler, we can update domain configuration to parse the scheduler.
---
 docs/formatdomain.html.in |  15 +++---
 docs/schemas/domaincommon.rng |  16 +++
 src/conf/domain_conf.c| 108 --
 3 files changed, 130 insertions(+), 9 deletions(-)

diff --git a/docs/formatdomain.html.in b/docs/formatdomain.html.in
index 4e40aa1..2f654f9 100644
--- a/docs/formatdomain.html.in
+++ b/docs/formatdomain.html.in
@@ -799,12 +799,12 @@
   
 The optional vcpusched elements specifies the scheduler
 type (values batch, idle, fifo,
-rr) for particular vCPU/IOThread threads (based on
-vcpus and iothreads, leaving out
-vcpus/iothreads sets the default). Valid
-vcpus values start at 0 through one less than the
-number of vCPU's defined for the domain. Valid iothreads
-values are described in the iothreadids
+rr, deadline) for particular vCPU/IOThread
+threads (based on vcpus and iothreads,
+leaving out vcpus/iothreads sets the
+default). Valid vcpus values start at 0 through one less
+than the number of vCPU's defined for the domain. Valid
+iothreads values are described in the 
iothreadids
 description.
 If no iothreadids are defined, then libvirt numbers
 IOThreads from 1 to the number of iothreads available
@@ -812,6 +812,9 @@
 rr), priority must be specified as
 well (and is ignored for non-real-time ones). The value range
 for the priority depends on the host kernel (usually 1-99).
+For deadline real-time scheduler, additional parameters runtime,
+deadline and period must be specified. The value of these parameters is
+specified in nanoseconds, where minimum is 1024 (1 usec).
 Since 1.2.13
   
 
diff --git a/docs/schemas/domaincommon.rng b/docs/schemas/domaincommon.rng
index 19d45fd..1461d34 100644
--- a/docs/schemas/domaincommon.rng
+++ b/docs/schemas/domaincommon.rng
@@ -851,6 +851,22 @@
   
 
   
+  
+
+  
+deadline
+  
+
+
+  
+
+
+  
+
+
+  
+
+  
 
   
 
diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c
index 9ec23be..342745e 100644
--- a/src/conf/domain_conf.c
+++ b/src/conf/domain_conf.c
@@ -4518,11 +4518,13 @@ static int
 virDomainVcpuDefPostParse(virDomainDefPtr def)
 {
 virDomainVcpuDefPtr vcpu;
+virDomainThreadSchedParamPtr sched;
 size_t maxvcpus = virDomainDefGetVcpusMax(def);
 size_t i;
 
 for (i = 0; i < maxvcpus; i++) {
 vcpu = virDomainDefGetVcpu(def, i);
+sched = >sched;
 
 /* impossible but some compilers don't like it */
 if (!vcpu)
@@ -4549,6 +4551,35 @@ virDomainVcpuDefPostParse(virDomainDefPtr def)
 case VIR_TRISTATE_BOOL_LAST:
 break;
 }
+
+switch (sched->policy) {
+case VIR_PROC_POLICY_NONE:
+case VIR_PROC_POLICY_BATCH:
+case VIR_PROC_POLICY_IDLE:
+case VIR_PROC_POLICY_FIFO:
+case VIR_PROC_POLICY_RR:
+break;
+case VIR_PROC_POLICY_DEADLINE:
+if (sched->runtime < 1024 || sched->deadline < 1024 || 
sched->period < 1024) {
+virReportError(VIR_ERR_CONFIG_UNSUPPORTED,
+_("Scheduler runtime, deadline and period must be "
+"higher or equal to 1024 (1 us) (runtime(%llu), "
+"deadline(%llu), period(%llu))"),
+sched->runtime, sched->deadline, sched->period);
+return -1;
+}
+
+if (!(sched->runtime <= sched->deadline && sched->deadline <= 
sched->period)) {
+virReportError(VIR_ERR_CONFIG_UNSUPPORTED,
+_("Scheduler configuration does not satisfy "
+"(runtime(%llu) <= deadline(%llu) <= 
period(%llu))"),
+sched->runtime, sched->deadline, sched->period);
+return -1;
+}
+break;
+case VIR_PROC_POLICY_LAST:
+break;
+}
 }
 
 return 0;
@@ -15717,7 +15748,10 @@ static virBitmapPtr
 virDomainSchedulerParse(xmlNodePtr node,
 const char *name,
 virProcessSchedPolicy *policy,
-int *priority)
+int *priority,
+unsigned long long *runtime,
+unsigned long long *deadline,
+unsigned long long *period)
 {
 virBitmapPtr ret = NULL;
 char *tmp = NULL;
@@ -15773,6 +15807,42 @@ virDomainSchedulerParse(xmlNodePtr node,
 VIR_FREE(tmp);
 }
 
+if (pol == 

[libvirt] [PATCH v2 0/5] Add deadline scheduler

2016-11-21 Thread Martin Polednik
The policy SCHED_DEADLINE is available since kernel 3.14 (and most likely
backported to older RT_PREEMPT kernels). It is safer to use than fifo or round
robin policies due to only limiting part of cpu time for the RT process,
leading to lack of lockups of the host.

The series adds new vcpusched/iothreadsched called 'deadline' that activates
SCHED_DEADLINE for given process. As the scheduler is linux specific, extra
implementation was required - it is not possible to use sched_setscheduler,
sched_setattr syscall must be used.

Changes in v2:
* using newly merged SCHED_* facilities
* split renaming and refactoring in virDomainFormatSchedDef
* squashed conf additions
* moved error handling to *PostParse

Martin Polednik (5):
  conf: add entries for deadline scheduler
  util: allow virProcessSetScheduler to set SCHED_DEADLINE
  virDomainFormatSchedDef: factor out subset code
  virDomainFormatSchedDef: rename prioMap and nextprio
  conf: add deadline scheduler

 docs/formatdomain.html.in |  15 ++--
 docs/schemas/domaincommon.rng |  16 
 src/conf/domain_conf.c| 179 --
 src/conf/domain_conf.h|   3 +
 src/qemu/qemu_process.c   |   3 +-
 src/util/virprocess.c |  50 ++--
 src/util/virprocess.h |  26 +-
 7 files changed, 256 insertions(+), 36 deletions(-)

-- 
2.8.1

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


[libvirt] [PATCH v2 1/5] conf: add entries for deadline scheduler

2016-11-21 Thread Martin Polednik
Deadline scheduler, or SCHED_DEADLINE, is a new realtime scheduler
added to Linux 3.14. In order to support it as a possible value in
vcpusched/iothreadsched, we have to add it to scheduler related
structures.
---
 src/conf/domain_conf.c |  1 +
 src/conf/domain_conf.h |  3 +++
 src/util/virprocess.c  | 13 -
 src/util/virprocess.h  |  1 +
 4 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c
index 6e008e2..2dd5e6f 100644
--- a/src/conf/domain_conf.c
+++ b/src/conf/domain_conf.c
@@ -23132,6 +23132,7 @@ virDomainFormatSchedDef(virDomainDefPtr def,
 case VIR_PROC_POLICY_NONE:
 case VIR_PROC_POLICY_BATCH:
 case VIR_PROC_POLICY_IDLE:
+case VIR_PROC_POLICY_DEADLINE:
 case VIR_PROC_POLICY_LAST:
 currentMap = schedMap;
 break;
diff --git a/src/conf/domain_conf.h b/src/conf/domain_conf.h
index 541b600..67b9e3e 100644
--- a/src/conf/domain_conf.h
+++ b/src/conf/domain_conf.h
@@ -1840,6 +1840,9 @@ typedef virDomainThreadSchedParam 
*virDomainThreadSchedParamPtr;
 struct _virDomainThreadSchedParam {
 virProcessSchedPolicy policy;
 int priority;
+unsigned long long runtime;
+unsigned long long deadline;
+unsigned long long period;
 };
 
 typedef struct _virDomainTimerCatchupDef virDomainTimerCatchupDef;
diff --git a/src/util/virprocess.c b/src/util/virprocess.c
index 3cacc89..9b4a555 100644
--- a/src/util/virprocess.c
+++ b/src/util/virprocess.c
@@ -34,6 +34,9 @@
 #endif
 #if HAVE_SCHED_SETSCHEDULER
 # include 
+# ifdef __linux__
+#  include 
+# endif
 #endif
 
 #if defined(__FreeBSD__) || defined(__FreeBSD_kernel__) || 
HAVE_BSD_CPU_AFFINITY
@@ -111,7 +114,8 @@ VIR_ENUM_IMPL(virProcessSchedPolicy, VIR_PROC_POLICY_LAST,
   "batch",
   "idle",
   "fifo",
-  "rr");
+  "rr",
+  "deadline");
 
 /**
  * virProcessTranslateStatus:
@@ -1212,6 +1216,13 @@ virProcessSchedTranslatePolicy(virProcessSchedPolicy 
policy)
 case VIR_PROC_POLICY_RR:
 return SCHED_RR;
 
+case VIR_PROC_POLICY_DEADLINE:
+# ifdef SCHED_DEADLINE
+return SCHED_DEADLINE;
+# else
+return -1;
+# endif
+
 case VIR_PROC_POLICY_LAST:
 /* nada */
 break;
diff --git a/src/util/virprocess.h b/src/util/virprocess.h
index 04e9802..1eac3e6 100644
--- a/src/util/virprocess.h
+++ b/src/util/virprocess.h
@@ -34,6 +34,7 @@ typedef enum {
 VIR_PROC_POLICY_IDLE,
 VIR_PROC_POLICY_FIFO,
 VIR_PROC_POLICY_RR,
+VIR_PROC_POLICY_DEADLINE,
 
 VIR_PROC_POLICY_LAST
 } virProcessSchedPolicy;
-- 
2.8.1

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] util: allow virProcessSetScheduler to set SCHED_DEADLINE

2016-11-15 Thread Martin Polednik

On 15/11/16 15:02 +, Daniel P. Berrange wrote:

On Tue, Nov 15, 2016 at 03:51:22PM +0100, Martin Polednik wrote:

On 10/11/16 16:21 +0100, Martin Kletzander wrote:
> On Mon, Nov 07, 2016 at 10:01:13AM +0100, Martin Polednik wrote:
> > As sched_deadline is linux specific, it is not set through
> > sched_setscheduler but rather the sched_setattr syscall. Additionally,
> > the scheduler has new set of parameters: runtime, deadline and period.
> >
> > In this part of the series, we extend virProcessSetScheduler to
> > accommodate the additional parameters and use sched_setattr syscall to
> > set SCHED_DEADLINE. Another small addition is sched_attr struct, which
> > is required for sched_setattr and not exposed from the kernel or
> > libraries.
> > ---
> > src/qemu/qemu_process.c |  3 ++-
> > src/util/virprocess.c   | 70 
+++--
> > src/util/virprocess.h   | 28 +++-
> > 3 files changed, 97 insertions(+), 4 deletions(-)
> >
> > diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c
> > index 1b67aee..91a635c 100644
> > --- a/src/qemu/qemu_process.c
> > +++ b/src/qemu/qemu_process.c
> > @@ -2395,7 +2395,8 @@ qemuProcessSetupPid(virDomainObjPtr vm,
> >
> >/* Set scheduler type and priority. */
> >if (sched &&
> > -virProcessSetScheduler(pid, sched->policy, sched->priority) < 0)
> > +virProcessSetScheduler(pid, sched->policy, sched->priority,
> > +   sched->runtime, sched->deadline, 
sched->period) < 0)
> >goto cleanup;
> >
> >ret = 0;
> > diff --git a/src/util/virprocess.c b/src/util/virprocess.c
> > index d68800b..cd1cc9b 100644
> > --- a/src/util/virprocess.c
> > +++ b/src/util/virprocess.c
> > @@ -70,6 +70,7 @@
> > VIR_LOG_INIT("util.process");
> >
> > #ifdef __linux__
> > +# include 
> > /*
> > * Workaround older glibc. While kernel may support the setns
> > * syscall, the glibc wrapper might not exist. If that's the
> > @@ -91,9 +92,14 @@ VIR_LOG_INIT("util.process");
> > #  endif
> > # endif
> >
> > +# ifndef __NR_sched_setattr
> > +#  if defined(__x86_64__)
> > +#   define __NR_sched_setattr 314
> > +#  endif
> > +# endif
> > +
>
> At least ARMs, POWERs and i386 should be defined from the start so that
> tests in CI don't fail right away.  And other platforms should error out
> right away.  But honestly, I'd drop this compatibility code.  What
> you're adding here is not a bleeding-edge feature.

It maybe makes sense to virReportSystemError in case it's not defined
-- although the feature isn't cutting edge, older kernels (such as
3.10 on CentOS) do not have the syscall (or the number)
unless they're rebases with RT_PREEMPT patch.

> > # ifndef HAVE_SETNS
> > #  if defined(__NR_setns)
> > -#   include 
> >
> > static inline int setns(int fd, int nstype)
> > {
> > @@ -103,6 +109,13 @@ static inline int setns(int fd, int nstype)
> > #   error Please determine the syscall number for setns on your architecture
> > #  endif
> > # endif
> > +
> > +static inline int sched_setattr(pid_t pid,
> > +const struct sched_attr *attr,
> > +unsigned int flags)
> > +{
> > +return syscall(__NR_sched_setattr, pid, attr, flags);
> > +}
> > #else /* !__linux__ */
> > static inline int setns(int fd ATTRIBUTE_UNUSED, int nstype 
ATTRIBUTE_UNUSED)
> > {
> > @@ -110,6 +123,15 @@ static inline int setns(int fd ATTRIBUTE_UNUSED, int 
nstype ATTRIBUTE_UNUSED)
> > _("Namespaces are not supported on this 
platform."));
> >return -1;
> > }
> > +
> > +static inline int sched_setattr(pid_t pid,
> > +const struct sched_attr *attr,
> > +unsigned int flags)
> > +{
> > +virReportSystemError(ENOSYS, "%s",
> > + _("Deadline scheduler is not supported on this 
platform."));
> > +return -1;
> > +}
> > #endif
> >
> > VIR_ENUM_IMPL(virProcessSchedPolicy, VIR_PROC_POLICY_LAST,
> > @@ -1225,7 +1247,10 @@ virProcessSchedTranslatePolicy(virProcessSchedPolicy 
policy)
> > int
> > virProcessSetScheduler(pid_t pid,
> >   virProcessSchedPolicy policy,
> > -   int priority)
> > +   int priority,
> > +  

Re: [libvirt] util: allow virProcessSetScheduler to set SCHED_DEADLINE

2016-11-15 Thread Martin Polednik

On 10/11/16 16:21 +0100, Martin Kletzander wrote:

On Mon, Nov 07, 2016 at 10:01:13AM +0100, Martin Polednik wrote:

As sched_deadline is linux specific, it is not set through
sched_setscheduler but rather the sched_setattr syscall. Additionally,
the scheduler has new set of parameters: runtime, deadline and period.

In this part of the series, we extend virProcessSetScheduler to
accommodate the additional parameters and use sched_setattr syscall to
set SCHED_DEADLINE. Another small addition is sched_attr struct, which
is required for sched_setattr and not exposed from the kernel or
libraries.
---
src/qemu/qemu_process.c |  3 ++-
src/util/virprocess.c   | 70 +++--
src/util/virprocess.h   | 28 +++-
3 files changed, 97 insertions(+), 4 deletions(-)

diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c
index 1b67aee..91a635c 100644
--- a/src/qemu/qemu_process.c
+++ b/src/qemu/qemu_process.c
@@ -2395,7 +2395,8 @@ qemuProcessSetupPid(virDomainObjPtr vm,

   /* Set scheduler type and priority. */
   if (sched &&
-virProcessSetScheduler(pid, sched->policy, sched->priority) < 0)
+virProcessSetScheduler(pid, sched->policy, sched->priority,
+   sched->runtime, sched->deadline, sched->period) 
< 0)
   goto cleanup;

   ret = 0;
diff --git a/src/util/virprocess.c b/src/util/virprocess.c
index d68800b..cd1cc9b 100644
--- a/src/util/virprocess.c
+++ b/src/util/virprocess.c
@@ -70,6 +70,7 @@
VIR_LOG_INIT("util.process");

#ifdef __linux__
+# include 
/*
* Workaround older glibc. While kernel may support the setns
* syscall, the glibc wrapper might not exist. If that's the
@@ -91,9 +92,14 @@ VIR_LOG_INIT("util.process");
#  endif
# endif

+# ifndef __NR_sched_setattr
+#  if defined(__x86_64__)
+#   define __NR_sched_setattr 314
+#  endif
+# endif
+


At least ARMs, POWERs and i386 should be defined from the start so that
tests in CI don't fail right away.  And other platforms should error out
right away.  But honestly, I'd drop this compatibility code.  What
you're adding here is not a bleeding-edge feature.


It maybe makes sense to virReportSystemError in case it's not defined
-- although the feature isn't cutting edge, older kernels (such as
3.10 on CentOS) do not have the syscall (or the number)
unless they're rebases with RT_PREEMPT patch.


# ifndef HAVE_SETNS
#  if defined(__NR_setns)
-#   include 

static inline int setns(int fd, int nstype)
{
@@ -103,6 +109,13 @@ static inline int setns(int fd, int nstype)
#   error Please determine the syscall number for setns on your architecture
#  endif
# endif
+
+static inline int sched_setattr(pid_t pid,
+const struct sched_attr *attr,
+unsigned int flags)
+{
+return syscall(__NR_sched_setattr, pid, attr, flags);
+}
#else /* !__linux__ */
static inline int setns(int fd ATTRIBUTE_UNUSED, int nstype ATTRIBUTE_UNUSED)
{
@@ -110,6 +123,15 @@ static inline int setns(int fd ATTRIBUTE_UNUSED, int 
nstype ATTRIBUTE_UNUSED)
_("Namespaces are not supported on this platform."));
   return -1;
}
+
+static inline int sched_setattr(pid_t pid,
+const struct sched_attr *attr,
+unsigned int flags)
+{
+virReportSystemError(ENOSYS, "%s",
+ _("Deadline scheduler is not supported on this 
platform."));
+return -1;
+}
#endif

VIR_ENUM_IMPL(virProcessSchedPolicy, VIR_PROC_POLICY_LAST,
@@ -1225,7 +1247,10 @@ virProcessSchedTranslatePolicy(virProcessSchedPolicy 
policy)
int
virProcessSetScheduler(pid_t pid,
  virProcessSchedPolicy policy,
-   int priority)
+   int priority,
+   unsigned long long runtime,
+   unsigned long long deadline,
+   unsigned long long period)
{
   struct sched_param param = {0};
   int pol = virProcessSchedTranslatePolicy(policy);
@@ -1235,6 +1260,47 @@ virProcessSetScheduler(pid_t pid,
   if (!policy)
   return 0;

+if (pol == SCHED_DEADLINE) {
+int ret = 0;
+
+if (runtime < 1024 || deadline < 1024 || period < 24) {
+virReportError(VIR_ERR_CONFIG_UNSUPPORTED,
+_("Scheduler runtime, deadline and period must be "
+  "higher or equal to 1024 (1 us) (runtime(%llu), "
+  "deadline(%llu), period(%llu))"), runtime, deadline, 
period);
+return -1;
+}
+
+if (!(runtime <= deadline && deadline <= period)) {
+virReportError(VIR_ERR_CONFIG_UNSUPPORTED,
+   _("Scheduler configuration does not satisfy "
+ "(runtime(

[libvirt] [PATCH 3/6] virDomainFormatSchedDef: factor out subset code

2016-11-07 Thread Martin Polednik
The code within the function is too specific for priority attribute of
RT schedulers. To allow addition of schedulers that group by different
properties, we factor out the logic to calculate cpu subset. Instead
of comparing by priority, the new code accepts comparator for the 2
sched structs.
---
 src/conf/domain_conf.c | 77 +++---
 1 file changed, 55 insertions(+), 22 deletions(-)

diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c
index ca0fdfd..0ed755c 100644
--- a/src/conf/domain_conf.c
+++ b/src/conf/domain_conf.c
@@ -23058,6 +23058,47 @@ virDomainDefHasCapabilitiesFeatures(virDomainDefPtr 
def)
 return false;
 }
 
+static bool
+virDomainSchedPriorityComparator(virDomainThreadSchedParamPtr baseSched,
+ virDomainThreadSchedParamPtr sched)
+{
+bool ret = false;
+ret = (baseSched->priority == sched->priority);
+
+return ret;
+}
+
+static virDomainThreadSchedParamPtr
+virDomainSchedSubsetCharacteristic(virDomainDefPtr def,
+   virBitmapPtr schedMap,
+   virBitmapPtr subsetMap,
+   virDomainThreadSchedParamPtr 
(*func)(virDomainDefPtr, unsigned int),
+   bool 
(*comparator)(virDomainThreadSchedParamPtr,
+  
virDomainThreadSchedParamPtr))
+{
+ssize_t nextidx;
+virDomainThreadSchedParamPtr sched;
+virDomainThreadSchedParamPtr baseSched = NULL;
+
+virBitmapClearAll(subsetMap);
+
+/* we need to find a subset of vCPUs with the given scheduler
+* that share the priority */
+nextidx = virBitmapNextSetBit(schedMap, -1);
+if (!(sched = func(def, nextidx)))
+return NULL;
+
+baseSched = sched;
+ignore_value(virBitmapSetBit(subsetMap, nextidx));
+
+while ((nextidx = virBitmapNextSetBit(schedMap, nextidx)) > -1) {
+sched = func(def, nextidx);
+if (sched && comparator(baseSched, sched))
+ignore_value(virBitmapSetBit(subsetMap, nextidx));
+}
+
+return baseSched;
+}
 
 /**
  * virDomainFormatSchedDef:
@@ -23082,8 +23123,9 @@ virDomainFormatSchedDef(virDomainDefPtr def,
 virBitmapPtr resourceMap)
 {
 virBitmapPtr schedMap = NULL;
-virBitmapPtr prioMap = NULL;
+virBitmapPtr subsetMap = NULL;
 virDomainThreadSchedParamPtr sched;
+virDomainThreadSchedParamPtr baseSched;
 char *tmp = NULL;
 ssize_t next;
 size_t i;
@@ -23098,7 +23140,7 @@ virDomainFormatSchedDef(virDomainDefPtr def,
  */
 
 if (!(schedMap = virBitmapNew(VIR_DOMAIN_CPUMASK_LEN)) ||
-!(prioMap = virBitmapNew(VIR_DOMAIN_CPUMASK_LEN)))
+!(subsetMap = virBitmapNew(VIR_DOMAIN_CPUMASK_LEN)))
 goto cleanup;
 
 for (i = VIR_PROC_POLICY_NONE + 1; i < VIR_PROC_POLICY_LAST; i++) {
@@ -23117,9 +23159,8 @@ virDomainFormatSchedDef(virDomainDefPtr def,
  * have them */
 while (!virBitmapIsAllClear(schedMap)) {
 virBitmapPtr currentMap = NULL;
-ssize_t nextprio;
 bool hasPriority = false;
-int priority = 0;
+baseSched = NULL;
 
 switch ((virProcessSchedPolicy) i) {
 case VIR_PROC_POLICY_NONE:
@@ -23132,25 +23173,17 @@ virDomainFormatSchedDef(virDomainDefPtr def,
 
 case VIR_PROC_POLICY_FIFO:
 case VIR_PROC_POLICY_RR:
-virBitmapClearAll(prioMap);
 hasPriority = true;
 
-/* we need to find a subset of vCPUs with the given scheduler
- * that share the priority */
-nextprio = virBitmapNextSetBit(schedMap, -1);
-if (!(sched = func(def, nextprio)))
+baseSched = virDomainSchedSubsetCharacteristic(def,
+   schedMap,
+   subsetMap,
+   func,
+   
virDomainSchedPriorityComparator);
+if (baseSched == NULL)
 goto cleanup;
 
-priority = sched->priority;
-ignore_value(virBitmapSetBit(prioMap, nextprio));
-
-while ((nextprio = virBitmapNextSetBit(schedMap, nextprio)) > 
-1) {
-sched = func(def, nextprio);
-if (sched && sched->priority == priority)
-ignore_value(virBitmapSetBit(prioMap, nextprio));
-}
-
-currentMap = prioMap;
+currentMap = subsetMap;
 break;
 }
 
@@ -23164,8 +23197,8 @@ virDomainFormatSchedDef(virDomainDefPtr def,
   virProcessSchedPolicyTypeToString(i));
 VIR_FREE(tmp);
 
-if 

[libvirt] [PATCH 5/6] schema: add deadline scheduler

2016-11-07 Thread Martin Polednik
Can be used in the same way as other schedulers, only requiring
additional parameters.
---
 docs/schemas/domaincommon.rng | 16 
 1 file changed, 16 insertions(+)

diff --git a/docs/schemas/domaincommon.rng b/docs/schemas/domaincommon.rng
index 19d45fd..1461d34 100644
--- a/docs/schemas/domaincommon.rng
+++ b/docs/schemas/domaincommon.rng
@@ -851,6 +851,22 @@
   
 
   
+  
+
+  
+deadline
+  
+
+
+  
+
+
+  
+
+
+  
+
+  
 
   
 
-- 
2.8.1

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


[libvirt] [PATCH 4/6] conf: add deadline scheduler

2016-11-07 Thread Martin Polednik
As the code for changing task scheduler is now able to choose deadline
scheduler, we can update domain configuration to parse the scheduler.

The subset of vcpus or iothreads where the scheduler is configured can
be calculated by simply adding comparator based on SCHED_DEADLINE
attributes (runtime, deadline, period) and correctly passing it within
virDomainFormatSchedDef.
---
 src/conf/domain_conf.c | 80 --
 1 file changed, 77 insertions(+), 3 deletions(-)

diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c
index 0ed755c..a4eb2e1 100644
--- a/src/conf/domain_conf.c
+++ b/src/conf/domain_conf.c
@@ -15710,7 +15710,10 @@ static virBitmapPtr
 virDomainSchedulerParse(xmlNodePtr node,
 const char *name,
 virProcessSchedPolicy *policy,
-int *priority)
+int *priority,
+unsigned long long *runtime,
+unsigned long long *deadline,
+unsigned long long *period)
 {
 virBitmapPtr ret = NULL;
 char *tmp = NULL;
@@ -15766,6 +15769,42 @@ virDomainSchedulerParse(xmlNodePtr node,
 VIR_FREE(tmp);
 }
 
+if (pol == VIR_PROC_POLICY_DEADLINE) {
+if (!(tmp = virXMLPropString(node, "runtime"))) {
+virReportError(VIR_ERR_XML_ERROR, "%s",
+   _("Missing scheduler runtime"));
+goto error;
+}
+if (virStrToLong_ull(tmp, NULL, 10, runtime) < 0) {
+virReportError(VIR_ERR_XML_ERROR, "%s",
+   _("Invalid value for element runtime"));
+goto error;
+}
+
+if (!(tmp = virXMLPropString(node, "deadline"))) {
+virReportError(VIR_ERR_XML_ERROR, "%s",
+   _("Missing scheduler deadline"));
+goto error;
+}
+if (virStrToLong_ull(tmp, NULL, 10, deadline) < 0) {
+virReportError(VIR_ERR_XML_ERROR, "%s",
+   _("Invalid value for element deadline"));
+goto error;
+}
+
+if (!(tmp = virXMLPropString(node, "period"))) {
+virReportError(VIR_ERR_XML_ERROR, "%s",
+   _("Missing scheduler period"));
+goto error;
+}
+if (virStrToLong_ull(tmp, NULL, 10, period) < 0) {
+virReportError(VIR_ERR_XML_ERROR, "%s",
+   _("Invalid value for element period"));
+goto error;
+}
+VIR_FREE(tmp);
+}
+
 return ret;
 
  error:
@@ -15786,9 +15825,13 @@ virDomainThreadSchedParseHelper(xmlNodePtr node,
 virDomainThreadSchedParamPtr sched;
 virProcessSchedPolicy policy;
 int priority;
+unsigned long long runtime;
+unsigned long long deadline;
+unsigned long long period;
 int ret = -1;
 
-if (!(map = virDomainSchedulerParse(node, name, , )))
+if (!(map = virDomainSchedulerParse(node, name, , ,
+, , )))
 goto cleanup;
 
 while ((next = virBitmapNextSetBit(map, next)) > -1) {
@@ -15804,6 +15847,9 @@ virDomainThreadSchedParseHelper(xmlNodePtr node,
 
 sched->policy = policy;
 sched->priority = priority;
+sched->runtime = runtime;
+sched->deadline = deadline;
+sched->period = period;
 }
 
 ret = 0;
@@ -23068,6 +23114,18 @@ 
virDomainSchedPriorityComparator(virDomainThreadSchedParamPtr baseSched,
 return ret;
 }
 
+static bool
+virDomainSchedDeadlineComparator(virDomainThreadSchedParamPtr baseSched,
+ virDomainThreadSchedParamPtr sched)
+{
+bool ret = false;
+ret = (baseSched->runtime == sched->priority &&
+   baseSched->deadline == sched->deadline &&
+   baseSched->period == sched->period);
+
+return ret;
+}
+
 static virDomainThreadSchedParamPtr
 virDomainSchedSubsetCharacteristic(virDomainDefPtr def,
virBitmapPtr schedMap,
@@ -23160,13 +23218,13 @@ virDomainFormatSchedDef(virDomainDefPtr def,
 while (!virBitmapIsAllClear(schedMap)) {
 virBitmapPtr currentMap = NULL;
 bool hasPriority = false;
+bool isDeadline = false;
 baseSched = NULL;
 
 switch ((virProcessSchedPolicy) i) {
 case VIR_PROC_POLICY_NONE:
 case VIR_PROC_POLICY_BATCH:
 case VIR_PROC_POLICY_IDLE:
-case VIR_PROC_POLICY_DEADLINE:
 case VIR_PROC_POLICY_LAST:
 currentMap = schedMap;
 break;
@@ -23185,6 +23243,19 @@ virDomainFormatSchedDef(virDomainDefPtr def,
 
 currentMap = subsetMap;
 break;
+case VIR_PROC_POLICY_DEADLINE:
+isDeadline = true;
+
+baseSched = virDomainSchedSubsetCharacteristic(def,
+   

[libvirt] [PATCH 6/6] docs: mention deadline scheduler in vcpusched

2016-11-07 Thread Martin Polednik
---
 docs/formatdomain.html.in | 15 +--
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/docs/formatdomain.html.in b/docs/formatdomain.html.in
index 11b3330..1442190 100644
--- a/docs/formatdomain.html.in
+++ b/docs/formatdomain.html.in
@@ -799,12 +799,12 @@
   
 The optional vcpusched elements specifies the scheduler
 type (values batch, idle, fifo,
-rr) for particular vCPU/IOThread threads (based on
-vcpus and iothreads, leaving out
-vcpus/iothreads sets the default). Valid
-vcpus values start at 0 through one less than the
-number of vCPU's defined for the domain. Valid iothreads
-values are described in the iothreadids
+rr, deadline) for particular vCPU/IOThread
+threads (based on vcpus and iothreads,
+leaving out vcpus/iothreads sets the
+default). Valid vcpus values start at 0 through one less
+than the number of vCPU's defined for the domain. Valid
+iothreads values are described in the 
iothreadids
 description.
 If no iothreadids are defined, then libvirt numbers
 IOThreads from 1 to the number of iothreads available
@@ -812,6 +812,9 @@
 rr), priority must be specified as
 well (and is ignored for non-real-time ones). The value range
 for the priority depends on the host kernel (usually 1-99).
+For deadline real-time scheduler, additional parameters runtime,
+deadline and period must be specified. The value of these parameters is
+specified in nanoseconds, where minimum is 1024 (1 usec).
 Since 1.2.13
   
 
-- 
2.8.1

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


[libvirt] [PATCH 0/6] Add deadline scheduler

2016-11-07 Thread Martin Polednik
The policy SCHED_DEADLINE is available since kernel 3.14 (and most likely
backported to older RT_PREEMPT kernels). It is safer to use than fifo or round
robin policies due to only limiting part of cpu time for the RT process,
leading to lack of lockups of the host.

The series adds new vcpusched/iothreadsched called 'deadline' that activates
SCHED_DEADLINE for given process. As the scheduler is linux specific, extra
implementation was required - it is not possible to use sched_setscheduler,
sched_setattr syscall must be used.

Martin Polednik (6):
  conf: add entries for deadline scheduler
  util: allow virProcessSetScheduler to set SCHED_DEADLINE
  virDomainFormatSchedDef: factor out subset code
  conf: add deadline scheduler
  schema: add deadline scheduler
  docs: mention deadline scheduler in vcpusched

 docs/formatdomain.html.in |  15 ++--
 docs/schemas/domaincommon.rng |  16 +
 src/conf/domain_conf.c| 154 +++---
 src/conf/domain_conf.h|   3 +
 src/qemu/qemu_process.c   |   3 +-
 src/util/virprocess.c |  82 +-
 src/util/virprocess.h |  29 +++-
 7 files changed, 268 insertions(+), 34 deletions(-)

-- 
2.8.1

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


[libvirt] [PATCH 1/6] conf: add entries for deadline scheduler

2016-11-07 Thread Martin Polednik
Deadline scheduler, or SCHED_DEADLINE, is a new realtime scheduler
added to Linux 3.14. In order to support it as a possible value in
vcpusched/iothreadsched, we have to add it to scheduler related
structures.
---
 src/conf/domain_conf.c |  1 +
 src/conf/domain_conf.h |  3 +++
 src/util/virprocess.c  | 12 +++-
 src/util/virprocess.h  |  1 +
 4 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c
index a233c0c..ca0fdfd 100644
--- a/src/conf/domain_conf.c
+++ b/src/conf/domain_conf.c
@@ -23125,6 +23125,7 @@ virDomainFormatSchedDef(virDomainDefPtr def,
 case VIR_PROC_POLICY_NONE:
 case VIR_PROC_POLICY_BATCH:
 case VIR_PROC_POLICY_IDLE:
+case VIR_PROC_POLICY_DEADLINE:
 case VIR_PROC_POLICY_LAST:
 currentMap = schedMap;
 break;
diff --git a/src/conf/domain_conf.h b/src/conf/domain_conf.h
index 541b600..67b9e3e 100644
--- a/src/conf/domain_conf.h
+++ b/src/conf/domain_conf.h
@@ -1840,6 +1840,9 @@ typedef virDomainThreadSchedParam 
*virDomainThreadSchedParamPtr;
 struct _virDomainThreadSchedParam {
 virProcessSchedPolicy policy;
 int priority;
+unsigned long long runtime;
+unsigned long long deadline;
+unsigned long long period;
 };
 
 typedef struct _virDomainTimerCatchupDef virDomainTimerCatchupDef;
diff --git a/src/util/virprocess.c b/src/util/virprocess.c
index 718c4a2..d68800b 100644
--- a/src/util/virprocess.c
+++ b/src/util/virprocess.c
@@ -34,6 +34,12 @@
 #endif
 #if HAVE_SCHED_SETSCHEDULER
 # include 
+# ifdef __linux__
+#  include 
+# endif
+# ifndef SCHED_DEADLINE
+#  define SCHED_DEADLINE 6
+# endif
 #endif
 
 #if defined(__FreeBSD__) || defined(__FreeBSD_kernel__) || 
HAVE_BSD_CPU_AFFINITY
@@ -111,7 +117,8 @@ VIR_ENUM_IMPL(virProcessSchedPolicy, VIR_PROC_POLICY_LAST,
   "batch",
   "idle",
   "fifo",
-  "rr");
+  "rr",
+  "deadline");
 
 /**
  * virProcessTranslateStatus:
@@ -1204,6 +1211,9 @@ virProcessSchedTranslatePolicy(virProcessSchedPolicy 
policy)
 case VIR_PROC_POLICY_RR:
 return SCHED_RR;
 
+case VIR_PROC_POLICY_DEADLINE:
+return SCHED_DEADLINE;
+
 case VIR_PROC_POLICY_LAST:
 /* nada */
 break;
diff --git a/src/util/virprocess.h b/src/util/virprocess.h
index 04e9802..1eac3e6 100644
--- a/src/util/virprocess.h
+++ b/src/util/virprocess.h
@@ -34,6 +34,7 @@ typedef enum {
 VIR_PROC_POLICY_IDLE,
 VIR_PROC_POLICY_FIFO,
 VIR_PROC_POLICY_RR,
+VIR_PROC_POLICY_DEADLINE,
 
 VIR_PROC_POLICY_LAST
 } virProcessSchedPolicy;
-- 
2.8.1

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] hostdev: add support for "managed='detach'"

2016-03-18 Thread Martin Polednik

On 18/03/16 10:15 +, Daniel P. Berrange wrote:

On Thu, Mar 17, 2016 at 05:37:28PM -0400, Laine Stump wrote:

On 03/17/2016 02:32 PM, Daniel P. Berrange wrote:
>On Thu, Mar 17, 2016 at 12:18:49PM -0600, Alex Williamson wrote:
>>On Thu, 17 Mar 2016 17:59:53 +
>>"Daniel P. Berrange"  wrote:
>>
>>>I don't think it is a significant burden really. Apps which want this
>>>blacklisted forever likely want to setup the modprobe blacklist anyway
>>>to stop the initial bind at boot up and instead permanently reserve
>>>the device. This stops the device being used at startup - eg if we
>>>have a bunch of NICs to be given to guests, you don't want the host
>>>OS to automatically configure them and give them IP addresses on the
>>>host before we start guests. So pre-reserving devices at the host OS
>>>level is really want you want todo with data center / cloud management
>>>apps like oVirt / OpenStack at least. They could easily use the
>>>virNodeDeviceDetach API at the time they decide to assign a device
>>>to a guest though.
>>modprobe blacklist assumes that all devices managed by a given driver
>>are reserved for VM use.  That's very often not the case.  Even with
>>SR-IOV VFs, several vendors use the same driver for PF and VF, so
>>that's just a poor solution.  For GPU assignment we often recommend
>>using pci-stub.ids on the kernel commandline to pre-load the pci-stub
>>driver with PCI vendor and device IDs to claim to prevent host drivers
>>from attaching, but that also assumes that you want to use everything
>>matching those IDs for a VM, which users will quickly find fault with.
>>Additionally, using either solution assumes that the device will be
>>left entirely alone otherwise, which is also not true.  If I blacklist
>>i915 or using pci-stub.ids to make pci-stub claim it, then efifb or
>>vesafb is more than happy to make use of it, so it's actually cleaner
>>to let i915 grab the device and unbind it when ready.  And of course
>>the issue of assuming that the device can go without drivers, which may
>>make your user run a headless system.  This is really not the
>>simplistic issue that it may seem.  Thanks,
>The issues you describe point towards the need for a better blacklisting
>facility at the OS level IMHO. eg a way to tell the kernel module to
>not automatically bind drivers for devices with a particular device ID.

To paraphrase an old saying: this is a nail, and libvirt is my hammer :-) My
first response to Alex when he asked about this feature was to ask if he
couldn't configure it somehow outside libvirt, and his response (similar to
above) made it fairly plain that you really can't, at least not without
adding custom startup scripts. Since the barrier to entry for the kernel is
much higher than for libvirt, this seems like a way to actually get
something that works.


Looking at this from the POV of openstack, we would want to mark devices as
detached from host independantly of assigning them to guests, since we build
up a whitelist of assignable devices at initial startup.

So I'd expect OpenStack would simply call virNodeDeviceDetach() for each
device in its whitelist at startup. If we consider how we deal with guest
domain hotplug, we have the concept of VIR_DOMAIN_MODIFY_LIVE vs CONFIG.
If there were a way to blacklist devices based on address, then it would
be natural to extend libvirt so that you could do

 virNodeDeviceDetach(dev, VIR_DOMAIN_MODIFY_CONFIG);

to ensure it was permanently disabled from attachment to host drivers,
instead of just for the current point in time. This is using libvirt as
a hammer, but we still need underlying OS support in some manner to
actually implement it.


Joining this discussion from oVirt's perspective. oVirt uses managed="no"
mode. What we do now is using virNodeDeviceDetachFlags() when VM is
started, and virNodeDeviceReAttach() when it's destroyed. This is because
we handle permissions of /dev/vfio/* ourselves and allows for finer
granularity (e.g. skipping PCI_HEADER_TYPE 0 devices or
hot(un)plugging a device).

To deal with GPUs or any kind of device with problematic driver, we
currently advise users to use pci-stub.ids and plan to make this
functionality available from our UI. We have to modify kernel
command line anyway, as user's modifications of it are not supported
and we need to enable IOMMU and possibly expose workarounds for
machines with issues (allow_unsafe_interrupts, pci=realloc for
SR-IOV).

After reading this thread, we are considering not explicitly reattaching
the devices. If user wants to reclaim a device, it will be possible to
reattach it explicitly from the UI.

Due to system configuration mentioned above, I don't believe there is
a need for managed="detach" in management applications. It should be
management application's developer that decides whether the devices
will be detached on boot or at VM start, and users decision if they require
reattaching back later.

That being said, I am not sure if people 

[libvirt] USB hostdev de/reattaching

2015-03-19 Thread Martin Polednik
Hello,

according to libvirt documentation, when using hostdev with USB device
...the user is responsible to call virNodeDeviceDettach (or virsh 
nodedev-detach)
before starting the guest In libvirt-1.2.13-1.el7.x86_64, this leads to 

virsh nodedev-detach usb_usb1

error: Failed to detach device usb_usb1
error: invalid argument: device usb_usb1 is not a PCI device

The failure is most probably caused in qemuNodeDeviceDeta where 
qemuNodeDeviceGetPCIInfo
is called. Is it a bug, or usb hostdev can be done without 
detaching/reattaching?

Thanks,
mpolednik 

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list