Re: [systemd-devel] persisting sriov_numvfs

2015-01-27 Thread Martin Polednik


- Original Message -
> From: "Lennart Poettering" 
> To: "Martin Polednik" 
> Cc: "Andrei Borzenkov" , 
> systemd-devel@lists.freedesktop.org, ibar...@redhat.com
> Sent: Tuesday, January 27, 2015 2:21:21 PM
> Subject: Re: [systemd-devel] persisting sriov_numvfs
> 
> On Tue, 27.01.15 07:35, Martin Polednik (mpoled...@redhat.com) wrote:
> 
> > > > > Hmm, I see. In many ways this feels like VLAN setup from a
> > > > > configuration PoV, right? i.e. you have one hw device the driver
> > > > > creates, and then you configure a couple of additional interfaces on
> > > > > top of it.
> > > > > 
> > > > > This of course then raises the question: shouldn't this functionality
> > > > > be exposed by the kernel the same way as VLANs? i.e. with a
> > > > > rtnetlink-based API to create additional interfaces, instead of /sys?
> > > > > 
> > > > > In systemd I figure the right way to expose this to the user would be
> > > > > via
> > > > > .netdev files, the same way as we expose VLAN devices. Not however
> > > > > that that would be networkd territory,
> > > > 
> > > > No, this is not limited to NICs. It is generic feature that can be in
> > > > principle used with any hardware and there are e.g. FC or FCoE HBAs
> > > > with SR-IOV support. It is true that today it is mostly comes with NICs
> > > > though.
> > > > 
> > > > Any general framework for setting it up should not be tied to specific
> > > > card type.
> > > 
> > > Well, I doubt that there will be graphics cards that support this
> > > right? I mean, it's really only network connectivity that can support
> > > a concept like this easily, since you can easily merge packet streams
> > > from multiple VMs on one connection. However, I am not sure how you
> > > want to physically merge VGA streams onto a single VGA connector...
> > > 
> > > If this is about ethernet, FC, FCOE, then I still think that the
> > > network management solution should consider this as something you can
> > > configure on physical links like VLANs. Hence networkd or
> > > NetworkManager and so on should cover it.
> > > 
> > > Lennart
> > 
> > Afaik some storage cards support this, for GPU's it's possibly for the
> > GPUPU applications and such - where you don't care about the physical
> > output, but the processing core of gpu itself (but I'm not aware of such
> > implementation yet, nvidia seems to be doing something but the details
> > are nowhere to be found).
> 
> Hmm, so there are three options I think.
> 
> a) Expose this in networkd .netdev files, as I suggested
>originally. This would be appropriate if we can add and remove VFs
>freely any time, without the other VFs being affected. Can you
>clarify whether going from let's say 4 to 5 VFs requires removing
>all VFs and recreating them? THis would be the nicest exposure I
>think, but be specific to networkd.

Removing and recreating the VFs is unfortunately required when changing the
number of them (both ways - increasing and decreasing their count).

https://www.kernel.org/doc/Documentation/PCI/pci-iov-howto.txt

> b) Expose this via udev .link files. This would be appropriate if
>adding/removing VFs is a one-time thing, when a device pops
>up. This would be networking specific, not cover anything else like
>GPU or storage or so. Would still be quite nice. Would probably the
>best option, after a), if VFs cannot be added/removed dynamically
>all the time without affecting the other VFs.
> 
> c) Expose this via udev rules files. This would be generic, would work
>for networking as well as GPUs or storage. This would entail
>writing our rules files when you want to configure the number of
>VFs. Care needs to be taken to use the right way to identify
>devices as they come and go, so that you can apply configuration to
>them in a stable way. This is somewhat uglier, as we don't really
>think that udev rules should be used that much for configuration,
>especially not for configuration written out by programs, rather
>than manually. However, logind already does this, to assign seat
>identifiers to udev devices to enable multi-seat support.
> 
> A combination of b) for networking and c) for the rest might be an
> option too.

I myself would vote for b) + c) since we want to cover most of the
possible use cases for SR-IOV and MR-IOV, which hopefully shares
the interface; adding Dan back to CC as he is the one to speak for network. 

> Lennart
> 
> --
> Lennart Poettering, Red Hat
> 
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] persisting sriov_numvfs

2015-01-27 Thread Martin Polednik


- Original Message -
> From: "Lennart Poettering" 
> To: "Andrei Borzenkov" 
> Cc: "Martin Polednik" , 
> systemd-devel@lists.freedesktop.org, ibar...@redhat.com
> Sent: Tuesday, January 27, 2015 1:21:32 PM
> Subject: Re: [systemd-devel] persisting sriov_numvfs
> 
> On Tue, 27.01.15 06:47, Andrei Borzenkov (arvidj...@gmail.com) wrote:
> 
> > > Hmm, I see. In many ways this feels like VLAN setup from a
> > > configuration PoV, right? i.e. you have one hw device the driver
> > > creates, and then you configure a couple of additional interfaces on
> > > top of it.
> > > 
> > > This of course then raises the question: shouldn't this functionality
> > > be exposed by the kernel the same way as VLANs? i.e. with a
> > > rtnetlink-based API to create additional interfaces, instead of /sys?
> > > 
> > > In systemd I figure the right way to expose this to the user would be via
> > > .netdev files, the same way as we expose VLAN devices. Not however
> > > that that would be networkd territory,
> > 
> > No, this is not limited to NICs. It is generic feature that can be in
> > principle used with any hardware and there are e.g. FC or FCoE HBAs
> > with SR-IOV support. It is true that today it is mostly comes with NICs
> > though.
> > 
> > Any general framework for setting it up should not be tied to specific
> > card type.
> 
> Well, I doubt that there will be graphics cards that support this
> right? I mean, it's really only network connectivity that can support
> a concept like this easily, since you can easily merge packet streams
> from multiple VMs on one connection. However, I am not sure how you
> want to physically merge VGA streams onto a single VGA connector...
> 
> If this is about ethernet, FC, FCOE, then I still think that the
> network management solution should consider this as something you can
> configure on physical links like VLANs. Hence networkd or
> NetworkManager and so on should cover it.
> 
> Lennart

Afaik some storage cards support this, for GPU's it's possibly for the
GPUPU applications and such - where you don't care about the physical
output, but the processing core of gpu itself (but I'm not aware of such
implementation yet, nvidia seems to be doing something but the details
are nowhere to be found).
 
> --
> Lennart Poettering, Red Hat
> 
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] persisting sriov_numvfs

2015-01-23 Thread Martin Polednik


- Original Message -
> From: "Lennart Poettering" 
> To: "Dan Kenigsberg" 
> Cc: systemd-devel@lists.freedesktop.org, mpole...@redhat.com, 
> ibar...@redhat.com
> Sent: Friday, January 23, 2015 3:49:59 AM
> Subject: Re: [systemd-devel] persisting sriov_numvfs
> 
> On Mon, 19.01.15 14:18, Dan Kenigsberg (dan...@redhat.com) wrote:
> 
> > Hello, list.
> > 
> > I'm an http://oVirt.org developer, and we plan to (finally) support
> > SR-IOV cards natively. Working on this feature, we've noticed that
> > something is missing in the platform OS.
> > 
> > If I maintain a host with sr-iov cards, I'd like to use the new kernel
> > method of defining how many virtual functions (VFs) are to be exposed by
> > each physical function:
> 
> Quite frankly, I cannot make sense of these sentences. I have no clue
> what a "SR-IOV", "virtual function", "physical function" is supposed
> to be.
> 
> Please explain what this all is, before we can think of adding any
> friendlier config option to udev/networkd/systemd for this.

Hello,

I'm oVirt developer responsible for VFIO/SR-IOV passthrough on the host
side.

SR-IOV is a specification from PCI SIG, where single hardware device
(we're using NICs for example) can actually act as a multiple devices.
This device is then considered PF (physical function) and spawned devices
are so called VFs (virtual functions). This functionality allows system
administrators to assign these devices to virtual machines to get near
bare metal performance of the device and possibly share it amongst multiple
VMs.

Spawning of the VFs was previously done via device driver, using max_vfs
attribute. This means that if you wanted to persist these VFs, you had to
add this to modules-load.d. Since some of the device driver creators used
different names, spawning of VFs was moved to sysfs and can be operated via
echo ${number} > /sys/bus/pci/devices/${device_name}/sriov_numvfs where
${number} <= /sys/bus/pci/devices/${device_name}/sriov_totalvfs and if 
changing the number of VFs from nonzero value, it first needs to be set to 0.

We've encountered the need to persist this configuration and load it before
network scripts (and possibly in future other scripts) so that the hardware
can be referenced in those scripts. There is currently no such option. We
are seeking help in creating a standardized way of handling this persistence.

mpolednik
 
> Lennart
> 
> --
> Lennart Poettering, Red Hat
> 
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel