On Thu, Jun 10, 2021 at 10:13 AM Frode Nordahl <frode.nord...@canonical.com> wrote: > > On Thu, Jun 10, 2021 at 1:46 PM Ilya Maximets <i.maxim...@ovn.org> wrote: > > > > On 6/10/21 8:36 AM, Han Zhou wrote: > > > > > > > > > On Thu, May 13, 2021 at 9:25 AM Frode Nordahl > > > <frode.nord...@canonical.com <mailto:frode.nord...@canonical.com>> wrote: > > >> > > >> On Thu, May 13, 2021 at 5:12 PM Ilya Maximets <i.maxim...@ovn.org > > >> <mailto:i.maxim...@ovn.org>> wrote: > > >> > > > >> > On 5/9/21 4:03 PM, Frode Nordahl wrote: > > >> > > Introduce plugging module that adds and removes ports on the > > >> > > integration bridge, as directed by Port_Binding options. > > >> > > > > >> > > Traditionally it has been the CMSs responsibility to create Virtual > > >> > > Interfaces (VIFs) as part of instance (Container, Pod, Virtual > > >> > > Machine etc.) life cycle, and subsequently manage plug/unplug > > >> > > operations on the Open vSwitch integration bridge. > > >> > > > > >> > > With the advent of NICs connected to multiple distinct CPUs we can > > >> > > have a topology where the instance runs on one host and Open > > >> > > vSwitch and OVN runs on a different host, the smartnic CPU. > > >> > > > > >> > > The act of plugging and unplugging the representor port in Open > > >> > > vSwitch running on the smartnic host CPU would be the same for > > >> > > every smartnic variant (thanks to the devlink-port[0][1] > > >> > > infrastructure) and every CMS (Kubernetes, LXD, OpenStack, etc.). > > >> > > As such it is natural to extend OVN to provide this common > > >> > > functionality through its CMS facing API. > > >> > > > >> > Hi, Frode. Thanks for putting this together, but it doesn't look > > >> > natural to me. OVN, AFAIK, never touched physical devices or > > >> > interacted with the kernel directly. This change introduces completely > > >> > new functionality inside OVN. With the same effect we can run a fully > > >> > separate service on these smartnic CPUs that will do plugging > > >> > and configuration job for CMS. You may even make it independent > > >> > from a particular CMS by creating a REST API for it or whatever. > > >> > This will additionally allow using same service for non-OVN setups. > > >> > > >> Ilya, > > >> > > >> Thank you for taking the time to comment, much appreciated. > > >> > > >> Yes, this is new functionality, NICs with separate control plane CPUs > > >> and isolation from the host are also new, so this is one proposal for > > >> how we could go about to enable the use of them. > > >> > > >> The OVN controller does today get pretty close to the physical realm > > >> by maintaining patch ports in Open vSwitch based on bridge mapping > > >> configuration and presence of bridges to physical interfaces. It also > > >> does react to events of physical interfaces being plugged into the > > >> Open vSwitch instance it manages, albeit to date some other entity has > > >> been doing the act of adding the port into the bridge. > > >> > > >> The rationale for proposing to use the OVN database for coordinating > > >> this is that the information about which ports to bind, and where to > > >> bind them is already there. The timing of the information flow from > > >> the CMS is also suitable for the task. > > >> > > >> OVN relies on OVS library code, and all the necessary libraries for > > >> interfacing with the kernel through netlink and friends are there or > > >> would be easy to add. The rationale for using the netlink-devlink > > >> interface is that it provides a generic infrastructure for these types > > >> of NICs. So by using this interface we should be able to support most > > >> if not all of the variants of these cards. > > >> > > >> > > >> Providing a separate OVN service to do the task could work, but would > > >> have the cost of an extra SB DB connection, IDL and monitors. > > > > IMHO, CMS should never connect to Southbound DB. It's just because the > > Southbound DB is not meant to be a public interface, it just happened > > to be available for connections. I know that OpenStack has metadata > > agents that connects to Sb DB, but if it's really required for them, I > > think, there should be a different way to get/set required information > > without connection to the Southbound. > > The CMS-facing API is the Northbound DB, I was not suggesting direct > use of the Southbound DB by external to OVN services. My suggestion > was to have a separate OVN process do this if your objection was to > handle it as part of the ovn-controller process. > > > >> > > >> I fear it would be quite hard to build a whole separate project with > > >> its own API, feels like a lot of duplicated effort when the flow of > > >> data and APIs in OVN already align so well with CMSs interested in > > >> using this? > > >> > > >> > Interactions with physical devices also makes OVN linux-dependent > > >> > at least for this use case, IIUC. > > >> > > >> This specific bit would be linux-specific in the first iteration, yes. > > >> But the vendors manufacturing and distributing the hardware do often > > >> have drivers for other platforms, I am sure the necessary > > >> infrastructure will become available there too over time, if it is not > > >> there already. > > >> > > >> We do currently have platform specific macros in the OVN build system, > > >> so we could enable the functionality when built on a compatible > > >> platform. > > >> > > >> > Maybe, others has different opinions. > > >> > > >> I appreciate your opinion, and enjoy discussing this topic. > > >> > > >> > Another though is that there is, obviously, a network connection > > >> > between the host and smartnic system. Maybe it's possible to just > > >> > add an extra remote to the local ovsdb-server so CMS daemon on the > > >> > host system could just add interfaces over the network connection? > > >> > > >> There are a few issues with such an approach. One of the main goals > > >> with providing and using a NIC with control plane CPUs is having an > > >> extra layer of security and isolation which is separate from the > > >> hypervisor host the card happens to share a PCI complex with and draw > > >> power from. Requiring a connection between the two for operation would > > >> defy this purpose. > > >> > > >> In addition to that, this class of cards provide visibility into > > >> kernel interfaces, enumeration of representor ports etc. only from the > > >> NIC control plane CPU side of the PCI complex, this information is not > > >> provided to the host. So if a hypervisor host CMS agent were to do the > > >> plugging through a remote ovsdb connection, it would have to > > >> communicate with something else running on the NIC control plane CPU > > >> to retrieve the information it needs before it can know what to relay > > >> back over the ovsdb connection. > > >> > > >> -- > > >> Frode Nordahl > > >> > > >> > Best regards, Ilya Maximets. > > > > > > Here are my 2 cents. > > > > > > Initially I had similar concerns to Ilya, and it seems OVN should stay > > > away from the physical interface plugging. As a reference, here is how > > > ovn-kubernetes is doing it without adding anything to OVN: > > > https://docs.google.com/document/d/11IoMKiohK7hIyIE36FJmwJv46DEBx52a4fqvrpCBBcg/edit?usp=sharing > > > > > > <https://docs.google.com/document/d/11IoMKiohK7hIyIE36FJmwJv46DEBx52a4fqvrpCBBcg/edit?usp=sharing> > > > > AFAICT, a big part of the work is already done on the ovn-k8s side: > > https://github.com/ovn-org/ovn-kubernetes/pull/2005 > > https://github.com/ovn-org/ovn-kubernetes/pull/2042 > > I am aware of the on-going effort to implement support for this in > ovn-kubernetes directly. What we have identified is that there are > other CMSs that want this functionality, and with that we have an > opportunity to generalise and provide an abstraction in a common place > that all the consuming CMSes can benefit from. > > > > > > > However, thinking more about it, the proposed approach in this patch just > > > expands the way how OVN can bind ports, utilizing the communication > > > channel of OVN (OVSDB connections). If all the information regarding port > > > binding can be specified by the CMS from NB, then it is not unnatural for > > > ovn-controller to perform interface binding directly (instead of > > > passively accepting what is attached by CMS). This kind of information > > > already existed to some extent - the "requested_chassis" option in > > > OpenStack. Now it seems this idea is just expanding it to a specific > > > interface. The difference is that "requested_chassis" is used for > > > validation only, but now we want to directly apply it. So I think at > > > least I don't have a strong opinion on the idea. > > > > While it's, probably, OK for OVN to add port to the OVSDB, in many > > cases these ports will require a lot of extra configuration which > > is typically done by os-vif or CNI/device plugins. Imagine that OVS > > is running with userspace datapath and you need to plug-in some DPDK > > ports, where you have to specify the port type, DPDK port config, > > porbably also number of rx/tx queues, number of descriptors in these > > queues. You may also want to configure affinity for these queues per > > PMD thread in OVS. For kernel interfaces it might be easier, but they > > also might require some extra configuration that OVN will have to > > think about now. This is a typical job for CMS to configure this > > kind of stuff, and that is why projects like os-vif or large variety > > of CNI/device plugins exists. > > CNI is Kubernetes specific, os-vif is OpenStack specific. And both of > them get their information from the CMS. Providing support for > userspace datapath and DPDK would require more information, some of > which is available through devlink, some fit well in key/value > options. Our initial target would be to support the kernel representor > port workflow. > > > > > > > There are some benefits: > > > 1) The mechanism can be reused by different CMSes, which may simplify CMS > > > implementation. > > > 2) Compared with the ovn-k8s approach, it reuses OVN's communication > > > channel, which avoids an extra CMS communication channel on the smart NIC > > > side. (of course this can be achieved by a connection between the BM and > > > smart NIC with *restricted* API just to convey the necessary information) > > > > The problem I see is that at least ovn-k8s, AFAIU, will require > > the daemon on the Smart NIC anyways to monitor and configure > > OVN components, e.g. to configure ovn-remote for ovn-controller > > or run management appctl commands if required. > > So, I don't see the point why it can't do plugging if it's already > > there. > > This pattern is Kubernetes specific and is not the case for other > CMSes. The current proposal for enabling Smart NIC with control plane > CPUs for ovn-kubernetes could be simplified if the networking platform > provided means for coordinating more of the network related bits. > > > > > > > As to the negative side, it would increase OVN's complexity, and as > > > mentioned by Ilya potentially breaks OVN's platform independence. To > > > avoid this, I think the *plugging* module itself needs to be independent > > > and pluggable. It can be extended as independent plugins. The plugin > > > would need to define what information is needed in LSP's "options", and > > > then implement corresponding drivers. With this approach, even the > > > regular VIFs can be attached by ovn-controller if CMS can tell the > > > interface name. Anyway, this is just my brief thinking. > > > > Aside from plugging, > > I also don't see the reason to have devlink code in OVN just > > because it runs once on the init stage, IIUC. So, I don't > > understand why this information about the hardware cannot be > > retrieved during the host provisioning and stored somewhere > > (Nova config?). Isn't hardware inventory a job for tripleo > > or something? > > As noted in one of the TODOs in the commit message of the RFC one of > the work items to further develop this is to monitor devlink for > changes, the end product will not do one-time initialization. > > While I agree with you that the current SR-IOV acceleration workflow > configuration is pretty static and can be done at deploy time, this > proposal prepares for the next generation subfunction workflow where > you will have a much higher density, and run-time configuration and > discovery of representor ports. There is a paper about it from netdev > conf[2], and all of this is part of the devlink infrastructure (Linux > kernel ABI). > > 2: https://netdevconf.info/0x14/pub/papers/45/0x14-paper45-talk-paper.pdf
My 2 cents. I agree with Ilya. It doesn't seem natural to me for OVN to create OVS ports and also to use devlink infrastructure. I think an external entity can do all these. The other downside I see with this RFC is that it will be now monitoring all the port binding rows without any condition because when a port binding is created the chassis column will be NULL. Correct me If I'm wrong here. Perhaps there can be a separate project called - ovn-vif which does this ? Thanks Numan > > > -- > Frode Nordahl > > > Best regards, Ilya Maximets. > _______________________________________________ > dev mailing list > d...@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-dev > _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev