On Thu, Jun 10, 2021 at 10:13 AM Frode Nordahl
<frode.nord...@canonical.com> wrote:
>
> On Thu, Jun 10, 2021 at 1:46 PM Ilya Maximets <i.maxim...@ovn.org> wrote:
> >
> > On 6/10/21 8:36 AM, Han Zhou wrote:
> > >
> > >
> > > On Thu, May 13, 2021 at 9:25 AM Frode Nordahl 
> > > <frode.nord...@canonical.com <mailto:frode.nord...@canonical.com>> wrote:
> > >>
> > >> On Thu, May 13, 2021 at 5:12 PM Ilya Maximets <i.maxim...@ovn.org 
> > >> <mailto:i.maxim...@ovn.org>> wrote:
> > >> >
> > >> > On 5/9/21 4:03 PM, Frode Nordahl wrote:
> > >> > > Introduce plugging module that adds and removes ports on the
> > >> > > integration bridge, as directed by Port_Binding options.
> > >> > >
> > >> > > Traditionally it has been the CMSs responsibility to create Virtual
> > >> > > Interfaces (VIFs) as part of instance (Container, Pod, Virtual
> > >> > > Machine etc.) life cycle, and subsequently manage plug/unplug
> > >> > > operations on the Open vSwitch integration bridge.
> > >> > >
> > >> > > With the advent of NICs connected to multiple distinct CPUs we can
> > >> > > have a topology where the instance runs on one host and Open
> > >> > > vSwitch and OVN runs on a different host, the smartnic CPU.
> > >> > >
> > >> > > The act of plugging and unplugging the representor port in Open
> > >> > > vSwitch running on the smartnic host CPU would be the same for
> > >> > > every smartnic variant (thanks to the devlink-port[0][1]
> > >> > > infrastructure) and every CMS (Kubernetes, LXD, OpenStack, etc.).
> > >> > > As such it is natural to extend OVN to provide this common
> > >> > > functionality through its CMS facing API.
> > >> >
> > >> > Hi, Frode.  Thanks for putting this together, but it doesn't look
> > >> > natural to me.  OVN, AFAIK, never touched physical devices or
> > >> > interacted with the kernel directly.  This change introduces completely
> > >> > new functionality inside OVN.  With the same effect we can run a fully
> > >> > separate service on these smartnic CPUs that will do plugging
> > >> > and configuration job for CMS.  You may even make it independent
> > >> > from a particular CMS by creating a REST API for it or whatever.
> > >> > This will additionally allow using same service for non-OVN setups.
> > >>
> > >> Ilya,
> > >>
> > >> Thank you for taking the time to comment, much appreciated.
> > >>
> > >> Yes, this is new functionality, NICs with separate control plane CPUs
> > >> and isolation from the host are also new, so this is one proposal for
> > >> how we could go about to enable the use of them.
> > >>
> > >> The OVN controller does today get pretty close to the physical realm
> > >> by maintaining patch ports in Open vSwitch based on bridge mapping
> > >> configuration and presence of bridges to physical interfaces. It also
> > >> does react to events of physical interfaces being plugged into the
> > >> Open vSwitch instance it manages, albeit to date some other entity has
> > >> been doing the act of adding the port into the bridge.
> > >>
> > >> The rationale for proposing to use the OVN database for coordinating
> > >> this is that the information about which ports to bind, and where to
> > >> bind them is already there. The timing of the information flow from
> > >> the CMS is also suitable for the task.
> > >>
> > >> OVN relies on OVS library code, and all the necessary libraries for
> > >> interfacing with the kernel through netlink and friends are there or
> > >> would be easy to add. The rationale for using the netlink-devlink
> > >> interface is that it provides a generic infrastructure for these types
> > >> of NICs. So by using this interface we should be able to support most
> > >> if not all of the variants of these cards.
> > >>
> > >>
> > >> Providing a separate OVN service to do the task could work, but would
> > >> have the cost of an extra SB DB connection, IDL and monitors.
> >
> > IMHO, CMS should never connect to Southbound DB.  It's just because the
> > Southbound DB is not meant to be a public interface, it just happened
> > to be available for connections.  I know that OpenStack has metadata
> > agents that connects to Sb DB, but if it's really required for them, I
> > think, there should be a different way to get/set required information
> > without connection to the Southbound.
>
> The CMS-facing API is the Northbound DB, I was not suggesting direct
> use of the Southbound DB by external to OVN services. My suggestion
> was to have a separate OVN process do this if your objection was to
> handle it as part of the ovn-controller process.
>
> > >>
> > >> I fear it would be quite hard to build a whole separate project with
> > >> its own API, feels like a lot of duplicated effort when the flow of
> > >> data and APIs in OVN already align so well with CMSs interested in
> > >> using this?
> > >>
> > >> > Interactions with physical devices also makes OVN linux-dependent
> > >> > at least for this use case, IIUC.
> > >>
> > >> This specific bit would be linux-specific in the first iteration, yes.
> > >> But the vendors manufacturing and distributing the hardware do often
> > >> have drivers for other platforms, I am sure the necessary
> > >> infrastructure will become available there too over time, if it is not
> > >> there already.
> > >>
> > >> We do currently have platform specific macros in the OVN build system,
> > >> so we could enable the functionality when built on a compatible
> > >> platform.
> > >>
> > >> > Maybe, others has different opinions.
> > >>
> > >> I appreciate your opinion, and enjoy discussing this topic.
> > >>
> > >> > Another though is that there is, obviously, a network connection
> > >> > between the host and smartnic system.  Maybe it's possible to just
> > >> > add an extra remote to the local ovsdb-server so CMS daemon on the
> > >> > host system could just add interfaces over the network connection?
> > >>
> > >> There are a few issues with such an approach. One of the main goals
> > >> with providing and using a NIC with control plane CPUs is having an
> > >> extra layer of security and isolation which is separate from the
> > >> hypervisor host the card happens to share a PCI complex with and draw
> > >> power from. Requiring a connection between the two for operation would
> > >> defy this purpose.
> > >>
> > >> In addition to that, this class of cards provide visibility into
> > >> kernel interfaces, enumeration of representor ports etc. only from the
> > >> NIC control plane CPU side of the PCI complex, this information is not
> > >> provided to the host. So if a hypervisor host CMS agent were to do the
> > >> plugging through a remote ovsdb connection, it would have to
> > >> communicate with something else running on the NIC control plane CPU
> > >> to retrieve the information it needs before it can know what to relay
> > >> back over the ovsdb connection.
> > >>
> > >> --
> > >> Frode Nordahl
> > >>
> > >> > Best regards, Ilya Maximets.
> > >
> > > Here are my 2 cents.
> > >
> > > Initially I had similar concerns to Ilya, and it seems OVN should stay 
> > > away from the physical interface plugging. As a reference, here is how 
> > > ovn-kubernetes is doing it without adding anything to OVN: 
> > > https://docs.google.com/document/d/11IoMKiohK7hIyIE36FJmwJv46DEBx52a4fqvrpCBBcg/edit?usp=sharing
> > >  
> > > <https://docs.google.com/document/d/11IoMKiohK7hIyIE36FJmwJv46DEBx52a4fqvrpCBBcg/edit?usp=sharing>
> >
> > AFAICT, a big part of the work is already done on the ovn-k8s side:
> >   https://github.com/ovn-org/ovn-kubernetes/pull/2005
> >   https://github.com/ovn-org/ovn-kubernetes/pull/2042
>
> I am aware of the on-going effort to implement support for this in
> ovn-kubernetes directly. What we have identified is that there are
> other CMSs that want this functionality, and with that we have an
> opportunity to generalise and provide an abstraction in a common place
> that all the consuming CMSes can benefit from.
>
> > >
> > > However, thinking more about it, the proposed approach in this patch just 
> > > expands the way how OVN can bind ports, utilizing the communication 
> > > channel of OVN (OVSDB connections). If all the information regarding port 
> > > binding can be specified by the CMS from NB, then it is not unnatural for 
> > > ovn-controller to perform interface binding directly (instead of 
> > > passively accepting what is attached by CMS). This kind of information 
> > > already existed to some extent - the "requested_chassis" option in 
> > > OpenStack. Now it seems this idea is just expanding it to a specific 
> > > interface. The difference is that "requested_chassis" is used for 
> > > validation only, but now we want to directly apply it. So I think at 
> > > least I don't have a strong opinion on the idea.
> >
> > While it's, probably, OK for OVN to add port to the OVSDB, in many
> > cases these ports will require a lot of extra configuration which
> > is typically done by os-vif or CNI/device plugins.  Imagine that OVS
> > is running with userspace datapath and you need to plug-in some DPDK
> > ports, where you have to specify the port type, DPDK port config,
> > porbably also number of rx/tx queues, number of descriptors in these
> > queues.  You may also want to configure affinity for these queues per
> > PMD thread in OVS.  For kernel interfaces it might be easier, but they
> > also might require some extra configuration that OVN will have to
> > think about now.  This is a typical job for CMS to configure this
> > kind of stuff, and that is why projects like os-vif or large variety
> > of CNI/device plugins exists.
>
> CNI is Kubernetes specific, os-vif is OpenStack specific. And both of
> them get their information from the CMS. Providing support for
> userspace datapath and DPDK would require more information, some of
> which is available through devlink, some fit well in key/value
> options. Our initial target would be to support the kernel representor
> port workflow.
>
> > >
> > > There are some benefits:
> > > 1) The mechanism can be reused by different CMSes, which may simplify CMS 
> > > implementation.
> > > 2) Compared with the ovn-k8s approach, it reuses OVN's communication 
> > > channel, which avoids an extra CMS communication channel on the smart NIC 
> > > side. (of course this can be achieved by a connection between the BM and 
> > > smart NIC with *restricted* API just to convey the necessary information)
> >
> > The problem I see is that at least ovn-k8s, AFAIU, will require
> > the daemon on the Smart NIC anyways to monitor and configure
> > OVN components, e.g. to configure ovn-remote for ovn-controller
> > or run management appctl commands if required.
> > So, I don't see the point why it can't do plugging if it's already
> > there.
>
> This pattern is Kubernetes specific and is not the case for other
> CMSes. The current proposal for enabling Smart NIC with control plane
> CPUs for ovn-kubernetes could be simplified if the networking platform
> provided means for coordinating more of the network related bits.
>
> > >
> > > As to the negative side, it would increase OVN's complexity, and as 
> > > mentioned by Ilya potentially breaks OVN's platform independence. To 
> > > avoid this, I think the *plugging* module itself needs to be independent 
> > > and pluggable. It can be extended as independent plugins. The plugin 
> > > would need to define what information is needed in LSP's "options", and 
> > > then implement corresponding drivers. With this approach, even the 
> > > regular VIFs can be attached by ovn-controller if CMS can tell the 
> > > interface name. Anyway, this is just my brief thinking.
> >
> > Aside from plugging,
> > I also don't see the reason to have devlink code in OVN just
> > because it runs once on the init stage, IIUC.  So, I don't
> > understand why this information about the hardware cannot be
> > retrieved during the host provisioning and stored somewhere
> > (Nova config?).  Isn't hardware inventory a job for tripleo
> > or something?
>
> As noted in one of the TODOs in the commit message of the RFC one of
> the work items to further develop this is to monitor devlink for
> changes, the end product will not do one-time initialization.
>
> While I agree with you that the current SR-IOV acceleration workflow
> configuration is pretty static and can be done at deploy time, this
> proposal prepares for the next generation subfunction workflow where
> you will have a much higher density, and run-time configuration and
> discovery of representor ports. There is a paper about it from netdev
> conf[2], and all of this is part of the devlink infrastructure (Linux
> kernel ABI).
>
> 2: https://netdevconf.info/0x14/pub/papers/45/0x14-paper45-talk-paper.pdf


My 2 cents.

I agree with Ilya.  It doesn't seem natural to me for OVN to create
OVS ports and also
to use devlink infrastructure.  I think an external entity can do all these.

The other downside I see with this RFC is that it will be now
monitoring all the port binding
rows without any condition because when a port binding is created the
chassis column will
be NULL.  Correct me If I'm wrong here.

Perhaps there can be a separate project called - ovn-vif which does this ?

Thanks
Numan

>
>
> --
> Frode Nordahl
>
> > Best regards, Ilya Maximets.
> _______________________________________________
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to