This looks like a great start. Let's get it in, and we can continue to refine it as necessary:
Acked-by: Justin Pettit <[email protected]> Would you please push this before the other patches in the series? Thanks, --Justin > On Feb 25, 2015, at 9:13 PM, Ben Pfaff <[email protected]> wrote: > > This commit adds preliminary design documentation for Open Virtual Network, > or OVN, a new OVS-based project to add support for virtual networking to > OVS, initially with OpenStack integration. > > This initial design has been influenced by many people, including (in > alphabetical order) Aaron Rosen, Chris Wright, Gurucharan Shetty, Jeremy > Stribling, Justin Pettit, Ken Duda, Kevin Benton, Kyle Mestery, Madhu > Venugopal, Martin Casado, Natasha Gude, Pankaj Thakkar, Russell Bryant, > Teemu Koponen, and Thomas Graf. All blunders, however, are due to my own > hubris. > > Signed-off-by: Ben Pfaff <[email protected]> > --- > v1->v2: Rebase. > v2->v3: > - Multiple CMSes are possible. > - Whitespace and typo fixes. > - ovn.ovsschema: Gateway table is not a root table, other tables are. > - ovn.xml: Talk about deleting rows on HV shutdown. > - ovn-nb.xml: Clarify 'switch' column in ACL table. > - ovn-nb.ovssechma: A Logical_Router_Port is no longer a Logical_Port. > - ovn.xml: Add action for generating ARP. > - ovn-nb.xml: Add allow-related action for security group support. > v3->v4: > - Add initial TODO list. > v4->v5: > - TODO: Revise default tunnel encapsulation thoughts. > - TODO: Fill in a few details for Neutron plugin. > - ovn-architecture: Mention DHCP as desirable. > --- > Makefile.am | 1 + > configure.ac | 3 +- > ovn/TODO | 306 ++++++++++++++++++++++++++++ > ovn/automake.mk | 77 +++++++ > ovn/ovn-architecture.7.xml | 339 +++++++++++++++++++++++++++++++ > ovn/ovn-controller.8.in | 41 ++++ > ovn/ovn-nb.ovsschema | 62 ++++++ > ovn/ovn-nb.xml | 245 ++++++++++++++++++++++ > ovn/ovn.ovsschema | 50 +++++ > ovn/ovn.xml | 497 +++++++++++++++++++++++++++++++++++++++++++++ > 10 files changed, 1620 insertions(+), 1 deletion(-) > create mode 100644 ovn/TODO > create mode 100644 ovn/automake.mk > create mode 100644 ovn/ovn-architecture.7.xml > create mode 100644 ovn/ovn-controller.8.in > create mode 100644 ovn/ovn-nb.ovsschema > create mode 100644 ovn/ovn-nb.xml > create mode 100644 ovn/ovn.ovsschema > create mode 100644 ovn/ovn.xml > > diff --git a/Makefile.am b/Makefile.am > index 0480d20..699a580 100644 > --- a/Makefile.am > +++ b/Makefile.am > @@ -370,3 +370,4 @@ include tutorial/automake.mk > include vtep/automake.mk > include datapath-windows/automake.mk > include datapath-windows/include/automake.mk > +include ovn/automake.mk > diff --git a/configure.ac b/configure.ac > index d2d02ca..795f876 100644 > --- a/configure.ac > +++ b/configure.ac > @@ -1,4 +1,4 @@ > -# Copyright (c) 2008, 2009, 2010, 2011, 2012, 2013, 2014 Nicira, Inc. > +# Copyright (c) 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015 Nicira, Inc. > # > # Licensed under the Apache License, Version 2.0 (the "License"); > # you may not use this file except in compliance with the License. > @@ -182,6 +182,7 @@ dnl This makes sure that include/openflow gets created in > the build directory. > AC_CONFIG_COMMANDS([include/openflow/openflow.h.stamp]) > > AC_CONFIG_COMMANDS([utilities/bugtool/dummy], [:]) > +AC_CONFIG_COMMANDS([ovn/dummy], [:]) > > m4_ifdef([AM_SILENT_RULES], [AM_SILENT_RULES]) > > diff --git a/ovn/TODO b/ovn/TODO > new file mode 100644 > index 0000000..e405c7c > --- /dev/null > +++ b/ovn/TODO > @@ -0,0 +1,306 @@ > +* Flow match expression handling library. > + > + ovn-controller is the primary user of flow match expressions, but > + the same syntax and I imagine the same code ought to be useful in > + ovn-nbd for ACL match expressions. > + > +** Definition of data structures to represent a match expression as a > + syntax tree. > + > +** Definition of data structures to represent variables (fields). > + > + Fields need names and prerequisites. Most fields are numeric and > + thus need widths. We need also need a way to represent nominal > + fields (currently just logical port names). It might be > + appropriate to associate fields directly with OXM/NXM code points; > + we have to decide whether we want OVN to use the OVS flow structure > + or work with OXM more directly. > + > + Probably should be defined so that the data structure is also > + useful for references to fields in action parsing. > + > +** Lexical analysis. > + > + Probably should be defined so that the lexer can be reused for > + parsing actions. > + > +** Parsing into syntax tree. > + > +** Semantic checking against variable definitions. > + > +** Applying prerequisites. > + > +** Simplification into conjunction-of-disjunctions (CoD) form. > + > +** Transformation from CoD form into OXM matches. > + > +* ovn-controller > + > +** Flow table handling in ovn-controller. > + > + ovn-controller has to transform logical datapath flows from the > + database into OpenFlow flows. > + > +*** Definition (or choice) of data structure for flows and flow table. > + > + It would be natural enough to use "struct flow" and "struct > + classifier" for this. Maybe that is what we should do. However, > + "struct classifier" is optimized for searches based on packet > + headers, whereas all we care about here can be implemented with a > + hash table. Also, we may want to make it easy to add and remove > + support for fields without recompiling, which is not possible with > + "struct flow" or "struct classifier". > + > + On the other hand, we may find that it is difficult to decide that > + two OXM flow matches are identical (to normalize them) without a > + lot of domain-specific knowledge that is already embedded in struct > + flow. It's also going to be a pain to come up with a way to make > + anything other than "struct flow" work with the ofputil_*() > + functions for encoding and decoding OpenFlow. > + > + It's also possible we could use struct flow without struct > + classifier. > + > +*** Assembling conjunctive flows from flow match expressions. > + > + This transformation explodes logical datapath flows into multiple > + OpenFlow flow table entries, since a flow match expression in CoD > + form requires several OpenFlow flow table entries. It also > + requires merging together OpenFlow flow tables entries that contain > + "conjunction" actions (really just concatenating their actions). > + > +*** Translating logical datapath port names into port numbers. > + > + Logical ports are specified by name in logical datapath flows, but > + OpenFlow only works in terms of numbers. > + > +*** Translating logical datapath actions into OpenFlow actions. > + > + Some of the logical datapath actions do not have natural > + representations as OpenFlow actions: they require > + packet-in/packet-out round trips through ovn-controller. The > + trickiest part of that is going to be making sure that the > + packet-out resumes the control flow that was broken off by the > + packet-in. That's tricky; we'll probably have to restrict control > + flow or add OVS features to make resuming in general possible. Not > + sure which is better at this point. > + > +*** OpenFlow flow table synchronization. > + > + The internal representation of the OpenFlow flow table has to be > + synced across the controller connection to OVS. This probably > + boils down to the "flow monitoring" feature of OF1.4 which was then > + made available as a "standard extension" to OF1.3. (OVS hasn't > + implemented this for OF1.4 yet, but the feature is based on a OVS > + extension to OF1.0, so it should be straightforward to add it.) > + > + We probably need some way to catch cases where OVS and OVN don't > + see eye-to-eye on what exactly constitutes a flow, so that OVN > + doesn't waste a lot of CPU time hammering at OVS trying to install > + something that it's not going to do. > + > +*** Logical/physical translation. > + > + When a packet comes into the integration bridge, the first stage of > + processing needs to translate it from a physical to a logical > + context. When a packet leaves the integration bridge, the final > + stage of processing needs to translate it back into a physical > + context. ovn-controller needs to populate the OpenFlow flows > + tables to do these translations. > + > +*** Determine how to split logical pipeline across physical nodes. > + > + From the original OVN architecture document: > + > + The pipeline processing is split between the ingress and egress > + transport nodes. In particular, the logical egress processing may > + occur at either hypervisor. Processing the logical egress on the > + ingress hypervisor requires more state about the egress vif's > + policies, but reduces traffic on the wire that would eventually be > + dropped. Whereas, processing on the egress hypervisor can reduce > + broadcast traffic on the wire by doing local replication. We > + initially plan to process logical egress on the egress hypervisor > + so that less state needs to be replicated. However, we may change > + this behavior once we gain some experience writing the logical > + flows. > + > + The split pipeline processing split will influence how tunnel keys > + are encoded. > + > +** Interaction with Open_vSwitch and OVN databases: > + > +*** Monitor VIFs attached to the integration bridge in Open_vSwitch. > + > + In response to changes, add or remove corresponding rows in > + Bindings table in OVN. > + > +*** Populate Chassis row in OVN at startup. Maintain Chassis row over time. > + > + (Warn if any other Chassis claims the same IP address.) > + > +*** Remove Chassis and Bindings rows from OVN on exit. > + > +*** Monitor Chassis table in OVN. > + > + Populate Port records for tunnels to other chassis into > + Open_vSwitch database. As a scale optimization later on, one can > + populate only records for tunnels to other chassis that have > + logical networks in common with this one. > + > +*** Monitor Pipeline table in OVN, trigger flow table recomputation on > change. > + > +** ovn-controller parameters and configuration. > + > +*** Tunnel encapsulation to publish. > + > + Default: VXLAN? Geneve? > + > +*** Location of Open_vSwitch database. > + > + We can probably use the same default as ovs-vsctl. > + > +*** Location of OVN database. > + > + Probably no useful default. > + > +*** SSL configuration. > + > + Can probably get this from Open_vSwitch database. > + > +* ovn-nbd > + > +** Monitor OVN_Northbound database, trigger Pipeline recomputation on change. > + > +** Translate each OVN_Northbound entity into Pipeline logical datapath flows. > + > + We have to first sit down and figure out what the general > + translation of each entity is. The original OVN architecture > + description at > + http://openvswitch.org/pipermail/dev/2015-January/050380.html had > + some sketches of these, but they need to be completed and > + elaborated. > + > + Initially, the simplest way to do this is probably to write > + straight C code to do a full translation of the entire > + OVN_Northbound database into the format for the Pipeline table in > + the OVN database. As scale increases, this will probably be too > + inefficient since a small change in OVN_Northbound requires a full > + recomputation. At that point, we probably want to adopt a more > + systematic approach, such as something akin to the "nlog" system > + used in NVP (see Koponen et al. "Network Virtualization in > + Multi-tenant Datacenters", NSDI 2014). > + > +** Push logical datapath flows to Pipeline table. > + > +** Monitor OVN database Bindings table. > + > + Sync rows in the OVN Bindings table to the "up" column in the > + OVN_Northbound database. > + > +* ovsdb-server > + > + ovsdb-server should have adequate features for OVN but it probably > + needs work for scale and possibly for availability as deployments > + grow. Here are some thoughts. > + > + Andy Zhou is looking at these issues. > + > +** Scaling number of connections. > + > + In typical use today a given ovsdb-server has only a single-digit > + number of simultaneous connections. The OVN database will have a > + connection from every hypervisor. This use case needs testing and > + probably coding work. Here are some possible improvements. > + > +*** Reducing amount of data sent to clients. > + > + Currently, whenever a row monitored by a client changes, > + ovsdb-server sends the client every monitored column in the row, > + even if only one column changes. It might be valuable to reduce > + this only to the columns that changes. > + > + Also, whenever a column changes, ovsdb-server sends the entire > + contents of the column. It might be valuable, for columns that > + are sets or maps, to send only added or removed values or > + key-values pairs. > + > + Currently, clients monitor the entire contents of a table. It > + might make sense to allow clients to monitor only rows that > + satisfy specific criteria, e.g. to allow an ovn-controller to > + receive only Pipeline rows for logical networks on its hypervisor. > + > +*** Reducing redundant data and code within ovsdb-server. > + > + Currently, ovsdb-server separately composes database update > + information to send to each of its clients. This is fine for a > + small number of clients, but it wastes time and memory when > + hundreds of clients all want the same updates (as will be in the > + case in OVN). > + > + (This is somewhat opposed to the idea of letting a client monitor > + only some rows in a table, since that would increase the diversity > + among clients.) > + > +*** Multithreading. > + > + If it turns out that other changes don't let ovsdb-server scale > + adequately, we can multithread ovsdb-server. Initially one might > + only break protocol handling into separate threads, leaving the > + actual database work serialized through a lock. > + > +** Increasing availability. > + > + Database availability might become an issue. The OVN system > + shouldn't grind to a halt if the database becomes unavailable, but > + it would become impossible to bring VIFs up or down, etc. > + > + My current thought on how to increase availability is to add > + clustering to ovsdb-server, probably via the Raft consensus > + algorithm. As an experiment, I wrote an implementation of Raft > + for Open vSwitch that you can clone from: > + > + https://github.com/blp/ovs-reviews.git raft > + > +** Reducing startup time. > + > + As-is, if ovsdb-server restarts, every client will fetch a fresh > + copy of the part of the database that it cares about. With > + hundreds of clients, this could cause heavy CPU load on > + ovsdb-server and use excessive network bandwidth. It would be > + better to allow incremental updates even across connection loss. > + One way might be to use "Difference Digests" as described in > + Epstein et al., "What's the Difference? Efficient Set > + Reconciliation Without Prior Context". (I'm not yet aware of > + previous non-academic use of this technique.) > + > +* Miscellaneous: > + > +** Write ovn-nbctl utility. > + > + The idea here is that we need a utility to act on the OVN_Northbound > + database in a way similar to a CMS, so that we can do some testing > + without an actual CMS in the picture. > + > + No details yet. > + > +** Init scripts for ovn-controller (on HVs), ovn-nbd, OVN DB server. > + > +** Distribution packaging. > + > +* Not yet scoped: > + > +** Neutron plugin. > + > +*** Create stackforge/networking-ovn repository based on OpenStack's > +cookiecutter git repo generator > + > +*** Document mappings between Neutron data model and the OVN northbound DB > + > +*** Create a Neutron ML2 mechanism driver that implements the mappings > +on Neutron resource requests > + > +*** Add synchronization for when we need to sanity check that the OVN > +northbound DB reflects the current state of the world as intended by > +Neutron (needed for various failure scenarios) > + > +** Gateways. > diff --git a/ovn/automake.mk b/ovn/automake.mk > new file mode 100644 > index 0000000..a4951dc > --- /dev/null > +++ b/ovn/automake.mk > @@ -0,0 +1,77 @@ > +# OVN schema and IDL > +EXTRA_DIST += ovn/ovn.ovsschema > +pkgdata_DATA += ovn/ovn.ovsschema > + > +# OVN E-R diagram > +# > +# If "python" or "dot" is not available, then we do not add graphical diagram > +# to the documentation. > +if HAVE_PYTHON > +if HAVE_DOT > +ovn/ovn.gv: ovsdb/ovsdb-dot.in ovn/ovn.ovsschema > + $(AM_V_GEN)$(OVSDB_DOT) --no-arrows $(srcdir)/ovn/ovn.ovsschema > $@ > +ovn/ovn.pic: ovn/ovn.gv ovsdb/dot2pic > + $(AM_V_GEN)(dot -T plain < ovn/ovn.gv | $(PERL) $(srcdir)/ovsdb/dot2pic > -f 3) > [email protected] && \ > + mv [email protected] $@ > +OVN_PIC = ovn/ovn.pic > +OVN_DOT_DIAGRAM_ARG = --er-diagram=$(OVN_PIC) > +DISTCLEANFILES += ovn/ovn.gv ovn/ovn.pic > +endif > +endif > + > +# OVN schema documentation > +EXTRA_DIST += ovn/ovn.xml > +DISTCLEANFILES += ovn/ovn.5 > +man_MANS += ovn/ovn.5 > +ovn/ovn.5: \ > + ovsdb/ovsdb-doc ovn/ovn.xml ovn/ovn.ovsschema $(OVN_PIC) > + $(AM_V_GEN)$(OVSDB_DOC) \ > + $(OVN_DOT_DIAGRAM_ARG) \ > + --version=$(VERSION) \ > + $(srcdir)/ovn/ovn.ovsschema \ > + $(srcdir)/ovn/ovn.xml > [email protected] && \ > + mv [email protected] $@ > + > +# OVN northbound schema and IDL > +EXTRA_DIST += ovn/ovn-nb.ovsschema > +pkgdata_DATA += ovn/ovn-nb.ovsschema > + > +# OVN northbound E-R diagram > +# > +# If "python" or "dot" is not available, then we do not add graphical diagram > +# to the documentation. > +if HAVE_PYTHON > +if HAVE_DOT > +ovn/ovn-nb.gv: ovsdb/ovsdb-dot.in ovn/ovn-nb.ovsschema > + $(AM_V_GEN)$(OVSDB_DOT) --no-arrows $(srcdir)/ovn/ovn-nb.ovsschema > $@ > +ovn/ovn-nb.pic: ovn/ovn-nb.gv ovsdb/dot2pic > + $(AM_V_GEN)(dot -T plain < ovn/ovn-nb.gv | $(PERL) > $(srcdir)/ovsdb/dot2pic -f 3) > [email protected] && \ > + mv [email protected] $@ > +OVN_NB_PIC = ovn/ovn-nb.pic > +OVN_NB_DOT_DIAGRAM_ARG = --er-diagram=$(OVN_NB_PIC) > +DISTCLEANFILES += ovn/ovn-nb.gv ovn/ovn-nb.pic > +endif > +endif > + > +# OVN northbound schema documentation > +EXTRA_DIST += ovn/ovn-nb.xml > +DISTCLEANFILES += ovn/ovn-nb.5 > +man_MANS += ovn/ovn-nb.5 > +ovn/ovn-nb.5: \ > + ovsdb/ovsdb-doc ovn/ovn-nb.xml ovn/ovn-nb.ovsschema $(OVN_NB_PIC) > + $(AM_V_GEN)$(OVSDB_DOC) \ > + $(OVN_NB_DOT_DIAGRAM_ARG) \ > + --version=$(VERSION) \ > + $(srcdir)/ovn/ovn-nb.ovsschema \ > + $(srcdir)/ovn/ovn-nb.xml > [email protected] && \ > + mv [email protected] $@ > + > +man_MANS += ovn/ovn-controller.8 ovn/ovn-architecture.7 > +EXTRA_DIST += ovn/ovn-controller.8.in ovn/ovn-architecture.7.xml > + > +SUFFIXES += .xml > +%: %.xml > + $(AM_V_GEN)$(run_python) $(srcdir)/build-aux/xml2nroff \ > + --version=$(VERSION) $< > [email protected] && mv [email protected] $@ > + > +EXTRA_DIST += ovn/TODO > diff --git a/ovn/ovn-architecture.7.xml b/ovn/ovn-architecture.7.xml > new file mode 100644 > index 0000000..9ffa036 > --- /dev/null > +++ b/ovn/ovn-architecture.7.xml > @@ -0,0 +1,339 @@ > +<?xml version="1.0" encoding="utf-8"?> > +<manpage program="ovn-architecture" section="7" title="OVN Architecture"> > + <h1>Name</h1> > + <p>ovn-architecture -- Open Virtual Network architecture</p> > + > + <h1>Description</h1> > + > + <p> > + OVN, the Open Virtual Network, is a system to support virtual network > + abstraction. OVN complements the existing capabilities of OVS to add > + native support for virtual network abstractions, such as virtual L2 and > L3 > + overlays and security groups. Services such as DHCP are also desirable > + features. Just like OVS, OVN's design goal is to have a > production-quality > + implementation that can operate at significant scale. > + </p> > + > + <p> > + An OVN deployment consists of several components: > + </p> > + > + <ul> > + <li> > + <p> > + A <dfn>Cloud Management System</dfn> (<dfn>CMS</dfn>), which is > + OVN's ultimate client (via its users and administrators). OVN > + integration requires installing a CMS-specific plugin and > + related software (see below). OVN initially targets OpenStack > + as CMS. > + </p> > + > + <p> > + We generally speak of ``the'' CMS, but one can imagine scenarios in > + which multiple CMSes manage different parts of an OVN deployment. > + </p> > + </li> > + > + <li> > + An OVN Database physical or virtual node (or, eventually, cluster) > + installed in a central location. > + </li> > + > + <li> > + One or more (usually many) <dfn>hypervisors</dfn>. Hypervisors must > run > + Open vSwitch and implement the interface described in > + <code>IntegrationGuide.md</code> in the OVS source tree. Any > hypervisor > + platform supported by Open vSwitch is acceptable. > + </li> > + > + <li> > + <p> > + Zero or more <dfn>gateways</dfn>. A gateway extends a tunnel-based > + logical network into a physical network by bidirectionally forwarding > + packets between tunnels and a physical Ethernet port. This allows > + non-virtualized machines to participate in logical networks. A gateway > + may be a physical host, a virtual machine, or an ASIC-based hardware > + switch that supports the <code>vtep</code>(5) schema. (Support for the > + latter will come later in OVN implementation.) > + </p> > + > + <p> > + Hypervisors and gateways are together called <dfn>transport node</dfn> > + or <dfn>chassis</dfn>. > + </p> > + </li> > + </ul> > + > + <p> > + The diagram below shows how the major components of OVN and related > + software interact. Starting at the top of the diagram, we have: > + </p> > + > + <ul> > + <li> > + The Cloud Management System, as defined above. > + </li> > + > + <li> > + <p> > + The <dfn>OVN/CMS Plugin</dfn> is the component of the CMS that > + interfaces to OVN. In OpenStack, this is a Neutron plugin. > + The plugin's main purpose is to translate the CMS's notion of logical > + network configuration, stored in the CMS's configuration database in a > + CMS-specific format, into an intermediate representation understood by > + OVN. > + </p> > + > + <p> > + This component is necessarily CMS-specific, so a new plugin needs to be > + developed for each CMS that is integrated with OVN. All of the > + components below this one in the diagram are CMS-independent. > + </p> > + </li> > + > + <li> > + <p> > + The <dfn>OVN Northbound Database</dfn> receives the intermediate > + representation of logical network configuration passed down by the > + OVN/CMS Plugin. The database schema is meant to be ``impedance > + matched'' with the concepts used in a CMS, so that it directly supports > + notions of logical switches, routers, ACLs, and so on. See > + <code>ovs-nb</code>(5) for details. > + </p> > + > + <p> > + The OVN Northbound Database has only two clients: the OVN/CMS Plugin > + above it and <code>ovn-nbd</code> below it. > + </p> > + </li> > + > + <li> > + <code>ovn-nbd</code>(8) connects to the OVN Northbound Database above > it > + and the OVN Database below it. It translates the logical network > + configuration in terms of conventional network concepts, taken from the > + OVN Northbound Database, into logical datapath flows in the OVN > Database > + below it. > + </li> > + > + <li> > + <p> > + The <dfn>OVN Database</dfn> is the center of the system. Its clients > + are <code>ovn-nbd</code>(8) above it and <code>ovn-controller</code>(8) > + on every transport node below it. > + </p> > + > + <p> > + The OVN Database contains three kinds of data: <dfn>Physical > + Network</dfn> (PN) tables that specify how to reach hypervisor and > + other nodes, <dfn>Logical Network</dfn> (LN) tables that describe the > + logical network in terms of ``logical datapath flows,'' and > + <dfn>Binding</dfn> tables that link logical network components' > + locations to the physical network. The hypervisors populate the PN and > + Binding tables, whereas <code>ovn-nbd</code>(8) populates the LN > + tables. > + </p> > + > + <p> > + OVN Database performance must scale with the number of transport nodes. > + This will likely require some work on <code>ovsdb-server</code>(1) as > + we encounter bottlenecks. Clustering for availability may be needed. > + </p> > + </li> > + </ul> > + > + <p> > + The remaining components are replicated onto each hypervisor: > + </p> > + > + <ul> > + <li> > + <code>ovn-controller</code>(8) is OVN's agent on each hypervisor and > + software gateway. Northbound, it connects to the OVN Database to learn > + about OVN configuration and status and to populate the PN and > <code>Bindings</code> > + tables with the hypervisor's status. Southbound, it connects to > + <code>ovs-vswitchd</code>(8) as an OpenFlow controller, for control > over > + network traffic, and to the local <code>ovsdb-server</code>(1) to allow > + it to monitor and control Open vSwitch configuration. > + </li> > + > + <li> > + <code>ovs-vswitchd</code>(8) and <code>ovsdb-server</code>(1) are > + conventional components of Open vSwitch. > + </li> > + </ul> > + > + <pre fixed="yes"> > + CMS > + | > + | > + +-----------|-----------+ > + | | | > + | OVN/CMS Plugin | > + | | | > + | | | > + | OVN Northbound DB | > + | | | > + | | | > + | ovn-nbd | > + | | | > + +-----------|-----------+ > + | > + | > + +------+ > + |OVN DB| > + +------+ > + | > + | > + +------------------+------------------+ > + | | | > + HV 1 | | HV n | > ++---------------|---------------+ . +---------------|---------------+ > +| | | . | | | > +| ovn-controller | . | ovn-controller | > +| | | | . | | | | > +| | | | | | | | > +| ovs-vswitchd ovsdb-server | | ovs-vswitchd ovsdb-server | > +| | | | > ++-------------------------------+ +-------------------------------+ > + </pre> > + > + <h3>Life Cycle of a VIF</h3> > + > + <p> > + Tables and their schemas presented in isolation are difficult to > + understand. Here's an example. > + </p> > + > + <p> > + The steps in this example refer often to details of the OVN and OVN > + Northbound database schemas. Please see <code>ovn</code>(5) and > + <code>ovn-nb</code>(5), respectively, for the full story on these > + databases. > + </p> > + > + <ol> > + <li> > + A VIF's life cycle begins when a CMS administrator creates a new VIF > + using the CMS user interface or API and adds it to a switch (one > + implemented by OVN as a logical switch). The CMS updates its own > + configuration. This includes associating unique, persistent identifier > + <var>vif-id</var> and Ethernet address <var>mac</var> with the VIF. > + </li> > + > + <li> > + The CMS plugin updates the OVN Northbound database to include the new > + VIF, by adding a row to the <code>Logical_Port</code> table. In the > new > + row, <code>name</code> is <var>vif-id</var>, <code>mac</code> is > + <var>mac</var>, <code>switch</code> points to the OVN logical switch's > + Logical_Switch record, and other columns are initialized appropriately. > + </li> > + > + <li> > + <code>ovs-nbd</code> receives the OVN Northbound database update. In > + turn, it makes the corresponding updates to the OVN database, by adding > + rows to the OVN database <code>Pipeline</code> table to reflect the new > + port, e.g. add a flow to recognize that packets destined to the new > + port's MAC address should be delivered to it, and update the flow that > + delivers broadcast and multicast packets to include the new port. > + </li> > + > + <li> > + On every hypervisor, <code>ovn-controller</code> receives the > + <code>Pipeline</code> table updates that <code>ovs-nbd</code> made in > the > + previous step. As long as the VM that owns the VIF is powered off, > + <code>ovn-controller</code> cannot do much; it cannot, for example, > + arrange to send packets to or receive packets from the VIF, because the > + VIF does not actually exist anywhere. > + </li> > + > + <li> > + Eventually, a user powers on the VM that owns the VIF. On the > hypervisor > + where the VM is powered on, the integration between the hypervisor and > + Open vSwitch (described in <code>IntegrationGuide.md</code>) adds the > VIF > + to the OVN integration bridge and stores <var>vif-id</var> in > + <code>external-ids</code>:<code>iface-id</code> to indicate that the > + interface is an instantiation of the new VIF. (None of this code is > new > + in OVN; this is pre-existing integration work that has already been > done > + on hypervisors that support OVS.) > + </li> > + > + <li> > + On the hypervisor where the VM is powered on, > <code>ovn-controller</code> > + notices <code>external-ids</code>:<code>iface-id</code> in the new > + Interface. In response, it updates the local hypervisor's OpenFlow > + tables so that packets to and from the VIF are properly handled. > + Afterward, it updates the <code>Bindings</code> table in the OVN DB, > + adding a row that links the logical port from > + <code>external-ids</code>:<code>iface-id</code> to the hypervisor. > + </li> > + > + <li> > + Some CMS systems, including OpenStack, fully start a VM only when its > + networking is ready. To support this, <code>ovn-nbd</code> notices the > + new row in the <code>Bindings</code> table, and pushes this upward by > + updating the <ref column="up" table="Logical_Port" db="OVN_NB"/> column > + in the OVN Northbound database's <ref table="Logical_Port" > db="OVN_NB"/> > + table to indicate that the VIF is now up. The CMS, if it uses this > + feature, can then react by allowing the VM's execution to proceed. > + </li> > + > + <li> > + On every hypervisor but the one where the VIF resides, > + <code>ovn-controller</code> notices the new row in the > + <code>Bindings</code> table. This provides <code>ovn-controller</code> > + the physical location of the logical port, so each instance updates the > + OpenFlow tables of its switch (based on logical datapath flows in the > OVN > + DB <code>Pipeline</code> table) so that packets to and from the VIF can > + be properly handled via tunnels. > + </li> > + > + <li> > + Eventually, a user powers off the VM that owns the VIF. On the > + hypervisor where the VM was powered on, the VIF is deleted from the OVN > + integration bridge. > + </li> > + > + <li> > + On the hypervisor where the VM was powered on, > + <code>ovn-controller</code> notices that the VIF was deleted. In > + response, it removes the logical port's row from the > + <code>Bindings</code> table. > + </li> > + > + <li> > + On every hypervisor, <code>ovn-controller</code> notices the row > removed > + from the <code>Bindings</code> table. This means that > + <code>ovn-controller</code> no longer knows the physical location of > the > + logical port, so each instance updates its OpenFlow table to reflect > + that. > + </li> > + > + <li> > + Eventually, when the VIF (or its entire VM) is no longer needed by > + anyone, an administrator deletes the VIF using the CMS user interface > or > + API. The CMS updates its own configuration. > + </li> > + > + <li> > + The CMS plugin removes the VIF from the OVN Northbound database, > + by deleting its row in the <code>Logical_Port</code> table. > + </li> > + > + <li> > + <code>ovs-nbd</code> receives the OVN Northbound update and in turn > + updates the OVN database accordingly, by removing or updating the > + rows from the OVN database <code>Pipeline</code> table that were > related > + to the now-destroyed VIF. > + </li> > + > + <li> > + On every hypervisor, <code>ovn-controller</code> receives the > + <code>Pipeline</code> table updates that <code>ovs-nbd</code> made in > the > + previous step. <code>ovn-controller</code> updates OpenFlow tables to > + reflect the update, although there may not be much to do, since the VIF > + had already become unreachable when it was removed from the > + <code>Bindings</code> table in a previous step. > + </li> > + </ol> > + > +</manpage> > diff --git a/ovn/ovn-controller.8.in b/ovn/ovn-controller.8.in > new file mode 100644 > index 0000000..59fcb59 > --- /dev/null > +++ b/ovn/ovn-controller.8.in > @@ -0,0 +1,41 @@ > +.\" -*- nroff -*- > +.de IQ > +. br > +. ns > +. IP "\\$1" > +.. > +.TH ovn\-controller 8 "@VERSION@" "Open vSwitch" "Open vSwitch Manual" > +.ds PN ovn\-controller > +. > +.SH NAME > +ovn\-controller \- OVN local controller > +. > +.SH SYNOPSIS > +\fBovn\-controller\fR [\fIoptions\fR] > +. > +.SH DESCRIPTION > +\fBovn\-controller\fR is the local controller daemon for OVN, the Open > +Virtual Network. It connects northbound to the OVN database (see > +\fBovn\fR(5)) over the OVSDB protocol, and southbound to the Open > +vSwitch database (see \fBovs-vswitchd.conf.db\fR(5)) over the OVSDB > +protocol and to \fBovs\-vswitchd\fR(8) via OpenFlow. Each hypervisor > +and software gateway in an OVN deployment runs its own independent > +copy of \fBovn\-controller\fR; thus, \fBovn\-controller\fR's > +southbound connections are machine-local and do not run over a > +physical network. > +.PP > +XXX this is completely skeletal. > +. > +.SH OPTIONS > +.SS "Public Key Infrastructure Options" > +.so lib/ssl.man > +.so lib/ssl-peer-ca-cert.man > +.ds DD > +.so lib/daemon.man > +.so lib/vlog.man > +.so lib/unixctl.man > +.so lib/common.man > +. > +.SH "SEE ALSO" > +. > +\fBovn\-architecture\fR(7) > diff --git a/ovn/ovn-nb.ovsschema b/ovn/ovn-nb.ovsschema > new file mode 100644 > index 0000000..ad675ac > --- /dev/null > +++ b/ovn/ovn-nb.ovsschema > @@ -0,0 +1,62 @@ > +{ > + "name": "OVN_Northbound", > + "tables": { > + "Logical_Switch": { > + "columns": { > + "router_port": {"type": {"key": {"type": "uuid", > + "refTable": > "Logical_Router_Port", > + "refType": "strong"}, > + "min": 0, "max": 1}}, > + "external_ids": { > + "type": {"key": "string", "value": "string", > + "min": 0, "max": "unlimited"}}}}, > + "Logical_Port": { > + "columns": { > + "switch": {"type": {"key": {"type": "uuid", > + "refTable": "Logical_Switch", > + "refType": "strong"}}}, > + "name": {"type": "string"}, > + "macs": {"type": {"key": "string", > + "min": 0, > + "max": "unlimited"}}, > + "port_security": {"type": {"key": "string", > + "min": 0, > + "max": "unlimited"}}, > + "up": {"type": {"key": "boolean", "min": 0, "max": 1}}, > + "external_ids": { > + "type": {"key": "string", "value": "string", > + "min": 0, "max": "unlimited"}}}, > + "indexes": [["name"]]}, > + "ACL": { > + "columns": { > + "switch": {"type": {"key": {"type": "uuid", > + "refTable": "Logical_Switch", > + "refType": "strong"}}}, > + "priority": {"type": {"key": {"type": "integer", > + "minInteger": 0, > + "maxInteger": 65535}}}, > + "match": {"type": "string"}, > + "action": {"type": {"key": {"type": "string", > + "enum": ["set", ["allow", > "allow-related", "drop", "reject"]]}}}, > + "log": {"type": "boolean"}, > + "external_ids": { > + "type": {"key": "string", "value": "string", > + "min": 0, "max": "unlimited"}}}}, > + "Logical_Router": { > + "columns": { > + "ip": {"type": "string"}, > + "default_gw": {"type": {"key": "string", "min": 0, "max": > 1}}, > + "external_ids": { > + "type": {"key": "string", "value": "string", > + "min": 0, "max": "unlimited"}}}}, > + "Logical_Router_Port": { > + "columns": { > + "router": {"type": {"key": {"type": "uuid", > + "refTable": "Logical_Router", > + "refType": "strong"}}}, > + "network": {"type": "string"}, > + "mac": {"type": "string"}, > + "external_ids": { > + "type": {"key": "string", "value": "string", > + "min": 0, "max": "unlimited"}}}}}, > + "version": "1.0.0"} > diff --git a/ovn/ovn-nb.xml b/ovn/ovn-nb.xml > new file mode 100644 > index 0000000..80190ca > --- /dev/null > +++ b/ovn/ovn-nb.xml > @@ -0,0 +1,245 @@ > +<?xml version="1.0" encoding="utf-8"?> > +<database name="ovn-nb" title="OVN Northbound Database"> > + <p> > + This database is the interface between OVN and the cloud management > system > + (CMS), such as OpenStack, running above it. The CMS produces almost all > of > + the contents of the database. The <code>ovs-nbd</code> program monitors > + the database contents, transforms it, and stores it into the <ref > + db="OVN"/> database. > + </p> > + > + <p> > + We generally speak of ``the'' CMS, but one can imagine scenarios in > + which multiple CMSes manage different parts of an OVN deployment. > + </p> > + > + <h2>External IDs</h2> > + > + <p> > + Each of the tables in this database contains a special column, named > + <code>external_ids</code>. This column has the same form and purpose > each > + place it appears. > + </p> > + > + <dl> > + <dt><code>external_ids</code>: map of string-string pairs</dt> > + <dd> > + Key-value pairs for use by the CMS. The CMS might use certain pairs, > for > + example, to identify entities in its own configuration that correspond > to > + those in this database. > + </dd> > + </dl> > + > + <table name="Logical_Switch" title="L2 logical switch"> > + <p> > + Each row represents one L2 logical switch. A given switch's ports are > + the <ref table="Logical_Port"/> rows whose <ref table="Logical_Port" > + column="switch"/> column points to its row. > + </p> > + > + <column name="router_port"> > + <p> > + The router port to which this logical switch is connected, or empty > if > + this logical switch is not connected to any router. A switch may be > + connected to at most one logical router, but this is not a > significant > + restriction because logical routers may be connected into arbitrary > + topologies. > + </p> > + </column> > + > + <group title="Common Columns"> > + <column name="external_ids"> > + See <em>External IDs</em> at the beginning of this document. > + </column> > + </group> > + </table> > + > + <table name="Logical_Port" title="L2 logical switch port"> > + <p> > + A port within an L2 logical switch. > + </p> > + > + <column name="switch"> > + The logical switch to which the logical port is connected. > + </column> > + > + <column name="name"> > + The logical port name. The name used here must match those used in the > + <ref key="iface-id" table="Interface" column="external_ids" > + db="Open_vSwitch"/> in the <ref db="Open_vSwitch"/> database's <ref > + table="Interface" db="Open_vSwitch"/> table, because hypervisors use > <ref > + key="iface-id" table="Interface" column="external_ids" > + db="Open_vSwitch"/> as a lookup key for logical ports. > + </column> > + > + <column name="up"> > + This column is populated by <code>ovn-nbd</code>, rather than by the > CMS > + plugin as is most of this database. When a logical port is bound to a > + physical location in the OVN database <ref db="OVN" table="Bindings"/> > + table, <code>ovn-nbd</code> sets this column to <code>true</code>; > + otherwise, or if the port becomes unbound later, it sets it to > + <code>false</code>. This allows the CMS to wait for a VM's networking > to > + become active before it allows the VM to start. > + </column> > + > + <column name="macs"> > + The logical port's own Ethernet address or addresses, each in the form > + > <var>xx</var>:<var>xx</var>:<var>xx</var>:<var>xx</var>:<var>xx</var>:<var>xx</var>. > + Like a physical Ethernet NIC, a logical port ordinarily has a single > + fixed Ethernet address. The string <code>unknown</code> is also > allowed > + to indicate that the logical port has an unknown set of (additional) > + source addresses. > + </column> > + > + <column name="port_security"> > + <p> > + A set of L2 (Ethernet) or L3 (IPv4 or IPv6) addresses or L2+L3 pairs > + from which the logical port is allowed to send packets and to which > it > + is allowed to receive packets. If this column is empty, all > addresses > + are permitted. > + </p> > + > + <p> > + Exact syntax is TBD. One could simply use comma- or space-separated > L2 > + and L3 addresses in each set member, or replace this by a subset of > the > + general-purpose expression language used for the <ref column="match" > + table="Pipeline" db="OVN"/> column in the OVN database's <ref > + table="Pipeline" db="OVN"/> table. > + </p> > + </column> > + > + <group title="Common Columns"> > + <column name="external_ids"> > + See <em>External IDs</em> at the beginning of this document. > + </column> > + </group> > + </table> > + > + <table name="ACL" title="Access Control List (ACL) rule"> > + <p> > + Each row in this table represents one ACL rule for the logical switch > in > + its <ref column="switch"/> column. The <ref column="action"/> column > for > + the highest-<ref column="priority"/> matching row in this table > + determines a packet's treatment. If no row matches, packets are > allowed > + by default. (Default-deny treatment is possible: add a rule with <ref > + column="priority"/> 0, <code>true</code> as <ref column="match"/>, and > + <code>deny</code> as <ref column="action"/>.) > + </p> > + > + <column name="switch"> > + The switch to which the ACL rule applies. The expression in the > + <ref column="match"/> column may match against logical ports > + within this switch. > + </column> > + > + <column name="priority"> > + The ACL rule's priority. Rules with numerically higher priority take > + precedence over those with lower. If two ACL rules with the same > + priority both match, then the one actually applied to a packet is > + undefined. > + </column> > + > + <column name="match"> > + The packets that the ACL should match, in the same expression language > + used for the <ref column="match" table="Pipeline" db="OVN"/> column in > + the OVN database's <ref table="Pipeline" db="OVN"/> table. Match > + <code>inport</code> and <code>outport</code> against names of logical > + ports within <ref column="switch"/> to implement ingress and egress > ACLs, > + respectively. In logical switches connected to logical routers, the > + special port name <code>ROUTER</code> refers to the logical router > port. > + </column> > + > + <column name="action"> > + <p>The action to take when the ACL rule matches:</p> > + > + <ul> > + <li> > + <code>allow</code>: Forward the packet. > + </li> > + > + <li> > + <code>allow-related</code>: Forward the packet and related traffic > + (e.g. inbound replies to an outbound connection). > + </li> > + > + <li> > + <code>drop</code>: Silently drop the packet. > + </li> > + > + <li> > + <code>reject</code>: Drop the packet, replying with a RST for TCP or > + ICMP unreachable message for other IP-based protocols. > + </li> > + </ul> > + </column> > + > + <column name="log"> > + If set to <code>true</code>, packets that match the ACL will trigger a > + log message on the transport node or nodes that perform ACL processing. > + Logging may be combined with any <ref column="action"/>. > + </column> > + > + <group title="Common Columns"> > + <column name="external_ids"> > + See <em>External IDs</em> at the beginning of this document. > + </column> > + </group> > + </table> > + > + <table name="Logical_Router" title="L3 logical router"> > + <p> > + Each row represents one L3 logical router. A given router's ports are > + the <ref table="Logical_Router_Port"/> rows whose <ref > + table="Logical_Router_Port" column="router"/> column points to its row. > + </p> > + > + <column name="ip"> > + The logical router's own IP address. The logical router uses this > + address for ICMP replies (e.g. network unreachable messages) and other > + traffic that it originates and responds to traffic destined to this > + address (e.g. ICMP echo requests). > + </column> > + > + <column name="default_gw"> > + IP address to use as default gateway, if any. > + </column> > + > + <group title="Common Columns"> > + <column name="external_ids"> > + See <em>External IDs</em> at the beginning of this document. > + </column> > + </group> > + </table> > + > + <table name="Logical_Router_Port" title="L3 logical router port"> > + <p> > + A port within an L3 logical router. > + </p> > + > + <p> > + A router port is always attached to a switch port. The connection can > be > + identified by following the <ref column="router_port" > + table="Logical_Port"/> column from an appropriate <ref > + table="Logical_Port"/> row. > + </p> > + > + <column name="router"> > + The router to which the port belongs. > + </column> > + > + <column name="network"> > + The IP network and netmask of the network on the router port. Used for > + routing. > + </column> > + > + <column name="mac"> > + The Ethernet address that belongs to this router port. > + </column> > + > + <group title="Common Columns"> > + <column name="external_ids"> > + See <em>External IDs</em> at the beginning of this document. > + </column> > + </group> > + </table> > +</database> > diff --git a/ovn/ovn.ovsschema b/ovn/ovn.ovsschema > new file mode 100644 > index 0000000..5597df4 > --- /dev/null > +++ b/ovn/ovn.ovsschema > @@ -0,0 +1,50 @@ > +{ > + "name": "OVN", > + "tables": { > + "Chassis": { > + "columns": { > + "name": {"type": "string"}, > + "encap": {"type": {"key": {"type": "string", > + "enum": ["set", ["stt", "vxlan", > "gre"]]}}}, > + "encap_options": {"type": {"key": "string", > + "value": "string", > + "min": 0, > + "max": "unlimited"}}, > + "ip": {"type": "string"}, > + "gateway_ports": {"type": {"key": "string", > + "value": {"type": "uuid", > + "refTable": "Gateway", > + "refType": "strong"}, > + "min": 0, > + "max": "unlimited"}}}, > + "isRoot": true, > + "indexes": [["name"]]}, > + "Gateway": { > + "columns": {"attached_port": {"type": "string"}, > + "vlan_map": {"type": {"key": {"type": "integer", > + "minInteger": 0, > + "maxInteger": 4095}, > + "value": {"type": "string"}, > + "min": 0, > + "max": "unlimited"}}}}, > + "Pipeline": { > + "columns": { > + "table_id": {"type": {"key": {"type": "integer", > + "minInteger": 0, > + "maxInteger": 127}}}, > + "priority": {"type": {"key": {"type": "integer", > + "minInteger": 0, > + "maxInteger": 65535}}}, > + "match": {"type": "string"}, > + "actions": {"type": "string"}}, > + "isRoot": true}, > + "Bindings": { > + "columns": { > + "logical_port": {"type": "string"}, > + "chassis": {"type": "string"}, > + "mac": {"type": {"key": "string", > + "min": 0, > + "max": "unlimited"}}}, > + "indexes": [["logical_port"]], > + "isRoot": true}}, > + "version": "1.0.0"} > diff --git a/ovn/ovn.xml b/ovn/ovn.xml > new file mode 100644 > index 0000000..a233112 > --- /dev/null > +++ b/ovn/ovn.xml > @@ -0,0 +1,497 @@ > +<?xml version="1.0" encoding="utf-8"?> > +<database name="ovn" title="OVN Database"> > + <p> > + This database holds logical and physical configuration and state for the > + Open Virtual Network (OVN) system to support virtual network abstraction. > + For an introduction to OVN, please see <code>ovn-architecture</code>(7). > + </p> > + > + <p> > + The OVN database sits at the center of the OVN architecture. It is the > one > + component that speaks both southbound directly to all the hypervisors and > + gateways, via <code>ovn-controller</code>, and northbound to the Cloud > + Management System, via <code>ovn-nbd</code>: > + </p> > + > + <h2>Database Structure</h2> > + > + <p> > + The OVN database contains three classes of data with different > properties, > + as described in the sections below. > + </p> > + > + <h3>Physical Network (PN) data</h3> > + > + <p> > + PN tables contain information about the chassis nodes in the system. > This > + contains all the information necessary to wire the overlay, such as IP > + addresses, supported tunnel types, and security keys. > + </p> > + > + <p> > + The amount of PN data is small (O(n) in the number of chassis) and it > + changes infrequently, so it can be replicated to every chassis. > + </p> > + > + <p> > + The <ref table="Chassis"/> and <ref table="Gateway"/> tables comprise the > + PN tables. > + </p> > + > + <h3>Logical Network (LN) data</h3> > + > + <p> > + LN tables contain the topology of logical switches and routers, ACLs, > + firewall rules, and everything needed to describe how packets traverse a > + logical network, represented as logical datapath flows (see Logical > + Datapath Flows, below). > + </p> > + > + <p> > + LN data may be large (O(n) in the number of logical ports, ACL rules, > + etc.). Thus, to improve scaling, each chassis should receive only data > + related to logical networks in which that chassis participates. Past > + experience shows that in the presence of large logical networks, even > + finer-grained partitioning of data, e.g. designing logical flows so that > + only the chassis hosting a logical port needs related flows, pays off > + scale-wise. (This is not necessary initially but it is worth bearing in > + mind in the design.) > + </p> > + > + <p> > + The LN is a slave of the cloud management system running northbound of > OVN. > + That CMS determines the entire OVN logical configuration and therefore > the > + LN's content at any given time is a deterministic function of the CMS's > + configuration, although that happens indirectly via the OVN Northbound DB > + and <code>ovn-nvd</code>. > + </p> > + > + <p> > + LN data is likely to change more quickly than PN data. This is > especially > + true in a container environment where VMs are created and destroyed (and > + therefore added to and deleted from logical switches) quickly. > + </p> > + > + <p> > + The <ref table="Pipeline"/> table is currently the only LN table. > + </p> > + > + <h3>Bindings data</h3> > + > + <p> > + The Bindings tables contain the current placement of logical components > + (such as VMs and VIFs) onto chassis and the bindings between logical > ports > + and MACs. > + </p> > + > + <p> > + Bindings change frequently, at least every time a VM powers up or down > + or migrates, and especially quickly in a container environment. The > + amount of data per VM (or VIF) is small. > + </p> > + > + <p> > + Each chassis is authoritative about the VMs and VIFs that it hosts at any > + given time and can efficiently flood that state to a central location, so > + the consistency needs are minimal. > + </p> > + > + <p> > + The <ref table="Bindings"/> table is currently the only Bindings table. > + </p> > + > + <table name="Chassis" title="Physical Network Hypervisor and Gateway > Information"> > + <p> > + Each row in this table represents a hypervisor or gateway (a chassis) > in > + the physical network (PN). Each chassis, via > + <code>ovn-controller</code>, adds and updates its own row, and keeps a > + copy of the remaining rows to determine how to reach other hypervisors. > + </p> > + > + <p> > + When a chassis shuts down gracefully, it should remove its own row. > + (This is not critical because resources hosted on the chassis are > equally > + unreachable regardless of whether the row is present.) If a chassis > + shuts down permanently without removing its row, some kind of manual or > + automatic cleanup is eventually needed; we can devise a process for > that > + as necessary. > + </p> > + > + <column name="name"> > + A chassis name, taken from <ref key="system-id" table="Open_vSwitch" > + column="external_ids" db="Open_vSwitch"/> in the Open_vSwitch > + database's <ref table="Open_vSwitch" db="Open_vSwitch"/> table. OVN > does > + not prescribe a particular format for chassis names. > + </column> > + > + <group title="Encapsulation"> > + <p> > + These columns together identify how OVN may transmit logical > dataplane > + packets to this chassis. > + </p> > + > + <column name="encap"> > + The encapsulation to use to transmit packets to this chassis. > + </column> > + > + <column name="encap_options"> > + Options for configuring the encapsulation, e.g. IPsec parameters when > + IPsec support is introduced. No options are currently defined. > + </column> > + > + <column name="ip"> > + The IPv4 address of the encapsulation tunnel endpoint. > + </column> > + </group> > + > + <group title="Gateway Configuration"> > + <p> > + A <dfn>gateway</dfn> is a chassis that forwards traffic between a > + logical network and a physical VLAN. Gateways are typically > dedicated > + nodes that do not host VMs. > + </p> > + > + <column name="gateway_ports"> > + Maps from the name of a gateway port, which is typically a physical > + port (e.g. <code>eth1</code>) or an Open vSwitch patch port, to a > <ref > + table="Gateway"/> record that describes the details of the gatewaying > + function. > + </column> > + </group> > + </table> > + > + <table name="Gateway" title="Physical Network Gateway Ports"> > + <p> > + The <ref column="gateway_ports" table="Chassis"/> column in the <ref > + table="Chassis"/> table refers to rows in this table to connect a > chassis > + port to a gateway function. Each row in this table describes the > logical > + networks to which a gateway port is attached. Each chassis, via > + <code>ovn-controller</code>(8), adds and updates its own rows, if any > + (since most chassis are not gateways), and keeps a copy of the > remaining > + rows to determine how to reach other chassis. > + </p> > + > + <column name="vlan_map"> > + Maps from a VLAN ID to a logical port name. Thus, each named logical > + port corresponds to one VLAN on the gateway port. > + </column> > + > + <column name="attached_port"> > + The name of the gateway port in the chassis's Open vSwitch integration > + bridge. > + </column> > + </table> > + > + <table name="Pipeline" title="Logical Network Pipeline"> > + <p> > + Each row in this table represents one logical flow. The cloud > management > + system, via its OVN integration, populates this table with logical > flows > + that implement the L2 and L3 topology specified in the CMS > configuration. > + Each hypervisor, via <code>ovn-controller</code>, translates the > logical > + flows into OpenFlow flows specific to its hypervisor and installs them > + into Open vSwitch. > + </p> > + > + <p> > + Logical flows are expressed in an OVN-specific format, described here. > A > + logical datapath flow is much like an OpenFlow flow, except that the > + flows are written in terms of logical ports and logical datapaths > instead > + of physical ports and physical datapaths. Translation between logical > + and physical flows helps to ensure isolation between logical datapaths. > + (The logical flow abstraction also allows the CMS to do less work, > since > + it does not have to separately compute and push out physical physical > + flows to each chassis.) > + </p> > + > + <p> > + The default action when no flow matches is to drop packets. > + </p> > + > + <column name="table_id"> > + The stage in the logical pipeline, analogous to an OpenFlow table > number. > + </column> > + > + <column name="priority"> > + The flow's priority. Flows with numerically higher priority take > + precedence over those with lower. If two logical datapath flows with > the > + same priority both match, then the one actually applied to the packet > is > + undefined. > + </column> > + > + <column name="match"> > + <p> > + A matching expression. OVN provides a superset of OpenFlow matching > + capabilities, using a syntax similar to Boolean expressions in a > + programming language. > + </p> > + > + <p> > + Matching expressions have two important kinds of primary expression: > + <dfn>fields</dfn> and <dfn>constants</dfn>. A field names a piece of > + data or metadata. The supported fields are: > + </p> > + > + <ul> > + <li> > + <code>metadata</code> <code>reg0</code> ... <code>reg7</code> > + <code>xreg0</code> ... <code>xreg3</code> > + </li> > + <li><code>inport</code> <code>outport</code> <code>queue</code></li> > + <li><code>eth.src</code> <code>eth.dst</code> > <code>eth.type</code></li> > + <li><code>vlan.tci</code> <code>vlan.vid</code> > <code>vlan.pcp</code> <code>vlan.present</code></li> > + <li><code>ip.proto</code> <code>ip.dscp</code> <code>ip.ecn</code> > <code>ip.ttl</code> <code>ip.frag</code></li> > + <li><code>ip4.src</code> <code>ip4.dst</code></li> > + <li><code>ip6.src</code> <code>ip6.dst</code> > <code>ip6.label</code></li> > + <li><code>arp.op</code> <code>arp.spa</code> <code>arp.tpa</code> > <code>arp.sha</code> <code>arp.tha</code></li> > + <li><code>tcp.src</code> <code>tcp.dst</code> > <code>tcp.flags</code></li> > + <li><code>udp.src</code> <code>udp.dst</code></li> > + <li><code>sctp.src</code> <code>sctp.dst</code></li> > + <li><code>icmp4.type</code> <code>icmp4.code</code></li> > + <li><code>icmp6.type</code> <code>icmp6.code</code></li> > + <li><code>nd.target</code> <code>nd.sll</code> > <code>nd.tll</code></li> > + </ul> > + > + <p> > + Subfields may be addressed using a <code>[]</code> suffix, > + e.g. <code>tcp.src[0..7]</code> refers to the low 8 bits of the TCP > + source port. A subfield may be used in any context a field is > allowed. > + </p> > + > + <p> > + Some fields have prerequisites. OVN implicitly adds clauses to > satisfy > + these. For example, <code>arp.op == 1</code> is equivalent to > + <code>eth.type == 0x0806 && arp.op == 1</code>, and > + <code>tcp.src == 80</code> is equivalent to <code>(eth.type == 0x0800 > + || eth.type == 0x86dd) && ip.proto == 6 && tcp.src == > + 80</code>. > + </p> > + > + <p> > + Most fields have integer values. Integer constants may be expressed > in > + several forms: decimal integers, hexadecimal integers prefixed by > + <code>0x</code>, dotted-quad IPv4 addresses, IPv6 addresses in their > + standard forms, and as Ethernet addresses as colon-separated hex > + digits. A constant in any of these forms may be followed by a slash > + and a second constant (the mask) in the same form, to form a masked > + constant. IPv4 and IPv6 masks may be given as integers, to express > + CIDR prefixes. > + </p> > + > + <p> > + The <code>inport</code> and <code>outport</code> fields have string > + values. The useful values are <ref column="logical_port"/> names > from > + the <ref column="Bindings"/> and <ref column="Gateway"/> table. > + </p> > + > + <p> > + The available operators, from highest to lowest precedence, are: > + </p> > + > + <ul> > + <li><code>()</code></li> > + <li><code>== != < <= > >= in not > in</code></li> > + <li><code>!</code></li> > + <li><code>&&</code></li> > + <li><code>||</code></li> > + </ul> > + > + <p> > + The <code>()</code> operator is used for grouping. > + </p> > + > + <p> > + The equality operator <code>==</code> is the most important operator. > + Its operands must be a field and an optionally masked constant, in > + either order. The <code>==</code> operator yields true when the > + field's value equals the constant's value for all the bits included > in > + the mask. The <code>==</code> operator translates simply and > naturally > + to OpenFlow. > + </p> > + > + <p> > + The inequality operator <code>!=</code> yields the inverse of > + <code>==</code> but its syntax and use are the same. Implementation > of > + the inequality operator is expensive. > + </p> > + > + <p> > + The relational operators are <, <=, >, and >=. Their > + operands must be a field and a constant, in either order; the > constant > + must not be masked. These operators are most commonly useful for L4 > + ports, e.g. <code>tcp.src < 1024</code>. Implementation of the > + relational operators is expensive. > + </p> > + > + <p> > + The set membership operator <code>in</code>, with syntax > + ``<code><var>field</var> in { <var>constant1</var>, > + <var>constant2</var>,</code> ... <code>}</code>'', is syntactic sugar > + for ``<code>(<var>field</var> == <var>constant1</var> || > + <var>field</var> == <var>constant2</var> || </code>...<code>)</code>. > + Conversely, ``<code><var>field</var> not in { <var>constant1</var>, > + <var>constant2</var>, </code>...<code> }</code>'' is syntactic sugar > + for ``<code>(<var>field</var> != <var>constant1</var> && > + <var>field</var> != <var>constant2</var> && > + </code>...<code>)</code>''. > + </p> > + > + <p> > + The unary prefix operator <code>!</code> yields its operand's > inverse. > + </p> > + > + <p> > + The logical AND operator <code>&&</code> yields true only if > + both of its operands are true. > + </p> > + > + <p> > + The logical OR operator <code>||</code> yields true if at least one > of > + its operands is true. > + </p> > + > + <p> > + Finally, the keywords <code>true</code> and <code>false</code> may > also > + be used in matching expressions. <code>true</code> is useful by > itself > + as a catch-all expression that matches every packet. > + </p> > + > + <p> > + (The above is pretty ambitious. It probably makes sense to initially > + implement only a subset of this specification. The full > specification > + is written out mainly to get an idea of what a fully general matching > + expression language could include.) > + </p> > + </column> > + > + <column name="actions"> > + <p> > + Below, a <var>value</var> is either a <var>constant</var> or a > + <var>field</var>. The following actions seem most likely to be > useful: > + </p> > + > + <dl> > + <dt><code>drop;</code></dt> > + <dd>syntactic sugar for no actions</dd> > + > + <dt><code>output(<var>value</var>);</code></dt> > + <dd>output to port</dd> > + > + <dt><code>broadcast;</code></dt> > + <dd>output to every logical port except ingress port</dd> > + > + <dt><code>resubmit;</code></dt> > + <dd>execute next logical datapath table as subroutine</dd> > + > + <dt><code>set(<var>field</var>=<var>value</var>);</code></dt> > + <dd>set data or metadata field, or copy between fields</dd> > + </dl> > + > + <p> > + Following are not well thought out: > + </p> > + > + <dl> > + <dt><code>learn</code></dt> > + > + <dt><code>conntrack</code></dt> > + > + <dt><code>with(<var>field</var>=<var>value</var>) { > <var>action</var>, </code>...<code> }</code></dt> > + <dd>execute <var>actions</var> with temporary changes to > <var>fields</var></dd> > + > + <dt><code>dec_ttl { <var>action</var>, </code>...<code> } { > <var>action</var>; </code>...<code>}</code></dt> > + <dd> > + decrement TTL; execute first set of actions if > + successful, second set if TTL decrement fails > + </dd> > + > + <dt><code>icmp_reply { <var>action</var>, </code>...<code> > }</code></dt> > + <dd>generate ICMP reply from packet, execute > <var>action</var>s</dd> > + > + <dt><code>arp { <var>action</var>, </code>...<code> }</code></dt> > + <dd>generate ARP from packet, execute <var>action</var>s</dd> > + </dl> > + > + <p> > + Other actions can be added as needed > + (e.g. <code>push_vlan</code>, <code>pop_vlan</code>, > + <code>push_mpls</code>, <code>pop_mpls</code>). > + </p> > + > + <p> > + Some of the OVN actions do not map directly to OpenFlow actions, > e.g.: > + </p> > + > + <ul> > + <li> > + <code>with</code>: Implemented as <code>stack_push; > + set(</code>...<code>); <var>actions</var>; stack_pop</code>. > + </li> > + > + <li> > + <code>dec_ttl</code>: Implemented as <code>dec_ttl</code> followed > + by the successful actions. The failure case has to be implemented > by > + ovn-controller interpreting packet-ins. It might be difficult to > + identify the particular place in the processing pipeline in > + <code>ovn-controller</code>; maybe some restrictions will be > + necessary. > + </li> > + > + <li> > + <code>icmp_reply</code>: Implemented by sending the packet to > + <code>ovn-controller</code>, which generates the ICMP reply and > sends > + the packet back to <code>ovs-vswitchd</code>. > + </li> > + </ul> > + </column> > + </table> > + > + <table name="Bindings" title="Physical-Logical Bindings"> > + <p> > + Each row in this table identifies the physical location of a logical > + port. Each hypervisor, via <code>ovn-controller</code>, populates this > + table with rows for the logical ports that are located on its > hypervisor, > + which <code>ovn-controller</code> in turn finds out by monitoring the > + local hypervisor's Open_vSwitch database, which identifies logical > ports > + via the conventions described in <code>IntegrationGuide.md</code>. > + </p> > + > + <p> > + When a chassis shuts down gracefully, it should remove its bindings. > + (This is not critical because resources hosted on the chassis are > equally > + unreachable regardless of whether their rows are present.) To handle > the > + case where a VM is shut down abruptly on one chassis, then brought up > + again on a different one, <code>ovn-controller</code> must delete any > + existing <ref table="Binding"/> record for a logical port when it adds > a > + new one. > + </p> > + > + <column name="logical_port"> > + A logical port, taken from <ref key="iface-id" table="Interface" > + column="external_ids" db="Open_vSwitch"/> in the Open_vSwitch > database's > + <ref table="Interface" db="Open_vSwitch"/> table. OVN does not > prescribe > + a particular format for the logical port ID. > + </column> > + > + <column name="chassis"> > + The physical location of the logical port. To successfully identify a > + chassis, this column must match the <ref table="Chassis" > column="name"/> > + column in some row in the <ref table="Chassis"/> table. > + </column> > + > + <column name="mac"> > + <p> > + The Ethernet address or addresses used as a source address on the > + logical port, each in the form > + > <var>xx</var>:<var>xx</var>:<var>xx</var>:<var>xx</var>:<var>xx</var>:<var>xx</var>. > + The string <code>unknown</code> is also allowed to indicate that the > + logical port has an unknown set of (additional) source addresses. > + </p> > + > + <p> > + A VM interface would ordinarily have a single Ethernet address. A > + gateway port might initially only have <code>unknown</code>, and then > + add MAC addresses to the set as it learns new source addresses. > + </p> > + </column> > + </table> > +</database> > -- > 2.1.3 > > _______________________________________________ > dev mailing list > [email protected] > http://openvswitch.org/mailman/listinfo/dev _______________________________________________ dev mailing list [email protected] http://openvswitch.org/mailman/listinfo/dev
