On Wed, Mar 25, 2020 at 1:03 PM Ihar Hrachyshka <ihrac...@redhat.com> wrote: > > On Mon, Mar 23, 2020 at 7:47 PM Ben Pfaff <b...@ovn.org> wrote: > > > > On Mon, Mar 23, 2020 at 06:39:14PM -0400, Ihar Hrachyshka wrote: > > > First, some questions as to implementation (or feasibility) of several > > > todo items in my list for the patch. > > > > > > 1) I initially thought that, because VXLAN would have limited space > > > for both networks and ports in its VNI, the encap type would not be > > > able to support as many of both as Geneve / STT, and so we would need > > > to enforce the limit programmatically somehow. But in OVN context, is > > > it even doable? North DB resources may be created before any chassis > > > are registered; once a chassis that is VXLAN only joins, it's too late > > > to forbid the spilling resources from existence (though it may be a > > > good time to detect this condition and perhaps fail to register the > > > chassis / configure flow tables). How do we want to handle this case? > > > Do we fail to start VXLAN configured ovn-controller when too many > > > networks / ports per network created? Do we forbid creating too many > > > resources when a chassis is registered that is VXLAN only? Both? Or do > > > we leave it up to the deployment / CMS to control the chassis / north > > > DB configuration? > > > > > > 2) Similar to the issue above, I originally planned to forbid using > > > ACLs relying on ingress port when a VXLAN chassis is involved (because > > > the VNI won't carry the information). I believe the approach should be > > > similar to how we choose to handle the issue with the maximum number > > > of resources, described above. > > > > > > I am new to OVN so maybe there are existing examples for such > > > situations already that I could get inspiration from. Let me know what > > > you think. > > > > I don't have good solutions for the above resource limit problems. We > > designed OVN so that this kind of resource limit wouldn't be a problem > > in practice, so we didn't think through what would happen if the limits > > suddenly became more stringent. > > > > I think that it falls upon the CMS by default. > > > > For ACLs, I think it's fair to put the burden on CMS (just because it > should be easy for them to follow the simple rule: "Don't use ingress > matching ACLs in your OVN driver.") > > While having a guard against overflowing resource number limits in CMS > may be helpful (for example, for immediate failure mode feedback to > CMS user - compare to async notification about a CMS resource to OVSDB > primitive conversion), > > I believe OVN should handle the case too. The risk of not doing it is > - the limits are reached, and we start to send traffic that belongs to > one network to another, because their lower 12 bits of datapath ID are > the same. > > While CMS could guard against that, it may be less aware about chassis > configuration than OVN. A dumb way to resolve this in CMS would be > having a global configuration option set by deployment tool that > configures OVN and that would know whether any VXLAN capable chassis > are deployed in the cluster. A more proper way to solve it would be to > make CMS aware of chassis configuration by maintaining a cache of > Chassis table records and checking their encap types on each network / > port created. > > The same could be done by OVN itself, and arguably OVN is the owner of > the data source (encap records) and is in a better position to control > it: > > 1. on network creation, if VXLAN is enabled on any chassis, count > networks; if result >= limit, fail; same for ports per network; > 2. on ovn-controller start, if VXLAN is enabled for the chassis, > calculate networks / ports per network; if result >= limit, fail to > start the service. > > Note that in most common scenario, all chassis have the same > encapsulation types registered; there are multiple ovn-controller > nodes; and resources are created after all chassis are registered in > the database. So point (2) above is to handle a corner case that > probably won't ever happen in real life. (1) is a hot path. > > Any specific objections to having this kind of guards in OVN itself? > This may be in addition to CMS side guards (to avoid even trying to > create CMS resources that are known to fail to sync to OVN). > > (A similar approach may be extended to ACLs allowed though it's not as > pressing because there are no known CMS that rely on unsupported > ACLs.) >
The more I think about the issue the more important it looks that OVN is aware of VXLAN limitations and guards against overflowing the number of resources. Here is why. While CMS could relatively easy control the overall number of resources in database - it should be aware of its own resource records - it does not, in general case, control tunnel keys selected for datapaths. Meaning, OVN allocates the IDs on Datapath_Binding creation. OVN selects datapath IDs sequentially, starting from 1 up to max value for the 24-bit ID, then wraps to the start. A problem with this approach may occur when after a significant number of networks were created and then deleted, the "next tunnel ID" counter moves to the "edge" of 12-bit space available for unique VXLAN datapath identifiers. Then once a new logical switch creation request is submitted, OVN may allocate an ID that would have the same lower 12-bits of the new datapath ID as another existing switch (the final 24-bit datapath ID would be unique but that won't translate into a unique ID passed to a remote hypervisor through VXLAN VNI due to the proposed 12/12-bit split scheme). This is probably a bit convoluted, so to give an example, consider there is a network A with datapath ID = 0b000000000000000000000001. When VXLAN is enabled, we truncate the datapath ID to 12-bits before setting it to outgoing packet metadata. Then network B is created with datapath ID = 0b000000000001000000000001. (Note two bits set.) This unique datapath ID will map to the same 12-bit value when setting it for the outgoing packet, making traffic from one network to flow to another network. Note that in this example, the number of switches in the database is below the maximum number allowed for VXLAN (2^12). The only way CMS could guard against this scenario is monitoring all tunnel keys allocated to all datapaths and explicitly requesting tunnel keys when creating new switches, doing it in a way that would not produce a 12-bit clash. (There is already the `requested-tnl-key` option for this.) It is not a good idea to offload tunnel key management onto CMS (or at least it's not a good idea to assume that all CMS implement this correctly, considering that the risk of not doing so has serious tenant privacy and connectivity implications). My belief is OVN should detect VXLAN enabled in cluster, in which case datapath ID range to allocate to new switches would be halved. (2^24 -> 2^12) This would involve additional database server work; specifically, ovsdb-server would need to, on switch and port creation, detect VXLAN mode by fetching (probably subscribing and caching) all chassis encaps and checking if any have VXLAN enabled, and if so, adjust the maximum allowed value for datapath IDs to 2^12. Another issue that I initially haven't considered that is related to the available space for port IDs is that I assumed 2^12 port IDs available per network in the proposed solution; but I missed that OVN allocates separate sub-range for multicast groups that occupies half of the total range for port IDs. (The reserved multicast space is IDs 32768 through 65535.) Perhaps having 2^11 for unique port IDs is still ok but since we already reduced the available limits pretty significantly, this is something to keep in mind. Let me know what you think. Ihar _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev