Hi! This email is an update regarding stateless load balancing, along with a couple of questions before I start working on the implementation.
In my previous email (https://mail.openvswitch.org/pipermail/ovs-dev/2026-March/431365.html <https://mail.openvswitch.org/pipermail/ovs-dev/2026-March/431365.html>), I described several approaches for how I see the implementation of stateless load balancing in OVN. To summarize the points from the previous email: I described the idea of deferring conntrack usage from the gateway node to the compute node where the virtual machine is located, as well as the main issue with this approach: since backend selection happens in a stateless manner and we do not store the connection in conntrack on the gateway, some connections may break during backend reconfiguration. Also, thanks to Dumitru for the idea he suggested. His approach is to use conntrack on the gateway node and store the MAC address of the selected backend in conntrack labels, which solves the session persistence problem. However, this makes it mandatory for the return traffic to pass through the same gateway that initially handled the connection. In general, I think this idea could be made more generic — instead of doing any L2 balancing with MAC selection on the gateway node, we could fully rely on DNAT in this case. The main requirement would still be ensuring that the return traffic goes back through the same gateway. This is exactly the part I could not figure out for our topology: we have two gateways (in the simplest case, there may be more) for incoming traffic, while the backend VMs are located on different compute nodes. Here is the topology diagram for your convenience: https://s3.ru-msk.k2.cloud/stateless-lb-topology/stateless-lb-topology.drawio <https://s3.ru-msk.k2.cloud/stateless-lb-topology/stateless-lb-topology.drawio> If I understand correctly, I cannot use ecmp-symmetric-reply in such a topology. Such a route has to be attached to a router that has a chassis assigned to it, meaning it has to be a centralized router bound to some chassis, and our topology does not have such a router. At this point, based on the analysis from the previous email, I would like to start implementing the following approach: using L2 stateless balancing on the gateway node and then do dnat relying on conntrack on the compute node where the virtual machine is located, while using rendezvous hashing in OVS. Currently, rendezvous hashing works only with the hash selection method, which has the downside of requiring an upcall for every SYN packet. I made a small hack to reuse the code path currently used for dp_hash, but without using Webster distribution that rn used in dp_hash, and instead selecting the backend using rendezvous hashing (https://github.com/Sashhkaa/ovs/commit/ad82205ed4df125e7072c6b2e480c26e4af297ae <https://github.com/Sashhkaa/ovs/commit/ad82205ed4df125e7072c6b2e480c26e4af297ae>) I tested this by incrementally inserting and removing buckets (insert-buckets/ remove-buckets commands in ovs-ofctrl), and it behaves as expected, similarly to the existing hash method - when I remove backend, established connections on other backends are not broken, and when I add a backend, only 1/(n + 1) sessions are rebuilt. Unfortunately, I do not yet have enough understanding now to determine whether there are any fundamental limitations with this approach. The way I currently see it is the following: I would use dp_hash at the datapath level, meaning there would be a single upcall for a group of packets going through the load balancer, after which I would get recirculation for each individual hash value that was calculated, and then in upcall processing of recirc packet i select the backend for the recirculation flow using rendezvous hashing how hash work. The most obvious downside I currently see is that rendezvous hashing is inherently more expensive than the current dp_hash approach because it calculates the hash for each backend, so I plan to measure the performance impact under high traffic load and high backends number I was also considering possible differences in traffic distribution across backends, but if my understanding of the math is correct, both algorithms should provide roughly similar distribution properties. So here my questions: Did I miss anything at first glance and how valid would it be to use dp_hash in such an implementation? If there are no strict limitations requiring this approach, would it make sense to introduce a new selection_method, for example smth like consistent-dp-hash, which would still calculate packet hash at the datapath level, but would use rendezvous hashing instead of the Webster method for backend selection? It is also possible that I am missing something and that there is a way to implement this without patching OVS ? If this approach is considered valid, then my next steps would be implementing the changes described above, extending the ovn-controller so that bucket updates happen incrementally without deleting and recreating the group, and finishing the northd changes that I attached in the previous email. Also, I wanted to ask this last time but forgot: perhaps we should clarify the limitations of the stateless LB option in the documentation? I mean both the limitation that a backend can only belong to a single LB, as well as the fact that changing the number of backends may cause existing connections to break. Thanks a lot! _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
