Hi!
This email is an update regarding stateless load balancing, along with a 
couple of questions before I start working on the implementation.

In my previous email 
(https://mail.openvswitch.org/pipermail/ovs-dev/2026-March/431365.html 
<https://mail.openvswitch.org/pipermail/ovs-dev/2026-March/431365.html>), 
I described several approaches for how I see the implementation of 
stateless load balancing in OVN.

To summarize the points from the previous email:

I described the idea of deferring conntrack usage from the gateway node 
to the compute node where the virtual machine is located, as well as the 
main issue with this approach: since backend selection happens in a 
stateless manner and we do not store the connection in conntrack on the 
gateway, some connections may break during backend reconfiguration.

Also, thanks to Dumitru for the idea he suggested.

His approach is to use conntrack on the gateway node and store the MAC 
address of the selected backend in conntrack labels, which solves the 
session persistence problem. However, this makes it mandatory for the 
return traffic to pass through the same gateway that initially handled 
the connection. In general, I think this idea could be made more generic 
— instead of doing any L2 balancing with MAC selection on the gateway 
node, we could fully rely on DNAT in this case. The main requirement 
would still be ensuring that the return traffic goes back through the 
same gateway. This is exactly the part I could not figure out for our 
topology: we have two gateways (in the simplest case, there may be more) 
for incoming traffic, while the backend VMs are located on different 
compute nodes.

Here is the topology diagram for your convenience:
https://s3.ru-msk.k2.cloud/stateless-lb-topology/stateless-lb-topology.drawio 
<https://s3.ru-msk.k2.cloud/stateless-lb-topology/stateless-lb-topology.drawio>

If I understand correctly, I cannot use ecmp-symmetric-reply in such a 
topology. Such a route has to be attached to a router that has a chassis 
assigned to it, meaning it has to be a centralized router bound to some 
chassis, and our topology does not have such a router.

At this point, based on the analysis from the previous email, I would 
like to start implementing the following approach: using L2 stateless 
balancing on the gateway node and then do dnat relying on conntrack on 
the compute node where the virtual machine is located, while using 
rendezvous hashing in OVS.

Currently, rendezvous hashing works only with the hash selection method, 
which has the downside of requiring an upcall for every SYN packet. I 
made a small hack to reuse the code path currently used for dp_hash, but 
without using Webster distribution that rn used in dp_hash, and instead 
selecting the backend using rendezvous hashing 
(https://github.com/Sashhkaa/ovs/commit/ad82205ed4df125e7072c6b2e480c26e4af297ae
 
<https://github.com/Sashhkaa/ovs/commit/ad82205ed4df125e7072c6b2e480c26e4af297ae>)

I tested this by incrementally inserting and removing buckets 
(insert-buckets/ remove-buckets commands in ovs-ofctrl), and it behaves 
as expected, similarly to the existing hash method - when I remove 
backend, established connections on other backends are not broken, and 
when I add a backend, only 1/(n + 1) sessions are rebuilt.

Unfortunately, I do not yet have enough understanding now to determine 
whether there are any fundamental limitations with this approach. The 
way I currently see it is the following: I would use dp_hash at the 
datapath level, meaning there would be a single upcall for a group of 
packets going through the load balancer, after which I would get 
recirculation for each individual hash value that was calculated, and 
then in upcall processing of recirc packet i select the backend for the 
recirculation flow using rendezvous hashing how hash work.

The most obvious downside I currently see is that rendezvous hashing is 
inherently more expensive than the current dp_hash approach because it 
calculates the hash for each backend, so I plan to measure the 
performance impact under high traffic load and high backends number

I was also considering possible differences in traffic distribution 
across backends, but if my understanding of the math is correct, both 
algorithms should provide roughly similar distribution properties.

So here my questions:
Did I miss anything at first glance and how valid would it be to use 
dp_hash in such an implementation? If there are no strict limitations 
requiring this approach, would it make sense to introduce a new 
selection_method, for example smth like consistent-dp-hash, which would 
still calculate packet hash at the datapath level, but would use 
rendezvous hashing instead of the Webster method for backend selection? 
It is also possible that I am missing something and that there is a way 
to implement this without patching OVS ?

If this approach is considered valid, then my next steps would be 
implementing the changes described above, extending the ovn-controller 
so that bucket updates happen incrementally without deleting and 
recreating the group, and finishing the northd changes that I attached 
in the previous email.

Also, I wanted to ask this last time but forgot: perhaps we should 
clarify the limitations of the stateless LB option in the documentation? 
I mean both the limitation that a backend can only belong to a single 
LB, as well as the fact that changing the number of backends may cause 
existing connections to break.

Thanks a lot!


_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to