Hello all,

=== Background material, skip this if you just want to know what I've done ===

For some time I've been wishing for a way to combine the best aspects of 
consistent hashing and "balance leastconn". I use haproxy to distribute traffic 
across a cluster of servers doing dynamic video delivery. Without going into 
too much detail, there's a substantial cache effect — there's a bunch of 
information that needs to be calculated for each video, and if that 
information is cached, then the request is quite a bit faster and uses less 
resources. Also, the server cluster runs in the cloud and auto-scales based on 
load.

Initially our approach was to cache the information locally on each server, 
and use uri-based balancing with consistent hashing to distribute the traffic 
among servers according to video ID. This gave a satisfying number of cache 
hits, but had the problem that a single server could become overloaded by 
receiving the traffic for one or two exceptionally popular videos, resulting in 
bad performance.

Our second whack was to change to "balance leastconn" (which destroys the 
cache hit ratio) and add a second layer of shared cache. This gives more 
consistent performance, but the bandwidth used for cache coherency is 
substantial, and the cache servers create a single point of failure.

What I really wanted was a policy that uses consistent hashing most of the 
time, but at the same time, avoids overloading any given server, by diverting 
traffic from an overloaded server to a less-loaded server, in a consistent 
order 
based on the consistent hash ring. I did some experimenting with algorithms, 
but didn't find a satisfactory one.

=== End of background material, here's the paper and the patch ===

Recently I came upon the paper "Consistent Hashing with Bounded Loads", 
http://arxiv.org/abs/1608.01350 . It describes a very simple algorithm for 
doing exactly what I wanted. It defines a parameter c > 1, and enforces that no 
server receives more than ceil(c * avg(conns)) connections. If a server is 
already at the limit, it goes around the hash ring until it finds one that is 
below the limit, and assigns the connection to that server. A large "c" 
provides stricter hashing (a greater chance of requests with the same hash key 
going to the same server), while a small "c" provides stricter balancing (less 
difference between the smallest and largest number of connections to a 
server). I simulated this algorithm, found that it would work better than the 
ones I tested before, and then implemented it in HAProxy. On testing in 
production I saw reduced 95th-percentile response times, and a great reduction 
in cache-coherency traffic.

I'm attaching the patch that I tested. However, I don't expect that it is in 
any way mergeable. I'm not very familiar with the HAProxy codebase and I did a 
number of things that are almost certainly wrong. I replaced the existing 
chash_get_server_hash instead of attempting to add a new method or flag to the 
configuration. The "c" parameter is hard-coded instead of configurable, as 
haproxy currently has no awareness of non-integer numbers, and I wasn't sure 
how to deal with that. My code for enumerating servers and assigning 
capacities is probably suboptimal, although it didn't cause a noticeable bump 
in CPU usage on my production systems. There are probably any number of 
coding-style gaffes. And there is at least one bug that causes haproxy to 
become unresponsive when servers are marked down.

That being said, I would be thrilled to work with someone to repair all of 
these problems, if there is interest in having this feature in HAProxy. I 
think that it's an exciting feature and it would be really great to add to 
HAProxy's already excellent set of load-balancing tools. I just need a little 
guidance to make it happen.

My changes are on Github at https://github.com/arodland/haproxy and they're 
up-to-date against haproxy master. I'd encourage anyone to review the code, 
and to contact me by email, on IRC (I'm on #haproxy as hobbs), or on GitHub.

Thank you,

Andrew

Reply via email to