Hi HAProxy List,

I’ve run into an issue with the stick tables/peering issue that may be of 
interest to some of you.

I’ve got a fleet of 10 proxy servers peering with each other, fronting several 
backend servers.  I have a very simple stick table setup which I’ve pasted 
examples of below.  Basically I use a URL parameter to control server 
stickiness.

This works great, and is an amazing solution to a sticky problem for our 
BOSH-based XMPP messaging, as long as the stick table entries stay in sync.  
However, sometimes one HAProxy instance will lose one or more entries which are 
still present on the others.

This state persists between minutes and hours, in which the out-of-sync server 
continues to receive updates on some entries but is missing others.

A restart of the server can resolve the issue by causing the table to refresh, 
but this is less than ideal.

When it occurs, it appears that all the other servers continue to update the 
“TTL” on the entry, but the errant server slowly allows the entry to expire and 
be removed.
I have developed a tool which pulls the stick table from each proxy and 
compares the entries.  There’s obviously some room for expiry times to be 
different on each proxy, but I’d expect that entries which are regularly 
refreshed on all other peers should be propagated everywhere.

I suspect somehow either ephemeral network connectivity between the peers or 
some other error, but I haven’t seen anything in the logs that seem relevant.  

lsof analysis of open TCP sockets shows all peers connected on 1024 as expected.

I wondered if this list would have any ideas on further avenues for analysis on 
this particular problem.  I’ve seen this happen consistently on HAProxy 1.6 and 
1.7 through several point releases of each.  If anything it seems more frequent 
in 1.7.

Please let me know if you have any good ideas or if anyone has seen behavior 
like this before. 

Thanks,

-Aaron van Meerten

Below is the example of my peer and stick table configuration, extracted from a 
larger haproxy.cfg
If there’s more info that’d help track this down, I’m happy to provide it.


peers mypeers
 peer hcv-chaos-haproxy-13056 XX.XX.130.56:1024
 peer hcv-chaos-haproxy-230228 XX.XX.230.228:1024
 peer hcv-chaos-haproxy-35147 XX.XX.35.147:1024
 peer hcv-chaos-haproxy-10660 10.186.3.137:1024
 peer hcv-chaos-haproxy-9682 XX.XX.96.82:1024
 peer hcv-chaos-haproxy-239179 XX.XX.239.179:1024
 peer hcv-chaos-haproxy-246171 XX.XX.246.171:1024
 peer hcv-chaos-haproxy-68128 XX.XX.68.128:1024
 peer hcv-chaos-haproxy-151101 XX.XX.151.101:1024
 peer hcv-chaos-haproxy-207217 XX.XX.207.217:1024


backend nodes
  redirect scheme https if !{ ssl_fc }

# make sure we send the client's ip
 option forwardfor

  balance url_param room
  hash-type consistent
  stick-table type string len 128 size 20k peers mypeers expire 5m
  stick on url_param(room) table nodes

  #example server
   server chaos-us-east-1a-s0 XX.XX.XX.XXX:443 id 10 ssl verify none check port 
8888 inter 5s fastinter 1s fall 2 rise 30

Reply via email to