Hi HAProxy List,

I posted the following approximately 2 weeks ago and was hoping that someone 
else might have experienced these inconsistencies within the stick tables 
between peers.  It seems to be an issue even in the latest release (HAProxy 
1.7.4).  I hope to get some guidance on what information I could collect which 
might be of interest to the developers or the community.

Would a tcpdump of the chatter between peers (on TCP port 1024) be of use?  I 
cannot always predict when the stick table corruption will occur, but I can try 
to collect some data about the traffic between the peers once the corruption 
has happened.

Or is there anything else I could be doing to increase the logging with 
relation to peer connections and stick table updates? At the moment I don’t see 
anything in the HAProxy logs related to this feature.

Thanks again for this amazing, product, I’m still a very happy user!

Cheers,

-Aaron


> On Mar 15, 2017, at 22:22, Aaron van Meerten <avanmeer...@atlassian.com> 
> wrote:
> 
> Hi HAProxy List,
> 
> I’ve run into an issue with the stick tables/peering issue that may be of 
> interest to some of you.
> 
> I’ve got a fleet of 10 proxy servers peering with each other, fronting 
> several backend servers.  I have a very simple stick table setup which I’ve 
> pasted examples of below.  Basically I use a URL parameter to control server 
> stickiness.
> 
> This works great, and is an amazing solution to a sticky problem for our 
> BOSH-based XMPP messaging, as long as the stick table entries stay in sync.  
> However, sometimes one HAProxy instance will lose one or more entries which 
> are still present on the others.
> 
> This state persists between minutes and hours, in which the out-of-sync 
> server continues to receive updates on some entries but is missing others.
> 
> A restart of the server can resolve the issue by causing the table to 
> refresh, but this is less than ideal.
> 
> When it occurs, it appears that all the other servers continue to update the 
> “TTL” on the entry, but the errant server slowly allows the entry to expire 
> and be removed.
> I have developed a tool which pulls the stick table from each proxy and 
> compares the entries.  There’s obviously some room for expiry times to be 
> different on each proxy, but I’d expect that entries which are regularly 
> refreshed on all other peers should be propagated everywhere.
> 
> I suspect somehow either ephemeral network connectivity between the peers or 
> some other error, but I haven’t seen anything in the logs that seem relevant. 
>  
> 
> lsof analysis of open TCP sockets shows all peers connected on 1024 as 
> expected.
> 
> I wondered if this list would have any ideas on further avenues for analysis 
> on this particular problem.  I’ve seen this happen consistently on HAProxy 
> 1.6 and 1.7 through several point releases of each.  If anything it seems 
> more frequent in 1.7.
> 
> Please let me know if you have any good ideas or if anyone has seen behavior 
> like this before. 
> 
> Thanks,
> 
> -Aaron van Meerten
> 
> Below is the example of my peer and stick table configuration, extracted from 
> a larger haproxy.cfg
> If there’s more info that’d help track this down, I’m happy to provide it.
> 
> 
> peers mypeers
> peer hcv-chaos-haproxy-13056 XX.XX.130.56:1024
> peer hcv-chaos-haproxy-230228 XX.XX.230.228:1024
> peer hcv-chaos-haproxy-35147 XX.XX.35.147:1024
> peer hcv-chaos-haproxy-10660 10.186.3.137:1024
> peer hcv-chaos-haproxy-9682 XX.XX.96.82:1024
> peer hcv-chaos-haproxy-239179 XX.XX.239.179:1024
> peer hcv-chaos-haproxy-246171 XX.XX.246.171:1024
> peer hcv-chaos-haproxy-68128 XX.XX.68.128:1024
> peer hcv-chaos-haproxy-151101 XX.XX.151.101:1024
> peer hcv-chaos-haproxy-207217 XX.XX.207.217:1024
> 
> 
> backend nodes
>  redirect scheme https if !{ ssl_fc }
> 
> # make sure we send the client's ip
> option forwardfor
> 
>  balance url_param room
>  hash-type consistent
>  stick-table type string len 128 size 20k peers mypeers expire 5m
>  stick on url_param(room) table nodes
> 
>  #example server
>   server chaos-us-east-1a-s0 XX.XX.XX.XXX:443 id 10 ssl verify none check 
> port 8888 inter 5s fastinter 1s fall 2 rise 30


Reply via email to