Re: How to redispatch a request after queue timeout
Hi Lukas, On Fri, Aug 30, 2013 at 10:27:34AM +0200, Lukas Tribus wrote: Is there a way to redispatch the requests to other servers ignoring the persistence on queue timeout? No; after a queue timeout (configurable by timeout queue), haproxy will drop the request. Think of what would happen when we redispatch on queue timeout: The request would wait seconds for a free slot on the backend and only then be redispatched to another server. This would delay many requests and hide the problem from you, while your customers furiously switch to a competitor. It could be even worse if you do so with caches : this will end up mixing objects among all nodes when one cache takes a long time to respond, resulting in a lower hit rate, so a higher general load increasing this problem, and you could very well end up killing the whole farm in a domino effect. Willy
Re: How to redispatch a request after queue timeout
On Fri, Aug 30, 2013 at 02:10:50PM +0530, Sachin Shetty wrote: Thanks Lukas. Yes, I was hoping to workaround by setting a smaller maxqueue limit and queue timeout. So what other options do we have, I need to: 1. Send all requests for a host (mytest.mydomain.com) to one backend as long as it can serve. 2. If the backend is swamped, it should go to any other backend available. I'm wondering if we should not try to implement this when the hash type is set to consistent. The principle of the consistent hash precisely is that we want the closest node but we know that sometimes things will be slightly redistributed (eg: when adding/removing a server in the farm). So maybe it would make sense to specify that when using consistent hash, if a server has a maxqueue parameter and this maxqueue is reached, then look for the closest server. That might be OK with caches as well as the ones close to each other tend to share a few objects when the farm size changes. What do others think ? Willy
Re: How to redispatch a request after queue timeout
Thanks Willy. I am precisely using it for caching. I need requests to go to the same nodes for cache hits, but when the node is already swamped I would prefer a cache miss over a 503. Thanks Sachin On 8/31/13 12:57 PM, Willy Tarreau w...@1wt.eu wrote: On Fri, Aug 30, 2013 at 02:10:50PM +0530, Sachin Shetty wrote: Thanks Lukas. Yes, I was hoping to workaround by setting a smaller maxqueue limit and queue timeout. So what other options do we have, I need to: 1. Send all requests for a host (mytest.mydomain.com) to one backend as long as it can serve. 2. If the backend is swamped, it should go to any other backend available. I'm wondering if we should not try to implement this when the hash type is set to consistent. The principle of the consistent hash precisely is that we want the closest node but we know that sometimes things will be slightly redistributed (eg: when adding/removing a server in the farm). So maybe it would make sense to specify that when using consistent hash, if a server has a maxqueue parameter and this maxqueue is reached, then look for the closest server. That might be OK with caches as well as the ones close to each other tend to share a few objects when the farm size changes. What do others think ? Willy
Re: How to redispatch a request after queue timeout
We did try consistent hashing, but I found better distribution without it. We don¹t add or remove servers often so we should be ok. Our total pool is sized correctly and we are able to serve 100% requests when we use roundrobin, however sticky on host is what causes some nodes to hit maxconn. My goal is to never send a 503 as long as we have other nodes available which is always the case in our pool. Thanks Sachin On 8/31/13 1:17 PM, Willy Tarreau w...@1wt.eu wrote: On Sat, Aug 31, 2013 at 01:03:34PM +0530, Sachin Shetty wrote: Thanks Willy. I am precisely using it for caching. I need requests to go to the same nodes for cache hits, but when the node is already swamped I would prefer a cache miss over a 503. Then you should already be using hash-type consistent, otherwise when you lose or add a server, you redistribute everything and will end up with only about 1/#cache at the same place and all the rest with misses. Not many cache architectures resist to this, really. Interestingly, a long time ago I wanted to have some outgoing rules (they're on the diagram in the doc directory). The idea was to be able to apply some processing *after* the LB algorithm was called. Such processing could include detecting the selected server's queue size or any such thing and decide to force to use another server. But in practice it doesn't play well with the current sequencing so it was never done. It could have been useful in such a situation I think. I'll wait a bit for others to step up about the idea of redistributing connections only for consistent hashing. I really don't want to break existing setups (eventhough I think in this case it should be OK). Willy
Re: How to redispatch a request after queue timeout
On Sat, Aug 31, 2013 at 01:27:41PM +0530, Sachin Shetty wrote: We did try consistent hashing, but I found better distribution without it. That's known and normal. We don¹t add or remove servers often so we should be ok. It depends on what you do with them in fact, because most places will not accept that the whole farm goes down due to a server falling down causing 100% redistribution. If you have reverse caches in general it is not a big issue because the number of objects is very limited and caches can quickly refill. But outgoing caches generally take ages to fill up. Our total pool is sized correctly and we are able to serve 100% requests when we use roundrobin, however sticky on host is what causes some nodes to hit maxconn. My goal is to never send a 503 as long as we have other nodes available which is always the case in our pool. OK so if we perform the proposed change it will not match your usage since you're not using consistent hashing anyway. So we might have to add another explicit option such as loose/strict assignment of the server. We could have 3 levels BTW : - no-queue : find another server if the destination is full - loose: find another server if the destination has reached maxqueue - strict : never switch to another server I would just like to find how to do something clean for the map-based hash that you're using without recomputing a map excluding the unusable server(s) but trying to stick as much as possible to the same servers to optimize hit rate. Maybe scanning the table for the next usable server will be enough, though it will not match the same servers as the ones used in case of a change of the farm size. This could be a limitation that has to be accepted for this. Willy
Re: How to redispatch a request after queue timeout
Yes, no-queue is what I am looking for. If consistent hashing is easier to accommodate this change, I won't mind switching to consistent hashing when the fix is available. Right now with no support for maxqueue failover, consistent hashing is even more severe for our use case. Thanks again Willy. Thanks Sachin On 8/31/13 2:17 PM, Willy Tarreau w...@1wt.eu wrote: On Sat, Aug 31, 2013 at 01:27:41PM +0530, Sachin Shetty wrote: We did try consistent hashing, but I found better distribution without it. That's known and normal. We don¹t add or remove servers often so we should be ok. It depends on what you do with them in fact, because most places will not accept that the whole farm goes down due to a server falling down causing 100% redistribution. If you have reverse caches in general it is not a big issue because the number of objects is very limited and caches can quickly refill. But outgoing caches generally take ages to fill up. Our total pool is sized correctly and we are able to serve 100% requests when we use roundrobin, however sticky on host is what causes some nodes to hit maxconn. My goal is to never send a 503 as long as we have other nodes available which is always the case in our pool. OK so if we perform the proposed change it will not match your usage since you're not using consistent hashing anyway. So we might have to add another explicit option such as loose/strict assignment of the server. We could have 3 levels BTW : - no-queue : find another server if the destination is full - loose: find another server if the destination has reached maxqueue - strict : never switch to another server I would just like to find how to do something clean for the map-based hash that you're using without recomputing a map excluding the unusable server(s) but trying to stick as much as possible to the same servers to optimize hit rate. Maybe scanning the table for the next usable server will be enough, though it will not match the same servers as the ones used in case of a change of the farm size. This could be a limitation that has to be accepted for this. Willy
Re: How to redispatch a request after queue timeout
On Sat, Aug 31, 2013 at 02:25:47PM +0530, Sachin Shetty wrote: Yes, no-queue is what I am looking for. If consistent hashing is easier to accommodate this change, I wouldn't say it's easier, better that it's less complicated :-/ I won't mind switching to consistent hashing when the fix is available. Right now with no support for maxqueue failover, consistent hashing is even more severe for our use case. OK, I'll think about this. I can't provide any ETA though. Willy
RE: How to redispatch a request after queue timeout
Hi Sachin, We want to maintain stickiness to a backed server based on host header so balance hdr(host) works pretty well for us, however as soon at the backend hits max connection, requests pile up in the queue eventually timeout with a 503 and –sQ in the logs. I don't think balance hdr(host) is the correct load-balancing method then. Is there a way to redispatch the requests to other servers ignoring the persistence on queue timeout? No; after a queue timeout (configurable by timeout queue), haproxy will drop the request. Think of what would happen when we redispatch on queue timeout: The request would wait seconds for a free slot on the backend and only then be redispatched to another server. This would delay many requests and hide the problem from you, while your customers furiously switch to a competitor. You should find another way to better distribute the load across your backends. Lukas
Re: How to redispatch a request after queue timeout
Thanks Lukas. Yes, I was hoping to workaround by setting a smaller maxqueue limit and queue timeout. So what other options do we have, I need to: 1. Send all requests for a host (mytest.mydomain.com) to one backend as long as it can serve. 2. If the backend is swamped, it should go to any other backend available. Thanks Sachin On 8/30/13 1:57 PM, Lukas Tribus luky...@hotmail.com wrote: Hi Sachin, We want to maintain stickiness to a backed server based on host header so balance hdr(host) works pretty well for us, however as soon at the backend hits max connection, requests pile up in the queue eventually timeout with a 503 and sQ in the logs. I don't think balance hdr(host) is the correct load-balancing method then. Is there a way to redispatch the requests to other servers ignoring the persistence on queue timeout? No; after a queue timeout (configurable by timeout queue), haproxy will drop the request. Think of what would happen when we redispatch on queue timeout: The request would wait seconds for a free slot on the backend and only then be redispatched to another server. This would delay many requests and hide the problem from you, while your customers furiously switch to a competitor. You should find another way to better distribute the load across your backends. Lukas