Re: Help with dynamic backend selection
- Original Message - From: Willy Tarreau w...@1wt.eu To: Rajat Chopra rcho...@redhat.com Cc: haproxy haproxy@formilux.org Sent: Saturday, May 10, 2014 12:01:14 AM Subject: Re: Help with dynamic backend selection On Sat, May 10, 2014 at 07:58:25AM +0200, Willy Tarreau wrote: May I ask about the ETA on this? It's too early for me to know, I need to go down deep into the ebtrees first to see if longest match is compatible with strings storage-wise, then I need to study how patterns are built as trees to see how to do that as well. Possibly it's just one or two days of work once I understand everything. OK in the end it was extremely easy :-) Thierry has done an amazing job at making the pattern management very modular, because I just changed the index and lookup to try in a tree first with a different algorithm and that works fine! So we don't care about the compatibility between regular string match and beginning. So that's pushed into git now if you want to give it a try. Thanks a lot Willy. It works as intended. I am trying to spin a real large case and check the performance. Will report. Thanks again. Rajat
Re: Help with dynamic backend selection
- Original Message - From: Willy Tarreau w...@1wt.eu To: Rajat Chopra rcho...@redhat.com Cc: haproxy haproxy@formilux.org Sent: Friday, May 9, 2014 10:43:32 AM Subject: Re: Help with dynamic backend selection Hi Rajat, On Tue, May 06, 2014 at 07:34:34PM -0400, Rajat Chopra wrote: Hi! The new feature to dynamically select a backend has been going great for me. I use it like this and it all works like a charm : use_backend bk_%[hdr(host),host_to_backend.mapfile] if TRUE Now I need help with a situation. Some of the servers meant for a particular hostname need path replacement. i.e. reqrep/resrep. e.g. I have a set of 5 servers. While three of them need to be balanced for any hostname/service1 requests, the other two need to be load balanced for hostname/service2 requests. How can I dynamically choose between two backends depending on hostname+path_beg? Can I use something like this? use_backend bk_%[hdr(host),host_to_backend.mapfile]_%[path] where a request to hostname/service1 would mean look for a backend named bk_hostname_service1. Clearly it will not work like that because hostname/service1/login.php will break the backend lookup. Absolutely. And you'd have to have as many backends as possible paths, which is not practical! Why don't you run that based on ACLs if you're certain that service2 will not cause any trouble ? Eg: use_backend bk_%[hdr(host),host_to_backend.mapfile]_%[path] if { path_beg /service2 } use_backend bk_%[hdr(host),host_to_backend.mapfile] But again, that requires that you know the exhaustive list of valid paths for service2. I don't know if that's your case or not, but that could be the solution. Yes. Thats the only solution I could come up with. I would have the exhaustive list as the script generates the config file, but I worry that with 150k backends, what if each backend comes up with its own unique path_beg requirement. It would defeat the purpose of the map. Also. In this statement (use_backend bk_%[hdr(host),host_to_backend.mapfile]_%[path] if { path_beg /service2 }) using _%[path] is not good, right? We need to use 'bk_%[hdr(host),map(mapfile)]_service2', right? Wonder if we could use some pluggable logic for map. e.g. User can provide a .so with the user defined map function. So my call will become : use_backend bk_%[hdr(host),path,myown_func(path_to_dot_so)] if TRUE And at config time, haproxy does a dlopen on path_to_dot_so, and expects that dlsym finds a myown_func method. One that returns a string when given the two arguments (host,path). It may be useful for other dynamic user defined acls also. Far fetched feature, eh? :) Thanks for the help again. Rajat
Re: Help with dynamic backend selection
Wonder if we could use some pluggable logic for map. e.g. User can provide a .so with the user defined map function. So my call will become : It's still early to do this, there are strict types on the patterns and their own alloc/release/insert/delete functions. use_backend bk_%[hdr(host),path,myown_func(path_to_dot_so)] if TRUE And at config time, haproxy does a dlopen on path_to_dot_so, and expects that dlsym finds a myown_func method. One that returns a string when given the two arguments (host,path). It may be useful for other dynamic user defined acls also. In practice we should not need to define that many new ACLs since they're supposed to rely on the sample fetch functions. However, sample fetch functions can very easily be added. Far fetched feature, eh? :) No that much. I'm thinking about something. Maps can retrieve the first matching prefix using map_beg. So if you pass base to your map, you'll have the host+path on input, and take the first matching prefix. Thus, it will give you this : use_backend bk_%[base,map_beg(mapfile)] The file would contain all the host+path sorted from the longest to the shortest (basically just a sort -r) : domain1/path1/ backend1 domain1/path2/ backend2 domain1/ backend3 domain2/path1/ backend1 domain2/ backend4 The only problem is that the beg match needs to iterate over all entries, so it will be slower than a tree based lookup, but much faster than running 150k rules! That looks great. Humble opinion here, or just my discomfort maybe. The map_beg will certainly solve my current case, but it feels like that a generic user callable function would offload the myriad other cases that might crop up in the future. Pretty sure someone will ask for a map_reg down the line - a map with keys as compiled regexes is compelling. And further, I think that with some tweeking, we could make the beg match use the longest prefix match function of the ebtrees. I remember that there's something tricky about it related to the string length, but if we need to pad with a zero or something like this, it might not be a big deal. So maybe in the end we could improve the map match to lookup host+uri prefixes faster than we do today. That would be great for URL or Location rewriting! May I ask about the ETA on this? Greedily, With sincere thanks. Rajat
Help with dynamic backend selection
Hi! The new feature to dynamically select a backend has been going great for me. I use it like this and it all works like a charm : use_backend bk_%[hdr(host),host_to_backend.mapfile] if TRUE Now I need help with a situation. Some of the servers meant for a particular hostname need path replacement. i.e. reqrep/resrep. e.g. I have a set of 5 servers. While three of them need to be balanced for any hostname/service1 requests, the other two need to be load balanced for hostname/service2 requests. How can I dynamically choose between two backends depending on hostname+path_beg? Can I use something like this? use_backend bk_%[hdr(host),host_to_backend.mapfile]_%[path] where a request to hostname/service1 would mean look for a backend named bk_hostname_service1. Clearly it will not work like that because hostname/service1/login.php will break the backend lookup. Advice/suggestions please. Thanks. Rajat
Re: [PATCH] proxy: support use_backend with dynamic names
Hi Steven, With the patch from Bertrand, you should not need many ACLs I believe. For the thousands of backends, yes, I did have the issue with huge startup times, but it has been solved with recent commits from Willy. 'Huge' is relative obviously - it came down from 15minutes or so to 8s, and for me it is reasonable enough now. I changed a few things in my config file too and I have posted the optimizations in the stackoverflow post - e.g. use fullconn 1000 in defaults and use IP addresses for destinations instead of DNS. Did you try the latest code from git? Send an example of your config file otherwise and I am sure the experts on the list will be able to help. Best, Rajat - Original Message - From: Steven Le Roux ste...@le-roux.info To: Rajat Chopra rcho...@redhat.com Cc: haproxy haproxy@formilux.org Sent: Monday, March 31, 2014 4:04:55 PM Subject: Re: [PATCH] proxy: support use_backend with dynamic names Hi ! Since I experienced the same behaviour with a similar configuration, don't you have a huge startup time due to the ACL parsing ? -- Steven Le Roux Le 28 mars 2014 01:59, Rajat Chopra rcho...@redhat.com a écrit : Hi! This solution very much solves the problem that I have been facing i.e. large number of acl rules causing latency in requests. Been in discussions separately about it and today I got a chance to test out this patch. I report that it works great! I have been able to route 150k backends with this and the latency added because of the dynamic lookup is in order of microseconds (compared to 24ms earlier). The usage 'use_backend bk_%[hdr(Host)] if TRUE' works for my use-case but originally I was wondering if one could do a map based lookup for the backend. As posted here : http://stackoverflow.com/questions/22025412/how-to-use-thousands-of-backends-in-haproxy-is-the-new-map-feature-useful-for-t Most of the issues in the above question are now solved, but I tested this with the patch - use_backend bk_%[hdr(Host), map(host_to_backend_map.file)] if TRUE And it does not work. I am not yet familiar with code to determine why this does not work. Again, the current proposal works well for me but an enhancement should probably consider using maps within dynamic lookup. +1 for the patch. Thanks. Rajat Hi Bertrand, On Sun, Mar 23, 2014 at 04:18:44PM +0100, Bertrand Jacquin wrote: Hi, I did this patch for dev19 some time ago but I am still not sure whether it is the best way to do it or not, and did not have the time to discuss it since. As the latest changes broke it and forced me to rebase it, and it's very useful for us, I'd like to propose it for inclusion before the final release if you think it's OK, or to discuss how it should be done. Great! Main purpose wanted to achieve is it be able to use many backends without the need to declare each routing process from frontend to backend and instead use generic and dynamic switching when a sane parameter can be used from user request using the logformat logic. For example when we have a backend farm dedicated to each 'Host: ' http-header, it's pain in the ass to have to declare the backend and the relevant use_backend. Yes I know there's this request coming from time to time. In fact it was even planned to work like this before we finally went with ACLs and use_backend, but we felt it would be a too limited design (eg: no choice of other routing key). With the proposed solution, you first need to declare a dynamic use_backend as the following : use_backend bk_cust_%[hdr(Host)] if { hdr(Host) -m found } And then to declare the needed backend. For every new vhost hosted will only need to add the backend section to the configuration. I'm not opposed to the feature at all, in fact I've even been involved in a discussion about something more or less in this vein recently. But I'm having some fears about the use of the %[] form in a use_backend directive. Indeed, this string format was initially done only for logformat. Then it was adopted for unique-id. Then for all http-request directives. And we start to see from time to time people trying to use it in places which have no relation with it (eg: in ACL declaration). I'm seeing several solutions in fact : - yours above - append some argument to use_backend to indicate that it's a logformat string or a dynamic backend (eg: use_backend -d foo%[bar]), but -d might be a valid backend name, so ... - have a different directive name (eg: use_backend_dyn or use_backend_lf), but that might increase the confusion for some users who will not necessarily know that they're part of the same ruleset. - put use_backend in http-request rules and clearly state that only http-request can
Re: [PATCH] proxy: support use_backend with dynamic names
Haproxy 1.5 and earlier cut the lines in words around spaces, so above your expression does not work because it's split in two. Just remove the space before map and it will do exactly what you need. Also I think it's better to use a map than the plain header because this way you can ensure what the exact list of accessible backends will be, and you can also get rid of some matching details such as lower/upper case etc... Thanks for the pointer. Removed the space, and it works. Do you think that use-server would simplify your configuration (eg: have only one backend with thousands of servers, one per host) ? We could then imagine something like this : frontend foo use_backend bk_%[hdr(host),map(host_to_backend.map)] bk_host_xxx server 1 ... server 2 ... bk_virtual_hosted use-server srv_%[hdr(host),map(host_to_server.map)] server srv_1 ... server srv_2 ... The idea would be to have a few backends for multi-server hosts, and a single backend with all single-server hosts and as many servers. But maybe this would make the config more complex in the end, I don't know. I think the natural next step is the usedst server directive that people who want to run proxies are requesting :-) I generate the configuration through a script, so I guess not much difference unless you think it is more optimal to run single-server backends by grouping them. On the other hand, it appears cumbersome when such host-servers start to scale-up and we have to move it from being a stag server to an actual backend for load-balancing. Not sure I understand the usedst directive. Dynamic lookup of server's target destination? :), Rajat
Re: [PATCH] proxy: support use_backend with dynamic names
Hi! This solution very much solves the problem that I have been facing i.e. large number of acl rules causing latency in requests. Been in discussions separately about it and today I got a chance to test out this patch. I report that it works great! I have been able to route 150k backends with this and the latency added because of the dynamic lookup is in order of microseconds (compared to 24ms earlier). The usage 'use_backend bk_%[hdr(Host)] if TRUE' works for my use-case but originally I was wondering if one could do a map based lookup for the backend. As posted here : http://stackoverflow.com/questions/22025412/how-to-use-thousands-of-backends-in-haproxy-is-the-new-map-feature-useful-for-t Most of the issues in the above question are now solved, but I tested this with the patch - use_backend bk_%[hdr(Host), map(host_to_backend_map.file)] if TRUE And it does not work. I am not yet familiar with code to determine why this does not work. Again, the current proposal works well for me but an enhancement should probably consider using maps within dynamic lookup. +1 for the patch. Thanks. Rajat Hi Bertrand, On Sun, Mar 23, 2014 at 04:18:44PM +0100, Bertrand Jacquin wrote: Hi, I did this patch for dev19 some time ago but I am still not sure whether it is the best way to do it or not, and did not have the time to discuss it since. As the latest changes broke it and forced me to rebase it, and it's very useful for us, I'd like to propose it for inclusion before the final release if you think it's OK, or to discuss how it should be done. Great! Main purpose wanted to achieve is it be able to use many backends without the need to declare each routing process from frontend to backend and instead use generic and dynamic switching when a sane parameter can be used from user request using the logformat logic. For example when we have a backend farm dedicated to each 'Host: ' http-header, it's pain in the ass to have to declare the backend and the relevant use_backend. Yes I know there's this request coming from time to time. In fact it was even planned to work like this before we finally went with ACLs and use_backend, but we felt it would be a too limited design (eg: no choice of other routing key). With the proposed solution, you first need to declare a dynamic use_backend as the following : use_backend bk_cust_%[hdr(Host)] if { hdr(Host) -m found } And then to declare the needed backend. For every new vhost hosted will only need to add the backend section to the configuration. I'm not opposed to the feature at all, in fact I've even been involved in a discussion about something more or less in this vein recently. But I'm having some fears about the use of the %[] form in a use_backend directive. Indeed, this string format was initially done only for logformat. Then it was adopted for unique-id. Then for all http-request directives. And we start to see from time to time people trying to use it in places which have no relation with it (eg: in ACL declaration). I'm seeing several solutions in fact : - yours above - append some argument to use_backend to indicate that it's a logformat string or a dynamic backend (eg: use_backend -d foo%[bar]), but -d might be a valid backend name, so ... - have a different directive name (eg: use_backend_dyn or use_backend_lf), but that might increase the confusion for some users who will not necessarily know that they're part of the same ruleset. - put use_backend in http-request rules and clearly state that only http-request can use logformat, but then it means that we can't do it on TCP, and it can further confuse users who will try to chain multiple backends using http-request rules in backends. So in the end, I tend to think that your solution might still be the best one, or the least confusing one. But I'd like to read other people's comments about this, maybe someone has a better idea. More detailed usage and implementation in patch itself. It was rebased on commit 0e9b1b4d1f0efc5e46a10371d9be21e97581faab. OK thanks! A current limitation of that patch can be that if a dynamic use_backend is evaluated and the named backend is not found, the default_backend is then used. This is not a need we have, but some may complain about it. Well, *all* use_backend rules are final today. So if the condition after use_backend is true, then the rule is executed and the evaluation is stopped. So I think this is the proper behaviour. Doing it differently could instead increase confusion, because if we changed the way it works, some people might then complain for example that when use_backend directs to a backend whose all servers are dead, we ought to go back to evaluate all the rules instead. Also I could easily see some security issues by not having this behaviour, because is multiple dynamic rules are chained and it is not at