Re: Help with dynamic backend selection

2014-05-12 Thread Rajat Chopra




- Original Message -
 From: Willy Tarreau w...@1wt.eu
 To: Rajat Chopra rcho...@redhat.com
 Cc: haproxy haproxy@formilux.org
 Sent: Saturday, May 10, 2014 12:01:14 AM
 Subject: Re: Help with dynamic backend selection
 
 On Sat, May 10, 2014 at 07:58:25AM +0200, Willy Tarreau wrote:
   May I ask about the ETA on this?
  
  It's too early for me to know, I need to go down deep into the ebtrees
  first to
  see if longest match is compatible with strings storage-wise, then I need
  to
  study how patterns are built as trees to see how to do that as well.
  Possibly
  it's just one or two days of work once I understand everything.
 
 OK in the end it was extremely easy :-)
 Thierry has done an amazing job at making the pattern management very
 modular,
 because I just changed the index and lookup to try in a tree first with a
 different algorithm and that works fine! So we don't care about the
 compatibility between regular string match and beginning.
 
 So that's pushed into git now if you want to give it a try.


Thanks a lot Willy. It works as intended. I am trying to spin a real large case 
and check the performance. Will report.

Thanks again.
Rajat



Re: Help with dynamic backend selection

2014-05-09 Thread Rajat Chopra




- Original Message -
 From: Willy Tarreau w...@1wt.eu
 To: Rajat Chopra rcho...@redhat.com
 Cc: haproxy haproxy@formilux.org
 Sent: Friday, May 9, 2014 10:43:32 AM
 Subject: Re: Help with dynamic backend selection
 
 Hi Rajat,
 
 On Tue, May 06, 2014 at 07:34:34PM -0400, Rajat Chopra wrote:
  Hi!
  The new feature to dynamically select a backend has been going great
  for me.
  I use it like this and it all works like a charm :
  
  use_backend bk_%[hdr(host),host_to_backend.mapfile] if TRUE
  
  
  Now I need help with a situation. Some of the servers meant for a
  particular
  hostname need path replacement. i.e. reqrep/resrep.
  e.g. I have a set of 5 servers. While three of them need to be balanced for
  any hostname/service1 requests, the other two need to be load balanced
  for
  hostname/service2 requests. How can I dynamically choose between two
  backends depending on hostname+path_beg?
  
  Can I use something like this?
  
  use_backend bk_%[hdr(host),host_to_backend.mapfile]_%[path]
  
  where a request to hostname/service1 would mean look for a backend named
  bk_hostname_service1.
  Clearly it will not work like that because hostname/service1/login.php
  will break the backend lookup.
 
 Absolutely. And you'd have to have as many backends as possible paths, which
 is not practical!
 
 Why don't you run that based on ACLs if you're certain that service2 will
 not cause any trouble ?
 
 Eg:
 
 use_backend bk_%[hdr(host),host_to_backend.mapfile]_%[path] if { path_beg
 /service2 }
 use_backend bk_%[hdr(host),host_to_backend.mapfile]
 
 But again, that requires that you know the exhaustive list of valid paths for
 service2. I don't know if that's your case or not, but that could be the
 solution.

Yes. Thats the only solution I could come up with. I would have the exhaustive 
list as the script generates the config file, but I worry that with 150k 
backends, what if each backend comes up with its own unique path_beg 
requirement. It would defeat the purpose of the map.

Also. In this statement (use_backend 
bk_%[hdr(host),host_to_backend.mapfile]_%[path] if { path_beg /service2 })
using _%[path] is not good, right? We need to use 
'bk_%[hdr(host),map(mapfile)]_service2', right?

Wonder if we could use some pluggable logic for map. e.g. User can provide a 
.so with the user defined map function. So my call will become :
use_backend bk_%[hdr(host),path,myown_func(path_to_dot_so)] if TRUE
And at config time, haproxy does a dlopen on path_to_dot_so, and expects that 
dlsym finds a myown_func method. One that returns a string when given the two 
arguments (host,path).
It may be useful for other dynamic user defined acls also.

Far fetched feature, eh? :)

Thanks for the help again.
Rajat 






Re: Help with dynamic backend selection

2014-05-09 Thread Rajat Chopra

 
  Wonder if we could use some pluggable logic for map. e.g. User can provide
  a
  .so with the user defined map function. So my call will become :
 
 It's still early to do this, there are strict types on the patterns and
 their own alloc/release/insert/delete functions.
 
  use_backend bk_%[hdr(host),path,myown_func(path_to_dot_so)] if TRUE
  And at config time, haproxy does a dlopen on path_to_dot_so, and expects
  that
  dlsym finds a myown_func method. One that returns a string when given the
  two
  arguments (host,path).
  It may be useful for other dynamic user defined acls also.
 
 In practice we should not need to define that many new ACLs since they're
 supposed to rely on the sample fetch functions. However, sample fetch
 functions can very easily be added.
 
  Far fetched feature, eh? :)
 
 No that much. I'm thinking about something. Maps can retrieve the first
 matching prefix using map_beg. So if you pass base to your map, you'll
 have the host+path on input, and take the first matching prefix. Thus, it
 will give you this :
 
use_backend bk_%[base,map_beg(mapfile)]
 
 The file would contain all the host+path sorted from the longest to the
 shortest (basically just a sort -r) :
 
domain1/path1/   backend1
domain1/path2/   backend2
domain1/ backend3
domain2/path1/   backend1
domain2/ backend4
 
 The only problem is that the beg match needs to iterate over all entries,
 so it will be slower than a tree based lookup, but much faster than running
 150k rules!
 

That looks great.
Humble opinion here, or just my discomfort maybe. The map_beg will certainly 
solve my current case, but it feels like that a generic user callable function 
would offload the myriad other cases that might crop up in the future. Pretty 
sure someone will ask for a map_reg down the line - a map with keys as compiled 
regexes is compelling.



 And further, I think that with some tweeking, we could make the beg match
 use the longest prefix match function of the ebtrees. I remember that there's
 something tricky about it related to the string length, but if we need to pad
 with a zero or something like this, it might not be a big deal. So maybe in
 the end we could improve the map match to lookup host+uri prefixes faster
 than we do today.
 
 That would be great for URL or Location rewriting!
 

May I ask about the ETA on this?


Greedily,
With sincere thanks.
Rajat



Help with dynamic backend selection

2014-05-06 Thread Rajat Chopra
Hi!
The new feature to dynamically select a backend has been going great for me.
I use it like this and it all works like a charm :

use_backend bk_%[hdr(host),host_to_backend.mapfile] if TRUE


Now I need help with a situation. Some of the servers meant for a particular 
hostname need path replacement. i.e. reqrep/resrep.
e.g. I have a set of 5 servers. While three of them need to be balanced for any 
hostname/service1 requests, the other two need to be load balanced for 
hostname/service2 requests. How can I dynamically choose between two backends 
depending on hostname+path_beg?

Can I use something like this?

use_backend bk_%[hdr(host),host_to_backend.mapfile]_%[path] 

where a request to hostname/service1 would mean look for a backend named 
bk_hostname_service1. 
Clearly it will not work like that because hostname/service1/login.php will 
break the backend lookup.

Advice/suggestions please. Thanks.
Rajat



Re: [PATCH] proxy: support use_backend with dynamic names

2014-03-31 Thread Rajat Chopra


Hi Steven,

With the patch from Bertrand, you should not need many ACLs I believe.

For the thousands of backends, yes, I did have the issue with huge startup 
times, but it has been solved with recent commits from Willy.
'Huge' is relative obviously - it came down from 15minutes or so to 8s, and for 
me it is reasonable enough now. 

I changed a few things in my config file too and I have posted the 
optimizations in the stackoverflow post - e.g. use fullconn 1000 in defaults 
and use IP addresses for destinations instead of DNS.


Did you try the latest code from git? Send an example of your config file 
otherwise and I am sure the experts on the list will be able to help.

Best,
Rajat



- Original Message -
 From: Steven Le Roux ste...@le-roux.info
 To: Rajat Chopra rcho...@redhat.com
 Cc: haproxy haproxy@formilux.org
 Sent: Monday, March 31, 2014 4:04:55 PM
 Subject: Re: [PATCH] proxy: support use_backend with dynamic names
 
 Hi !
 
 Since I experienced the same behaviour with a similar configuration, don't
 you have a huge startup time  due to the ACL parsing ?
 
 --
 Steven Le Roux
 Le 28 mars 2014 01:59, Rajat Chopra rcho...@redhat.com a écrit :
 
  Hi!
 This solution very much solves the problem that I have been facing i.e.
  large number of acl rules causing latency in requests. Been in discussions
  separately about it and today I got a chance to test out this patch. I
  report that it works great! I have been able to route 150k backends with
  this and the latency added because of the dynamic lookup is in order of
  microseconds (compared to 24ms earlier).
 
 
  The usage 'use_backend bk_%[hdr(Host)] if TRUE' works for my use-case but
  originally I was wondering if one could do a map based lookup for the
  backend.
  As posted here :
 
  http://stackoverflow.com/questions/22025412/how-to-use-thousands-of-backends-in-haproxy-is-the-new-map-feature-useful-for-t
 
  Most of the issues in the above question are now solved, but I tested this
  with the patch -
  use_backend bk_%[hdr(Host), map(host_to_backend_map.file)] if TRUE
 
  And it does not work. I am not yet familiar with code to determine why
  this does not work. Again, the current proposal works well for me but an
  enhancement should probably consider using maps within dynamic lookup.
 
  +1 for the patch.
  Thanks.
  Rajat
 
 
 
 
 
   Hi Bertrand,
  
   On Sun, Mar 23, 2014 at 04:18:44PM +0100, Bertrand Jacquin wrote:
Hi,
   
I did this patch for dev19 some time ago but I am still not sure
  whether
it is the best way to do it or not, and did not have the time to
  discuss
it since. As the latest changes broke it and forced me to rebase it,
  and
it's very useful for us, I'd like to propose it for inclusion before
  the
final release if you think it's OK, or to discuss how it should be
  done.
  
   Great!
  
Main purpose wanted to achieve is it be able to use many backends
without the need to declare each routing process from frontend to
backend and instead use generic and dynamic switching when a sane
parameter can be used from user request using the logformat logic. For
example when we have a backend farm dedicated to each 'Host: '
  http-header,
it's pain in the ass to have to declare the backend and the relevant
use_backend.
  
   Yes I know there's this request coming from time to time. In fact it
   was even planned to work like this before we finally went with ACLs
   and use_backend, but we felt it would be a too limited design (eg: no
   choice of other routing key).
  
With the proposed solution, you first need to declare a dynamic
use_backend as the following :
   
  use_backend bk_cust_%[hdr(Host)] if { hdr(Host) -m found }
   
And then to declare the needed backend. For every new vhost hosted
  will only
need to add the backend section to the configuration.
  
   I'm not opposed to the feature at all, in fact I've even been involved
   in a discussion about something more or less in this vein recently. But
   I'm having some fears about the use of the %[] form in a use_backend
   directive. Indeed, this string format was initially done only for
   logformat. Then it was adopted for unique-id. Then for all http-request
   directives. And we start to see from time to time people trying to use
   it in places which have no relation with it (eg: in ACL declaration).
  
   I'm seeing several solutions in fact :
 - yours above
  
 - append some argument to use_backend to indicate that it's a logformat
   string or a dynamic backend (eg: use_backend -d foo%[bar]), but -d
   might be a valid backend name, so ...
  
 - have a different directive name (eg: use_backend_dyn or
  use_backend_lf),
   but that might increase the confusion for some users who will not
   necessarily know that they're part of the same ruleset.
  
 - put use_backend in http-request rules and clearly state that only
   http-request can

Re: [PATCH] proxy: support use_backend with dynamic names

2014-03-28 Thread Rajat Chopra


 
 Haproxy 1.5 and earlier cut the lines in words around spaces, so above your
 expression does not work because it's split in two. Just remove the space
 before map and it will do exactly what you need. Also I think it's better
 to use a map than the plain header because this way you can ensure what the
 exact list of accessible backends will be, and you can also get rid of some
 matching details such as lower/upper case etc...

Thanks for the pointer. Removed the space, and it works.



 
 Do you think that use-server would simplify your configuration (eg: have
 only one backend with thousands of servers, one per host) ? We could then
 imagine something like this :
 
   frontend foo
  use_backend bk_%[hdr(host),map(host_to_backend.map)]
 
   bk_host_xxx
  server 1 ...
  server 2 ...
 
   bk_virtual_hosted
  use-server srv_%[hdr(host),map(host_to_server.map)]
  server srv_1 ...
  server srv_2 ...
 
 The idea would be to have a few backends for multi-server hosts, and a
 single backend with all single-server hosts and as many servers. But maybe
 this would make the config more complex in the end, I don't know. I think
 the natural next step is the usedst server directive that people who want
 to run proxies are requesting :-)

I generate the configuration through a script, so I guess not much difference 
unless you think it is more optimal to run single-server backends by grouping 
them.
On the other hand, it appears cumbersome when such host-servers start to 
scale-up and we have to move it from being a stag server to an actual backend 
for load-balancing.

Not sure I understand the usedst directive. Dynamic lookup of server's target 
destination?


:),
Rajat



Re: [PATCH] proxy: support use_backend with dynamic names

2014-03-27 Thread Rajat Chopra
Hi!
   This solution very much solves the problem that I have been facing i.e. 
large number of acl rules causing latency in requests. Been in discussions 
separately about it and today I got a chance to test out this patch. I report 
that it works great! I have been able to route 150k backends with this and the 
latency added because of the dynamic lookup is in order of microseconds 
(compared to 24ms earlier).


The usage 'use_backend bk_%[hdr(Host)] if TRUE' works for my use-case but 
originally I was wondering if one could do a map based lookup for the backend.
As posted here :
http://stackoverflow.com/questions/22025412/how-to-use-thousands-of-backends-in-haproxy-is-the-new-map-feature-useful-for-t

Most of the issues in the above question are now solved, but I tested this with 
the patch -
use_backend bk_%[hdr(Host), map(host_to_backend_map.file)] if TRUE

And it does not work. I am not yet familiar with code to determine why this 
does not work. Again, the current proposal works well for me but an enhancement 
should probably consider using maps within dynamic lookup.

+1 for the patch.
Thanks.
Rajat





 Hi Bertrand,

 On Sun, Mar 23, 2014 at 04:18:44PM +0100, Bertrand Jacquin wrote:
  Hi,
 
  I did this patch for dev19 some time ago but I am still not sure whether
  it is the best way to do it or not, and did not have the time to discuss
  it since. As the latest changes broke it and forced me to rebase it, and
  it's very useful for us, I'd like to propose it for inclusion before the
  final release if you think it's OK, or to discuss how it should be done.

 Great!

  Main purpose wanted to achieve is it be able to use many backends
  without the need to declare each routing process from frontend to
  backend and instead use generic and dynamic switching when a sane
  parameter can be used from user request using the logformat logic. For
  example when we have a backend farm dedicated to each 'Host: ' http-header,
  it's pain in the ass to have to declare the backend and the relevant
  use_backend.

 Yes I know there's this request coming from time to time. In fact it
 was even planned to work like this before we finally went with ACLs
 and use_backend, but we felt it would be a too limited design (eg: no
 choice of other routing key).

  With the proposed solution, you first need to declare a dynamic
  use_backend as the following :
 
use_backend bk_cust_%[hdr(Host)] if { hdr(Host) -m found }
 
  And then to declare the needed backend. For every new vhost hosted will only
  need to add the backend section to the configuration.

 I'm not opposed to the feature at all, in fact I've even been involved
 in a discussion about something more or less in this vein recently. But
 I'm having some fears about the use of the %[] form in a use_backend
 directive. Indeed, this string format was initially done only for
 logformat. Then it was adopted for unique-id. Then for all http-request
 directives. And we start to see from time to time people trying to use
 it in places which have no relation with it (eg: in ACL declaration).

 I'm seeing several solutions in fact :
   - yours above

   - append some argument to use_backend to indicate that it's a logformat
 string or a dynamic backend (eg: use_backend -d foo%[bar]), but -d
 might be a valid backend name, so ...

   - have a different directive name (eg: use_backend_dyn or use_backend_lf),
 but that might increase the confusion for some users who will not
 necessarily know that they're part of the same ruleset.

   - put use_backend in http-request rules and clearly state that only
 http-request can use logformat, but then it means that we can't do
 it on TCP, and it can further confuse users who will try to chain
 multiple backends using http-request rules in backends.

 So in the end, I tend to think that your solution might still be the best
 one, or the least confusing one. But I'd like to read other people's comments
 about this, maybe someone has a better idea.

  More detailed usage and implementation in patch itself.
 
  It was rebased on commit 0e9b1b4d1f0efc5e46a10371d9be21e97581faab.

 OK thanks!

  A current limitation of that patch can be that if a dynamic use_backend
  is evaluated and the named backend is not found, the default_backend is
  then used. This is not a need we have, but some may complain about it.

 Well, *all* use_backend rules are final today. So if the condition after
 use_backend is true, then the rule is executed and the evaluation is
 stopped. So I think this is the proper behaviour. Doing it differently
 could instead increase confusion, because if we changed the way it works,
 some people might then complain for example that when use_backend directs
 to a backend whose all servers are dead, we ought to go back to evaluate
 all the rules instead.

 Also I could easily see some security issues by not having this behaviour,
 because is multiple dynamic rules are chained and it is not at