Re: 1.5 badly dies after a few seconds
On Wed, Sep 15, 2010 at 07:17:32AM +0200, R.Nagy József wrote: My bad, most likely. After killing haproxy process completely -instead of just config reloads-, and restarting it, problem can't be reproduced anymore without rate limiting config. OK, thanks for this clarification. So most likely it was simply rejecting the request where it seemed to be serving 'random' blank pages due to config not being reloaded properly. indeed. Number of denied reqs in the stats is 0 all along though. Bug? No it's expected if you drop at the connection level. Only sessions are accounted right now in the stats. A session is defined as a connection that has been accepted. The difference is important for analyzing what causes the drops. More counters should be added, but there will probably be some more general work on the stats first. Let me mod the question then though: All I'm trying to achieve is a simple rate limiting config against (d)dos attacks. Need to: - Serve custom 503 page when client is banned (never give blank page) - Ban with over 30reqs/10secs, temp ban for 10mins then Based on better rate limiting and docs, I came up with the config below, but problem is, the rate limiting does not take place with use_backend ease-up if conn_rate_abuse mark_as_abuser in the backend, while it does _reject_ the page if I use tcp-request content reject if conn_rate_abuse mark_as_abuser in there (but I need custom 503 as stated above). In my opinion your config is OK for this and I see no reason why it should not work (however you have src_get_gpc0(http) instead of naming the correct frontend, but I assume that's because you renamed the frontend before sending the conf). By the way: to achieve this with as simple config as possible, could 2 stick-tables config be put under a single listen block (don't need separate frontend/backend blocks for anything but this)? Yes, you could even have the same stick-table for this and store two different data. The fact that the Stackoverflow's config makes use of two stick tables is because they wanted to measure the request only on some backends. If you want to store both gpc0 and conn_rate over 10 seconds, simply declare it this way : stick-table type ip size 200k expire 10m store gpc0,conn_rate(10s) Regards, Willy --- So the config is as follows: global log 127.0.0.1 daemon debug maxconn 1024 chroot /var/chroot/haproxy uid 99 gid 99 daemon quiet pidfile /var/run/haproxy-private2.pid defaults log global modehttp option httplog option dontlognull option redispatch retries 3 maxconn 3000 contimeout 4000 clitimeout 1000 srvtimeout 20 stats enable stats scope MyHost-webfarm stats uri /secretadmin?stats stats realm Haproxy\ Statistics stats authuser:pass frontend MyHost-webfarm 82.136.111.111:8011 option forwardfor default_backend works contimeout 6000 clitimeout 2000 errorfile 503 /usr/local/etc/503error.html ### (d)dos protection ### # check master 'banned' table first stick-table type ip size 200k expire 10m store gpc0 acl source_is_abuser src_get_gpc0(http) gt 0 use_backend ease-up if source_is_abuser tcp-request connection track-sc1 src if ! source_is_abuser backend works option httpchk /!healthcheck.php option httpclose balance roundrobin server myserv1 192.168.0.4:80 check inter 5000 rise 2 fall 3 server myserv2 192.168.0.3:80 check inter 5000 rise 2 fall 3 stick-table type ip size 200k expire 1m store conn_rate(10s) # values below are specific to the backend tcp-request content track-sc2 src acl conn_rate_abuse sc2_conn_rate gt 3 # abuse is marked in the frontend so that it's shared between all sites acl mark_as_abuser sc1_inc_gpc0 gt 0 #tcp-request content reject if conn_rate_abuse mark_as_abuser use_backend ease-up if conn_rate_abuse mark_as_abuser backend ease-up mode http errorfile 503 /usr/local/etc/503error_dos.html Thanks for reading! Joe Idézet (Willy Tarreau w...@1wt.eu): On Tue, Sep 14, 2010 at 11:39:05PM +0200, Jozsef R.Nagy wrote: Hello guys, Just been testing 1.5dev2 (and most recent snapshot as well) on freebsd, evaluating it for its anti-dos capabilities. The strange thing is..it starts up just fine, serves a few pages just fine then it returns blank pages. After a minute or so it will deliver a few pages again and then blank again..this does happen with no limitation config (no dos protection) as well. Could you please send your config ? (you can send it to me privately if you prefer). I suspect an uninitialized variable or something like this,
Re: hosting HAProxy and content servers in different locations
On Mon, Sep 13, 2010 at 09:40:39AM +0200, Daniel Storjordet wrote: Is it possible to add the prefix behind instead of as a prefix? This would be extremely usefull if you wish this feature with a cdn. Then we could make a CDN server that answers to *.mycdn.com and redirect requests like www.mysite.com/mylargepicture.png to www.mysite.com.mycdn.com/mylargepicture.png without having to manualy create subdomains for every domain. This is a good idea. You can't do that with the server redir method because the prefix is static. This is something to think about however. Regards Willy
Re: sticky sessions based on request param
On Mon, Sep 13, 2010 at 08:06:37AM -0400, Karl Baum wrote: Hi Willy. The balance url_param looks like what I need. In regards to setting a cookie, in my case each of the http clients is actually an email fetching worker calling an imap api which will eventually sit behind HAProxy. Because each api node will have a connection pool of imap connections, depending on which email address the worker is processing, i want the request to be directed to the server which already has a pool of connections open for that email address. If i didn't do this, the more api nodes behind HAProxy, the more connections i will have open to the imap server and imap limits open connections to each email account. Each worker will be serving multiple email accounts and workers will process the same email account in parallel so i don't think the cookie based routing applies to this use case (but i could be wrong). OK, so what you describe perfectly matches the typical usage of a URL parameter hash. Is there significant overhead in the url_param hashing algorithm? No, it's still quite cheap. You should not notice it on a mailing system. Will the load be spread evenly across all nodes? yes, the hashing algorithm used tends to be quite smooth, but for that you obviously need a somewhat large population. I'd say that if your number of possible keys is about 100 times bigger than the number of nodes, you'll get quite a smooth distribution. Also, in the case of node failure, will the balance url_param pick another node? Yes, that's the same for all algorithms, provided that you enable health checks of course. The point to note with hashing algorithms is that when a node goes down, the farm size changes and everyone gets redispatched. This may or may not be desired. If this is a problem, you could use hash-type consistent to change the hashing algorithm to a consistent one : only the users of the failed node will be uniformly redispatched to remaining nodes. The downside is that consistent hashing is not as smooth at the normal one, but it might still be acceptable in your case. Regards, Willy
Re: 1.5 badly dies after a few seconds
Thank you for the heads up, Managed to put it in a single listen block, and worked! Temporarily! ;( Was fine on testing environment, but after putting it into production, haproxy gone wild after 40mins, and then after 20mins in the next round. 'Wild' being it returner Error instead of serving up files from an other block (symptom being static files missing from site), not even the one being rate restricted, and a few mins later it completely 'died' not serving anything just seemingly loading endlessly. If you could let me know what debug to save or look at, I'd be happy to do so. Meanwhile I've rolled back to 1.3 (exact same config except for rate limiting), being a reliable provider for months. Compiled from latest source, by make -f Makefile.bsd REGEX=pcre DEBUG= COPTS.generic=-Os -fomit-frame-pointer (no mgnu) Whole config attached: global log 127.0.0.1 daemon debug maxconn 1024 chroot /var/chroot/haproxy uid 99 gid 99 daemon quiet pidfile /var/run/haproxy-private.pid defaults log global modehttp option httplog option dontlognull option redispatch retries 3 maxconn 3000 contimeout 4000 clitimeout 1000 srvtimeout 20 stats enable stats scope Static-farm stats scope mySite-webfarm stats scope ease-up stats uri /admin?stats stats realm Haproxy\ Statistics stats authuser:pass listen mySite-webfarm 82.136.11.111:80 option forwardfor option httpchk /!healthcheck.php option httpclose balance roundrobin server host1 192.168.0.4:80 check inter 5000 rise 2 fall 3 server host2 192.168.0.3:80 check inter 5000 rise 2 fall 3 contimeout 6000 clitimeout 2000 errorfile 503 /usr/local/etc/503error.html ### (d)dos protection ### # check master 'banned' table first stick-table type ip size 1m expire 5m store gpc0,conn_rate(10s) acl source_is_abuser src_get_gpc0 gt 0 use_backend ease-up if source_is_abuser tcp-request connection track-sc1 src # values below are specific to the backend tcp-request content track-sc2 src acl conn_rate_abuse sc2_conn_rate gt 30 # abuse is marked in the frontend so that it's shared between all sites acl mark_as_abuser sc1_inc_gpc0 gt 0 #tcp-request content reject if conn_rate_abuse mark_as_abuser use_backend ease-up if conn_rate_abuse mark_as_abuser backend ease-up mode http errorfile 503 /usr/local/etc/503error_dos.html listen Static-farm 82.136.11.114:80 balance roundrobin option forwardfor option httpclose option httpchk server stat1 192.168.0.8:80 check inter 3000 rise 3 fall 2 server stat2 192.168.0.7:80 check inter 3000 rise 3 fall 2 backup Thank you, Joe Idézet (Willy Tarreau w...@1wt.eu): On Wed, Sep 15, 2010 at 07:17:32AM +0200, R.Nagy József wrote: My bad, most likely. After killing haproxy process completely -instead of just config reloads-, and restarting it, problem can't be reproduced anymore without rate limiting config. OK, thanks for this clarification. So most likely it was simply rejecting the request where it seemed to be serving 'random' blank pages due to config not being reloaded properly. indeed. Number of denied reqs in the stats is 0 all along though. Bug? No it's expected if you drop at the connection level. Only sessions are accounted right now in the stats. A session is defined as a connection that has been accepted. The difference is important for analyzing what causes the drops. More counters should be added, but there will probably be some more general work on the stats first. Let me mod the question then though: All I'm trying to achieve is a simple rate limiting config against (d)dos attacks. Need to: - Serve custom 503 page when client is banned (never give blank page) - Ban with over 30reqs/10secs, temp ban for 10mins then Based on better rate limiting and docs, I came up with the config below, but problem is, the rate limiting does not take place with use_backend ease-up if conn_rate_abuse mark_as_abuser in the backend, while it does _reject_ the page if I use tcp-request content reject if conn_rate_abuse mark_as_abuser in there (but I need custom 503 as stated above). In my opinion your config is OK for this and I see no reason why it should not work (however you have src_get_gpc0(http) instead of naming the correct frontend, but I assume that's because you renamed the frontend before sending the conf). By the way: to achieve this with as simple config as possible, could 2 stick-tables config be put under a single listen block (don't need separate frontend/backend blocks for anything but this)? Yes, you could even have the same stick-table for this and store two different data. The fact that the Stackoverflow's config makes use of
Re: 1.5 badly dies after a few seconds
On Wed, Sep 15, 2010 at 10:17:53AM +0200, R.Nagy József wrote: Thank you for the heads up, Managed to put it in a single listen block, and worked! Temporarily! ;( Was fine on testing environment, but after putting it into production, haproxy gone wild after 40mins, and then after 20mins in the next round. 'Wild' being it returner Error instead of serving up files from an other block (symptom being static files missing from site), not even the one being rate restricted, and a few mins later it completely 'died' not serving anything just seemingly loading endlessly. If you could let me know what debug to save or look at, I'd be happy to do so. You should simply disable the anti-dos protection to check the difference. Also, I can recommend you to enable the stats socket in the global config, so that you can inspect your tables or even delete entries : global stats socket /var/run/haproxy.sock level admin stats timeout 1d Then from the command line : $ socat readline /var/run/haproxy.sock prompt show table show table mySite-webfarm clear table mySite-webfarm key 192.168.0.1 etc... Also, I think that what you're experiencing is that your block levels are too low and that once an IP is blocked, it remains blocked because the user continues to try to access the site. Also, keep in mind that the use_backend rules are processed last (I should add a warning to remind about that when they're placed before tcp-request). I would personally simplify your config like this (it does not need to track two separate counters anymore) : stick-table type ip size 1m expire 5m store gpc0,conn_rate(10s) acl source_is_abuser src_get_gpc0 gt 0 tcp-request connection track-sc1 src if ! source_is_abuser acl conn_rate_abusesc1_conn_rate gt 30 acl mark_as_abuser sc1_inc_gpc0 gt 0 use_backend ease-upif source_is_abuser use_backend ease-upif conn_rate_abuse mark_as_abuser Regards, Willy
Re: 1.5 badly dies after a few seconds
You should simply disable the anti-dos protection to check the difference. Also, I can recommend you to enable the stats socket in the global config, so that you can inspect your tables or even delete entries : global stats socket /var/run/haproxy.sock level admin stats timeout 1d Then from the command line : $ socat readline /var/run/haproxy.sock prompt show table show table mySite-webfarm clear table mySite-webfarm key 192.168.0.1 Will try this etc... Also, I think that what you're experiencing is that your block levels are too low and that once an IP is blocked, it remains blocked because the user continues to try to access the site. That's fairly impossible, why would static be missing then? No rate limit on requests to static Also, keep in mind that the use_backend rules are processed last (I should add a warning to remind about that when they're placed before tcp-request). I would personally simplify your config like this (it does not need to track two separate counters anymore) : stick-table type ip size 1m expire 5m store gpc0,conn_rate(10s) acl source_is_abuser src_get_gpc0 gt 0 tcp-request connection track-sc1 src if ! source_is_abuser acl conn_rate_abusesc1_conn_rate gt 30 acl mark_as_abuser sc1_inc_gpc0 gt 0 use_backend ease-upif source_is_abuser use_backend ease-upif conn_rate_abuse mark_as_abuser Regards, Willy Thank you, Joe
Re: 1.5 badly dies after a few seconds
On Wed, Sep 15, 2010 at 10:45:12AM +0200, Jozsef R.Nagy wrote: Also, I think that what you're experiencing is that your block levels are too low and that once an IP is blocked, it remains blocked because the user continues to try to access the site. That's fairly impossible, why would static be missing then? No rate limit on requests to static Interesting, I did not understand that the first time. Indeed, there is no reason and it smells bad. I'll have to check the code, I *suspect* that one cause might be an uninitialized variable lying somewhere in the session initialization that gets passed between the various frontends upon new allocations. Regards, Willy
Re: 1.5 badly dies after a few seconds
Hey, Think found the reason causing this, after looking and logging debug: Serving requests just goes on for a while, then suddenly: 03c0:my-webfarm.srvcls[0009:000a] 03c0:my-webfarm.clicls[0009:000a] 03c0:my-webfarm.closed[0009:000a] [ALERT] 257/101918 (78231) : accept(): cannot set the socket in non blocking mode. Giving up 03c1:my-webfarm.srvcls[0007:0008] 03c1:my-webfarm.clicls[0007:0008] 03c1:my-webfarm.closed[0007:0008] and then gameover, further requests don't even show up in debug (-d) output Does that ring a bell? Cheers, Joe On 2010. 09. 15. 10:50, Willy Tarreau wrote: On Wed, Sep 15, 2010 at 10:45:12AM +0200, Jozsef R.Nagy wrote: Also, I think that what you're experiencing is that your block levels are too low and that once an IP is blocked, it remains blocked because the user continues to try to access the site. That's fairly impossible, why would static be missing then? No rate limit on requests to static Interesting, I did not understand that the first time. Indeed, there is no reason and it smells bad. I'll have to check the code, I *suspect* that one cause might be an uninitialized variable lying somewhere in the session initialization that gets passed between the various frontends upon new allocations. Regards, Willy
Re: 1.5 badly dies after a few seconds
On Wed, Sep 15, 2010 at 11:34:29AM +0200, Jozsef R.Nagy wrote: Hey, Think found the reason causing this, after looking and logging debug: Serving requests just goes on for a while, then suddenly: 03c0:my-webfarm.srvcls[0009:000a] 03c0:my-webfarm.clicls[0009:000a] 03c0:my-webfarm.closed[0009:000a] [ALERT] 257/101918 (78231) : accept(): cannot set the socket in non blocking mode. Giving up Good catch! It's the first time I've ever seen that error. What annoys me most is that it does not look possible. The file descriptor passed to fcntl() in session_accept() is the same as the one returned by accept() in stream_sock_accept(). So what I'm suspecting now is that either something corrupts the stack, ot that someone closes the same FD by error at one point. In either case, it's not funny at all :-( Have you found a minimal way to reproduce this ? Also did you have the tcp-request rules enabled in the conf causing this issue ? Thanks, Willy
Re: sticky sessions based on request param
It sounds like balance url_param is exactly what i need. In regards to a node going down, hash-type consistent sounds great as i would like to keep my clients using the same node if it is still available. Thanks for your help! -karl On Sep 15, 2010, at 3:34 AM, Willy Tarreau wrote: On Mon, Sep 13, 2010 at 08:06:37AM -0400, Karl Baum wrote: Hi Willy. The balance url_param looks like what I need. In regards to setting a cookie, in my case each of the http clients is actually an email fetching worker calling an imap api which will eventually sit behind HAProxy. Because each api node will have a connection pool of imap connections, depending on which email address the worker is processing, i want the request to be directed to the server which already has a pool of connections open for that email address. If i didn't do this, the more api nodes behind HAProxy, the more connections i will have open to the imap server and imap limits open connections to each email account. Each worker will be serving multiple email accounts and workers will process the same email account in parallel so i don't think the cookie based routing applies to this use case (but i could be wrong). OK, so what you describe perfectly matches the typical usage of a URL parameter hash. Is there significant overhead in the url_param hashing algorithm? No, it's still quite cheap. You should not notice it on a mailing system. Will the load be spread evenly across all nodes? yes, the hashing algorithm used tends to be quite smooth, but for that you obviously need a somewhat large population. I'd say that if your number of possible keys is about 100 times bigger than the number of nodes, you'll get quite a smooth distribution. Also, in the case of node failure, will the balance url_param pick another node? Yes, that's the same for all algorithms, provided that you enable health checks of course. The point to note with hashing algorithms is that when a node goes down, the farm size changes and everyone gets redispatched. This may or may not be desired. If this is a problem, you could use hash-type consistent to change the hashing algorithm to a consistent one : only the users of the failed node will be uniformly redispatched to remaining nodes. The downside is that consistent hashing is not as smooth at the normal one, but it might still be acceptable in your case. Regards, Willy
Re: 1.5 badly dies after a few seconds
On 2010. 09. 15. 15:08, Willy Tarreau wrote: On Wed, Sep 15, 2010 at 01:00:57PM +0200, Jozsef R.Nagy wrote: Have you found a minimal way to reproduce this ? Also did you have the tcp-request rules enabled in the conf causing this issue ? No minimal way yet, the config is the 'full' one i've set over previously with 2 listens (and no frontend/backend blocks) with the mods you've recommended: OK, that's already useful information. Is it? :) So yea tcp-request rules were enabled. Not sure how to reproduce it for getting to minimal way, as it only happened 4 times on production setup, and can't really afford having it dead a few more times atm :/ I certainly can understand and thank you for these tests. Now we're certain there's a nasty bug, so you should stay on the safe side. On test instance I can't get to reproducing it just yet..prolly not enough traffic or concurrency simply? that's very possible. Willy 3 hours, 40k randomized requests later test instance -with same config- still stands. Difference between test and live: - Test is only hit by 2ips, thus the rate limiter tables are way much smaller - The concurrency ratio is still lower (over 100 on production every now and then) Otherwise running test instance on same host, same binary, very same config (except ports). Hopefully this helps a bit to narrow down the possible causes.. Let me know if I can help in any way to track this down, in need of rate limiting. Thanks, Joe
AWStats and HAProxy Logs
Hi All, I am trying to come up with a way that I might combine AWStats ( or a similar system ) with HAProxy logging. Looking at the AWStats custom log options it doesn't seem like HAProxy httplog will fit and I don't see any custom options for HAProxy logging (Perhaps I am missing that?). Has anyone ever plugged HAProxy logs into any analysis software before that they might recommend? Thank you, Kyle Brandt
Re: AWStats and HAProxy Logs
Hi Kyle, option httplog clf Hervé. On Wed, 15 Sep 2010 11:08:43 -0400 Kyle Brandt k...@stackoverflow.com wrote: Hi All, I am trying to come up with a way that I might combine AWStats ( or a similar system ) with HAProxy logging. Looking at the AWStats custom log options it doesn't seem like HAProxy httplog will fit and I don't see any custom options for HAProxy logging (Perhaps I am missing that?). Has anyone ever plugged HAProxy logs into any analysis software before that they might recommend? Thank you, Kyle Brandt -- Hervé COMMOWICK, EXOSEC (http://www.exosec.fr/) ZAC des Metz - 3 Rue du petit robinson - 78350 JOUY EN JOSAS Tel: +33 1 30 67 60 65 - Fax: +33 1 75 43 40 70 mailto:hcommow...@exosec.fr
Re: AWStats and HAProxy Logs
On Wed, Sep 15, 2010 at 05:56:39PM +0200, Graeme Donaldson wrote: As of 1.4, HAproxy does have the option httplog clf option, which generates logs in common log format, which AWstats understands just fine. In our case however, we have used combined log format in the past, which gives us the HTTP User-Agent and Referer headers, which enables some useful stats in AWstats. Switching to common log format would reduce the amount of information we get from our AWstats. You can still enable header captures though (check capture request header). Regards, Willy
HAProxy option httpchk - Soap?
Hi, This is a long shot but has anyone every been able to test a simple soap service using the httpchk option in haproxy?.. If so any examples around? I've tried a few over using the \r\n but had no luck.. eg.. option httpchk POST /wsdl/NAL/bp1.0 HTTP/1.1\r\nAccept-Encoding: gzip,deflate\r\nContent-Type: text/xml;charset=UTF-8\r\nSOAPAction: \r\nUser-Agent: Jakarta Commons-HttpClient/3.1\r\nHost: 10.119.37.250\r\nContent-Length: 527\r\n\r\nsoapenv:Envelope xmlns:soapenv=http://schemas.xmlsoap.org/soap/envelope/; xmlns:ser= http://www.bla.co.nz/xsd/serviceprovisioning;soapenv:Header/soapenv:Bodyser:queryser:providerCodebla/ser:providerCodeser:lineser:exchangeID exchangeProvider=TELser:exchangeIDAT2/ser:exchangeID/ser:exchangeIDser:mpfIDL038-07-011/ser:mpfID/ser:lineser:phoneNumber99628030/ser:phoneNumberser:queryPorttrue/ser:queryPortser:queryPhonetrue/ser:queryPhone/ser:query/soapenv:Body/soapenv:Envelope\r\n\r\n any pointers are most appricated.. Sam