Re: Fix triggering of runtime DNS resolution?
On Thu, Sep 3, 2015 at 1:11 AM, Baptiste wrote: > On Thu, Sep 3, 2015 at 12:56 AM, Conrad Hoffmann > wrote: >> Hello, >> >> it's kind of late and I am not 100% sure I'm getting this right, so would >> be great if someone could double-check this: >> >> Essentially, the runtime DNS resolution was never triggered for me. I >> tracked this down to a signed/unsigned problem in the usage of >> tick_is_expired() from checks.c:2158. >> >> curr_resolution->last_resolution is being initialized to zero >> (server.c:981), which in turn makes it say a few thousand after the value >> of hold.valid is added (also checks.c:2158). It is then compared to now_ms, >> which is an unsigned integer so large that it is out of the signed integer >> range. Thus, the comparison will not get the expected result, as it is done >> on integer values (now_ms cast to integer gave e.g. -1875721083 a few >> minutes ago, which is undeniably smaller then 3000). >> >> One way to fix this is to initialize curr_resolution->last_resolution to >> now_ms instead of zero (attached "patch"), but then it only works because >> both values are converted to negative integers. While I think that this >> will reasonably hide the problem for the time being, I do think there is a >> deeper problem here, which is the frequent passing of an unsigned integer >> into a function that takes signed int as argument. >> >> I see that tick_* is used all over the place, so I thought I would rather >> consult someone before spending lots of time creating a patch that would >> not be used. Also, I would need some more time to actually figure out what >> the best solution would be. >> >> Does anyone have any thoughts on this? Is someone maybe already aware of >> this? >> >> Thanks a lot, >> Conrad >> -- >> Conrad Hoffmann >> Traffic Engineer >> >> SoundCloud Ltd. | Rheinsberger Str. 76/77, 10115 Berlin, Germany >> >> Managing Director: Alexander Ljung | Incorporated in England & Wales >> with Company No. 6343600 | Local Branch Office | AG Charlottenburg | >> HRB 110657B > > > Hi Conrad, > > I remarked this as well. > Please apply the patch in attachment and confirm it fixes this issue. > > I introduced this bug when trying to fix an other one: DNS resolution > was supposed to start with first health check. > Unfortunately, it started after hold.valid period after HAProxy's start time. > > Please confirm the patch in attachment fix this and that DNS queries > are well sent at startup (and later). > > Baptiste Hi Conrad, Please note the patch in my previous mail is not the definitive one. I started a private thread with Willy right before your mail to discuss this point and I'll send today the definitive patch. Baptiste
Using getaddrinfo_a on configuration load
Hi! I've searched the list and not found much on lengthy HAProxy start/config load times. We tend to run HAProxy with a large number of backends (100+), and recently noticed that we are seeing lengthy reload times (20+ seconds). This is most noticeable in locations that are far away from our DNS server. We've traced this back to HAProxy doing sequential DNS lookup, one address at a time, and the long RTT to our DNS servers (~200ms). As such, load times tend to be N*M ms, where N is the number of backends and M is the RTT to the DNS server. While searching the mailing list, there was a little discussion about using getaddrinfo_a for making DNS queries asynchronous. I can not find any signs of this work in the latest development branch release. Is anyone currently working on it, and if not, is it something that the project would be interested in seeing? Cheers, Cyrus -- Cyrus Hall | Lead Software Engineer | Twitch | 720-327-0344 | cy...@twitch.tv
Re: Rate Limiting - Stick-Table Entry Expiration
Hi Willy, Thank you for your detailed and clear answer. I somehow missed it when you sent it 8 days ago. > It's the sc0_get_gpc0() which refreshes the entry. I did not realize that. This is very good to know. I will definitely try your suggestion and report back here. -Hugues On Tue, Aug 25, 2015 at 9:43 AM, Willy Tarreau wrote: > Hi Hugues, > > On Wed, Aug 19, 2015 at 01:34:46PM -0700, Hugues Alary wrote: > > Hi there, > > > > I've been trying to implement rate limiting for some HTTP POST requests > on > > my website. It works great, except for one detail: the expiration of my > > entry in my stick-table is always reset to 30 seconds, which means that > if > > the client mistakenly makes a request 29 seconds after being blocked, it > > will be blocked, again, for 30 seconds. > > Note that in general that's what is desired but in your case it could be > different. > > > Here's my config, stripped down tp the bare minimal for ease of reading: > > > > frontend http-in > > modehttp > > option httplog > > > > bind *:80 > > > > ### Request limiting > > # Declare stick table > > stick-table type string size 100k expire 30s store gpc0 > > > > # Inspect layer 7 > > tcp-request inspect-delay 15s > > > > # Declare ACLs > > acl source_is_abuser sc0_get_gpc0 gt 0 > > > > tcp-request content track-sc0 req.cook(frontend) if > !source_is_abuser > > ### End Request limiting > > > > use_backend rate-limit if source_is_abuser > > > > default_backend mybackend > > > > backend mybackend > > mode http > > option httplog > > option forwardfor > > > > stick-table type string size 100k expire 30s store > http_req_rate(30s) > > tcp-request content track-sc1 req.cook(frontend) if METH_POST > > > > acl post_req_rate_abuse sc1_http_req_rate gt 30 > > acl mark_as_abuser sc0_inc_gpc0 gt 0 > > > > tcp-request content accept if post_req_rate_abuse mark_as_abuser > > > > server myLocalhost 127.0.0.1:8081 > > > > backend rate-limit > > mode http > > errorfile 503 /usr/local/etc/haproxy/rate-limit.http > > > > > > With this config, as soon as a client makes more than 1 request per > second > > over 30 seconds, this client is marked as an abuser by "mybackend". The > > following request are then, as expected, blocked by the "http-in" > frontend. > > > > However, every time the currently marked "source_is_abuser" client sends > a > > request, the expiration counter of "http-in" 's stick-table is reset to > 30 > > seconds. I would expect the expiration counter to keep going down, since > > the connection is supposedly only tracked when `!source_is_abuser`. > > > > Any insight into what I am doing wrong? > > It's the sc0_get_gpc0() which refreshes the entry. Please keep in mind that > originally stick-tables are designed to maintain stickiness information and > to ensure that entries which are still used are kept fresh. > > In your case you *really* want to monitor the abuse rate by watching > gpc0_rate. > If you measure it over 30 seconds you'll get the average amount of attempts > over the last 30 seconds period, and it would only increase when you detect > an access while still being blocked. You can then decide on the threshold > to > block on. > > But that makes me think that what you're trying to achieve in fact is an > hysteresis : you want to trigger only once the request rate reaches 30 per > 30s, and then you want to block until it goes down to 0 per 30 second. > > So probably something like this would work : > > acl post_req_rate_abuse sc1_http_req_rate gt 30 > acl post_req_recent sc1_http_req_rate gt 0 > > tcp-request content track-sc0 req.cook(frontend) if !source_is_abuser > ... > use_backend rate-limit if source_is_abuser post_req_recent > > It blocks only if there was still some activity over the last period. > > Please share your results :-) > Willy > >
Re: Fix triggering of runtime DNS resolution?
On Thu, Sep 3, 2015 at 12:56 AM, Conrad Hoffmann wrote: > Hello, > > it's kind of late and I am not 100% sure I'm getting this right, so would > be great if someone could double-check this: > > Essentially, the runtime DNS resolution was never triggered for me. I > tracked this down to a signed/unsigned problem in the usage of > tick_is_expired() from checks.c:2158. > > curr_resolution->last_resolution is being initialized to zero > (server.c:981), which in turn makes it say a few thousand after the value > of hold.valid is added (also checks.c:2158). It is then compared to now_ms, > which is an unsigned integer so large that it is out of the signed integer > range. Thus, the comparison will not get the expected result, as it is done > on integer values (now_ms cast to integer gave e.g. -1875721083 a few > minutes ago, which is undeniably smaller then 3000). > > One way to fix this is to initialize curr_resolution->last_resolution to > now_ms instead of zero (attached "patch"), but then it only works because > both values are converted to negative integers. While I think that this > will reasonably hide the problem for the time being, I do think there is a > deeper problem here, which is the frequent passing of an unsigned integer > into a function that takes signed int as argument. > > I see that tick_* is used all over the place, so I thought I would rather > consult someone before spending lots of time creating a patch that would > not be used. Also, I would need some more time to actually figure out what > the best solution would be. > > Does anyone have any thoughts on this? Is someone maybe already aware of this? > > Thanks a lot, > Conrad > -- > Conrad Hoffmann > Traffic Engineer > > SoundCloud Ltd. | Rheinsberger Str. 76/77, 10115 Berlin, Germany > > Managing Director: Alexander Ljung | Incorporated in England & Wales > with Company No. 6343600 | Local Branch Office | AG Charlottenburg | > HRB 110657B Hi Conrad, I remarked this as well. Please apply the patch in attachment and confirm it fixes this issue. I introduced this bug when trying to fix an other one: DNS resolution was supposed to start with first health check. Unfortunately, it started after hold.valid period after HAProxy's start time. Please confirm the patch in attachment fix this and that DNS queries are well sent at startup (and later). Baptiste From 06ec4730a0ed3fd5e7395d2bac907a60b62f2557 Mon Sep 17 00:00:00 2001 From: Baptiste Assmann Date: Wed, 2 Sep 2015 22:25:50 +0200 Subject: [PATCH] MINOR: FIX: DNS resolution doesn't start Patch f046f1156149d3d8563cc45d7608f2c42ef5b596 introduced a regression: DNS resolution doesn't start anymore, while it was supposed to make it start with first health check. current patch fix this issue with an other method: the last_resolution is setup to now_ms - hold.valid - 1 when parsing HAProxy's configuration file. So at first check, the last_resolution is old enough to trigger a new resolution. --- src/cfgparse.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/src/cfgparse.c b/src/cfgparse.c index 6e2bcd7..fc7f0eb 100644 --- a/src/cfgparse.c +++ b/src/cfgparse.c @@ -8052,8 +8052,10 @@ out_uri_auth_compat: } else { free(newsrv->resolvers_id); newsrv->resolvers_id = NULL; - if (newsrv->resolution) + if (newsrv->resolution) { newsrv->resolution->resolvers = curr_resolvers; + newsrv->resolution->last_resolution = tick_add(now_ms, -1 - newsrv->resolution->resolvers->hold.valid); + } } } else { -- 2.5.0
Fix triggering of runtime DNS resolution?
Hello, it's kind of late and I am not 100% sure I'm getting this right, so would be great if someone could double-check this: Essentially, the runtime DNS resolution was never triggered for me. I tracked this down to a signed/unsigned problem in the usage of tick_is_expired() from checks.c:2158. curr_resolution->last_resolution is being initialized to zero (server.c:981), which in turn makes it say a few thousand after the value of hold.valid is added (also checks.c:2158). It is then compared to now_ms, which is an unsigned integer so large that it is out of the signed integer range. Thus, the comparison will not get the expected result, as it is done on integer values (now_ms cast to integer gave e.g. -1875721083 a few minutes ago, which is undeniably smaller then 3000). One way to fix this is to initialize curr_resolution->last_resolution to now_ms instead of zero (attached "patch"), but then it only works because both values are converted to negative integers. While I think that this will reasonably hide the problem for the time being, I do think there is a deeper problem here, which is the frequent passing of an unsigned integer into a function that takes signed int as argument. I see that tick_* is used all over the place, so I thought I would rather consult someone before spending lots of time creating a patch that would not be used. Also, I would need some more time to actually figure out what the best solution would be. Does anyone have any thoughts on this? Is someone maybe already aware of this? Thanks a lot, Conrad -- Conrad Hoffmann Traffic Engineer SoundCloud Ltd. | Rheinsberger Str. 76/77, 10115 Berlin, Germany Managing Director: Alexander Ljung | Incorporated in England & Wales with Company No. 6343600 | Local Branch Office | AG Charlottenburg | HRB 110657B diff --git a/src/server.c b/src/server.c index f3b0f16..e88302b 100644 --- a/src/server.c +++ b/src/server.c @@ -978,7 +978,7 @@ int parse_server(const char *file, int linenum, char **args, struct proxy *curpr curr_resolution->status = RSLV_STATUS_NONE; curr_resolution->step = RSLV_STEP_NONE; /* a first resolution has been done by the configuration parser */ - curr_resolution->last_resolution = 0; + curr_resolution->last_resolution = now_ms; newsrv->resolution = curr_resolution; skip_name_resolution:
Hashing fetched samples / strings
I was wondering if there is a way to hash fetched sample strings - for example I have api servers and haproxy servers - I want HAProxy to set a header (i.e.'X-Secure') with a hashed concatenation of the request id and a 'shared secret' that I can have on both api and haproxy through configuration management software (chef) my intention is to have a header that proves the request came from haproxy and was not sent directly to the api via a third party. I intend to have the api perform the same hash and compare it. if there is a better way to handle this, issue, please let me know. Thank you in advance, Paul
Re: [PATCH] DOC: mention support for RFC 5077 TLS Ticket extension in starter guide
Hi all, Le 31/08/2015 11:59, Pavlos Parissis a écrit : Maybe reStructuredText as a format and Sphinx tool could help here, but it will require quite a bit of work to migrate to. It was evocated, I'm not opposed to it, I just want to ensure first that people don't have to *learn* the doc language to contribute doc. Ie: if the format is broken in a patch, it should not result in utter crap on the output nor in errors during conversion. That's why we have the current format in the first place : instead of having people learn a language, we have Cyril's tool which learns people's language. That is so true as took me 2 days to get used it and have a clean build. In the end it makes the doc contributions extremely smooth. That would be a very strong argument against any plans to migrate to something else. As supplying patches for the doc with the current format is so easy. Just erase my e-mail:-) Don't erase it too quickly, because it has always been the idea since the beginning. Currently, it's only at the state of proof of concept which provide a usable documentation with links, and that's where I stopped. One day, I hope to find some times to make it a kind of preprocessor to convert the plain text documentation to something like reStructuredText (or another one) and use a "more" standard tool for the final rendering. -- Cyril Bonté
Re: Can HAProxy loadbalance multiple requests send through single TCP connection
TCP really has no notion of "messages", it's all just bytes. So no, this would not be possible with plain TCP. -Bryan On Wed, Sep 2, 2015 at 12:05 PM, Prabu rajan wrote: > Hi Team, > > Our client to HAProxy establishes single TCP connection and continues to > send messages. We would like to know, is there a way to load balance those > messages across the services sitting behind HAProxy. Please advise. > > Regards, > Prabu >
Can HAProxy loadbalance multiple requests send through single TCP connection
Hi Team, Our client to HAProxy establishes single TCP connection and continues to send messages. We would like to know, is there a way to load balance those messages across the services sitting behind HAProxy. Please advise. Regards, Prabu
Re: Haproxy and postfix SMTPS - can't get haproxy and postfix talking to each other
Hi, On 31.08.2015 13:44, Lukas Erlacher wrote: > Hi, > >> >> Could be send your complete config and remove private information? Could >> you also please give us the output of haproxy -vv? >> > > Full config: http://ix.io/ky6 thanks. > > haproxy -vv: > > HA-Proxy version 1.5.3 2014/07/25 > Copyright 2000-2014 Willy Tarreau > > Build options : > TARGET = linux2628 > CPU = generic > CC = gcc > CFLAGS = -g -O2 -fstack-protector --param=ssp-buffer-size=4 -Wformat > -Werror=format-security -D_FORTIFY_SOURCE=2 > OPTIONS = USE_ZLIB=1 USE_OPENSSL=1 USE_PCRE=1 > > Default settings : > maxconn = 2000, bufsize = 16384, maxrewrite = 8192, maxpollevents = 200 > > Encrypted password support via crypt(3): yes > Built with zlib version : 1.2.8 > Compression algorithms supported : identity, deflate, gzip > Built with OpenSSL version : OpenSSL 1.0.1f 6 Jan 2014 > Running on OpenSSL version : OpenSSL 1.0.1f 6 Jan 2014 > OpenSSL library supports TLS extensions : yes > OpenSSL library supports SNI : yes > OpenSSL library supports prefer-server-ciphers : yes > Built with PCRE version : 8.31 2012-07-06 > PCRE library supports JIT : no (USE_PCRE_JIT not set) > Built with transparent proxy support using: IP_TRANSPARENT > IPV6_TRANSPARENT IP_FREEBIND > > Available polling systems : > epoll : pref=300, test result OK >poll : pref=200, test result OK > select : pref=150, test result OK > Total: 3 (3 usable), will use epoll. > > looks good to me > Best, > Luke > Well I created a very simple config. /etc/haproxy.cfg global maxconn 65000 ulimit-n 85535 uid 0 gid 0 daemon stats socket /var/run/haproxy.stat level admin nbproc 1 cpu-map all 1 2 ssl-server-verify none tune.ssl.default-dh-param 2048 defaults modetcp no option http-server-close timeout connect 5000 timeout client 5 timeout server 5 listen app1 bind :8080 mode http stats enable stats uri / maxconn 200 frontend ft_smtps bind :465 timeout client 1m default_backend bk_postfix_smtps backend bk_postfix_smtps option tcp-check timeout server 1m timeout connect 5s server mail-1 172.1.1.21:10464 send-proxy check /etc/postfix/master.cf on 172.1.1.21 10464 inet n - n - - smtpd -o smtpd_tls_wrappermode=yes -o smtpd_sasl_auth_enable=yes -o smtpd_client_restrictions=permit_sasl_authenticated,reject -o smtpd_upstream_proxy_protocol=haproxy Would you mind trying ? 10464 inet n - n - - smtpd instead of 10464 inet n - - - - smtpd For haproxy... The only differnce is that you use chroot and user haproxy.. Cou,ld you please try with the default and global section in the minimal example? cheers thomas
Re: [PATCH] Support statistics in multi-process mode
Hi Willy, I saw once a message that you forgot about this patch, but never saw any comment on this ever again: On 04/24/15 12:34, root wrote: From: HiepNV Signed-off-by: root --- Makefile | 4 +- include/proto/shm_proxy.h | 28 +++ src/dumpstats.c | 59 ++- src/haproxy.c | 48 - src/shm_proxy.c | 439 ++ 5 files changed, 571 insertions(+), 7 deletions(-) create mode 100644 include/proto/shm_proxy.h create mode 100644 src/shm_proxy.c http://comments.gmane.org/gmane.comp.web.haproxy/21470 Could you please recheck, if that would be a possible feature? thanks Philipp -- --- DI Mag. Philipp Kolmann mail: kolm...@zid.tuwien.ac.at Technische Universitaet Wien web: www.zid.tuwien.ac.at Zentraler Informatikdienst (ZID) tel: +43(1)58801-42011 Wiedner Hauptstr. 8-10, A-1040 WienDVR: 0005886 ---
Re: Lua outbound Sockets in 1.6-dev4
You are NOT able to reproduce? I misunderstood your previous comment. Further testing suggests (to me) that this is a timing issue, where HAProxy does not discover that the connection is established, if connection establishment doesn't happen within a very, very short window after the connection is attempted. Previously, I only tested "client talks first" (http) using a different machine as the server. Consider the following new results: server talks first (ssh) - connection to local machine - *works* server talks first (ssh) - connection to a different machine on same LAN - *works* server talks first (ssh) - connection to a different machine across Internet - *works* client talks first (http) - connection to local machine - *works* client talks first (http) - connection to a different machine on same LAN - *does not work* client talks first (http) - connection to a different machine across Internet - *does not work* The difference here seems to be the timing of the connection establishment, and the presence or absence of additional events. (Note that when I say "local machine" I do not mean 127.0.0.1; I am still using the local machine's Ethernet IP when talking to services on the local machine.) When you are testing, are you using a remote machine, so that there is a brief delay in connection establishment? If not, this may explain why you do not see the same behavior, since local connections do not appear to have the same problem. Most interesting, based on my "timing" theory, I found a workaround, which seems very wrong in principle; so wrong, in fact, that I can't believe I tried it; however, using the following tactic, I am able to make an outgoing socket connection to a different machine, when client talks first. local sock = core.tcp(); sock:settimeout(3); local written = sock:send("GET /latest/meta-data/placement/availability-zone HTTP/1.0\r\nHost: 169.254.169.254\r\n\r\n"); local connected, con_err = sock:connect("169.254.169.254",80); ... This strange code works. I hope you will agree that writing to the socket before connecting seems very wrong, and I was surprised to find that this code works successfully when connecting to a different machine -- presumably because I'm pre-loading the outbound buffer, so the server's response to my request actually triggers an event that does not occur in a condition where the client talks first and when there is a delay in connection establishment, even a very brief delay.
Correlate requests on multiple frontends based on src
Hi All, It is possible to correlate requests from multiple frontends so can i direct the request to one frontend to specific backend servers based on the backend servers accessed by the same ip address on another frontend ? To understand the scenario: MySQL Master 1 < -> MySQL Master 2 | | MySQL S1 MySQL S2 | | MySQL S3 MySQL S4 In haproxy : frontend1 points to MySQL Master 1 + MySQL Master 2 frontend2 points to MySQL S1-4 I want if possible to redirect clients accessing MySQL Master 1 on frontend1 to be redirected to MySQL S1 + S3 when they access frontend2. -- Best regards, Vintila Mihai Alexandru
Re: segfault with 1.6 10ec214f41385b231a0c4c529b7b555caf5280bb
Cyril Bonté writes: > In some conditions, srv_conn is set to NULL but is then used later. Awesome, I will roll out a new version based on master today, thanks Cyril!
Re: Lua outbound Sockets in 1.6-dev4
Thank you, Now I'm sure that the connection is establish. I see also that HAProxy close the connection 3seconds later, according with the timeout. Now the hard work begin :) I can't reproduce the bug. I agree your conclusion, the error message "Can't connect" is found only once in the HAProxy Lua code. It is in the yield function, so I guess that the yield function is wake up after the 3seconds timeout. I don't known why. Can you send your complete configuration file ? or a configuration wihich reproduce the problem ? Thanks Thierry On Tue, 1 Sep 2015 12:49:22 -0400 Michael Ezzell wrote: > You *can* reproduce the error? I feel better already. > > > > Can you run a tcpdump for validating the TCP connection establishment ? > > > > It looks pretty much as expected. Is this what you wanted? > > 73 69.516276 10.10.10.10 -> 10.20.20.20 TCP 74 44748 > http [SYN] Seq=0 > Win=26883 Len=0 MSS=8961 SACK_PERM=1 TSval=833894013 TSecr=0 WS=128 > 74 69.516893 10.20.20.20 -> 10.10.10.10 TCP 74 http > 44748 [SYN, ACK] > Seq=0 Ack=1 Win=26847 Len=0 MSS=8961 SACK_PERM=1 TSval=20615574 > TSecr=833894013 WS=128 > 75 69.516909 10.10.10.10 -> 10.20.20.20 TCP 66 44748 > http [ACK] Seq=1 > Ack=1 Win=27008 Len=0 TSval=833894013 TSecr=20615574 > 93 72.517981 10.10.10.10 -> 10.20.20.20 TCP 66 44748 > http [FIN, ACK] > Seq=1 Ack=1 Win=27008 Len=0 TSval=833894764 TSecr=20615574 > 94 72.518672 10.20.20.20 -> 10.10.10.10 TCP 254 [TCP segment of a > reassembled PDU] > 95 72.518689 10.10.10.10 -> 10.20.20.20 TCP 66 44748 > http [ACK] Seq=2 > Ack=190 Win=28032 Len=0 TSval=833894764 TSecr=20616324 > > > Also a quick hack of src/hlua.c to discover which of the three > possibilities is causing the error reveals that in > hlua_socket_connect_yield()... > > if (!hlua || !socket->s || channel_output_closed(&socket->s->req)) { > ...the condition being matched and prompting the "Can't connect" error > appears to be !socket->s.
Re: segfault with 1.6 @10ec214f41385b231a0c4c529b7b555caf5280bb
Hi guys, On Wed, Sep 02, 2015 at 08:44:21AM +0200, Cyril Bonté wrote: > I haven't made some tests yet, but I guess the issue is on srv_conn due > to this part of code in proto_http.c : > if (((s->txn->flags & TX_CON_WANT_MSK) != TX_CON_WANT_KAL) || > !si_conn_ready(&s->si[1])) { > si_release_endpoint(&s->si[1]); > srv_conn = NULL; > } > > [...] > > if (prev_status == 401 || prev_status == 407) { > [...] > s->txn->flags |= TX_PREFER_LAST; > srv_conn->flags |= CO_FL_PRIVATE; > } > > In some conditions, srv_conn is set to NULL but is then used later. Good catch, thanks for this! I've merged the fix. Fortunately it only impacts 1.6 since the connection reuse code. Thanks, Willy