Re: haproxy current session rate and current session count
Amyas writes: > Wei Kong ...> writes: > > Hi, > > > > Can someone please help me understand > > what the difference is between these two metrics? > > See attached chart. We used to rely on session rate to determine when to > > autoscale but all of a sudden, the session rate always remains as 1 or 0 > > while the current session count goes up to almost 50. > > > > Thanks, > > Wei > > I believe session rate is connections or requests (depending on your > keepalive > setting) per second. > current sessions is the current connections active on that > frontend/backend/server . Revising my previous reply - using session rate as a method to find when to scale is potentially the worst way to decide. That is because, depending on the failure modes of your system, you could end up where your capacity is overrun in a short period and your systems become overloaded, if your systems are not set up to "fail fast" then your app may hang or respond slowly. If your apps responds slowly, session rate goes _down_, and current sessions go up possibly to the point where there are queued requests. (which there were on your screenshot - are you sure your system can really handle 50 concurrent requests if there are always requests outstanding for those 50 slots?) It might be if you switch to "balance first" if you are not using it already to be able to see what the capacity is of each instance when fully loaded. You might find 50 too high.
Re: haproxy live demo IPv4-cached, IPv4-direct, etc under frontend
Willy Tarreau 1wt.eu> writes: > You need two things : > 1) enable "option socket-stats" so that haproxy keeps stats per-listener > 2) name each of your listeners. > Hoping this helps, > Willy Thanks! Great, I was missing "option socket-stats" I assume that direct/cached is based on an haproxy instance in front of this one or a rule that routes to those sockets based on whether the address has been seen before or not?
Re: haproxy current session rate and current session count
Wei Kong writes: > Hi, > > Can someone please help me understand > what the difference is between these two metrics? > See attached chart. We used to rely on session rate to determine when to > autoscale but all of a sudden, the session rate always remains as 1 or 0 > while the current session count goes up to almost 50. > > Thanks, > Wei I believe session rate is connections or requests (depending on your keepalive setting) per second. current sessions is the current connections active on that frontend/backend/server . You can use the socket to get the full list of active sessions, afaik. So if you are using keepalive you might have sessions open/connected but not sending requests or your machines are overloaded and hanging keeping the sessions open or some other reason the sessions are staying open. You probably don't care what your requests/connections per second are - if your instances can process a request in 23 ms (you can get that from the log) each instance can do 43 rps. If you are generating different dynamic pages/content or if the speed of processing varies, then your rps capacity will vary, so you can't know your exact rps cpacity in a complex dynamic system with many types of requests. But your current session is a good indicator of load that calls for additional instances since many applications have a fairly low capacity for concurrent connections when fully loaded (if you are not using keepalive). I noticed during my testing that any amount of sustained queued requests are a good indicator that additional capacity is required. This seems to be a better indicator than session rate or even current sessions. For instance, I have done quite a bit of testing and found that a fairly complex apache/php/eaccel/smarty/redis/mysql app on a micro EC2 instance, has a max capacity of 3 concurrent connections before overloading. People are shocked at how low that number is (and after they review the config/code/logs find that it is not due to poor code/config), but when there are a constant supply of queued requests, it does not take many to fully load the cpu/resources of EC2 instances. YMMV.
Remotely accessible stats socket and HATop
Hi List, We had a requirement to be able to put servers in and out of maint mode remotely from a script. To facilitate this we exposed the stats socket using socat, and wrote an init script to do that for us: https://github.com/Wirehive/haproxy-remote We then thought it would be nice, as we manage a lot of HAProxy instances, to be able to use HATop (http://feurix.org/projects/hatop/) remotely, so we've modified HATop to accept a host and port for TCP socket connections: https://github.com/Wirehive/hatop I'm not sure how useful this will be for everyone else, but I thought I'd share it just in case :) Simon
Re: RFC: set-tos followup
On Sun, Jun 23, 2013 at 05:40:44PM +0200, Lukas Tribus wrote: > Hi again! > > > Seems OK in principle. However I'd rather enclose the IPv6 part in the > > if condition instead of making the code return > > Agreed and fixed. > > > > > OK, I think everything is fine in your proposal. > > Patch attached, but you have to apply the bugfix patch from June 20th > first I guess. I thought I had applied it and not, I only read it. Both applied now. Thanks! Willy
RE: RFC: set-tos followup
Hi again! > Seems OK in principle. However I'd rather enclose the IPv6 part in the > if condition instead of making the code return Agreed and fixed. > OK, I think everything is fine in your proposal. Patch attached, but you have to apply the bugfix patch from June 20th first I guess. Thanks for reviewing! Lukas 0001-MEDIUM-http-add-IPv6-support-for-set-tos.patch Description: Binary data
Re: GIT clone fails, how to proceed?
Hi, On 23.06.2013 15:55, Willy Tarreau wrote: > Guys, I found a workaround which seems to be working quit ewell at the > moment. For some reason the kernel seems to ignore the max TCP window > size when GSO is enabled on the interface, resulting in hundreds of kB > in flight which take ages to recover in case of losses => haproxy sees > nothing move and finally times out. Disabling GSO on that interface > completely fixed the issue, now the socket's send queues are reasonable > and match the configuration and I've not seen a timeout for the last > hour. There were always a few per hour previously that I always attributed > to the clients! > > So I think it's really fixed now. I can confirm that. Thanks a lot. > Cheers, > Willy > > > thomas
Re: haproxy live demo IPv4-cached, IPv4-direct, etc under frontend
Hi, On Sat, Jun 22, 2013 at 12:54:39PM +, Amyas wrote: > I am just starting with haproxy on my personal website > and have a basic setup. > > I was wondering if the config file for the "live demo" is available > anywhere because No, it's not public. > I have not been able to figure out if I am missing something because > I don't see these heading on my frontend status. > > IPv4-cached > IPv4-direct > > where do these come from and what setting do I add > to create these subheadings on a frontend. You need two things : 1) enable "option socket-stats" so that haproxy keeps stats per-listener 2) name each of your listeners. Here is what it looks like in the config : frontend http-in option socket-stats bind 10.9.2.3:60080 mss 512 name IPv4-direct bind 10.9.2.3:60081 mss 512 name IPv4-cached bind :::80 mss 512 v6only name IPv6-direct bind 127.0.0.1:60080 name local bind 127.0.0.1:65443 name local-https accept-proxy ssl crt /etc/haproxy/demo.1wt.eu.pem ciphers ECDHE-RSA-AES128-SHA256:AES128-GCM-SHA256:RC4:HIGH:!MD5:!aNULL:!EDH ecdhe prime256v1 npn http/1.1 Hoping this helps, Willy
Re: GIT clone fails, how to proceed?
Guys, I found a workaround which seems to be working quit ewell at the moment. For some reason the kernel seems to ignore the max TCP window size when GSO is enabled on the interface, resulting in hundreds of kB in flight which take ages to recover in case of losses => haproxy sees nothing move and finally times out. Disabling GSO on that interface completely fixed the issue, now the socket's send queues are reasonable and match the configuration and I've not seen a timeout for the last hour. There were always a few per hour previously that I always attributed to the clients! So I think it's really fixed now. Cheers, Willy
Re: RFC: set-tos followup
Hi Lukas, On Sun, Jun 23, 2013 at 03:23:15PM +0200, Lukas Tribus wrote: > > static inline void inet_set_tos(int fd, sa_family_t family, int tos) > > { > > if (family != AF_INET && family != AF_INET6) > > return; > > #if defined(IP_TOS) > > if (setsockopt(fd, IPPROTO_IP, IP_TOS, &tos, sizeof(tos)) == 0) > > return; > > #endif > > #if defined(IPV6_TCLASS) > > if (setsockopt(fd, IPPROTO_IPV6, IPV6_TCLASS, &tos, sizeof(tos) == 0) > > return; > > #endif > > } > > Since setsockopt always returns 0, the function will return too early to > set IPV6_TCLASS. OK. > However we can - as per RFC2553 section 3.7 - use the libc macro > IN6_IS_ADDR_V4MAPPED(), to test whether the IPv6 address is actually > a V4-mapped-in-V6 adress (so we set IP_TOS) or a proper IPv6 address (so > we set IPV6_TCLASS). I didn't know about this one, we have the same test open-coded in acl.c. > Instead of the sa_family_t I'm passing the whole sockaddr_storage struct > to the function, so I can access the actual IPv6 address. Let me know if > there are any implications of doing so: > > static inline void inet_set_tos(int fd, struct sockaddr_storage from, int tos) > { > #ifdef IP_TOS > if (from.ss_family == AF_INET) > setsockopt(fd, IPPROTO_IP, IP_TOS, &tos, sizeof(tos)); > #endif > #ifdef IPV6_TCLASS > if (from.ss_family != AF_INET6) > return; > if (IN6_IS_ADDR_V4MAPPED(&((struct sockaddr_in6 *)&from)->sin6_addr)) > setsockopt(fd, IPPROTO_IP, IP_TOS, &tos, sizeof(tos)); > else > setsockopt(fd, IPPROTO_IPV6, IPV6_TCLASS, &tos, sizeof(tos)); > #endif > } Seems OK in principle. However I'd rather enclose the IPv6 part in the if condition instead of making the code return when IPV6_TCLASS does not match : if we later reuse this for other families, the behaviour should not depend on IPV6_TCLASS. > Thats means we make a single (the correct one) syscall only, and all 3 > scenarios (native IPv4, native IPv6 and v4-in-v6-mapped) are covered. > > To maintain compatibility we put everything v6 related in #ifdef's of > IPV6_TCLASS. > > In case the libc doesn't have that macro, but defines other IPv6 related > things like IPV6_TCLASS, we add a compatibility define in compat.h: > > #if defined(IPV6_TCLASS) && !defined(IN6_IS_ADDR_V4MAPPED) > #define IN6_IS_ADDR_V4MAPPED(a) \ > const uint32_t *) (a))[0] == 0) \ > && (((const uint32_t *) (a))[1] == 0) \ > && (((const uint32_t *) (a))[2] == htonl (0x))) > #endif > > > > > We could emit a warning if IPV6_TCLASS is not defined when the set-tos > > is used, but quite frankly, it could upset users who are only using IPV4. > > I agree, I didn't think of that. The user may not be able to change the > underlying kernel/libc and a warning can be annoying. Also this is only > interesting for one situation: when IPv6 is used, but IPV6_TCLASS is not > defined, and that particular case will probably never happen in real life. > > > Let me know if you are ok with the approach above. OK, I think everything is fine in your proposal. Best regards, Willy
RE: RFC: set-tos followup
Hi Willy, > static inline void inet_set_tos(int fd, sa_family_t family, int tos) > { > if (family != AF_INET && family != AF_INET6) > return; > #if defined(IP_TOS) > if (setsockopt(fd, IPPROTO_IP, IP_TOS, &tos, sizeof(tos)) == 0) > return; > #endif > #if defined(IPV6_TCLASS) > if (setsockopt(fd, IPPROTO_IPV6, IPV6_TCLASS, &tos, sizeof(tos) == 0) > return; > #endif > } Since setsockopt always returns 0, the function will return too early to set IPV6_TCLASS. However we can - as per RFC2553 section 3.7 - use the libc macro IN6_IS_ADDR_V4MAPPED(), to test whether the IPv6 address is actually a V4-mapped-in-V6 adress (so we set IP_TOS) or a proper IPv6 address (so we set IPV6_TCLASS). Instead of the sa_family_t I'm passing the whole sockaddr_storage struct to the function, so I can access the actual IPv6 address. Let me know if there are any implications of doing so: static inline void inet_set_tos(int fd, struct sockaddr_storage from, int tos) { #ifdef IP_TOS if (from.ss_family == AF_INET) setsockopt(fd, IPPROTO_IP, IP_TOS, &tos, sizeof(tos)); #endif #ifdef IPV6_TCLASS if (from.ss_family != AF_INET6) return; if (IN6_IS_ADDR_V4MAPPED(&((struct sockaddr_in6 *)&from)->sin6_addr)) setsockopt(fd, IPPROTO_IP, IP_TOS, &tos, sizeof(tos)); else setsockopt(fd, IPPROTO_IPV6, IPV6_TCLASS, &tos, sizeof(tos)); #endif } Thats means we make a single (the correct one) syscall only, and all 3 scenarios (native IPv4, native IPv6 and v4-in-v6-mapped) are covered. To maintain compatibility we put everything v6 related in #ifdef's of IPV6_TCLASS. In case the libc doesn't have that macro, but defines other IPv6 related things like IPV6_TCLASS, we add a compatibility define in compat.h: #if defined(IPV6_TCLASS) && !defined(IN6_IS_ADDR_V4MAPPED) #define IN6_IS_ADDR_V4MAPPED(a) \ const uint32_t *) (a))[0] == 0) \ && (((const uint32_t *) (a))[1] == 0) \ && (((const uint32_t *) (a))[2] == htonl (0x))) #endif > We could emit a warning if IPV6_TCLASS is not defined when the set-tos > is used, but quite frankly, it could upset users who are only using IPV4. I agree, I didn't think of that. The user may not be able to change the underlying kernel/libc and a warning can be annoying. Also this is only interesting for one situation: when IPv6 is used, but IPV6_TCLASS is not defined, and that particular case will probably never happen in real life. Let me know if you are ok with the approach above. Best regards, Lukas
Re: GIT clone fails, how to proceed?
Hi Lukas, OK it's a kernel issue on my reverse proxy. Look below, haproxy detected a timeout after 30s of idle : (fd 14 faces the client, fd 15 the server) epoll_wait(0, 0x1aebdd8, 0xc8, 0) = 0 gettimeofday({1371978241, 119519}, NULL) = 0 recv(15, "-\nR\216+f\213%G\3539\"\270\246{9\3037\272\317N\215\0\226\333;\334\320y\374Z.'"..., 8030, 0) = 8030 send(14, "-\nR\216+f\213%G\3539\"\270\246{9\3037\272\317N\215\0\226\333;\334\320y\374Z.'"..., 8030, MSG_DONTWAIT|MSG_NOSIGNAL|MSG_MORE) = 4280 gettimeofday({1371978241, 120296}, NULL) = 0 epoll_wait(0, 0x1aebdd8, 0xc8, 0) = 0 gettimeofday({1371978241, 120635}, NULL) = 0 recv(15, "A\23,A\17\221\234k\271!\313C\245\267a Pp\316\204-9\342E\360\3438\255\322\247(-J"..., 4280, 0) = 4280 send(14, "i.\302#t\35\300\354~G\312\2606\266\201\376\254}~\362\372l_\226\31\5\210{\344\361\10`\30"..., 3750, MSG_DONTWAIT|MSG_NOSIGNAL|MSG_MORE) = -1 EAGAIN (Resource temporarily unavailable) epoll_ctl(0, 0x3, 0xe, 0xa2028) = 0 ==> buffers are full for fd #14. Nothing happens on this FD for the next 30 seconds, until we decide it's over and close the connection : epoll_wait(0, 0x1aebdd8, 0xc8, 0x93)= 0 gettimeofday({1371978271, 122894}, NULL) = 0 setsockopt(14, SOL_SOCKET, SO_LINGER, {onoff=1, linger=0}, 8) = 0 close(14) = 0 shutdown(15, 1 /* send */) = 0 close(15) = 0 sendto(10, "<134>Jun 23 11:04:31 haproxy[1153"..., 318, MSG_DONTWAIT|MSG_NOSIGNAL, {sa_family=AF_INET, sin_port=htons(514), sin_addr=inet_addr("10.8.1.2")}, 16) = 318 Now the problem is that the capture taken on the same side shows a different story : this happens after the recovery from some losses : 10:03:47.918722 88.191.124.161.45154 > 62.212.114.60.81: . [tcp sum ok] 3500432327:3500432327(0) ack 3270212996 win 398 (DF) (ttl 53, id 27788, len 52) 10:03:48.001500 62.212.114.60.81 > 88.191.124.161.45154: . [tcp sum ok] 3270212996:3270213496(500) ack 3500432327 win 1500 (DF) (ttl 128, id 24160, len 552) 10:03:48.025513 62.212.114.60.81 > 88.191.124.161.45154: . [tcp sum ok] 3270213496:3270213996(500) ack 3500432327 win 1500 (DF) (ttl 128, id 24161, len 552) 10:03:48.063247 88.191.124.161.45154 > 62.212.114.60.81: . [tcp sum ok] 3500432327:3500432327(0) ack 3270213496 win 405 (DF) (ttl 53, id 27789, len 52) 10:03:48.086935 88.191.124.161.45154 > 62.212.114.60.81: . [tcp sum ok] 3500432327:3500432327(0) ack 3270213996 win 413 (DF) (ttl 53, id 27790, len 52) 10:03:48.344722 62.212.114.60.81 > 88.191.124.161.45154: R [tcp sum ok] 3270224496:3270224496(0) ack 3500432327 win 1500 (DF) (ttl 128, id 24183, len 52) When the RST happens, all bytes were acked, so for sure the writes should have retriggerred. It's probably time to upgrade this kernel. Now that I'm thinking about it, I believe that the issues started when I switched to use this machine :-/ Best regards, Willy
Re: GIT clone fails, how to proceed?
On Sun, Jun 23, 2013 at 10:54:00AM +0200, Lukas Tribus wrote: > Still fails here: > > lukas@ubuntuvm:~/haproxy-test$ time git clone > http://git.1wt.eu/git/haproxy.git/ > Cloning into 'haproxy'... > error: Unable to get pack file > http://git.1wt.eu/git/haproxy.git/objects/pack/pack-815835d1b2e20e0ad9d028756813b078cdf8f9c2.pack > transfer closed with 233372 bytes remaining to read > error: Unable to find 84d23dab089a4313913e22c1b0c60cc2b48216f0 under > http://git.1wt.eu/git/haproxy.git > Cannot obtain needed blob 84d23dab089a4313913e22c1b0c60cc2b48216f0 > while processing commit 0a3dd74c9cd24ab77178c9ccc65c577a91648cef. > error: Fetch failed. > > real 15m37.691s > user 0m0.012s > sys 0m0.084s > lukas@ubuntuvm:~/haproxy-test$ Yes I noticed it in the logs during your test. I managed to reproduce it now. That's strange, the server-side haproxy detects an error and closes (not a timeout) while network traces show that it's the first one to close. I'll have to retry using strace. I suspect some timeout issue reported at by the kernel. I don't even have tcp keep-alives though :-/ Thanks for the test! Willy
RE: GIT clone fails, how to proceed?
Hi Willy, > I've just put the cache into maintenance so that connections will go > directly to the origin, if you want to retry. It will be even slower > but probably worth a try. Still fails here: lukas@ubuntuvm:~/haproxy-test$ time git clone http://git.1wt.eu/git/haproxy.git/ Cloning into 'haproxy'... error: Unable to get pack file http://git.1wt.eu/git/haproxy.git/objects/pack/pack-815835d1b2e20e0ad9d028756813b078cdf8f9c2.pack transfer closed with 233372 bytes remaining to read error: Unable to find 84d23dab089a4313913e22c1b0c60cc2b48216f0 under http://git.1wt.eu/git/haproxy.git Cannot obtain needed blob 84d23dab089a4313913e22c1b0c60cc2b48216f0 while processing commit 0a3dd74c9cd24ab77178c9ccc65c577a91648cef. error: Fetch failed. real 15m37.691s user 0m0.012s sys 0m0.084s lukas@ubuntuvm:~/haproxy-test$ Regards, Lukas
Re: GIT clone fails, how to proceed?
Hi Lukas, On Sun, Jun 23, 2013 at 09:46:34AM +0200, Lukas Tribus wrote: > Hi, > > > I find it strange that the 'normal' git repository (though slow) is > > unable to clone correctly. But i guess thats not so important if there > > is a good workaround / secondary up to date repository. > > I agree, slow is one thing, not working is another thing. > > Willy, can you take a look why cloning from git.1wt.eu fails? > > > lukas@ubuntuvm:~/haproxy-test$ git clone http://git.1wt.eu/git/haproxy.git/ > Cloning into 'haproxy'... > error: Unable to get pack file > http://git.1wt.eu/git/haproxy.git/objects/pack/pack-ad332087a4ea5a65ac90791a6d55f57f2efb57d3.pack > transfer closed with 272368 bytes remaining to read > error: Unable to find 85eb3ee8610b7a8389e78b3f342f6101467d31c3 under > http://git.1wt.eu/git/haproxy.git > Cannot obtain needed blob 85eb3ee8610b7a8389e78b3f342f6101467d31c3 > while processing commit 0a3dd74c9cd24ab77178c9ccc65c577a91648cef. > error: Fetch failed. > lukas@ubuntuvm:~/haproxy-test$ We have this report from time to time with no clear explanation :-( Here it seems the problem was a bit clearer. When you download from git.1wt.eu, you pass via a cache (formilux.org) so that git packs are retrieved faster. There is one haproxy in front of this cache which reports this : 2013-06-23T07:42:16+02:00/86 127.0.0.1 haproxy[29509]: XX.XXX.XX.XX:39265 [23/Jun/2013:07:41:44.662] public cache-1wt/cache 45/0/0/2079/32021 200 51531 - - SDNI 9/9/5/5/0 0/0 {git.1wt.eu} \"GET /git/haproxy.git/objects/pack/pack-ad332087a4ea5a65ac90791a6d55f57f2e fb57d3.pack HTTP/1.1\" And on the site on the other side I'm seeing this : Jun 23 09:42:16 rpx2 haproxy[1153]: 88.191.124.161:40531 [23/Jun/2013:09:41:44.857] http-in www/www 3/0/1/13/31816 200 220007 - - cD-- 1/1/1/1/0 0/0 {git.1wt.eu:81|git/1.7.9.5|XX.XXX.XX.XX, 1|||} {|323599|application/octet-st} "GET /git/haproxy.git/objects/pack/pack-ad332087a4ea5a65ac90791a6d55f57f2efb57d3.pack HTTP/1.1" So it seems to me like this is the cache in the middle which tends to hang on some connections. And probably that once the connection aborts, the broken object is stored truncated in the cache. I've just put the cache into maintenance so that connections will go directly to the origin, if you want to retry. It will be even slower but probably worth a try. Regards, Willy
RE: GIT clone fails, how to proceed?
Hi, > I find it strange that the 'normal' git repository (though slow) is > unable to clone correctly. But i guess thats not so important if there > is a good workaround / secondary up to date repository. I agree, slow is one thing, not working is another thing. Willy, can you take a look why cloning from git.1wt.eu fails? lukas@ubuntuvm:~/haproxy-test$ git clone http://git.1wt.eu/git/haproxy.git/ Cloning into 'haproxy'... error: Unable to get pack file http://git.1wt.eu/git/haproxy.git/objects/pack/pack-ad332087a4ea5a65ac90791a6d55f57f2efb57d3.pack transfer closed with 272368 bytes remaining to read error: Unable to find 85eb3ee8610b7a8389e78b3f342f6101467d31c3 under http://git.1wt.eu/git/haproxy.git Cannot obtain needed blob 85eb3ee8610b7a8389e78b3f342f6101467d31c3 while processing commit 0a3dd74c9cd24ab77178c9ccc65c577a91648cef. error: Fetch failed. lukas@ubuntuvm:~/haproxy-test$ Thanks, Lukas