Re: http responses randomly getting RSTs
Hi guys, I just noticed I had missed a timeout client.. When disabling that - I can't seem to reproduce the 408 issue. Willy Tarreau said the following on 02/20/2014 08:44 PM: [CUT] --- a/src/proto_http.c +++ b/src/proto_http.c @@ -2484,6 +2484,10 @@ int http_wait_for_request(struct session *s, struct channel *req, int an_bit) /* 3: has the read timeout expired ? */ else if (req-flags CF_READ_TIMEOUT || tick_is_expired(req-analyse_exp, now_ms)) { + send_log(s-fe, LOG_DEBUG, +Timeout detected: fe=%s s-flags=%08x txn-flags=%08x req-flags=%08x msg-flags=%08x now_ms=%u req-analyse_exp=%u (%d)\n, +s-fe-id, s-flags, txn-flags, req-flags, msg-flags, now_ms, req-analyse_exp, req-analyse_exp - now_ms); + if (!(s-flags SN_ERR_MASK)) s-flags |= SN_ERR_CLITO; And if you're willing to do a second test, please apply this one : --- a/src/proto_http.c +++ b/src/proto_http.c @@ -,6 +4452,7 @@ void http_end_txn_clean_session(struct session *s) s-req-cons-exp = TICK_ETERNITY; s-req-cons-flags= SI_FL_DONT_WAKE; /* we're in the context of process_session */ s-req-flags = ~(CF_SHUTW|CF_SHUTW_NOW|CF_AUTO_CONNECT|CF_WRITE_ERROR|CF_STREAMER|CF_STREAMER_FAST|CF_NEVER_WAIT); + s-req-flags = ~CF_READ_TIMEOUT; s-rep-flags = ~(CF_SHUTR|CF_SHUTR_NOW|CF_READ_ATTACHED|CF_READ_ERROR|CF_READ_NOEXP|CF_STREAMER|CF_STREAMER_FAST|CF_WRITE_PARTIAL|CF_NEVER_WAIT); s-flags = ~(SN_DIRECT|SN_ASSIGNED|SN_ADDR_SET|SN_BE_ASSIGNED|SN_FORCE_PRST|SN_IGNORE_PRST); s-flags = ~(SN_CURR_SESS|SN_REDIRECTABLE); It removes the CF_READ_TIMEOUT flag between requests. If the problem disappears, it means the flag is inherited between requests (though we still have to figure when it can happen without being an issue). I applied both patches, and can still reproduce the error (when timeout client is set), but the log no longer logs cR (since I enabled httplog). This was the log entries from the last try, where I got a 408 in the browser. Feb 25 09:00:29 localhost haproxy[10486]: 10.46.10.145:39602 [25/Feb/2014:09:00:28.834] example.dk varnish-dmz/varnish01 0/0/0/177/177 200 4840 - - 6/6/0/1/0 0/0 GET /user/login HTTP/1.1 Feb 25 09:00:29 localhost haproxy[10486]: 10.46.10.145:39602 [25/Feb/2014:09:00:29.010] example.dk varnish-dmz/varnish02 28/0/1/0/29 304 461 - - 4/4/0/1/0 0/0 GET /sites/test1.testms.example.dk/files/css/css_LYss46ysJxRRPkNn97hs6CWd7YXPB2emRjXSSVfLgCU.css HTTP/1.1 Feb 25 09:00:29 localhost haproxy[10486]: 10.46.10.145:39603 [25/Feb/2014:09:00:29.044] example.dk varnish-dmz/varnish02 0/0/0/0/0 304 461 - - 5/5/0/1/0 0/0 GET /sites/test1.testms.example.dk/files/css/css_PotrMOiE3wsU0p53VT41Vp_TlyGnJa_IOPSf74RYDNs.css HTTP/1.1 Feb 25 09:00:29 localhost haproxy[10486]: 10.46.10.145:39604 [25/Feb/2014:09:00:29.045] example.dk varnish-dmz/varnish02 0/0/0/0/0 304 462 - - 9/9/4/2/0 0/0 GET /sites/test1.testms.example.dk/files/css/css_9Bngx0XRCGG3sSIyd4AvOFOks7o2IpfayRORBN6c3Xw.css HTTP/1.1 Feb 25 09:00:29 localhost haproxy[10486]: 10.46.10.145:39606 [25/Feb/2014:09:00:29.045] example.dk varnish-dmz/varnish02 0/0/0/0/0 304 469 - - 9/9/3/1/0 0/0 GET /sites/test1.testms.example.dk/files/js/js_IkgOrrVZK1UTp2HktvFi5vo1g7b25Muyr5GoySikuic.js HTTP/1.1 Feb 25 09:00:29 localhost haproxy[10486]: 10.46.10.145:39602 [25/Feb/2014:09:00:29.040] example.dk varnish-dmz/varnish01 4/0/1/1/6 304 463 - - 9/9/2/3/0 0/0 GET /sites/test1.testms.example.dk/files/css/css_AFNnEjvDR5CvJkNKTljzBrlXpHTR2BYWPNnWbH1OEx4.css HTTP/1.1 Feb 25 09:00:29 localhost haproxy[10486]: 10.46.10.145:39605 [25/Feb/2014:09:00:29.045] example.dk varnish-dmz/varnish01 0/0/0/1/1 304 463 - - 9/9/1/2/0 0/0 GET /sites/test1.testms.example.dk/files/css/css__nTXAnD87M2AhWF2DiSgN1-uhQqgJ3Op1bl5N06Pvu4.css HTTP/1.1 Feb 25 09:00:29 localhost haproxy[10486]: 10.46.10.145:39607 [25/Feb/2014:09:00:29.045] example.dk varnish-dmz/varnish01 0/0/0/1/1 304 471 - - 9/9/0/1/0 0/0 GET /sites/test1.testms.example.dk/files/js/js_5cYf1A6mNLYEmlwo2Ftk3pqeuzLOxPQdHymcr4I4idM.js HTTP/1.1 Feb 25 09:00:29 localhost haproxy[10486]: 10.46.10.145:39603 [25/Feb/2014:09:00:29.044] example.dk varnish-dmz/varnish02 2/0/0/0/2 304 469 - - 9/9/0/1/0 0/0 GET /sites/test1.testms.example.dk/files/js/js_R_ItDCoKeqPMRgcU-D-9yzx190CQeBCgO07IOiqZeLU.js HTTP/1.1 Feb 25 09:00:29 localhost haproxy[10486]: 10.46.10.145:39602 [25/Feb/2014:09:00:29.046] example.dk varnish-dmz/varnish01 2/0/0/0/2 304 470 - - 9/9/0/1/0 0/0 GET /sites/test1.testms.example.dk/files/js/js_Dpwj9_jdZx11LB9vWakYPT-NglBvlNPbjDg8kKHykyQ.js HTTP/1.1 Feb 25 09:00:29 localhost haproxy[10486]: 10.46.10.145:39609 [25/Feb/2014:09:00:29.131] example.dk varnish-dmz/varnish02 0/0/0/0/0 304 414 - - 2/2/1/1/0 0/0 GET
RE: http responses randomly getting RSTs
Hi, I just noticed I had missed a timeout client.. When disabling that - I can't seem to reproduce the 408 issue. Ok. I applied both patches, and can still reproduce the error (when timeout client is set), but the log no longer logs cR (since I enabled httplog). I'm not sure if they are useful if they are both applied at the same time. The first patch enables additional logging via debug syslog, but the second patch may avoid that particular code path, if I understand correctly. Can you just apply the first patch, enable syslog at debug level: global log syslogip syslog debug reproduce it and provide the syslog output (you should see messages starting with Timeout detected: ). Regards, Lukas
RE: http responses randomly getting RSTs
On Tue, February 25, 2014 9:45 pm, Lukas Tribus wrote: [CUT] I'm not sure if they are useful if they are both applied at the same time. The first patch enables additional logging via debug syslog, but the second patch may avoid that particular code path, if I understand correctly. Sorry for not taking the time to try and understand what the patches did.. I did ponder about the one patch at a time-note.. Can you just apply the first patch, enable syslog at debug level: global log syslogip syslog debug reproduce it and provide the syslog output (you should see messages starting with Timeout detected: ). I'll do that and come back. -- Best regards, Klavs Klavsen, Cell 61281200
Re: http responses randomly getting RSTs
Finally came back here :) Lukas Tribus said the following on 02/20/2014 07:16 PM: [CUT] Try removing timeout client as well (never ever do this in production). You will see a startup warning, ignore it and test if you still can reproduce it. Without timeout http-request and timeout client you probably don't see this issue. I tried removing timeout client - and it did complain loudly about it when restarting, but I still get 408's.. (you can also verify yourself.. it shows up pretty much consistently, if I press enter on the url - and then press f5 after it has loaded. I'll run the script to test timejumps in a second :) -- Regards, Klavs Klavsen, GSEC - k...@vsen.dk - http://www.vsen.dk - Tlf. 61281200 Those who do not understand Unix are condemned to reinvent it, poorly. --Henry Spencer
Re: http responses randomly getting RSTs
Willy Tarreau said the following on 02/20/2014 08:44 PM: [CUT] Or you can also use this old script I used many years ago to detect such issues : #!/bin/bash old=$(date +%s); while : ; do new=$(date +%s); if [ $new -lt $old ]; then echo Time jumps backwards : $old = $new elif [ $new -gt $((old+1)) ]; then echo Time jumps forwards : $old = $new fi old=$new done I had this running for a few minutes - while I was able to reproduce the 408 issue.. and it echo'ed nothing. I do monitor time and normally catch timejumps (I didn't even have to have tinker panic 0 set in ntpd.conf). [CUT] I just checked, and the two only situations where we may report a 408 are - when detecting a timeout waiting for a request ; - when detecting a timeout waiting for a body in balance url_param check-post But only the former will return cR. So there's no risk of inheritating this from somewhere else. The condition to enter this block is the following (proto_http.c:2486) : if (req-flags CF_READ_TIMEOUT || tick_is_expired(req-analyse_exp, now_ms)) So I'm seeing two possibilities : - the CF_READ_TIMEOUT flag was present on the channel - the http-request delay was really expired. Since the logs showed that the delay was very short, either it's the first case, or we're facing a case there analyse_exp is not properly set. This expiration is set from timeout http-request or from timeout http-keep-alive when processing the second and next requests. So here I don't think it can be the issue. Thus in my opinion, the only remaining possibility is that the CF_READ_TIMEOUT flag is set. It is normally set if the read timeout is reached. So here it will be timeout client. But that doesn't seem to make much sense, especially if you manage to reproduce it easily. last time it took me ~20 reloads, in a new firefox, of the page to reproduce it.. but on IE (on a buddys machine) it reproduces more easily. One thing I'm noticing is that this flag is not cleared between two consecutive requests over the same connection. So we could imagine a scenario where a read timeout is detected on the first request, but an event arrives just in time, at the exact same moment, resulting in the data still being processed while the flag is set. The request completes and reinitializes to handle the next request over the connection, but the flag is not cleared. Thus the second request would be the victim of this inherited timeout. It seems a bit far-fetched, but it could explain what you're seeing. Could you please add the following patch to proto_http.c (1.5.22) ? It will log in debug level a few flags and timers to try to better find what is happeneing when this happens. You'll need that your syslog deamon logs debug level entries, maybe they're stored in a differen file (eg: /var/log/debug). Otherwise you can change the level LOG_DEBUG for something else such as LOG_INFO. Warning, I copy pasted both the patches below, so you'll have to copy the lines or they won't apply due to mangled spaces. I'll apply the patch and build a new rpm.. will return back later today. Thank you very much for your assistance. -- Regards, Klavs Klavsen, GSEC - k...@vsen.dk - http://www.vsen.dk - Tlf. 61281200 Those who do not understand Unix are condemned to reinvent it, poorly. --Henry Spencer
Re: http responses randomly getting RSTs
and timesync is disabled: # vmware-toolbox-cmd timesync status Disabled Klavs Klavsen said the following on 02/20/2014 11:06 AM: Lukas Tribus said the following on 02/20/2014 10:16 AM: Hi, Can you tell us more about this server? What OS is running? Any firewalls (software or hardware)? Any other security product in between? The server is announcing 1380 Byte MSS to me here as well, so this was not something on your client side, but this is server side and thats not very common. There must be some 'interesting' detail about this server that we don't know yet. Not that I know of. It's all virtual servers, running CentOS 6. we have lots of servers with websites, running in the same environment - without ever having seen this issue :( These are behind a hardware firewall, in dmz (which nat's to the internal address) - but that's it. The TCP handshake on the capture (server side) you sent me completes within 0,1ms. So what your firewall is doing is intercepting the tcp session. Its not just source nat and destination nat. I assume you never see the real client's IP address in your log files? Yes I do. And the dump I sent you - I had set the ip of the site url in my hosts file, to point directly to the internal address of haproxy - so there is no dest. natting in that either. But thats not the real problem. The real problem is that HAProxy sends the 408 Request Time-out response to the client 31ms after the TCP handshake (tcp.stream eq 32 in with-http-client-timeout.pcap), prior to the client sendings its request, although it should only sends this error after 10 seconds. Is it possible that your OS is jumping forwards and backwards in time? haproxy server is running ntp. ntpdate -q 10.x.x.x server 10.x.x.x, stratum 2, offset -0.000244, delay 0.02614 20 Feb 10:46:56 ntpdate[19786]: adjust time server 10.47.47.47 offset -0.000244 sec Try reproducing the 408 while running through strace -tt to confirm this. Attached strace. Checkout [1] for advise on timejumps in a VM. Yes - we follow recommendations from http://kb.vmware.com/selfservice/microsites/search.do?language=en_UScmd=displayKCexternalId=1006427 (which your link refers to as well) -- Regards, Klavs Klavsen, GSEC, kl...@enableit.dk - Tlf. +45 612 812 00 EnableIT - Open Source Server, Security and Network Consulting Open Source Software - Sometimes you get more than you paid for.
Re: http responses randomly getting RSTs
Lukas Tribus said the following on 02/20/2014 11:31 AM: Hi, 11:04:59.057223 accept(7, {sa_family=AF_INET, sin_port=htons(59491), sin_addr=inet_addr(10.46.10.145)}, [16]) = 1 11:04:59.057310 fcntl(1, F_SETFL, O_RDONLY|O_NONBLOCK) = 0 11:04:59.057342 setsockopt(1, SOL_TCP, TCP_NODELAY, [1], 4) = 0 11:04:59.057370 accept(7, 0x7fff50f0b480, [128]) = -1 EAGAIN (Resource temporarily unavailable) 11:04:59.057411 recvfrom(1, 0x1d5b504, 8192, 0, 0, 0) = -1 EAGAIN (Resource temporarily unavailable) 11:04:59.057446 epoll_ctl(0, EPOLL_CTL_ADD, 1, {EPOLLIN|0x2000, {u32=1, u64=1}}) = 0 11:04:59.057476 epoll_wait(0, {}, 200, 31) = 0 11:04:59.088696 epoll_ctl(0, EPOLL_CTL_DEL, 1, {0, {u32=1, u64=1}}) = 0 11:04:59.088752 epoll_wait(0, {}, 200, 0) = 0 11:04:59.088778 sendto(1, HTTP/1.0 408 Request Time-out\r\nC..., 212, MSG_DONTWAIT|MSG_NOSIGNAL|MSG_MORE, NULL, 0) = 212 11:04:59.088822 shutdown(1, 1 /* send */) = 0 11:04:59.088901 close(1)= 0 11:04:59.088952 epoll_wait(0, {{EPOLLIN, {u32=7, u64=7}}}, 200, 31) = 1 I don't get it why this happens ... and I haven't gotten the slightest idea. I'll gladly patch the source and rebuild the rpm for dev22 I use, to enable more debugging if this can assist. Do you still see issues when using defer-accept [1] on the bind line in the configuration? I've set defer-accept on the bind and it seems I can't reproduce the issue anymore - tried an apachebench with -c 2 and -n 100 - which didn't give any errors. Please always respond the the mailing list as well. I didn't want to send attachments to the list.. so it should only be those that didn't get send to the list (and I should probably just have send the attachment off-list and not the entire email). sorry. -- Regards, Klavs Klavsen, GSEC - k...@vsen.dk - http://www.vsen.dk - Tlf. 61281200 Those who do not understand Unix are condemned to reinvent it, poorly. --Henry Spencer
RE: http responses randomly getting RSTs
Hi, I suggest you try this an different box, not virtualized or on a different hypervisor as well. I still think what you are experiencing is time jump related. Compile the test program from this bugreport: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=186393 and let it run for a few minutes. Post the results. Regards, Lukas
Re: http responses randomly getting RSTs
Hi guys, On Thu, Feb 20, 2014 at 07:49:01PM +0100, Lukas Tribus wrote: Hi, I suggest you try this an different box, not virtualized or on a different hypervisor as well. I still think what you are experiencing is time jump related. Compile the test program from this bugreport: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=186393 and let it run for a few minutes. Post the results. Or you can also use this old script I used many years ago to detect such issues : #!/bin/bash old=$(date +%s); while : ; do new=$(date +%s); if [ $new -lt $old ]; then echo Time jumps backwards : $old = $new elif [ $new -gt $((old+1)) ]; then echo Time jumps forwards : $old = $new fi old=$new done By the way, time jumps are already dealt with in haproxy. It detects delays larger than the poll time and time shifts in the past, then compensates for them. Initially it was done because of the unsynced clock on AMD Opterons, but it revealed very useful when running on VMs. So in practice, its internal clock (the now and now_ms variables) are monotonic and cannot go back. Also, the logs are basedon this internal virtual time, and I'm very worried about the fact that it logged cR after 31 ms since both use the same clock. I just checked, and the two only situations where we may report a 408 are - when detecting a timeout waiting for a request ; - when detecting a timeout waiting for a body in balance url_param check-post But only the former will return cR. So there's no risk of inheritating this from somewhere else. The condition to enter this block is the following (proto_http.c:2486) : if (req-flags CF_READ_TIMEOUT || tick_is_expired(req-analyse_exp, now_ms)) So I'm seeing two possibilities : - the CF_READ_TIMEOUT flag was present on the channel - the http-request delay was really expired. Since the logs showed that the delay was very short, either it's the first case, or we're facing a case there analyse_exp is not properly set. This expiration is set from timeout http-request or from timeout http-keep-alive when processing the second and next requests. So here I don't think it can be the issue. Thus in my opinion, the only remaining possibility is that the CF_READ_TIMEOUT flag is set. It is normally set if the read timeout is reached. So here it will be timeout client. But that doesn't seem to make much sense, especially if you manage to reproduce it easily. One thing I'm noticing is that this flag is not cleared between two consecutive requests over the same connection. So we could imagine a scenario where a read timeout is detected on the first request, but an event arrives just in time, at the exact same moment, resulting in the data still being processed while the flag is set. The request completes and reinitializes to handle the next request over the connection, but the flag is not cleared. Thus the second request would be the victim of this inherited timeout. It seems a bit far-fetched, but it could explain what you're seeing. Could you please add the following patch to proto_http.c (1.5.22) ? It will log in debug level a few flags and timers to try to better find what is happeneing when this happens. You'll need that your syslog deamon logs debug level entries, maybe they're stored in a differen file (eg: /var/log/debug). Otherwise you can change the level LOG_DEBUG for something else such as LOG_INFO. Warning, I copy pasted both the patches below, so you'll have to copy the lines or they won't apply due to mangled spaces. --- a/src/proto_http.c +++ b/src/proto_http.c @@ -2484,6 +2484,10 @@ int http_wait_for_request(struct session *s, struct channel *req, int an_bit) /* 3: has the read timeout expired ? */ else if (req-flags CF_READ_TIMEOUT || tick_is_expired(req-analyse_exp, now_ms)) { + send_log(s-fe, LOG_DEBUG, +Timeout detected: fe=%s s-flags=%08x txn-flags=%08x req-flags=%08x msg-flags=%08x now_ms=%u req-analyse_exp=%u (%d)\n, +s-fe-id, s-flags, txn-flags, req-flags, msg-flags, now_ms, req-analyse_exp, req-analyse_exp - now_ms); + if (!(s-flags SN_ERR_MASK)) s-flags |= SN_ERR_CLITO; And if you're willing to do a second test, please apply this one : --- a/src/proto_http.c +++ b/src/proto_http.c @@ -,6 +4452,7 @@ void http_end_txn_clean_session(struct session *s) s-req-cons-exp = TICK_ETERNITY; s-req-cons-flags= SI_FL_DONT_WAKE; /* we're in the context of process_session */ s-req-flags = ~(CF_SHUTW|CF_SHUTW_NOW|CF_AUTO_CONNECT|CF_WRITE_ERROR|CF_STREAMER|CF_STREAMER_FAST|CF_NEVER_WAIT); + s-req-flags = ~CF_READ_TIMEOUT; s-rep-flags = ~(CF_SHUTR|CF_SHUTR_NOW|CF_READ_ATTACHED|CF_READ_ERROR|CF_READ_NOEXP|CF_STREAMER|CF_STREAMER_FAST|CF_WRITE_PARTIAL|CF_NEVER_WAIT); s-flags =
RE: http responses randomly getting RSTs
Hi, I have attached a dump, from the client side of the problem. As you can see it starts to send the reply, and then suddenly resets. When looking at tcp.stream eq 0: We RST in the middle of a HTTP response, without any apparent reason. When looking at tcp.stream eq 1: 270 ms after the ACK we already see the RST. The browser doesn't even start to send a request. Could you try without the timeout http-request option in the config? You said you have full MTU, but the SYN-ACK from the server announces only 1380 Byte MSS. Any idea why? Any security software or NAT Gateways in between? I think something is messing with your TCP/HTTP connections. Try reproducing it with a Linux Live CD and on a different Internet connection (or at least a different NAT router). Thursday, I will do a capture on haproxy server, and capture it there, if necessary. Yes, that would be very useful as we are not sure what the server really sees. Best thing would be to capture on the client and the server side simultaneously, so that the capture can be compared. Regards, Lukas
Re: http responses randomly getting RSTs
the odd thing is, if I point the url to the varnish right behind the haproxy - the issue goes away completely. The dump I send you, was from over the internet (a few countries apart) - so that's probably why the MSS is the size it is :) I'll grab a dump on haproxy server tomorrow, while reproducing the problem with a local client. Lukas Tribus said the following on 02/19/2014 11:13 AM: Hi, I have attached a dump, from the client side of the problem. As you can see it starts to send the reply, and then suddenly resets. When looking at tcp.stream eq 0: We RST in the middle of a HTTP response, without any apparent reason. When looking at tcp.stream eq 1: 270 ms after the ACK we already see the RST. The browser doesn't even start to send a request. Could you try without the timeout http-request option in the config? You said you have full MTU, but the SYN-ACK from the server announces only 1380 Byte MSS. Any idea why? Any security software or NAT Gateways in between? I think something is messing with your TCP/HTTP connections. Try reproducing it with a Linux Live CD and on a different Internet connection (or at least a different NAT router). Thursday, I will do a capture on haproxy server, and capture it there, if necessary. Yes, that would be very useful as we are not sure what the server really sees. Best thing would be to capture on the client and the server side simultaneously, so that the capture can be compared. Regards, Lukas -- Regards, Klavs Klavsen, GSEC - k...@vsen.dk - http://www.vsen.dk - Tlf. 61281200 Those who do not understand Unix are condemned to reinvent it, poorly. --Henry Spencer
RE: http responses randomly getting RSTs
Hi, the odd thing is, if I point the url to the varnish right behind the haproxy - the issue goes away completely. The dump I send you, was from over the internet (a few countries apart) - so that's probably why the MSS is the size it is :) I'll grab a dump on haproxy server tomorrow, while reproducing the problem with a local client. I think something is seriously wrong on the server side. I took the liberty of accessing http://testkkms.kk.dk myself and and I see the issues as well [1]. But haproxy doesn't behave this way. Can you tell us more about this server? What OS is running? Any firewalls (software or hardware)? Any other security product in between? The server is announcing 1380 Byte MSS to me here as well, so this was not something on your client side, but this is server side and thats not very common. There must be some 'interesting' detail about this server that we don't know yet. Regards, Lukas [1] http://cloudshark.org/captures/5b6efeef9fff
Re: http responses randomly getting RSTs
Hi all, Le 19/02/2014 20:30, Lukas Tribus a écrit : Hi, the odd thing is, if I point the url to the varnish right behind the haproxy - the issue goes away completely. The dump I send you, was from over the internet (a few countries apart) - so that's probably why the MSS is the size it is :) I'll grab a dump on haproxy server tomorrow, while reproducing the problem with a local client. I think something is seriously wrong on the server side. I took the liberty of accessing http://testkkms.kk.dk myself and and I see the issues as well [1]. But haproxy doesn't behave this way. Can you tell us more about this server? What OS is running? Any firewalls (software or hardware)? Any other security product in between? The server is announcing 1380 Byte MSS to me here as well, so this was not something on your client side, but this is server side and thats not very common. There must be some 'interesting' detail about this server that we don't know yet. I really think there isn't any issue. Remember that some browsers (such as chrome/chromium) preopen connection in case there is a need to ask for more requests in parallel. While hitting F5, the browser will ask to close those connections. As there was no data, no valid request was received on the haproxy side and haproxy logs a 408. I'm pretty sure that adding option dontlognull [1] will solve the issue. Also, as a side note, your configuration is not optimal : - you are using both option httpclose and option http-server-close, you should make a choice (or use option http-keep-alive in recent haproxy 1.5 dev versions). - the balance keyword is not valid in a frontend section. - you should avoid the use of stats enable in the defaults section. [1] http://cbonte.github.io/haproxy-dconv/configuration-1.5.html#option%20dontlognull -- Cyril Bonté
Re: http responses randomly getting RSTs
Hi again, Le 19/02/2014 22:44, Cyril Bonté a écrit : Also, as a side note, your configuration is not optimal : - you are using both option httpclose and option http-server-close, you should make a choice (or use option http-keep-alive in recent haproxy 1.5 dev versions). - the balance keyword is not valid in a frontend section. - you should avoid the use of stats enable in the defaults section. I forgot one thing, why don't you use option httplog instead of option tcplog ? You'll have more details in the logs. -- Cyril Bonté
Re: http responses randomly getting RSTs
Hi again and sorry for spamming ;-) Le 19/02/2014 22:51, Cyril Bonté a écrit : Hi again, Le 19/02/2014 22:44, Cyril Bonté a écrit : Also, as a side note, your configuration is not optimal : - you are using both option httpclose and option http-server-close, you should make a choice (or use option http-keep-alive in recent haproxy 1.5 dev versions). I quoted the last message from Lukas without reading completely the thread. I've also made some tests on your server and indeed could see some 408 responses, even on a simple test using apache bench for 10 keep-alive requests in a range (using the server ip to hit the default virtualhost). Interstingly, putting a haproxy on my laptop that receives the traffic and sends it to your server, I don't see any more 408 nor TCP retransmissions. So, can you remove option httpclose from your configuration and let us know when it's done ? -- Cyril Bonté
Re: http responses randomly getting RSTs
Cyril Bonté said the following on 02/19/2014 10:44 PM: [CUT] I really think there isn't any issue. Remember that some browsers (such as chrome/chromium) preopen connection in case there is a need to ask for more requests in parallel. While hitting F5, the browser will ask to close those connections. As there was no data, no valid request was received on the haproxy side and haproxy logs a 408. Using f5 to reproduce is just my way of reproducing.. it happens randomly while people work on the site (not pressing f5). I'm pretty sure that adding option dontlognull [1] will solve the issue. I added this in defaults section - it didn't change anything. I set option http-keep-alive (and removed http-server-close) as well. I'm running dev22. Also, as a side note, your configuration is not optimal : - you are using both option httpclose and option http-server-close, you should make a choice (or use option http-keep-alive in recent haproxy 1.5 dev versions). - the balance keyword is not valid in a frontend section. - you should avoid the use of stats enable in the defaults section. my bad - I've removed that. -- Regards, Klavs Klavsen, GSEC - k...@vsen.dk - http://www.vsen.dk - Tlf. 61281200 Those who do not understand Unix are condemned to reinvent it, poorly. --Henry Spencer
Re: http responses randomly getting RSTs
Cyril Bonté said the following on 02/19/2014 10:51 PM: Hi again, Le 19/02/2014 22:44, Cyril Bonté a écrit : Also, as a side note, your configuration is not optimal : - you are using both option httpclose and option http-server-close, you should make a choice (or use option http-keep-alive in recent haproxy 1.5 dev versions). - the balance keyword is not valid in a frontend section. - you should avoid the use of stats enable in the defaults section. I forgot one thing, why don't you use option httplog instead of option tcplog ? You'll have more details in the logs. switched to httplog. when I get the 408 - I see nothing but 200 and 304 returncodes in the logs. -- Regards, Klavs Klavsen, GSEC - k...@vsen.dk - http://www.vsen.dk - Tlf. 61281200 Those who do not understand Unix are condemned to reinvent it, poorly. --Henry Spencer
Re: http responses randomly getting RSTs
Cyril Bonté said the following on 02/20/2014 12:03 AM: Hi again and sorry for spamming ;-) Le 19/02/2014 22:51, Cyril Bonté a écrit : Hi again, Le 19/02/2014 22:44, Cyril Bonté a écrit : Also, as a side note, your configuration is not optimal : - you are using both option httpclose and option http-server-close, you should make a choice (or use option http-keep-alive in recent haproxy 1.5 dev versions). I quoted the last message from Lukas without reading completely the thread. I've also made some tests on your server and indeed could see some 408 responses, even on a simple test using apache bench for 10 keep-alive requests in a range (using the server ip to hit the default virtualhost). Interstingly, putting a haproxy on my laptop that receives the traffic and sends it to your server, I don't see any more 408 nor TCP retransmissions. So, can you remove option httpclose from your configuration and let us know when it's done ? I had already removed it. I found a option http-server-close that I missed, which I've removed now. -- Regards, Klavs Klavsen, GSEC - k...@vsen.dk - http://www.vsen.dk - Tlf. 61281200 Those who do not understand Unix are condemned to reinvent it, poorly. --Henry Spencer
http responses randomly getting RSTs
Hi, I'm at my wits end and was hoping you had any ideas. We've setup a test environment with haproxy-1.5dev22, with varnish as a backend (and varnish using a webserver as it's backend). I have the same issue with haproxy-1.5dev19. randomly around the website (if I f.ex. reload /user/login url (it's a drupal website)) - the browser requests fails and the browser says 408. the VIP on haproxy is 10.27.121.134 a wireshark of the reponse shows, that the browser suddenly gets a RST - in the middle of a receiving the response. I did a tcpdump on the haproxy, and on the varnish - and the RST was originating from haproxy. The stats counters shows some ereq - which I think is these RSTs: # pxname,svname,qcur,qmax,scur,smax,slim,stot,bin,bout,dreq,dresp,ereq,econ,eresp,wretr,wredis,status,weight,act,bck,chkfail,chkdown,lastchg,downtime,qlimit,pid,iid,sid,throttle,lbtot,tracked,type,rate,rate_lim,rate_max,check_status,check_code,check_duration,hrsp_1xx,hrsp_2xx,hrsp_3xx,hrsp_4xx,hrsp_5xx,hrsp_other,hanafail,req_rate,req_rate_max,req_tot,cli_abrt,srv_abrt,comp_in,comp_out,comp_byp,comp_rsp, example.dk,FRONTEND,,,0,6,8000,132,129135,357937,0,0,9,OPEN,1,2,00,0,0,170,35,168,10,0,0,,0,27,213,,,0,0,0,0, example.dk-prod,FRONTEND,,,0,0,8000,0,0,0,0,0,0,OPEN,1,4,00,0,0,00,0,0,0,0,0,,0,0,0,,,0,0,0,0, example.dk-prod,web-p02,0,0,0,0,,0,0,0,,0,,0,0,0,0,UP,1,1,0,0,0,890,0,,1,4,1,,0,,2,0,,0,L4OK,,2000,0,0,0,0,0,0,00,0, example.dk-prod,web-p03,0,0,0,0,,0,0,0,,0,,0,0,0,0,UP,1,1,0,0,0,890,0,,1,4,2,,0,,2,0,,0,L4OK,,2001,0,0,0,0,0,0,00,0, example.dk-prod,web-p04,0,0,0,0,,0,0,0,,0,,0,0,0,0,UP,1,1,0,0,0,890,0,,1,4,3,,0,,2,0,,0,L4OK,,2001,0,0,0,0,0,0,00,0, example.dk-prod,BACKEND,0,0,0,0,800,0,0,0,0,0,,0,0,0,0,UP,3,3,0,,0,890,0,,1,4,0,,0,,1,0,,00,0,0,0,0,0,0,0,0,0,0,0, varnish-dmz,varnish01,0,0,0,3,,102,64617,177379,,0,,0,0,0,0,UP,1,1,0,0,0,890,0,,1,5,1,,102,,2,0,,14,L4OK,,2002,0,17,85,0,0,0,00,0, varnish-dmz,varnish02,0,0,0,3,,102,64518,178650,,0,,0,0,0,0,UP,1,1,0,0,0,890,0,,1,5,2,,102,,2,0,,13,L4OK,,2001,0,18,83,1,0,0,00,0, varnish-dmz,BACKEND,0,0,0,6,800,204,129135,356029,0,0,,0,0,0,0,UP,2,2,0,,0,890,0,,1,5,0,,204,,1,0,,270,35,168,1,0,0,0,0,0,0,0,0, I've tried setting option http-server-close and tried option httpclose - and tried without any of them. My config is: global chroot /var/lib/haproxy daemon group haproxy log 127.0.0.1 local0 maxconn 4000 pidfile /var/run/haproxy.pid stats socket /var/lib/haproxy/stats level admin user haproxy defaults log global maxconn 8000 option redispatch retries 3 stats enable option httpclose option http-server-close timeout http-request 10s timeout queue 1m timeout connect 10s timeout client 1m timeout server 1m timeout check 10s frontend example.dk bind 10.27.121.134:80 bind 10.27.121.134:443 ssl crt /etc/haproxy/star2.pem ciphers RC4-SHA:AES128-SHA:AES256-SHA mode http balance roundrobin default_backend varnish-dmz option tcplog option accept-invalid-http-request timeout client 30 listen example.dk-prod bind 10.27.121.135:80 bind 10.27.121.135:443 ssl crt /etc/haproxy/wildcard.example.dk.combined.pem ciphers RC4-SHA:AES128-SHA:AES256-SHA mode http balance roundrobin option tcplog server web-p02 10.27.121.245:80 check server web-p03 10.27.121.246:80 check server web-p04 10.27.121.247:80 check backend varnish-dmz balance roundrobin mode http option tcplog server varnish01 10.27.121.240:80 check server varnish02 10.27.121.241:80 check -- Regards, Klavs Klavsen, GSEC - k...@vsen.dk - http://www.vsen.dk - Tlf. 61281200 Those who do not understand Unix are condemned to reinvent it, poorly. --Henry Spencer
Re: http responses randomly getting RSTs
I enabled stats listener to better get an overview of the numbers.. and it seems there's no ereq.. and haproxy doesn't log anything interesting :( In debug mode I get a LOT.. It seems it's easier to reproduce, if I hit ctrl+f5 (which goes fine) and right after that, press f5 - that gives me the 408. -- Regards, Klavs Klavsen, GSEC - k...@vsen.dk - http://www.vsen.dk - Tlf. 61281200 Those who do not understand Unix are condemned to reinvent it, poorly. --Henry Spencer
Re: http responses randomly getting RSTs
Enabling debug in config, I only get a more detailed upstart info and a copy of my config in my logs, and then these upon requests: Feb 18 14:46:02 localhost haproxy[23414]: 10.46.10.145:58864 [18/Feb/2014:14:46:02.199] example.dk varnish-dmz/varnish01 0/0/1 6315 -- 6/6/0/1/0 0/0 Feb 18 14:46:02 localhost haproxy[23414]: 10.46.10.145:58860 [18/Feb/2014:14:46:02.199] example.dk varnish-dmz/varnish02 0/0/8 7302 -- 6/6/0/0/0 0/0 Feb 18 14:46:02 localhost haproxy[23414]: 10.46.10.145:58866 [18/Feb/2014:14:46:02.354] example.dk varnish-dmz/varnish01 0/0/1 1812 -- 1/1/0/0/0 0/0 Feb 18 14:46:02 localhost haproxy[23414]: 10.46.10.145:58867 [18/Feb/2014:14:46:02.390] example.dk .dk/NOSRV -1/-1/31 212 cR 0/0/0/0/0 0/0 Feb 18 14:46:02 localhost haproxy[23414]: 10.46.10.145:58868 [18/Feb/2014:14:46:02.429] example.dk varnish-dmz/varnish02 0/0/1 2639 -- 2/2/1/1/0 0/0 Feb 18 14:46:02 localhost haproxy[23414]: 10.46.10.145:58869 [18/Feb/2014:14:46:02.430] example.dk varnish-dmz/varnish01 0/0/1 1808 -- 2/2/0/1/0 0/0 Feb 18 14:46:02 localhost haproxy[23414]: 10.46.10.145:58868 [18/Feb/2014:14:46:02.431] example.dk varnish-dmz/varnish02 0/0/16 25620 -- 2/2/1/0/0 0/0 Feb 18 14:46:02 localhost haproxy[23414]: 10.46.10.145:58869 [18/Feb/2014:14:46:02.431] example.dk varnish-dmz/varnish01 0/0/16 27614 -- 3/3/0/0/0 0/0 Feb 18 14:46:02 localhost haproxy[23414]: 10.46.10.145:58870 [18/Feb/2014:14:46:02.448] example.dk varnish-dmz/varnish02 0/0/1 826 -- 3/3/0/1/0 0/0 Feb 18 14:46:02 localhost haproxy[23414]: 10.46.10.145:58870 [18/Feb/2014:14:46:02.449] example.dk varnish-dmz/varnish01 0/1/4 6526 -- 4/4/1/1/0 0/0 Feb 18 14:46:02 localhost haproxy[23414]: 10.46.10.145:58868 [18/Feb/2014:14:46:02.447] example.dk varnish-dmz/varnish02 0/0/7 3206 -- 5/5/0/0/0 0/0 Feb 18 14:46:02 localhost haproxy[23414]: 10.46.10.145:58869 [18/Feb/2014:14:46:02.448] example.dk varnish-dmz/varnish01 0/0/12 50494 -- 5/5/0/0/0 0/0 Feb 18 14:46:02 localhost haproxy[23414]: 10.46.10.145:58871 [18/Feb/2014:14:46:02.454] example.dk .dk/NOSRV -1/-1/31 212 cR 2/2/0/0/0 0/0 Feb 18 14:46:02 localhost haproxy[23414]: 10.46.10.145:58872 [18/Feb/2014:14:46:02.455] example.dk .dk/NOSRV -1/-1/30 212 cR 1/1/0/0/0 0/0 -- Regards, Klavs Klavsen, GSEC - k...@vsen.dk - http://www.vsen.dk - Tlf. 61281200 Those who do not understand Unix are condemned to reinvent it, poorly. --Henry Spencer
RE: http responses randomly getting RSTs
Hi, Feb 18 14:46:02 localhost haproxy[23414]: 10.46.10.145:58871 [18/Feb/2014:14:46:02.454] example.dk .dk/NOSRV -1/-1/31 212 cR 2/2/0/0/0 0/0 Feb 18 14:46:02 localhost haproxy[23414]: 10.46.10.145:58872 [18/Feb/2014:14:46:02.455] example.dk .dk/NOSRV -1/-1/30 212 cR 1/1/0/0/0 0/0 cR means [1]: The timeout http-request stroke before the client sent a full HTTP request. This is sometimes caused by too large TCP MSS values on the client side for PPPoE networks which cannot transport full-sized packets, or by clients sending requests by hand and not typing fast enough, or forgetting to enter the empty line at the end of the request. The HTTP status code is likely a 408 here. Does your tcpdump confirm this? Regards, Lukas [1] http://cbonte.github.io/haproxy-dconv/configuration-1.5.html#8.5
Re: http responses randomly getting RSTs
Lukas Tribus said the following on 02/18/2014 03:43 PM: Hi, Feb 18 14:46:02 localhost haproxy[23414]: 10.46.10.145:58871 [18/Feb/2014:14:46:02.454] example.dk .dk/NOSRV -1/-1/31 212 cR 2/2/0/0/0 0/0 Feb 18 14:46:02 localhost haproxy[23414]: 10.46.10.145:58872 [18/Feb/2014:14:46:02.455] example.dk .dk/NOSRV -1/-1/30 212 cR 1/1/0/0/0 0/0 cR means [1]: The timeout http-request stroke before the client sent a full HTTP request. This is sometimes caused by too large TCP MSS values on the client side for PPPoE networks which cannot transport full-sized packets, or by clients sending requests by hand and not typing fast enough, or forgetting to enter the empty line at the end of the request. The HTTP status code is likely a 408 here. Does your tcpdump confirm this? What would I look for in the tcpdump? it's a vmware virtualserver with standard MTU and all.. no PPPoE or anything. I can reproduce it by pressing f5 (a few times in a row) in my browser (firefox) and IE. I cannot reproduce it in chromium. -- Regards, Klavs Klavsen, GSEC - k...@vsen.dk - http://www.vsen.dk - Tlf. 61281200 Those who do not understand Unix are condemned to reinvent it, poorly. --Henry Spencer
RE: http responses randomly getting RSTs
Hi, What would I look for in the tcpdump? Whether or not the browser takes more than 10 seconds to complete the http request, as per your configuration: timeout http-request 10s This is the timeout you are hitting. Also read this [1]. Share your tcpdump capture if it doesn't contain anything proprietary (an tell us which tcp session in the capture failed, according to your log). Regards, Lukas [1] http://cbonte.github.io/haproxy-dconv/configuration-1.5.html#timeout%20http-request
RE: http responses randomly getting RSTs
Hi, Whether or not the browser takes more than 10 seconds to complete the http request, as per your configuration: timeout http-request 10s it's not even close to 10 seconds.. it happens as soon as I press f5 - randomly. Well, I was assuming this is the issue, since it was the only error from your logs and it seemed to be reoccurring. You did confirmed 408 errors on the client side though, so this is still my best bet. Please provide a non-truncated capture (-s 0) of the session where you see this RST. Filtering on the client IP is probably a good idea to keep the capture limited to your IP (something like tcpdump -ps0 -w haproxy-rstcapture.cap host 1.2.3.4 and tcp port 80 for example). Regards, Lukas