Re: http responses randomly getting RSTs

2014-02-25 Thread Klavs Klavsen
Hi guys,

I just noticed I had missed a timeout client.. When disabling that - I
can't seem to reproduce the 408 issue.

Willy Tarreau said the following on 02/20/2014 08:44 PM:
[CUT]
 --- a/src/proto_http.c
 +++ b/src/proto_http.c
 @@ -2484,6 +2484,10 @@ int http_wait_for_request(struct session *s, struct 
 channel *req, int an_bit)
  
 /* 3: has the read timeout expired ? */
 else if (req-flags  CF_READ_TIMEOUT || 
 tick_is_expired(req-analyse_exp, now_ms)) {
 +   send_log(s-fe, LOG_DEBUG,
 +Timeout detected: fe=%s s-flags=%08x 
 txn-flags=%08x req-flags=%08x msg-flags=%08x now_ms=%u req-analyse_exp=%u 
 (%d)\n,
 +s-fe-id, s-flags, txn-flags, req-flags, 
 msg-flags, now_ms, req-analyse_exp, req-analyse_exp - now_ms);
 +
 if (!(s-flags  SN_ERR_MASK))
 s-flags |= SN_ERR_CLITO;
  
 
 And if you're willing to do a second test, please apply this one :
 
 --- a/src/proto_http.c
 +++ b/src/proto_http.c
 @@ -,6 +4452,7 @@ void http_end_txn_clean_session(struct session *s)
 s-req-cons-exp   = TICK_ETERNITY;
 s-req-cons-flags= SI_FL_DONT_WAKE; /* we're in the context of 
 process_session */
 s-req-flags = 
 ~(CF_SHUTW|CF_SHUTW_NOW|CF_AUTO_CONNECT|CF_WRITE_ERROR|CF_STREAMER|CF_STREAMER_FAST|CF_NEVER_WAIT);
 +   s-req-flags = ~CF_READ_TIMEOUT;
 s-rep-flags = 
 ~(CF_SHUTR|CF_SHUTR_NOW|CF_READ_ATTACHED|CF_READ_ERROR|CF_READ_NOEXP|CF_STREAMER|CF_STREAMER_FAST|CF_WRITE_PARTIAL|CF_NEVER_WAIT);
 s-flags = 
 ~(SN_DIRECT|SN_ASSIGNED|SN_ADDR_SET|SN_BE_ASSIGNED|SN_FORCE_PRST|SN_IGNORE_PRST);
 s-flags = ~(SN_CURR_SESS|SN_REDIRECTABLE);
 
 It removes the CF_READ_TIMEOUT flag between requests. If the problem
 disappears, it means the flag is inherited between requests (though we still
 have to figure when it can happen without being an issue).
 

I applied both patches, and can still reproduce the error (when timeout
client is set), but the log no longer logs cR (since I enabled httplog).

This was the log entries from the last try, where I got a 408 in the
browser.

Feb 25 09:00:29 localhost haproxy[10486]: 10.46.10.145:39602
[25/Feb/2014:09:00:28.834] example.dk varnish-dmz/varnish01
0/0/0/177/177 200 4840 - -  6/6/0/1/0 0/0 GET /user/login HTTP/1.1
Feb 25 09:00:29 localhost haproxy[10486]: 10.46.10.145:39602
[25/Feb/2014:09:00:29.010] example.dk varnish-dmz/varnish02 28/0/1/0/29
304 461 - -  4/4/0/1/0 0/0 GET
/sites/test1.testms.example.dk/files/css/css_LYss46ysJxRRPkNn97hs6CWd7YXPB2emRjXSSVfLgCU.css
HTTP/1.1
Feb 25 09:00:29 localhost haproxy[10486]: 10.46.10.145:39603
[25/Feb/2014:09:00:29.044] example.dk varnish-dmz/varnish02 0/0/0/0/0
304 461 - -  5/5/0/1/0 0/0 GET
/sites/test1.testms.example.dk/files/css/css_PotrMOiE3wsU0p53VT41Vp_TlyGnJa_IOPSf74RYDNs.css
HTTP/1.1
Feb 25 09:00:29 localhost haproxy[10486]: 10.46.10.145:39604
[25/Feb/2014:09:00:29.045] example.dk varnish-dmz/varnish02 0/0/0/0/0
304 462 - -  9/9/4/2/0 0/0 GET
/sites/test1.testms.example.dk/files/css/css_9Bngx0XRCGG3sSIyd4AvOFOks7o2IpfayRORBN6c3Xw.css
HTTP/1.1
Feb 25 09:00:29 localhost haproxy[10486]: 10.46.10.145:39606
[25/Feb/2014:09:00:29.045] example.dk varnish-dmz/varnish02 0/0/0/0/0
304 469 - -  9/9/3/1/0 0/0 GET
/sites/test1.testms.example.dk/files/js/js_IkgOrrVZK1UTp2HktvFi5vo1g7b25Muyr5GoySikuic.js
HTTP/1.1
Feb 25 09:00:29 localhost haproxy[10486]: 10.46.10.145:39602
[25/Feb/2014:09:00:29.040] example.dk varnish-dmz/varnish01 4/0/1/1/6
304 463 - -  9/9/2/3/0 0/0 GET
/sites/test1.testms.example.dk/files/css/css_AFNnEjvDR5CvJkNKTljzBrlXpHTR2BYWPNnWbH1OEx4.css
HTTP/1.1
Feb 25 09:00:29 localhost haproxy[10486]: 10.46.10.145:39605
[25/Feb/2014:09:00:29.045] example.dk varnish-dmz/varnish01 0/0/0/1/1
304 463 - -  9/9/1/2/0 0/0 GET
/sites/test1.testms.example.dk/files/css/css__nTXAnD87M2AhWF2DiSgN1-uhQqgJ3Op1bl5N06Pvu4.css
HTTP/1.1
Feb 25 09:00:29 localhost haproxy[10486]: 10.46.10.145:39607
[25/Feb/2014:09:00:29.045] example.dk varnish-dmz/varnish01 0/0/0/1/1
304 471 - -  9/9/0/1/0 0/0 GET
/sites/test1.testms.example.dk/files/js/js_5cYf1A6mNLYEmlwo2Ftk3pqeuzLOxPQdHymcr4I4idM.js
HTTP/1.1
Feb 25 09:00:29 localhost haproxy[10486]: 10.46.10.145:39603
[25/Feb/2014:09:00:29.044] example.dk varnish-dmz/varnish02 2/0/0/0/2
304 469 - -  9/9/0/1/0 0/0 GET
/sites/test1.testms.example.dk/files/js/js_R_ItDCoKeqPMRgcU-D-9yzx190CQeBCgO07IOiqZeLU.js
HTTP/1.1
Feb 25 09:00:29 localhost haproxy[10486]: 10.46.10.145:39602
[25/Feb/2014:09:00:29.046] example.dk varnish-dmz/varnish01 2/0/0/0/2
304 470 - -  9/9/0/1/0 0/0 GET
/sites/test1.testms.example.dk/files/js/js_Dpwj9_jdZx11LB9vWakYPT-NglBvlNPbjDg8kKHykyQ.js
HTTP/1.1
Feb 25 09:00:29 localhost haproxy[10486]: 10.46.10.145:39609
[25/Feb/2014:09:00:29.131] example.dk varnish-dmz/varnish02 0/0/0/0/0
304 414 - -  2/2/1/1/0 0/0 GET

RE: http responses randomly getting RSTs

2014-02-25 Thread Lukas Tribus
Hi,


 I just noticed I had missed a timeout client.. When disabling that - I
 can't seem to reproduce the 408 issue.

Ok.



 I applied both patches, and can still reproduce the error (when timeout
 client is set), but the log no longer logs cR (since I enabled httplog).

I'm not sure if they are useful if they are both applied at the same time.

The first patch enables additional logging via debug syslog, but the second
patch may avoid that particular code path, if I understand correctly.

Can you just apply the first patch, enable syslog at debug level:

global
 log syslogip syslog debug


reproduce it and provide the syslog output (you should see messages starting
with Timeout detected: ).




Regards,

Lukas 


RE: http responses randomly getting RSTs

2014-02-25 Thread Klavs Klavsen

On Tue, February 25, 2014 9:45 pm, Lukas Tribus wrote:
[CUT]
 I'm not sure if they are useful if they are both applied at the same time.

 The first patch enables additional logging via debug syslog, but the
 second
 patch may avoid that particular code path, if I understand correctly.


Sorry for not taking the time to try and understand what the patches did..
I did ponder about the one patch at a time-note..

 Can you just apply the first patch, enable syslog at debug level:

 global
  log syslogip syslog debug


 reproduce it and provide the syslog output (you should see messages
 starting
 with Timeout detected: ).


I'll do that and come back.


-- 
Best regards,

Klavs Klavsen, Cell 61281200



Re: http responses randomly getting RSTs

2014-02-24 Thread Klavs Klavsen
Finally came back here :)

Lukas Tribus said the following on 02/20/2014 07:16 PM:
[CUT]
 Try removing timeout client as well (never ever do this in production).
 You will see a startup warning, ignore it and test if you still can
 reproduce it.
 
 Without timeout http-request and timeout client you probably don't see
 this issue.
 

I tried removing timeout client - and it did complain loudly about it
when restarting, but I still get 408's.. (you can also verify yourself..
it shows up pretty much consistently, if I press enter on the url -
and then press f5 after it has loaded.

I'll run the script to test timejumps in a second :)

-- 
Regards,
Klavs Klavsen, GSEC - k...@vsen.dk - http://www.vsen.dk - Tlf. 61281200

Those who do not understand Unix are condemned to reinvent it, poorly.
  --Henry Spencer




Re: http responses randomly getting RSTs

2014-02-24 Thread Klavs Klavsen
Willy Tarreau said the following on 02/20/2014 08:44 PM:
[CUT]
 Or you can also use this old script I used many years ago to detect such 
 issues :
 
 #!/bin/bash
 old=$(date +%s);
 while : ; do
   new=$(date +%s);
   if [ $new -lt $old ]; then
 echo Time jumps backwards : $old = $new
   elif [ $new -gt $((old+1)) ]; then
 echo Time jumps forwards  : $old = $new
   fi
   old=$new
 done
 
I had this running for a few minutes - while I was able to reproduce the
408 issue.. and it echo'ed nothing. I do monitor time and normally catch
timejumps (I didn't even have to have tinker panic 0 set in ntpd.conf).

[CUT]
 I just checked, and the two only situations where we may report a 408 are
   - when detecting a timeout waiting for a request ;
   - when detecting a timeout waiting for a body in balance url_param 
 check-post
 
 But only the former will return cR. So there's no risk of inheritating this
 from somewhere else.
 
 The condition to enter this block is the following (proto_http.c:2486) :
 
   if (req-flags  CF_READ_TIMEOUT || tick_is_expired(req-analyse_exp, 
 now_ms))
 
 So I'm seeing two possibilities :
   - the CF_READ_TIMEOUT flag was present on the channel
   - the http-request delay was really expired.
 
 Since the logs showed that the delay was very short, either it's the first
 case, or we're facing a case there analyse_exp is not properly set. This
 expiration is set from timeout http-request or from timeout 
 http-keep-alive
 when processing the second and next requests. So here I don't think it can be
 the issue.
 
 Thus in my opinion, the only remaining possibility is that the CF_READ_TIMEOUT
 flag is set. It is normally set if the read timeout is reached. So here it 
 will
 be timeout client. But that doesn't seem to make much sense, especially if
 you manage to reproduce it easily.
 

last time it took me ~20 reloads, in a new firefox, of the page to
reproduce it.. but on IE (on a buddys machine) it reproduces more easily.

 One thing I'm noticing is that this flag is not cleared between two 
 consecutive
 requests over the same connection. So we could imagine a scenario where a read
 timeout is detected on the first request, but an event arrives just in time, 
 at
 the exact same moment, resulting in the data still being processed while the
 flag is set. The request completes and reinitializes to handle the next 
 request
 over the connection, but the flag is not cleared. Thus the second request 
 would
 be the victim of this inherited timeout. It seems a bit far-fetched, but it
 could explain what you're seeing.
 

 Could you please add the following patch to proto_http.c (1.5.22) ? It will 
 log
 in debug level a few flags and timers to try to better find what is happeneing
 when this happens. You'll need that your syslog deamon logs debug level 
 entries,
 maybe they're stored in a differen file (eg: /var/log/debug). Otherwise you 
 can
 change the level LOG_DEBUG for something else such as LOG_INFO. Warning, I 
 copy
 pasted both the patches below, so you'll have to copy the lines or they won't
 apply due to mangled spaces.
 
I'll apply the patch and build a new rpm.. will return back later today.

Thank you very much for your assistance.

-- 
Regards,
Klavs Klavsen, GSEC - k...@vsen.dk - http://www.vsen.dk - Tlf. 61281200

Those who do not understand Unix are condemned to reinvent it, poorly.
  --Henry Spencer




Re: http responses randomly getting RSTs

2014-02-20 Thread Klavs Klavsen
and timesync is disabled:
# vmware-toolbox-cmd timesync status
Disabled

Klavs Klavsen said the following on 02/20/2014 11:06 AM:
 Lukas Tribus said the following on 02/20/2014 10:16 AM:
 Hi,


 Can you tell us more about this server? What OS is running? Any firewalls
 (software or hardware)? Any other security product in between?
 The server is announcing 1380 Byte MSS to me here as well, so this was
 not something on your client side, but this is server side and thats not
 very common. There must be some 'interesting' detail about this server
 that we don't know yet.

 Not that I know of.

 It's all virtual servers, running CentOS 6. we have lots of servers with
 websites, running in the same environment - without ever having seen
 this issue :(

 These are behind a hardware firewall, in dmz (which nat's to the
 internal address) - but that's it.

 The TCP handshake on the capture (server side) you sent me completes within
 0,1ms. So what your firewall is doing is intercepting the tcp session. Its 
 not
 just source nat and destination nat.

 I assume you never see the real client's IP address in your log files?

 Yes I do. And the dump I sent you - I had set the ip of the site url in
 my hosts file, to point directly to the internal address of haproxy - so
 there is no dest. natting in that either.
 
 But thats not the real problem. The real problem is that HAProxy sends the
 408 Request Time-out response to the client 31ms after the TCP handshake
 (tcp.stream eq 32 in with-http-client-timeout.pcap), prior to the client
 sendings its request, although it should only sends this error after 10
 seconds. Is it possible that your OS is jumping forwards and backwards in
 time?

 haproxy server is running ntp.
 
 ntpdate -q 10.x.x.x
 server 10.x.x.x, stratum 2, offset -0.000244, delay 0.02614
 20 Feb 10:46:56 ntpdate[19786]: adjust time server 10.47.47.47 offset
 -0.000244 sec
 
 Try reproducing the 408 while running through strace -tt to confirm this.

 Attached strace.
 
 Checkout [1] for advise on timejumps in a VM.

 Yes - we follow recommendations from
 http://kb.vmware.com/selfservice/microsites/search.do?language=en_UScmd=displayKCexternalId=1006427
 (which your link refers to as well)
 
 


-- 
Regards,
Klavs Klavsen, GSEC, kl...@enableit.dk - Tlf. +45 612 812 00
EnableIT - Open Source Server, Security and Network Consulting

Open Source Software - Sometimes you get more than you paid for.



Re: http responses randomly getting RSTs

2014-02-20 Thread Klavs Klavsen
Lukas Tribus said the following on 02/20/2014 11:31 AM:
 Hi,
 
 
 11:04:59.057223 accept(7, {sa_family=AF_INET, sin_port=htons(59491), 
 sin_addr=inet_addr(10.46.10.145)}, [16]) = 1
 11:04:59.057310 fcntl(1, F_SETFL, O_RDONLY|O_NONBLOCK) = 0
 11:04:59.057342 setsockopt(1, SOL_TCP, TCP_NODELAY, [1], 4) = 0
 11:04:59.057370 accept(7, 0x7fff50f0b480, [128]) = -1 EAGAIN (Resource 
 temporarily unavailable)
 11:04:59.057411 recvfrom(1, 0x1d5b504, 8192, 0, 0, 0) = -1 EAGAIN (Resource 
 temporarily unavailable)
 11:04:59.057446 epoll_ctl(0, EPOLL_CTL_ADD, 1, {EPOLLIN|0x2000, {u32=1, 
 u64=1}}) = 0
 11:04:59.057476 epoll_wait(0, {}, 200, 31) = 0
 11:04:59.088696 epoll_ctl(0, EPOLL_CTL_DEL, 1, {0, {u32=1, u64=1}}) = 0
 11:04:59.088752 epoll_wait(0, {}, 200, 0) = 0
 11:04:59.088778 sendto(1, HTTP/1.0 408 Request Time-out\r\nC..., 212, 
 MSG_DONTWAIT|MSG_NOSIGNAL|MSG_MORE, NULL, 0) = 212
 11:04:59.088822 shutdown(1, 1 /* send */) = 0
 11:04:59.088901 close(1)= 0
 11:04:59.088952 epoll_wait(0, {{EPOLLIN, {u32=7, u64=7}}}, 200, 31) = 1
 
 I don't get it why this happens ...
 
and I haven't gotten the slightest idea.  I'll gladly patch the source
and rebuild the rpm for dev22 I use, to enable more debugging if this
can assist.

 
 Do you still see issues when using defer-accept [1] on the bind line in the
 configuration?

I've set defer-accept on the bind and it seems I can't reproduce the
issue anymore - tried an apachebench with -c 2 and -n 100 - which didn't
give any errors.

 
 Please always respond the the mailing list as well.
 
I didn't want to send attachments to the list.. so it should only be
those that didn't get send to the list (and I should probably just have
send the attachment off-list and not the entire email). sorry.

-- 
Regards,
Klavs Klavsen, GSEC - k...@vsen.dk - http://www.vsen.dk - Tlf. 61281200

Those who do not understand Unix are condemned to reinvent it, poorly.
  --Henry Spencer




RE: http responses randomly getting RSTs

2014-02-20 Thread Lukas Tribus
Hi,

 I suggest you try this an different box, not virtualized or on a different
 hypervisor as well. I still think what you are experiencing is time jump
 related.


Compile the test program from this bugreport:
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=186393

and let it run for a few minutes. Post the results.



Regards,

Lukas 


Re: http responses randomly getting RSTs

2014-02-20 Thread Willy Tarreau
Hi guys,

On Thu, Feb 20, 2014 at 07:49:01PM +0100, Lukas Tribus wrote:
 Hi,
 
  I suggest you try this an different box, not virtualized or on a different
  hypervisor as well. I still think what you are experiencing is time jump
  related.
 
 
 Compile the test program from this bugreport:
 https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=186393
 
 and let it run for a few minutes. Post the results.

Or you can also use this old script I used many years ago to detect such issues 
:

#!/bin/bash
old=$(date +%s);
while : ; do
  new=$(date +%s);
  if [ $new -lt $old ]; then
echo Time jumps backwards : $old = $new
  elif [ $new -gt $((old+1)) ]; then
echo Time jumps forwards  : $old = $new
  fi
  old=$new
done

By the way, time jumps are already dealt with in haproxy. It detects delays
larger than the poll time and time shifts in the past, then compensates for
them. Initially it was done because of the unsynced clock on AMD Opterons,
but it revealed very useful when running on VMs. So in practice, its internal
clock (the now and now_ms variables) are monotonic and cannot go back.

Also, the logs are basedon this internal virtual time, and I'm very worried
about the fact that it logged cR after 31 ms since both use the same clock.

I just checked, and the two only situations where we may report a 408 are
  - when detecting a timeout waiting for a request ;
  - when detecting a timeout waiting for a body in balance url_param 
check-post

But only the former will return cR. So there's no risk of inheritating this
from somewhere else.

The condition to enter this block is the following (proto_http.c:2486) :

if (req-flags  CF_READ_TIMEOUT || tick_is_expired(req-analyse_exp, 
now_ms))

So I'm seeing two possibilities :
  - the CF_READ_TIMEOUT flag was present on the channel
  - the http-request delay was really expired.

Since the logs showed that the delay was very short, either it's the first
case, or we're facing a case there analyse_exp is not properly set. This
expiration is set from timeout http-request or from timeout http-keep-alive
when processing the second and next requests. So here I don't think it can be
the issue.

Thus in my opinion, the only remaining possibility is that the CF_READ_TIMEOUT
flag is set. It is normally set if the read timeout is reached. So here it will
be timeout client. But that doesn't seem to make much sense, especially if
you manage to reproduce it easily.

One thing I'm noticing is that this flag is not cleared between two consecutive
requests over the same connection. So we could imagine a scenario where a read
timeout is detected on the first request, but an event arrives just in time, at
the exact same moment, resulting in the data still being processed while the
flag is set. The request completes and reinitializes to handle the next request
over the connection, but the flag is not cleared. Thus the second request would
be the victim of this inherited timeout. It seems a bit far-fetched, but it
could explain what you're seeing.

Could you please add the following patch to proto_http.c (1.5.22) ? It will log
in debug level a few flags and timers to try to better find what is happeneing
when this happens. You'll need that your syslog deamon logs debug level entries,
maybe they're stored in a differen file (eg: /var/log/debug). Otherwise you can
change the level LOG_DEBUG for something else such as LOG_INFO. Warning, I copy
pasted both the patches below, so you'll have to copy the lines or they won't
apply due to mangled spaces.

--- a/src/proto_http.c
+++ b/src/proto_http.c
@@ -2484,6 +2484,10 @@ int http_wait_for_request(struct session *s, struct 
channel *req, int an_bit)
 
/* 3: has the read timeout expired ? */
else if (req-flags  CF_READ_TIMEOUT || 
tick_is_expired(req-analyse_exp, now_ms)) {
+   send_log(s-fe, LOG_DEBUG,
+Timeout detected: fe=%s s-flags=%08x 
txn-flags=%08x req-flags=%08x msg-flags=%08x now_ms=%u req-analyse_exp=%u 
(%d)\n,
+s-fe-id, s-flags, txn-flags, req-flags, 
msg-flags, now_ms, req-analyse_exp, req-analyse_exp - now_ms);
+
if (!(s-flags  SN_ERR_MASK))
s-flags |= SN_ERR_CLITO;
 

And if you're willing to do a second test, please apply this one :

--- a/src/proto_http.c
+++ b/src/proto_http.c
@@ -,6 +4452,7 @@ void http_end_txn_clean_session(struct session *s)
s-req-cons-exp   = TICK_ETERNITY;
s-req-cons-flags= SI_FL_DONT_WAKE; /* we're in the context of 
process_session */
s-req-flags = 
~(CF_SHUTW|CF_SHUTW_NOW|CF_AUTO_CONNECT|CF_WRITE_ERROR|CF_STREAMER|CF_STREAMER_FAST|CF_NEVER_WAIT);
+   s-req-flags = ~CF_READ_TIMEOUT;
s-rep-flags = 
~(CF_SHUTR|CF_SHUTR_NOW|CF_READ_ATTACHED|CF_READ_ERROR|CF_READ_NOEXP|CF_STREAMER|CF_STREAMER_FAST|CF_WRITE_PARTIAL|CF_NEVER_WAIT);
s-flags = 

RE: http responses randomly getting RSTs

2014-02-19 Thread Lukas Tribus
Hi,


 I have attached a dump, from the client side of the problem.
 As you can see it starts to send the reply, and then suddenly resets.

When looking at tcp.stream eq 0:
We RST in the middle of a HTTP response, without any apparent reason.

When looking at tcp.stream eq 1:
270 ms after the ACK we already see the RST. The browser doesn't even start
to send a request. Could you try without the timeout http-request option
in the config?

You said you have full MTU, but the SYN-ACK from the server announces only
1380 Byte MSS. Any idea why? Any security software or NAT Gateways in between?


I think something is messing with your TCP/HTTP connections. Try reproducing
it with a Linux Live CD and on a different Internet connection (or at least a
different NAT router).



 Thursday, I will do a capture on haproxy server, and capture it there, if
 necessary.

Yes, that would be very useful as we are not sure what the server really sees.

Best thing would be to capture on the client and the server side simultaneously,
so that the capture can be compared.




Regards,

Lukas 


Re: http responses randomly getting RSTs

2014-02-19 Thread Klavs Klavsen
the odd thing is, if I point the url to the varnish right behind the
haproxy - the issue goes away completely.

The dump I send you, was from over the internet (a few countries apart)
- so that's probably why the MSS is the size it is :)

I'll grab a dump on haproxy server tomorrow, while reproducing the
problem with a local client.

Lukas Tribus said the following on 02/19/2014 11:13 AM:
 Hi,
 
 
 I have attached a dump, from the client side of the problem.
 As you can see it starts to send the reply, and then suddenly resets.
 
 When looking at tcp.stream eq 0:
 We RST in the middle of a HTTP response, without any apparent reason.
 
 When looking at tcp.stream eq 1:
 270 ms after the ACK we already see the RST. The browser doesn't even start
 to send a request. Could you try without the timeout http-request option
 in the config?
 
 You said you have full MTU, but the SYN-ACK from the server announces only
 1380 Byte MSS. Any idea why? Any security software or NAT Gateways in between?
 
 
 I think something is messing with your TCP/HTTP connections. Try reproducing
 it with a Linux Live CD and on a different Internet connection (or at least a
 different NAT router).
 
 
 
 Thursday, I will do a capture on haproxy server, and capture it there, if
 necessary.
 
 Yes, that would be very useful as we are not sure what the server really sees.
 
 Best thing would be to capture on the client and the server side 
 simultaneously,
 so that the capture can be compared.
 
 
 
 
 Regards,
 
 Lukas   
 


-- 
Regards,
Klavs Klavsen, GSEC - k...@vsen.dk - http://www.vsen.dk - Tlf. 61281200

Those who do not understand Unix are condemned to reinvent it, poorly.
  --Henry Spencer




RE: http responses randomly getting RSTs

2014-02-19 Thread Lukas Tribus
Hi,


 the odd thing is, if I point the url to the varnish right behind the
 haproxy - the issue goes away completely.

 The dump I send you, was from over the internet (a few countries apart)
 - so that's probably why the MSS is the size it is :)

 I'll grab a dump on haproxy server tomorrow, while reproducing the
 problem with a local client.

I think something is seriously wrong on the server side. I took the liberty
of accessing http://testkkms.kk.dk myself and and I see the issues as well [1].

But haproxy doesn't behave this way.

Can you tell us more about this server? What OS is running? Any firewalls
(software or hardware)? Any other security product in between?
The server is announcing 1380 Byte MSS to me here as well, so this was
not something on your client side, but this is server side and thats not
very common. There must be some 'interesting' detail about this server
that we don't know yet.


Regards,

Lukas


[1] http://cloudshark.org/captures/5b6efeef9fff 
  


Re: http responses randomly getting RSTs

2014-02-19 Thread Cyril Bonté

Hi all,

Le 19/02/2014 20:30, Lukas Tribus a écrit :

Hi,



the odd thing is, if I point the url to the varnish right behind the
haproxy - the issue goes away completely.

The dump I send you, was from over the internet (a few countries apart)
- so that's probably why the MSS is the size it is :)

I'll grab a dump on haproxy server tomorrow, while reproducing the
problem with a local client.


I think something is seriously wrong on the server side. I took the liberty
of accessing http://testkkms.kk.dk myself and and I see the issues as well [1].

But haproxy doesn't behave this way.

Can you tell us more about this server? What OS is running? Any firewalls
(software or hardware)? Any other security product in between?
The server is announcing 1380 Byte MSS to me here as well, so this was
not something on your client side, but this is server side and thats not
very common. There must be some 'interesting' detail about this server
that we don't know yet.


I really think there isn't any issue.

Remember that some browsers (such as chrome/chromium) preopen connection 
in case there is a need to ask for more requests in parallel.


While hitting F5, the browser will ask to close those connections. As 
there was no data, no valid request was received on the haproxy side and 
haproxy logs a 408.


I'm pretty sure that adding option dontlognull [1] will solve the issue.

Also, as a side note, your configuration is not optimal :
- you are using both option httpclose and option http-server-close, 
you should make a choice (or use option http-keep-alive in recent 
haproxy 1.5 dev versions).

- the balance keyword is not valid in a frontend section.
- you should avoid the use of stats enable in the defaults section.

[1] 
http://cbonte.github.io/haproxy-dconv/configuration-1.5.html#option%20dontlognull


--
Cyril Bonté



Re: http responses randomly getting RSTs

2014-02-19 Thread Cyril Bonté

Hi again,

Le 19/02/2014 22:44, Cyril Bonté a écrit :

Also, as a side note, your configuration is not optimal :
- you are using both option httpclose and option http-server-close,
you should make a choice (or use option http-keep-alive in recent
haproxy 1.5 dev versions).
- the balance keyword is not valid in a frontend section.
- you should avoid the use of stats enable in the defaults section.


I forgot one thing, why don't you use option httplog instead of 
option tcplog ? You'll have more details in the logs.


--
Cyril Bonté



Re: http responses randomly getting RSTs

2014-02-19 Thread Cyril Bonté

Hi again and sorry for spamming ;-)

Le 19/02/2014 22:51, Cyril Bonté a écrit :

Hi again,

Le 19/02/2014 22:44, Cyril Bonté a écrit :

Also, as a side note, your configuration is not optimal :
- you are using both option httpclose and option http-server-close,
you should make a choice (or use option http-keep-alive in recent
haproxy 1.5 dev versions).


I quoted the last message from Lukas without reading completely the thread.
I've also made some tests on your server and indeed could see some 408 
responses, even on a simple test using apache bench for 10 keep-alive 
requests in a range (using the server ip to hit the default virtualhost).


Interstingly, putting a haproxy on my laptop that receives the traffic 
and sends it to your server, I don't see any more 408 nor TCP 
retransmissions.


So, can you remove option httpclose from your configuration and let us 
know when it's done ?



--
Cyril Bonté



Re: http responses randomly getting RSTs

2014-02-19 Thread Klavs Klavsen
Cyril Bonté said the following on 02/19/2014 10:44 PM:
[CUT]
 I really think there isn't any issue.
 
 Remember that some browsers (such as chrome/chromium) preopen connection
 in case there is a need to ask for more requests in parallel.
 
 While hitting F5, the browser will ask to close those connections. As
 there was no data, no valid request was received on the haproxy side and
 haproxy logs a 408.
 
Using f5 to reproduce is just my way of reproducing.. it happens
randomly while people work on the site (not pressing f5).

 I'm pretty sure that adding option dontlognull [1] will solve the
 issue.
 
I added this in defaults section - it didn't change anything.
I set option http-keep-alive (and removed http-server-close) as well.
I'm running dev22.

 Also, as a side note, your configuration is not optimal :
 - you are using both option httpclose and option http-server-close,
 you should make a choice (or use option http-keep-alive in recent
 haproxy 1.5 dev versions).
 - the balance keyword is not valid in a frontend section.
 - you should avoid the use of stats enable in the defaults section.
 
my bad - I've removed that.

-- 
Regards,
Klavs Klavsen, GSEC - k...@vsen.dk - http://www.vsen.dk - Tlf. 61281200

Those who do not understand Unix are condemned to reinvent it, poorly.
  --Henry Spencer




Re: http responses randomly getting RSTs

2014-02-19 Thread Klavs Klavsen
Cyril Bonté said the following on 02/19/2014 10:51 PM:
 Hi again,
 
 Le 19/02/2014 22:44, Cyril Bonté a écrit :
 Also, as a side note, your configuration is not optimal :
 - you are using both option httpclose and option http-server-close,
 you should make a choice (or use option http-keep-alive in recent
 haproxy 1.5 dev versions).
 - the balance keyword is not valid in a frontend section.
 - you should avoid the use of stats enable in the defaults section.
 
 I forgot one thing, why don't you use option httplog instead of
 option tcplog ? You'll have more details in the logs.
 
switched to httplog.

when I get the 408 - I see nothing but 200 and 304 returncodes in the logs.

-- 
Regards,
Klavs Klavsen, GSEC - k...@vsen.dk - http://www.vsen.dk - Tlf. 61281200

Those who do not understand Unix are condemned to reinvent it, poorly.
  --Henry Spencer




Re: http responses randomly getting RSTs

2014-02-19 Thread Klavs Klavsen
Cyril Bonté said the following on 02/20/2014 12:03 AM:
 Hi again and sorry for spamming ;-)
 
 Le 19/02/2014 22:51, Cyril Bonté a écrit :
 Hi again,

 Le 19/02/2014 22:44, Cyril Bonté a écrit :
 Also, as a side note, your configuration is not optimal :
 - you are using both option httpclose and option http-server-close,
 you should make a choice (or use option http-keep-alive in recent
 haproxy 1.5 dev versions).
 
 I quoted the last message from Lukas without reading completely the thread.
 I've also made some tests on your server and indeed could see some 408
 responses, even on a simple test using apache bench for 10 keep-alive
 requests in a range (using the server ip to hit the default virtualhost).
 
 Interstingly, putting a haproxy on my laptop that receives the traffic
 and sends it to your server, I don't see any more 408 nor TCP
 retransmissions.
 
 So, can you remove option httpclose from your configuration and let us
 know when it's done ?
 
 
I had already removed it. I found a option http-server-close that I
missed, which I've removed now.

-- 
Regards,
Klavs Klavsen, GSEC - k...@vsen.dk - http://www.vsen.dk - Tlf. 61281200

Those who do not understand Unix are condemned to reinvent it, poorly.
  --Henry Spencer




http responses randomly getting RSTs

2014-02-18 Thread Klavs Klavsen
Hi,

I'm at my wits end and was hoping  you had any ideas.

We've setup a test environment with haproxy-1.5dev22, with varnish as  a
backend (and varnish using a webserver as it's backend).

I have the same issue with haproxy-1.5dev19.

randomly around the website (if I f.ex. reload /user/login url (it's a
drupal website)) - the browser requests fails and the browser says 408.
the VIP on haproxy is 10.27.121.134

a wireshark of the reponse shows, that the browser suddenly gets a RST -
in the middle of a receiving the response.

I did a tcpdump on the haproxy, and on the varnish - and the RST was
originating from haproxy.

The stats counters shows some ereq - which I think is these RSTs:
#
pxname,svname,qcur,qmax,scur,smax,slim,stot,bin,bout,dreq,dresp,ereq,econ,eresp,wretr,wredis,status,weight,act,bck,chkfail,chkdown,lastchg,downtime,qlimit,pid,iid,sid,throttle,lbtot,tracked,type,rate,rate_lim,rate_max,check_status,check_code,check_duration,hrsp_1xx,hrsp_2xx,hrsp_3xx,hrsp_4xx,hrsp_5xx,hrsp_other,hanafail,req_rate,req_rate_max,req_tot,cli_abrt,srv_abrt,comp_in,comp_out,comp_byp,comp_rsp,
example.dk,FRONTEND,,,0,6,8000,132,129135,357937,0,0,9,OPEN,1,2,00,0,0,170,35,168,10,0,0,,0,27,213,,,0,0,0,0,
example.dk-prod,FRONTEND,,,0,0,8000,0,0,0,0,0,0,OPEN,1,4,00,0,0,00,0,0,0,0,0,,0,0,0,,,0,0,0,0,
example.dk-prod,web-p02,0,0,0,0,,0,0,0,,0,,0,0,0,0,UP,1,1,0,0,0,890,0,,1,4,1,,0,,2,0,,0,L4OK,,2000,0,0,0,0,0,0,00,0,
example.dk-prod,web-p03,0,0,0,0,,0,0,0,,0,,0,0,0,0,UP,1,1,0,0,0,890,0,,1,4,2,,0,,2,0,,0,L4OK,,2001,0,0,0,0,0,0,00,0,
example.dk-prod,web-p04,0,0,0,0,,0,0,0,,0,,0,0,0,0,UP,1,1,0,0,0,890,0,,1,4,3,,0,,2,0,,0,L4OK,,2001,0,0,0,0,0,0,00,0,
example.dk-prod,BACKEND,0,0,0,0,800,0,0,0,0,0,,0,0,0,0,UP,3,3,0,,0,890,0,,1,4,0,,0,,1,0,,00,0,0,0,0,0,0,0,0,0,0,0,
varnish-dmz,varnish01,0,0,0,3,,102,64617,177379,,0,,0,0,0,0,UP,1,1,0,0,0,890,0,,1,5,1,,102,,2,0,,14,L4OK,,2002,0,17,85,0,0,0,00,0,
varnish-dmz,varnish02,0,0,0,3,,102,64518,178650,,0,,0,0,0,0,UP,1,1,0,0,0,890,0,,1,5,2,,102,,2,0,,13,L4OK,,2001,0,18,83,1,0,0,00,0,
varnish-dmz,BACKEND,0,0,0,6,800,204,129135,356029,0,0,,0,0,0,0,UP,2,2,0,,0,890,0,,1,5,0,,204,,1,0,,270,35,168,1,0,0,0,0,0,0,0,0,

I've tried setting option http-server-close and tried option httpclose -
and tried without any of them.

My config is:
global
  chroot  /var/lib/haproxy
  daemon
  group  haproxy
  log  127.0.0.1 local0
  maxconn  4000
  pidfile  /var/run/haproxy.pid
  stats  socket /var/lib/haproxy/stats level admin
  user  haproxy

defaults
  log  global
  maxconn  8000
  option  redispatch
  retries  3
  stats  enable
  option httpclose
  option http-server-close
  timeout  http-request 10s
  timeout  queue 1m
  timeout  connect 10s
  timeout  client 1m
  timeout  server 1m
  timeout  check 10s

frontend example.dk
  bind 10.27.121.134:80
  bind 10.27.121.134:443 ssl crt /etc/haproxy/star2.pem ciphers
RC4-SHA:AES128-SHA:AES256-SHA
  mode  http
  balance  roundrobin
  default_backend  varnish-dmz
  option  tcplog
  option  accept-invalid-http-request
  timeout client  30

listen example.dk-prod
  bind 10.27.121.135:80
  bind 10.27.121.135:443 ssl crt
/etc/haproxy/wildcard.example.dk.combined.pem ciphers
RC4-SHA:AES128-SHA:AES256-SHA
  mode  http
  balance  roundrobin
  option  tcplog
  server web-p02 10.27.121.245:80  check
  server web-p03 10.27.121.246:80  check
  server web-p04 10.27.121.247:80  check

backend varnish-dmz
  balance  roundrobin
  mode  http
  option  tcplog
  server varnish01 10.27.121.240:80  check
  server varnish02 10.27.121.241:80  check


-- 
Regards,
Klavs Klavsen, GSEC - k...@vsen.dk - http://www.vsen.dk - Tlf. 61281200

Those who do not understand Unix are condemned to reinvent it, poorly.
  --Henry Spencer




Re: http responses randomly getting RSTs

2014-02-18 Thread Klavs Klavsen
I enabled stats listener to better get an overview of the numbers.. and
it seems there's no ereq.. and haproxy doesn't log anything interesting :(

In debug mode I get a LOT..

It seems it's easier to reproduce, if I hit ctrl+f5 (which goes fine)
and right after that, press f5 - that gives me the 408.

-- 
Regards,
Klavs Klavsen, GSEC - k...@vsen.dk - http://www.vsen.dk - Tlf. 61281200

Those who do not understand Unix are condemned to reinvent it, poorly.
  --Henry Spencer




Re: http responses randomly getting RSTs

2014-02-18 Thread Klavs Klavsen
Enabling debug in config, I only get a more detailed upstart info and a
copy of my config in my logs, and then these upon requests:

Feb 18 14:46:02 localhost haproxy[23414]: 10.46.10.145:58864
[18/Feb/2014:14:46:02.199] example.dk varnish-dmz/varnish01 0/0/1 6315
-- 6/6/0/1/0 0/0
Feb 18 14:46:02 localhost haproxy[23414]: 10.46.10.145:58860
[18/Feb/2014:14:46:02.199] example.dk varnish-dmz/varnish02 0/0/8 7302
-- 6/6/0/0/0 0/0
Feb 18 14:46:02 localhost haproxy[23414]: 10.46.10.145:58866
[18/Feb/2014:14:46:02.354] example.dk varnish-dmz/varnish01 0/0/1 1812
-- 1/1/0/0/0 0/0
Feb 18 14:46:02 localhost haproxy[23414]: 10.46.10.145:58867
[18/Feb/2014:14:46:02.390] example.dk .dk/NOSRV -1/-1/31 212 cR
0/0/0/0/0 0/0
Feb 18 14:46:02 localhost haproxy[23414]: 10.46.10.145:58868
[18/Feb/2014:14:46:02.429] example.dk varnish-dmz/varnish02 0/0/1 2639
-- 2/2/1/1/0 0/0
Feb 18 14:46:02 localhost haproxy[23414]: 10.46.10.145:58869
[18/Feb/2014:14:46:02.430] example.dk varnish-dmz/varnish01 0/0/1 1808
-- 2/2/0/1/0 0/0
Feb 18 14:46:02 localhost haproxy[23414]: 10.46.10.145:58868
[18/Feb/2014:14:46:02.431] example.dk varnish-dmz/varnish02 0/0/16 25620
-- 2/2/1/0/0 0/0
Feb 18 14:46:02 localhost haproxy[23414]: 10.46.10.145:58869
[18/Feb/2014:14:46:02.431] example.dk varnish-dmz/varnish01 0/0/16 27614
-- 3/3/0/0/0 0/0
Feb 18 14:46:02 localhost haproxy[23414]: 10.46.10.145:58870
[18/Feb/2014:14:46:02.448] example.dk varnish-dmz/varnish02 0/0/1 826 --
3/3/0/1/0 0/0
Feb 18 14:46:02 localhost haproxy[23414]: 10.46.10.145:58870
[18/Feb/2014:14:46:02.449] example.dk varnish-dmz/varnish01 0/1/4 6526
-- 4/4/1/1/0 0/0
Feb 18 14:46:02 localhost haproxy[23414]: 10.46.10.145:58868
[18/Feb/2014:14:46:02.447] example.dk varnish-dmz/varnish02 0/0/7 3206
-- 5/5/0/0/0 0/0
Feb 18 14:46:02 localhost haproxy[23414]: 10.46.10.145:58869
[18/Feb/2014:14:46:02.448] example.dk varnish-dmz/varnish01 0/0/12 50494
-- 5/5/0/0/0 0/0
Feb 18 14:46:02 localhost haproxy[23414]: 10.46.10.145:58871
[18/Feb/2014:14:46:02.454] example.dk .dk/NOSRV -1/-1/31 212 cR
2/2/0/0/0 0/0
Feb 18 14:46:02 localhost haproxy[23414]: 10.46.10.145:58872
[18/Feb/2014:14:46:02.455] example.dk .dk/NOSRV -1/-1/30 212 cR
1/1/0/0/0 0/0


-- 
Regards,
Klavs Klavsen, GSEC - k...@vsen.dk - http://www.vsen.dk - Tlf. 61281200

Those who do not understand Unix are condemned to reinvent it, poorly.
  --Henry Spencer




RE: http responses randomly getting RSTs

2014-02-18 Thread Lukas Tribus
Hi,


 Feb 18 14:46:02 localhost haproxy[23414]: 10.46.10.145:58871
 [18/Feb/2014:14:46:02.454] example.dk .dk/NOSRV -1/-1/31 212 cR
 2/2/0/0/0 0/0
 Feb 18 14:46:02 localhost haproxy[23414]: 10.46.10.145:58872
 [18/Feb/2014:14:46:02.455] example.dk .dk/NOSRV -1/-1/30 212 cR
 1/1/0/0/0 0/0


cR means [1]:
 The timeout http-request stroke before the client sent a full HTTP
 request. This is sometimes caused by too large TCP MSS values on the
 client side for PPPoE networks which cannot transport full-sized
 packets, or by clients sending requests by hand and not typing fast
 enough, or forgetting to enter the empty line at the end of the
 request. The HTTP status code is likely a 408 here.

Does your tcpdump confirm this?



Regards,

Lukas


[1] http://cbonte.github.io/haproxy-dconv/configuration-1.5.html#8.5
  


Re: http responses randomly getting RSTs

2014-02-18 Thread Klavs Klavsen
Lukas Tribus said the following on 02/18/2014 03:43 PM:
 Hi,
 
 
 Feb 18 14:46:02 localhost haproxy[23414]: 10.46.10.145:58871
 [18/Feb/2014:14:46:02.454] example.dk .dk/NOSRV -1/-1/31 212 cR
 2/2/0/0/0 0/0
 Feb 18 14:46:02 localhost haproxy[23414]: 10.46.10.145:58872
 [18/Feb/2014:14:46:02.455] example.dk .dk/NOSRV -1/-1/30 212 cR
 1/1/0/0/0 0/0
 
 
 cR means [1]:
 The timeout http-request stroke before the client sent a full HTTP
 request. This is sometimes caused by too large TCP MSS values on the
 client side for PPPoE networks which cannot transport full-sized
 packets, or by clients sending requests by hand and not typing fast
 enough, or forgetting to enter the empty line at the end of the
 request. The HTTP status code is likely a 408 here.
 
 Does your tcpdump confirm this?


What would I look for in the tcpdump? it's a vmware virtualserver with
standard MTU and all.. no PPPoE or anything.

I can reproduce it by pressing f5 (a few times in a row) in my browser
(firefox) and IE.

I cannot reproduce it in chromium.

-- 
Regards,
Klavs Klavsen, GSEC - k...@vsen.dk - http://www.vsen.dk - Tlf. 61281200

Those who do not understand Unix are condemned to reinvent it, poorly.
  --Henry Spencer




RE: http responses randomly getting RSTs

2014-02-18 Thread Lukas Tribus
Hi,


 What would I look for in the tcpdump?

Whether or not the browser takes more than 10 seconds to complete
the http request, as per your configuration:
 timeout http-request 10s


This is the timeout you are hitting. Also read this [1]. Share
your tcpdump capture if it doesn't contain anything proprietary (an tell
us which tcp session in the capture failed, according to your log).


Regards,

Lukas


[1] 
http://cbonte.github.io/haproxy-dconv/configuration-1.5.html#timeout%20http-request
   


RE: http responses randomly getting RSTs

2014-02-18 Thread Lukas Tribus
Hi,


 Whether or not the browser takes more than 10 seconds to complete
 the http request, as per your configuration:
 timeout http-request 10s

 it's not even close to 10 seconds.. it happens as soon as I press f5 -
 randomly.

Well, I was assuming this is the issue, since it was the only error from
your logs and it seemed to be reoccurring.

You did confirmed 408 errors on the client side though, so this is still
my best bet.

Please provide a non-truncated capture (-s 0) of the session
where you see this RST. Filtering on the client IP is probably a good
idea to keep the capture limited to your IP (something like
tcpdump -ps0 -w haproxy-rstcapture.cap host 1.2.3.4 and tcp port 80

for example).



Regards,

Lukas