Re: Help with cleaning up our error log output and request error counts

Brendon Colby Thu, 27 Dec 2012 09:16:43 -0800

On 12/27/2012 1:53 AM, Willy Tarreau wrote:

On Wed, Dec 26, 2012 at 07:03:02PM -0500, Brendon Colby wrote:

I was thinking that this is just standard browser behavior too. IE also does
this - it just seems to open fewer connections. This is why I was confused
and thought I was missing something, because it seems like normal browser
behavior even though the docs indicated that it should only happen during an
attack or some other anomalous event.

This stupid behaviour is something very recent. I did not know that IE was
doing this too, but after all I'm not surprized, with the browser war...

I'm really surprised no one else as seen and reported this behavior. Iknow one site in particular (trello.com) uses haproxy and that has to behigher volume (at least in terms of req/s) than our site. Reading theirarchitecture posts is how I found out about haproxy. Maybe I will sendthem an e-mail to see if they've seen anything like this.

If you can reproduce the behaviour with your browser, I think that dontlognull
will be your only solution and that we'll have to update the doc to indicate
that browsers have adopted such an internet-unfriendly behaviour that it's
better to leave the option on. What I don't like with proactively opened
connections is that they're killing servers with 10-100 times the load they
would have to sustain and that even small sites might experience issues with
this. If you see 200 of them per second and they last 5s on average, it means
you're constantly having 5000 idle connections just because of this. Many
web servers can't handle this :-/

I can see smaller sites having a hard time with this! Before we added
"timeout http-request" we were seeing over 22K established connections to the
haproxy server. That's why we have such a high maxconn on our frontend - we
hit several limits once we went live (which reminds me to lower it) and had
to keep increasing it.

That's really disgusting. Products such as haproxy or nginx can easily deal
with that many concurrent connections, but many other legacy servers cannot.

Would you mind reporting this to the HTTP working group at the IETF ? The
HTTP/1.1 spec is currently being refined and is almost done, but we can
still add information there. Good and bad practices can be updated with
this experience. The address is ietf-http...@w3.org.


I'll see what I can do.

Once we added the timeout though, established connections plummeted to (as of
now, for example) about 5K. The total TCP connections did NOT go down,
however, because now most of them are now in TIME_WAIT (92K!).

TIME_WAIT are harmless on the server side. You can easily reach millions
without any issues.


OK. I didn't know it could reach millions. This is good to know.

The only thing
this seems to affect is our monitoring system which uses netstat to get TCP
stats. It sometimes takes almost a minute to run and uses 100% CPU, but
otherwise doesn't seem to affect anything so we've left it for now.

There are two commands that you must absolutely never use in a monitoring
system :
   - netstat -a
   - ipcs -a

Both of them will saturate the system and considerably slow it down when
something starts to go wrong. For the sockets you should use what's in
/proc/net/sockstat. You have all the numbers you want. If you need more
details, use "ss -a" instead of "netstat -a", it uses the netlink interface
and is several orders of magnitude faster.

Awesome - "ss -a" runs much faster than netstat. I will update ourmonitoring system to use that instead. This is good to know.

So if connections are terminated because of timeouts that I have explicitly
set, is there any reason to log and count that as a request error? That to me
seems like something that could be logged as info for troubleshooting and not
counted as an error at all. Just a thought - it's nothing major.

This is a real error. It's not because some browsers decided to do stupid
things that it's not an error. When you don't count this case not attacks,
the first reason for not getting a request is that it is blocked by too
short an MTU in some VPNs. It's very important to know that a browser
could not send a POST or a request with a large cookie due to a short MTU
somewhere.

OK - makes sense. So if a browser proactively opens several connectionsto the server, doesn't send any HTTP requests, and then shuts down theconnection after so many seconds, you're saying this is basically thesame behavior you'd see from clients with too short an MTU in some typesof VPNs?

With dontlognulls we still see this type of error at a rate of several per 
second:

Dec 26 18:38:17 localhost haproxy[32259]: n.n.n.n:33685
[26/Dec/2012:18:38:17.155] http-in webservers-ngfiles/ngwebNN 10/0/0/1/649
200 31671 - - CD-- 4166/4159/139/31/0 0/0 "GET
/images/14/o-eternal-o_the-vampire-dragon.jpg HTTP/1.1"

This one happens more frequently and is also an error as the transfer was not
complete, as seen from haproxy.

We see this a lot in our environment since we host mostly media, butit's not too big of a deal.

This is logged when, for example, I am on the site watching a movie, and I
close the browser. To me, this is another event that could be logged at the
info level, but I figure there are probably good reasons why it's logged and
counted as an error. It is probably just the perfectionist in me that wants
the error rate to be nearly 0. :)

In fact if you want to bring the error rate to zero by hiding all errors,
you're cheating :-)

It seems to me that you'd like not to log errors for what happens on the
client side, am I wrong ? This could probably make sense in some environments
and we could probably think about adding an option to do that.

Yeah maybe a setting to ignore or at least not log client side errors.The only problem I see with this is we WOULD want to have "timeouthttp-request" in place, so that would change the bulk of the errorswe're seeing (200-300 per second) from client side to server sideerrors. In this case, if we could tell haproxy to ignore connectionsthat never send a request, maybe that would work too.

I'm OK just knowing that these errors are truly errors and not somethingI missed. haproxy was so easy to set up my co-worker and I thought wemissed something. It's good to know our config checks out and that wecan just run as-is with "dontlognulls". I think just updating thedocumentation to reflect that this is now (for better or worse) standardbrowser behavior, and in certain cases there just isn't much you can doabout it but run with "dontlognulls", would be fine.


Thanks for your input!

Brendon

Re: Help with cleaning up our error log output and request error counts

Reply via email to