Stats Page Not Matching Log Lines

SC Thu, 25 Sep 2014 09:10:16 -0700

Hi,

First time listener and caller. Of course my first interaction with haproxy
is someone telling me it's not working properly. :) I've done a lot of
digging and I'm so very confused about some of the things I am finding and
hoping to get some assistance here.


After determining on their own that they were not receiving all of the web
traffic they were supposed to, our engineering team asked me to look into
it. They're specifically concerned about the stats they're seeing on the
stats page, which say something like this:

Frontend
- 3434009 requests total
- 79263 3xx errors
- 1283482 4xx errors
- 30396 5xx errors

Backend
- 79306 3xx errors
- 136 4xx errors
- 30396 5xx errors

I also grepped through haproxy.log for the same time period to get a count
of all of the log lines for various errors. It looks like this:

  Error

Counts

206

6

304

221635

400

230

404

11

405

60

408

21

500

15058

503

30396




My confusion comes from not understanding the discrepancy between some of
the numbers. I get why frontend and backend would differ; obviously if the
request doesn't even make it to a backend server then there would be a
difference. However, I am logging different numbers for these errors in
haproxy.log than the stats page is showing.

Example: according to the stats page I have a ton of 4xx errors on the
frontend: 1283482. I don't have a fraction of that showing up in
haproxy.log. The 5xx errors are similar: I get a count of 30396 503 errors
on the front and backend, and I have the same number of 503 errors in the
logs, but then there are a ton of 500 errors in the logs that don't appear
to be accounted for in the stats page.

Eng wants me to troubleshoot why we're getting all of these 4xx and 5xx
drops, but since I can't account for even half of them in the logs it's
difficult to even begin. Is there a setting I'm missing that's not
capturing all of this?

haproxy.cfg:

global
        maxconn         300000
#       log             127.0.0.1       local6 info
#       log             172.1.1.101     local5 notice
#       log             172.1.1.101     local6 info
        log             127.0.0.1       local5 notice
        log             127.0.0.1       local6 info
        uid             99
        gid             99
        chroot          /var/empty
        nbproc          1
        spread-checks   5
        daemon
        stats           socket          /var/run/haproxy/haproxy.sock mode
0600 level admin

# The public 'www' address in the DMZ
frontend public
        bind            1.1.1.1:80
        mode            http
        log             global
#       option          checkcache
        option          httplog         # log in HTTP format
#       option          logasap         # log after processing server
headers - don't wait for long sessions
        option          dontlognull     # ignore sessions which don't
transfer data (health checks)
        option          forwardfor      # add a X-Forwarded-For HTTP header
with the client IP
        option          httpclose       # remove connection header - defeat
spurious keep-alives
#       option          forceclose      # close outgoing connections with
empty buffers
        monitor-uri     /monitoruri
        # block certain browser strings
#       reqideny        ^(User-Agent:\ ) *HTTrack*
        maxconn         150000
        clitimeout      260000
        capture         request header          User-Agent      len 128
        capture         request header          Referer         len 128

# Host: will use a specific keyword soon
#       reqisetbe       ^Host:\ img                             static
#       The URI will use a specific keyword soon
#       reqisetbe       ^[^\ ]*\ /(img|css)/                    static
        reqisetbe       ^[^\ ]*\ /admin/stats                   stats

        default_backend                                         dynCreative

# The static backend backend for 'Host: img', /img and /css.
backend webservers
        mode            http
        balance         roundrobin
#       balance         source
#       option          checkcache
        timeout connect 5000
        timeout queue   300000
        srvtimeout      360000
        option          redispatch
        retries         5
        option          forwardfor
#       option          httpchk HEAD    /robots.txt
#       option          persist
        option          allbackups
#       rsprep          ^Server:.*      Server:\ lighttpd
#       cookie          SERVERID        insert  nocache # indirect nocache
        stats           refresh         5
        log             global
        fullconn        5000            # the servers will be used at full
load above this number of connections
#       server          server1        1.1.1.2:8080 minconn 50 maxconn
131070 check inter 5000
        server          server2        1.1.1.3:8080 minconn 50 maxconn 1000
check inter 5000
        server          server3        1.1.1.4:8080 minconn 50 maxconn 1000
check inter 5000
        server          server4        1.1.1.5:8080 minconn 50 maxconn 1000
check inter 5000

# Haproxy Internal Stats
backend stats
#       log             global
        mode            http
        stats           uri /
        stats           realm           Haproxy\ Stats
        stats           refresh         5
        balance         roundrobin


Thanks in advance for your help and insight!

Stats Page Not Matching Log Lines

Reply via email to