Greetings,

Error in my last e-mail, used used the word client instead of server; fixed inline.

On 03/10/2016 02:34 PM, Chad Lavoie wrote:
Greetings,

Having paged through the logs, I see a lot that seem to have the first four numbers fairly small (indicating that the request to the response headers finished before times started getting extreme) (Tq, Tw, Tc, Tr), but which have an overall time (Tt) in the realm of five minutes.

This would indicate that the backend is getting the request from the client (Tq), gets through the queues (Tw), a TCP connection to the backend is established (Tc), and it sends the response headers (Tr) in a few hundred ms to a couple of seconds; but then most of the time is spent with the client sending the body.
Most of the time is spent with the server sending the body, not the client data (as per the timings the server already sent the response headers, so the client is mostly out of the picture).

- Chad

Before we move on, does that sound reasonable as a potential issue location? If not, I can try running some math on the columns to get a better idea (I just looked at a random sampling of slow requests to compare to what I've seen as the baseline).

Another thing which is interesting here are the termination states (I usually look at them as they give an idea for why connections are failing; definitions are at https://cbonte.github.io/haproxy-dconv/configuration-1.5.html#8.5):
      7 CHVN
      9 SDVN
     10 cDVN
     12 LR--
     33 CDVN
     50 SHDN
     92 --NI
    113 sHVN
    186 SHVN
   2115 --DI
  13896 --VN

The first two chars show the state at termiation, and the second two talk about the persistence cookie (useful for seeing if first time clients are failing, etc).

The ones starting with -- indicate they were successful, so ignoring them here. Other then that we have a bunch starting with SH, indicating that the TCP connection to the backend ether failed or was aborted, and sH indicating that the backend connection attempt timed out. The numbers are fairly small there in terms of failures vs successes, so I'd say that isn't likely to be the primary issue (unless we get to talking about individual connections).

If thats the case, the next step would be to figure out why the body data takes so long; which is outside of what HAProxy can cleanly help with. Do the backends have logs which would indicate what they are doing? If not, the next thing I'd try would be making a file with TCPDump to view in Wireshark to see what is going on between haproxy and the backends (how to do that is outside the scope of what makes any sense to describe here, though).

- Chad

On 03/10/2016 08:06 AM, matt wrote:
I have the log, but a lot of the data is confidential.
Can I send you by email in order for you to take a look?

We can post a edited version later in order to help others
debug the same issue

Thanks in advance






Reply via email to