Re: Reflexions around timers in the logs

Willy Tarreau Tue, 23 Aug 2016 03:16:24 -0700

On Tue, Aug 23, 2016 at 11:34:39AM +0200, Willy Tarreau wrote:
> Maybe something between all of this could consist in this :
> 
>   - we create a new date field (eg: %tr) to note the date the request
>     starts to arrive (end of the idle period)
>   - we use a new field for the request-only time (eg: %TR)
>   - we change the default HTTP log format to use both.
> 
> That would give us %Tq = %Th + %Ti + %TR, and %tr = %t + %Th + %Ti.
> 
> This way people who use "option httplog" will suddenly have much more
> exploitable logs without changing anything, as the dates will now reflect
> the start of receipt of requests and not the end of last response, and the
> other ones will not be harmed by any change.


That's what I've implemented in the attached patch (I didn't update the
doc). It works fine here. I'd have preferred to have the date of complete
request but it would change the semantics of the logs for existing setups,
while this one respects it.

In the log below, the connection was established exactly at 12:06:00,
the first request was started at 12:06:04.941 and completed at 12:06:06.509.
The second request was started at 12:06:10.252 and took 922ms to be typed.

I'm seeing that the total time still includes the handshake and idle time
which is not fun at all, so we'd basically need a different total time here
as well.

Aug 23 12:06:06 haproxy[5688]: 127.0.0.1:47074 [23/Aug/2016:12:06:04.941] test 
test/test 1568/0/0/0/5579 200 145 - - ---- 1/1/0/0/0 0/0 "GET / HTTP/1.1"
Aug 23 12:06:11 haproxy[5688]: 127.0.0.1:47074 [23/Aug/2016:12:06:10.252] test 
test/test 922/0/0/0/4665 200 145 - - ---- 1/1/0/0/0 0/0 "GET / HTTP/1.1"

In short we're adding Th, Ti and TR below in addition to the already
existing Tq, and we'd need to have Tsomething to represent the active
time of the request (Ta ?)  :

            first request                   2nd request
      <---------------------------------><-------------- ...
      t         tr                       t    tr ...   
   ---|----|----|----|----|----|----|----|----|--
        Th   Ti   TR   Tw   Tc   Tr   Td   Ti   ...
      <---- Tq ----->
      <--------------- Tt -------------->
                <----- Tsomething ------>

Hoping I'm not confusing too many people :-)

Willy

diff --git a/doc/configuration.txt b/doc/configuration.txt
index 3bb70a4..61903f0 100644
--- a/doc/configuration.txt
+++ b/doc/configuration.txt
@@ -14687,8 +14687,8 @@ Detailed fields description :
     a full HTTP request, not counting data. It can be "-1" if the connection
     was aborted before a complete request could be received. It should always
     be very small because a request generally fits in one single packet. Large
-    times here generally indicate network trouble between the client and
-    haproxy. See "Timers" below for more details.
+    times here generally indicate network issues between the client and haproxy
+    or requests being typed by hand. See "Timers" below for more details.
 
   - "Tw" is the total time in milliseconds spent waiting in the various queues.
     It can be "-1" if the connection was aborted before reaching the queue.
@@ -14926,7 +14926,9 @@ Please refer to the table below for currently defined 
variables :
   |   | %Tc  | Tc                                            | numeric     |
   |   | %Td  | Td = Tt - (Tq + Tw + Tc + Tr)                 | numeric     |
   |   | %Tl  | local_date_time                               | date        |
-  | H | %Tq  | Tq                                            | numeric     |
+  |   | %Th  | connexion handshake time (SSL and proxy proto)| numeric     |
+  | H | %Ti  | idle time before the HTTP request             | numeric     |
+  | H | %Tq  | time to receive the full request              | numeric     |
   | H | %Tr  | Tr                                            | numeric     |
   |   | %Ts  | timestamp                                     | numeric     |
   |   | %Tt  | Tt                                            | numeric     |
@@ -15082,13 +15084,33 @@ Timers provide a great help in troubleshooting 
network problems. All values are
 reported in milliseconds (ms). These timers should be used in conjunction with
 the session termination flags. In TCP mode with "option tcplog" set on the
 frontend, 3 control points are reported under the form "Tw/Tc/Tt", and in HTTP
-mode, 5 control points are reported under the form "Tq/Tw/Tc/Tr/Tt" :
+mode, 5 control points are reported under the form "Tq/Tw/Tc/Tr/Tt". In
+addition, two other measures are provided, "Th" and "Ti".
+
+  - Th: total time to accept tcp connection and execute handshakes for low 
level
+    protocols. Currently, these protocoles are proxy-protocol and SSL. This may
+    only happen once during the whole connection's lifetime. A large time here
+    may indicate that the client only pre-established the connection without
+    speaking, that it is experiencing network issues preventing it from
+    completing a handshake in a reasonable time (eg: MTU issues), or that an
+    SSL handshake was very expensive to compute.
+
+  - Ti: is the idle time before the HTTP request (HTTP mode only). This timer
+    counts between the end of the handshakes and the first byte of the HTTP
+    request. When dealing with a second request in keep-alive mode, it starts
+    to count after the end of the transmission the previous response. Some
+    browsers pre-establish connections to a server in order to reduce the
+    latency of a future request, and keep them pending until they need it. This
+    delay will be reported as the idle time. A value of -1 indicates that
+    nothing was received on the connection.
 
   - Tq: total time to get the client request (HTTP mode only). It's the time
-    elapsed between the moment the client connection was accepted and the
-    moment the proxy received the last HTTP header. The value "-1" indicates
-    that the end of headers (empty line) has never been seen. This happens when
-    the client closes prematurely or times out.
+    elapsed between the first bytes received and the moment the proxy received
+    the empty line marking the end of the HTTP headers. The value "-1"
+    indicates that the end of headers has never been seen. This happens when
+    the client closes prematurely or times out. This time is usually very short
+    since most requests fit in a single packet. A large time may indicate a
+    request typed by hand during a test.
 
   - Tw: total time spent in the queues waiting for a connection slot. It
     accounts for backend queue as well as the server queues, and depends on the
@@ -15116,15 +15138,15 @@ mode, 5 control points are reported under the form 
"Tq/Tw/Tc/Tr/Tt" :
 
   - Tt: total session duration time, between the moment the proxy accepted it
     and the moment both ends were closed. The exception is when the "logasap"
-    option is specified. In this case, it only equals (Tq+Tw+Tc+Tr), and is
-    prefixed with a '+' sign. From this field, we can deduce "Td", the data
+    option is specified. In this case, it only equals (Th+Ti+Tq+Tw+Tc+Tr), and
+    is prefixed with a '+' sign. From this field, we can deduce "Td", the data
     transmission time, by subtracting other timers when valid :
 
-        Td = Tt - (Tq + Tw + Tc + Tr)
+        Td = Tt - (Th + Ti + Tq + Tw + Tc + Tr)
 
     Timers with "-1" values have to be excluded from this equation. In TCP
-    mode, "Tq" and "Tr" have to be excluded too. Note that "Tt" can never be
-    negative.
+    mode, "Ti", "Tq" and "Tr" have to be excluded too. Note that "Tt" can never
+    be negative.
 
 These timers provide precious indications on trouble causes. Since the TCP
 protocol defines retransmit delays of 3, 6, 12... seconds, we know for sure
@@ -15135,17 +15157,17 @@ session has been aborted on timeout.
 
 Most common cases :
 
-  - If "Tq" is close to 3000, a packet has probably been lost between the
-    client and the proxy. This is very rare on local networks but might happen
-    when clients are on far remote networks and send large requests. It may
-    happen that values larger than usual appear here without any network cause.
-    Sometimes, during an attack or just after a resource starvation has ended,
-    haproxy may accept thousands of connections in a few milliseconds. The time
-    spent accepting these connections will inevitably slightly delay processing
-    of other connections, and it can happen that request times in the order of
-    a few tens of milliseconds are measured after a few thousands of new
-    connections have been accepted at once. Using one of the keep-alive modes
-    may display larger request times since "Tq" also measures the time spent
+  - If "Th" or "Ti" are close to 3000, a packet has probably been lost between
+    the client and the proxy. This is very rare on local networks but might
+    happen when clients are on far remote networks and send large requests. It
+    may happen that values larger than usual appear here without any network
+    cause. Sometimes, during an attack or just after a resource starvation has
+    ended, haproxy may accept thousands of connections in a few milliseconds.
+    The time spent accepting these connections will inevitably slightly delay
+    processing of other connections, and it can happen that request times in 
the
+    order of a few tens of milliseconds are measured after a few thousands of
+    new connections have been accepted at once. Using one of the keep-alive
+    modes may display larger idle times since "Ti" measures the time spent
     waiting for additional requests.
 
   - If "Tc" is close to 3000, a packet has probably been lost between the
diff --git a/include/types/log.h b/include/types/log.h
index 25d872e..f3543f4 100644
--- a/include/types/log.h
+++ b/include/types/log.h
@@ -66,6 +66,7 @@ enum {
        LOG_FMT_LOGCNT,
        LOG_FMT_PID,
        LOG_FMT_DATE,
+       LOG_FMT_DATEREQ,
        LOG_FMT_DATEGMT,
        LOG_FMT_DATELOCAL,
        LOG_FMT_TS,
@@ -77,10 +78,13 @@ enum {
        LOG_FMT_BYTES,
        LOG_FMT_BYTES_UP,
        LOG_FMT_T,
+       LOG_FMT_TH,
+       LOG_FMT_TI,
        LOG_FMT_TQ,
        LOG_FMT_TW,
        LOG_FMT_TC,
        LOG_FMT_TR,
+       LOG_FMT_TReq,
        LOG_FMT_TD,
        LOG_FMT_TT,
        LOG_FMT_STATUS,
diff --git a/include/types/stream.h b/include/types/stream.h
index 8687726..17e74b8 100644
--- a/include/types/stream.h
+++ b/include/types/stream.h
@@ -97,6 +97,8 @@ struct strm_logs {
        int level;                      /* log level to force + 1 if > 0, -1 = 
no log */
        struct timeval accept_date;     /* date of the stream's accept() in 
user date */
        struct timeval tv_accept;       /* date of the stream's accept() in 
internal date (monotonic) */
+       long t_handshake;               /* hanshake duration, -1 if never 
occurs */
+       long t_idle;                    /* idle duration, -1 if never occurs */
        struct timeval tv_request;      /* date the request arrives, {0,0} if 
never occurs */
        long  t_queue;                  /* delay before the stream gets out of 
the connect queue, -1 if never occurs */
        long  t_connect;                /* delay before the connect() to the 
server succeeds, -1 if never occurs */
diff --git a/src/log.c b/src/log.c
index 4f4ede9..63567c3 100644
--- a/src/log.c
+++ b/src/log.c
@@ -125,8 +125,11 @@ static const struct logformat_type logformat_keywords[] = {
        { "ST", LOG_FMT_STATUS, PR_MODE_TCP, LW_RESP, NULL },   /* status code 
*/
        { "T", LOG_FMT_DATEGMT, PR_MODE_TCP, LW_INIT, NULL },   /* date GMT */
        { "Tc", LOG_FMT_TC, PR_MODE_TCP, LW_BYTES, NULL },       /* Tc */
+       { "Th", LOG_FMT_TH, PR_MODE_TCP, LW_BYTES, NULL },       /* Time 
handshake */
+       { "Ti", LOG_FMT_TI, PR_MODE_HTTP, LW_BYTES, NULL },       /* Time idle 
*/
+       { "TR", LOG_FMT_TReq, PR_MODE_HTTP, LW_BYTES, NULL },       /* Time to 
receive a valid request */
        { "Tl", LOG_FMT_DATELOCAL, PR_MODE_TCP, LW_INIT, NULL },   /* date 
local timezone */
-       { "Tq", LOG_FMT_TQ, PR_MODE_HTTP, LW_BYTES, NULL },       /* Tq */
+       { "Tq", LOG_FMT_TQ, PR_MODE_HTTP, LW_BYTES, NULL },       /* 
Tq=Th+Ti+TR */
        { "Tr", LOG_FMT_TR, PR_MODE_HTTP, LW_BYTES, NULL },       /* Tr */
        { "Td", LOG_FMT_TD, PR_MODE_TCP, LW_BYTES, NULL },       /* Td = Tt - 
(Tq + Tw + Tc + Tr) */
        { "Ts", LOG_FMT_TS, PR_MODE_TCP, LW_INIT, NULL },   /* timestamp GMT */
@@ -169,6 +172,7 @@ static const struct logformat_type logformat_keywords[] = {
        { "sslc", LOG_FMT_SSL_CIPHER, PR_MODE_TCP, LW_XPRT, NULL }, /* 
client-side SSL ciphers */
        { "sslv", LOG_FMT_SSL_VERSION, PR_MODE_TCP, LW_XPRT, NULL }, /* 
client-side SSL protocol version */
        { "t", LOG_FMT_DATE, PR_MODE_TCP, LW_INIT, NULL },      /* date */
+       { "tr", LOG_FMT_DATEREQ, PR_MODE_HTTP, LW_BYTES, NULL },  /* date of 
start of request tr = t + Th + Ti */
        { "ts", LOG_FMT_TERMSTATE, PR_MODE_TCP, LW_BYTES, NULL },/* termination 
state */
        { "tsc", LOG_FMT_TERMSTATE_CK, PR_MODE_TCP, LW_INIT, NULL },/* 
termination state */
 
@@ -187,7 +191,7 @@ static const struct logformat_type logformat_keywords[] = {
        { 0, 0, 0, 0, NULL }
 };
 
-char default_http_log_format[] = "%ci:%cp [%t] %ft %b/%s %Tq/%Tw/%Tc/%Tr/%Tt 
%ST %B %CC %CS %tsc %ac/%fc/%bc/%sc/%rc %sq/%bq %hr %hs %{+Q}r"; // default 
format
+char default_http_log_format[] = "%ci:%cp [%tr] %ft %b/%s %TR/%Tw/%Tc/%Tr/%Tt 
%ST %B %CC %CS %tsc %ac/%fc/%bc/%sc/%rc %sq/%bq %hr %hs %{+Q}r"; // default 
format
 char clf_http_log_format[] = "%{+Q}o %{-Q}ci - - [%T] %r %ST %B \"\" \"\" %cp 
%ms %ft %b %s %Tq %Tw %Tc %Tr %Tt %tsc %ac %fc %bc %sc %rc %sq %bq %CC %CS %hrl 
%hsl";
 char default_tcp_log_format[] = "%ci:%cp [%t] %ft %b/%s %Tw/%Tc/%Tt %B %ts 
%ac/%fc/%bc/%sc/%rc %sq/%bq";
 char *log_format = NULL;
@@ -1305,6 +1309,7 @@ int build_logline(struct stream *s, char *dst, size_t 
maxsize, struct list *list
        char *ret;
        int iret;
        struct logformat_node *tmp;
+       struct timeval tv;
 
        /* FIXME: let's limit ourselves to frontend logging for now. */
 
@@ -1502,6 +1507,17 @@ int build_logline(struct stream *s, char *dst, size_t 
maxsize, struct list *list
                                last_isspace = 0;
                                break;
 
+                       case LOG_FMT_DATEREQ: // %tr
+                               /* Note that the timers are valid if we get 
here */
+                               tv_ms_add(&tv, &s->logs.accept_date, 
s->logs.t_idle >= 0 ? s->logs.t_idle + s->logs.t_handshake : 0);
+                               get_localtime(tv.tv_sec, &tm);
+                               ret = date2str_log(tmplog, &tm, &tv, dst + 
maxsize - tmplog);
+                               if (ret == NULL)
+                                       goto out;
+                               tmplog = ret;
+                               last_isspace = 0;
+                               break;
+
                        case LOG_FMT_TS: // %Ts
                                get_gmtime(s->logs.accept_date.tv_sec, &tm);
                                if (tmp->options & LOG_OPT_HEXA) {
@@ -1620,6 +1636,33 @@ int build_logline(struct stream *s, char *dst, size_t 
maxsize, struct list *list
                                last_isspace = 0;
                                break;
 
+                       case LOG_FMT_TH: // %Th
+                               ret = ltoa_o(s->logs.t_handshake,
+                                               tmplog, dst + maxsize - tmplog);
+                               if (ret == NULL)
+                                       goto out;
+                               tmplog = ret;
+                               last_isspace = 0;
+                               break;
+
+                       case LOG_FMT_TI: // %Ti
+                               ret = ltoa_o(s->logs.t_idle,
+                                               tmplog, dst + maxsize - tmplog);
+                               if (ret == NULL)
+                                       goto out;
+                               tmplog = ret;
+                               last_isspace = 0;
+                               break;
+
+                       case LOG_FMT_TReq: // %TR
+                               ret = ltoa_o((t_request >= 0) ? t_request - 
s->logs.t_idle - s->logs.t_handshake : -1,
+                                            tmplog, dst + maxsize - tmplog);
+                               if (ret == NULL)
+                                       goto out;
+                               tmplog = ret;
+                               last_isspace = 0;
+                               break;
+
                        case LOG_FMT_TQ: // %Tq
                                ret = ltoa_o(t_request, tmplog, dst + maxsize - 
tmplog);
                                if (ret == NULL)
diff --git a/src/proto_http.c b/src/proto_http.c
index 3e18a6c..c8df1a4 100644
--- a/src/proto_http.c
+++ b/src/proto_http.c
@@ -2455,6 +2455,12 @@ int http_wait_for_request(struct stream *s, struct 
channel *req, int an_bit)
         * data later, which is much more complicated.
         */
        if (buffer_not_empty(req->buf) && msg->msg_state < HTTP_MSG_ERROR) {
+
+               /* This point is executed when some data is avalaible for 
analysis,
+                * so we log the end of the idle time. */
+               if (s->logs.t_idle == -1)
+                       s->logs.t_idle = tv_ms_elapsed(&s->logs.tv_accept, 
&now) - s->logs.t_handshake;
+
                if (txn->flags & TX_NOT_FIRST) {
                        if (unlikely(!channel_is_rewritable(req))) {
                                if (req->flags & 
(CF_SHUTW|CF_SHUTW_NOW|CF_WRITE_ERROR|CF_WRITE_TIMEOUT))
@@ -5026,6 +5032,8 @@ void http_end_txn_clean_session(struct stream *s)
 
        s->logs.accept_date = date; /* user-visible date for logging */
        s->logs.tv_accept = now;  /* corrected date for internal use */
+       s->logs.t_handshake = 0; /* There are no handshake in keep alive 
connection. */
+       s->logs.t_idle = -1;
        tv_zero(&s->logs.tv_request);
        s->logs.t_queue = -1;
        s->logs.t_connect = -1;
diff --git a/src/stream.c b/src/stream.c
index 3eb5265..b1d7936 100644
--- a/src/stream.c
+++ b/src/stream.c
@@ -93,6 +93,10 @@ struct stream *stream_new(struct session *sess, struct task 
*t, enum obj_type *o
        s->logs.level = 0;
        s->logs.accept_date = sess->accept_date; /* user-visible date for 
logging */
        s->logs.tv_accept = sess->tv_accept;   /* corrected date for internal 
use */
+       /* This function is called just after the handshake, so the handshake 
duration is
+        * between the accept time and now. */
+       s->logs.t_handshake = tv_ms_elapsed(&sess->tv_accept, &now);
+       s->logs.t_idle = -1;
        tv_zero(&s->logs.tv_request);
        s->logs.t_queue = -1;
        s->logs.t_connect = -1;

Re: Reflexions around timers in the logs

Reply via email to