On Tue, Aug 23, 2016 at 11:34:39AM +0200, Willy Tarreau wrote: > Maybe something between all of this could consist in this : > > - we create a new date field (eg: %tr) to note the date the request > starts to arrive (end of the idle period) > - we use a new field for the request-only time (eg: %TR) > - we change the default HTTP log format to use both. > > That would give us %Tq = %Th + %Ti + %TR, and %tr = %t + %Th + %Ti. > > This way people who use "option httplog" will suddenly have much more > exploitable logs without changing anything, as the dates will now reflect > the start of receipt of requests and not the end of last response, and the > other ones will not be harmed by any change.
That's what I've implemented in the attached patch (I didn't update the doc). It works fine here. I'd have preferred to have the date of complete request but it would change the semantics of the logs for existing setups, while this one respects it. In the log below, the connection was established exactly at 12:06:00, the first request was started at 12:06:04.941 and completed at 12:06:06.509. The second request was started at 12:06:10.252 and took 922ms to be typed. I'm seeing that the total time still includes the handshake and idle time which is not fun at all, so we'd basically need a different total time here as well. Aug 23 12:06:06 haproxy[5688]: 127.0.0.1:47074 [23/Aug/2016:12:06:04.941] test test/test 1568/0/0/0/5579 200 145 - - ---- 1/1/0/0/0 0/0 "GET / HTTP/1.1" Aug 23 12:06:11 haproxy[5688]: 127.0.0.1:47074 [23/Aug/2016:12:06:10.252] test test/test 922/0/0/0/4665 200 145 - - ---- 1/1/0/0/0 0/0 "GET / HTTP/1.1" In short we're adding Th, Ti and TR below in addition to the already existing Tq, and we'd need to have Tsomething to represent the active time of the request (Ta ?) : first request 2nd request <---------------------------------><-------------- ... t tr t tr ... ---|----|----|----|----|----|----|----|----|-- Th Ti TR Tw Tc Tr Td Ti ... <---- Tq -----> <--------------- Tt --------------> <----- Tsomething ------> Hoping I'm not confusing too many people :-) Willy
diff --git a/doc/configuration.txt b/doc/configuration.txt index 3bb70a4..61903f0 100644 --- a/doc/configuration.txt +++ b/doc/configuration.txt @@ -14687,8 +14687,8 @@ Detailed fields description : a full HTTP request, not counting data. It can be "-1" if the connection was aborted before a complete request could be received. It should always be very small because a request generally fits in one single packet. Large - times here generally indicate network trouble between the client and - haproxy. See "Timers" below for more details. + times here generally indicate network issues between the client and haproxy + or requests being typed by hand. See "Timers" below for more details. - "Tw" is the total time in milliseconds spent waiting in the various queues. It can be "-1" if the connection was aborted before reaching the queue. @@ -14926,7 +14926,9 @@ Please refer to the table below for currently defined variables : | | %Tc | Tc | numeric | | | %Td | Td = Tt - (Tq + Tw + Tc + Tr) | numeric | | | %Tl | local_date_time | date | - | H | %Tq | Tq | numeric | + | | %Th | connexion handshake time (SSL and proxy proto)| numeric | + | H | %Ti | idle time before the HTTP request | numeric | + | H | %Tq | time to receive the full request | numeric | | H | %Tr | Tr | numeric | | | %Ts | timestamp | numeric | | | %Tt | Tt | numeric | @@ -15082,13 +15084,33 @@ Timers provide a great help in troubleshooting network problems. All values are reported in milliseconds (ms). These timers should be used in conjunction with the session termination flags. In TCP mode with "option tcplog" set on the frontend, 3 control points are reported under the form "Tw/Tc/Tt", and in HTTP -mode, 5 control points are reported under the form "Tq/Tw/Tc/Tr/Tt" : +mode, 5 control points are reported under the form "Tq/Tw/Tc/Tr/Tt". In +addition, two other measures are provided, "Th" and "Ti". + + - Th: total time to accept tcp connection and execute handshakes for low level + protocols. Currently, these protocoles are proxy-protocol and SSL. This may + only happen once during the whole connection's lifetime. A large time here + may indicate that the client only pre-established the connection without + speaking, that it is experiencing network issues preventing it from + completing a handshake in a reasonable time (eg: MTU issues), or that an + SSL handshake was very expensive to compute. + + - Ti: is the idle time before the HTTP request (HTTP mode only). This timer + counts between the end of the handshakes and the first byte of the HTTP + request. When dealing with a second request in keep-alive mode, it starts + to count after the end of the transmission the previous response. Some + browsers pre-establish connections to a server in order to reduce the + latency of a future request, and keep them pending until they need it. This + delay will be reported as the idle time. A value of -1 indicates that + nothing was received on the connection. - Tq: total time to get the client request (HTTP mode only). It's the time - elapsed between the moment the client connection was accepted and the - moment the proxy received the last HTTP header. The value "-1" indicates - that the end of headers (empty line) has never been seen. This happens when - the client closes prematurely or times out. + elapsed between the first bytes received and the moment the proxy received + the empty line marking the end of the HTTP headers. The value "-1" + indicates that the end of headers has never been seen. This happens when + the client closes prematurely or times out. This time is usually very short + since most requests fit in a single packet. A large time may indicate a + request typed by hand during a test. - Tw: total time spent in the queues waiting for a connection slot. It accounts for backend queue as well as the server queues, and depends on the @@ -15116,15 +15138,15 @@ mode, 5 control points are reported under the form "Tq/Tw/Tc/Tr/Tt" : - Tt: total session duration time, between the moment the proxy accepted it and the moment both ends were closed. The exception is when the "logasap" - option is specified. In this case, it only equals (Tq+Tw+Tc+Tr), and is - prefixed with a '+' sign. From this field, we can deduce "Td", the data + option is specified. In this case, it only equals (Th+Ti+Tq+Tw+Tc+Tr), and + is prefixed with a '+' sign. From this field, we can deduce "Td", the data transmission time, by subtracting other timers when valid : - Td = Tt - (Tq + Tw + Tc + Tr) + Td = Tt - (Th + Ti + Tq + Tw + Tc + Tr) Timers with "-1" values have to be excluded from this equation. In TCP - mode, "Tq" and "Tr" have to be excluded too. Note that "Tt" can never be - negative. + mode, "Ti", "Tq" and "Tr" have to be excluded too. Note that "Tt" can never + be negative. These timers provide precious indications on trouble causes. Since the TCP protocol defines retransmit delays of 3, 6, 12... seconds, we know for sure @@ -15135,17 +15157,17 @@ session has been aborted on timeout. Most common cases : - - If "Tq" is close to 3000, a packet has probably been lost between the - client and the proxy. This is very rare on local networks but might happen - when clients are on far remote networks and send large requests. It may - happen that values larger than usual appear here without any network cause. - Sometimes, during an attack or just after a resource starvation has ended, - haproxy may accept thousands of connections in a few milliseconds. The time - spent accepting these connections will inevitably slightly delay processing - of other connections, and it can happen that request times in the order of - a few tens of milliseconds are measured after a few thousands of new - connections have been accepted at once. Using one of the keep-alive modes - may display larger request times since "Tq" also measures the time spent + - If "Th" or "Ti" are close to 3000, a packet has probably been lost between + the client and the proxy. This is very rare on local networks but might + happen when clients are on far remote networks and send large requests. It + may happen that values larger than usual appear here without any network + cause. Sometimes, during an attack or just after a resource starvation has + ended, haproxy may accept thousands of connections in a few milliseconds. + The time spent accepting these connections will inevitably slightly delay + processing of other connections, and it can happen that request times in the + order of a few tens of milliseconds are measured after a few thousands of + new connections have been accepted at once. Using one of the keep-alive + modes may display larger idle times since "Ti" measures the time spent waiting for additional requests. - If "Tc" is close to 3000, a packet has probably been lost between the diff --git a/include/types/log.h b/include/types/log.h index 25d872e..f3543f4 100644 --- a/include/types/log.h +++ b/include/types/log.h @@ -66,6 +66,7 @@ enum { LOG_FMT_LOGCNT, LOG_FMT_PID, LOG_FMT_DATE, + LOG_FMT_DATEREQ, LOG_FMT_DATEGMT, LOG_FMT_DATELOCAL, LOG_FMT_TS, @@ -77,10 +78,13 @@ enum { LOG_FMT_BYTES, LOG_FMT_BYTES_UP, LOG_FMT_T, + LOG_FMT_TH, + LOG_FMT_TI, LOG_FMT_TQ, LOG_FMT_TW, LOG_FMT_TC, LOG_FMT_TR, + LOG_FMT_TReq, LOG_FMT_TD, LOG_FMT_TT, LOG_FMT_STATUS, diff --git a/include/types/stream.h b/include/types/stream.h index 8687726..17e74b8 100644 --- a/include/types/stream.h +++ b/include/types/stream.h @@ -97,6 +97,8 @@ struct strm_logs { int level; /* log level to force + 1 if > 0, -1 = no log */ struct timeval accept_date; /* date of the stream's accept() in user date */ struct timeval tv_accept; /* date of the stream's accept() in internal date (monotonic) */ + long t_handshake; /* hanshake duration, -1 if never occurs */ + long t_idle; /* idle duration, -1 if never occurs */ struct timeval tv_request; /* date the request arrives, {0,0} if never occurs */ long t_queue; /* delay before the stream gets out of the connect queue, -1 if never occurs */ long t_connect; /* delay before the connect() to the server succeeds, -1 if never occurs */ diff --git a/src/log.c b/src/log.c index 4f4ede9..63567c3 100644 --- a/src/log.c +++ b/src/log.c @@ -125,8 +125,11 @@ static const struct logformat_type logformat_keywords[] = { { "ST", LOG_FMT_STATUS, PR_MODE_TCP, LW_RESP, NULL }, /* status code */ { "T", LOG_FMT_DATEGMT, PR_MODE_TCP, LW_INIT, NULL }, /* date GMT */ { "Tc", LOG_FMT_TC, PR_MODE_TCP, LW_BYTES, NULL }, /* Tc */ + { "Th", LOG_FMT_TH, PR_MODE_TCP, LW_BYTES, NULL }, /* Time handshake */ + { "Ti", LOG_FMT_TI, PR_MODE_HTTP, LW_BYTES, NULL }, /* Time idle */ + { "TR", LOG_FMT_TReq, PR_MODE_HTTP, LW_BYTES, NULL }, /* Time to receive a valid request */ { "Tl", LOG_FMT_DATELOCAL, PR_MODE_TCP, LW_INIT, NULL }, /* date local timezone */ - { "Tq", LOG_FMT_TQ, PR_MODE_HTTP, LW_BYTES, NULL }, /* Tq */ + { "Tq", LOG_FMT_TQ, PR_MODE_HTTP, LW_BYTES, NULL }, /* Tq=Th+Ti+TR */ { "Tr", LOG_FMT_TR, PR_MODE_HTTP, LW_BYTES, NULL }, /* Tr */ { "Td", LOG_FMT_TD, PR_MODE_TCP, LW_BYTES, NULL }, /* Td = Tt - (Tq + Tw + Tc + Tr) */ { "Ts", LOG_FMT_TS, PR_MODE_TCP, LW_INIT, NULL }, /* timestamp GMT */ @@ -169,6 +172,7 @@ static const struct logformat_type logformat_keywords[] = { { "sslc", LOG_FMT_SSL_CIPHER, PR_MODE_TCP, LW_XPRT, NULL }, /* client-side SSL ciphers */ { "sslv", LOG_FMT_SSL_VERSION, PR_MODE_TCP, LW_XPRT, NULL }, /* client-side SSL protocol version */ { "t", LOG_FMT_DATE, PR_MODE_TCP, LW_INIT, NULL }, /* date */ + { "tr", LOG_FMT_DATEREQ, PR_MODE_HTTP, LW_BYTES, NULL }, /* date of start of request tr = t + Th + Ti */ { "ts", LOG_FMT_TERMSTATE, PR_MODE_TCP, LW_BYTES, NULL },/* termination state */ { "tsc", LOG_FMT_TERMSTATE_CK, PR_MODE_TCP, LW_INIT, NULL },/* termination state */ @@ -187,7 +191,7 @@ static const struct logformat_type logformat_keywords[] = { { 0, 0, 0, 0, NULL } }; -char default_http_log_format[] = "%ci:%cp [%t] %ft %b/%s %Tq/%Tw/%Tc/%Tr/%Tt %ST %B %CC %CS %tsc %ac/%fc/%bc/%sc/%rc %sq/%bq %hr %hs %{+Q}r"; // default format +char default_http_log_format[] = "%ci:%cp [%tr] %ft %b/%s %TR/%Tw/%Tc/%Tr/%Tt %ST %B %CC %CS %tsc %ac/%fc/%bc/%sc/%rc %sq/%bq %hr %hs %{+Q}r"; // default format char clf_http_log_format[] = "%{+Q}o %{-Q}ci - - [%T] %r %ST %B \"\" \"\" %cp %ms %ft %b %s %Tq %Tw %Tc %Tr %Tt %tsc %ac %fc %bc %sc %rc %sq %bq %CC %CS %hrl %hsl"; char default_tcp_log_format[] = "%ci:%cp [%t] %ft %b/%s %Tw/%Tc/%Tt %B %ts %ac/%fc/%bc/%sc/%rc %sq/%bq"; char *log_format = NULL; @@ -1305,6 +1309,7 @@ int build_logline(struct stream *s, char *dst, size_t maxsize, struct list *list char *ret; int iret; struct logformat_node *tmp; + struct timeval tv; /* FIXME: let's limit ourselves to frontend logging for now. */ @@ -1502,6 +1507,17 @@ int build_logline(struct stream *s, char *dst, size_t maxsize, struct list *list last_isspace = 0; break; + case LOG_FMT_DATEREQ: // %tr + /* Note that the timers are valid if we get here */ + tv_ms_add(&tv, &s->logs.accept_date, s->logs.t_idle >= 0 ? s->logs.t_idle + s->logs.t_handshake : 0); + get_localtime(tv.tv_sec, &tm); + ret = date2str_log(tmplog, &tm, &tv, dst + maxsize - tmplog); + if (ret == NULL) + goto out; + tmplog = ret; + last_isspace = 0; + break; + case LOG_FMT_TS: // %Ts get_gmtime(s->logs.accept_date.tv_sec, &tm); if (tmp->options & LOG_OPT_HEXA) { @@ -1620,6 +1636,33 @@ int build_logline(struct stream *s, char *dst, size_t maxsize, struct list *list last_isspace = 0; break; + case LOG_FMT_TH: // %Th + ret = ltoa_o(s->logs.t_handshake, + tmplog, dst + maxsize - tmplog); + if (ret == NULL) + goto out; + tmplog = ret; + last_isspace = 0; + break; + + case LOG_FMT_TI: // %Ti + ret = ltoa_o(s->logs.t_idle, + tmplog, dst + maxsize - tmplog); + if (ret == NULL) + goto out; + tmplog = ret; + last_isspace = 0; + break; + + case LOG_FMT_TReq: // %TR + ret = ltoa_o((t_request >= 0) ? t_request - s->logs.t_idle - s->logs.t_handshake : -1, + tmplog, dst + maxsize - tmplog); + if (ret == NULL) + goto out; + tmplog = ret; + last_isspace = 0; + break; + case LOG_FMT_TQ: // %Tq ret = ltoa_o(t_request, tmplog, dst + maxsize - tmplog); if (ret == NULL) diff --git a/src/proto_http.c b/src/proto_http.c index 3e18a6c..c8df1a4 100644 --- a/src/proto_http.c +++ b/src/proto_http.c @@ -2455,6 +2455,12 @@ int http_wait_for_request(struct stream *s, struct channel *req, int an_bit) * data later, which is much more complicated. */ if (buffer_not_empty(req->buf) && msg->msg_state < HTTP_MSG_ERROR) { + + /* This point is executed when some data is avalaible for analysis, + * so we log the end of the idle time. */ + if (s->logs.t_idle == -1) + s->logs.t_idle = tv_ms_elapsed(&s->logs.tv_accept, &now) - s->logs.t_handshake; + if (txn->flags & TX_NOT_FIRST) { if (unlikely(!channel_is_rewritable(req))) { if (req->flags & (CF_SHUTW|CF_SHUTW_NOW|CF_WRITE_ERROR|CF_WRITE_TIMEOUT)) @@ -5026,6 +5032,8 @@ void http_end_txn_clean_session(struct stream *s) s->logs.accept_date = date; /* user-visible date for logging */ s->logs.tv_accept = now; /* corrected date for internal use */ + s->logs.t_handshake = 0; /* There are no handshake in keep alive connection. */ + s->logs.t_idle = -1; tv_zero(&s->logs.tv_request); s->logs.t_queue = -1; s->logs.t_connect = -1; diff --git a/src/stream.c b/src/stream.c index 3eb5265..b1d7936 100644 --- a/src/stream.c +++ b/src/stream.c @@ -93,6 +93,10 @@ struct stream *stream_new(struct session *sess, struct task *t, enum obj_type *o s->logs.level = 0; s->logs.accept_date = sess->accept_date; /* user-visible date for logging */ s->logs.tv_accept = sess->tv_accept; /* corrected date for internal use */ + /* This function is called just after the handshake, so the handshake duration is + * between the accept time and now. */ + s->logs.t_handshake = tv_ms_elapsed(&sess->tv_accept, &now); + s->logs.t_idle = -1; tv_zero(&s->logs.tv_request); s->logs.t_queue = -1; s->logs.t_connect = -1;