Hi guys, On Thu, Feb 20, 2014 at 07:49:01PM +0100, Lukas Tribus wrote: > Hi, > > > I suggest you try this an different box, not virtualized or on a different > > hypervisor as well. I still think what you are experiencing is time jump > > related. > > > Compile the test program from this bugreport: > https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=186393 > > and let it run for a few minutes. Post the results.
Or you can also use this old script I used many years ago to detect such issues : #!/bin/bash old=$(date +%s); while : ; do new=$(date +%s); if [ $new -lt $old ]; then echo "Time jumps backwards : $old => $new" elif [ $new -gt $((old+1)) ]; then echo "Time jumps forwards : $old => $new" fi old=$new done By the way, time jumps are already dealt with in haproxy. It detects delays larger than the poll time and time shifts in the past, then compensates for them. Initially it was done because of the unsynced clock on AMD Opterons, but it revealed very useful when running on VMs. So in practice, its internal clock (the "now" and "now_ms" variables) are monotonic and cannot go back. Also, the logs are basedon this internal virtual time, and I'm very worried about the fact that it logged cR after 31 ms since both use the same clock. I just checked, and the two only situations where we may report a 408 are - when detecting a timeout waiting for a request ; - when detecting a timeout waiting for a body in "balance url_param check-post" But only the former will return "cR". So there's no risk of inheritating this from somewhere else. The condition to enter this block is the following (proto_http.c:2486) : if (req->flags & CF_READ_TIMEOUT || tick_is_expired(req->analyse_exp, now_ms)) So I'm seeing two possibilities : - the CF_READ_TIMEOUT flag was present on the channel - the http-request delay was really expired. Since the logs showed that the delay was very short, either it's the first case, or we're facing a case there analyse_exp is not properly set. This expiration is set from "timeout http-request" or from "timeout http-keep-alive" when processing the second and next requests. So here I don't think it can be the issue. Thus in my opinion, the only remaining possibility is that the CF_READ_TIMEOUT flag is set. It is normally set if the read timeout is reached. So here it will be "timeout client". But that doesn't seem to make much sense, especially if you manage to reproduce it easily. One thing I'm noticing is that this flag is not cleared between two consecutive requests over the same connection. So we could imagine a scenario where a read timeout is detected on the first request, but an event arrives just in time, at the exact same moment, resulting in the data still being processed while the flag is set. The request completes and reinitializes to handle the next request over the connection, but the flag is not cleared. Thus the second request would be the victim of this inherited timeout. It seems a bit far-fetched, but it could explain what you're seeing. Could you please add the following patch to proto_http.c (1.5.22) ? It will log in debug level a few flags and timers to try to better find what is happeneing when this happens. You'll need that your syslog deamon logs debug level entries, maybe they're stored in a differen file (eg: /var/log/debug). Otherwise you can change the level LOG_DEBUG for something else such as LOG_INFO. Warning, I copy pasted both the patches below, so you'll have to copy the lines or they won't apply due to mangled spaces. --- a/src/proto_http.c +++ b/src/proto_http.c @@ -2484,6 +2484,10 @@ int http_wait_for_request(struct session *s, struct channel *req, int an_bit) /* 3: has the read timeout expired ? */ else if (req->flags & CF_READ_TIMEOUT || tick_is_expired(req->analyse_exp, now_ms)) { + send_log(s->fe, LOG_DEBUG, + "Timeout detected: fe=%s s->flags=%08x txn->flags=%08x req->flags=%08x msg->flags=%08x now_ms=%u req->analyse_exp=%u (%d)\n", + s->fe->id, s->flags, txn->flags, req->flags, msg->flags, now_ms, req->analyse_exp, req->analyse_exp - now_ms); + if (!(s->flags & SN_ERR_MASK)) s->flags |= SN_ERR_CLITO; And if you're willing to do a second test, please apply this one : --- a/src/proto_http.c +++ b/src/proto_http.c @@ -4444,6 +4452,7 @@ void http_end_txn_clean_session(struct session *s) s->req->cons->exp = TICK_ETERNITY; s->req->cons->flags &= SI_FL_DONT_WAKE; /* we're in the context of process_session */ s->req->flags &= ~(CF_SHUTW|CF_SHUTW_NOW|CF_AUTO_CONNECT|CF_WRITE_ERROR|CF_STREAMER|CF_STREAMER_FAST|CF_NEVER_WAIT); + s->req->flags &= ~CF_READ_TIMEOUT; s->rep->flags &= ~(CF_SHUTR|CF_SHUTR_NOW|CF_READ_ATTACHED|CF_READ_ERROR|CF_READ_NOEXP|CF_STREAMER|CF_STREAMER_FAST|CF_WRITE_PARTIAL|CF_NEVER_WAIT); s->flags &= ~(SN_DIRECT|SN_ASSIGNED|SN_ADDR_SET|SN_BE_ASSIGNED|SN_FORCE_PRST|SN_IGNORE_PRST); s->flags &= ~(SN_CURR_SESS|SN_REDIRECTABLE); It removes the CF_READ_TIMEOUT flag between requests. If the problem disappears, it means the flag is inherited between requests (though we still have to figure when it can happen without being an issue). Best regards, Willy