Hi guys,

On Thu, Feb 20, 2014 at 07:49:01PM +0100, Lukas Tribus wrote:
> Hi,
> 
> > I suggest you try this an different box, not virtualized or on a different
> > hypervisor as well. I still think what you are experiencing is time jump
> > related.
> 
> 
> Compile the test program from this bugreport:
> https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=186393
> 
> and let it run for a few minutes. Post the results.

Or you can also use this old script I used many years ago to detect such issues 
:

#!/bin/bash
old=$(date +%s);
while : ; do
  new=$(date +%s);
  if [ $new -lt $old ]; then
    echo "Time jumps backwards : $old => $new"
  elif [ $new -gt $((old+1)) ]; then
    echo "Time jumps forwards  : $old => $new"
  fi
  old=$new
done

By the way, time jumps are already dealt with in haproxy. It detects delays
larger than the poll time and time shifts in the past, then compensates for
them. Initially it was done because of the unsynced clock on AMD Opterons,
but it revealed very useful when running on VMs. So in practice, its internal
clock (the "now" and "now_ms" variables) are monotonic and cannot go back.

Also, the logs are basedon this internal virtual time, and I'm very worried
about the fact that it logged cR after 31 ms since both use the same clock.

I just checked, and the two only situations where we may report a 408 are
  - when detecting a timeout waiting for a request ;
  - when detecting a timeout waiting for a body in "balance url_param 
check-post"

But only the former will return "cR". So there's no risk of inheritating this
from somewhere else.

The condition to enter this block is the following (proto_http.c:2486) :

        if (req->flags & CF_READ_TIMEOUT || tick_is_expired(req->analyse_exp, 
now_ms))

So I'm seeing two possibilities :
  - the CF_READ_TIMEOUT flag was present on the channel
  - the http-request delay was really expired.

Since the logs showed that the delay was very short, either it's the first
case, or we're facing a case there analyse_exp is not properly set. This
expiration is set from "timeout http-request" or from "timeout http-keep-alive"
when processing the second and next requests. So here I don't think it can be
the issue.

Thus in my opinion, the only remaining possibility is that the CF_READ_TIMEOUT
flag is set. It is normally set if the read timeout is reached. So here it will
be "timeout client". But that doesn't seem to make much sense, especially if
you manage to reproduce it easily.

One thing I'm noticing is that this flag is not cleared between two consecutive
requests over the same connection. So we could imagine a scenario where a read
timeout is detected on the first request, but an event arrives just in time, at
the exact same moment, resulting in the data still being processed while the
flag is set. The request completes and reinitializes to handle the next request
over the connection, but the flag is not cleared. Thus the second request would
be the victim of this inherited timeout. It seems a bit far-fetched, but it
could explain what you're seeing.

Could you please add the following patch to proto_http.c (1.5.22) ? It will log
in debug level a few flags and timers to try to better find what is happeneing
when this happens. You'll need that your syslog deamon logs debug level entries,
maybe they're stored in a differen file (eg: /var/log/debug). Otherwise you can
change the level LOG_DEBUG for something else such as LOG_INFO. Warning, I copy
pasted both the patches below, so you'll have to copy the lines or they won't
apply due to mangled spaces.

--- a/src/proto_http.c
+++ b/src/proto_http.c
@@ -2484,6 +2484,10 @@ int http_wait_for_request(struct session *s, struct 
channel *req, int an_bit)
 
                /* 3: has the read timeout expired ? */
                else if (req->flags & CF_READ_TIMEOUT || 
tick_is_expired(req->analyse_exp, now_ms)) {
+                       send_log(s->fe, LOG_DEBUG,
+                                "Timeout detected: fe=%s s->flags=%08x 
txn->flags=%08x req->flags=%08x msg->flags=%08x now_ms=%u req->analyse_exp=%u 
(%d)\n",
+                                s->fe->id, s->flags, txn->flags, req->flags, 
msg->flags, now_ms, req->analyse_exp, req->analyse_exp - now_ms);
+
                        if (!(s->flags & SN_ERR_MASK))
                                s->flags |= SN_ERR_CLITO;
 

And if you're willing to do a second test, please apply this one :

--- a/src/proto_http.c
+++ b/src/proto_http.c
@@ -4444,6 +4452,7 @@ void http_end_txn_clean_session(struct session *s)
        s->req->cons->exp       = TICK_ETERNITY;
        s->req->cons->flags    &= SI_FL_DONT_WAKE; /* we're in the context of 
process_session */
        s->req->flags &= 
~(CF_SHUTW|CF_SHUTW_NOW|CF_AUTO_CONNECT|CF_WRITE_ERROR|CF_STREAMER|CF_STREAMER_FAST|CF_NEVER_WAIT);
+       s->req->flags &= ~CF_READ_TIMEOUT;
        s->rep->flags &= 
~(CF_SHUTR|CF_SHUTR_NOW|CF_READ_ATTACHED|CF_READ_ERROR|CF_READ_NOEXP|CF_STREAMER|CF_STREAMER_FAST|CF_WRITE_PARTIAL|CF_NEVER_WAIT);
        s->flags &= 
~(SN_DIRECT|SN_ASSIGNED|SN_ADDR_SET|SN_BE_ASSIGNED|SN_FORCE_PRST|SN_IGNORE_PRST);
        s->flags &= ~(SN_CURR_SESS|SN_REDIRECTABLE);

It removes the CF_READ_TIMEOUT flag between requests. If the problem
disappears, it means the flag is inherited between requests (though we still
have to figure when it can happen without being an issue).

Best regards,
Willy


Reply via email to