Re: [PATCH] MEDIUM/RFC: Implement time-based server latency metrics

Krishna Kumar (Engineering) Thu, 22 Dec 2016 03:20:49 -0800

Thanks Willy for your detailed review, especially some design related points
that I was not aware of. I will go through these and respond accordingly.


Regards,
- Krishna


On Thu, Dec 22, 2016 at 4:23 PM, Willy Tarreau <w...@1wt.eu> wrote:

> Hi Krishna,
>
> On Thu, Dec 22, 2016 at 09:41:49AM +0530, Krishna Kumar (Engineering)
> wrote:
> > We have found that the current mechanism of qtime, ctime, rtime, and
> ttime
> > based on last 1024 requests is not the most suitable to debug/visualize
> > latency issues with servers, especially if they happen to last a very
> short
> > time. For live dashboards showing server timings, we found an additional
> > last-'n' seconds metrics useful. The logs could also be parsed to derive
> > these
> > values, but suffers from delays at high volume, requiring higher
> processing
> > power and enabling logs.
> >
> > The 'last-n' seconds metrics per server/backend can be configured as
> follows
> > in the HAProxy configuration file:
> > backend backend-1
> > stats-period 32
> > ...
> >
> > To retrieve these stats at the CLI (in addition to existing metrics),
> run:
> > echo show stat-duration time 3 | socat /var/run/admin.sock stdio
> >
> > These are also available on the GUI.
> >
> > The justification for this patch are:
> > 1. Allows to capture spikes for a server during a short period. This
> helps
> >    having dashboards that show server response times every few seconds
> (e.g.
> >    every 1 second), so as to be able to chart it across timelines.
> > 2. Be able to get an average across different time intervals, e.g.  the
> >    configuration file may specify to save the last 32 seconds, but the
> cli
> >    interface can request for average across any interval upto 32 seconds.
> > E.g.
> >    the following command prints the existing metrics appended by the time
> >    based ones for the last 1 second:
> > echo show stat-duration time 1 | socat /var/run/admin.sock stdio
> >    Running the following existing command appends the time-based metric
> > values
> >    based on the time period configured in the configuration file per
> >    backend/server:
> > echo show stat | socat /var/run/admin.sock stdio
> > 3. Option per backend for configuring the server stat's time interval,
> and.
> >    no API breakage to stats (new metrics are added at end of line).
> >
> > Please review, any feedback on the code/usability/extensibility is very
> much
> > appreciated.
>
> First, thanks for this work. I'm having several concerns and comments
> however
> about it.
>
> The first one is that the amount of storage is overkill if the output can
> only emit an average over a few periods. I mean, the purpose of stats is
> to emit what we know internally. Some people might want to see historgrams,
> and while we have everything internally with your patch, it's not possible
> to produce them.
>
> For this reason I think we should proceed differently and always emit these
> stats over a few hard-coded periods. You proved that they don't take that
> much space, and I think it would make sense probably to emit them over a
> small series of power of 4 seconds : 1s, 4s, 16s, 64s. That's quite cheap
> to store and easy to compute because it's not needed anymore to store all
> individual values, you can cascade them while filling a bucket.
>
> And if you go down that route then there's no need anymore for adding yet
> another setting in the configuration, it will be done by default. Another
> point regarding this setting "stats-period" is that you currently do not
> support the "defaults" section while I guess almost all users will want to
> have it there. That's another reason for keeping it hard-coded.
>
> Now some general comments and guidance on the code itself :
>
> > diff --git a/include/proto/stats.h b/include/proto/stats.h
> > index ac893b8..f33ceb1 100644
> > --- a/include/proto/stats.h
> > +++ b/include/proto/stats.h
> > @@ -95,7 +95,8 @@ int stats_fill_fe_stats(struct proxy *px, struct field
> *stats, int len);
> >  int stats_fill_li_stats(struct proxy *px, struct listener *l, int flags,
> >                          struct field *stats, int len);
> >  int stats_fill_sv_stats(struct proxy *px, struct server *sv, int flags,
> > -                        struct field *stats, int len);
> > +                        struct field *stats, int len,
> > +                        struct stream_interface *si);
>
> Please avoid re-introducing the stream_interface in stats, we're trying
> hard
> to abstract it. From what I'm seeing you only need it to know if there's a
> different period. Better pass the period directly in argument from the
> caller.
>
> > diff --git a/include/types/counters.h b/include/types/counters.h
> > index 06ce617..a33e01f 100644
> > --- a/include/types/counters.h
> > +++ b/include/types/counters.h
> > @@ -105,6 +105,41 @@ struct be_counters {
> >       } p;                                    /* protocol-specific stats
> */
> >  };
> >
> > +/*
> > + * Struct time-stat-counters:
> > + *
> > + * Implement a last-n seconds based counters for backend/server's
> q/c/r/t
> > + * times. Total memory consumption on x86_64 to maintain all 4 event
> time
> > + * values per backend/server is: 16 + #seconds * 56 bytes.
> > + *   for 1 second:   72 bytes
> > + *   for 16 seconds: 912 bytes
> > + *   for 32 seconds: 1808 bytes
> > + *   for 64 second:  3600 bytes
> > + */
> > +struct __tstat_counters {
>
> Avoid struct names using '__' as a prefix. In general we use such a prefix
> for functions that people should use with extreme caution (eg: functions
> which do not check for nulls or do not lock). Here there's no reason for
> this.
>
> > +     unsigned int timestamp;         /* time when this entry was added
> */
> > +     long t_queue;                   /* sum of queue times */
> > +     long t_connect;                 /* sum of times to connect to
> server */
> > +     long t_data;                    /* sum of times to get response */
> > +     long t_close;                   /* sum of session times */
> > +     int t_queue_counter;            /* total number of 'queue' entries
> */
> > +     int t_connect_counter;          /* total number of 'connect'
> entries */
> > +     int t_data_counter;             /* total number of 'response'
> entries */
> > +     int t_close_counter;            /* total number of 'session'
> entries */
> > +};
> > +
> > +struct tstat_counters {
> > +     int max_entries;                /* max number of 'times' entries */
> > +     struct __tstat_counters *times; /* Stats for every second */
> > +};
>
> (...)
> > diff --git a/include/types/proxy.h b/include/types/proxy.h
> > index 3bf5a4d..97c5732 100644
> > --- a/include/types/proxy.h
> > +++ b/include/types/proxy.h
> > @@ -365,6 +365,8 @@ struct proxy {
> >       struct be_counters be_counters;         /* backend statistics
> counters */
> >       struct fe_counters fe_counters;         /* frontend statistics
> counters */
> >
> > +     struct tstat_counters be_tstat_counters; /* last 'n' time counters
> for backend */
> > +
>
> Given that you use it both in the backend and in the server, you should
> probably move that directly to the struct be_counters. That will also get
> rid of your intermediary tstats_counters struct.
>
> > diff --git a/src/cfgparse.c b/src/cfgparse.c
> > index 3ed2c22..b212ee6 100644
> > --- a/src/cfgparse.c
> > +++ b/src/cfgparse.c
> > @@ -2551,6 +2551,50 @@ static void free_email_alert(struct proxy *p)
> >       p->email_alert.myhostname = NULL;
> >  }
> >
> > +/*
> > + * Parse "stats-period" option in backend specifying how many
> > + * seconds of stats to keep per server/backend.
> > + */
> > +int parse_stats_period(const char *file, int linenum, char **args,
> > +                    struct proxy *curproxy, struct proxy *defproxy)
>
> Better register a new keyword from stats.c than further increasing
> cfgparse.c.
> We're trying to deflate that huge beast by providing a more decentralized
> parsing system. Please check "cfg_register_keywords()" for this.
>
> > diff --git a/src/haproxy.c b/src/haproxy.c
> > index 239db80..57536ea 100644
> > --- a/src/haproxy.c
> > +++ b/src/haproxy.c
> > @@ -860,6 +860,12 @@ static void init(int argc, char **argv)
> >          }
> >  #endif
> >
> > +     /* Enable time stats counters for those backends that asked for it
> */
> > +     if (enable_be_tstat_counters()) {
> > +             Alert("Unable to allocate backend time-counter stats.\n");
> > +             exit(1);
> > +     }
> > +
>
> At some point all proxies are started (start_proxies()). This should be
> done at the same time instead of having it own loop.
>
> > +/* Initialize data structure for a single server/backend */
> > +int tstat_counters_init(struct tstat_counters *tstat, int num_entries)
>
> Please always indicate the expected return values. That saves a lot of
> time during debugging sessions, otherwise we always have to read the whole
> function to know what 1 or 0 means for example. Here I'm seeing that 0 is
> OK and non-zero indicates a failure.
>
> > +{
> > +     tstat->max_entries = num_entries;
> > +     if (!num_entries) {
> > +             tstat->times = NULL;
> > +             return 0;
> > +     }
> > +
> > +     tstat->times = calloc(num_entries, sizeof(*tstat->times));
> > +     return tstat->times == NULL;
> > +}
> > +
> > +/* Wrapper function for tstat_counters_init() for all backends */
> > +int enable_be_tstat_counters(void)
>
> Same here.
>
> (...)
> > diff --git a/src/stats.c b/src/stats.c
> > index 976496e..571ec11 100644
> > --- a/src/stats.c
> > +++ b/src/stats.c
> > @@ -216,6 +216,10 @@ const char *stat_field_names[ST_F_TOTAL_FIELDS] = {
> >       [ST_F_INTERCEPTED]    = "intercepted",
> >       [ST_F_DCON]           = "dcon",
> >       [ST_F_DSES]           = "dses",
> > +     [ST_F_QTIME_PERIOD]   = "qt_period",
> > +     [ST_F_CTIME_PERIOD]   = "ct_period",
> > +     [ST_F_RTIME_PERIOD]   = "rt_period",
> > +     [ST_F_TTIME_PERIOD]   = "tt_period",
>
> I think that if we go for the multiple periods output, it would make sense
> to have the value directly in the name. For example, qt1, qt4, qt16, qt64
> etc...
>
> >  };
> >
> >  /* one line of info */
> > @@ -385,7 +389,7 @@ static int stats_dump_fields_typed(struct chunk
> *out, const struct field *stats)
> >   * reserved for the checkbox is ST_SHOWADMIN is set in <flags>. Some
> extra info
> >   * are provided if ST_SHLGNDS is present in <flags>.
> >   */
> > -static int stats_dump_fields_html(struct chunk *out, const struct field
> *stats, unsigned int flags)
> > +static int stats_dump_fields_html(struct chunk *out, const struct field
> *stats, unsigned int flags, struct proxy *px)
>
> Please don't add the proxy here. We've tried hard to remove any dependency
> on internal information in the dump code so that all outputs can contain
> all info. That's what will make it possible to have some JSON output for
> example. The dump functions are just "output drivers" which translate the
> contents of the stats table. Here the only reason why you need this is to
> have the duration. Just add another metric in the stats which is the
> duration
> and an external aggregator will even be able to use it and report it. But
> as
> you see this is also eliminated if you produce a few values over different
> periods.
>
> (...)
>
> > +/*
> > + * Time stat entries have one of the following values:
> > + *   - 0 (never used, or was set with a value of 0. May or may not be
> valid)
> > + *   - -1 (time was not relevant at update, this entry is not used to
> > + *           calculate the average)
> > + *   - non-zero (used in the past, may be valid or invalid).
> > + * To determine whether the value is useful or not, we compare current
> > + * time to saved time. We cannot stop once we reach a time outside of
> the
> > + * interval, as intermediate entries may have old values and not be
> updated
> > + * if no updates took place at those times. This means going through the
> > + * entire array, but it is better to do extra work at (rare) retrieval
> times
> > + * (control plane), rather than updating unused entries at run time
> (data
> > + * plane). This leads to fast updates and slower retrievals.
> > + */
> > +static void get_tstat_counters(struct tstat_counters *tstat,
> > +                            struct stream_interface *si,
> > +                            unsigned int now, struct field *qtimep,
> > +                            struct field *ctimep, struct field *rtimep,
> > +                            struct field *ttimep)
> > +{
> > +     int i;
> > +     int64_t qtime = 0, ctime = 0, rtime = 0, ttime = 0;
> > +     int64_t qcounter = 0, ccounter = 0, rcounter = 0, tcounter = 0;
> > +     int duration = tstat->max_entries;
> > +
> > +     /*
> > +      * Return zero for those backends that did not ask for time based
> > +      * stats.
> > +      */
> > +     if (!tstat->times) {
> > +             *qtimep = mkf_u32(FN_AVG, 0);
> > +             *ctimep = mkf_u32(FN_AVG, 0);
> > +             *rtimep = mkf_u32(FN_AVG, 0);
> > +             *ttimep = mkf_u32(FN_AVG, 0);
> > +             return;
> > +     }
> > +
> > +     if (si && si->end) {
> > +             /*
> > +              * If user via CLI specified a specific time period over
> which
> > +              * to report stats, then retrieve that value instead of the
> > +              * default value specified in the haproxy configuration
> file.
> > +              */
> > +             struct appctx *appctx = objt_appctx(si->end);
> > +
> > +             if (appctx && (appctx->ctx.stats.flags & STAT_DURATION))
> > +                     duration = MAX(MIN(duration,
> > +                                        appctx->ctx.stats.duration), 0);
> > +     }
> > +
> > +     /* Sum all entries to calculate time window based average */
> > +     for (i = 0; i < tstat->max_entries; i++) {
> > +             if (!within_period(tstat->times[i].timestamp, now,
> duration)) {
> > +                     /* Outside the requested period, but cannot break
> out */
> > +                     continue;
> > +             }
> > +
> > +             /* Valid time period - increment counts/average */
> > +             qtime += tstat->times[i].t_queue;
> > +             ctime += tstat->times[i].t_connect;
> > +             rtime += tstat->times[i].t_data;
> > +             ttime += tstat->times[i].t_close;
> > +
> > +             qcounter += tstat->times[i].t_queue_counter;
> > +             ccounter += tstat->times[i].t_connect_counter;
> > +             rcounter += tstat->times[i].t_data_counter;
> > +             tcounter += tstat->times[i].t_close_counter;
> > +     }
> > +
> > +     /* Avoid divide-by-zero errors */
> > +     if (!qcounter)
> > +             qcounter = 1;
> > +     if (!ccounter)
> > +             ccounter = 1;
> > +     if (!rcounter)
> > +             rcounter = 1;
> > +     if (!tcounter)
> > +             tcounter = 1;
> > +
> > +     /* Convert average times to microsecond */
> > +     *qtimep = mkf_u32(FN_AVG, qtime / qcounter);
> > +     *ctimep = mkf_u32(FN_AVG, ctime / ccounter);
> > +     *rtimep = mkf_u32(FN_AVG, rtime / rcounter);
> > +     *ttimep = mkf_u32(FN_AVG, ttime / tcounter);
>
> Above I'd proceed differently. Since the stats supports the "empty" type
> (which is the default), I'd do it this way instead, which has both the
> benefit of avoiding the divide by zero and not to emit any value when
> there's none (instead of emitting zero) :
>
>         /* Convert average times to microsecond */
>         if (qcounter)
>                 *qtimep = mkf_u32(FN_AVG, qtime / qcounter);
>         if (ccounter)
>                 *ctimep = mkf_u32(FN_AVG, ctime / ccounter);
>         if (rcounter)
>                 *rtimep = mkf_u32(FN_AVG, rtime / rcounter);
>         if (tcounter)
>                 *ttimep = mkf_u32(FN_AVG, ttime / tcounter);
>
> > +/* Update time-stat counter values if enabled for this backend */
> > +void set_tstat_counters(struct tstat_counters *tstat,
> > +                     unsigned int now, int t_queue,
> > +                     int t_connect, int t_data, int t_close)
> > +{
> > +     if (tstat->times) {
> > +             unsigned int slot = now % tstat->max_entries;
> > +
> > +             /*
> > +              * Optimize for heavy load when many requests are made in
> > +              * the same second. During very low load, the following
> > +              * 'likely' will become false more often.
> > +              */
> > +             if (likely(now == tstat->times[slot].timestamp)) {
> (...)
>
> Here you can make the code slightly simpler by first checking if the period
> is different to reinitialize the slot to zero using a simple memset(), and
> then go to the usual increment. That removes a significant amount of logic.
>
> Cheers,
> Willy
>

Re: [PATCH] MEDIUM/RFC: Implement time-based server latency metrics

Reply via email to