On Sat, Apr 25, 2020 at 08:10:40PM +0200, Rainer Jung wrote: > Patch available at > > home.apache.org/~rjung/patches/httpd-trunk-mon-snaps-v1_2.patch
Very nice! +1 from me. Does the times_per_thread logic still make any sense? It's always been wrong for Linux AFAICT so maybe can just be dropped. A minor nit, I think the snap_index should be read/written using apr_atomic_t since it's going to be accessed concurrently across threads? Regards, Joe > Some notes: > > - in order to use the data from mod_systemd (monitor hook), which runs in > the maim process, and also from mod_status, so from child processes, it > needs to sit in shared memory. Since most of the data is tightly coupled to > the scoreboard, I added the new cached data to the global part of the > scoreboard. > > - it was then IMHO a good fit to move the existing ap_get_sload(ap_sload_t > *ld) from server/util.c to server/scoreboard.c. > > - ap_sload_t is used as a collection of data which can be used to generate > averaged data from a pair of ap_sload_t structs. It was extended to also > contain the total request duration plus total usr and sys CPU and the > timestamp when the data was taken. So it should now contain all data from > which mod-systemd and mod_status currently like to display derived averaged > values. > > - busy and idle in ap_sload_t have been changed from int to float, because > they were already only used a percentages. IMHO it doesn't make sense to > express idle and busy as percentages based on a total of busy plus idle, > because that sum is not a meaningful total. The server can grow by creating > new processes and suddenly your busy percentage might shrink, although the > absolute number of busy threads increases. That's counter intuitive. So I > added the unused process slots to the total count, and we have three > counters that sum up to this total, busy, idle and dead. We need a better > name than "dead" for these unused process slots that might get used later. > "unused" is to close to "idle" and I don't have a good name yet. > > - the new ap_mon_snap_t contains a pointer to an ap_sload_t, six averaged > values generated from two ap_sload_t and a member that conatins the time > delta between those two ap_sload_t. The ap_sload_t referenced by > ap_mon_snap_t contains the data at time t1. During the next monitor run, new > t1 data will be collected and the previous t1 data will be used as t0 data > to generate the new averages. > > - the scoreboard contains two ap_mon_snap_t (plus the two ap_sload_t > referenced by them) to be able to update one of them without breaking access > by consumers to the other one. After the update the roles get switched. > > - both structs, ap_sload_t and ap_mon_snap_t are declared in httpd.h, > because ap_sload_t already had been there. t might be a better fit to move > them to scoreboard.h, but I'm not sure about that and I don't know if that > would be allowed by the compatibility rules. > > - also in httpd.h there are now three new function declarations. Two only > used by server/core.c as hook functions: > > int ap_scoreboard_monitor(apr_pool_t *p, server_rec *s); > void ap_scoreboard_child_init(apr_pool_t *p, server_rec *s); > > and one for public consumption: > > AP_DECLARE(void) ap_get_mon_snap(ap_mon_snap_t *ms); > > - mod_systemd and mod_status now call ap_get_mon_snap(&snap) to retrieve the > latest averaged data. mod_status still uses the scoreboard in addition to > retrieve up-to-date current scalar metrics. Small adjustments to the > mod_status output plus additions to mod_systemd notification messages. Tne > STATUS in the notification of mod_systemd now looks liek this: > > STATUS=Pct Idle workers 28.4; Pct Busy workers 10.6; Pct Dead workers 60.9; > Requests/sec: 5030; Bytes served/sec: 2.9MB/sec; Bytes served/request: 596 > B/req; Avg Duration(ms): 5.78; Avg Concurrency: 29.1; Cpu Pct: 40.5 > > - scoreboard.c changes: > > - take over ap_get_sload(ap_sload_t *sl) from server/util.c. > Added copied code from mod_status to that function to also add up the > request duration and the usr and sys CPU times. > > - ap_scoreboard_monitor() runs in the monitor hook. It calls > ap_get_sload() and then a static utility function calc_mon_data() to > calculate the new averages. > > - Minor mmn change not yet part of the patch. > > It compiles fine (maintainer mode) on RHEL 7 x86_64 and on Solaris 10 Sparc > and I did some tests with mod_status and mod_systemd. > > Regards, > > Rainer > > Am 24.04.2020 um 18:32 schrieb Rainer Jung: > > Am 24.04.2020 um 16:21 schrieb Joe Orton: > > > On Fri, Apr 24, 2020 at 12:17:19PM +0200, Rainer Jung wrote: > > > > Thinking further: I think it would make sense to have a module or core > > > > implement the monitor hook to generate that derived data (requests/sec, > > > > bytes/sec, durationMs/request, avgConcurrency) in the last > > > > monitor interval > > > > and to provide that data to consumers like mod_systemd or - new > > > > - mod_status > > > > - instead of the long term averages since start. It could > > > > probably be added > > > > to the code that already provides "sload". That way mod_status > > > > would also > > > > profit from the more precise average values (taken over the last monitor > > > > interval). > > > > > > I definitely like the patch, it has bothered me that the "per second" > > > stats are not very useful but wasn't sure how to make it better. > > > > > > This is also an interesting idea. > > > > > > So you would suggest having a new monitor hook which runs REALLY_LAST in > > > the order, calls ap_get_sload() and stores it in a global, and then we'd > > > have an ap_get_cached_sload() (or whatever) which gives you the cached > > > data from the last iteration? Or are you thinking of a more > > > sophisticated API which does the "diff" between intervals internally? > > > > Thanks for the positive feedback. > > > > The averaged metrics IMHO only make sense as cached data, updated in > > regular intervals and provided for use by various modules (probably only > > mod_systemd and mod_status). > > > > I would like to provide the already averaged data in a struct, each > > metric as a float or double. The bytes/request probably not already > > human readably scaled, because it makes its use less flexible. Since we > > already also have the absolute counters at that point, we can easily add > > them to the same struct as 32 or 64 bit counters and return a consistent > > set of data (five old values, five new values, five averages and two > > time stamps). [idle(32), busy(32), requests(64), bytes(64), > > duration(64); req/s, bytes/s, bytes/req, dur/s, dur/req]. So consumers > > needing a consistent view can get it. > > > > Even more so since the absolute metrics are currently not cheap to > > access. We collect all of them by iterating over the scoreboard and > > summing up. By adding them to the cached data, the consuming code could > > decide, whether such near-time data is good enough or it needs to > > acquire new and curent counters. For mod_systemd, cached data (10 second > > interval) might be OK. > > > > For some modules - like mod_status - cached averages are fine, but I > > think the counters should be correct for the point in time the status > > request was handled by the module. So the scoreboard statistics code in > > mod_status unfortunately would not go away.´, but the data quality for > > the averages would become better. > > > > Implementation wise I am thinking about adding > > > > ap_hook_monitor(mon_avg_monitor, NULL, NULL, ???); > > > > to server/util.c, which calculates the new averages and > > > > ap_get_mon_avg(ap_mon_avg_t *ma) > > > > which returns the four averages in a struct similar to the existing > > ap_get_loadavg() and ap_get_sload(). > > > > We might have a little hassle to make the statistics update > > atomic/thread-safe (eg. two instances of the internal data struct, so > > that we only need to make the switch between them after the new > > calculation atomic). > > > > About REALLY_LAST: why last? If other modules collect data via this API > > and wasn't to do it in the monitor hook as well, shouldn't run the > > caching of data REALLY_FIRST, so you get the new averages? > > > > I'll try to draft and test something along these lines later today. Fun > > stuff. And more comments are very welcome. > > > > And I like, that mod_status will profit by showing betrer averages as well. > > > > Regards, > > > > Rainer > > > > > > Am 23.04.2020 um 21:29 schrieb Rainer Jung: > > > > > Hi all, > > > > > > > > > > triggered by the new mod_systemd I drafted a patch to enhance the > > > > > monitoring data it provides during the monitor hook run. > > > > > > > > > > Currently it publishes important data, like idle and busy slots and > > > > > total request count, but also not so useful info like requests/second > > > > > and bytes/second as a long term average (since start). These two > > > > > figues > > > > > tend to become near constant after a longer time of operation. > > > > > > > > > > Since the monitor hook of the module always seems to run in the same > > > > > (parent) process, it is easy to remember the previous request and byte > > > > > count data and average only over the last monitor hook interval. This > > > > > should give more meaningful data. And is a change local to > > > > > mod_systemd. > > > > > > > > > > In addition we have a third metric available in the scoreboard, namely > > > > > the total request duration. From that we can get the average request > > > > > duration and the average request concurrency. This part also needs a > > > > > change to the sload structure. Maybe we need a minor MMN bump for > > > > > that. > > > > > > > > > > I scetched a patch under > > > > > > > > > > home.apache.org/~rjung/patches/httpd-trunk-mod_systemd-interval-stats.patch > > > > > > > > > > > > > > > Any comments, likes or dislikes? > > > > > > > > > > Thanks and regards, > > > > > > > > > > Rainer >