Hi, On 2022-04-05 01:16:04 +1200, Thomas Munro wrote: > On Mon, Apr 4, 2022 at 4:16 PM Andres Freund <and...@anarazel.de> wrote: > > Please take a look! > > A few superficial comments: > > > [PATCH v68 01/31] pgstat: consistent function header formatting. > > [PATCH v68 02/31] pgstat: remove some superflous comments from pgstat.h. > > +1
Planning to commit these after making another coffee and proof reading them some more. > > [PATCH v68 03/31] dshash: revise sequential scan support. > > Logic looks good. That is, > lock-0-and-ensure_valid_bucket_pointers()-only-once makes sense. Just > some comment trivia: > > + * dshash_seq_term needs to be called when a scan finished. The caller may > + * delete returned elements midst of a scan by using dshash_delete_current() > + * if exclusive = true. > > s/scan finished/scan is finished/ > s/midst of/during/ (or /in the middle of/, ...) > > > [PATCH v68 04/31] dsm: allow use in single user mode. > > LGTM. > + Assert(IsUnderPostmaster || !IsPostmasterEnvironment); > (Not this patch's fault, but I wish we had a more explicit way to say "am > single user".) Agreed. > > [PATCH v68 05/31] pgstat: stats collector references in comments > > LGTM. I could think of some alternative suggested names for this subsystem, > but don't think it would be helpful at this juncture so I will refrain :-) Heh. I did start a thread about it a while ago :) > > [PATCH v68 08/31] pgstat: introduce PgStat_Kind enum. > > +#define PGSTAT_KIND_FIRST PGSTAT_KIND_DATABASE > +#define PGSTAT_KIND_LAST PGSTAT_KIND_WAL > +#define PGSTAT_NUM_KINDS (PGSTAT_KIND_LAST + 1) > > It's a little confusing that PGSTAT_NUM_KINDS isn't really the number of > kinds, > because there is no kind 0. For the two users of it... maybe just use > pgstat_kind_infos[] = {...}, and > global_valid[PGSTAT_KIND_LAST + 1]? Maybe the whole justification for not defining an invalid kind is moot now. There's not a single switch covering all kinds of stats left, and I hope that we don't introduce one again... > > [PATCH v68 10/31] pgstat: scaffolding for transactional stats creation / > > drop. > > + /* > + * Dropping the statistics for objects that dropped transactionally itself > + * needs to be transactional. ... > > Hard to parse. How about: "Objects are dropped transactionally, so > related statistics need to be dropped transactionally too." Not all objects are dropped transactionally. But I agree it reads awkwardly. I now, incorporating feedback from Justin as well, rephrased it to: /* * Statistics for transactionally dropped objects need to be * transactionally dropped as well. Collect the stats dropped in the * current (sub-)transaction and only execute the stats drop when we know * if the transaction commits/aborts. To handle replicas and crashes, * stats drops are included in commit / abort records. */ A few too many "drop"s in there, but maybe that's unavoidable. > + /* > + * Whenever the for a dropped stats entry could not be freed (because > + * backends still have references), this is incremented, causing backends > + * to run pgstat_gc_entry_refs(), allowing that memory to be reclaimed. > + */ > + pg_atomic_uint64 gc_count; > > Whenever the ...? * Whenever statistics for dropped objects could not be freed - because * backends still have references - the dropping backend calls * pgstat_request_entry_refs_gc() incrementing this counter. Eventually * that causes backends to run pgstat_gc_entry_refs(), allowing memory to * be reclaimed. > Would it be better to call this variable gc_request_count? Agreed. > + * Initialize refcount to 1, marking it as valid / not tdroped. The entry > > s/tdroped/dropped/ > > + * further if a longer lived references is needed. > > s/references/reference/ Fixed. > + /* > + * There are legitimate cases where the old stats entry might not > + * yet have been dropped by the time it's reused. The easiest > case > + * are replication slot stats. But oid wraparound can lead to > + * other cases as well. We just reset the stats to their plain > + * state. > + */ > + shheader = pgstat_reinit_entry(kind, shhashent); > > This whole comment is repeated in pgstat_reinit_entry and its caller. I guess I felt as indecisive about where to place it between the two locations when I wrote it as I do now. Left it at the callsite for now. > + /* > + * XXX: Might be worth adding some frobbing of the allocation before > + * freeing, to make it easier to detect use-after-free style bugs. > + */ > + dsa_free(pgStatLocal.dsa, pdsa); > > FWIW dsa_free() clobbers memory in assert builds, just like pfree(). Oh. I could swear I saw that not being the case a while ago. But clearly it is the case. Removed. > +static Size > +pgstat_dsa_init_size(void) > +{ > + Size sz; > + > + /* > + * The dshash header / initial buckets array needs to fit into "plain" > + * shared memory, but it's beneficial to not need dsm segments > + * immediately. A size of 256kB seems works well and is not > + * disproportional compared to other constant sized shared memory > + * allocations. NB: To avoid DSMs further, the user can configure > + * min_dynamic_shared_memory. > + */ > + sz = 256 * 1024; > > It kinda bothers me that the memory reserved by > min_dynamic_shared_memory might eventually fill up with stats, and not > be available for temporary use by parallel queries (which can benefit > more from fast acquire/release on DSMs, and probably also huge pages, > or maybe not...), and that's hard to diagnose. It's not great, but I don't really see an alternative? The saving grace is that it's hard to imagine "real" usages of min_dynamic_shared_memory being used up by stats. > + * (4) turn off the idle-in-transaction, idle-session and > + * idle-state-update timeouts if active. We do this before step (5) > so > > s/idle-state-/idle-stats-/ > > + /* > + * Some of the pending stats may have not been flushed due to lock > + * contention. If we have such pending stats here, let the caller know > + * the retry interval. > + */ > + if (partial_flush) > + { > > I think it's better for a comment that is outside the block to say "If > some of the pending...". Or the comment should be inside the blocks. The comment says "if" in the second sentence? But it's a bit awkward anyway, rephrased to: * If some of the pending stats could not be flushed due to lock * contention, let the caller know when to retry. > +static void > +pgstat_build_snapshot(void) > +{ > ... > + dshash_seq_init(&hstat, pgStatLocal.shared_hash, false); > + while ((p = dshash_seq_next(&hstat)) != NULL) > + { > ... > + entry->data = MemoryContextAlloc(pgStatLocal.snapshot.context, > ... > + } > + dshash_seq_term(&hstat); > > Doesn't allocation failure leave the shared hash table locked? The shared table itself not - the error path does LWLockReleaseAll(). The problem is the backend local dshash_table, specifically find_[exclusively_]locked will stay set, and then cause assertion failures when used next. I think we need to fix that in dshash.c. We have code in released branches that's vulnerable to this problem. E.g. ensure_record_cache_typmod_slot_exists() in lookup_rowtype_tupdesc_internal(). See also https://postgr.es/m/20220311012712.botrpsikaufzteyt%40alap3.anarazel.de Afaics the only real choice is to remove find_[exclusively_]locked and rely on LWLockHeldByMeInMode() instead. > > PATCH v68 16/31] pgstat: add pg_stat_exists_stat() for easier testing. > > pg_stat_exists_stat() is a weird name, ... would it be better as > pg_stat_object_exists()? I was fighting with this one a bunch :). Earlier it was called pg_stat_stats_exist() I think. "object" makes it sound a bit too much like it's the database object? Maybe pg_stat_have_stat()? > > [PATCH v68 28/31] pgstat: update docs. > > + Determines the behaviour when cumulative statistics are accessed > > AFAIK our manual is written in en_US, so s/behaviour/behavior/. Fixed like 10 instances of this in the patchset. Not sure why I just can't make myself type behavior. > + memory. When set to <literal>cache</literal>, the first access to > + statistics for an object caches those statistics until the end of the > + transaction / until <function>pg_stat_clear_snapshot()</function> is > > s|/|unless| > > + <literal>none</literal> is most suitable for monitoring solutions. > If > > I'd change "solutions" to "tools" or maybe "systems". Done. > + When using the accumulated statistics views and functions to > monitor collected data, it is important > > Did you intend to write "accumulated" instead of "cumulative" here? Not sure. I think I got bored of the word at some point :P > + You can invoke <function>pg_stat_clear_snapshot</function>() to discard > the > + current transaction's statistics snapshot / cache (if any). The next use > > I'd change s|/ cache|or cached values|. I think "/" like this is an informal > thing. I think we have a few other uses of it. But anyway, changed. Thanks! Andres