Re: Split index and table statistics into different types of stats

Drouvot, Bertrand Tue, 14 Nov 2023 00:04:25 -0800

Hi,

On 11/13/23 9:44 PM, Andres Freund wrote:

Hi,


On 2023-11-13 09:26:56 +0100, Drouvot, Bertrand wrote:

--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -799,11 +799,19 @@ ReadBufferExtended(Relation reln, ForkNumber forkNum, 
BlockNumber blockNum,
         * Read the buffer, and update pgstat counters to reflect a cache hit or
         * miss.
         */
-       pgstat_count_buffer_read(reln);
+       if (reln->rd_rel->relkind == RELKIND_INDEX)
+               pgstat_count_index_buffer_read(reln);
+       else
+               pgstat_count_table_buffer_read(reln);


It's not nice from a layering POV that we need this level of awareness in
bufmgr.c.  I wonder if this is an argument for first splitting out stats like
blocks_hit, blocks_fetched into something like "relfilenode stats" - they're

agnostic of the relkind.


Thanks for looking at it! Yeah I think that would make a lot of sense
to track some stats per relfilenode.

There aren't that many such stats right now,
admittedly, but I think we'll want to also track dirtied, written blocks on a
per relation basis once we can (i.e. we key the relevant stats by relfilenode
instead of oid, so we can associate stats when writing out buffers).


Agree. Then, I think that would make sense to start this effort before the
split index/table one. I can work on a per relfilenode stat patch first.

Does this patch ordering make sense to you?

1) Introduce per relfilenode stats
2) Split index and table stats

+/*
+ * Initialize a relcache entry to count access statistics.  Called whenever an
+ * index is opened.
+ *
+ * We assume that a relcache entry's pgstatind_info field is zeroed by 
relcache.c
+ * when the relcache entry is made; thereafter it is long-lived data.
+ *
+ * This does not create a reference to a stats entry in shared memory, nor
+ * allocate memory for the pending stats. That happens in
+ * pgstat_assoc_index().
+ */
+void
+pgstat_init_index(Relation rel)
+{
+       /*
+        * We only count stats for indexes
+        */
+       Assert(rel->rd_rel->relkind == RELKIND_INDEX);
+
+       if (!pgstat_track_counts)
+       {
+               if (rel->pgstatind_info != NULL)
+                       pgstat_unlink_index(rel);
+
+               /* We're not counting at all */
+               rel->pgstat_enabled = false;
+               rel->pgstatind_info = NULL;
+               return;
+       }
+
+       rel->pgstat_enabled = true;
+}
+
+/*
+ * Prepare for statistics for this index to be collected.
+ *
+ * This ensures we have a reference to the stats entry before stats can be
+ * generated. That is important because an index drop in another
+ * connection could otherwise lead to the stats entry being dropped, which then
+ * later would get recreated when flushing stats.
+ *
+ * This is separate from pgstat_init_index() as it is not uncommon for
+ * relcache entries to be opened without ever getting stats reported.
+ */
+void
+pgstat_assoc_index(Relation rel)
+{
+       Assert(rel->pgstat_enabled);
+       Assert(rel->pgstatind_info == NULL);
+
+       /* Else find or make the PgStat_IndexStatus entry, and update link */
+       rel->pgstatind_info = pgstat_prep_index_pending(RelationGetRelid(rel),
+                                                                                     
                  rel->rd_rel->relisshared);
+
+       /* don't allow link a stats to multiple relcache entries */
+       Assert(rel->pgstatind_info->relation == NULL);
+
+       /* mark this relation as the owner */
+       rel->pgstatind_info->relation = rel;
+}
+
+/*
+ * Break the mutual link between a relcache entry and pending index stats 
entry.
+ * This must be called whenever one end of the link is removed.
+ */
+void
+pgstat_unlink_index(Relation rel)
+{
+
+       if (rel->pgstatind_info == NULL)
+               return;
+
+       /* link sanity check for the index stats */
+       if (rel->pgstatind_info)
+       {
+               Assert(rel->pgstatind_info->relation == rel);
+               rel->pgstatind_info->relation = NULL;
+               rel->pgstatind_info = NULL;
+       }
+}
...


This is a fair bit of duplicated code - perhaps we could have shared helpers?


Yeah, I had it in mind and that was part of the "Will now work on addressing the
up-thread remaining comments" remark I made up-thread.

+/* ----------
+ * PgStat_IndexStatus                  Per-index status within a backend
+ *
+ * Many of the event counters are nontransactional, ie, we count events
+ * in committed and aborted transactions alike.  For these, we just count
+ * directly in the PgStat_IndexStatus.
+ * ----------
+ */
+typedef struct PgStat_IndexStatus
+{
+       Oid                     r_id;                   /* relation's OID */
+       bool            r_shared;               /* is it a shared catalog? */
+       struct PgStat_IndexXactStatus *trans;   /* lowest subxact's counts */
+       PgStat_IndexCounts counts;      /* event counts to be sent */
+       Relation        relation;               /* rel that is using this entry 
*/
+} PgStat_IndexStatus;
+
  /* ----------
   * PgStat_TableXactStatus             Per-table, per-subtransaction status
   * ----------
@@ -227,6 +264,29 @@ typedef struct PgStat_TableXactStatus
  } PgStat_TableXactStatus;

+/* ----------

+ * PgStat_IndexXactStatus              Per-index, per-subtransaction status
+ * ----------
+ */
+typedef struct PgStat_IndexXactStatus
+{
+       PgStat_Counter tuples_inserted; /* tuples inserted in (sub)xact */
+       PgStat_Counter tuples_updated;  /* tuples updated in (sub)xact */
+       PgStat_Counter tuples_deleted;  /* tuples deleted in (sub)xact */
+       bool            truncdropped;   /* relation truncated/dropped in this
+                                                                * (sub)xact */
+       /* tuples i/u/d prior to truncate/drop */
+       PgStat_Counter inserted_pre_truncdrop;
+       PgStat_Counter updated_pre_truncdrop;
+       PgStat_Counter deleted_pre_truncdrop;
+       int                     nest_level;             /* subtransaction nest 
level */
+       /* links to other structs for same relation: */
+       struct PgStat_IndexXactStatus *upper;   /* next higher subxact if any */
+       PgStat_IndexStatus *parent; /* per-table status */
+       /* structs of same subxact level are linked here: */
+       struct PgStat_IndexXactStatus *next;    /* next of same subxact */
+} PgStat_IndexXactStatus;


I don't think much of this is used? It doesn't look like you're using most of
the fields. Which makes sense - there's not really the same transactional
behaviour for indexes as there is for tables.


Fully agree. I had in mind to revisit this stuff too.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Split index and table statistics into different types of stats

Reply via email to