Size of pg_rewrite (Was: Report checkpoint progress with pg_stat_progress_checkpoint)

Matthias van de Meent Fri, 08 Apr 2022 07:52:36 -0700

On Sat, 19 Mar 2022 at 01:15, Andres Freund <and...@anarazel.de> wrote:
> pg_rewrite without pg_stat_progress_checkpoint: 745472, with: 753664
>
> pg_rewrite is the second biggest relation in an empty database already...


Yeah, that's not great. Thanks for nerd-sniping me into looking into
how views and pg_rewrite rules work, that was very interesting and I
learned quite a lot.

# Immediately potential, limited to progress views

I noticed that the CASE-WHEN (used in translating progress stage index
to stage names) in those progress reporting views can be more
efficiently described (althoug with slightly worse behaviour around
undefined values) using text array lookups (as attached). That
resulted in somewhat smaller rewrite entries for the progress views
(toast compression was good old pglz):

template1=# SELECT sum(octet_length(ev_action)),
SUM(pg_column_size(ev_action)) FROM pg_rewrite WHERE
ev_class::regclass::text LIKE '%progress%';

master:
  sum  |  sum
-------+-------
 97277 | 19956
patched:
  sum  |  sum
-------+-------
 77069 | 18417

So this seems like a nice improvement of 20% uncompressed / 7% compressed.

I tested various cases of phase number to text translations: `CASE ..
WHEN`; `(ARRAY[]::text[])[index]` and `('{}'::text[])[index]`. See
results below:

postgres=# create or replace view arrayliteral_view as select
(ARRAY['a','b','c','d','e','f']::text[])[index] as name from tst
s(index);
CREATE INDEX
postgres=# create or replace view stringcast_view as select
('{a,b,c,d,e,f}'::text[])[index] as name from tst s(index);
CREATE INDEX
postgres=# create or replace view split_stringcast_view as select
(('{a,b,' || 'c,d,e,f}')::text[])[index] as name from tst s(index);
CREATE VIEW
postgres=# create or replace view case_view as select case index when
0 then 'a' when 1 then 'b' when 2 then 'c' when 3 then 'd' when 4 then
'e' when 5 then 'f' end as name from tst s(index);
CREATE INDEX


postgres=# postgres=# select ev_class::regclass::text,
octet_length(ev_action), pg_column_size(ev_action) from pg_rewrite
where ev_class in ('arrayliteral_view'::regclass::oid,
'case_view'::regclass::oid, 'split_stringcast_view'::regclass::oid,
'stringcast_view'::regclass::oid);
       ev_class        | octet_length | pg_column_size
-----------------------+--------------+----------------
 arrayliteral_view     |         3311 |           1322
 stringcast_view       |         2610 |           1257
 case_view             |         5170 |           1412
 split_stringcast_view |         2847 |           1350

It seems to me that we could consider replacing the CASE statements
with array literals and lookups if we really value our template
database size. But, as text literal concatenations don't seem to get
constant folded before storing them in the rules table, this rewrite
of the views would result in long lines in the system_views.sql file,
or we'd have to deal with the additional overhead of the append
operator and cast nodes.

# Future work; nodeToString / readNode, all rewrite rules

Additionally, we might want to consider other changes like default (or
empty value) elision in nodeToString, if that is considered a
reasonable option and if we really want to reduce the size of the
pg_rewrite table.

I think a lot of space can be recovered from that: A manual removal of
what seemed to be fields with default values (and the removal of all
query location related fields) in the current definition of
pg_stat_progress_create_index reduces its uncompressed size from
23226B raw and 4204B compressed to 13821B raw and 2784B compressed,
for an on-disk space saving of 33% for this view's ev_action.

Do note, however, that that would add significant branching in the
nodeToString and readNode code, which might slow down that code
significantly. I'm not planning on working on that; but in my opinion
that is a viable path to reducing the size of new database catalogs.


-Matthias

PS. attached patch is not to be considered complete - it is a minimal
example of the array literal form. It fails regression tests because I
didn't bother updating or including the regression tests on system
views.

Index: src/backend/catalog/system_views.sql
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/src/backend/catalog/system_views.sql 
b/src/backend/catalog/system_views.sql
--- a/src/backend/catalog/system_views.sql      (revision 
32723e5fabcc7db1bf4e897baaf0d251b500c1dc)
+++ b/src/backend/catalog/system_views.sql      (date 1649160138886)
@@ -1120,13 +1120,7 @@
     SELECT
         S.pid AS pid, S.datid AS datid, D.datname AS datname,
         CAST(S.relid AS oid) AS relid,
-        CASE S.param1 WHEN 0 THEN 'initializing'
-                      WHEN 1 THEN 'acquiring sample rows'
-                      WHEN 2 THEN 'acquiring inherited sample rows'
-                      WHEN 3 THEN 'computing statistics'
-                      WHEN 4 THEN 'computing extended statistics'
-                      WHEN 5 THEN 'finalizing analyze'
-                      END AS phase,
+        (('{initializing,acquiring sample rows,acquiring inherited sample 
rows,computing statistics,computing extended statistics,finalizing 
analyze}')::text[])[S.param1] AS phase,
         S.param2 AS sample_blks_total,
         S.param3 AS sample_blks_scanned,
         S.param4 AS ext_stats_total,
@@ -1141,14 +1135,8 @@
     SELECT
         S.pid AS pid, S.datid AS datid, D.datname AS datname,
         S.relid AS relid,
-        CASE S.param1 WHEN 0 THEN 'initializing'
-                      WHEN 1 THEN 'scanning heap'
-                      WHEN 2 THEN 'vacuuming indexes'
-                      WHEN 3 THEN 'vacuuming heap'
-                      WHEN 4 THEN 'cleaning up indexes'
-                      WHEN 5 THEN 'truncating heap'
-                      WHEN 6 THEN 'performing final cleanup'
-                      END AS phase,
+        (('{initializing,scanning heap,vacuuming indexes,vacuuming 
heap,cleaning up indexes,truncating heap,performing final cleanup}')::text[]
+        )[S.param1] AS phase,
         S.param2 AS heap_blks_total, S.param3 AS heap_blks_scanned,
         S.param4 AS heap_blks_vacuumed, S.param5 AS index_vacuum_count,
         S.param6 AS max_dead_tuples, S.param7 AS num_dead_tuples
@@ -1161,18 +1149,8 @@
         S.datid AS datid,
         D.datname AS datname,
         S.relid AS relid,
-        CASE S.param1 WHEN 1 THEN 'CLUSTER'
-                      WHEN 2 THEN 'VACUUM FULL'
-                      END AS command,
-        CASE S.param2 WHEN 0 THEN 'initializing'
-                      WHEN 1 THEN 'seq scanning heap'
-                      WHEN 2 THEN 'index scanning heap'
-                      WHEN 3 THEN 'sorting tuples'
-                      WHEN 4 THEN 'writing new heap'
-                      WHEN 5 THEN 'swapping relation files'
-                      WHEN 6 THEN 'rebuilding index'
-                      WHEN 7 THEN 'performing final cleanup'
-                      END AS phase,
+        ('{NULL,CLUSTER,VACUUM FULL}'::text[])[S.param1] AS command,
+        (('{initializing,initializing,seq scanning heap,index scanning 
heap,sorting tuples,writing new heap,swapping relation files,rebuilding 
index,performing final cleanup}')::text[])[S.param2] AS phase,
         CAST(S.param3 AS oid) AS cluster_index_relid,
         S.param4 AS heap_tuples_scanned,
         S.param5 AS heap_tuples_written,
@@ -1187,24 +1165,12 @@
         S.pid AS pid, S.datid AS datid, D.datname AS datname,
         S.relid AS relid,
         CAST(S.param7 AS oid) AS index_relid,
-        CASE S.param1 WHEN 1 THEN 'CREATE INDEX'
-                      WHEN 2 THEN 'CREATE INDEX CONCURRENTLY'
-                      WHEN 3 THEN 'REINDEX'
-                      WHEN 4 THEN 'REINDEX CONCURRENTLY'
-                      END AS command,
-        CASE S.param10 WHEN 0 THEN 'initializing'
-                       WHEN 1 THEN 'waiting for writers before build'
-                       WHEN 2 THEN 'building index' ||
+        ('{NULL,CREATE INDEX,CREATE INDEX CONCURRENTLY,REINDEX,REINDEX 
CONCURRENTLY}'::text[])[S.param1] AS command,
+        CASE S.param10 WHEN 2 THEN 'building index' ||
                            COALESCE((': ' || 
pg_indexam_progress_phasename(S.param9::oid, S.param11)),
                                     '')
-                       WHEN 3 THEN 'waiting for writers before validation'
-                       WHEN 4 THEN 'index validation: scanning index'
-                       WHEN 5 THEN 'index validation: sorting tuples'
-                       WHEN 6 THEN 'index validation: scanning table'
-                       WHEN 7 THEN 'waiting for old snapshots'
-                       WHEN 8 THEN 'waiting for readers before marking dead'
-                       WHEN 9 THEN 'waiting for readers before dropping'
-                       END as phase,
+                       ELSE ('{NULL,initializing,waiting for writers before 
build,NULL,waiting for writers before validation,index validation: scanning 
index,index validation: sorting tuples,index validation: scanning table,waiting 
for old snapshots,waiting for readers before marking dead,waiting for readers 
before dropping}'::text[])[S.param10]
+                       END AS phase,
         S.param4 AS lockers_total,
         S.param5 AS lockers_done,
         S.param6 AS current_locker_pid,
@@ -1220,13 +1186,8 @@
 CREATE VIEW pg_stat_progress_basebackup AS
     SELECT
         S.pid AS pid,
-        CASE S.param1 WHEN 0 THEN 'initializing'
-                      WHEN 1 THEN 'waiting for checkpoint to finish'
-                      WHEN 2 THEN 'estimating backup size'
-                      WHEN 3 THEN 'streaming database files'
-                      WHEN 4 THEN 'waiting for wal archiving to finish'
-                      WHEN 5 THEN 'transferring wal files'
-                      END AS phase,
+        ('{NULL,initializing,waiting for checkpoint to finish,estimating 
backup size,streaming database files,waiting for wal archiving to 
finish,transferring wal files}'::text[]
+        )[S.param1] AS phase,
         CASE S.param2 WHEN -1 THEN NULL ELSE S.param2 END AS backup_total,
         S.param3 AS backup_streamed,
         S.param4 AS tablespaces_total,
@@ -1238,14 +1199,8 @@
     SELECT
         S.pid AS pid, S.datid AS datid, D.datname AS datname,
         S.relid AS relid,
-        CASE S.param5 WHEN 1 THEN 'COPY FROM'
-                      WHEN 2 THEN 'COPY TO'
-                      END AS command,
-        CASE S.param6 WHEN 1 THEN 'FILE'
-                      WHEN 2 THEN 'PROGRAM'
-                      WHEN 3 THEN 'PIPE'
-                      WHEN 4 THEN 'CALLBACK'
-                      END AS "type",
+        ('{NULL,COPY FROM,COPY TO}'::text[])[S.param5] AS command,
+        ('{NULL,FILE,PROGRAM,PIPE,CALLBACK}'::text[])[S.param6] AS "type",
         S.param1 AS bytes_processed,
         S.param2 AS bytes_total,
         S.param3 AS tuples_processed,

Size of pg_rewrite (Was: Report checkpoint progress with pg_stat_progress_checkpoint)

Reply via email to