On Sat, 19 Mar 2022 at 01:15, Andres Freund <and...@anarazel.de> wrote: > pg_rewrite without pg_stat_progress_checkpoint: 745472, with: 753664 > > pg_rewrite is the second biggest relation in an empty database already...
Yeah, that's not great. Thanks for nerd-sniping me into looking into how views and pg_rewrite rules work, that was very interesting and I learned quite a lot. # Immediately potential, limited to progress views I noticed that the CASE-WHEN (used in translating progress stage index to stage names) in those progress reporting views can be more efficiently described (althoug with slightly worse behaviour around undefined values) using text array lookups (as attached). That resulted in somewhat smaller rewrite entries for the progress views (toast compression was good old pglz): template1=# SELECT sum(octet_length(ev_action)), SUM(pg_column_size(ev_action)) FROM pg_rewrite WHERE ev_class::regclass::text LIKE '%progress%'; master: sum | sum -------+------- 97277 | 19956 patched: sum | sum -------+------- 77069 | 18417 So this seems like a nice improvement of 20% uncompressed / 7% compressed. I tested various cases of phase number to text translations: `CASE .. WHEN`; `(ARRAY[]::text[])[index]` and `('{}'::text[])[index]`. See results below: postgres=# create or replace view arrayliteral_view as select (ARRAY['a','b','c','d','e','f']::text[])[index] as name from tst s(index); CREATE INDEX postgres=# create or replace view stringcast_view as select ('{a,b,c,d,e,f}'::text[])[index] as name from tst s(index); CREATE INDEX postgres=# create or replace view split_stringcast_view as select (('{a,b,' || 'c,d,e,f}')::text[])[index] as name from tst s(index); CREATE VIEW postgres=# create or replace view case_view as select case index when 0 then 'a' when 1 then 'b' when 2 then 'c' when 3 then 'd' when 4 then 'e' when 5 then 'f' end as name from tst s(index); CREATE INDEX postgres=# postgres=# select ev_class::regclass::text, octet_length(ev_action), pg_column_size(ev_action) from pg_rewrite where ev_class in ('arrayliteral_view'::regclass::oid, 'case_view'::regclass::oid, 'split_stringcast_view'::regclass::oid, 'stringcast_view'::regclass::oid); ev_class | octet_length | pg_column_size -----------------------+--------------+---------------- arrayliteral_view | 3311 | 1322 stringcast_view | 2610 | 1257 case_view | 5170 | 1412 split_stringcast_view | 2847 | 1350 It seems to me that we could consider replacing the CASE statements with array literals and lookups if we really value our template database size. But, as text literal concatenations don't seem to get constant folded before storing them in the rules table, this rewrite of the views would result in long lines in the system_views.sql file, or we'd have to deal with the additional overhead of the append operator and cast nodes. # Future work; nodeToString / readNode, all rewrite rules Additionally, we might want to consider other changes like default (or empty value) elision in nodeToString, if that is considered a reasonable option and if we really want to reduce the size of the pg_rewrite table. I think a lot of space can be recovered from that: A manual removal of what seemed to be fields with default values (and the removal of all query location related fields) in the current definition of pg_stat_progress_create_index reduces its uncompressed size from 23226B raw and 4204B compressed to 13821B raw and 2784B compressed, for an on-disk space saving of 33% for this view's ev_action. Do note, however, that that would add significant branching in the nodeToString and readNode code, which might slow down that code significantly. I'm not planning on working on that; but in my opinion that is a viable path to reducing the size of new database catalogs. -Matthias PS. attached patch is not to be considered complete - it is a minimal example of the array literal form. It fails regression tests because I didn't bother updating or including the regression tests on system views.
Index: src/backend/catalog/system_views.sql IDEA additional info: Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP <+>UTF-8 =================================================================== diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql --- a/src/backend/catalog/system_views.sql (revision 32723e5fabcc7db1bf4e897baaf0d251b500c1dc) +++ b/src/backend/catalog/system_views.sql (date 1649160138886) @@ -1120,13 +1120,7 @@ SELECT S.pid AS pid, S.datid AS datid, D.datname AS datname, CAST(S.relid AS oid) AS relid, - CASE S.param1 WHEN 0 THEN 'initializing' - WHEN 1 THEN 'acquiring sample rows' - WHEN 2 THEN 'acquiring inherited sample rows' - WHEN 3 THEN 'computing statistics' - WHEN 4 THEN 'computing extended statistics' - WHEN 5 THEN 'finalizing analyze' - END AS phase, + (('{initializing,acquiring sample rows,acquiring inherited sample rows,computing statistics,computing extended statistics,finalizing analyze}')::text[])[S.param1] AS phase, S.param2 AS sample_blks_total, S.param3 AS sample_blks_scanned, S.param4 AS ext_stats_total, @@ -1141,14 +1135,8 @@ SELECT S.pid AS pid, S.datid AS datid, D.datname AS datname, S.relid AS relid, - CASE S.param1 WHEN 0 THEN 'initializing' - WHEN 1 THEN 'scanning heap' - WHEN 2 THEN 'vacuuming indexes' - WHEN 3 THEN 'vacuuming heap' - WHEN 4 THEN 'cleaning up indexes' - WHEN 5 THEN 'truncating heap' - WHEN 6 THEN 'performing final cleanup' - END AS phase, + (('{initializing,scanning heap,vacuuming indexes,vacuuming heap,cleaning up indexes,truncating heap,performing final cleanup}')::text[] + )[S.param1] AS phase, S.param2 AS heap_blks_total, S.param3 AS heap_blks_scanned, S.param4 AS heap_blks_vacuumed, S.param5 AS index_vacuum_count, S.param6 AS max_dead_tuples, S.param7 AS num_dead_tuples @@ -1161,18 +1149,8 @@ S.datid AS datid, D.datname AS datname, S.relid AS relid, - CASE S.param1 WHEN 1 THEN 'CLUSTER' - WHEN 2 THEN 'VACUUM FULL' - END AS command, - CASE S.param2 WHEN 0 THEN 'initializing' - WHEN 1 THEN 'seq scanning heap' - WHEN 2 THEN 'index scanning heap' - WHEN 3 THEN 'sorting tuples' - WHEN 4 THEN 'writing new heap' - WHEN 5 THEN 'swapping relation files' - WHEN 6 THEN 'rebuilding index' - WHEN 7 THEN 'performing final cleanup' - END AS phase, + ('{NULL,CLUSTER,VACUUM FULL}'::text[])[S.param1] AS command, + (('{initializing,initializing,seq scanning heap,index scanning heap,sorting tuples,writing new heap,swapping relation files,rebuilding index,performing final cleanup}')::text[])[S.param2] AS phase, CAST(S.param3 AS oid) AS cluster_index_relid, S.param4 AS heap_tuples_scanned, S.param5 AS heap_tuples_written, @@ -1187,24 +1165,12 @@ S.pid AS pid, S.datid AS datid, D.datname AS datname, S.relid AS relid, CAST(S.param7 AS oid) AS index_relid, - CASE S.param1 WHEN 1 THEN 'CREATE INDEX' - WHEN 2 THEN 'CREATE INDEX CONCURRENTLY' - WHEN 3 THEN 'REINDEX' - WHEN 4 THEN 'REINDEX CONCURRENTLY' - END AS command, - CASE S.param10 WHEN 0 THEN 'initializing' - WHEN 1 THEN 'waiting for writers before build' - WHEN 2 THEN 'building index' || + ('{NULL,CREATE INDEX,CREATE INDEX CONCURRENTLY,REINDEX,REINDEX CONCURRENTLY}'::text[])[S.param1] AS command, + CASE S.param10 WHEN 2 THEN 'building index' || COALESCE((': ' || pg_indexam_progress_phasename(S.param9::oid, S.param11)), '') - WHEN 3 THEN 'waiting for writers before validation' - WHEN 4 THEN 'index validation: scanning index' - WHEN 5 THEN 'index validation: sorting tuples' - WHEN 6 THEN 'index validation: scanning table' - WHEN 7 THEN 'waiting for old snapshots' - WHEN 8 THEN 'waiting for readers before marking dead' - WHEN 9 THEN 'waiting for readers before dropping' - END as phase, + ELSE ('{NULL,initializing,waiting for writers before build,NULL,waiting for writers before validation,index validation: scanning index,index validation: sorting tuples,index validation: scanning table,waiting for old snapshots,waiting for readers before marking dead,waiting for readers before dropping}'::text[])[S.param10] + END AS phase, S.param4 AS lockers_total, S.param5 AS lockers_done, S.param6 AS current_locker_pid, @@ -1220,13 +1186,8 @@ CREATE VIEW pg_stat_progress_basebackup AS SELECT S.pid AS pid, - CASE S.param1 WHEN 0 THEN 'initializing' - WHEN 1 THEN 'waiting for checkpoint to finish' - WHEN 2 THEN 'estimating backup size' - WHEN 3 THEN 'streaming database files' - WHEN 4 THEN 'waiting for wal archiving to finish' - WHEN 5 THEN 'transferring wal files' - END AS phase, + ('{NULL,initializing,waiting for checkpoint to finish,estimating backup size,streaming database files,waiting for wal archiving to finish,transferring wal files}'::text[] + )[S.param1] AS phase, CASE S.param2 WHEN -1 THEN NULL ELSE S.param2 END AS backup_total, S.param3 AS backup_streamed, S.param4 AS tablespaces_total, @@ -1238,14 +1199,8 @@ SELECT S.pid AS pid, S.datid AS datid, D.datname AS datname, S.relid AS relid, - CASE S.param5 WHEN 1 THEN 'COPY FROM' - WHEN 2 THEN 'COPY TO' - END AS command, - CASE S.param6 WHEN 1 THEN 'FILE' - WHEN 2 THEN 'PROGRAM' - WHEN 3 THEN 'PIPE' - WHEN 4 THEN 'CALLBACK' - END AS "type", + ('{NULL,COPY FROM,COPY TO}'::text[])[S.param5] AS command, + ('{NULL,FILE,PROGRAM,PIPE,CALLBACK}'::text[])[S.param6] AS "type", S.param1 AS bytes_processed, S.param2 AS bytes_total, S.param3 AS tuples_processed,