On Wed, Sep 29, 2021 at 11:27 AM Peter Geoghegan <p...@bowt.ie> wrote:
> I would like to enable deduplication within system catalog indexes for
> Postgres 15.

I decided to run a simple experiment, to give us some idea of what
benefits my proposal gives users: I ran "make installcheck" on a newly
initdb'd database (master branch), and then with the attached patch
(which enables deduplication with system catalog indexes) applied.

I ran a query that shows the 20 largest system catalog indexes in each
case. I'm interested in when and where we see improvements to space
utilization. Any reduction in index size must be a result of index
deduplication (excluding any noise-level changes).

Master branch:

regression=# SELECT
  pg_size_pretty(pg_relation_size(c.oid)) as sz,
  c.relname
FROM pg_index i
JOIN pg_opclass op ON i.indclass[0] = op.oid
JOIN pg_am am ON op.opcmethod = am.oid
JOIN pg_class c ON i.indexrelid = c.oid
JOIN pg_namespace n ON c.relnamespace = n.oid
WHERE am.amname = 'btree' AND n.nspname = 'pg_catalog'
AND c.relkind = 'i' AND i.indisready AND i.indisvalid
ORDER BY pg_relation_size(c.oid) DESC LIMIT 20;
   sz    |              relname
---------+-----------------------------------
 1088 kB | pg_attribute_relid_attnam_index
 928 kB  | pg_depend_depender_index
 800 kB  | pg_attribute_relid_attnum_index
 736 kB  | pg_depend_reference_index
 352 kB  | pg_proc_proname_args_nsp_index
 216 kB  | pg_description_o_c_o_index
 200 kB  | pg_class_relname_nsp_index
 184 kB  | pg_type_oid_index
 176 kB  | pg_class_tblspc_relfilenode_index
 160 kB  | pg_type_typname_nsp_index
 104 kB  | pg_proc_oid_index
 64 kB   | pg_class_oid_index
 64 kB   | pg_statistic_relid_att_inh_index
 56 kB   | pg_collation_name_enc_nsp_index
 48 kB   | pg_constraint_conname_nsp_index
 48 kB   | pg_amop_fam_strat_index
 48 kB   | pg_amop_opr_fam_index
 48 kB   | pg_largeobject_loid_pn_index
 48 kB   | pg_operator_oprname_l_r_n_index
 48 kB   | pg_index_indexrelid_index
(20 rows)

Patch:

   sz    |              relname
---------+-----------------------------------
 1048 kB | pg_attribute_relid_attnam_index
 888 kB  | pg_depend_depender_index
 752 kB  | pg_attribute_relid_attnum_index
 616 kB  | pg_depend_reference_index
 352 kB  | pg_proc_proname_args_nsp_index
 216 kB  | pg_description_o_c_o_index
 192 kB  | pg_class_relname_nsp_index
 184 kB  | pg_type_oid_index
 152 kB  | pg_type_typname_nsp_index
 144 kB  | pg_class_tblspc_relfilenode_index
 104 kB  | pg_proc_oid_index
 72 kB   | pg_class_oid_index
 56 kB   | pg_collation_name_enc_nsp_index
 56 kB   | pg_statistic_relid_att_inh_index
 48 kB   | pg_index_indexrelid_index
 48 kB   | pg_amop_fam_strat_index
 48 kB   | pg_amop_opr_fam_index
 48 kB   | pg_largeobject_loid_pn_index
 48 kB   | pg_operator_oprname_l_r_n_index
 40 kB   | pg_index_indrelid_index
(20 rows)

The improvements to space utilization for the larger indexes
(especially the two pg_depends non-unique indexes) is smaller than I
remember from last time around, back in early 2020. This is probably
due to a combination of the Postgres 14 work and the pg_depend PIN
optimization work from commit a49d0812.

The single biggest difference is the decrease in the size of
pg_depend_reference_index -- it goes from 736 kB to 616 kB. Another
notable difference is that pg_class_tblspc_relfilenode_index shrinks,
going from 176 kB to 144 kB. These are not huge differences, but they
still seem worth having.

The best argument in favor of my proposal is definitely the index
bloat argument, which this test case tells us little or nothing about.
I'm especially concerned about scenarios where logical replication is
used, or where index deletion and VACUUM are inherently unable to
remove older index tuple versions for some other reason.

--
Peter Geoghegan

Attachment: v1-0001-Enable-deduplication-in-system-catalog-indexes.patch
Description: Binary data

Reply via email to