On Wed, Sep 29, 2021 at 11:27 AM Peter Geoghegan <p...@bowt.ie> wrote: > I would like to enable deduplication within system catalog indexes for > Postgres 15.
I decided to run a simple experiment, to give us some idea of what benefits my proposal gives users: I ran "make installcheck" on a newly initdb'd database (master branch), and then with the attached patch (which enables deduplication with system catalog indexes) applied. I ran a query that shows the 20 largest system catalog indexes in each case. I'm interested in when and where we see improvements to space utilization. Any reduction in index size must be a result of index deduplication (excluding any noise-level changes). Master branch: regression=# SELECT pg_size_pretty(pg_relation_size(c.oid)) as sz, c.relname FROM pg_index i JOIN pg_opclass op ON i.indclass[0] = op.oid JOIN pg_am am ON op.opcmethod = am.oid JOIN pg_class c ON i.indexrelid = c.oid JOIN pg_namespace n ON c.relnamespace = n.oid WHERE am.amname = 'btree' AND n.nspname = 'pg_catalog' AND c.relkind = 'i' AND i.indisready AND i.indisvalid ORDER BY pg_relation_size(c.oid) DESC LIMIT 20; sz | relname ---------+----------------------------------- 1088 kB | pg_attribute_relid_attnam_index 928 kB | pg_depend_depender_index 800 kB | pg_attribute_relid_attnum_index 736 kB | pg_depend_reference_index 352 kB | pg_proc_proname_args_nsp_index 216 kB | pg_description_o_c_o_index 200 kB | pg_class_relname_nsp_index 184 kB | pg_type_oid_index 176 kB | pg_class_tblspc_relfilenode_index 160 kB | pg_type_typname_nsp_index 104 kB | pg_proc_oid_index 64 kB | pg_class_oid_index 64 kB | pg_statistic_relid_att_inh_index 56 kB | pg_collation_name_enc_nsp_index 48 kB | pg_constraint_conname_nsp_index 48 kB | pg_amop_fam_strat_index 48 kB | pg_amop_opr_fam_index 48 kB | pg_largeobject_loid_pn_index 48 kB | pg_operator_oprname_l_r_n_index 48 kB | pg_index_indexrelid_index (20 rows) Patch: sz | relname ---------+----------------------------------- 1048 kB | pg_attribute_relid_attnam_index 888 kB | pg_depend_depender_index 752 kB | pg_attribute_relid_attnum_index 616 kB | pg_depend_reference_index 352 kB | pg_proc_proname_args_nsp_index 216 kB | pg_description_o_c_o_index 192 kB | pg_class_relname_nsp_index 184 kB | pg_type_oid_index 152 kB | pg_type_typname_nsp_index 144 kB | pg_class_tblspc_relfilenode_index 104 kB | pg_proc_oid_index 72 kB | pg_class_oid_index 56 kB | pg_collation_name_enc_nsp_index 56 kB | pg_statistic_relid_att_inh_index 48 kB | pg_index_indexrelid_index 48 kB | pg_amop_fam_strat_index 48 kB | pg_amop_opr_fam_index 48 kB | pg_largeobject_loid_pn_index 48 kB | pg_operator_oprname_l_r_n_index 40 kB | pg_index_indrelid_index (20 rows) The improvements to space utilization for the larger indexes (especially the two pg_depends non-unique indexes) is smaller than I remember from last time around, back in early 2020. This is probably due to a combination of the Postgres 14 work and the pg_depend PIN optimization work from commit a49d0812. The single biggest difference is the decrease in the size of pg_depend_reference_index -- it goes from 736 kB to 616 kB. Another notable difference is that pg_class_tblspc_relfilenode_index shrinks, going from 176 kB to 144 kB. These are not huge differences, but they still seem worth having. The best argument in favor of my proposal is definitely the index bloat argument, which this test case tells us little or nothing about. I'm especially concerned about scenarios where logical replication is used, or where index deletion and VACUUM are inherently unable to remove older index tuple versions for some other reason. -- Peter Geoghegan
v1-0001-Enable-deduplication-in-system-catalog-indexes.patch
Description: Binary data