Hi hackers, This is my proposal for 'pg_dump' utility enhancements which enable backup and recovery of statistical data stored in the 'pg_statistic' table. Statistical data is very valuable for the query planner, so support engineers have to manually generate it using ANALYZE whenever they migrate or restore a database from backup, which is unpleasant. This proposal aims to improve the overall situation.
Problem description =================== Currently there is no way to backup 'pg_statistic' because columns of 'anyarray' type cannot be reconstructed solely with their textual representation. Meanwhile, all that is needed to solve this problem is a small extension capable of retrieving the element type of an 'anyarray' object and recreating this particular 'anyarray' object using the 'array_in' procedure. Another vital feature is the ability to transform various object identificators that are stored in this table to textual representations in order to make them portable. Such functionality could be easily implemented, which is why I've made up a tiny proof of concept extension that is able to demonstrate the possibility of recovery. Design ====== Several things come to mind when we think of the 'pg_statistic' recovery: * The procedure written in the C language is needed in order to determine the element type of a composite 'anyarray' type. The returned 'oid' will be used to reconstruct the 'anyarray' object using the 'array_in' procedure. arr := array_in('{1,2,3}', 'text'::regtype::oid, -1); anyarray_elemtype(arr) -> 25 ('text'::regtype::oid) * The columns 'starelid' (relation identifier) and 'staop' (operator identifier) have type 'oid', so their values could be invalid within a new DB, because the object IDs of newly recovered relations and operators might have changed during recovery. These kinds of values should be substituted with proper casts to internal types, for example: 65554 (relid) -> 'public.test'::regclass::oid 15 (staopN) -> 'public.=(pg_catalog.=(pg_catalog.int4, pg_catalog.int8)' Note that every type is schema-qualified in order to avoid naming conflicts. * The type of a column which is referenced by the 'staattnum' column also needs to be checked for the sake of consistency. It should remain the same, otherwise collected stats won't be of any use. * The main procedure will simply generate a bunch of INSERT queries (one query per each row of the 'pg_statistic') which can be saved to a text file. Proof of concept ================ Interface --------- Currently there's a PoC extension which contains several functions: dump_statistic() - returns a set of INSERT queries; anyarray_elemtype(anyarray) - returns the object identificator of an element type; to_schema_qualified_operator(opid oid) - converts the 'opid' (operator ID) to a schema-qualified operator name; to_schema_qualified_relname(relid oid) - converts the 'relid' (relation ID) to a schema-qualified relation name; to_schema_qualified_type(typid oid) - converts the 'typid' (type ID) to a schema-qualified type name; to_attname(rel text, colnum smallint) - returns the name of the Nth column of the specified table 'rel'; to_attnum(rel text, col text) - converts the table name 'rel' and the column name 'col' to a column number; to_atttype(rel text, col text) - returns the type of the column 'col'; to_atttype(rel text, colnum smallint) - overloaded for the column number 'colnum'; The extension is compatible with versions 9.4 and above. Usage example ------------- DB=# \copy (select dump_statistic()) to 'stat_backup.sql' $ psql DB < stat_backup.sql Proposed changes to pg_dump =========================== Now that the approach has been developed, it may be applied to improve the 'pg_dump' utility. Some minor code changes would make the 'pg_dump' emit specially-formed recovery INSERTS for 'pg_statistic' in the 'binary-upgrade' mode, thus allowing us to restore saved stats after an upgrade. Conclusion ========== I've attached a tarball with sources so that anyone could try the extension. -- Dmitry Ivanov Postgres Professional: http://www.postgrespro.com Russian Postgres Company
dump_stat.tar.gz
Description: application/compressed-tar
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers