Hi hackers,

This is my proposal for 'pg_dump' utility enhancements which enable backup and 
recovery of statistical data stored in the 'pg_statistic' table. Statistical 
data is very valuable for the query planner, so support engineers have to 
manually generate it using ANALYZE whenever they migrate or restore a database 
from backup, which is unpleasant. This proposal aims to improve the overall 
situation.


Problem description
===================

Currently there is no way to backup 'pg_statistic' because columns of 
'anyarray' type cannot be reconstructed solely with their textual 
representation. Meanwhile, all that is needed to solve this problem is a small 
extension capable of retrieving the element type of an 'anyarray' object and 
recreating this particular 'anyarray' object using the 'array_in' procedure. 
Another vital feature is the ability to transform various object identificators 
that are stored in this table to textual representations in order to make them 
portable. Such functionality could be easily implemented, which is why I've 
made up a tiny proof of concept extension that is able to demonstrate the 
possibility of recovery.


Design
======

Several things come to mind when we think of the 'pg_statistic' recovery:

* The procedure written in the C language is needed in order to determine the 
element type of a composite 'anyarray' type. The returned 'oid' will be used 
to reconstruct the 'anyarray' object using the 'array_in' procedure.

        arr := array_in('{1,2,3}', 'text'::regtype::oid, -1);
        anyarray_elemtype(arr) -> 25 ('text'::regtype::oid)

* The columns 'starelid' (relation identifier) and 'staop' (operator 
identifier) 
have type 'oid', so their values could be invalid within a new DB, because the 
object IDs of newly recovered relations and operators might have changed 
during recovery. These kinds of values should be substituted with proper casts 
to internal types, for example:

        65554 (relid) -> 'public.test'::regclass::oid
        15 (staopN) -> 'public.=(pg_catalog.=(pg_catalog.int4, pg_catalog.int8)'

Note that every type is schema-qualified in order to avoid naming conflicts.

* The type of a column which is referenced by the 'staattnum' column also 
needs to be checked for the sake of consistency. It should remain the same, 
otherwise collected stats won't be of any use.

* The main procedure will simply generate a bunch of INSERT queries (one query 
per each row of the 'pg_statistic') which can be saved to a text file.


Proof of concept
================

Interface
---------

Currently there's a PoC extension which contains several functions:

dump_statistic() - returns a set of INSERT queries;

anyarray_elemtype(anyarray) - returns the object identificator of an element 
type;

to_schema_qualified_operator(opid oid) - converts the 'opid' (operator ID) to a 
schema-qualified operator name;
to_schema_qualified_relname(relid oid) - converts the 'relid' (relation ID) to 
a schema-qualified relation name;
to_schema_qualified_type(typid oid) - converts the 'typid' (type ID) to a 
schema-qualified type name;

to_attname(rel text, colnum smallint) - returns the name of the Nth column of 
the specified table 'rel';
to_attnum(rel text, col text) - converts the table name 'rel' and the column 
name 'col' to a column number;

to_atttype(rel text, col text) - returns the type of the column 'col';
to_atttype(rel text, colnum smallint) - overloaded for the column number 
'colnum';

The extension is compatible with versions 9.4 and above.

Usage example
-------------

DB=# \copy (select dump_statistic()) to 'stat_backup.sql'
$ psql DB < stat_backup.sql


Proposed changes to pg_dump
===========================

Now that the approach has been developed, it may be applied to improve the 
'pg_dump' utility. Some minor code changes would make the 'pg_dump' emit 
specially-formed recovery INSERTS for 'pg_statistic' in the 'binary-upgrade' 
mode, thus allowing us to restore saved stats after an upgrade.


Conclusion
==========

I've attached a tarball with sources so that anyone could try the extension.

-- 
Dmitry Ivanov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company

Attachment: dump_stat.tar.gz
Description: application/compressed-tar

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to