Re: Scriptable way to validate a pg_dump restore ?

Ron Johnson Mon, 29 Jan 2024 01:07:03 -0800

On Mon, Jan 29, 2024 at 3:12 AM Laura Smith <
[email protected]> wrote:


> Hi
>
> Let's say I've got a scenario where I'm doing a pg_dump replication rather
> than online streaming, e.g. due to air-gap or whatever.
>
> Is there a scriptable way to validate the restore ?  e.g. using doing
> something clever with ctid or something to ensure both the schema and all
> its rows were restored to the same point at which the dump was taken ?
>

1. Since pg_dump creates a logical copy, nothing binary like ctid will work.
2. pg_dump is designed to take a snapshot.If it doesn't, it would fail
for people who do logical replication.  However, no one has noticed.

That's kinda like being asked to prove that rocks *always* fall when you
drop them.  Either you trust physics, because physics has always worked, or
you must watch every rock, because next time it might not fall.  The
analogy is slightly flawed, since we always check the pg_dump and
pg_restore return codes, since something else might impact their function.

But if you still need evidence, here's what I'm doing to verify table and
record counts during a 9.6 -> 14 migration.  You'll have to modify it for
your purpose.

Create this table and function beforehand:
CREATE TABLE dba.migration_table_counts (
    location text check (location in ('96', '14'))
  , table_name text
  , row_count bigint
  , count_time timestamp without time zone default current_timestamp
  , primary key (table_name, location)
);
CREATE OR REPLACE FUNCTION dba.get_table_counts(_p_source TEXT) RETURNS
INTEGER
    LANGUAGE plpgsql
AS $func$
DECLARE
    r RECORD;
    _sql TEXT;
    _table_count BIGINT;
BEGIN
    FOR r IN select relnamespace::regnamespace::text||'.'||relname as
table_name
            from pg_class cla
            where relkind = 'r'
            and not exists (select 1   -- excludes parent tables
                            from pg_inherits inh1
                            where inh1.inhparent = cla.oid)
            and relnamespace::regnamespace::text
                    not in ('pg_catalog', 'information_schema', 'dba')
            order by 1
    LOOP
        _sql := FORMAT('SELECT COUNT(*) FROM %s;', r.table_name);
        RAISE NOTICE '% %', to_char(clock_timestamp(), 'YYYY-MM-DD HH24:MI:
SS.MS'), _sql;
        EXECUTE _sql INTO _table_count;
        --RAISE NOTICE '%', _table_count;
        INSERT INTO dba.migration_table_counts (location, table_name,
row_count)
            VALUES (_p_source, r.table_name, _table_count);
    END LOOP;
    RAISE NOTICE '% %', to_char(clock_timestamp(), 'YYYY-MM-DD HH24:MI:SS.MS'),
'Finished';
    RETURN 0;
END
$func$;

Run this script in a cron job that executes at the same time as your
pg_dump cron job.  Parameters should be for the source database.
#!/bin/bash
declare -gr Server=$1
declare -gr DB=$2
declare -gr Source=$3
if [ -z $Server ]; then echo "Requires a server name."; exit 2; fi
if [ -z $DB ]; then echo "Requires a DB name."; exit 3; fi
if [ -z $Source ]; then echo "Requires a source: 96|14."; exit 4; fi
psql -U postgres -h ${Server} $DB -Xac "DELETE FROM
dba.migration_table_counts WHERE location = '$Source';"
psql -U postgres -h ${Server} $DB -Xac "select * from
dba.get_table_counts('$Source');"

Run the same script on the destination server after the pg_restore is
finished.
Dump the source dba.migration_table_counts then load it into the
destination dba.migration_table_counts.

These two queries run on the destination server will check that all tables
exist in both databases, and that the record counts are the same.

You'll have some slight variations, since the two jobs are in separate
transactions.  (Mine won't, since the applications will be shut down, and
pg_hba.conf will block them.)

declare -gr SQL1="
    with
    c96 as (select table_name, row_count from dba.migration_table_counts
where location = '96'),
    c14 as (select table_name, row_count from dba.migration_table_counts
where location = '14')
    select c96.*, c14.*
    from c96 full join c14 on c96.table_name = c14.table_name
    where c96.table_name is null
       or c14.table_name is null
    order by c96.table_name, c14.table_name;"
psql $DB -ac "$SQL1"

declare -gr SQL2="
    with
    c96 as (select table_name, row_count from dba.migration_table_counts
where location = '96'),
    c14 as (select table_name, row_count from dba.migration_table_counts
where location = '14')
    select c96.*, c14.*, c96.row_count - c14.row_count as row_diff
    from c96 inner join c14 on c96.table_name = c14.table_name
    where c96.row_count != c14.row_count
    order by c96.table_name;"
psql $DB -ac "$SQL2"

Re: Scriptable way to validate a pg_dump restore ?

Reply via email to