On Mon, Jan 29, 2024 at 3:12 AM Laura Smith < n5d9xq3ti233xiyif...@protonmail.ch> wrote:
> Hi > > Let's say I've got a scenario where I'm doing a pg_dump replication rather > than online streaming, e.g. due to air-gap or whatever. > > Is there a scriptable way to validate the restore ? e.g. using doing > something clever with ctid or something to ensure both the schema and all > its rows were restored to the same point at which the dump was taken ? > 1. Since pg_dump creates a logical copy, nothing binary like ctid will work. 2. pg_dump is designed to take a snapshot.If it doesn't, it would fail for people who do logical replication. However, no one has noticed. That's kinda like being asked to prove that rocks *always* fall when you drop them. Either you trust physics, because physics has always worked, or you must watch every rock, because next time it might not fall. The analogy is slightly flawed, since we always check the pg_dump and pg_restore return codes, since something else might impact their function. But if you still need evidence, here's what I'm doing to verify table and record counts during a 9.6 -> 14 migration. You'll have to modify it for your purpose. Create this table and function beforehand: CREATE TABLE dba.migration_table_counts ( location text check (location in ('96', '14')) , table_name text , row_count bigint , count_time timestamp without time zone default current_timestamp , primary key (table_name, location) ); CREATE OR REPLACE FUNCTION dba.get_table_counts(_p_source TEXT) RETURNS INTEGER LANGUAGE plpgsql AS $func$ DECLARE r RECORD; _sql TEXT; _table_count BIGINT; BEGIN FOR r IN select relnamespace::regnamespace::text||'.'||relname as table_name from pg_class cla where relkind = 'r' and not exists (select 1 -- excludes parent tables from pg_inherits inh1 where inh1.inhparent = cla.oid) and relnamespace::regnamespace::text not in ('pg_catalog', 'information_schema', 'dba') order by 1 LOOP _sql := FORMAT('SELECT COUNT(*) FROM %s;', r.table_name); RAISE NOTICE '% %', to_char(clock_timestamp(), 'YYYY-MM-DD HH24:MI: SS.MS'), _sql; EXECUTE _sql INTO _table_count; --RAISE NOTICE '%', _table_count; INSERT INTO dba.migration_table_counts (location, table_name, row_count) VALUES (_p_source, r.table_name, _table_count); END LOOP; RAISE NOTICE '% %', to_char(clock_timestamp(), 'YYYY-MM-DD HH24:MI:SS.MS'), 'Finished'; RETURN 0; END $func$; Run this script in a cron job that executes at the same time as your pg_dump cron job. Parameters should be for the source database. #!/bin/bash declare -gr Server=$1 declare -gr DB=$2 declare -gr Source=$3 if [ -z $Server ]; then echo "Requires a server name."; exit 2; fi if [ -z $DB ]; then echo "Requires a DB name."; exit 3; fi if [ -z $Source ]; then echo "Requires a source: 96|14."; exit 4; fi psql -U postgres -h ${Server} $DB -Xac "DELETE FROM dba.migration_table_counts WHERE location = '$Source';" psql -U postgres -h ${Server} $DB -Xac "select * from dba.get_table_counts('$Source');" Run the same script on the destination server after the pg_restore is finished. Dump the source dba.migration_table_counts then load it into the destination dba.migration_table_counts. These two queries run on the destination server will check that all tables exist in both databases, and that the record counts are the same. You'll have some slight variations, since the two jobs are in separate transactions. (Mine won't, since the applications will be shut down, and pg_hba.conf will block them.) declare -gr SQL1=" with c96 as (select table_name, row_count from dba.migration_table_counts where location = '96'), c14 as (select table_name, row_count from dba.migration_table_counts where location = '14') select c96.*, c14.* from c96 full join c14 on c96.table_name = c14.table_name where c96.table_name is null or c14.table_name is null order by c96.table_name, c14.table_name;" psql $DB -ac "$SQL1" declare -gr SQL2=" with c96 as (select table_name, row_count from dba.migration_table_counts where location = '96'), c14 as (select table_name, row_count from dba.migration_table_counts where location = '14') select c96.*, c14.*, c96.row_count - c14.row_count as row_diff from c96 inner join c14 on c96.table_name = c14.table_name where c96.row_count != c14.row_count order by c96.table_name;" psql $DB -ac "$SQL2"