On Fri, Mar 8, 2024 at 11:22 AM David Gauthier <dfgpostg...@gmail.com> wrote:
> Here's the situation.... > > - The DB contains data for several projects. > - The tables of the DB contain data for all projects (data is not > partitioned on project name or anything like that) > - The "project" identifier (table column) exists in a few "parent" tables > with many child... grandchild,... tables under them connected with foreign > keys defined with "on delete cascade". So if a record in one of the parent > table records is deleted, all of its underlying, dependent records get > deleted too. > - New projects come in, and old ones need to be removed and "archived" in > DBs of their own. So there's a DB called "active_projects" and there's a > DB called "project_a_archive" (identical metadata). > - The idea is to copy the data for project "a" that's in "active_projects" > to the "project_a_arhchive" DB AND delete the project a data out of > "active_projects". > - Leave "project_a_archive" up and running if someone needs to attach to > that and get some old/archived data. > > The brute-force method I've been using is... > 1) pg_dump "active_projects" to a (huge) file then populate > "project_a_archive" using that (I don't have the privs to create database, > IT creates an empty one for me, so this is how I do it). > 2) go into the "project_a_archive" DB and run... "delete from par_tbl_1 > where project <> 'a' ", "delete from par_tbl_2 where project <> 'a' ", > etc... leaving only project "a" data in the DB. > 3) go into the "active_projects" DB and "delete from par_tbl_1 where > project = 'a' ", etc... removing project "a" from the "active_projects DB. > > Ya, not very elegant, it takes a long time and it takes a lot of > resources. So I'm looking for ideas on how to do this better. > > I wrote this function to list all of the FK "children" that reference a "parent" table. It might help you determine all of the children, grandchildren, etc. CREATE OR REPLACE FUNCTION dba.get_fk_referenced_by_child(_p_table_name TEXT) RETURNS TABLE (downstream_table TEXT, downstream_column NAME, downstream_index NAME) LANGUAGE plpgsql AS $$ DECLARE _v_name TEXT; _v_downstream_index_name NAME; _c_downstream CURSOR FOR select cla2.relnamespace::regnamespace::text||'.'||co.conrelid::regclass::text AS downstream_table , co.conrelid AS downstream_oid , att.attname AS downstream_column from pg_constraint co inner join pg_class cla1 on co.confrelid = cla1.oid inner join pg_class cla2 on co.conrelid = cla2.oid inner join pg_attribute att on co.conrelid = att.attrelid where co.contype = 'f' and cla1.relnamespace::regnamespace::text||'.'||cla1.relname = _v_name and att.attnum = co.conkey[1] order by 1; BEGIN _v_name := TRIM(BOTH FROM _p_table_name); PERFORM relname FROM pg_class WHERE relnamespace::regnamespace::text||'.'||relname = _v_name AND relkind = 'r'; IF NOT FOUND THEN downstream_table := _v_name; downstream_column := 'does not exist'; downstream_index := NULL; RETURN NEXT; RETURN; END IF; FOR i IN _c_downstream LOOP downstream_table := i.downstream_table; downstream_column := i.downstream_column; SELECT indexrelid::regclass into downstream_index FROM pg_index INNER JOIN pg_attribute ON indexrelid = attrelid WHERE indrelid = i.downstream_oid AND attname = i.downstream_column; IF downstream_index IS NULL THEN downstream_index := 'ALERT!!! MISSING INDEX'; END IF; RETURN NEXT; END LOOP; END; $$; > Related question... > The "delete from par_tbl_a where project <> 'a' " is taking forever. I > fear it's because it's trying to journal everything in case I want to > rollback. But this is just in the archive DB and I don't mind taking the > risk if I can speed this up outside of a transaction. How can I run a > delete command like this without the rollback recovery overhead ? >