Re: creating a subset DB efficiently ?

Ron Johnson Fri, 08 Mar 2024 09:05:38 -0800

On Fri, Mar 8, 2024 at 11:22 AM David Gauthier <dfgpostg...@gmail.com>
wrote:


> Here's the situation....
>
> - The DB contains data for several projects.
> - The tables of the DB contain data for all projects (data is not
> partitioned on project name or anything like that)
> - The "project" identifier (table column) exists in a few "parent" tables
> with many child... grandchild,... tables under them connected with foreign
> keys defined with "on delete cascade".  So if a record in one of the parent
> table records is deleted, all of its underlying, dependent records get
> deleted too.
> - New projects come in, and old ones need to be removed and "archived" in
> DBs of their own.  So there's a DB called "active_projects" and there's a
> DB called "project_a_archive" (identical metadata).
> - The idea is to copy the data for project "a" that's in "active_projects"
> to the "project_a_arhchive" DB AND delete the project a data out of
> "active_projects".
> - Leave "project_a_archive" up and running if someone needs to attach to
> that and get some old/archived data.
>
> The brute-force method I've been using is...
> 1)  pg_dump "active_projects" to a (huge) file then populate
> "project_a_archive" using that (I don't have the privs to create database,
> IT creates an empty one for me, so this is how I do it).
> 2) go into the "project_a_archive" DB and run... "delete from par_tbl_1
> where project <> 'a' ", "delete from par_tbl_2 where project <> 'a' ",
> etc... leaving only project "a" data in the DB.
> 3) go into the "active_projects" DB and "delete from par_tbl_1 where
> project = 'a' ", etc... removing project "a" from the "active_projects DB.
>
> Ya, not very elegant, it takes a long time and it takes a lot of
> resources.  So I'm looking for ideas on how to do this better.
>
>
I wrote this function to list all of the FK "children" that reference a
"parent" table.  It might help you determine all of the children,
grandchildren, etc.

CREATE OR REPLACE FUNCTION dba.get_fk_referenced_by_child(_p_table_name
TEXT)
    RETURNS TABLE (downstream_table TEXT, downstream_column NAME,
downstream_index NAME)
    LANGUAGE plpgsql
    AS
$$
    DECLARE
        _v_name TEXT;
        _v_downstream_index_name NAME;
        _c_downstream CURSOR FOR
            select
cla2.relnamespace::regnamespace::text||'.'||co.conrelid::regclass::text AS
downstream_table
                    , co.conrelid AS downstream_oid
                    , att.attname AS downstream_column
            from pg_constraint co
                inner join pg_class cla1
                    on co.confrelid = cla1.oid
                inner join pg_class cla2
                    on co.conrelid = cla2.oid
                inner join pg_attribute att
                    on co.conrelid = att.attrelid
            where co.contype = 'f'
                and
cla1.relnamespace::regnamespace::text||'.'||cla1.relname = _v_name
                and att.attnum = co.conkey[1]
            order by 1;
    BEGIN
        _v_name := TRIM(BOTH FROM _p_table_name);
        PERFORM relname
        FROM pg_class
        WHERE relnamespace::regnamespace::text||'.'||relname = _v_name
          AND relkind = 'r';
        IF NOT FOUND THEN
            downstream_table := _v_name;
            downstream_column := 'does not exist';
            downstream_index := NULL;
            RETURN NEXT;
            RETURN;
        END IF;
        FOR i IN _c_downstream
        LOOP
            downstream_table := i.downstream_table;
            downstream_column := i.downstream_column;
            SELECT indexrelid::regclass
            into downstream_index
            FROM pg_index
                INNER JOIN pg_attribute ON indexrelid = attrelid
            WHERE indrelid = i.downstream_oid
              AND attname = i.downstream_column;
            IF downstream_index IS NULL THEN
                downstream_index := 'ALERT!!! MISSING INDEX';
            END IF;
            RETURN NEXT;
        END LOOP;
    END;
$$;



> Related question...
> The "delete from par_tbl_a where project <> 'a' " is taking forever.  I
> fear it's because it's trying to journal everything in case I want to
> rollback.  But this is just in the archive DB and I don't mind taking the
> risk if I can speed this up outside of a transaction.  How can I run a
> delete command like this without the rollback recovery overhead ?
>

Re: creating a subset DB efficiently ?

Reply via email to