On 01/12/2017 12:06 PM, Jonathan Vanasco wrote:
> I'm just wondering if there's a more efficient way of handling a certain 
> periodic data migration.
> 
> We have a pair of tables with this structure:
> 
>       table_a__live
>               column_1 INT
>               column_2 INT
>               record_timestamp TIMESTAMP
> 
>       table_a__archive
>               column_1 INT
>               column_2 INT
>               record_timestamp TIMESTAMP
> 
> periodically, we must migrate items that are 'stale' from `table_a__live ` to 
> `table_a__archive`.  The entries are copied over to the archive, then deleted.
> 
> The staleness is calculated based on age--  so we need to use INTERVAL.  the 
> "live" table can have anywhere from 100k to 20MM records.
> 
> the primary key on `table_a__live` is a composite of column_1 & column_2, 
> 
> In order to minimize scanning the table, we opted to hint migrations with a 
> dedicated column:
> 
>       ALTER TABLE table_a__live ADD is_migrate BOOLEAN DEFAULT NULL;
>       CREATE INDEX idx_table_a__live_migrate ON table_a__live(is_migrate) 
> WHERE is_migrate IS NOT NULL;
> 
> so our migration is then based on that `is_migrate` column:
> 
>       BEGIN;
>       UPDATE table_a__live SET is_migrate = TRUE WHERE record_timestamp < 
> transaction_timestamp() AT TIME ZONE 'UTC' - INTERVAL '1 month';
>       INSERT INTO table_a__archive (column_1, column_2, record_timestamp) 
> SELECT column_1, column_2, record_timestamp FROM table_a__live WHERE 
> is_migrate IS TRUE;
>       DELETE FROM table_a__live WHERE is_migrate IS TRUE; 
>       COMMIT;
> 
> The inserts & deletes are blazing fast, but the UPDATE is a bit slow from 
> postgres re-writing all the rows. 

Maybe I am missing something, but why do the UPDATE?

Why not?:

BEGIN;

    INSERT INTO 
        table_a__archive (column_1, column_2, record_timestamp) 
    SELECT 
        column_1, column_2, record_timestamp 
    FROM 
        table_a__live 
    WHERE 
        record_timestamp < transaction_timestamp() AT TIME ZONE 'UTC' - 
INTERVAL '1 month';

    DELETE FROM 
       table_a__live 
    WHERE 
       record_timestamp < transaction_timestamp() AT TIME ZONE 'UTC' - INTERVAL 
'1 month'; 
COMMIT;

With an index on record_timestamp.

> 
> can anyone suggest a better approach?
> 
> I considered copying everything to a tmp table then inserting/deleting based 
> on that table -- but there's a lot of disk-io on that approach too.
> 
> 
> fwiw we're on postgres9.6.1
> 


-- 
Adrian Klaver
adrian.kla...@aklaver.com


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Reply via email to