Rob, I'm going to make half of the list cringe at this suggestion though I have used it successfully.
If you can guarantee the table will not be vacuumed during this cleanup or rows you want deleted updated, I would suggest using the ctid column to facilitate the delete. Using the simple transaction below, I have witnessed a DELETE move much more quickly than one using a PK or any other column with an index. BEGIN; SELECT ctid INTO TEMP TABLE ctids_to_be deleted FROM my_big_table WHERE *delete criteria*; DELETE FROM my_big_table bt USING ctids_to_be_deleted dels WHERE bt.ctid = dels.ctid; COMMIT; HTH. -Greg On Fri, May 17, 2013 at 5:26 AM, Rob Emery <re-pg...@codeweavers.net> wrote: > Hi All, > > We've got 3 quite large tables that due to an unexpected surge in > usage (!) have grown to about 10GB each, with 72, 32 and 31 million > rows in. I've been tasked with cleaning out about half of them, the > problem I've got is that even deleting the first 1,000,000 rows seems > to take an unreasonable amount of time. Unfortunately this is on quite > an old server (Dell 2950 with a RAID-10 over 6 disks) running Postgres > 8.4; which serves other things like our logging systems. > > If I run a sustained (more than about 5 minutes) delete it'll have a > detrimental effect on the other services. I'm trying to batch up the > deletes into small chunks of approximately 1 month of data ; even this > seems to take too long, I originally reduced this down to a single > day's data and had the same problem. I can keep decreasing the size of > the window I'm deleting but I feel I must be doing something either > fundamentally wrong or over-complicating this enormously. I've > switched over to retrieving a list of IDs to delete, storing them in > temporary tables and deleting based on the primary keys on each of the > tables with something similar to this: > > BEGIN TRANSACTION; > > CREATE TEMPORARY TABLE table_a_ids_to_delete (id INT); > CREATE TEMPORARY TABLE table_b_ids_to_delete (id INT); > > INSERT INTO table_a_ids_to_delete > SELECT id FROM table_a WHERE purchased ='-infinity' AND created_at > < '2007-01-01T00:00:00'; > > INSERT INTO table_b_ids_to_delete > SELECT table_b_id FROM table_a_table_b_xref > INNER JOIN table_a_ids_to_delete ON (table_a_ids_to_delete.id = > table_a_table_b.quote_id); > > DELETE FROM table_a_table_b_xref USING table_a_ids_to_delete > WHERE table_a_table_b_xref.table_a_id = table_a_ids_to_delete.id; > > DELETE FROM table_b USING table_b_ids_to_delete > WHERE table_b.id = table_b_ids_to_delete.id; > > DELETE FROM table_a USING table_a_ids_to_delete > WHERE table_a.id = table_a_ids_to_delete.id; > > COMMIT; > > There're indices on table_a on the queried columns, table_b's primary > key is it's id, and table_a_table_b_xref has an index on (table_a_id, > table_b_id). There're FK defined on the xref table, hence why I'm > deleting from it first. > > Does anyone have any ideas as to what I can do to make the deletes any > faster? I'm running out of ideas! > > Thanks in advance, > > -- > Rob Emery > > > -- > Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance >