Josh <slus...@gmail.com> writes:
> I'm trying to do a DELETE FROM on my large table (about 800 million
> rows) based on the contents of another, moderately large table (about
> 110 million rows). The command I'm using is:

> DELETE FROM records WHERE id NOT IN (SELECT id FROM unique_records);

> This process ran for about two weeks before I decided to stop it -- it
> was dragging down the DB server. I can understand long-running
> processes, but two weeks seems a bit much even for a big table.

> Is this the best way to approach the problem? Is there a better way?

> Some background: The server is version 8.3, running nothing but Pg.
> The 'records' table has 'id' as its primary key, and one other index
> on another column. The table is referenced by just about every other
> table in my DB (about 15 other tables) via foreign key constraints,

Hmm ... do all of those referencing tables have indexes on the
referencing columns?  It seems plausible that the time is going into
seqscan searches for referencing rows.

You might try doing EXPLAIN ANALYZE of this same delete for a limited
number of rows (maybe 1000 or so) so that you could see what plan you're
getting and where the time really goes.  I think 8.3 had the ability to
break out time spent in triggers, so if the problem is the FK
propagation, EXPLAIN ANALYZE would show it.

Also, the NOT IN is probably going to suck performance-wise no matter
what, for such large numbers of rows.  Converting to NOT EXISTS might
help some, though I don't remember right now how smart 8.3 is about
either.

                        regards, tom lane

-- 
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql

Reply via email to