Hello.

I have a database with company table that have a bunch of related
(delete=cascade) tables.
Also it has 1<->M relation to company_descr table.
Once we've found that ~half of our companies do not have any description and
we would like to remove them.
First this I've tried was
delete from company where id not in (select company_id from company_descr);
I've tried to analyze command, but unlike to other RDBM I've used it did not
include cascade deletes/checks into query plan. That is first problem.
It was SLOW. To make it faster I've done next thing:

create temporary table comprm(id) as select id from company;
delete from comprm where id in (select company_id from company_descr);
delete from company where id in (select id from comprm);

That was much better. So the question is why postgresql can't do such a
thing.
But it was better only until "removing" dataset was small (~5% of all
table).
As soon as I've tried to remove 50% I've got speed problems. I've ensured I
have all indexes for both ends of foreign key.
I've tried to remove all cascaded entries by myself, e.g.:

create temporary table comprm(id) as select id from company;
delete from comprm where id in (select company_id from company_descr);
delete from company_alias where company_id in (select id from comprm);
...
delete from company where id in (select id from comprm);

It did not help until I drop all constraints before and recreate all
constraints after.
Now I have it work for 15minutes, while previously it could not do in a day.

Is it OK? I'd say, some (if not all) of the optimizations could be done by
postgresql optimizer.

Reply via email to