On 6/15/07, Mark Martinec <[EMAIL PROTECTED]> wrote:
> The README.sql-pg in amavisd-new distribution suggests
> the following SQL clauses to purge old records from a database:
>
> DELETE FROM msgs WHERE time_iso < now() - INTERVAL '3 weeks';

The script that does this on my amavis-setup had a convenient bug in
it so that it refused to run, leading to no deletion, leading to a
database larger than 4 gigabytes, leading to the above taking days. (I
only let it run for 48 hours though before I started looking for more
efficient methods.)

The problem is the foreign key constraints on msgrcpt.

I found a solution:
* put/have a proxy in front of the amavis-server to queue mail
* stop postfix+amavis on the amavis-server
* rename msgrcpt to msgrcpt_old (alter table...)
* record the grants and foreign key constraints somewhere, for
instance via  pg_dump(1)
* make a new msgrcpt that lacks all foreign key constraints
* make a temporary table of all "new" mail_id in msgs (flip the comparison)
* copy from msgrcpt_old all fields with new mail_ids (insert with
inner join with temp-table)
* drop msgrcpt_old (and suddenly the db was two gigs smaller...)
* THEN delete old msgs with the standard method
* add foreign keys to the "new" msgrcpt
* fix the grants
* turn on amavis and postfix

Voila, takes less than 30 minutes, including typing/pasting, on 3.4
million rows in msgs/4.2 million rows in msgrcpt which should be a
small enough delay for most uses.

Also, if on a 32-bit linux system, set SHMALL (sysctl(1)) to the
amount of memory on the server and SHMMAX to at least a gigabyte
provided you have more than a gig on the system.

> DELETE FROM msgs WHERE time_iso < now() - INTERVAL '1 h' AND content IS NULL;

This was only a few hundred messages so with an index on time_iso this
was quick.

> DELETE FROM maddr WHERE id IN (
>   SELECT id FROM maddr LEFT JOIN (
>     SELECT sid AS id, 1 AS f FROM msgs UNION ALL
>     SELECT rid AS id, 1 AS f FROM msgrcpt
>   ) AS u USING(id) WHERE u.f IS NULL);
>
> I'd be interested in your experience before I update
> the documentation.

DELETEs are expensive but SELECTs are quick. The cost (through EXPLAIN
<query>) of the above for my data (after getting rid of old messages)
was "cost=1623113.61..1765429.58 rows=1664586 width=6".

So instead, I used temporary tables to get a list of maddr.ids to
delete, and the cost for the final delete was
"cost=86070.32..177104.67 rows=1247620 width=6", an order of magnitude
less.

As it says in the postgres-docs, what is speedy for a small db might
not be speedy for a large one. It seems that for postgres, using
temporary tables can pay in a big way.


HM

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
AMaViS-user mailing list
AMaViS-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/amavis-user
AMaViS-FAQ:http://www.amavis.org/amavis-faq.php3
AMaViS-HowTos:http://www.amavis.org/howto/

Reply via email to