On 6/15/07, Mark Martinec <[EMAIL PROTECTED]> wrote: > The README.sql-pg in amavisd-new distribution suggests > the following SQL clauses to purge old records from a database: > > DELETE FROM msgs WHERE time_iso < now() - INTERVAL '3 weeks';
The script that does this on my amavis-setup had a convenient bug in it so that it refused to run, leading to no deletion, leading to a database larger than 4 gigabytes, leading to the above taking days. (I only let it run for 48 hours though before I started looking for more efficient methods.) The problem is the foreign key constraints on msgrcpt. I found a solution: * put/have a proxy in front of the amavis-server to queue mail * stop postfix+amavis on the amavis-server * rename msgrcpt to msgrcpt_old (alter table...) * record the grants and foreign key constraints somewhere, for instance via pg_dump(1) * make a new msgrcpt that lacks all foreign key constraints * make a temporary table of all "new" mail_id in msgs (flip the comparison) * copy from msgrcpt_old all fields with new mail_ids (insert with inner join with temp-table) * drop msgrcpt_old (and suddenly the db was two gigs smaller...) * THEN delete old msgs with the standard method * add foreign keys to the "new" msgrcpt * fix the grants * turn on amavis and postfix Voila, takes less than 30 minutes, including typing/pasting, on 3.4 million rows in msgs/4.2 million rows in msgrcpt which should be a small enough delay for most uses. Also, if on a 32-bit linux system, set SHMALL (sysctl(1)) to the amount of memory on the server and SHMMAX to at least a gigabyte provided you have more than a gig on the system. > DELETE FROM msgs WHERE time_iso < now() - INTERVAL '1 h' AND content IS NULL; This was only a few hundred messages so with an index on time_iso this was quick. > DELETE FROM maddr WHERE id IN ( > SELECT id FROM maddr LEFT JOIN ( > SELECT sid AS id, 1 AS f FROM msgs UNION ALL > SELECT rid AS id, 1 AS f FROM msgrcpt > ) AS u USING(id) WHERE u.f IS NULL); > > I'd be interested in your experience before I update > the documentation. DELETEs are expensive but SELECTs are quick. The cost (through EXPLAIN <query>) of the above for my data (after getting rid of old messages) was "cost=1623113.61..1765429.58 rows=1664586 width=6". So instead, I used temporary tables to get a list of maddr.ids to delete, and the cost for the final delete was "cost=86070.32..177104.67 rows=1247620 width=6", an order of magnitude less. As it says in the postgres-docs, what is speedy for a small db might not be speedy for a large one. It seems that for postgres, using temporary tables can pay in a big way. HM ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ AMaViS-user mailing list AMaViS-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/amavis-user AMaViS-FAQ:http://www.amavis.org/amavis-faq.php3 AMaViS-HowTos:http://www.amavis.org/howto/