On Sun, Jul 31, 2005 at 08:51:06AM -0800, Matthew Schumacher wrote: > Ok, here is the current plan. > > Change the spamassassin API to pass a hash of tokens into the storage > module, pass the tokens to the proc as an array, start a transaction, > load the tokens into a temp table using copy, select the tokens distinct > into the token table for new tokens, update the token table for known > tokens, then commit.
You might consider: UPDATE tokens FROM temp_table (this updates existing records) INSERT INTO tokens SELECT ... FROM temp_table WHERE NOT IN (SELECT ... FROM tokens) This way you don't do an update to newly inserted tokens, which helps keep vacuuming needs in check. > This solves the following problems: > > 1. Each email is a transaction instead of each token. > 2. The update statement is only called when we really need an update > which avoids all of those searches. > 3. The looping work is done inside the proc instead of perl calling a > method a zillion times per email. > > I'm not sure how vacuuming will be done yet, if we vacuum once per email > that may be too often, so I may do that every 5 mins in cron. I would suggest leaving an option to have SA vacuum every n emails, since some people may not want to mess with cron, etc. I suspect that pg_autovacuum would be able to keep up with things pretty well, though. -- Jim C. Nasby, Database Consultant [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 Windows: "Where do you want to go today?" Linux: "Where do you want to go tomorrow?" FreeBSD: "Are you guys coming, or what?" ---------------------------(end of broadcast)--------------------------- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly