Re: [SQL] Re: How to keep at-most N rows per group? periodic DELETEs or constraints or..?
On 1/9/08, Erik Jones <[EMAIL PROTECTED]> wrote: > On Jan 9, 2008, at 1:09 PM, Scott Marlowe wrote: > > On Jan 9, 2008 12:20 PM, Steve Midgley <[EMAIL PROTECTED]> wrote: > >> This is kludgy but you would have some kind of random number test at > >> the start of the trigger - if it evals true once per every ten > >> calls to > >> the trigger (say), you'd cut your delete statements execs by about > >> 10x > >> and still periodically truncate every set of user rows fairly > >> often. > > > > And, if you always through a limit 50 on the end of queries that > > retrieve data, you could let it grow quite a bit more than 60 or 70... > > Say 200. Then you could have it so that the random chopper function > > only gets kicked off every 100th or so time. > > I like that idea. I do too! I'm going to have a shot at implementing this tomorrow. Thanks for all your opinions guys :-) Cheers, Jamie ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [SQL] Re: How to keep at-most N rows per group? periodic DELETEs or constraints or..?
On Jan 9, 2008, at 1:09 PM, Scott Marlowe wrote: On Jan 9, 2008 12:20 PM, Steve Midgley <[EMAIL PROTECTED]> wrote: This is kludgy but you would have some kind of random number test at the start of the trigger - if it evals true once per every ten calls to the trigger (say), you'd cut your delete statements execs by about 10x and still periodically truncate every set of user rows fairly often. On average you'd have ~55 rows per user, never less than 50 and a few outliers with 60 or 70 rows before they get trimmed back down to 50.. Seems more reliable than a cron job, and solves your problem of an ever growing table? You could adjust the random number test easily if you change your mind of the balance of size of table vs. # of delete statements down the road. And, if you always through a limit 50 on the end of queries that retrieve data, you could let it grow quite a bit more than 60 or 70... Say 200. Then you could have it so that the random chopper function only gets kicked off every 100th or so time. I like that idea. Erik Jones DBA | Emma® [EMAIL PROTECTED] 800.595.4401 or 615.292.5888 615.292.0777 (fax) Emma helps organizations everywhere communicate & market in style. Visit us online at http://www.myemma.com ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [SQL] Re: How to keep at-most N rows per group? periodic DELETEs or constraints or..?
On Jan 9, 2008 12:20 PM, Steve Midgley <[EMAIL PROTECTED]> wrote: > This is kludgy but you would have some kind of random number test at > the start of the trigger - if it evals true once per every ten calls to > the trigger (say), you'd cut your delete statements execs by about 10x > and still periodically truncate every set of user rows fairly often. On > average you'd have ~55 rows per user, never less than 50 and a few > outliers with 60 or 70 rows before they get trimmed back down to 50.. > Seems more reliable than a cron job, and solves your problem of an ever > growing table? You could adjust the random number test easily if you > change your mind of the balance of size of table vs. # of delete > statements down the road. And, if you always through a limit 50 on the end of queries that retrieve data, you could let it grow quite a bit more than 60 or 70... Say 200. Then you could have it so that the random chopper function only gets kicked off every 100th or so time. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
[SQL] Re: How to keep at-most N rows per group? periodic DELETEs or constraints or..?
At 07:20 AM 1/9/2008, [EMAIL PROTECTED] wrote: Date: Tue, 8 Jan 2008 17:41:18 + From: "Jamie Tufnell" <[EMAIL PROTECTED]> To: pgsql-sql@postgresql.org Subject: Re: How to keep at-most N rows per group? periodic DELETEs or constraints or..? Message-ID: <[EMAIL PROTECTED]> On 1/8/08, codeWarrior <[EMAIL PROTECTED]> wrote: > Jamie: > > I think you are probably having slowdown issues in your "DELETE FROM WHERE > NOT IN SELECT ORDER BY DESCENDING" construct -- that seems a bit convoluted > to me Hmm so rather than NOT IN ( .. LIMIT 50) would you suggest IN ( ... OFFSET 50) like in Erik's example? Or something else entirely? > ALSO: It looks to me like you have a column named "timestamp' ??? This is > bad practice since "timestamp" is a reserved word... You really ought NOT to > use reserved words for column names... different debate. I do realize it would be better to use something else and thanks for the tip This is an established database and "timestamp" has been used in other tables which is why I stuck to it here.. one day when time permits maybe I'll rename them all! > Why bother deleting records anyway ? Why not alter your query that tracks > the 50 records to LIMIT 50 ??? The read query does LIMIT 50 and the reason for deleting the rest of the records is because they're not needed by the application and there's loads of them being created all the time (currently several million unnecessary rows) -- I imagine eventually this will slow things down? Do you think a regular batch process to delete rows might be more appropriate than a trigger in this scenario? Thanks, Jamie This is kludgy but you would have some kind of random number test at the start of the trigger - if it evals true once per every ten calls to the trigger (say), you'd cut your delete statements execs by about 10x and still periodically truncate every set of user rows fairly often. On average you'd have ~55 rows per user, never less than 50 and a few outliers with 60 or 70 rows before they get trimmed back down to 50.. Seems more reliable than a cron job, and solves your problem of an ever growing table? You could adjust the random number test easily if you change your mind of the balance of size of table vs. # of delete statements down the road. Steve ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
[SQL] Re: How to keep at-most N rows per group? periodic DELETEs or constraints or..?
I think what you want is related to this post on how to create a FIFO queue in Postgres: http://people.planetpostgresql.org/greg/index.php?/archives/89-Implementing-a-queue-in-SQL-Postgres-version.html The major difference is that you want a FIFO queue per user_id, so the triggering code would want to bump old records aggregating on user_id to calculate the "limit" subquery. His original code is this: DELETE FROM q WHERE id NOT IN (SELECT id FROM q ORDER BY id DESC LIMIT maxrows); And that subquery is where (I think!) you'd want to add "where user_id = [user_id]" - I'm not sure how you'll pass user_id into this function, maybe someone else can help with that? Hopefully this is useful? Steve At 06:24 AM 1/8/2008, [EMAIL PROTECTED] wrote: Date: Tue, 8 Jan 2008 14:24:22 + From: "Jamie Tufnell" <[EMAIL PROTECTED]> To: pgsql-sql@postgresql.org Subject: How to keep at-most N rows per group? periodic DELETEs or constraints or..? Message-ID: <[EMAIL PROTECTED]> Hi, I have a table that stores per-user histories of recently viewed items and I'd like to limit the amount of history items to <= 50 per user. I'm considering doing this with a query run from cron every so often but I'm not happy with what I've come up with so far, and since it's a quite active table I thought I'd ask here to see if there's a more efficient way. Right now the table structure is as follows... user_item_history: id (PK), user_id (FK), item_id (FK), timestamp For user_ids that have more than 50 rows, I want to keep the most recent 50 and delete the rest. The most obvious way of doing this for me is: -- -- Get the user_ids with 50 or more history entries like this -- SELECT user_id, count(*) FROM user_scene_history GROUP BY user_id HAVING count(*) > 50; -- -- Then iterate the ids above (_user_id) -- DELETE FROM user_scene_history WHERE user_id = _user_id AND id NOT IN ( SELECT id FROM user_scene_history WHERE user_id = _user_id ORDER BY timestamp DESC LIMIT 50); I've left out the simple logic tying the above two queries together for clarity.. I haven't actually tested this but while I assume it would work I imagine there is a neater and possibly more efficient way of attacking this. I'm also open to different approaches of limiting the user's history too ... perhaps with table constraints so they can simply never exceed 50 entries? But I'm not sure how to do this.. Any help would be greatly appreciated.. Thanks, Jamie