Re: [HACKERS] count(*) performance improvement ideas

2008-06-23 Thread Bruce Momjian
Added to TODO: * Allow custom variables to appear in pg_settings() --- Tom Lane wrote: Magnus Hagander [EMAIL PROTECTED] writes: Tom Lane wrote: Really? [ pokes around ... ] Hm, you're right, because

Re: [HACKERS] count(*) performance improvement ideas

2008-04-17 Thread PFC
On Thu, 17 Apr 2008 02:48:37 +0200, Stephen Denne [EMAIL PROTECTED] wrote: PFC wrote: Let's try this quick dirty implementation of a local count-delta cache using a local in-memory hashtable (ie. {}). CREATE OR REPLACE FUNCTION update_count( key TEXT, delta INTEGER ) RETURNS INTEGER

Re: [HACKERS] count(*) performance improvement ideas

2008-04-17 Thread Tom Lane
Joe Conway [EMAIL PROTECTED] writes: Tom Lane wrote: I thought about this in the shower just now, and ISTM that if we want to turn this into an actual feature rather than a kluge, there needs to be some sort of define variable command that sets up a custom variable and specifies its type (and

Re: [HACKERS] count(*) performance improvement ideas

2008-04-17 Thread Stephen Denne
PFC wrote [EMAIL PROTECTED] wrote: I don't know plpythonu (nor python), just read a few docs now: Learn Python, it is a really useful language ;) My wife has a snake phobia, besides, I've just started learning Scala. There is no sharing between processes, so - both SD

Re: [HACKERS] count(*) performance improvement ideas

2008-04-17 Thread Stephen Denne
The big gotcha is that these are all non-transactional : if you rollback, GD and SD stay the same, and when you issue a query, you can assume the state of SD and GD is random (due to previous queries) unless you initialize them to a known value. Using txid_current() as a

Re: [HACKERS] count(*) performance improvement ideas

2008-04-17 Thread PFC
My wife has a snake phobia, besides, I've just started learning Scala. Just had a look at Scala, it looks nice. Slightly Lispish (like all good languages)... txid_current() No... hold on, it is per session, and a session can't have two or more transactions active at once can it?

Re: [HACKERS] count(*) performance improvement ideas

2008-04-16 Thread Magnus Hagander
Tom Lane wrote: Stephen Denne [EMAIL PROTECTED] writes: From: Tom Lane [mailto:[EMAIL PROTECTED] As for 2) and 3), can't you look into the pg_settings view? pg_settings view doesn't contain custom variables created on the fly, Really? [ pokes around ... ] Hm, you're right, because

Re: [HACKERS] count(*) performance improvement ideas

2008-04-16 Thread Tom Lane
Magnus Hagander [EMAIL PROTECTED] writes: Tom Lane wrote: Really? [ pokes around ... ] Hm, you're right, because add_placeholder_variable() sets the GUC_NO_SHOW_ALL flag, and in this usage it'll never be cleared. I wonder if we should change that. The whole thing is a bit of an abuse of

Re: [HACKERS] count(*) performance improvement ideas

2008-04-16 Thread Joe Conway
Tom Lane wrote: Magnus Hagander [EMAIL PROTECTED] writes: Tom Lane wrote: Really? [ pokes around ... ] Hm, you're right, because add_placeholder_variable() sets the GUC_NO_SHOW_ALL flag, and in this usage it'll never be cleared. I wonder if we should change that. The whole thing is a bit

Re: [HACKERS] count(*) performance improvement ideas

2008-04-16 Thread Stephen Denne
Tom Lane wrote Stephen Denne [EMAIL PROTECTED] writes: From: Tom Lane [mailto:[EMAIL PROTECTED] As for 2) and 3), can't you look into the pg_settings view? pg_settings view doesn't contain custom variables created on the fly, Really? [ pokes around ... ] Hm, you're right, because

Re: [HACKERS] count(*) performance improvement ideas

2008-04-16 Thread Tom Lane
Stephen Denne [EMAIL PROTECTED] writes: Aside: It is currently more cumbersome to get a function to run, if needed, at commit. Ideal solution would be something like EXECUTE ON COMMIT my_function() or maybe SAVEPOINT my_name ON COMMIT my_function(), but these suggestions are made without

Re: [HACKERS] count(*) performance improvement ideas

2008-04-16 Thread PFC
The whole thing is a bit of an abuse of what the mechanism was intended for, and so I'm not sure we should rejigger GUC's behavior to make it more pleasant, but on the other hand if we're not ready to provide a better substitute ... In my experiments with materialized views, I identified

Re: [HACKERS] count(*) performance improvement ideas

2008-04-16 Thread Stephen Denne
Tom Lane wrote Transaction commit is an exceedingly subtle and carefully structured thing. Throwing random user-defined code into it ain't gonna happen. Deferred constraint triggers currently run random user-defined code. This'll do me. Regards, Stephen Denne. Disclaimer: At the Datamail

Re: [HACKERS] count(*) performance improvement ideas

2008-04-16 Thread Stephen Denne
PFC wrote: Let's try this quick dirty implementation of a local count-delta cache using a local in-memory hashtable (ie. {}). CREATE OR REPLACE FUNCTION update_count( key TEXT, delta INTEGER ) RETURNS INTEGER AS $$ if key in GD: GD[key] += delta else:

Re: [HACKERS] count(*) performance improvement ideas

2008-04-15 Thread Stephen Denne
(There is a possible performance bug mentioned at the end of the email, the rest is further discussion regarding materialised views) I wrote Pavan Deolasee wrote On Thu, Mar 20, 2008 at 3:24 AM, Stephen Denne [EMAIL PROTECTED] wrote: Pavan also refers to deferred triggers, which

Re: [HACKERS] count(*) performance improvement ideas

2008-04-15 Thread Tom Lane
Stephen Denne [EMAIL PROTECTED] writes: The custom variable class can also be used to aggregate the deltas within a transaction, though there are some minor difficulties: 1) They only store text 2) You can only find out that a value has not been set by catching an exception 3) There is no

Re: [HACKERS] count(*) performance improvement ideas

2008-04-15 Thread Stephen Denne
(apologies for the email format top posting, I've had to temporarily switch to using a web interface from home that doesn't seem to know what plain text is) pg_settings view doesn't contain custom variables created on the fly, (nor, from memory, ones defined in postgresql.conf. I'm not able

Re: [HACKERS] count(*) performance improvement ideas

2008-04-15 Thread Tom Lane
Stephen Denne [EMAIL PROTECTED] writes: From: Tom Lane [mailto:[EMAIL PROTECTED] As for 2) and 3), can't you look into the pg_settings view? pg_settings view doesn't contain custom variables created on the fly, Really? [ pokes around ... ] Hm, you're right, because

Re: [HACKERS] count(*) performance improvement ideas

2008-03-24 Thread Stephen Denne
Pavan Deolasee wrote On Thu, Mar 20, 2008 at 3:24 AM, Stephen Denne [EMAIL PROTECTED] wrote: Pavan also refers to deferred triggers, which has got me thinking about another possible solution: Instead of inserting a delta row, that will be updated a lot of times, create an on

Re: [HACKERS] count(*) performance improvement ideas

2008-03-20 Thread Pavan Deolasee
On Thu, Mar 20, 2008 at 3:24 AM, Stephen Denne [EMAIL PROTECTED] wrote: Pavan also refers to deferred triggers, which has got me thinking about another possible solution: Instead of inserting a delta row, that will be updated a lot of times, create an on commit drop temp table named

Re: [HACKERS] count(*) performance improvement ideas

2008-03-19 Thread Stephen Denne
Mark Mielke wrote This returns to the question of whether count of the whole table is useful, or whether count of a GROUP BY or WHERE is useful. If GROUP BY or WHERE is useful, then trigger on UPDATE becomes necessary. True... for the example I gave I should have had an update trigger on my

Re: [HACKERS] count(*) performance improvement ideas

2008-03-18 Thread Gregory Stark
Tom Lane [EMAIL PROTECTED] writes: Stephen Denne [EMAIL PROTECTED] writes: However I'm not after a fast count(*) from table, but more like a fast select grouping_id, count(*) from my_table group by grouping_id You could apply the same technique across each group id, though this

Re: [HACKERS] count(*) performance improvement ideas

2008-03-18 Thread Stephen Denne
(Sorry for the repeat email Tom, I forgot the cc's) Tom Lane wrote: Stephen Denne [EMAIL PROTECTED] writes: I initially thought that what you meant by having transactions enter delta entries was that I have a trigger that would create a row each time it was called, writing how many

Re: [HACKERS] count(*) performance improvement ideas

2008-03-18 Thread Stephen Denne
Tom Lane wrote Stephen Denne [EMAIL PROTECTED] writes: What I was asking about with those questions, is if a single row is inserted in transaction A, and updated 1000 times still within transaction A, then transaction A commits... does a single row version get written, or 1001, 1000

Re: [HACKERS] count(*) performance improvement ideas

2008-03-18 Thread Tom Lane
Stephen Denne [EMAIL PROTECTED] writes: Tom Lane wrote Umm ... AFAICS there is no need for an UPDATE to touch the count table at all. You'd only need ON INSERT and ON DELETE triggers. I'm not referring to updates of my base table... the single row inserted was referring to the delta row...

Re: [HACKERS] count(*) performance improvement ideas

2008-03-18 Thread Tom Lane
Stephen Denne [EMAIL PROTECTED] writes: What I was asking about with those questions, is if a single row is inserted in transaction A, and updated 1000 times still within transaction A, then transaction A commits... does a single row version get written, or 1001, 1000 of which are

Re: [HACKERS] count(*) performance improvement ideas

2008-03-18 Thread Mark Mielke
Tom Lane wrote: Stephen Denne [EMAIL PROTECTED] writes: What I was asking about with those questions, is if a single row is inserted in transaction A, and updated 1000 times still within transaction A, then transaction A commits... does a single row version get written, or 1001, 1000 of

Re: [HACKERS] count(*) performance improvement ideas

2008-03-17 Thread Stephen Denne
Tom Lane wrote Pavan Deolasee [EMAIL PROTECTED] writes: I am sure this must have been discussed before. Indeed. Apparently you didn't find the threads in which the idea of having transactions enter delta entries was discussed? Solves both the locking and the MVCC problems, at the cost

Re: [HACKERS] count(*) performance improvement ideas

2008-03-17 Thread Tom Lane
Stephen Denne [EMAIL PROTECTED] writes: I initially thought that what you meant by having transactions enter delta entries was that I have a trigger that would create a row each time it was called, writing how many records where inserted or deleted. I didn't understand how this would be

Re: [HACKERS] count(*) performance improvement ideas

2008-03-12 Thread Tom Lane
Pavan Deolasee [EMAIL PROTECTED] writes: I am sure this must have been discussed before. Indeed. Apparently you didn't find the threads in which the idea of having transactions enter delta entries was discussed? Solves both the locking and the MVCC problems, at the cost that you need to make

Re: [HACKERS] count(*) performance improvement ideas

2008-03-12 Thread Bruce Momjian
Tom Lane wrote: Pavan Deolasee [EMAIL PROTECTED] writes: I am sure this must have been discussed before. Indeed. Apparently you didn't find the threads in which the idea of having transactions enter delta entries was discussed? Solves both the locking and the MVCC problems, at the cost

Re: [HACKERS] count(*) performance improvement ideas

2008-03-12 Thread Pavan Deolasee
On Wed, Mar 12, 2008 at 9:01 PM, Tom Lane [EMAIL PROTECTED] wrote: Pavan Deolasee [EMAIL PROTECTED] writes: I am sure this must have been discussed before. Indeed. Apparently you didn't find the threads in which the idea of having transactions enter delta entries was discussed? Solves

Re: [HACKERS] count(*) performance improvement ideas

2008-03-12 Thread Mark Mielke
Pavan Deolasee wrote: I am reading discussion about improving count(*) performance. I have also seen a TODO for this. Many people have suggested TRIGGER based solution to the slow count(*) problem. I looked at the following link which presents the solution neatly.

Re: [HACKERS] count(*) performance improvement ideas

2008-03-12 Thread Pavan Deolasee
On Wed, Mar 12, 2008 at 9:14 PM, Mark Mielke [EMAIL PROTECTED] wrote: If you are talking about automatically doing this for every table - I have an objection that the performance impact seems unwarranted against the gain. We are still talking about every insert or update updating some

Re: [HACKERS] count(*) performance improvement ideas

2008-03-12 Thread Mark Mielke
Pavan Deolasee wrote: On Wed, Mar 12, 2008 at 9:14 PM, Mark Mielke [EMAIL PROTECTED] wrote: If you are talking about automatically doing this for every table - I have an objection that the performance impact seems unwarranted against the gain. We are still talking about every insert or

Re: [HACKERS] count(*) performance improvement ideas

2008-03-12 Thread Tom Lane
Mark Mielke [EMAIL PROTECTED] writes: Fine - once per transaction instead of once per insert. Still, if there is overhead to this (updating a secondary summary table), does it really make sense to have it for every table? We certainly wouldn't accept a patch that imposed this overhead on

Re: [HACKERS] count(*) performance improvement ideas

2008-03-12 Thread Pavan Deolasee
On Wed, Mar 12, 2008 at 9:53 PM, Mark Mielke [EMAIL PROTECTED] wrote: Fine - once per transaction instead of once per insert. Still, if there is overhead to this (updating a secondary summary table), does it really make sense to have it for every table? Most of my tables do not require

Re: [HACKERS] count(*) performance improvement ideas

2008-03-12 Thread Mark Mielke
Pavan Deolasee wrote: On Wed, Mar 12, 2008 at 9:53 PM, Mark Mielke [EMAIL PROTECTED] wrote: Fine - once per transaction instead of once per insert. Still, if there is overhead to this (updating a secondary summary table), does it really make sense to have it for every table? Most of my

Re: [HACKERS] count(*) performance improvement ideas

2008-03-12 Thread Gregory Stark
Tom Lane [EMAIL PROTECTED] writes: Mark Mielke [EMAIL PROTECTED] writes: Fine - once per transaction instead of once per insert. Still, if there is overhead to this (updating a secondary summary table), does it really make sense to have it for every table? We certainly wouldn't accept a

Re: [HACKERS] count(*) performance improvement ideas

2008-03-12 Thread Mark Mielke
Gregory Stark wrote: In an ideal world I would love to be able to do something like: CREATE MATERIALIZED VIEW foo AS (select count(*) from bar) WITH INCREMENTAL UPDATES; and have that automatically create both a heap to store the count and another to store the incremental changes. To do this