On Fri, Apr 11, 2008 at 12:34 PM, PFC <[EMAIL PROTECTED]> wrote:
        Well, I realized the idea of global prepared statements actually
sucked, so I set on another approach thanks to ideas from this list, this is
caching query plans.

Well, that's a blatantly bad realization. Perhaps you should do more research.

No, what I meant is that the "global prepared statements" as I tried to implement them before weren't that good... I think simple caching based on the query text itself is preferable to having to name each of your queries, extract them from your programs and replace them by executes, issue a "create statement" command for each of them, etc. Few people would actually use that feature because it would mean lots of modifications to the application, so all the applications that have to be compatible with other databases would not use the feature (*) It could be useful for permissions and fine access control, though, but views and stored procs already provide that functionality...

(*) = Note that caching the plans based on the query text (with $ params) from a parse message will not provide caching for oldskool queries with params inside in the form of escaped strings. This is good, because it means the safer solution (using $-quoted params) will also be the faster solution. And in the application, only a very small part of the code needs to be changed, that's the DB abstraction layer.


 Doesn't Oracle do this now transparently to clients?

Of course it does, and it has since the late 80's I believe.

 Oracle keeps a statement/plan cache in its shared memory segment (SGA)
 that greatly improves its performance at running queries that don't
 change very often.

        Can we have more details on how Oracle does it ? For "inspiration"...

        Here is what I'm thinking about :
Don't flame me too much about implementation issues, this is just throwing ideas in the air to see where they'll fall ;)

* global plan cache in shared memory, implemented as hashtable, hash key being the (search_path, query_string) Doubt : Can a plan be stored in shared memory ? Will it have to be copied to local memory before being executed ?

This stores :
- the plans (not for all keys, see below)
- the stats :
        - number of times this query has been executed,
- total, min and max wallclock time and CPU time spent planning this query, - total, min and max wallclock time, CPU time and RAM spent executing this query,
        - total, min and max number of rows returned,
        - last timestamp of execution of this query,

There should be separate GUCs to control this :
        - should the whole thing be activated ?
        - should the cache be active ? or just the stats ? and what stats ?

There should be also a way to query this to display the statistics (ie "what query is killing my server ?"), and a way to purge old plans.

* every time a Parse message comes up :
- look if the (search_path, query_string) is in the cache
- if it is in the cache :
- if there is a cached plan, make the unnamed statement point to it, and we're done. - if there is no cached plan, prepare the query, and put it in the unnamed statement.

Now, the query has been parsed, so we can decide if it is cacheable. Should this be done in Parse, in Bind, or somewhere else ? I have no idea.

For instance, queries which contain VALUES() or IN( list of consts ) should not be cached, since the IN() is likely to change all the time, it would just trash the cache. Using =ANY( $1 ) instead will work with cached plans.

Also, will a plan to be cached have to be prepared with or without the parameters ? That's also an interesting question... Perhaps the user should also be able to specify wether to cache a plan or not, or wether to use the params or not, with hint flags in the query string ?
(like mysql, /* flags */ SELECT blah )

Now, if the query is cacheable, store it in the cache, and update the stats. If we decided to store the plan, do that too. For instance we might decide to store the plan only if this query has been executed a certain number of times, etc.

* In the Execute message, if a cached plan was used, execute it and update the stats (time spent, etc).

Now, about contention, since this is one shared hashtable for everyone, it will be fought for... However, the lock on it is likely to be held during a very small time (much less than a microsecond), so would it be that bad ? Also, GUC can be used to mitigate the contention, for instance if the user is not interested in the stats, the thing becomes mostly read-only





















--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to