Re: [HACKERS] Cached Query Plans
* Dawid Kuroczko: > Right now the only way of getting such information from PostgreSQL > is by logging all queries and analyzing logs. The current_query > column of pg_stat_activity is useless as the (prepared) queries are > usually so short lived that you will see one execution out of > thousands happening. If the cached plans are kept a bit longer than actually necessary, it might also be possible to see the query plan of a query that involves temporary tables, something that is somewhat convoluted to do otherwise (ptop uses the query string from pg_stat_activity and tacks an EXPLAIN in front of it, which breaks with temporary tables). -- Florian Weimer<[EMAIL PROTECTED]> BFK edv-consulting GmbH http://www.bfk.de/ Kriegsstraße 100 tel: +49-721-96201-1 D-76133 Karlsruhe fax: +49-721-96201-99 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Cached Query Plans
On Mon, Apr 14, 2008 at 5:01 PM, Csaba Nagy <[EMAIL PROTECTED]> wrote: > On Mon, 2008-04-14 at 10:55 -0400, Mark Mielke wrote: > > The other ideas about automatically deciding between plans based on > > ranges and such strike me as involving enough complexity and logic, that > > to do properly, it might as well be completely re-planned from the > > beginning to get the most benefit. > > ... except if you hard-wire the most common alternative plans, you still > get the benefit of cached plan for a wider range of parameter values. > Not to mention that if you know you'll cache the plan, you can try > harder planning it right, getting possibly better plans for complex > queries... you could argue that complex queries tend not to be repeated, > but we do have here some which are in fact repeated a lot in batches, > then discarded. So I guess a cached plan discard/timeout mechanism would > also be nice. I think ANALYZE on tables involved should _force_ replanning of cached query. After all, if ANALYZE was fired, then contents changed substantially and replanning feels like a good idea. As for planner getting smarter (and slower ;)) -- complex queries tend not to be repeated -- so it is worth the trouble to plan them carefully. These would benefit from smarter planer with or without caching. The problem is with "simple queries", which can be argued are a majority of queries. its where the caching comes in. If you cache the queries, you can let the planner be smarter (and slower). If you don't cache, you probably don't want trade "frequent simple query"'s speed for "once in a while complex query". That stated, for me the most important feature is the possibility to have a good online query statistics. :) Regards, Dawid -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Cached Query Plans (was: global prepared statements)
On Mon, 2008-04-14 at 17:08 +0200, PFC wrote: > Those "Decision" nodes could potentially lead to lots of decisions > (ahem). > What if you have 10 conditions in the Where, plus some joined ones ? > That > would make lots of possibilities... Yes, that's true, but most of them are likely not relevant for the end result. In any real life query there are a few parameters which are really important for what plan you should choose... the key here is that you should spend more time on finding the possibilities for a cached plan than you do for a one shot query. In principle one-shot planning should be the default and caching should be something the user has to chose deliberately. I would really like a special command to plan and cache a query without actually executing it, possibly having a parameter how hard to try... for e.g. you could expend the extra cycles to eliminate all redundancies from boolean expressions, in lists, to get the parse tree in a canonical format - all things which can make planning easier. All these lose in one-shot queries, but once you cache you can really do a lot of smarts which were no-no before... > Consider several types of queries : > > - The small, quick query which returns one or a few rows : in this > case, > planning overhead is large relative to execution time, but I would venture > to guess that the plans always end up being the same. Consider a 'select a where b like $1' -> the parameter $1 will considerably affect the query plan. A query can't go much simpler... > - The query that takes a while : in this case, planning overhead is nil > > compared to execution time, better replan every time with the params. I guess these queries are not the ones targeted by this feature. In fact for these queries it really doesn't matter if you cache or not, except: if you know you're gonna cache, you'll expend more effort planning right, and that could still matter for a query which runs long. Note that if you don't cache, planning harder will lose in the long run, only once you cache you can afford to plan harder... > - The complex query that still executes fast because it doesn't process > a > lot of rows and postgres finds a good plan (for instance, a well optimized > search query). Those would benefit from reducing the planning overhead, > but those also typically end up having many different plans depending on > the search parameters. Besides, those queries are likely to be dynamically > generated. So, would it be worth it to add all those features just to > optimize those ? I don't know... We have here dynamically generated queries which are specifically chunked to be executed in small increments so none of the queries runs too long (they would block vacuuming vital tables otherwise). Those chunks would greatly benefit from properly planned and cached plans... A real smart system would store canonical plan fragments as response to (also canonicalized) parse tree fragments, and then assemble the plan out of those fragments, but that would be indeed really complex (to design, the resulting code might be simpler than one thinks) ;-) Cheers, Csaba. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Cached Query Plans (was: global prepared statements)
On Mon, 14 Apr 2008 16:17:18 +0200, Csaba Nagy <[EMAIL PROTECTED]> wrote: On Mon, 2008-04-14 at 16:10 +0200, Csaba Nagy wrote: ... or plan the query with the actual parameter value you get, and also record the range of the parameter values you expect the plan to be valid for. If at execution time the parameter happens to be out of that range, replan, and possibly add new sublpan covering the extra range. This could still work with prepared queries (where you don't get any parameter values to start with) by estimating the most probable parameter range (whatever that could mean), and planning for that. More on that: recording the presumptions under which the (cached!)plan is thought to be valid would also facilitate setting up dependencies against statistics, to be checked when you analyze tables... and if the key value which you depend on with your query changed, the analyze process could possibly replan it in the background. LOL, it started with the idea to make small queries faster, and now the brain juice is pouring. Those "Decision" nodes could potentially lead to lots of decisions (ahem). What if you have 10 conditions in the Where, plus some joined ones ? That would make lots of possibilities... Consider several types of queries : - The small, quick query which returns one or a few rows : in this case, planning overhead is large relative to execution time, but I would venture to guess that the plans always end up being the same. - The query that takes a while : in this case, planning overhead is nil compared to execution time, better replan every time with the params. - The complex query that still executes fast because it doesn't process a lot of rows and postgres finds a good plan (for instance, a well optimized search query). Those would benefit from reducing the planning overhead, but those also typically end up having many different plans depending on the search parameters. Besides, those queries are likely to be dynamically generated. So, would it be worth it to add all those features just to optimize those ? I don't know... -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Cached Query Plans
On Mon, 2008-04-14 at 10:55 -0400, Mark Mielke wrote: > The other ideas about automatically deciding between plans based on > ranges and such strike me as involving enough complexity and logic, that > to do properly, it might as well be completely re-planned from the > beginning to get the most benefit. ... except if you hard-wire the most common alternative plans, you still get the benefit of cached plan for a wider range of parameter values. Not to mention that if you know you'll cache the plan, you can try harder planning it right, getting possibly better plans for complex queries... you could argue that complex queries tend not to be repeated, but we do have here some which are in fact repeated a lot in batches, then discarded. So I guess a cached plan discard/timeout mechanism would also be nice. Cheers, Csaba. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Cached Query Plans
I like cross-session query plan caching talk. I would prefer if the function was optional (i.e. per-session "use cross-session query plan cache" variable). I like the "automatic re-plan if the estimate did not match the actual" idea with some softening technique involved such as "if the last 3 times it ran, it did the wrong thing, learn from our mistake and adapt". The other ideas about automatically deciding between plans based on ranges and such strike me as involving enough complexity and logic, that to do properly, it might as well be completely re-planned from the beginning to get the most benefit. Cheers, mark -- Mark Mielke <[EMAIL PROTECTED]> -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Cached Query Plans (was: global prepared statements)
Bind message behaviour was modified : - If the user asks for execution of a named prepared statement, and the named statement does not exist in PG's prepared statements cache, instead of issuing an error and borking the transaction, it Binds to an empty statement, that takes no parameters, and returns no result. Parameters sent by the user are consumed but not used. You mentioned the need for a wire protocol change to allow this. Why can't this be controlled with a server variable, like SET auto_prepare = 'true'? Actually, thanks to the hack, the wire protocol doesn't change. Explanation : - Send Parse(SQL) to unnamed statement + Bind unnamed statement => works as usual (no cache) - Send only Bind (named statement) with a statement name that is not found in the cache => doesn't raise an error, instead informs the application that the statement does not exist. The application can then prepare (send a Parse message with SQL and a name) the statement and give it a name. I used as name the SQL itself, but you can use anything else. The application can then send the Bind again, which will (hopefully) work. So, here, the information ("cache" or "don't cache") is passed from the client to the server, in a hidden way : it depends on what function you use to send the query (unnamed statements are not cached, named statements are cached). There is no protocol change, but a new information is provided to the server nonetheless. Downside to this is that the application needs to be modified (only a little, though) and applications that expect exceptions on "Statement does not exist" will break, thus the necessity of a GUC to control it. It was just a quick & dirty test to see if this way of doing it was an option to consider or not. Apparently it works, but wether it is The Right Way remains to be seen... -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Cached Query Plans (was: global prepared statements)
> ... or plan the query with the actual parameter value you get, and also > record the range of the parameter values you expect the plan to be valid > for. If at execution time the parameter happens to be out of that range, > replan, and possibly add new sublpan covering the extra range. This > could still work with prepared queries (where you don't get any > parameter values to start with) by estimating the most probable > parameter range (whatever that could mean), and planning for that. Another thought: if the cached plans get their own table (as it was suggested) then you could also start gathering parameter range statistics meaningfully... and on the next replan you know what to optimize your planning efforts for. Cheers, Csaba. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Cached Query Plans (was: global prepared statements)
On Mon, 2008-04-14 at 16:10 +0200, Csaba Nagy wrote: > ... or plan the query with the actual parameter value you get, and also > record the range of the parameter values you expect the plan to be valid > for. If at execution time the parameter happens to be out of that range, > replan, and possibly add new sublpan covering the extra range. This > could still work with prepared queries (where you don't get any > parameter values to start with) by estimating the most probable > parameter range (whatever that could mean), and planning for that. More on that: recording the presumptions under which the (cached!)plan is thought to be valid would also facilitate setting up dependencies against statistics, to be checked when you analyze tables... and if the key value which you depend on with your query changed, the analyze process could possibly replan it in the background. Cheers, Csaba. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Cached Query Plans (was: global prepared statements)
On Mon, 2008-04-14 at 16:54 +0300, Heikki Linnakangas wrote: > Figuring out the optimal "decision points" is hard, and potentially very > expensive. There is one pretty simple scenario though: enabling the use > of partial indexes, preparing one plan where a partial index can be > used, and another one where it can't. Another such case is "col LIKE ?" > queries, where ? is actually a prefix query, "foo%". Another point is when the cardinality distribution of some key's values is very skewed, with some values very frequent and the majority of values being unique. There you could check the stats at execution time just for deciding to go for the low cardinality plan or the high one... > As an optimization, we could decide the decision points on the prepare > message, and delay actually planning the queries until they're needed. > That way we wouldn't waste time planning queries for combinations of > parameters that are never used. ... or plan the query with the actual parameter value you get, and also record the range of the parameter values you expect the plan to be valid for. If at execution time the parameter happens to be out of that range, replan, and possibly add new sublpan covering the extra range. This could still work with prepared queries (where you don't get any parameter values to start with) by estimating the most probable parameter range (whatever that could mean), and planning for that. Cheers, Csaba. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Cached Query Plans (was: global prepared statements)
Csaba Nagy wrote: If cached plans would be implemented, the dependence on parameter values could be solved too: use special "fork" nodes in the plan which execute different sub-plans depending on special parameter values/ranges, possibly looking up the stats at runtime, so that the plan is in a compiled state with the "decision points" wired in. That's an idea I've been thinking about for a long time, but never got around implementing. I see that as a completely orthogonal feature to the server-side shared plan cache, though. There's plenty of scenarios, like with client-side prepared statement cache, where it would be useful. Figuring out the optimal "decision points" is hard, and potentially very expensive. There is one pretty simple scenario though: enabling the use of partial indexes, preparing one plan where a partial index can be used, and another one where it can't. Another such case is "col LIKE ?" queries, where ? is actually a prefix query, "foo%". As an optimization, we could decide the decision points on the prepare message, and delay actually planning the queries until they're needed. That way we wouldn't waste time planning queries for combinations of parameters that are never used. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Cached Query Plans (was: global prepared statements)
PFC wrote: > Bind message behaviour was modified : > - If the user asks for execution of a named prepared statement, and the > named statement does not exist in PG's prepared statements cache, instead > of issuing an error and borking the transaction, it Binds to an empty > statement, that takes no parameters, and returns no result. Parameters > sent by the user are consumed but not used. You mentioned the need for a wire protocol change to allow this. Why can't this be controlled with a server variable, like SET auto_prepare = 'true'? -- Bruce Momjian <[EMAIL PROTECTED]>http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Cached Query Plans (was: global prepared statements)
If cached plans would be implemented, the dependence on parameter values could be solved too: use special "fork" nodes in the plan which execute different sub-plans depending on special parameter values/ranges, possibly looking up the stats at runtime, so that the plan is in a compiled state with the "decision points" wired in. This of course would mean a lot heavier planning and possibly a lot bigger plans, but you could afford that if you cache the plan. You could even have a special command to plan a query this way. And, the "fork node" could mutter to itself "Strange, I'm getting 1 rows instead of the 2 for which I was planned, perhaps I should switch to a different plan..." I have made another very simple hack to test for another option : Bind message behaviour was modified : - If the user asks for execution of a named prepared statement, and the named statement does not exist in PG's prepared statements cache, instead of issuing an error and borking the transaction, it Binds to an empty statement, that takes no parameters, and returns no result. Parameters sent by the user are consumed but not used. The application was modified thusly : - Calls to pg_query_params were changed to calls to the following function : function pg_query_cached( $sql, $params ) { // Try to execute it, using the query string as statement name. $q = pg_execute( $sql, $params ); if( !$q ) die( pg_last_error() ); // If it worked, return result to caller. if( pg_result_status( $q, PGSQL_STATUS_STRING ) != "" ) return $q; // If we got an empty query result (not a result with 0 rows which is valid) then prepare the query $q = pg_prepare( $sql, $sql ); if( !$q ) die( pg_last_error() ); // and execute it again $q = pg_execute( $sql, $params ); if( !$q ) die( pg_last_error() ); return $q; } Pros : - It works - It is very very simple - The user can choose between caching plans or not by calling pg_query_params() (no cached plans) or pg_query_cached() (cached plans) - It works with persistent connections Cons : - It is too simple - Plans are cached locally, so memory use is proportional to number of connections - It is still vulnerable to search_path problems -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Cached Query Plans (was: global prepared statements)
> The hairiness is in the plan dependence (or independence) on parameter > values, ideally we only want to cache plans that would be good for all > parameter values, only the user knows that precisely. Although it could be > possible to examine the column histograms... If cached plans would be implemented, the dependence on parameter values could be solved too: use special "fork" nodes in the plan which execute different sub-plans depending on special parameter values/ranges, possibly looking up the stats at runtime, so that the plan is in a compiled state with the "decision points" wired in. This of course would mean a lot heavier planning and possibly a lot bigger plans, but you could afford that if you cache the plan. You could even have a special command to plan a query this way. Cheers, Csaba. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Cached Query Plans (was: global prepared statements)
Why limit ourselves with Oracle? How all major proprietary RDBMSs do it. Thanks for the links. Very interesting. The DB2 document especially mentions an important point : in order to make their planner/optimizer smarter, they had to make it slower, hence it became crucial to cache the plans. Contrast this with MySQL where using prepared statements gains nothing : the "optimizer" does so little work that it actually doesn't matter. So, basically, Orcale : - Parses the query every time (identifies tables, permissions etc) (soft parse) - From that parsed query it looks up a cached plan (the lookup key could then be different depending on the schema etc) - If not, it must plan the query (hard parse). Also the Oracle doc mentions that the soft parsing should be avoided by using prepared statements in the application (ie Parse once and Bind lots of times) So, Oracle will redo the parsing + permissions check each time, unless prepared statements are used, in which case it's direct execution. And DB2 : Er, the document is not very clear about what it actually does, but the stats look nice ;) I liked your global prepared statements idea much better. Named the statements is no problem: DB frontends do that for you anyway sometimes. Hm. The "global statements" and the cache would complement each other actually. Why not. When the user wants to name the statements, he can do so (and perhaps control who can execute what, etc, like with stored procs) Permission checking overhead will be there at each execution. Should the plan be cached locally ? (RAM consumption times N bakends...) Cached per user once permissions have been checked ? (avoids the overhead of rechecking permissions) What about the search path ? (I'd force the global statements to use the default search path no matter what, being explicit is better than "why does it stop working ?") Can the application or the database library name the statements ? I'm not so sure. This could work for compiled languages (what about when you run several applications ? or several versions of the same application ? do we need a uniqueness of statement names from all developers all over the world ?) Solution : make each application use a different user name, and global prepared statements only visible to the user that created them, perhaps. This conflicts with some desirable features, though. It needs more thinking. What about non-compiled languages ? It will not be possible to generate a list of statements beforehands... And queries are also constructed dynamically by frameworks such as Rails, which makes naming them impossible, but caching the plans would work well. So, some situations would benefit from a plan cache, Frankly, I think you're better off storing them in a table. Shared memory is a limited resource and you cannot change how much you've I'd say that unless you have a perverse application that will try all the permutations of column names just to make sure the query is different every time, how many different queries would you want to cache ?... probably less than 1000... so it wouldn't take more than a couple megabytes... allocated after the server has started. It does mean you'll have to serialise/deserialise them, but this will be cheaper than replanning, right? What would be the overhead of a catalog lookup to get a cached plan for a statement that returns 1 row ? Would the catalog cache make it fast enough ? And what about deserialization ?... I am not too sure that plans and statistical counters should be stored together... Not sure either. Probably plans should go in one place, and statistics should go to the stats collector (I know he's not quite ready for this ;)). That's the problem... Hm, a limit on how much memory can be used for plans (query_plan_cache_size GUC?), and a LRU/LFU expiration of old plans? Now it gets hairy ;) Yes memory size should be limited. But how to make a LRU cleaner which doesn't create lots of contention ?... Luckily, with a hash having a fixed number of buckets, it is easier (clean a bucket every N seconds for instance). Perhaps a GUC for controlling query cache should heve three values: none -- don't cache any statement smart -- use heuristics for deciding whether to cache it all -- force caching all queries -- for uncommon/statistical/testing purposes. I would not volunteer to write that heuristic ;) Although there would be a very simple solution : if time to parse > some percentage of time to execute then cache. The hairiness is in the plan dependence (or independence) on parameter values, ideally we only want to cache plans that would be good for all parameter values, only the user knows that precisely. Although it could be possible to examine the column histograms... (like mysql, /* flags */ SELECT blah ) I don't like the hint flags. They tend to ha
Re: [HACKERS] Cached Query Plans (was: global prepared statements)
On Sun, Apr 13, 2008 at 2:26 PM, PFC <[EMAIL PROTECTED]> wrote: > > > Oracle keeps a statement/plan cache in its shared memory segment (SGA) > > > that greatly improves its performance at running queries that don't > > > change very often. > Can we have more details on how Oracle does it ? For > "inspiration"... Why limit ourselves with Oracle? How all major proprietary RDBMSs do it. Here is a nice presentation I've found on DB2, they call it "Dynamic Statement Cache": http://www.tbrug.com/TB%20UG%20Dynamic%20Statement%20Cache.ppt > Here is what I'm thinking about : > Don't flame me too much about implementation issues, this is just > throwing ideas in the air to see where they'll fall ;) > > * global plan cache in shared memory, implemented as hashtable, hash key > being the (search_path, query_string) > Doubt : Can a plan be stored in shared memory ? Will it have to be copied > to local memory before being executed ? Well, Oracle uses terms "hard parse" and "soft parse", the former being preparing the whole query, the latter reusing query plan prepared by some other session. More or less. See this link for more detailed description: http://asktom.oracle.com/pls/asktom/f?p=100:11:0P11_QUESTION_ID:2588723819082 (this is quite interesting read) > This stores : > - the plans (not for all keys, see below) > - the stats : [...] I am not too sure that plans and statistical counters should be stored together... Probably plans should go in one place, and statistics should go to the stats collector (I know he's not quite ready for this ;)). > There should be also a way to query this to display the statistics (ie > "what query is killing my server ?"), and a way to purge old plans. Hm, a limit on how much memory can be used for plans (query_plan_cache_size GUC?), and a LRU/LFU expiration of old plans? > * every time a Parse message comes up : > - look if the (search_path, query_string) is in the cache > - if it is in the cache : > - if there is a cached plan, make the unnamed statement point to it, > and we're done. > - if there is no cached plan, prepare the query, and put it in the > unnamed statement. > > Now, the query has been parsed, so we can decide if it is cacheable. Should > this be done in Parse, in Bind, or somewhere else ? I have no idea. > > For instance, queries which contain VALUES() or IN( list of consts ) should > not be cached, since the IN() is likely to change all the time, it would > just trash the cache. Using =ANY( $1 ) instead will work with cached plans. Perhaps a GUC for controlling query cache should heve three values: none -- don't cache any statement smart -- use heuristics for deciding whether to cache it all -- force caching all queries -- for uncommon/statistical/testing purposes. >> Also, will a plan to be cached have to be prepared with or without the > parameters ? That's also an interesting question... > Perhaps the user should also be able to specify wether to cache a plan or > not, or wether to use the params or not, with hint flags in the query string > ? > (like mysql, /* flags */ SELECT blah ) I don't like the hint flags. They tend to haunt later on (when the database gets smarter, but application forces it to be dumb). I would say a GUC. GUC gives freedom of change to the application, and can also be set per user with ALTER USER. > Now, if the query is cacheable, store it in the cache, and update > the stats. If we decided to store the plan, do that too. For instance we > might decide to store the plan only if this query has been executed a > certain number of times, etc. Interesting idea. I think I like it. > * In the Execute message, if a cached plan was used, execute it and update > the stats (time spent, etc). > > Now, about contention, since this is one shared hashtable for > everyone, it will be fought for... > However, the lock on it is likely to be held during a very small > time (much less than a microsecond), so would it be that bad ? > Also, GUC can be used to mitigate the contention, for instance if > the user is not interested in the stats, the thing becomes mostly read-only I would say: keep the stats separate. For evey plan cached generate some unique id (Perhaps OID? I am not convinced), and use this ID as the key for the statistics. I tend to think of it as a "temporary table, and temporary table stats". :) Regards, Dawid -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Cached Query Plans (was: global prepared statements)
On Sun, Apr 13, 2008 at 02:26:04PM +0200, PFC wrote: > * global plan cache in shared memory, implemented as hashtable, hash key > being the (search_path, query_string) > Doubt : Can a plan be stored in shared memory ? Will it have to be copied > to local memory before being executed ? Frankly, I think you're better off storing them in a table. Shared memory is a limited resource and you cannot change how much you've allocated after the server has started. It does mean you'll have to serialise/deserialise them, but this will be cheaper than replanning, right? I liked your global prepared statements idea much better. Named the statements is no problem: DB frontends do that for you anyway sometimes. Have a nice day, -- Martijn van Oosterhout <[EMAIL PROTECTED]> http://svana.org/kleptog/ > Please line up in a tree and maintain the heap invariant while > boarding. Thank you for flying nlogn airlines. signature.asc Description: Digital signature
Re: [HACKERS] Cached Query Plans (was: global prepared statements)
On Fri, Apr 11, 2008 at 12:34 PM, PFC <[EMAIL PROTECTED]> wrote: Well, I realized the idea of global prepared statements actually sucked, so I set on another approach thanks to ideas from this list, this is caching query plans. Well, that's a blatantly bad realization. Perhaps you should do more research. No, what I meant is that the "global prepared statements" as I tried to implement them before weren't that good... I think simple caching based on the query text itself is preferable to having to name each of your queries, extract them from your programs and replace them by executes, issue a "create statement" command for each of them, etc. Few people would actually use that feature because it would mean lots of modifications to the application, so all the applications that have to be compatible with other databases would not use the feature (*) It could be useful for permissions and fine access control, though, but views and stored procs already provide that functionality... (*) = Note that caching the plans based on the query text (with $ params) from a parse message will not provide caching for oldskool queries with params inside in the form of escaped strings. This is good, because it means the safer solution (using $-quoted params) will also be the faster solution. And in the application, only a very small part of the code needs to be changed, that's the DB abstraction layer. Doesn't Oracle do this now transparently to clients? Of course it does, and it has since the late 80's I believe. Oracle keeps a statement/plan cache in its shared memory segment (SGA) that greatly improves its performance at running queries that don't change very often. Can we have more details on how Oracle does it ? For "inspiration"... Here is what I'm thinking about : Don't flame me too much about implementation issues, this is just throwing ideas in the air to see where they'll fall ;) * global plan cache in shared memory, implemented as hashtable, hash key being the (search_path, query_string) Doubt : Can a plan be stored in shared memory ? Will it have to be copied to local memory before being executed ? This stores : - the plans (not for all keys, see below) - the stats : - number of times this query has been executed, - total, min and max wallclock time and CPU time spent planning this query, - total, min and max wallclock time, CPU time and RAM spent executing this query, - total, min and max number of rows returned, - last timestamp of execution of this query, There should be separate GUCs to control this : - should the whole thing be activated ? - should the cache be active ? or just the stats ? and what stats ? There should be also a way to query this to display the statistics (ie "what query is killing my server ?"), and a way to purge old plans. * every time a Parse message comes up : - look if the (search_path, query_string) is in the cache - if it is in the cache : - if there is a cached plan, make the unnamed statement point to it, and we're done. - if there is no cached plan, prepare the query, and put it in the unnamed statement. Now, the query has been parsed, so we can decide if it is cacheable. Should this be done in Parse, in Bind, or somewhere else ? I have no idea. For instance, queries which contain VALUES() or IN( list of consts ) should not be cached, since the IN() is likely to change all the time, it would just trash the cache. Using =ANY( $1 ) instead will work with cached plans. Also, will a plan to be cached have to be prepared with or without the parameters ? That's also an interesting question... Perhaps the user should also be able to specify wether to cache a plan or not, or wether to use the params or not, with hint flags in the query string ? (like mysql, /* flags */ SELECT blah ) Now, if the query is cacheable, store it in the cache, and update the stats. If we decided to store the plan, do that too. For instance we might decide to store the plan only if this query has been executed a certain number of times, etc. * In the Execute message, if a cached plan was used, execute it and update the stats (time spent, etc). Now, about contention, since this is one shared hashtable for everyone, it will be fought for... However, the lock on it is likely to be held during a very small time (much less than a microsecond), so would it be that bad ? Also, GUC can be used to mitigate the contention, for instance if the user is not interested in the stats, the thing becomes mostly read-only -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Cached Query Plans
Would it be possible to store plans with an indication of the search path that was used to find tables, and for temp tables some snapshot of the statistics for the table if any? My suspicions are that: * where you have a lot of short-lived connections then actually they will often use the default search path - or a similar one * if a temp table is in use then normally these will be small or contain 'similar' data There is a danger that these heuristics will be poor if long-running connections are in play - but they have no excuse not to do their own preparation. Perhaps you could have named cache segments and connections could 'opt in' to a cache segment if they want such sharing? James -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Cached Query Plans (was: global prepared statements)
Another issue with plan caches, besides contention, in Oracle at least, is shared memory fragmentation (as plans aren't all the same size in memory ...) But this cache is very helpful for developments where every query is done via prepare/execute/deallocate. I've seen it a lot on java apps, the purpose being to benefit from the advantages of prepared statements, but without having to deal with storing those prepared statements somewhere. And of course, as said before, the statistics associated with those plans can be very helpful, mostly for all those very small queries that are run very frequently (a badly developped app most of the time, but that happens). Le Sunday 13 April 2008 06:21:41 Jonah H. Harris, vous avez écrit : > On Sat, Apr 12, 2008 at 10:17 PM, Tom Lane <[EMAIL PROTECTED]> wrote: > > > Yes, this is worthless on large active databases. The logging > > > overhead alone starts to affect performance. > > > > But somehow, all that stuff with cached plans is free? > > Of course not. The first time you execute a query, it is cached... so > you pay the same penalty you do in PG, but in many cases, only once. > In regards to plan re-use, sure there's going to be some contention on > the hash buckets... but that can be mitigated in a lot of ways. > > In addition to that, Oracle collects over two thousand other > statistics in real-time... yet somehow Oracle is quite fast. So, I > would say that the usual complaint about collecting stats should be > more an issue of proper implementation than a complaint about the act > of collection itself. > > -- > Jonah H. Harris, Sr. Software Architect | phone: 732.331.1324 > EnterpriseDB Corporation | fax: 732.331.1301 > 499 Thornall Street, 2nd Floor | [EMAIL PROTECTED] > Edison, NJ 08837 | http://www.enterprisedb.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Cached Query Plans (was: global prepared statements)
On Sat, Apr 12, 2008 at 10:17 PM, Tom Lane <[EMAIL PROTECTED]> wrote: > > Yes, this is worthless on large active databases. The logging > > overhead alone starts to affect performance. > > But somehow, all that stuff with cached plans is free? Of course not. The first time you execute a query, it is cached... so you pay the same penalty you do in PG, but in many cases, only once. In regards to plan re-use, sure there's going to be some contention on the hash buckets... but that can be mitigated in a lot of ways. In addition to that, Oracle collects over two thousand other statistics in real-time... yet somehow Oracle is quite fast. So, I would say that the usual complaint about collecting stats should be more an issue of proper implementation than a complaint about the act of collection itself. -- Jonah H. Harris, Sr. Software Architect | phone: 732.331.1324 EnterpriseDB Corporation | fax: 732.331.1301 499 Thornall Street, 2nd Floor | [EMAIL PROTECTED] Edison, NJ 08837 | http://www.enterprisedb.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Cached Query Plans (was: global prepared statements)
"Jonah H. Harris" <[EMAIL PROTECTED]> writes: > On Sat, Apr 12, 2008 at 2:19 PM, Dawid Kuroczko <[EMAIL PROTECTED]> wrote: >> There are other benefits as well. Oracle lets you see the statistics >> associated >> with given plans. So you can see how many times given (cached) query was >> executed, how much resources did it consume and do on. > Yes, and it also uses that data at both the statement and column level > to determine what needs more analysis to help build better plans in > the future. >> Right now the only way of getting such information from PostgreSQL is by >> logging all queries and analyzing logs. The current_query column of >> pg_stat_activity is useless as the (prepared) queries are usually so short >> lived that you will see one execution out of thousands happening. > Yes, this is worthless on large active databases. The logging > overhead alone starts to affect performance. But somehow, all that stuff with cached plans is free? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Cached Query Plans (was: global prepared statements)
On Sat, Apr 12, 2008 at 2:19 PM, Dawid Kuroczko <[EMAIL PROTECTED]> wrote: > There are other benefits as well. Oracle lets you see the statistics > associated > with given plans. So you can see how many times given (cached) query was > executed, how much resources did it consume and do on. Yes, and it also uses that data at both the statement and column level to determine what needs more analysis to help build better plans in the future. > Right now the only way of getting such information from PostgreSQL is by > logging all queries and analyzing logs. The current_query column of > pg_stat_activity is useless as the (prepared) queries are usually so short > lived that you will see one execution out of thousands happening. Yes, this is worthless on large active databases. The logging overhead alone starts to affect performance. > Nooow, suppose we do have cached plans. Then we can have a view > pg_stat_queries + a stats collector which will track number of executions, > number of blocks hit, blocks read, etc. Would be great! :) With the exception that the stats collector itself needs a bit of work to handle larger volumes of statistics, I agree. -- Jonah H. Harris, Sr. Software Architect | phone: 732.331.1324 EnterpriseDB Corporation | fax: 732.331.1301 499 Thornall Street, 2nd Floor | [EMAIL PROTECTED] Edison, NJ 08837 | http://www.enterprisedb.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Cached Query Plans (was: global prepared statements)
On Sat, Apr 12, 2008 at 8:44 AM, Perez <[EMAIL PROTECTED]> wrote: > Doesn't Oracle do this now transparently to clients? Of course it does, and it has since the late 80's I believe. > Oracle keeps a statement/plan cache in its shared memory segment (SGA) > that greatly improves its performance at running queries that don't > change very often. Yep. > From that point of view, Oracle at least sees benefits in doing this. Yes, it is also a bit more advanced than we're discussing here, so I'll just leave it as. > From my POV a transparent performance enhancer for all those PHP and > Rails apps out there. Yes. > > With plan invalidation in 8.3 this becomes feasible for pgSQL to do as > well. > > -arturo > > > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers > -- Jonah H. Harris, Sr. Software Architect | phone: 732.331.1324 EnterpriseDB Corporation | fax: 732.331.1301 499 Thornall Street, 2nd Floor | [EMAIL PROTECTED] Edison, NJ 08837 | http://www.enterprisedb.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Cached Query Plans (was: global prepared statements)
On Fri, Apr 11, 2008 at 12:34 PM, PFC <[EMAIL PROTECTED]> wrote: > Well, I realized the idea of global prepared statements actually > sucked, so I set on another approach thanks to ideas from this list, this is > caching query plans. Well, that's a blatantly bad realization. Perhaps you should do more research. -- Jonah H. Harris, Sr. Software Architect | phone: 732.331.1324 EnterpriseDB Corporation | fax: 732.331.1301 499 Thornall Street, 2nd Floor | [EMAIL PROTECTED] Edison, NJ 08837 | http://www.enterprisedb.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Cached Query Plans (was: global prepared statements)
On Sat, Apr 12, 2008 at 2:44 PM, Perez <[EMAIL PROTECTED]> wrote: > In article <[EMAIL PROTECTED]>, > > > PFC wrote: > > > > > So, where to go from that ? I don't see a way to implement this > without > > > a (backwards-compatible) change to the wire protocol, because the clients > > > will want to specify when a plan should be cached or not. Since the user > > > should not have to name each and every one of the statements they want to > > > use plan caching, I see the following choices : > > > Doesn't Oracle do this now transparently to clients? That, I believe > Oracle keeps a statement/plan cache in its shared memory segment (SGA) > that greatly improves its performance at running queries that don't > change very often. > > From that point of view, Oracle at least sees benefits in doing this. > From my POV a transparent performance enhancer for all those PHP and > Rails apps out there. There are other benefits as well. Oracle lets you see the statistics associated with given plans. So you can see how many times given (cached) query was executed, how much resources did it consume and do on. Right now the only way of getting such information from PostgreSQL is by logging all queries and analyzing logs. The current_query column of pg_stat_activity is useless as the (prepared) queries are usually so short lived that you will see one execution out of thousands happening. Nooow, suppose we do have cached plans. Then we can have a view pg_stat_queries + a stats collector which will track number of executions, number of blocks hit, blocks read, etc. Would be great! :) Regards, Dawid -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Cached Query Plans (was: global prepared statements)
In article <[EMAIL PROTECTED]>, > PFC wrote: > > > So, where to go from that ? I don't see a way to implement this without > > a (backwards-compatible) change to the wire protocol, because the clients > > will want to specify when a plan should be cached or not. Since the user > > should not have to name each and every one of the statements they want to > > use plan caching, I see the following choices : > Doesn't Oracle do this now transparently to clients? That, I believe Oracle keeps a statement/plan cache in its shared memory segment (SGA) that greatly improves its performance at running queries that don't change very often. >From that point of view, Oracle at least sees benefits in doing this. >From my POV a transparent performance enhancer for all those PHP and Rails apps out there. With plan invalidation in 8.3 this becomes feasible for pgSQL to do as well. -arturo -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Cached Query Plans
Well if you're caching per-connection then it doesn't really matter whether you do it on the client side or the server side, it's pretty much exactly the same problem. Actually I thought about doing it on the server since it would then also work with connection pooling. Doing it on the client means the client has to maintain state, which is not possible in a pool... Unsurprisingly most drivers do precisely what you're describing. In Perl DBI for example you just change $dbh->prepare("") into $dbh->prepare_cached("") and it does exactly what you want. I would expect the PHP drivers to have something equivalent. Well, you clearly have "expectations" about PHP, lol. PHP takes pride in always aiming below your expectations, not above ;) It has no such feature. Also pg_query_params() is SLOWER than pg_query() which makes you choose between clean&slow, and string quoting hell. Perhaps I should patch PHP instead... Or perhaps this feature should be implemented in pgpool or pgbouncer. But, using prepared statements with persistent connections is messy, because you never know if the connection is new or not, If you were to fix *that* then both this problem and others (such as setting up desired SET-parameter values) would go away. True. Languages that keep a long-running context (like application servers etc) can do this easily. Although in the newer versions of PHP, it's not so bad, pconnect seems to work (ie. it will issue ROLLBACKs when the script dies, reset session variables like enable_indexscan, etc), so the only remaining problem seems to be prepared statements. And again, adding a method for the application to know if the persistent connection is new or not, will not work in a connection pool... Perhaps a GUC flag saying EXECUTE should raise an error but not kill the current transaction if the requested prepared statement does not exist ? Then the application would issue a PREPARE. It could also raise a non-fatal error when the tables have changed (column added, for instance) so the application can re-issue a PREPARE. But I still think it would be cleaner to do it in the server. Also, I rethought about what Gregory Stark said : The contention on the shared cache is likely to negate much of the planning savings but I think it would still be a win. If a shared plan cache is implemented, it will mostly be read-only, ie. when the application is started, new queries will come, so the plans will have to be written to the cache, but then once the cache contains everything it needs, it will not be modified that often, so I wouldn't think contention would be such a problem... It's not so easy as all that. Consider search_path. Consider temp tables. Temp tables : I thought plan revalidation took care of this ? (After testing, it does work, if a temp table is dropped and recreated, PG finds it, although of course if a table is altered by adding a column for instance, it logically fails). search_path: I suggested to either put the search_path in the cache key along with the SQL string, or force queries to specify schema.table for all tables. It is also possible to shoot one's foot with the current PREPARE (ie. search_path is used to PREPARE but of course not for EXECUTE), and also with plpgsql functions (ie. the search path used to compile the function is the one that is active when it is compiled, ie at its first call in the current connection, and not the search path that was active when the function was defined)... SET search_path TO DEFAULT; CREATE SCHEMA a; CREATE SCHEMA b; CREATE TABLE a.test( v TEXT ); CREATE TABLE b.test( v TEXT ); INSERT INTO a.test VALUES ('This is schema a'); INSERT INTO b.test VALUES ('This is schema b'); CREATE OR REPLACE FUNCTION test_search_path() RETURNS SETOF TEXT LANGUAGE plpgsql AS $$ DECLARE x TEXT; BEGIN FOR x IN SELECT v FROM test LOOP RETURN NEXT x; END LOOP; END; $$; test=> SET search_path TO a,public; test=> SELECT * FROM test_search_path(); test_search_path -- This is schema a test=> \q $ psql test test=> SET search_path TO b,public; test=> SELECT * FROM test_search_path(); test_search_path -- This is schema b test=> SET search_path TO a,public; test=> SELECT * FROM test_search_path(); test_search_path -- This is schema b -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Cached Query Plans
"PFC" <[EMAIL PROTECTED]> writes: > But, using prepared statements with persistent connections is messy, > because you never know if the connection is new or not, if it contains > already > prepared statements or not, you'd have to maintain a list of those statements > (named) for every query in your application, and when someone changes a > query, > it's a mess, not to mention queries generated by the ORM like Rails etc. Well if you're caching per-connection then it doesn't really matter whether you do it on the client side or the server side, it's pretty much exactly the same problem. Unsurprisingly most drivers do precisely what you're describing. In Perl DBI for example you just change $dbh->prepare("") into $dbh->prepare_cached("") and it does exactly what you want. I would expect the PHP drivers to have something equivalent. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's Slony Replication support! -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Cached Query Plans
PFC <[EMAIL PROTECTED]> writes: > And it is very easy to recognize a query we've seen before, It's not so easy as all that. Consider search_path. Consider temp tables. The real problem here is entirely on the client side: > But, using prepared statements with persistent connections is messy, > because you never know if the connection is new or not, If you were to fix *that* then both this problem and others (such as setting up desired SET-parameter values) would go away. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Cached Query Plans
I think what he's referring to is persistently caching plans so that new connections can use them. That makes a lot more sense if you have lots of short-lived connections like a stock php server without persistent connections turned on or a connection pooler. You can prepare queries but they only live for a single web page so you don't get any benefit. Let me explain a little further. Persistent database connections are the way to go for web applications, because the connection is only going to get used for a few queries, and the time needed to start the postgres process and establish the connection is often significant compared to the time used for the actual queries. Connection pooling can also be used, you get the idea. So, using persistent database connections, it makes sense to use prepared statements to speed up execution of simple queries, like those returning a few rows with a few joins and no complicated WHERE clauses, which is actually most of the queries on your average website. As shown in my previous message, the CPU time spent planning the query can be as much or even a lot more than CPU time spent actually executing the query. But, using prepared statements with persistent connections is messy, because you never know if the connection is new or not, if it contains already prepared statements or not, you'd have to maintain a list of those statements (named) for every query in your application, and when someone changes a query, it's a mess, not to mention queries generated by the ORM like Rails etc. The idea in this "proof of concept" was : Wouldn't it be nice if Postgres could just say "Hey, I already planned that query, I'll reuse that plan". And it is very easy to recognize a query we've seen before, since $-params takes the parameters out of the equation, and eliminates parsing time and string quoting hell. Storing the cached plans as prepared statements in the connection-local hashtable makes sense : it doesn't use that much memory anyway, and there are no locking and contention problems. Just like PREPARE and EXECUTE. Personally I would like to see this, not primarily for the performance gains, but for the possibility of managing when plans change -- ie, plan stability. Unfortunately, this isn't compatible with a non-shared memory approach... But there is resistance from other quarters about the reliability hit of having the plan data structures in shared memory. I agree. Hence the idea to put them in non-shared memory, local to a process. Perfectly useless when using non-persistent connections, but very powerful when using persistent connections. I still don't see why you would need a wire protocol change. Because I'd think that sometimes the client will not want to use a cached plan, when the query is rarely used (no need to waste memory to cache the plan), or it is complex and needs to be replanned according to parameter values every time. Sure, the client could use the oldskool "send query as text with parameters inside" but that's back to string escaping hell, and it's ugly. It would be nicer to have a bool "cache_plan". You would just have clients prepare plans normally and stash them in shared memory for other backends in a hash table keyed by, well, something, perhaps the original query text. Query text seems to be the simplest, better not ask the user to come up with distinct names when the query text will be a perfect key. Besides, hand-generated names might turn out not to be so distinct after all... Then whenever you're asked to prepare a query you go check if someone else has already done it for you and find an already generated plan in the shared memory hash table. The contention on the shared cache is likely to negate much of the planning savings but I think it would still be a win. But what's really interesting to me is then providing an interface to see and manipulate that cache. Then you could see what plans other backends are using for queries, mark plans as being acceptable or not, and even revoke users' permissions to execute queries which aren't already present and marked as being acceptable. If it can be made to work with a shared cache, why not, but that would be more complex. You'd also have to deal with permissions, different users with different privileges, etc. But local would probably be simplest (and faster). Also, there will be problems with the schema search path. Perhaps a query should be required to specify the fully qualified table names (schema.table) for all tables in order to be cacheable. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Cached Query Plans
"Alvaro Herrera" <[EMAIL PROTECTED]> writes: > PFC wrote: > >> So, where to go from that ? I don't see a way to implement this without >> a (backwards-compatible) change to the wire protocol, because the clients >> will want to specify when a plan should be cached or not. Since the user >> should not have to name each and every one of the statements they want to >> use plan caching, I see the following choices : > > I don't understand the point here. We already have cached plans: you > send a Parse. You can then Bind/Execute many times. I think what he's referring to is persistently caching plans so that new connections can use them. That makes a lot more sense if you have lots of short-lived connections like a stock php server without persistent connections turned on or a connection pooler. You can prepare queries but they only live for a single web page so you don't get any benefit. Personally I would like to see this, not primarily for the performance gains, but for the possibility of managing when plans change -- ie, plan stability. But there is resistance from other quarters about the reliability hit of having the plan data structures in shared memory. A bug in one backend could cause other backends to crash or corrupt their memory. The possibility exists with other shared data structures but, arguably, plans are much more complex data structures than PGPROC entries and buffers. I still don't see why you would need a wire protocol change. You would just have clients prepare plans normally and stash them in shared memory for other backends in a hash table keyed by, well, something, perhaps the original query text. Then whenever you're asked to prepare a query you go check if someone else has already done it for you and find an already generated plan in the shared memory hash table. The contention on the shared cache is likely to negate much of the planning savings but I think it would still be a win. But what's really interesting to me is then providing an interface to see and manipulate that cache. Then you could see what plans other backends are using for queries, mark plans as being acceptable or not, and even revoke users' permissions to execute queries which aren't already present and marked as being acceptable. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's On-Demand Production Tuning -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Cached Query Plans (was: global prepared statements)
Alvaro Herrera <[EMAIL PROTECTED]> writes: > PFC wrote: >> So, where to go from that ? I don't see a way to implement this without >> a (backwards-compatible) change to the wire protocol, because the clients >> will want to specify when a plan should be cached or not. Since the user >> should not have to name each and every one of the statements they want to >> use plan caching, I see the following choices : > I don't understand the point here. We already have cached plans: you > send a Parse. You can then Bind/Execute many times. > Maybe what we need is support for this in libpq, so that PHP can use it? We already have that, too ... regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Cached Query Plans (was: global prepared statements)
PFC wrote: > So, where to go from that ? I don't see a way to implement this without > a (backwards-compatible) change to the wire protocol, because the clients > will want to specify when a plan should be cached or not. Since the user > should not have to name each and every one of the statements they want to > use plan caching, I see the following choices : I don't understand the point here. We already have cached plans: you send a Parse. You can then Bind/Execute many times. Maybe what we need is support for this in libpq, so that PHP can use it? -- Alvaro Herrerahttp://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Cached Query Plans (was: global prepared statements)
Well, I realized the idea of global prepared statements actually sucked, so I set on another approach thanks to ideas from this list, this is caching query plans. First, let's see if there is low hanging fruit with the typical small, often-executed queries that are so frequent on websites. Tables test, test2 and test3 contain id (integer primary key) and another integer field. There are 100K rows in each. First, the simplest query : SELECT * FROM test WHERE id = $1 110 us : Send query as text (PHP:pg_query -> PQexec) 125 us : Parse+Bind (PHP:pg_query_params -> PQexecParams) 67 us : Execute a previously prepared statement (PHP:pg_execute -> PQexecPrepared) A slightly more complex one but still pretty classic : SELECT * FROM (SELECT * FROM test WHERE id>$1 ORDER BY id LIMIT 5) AS a NATURAL LEFT JOIN test2 NATURAL LEFT JOIN test3 ORDER BY id 523 us : Send query as text (PHP:pg_query -> PQexec) 580 us : Parse+Bind (PHP:pg_query_params -> PQexecParams) 148 us : Execute a previously prepared statement (PHP:pg_execute -> PQexecPrepared) OK, so there is low hanging fruit since the parsing+planning time of those is longer than doing the query itself. Since the Parse message includes a $-parameterized query that is to be prepared, it seems logical to put the caching logic there : the query string (without parameters) makes a nice cache key. So I made a really quick and really dirty experimentation without changing the wire protocol between client and server. This is only "proof of concept". Try #1 : in exec_parse_message(), if the statement is named, look it up in the prepared statements cache, if it is found, return at once and do nothing else. To exploit this, I issue a pg_prepare() followed by pg_execute() at every query, wether or not the statement exists. If it already exists, pg_prepare() now does nothing (except losing a little time). Results : 88 us : simple query 173 us : complex query So, the timings are between a simple execute and a plan+execute. It provides a nice performance gain versus replanning every time, but not perfect. Try #2 : again, in exec_parse_message(), if the statement is unnamed, I use the query string as the statement name, search the plan in the prepared statements hash table. If it is not found, then it is prepared. Then I make the unnamed statement plan point to this. Of course, this is dangerous since it probably introduces a dozen crash bugs, but for this proof of concept, it's OK. Client code is unchanged, PQexecParams will benefit from the plan caching, since it always sends a Parse+Bind message using the unnamed statement. Results are identical to executing an execute on a prepared statement, modulo a few microseconds. This means the overhead of sending the Parse message, and of the server ignoring it when the statement is cached, is negligible. So, where to go from that ? I don't see a way to implement this without a (backwards-compatible) change to the wire protocol, because the clients will want to specify when a plan should be cached or not. Since the user should not have to name each and every one of the statements they want to use plan caching, I see the following choices : - Add a new Parse+Bind command, which gets the $-parameterized SQL and the parameters. If the plan is cached, grab it and execute, else prepare and execute. Add a flag to allow the client to specify if he wants caching or not. Pros : Only one message, faster Cons : SQL is transmitted in full, useless most of the time, but this overhead is rather small. - Send the SQL with Bind as statement name, add a flag to Bind telling it to report a cache miss instead of raising an error, then have the client send a Parse and Bind again. - Should there be one common hashtable for named prepared statements and cached plans, or two hashtables ? Using the SQL string as the statement name is not clean. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers