Re: [HACKERS] Cached Query Plans

2008-04-15 Thread Florian Weimer
* Dawid Kuroczko:

> Right now the only way of getting such information from PostgreSQL
> is by logging all queries and analyzing logs.  The current_query
> column of pg_stat_activity is useless as the (prepared) queries are
> usually so short lived that you will see one execution out of
> thousands happening.

If the cached plans are kept a bit longer than actually necessary, it
might also be possible to see the query plan of a query that involves
temporary tables, something that is somewhat convoluted to do
otherwise (ptop uses the query string from pg_stat_activity and tacks
an EXPLAIN in front of it, which breaks with temporary tables).

-- 
Florian Weimer<[EMAIL PROTECTED]>
BFK edv-consulting GmbH   http://www.bfk.de/
Kriegsstraße 100  tel: +49-721-96201-1
D-76133 Karlsruhe fax: +49-721-96201-99

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cached Query Plans

2008-04-14 Thread Dawid Kuroczko
On Mon, Apr 14, 2008 at 5:01 PM, Csaba Nagy <[EMAIL PROTECTED]> wrote:
> On Mon, 2008-04-14 at 10:55 -0400, Mark Mielke wrote:
>  > The other ideas about automatically deciding between plans based on
>  > ranges and such strike me as involving enough complexity and logic, that
>  > to do properly, it might as well be completely re-planned from the
>  > beginning to get the most benefit.
>
>  ... except if you hard-wire the most common alternative plans, you still
>  get the benefit of cached plan for a wider range of parameter values.
>  Not to mention that if you know you'll cache the plan, you can try
>  harder planning it right, getting possibly better plans for complex
>  queries... you could argue that complex queries tend not to be repeated,
>  but we do have here some which are in fact repeated a lot in batches,
>  then discarded. So I guess a cached plan discard/timeout mechanism would
>  also be nice.

I think ANALYZE on tables involved should _force_ replanning of cached query.
After all, if ANALYZE was fired, then contents changed substantially and
replanning feels like a good idea.

As for planner getting smarter (and slower ;)) -- complex queries tend not
to be repeated -- so it is worth the trouble to plan them carefully.  These
would benefit from smarter planer with or without caching.

The problem is with "simple queries", which can be argued are a majority
of queries.  its where the caching comes in.  If you cache the queries,
you can let the planner be smarter (and slower).  If you don't cache, you
probably don't want trade "frequent simple query"'s speed for "once in
a while complex query".

That stated, for me the most important feature is the possibility to
have a good online query statistics. :)

   Regards,
 Dawid

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cached Query Plans (was: global prepared statements)

2008-04-14 Thread Csaba Nagy
On Mon, 2008-04-14 at 17:08 +0200, PFC wrote:
>   Those "Decision" nodes could potentially lead to lots of decisions 
> (ahem).
>   What if you have 10 conditions in the Where, plus some joined ones ? 
> That  
> would make lots of possibilities...

Yes, that's true, but most of them are likely not relevant for the end
result. In any real life query there are a few parameters which are
really important for what plan you should choose... the key here is that
you should spend more time on finding the possibilities for a cached
plan than you do for a one shot query. 

In principle one-shot planning should be the default and caching should
be something the user has to chose deliberately. I would really like a
special command to plan and cache a query without actually executing it,
possibly having a parameter how hard to try... for e.g. you could expend
the extra cycles to eliminate all redundancies from boolean expressions,
in lists, to get the parse tree in a canonical format - all things which
can make planning easier. All these lose in one-shot queries, but once
you cache you can really do a lot of smarts which were no-no before...

>   Consider several types of queries :
> 
>   - The small, quick query which returns one or a few rows : in this 
> case,  
> planning overhead is large relative to execution time, but I would venture  
> to guess that the plans always end up being the same.

Consider a 'select a where b like $1' -> the parameter $1 will
considerably affect the query plan. A query can't go much simpler...

>   - The query that takes a while : in this case, planning overhead is nil 
>  
> compared to execution time, better replan every time with the params.

I guess these queries are not the ones targeted by this feature. In fact
for these queries it really doesn't matter if you cache or not, except:
if you know you're gonna cache, you'll expend more effort planning
right, and that could still matter for a query which runs long. Note
that if you don't cache, planning harder will lose in the long run, only
once you cache you can afford to plan harder...

>   - The complex query that still executes fast because it doesn't process 
> a  
> lot of rows and postgres finds a good plan (for instance, a well optimized  
> search query). Those would benefit from reducing the planning overhead,  
> but those also typically end up having many different plans depending on  
> the search parameters. Besides, those queries are likely to be dynamically  
> generated. So, would it be worth it to add all those features just to  
> optimize those ? I don't know...

We have here dynamically generated queries which are specifically
chunked to be executed in small increments so none of the queries runs
too long (they would block vacuuming vital tables otherwise). Those
chunks would greatly benefit from properly planned and cached plans... 

A real smart system would store canonical plan fragments as response to
(also canonicalized) parse tree fragments, and then assemble the plan
out of those fragments, but that would be indeed really complex (to
design, the resulting code might be simpler than one thinks) ;-)

Cheers,
Csaba.




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cached Query Plans (was: global prepared statements)

2008-04-14 Thread PFC

On Mon, 14 Apr 2008 16:17:18 +0200, Csaba Nagy <[EMAIL PROTECTED]> wrote:


On Mon, 2008-04-14 at 16:10 +0200, Csaba Nagy wrote:

... or plan the query with the actual parameter value you get, and also
record the range of the parameter values you expect the plan to be valid
for. If at execution time the parameter happens to be out of that range,
replan, and possibly add new sublpan covering the extra range. This
could still work with prepared queries (where you don't get any
parameter values to start with) by estimating the most probable
parameter range (whatever that could mean), and planning for that.


More on that: recording the presumptions under which the (cached!)plan
is thought to be valid would also facilitate setting up dependencies
against statistics, to be checked when you analyze tables... and if the
key value which you depend on with your query changed, the analyze
process could possibly replan it in the background.


	LOL, it started with the idea to make small queries faster, and now the  
brain juice is pouring.

Those "Decision" nodes could potentially lead to lots of decisions 
(ahem).
	What if you have 10 conditions in the Where, plus some joined ones ? That  
would make lots of possibilities...


Consider several types of queries :

	- The small, quick query which returns one or a few rows : in this case,  
planning overhead is large relative to execution time, but I would venture  
to guess that the plans always end up being the same.
	- The query that takes a while : in this case, planning overhead is nil  
compared to execution time, better replan every time with the params.
	- The complex query that still executes fast because it doesn't process a  
lot of rows and postgres finds a good plan (for instance, a well optimized  
search query). Those would benefit from reducing the planning overhead,  
but those also typically end up having many different plans depending on  
the search parameters. Besides, those queries are likely to be dynamically  
generated. So, would it be worth it to add all those features just to  
optimize those ? I don't know...


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cached Query Plans

2008-04-14 Thread Csaba Nagy
On Mon, 2008-04-14 at 10:55 -0400, Mark Mielke wrote:
> The other ideas about automatically deciding between plans based on 
> ranges and such strike me as involving enough complexity and logic, that 
> to do properly, it might as well be completely re-planned from the 
> beginning to get the most benefit.

... except if you hard-wire the most common alternative plans, you still
get the benefit of cached plan for a wider range of parameter values.
Not to mention that if you know you'll cache the plan, you can try
harder planning it right, getting possibly better plans for complex
queries... you could argue that complex queries tend not to be repeated,
but we do have here some which are in fact repeated a lot in batches,
then discarded. So I guess a cached plan discard/timeout mechanism would
also be nice.

Cheers,
Csaba.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cached Query Plans

2008-04-14 Thread Mark Mielke
I like cross-session query plan caching talk. I would prefer if the 
function was optional (i.e. per-session "use cross-session query plan 
cache" variable).


I like the "automatic re-plan if the estimate did not match the actual" 
idea with some softening technique involved such as "if the last 3 times 
it ran, it did the wrong thing, learn from our mistake and adapt".


The other ideas about automatically deciding between plans based on 
ranges and such strike me as involving enough complexity and logic, that 
to do properly, it might as well be completely re-planned from the 
beginning to get the most benefit.


Cheers,
mark

--
Mark Mielke <[EMAIL PROTECTED]>


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cached Query Plans (was: global prepared statements)

2008-04-14 Thread PFC



Bind message behaviour was modified :
- If the user asks for execution of a named prepared statement, and the
named statement does not exist in PG's prepared statements cache,  
instead

of issuing an error and borking the transaction, it Binds to an empty
statement, that takes no parameters, and returns no result. Parameters
sent by the user are consumed but not used.


You mentioned the need for a wire protocol change to allow this.  Why
can't this be controlled with a server variable, like SET auto_prepare =
'true'?


Actually, thanks to the hack, the wire protocol doesn't change.
Explanation :

- Send Parse(SQL) to unnamed statement + Bind unnamed statement => works  
as usual (no cache)
- Send only Bind (named statement) with a statement name that is not found  
in the cache => doesn't raise an error, instead informs the application  
that the statement does not exist. The application can then prepare (send  
a Parse message with SQL and a name) the statement and give it a name. I  
used as name the SQL itself, but you can use anything else. The  
application can then send the Bind again, which will (hopefully) work.


	So, here, the information ("cache" or "don't cache") is passed from the  
client to the server, in a hidden way : it depends on what function you  
use to send the query (unnamed statements are not cached, named statements  
are cached).
	There is no protocol change, but a new information is provided to the  
server nonetheless.


	Downside to this is that the application needs to be modified (only a  
little, though) and applications that expect exceptions on "Statement does  
not exist" will break, thus the necessity of a GUC to control it.


	It was just a quick & dirty test to see if this way of doing it was an  
option to consider or not. Apparently it works, but wether it is The Right  
Way remains to be seen...



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cached Query Plans (was: global prepared statements)

2008-04-14 Thread Csaba Nagy
> ... or plan the query with the actual parameter value you get, and also
> record the range of the parameter values you expect the plan to be valid
> for. If at execution time the parameter happens to be out of that range,
> replan, and possibly add new sublpan covering the extra range. This
> could still work with prepared queries (where you don't get any
> parameter values to start with) by estimating the most probable
> parameter range (whatever that could mean), and planning for that.

Another thought: if the cached plans get their own table (as it was
suggested) then you could also start gathering parameter range
statistics meaningfully... and on the next replan you know what to
optimize your planning efforts for.

Cheers,
Csaba.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cached Query Plans (was: global prepared statements)

2008-04-14 Thread Csaba Nagy
On Mon, 2008-04-14 at 16:10 +0200, Csaba Nagy wrote:
> ... or plan the query with the actual parameter value you get, and also
> record the range of the parameter values you expect the plan to be valid
> for. If at execution time the parameter happens to be out of that range,
> replan, and possibly add new sublpan covering the extra range. This
> could still work with prepared queries (where you don't get any
> parameter values to start with) by estimating the most probable
> parameter range (whatever that could mean), and planning for that.

More on that: recording the presumptions under which the (cached!)plan
is thought to be valid would also facilitate setting up dependencies
against statistics, to be checked when you analyze tables... and if the
key value which you depend on with your query changed, the analyze
process could possibly replan it in the background.

Cheers,
Csaba.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cached Query Plans (was: global prepared statements)

2008-04-14 Thread Csaba Nagy
On Mon, 2008-04-14 at 16:54 +0300, Heikki Linnakangas wrote:
> Figuring out the optimal "decision points" is hard, and potentially very 
> expensive. There is one pretty simple scenario though: enabling the use 
> of partial indexes, preparing one plan where a partial index can be 
> used, and another one where it can't. Another such case is "col LIKE ?" 
> queries, where ? is actually a prefix query, "foo%".

Another point is when the cardinality distribution of some key's values
is very skewed, with some values very frequent and the majority of
values being unique. There you could check the stats at execution time
just for deciding to go for the low cardinality plan or the high one...

> As an optimization, we could decide the decision points on the prepare 
> message, and delay actually planning the queries until they're needed. 
> That way we wouldn't waste time planning queries for combinations of 
> parameters that are never used.

... or plan the query with the actual parameter value you get, and also
record the range of the parameter values you expect the plan to be valid
for. If at execution time the parameter happens to be out of that range,
replan, and possibly add new sublpan covering the extra range. This
could still work with prepared queries (where you don't get any
parameter values to start with) by estimating the most probable
parameter range (whatever that could mean), and planning for that.

Cheers,
Csaba.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cached Query Plans (was: global prepared statements)

2008-04-14 Thread Heikki Linnakangas

Csaba Nagy wrote:

If cached plans would be implemented, the dependence on parameter values
could be solved too: use special "fork" nodes in the plan which execute
different sub-plans depending on special parameter values/ranges,
possibly looking up the stats at runtime, so that the plan is in a
compiled state with the "decision points" wired in.


That's an idea I've been thinking about for a long time, but never got 
around implementing. I see that as a completely orthogonal feature to 
the server-side shared plan cache, though. There's plenty of scenarios, 
like with client-side prepared statement cache, where it would be useful.


Figuring out the optimal "decision points" is hard, and potentially very 
expensive. There is one pretty simple scenario though: enabling the use 
of partial indexes, preparing one plan where a partial index can be 
used, and another one where it can't. Another such case is "col LIKE ?" 
queries, where ? is actually a prefix query, "foo%".


As an optimization, we could decide the decision points on the prepare 
message, and delay actually planning the queries until they're needed. 
That way we wouldn't waste time planning queries for combinations of 
parameters that are never used.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cached Query Plans (was: global prepared statements)

2008-04-14 Thread Bruce Momjian
PFC wrote:
> Bind message behaviour was modified :
> - If the user asks for execution of a named prepared statement, and the  
> named statement does not exist in PG's prepared statements cache, instead  
> of issuing an error and borking the transaction, it Binds to an empty  
> statement, that takes no parameters, and returns no result. Parameters  
> sent by the user are consumed but not used.

You mentioned the need for a wire protocol change to allow this.  Why
can't this be controlled with a server variable, like SET auto_prepare =
'true'?

-- 
  Bruce Momjian  <[EMAIL PROTECTED]>http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cached Query Plans (was: global prepared statements)

2008-04-14 Thread PFC



If cached plans would be implemented, the dependence on parameter values
could be solved too: use special "fork" nodes in the plan which execute
different sub-plans depending on special parameter values/ranges,
possibly looking up the stats at runtime, so that the plan is in a
compiled state with the "decision points" wired in.

This of course would mean a lot heavier planning and possibly a lot
bigger plans, but you could afford that if you cache the plan. You could
even have a special command to plan a query this way.


	And, the "fork node" could mutter to itself "Strange, I'm getting 1  
rows instead of the 2 for which I was planned, perhaps I should switch to  
a different plan..."


I have made another very simple hack to test for another option :

Bind message behaviour was modified :
- If the user asks for execution of a named prepared statement, and the  
named statement does not exist in PG's prepared statements cache, instead  
of issuing an error and borking the transaction, it Binds to an empty  
statement, that takes no parameters, and returns no result. Parameters  
sent by the user are consumed but not used.


The application was modified thusly :
- Calls to pg_query_params were changed to calls to the following function  
:


function pg_query_cached( $sql, $params )
{
// Try to execute it, using the query string as statement name.
$q = pg_execute( $sql, $params );
if( !$q ) die( pg_last_error() );

// If it worked, return result to caller.
if( pg_result_status( $q, PGSQL_STATUS_STRING ) != "" )
return $q;

// If we got an empty query result (not a result with 0 rows which is  
valid) then prepare the query

$q = pg_prepare( $sql, $sql );
if( !$q ) die( pg_last_error() );

// and execute it again
$q = pg_execute( $sql, $params );
if( !$q ) die( pg_last_error() );

return $q;
}

Pros :
- It works
- It is very very simple
	- The user can choose between caching plans or not by calling  
pg_query_params() (no cached plans) or pg_query_cached() (cached plans)

- It works with persistent connections

Cons :
- It is too simple
	- Plans are cached locally, so memory use is proportional to number of  
connections

- It is still vulnerable to search_path problems




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cached Query Plans (was: global prepared statements)

2008-04-14 Thread Csaba Nagy
>   The hairiness is in the plan dependence (or independence) on parameter  
> values, ideally we only want to cache plans that would be good for all  
> parameter values, only the user knows that precisely. Although it could be  
> possible to examine the column histograms...

If cached plans would be implemented, the dependence on parameter values
could be solved too: use special "fork" nodes in the plan which execute
different sub-plans depending on special parameter values/ranges,
possibly looking up the stats at runtime, so that the plan is in a
compiled state with the "decision points" wired in.

This of course would mean a lot heavier planning and possibly a lot
bigger plans, but you could afford that if you cache the plan. You could
even have a special command to plan a query this way.

Cheers,
Csaba.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cached Query Plans (was: global prepared statements)

2008-04-13 Thread PFC

Why limit ourselves with Oracle?  How all major proprietary RDBMSs do it.


Thanks for the links. Very interesting.
The DB2 document especially mentions an important point : in order to make  
their planner/optimizer smarter, they had to make it slower, hence it  
became crucial to cache the plans.
Contrast this with MySQL where using prepared statements gains nothing :  
the "optimizer" does so little work that it actually doesn't matter.


So, basically, Orcale :
- Parses the query every time (identifies tables, permissions etc) (soft  
parse)
- From that parsed query it looks up a cached plan (the lookup key could  
then be different depending on the schema etc)

- If not, it must plan the query (hard parse).
Also the Oracle doc mentions that the soft parsing should be avoided by  
using prepared statements in the application (ie Parse once and Bind lots  
of times)
So, Oracle will redo the parsing + permissions check each time, unless  
prepared statements are used, in which case it's direct execution.


And DB2 :
Er, the document is not very clear about what it actually does, but the  
stats look nice ;)



I liked your global prepared statements idea much better. Named the
statements is no problem: DB frontends do that for you anyway
sometimes.


Hm. The "global statements" and the cache would complement each other  
actually. Why not.


When the user wants to name the statements, he can do so (and perhaps  
control who can execute what, etc, like with stored procs)

Permission checking overhead will be there at each execution.
Should the plan be cached locally ? (RAM consumption times N bakends...)
Cached per user once permissions have been checked ? (avoids the overhead  
of rechecking permissions)

What about the search path ?
(I'd force the global statements to use the default search path no matter  
what, being explicit is better than "why does it stop working ?")


Can the application or the database library name the statements ?
I'm not so sure. This could work for compiled languages (what about when  
you run several applications ? or several versions of the same application  
? do we need a uniqueness of statement names from all developers all over  
the world ?) Solution : make each application use a different user name,  
and global prepared statements only visible to the user that created them,  
perhaps. This conflicts with some desirable features, though. It needs  
more thinking.


What about non-compiled languages ? It will not be possible to generate a  
list of statements beforehands... And queries are also constructed  
dynamically by frameworks such as Rails, which makes naming them  
impossible, but caching the plans would work well.


So, some situations would benefit from a plan cache,


Frankly, I think you're better off storing them in a table. Shared
memory is a limited resource and you cannot change how much you've


	I'd say that unless you have a perverse application that will try all the  
permutations of column names just to make sure the query is different  
every time, how many different queries would you want to cache ?...  
probably less than 1000... so it wouldn't take more than a couple  
megabytes...



allocated after the server has started. It does mean you'll have to
serialise/deserialise them, but this will be cheaper than replanning,
right?


	What would be the overhead of a catalog lookup to get a cached plan for a  
statement that returns 1 row ? Would the catalog cache make it fast enough  
?

And what about deserialization ?...


I am not too sure that plans and statistical counters should be stored
together...


Not sure either.


Probably plans should go in one place, and statistics should go to the
stats collector (I know he's not quite ready for this ;)).


That's the problem...


Hm, a limit on how much memory can be used for plans
(query_plan_cache_size GUC?), and a LRU/LFU expiration
of old plans?


Now it gets hairy ;)
	Yes memory size should be limited. But how to make a LRU cleaner which  
doesn't create lots of contention ?... Luckily, with a hash having a fixed  
number of buckets, it is easier (clean a bucket every N seconds for  
instance).



Perhaps a GUC for controlling query cache should heve three values:
 none -- don't cache any statement
 smart -- use heuristics for deciding whether to cache it
 all -- force caching all queries -- for uncommon/statistical/testing  
purposes.


I would not volunteer to write that heuristic ;)
	Although there would be a very simple solution : if time to parse > some  
percentage of time to execute then cache.
	The hairiness is in the plan dependence (or independence) on parameter  
values, ideally we only want to cache plans that would be good for all  
parameter values, only the user knows that precisely. Although it could be  
possible to examine the column histograms...



 (like mysql, /* flags */ SELECT blah )


I don't like the hint flags.  They tend to ha

Re: [HACKERS] Cached Query Plans (was: global prepared statements)

2008-04-13 Thread Dawid Kuroczko
On Sun, Apr 13, 2008 at 2:26 PM, PFC <[EMAIL PROTECTED]> wrote:
> > >  Oracle keeps a statement/plan cache in its shared memory segment (SGA)
> > >  that greatly improves its performance at running queries that don't
> > >  change very often.
> Can we have more details on how Oracle does it ? For
> "inspiration"...

Why limit ourselves with Oracle?  How all major proprietary RDBMSs do it.

Here is  a nice presentation I've found on DB2, they call it "Dynamic
Statement Cache":

http://www.tbrug.com/TB%20UG%20Dynamic%20Statement%20Cache.ppt

> Here is what I'm thinking about :
> Don't flame me too much about implementation issues, this is just
> throwing ideas in the air to see where they'll fall ;)
>
>  * global plan cache in shared memory, implemented as hashtable, hash key
> being the (search_path, query_string)
>  Doubt : Can a plan be stored in shared memory ? Will it have to be copied
> to local memory before being executed ?

Well, Oracle uses terms "hard parse" and "soft parse", the former being
preparing the whole query, the latter reusing query plan prepared by
some other session.   More or less.  See this link for more detailed
description:

http://asktom.oracle.com/pls/asktom/f?p=100:11:0P11_QUESTION_ID:2588723819082

(this is quite interesting read)

>  This stores :
>  - the plans (not for all keys, see below)
>  - the stats :
[...]

I am not too sure that plans and statistical counters should be stored
together...
Probably plans should go in one place, and statistics should go to the
stats collector (I know he's not quite ready for this ;)).

>  There should be also a way to query this to display the statistics (ie
> "what query is killing my server ?"), and a way to purge old plans.

Hm, a limit on how much memory can be used for plans
(query_plan_cache_size GUC?), and a LRU/LFU expiration
of old plans?

>  * every time a Parse message comes up :
>  - look if the (search_path, query_string) is in the cache
>  - if it is in the cache :
> - if there is a cached plan, make the unnamed statement point to it,
> and we're done.
> - if there is no cached plan, prepare the query, and put it in the
> unnamed statement.
>
>  Now, the query has been parsed, so we can decide if it is cacheable. Should
> this be done in Parse, in Bind, or somewhere else ? I have no idea.
>
>  For instance, queries which contain VALUES() or IN( list of consts ) should
> not be cached, since the IN() is likely to change all the time, it would
> just trash the cache. Using =ANY( $1 ) instead will work with cached plans.

Perhaps a GUC for controlling query cache should heve three values:
 none -- don't cache any statement
 smart -- use heuristics for deciding whether to cache it
 all -- force caching all queries -- for uncommon/statistical/testing purposes.

>>  Also, will a plan to be cached have to be prepared with or without the
> parameters ? That's also an interesting question...
>  Perhaps the user should also be able to specify wether to cache a plan or
> not, or wether to use the params or not, with hint flags in the query string
> ?
>  (like mysql, /* flags */ SELECT blah )

I don't like the hint flags.  They tend to haunt later on (when the database
gets smarter, but application forces it to be dumb).  I would say a GUC.
GUC gives freedom of change to the application, and can also be set
per user with ALTER USER.

> Now, if the query is cacheable, store it in the cache, and update
> the stats. If we decided to store the plan, do that too. For instance we
> might decide to store the plan only if this query has been executed a
> certain number of times, etc.

Interesting idea.  I think I like it.

>  * In the Execute message, if a cached plan was used, execute it and update
> the stats (time spent, etc).
>
> Now, about contention, since this is one shared hashtable for
> everyone, it will be fought for...
> However, the lock on it is likely to be held during a very small
> time (much less than a microsecond), so would it be that bad ?
> Also, GUC can be used to mitigate the contention, for instance if
> the user is not interested in the stats, the thing becomes mostly read-only

I would say: keep the stats separate.  For evey plan cached generate
some unique id (Perhaps OID? I am not convinced), and use this ID
as the key for the statistics.  I tend to think of it as a "temporary table,
and temporary table stats". :)

   Regards,
  Dawid

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cached Query Plans (was: global prepared statements)

2008-04-13 Thread Martijn van Oosterhout
On Sun, Apr 13, 2008 at 02:26:04PM +0200, PFC wrote:
> * global plan cache in shared memory, implemented as hashtable, hash key  
> being the (search_path, query_string)
> Doubt : Can a plan be stored in shared memory ? Will it have to be copied  
> to local memory before being executed ?

Frankly, I think you're better off storing them in a table. Shared
memory is a limited resource and you cannot change how much you've
allocated after the server has started. It does mean you'll have to
serialise/deserialise them, but this will be cheaper than replanning,
right?

I liked your global prepared statements idea much better. Named the
statements is no problem: DB frontends do that for you anyway
sometimes.

Have a nice day,
-- 
Martijn van Oosterhout   <[EMAIL PROTECTED]>   http://svana.org/kleptog/
> Please line up in a tree and maintain the heap invariant while 
> boarding. Thank you for flying nlogn airlines.


signature.asc
Description: Digital signature


Re: [HACKERS] Cached Query Plans (was: global prepared statements)

2008-04-13 Thread PFC



On Fri, Apr 11, 2008 at 12:34 PM, PFC <[EMAIL PROTECTED]> wrote:

Well, I realized the idea of global prepared statements actually
sucked, so I set on another approach thanks to ideas from this list,  
this is

caching query plans.


Well, that's a blatantly bad realization.  Perhaps you should do more  
research.


	No, what I meant is that the "global prepared statements" as I tried to  
implement them before weren't that good...
	I think simple caching based on the query text itself is preferable to  
having to name each of your queries, extract them from your programs and  
replace them by executes, issue a "create statement" command for each of  
them, etc. Few people would actually use that feature because it would  
mean lots of modifications to the application, so all the applications  
that have to be compatible with other databases would not use the feature  
(*)
	It could be useful for permissions and fine access control, though, but  
views and stored procs already provide that functionality...


	(*) = Note that caching the plans based on the query text (with $ params)  
from a parse message will not provide caching for oldskool queries with  
params inside in the form of escaped strings. This is good, because it  
means the safer solution (using $-quoted params) will also be the faster  
solution. And in the application, only a very small part of the code needs  
to be changed, that's the DB abstraction layer.




 Doesn't Oracle do this now transparently to clients?


Of course it does, and it has since the late 80's I believe.


 Oracle keeps a statement/plan cache in its shared memory segment (SGA)
 that greatly improves its performance at running queries that don't
 change very often.


Can we have more details on how Oracle does it ? For "inspiration"...

Here is what I'm thinking about :
	Don't flame me too much about implementation issues, this is just  
throwing ideas in the air to see where they'll fall ;)


* global plan cache in shared memory, implemented as hashtable, hash key  
being the (search_path, query_string)
Doubt : Can a plan be stored in shared memory ? Will it have to be copied  
to local memory before being executed ?


This stores :
- the plans (not for all keys, see below)
- the stats :
- number of times this query has been executed,
	- total, min and max wallclock time and CPU time spent planning this  
query,
	- total, min and max wallclock time, CPU time and RAM spent executing  
this query,

- total, min and max number of rows returned,
- last timestamp of execution of this query,

There should be separate GUCs to control this :
- should the whole thing be activated ?
- should the cache be active ? or just the stats ? and what stats ?

There should be also a way to query this to display the statistics (ie  
"what query is killing my server ?"), and a way to purge old plans.


* every time a Parse message comes up :
- look if the (search_path, query_string) is in the cache
- if it is in the cache :
	- if there is a cached plan, make the unnamed statement point to it, and  
we're done.
	- if there is no cached plan, prepare the query, and put it in the  
unnamed statement.


Now, the query has been parsed, so we can decide if it is cacheable.  
Should this be done in Parse, in Bind, or somewhere else ? I have no idea.


For instance, queries which contain VALUES() or IN( list of consts )  
should not be cached, since the IN() is likely to change all the time, it  
would just trash the cache. Using =ANY( $1 ) instead will work with cached  
plans.


Also, will a plan to be cached have to be prepared with or without the  
parameters ? That's also an interesting question...
Perhaps the user should also be able to specify wether to cache a plan or  
not, or wether to use the params or not, with hint flags in the query  
string ?

(like mysql, /* flags */ SELECT blah )

	Now, if the query is cacheable, store it in the cache, and update the  
stats. If we decided to store the plan, do that too. For instance we might  
decide to store the plan only if this query has been executed a certain  
number of times, etc.


* In the Execute message, if a cached plan was used, execute it and update  
the stats (time spent, etc).


	Now, about contention, since this is one shared hashtable for everyone,  
it will be fought for...
	However, the lock on it is likely to be held during a very small time  
(much less than a microsecond), so would it be that bad ?
	Also, GUC can be used to mitigate the contention, for instance if the  
user is not interested in the stats, the thing becomes mostly read-only






















--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cached Query Plans

2008-04-13 Thread James Mansion

Would it be possible to store plans with an indication of the
search path that was used to find tables, and for temp tables
some snapshot of the statistics for the table if any?

My suspicions are that:

* where you have a lot of short-lived connections then actually
 they will often use the default search path - or a similar one

* if a temp table is in use then normally these will be small or
 contain 'similar' data

There is a danger that these heuristics will be poor if long-running
connections are in play - but they have no excuse not to do their
own preparation.

Perhaps you could have named cache segments and connections
could 'opt in' to a cache segment if they want such sharing?

James


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cached Query Plans (was: global prepared statements)

2008-04-13 Thread Marc Cousin
Another issue with plan caches, besides contention, in Oracle at least, is 
shared memory fragmentation (as plans aren't all the same size in memory ...)

But this cache is very helpful for developments where every query is done via 
prepare/execute/deallocate. I've seen it a lot on java apps, the purpose 
being to benefit from the advantages of prepared statements, but without 
having to deal with storing those prepared statements somewhere.

And of course, as said before, the statistics associated with those plans can 
be very helpful, mostly for all those very small queries that are run very 
frequently (a badly developped app most of the time, but that happens).

Le Sunday 13 April 2008 06:21:41 Jonah H. Harris, vous avez écrit :
> On Sat, Apr 12, 2008 at 10:17 PM, Tom Lane <[EMAIL PROTECTED]> wrote:
> >  > Yes, this is worthless on large active databases.  The logging
> >  > overhead alone starts to affect performance.
> >
> >  But somehow, all that stuff with cached plans is free?
>
> Of course not.  The first time you execute a query, it is cached... so
> you pay the same penalty you do in PG, but in many cases, only once.
> In regards to plan re-use, sure there's going to be some contention on
> the hash buckets... but that can be mitigated in a lot of ways.
>
> In addition to that, Oracle collects over two thousand other
> statistics in real-time... yet somehow Oracle is quite fast.  So, I
> would say that the usual complaint about collecting stats should be
> more an issue of proper implementation than a complaint about the act
> of collection itself.
>
> --
> Jonah H. Harris, Sr. Software Architect | phone: 732.331.1324
> EnterpriseDB Corporation | fax: 732.331.1301
> 499 Thornall Street, 2nd Floor | [EMAIL PROTECTED]
> Edison, NJ 08837 | http://www.enterprisedb.com/



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cached Query Plans (was: global prepared statements)

2008-04-12 Thread Jonah H. Harris
On Sat, Apr 12, 2008 at 10:17 PM, Tom Lane <[EMAIL PROTECTED]> wrote:
>  > Yes, this is worthless on large active databases.  The logging
>  > overhead alone starts to affect performance.
>
>  But somehow, all that stuff with cached plans is free?

Of course not.  The first time you execute a query, it is cached... so
you pay the same penalty you do in PG, but in many cases, only once.
In regards to plan re-use, sure there's going to be some contention on
the hash buckets... but that can be mitigated in a lot of ways.

In addition to that, Oracle collects over two thousand other
statistics in real-time... yet somehow Oracle is quite fast.  So, I
would say that the usual complaint about collecting stats should be
more an issue of proper implementation than a complaint about the act
of collection itself.

-- 
Jonah H. Harris, Sr. Software Architect | phone: 732.331.1324
EnterpriseDB Corporation | fax: 732.331.1301
499 Thornall Street, 2nd Floor | [EMAIL PROTECTED]
Edison, NJ 08837 | http://www.enterprisedb.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cached Query Plans (was: global prepared statements)

2008-04-12 Thread Tom Lane
"Jonah H. Harris" <[EMAIL PROTECTED]> writes:
> On Sat, Apr 12, 2008 at 2:19 PM, Dawid Kuroczko <[EMAIL PROTECTED]> wrote:
>> There are other benefits as well.  Oracle lets you see the statistics 
>> associated
>> with given plans.  So you can see how many times given (cached) query was
>> executed, how much resources did it consume and do on.

> Yes, and it also uses that data at both the statement and column level
> to determine what needs more analysis to help build better plans in
> the future.

>> Right now the only way of getting such information from PostgreSQL is by
>> logging all queries and analyzing logs.  The current_query column of
>> pg_stat_activity is useless as the (prepared) queries are usually so short
>> lived that you will see one execution out of thousands happening.

> Yes, this is worthless on large active databases.  The logging
> overhead alone starts to affect performance.

But somehow, all that stuff with cached plans is free?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cached Query Plans (was: global prepared statements)

2008-04-12 Thread Jonah H. Harris
On Sat, Apr 12, 2008 at 2:19 PM, Dawid Kuroczko <[EMAIL PROTECTED]> wrote:
>  There are other benefits as well.  Oracle lets you see the statistics 
> associated
>  with given plans.  So you can see how many times given (cached) query was
>  executed, how much resources did it consume and do on.

Yes, and it also uses that data at both the statement and column level
to determine what needs more analysis to help build better plans in
the future.

>  Right now the only way of getting such information from PostgreSQL is by
>  logging all queries and analyzing logs.  The current_query column of
>  pg_stat_activity is useless as the (prepared) queries are usually so short
>  lived that you will see one execution out of thousands happening.

Yes, this is worthless on large active databases.  The logging
overhead alone starts to affect performance.

>  Nooow, suppose we do have cached plans.  Then we can have a view
>  pg_stat_queries + a stats collector which will track number of executions,
>  number of blocks hit, blocks read, etc.  Would be great! :)

With the exception that the stats collector itself needs a bit of work
to handle larger volumes of statistics, I agree.

-- 
Jonah H. Harris, Sr. Software Architect | phone: 732.331.1324
EnterpriseDB Corporation | fax: 732.331.1301
499 Thornall Street, 2nd Floor | [EMAIL PROTECTED]
Edison, NJ 08837 | http://www.enterprisedb.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cached Query Plans (was: global prepared statements)

2008-04-12 Thread Jonah H. Harris
On Sat, Apr 12, 2008 at 8:44 AM, Perez <[EMAIL PROTECTED]> wrote:
>  Doesn't Oracle do this now transparently to clients?

Of course it does, and it has since the late 80's I believe.

>  Oracle keeps a statement/plan cache in its shared memory segment (SGA)
>  that greatly improves its performance at running queries that don't
>  change very often.

Yep.

>  From that point of view, Oracle at least sees benefits in doing this.

Yes, it is also a bit more advanced than we're discussing here, so
I'll just leave it as.

>  From my POV a transparent performance enhancer for all those PHP and
>  Rails apps out there.

Yes.

>
>  With plan invalidation in 8.3 this becomes feasible for pgSQL to do as
>  well.
>
>  -arturo
>
>
>
>  --
>  Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
>  To make changes to your subscription:
>  http://www.postgresql.org/mailpref/pgsql-hackers
>



-- 
Jonah H. Harris, Sr. Software Architect | phone: 732.331.1324
EnterpriseDB Corporation | fax: 732.331.1301
499 Thornall Street, 2nd Floor | [EMAIL PROTECTED]
Edison, NJ 08837 | http://www.enterprisedb.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cached Query Plans (was: global prepared statements)

2008-04-12 Thread Jonah H. Harris
On Fri, Apr 11, 2008 at 12:34 PM, PFC <[EMAIL PROTECTED]> wrote:
> Well, I realized the idea of global prepared statements actually
> sucked, so I set on another approach thanks to ideas from this list, this is
> caching query plans.

Well, that's a blatantly bad realization.  Perhaps you should do more research.

-- 
Jonah H. Harris, Sr. Software Architect | phone: 732.331.1324
EnterpriseDB Corporation | fax: 732.331.1301
499 Thornall Street, 2nd Floor | [EMAIL PROTECTED]
Edison, NJ 08837 | http://www.enterprisedb.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cached Query Plans (was: global prepared statements)

2008-04-12 Thread Dawid Kuroczko
On Sat, Apr 12, 2008 at 2:44 PM, Perez <[EMAIL PROTECTED]> wrote:
> In article <[EMAIL PROTECTED]>,
>
>  > PFC wrote:
>  >
>  > > So, where to go from that ? I don't see a way to implement this 
> without
>  > > a (backwards-compatible) change to the wire protocol, because the clients
>  > > will want to specify when a plan should be cached or not. Since the user
>  > > should not have to name each and every one of the statements they want to
>  > > use plan caching, I see the following choices :
>
>
>  Doesn't Oracle do this now transparently to clients?  That, I believe
>  Oracle keeps a statement/plan cache in its shared memory segment (SGA)
>  that greatly improves its performance at running queries that don't
>  change very often.
>
>  From that point of view, Oracle at least sees benefits in doing this.
>  From my POV a transparent performance enhancer for all those PHP and
>  Rails apps out there.

There are other benefits as well.  Oracle lets you see the statistics associated
with given plans.  So you can see how many times given (cached) query was
executed, how much resources did it consume and do on.

Right now the only way of getting such information from PostgreSQL is by
logging all queries and analyzing logs.  The current_query column of
pg_stat_activity is useless as the (prepared) queries are usually so short
lived that you will see one execution out of thousands happening.

Nooow, suppose we do have cached plans.  Then we can have a view
pg_stat_queries + a stats collector which will track number of executions,
number of blocks hit, blocks read, etc.  Would be great! :)

   Regards,
   Dawid

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cached Query Plans (was: global prepared statements)

2008-04-12 Thread Perez
In article <[EMAIL PROTECTED]>,
 > PFC wrote:
> 
> > So, where to go from that ? I don't see a way to implement this without 
> > a (backwards-compatible) change to the wire protocol, because the clients 
> > will want to specify when a plan should be cached or not. Since the user  
> > should not have to name each and every one of the statements they want to 
> > use plan caching, I see the following choices :
> 


Doesn't Oracle do this now transparently to clients?  That, I believe 
Oracle keeps a statement/plan cache in its shared memory segment (SGA) 
that greatly improves its performance at running queries that don't 
change very often.

>From that point of view, Oracle at least sees benefits in doing this.  
>From my POV a transparent performance enhancer for all those PHP and 
Rails apps out there.

With plan invalidation in 8.3 this becomes feasible for pgSQL to do as 
well.

-arturo

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cached Query Plans

2008-04-12 Thread PFC


Well if you're caching per-connection then it doesn't really matter  
whether
you do it on the client side or the server side, it's pretty much  
exactly the

same problem.


Actually I thought about doing it on the server since it would then also
work with connection pooling.
Doing it on the client means the client has to maintain state, which is
not possible in a pool...

Unsurprisingly most drivers do precisely what you're describing. In Perl  
DBI
for example you just change $dbh->prepare("") into  
$dbh->prepare_cached("")

and it does exactly what you want. I would expect the PHP drivers to have
something equivalent.


	Well, you clearly have "expectations" about PHP, lol. PHP takes pride in  
always aiming below your expectations, not above ;)
	It has no such feature. Also pg_query_params() is SLOWER than pg_query()  
which makes you choose between clean&slow, and string quoting hell.

Perhaps I should patch PHP instead...
Or perhaps this feature should be implemented in pgpool or pgbouncer.

	But, using prepared statements with persistent connections is messy,  
because you never know if the connection is new or not,



If you were to fix *that* then both this problem and others (such as
setting up desired SET-parameter values) would go away. 


True. Languages that keep a long-running context (like application
servers etc) can do this easily.
Although in the newer versions of PHP, it's not so bad, pconnect seems 
to
work (ie. it will issue ROLLBACKs when the script dies, reset session
variables like enable_indexscan, etc), so the only remaining problem seems
to be prepared statements.
And again, adding a method for the application to know if the persistent
connection is new or not, will not work in a connection pool...

Perhaps a GUC flag saying EXECUTE should raise an error but not kill the
current transaction if the requested prepared statement does not exist ?
Then the application would issue a PREPARE. It could also raise a
non-fatal error when the tables have changed (column added, for instance)
so the application can re-issue a PREPARE.

But I still think it would be cleaner to do it in the server.

Also, I rethought about what Gregory Stark said :
The contention on the shared cache is likely to negate much of the  
planning

savings but I think it would still be a win.


If a shared plan cache is implemented, it will mostly be read-only, ie.
when the application is started, new queries will come, so the plans will
have to be written to the cache, but then once the cache contains
everything it needs, it will not be modified that often, so I wouldn't
think contention would be such a problem...


It's not so easy as all that.  Consider search_path.  Consider temp
tables.


Temp tables : I thought plan revalidation took care of this ?
(After testing, it does work, if a temp table is dropped and recreated,
PG finds it, although of course if a table is altered by adding a column
for instance, it logically fails).

search_path: I suggested to either put the search_path in the cache key
along with the SQL string, or force queries to specify schema.table for
all tables.
It is also possible to shoot one's foot with the current PREPARE (ie.
search_path is used to PREPARE but of course not for EXECUTE), and also
with plpgsql functions (ie. the search path used to compile the function
is the one that is active when it is compiled, ie at its first call in the
current connection, and not the search path that was active when the
function was defined)...

SET search_path TO DEFAULT;

CREATE SCHEMA a;
CREATE SCHEMA b;

CREATE TABLE a.test( v TEXT );
CREATE TABLE b.test( v TEXT );

INSERT INTO a.test VALUES ('This is schema a');
INSERT INTO b.test VALUES ('This is schema b');

CREATE OR REPLACE FUNCTION test_search_path()
 RETURNS SETOF TEXT
 LANGUAGE plpgsql
 AS
$$
DECLARE
 x TEXT;
BEGIN
 FOR x IN SELECT v FROM test LOOP
 RETURN NEXT x;
 END LOOP;
END;
$$;

test=> SET search_path TO a,public;
test=> SELECT * FROM test_search_path();
  test_search_path
--
  This is schema a
test=> \q
$ psql test

test=> SET search_path TO b,public;
test=> SELECT * FROM test_search_path();
  test_search_path
--
  This is schema b

test=> SET search_path TO a,public;
test=> SELECT * FROM test_search_path();
  test_search_path
--
  This is schema b

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cached Query Plans

2008-04-11 Thread Gregory Stark
"PFC" <[EMAIL PROTECTED]> writes:

>   But, using prepared statements with persistent connections is messy,
> because you never know if the connection is new or not, if it contains  
> already
> prepared statements or not, you'd have to maintain a list of those  statements
> (named) for every query in your application, and when someone  changes a 
> query,
> it's a mess, not to mention queries generated by the ORM  like Rails etc.

Well if you're caching per-connection then it doesn't really matter whether
you do it on the client side or the server side, it's pretty much exactly the
same problem.

Unsurprisingly most drivers do precisely what you're describing. In Perl DBI
for example you just change $dbh->prepare("") into $dbh->prepare_cached("")
and it does exactly what you want. I would expect the PHP drivers to have
something equivalent.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's Slony Replication support!

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cached Query Plans

2008-04-11 Thread Tom Lane
PFC <[EMAIL PROTECTED]> writes:
>   And it is very easy to recognize a query we've seen before,

It's not so easy as all that.  Consider search_path.  Consider temp
tables.

The real problem here is entirely on the client side:

>   But, using prepared statements with persistent connections is messy,  
> because you never know if the connection is new or not,

If you were to fix *that* then both this problem and others (such as
setting up desired SET-parameter values) would go away.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cached Query Plans

2008-04-11 Thread PFC



I think what he's referring to is persistently caching plans so that new
connections can use them. That makes a lot more sense if you have lots of
short-lived connections like a stock php server without persistent  
connections
turned on or a connection pooler. You can prepare queries but they only  
live

for a single web page so you don't get any benefit.


Let me explain a little further.

	Persistent database connections are the way to go for web applications,  
because the connection is only going to get used for a few queries, and  
the time needed to start the postgres process and establish the connection  
is often significant compared to the time used for the actual queries.  
Connection pooling can also be used, you get the idea.


	So, using persistent database connections, it makes sense to use prepared  
statements to speed up execution of simple queries, like those returning a  
few rows with a few joins and no complicated WHERE clauses, which is  
actually most of the queries on your average website. As shown in my  
previous message, the CPU time spent planning the query can be as much or  
even a lot more than CPU time spent actually executing the query.


	But, using prepared statements with persistent connections is messy,  
because you never know if the connection is new or not, if it contains  
already prepared statements or not, you'd have to maintain a list of those  
statements (named) for every query in your application, and when someone  
changes a query, it's a mess, not to mention queries generated by the ORM  
like Rails etc.


The idea in this "proof of concept" was :

	Wouldn't it be nice if Postgres could just say "Hey, I already planned  
that query, I'll reuse that plan".
	And it is very easy to recognize a query we've seen before, since  
$-params takes the parameters out of the equation, and eliminates parsing  
time and string quoting hell.


	Storing the cached plans as prepared statements in the connection-local  
hashtable makes sense : it doesn't use that much memory anyway, and there  
are no locking and contention problems. Just like PREPARE and EXECUTE.


Personally I would like to see this, not primarily for the performance  
gains,
but for the possibility of managing when plans change -- ie, plan  
stability.


Unfortunately, this isn't compatible with a non-shared memory 
approach...


But there is resistance from other quarters about the reliability hit of
having the plan data structures in shared memory.


I agree.
Hence the idea to put them in non-shared memory, local to a process.
	Perfectly useless when using non-persistent connections, but very  
powerful when using persistent connections.



I still don't see why you would need a wire protocol change.


	Because I'd think that sometimes the client will not want to use a cached  
plan, when the query is rarely used (no need to waste memory to cache the  
plan), or it is complex and needs to be replanned according to parameter  
values every time.
	Sure, the client could use the oldskool "send query as text with  
parameters inside" but that's back to string escaping hell, and it's ugly.

It would be nicer to have a bool "cache_plan".


You would just
have clients prepare plans normally and stash them in shared memory for  
other
backends in a hash table keyed by, well, something, perhaps the original  
query

text.


	Query text seems to be the simplest, better not ask the user to come up  
with distinct names when the query text will be a perfect key. Besides,  
hand-generated names might turn out not to be so distinct after all...


Then whenever you're asked to prepare a query you go check if someone  
else has

already done it for you and find an already generated plan in the shared
memory hash table.
The contention on the shared cache is likely to negate much of the  
planning
savings but I think it would still be a win. But what's really  
interesting to
me is then providing an interface to see and manipulate that cache. Then  
you
could see what plans other backends are using for queries, mark plans as  
being
acceptable or not, and even revoke users' permissions to execute queries  
which

aren't already present and marked as being acceptable.


	If it can be made to work with a shared cache, why not, but that would be  
more complex. You'd also have to deal with permissions, different users  
with different privileges, etc. But local would probably be simplest (and  
faster).


	Also, there will be problems with the schema search path. Perhaps a query  
should be required to specify the fully qualified table names  
(schema.table) for all tables in order to be cacheable.





--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cached Query Plans

2008-04-11 Thread Gregory Stark
"Alvaro Herrera" <[EMAIL PROTECTED]> writes:

> PFC wrote:
>
>>  So, where to go from that ? I don't see a way to implement this without 
>> a (backwards-compatible) change to the wire protocol, because the clients 
>> will want to specify when a plan should be cached or not. Since the user  
>> should not have to name each and every one of the statements they want to 
>> use plan caching, I see the following choices :
>
> I don't understand the point here.  We already have cached plans: you
> send a Parse.  You can then Bind/Execute many times.

I think what he's referring to is persistently caching plans so that new
connections can use them. That makes a lot more sense if you have lots of
short-lived connections like a stock php server without persistent connections
turned on or a connection pooler. You can prepare queries but they only live
for a single web page so you don't get any benefit.

Personally I would like to see this, not primarily for the performance gains,
but for the possibility of managing when plans change -- ie, plan stability.

But there is resistance from other quarters about the reliability hit of
having the plan data structures in shared memory. A bug in one backend could
cause other backends to crash or corrupt their memory. The possibility exists
with other shared data structures but, arguably, plans are much more complex
data structures than PGPROC entries and buffers.

I still don't see why you would need a wire protocol change. You would just
have clients prepare plans normally and stash them in shared memory for other
backends in a hash table keyed by, well, something, perhaps the original query
text.

Then whenever you're asked to prepare a query you go check if someone else has
already done it for you and find an already generated plan in the shared
memory hash table.

The contention on the shared cache is likely to negate much of the planning
savings but I think it would still be a win. But what's really interesting to
me is then providing an interface to see and manipulate that cache. Then you
could see what plans other backends are using for queries, mark plans as being
acceptable or not, and even revoke users' permissions to execute queries which
aren't already present and marked as being acceptable.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's On-Demand Production Tuning

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cached Query Plans (was: global prepared statements)

2008-04-11 Thread Tom Lane
Alvaro Herrera <[EMAIL PROTECTED]> writes:
> PFC wrote:
>> So, where to go from that ? I don't see a way to implement this without 
>> a (backwards-compatible) change to the wire protocol, because the clients 
>> will want to specify when a plan should be cached or not. Since the user  
>> should not have to name each and every one of the statements they want to 
>> use plan caching, I see the following choices :

> I don't understand the point here.  We already have cached plans: you
> send a Parse.  You can then Bind/Execute many times.
> Maybe what we need is support for this in libpq, so that PHP can use it?

We already have that, too ...

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cached Query Plans (was: global prepared statements)

2008-04-11 Thread Alvaro Herrera
PFC wrote:

>   So, where to go from that ? I don't see a way to implement this without 
> a (backwards-compatible) change to the wire protocol, because the clients 
> will want to specify when a plan should be cached or not. Since the user  
> should not have to name each and every one of the statements they want to 
> use plan caching, I see the following choices :

I don't understand the point here.  We already have cached plans: you
send a Parse.  You can then Bind/Execute many times.

Maybe what we need is support for this in libpq, so that PHP can use it?

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Cached Query Plans (was: global prepared statements)

2008-04-11 Thread PFC


	Well, I realized the idea of global prepared statements actually sucked,  
so I set on another approach thanks to ideas from this list, this is  
caching query plans.


First, let's see if there is low hanging fruit with the typical small,  
often-executed queries that are so frequent on websites.
Tables test, test2 and test3 contain id (integer primary key) and another  
integer field. There are 100K rows in each.


First, the simplest query :
SELECT * FROM test WHERE id = $1

110 us : Send query as text (PHP:pg_query -> PQexec)
125 us : Parse+Bind (PHP:pg_query_params -> PQexecParams)
 67 us : Execute a previously prepared statement (PHP:pg_execute ->  
PQexecPrepared)


A slightly more complex one but still pretty classic :
SELECT * FROM (SELECT * FROM test WHERE id>$1 ORDER BY id LIMIT 5) AS a  
NATURAL LEFT JOIN test2 NATURAL LEFT JOIN test3 ORDER BY id


523 us : Send query as text (PHP:pg_query -> PQexec)
580 us : Parse+Bind (PHP:pg_query_params -> PQexecParams)
148 us : Execute a previously prepared statement (PHP:pg_execute ->  
PQexecPrepared)


OK, so there is low hanging fruit since the parsing+planning time of those  
is longer than doing the query itself.


Since the Parse message includes a $-parameterized query that is to be  
prepared, it seems logical to put the caching logic there : the query  
string (without parameters) makes a nice cache key.


So I made a really quick and really dirty experimentation without changing  
the wire protocol between client and server. This is only "proof of  
concept".


Try #1 : in exec_parse_message(), if the statement is named, look it up in  
the prepared statements cache, if it is found, return at once and do  
nothing else.
To exploit this, I issue a pg_prepare() followed by pg_execute() at every  
query, wether or not the statement exists. If it already exists,  
pg_prepare() now does nothing (except losing a little time).


Results :
 88 us : simple query
173 us : complex query

So, the timings are between a simple execute and a plan+execute. It  
provides a nice performance gain versus replanning every time, but not  
perfect.


Try #2 : again, in exec_parse_message(), if the statement is unnamed, I  
use the query string as the statement name, search the plan in the  
prepared statements hash table. If it is not found, then it is prepared.  
Then I make the unnamed statement plan point to this. Of course, this is  
dangerous since it probably introduces a dozen crash bugs, but for this  
proof of concept, it's OK.
Client code is unchanged, PQexecParams will benefit from the plan caching,  
since it always sends a Parse+Bind message using the unnamed statement.


Results are identical to executing an execute on a prepared statement,  
modulo a few microseconds.
This means the overhead of sending the Parse message, and of the server  
ignoring it when the statement is cached, is negligible.


	So, where to go from that ? I don't see a way to implement this without a  
(backwards-compatible) change to the wire protocol, because the clients  
will want to specify when a plan should be cached or not. Since the user  
should not have to name each and every one of the statements they want to  
use plan caching, I see the following choices :


	- Add a new Parse+Bind command, which gets the $-parameterized SQL and  
the parameters. If the plan is cached, grab it and execute, else prepare  
and execute. Add a flag to allow the client to specify if he wants caching  
or not.

Pros : Only one message, faster
	Cons : SQL is transmitted in full, useless most of the time, but this  
overhead is rather small.


	- Send the SQL with Bind as statement name, add a flag to Bind telling it  
to report a cache miss instead of raising an error, then have the client  
send a Parse and Bind again.


	- Should there be one common hashtable for named prepared statements and  
cached plans, or two hashtables ? Using the SQL string as the statement  
name is not clean.



























--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers