Re: [HACKERS] first time hacker ;) messing with prepared statements

2008-04-02 Thread PFC


The MAJOR benefit of Microsoft's approach is that it works on existing  
application,


Yes, that is a nice benefit !
	Is there a way to turn it on/off ? Or is it smart enough to only cache  
plans for cases where it is relevant ?
	For instance, I absolutely want some queries to be planned according to  
real parameters (makes huge difference on some search queries, as  
expected), whereas most simple queries like the proverbial select by ID  
etc could be cached without problems...


and, most importantly makes NO assumptions on the "volatile" server  
state. A few cases where the Microsoft solution works, while yours will  
fail is:


* Server restart and assorted like failover (you need to redo a
  global prepare).
* Cleanup and instantiation of a prepared statement.


	Hehe, actually, mine does work after restart since the statements are  
stored in a database-specific system catalog which is persistent.
	Actually, what I store is not the result of PREPARE (a plan) but the text  
of the SQL query "PREPARE foo.", that is I just cut the GLOBAL from  
"GLOBAL PREPARE" and store the rest. The actual PREPARE is realized by  
each connection when it encounters an EXECUTE request and doesn't find the  
cached plan. It is actually extremely simple ;) did you expect a fancy  
shared memory cache (ahem...) ? No, no, it's very basic.
	This way, if a table was dropped and recreated, or whatever other stuff  
that can invalidate a plan since the GLOBAL PREPARE was issued, no  
problem, since there was no global stored plan anyway, just some SQL text.  
Also if a needed table was dropped, the user will get the same error  
message as he would have got issuing a PREPARE for the associated SQL  
query string.
	The overhead of each connection doing its own PREPARE is negligible,  
since, if you use that feature, you intend to issue this query many, many  
times during the life of the persistent connection.


What you are doing for a global query cache is already in consideration  
and having plan invalidation mechanism on schema changes or, maybe,  
statistic updates was a step into that direction. You code mostly  
contributed the other parts already.


As I said it is much simpler than that : I store no plans ;)
Of course this means it only works with persistent connections.

Another considerations is whether most task are getting CPU bound or IO  
bound. A better, per query, plan might reduce IO load due to better use  
of statistics on that single case, while for CPU bound it is very nice  
to reduce the planning overhead significantly.


	Well, if it is IO bound, then this thing is useless. However, since the  
purpose is to optimize often-used, simple queries, the likes of which  
abound in web applications, then it is relevant... because, if this kind  
of simple selects become IO bound, and you have a few on each page, you're  
really in trouble...


Another possible implementation would be to use a connection pooler  
which, when opening a new connection, can be configured to send a SQL  
script containing all the PREPARE statements.
This is, IMHO, an application side feature that might be a good addition  
to PHP and other languages that provide the "persistent connection"  
feature.


	On second thought, if it is not in Postgres, I'd rather put this in the  
connection pooler, because this way it can be used by different  
applications. But then you have to use a connection pooler. Also, by  
putting it in Postgres, statements are only prepared as needed, whereas  
the pooler would have to issue a lot of PREPAREs at connection startup,  
making new connection startup slower.



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] first time hacker ;) messing with prepared statements

2008-03-31 Thread James Mansion

Tom Lane wrote:

PFC <[EMAIL PROTECTED]> writes:
  
	Do the parse tree store fully qualified "schema.table" or  
"schema.function" ?



They store OIDs.
  
So, what happens if we reference a temporary table or something else 
that requires resolution

down a search path?

I believe Microsoft and Sybase have to defer some optimisation because 
of this.


James


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] first time hacker ;) messing with prepared statements

2008-03-31 Thread Tom Lane
PFC <[EMAIL PROTECTED]> writes:
>   Do the parse tree store fully qualified "schema.table" or  
> "schema.function" ?

They store OIDs.

>   I mean, if table T is mentioned in a parse tree which is stored, and 
> the  
> table is later dropped and recreated... or a column dropped... what  
> happens ?

Dependencies take care of that --- if you drop the table, the statement
goes away too.

>> I also wonder whether statements should belong to schemas...

>   Since they are basically an extremely simple form of a function, why 
> not ?
>   (but since part of the goodness on prepared statements is that they are 
>  
> stored in a fast hash cache, wouldn't that add too much overhead ?)

The lookup overhead would be trivial, I expect, compared to everything
else involved in a query.  But what you'd have to work out is the
interaction between that and ordinary prepared statements, which
traditionally haven't had a schema name attached to the statement name.

(Come to think of it, if there's a statement FOO and I explicitly do
PREPARE FOO, what happens?  Should the result depend on whether I've
used FOO earlier in the session?)

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] first time hacker ;) messing with prepared statements

2008-03-31 Thread PFC

* Server restart and assorted like failover (you need to redo a
  global prepare).


Hmm?  He's proposing storing the info in a system catalog.  That hardly
seems "volatile"; it'll certainly survive a server restart.


Yes, it's in a system catalog.


I agree with the point that this isn't completely transparent to
applications, but if an app is already using named prepared statements
it would surely be a pretty small matter to make it use this feature.
The app code would likely get simpler instead of more complex, since
you'd stop worrying about whether a given statement had been prepared
yet in the current session.


Thanks. That was the idea behing this hack...


I'm having a problem with the terminology here, since AFAICT what your
patch does is exactly not a global "prepare" --- there is no permanently
stored cached plan.  That's a good thing probably, but it seems like
the feature needs to be described differently.


	Sure, but I couldn't come up with a suitable name at the time... perhaps  
CREATE STATEMENT (and DROP STATEMENT) ? This would describe it better  
(especially the DROP, because GLOBAL DEALLOCATE is a rather bad name,  
since it doesn't actually deallocate anything...)



I'm also pretty dubious about storing raw text in that catalog.  In the
first place, while I've not looked at your patch, I expect you are
pulling the raw text from debug_query_string.  That won't work in cases
where multiple SQL commands were submitted in one query string.


	LOL, you are right, I had tested with multiple queries on the same line  
from psql, but psql apparently splits the queries, when I feed multiple  
queries from PHP, one of them being GLOBAL PREPARE, it fails.



In the
second place, raw-text SQL commands will be subject to a whole lot of
ambiguity at parse time.  If for instance another session tries to use
the command with a different search_path or standard_conforming_string
setting, it'll get different results.  While I can think of use-cases
for that sort of behavior, it seems like mostly a bad idea.


You're right.


I'm thinking that a more appropriate representation would use stored
parse trees, the same as we do in pg_rewrite, and with the same
dependency information so that a stored statement couldn't outlive the
objects it depends on.


	Do the parse tree store fully qualified "schema.table" or  
"schema.function" ?
	I mean, if table T is mentioned in a parse tree which is stored, and the  
table is later dropped and recreated... or a column dropped... what  
happens ? Dropping the statement would seem more logical, since it would  
probably no longer be valid...



Another area that could do with more thought is the hard-wired
association between statement ownership and accessibility.  That's
likely to be pretty inconvenient in a lot of cases, particularly
systems that use role membership heavily.


Yes, need to think about that.


I also wonder whether statements should belong to schemas...


Since they are basically an extremely simple form of a function, why 
not ?
	(but since part of the goodness on prepared statements is that they are  
stored in a fast hash cache, wouldn't that add too much overhead ?)


Thanks for the helpful advice.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] first time hacker ;) messing with prepared statements

2008-03-30 Thread Tom Lane
Joris Dobbelsteen <[EMAIL PROTECTED]> writes:
> The MAJOR benefit of Microsoft's approach is that it works on existing 
> application, and, most importantly makes NO assumptions on the 
> "volatile" server state. A few cases where the Microsoft solution works, 
> while yours will fail is:

> * Server restart and assorted like failover (you need to redo a
>   global prepare).

Hmm?  He's proposing storing the info in a system catalog.  That hardly
seems "volatile"; it'll certainly survive a server restart.

> * Cleanup and instantiation of a prepared statement.

Again, it's not clear what you've got in mind.

I agree with the point that this isn't completely transparent to
applications, but if an app is already using named prepared statements
it would surely be a pretty small matter to make it use this feature.
The app code would likely get simpler instead of more complex, since
you'd stop worrying about whether a given statement had been prepared
yet in the current session.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] first time hacker ;) messing with prepared statements

2008-03-30 Thread Joris Dobbelsteen

PFC wrote:

PFC wrote:

Hello,
So, I embarked (yesterday) on a weekend project to add a new 
feature to Postgres...
I use PHP with persistent connections and always have been 
bothered that those very small AJAX queries (usually simple selects 
returning 1 row) take more CPU in postgres to parse & plan than to 
actually execute.
Microsoft's answer to this issue with SQLServer appears to have been 
to introduce a smart
cache for all statement plans. It seems to be very effective. I guess 
you're doing much the

same thing but with more user intervention,  in effect.
Actually, the main purpose was to 1) have fun hacking Postgres, and 2) 
perhaps something useful would come of it...

Nice job so far!
Since you have achieved (1), now its time to get to (2) and I thinks its 
very well possible. In general its quite nice to have parts of your work 
integrated.
And I did find it very interesting, probably due to the fact that 
Postgres source code is so... impressively clean... well organized... 
readable... it took two hours from downloading the source to having an 
extra functional system catalog, and it worked at the first compile, 
all due to the concise but to the point comments in include/catalog, I 
couldn't believe it.
Anyway, Microsoft's solution is cool, too, but you need to reparse 
the entire query to then detect "I've planned this query before, with 
other parameters, so I'll reuse that prepared plan", so it adds 
another parsing step, which is less efficient.
The MAJOR benefit of Microsoft's approach is that it works on existing 
application, and, most importantly makes NO assumptions on the 
"volatile" server state. A few cases where the Microsoft solution works, 
while yours will fail is:


   * Server restart and assorted like failover (you need to redo a
 global prepare).
   * Cleanup and instantiation of a prepared statement.

Postgres could also do that with a small modification, by the way 
: like by using the entire string (with $1 style parameters) instead 
of the statement name, use that as a cache key, and send parameters 
separately, but I think it would be less clean than, say, a statement 
called "get_user_by_id" or something.
Also I like the idea of named prepared queries, which feel a bit 
like procedures, because many of those small, often-used queries would 
end up being defined in the same place, which makes schema changes 
(and having to modify queries in your application) slightly less painful.
PHP can also use pg_exec() which sends the parameters separately, 
automagically converted to postgres format, so you save time and 
hassle on quoting and de-quoting versus a SQL EXECUTE.
Since the performance of pg_exec() is almost twice as fast as 
plain SQL, and PHP scripts tend to use quite a lot of these queries, 
it also means a free database server performance upgrade (ie. the same 
DB server can handle more webservers, for instance).
What you are doing for a global query cache is already in consideration 
and having plan invalidation mechanism on schema changes or, maybe, 
statistic updates was a step into that direction. You code mostly 
contributed the other parts already.


I thinks a good middle ground to address the "persistence" problems I 
think there are and keep close to your patch might be using "stored 
procedures" and have these cached globally.


Another considerations is whether most task are getting CPU bound or IO 
bound. A better, per query, plan might reduce IO load due to better use 
of statistics on that single case, while for CPU bound it is very nice 
to reduce the planning overhead significantly.
Another possible implementation would be to use a connection pooler 
which, when opening a new connection, can be configured to send a SQL 
script containing all the PREPARE statements.
This is, IMHO, an application side feature that might be a good addition 
to PHP and other languages that provide the "persistent connection" feature.


- Joris

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] first time hacker ;) messing with prepared statements

2008-03-30 Thread PFC



PFC wrote:

Hello,
So, I embarked (yesterday) on a weekend project to add a new  
feature to Postgres...
I use PHP with persistent connections and always have been bothered  
that those very small AJAX queries (usually simple selects returning 1  
row) take more CPU in postgres to parse & plan than to actually execute.
Microsoft's answer to this issue with SQLServer appears to have been to  
introduce a smart
cache for all statement plans. It seems to be very effective. I guess  
you're doing much the

same thing but with more user intervention,  in effect.


	Actually, the main purpose was to 1) have fun hacking Postgres, and 2)  
perhaps something useful would come of it...
	And I did find it very interesting, probably due to the fact that  
Postgres source code is so... impressively clean... well organized...  
readable... it took two hours from downloading the source to having an  
extra functional system catalog, and it worked at the first compile, all  
due to the concise but to the point comments in include/catalog, I  
couldn't believe it.


	Anyway, Microsoft's solution is cool, too, but you need to reparse the  
entire query to then detect "I've planned this query before, with other  
parameters, so I'll reuse that prepared plan", so it adds another parsing  
step, which is less efficient.
	Postgres could also do that with a small modification, by the way : like  
by using the entire string (with $1 style parameters) instead of the  
statement name, use that as a cache key, and send parameters separately,  
but I think it would be less clean than, say, a statement called  
"get_user_by_id" or something.
	Also I like the idea of named prepared queries, which feel a bit like  
procedures, because many of those small, often-used queries would end up  
being defined in the same place, which makes schema changes (and having to  
modify queries in your application) slightly less painful.
	PHP can also use pg_exec() which sends the parameters separately,  
automagically converted to postgres format, so you save time and hassle on  
quoting and de-quoting versus a SQL EXECUTE.
	Since the performance of pg_exec() is almost twice as fast as plain SQL,  
and PHP scripts tend to use quite a lot of these queries, it also means a  
free database server performance upgrade (ie. the same DB server can  
handle more webservers, for instance).


	Another possible implementation would be to use a connection pooler  
which, when opening a new connection, can be configured to send a SQL  
script containing all the PREPARE statements.


Are you sure that you application wouldn't benefit more from a MOM  
solution with
persisted database connections? Have you looked at  
http://safmq.sourceforge.net/?


Dunno. Is this related to Postgres ?

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] first time hacker ;) messing with prepared statements

2008-03-30 Thread James Mansion

PFC wrote:

Hello,
So, I embarked (yesterday) on a weekend project to add a new 
feature to Postgres...
I use PHP with persistent connections and always have been 
bothered that those very small AJAX queries (usually simple selects 
returning 1 row) take more CPU in postgres to parse & plan than to 
actually execute.
Microsoft's answer to this issue with SQLServer appears to have been to 
introduce a smart
cache for all statement plans. It seems to be very effective. I guess 
you're doing much the

same thing but with more user intervention,  in effect.

Are you sure that you application wouldn't benefit more from a MOM 
solution with
persisted database connections? Have you looked at 
http://safmq.sourceforge.net/?


James


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers