subject:"\[HACKERS\] Cached plans and statement generalization"


On 04/26/2017 08:08 PM, Doug Doole wrote:


A naive option would be to invalidate anything that depends on table or 
view *.FOOBAR. You could probably make it a bit smarter by also requiring that 
schema A appear in the path.


This has been rumbling around in my head. I wonder if you could solve this 
problem by registering dependencies on objects which don't yet exist. Consider:

CREATE TABLE C.T1(...);
CREATE TABLE C.T2(...);
SET search_path='A,B,C,D';
SELECT * FROM C.T1, T2;

For T1, you'd register a hard dependency on C.T1 and no virtual dependencies 
since the table is explicitly qualified.

For T2, you'd register a hard dependency on C.T2 since that is the table that was selected for the query. You'd also register virtual dependencies on A.T2 and B.T2 since if either of those tables (or views) are created you need to recompile the 
statement. (Note that no virtual dependency is created on D.T2() since that table would never be selected by the compiler.)


The catch is that virtual dependencies would have to be recorded and searched 
as strings, not OIDs since the objects don't exist. Virtual dependencies only 
have to be checked during CREATE processing though, so that might not be too 
bad.

But this is getting off topic - I just wanted to capture the idea while it was 
rumbling around.


I think that it will be enough to handle modification of search path and 
invalidate prepared statements cache in this case.

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] Cached plans and statement generalization

2017-04-26 Thread Doug Doole

>
> A naive option would be to invalidate anything that depends on table or
> view *.FOOBAR. You could probably make it a bit smarter by also requiring
> that schema A appear in the path.
>

This has been rumbling around in my head. I wonder if you could solve this
problem by registering dependencies on objects which don't yet exist.
Consider:

CREATE TABLE C.T1(...);
CREATE TABLE C.T2(...);
SET search_path='A,B,C,D';
SELECT * FROM C.T1, T2;

For T1, you'd register a hard dependency on C.T1 and no virtual
dependencies since the table is explicitly qualified.

For T2, you'd register a hard dependency on C.T2 since that is the table
that was selected for the query. You'd also register virtual dependencies
on A.T2 and B.T2 since if either of those tables (or views) are created you
need to recompile the statement. (Note that no virtual dependency is
created on D.T2() since that table would never be selected by the compiler.)

The catch is that virtual dependencies would have to be recorded and
searched as strings, not OIDs since the objects don't exist. Virtual
dependencies only have to be checked during CREATE processing though, so
that might not be too bad.

But this is getting off topic - I just wanted to capture the idea while it
was rumbling around.

Re: [HACKERS] Cached plans and statement generalization

2017-04-26 Thread Pavel Stehule

2017-04-26 12:30 GMT+02:00 Konstantin Knizhnik :

>
>
> On 26.04.2017 10:49, Konstantin Knizhnik wrote:
>
>
>
> On 26.04.2017 04:00, Tsunakawa, Takayuki wrote:   Are you considering some
> upper limit on the number of prepared statements?
> In this case we need some kind of LRU for maintaining cache of
> autoprepared statements.
> I think that it is good idea to have such limited cached - it can avoid
> memory overflow problem.
> I will try to implement it.
>
>
> I attach new patch which allows to limit the number of autoprepared
> statements (autoprepare_limit GUC variable).
> Also I did more measurements, now with several concurrent connections and
> read-only statements.
> Results of pgbench with 10 connections, scale 10 and read-only statements
> are below:
>
> Protocol
> TPS
> extended
> 87k
> prepared
> 209k
> simple+autoprepare
> 206k
>
> As you can see, autoprepare provides more than 2 times speed improvement.
>
> Also I tried to measure overhead of parsing (to be able to substitute all
> literals, not only string literals).
> I just added extra call of pg_parse_query. Speed is reduced to 181k.
> So overhead is noticeable, but still making such optimization useful.
> This is why I want to ask question:  is it better to implement slower but
> safer and more universal solution?
>

Unsafe solution has not any sense, and it is dangerous (80% of database
users has not necessary knowledge). If somebody needs the max possible
performance, then he use explicit prepared statements.

Regards

Pavel


>
>
> --
> Konstantin Knizhnik
> Postgres Professional: http://www.postgrespro.com
> The Russian Postgres Company
>
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>
>

Re: [HACKERS] Cached plans and statement generalization




On 26.04.2017 10:49, Konstantin Knizhnik wrote:



On 26.04.2017 04:00, Tsunakawa, Takayuki wrote:   Are you considering 
some upper limit on the number of prepared statements?
In this case we need some kind of LRU for maintaining cache of 
autoprepared statements.
I think that it is good idea to have such limited cached - it can 
avoid memory overflow problem.

I will try to implement it.


I attach new patch which allows to limit the number of autoprepared 
statements (autoprepare_limit GUC variable).
Also I did more measurements, now with several concurrent connections 
and read-only statements.
Results of pgbench with 10 connections, scale 10 and read-only 
statements are below:


Protocol
TPS
extended
87k
prepared
209k
simple+autoprepare
206k


As you can see, autoprepare provides more than 2 times speed improvement.

Also I tried to measure overhead of parsing (to be able to substitute 
all literals, not only string literals).

I just added extra call of pg_parse_query. Speed is reduced to 181k.
So overhead is noticeable, but still making such optimization useful.
This is why I want to ask question:  is it better to implement slower 
but safer and more universal solution?


--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index f6be98b..0c9abfc 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -188,6 +188,7 @@ static bool IsTransactionStmtList(List *parseTrees);
 static void drop_unnamed_stmt(void);
 static void SigHupHandler(SIGNAL_ARGS);
 static void log_disconnections(int code, Datum arg);
+static bool exec_cached_query(const char *query_string);
 
 
 /* 
@@ -916,6 +917,14 @@ exec_simple_query(const char *query_string)
 	drop_unnamed_stmt();
 
 	/*
+	 * Try to find cached plan
+	 */
+	if (autoprepare_threshold != 0 && exec_cached_query(query_string))
+	{
+		return;
+	}
+
+	/*
 	 * Switch to appropriate context for constructing parsetrees.
 	 */
 	oldcontext = MemoryContextSwitchTo(MessageContext);
@@ -4500,3 +4509,606 @@ log_disconnections(int code, Datum arg)
 	port->user_name, port->database_name, port->remote_host,
   port->remote_port[0] ? " port=" : "", port->remote_port)));
 }
+
+typedef struct { 
+	char const*   query;
+	dlist_nodelru;
+	int64 exec_count;
+	CachedPlanSource* plan;	
+	int   n_params;
+	int16 format;
+	bool  disable_autoprepare;
+} plan_cache_entry;
+
+/*
+ * Replace string literals with parameters. We do not consider integer or real literals to avoid problems with 
+ * negative number, user defined operators, ... For example it is not easy to distinguish cases (-1), (1-1), (1-1)-1
+ */
+static void generalize_statement(const char *query_string, char** gen_query, char** query_params, int* n_params)
+{
+	size_t query_len = strlen(query_string);
+	char const* src = query_string;
+	char* dst;
+	char* params;
+	unsigned char ch;
+
+	*n_params = 0;
+
+	*gen_query = (char*)palloc(query_len*2); /* assume that we have less than 1000 parameters, the worst case is replacing '' with $999 */
+	*query_params = (char*)palloc(query_len + 1);
+	dst = *gen_query;
+	params = *query_params;
+
+	while ((ch = *src++) != '\0') { 
+		if (isspace(ch)) { 
+			/* Replace sequence of whitespaces with just one space */
+			while (*src && isspace(*(unsigned char*)src)) { 
+src += 1;
+			}
+			*dst++ = ' ';
+		} else if (ch == '\'') { 
+			while (true) { 
+ch = *src++;
+if (ch == '\'') { 
+	if (*src != '\'') { 
+		break;
+	} else {
+		/* escaped quote */
+		*params++ = '\'';
+		src += 1;
+	}
+} else { 
+	*params++ = ch;
+}
+			}
+			*params++ = '\0';
+			dst += sprintf(dst, "$%d", ++*n_params);
+		} else { 
+			*dst++ = ch;
+		}
+	}			
+	Assert(dst <= *gen_query + query_len);
+	Assert(params <= *query_params + query_len*2);
+	*dst = '\0';
+}
+
+static uint32 plan_cache_hash_fn(const void *key, Size keysize)
+{
+	return string_hash(((plan_cache_entry*)key)->query, 0);
+}
+
+static int plan_cache_match_fn(const void *key1, const void *key2, Size keysize)
+{
+	return strcmp(((plan_cache_entry*)key1)->query, ((plan_cache_entry*)key2)->query);
+}
+
+static void* plan_cache_keycopy_fn(void *dest, const void *src, Size keysize)
+{ 
+	((plan_cache_entry*)dest)->query = pstrdup(((plan_cache_entry*)src)->query);
+return dest;
+}
+
+#define PLAN_CACHE_SIZE 113
+
+size_t nPlanCacheHits;
+size_t nPlanCacheMisses;
+
+/*
+ * Try to generalize query, find cached plan for it and execute
+ */
+static bool exec_cached_query(const char *query_string)
+{
+	CommandDest   dest = whereToSendOutput;
+	DestReceiver *receiver;
+	char *gen_query;
+	char *query_params;
+	int   n_params;
+	plan_cache_entry *entry;
+	bo

Re: [HACKERS] Cached plans and statement generalization




On 26.04.2017 04:00, Tsunakawa, Takayuki wrote:

From: pgsql-hackers-ow...@postgresql.org

[mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Konstantin
Knizhnik
Well, first of all I want to share results I already get: pgbench with default
parameters,  scale 10 and one connection:

So autoprepare is as efficient as explicit prepare and can increase
performance almost two times.

This sounds great.

BTW, when are the autoprepared statements destroyed?
Right now them are destroyed only in case of receiving invalidation  
message (when catalog is changed).
Prepared statements are local to backend and are located in backend's  
memory.
It is unlikely, that there will be too much different queries which  
cause memory overflow.

But in theory such situation is certainly possible.



  Are you considering some upper limit on the number of prepared statements?
In this case we need some kind of LRU for maintaining cache of  
autoprepared statements.
I think that it is good idea to have such limited cached - it can avoid  
memory overflow problem.

I will try to implement it.



Regards
Takayuki Tsunakawa




--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Cached plans and statement generalization




On 26.04.2017 01:34, Andres Freund wrote:

Hi,

(FWIW, on this list we don't do top-quotes)

On 2017-04-25 22:21:22 +, Doug Doole wrote:

Plan invalidation was no different than for any SQL statement. DB2 keeps a
list of the objects the statement depends on. If any of the objects changes
in an incompatible way the plan is invalidated and kicked out of the cache.

I suspect what is more interesting is plan lookup. DB2 has something called
the "compilation environment". This is a collection of everything that
impacts how a statement is compiled (SQL path, optimization level, etc.).
Plan lookup is done using both the statement text and the compilation
environment. So, for example, if my path is DOUG, MYTEAM, SYSIBM and your
path is ANDRES, MYTEAM, SYSIBM we will have different compilation
environments. If we both issue "SELECT * FROM T" we'll end up with
different cache entries even if T in both of our statements resolves to
MYTEAM.T. If I execute "SELECT * FROM T", change my SQL path and then
execute "SELECT * FROM T" again, I have a new compilation environment so
the second invocation of the statement will create a new entry in the
cache. The first entry is not kicked out - it will still be there for
re-use if I change my SQL path back to my original value (modulo LRU for
cache memory management of course).

It's not always that simple, at least in postgres, unless you disregard
search_path.  Consider e.g. cases like

CREATE SCHEMA a;
CREATE SCHEMA b;
CREATE TABLE a.foobar(somecol int);
SET search_patch = 'b,a';
SELECT * FROM foobar;
CREATE TABLE b.foobar(anothercol int);
SELECT * FROM foobar; -- may not be cached plan from before!

it sounds - my memory of DB2 is very faint, and I never used it much -
like similar issues could arise in DB2 too?


There is the same problem with explicitly prepared statements, isn't it?
Certainly in case of using prepared statements it is responsibility of 
programmer to avoid such collisions.

And in case of autoprepare programmer it is hidden from programming.
But there is guc variable controlling autoprepare feature and by default 
it is switched off.
So if programmer or DBA enables it, then them should take in account 
effects of such decision.


By the way, isn't it a bug in PostgreSQL that altering search path is 
not invalidating cached plans?
As I already mentioned, the same problem can be reproduced with 
explicitly prepared statements.






Greetings,

Andres Freund


--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Cached plans and statement generalization




On 26.04.2017 00:47, Andres Freund wrote:

On 2017-04-25 21:11:08 +, Doug Doole wrote:

When I did this in DB2, I didn't use the parser - it was too expensive. I
just tokenized the statement and used some simple rules to bypass the
invalid cases. For example, if I saw the tokens "ORDER" and "BY" then I'd
disallow replacement replacement until I hit the end of the current
subquery or statement.

How did you manage plan invalidation and such?


The same mechanism as for prepared statements.
Cached plans are linked in the list by SaveCachedPlan function and are 
invalidated by PlanCacheRelCallback.





- Andres




--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Cached plans and statement generalization

2017-04-25 Thread Tsunakawa, Takayuki

From: pgsql-hackers-ow...@postgresql.org
> [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Konstantin
> Knizhnik
> Well, first of all I want to share results I already get: pgbench with default
> parameters,  scale 10 and one connection:
> 
> So autoprepare is as efficient as explicit prepare and can increase
> performance almost two times.

This sounds great.

BTW, when are the autoprepared statements destroyed?  Are you considering some 
upper limit on the number of prepared statements?

Regards
Takayuki Tsunakawa




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Cached plans and statement generalization

via Newton Mail 
[https://cloudmagic.com/k/d/mailapp?ct=dx&cv=9.4.52&pv=10.11.6&source=email_footer_2]
On Tue, Apr 25, 2017 at 3:48 PM, Doug Doole  wrote: It's 
not always that simple, at least in postgres, unless you disregard
search_path. Consider e.g. cases like

CREATE SCHEMA a;
CREATE SCHEMA b;
CREATE TABLE a.foobar(somecol int);
SET search_patch = 'b,a';
SELECT * FROM foobar;
CREATE TABLE b.foobar(anothercol int);
SELECT * FROM foobar; -- may not be cached plan from before!

it sounds - my memory of DB2 is very faint, and I never used it much -
like similar issues could arise in DB2 too?

DB2 does handle this case. Unfortunately I don't know the details of how it 
worked though.
A naive option would be to invalidate anything that depends on table or view 
*.FOOBAR. You could probably make it a bit smarter by also requiring that 
schema A appear in the path. While this specific scenario does not arise in DB2 
since it uses CURRENT SCHEMA only for tables (much to my dislike) your examples 
holds for functions and types which are resolved by path. For encapsulated SQL 
(in views, functions) conservative semantics are enforced via including the 
timestamp. For dynamic SQL the problem you describe does exist though and I 
think it is handled in the way Doug describes. However, as noted by Doug the 
topic of plan invalidation is really orthogonal to normalizing the queries. All 
it does is provide more opportunities to run into any pre-existing bugs.
Cheers Serge
PS: I’m just starting to look at the plan invalidation code in PG because we 
are dealing with potentially 10s of thousands of cached SQL statements. So 
these complete wipe outs or walks of every plan in the cache don’t scale.

Re: [HACKERS] Cached plans and statement generalization

2017-04-25 Thread Finnerty, Jim


On 4/25/17, 6:34 PM, "pgsql-hackers-ow...@postgresql.org on behalf of Andres 
Freund"  
wrote:

It's not always that simple, at least in postgres, unless you disregard
search_path.  Consider e.g. cases like

CREATE SCHEMA a;
CREATE SCHEMA b;
CREATE TABLE a.foobar(somecol int);
SET search_patch = 'b,a';
SELECT * FROM foobar;
CREATE TABLE b.foobar(anothercol int);
SELECT * FROM foobar; -- may not be cached plan from before!

it sounds - my memory of DB2 is very faint, and I never used it much -
like similar issues could arise in DB2 too?

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


You may need to store the reloid of each relation in the cached query to ensure 
that the schema bindings are the same, and also invalidate dependent cache 
entries if a referenced relation is dropped. 

Regards,

Jim Finnerty


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Cached plans and statement generalization

2017-04-25 Thread Doug Doole

>
> (FWIW, on this list we don't do top-quotes)
>

I know. Forgot and just did "reply all". My bad.

It's not always that simple, at least in postgres, unless you disregard
> search_path.  Consider e.g. cases like
>
> CREATE SCHEMA a;
> CREATE SCHEMA b;
> CREATE TABLE a.foobar(somecol int);
> SET search_patch = 'b,a';
> SELECT * FROM foobar;
> CREATE TABLE b.foobar(anothercol int);
> SELECT * FROM foobar; -- may not be cached plan from before!
>
> it sounds - my memory of DB2 is very faint, and I never used it much -
> like similar issues could arise in DB2 too?
>

DB2 does handle this case. Unfortunately I don't know the details of how it
worked though.

A naive option would be to invalidate anything that depends on table or
view *.FOOBAR. You could probably make it a bit smarter by also requiring
that schema A appear in the path.

- Doug

Re: [HACKERS] Cached plans and statement generalization

2017-04-25 Thread David G. Johnston

On Tue, Apr 25, 2017 at 3:24 PM, David Fetter  wrote:

> I don't have an exploit yet.  What concerns me is attackers' access to
> what is in essence the ability to poke at RULEs when they only have
> privileges to read.
>

If they want to see how it works they can read the source code.  In terms
of runtime data it would limited to whatever the session itself created.
In most cases the presence of the cache would be invisible.  I suppose it
might appear if one were to explain a query, reset the session, explain
another query and then re-explain the original.  If the chosen plan in the
second pass differed because of the presence of the leading query it would
be noticeable but not revealing.  Albeit I'm a far cry from a security
expert...

David J.

Re: [HACKERS] Cached plans and statement generalization

2017-04-25 Thread Andres Freund

Hi,

(FWIW, on this list we don't do top-quotes)

On 2017-04-25 22:21:22 +, Doug Doole wrote:
> Plan invalidation was no different than for any SQL statement. DB2 keeps a
> list of the objects the statement depends on. If any of the objects changes
> in an incompatible way the plan is invalidated and kicked out of the cache.
> 
> I suspect what is more interesting is plan lookup. DB2 has something called
> the "compilation environment". This is a collection of everything that
> impacts how a statement is compiled (SQL path, optimization level, etc.).
> Plan lookup is done using both the statement text and the compilation
> environment. So, for example, if my path is DOUG, MYTEAM, SYSIBM and your
> path is ANDRES, MYTEAM, SYSIBM we will have different compilation
> environments. If we both issue "SELECT * FROM T" we'll end up with
> different cache entries even if T in both of our statements resolves to
> MYTEAM.T. If I execute "SELECT * FROM T", change my SQL path and then
> execute "SELECT * FROM T" again, I have a new compilation environment so
> the second invocation of the statement will create a new entry in the
> cache. The first entry is not kicked out - it will still be there for
> re-use if I change my SQL path back to my original value (modulo LRU for
> cache memory management of course).

It's not always that simple, at least in postgres, unless you disregard
search_path.  Consider e.g. cases like

CREATE SCHEMA a;
CREATE SCHEMA b;
CREATE TABLE a.foobar(somecol int);
SET search_patch = 'b,a';
SELECT * FROM foobar;
CREATE TABLE b.foobar(anothercol int);
SELECT * FROM foobar; -- may not be cached plan from before!

it sounds - my memory of DB2 is very faint, and I never used it much -
like similar issues could arise in DB2 too?

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Cached plans and statement generalization

2017-04-25 Thread David Fetter

On Tue, Apr 25, 2017 at 11:35:21PM +0300, Konstantin Knizhnik wrote:
> On 04/25/2017 07:54 PM, David Fetter wrote:
> > On Tue, Apr 25, 2017 at 06:11:09PM +0300, Konstantin Knizhnik wrote:
> > > On 24.04.2017 21:43, Andres Freund wrote:
> > > > Hi,
> > > > 
> > > > On 2017-04-24 11:46:02 +0300, Konstantin Knizhnik wrote:
> > > > > So what I am thinking now is implicit query caching. If the same 
> > > > > query with
> > > > > different literal values is repeated many times, then we can try to
> > > > > generalize this query and replace it with prepared query with
> > > > > parameters.
> > > > That's not actuall all that easy:
> > > > - You pretty much do parse analysis to be able to do an accurate match.
> > > > How much overhead is parse analysis vs. planning in your cases?
> > > > - The invalidation infrastructure for this, if not tied to to fully
> > > > parse-analyzed statements, is going to be hell.
> > > > - Migrating to parameters can actually cause significant slowdowns, not
> > > > nice if that happens implicitly.
> > > Well, first of all I want to share results I already get: pgbench with
> > > default parameters,  scale 10 and one connection:
> > > 
> > > protocol
> > >   TPS
> > > simple
> > >   3492
> > > extended
> > >   2927
> > > prepared
> > >   6865
> > > simple + autoprepare
> > >   6844
> > If this is string mashing on the unparsed query, as it appears to be,
> > it's going to be a perennial source of security issues.
> 
> Sorry, may be I missed something, but I can not understand how
> security can be violated by extracting string literals from query. I
> am just copying bytes from one buffer to another. I do not try to
> somehow interpret this parameters.  What I am doing is very similar
> with standard prepared statements.  And moreover query is parsed!
> Only query which was already parsed and executed (but with different
> values of parameters) can be autoprepared.

I don't have an exploit yet.  What concerns me is attackers' access to
what is in essence the ability to poke at RULEs when they only have
privileges to read.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david(dot)fetter(at)gmail(dot)com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Cached plans and statement generalization

2017-04-25 Thread Doug Doole

Plan invalidation was no different than for any SQL statement. DB2 keeps a
list of the objects the statement depends on. If any of the objects changes
in an incompatible way the plan is invalidated and kicked out of the cache.

I suspect what is more interesting is plan lookup. DB2 has something called
the "compilation environment". This is a collection of everything that
impacts how a statement is compiled (SQL path, optimization level, etc.).
Plan lookup is done using both the statement text and the compilation
environment. So, for example, if my path is DOUG, MYTEAM, SYSIBM and your
path is ANDRES, MYTEAM, SYSIBM we will have different compilation
environments. If we both issue "SELECT * FROM T" we'll end up with
different cache entries even if T in both of our statements resolves to
MYTEAM.T. If I execute "SELECT * FROM T", change my SQL path and then
execute "SELECT * FROM T" again, I have a new compilation environment so
the second invocation of the statement will create a new entry in the
cache. The first entry is not kicked out - it will still be there for
re-use if I change my SQL path back to my original value (modulo LRU for
cache memory management of course).

With literal replacement, the cache entry is on the modified statement
text. Given the modified statement text and the compilation environment,
you're guaranteed to get the right plan entry.

On Tue, Apr 25, 2017 at 2:47 PM Andres Freund  wrote:

> On 2017-04-25 21:11:08 +, Doug Doole wrote:
> > When I did this in DB2, I didn't use the parser - it was too expensive. I
> > just tokenized the statement and used some simple rules to bypass the
> > invalid cases. For example, if I saw the tokens "ORDER" and "BY" then I'd
> > disallow replacement replacement until I hit the end of the current
> > subquery or statement.
>
> How did you manage plan invalidation and such?
>
> - Andres
>

Re: [HACKERS] Cached plans and statement generalization

2017-04-25 Thread Andres Freund

On 2017-04-25 21:11:08 +, Doug Doole wrote:
> When I did this in DB2, I didn't use the parser - it was too expensive. I
> just tokenized the statement and used some simple rules to bypass the
> invalid cases. For example, if I saw the tokens "ORDER" and "BY" then I'd
> disallow replacement replacement until I hit the end of the current
> subquery or statement.

How did you manage plan invalidation and such?

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Cached plans and statement generalization

2017-04-25 Thread Doug Doole

When I did this in DB2, I didn't use the parser - it was too expensive. I
just tokenized the statement and used some simple rules to bypass the
invalid cases. For example, if I saw the tokens "ORDER" and "BY" then I'd
disallow replacement replacement until I hit the end of the current
subquery or statement.

There are a few limitations to this approach. For example, DB2 allowed you
to cast using function notation like VARCHAR(foo, 10). This meant I would
never replace the second parameter of any VARCHAR function. Now it's
possible that when the statement was fully compiled we'd find that
VARCHAR(foo,10) actually resolved to BOB.VARCHAR() instead of the built-in
cast function. Our thinking that these cases were rare enough that we
wouldn't worry about them. (Of course, PostgreSQL's ::VARCHAR(10) syntax
avoids this problem completely.)

Because SQL is so structured, the implementation ended up being quite
simple (a few hundred line of code) with no significant maintenance issues.
(Other developers had no problem adding in new cases where constants had to
be preserved.)

The simple tokenizer was also fairly extensible. I'd prototyped using the
same code to also normalize statements (uppercase all keywords, collapse
whitespace to a single blank, etc.) but that feature was never added to the
product.

- Doug

On Tue, Apr 25, 2017 at 1:47 PM Konstantin Knizhnik <
k.knizh...@postgrespro.ru> wrote:

> On 04/25/2017 11:40 PM, Serge Rielau wrote:
>
>
> On Apr 25, 2017, at 1:37 PM, Konstantin Knizhnik <
> k.knizh...@postgrespro.ru> wrote:
>
>
> SELECT ‘hello’::CHAR(10) || ‘World’, 5 + 6;
>
> You can substitute ‘hello’, ‘World’, 5, and 6. But not 10.
>
>
> I am substituting only string literals. So the query above will be
> transformed to
>
> SELECT $1::CHAR(10) || $2, 5 + 6;
>
> What's wrong with it?
>
>
> Oh, well that leaves a lot of opportunities on the table, doesn’t it?
>
>
> Well, actually my primary intention was not to make badly designed
> programs (not using prepared statements) work faster.
> I wanted to address cases when it is not possible to use prepared
> statements.
> If we want to substitute with parameters as much literals as possible,
> then parse+deparse tree seems to be the only reasonable approach.
> I will try to implement it also, just to estimate parsing overhead.
>
>
>
>
> Cheers
> Serge
>
>
>
> --
> Konstantin Knizhnik
> Postgres Professional: http://www.postgrespro.com
> The Russian Postgres Company
>
>

Re: [HACKERS] Cached plans and statement generalization


On 04/25/2017 11:40 PM, Serge Rielau wrote:



On Apr 25, 2017, at 1:37 PM, Konstantin Knizhnik mailto:k.knizh...@postgrespro.ru>> wrote:



SELECT ‘hello’::CHAR(10) || ‘World’, 5 + 6;

You can substitute ‘hello’, ‘World’, 5, and 6. But not 10.


I am substituting only string literals. So the query above will be transformed 
to

SELECT $1::CHAR(10) || $2, 5 + 6;

What's wrong with it?


Oh, well that leaves a lot of opportunities on the table, doesn’t it?


Well, actually my primary intention was not to make badly designed programs 
(not using prepared statements) work faster.
I wanted to address cases when it is not possible to use prepared statements.
If we want to substitute with parameters as much literals as possible, then 
parse+deparse tree seems to be the only reasonable approach.
I will try to implement it also, just to estimate parsing overhead.





Cheers
Serge




--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] Cached plans and statement generalization


> On Apr 25, 2017, at 1:37 PM, Konstantin Knizhnik  
> wrote:
>> 
>> SELECT ‘hello’::CHAR(10) || ‘World’, 5 + 6;
>> 
>> You can substitute ‘hello’, ‘World’, 5, and 6. But not 10.
> 
> I am substituting only string literals. So the query above will be 
> transformed to 
> 
> SELECT $1::CHAR(10) || $2, 5 + 6;
> 
> What's wrong with it?

Oh, well that leaves a lot of opportunities on the table, doesn’t it?

Cheers
Serge

Re: [HACKERS] Cached plans and statement generalization


On 04/25/2017 08:09 PM, Serge Rielau wrote:


On Tue, Apr 25, 2017 at 9:45 AM, Konstantin Knizhnik 
 wrote:

On 25.04.2017 19:12, Serge Rielau wrote:


On Apr 25, 2017, at 8:11 AM, Konstantin Knizhnik mailto:k.knizh...@postgrespro.ru>> wrote:
Another problem is caused by using integer literals in context where 
parameters can not be used, for example "order by 1”.

You will also need to deal with modifiers in types such as VARCHAR(10). 
Not sure if there are specific functions which can only deal with literals (?) 
as well.

Sorry, I do not completely understand how presence of type modifiers can 
affect string literals used in query.
Can you provide me some example?

SELECT ‘hello’::CHAR(10) || ‘World’, 5 + 6;

You can substitute ‘hello’, ‘World’, 5, and 6. But not 10.


I am substituting only string literals. So the query above will be transformed 
to

SELECT $1::CHAR(10) || $2, 5 + 6;

What's wrong with it?



Also some OLAP syntax like “rows preceding”

It pretty much boils down to whether you can do some shallow parsing rather 
than expending the effort to build the parse tree.

Cheers
Serge



--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] Cached plans and statement generalization

On 04/25/2017 07:54 PM, David Fetter wrote:

On Tue, Apr 25, 2017 at 06:11:09PM +0300, Konstantin Knizhnik wrote:

On 24.04.2017 21:43, Andres Freund wrote:

Hi,

On 2017-04-24 11:46:02 +0300, Konstantin Knizhnik wrote:

So what I am thinking now is implicit query caching. If the same query with
different literal values is repeated many times, then we can try to
generalize this query and replace it with prepared query with
parameters.

That's not actuall all that easy:
- You pretty much do parse analysis to be able to do an accurate match.
How much overhead is parse analysis vs. planning in your cases?
- The invalidation infrastructure for this, if not tied to to fully
parse-analyzed statements, is going to be hell.
- Migrating to parameters can actually cause significant slowdowns, not
nice if that happens implicitly.

Well, first of all I want to share results I already get: pgbench with
default parameters, scale 10 and one connection:

protocol
TPS
simple
3492
extended
2927
prepared
6865
simple + autoprepare
6844

If this is string mashing on the unparsed query, as it appears to be,
it's going to be a perennial source of security issues.

Sorry, may be I missed something, but I can not understand how security can be violated by extracting string literals from query. I am just copying bytes from one buffer to another. I do not try to somehow interpret this parameters. What I am doing is
very similar with standard prepared statements.

And moreover query is parsed! Only query which was already parsed and executed
(but with different values of parameters) can be autoprepared.

Best,
David.

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Cached plans and statement generalization

On Tue, Apr 25, 2017 at 9:45 AM, Konstantin Knizhnik 
 wrote: On 25.04.2017 19:12, Serge Rielau wrote:

On Apr 25, 2017, at 8:11 AM, Konstantin Knizhnik < k.knizh...@postgrespro.ru 
[k.knizh...@postgrespro.ru] > wrote: Another problem is caused by using integer 
literals in context where parameters can not be used, for example "order by 1”.
You will also need to deal with modifiers in types such as VARCHAR(10). Not 
sure if there are specific functions which can only deal with literals (?) as 
well. Sorry, I do not completely understand how presence of type modifiers can 
affect string literals used in query.
Can you provide me some example? SELECT ‘hello’::CHAR(10) || ‘World’, 5 + 6;
You can substitute ‘hello’, ‘World’, 5, and 6. But not 10.
Also some OLAP syntax like “rows preceding”
It pretty much boils down to whether you can do some shallow parsing rather 
than expending the effort to build the parse tree.
Cheers Serge

Re: [HACKERS] Cached plans and statement generalization

2017-04-25 Thread David Fetter

On Tue, Apr 25, 2017 at 06:11:09PM +0300, Konstantin Knizhnik wrote:
> On 24.04.2017 21:43, Andres Freund wrote:
> > Hi,
> > 
> > On 2017-04-24 11:46:02 +0300, Konstantin Knizhnik wrote:
> > > So what I am thinking now is implicit query caching. If the same query 
> > > with
> > > different literal values is repeated many times, then we can try to
> > > generalize this query and replace it with prepared query with
> > > parameters.
> > That's not actuall all that easy:
> > - You pretty much do parse analysis to be able to do an accurate match.
> >How much overhead is parse analysis vs. planning in your cases?
> > - The invalidation infrastructure for this, if not tied to to fully
> >parse-analyzed statements, is going to be hell.
> > - Migrating to parameters can actually cause significant slowdowns, not
> >nice if that happens implicitly.
> 
> Well, first of all I want to share results I already get: pgbench with
> default parameters,  scale 10 and one connection:
> 
> protocol
>   TPS
> simple
>   3492
> extended
>   2927
> prepared
>   6865
> simple + autoprepare
>   6844

If this is string mashing on the unparsed query, as it appears to be,
it's going to be a perennial source of security issues.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david(dot)fetter(at)gmail(dot)com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Cached plans and statement generalization




On 25.04.2017 19:12, Serge Rielau wrote:


On Apr 25, 2017, at 8:11 AM, Konstantin Knizhnik 
mailto:k.knizh...@postgrespro.ru>> wrote:
Another problem is caused by using integer literals in context where 
parameters can not be used, for example "order by 1”.
You will also need to deal with modifiers in types such as 
VARCHAR(10). Not sure if there are specific functions which can only 
deal with literals (?) as well.


Sorry, I do not completely understand how presence of type modifiers can 
affect string literals used in query.

Can you provide me some example?



Doug Doole did this work in DB2 LUW and he may be able to point to 
more places to watch out for semantically.


Generally, in my experience, this feature is very valuable when 
dealing with (poorly designed) web apps that just glue together strings.


I do not think that this optimization will be useful only for poorly 
designed application.
I already pointed on two use cases where prepapred statements can not be 
used:

1. pgbouncer without session-level pooling.
2. partitioning

Protecting it under a GUC would allow to only do the work if it’s 
deemed likely to help.
Another rule I find useful is to abort any efforts to substitute 
literals if any bind variable is found in the query.
That can be used as a cue that the author of the SQL left the 
remaining literals in on purpose.


A follow up feature would be to formalize different flavors of peeking.
I.e. can you produce a generic plan, but still recruit the initial set 
of bind values/substituted literals to dos costing?

Here situation is the same as for explicitly prepared statements, isn't it?
Sometimes it is preferrable to use specialized plan rather than generic 
plan.

I am not sure if postgres now is able to do it.




Cheers
Serge Rielau
Salesforce.com 

PS: FWIW, I like this feature.


--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] Cached plans and statement generalization

> On Apr 25, 2017, at 8:11 AM, Konstantin Knizhnik  
> wrote:
> Another problem is caused by using integer literals in context where 
> parameters can not be used, for example "order by 1”.
You will also need to deal with modifiers in types such as VARCHAR(10). Not 
sure if there are specific functions which can only deal with literals (?) as 
well.

Doug Doole did this work in DB2 LUW and he may be able to point to more places 
to watch out for semantically.

Generally, in my experience, this feature is very valuable when dealing with 
(poorly designed) web apps that just glue together strings.
Protecting it under a GUC would allow to only do the work if it’s deemed likely 
to help.
Another rule I find useful is to abort any efforts to substitute literals if 
any bind variable is found in the query.
That can be used as a cue that the author of the SQL left the remaining 
literals in on purpose.

A follow up feature would be to formalize different flavors of peeking. 
I.e. can you produce a generic plan, but still recruit the initial set of bind 
values/substituted literals to dos costing?

Cheers
Serge Rielau
Salesforce.com 

PS: FWIW, I like this feature.

Re: [HACKERS] Cached plans and statement generalization