Re: [HACKERS] Cached plans and statement generalization

2017-09-12 Thread Konstantin Knizhnik



On 11.09.2017 12:24, Konstantin Knizhnik wrote:

Attached please find rebased version of the patch.
There are the updated performance results (pgbench -s 100 -c 1):

protocol (-M)
read-write
read-only (-S)
simple
3327
19325
extended
2256
16908
prepared
6145
39326
simple+autoprepare
4950
34841




One more patch passing all regression tests with autoprepare_threshold=1.
I still do not think that it should be switch on by default...

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

diff --git a/src/backend/nodes/nodeFuncs.c b/src/backend/nodes/nodeFuncs.c
index e3eb0c5..17f3dfd 100644
--- a/src/backend/nodes/nodeFuncs.c
+++ b/src/backend/nodes/nodeFuncs.c
@@ -3688,6 +3688,454 @@ raw_expression_tree_walker(Node *node,
 }
 
 /*
+ * raw_expression_tree_mutator --- transform raw parse tree.
+ *
+ * This function is implementing slightly different approach for tree update than expression_tree_mutator().
+ * Callback is given pointer to pointer to the current node and can update this field instead of returning reference to new node.
+ * It makes it possible to remember changes and easily revert them without extra traversal of the tree.
+ *
+ * This function do not need QTW_DONT_COPY_QUERY flag: it never implicitly copy tree nodes, doing in-place update.
+ *
+ * Like raw_expression_tree_walker, there is no special rule about query
+ * boundaries: we descend to everything that's possibly interesting.
+ *
+ * Currently, the node type coverage here extends only to DML statements
+ * (SELECT/INSERT/UPDATE/DELETE) and nodes that can appear in them, because
+ * this is used mainly during analysis of CTEs, and only DML statements can
+ * appear in CTEs. If some other node is visited, iteration is immediately stopped and true is returned.
+ */
+bool
+raw_expression_tree_mutator(Node *node,
+			bool (*mutator) (),
+			void *context)
+{
+	ListCell   *temp;
+
+	/*
+	 * The walker has already visited the current node, and so we need only
+	 * recurse into any sub-nodes it has.
+	 */
+	if (node == NULL)
+		return false;
+
+	/* Guard against stack overflow due to overly complex expressions */
+	check_stack_depth();
+
+	switch (nodeTag(node))
+	{
+		case T_SetToDefault:
+		case T_CurrentOfExpr:
+		case T_Integer:
+		case T_Float:
+		case T_String:
+		case T_BitString:
+		case T_Null:
+		case T_ParamRef:
+		case T_A_Const:
+		case T_A_Star:
+			/* primitive node types with no subnodes */
+			break;
+		case T_Alias:
+			/* we assume the colnames list isn't interesting */
+			break;
+		case T_RangeVar:
+			return mutator(&((RangeVar *) node)->alias, context);
+		case T_GroupingFunc:
+			return mutator(&((GroupingFunc *) node)->args, context);
+		case T_SubLink:
+			{
+SubLink	   *sublink = (SubLink *) node;
+
+if (mutator(&sublink->testexpr, context))
+	return true;
+/* we assume the operName is not interesting */
+if (mutator(&sublink->subselect, context))
+	return true;
+			}
+			break;
+		case T_CaseExpr:
+			{
+CaseExpr   *caseexpr = (CaseExpr *) node;
+
+if (mutator(&caseexpr->arg, context))
+	return true;
+/* we assume mutator(& doesn't care about CaseWhens, either */
+foreach(temp, caseexpr->args)
+{
+	CaseWhen   *when = (CaseWhen *) lfirst(temp);
+
+	Assert(IsA(when, CaseWhen));
+	if (mutator(&when->expr, context))
+		return true;
+	if (mutator(&when->result, context))
+		return true;
+}
+if (mutator(&caseexpr->defresult, context))
+	return true;
+			}
+			break;
+		case T_RowExpr:
+			/* Assume colnames isn't interesting */
+			return mutator(&((RowExpr *) node)->args, context);
+		case T_CoalesceExpr:
+			return mutator(&((CoalesceExpr *) node)->args, context);
+		case T_MinMaxExpr:
+			return mutator(&((MinMaxExpr *) node)->args, context);
+		case T_XmlExpr:
+			{
+XmlExpr	   *xexpr = (XmlExpr *) node;
+
+if (mutator(&xexpr->named_args, context))
+	return true;
+/* we assume mutator(& doesn't care about arg_names */
+if (mutator(&xexpr->args, context))
+	return true;
+			}
+			break;
+		case T_NullTest:
+			return mutator(&((NullTest *) node)->arg, context);
+		case T_BooleanTest:
+			return mutator(&((BooleanTest *) node)->arg, context);
+		case T_JoinExpr:
+			{
+JoinExpr   *join = (JoinExpr *) node;
+
+if (mutator(&join->larg, context))
+	return true;
+if (mutator(&join->rarg, context))
+	return true;
+if (mutator(&join->quals, context))
+	return true;
+if (mutator(&join->alias, context))
+	return true;
+/* using list is deemed uninteresting */
+			}
+			break;
+		case T_IntoClause:
+			{
+IntoClause *into = (IntoClause *) node;
+
+if (mutator(&into->rel, context))
+	return true;
+/* colNames, options are deemed uninteresting */
+/* viewQuery should be null in raw parsetree, but check it */
+if (mutator(&int

Re: [HACKERS] Cached plans and statement generalization

2017-09-11 Thread Konstantin Knizhnik



On 09.09.2017 06:35, Thomas Munro wrote:

On Fri, May 26, 2017 at 3:54 AM, Konstantin Knizhnik
 wrote:

Attached please find rebased version of the autoprepare patch based on Tom's
proposal (perform analyze for tree with constant literals and then replace
them with parameters).
Also I submitted this patch for the Autum commitfest.

The patch didn't survive the Summer bitrotfest.  Could you please rebase it?


Attached please find rebased version of the patch.
There are the updated performance results (pgbench -s 100 -c 1):

protocol (-M)
read-write
read-only (-S)
simple
3327
19325
extended
2256
16908
prepared
6145
39326
simple+autoprepare
4950
34841



--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

diff --git a/src/backend/nodes/nodeFuncs.c b/src/backend/nodes/nodeFuncs.c
index e3eb0c5..17f3dfd 100644
--- a/src/backend/nodes/nodeFuncs.c
+++ b/src/backend/nodes/nodeFuncs.c
@@ -3688,6 +3688,454 @@ raw_expression_tree_walker(Node *node,
 }
 
 /*
+ * raw_expression_tree_mutator --- transform raw parse tree.
+ *
+ * This function is implementing slightly different approach for tree update than expression_tree_mutator().
+ * Callback is given pointer to pointer to the current node and can update this field instead of returning reference to new node.
+ * It makes it possible to remember changes and easily revert them without extra traversal of the tree.
+ *
+ * This function do not need QTW_DONT_COPY_QUERY flag: it never implicitly copy tree nodes, doing in-place update.
+ *
+ * Like raw_expression_tree_walker, there is no special rule about query
+ * boundaries: we descend to everything that's possibly interesting.
+ *
+ * Currently, the node type coverage here extends only to DML statements
+ * (SELECT/INSERT/UPDATE/DELETE) and nodes that can appear in them, because
+ * this is used mainly during analysis of CTEs, and only DML statements can
+ * appear in CTEs. If some other node is visited, iteration is immediately stopped and true is returned.
+ */
+bool
+raw_expression_tree_mutator(Node *node,
+			bool (*mutator) (),
+			void *context)
+{
+	ListCell   *temp;
+
+	/*
+	 * The walker has already visited the current node, and so we need only
+	 * recurse into any sub-nodes it has.
+	 */
+	if (node == NULL)
+		return false;
+
+	/* Guard against stack overflow due to overly complex expressions */
+	check_stack_depth();
+
+	switch (nodeTag(node))
+	{
+		case T_SetToDefault:
+		case T_CurrentOfExpr:
+		case T_Integer:
+		case T_Float:
+		case T_String:
+		case T_BitString:
+		case T_Null:
+		case T_ParamRef:
+		case T_A_Const:
+		case T_A_Star:
+			/* primitive node types with no subnodes */
+			break;
+		case T_Alias:
+			/* we assume the colnames list isn't interesting */
+			break;
+		case T_RangeVar:
+			return mutator(&((RangeVar *) node)->alias, context);
+		case T_GroupingFunc:
+			return mutator(&((GroupingFunc *) node)->args, context);
+		case T_SubLink:
+			{
+SubLink	   *sublink = (SubLink *) node;
+
+if (mutator(&sublink->testexpr, context))
+	return true;
+/* we assume the operName is not interesting */
+if (mutator(&sublink->subselect, context))
+	return true;
+			}
+			break;
+		case T_CaseExpr:
+			{
+CaseExpr   *caseexpr = (CaseExpr *) node;
+
+if (mutator(&caseexpr->arg, context))
+	return true;
+/* we assume mutator(& doesn't care about CaseWhens, either */
+foreach(temp, caseexpr->args)
+{
+	CaseWhen   *when = (CaseWhen *) lfirst(temp);
+
+	Assert(IsA(when, CaseWhen));
+	if (mutator(&when->expr, context))
+		return true;
+	if (mutator(&when->result, context))
+		return true;
+}
+if (mutator(&caseexpr->defresult, context))
+	return true;
+			}
+			break;
+		case T_RowExpr:
+			/* Assume colnames isn't interesting */
+			return mutator(&((RowExpr *) node)->args, context);
+		case T_CoalesceExpr:
+			return mutator(&((CoalesceExpr *) node)->args, context);
+		case T_MinMaxExpr:
+			return mutator(&((MinMaxExpr *) node)->args, context);
+		case T_XmlExpr:
+			{
+XmlExpr	   *xexpr = (XmlExpr *) node;
+
+if (mutator(&xexpr->named_args, context))
+	return true;
+/* we assume mutator(& doesn't care about arg_names */
+if (mutator(&xexpr->args, context))
+	return true;
+			}
+			break;
+		case T_NullTest:
+			return mutator(&((NullTest *) node)->arg, context);
+		case T_BooleanTest:
+			return mutator(&((BooleanTest *) node)->arg, context);
+		case T_JoinExpr:
+			{
+JoinExpr   *join = (JoinExpr *) node;
+
+if (mutator(&join->larg, context))
+	return true;
+if (mutator(&join->rarg, context))
+	return true;
+if (mutator(&join->quals, context))
+	return true;
+if (mutator(&join->alias, context))
+	return true;
+/* using list is deemed uninteresting */
+			}
+			break;
+		case T_IntoClause:
+			{
+IntoClause *into 

Re: [HACKERS] Cached plans and statement generalization

2017-09-08 Thread Thomas Munro
On Fri, May 26, 2017 at 3:54 AM, Konstantin Knizhnik
 wrote:
> Attached please find rebased version of the autoprepare patch based on Tom's
> proposal (perform analyze for tree with constant literals and then replace
> them with parameters).
> Also I submitted this patch for the Autum commitfest.

The patch didn't survive the Summer bitrotfest.  Could you please rebase it?

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cached plans and statement generalization

2017-05-25 Thread Konstantin Knizhnik

On 10.05.2017 19:11, Konstantin Knizhnik wrote:


Based on the Robert's feedback and Tom's proposal I have implemented 
two new versions of autoprepare patch.


First version is just refactoring of my original implementation: I 
have extracted common code into prepare_cached_plan and 
exec_prepared_plan
function to avoid code duplication. Also I rewrote assignment of 
values to parameters. Now types of parameters are inferred from types 
of literals, so there may be several
prepared plans which are different only by types of parameters. Due to 
the problem with type coercion for parameters, I have to catch errors 
in parse_analyze_varparams.




Attached please find rebased version of the autoprepare patch based on 
Tom's proposal (perform analyze for tree with constant literals and then 
replace them with parameters).

Also I submitted this patch for the Autum commitfest.

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

diff --git a/src/backend/nodes/nodeFuncs.c b/src/backend/nodes/nodeFuncs.c
index 95c1d3e..0b0642b 100644
--- a/src/backend/nodes/nodeFuncs.c
+++ b/src/backend/nodes/nodeFuncs.c
@@ -3710,6 +3710,454 @@ raw_expression_tree_walker(Node *node,
 }
 
 /*
+ * raw_expression_tree_mutator --- transform raw parse tree.
+ *
+ * This function is implementing slightly different approach for tree update than expression_tree_mutator().
+ * Callback is given pointer to pointer to the current node and can update this field instead of returning reference to new node.
+ * It makes it possible to remember changes and easily revert them without extra traversal of the tree.
+ *
+ * This function do not need QTW_DONT_COPY_QUERY flag: it never implicitly copy tree nodes, doing in-place update.
+ *
+ * Like raw_expression_tree_walker, there is no special rule about query
+ * boundaries: we descend to everything that's possibly interesting.
+ *
+ * Currently, the node type coverage here extends only to DML statements
+ * (SELECT/INSERT/UPDATE/DELETE) and nodes that can appear in them, because
+ * this is used mainly during analysis of CTEs, and only DML statements can
+ * appear in CTEs. If some other node is visited, iteration is immediately stopped and true is returned.
+ */
+bool
+raw_expression_tree_mutator(Node *node,
+			bool (*mutator) (),
+			void *context)
+{
+	ListCell   *temp;
+
+	/*
+	 * The walker has already visited the current node, and so we need only
+	 * recurse into any sub-nodes it has.
+	 */
+	if (node == NULL)
+		return false;
+
+	/* Guard against stack overflow due to overly complex expressions */
+	check_stack_depth();
+
+	switch (nodeTag(node))
+	{
+		case T_SetToDefault:
+		case T_CurrentOfExpr:
+		case T_Integer:
+		case T_Float:
+		case T_String:
+		case T_BitString:
+		case T_Null:
+		case T_ParamRef:
+		case T_A_Const:
+		case T_A_Star:
+			/* primitive node types with no subnodes */
+			break;
+		case T_Alias:
+			/* we assume the colnames list isn't interesting */
+			break;
+		case T_RangeVar:
+			return mutator(&((RangeVar *) node)->alias, context);
+		case T_GroupingFunc:
+			return mutator(&((GroupingFunc *) node)->args, context);
+		case T_SubLink:
+			{
+SubLink	   *sublink = (SubLink *) node;
+
+if (mutator(&sublink->testexpr, context))
+	return true;
+/* we assume the operName is not interesting */
+if (mutator(&sublink->subselect, context))
+	return true;
+			}
+			break;
+		case T_CaseExpr:
+			{
+CaseExpr   *caseexpr = (CaseExpr *) node;
+
+if (mutator(&caseexpr->arg, context))
+	return true;
+/* we assume mutator(& doesn't care about CaseWhens, either */
+foreach(temp, caseexpr->args)
+{
+	CaseWhen   *when = (CaseWhen *) lfirst(temp);
+
+	Assert(IsA(when, CaseWhen));
+	if (mutator(&when->expr, context))
+		return true;
+	if (mutator(&when->result, context))
+		return true;
+}
+if (mutator(&caseexpr->defresult, context))
+	return true;
+			}
+			break;
+		case T_RowExpr:
+			/* Assume colnames isn't interesting */
+			return mutator(&((RowExpr *) node)->args, context);
+		case T_CoalesceExpr:
+			return mutator(&((CoalesceExpr *) node)->args, context);
+		case T_MinMaxExpr:
+			return mutator(&((MinMaxExpr *) node)->args, context);
+		case T_XmlExpr:
+			{
+XmlExpr	   *xexpr = (XmlExpr *) node;
+
+if (mutator(&xexpr->named_args, context))
+	return true;
+/* we assume mutator(& doesn't care about arg_names */
+if (mutator(&xexpr->args, context))
+	return true;
+			}
+			break;
+		case T_NullTest:
+			return mutator(&((NullTest *) node)->arg, context);
+		case T_BooleanTest:
+			return mutator(&((BooleanTest *) node)->arg, context);
+		case T_JoinExpr:
+			{
+JoinExpr   *join = (JoinExpr *) node;
+
+if (mutator(&join->larg, context))
+	return true;
+if (mutator(&join->rarg, context))
+	return true;
+if (mutator(&join->quals, context))
+	return true;
+if (mutator(&join->alias

Re: [HACKERS] Cached plans and statement generalization

2017-05-18 Thread Andres Freund
On 2017-05-18 11:57:57 +0300, Konstantin Knizhnik wrote:
> From my own experience I found out that PG_TRY/PG_CATCH mechanism is not
> providing proper cleanup (unlike C++ exceptions).

Right, simply because there's no portable way to transparently do so.
Would be possible on elf glibc platforms, but ...


> If there are opened relations, catalog cache entries,... then throwing error
> will not release them.
> It will cause no problems if error is handled in PostgresMain which aborts
> current transaction and releases all resources in any case.
> But if I want to ignore this error and continue query execution, then
> warnings about resources leaks can be reported.
> Is it want you mean by unsafety of PG_TRY/PG_CATCH constructions?

There's worse than just leaking resources.  Everything touching the
database might cause persistent corruption if you don't roll back.
Consider an insert that failed with a foreign key exception, done from
some function.  If you ignore that error, the row will still be visible,
but the foreign key will be violated.   If you want to continue after a
PG_CATCH you have to use a subtransaction/savepoint for the PG_TRY
contents, like several PLs do.

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cached plans and statement generalization

2017-05-18 Thread Konstantin Knizhnik



On 15.05.2017 18:31, Robert Haas wrote:

On Wed, May 10, 2017 at 12:11 PM, Konstantin Knizhnik
 wrote:

Robert, can you please explain why using TRY/CATCH is not safe here:

This is definitely not a safe way of using TRY/CATCH.

This has been discussed many, many times on this mailing list before,
and I don't really want to go over it again here.  We really need a
README or some documentation about this so that we don't have to keep
answering this same question again and again.

First of all I want to notice that new version of my patch is not using 
PG_TRY/PG_CATCH.

But I still want to clarify for myself whats wrong with this constructions.
I searched both hackers mailing list archive and world-wide using google 
but failed to find any references except of

sharing non-volatilie variables between try and catch blocks.
Can you please point me at the thread where this problem was discussed 
or just explain in few words the source of the problem?


From my own experience I found out that PG_TRY/PG_CATCH mechanism is 
not providing proper cleanup (unlike C++ exceptions).
If there are opened relations, catalog cache entries,... then throwing 
error will not release them.
It will cause no problems if error is handled in PostgresMain which 
aborts current transaction and releases all resources in any case.
But if I want to ignore this error and continue query execution, then 
warnings about resources leaks can be reported.

Is it want you mean by unsafety of PG_TRY/PG_CATCH constructions?

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cached plans and statement generalization

2017-05-15 Thread Robert Haas
On Wed, May 10, 2017 at 12:11 PM, Konstantin Knizhnik
 wrote:
> Robert, can you please explain why using TRY/CATCH is not safe here:
>>
>> This is definitely not a safe way of using TRY/CATCH.

This has been discussed many, many times on this mailing list before,
and I don't really want to go over it again here.  We really need a
README or some documentation about this so that we don't have to keep
answering this same question again and again.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cached plans and statement generalization

2017-05-12 Thread Bruce Momjian
On Fri, May 12, 2017 at 08:35:26PM +0300, Konstantin Knizhnik wrote:
> 
> 
> On 12.05.2017 18:23, Bruce Momjian wrote:
> >On Fri, May 12, 2017 at 10:50:41AM +0300, Konstantin Knizhnik wrote:
> >>Definitely changing session context (search_path, date/time format, ...) may
> >>cause incorrect behavior of cached statements.
> >I wonder if we should clear the cache whenever any SET command is
> >issued.
> 
> Well, with autoprepare cache disabled on each set variable, alter system and
> any slow utility statement
> only one regression test is not passed. And only because of different error
> message context:

Great.  We have to think of how we would keep this reliable when future
changes happen.  Simple is often better because it limits the odds of
breakage.

> >>Actually you may get the same problem with explicitly prepared statements
> >>(certainly, in the last case, you better understand what going on and it is
> >>your choice whether to use or not to use prepared statement).
> >>
> >>The fact of failure of 7 regression tests means that autoprepare can really
> >>change behavior of existed program. This is why my suggestion is  to switch
> >>off this feature by default.
> >I would like to see us target something that can be enabled by default.
> >Even if it only improves performance by 5%, it would be better overall
> >than a feature that improves performance by 90% but is only used by 1%
> >of our users.
> 
> I have to agree with you here.

Good.  I know there are database systems that will try to get the most
performance possible no matter how complex it is for users to configure
or to maintain, but that isn't us.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cached plans and statement generalization

2017-05-12 Thread Andres Freund


On May 12, 2017 12:50:41 AM PDT, Konstantin Knizhnik 
 wrote:

>Definitely changing session context (search_path, date/time format,
>...) 
>may cause incorrect behavior of cached statements.
>Actually you may get the same problem with explicitly prepared 
>statements (certainly, in the last case, you better understand what 
>going on and it is your choice whether to use or not to use prepared 
>statement).
>
>The fact of failure of 7 regression tests means that autoprepare can 
>really change behavior of existed program. This is why my suggestion is
> 
>to switch off this feature by default.

I can't see us agreeing with this feature unless it actually reliably works, 
even if disabled by default.

Andres
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cached plans and statement generalization

2017-05-12 Thread Konstantin Knizhnik



On 12.05.2017 18:23, Bruce Momjian wrote:

On Fri, May 12, 2017 at 10:50:41AM +0300, Konstantin Knizhnik wrote:

Definitely changing session context (search_path, date/time format, ...) may
cause incorrect behavior of cached statements.

I wonder if we should clear the cache whenever any SET command is
issued.


Well, with autoprepare cache disabled on each set variable, alter system 
and any slow utility statement
only one regression test is not passed. And only because of different 
error message context:


*** /home/knizhnik/postgresql.master/src/test/regress/expected/date.out 
2017-04-11 18:07:56.497461208 +0300
--- /home/knizhnik/postgresql.master/src/test/regress/results/date.out 
2017-05-12 20:21:19.767566302 +0300

***
*** 1443,1452 
  --
  SELECT EXTRACT(MICROSEC  FROM DATE 'infinity'); -- ERROR: 
timestamp units "microsec" not recognized

  ERROR:  timestamp units "microsec" not recognized
- CONTEXT:  SQL function "date_part" statement 1
  SELECT EXTRACT(UNDEFINED FROM DATE 'infinity'); -- ERROR: 
timestamp units "undefined" not supported

  ERROR:  timestamp units "undefined" not supported
- CONTEXT:  SQL function "date_part" statement 1
  -- test constructors
  select make_date(2013, 7, 15);
   make_date
--- 1443,1450 

==




Actually you may get the same problem with explicitly prepared statements
(certainly, in the last case, you better understand what going on and it is
your choice whether to use or not to use prepared statement).

The fact of failure of 7 regression tests means that autoprepare can really
change behavior of existed program. This is why my suggestion is  to switch
off this feature by default.

I would like to see us target something that can be enabled by default.
Even if it only improves performance by 5%, it would be better overall
than a feature that improves performance by 90% but is only used by 1%
of our users.


I have to agree with you here.

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cached plans and statement generalization

2017-05-12 Thread Bruce Momjian
On Fri, May 12, 2017 at 10:50:41AM +0300, Konstantin Knizhnik wrote:
> Definitely changing session context (search_path, date/time format, ...) may
> cause incorrect behavior of cached statements.

I wonder if we should clear the cache whenever any SET command is
issued.

> Actually you may get the same problem with explicitly prepared statements
> (certainly, in the last case, you better understand what going on and it is
> your choice whether to use or not to use prepared statement).
> 
> The fact of failure of 7 regression tests means that autoprepare can really
> change behavior of existed program. This is why my suggestion is  to switch
> off this feature by default.

I would like to see us target something that can be enabled by default. 
Even if it only improves performance by 5%, it would be better overall
than a feature that improves performance by 90% but is only used by 1%
of our users.

> But in 99.9% real cases (my estimation plucked out of thin air:) there will
> be no such problems with autoprepare. And it can significantly improve
> performance of OLTP applications
> which are not able to use prepared statements (because of working through
> pgbouncer or any other reasons).

Right, but we can't ship something that is 99.9% accurate when the
inaccuracy is indeterminate.  The bottom line is that Postgres has a
very high bar, and I realize you are just prototyping at this point, but
the final product is going to have to address all the intricacies of the
issue for it to be added.

> Can autoprepare slow down the system?
> Yes, it can. It can happen if application perform larger number of unique
> queries and autoprepare cache size is not limited.
> In this case large (and infinitely growing) number of stored plans can
> consume a lot of memory and, what is even worse, slowdown cache lookup.
> This is why I by default limit number of cached statements
> (autoprepare_limit parameter) by 100.

Yes, good idea.

> I am almost sure that there will be some other issues with autoprepare which
> I have not encountered yet (because I mostly tested it on pgbench and
> Postgres regression tests).
> But I am also sure that benefit of doubling system performance is good
> motivation to continue work in this direction.

Right, you are still testing to see where the edges are.

> My main concern is whether to continue to improve current approach with
> local (per-backend) cache of prepared statements.
> Or create shared cache (as in Oracle). It is much more difficult to
> implement shared cache (the same problem with session context, different
> catalog snapshots, cache invalidation,...)
> but it also provides more opportunities for queries optimization and tuning.

I would continue in the per-backend cache direction at this point
because we don't even have that solved yet.  The global cache is going
to be even more complex.

Ultimately I think we are going to want global and local caches because
the plans of the local cache are much more likely to be accurate.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cached plans and statement generalization

2017-05-12 Thread Konstantin Knizhnik



On 12.05.2017 03:58, Bruce Momjian wrote:

On Thu, May 11, 2017 at 10:41:45PM +0300, Konstantin Knizhnik wrote:

This is why I have provided second implementation which replace
literals with parameters after raw parsing.  Certainly it is slower
than first approach. But still provide significant advantage in
performance: more than two times at pgbench.  Then I tried to run
regression tests and find several situations where type analysis is
not correctly performed in case of replacing literals with parameters.

So the issue is that per-command output from the parser, SelectStmt,
only has strings for identifers, e.g. table and column names, so you
can't be sure it is the same as the cached entry you matched.  I suppose
if you cleared the cache every time someone created an object or changed
search_path, it might work.

Definitely changing session context (search_path, date/time format, ...) 
may cause incorrect behavior of cached statements.
Actually you may get the same problem with explicitly prepared 
statements (certainly, in the last case, you better understand what 
going on and it is your choice whether to use or not to use prepared 
statement).


The fact of failure of 7 regression tests means that autoprepare can 
really change behavior of existed program. This is why my suggestion is  
to switch off this feature by default.
But in 99.9% real cases (my estimation plucked out of thin air:) there 
will be no such problems with autoprepare. And it can significantly 
improve performance of OLTP applications
which are not able to use prepared statements (because of working 
through pgbouncer or any other reasons).


Can autoprepare slow down the system?
Yes, it can. It can happen if application perform larger number of 
unique queries and autoprepare cache size is not limited.
In this case large (and infinitely growing) number of stored plans can 
consume a lot of memory and, what is even worse, slowdown cache lookup.
This is why I by default limit number of cached statements 
(autoprepare_limit parameter) by 100.


I am almost sure that there will be some other issues with autoprepare 
which I have not encountered yet (because I mostly tested it on pgbench 
and Postgres regression tests).
But I am also sure that benefit of doubling system performance is good 
motivation to continue work in this direction.


My main concern is whether to continue to improve current approach with 
local (per-backend) cache of prepared statements.
Or create shared cache (as in Oracle). It is much more difficult to 
implement shared cache (the same problem with session context, different 
catalog snapshots, cache invalidation,...)
but it also provides more opportunities for queries optimization and 
tuning.






--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cached plans and statement generalization

2017-05-11 Thread Bruce Momjian
On Thu, May 11, 2017 at 10:41:45PM +0300, Konstantin Knizhnik wrote:
> This is why I have provided second implementation which replace
> literals with parameters after raw parsing.  Certainly it is slower
> than first approach. But still provide significant advantage in
> performance: more than two times at pgbench.  Then I tried to run
> regression tests and find several situations where type analysis is
> not correctly performed in case of replacing literals with parameters.

So the issue is that per-command output from the parser, SelectStmt,
only has strings for identifers, e.g. table and column names, so you
can't be sure it is the same as the cached entry you matched.  I suppose
if you cleared the cache every time someone created an object or changed
search_path, it might work.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cached plans and statement generalization

2017-05-11 Thread Konstantin Knizhnik

On 05/11/2017 10:52 PM, Andres Freund wrote:

On 2017-05-11 22:48:26 +0300, Konstantin Knizhnik wrote:

On 05/11/2017 09:31 PM, Tom Lane wrote:

Bruce Momjian  writes:

Good point.  I think we need to do some measurements to see if the
parser-only stage is actually significant.  I have a hunch that
commercial databases have much heavier parsers than we do.

FWIW, gram.y does show up as significant in many of the profiles I take.
I speculate that this is not so much that it eats many CPU cycles, as that
the constant tables are so large as to incur lots of cache misses.  scan.l
is not quite as big a deal for some reason, even though it's also large.

regards, tom lane

Yes, my results shows that pg_parse_query adds not so much overhead:
206k TPS for my first variant with string literal substitution and modified 
query text used as hash key vs.
181k. TPS for version with patching raw parse tree constructed by 
pg_parse_query.

Those numbers and your statement seem to contradict each other?


Oops, my parse error:( I incorrectly read Tom's statement.
Actually, I also was afraid that price of parsing is large enough and this is 
why my first attempt was to avoid parsing.
But then I find out that most of the time is spent in analyze and planning (see 
attached profile):

pg_parse_query: 4.23%
pg_analyze_and_rewrite 8.45%
pg_plan_queries: 15.49%



- Andres



--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cached plans and statement generalization

2017-05-11 Thread Andres Freund
On 2017-05-11 22:48:26 +0300, Konstantin Knizhnik wrote:
> On 05/11/2017 09:31 PM, Tom Lane wrote:
> > Bruce Momjian  writes:
> > > Good point.  I think we need to do some measurements to see if the
> > > parser-only stage is actually significant.  I have a hunch that
> > > commercial databases have much heavier parsers than we do.
> > FWIW, gram.y does show up as significant in many of the profiles I take.
> > I speculate that this is not so much that it eats many CPU cycles, as that
> > the constant tables are so large as to incur lots of cache misses.  scan.l
> > is not quite as big a deal for some reason, even though it's also large.
> > 
> > regards, tom lane
> Yes, my results shows that pg_parse_query adds not so much overhead:
> 206k TPS for my first variant with string literal substitution and modified 
> query text used as hash key vs.
> 181k. TPS for version with patching raw parse tree constructed by 
> pg_parse_query.

Those numbers and your statement seem to contradict each other?

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cached plans and statement generalization

2017-05-11 Thread Konstantin Knizhnik

On 05/11/2017 09:31 PM, Tom Lane wrote:

Bruce Momjian  writes:

Good point.  I think we need to do some measurements to see if the
parser-only stage is actually significant.  I have a hunch that
commercial databases have much heavier parsers than we do.

FWIW, gram.y does show up as significant in many of the profiles I take.
I speculate that this is not so much that it eats many CPU cycles, as that
the constant tables are so large as to incur lots of cache misses.  scan.l
is not quite as big a deal for some reason, even though it's also large.

regards, tom lane

Yes, my results shows that pg_parse_query adds not so much overhead:
206k TPS for my first variant with string literal substitution and modified 
query text used as hash key vs.
181k. TPS for version with patching raw parse tree constructed by 
pg_parse_query.


--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cached plans and statement generalization

2017-05-11 Thread Konstantin Knizhnik

On 05/11/2017 06:12 PM, Bruce Momjian wrote:

On Wed, May 10, 2017 at 07:11:07PM +0300, Konstantin Knizhnik wrote:

I am going to continue work on this patch I will be glad to receive any
feedback and suggestions for its improvement.
In most cases, applications are not accessing Postgres directly, but using
some connection pooling layer and so them are not able to use prepared
statements.
But at simple OLTP Postgres spent more time on building query plan than on
execution itself. And it is possible to speedup Postgres about two times at
such workload!
Another alternative is true shared plan cache.  May be it is even more
perspective approach, but definitely much more invasive and harder to
implement.

Can we back up and get an overview of what you are doing and how you are
doing it?  Our TODO list suggests this order for successful patches:

Desirability -> Design -> Implement -> Test -> Review -> Commit

You kind of started at the Implementation/patch level, which makes it
hard to evaluate.

I think everyone agrees on the Desirability of the feature, but the
Design is the tricky part.  I think the design questions are:

*  What information is stored about cached plans?
*  How are the cached plans invalidated?
*  How is a query matched against a cached plan?

Looking at the options, ideally the plan would be cached at the same
query stage as the stage where the incoming query is checked against the
cache.  However, caching and checking at the same level offers no
benefit, so they are going to be different.  For example, caching a
parse tree at the time it is created, then checking at the same point if
the incoming query is the same doesn't help you because you already had
to create the parse tree get to that point.

A more concrete example is prepared statements.  They are stored at the
end of planning and matched in the parser.  However, you can easily do
that since the incoming query specifies the name of the prepared query,
so there is no trick to matching.

The desire is to cache as late as possible so you cache more work and
you have more detail about the referenced objects, which helps with
cache invalidation.  However, you also want to do cache matching as
early as possible to improve performance.

So, let's look at some options.  One interesting idea from Doug Doole
was to do it between the tokenizer and parser.  I think they are glued
together so you would need a way to run the tokenizer separately and
compare that to the tokens you stored for the cached plan.  The larger
issue is that prepared plans already are checked after parsing, and we
know they are a win, so matching any earlier than that just seems like
overkill and likely to lead to lots of problems.

So, you could do it after parsing but before parse-analysis, which is
kind of what prepared queries do.  One tricky problem is that we don't
bind the query string tokens to database objects until after parse
analysis.

Doing matching before parse-analysis is going to be tricky, which is why
there are so many comments about the approach.  Changing search_path can
certainly affect it, but creating objects in earlier-mentioned schemas
can also change how an object reference in a query is resolved.  Even
obscure things like the creation of a new operator that has higher
precedence in the query could change the plan, though am not sure if
our prepared query system even handles that properly.

Anyway, that is my feedback.  I would like to get an overview of what
you are trying to do and the costs/benefits of each option so we can
best guide you.


Sorry, for luck of overview.
I have started with small prototype just to investigate if such optimization 
makes sense or not.
When I get more than two time advantage in performance on standard pgbench, I 
come to conclusion that this
optimization can be really very useful and now try to find the best way of its 
implementation.

I have started with simplest approach when string literals are replaced with 
parameters. It is done before parsing.
And can be done very fast - just need to locate data in quotes.
But this approach is not safe and universal: you will not be able to speedup 
most of the existed queries without rewriting them.

This is why I have provided second implementation which replace literals with 
parameters after raw parsing.
Certainly it is slower than first approach. But still provide significant 
advantage in performance: more than two times at pgbench.
Then I tried to run regression tests and find several situations where type 
analysis is not correctly performed in case of replacing literals with 
parameters.

So my third attempt is to replace constant nodes with parameters in already 
analyzed tree.

Now answering your questions:

*  What information is stored about cached plans?

Key to locate cached plan is raw parse tree. Value is saved CachedPlanSource.

*  How are the cached plans invalidated?

In the same way as plans for explicitly prepared statements.

*  How is a 

Re: [HACKERS] Cached plans and statement generalization

2017-05-11 Thread Andres Freund


On May 11, 2017 11:31:02 AM PDT, Tom Lane  wrote:
>Bruce Momjian  writes:
>> Good point.  I think we need to do some measurements to see if the
>> parser-only stage is actually significant.  I have a hunch that
>> commercial databases have much heavier parsers than we do.
>
>FWIW, gram.y does show up as significant in many of the profiles I
>take.
>I speculate that this is not so much that it eats many CPU cycles, as
>that
>the constant tables are so large as to incur lots of cache misses. 
>scan.l
>is not quite as big a deal for some reason, even though it's also
>large.

And that there's a lot of unpredictable branches, leading to a lot if pipeline 
stalls.

Andres
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cached plans and statement generalization

2017-05-11 Thread Tom Lane
Bruce Momjian  writes:
> Good point.  I think we need to do some measurements to see if the
> parser-only stage is actually significant.  I have a hunch that
> commercial databases have much heavier parsers than we do.

FWIW, gram.y does show up as significant in many of the profiles I take.
I speculate that this is not so much that it eats many CPU cycles, as that
the constant tables are so large as to incur lots of cache misses.  scan.l
is not quite as big a deal for some reason, even though it's also large.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cached plans and statement generalization

2017-05-11 Thread Bruce Momjian
On Thu, May 11, 2017 at 05:39:58PM +, Douglas Doole wrote:
> One interesting idea from Doug Doole was to do it between the tokenizer 
> and
> parser.  I think they are glued together so you would need a way to run 
> the
> tokenizer separately and compare that to the tokens you stored for the
> cached plan.
> 
> 
> When I did this, we had the same problem that the tokenizer and parser were
> tightly coupled. Fortunately, I was able to do as you suggest and run the
> tokenizer separately to do my analysis. 
> 
> So my model was to do statement generalization before entering the compiler at
> all. I would tokenize the statement to find the literals and generate a new
> statement string with placeholders. The new string would the be passed to the
> compiler which would then tokenize and parse the reworked statement.
> 
> This means we incurred the cost of tokenizing twice, but the tokenizer was
> lightweight enough that it wasn't a problem. In exchange I was able to do
> statement generalization without touching the compiler - the compiler saw the
> generalized statement text as any other statement and handled it in the exact
> same way. (There was just a bit of new code around variable binding.)

Good point.  I think we need to do some measurements to see if the
parser-only stage is actually significant.  I have a hunch that
commercial databases have much heavier parsers than we do.

This split would also not work if the scanner feeds changes back into
the parser.  I know C does that for typedefs but I don't think we do.

Ideally I would like to see percentage-of-execution numbers for typical
queries for scan, parse, parse-analysis, plan, and execute to see where
the wins are.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cached plans and statement generalization

2017-05-11 Thread Douglas Doole
>
> One interesting idea from Doug Doole was to do it between the tokenizer
> and parser.  I think they are glued together so you would need a way to run
> the tokenizer separately and compare that to the tokens you stored for the
> cached plan.
>

When I did this, we had the same problem that the tokenizer and parser were
tightly coupled. Fortunately, I was able to do as you suggest and run the
tokenizer separately to do my analysis.

So my model was to do statement generalization before entering the compiler
at all. I would tokenize the statement to find the literals and generate a
new statement string with placeholders. The new string would the be passed
to the compiler which would then tokenize and parse the reworked statement.

This means we incurred the cost of tokenizing twice, but the tokenizer was
lightweight enough that it wasn't a problem. In exchange I was able to do
statement generalization without touching the compiler - the compiler saw
the generalized statement text as any other statement and handled it in the
exact same way. (There was just a bit of new code around variable binding.)


Re: [HACKERS] Cached plans and statement generalization

2017-05-11 Thread Bruce Momjian
On Wed, May 10, 2017 at 07:11:07PM +0300, Konstantin Knizhnik wrote:
> I am going to continue work on this patch I will be glad to receive any
> feedback and suggestions for its improvement.
> In most cases, applications are not accessing Postgres directly, but using
> some connection pooling layer and so them are not able to use prepared
> statements.
> But at simple OLTP Postgres spent more time on building query plan than on
> execution itself. And it is possible to speedup Postgres about two times at
> such workload!
> Another alternative is true shared plan cache.  May be it is even more
> perspective approach, but definitely much more invasive and harder to
> implement.

Can we back up and get an overview of what you are doing and how you are
doing it?  Our TODO list suggests this order for successful patches:

Desirability -> Design -> Implement -> Test -> Review -> Commit

You kind of started at the Implementation/patch level, which makes it
hard to evaluate.

I think everyone agrees on the Desirability of the feature, but the
Design is the tricky part.  I think the design questions are:

*  What information is stored about cached plans?
*  How are the cached plans invalidated?
*  How is a query matched against a cached plan?

Looking at the options, ideally the plan would be cached at the same
query stage as the stage where the incoming query is checked against the
cache.  However, caching and checking at the same level offers no
benefit, so they are going to be different.  For example, caching a
parse tree at the time it is created, then checking at the same point if
the incoming query is the same doesn't help you because you already had
to create the parse tree get to that point.

A more concrete example is prepared statements.  They are stored at the
end of planning and matched in the parser.  However, you can easily do
that since the incoming query specifies the name of the prepared query,
so there is no trick to matching.

The desire is to cache as late as possible so you cache more work and
you have more detail about the referenced objects, which helps with
cache invalidation.  However, you also want to do cache matching as
early as possible to improve performance.

So, let's look at some options.  One interesting idea from Doug Doole
was to do it between the tokenizer and parser.  I think they are glued
together so you would need a way to run the tokenizer separately and
compare that to the tokens you stored for the cached plan.  The larger
issue is that prepared plans already are checked after parsing, and we
know they are a win, so matching any earlier than that just seems like
overkill and likely to lead to lots of problems.

So, you could do it after parsing but before parse-analysis, which is
kind of what prepared queries do.  One tricky problem is that we don't
bind the query string tokens to database objects until after parse
analysis.

Doing matching before parse-analysis is going to be tricky, which is why
there are so many comments about the approach.  Changing search_path can
certainly affect it, but creating objects in earlier-mentioned schemas
can also change how an object reference in a query is resolved.  Even
obscure things like the creation of a new operator that has higher
precedence in the query could change the plan, though am not sure if
our prepared query system even handles that properly.

Anyway, that is my feedback.  I would like to get an overview of what
you are trying to do and the costs/benefits of each option so we can
best guide you.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cached plans and statement generalization

2017-05-10 Thread Konstantin Knizhnik

On 02.05.2017 21:26, Robert Haas wrote:


I am sympathetic to the fact that this is a hard problem to solve.
I'm just telling you that the way you've got it is not acceptable and
nobody's going to commit it like that (or if they do, they will end up
having to revert it).  If you want to have a technical discussion
about what might be a way to change the patch to be more acceptable,
cool, but I don't want to get into a long debate about whether what
you have is acceptable or not; I've already said what I think about
that and I believe that opinion will be widely shared.  I am not
trying to beat you up here, just trying to be clear.




Based on the Robert's feedback and Tom's proposal I have implemented two 
new versions of autoprepare patch.


First version is just refactoring of my original implementation: I have 
extracted common code into prepare_cached_plan and exec_prepared_plan
function to avoid code duplication. Also I rewrote assignment of values 
to parameters. Now types of parameters are inferred from types of 
literals, so there may be several
prepared plans which are different only by types of parameters. Due to 
the problem with type coercion for parameters, I have to catch errors in 
parse_analyze_varparams.


Robert, can you please explain why using TRY/CATCH is not safe here:

This is definitely not a safe way of using TRY/CATCH.


Second version is based on Tom's suggestion:

Personally I'd think about
replacing the entire literal-with-cast construct with a Param having
already-known type.
So here I first patch raw parse tree, replacing A_Const with ParamRef. 
Such plan is needed to perform cache lookup.
Then I restore original raw parse tree and call pg_analyze_and_rewrite. 
Then I mutate analyzed tree, replacing Const with Param nodes.
In this case type coercion is already performed and I know precise types 
which should be used for parameters.
It seems to be more sophisticated approach. And I can not extract common 
code in prepare_cached_plan,
because preparing of plan is currently mix of steps done in 
exec_simple_query and exec_parse_message.

But there is no need to catch analyze errors.

Finally performance of both approaches is the same: at pgbench it is 
180k TPS on read-only queries, comparing with 80k TPS for not prepared 
queries.
In both cases 7 out of  177 regression tests  are not passed (mostly 
because session environment is changed between subsequent query execution).


I am going to continue work on this patch I will be glad to receive any 
feedback and suggestions for its improvement.
In most cases, applications are not accessing Postgres directly, but 
using some connection pooling layer and so them are not able to use 
prepared statements.
But at simple OLTP Postgres spent more time on building query plan than 
on execution itself. And it is possible to speedup Postgres about two 
times at such workload!
Another alternative is true shared plan cache.  May be it is even more 
perspective approach, but definitely much more invasive and harder to 
implement.


--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

diff --git a/src/backend/nodes/nodeFuncs.c b/src/backend/nodes/nodeFuncs.c
index 6e52eb7..f2eb0f5 100644
--- a/src/backend/nodes/nodeFuncs.c
+++ b/src/backend/nodes/nodeFuncs.c
@@ -3696,6 +3696,454 @@ raw_expression_tree_walker(Node *node,
 }
 
 /*
+ * raw_expression_tree_mutator --- transform raw parse tree. 
+ *
+ * This function is implementing slightly different approach for tree update than expression_tree_mutator().
+ * Callback is given pointer to pointer to the current node and can update this field instead of returning reference to new node.
+ * It makes it possible to remember changes and easily revert them without extra traversal of the tree.
+ * 
+ * This function do not need QTW_DONT_COPY_QUERY flag: it never implicitly copy tree nodes, doing in-place update.
+ * 
+ * Like raw_expression_tree_walker, there is no special rule about query
+ * boundaries: we descend to everything that's possibly interesting.
+ *
+ * Currently, the node type coverage here extends only to DML statements
+ * (SELECT/INSERT/UPDATE/DELETE) and nodes that can appear in them, because
+ * this is used mainly during analysis of CTEs, and only DML statements can
+ * appear in CTEs. If some other node is visited, iteration is immediately stopped and true is returned.
+ */
+bool
+raw_expression_tree_mutator(Node *node,
+			bool (*mutator) (),
+			void *context)
+{
+	ListCell   *temp;
+
+	/*
+	 * The walker has already visited the current node, and so we need only
+	 * recurse into any sub-nodes it has.
+	 */
+	if (node == NULL)
+		return false;
+
+	/* Guard against stack overflow due to overly complex expressions */
+	check_stack_depth();
+
+	switch (nodeTag(node))
+	{
+		case T_SetToDefault:
+		case T_CurrentOfExpr:
+		case T_Integer:
+		case T_Float:
+		case T_String:
+		case T_BitString:
+		case T_Null:
+		case T_ParamR

Re: [HACKERS] Cached plans and statement generalization

2017-05-02 Thread Robert Haas
On Tue, May 2, 2017 at 5:50 AM, Konstantin Knizhnik
 wrote:
>> I don't see something with a bunch of hard-coded rules for particular type
>> OIDs having any chance of being acceptable.
>
> Well, what I need is ...

Regarding this...

> Definitely copying of code is bad flaw. It will be much better and easier
> just to call three original functions instead of mixing gathering their code
> into the new function.
> But I failed to do it because ...

...and also this:

I am sympathetic to the fact that this is a hard problem to solve.
I'm just telling you that the way you've got it is not acceptable and
nobody's going to commit it like that (or if they do, they will end up
having to revert it).  If you want to have a technical discussion
about what might be a way to change the patch to be more acceptable,
cool, but I don't want to get into a long debate about whether what
you have is acceptable or not; I've already said what I think about
that and I believe that opinion will be widely shared.  I am not
trying to beat you up here, just trying to be clear.

> Thanks for your feedback.

Sure thing.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cached plans and statement generalization

2017-05-02 Thread Konstantin Knizhnik



On 01.05.2017 18:52, Robert Haas wrote:
On Fri, Apr 28, 2017 at 6:01 AM, Konstantin Knizhnik 
mailto:k.knizh...@postgrespro.ru>> wrote:



Any comments and suggestions for future improvement of this patch
are welcome.


+PG_TRY();
+{
+query = parse_analyze_varparams(parse_tree,
+query_string,
+ ¶m_types,
+ &num_params);
+}
+PG_CATCH();
+{
+/*
+ * In case of analyze errors revert back to original 
query processing
+ * and disable autoprepare for this query to avoid such 
problems in future.

+ */
+FlushErrorState();
+if (snapshot_set) {
+PopActiveSnapshot();
+}
+entry->disable_autoprepare = true;
+undo_query_plan_changes(parse_tree, const_param_list);
+MemoryContextSwitchTo(old_context);
+return false;
+}
+PG_END_TRY();

This is definitely not a safe way of using TRY/CATCH.

+
+/* Convert literal value to parameter value */
+switch (const_param->literal->val.type)
+{
+  /*
+   * Convert from integer literal
+   */
+  case T_Integer:
+switch (ptype) {
+  case INT8OID:
+params->params[paramno].value = 
Int64GetDatum((int64)const_param->literal->val.val.ival);

+break;
+  case INT4OID:
+params->params[paramno].value = 
Int32GetDatum((int32)const_param->literal->val.val.ival);

+break;
+  case INT2OID:
+if (const_param->literal->val.val.ival < SHRT_MIN
+|| const_param->literal->val.val.ival > SHRT_MAX)
+{
+ereport(ERROR,
+ (errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
+ errmsg("smallint out of range")));
+}
+params->params[paramno].value = 
Int16GetDatum((int16)const_param->literal->val.val.ival);

+break;
+  case FLOAT4OID:
+params->params[paramno].value = 
Float4GetDatum((float)const_param->literal->val.val.ival);

+break;
+  case FLOAT8OID:
+params->params[paramno].value = 
Float8GetDatum((double)const_param->literal->val.val.ival);

+break;
+  case INT4RANGEOID:
+sprintf(buf, "[%ld,%ld]", 
const_param->literal->val.val.ival, const_param->literal->val.val.ival);

+getTypeInputInfo(ptype, &typinput, &typioparam);
+params->params[paramno].value = 
OidInputFunctionCall(typinput, buf, typioparam, -1);

+break;
+  default:
+ pg_lltoa(const_param->literal->val.val.ival, buf);
+getTypeInputInfo(ptype, &typinput, &typioparam);
+params->params[paramno].value = 
OidInputFunctionCall(typinput, buf, typioparam, -1);

+}
+break;
+  case T_Null:
+params->params[paramno].isnull = true;
+break;
+  default:
+/*
+ * Convert from string literal
+ */
+getTypeInputInfo(ptype, &typinput, &typioparam);
+params->params[paramno].value = 
OidInputFunctionCall(typinput, const_param->literal->val.val.str, 
typioparam, -1);

+}

I don't see something with a bunch of hard-coded rules for particular 
type OIDs having any chance of being acceptable.




Well, what I need is to convert literal value represented in Value 
struct to parameter datum value.

Struct "value" contains union with integer literal and text.
So this peace of code is just provides efficient handling of most common 
cases (integer parameters) and uses type's input function in other cases.



This patch seems to duplicate a large amount of existing code.  That 
would be a good thing to avoid.


Yes,  I have to copy a lot of code from exec_parse_message + 
exec_bind_message + exec_execute_message functions.
Definitely copying of code is bad flaw. It will be much better and 
easier just to call three original functions instead of mixing gathering 
their code into the new function.

But I failed to do it because
1.  Autoprepare should be integrated into exec_simple_query. Before 
executing query in normal way, I need to perform cache lookup for 
previously prepared plan for this generalized query.
And generalization of query requires building of query tree (query 
parsing). In other words, parsing should be done before I can call 
exec_parse_message.
2. exec_bind_message works with parameters passed by client though 
libpwq protocol, while autoprepare deals with values of 

Re: [HACKERS] Cached plans and statement generalization

2017-05-01 Thread Robert Haas
On Fri, Apr 28, 2017 at 6:01 AM, Konstantin Knizhnik <
k.knizh...@postgrespro.ru> wrote:

>
> Any comments and suggestions for future improvement of this patch are
> welcome.
>

+PG_TRY();
+{
+query = parse_analyze_varparams(parse_tree,
+query_string,
+¶m_types,
+&num_params);
+}
+PG_CATCH();
+{
+/*
+ * In case of analyze errors revert back to original query
processing
+ * and disable autoprepare for this query to avoid such
problems in future.
+ */
+FlushErrorState();
+if (snapshot_set) {
+PopActiveSnapshot();
+}
+entry->disable_autoprepare = true;
+undo_query_plan_changes(parse_tree, const_param_list);
+MemoryContextSwitchTo(old_context);
+return false;
+}
+PG_END_TRY();

This is definitely not a safe way of using TRY/CATCH.

+
+/* Convert literal value to parameter value */
+switch (const_param->literal->val.type)
+{
+  /*
+   * Convert from integer literal
+   */
+  case T_Integer:
+switch (ptype) {
+  case INT8OID:
+params->params[paramno].value =
Int64GetDatum((int64)const_param->literal->val.val.ival);
+break;
+  case INT4OID:
+params->params[paramno].value =
Int32GetDatum((int32)const_param->literal->val.val.ival);
+break;
+  case INT2OID:
+if (const_param->literal->val.val.ival < SHRT_MIN
+|| const_param->literal->val.val.ival > SHRT_MAX)
+{
+ereport(ERROR,
+
(errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
+ errmsg("smallint out of range")));
+}
+params->params[paramno].value =
Int16GetDatum((int16)const_param->literal->val.val.ival);
+break;
+  case FLOAT4OID:
+params->params[paramno].value =
Float4GetDatum((float)const_param->literal->val.val.ival);
+break;
+  case FLOAT8OID:
+params->params[paramno].value =
Float8GetDatum((double)const_param->literal->val.val.ival);
+break;
+  case INT4RANGEOID:
+sprintf(buf, "[%ld,%ld]",
const_param->literal->val.val.ival, const_param->literal->val.val.ival);
+getTypeInputInfo(ptype, &typinput, &typioparam);
+params->params[paramno].value =
OidInputFunctionCall(typinput, buf, typioparam, -1);
+break;
+  default:
+pg_lltoa(const_param->literal->val.val.ival, buf);
+getTypeInputInfo(ptype, &typinput, &typioparam);
+params->params[paramno].value =
OidInputFunctionCall(typinput, buf, typioparam, -1);
+}
+break;
+  case T_Null:
+params->params[paramno].isnull = true;
+break;
+  default:
+/*
+ * Convert from string literal
+ */
+getTypeInputInfo(ptype, &typinput, &typioparam);
+params->params[paramno].value =
OidInputFunctionCall(typinput, const_param->literal->val.val.str,
typioparam, -1);
+}

I don't see something with a bunch of hard-coded rules for particular type
OIDs having any chance of being acceptable.

This patch seems to duplicate a large amount of existing code.  That would
be a good thing to avoid.

It could use a visit from the style police and a spell-checker, too.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: [HACKERS] Cached plans and statement generalization

2017-04-28 Thread Konstantin Knizhnik



On 26.04.2017 13:46, Pavel Stehule wrote:


I attach new patch which allows to limit the number of
autoprepared statements (autoprepare_limit GUC variable).
Also I did more measurements, now with several concurrent
connections and read-only statements.
Results of pgbench with 10 connections, scale 10 and read-only
statements are below:

Protocol
TPS
extended
87k
prepared
209k
simple+autoprepare
206k


As you can see, autoprepare provides more than 2 times speed
improvement.

Also I tried to measure overhead of parsing (to be able to
substitute all literals, not only string literals).
I just added extra call of pg_parse_query. Speed is reduced to 181k.
So overhead is noticeable, but still making such optimization useful.
This is why I want to ask question:  is it better to implement
slower but safer and more universal solution?


Unsafe solution has not any sense, and it is dangerous (80% of 
database users has not necessary knowledge). If somebody needs the max 
possible performance, then he use explicit prepared statements.




I attached new patch to this mail. I completely reimplement my original 
approach and now use parse tree transformation.

New pgbench (-S -c 10) results are the following:

Protocol
TPS
extended
87k
prepared
209k
simple+autoprepare
185k


So there is some slowdown comparing with my original implementation and 
explicitly prepared statements, but still it provide more than two times 
speed-up comparing with unprepared queries. And it doesn't require to 
change existed applications.
As far as most of real production application are working with DBMS 
through some connection pool (pgbouncer,...), I think that such 
optimization will be useful.
Isn't it interesting if If we can increase system throughput almost two 
times by just setting one parameter in configuration file?


I also tried to enable autoprepare by default and run regression tests. 
7 tests are not passed because of the following reasons:
1. Slightly different error reporting (for example error location is not 
always identically specified).
2. Difference in query behavior caused by  changed local settings 
(Andres gives an example with search_path,  and date test is failed 
because of changing datestyle).
3. Problems with indirect dependencies (when table is altered only 
cached plans directly depending on this relation and invalidated, but 
not plans with indirect dependencies).

4. Not performing domain checks for null values.

I do not think that this issues can cause problems for real application.

Also it is possible to limit number of autoprepared statements using 
autoprepare_limit parameter, avoid possible backend memory overflow in 
case of larger number of unique queries sent by application. LRU 
discipline is used to drop least recently used plans.


Any comments and suggestions for future improvement of this patch are 
welcome.


--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

diff --git a/src/backend/nodes/nodeFuncs.c b/src/backend/nodes/nodeFuncs.c
index cd39167..4fbc8b7 100644
--- a/src/backend/nodes/nodeFuncs.c
+++ b/src/backend/nodes/nodeFuncs.c
@@ -3610,6 +3610,454 @@ raw_expression_tree_walker(Node *node,
 }
 
 /*
+ * raw_expression_tree_mutator --- transform raw parse tree. 
+ *
+ * This function is implementing slightly different approach for tree update than expression_tree_mutator().
+ * Callback is given pointer to pointer to the current node and can update this field instead of returning reference to new node.
+ * It makes it possible to remember changes and easily revert them without extra traversal of the tree.
+ * 
+ * This function do not need QTW_DONT_COPY_QUERY flag: it never implicitly copy tree nodes, doing in-place update.
+ * 
+ * Like raw_expression_tree_walker, there is no special rule about query
+ * boundaries: we descend to everything that's possibly interesting.
+ *
+ * Currently, the node type coverage here extends only to DML statements
+ * (SELECT/INSERT/UPDATE/DELETE) and nodes that can appear in them, because
+ * this is used mainly during analysis of CTEs, and only DML statements can
+ * appear in CTEs. If some other node is visited, iteration is immediately stopped and true is returned.
+ */
+bool
+raw_expression_tree_mutator(Node *node,
+			bool (*mutator) (),
+			void *context)
+{
+	ListCell   *temp;
+
+	/*
+	 * The walker has already visited the current node, and so we need only
+	 * recurse into any sub-nodes it has.
+	 */
+	if (node == NULL)
+		return false;
+
+	/* Guard against stack overflow due to overly complex expressions */
+	check_stack_depth();
+
+	switch (nodeTag(node))
+	{
+		case T_SetToDefault:
+		case T_CurrentOfExpr:
+		case T_Integer:
+		case T_Float:
+		case T_String:
+		case T_BitString:
+		case T_Null:
+		case T_ParamRef:
+		case T_A_Const:
+		case T_A

Re: [HACKERS] Cached plans and statement generalization

2017-04-26 Thread Konstantin Knizhnik

On 04/26/2017 08:08 PM, Doug Doole wrote:


A naive option would be to invalidate anything that depends on table or 
view *.FOOBAR. You could probably make it a bit smarter by also requiring that 
schema A appear in the path.


This has been rumbling around in my head. I wonder if you could solve this 
problem by registering dependencies on objects which don't yet exist. Consider:

CREATE TABLE C.T1(...);
CREATE TABLE C.T2(...);
SET search_path='A,B,C,D';
SELECT * FROM C.T1, T2;

For T1, you'd register a hard dependency on C.T1 and no virtual dependencies 
since the table is explicitly qualified.

For T2, you'd register a hard dependency on C.T2 since that is the table that was selected for the query. You'd also register virtual dependencies on A.T2 and B.T2 since if either of those tables (or views) are created you need to recompile the 
statement. (Note that no virtual dependency is created on D.T2() since that table would never be selected by the compiler.)


The catch is that virtual dependencies would have to be recorded and searched 
as strings, not OIDs since the objects don't exist. Virtual dependencies only 
have to be checked during CREATE processing though, so that might not be too 
bad.

But this is getting off topic - I just wanted to capture the idea while it was 
rumbling around.


I think that it will be enough to handle modification of search path and 
invalidate prepared statements cache in this case.

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: [HACKERS] Cached plans and statement generalization

2017-04-26 Thread Doug Doole
>
> A naive option would be to invalidate anything that depends on table or
> view *.FOOBAR. You could probably make it a bit smarter by also requiring
> that schema A appear in the path.
>

This has been rumbling around in my head. I wonder if you could solve this
problem by registering dependencies on objects which don't yet exist.
Consider:

CREATE TABLE C.T1(...);
CREATE TABLE C.T2(...);
SET search_path='A,B,C,D';
SELECT * FROM C.T1, T2;

For T1, you'd register a hard dependency on C.T1 and no virtual
dependencies since the table is explicitly qualified.

For T2, you'd register a hard dependency on C.T2 since that is the table
that was selected for the query. You'd also register virtual dependencies
on A.T2 and B.T2 since if either of those tables (or views) are created you
need to recompile the statement. (Note that no virtual dependency is
created on D.T2() since that table would never be selected by the compiler.)

The catch is that virtual dependencies would have to be recorded and
searched as strings, not OIDs since the objects don't exist. Virtual
dependencies only have to be checked during CREATE processing though, so
that might not be too bad.

But this is getting off topic - I just wanted to capture the idea while it
was rumbling around.


Re: [HACKERS] Cached plans and statement generalization

2017-04-26 Thread Pavel Stehule
2017-04-26 12:30 GMT+02:00 Konstantin Knizhnik :

>
>
> On 26.04.2017 10:49, Konstantin Knizhnik wrote:
>
>
>
> On 26.04.2017 04:00, Tsunakawa, Takayuki wrote:   Are you considering some
> upper limit on the number of prepared statements?
> In this case we need some kind of LRU for maintaining cache of
> autoprepared statements.
> I think that it is good idea to have such limited cached - it can avoid
> memory overflow problem.
> I will try to implement it.
>
>
> I attach new patch which allows to limit the number of autoprepared
> statements (autoprepare_limit GUC variable).
> Also I did more measurements, now with several concurrent connections and
> read-only statements.
> Results of pgbench with 10 connections, scale 10 and read-only statements
> are below:
>
> Protocol
> TPS
> extended
> 87k
> prepared
> 209k
> simple+autoprepare
> 206k
>
> As you can see, autoprepare provides more than 2 times speed improvement.
>
> Also I tried to measure overhead of parsing (to be able to substitute all
> literals, not only string literals).
> I just added extra call of pg_parse_query. Speed is reduced to 181k.
> So overhead is noticeable, but still making such optimization useful.
> This is why I want to ask question:  is it better to implement slower but
> safer and more universal solution?
>

Unsafe solution has not any sense, and it is dangerous (80% of database
users has not necessary knowledge). If somebody needs the max possible
performance, then he use explicit prepared statements.

Regards

Pavel


>
>
> --
> Konstantin Knizhnik
> Postgres Professional: http://www.postgrespro.com
> The Russian Postgres Company
>
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>
>


Re: [HACKERS] Cached plans and statement generalization

2017-04-26 Thread Konstantin Knizhnik



On 26.04.2017 10:49, Konstantin Knizhnik wrote:



On 26.04.2017 04:00, Tsunakawa, Takayuki wrote:   Are you considering 
some upper limit on the number of prepared statements?
In this case we need some kind of LRU for maintaining cache of 
autoprepared statements.
I think that it is good idea to have such limited cached - it can 
avoid memory overflow problem.

I will try to implement it.


I attach new patch which allows to limit the number of autoprepared 
statements (autoprepare_limit GUC variable).
Also I did more measurements, now with several concurrent connections 
and read-only statements.
Results of pgbench with 10 connections, scale 10 and read-only 
statements are below:


Protocol
TPS
extended
87k
prepared
209k
simple+autoprepare
206k


As you can see, autoprepare provides more than 2 times speed improvement.

Also I tried to measure overhead of parsing (to be able to substitute 
all literals, not only string literals).

I just added extra call of pg_parse_query. Speed is reduced to 181k.
So overhead is noticeable, but still making such optimization useful.
This is why I want to ask question:  is it better to implement slower 
but safer and more universal solution?


--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index f6be98b..0c9abfc 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -188,6 +188,7 @@ static bool IsTransactionStmtList(List *parseTrees);
 static void drop_unnamed_stmt(void);
 static void SigHupHandler(SIGNAL_ARGS);
 static void log_disconnections(int code, Datum arg);
+static bool exec_cached_query(const char *query_string);
 
 
 /* 
@@ -916,6 +917,14 @@ exec_simple_query(const char *query_string)
 	drop_unnamed_stmt();
 
 	/*
+	 * Try to find cached plan
+	 */
+	if (autoprepare_threshold != 0 && exec_cached_query(query_string))
+	{
+		return;
+	}
+
+	/*
 	 * Switch to appropriate context for constructing parsetrees.
 	 */
 	oldcontext = MemoryContextSwitchTo(MessageContext);
@@ -4500,3 +4509,606 @@ log_disconnections(int code, Datum arg)
 	port->user_name, port->database_name, port->remote_host,
   port->remote_port[0] ? " port=" : "", port->remote_port)));
 }
+
+typedef struct { 
+	char const*   query;
+	dlist_nodelru;
+	int64 exec_count;
+	CachedPlanSource* plan;	
+	int   n_params;
+	int16 format;
+	bool  disable_autoprepare;
+} plan_cache_entry;
+
+/*
+ * Replace string literals with parameters. We do not consider integer or real literals to avoid problems with 
+ * negative number, user defined operators, ... For example it is not easy to distinguish cases (-1), (1-1), (1-1)-1
+ */
+static void generalize_statement(const char *query_string, char** gen_query, char** query_params, int* n_params)
+{
+	size_t query_len = strlen(query_string);
+	char const* src = query_string;
+	char* dst;
+	char* params;
+	unsigned char ch;
+
+	*n_params = 0;
+
+	*gen_query = (char*)palloc(query_len*2); /* assume that we have less than 1000 parameters, the worst case is replacing '' with $999 */
+	*query_params = (char*)palloc(query_len + 1);
+	dst = *gen_query;
+	params = *query_params;
+
+	while ((ch = *src++) != '\0') { 
+		if (isspace(ch)) { 
+			/* Replace sequence of whitespaces with just one space */
+			while (*src && isspace(*(unsigned char*)src)) { 
+src += 1;
+			}
+			*dst++ = ' ';
+		} else if (ch == '\'') { 
+			while (true) { 
+ch = *src++;
+if (ch == '\'') { 
+	if (*src != '\'') { 
+		break;
+	} else {
+		/* escaped quote */
+		*params++ = '\'';
+		src += 1;
+	}
+} else { 
+	*params++ = ch;
+}
+			}
+			*params++ = '\0';
+			dst += sprintf(dst, "$%d", ++*n_params);
+		} else { 
+			*dst++ = ch;
+		}
+	}			
+	Assert(dst <= *gen_query + query_len);
+	Assert(params <= *query_params + query_len*2);
+	*dst = '\0';
+}
+
+static uint32 plan_cache_hash_fn(const void *key, Size keysize)
+{
+	return string_hash(((plan_cache_entry*)key)->query, 0);
+}
+
+static int plan_cache_match_fn(const void *key1, const void *key2, Size keysize)
+{
+	return strcmp(((plan_cache_entry*)key1)->query, ((plan_cache_entry*)key2)->query);
+}
+
+static void* plan_cache_keycopy_fn(void *dest, const void *src, Size keysize)
+{ 
+	((plan_cache_entry*)dest)->query = pstrdup(((plan_cache_entry*)src)->query);
+return dest;
+}
+
+#define PLAN_CACHE_SIZE 113
+
+size_t nPlanCacheHits;
+size_t nPlanCacheMisses;
+
+/*
+ * Try to generalize query, find cached plan for it and execute
+ */
+static bool exec_cached_query(const char *query_string)
+{
+	CommandDest   dest = whereToSendOutput;
+	DestReceiver *receiver;
+	char *gen_query;
+	char *query_params;
+	int   n_params;
+	plan_cache_entry *entry;
+	bo

Re: [HACKERS] Cached plans and statement generalization

2017-04-26 Thread Konstantin Knizhnik



On 26.04.2017 04:00, Tsunakawa, Takayuki wrote:

From: pgsql-hackers-ow...@postgresql.org

[mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Konstantin
Knizhnik
Well, first of all I want to share results I already get: pgbench with default
parameters,  scale 10 and one connection:

So autoprepare is as efficient as explicit prepare and can increase
performance almost two times.

This sounds great.

BTW, when are the autoprepared statements destroyed?
Right now them are destroyed only in case of receiving invalidation  
message (when catalog is changed).
Prepared statements are local to backend and are located in backend's  
memory.
It is unlikely, that there will be too much different queries which  
cause memory overflow.

But in theory such situation is certainly possible.



  Are you considering some upper limit on the number of prepared statements?
In this case we need some kind of LRU for maintaining cache of  
autoprepared statements.
I think that it is good idea to have such limited cached - it can avoid  
memory overflow problem.

I will try to implement it.



Regards
Takayuki Tsunakawa




--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cached plans and statement generalization

2017-04-26 Thread Konstantin Knizhnik



On 26.04.2017 01:34, Andres Freund wrote:

Hi,

(FWIW, on this list we don't do top-quotes)

On 2017-04-25 22:21:22 +, Doug Doole wrote:

Plan invalidation was no different than for any SQL statement. DB2 keeps a
list of the objects the statement depends on. If any of the objects changes
in an incompatible way the plan is invalidated and kicked out of the cache.

I suspect what is more interesting is plan lookup. DB2 has something called
the "compilation environment". This is a collection of everything that
impacts how a statement is compiled (SQL path, optimization level, etc.).
Plan lookup is done using both the statement text and the compilation
environment. So, for example, if my path is DOUG, MYTEAM, SYSIBM and your
path is ANDRES, MYTEAM, SYSIBM we will have different compilation
environments. If we both issue "SELECT * FROM T" we'll end up with
different cache entries even if T in both of our statements resolves to
MYTEAM.T. If I execute "SELECT * FROM T", change my SQL path and then
execute "SELECT * FROM T" again, I have a new compilation environment so
the second invocation of the statement will create a new entry in the
cache. The first entry is not kicked out - it will still be there for
re-use if I change my SQL path back to my original value (modulo LRU for
cache memory management of course).

It's not always that simple, at least in postgres, unless you disregard
search_path.  Consider e.g. cases like

CREATE SCHEMA a;
CREATE SCHEMA b;
CREATE TABLE a.foobar(somecol int);
SET search_patch = 'b,a';
SELECT * FROM foobar;
CREATE TABLE b.foobar(anothercol int);
SELECT * FROM foobar; -- may not be cached plan from before!

it sounds - my memory of DB2 is very faint, and I never used it much -
like similar issues could arise in DB2 too?


There is the same problem with explicitly prepared statements, isn't it?
Certainly in case of using prepared statements it is responsibility of 
programmer to avoid such collisions.

And in case of autoprepare programmer it is hidden from programming.
But there is guc variable controlling autoprepare feature and by default 
it is switched off.
So if programmer or DBA enables it, then them should take in account 
effects of such decision.


By the way, isn't it a bug in PostgreSQL that altering search path is 
not invalidating cached plans?
As I already mentioned, the same problem can be reproduced with 
explicitly prepared statements.






Greetings,

Andres Freund


--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cached plans and statement generalization

2017-04-26 Thread Konstantin Knizhnik



On 26.04.2017 00:47, Andres Freund wrote:

On 2017-04-25 21:11:08 +, Doug Doole wrote:

When I did this in DB2, I didn't use the parser - it was too expensive. I
just tokenized the statement and used some simple rules to bypass the
invalid cases. For example, if I saw the tokens "ORDER" and "BY" then I'd
disallow replacement replacement until I hit the end of the current
subquery or statement.

How did you manage plan invalidation and such?


The same mechanism as for prepared statements.
Cached plans are linked in the list by SaveCachedPlan function and are 
invalidated by PlanCacheRelCallback.





- Andres




--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cached plans and statement generalization

2017-04-25 Thread Tsunakawa, Takayuki
From: pgsql-hackers-ow...@postgresql.org
> [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Konstantin
> Knizhnik
> Well, first of all I want to share results I already get: pgbench with default
> parameters,  scale 10 and one connection:
> 
> So autoprepare is as efficient as explicit prepare and can increase
> performance almost two times.

This sounds great.

BTW, when are the autoprepared statements destroyed?  Are you considering some 
upper limit on the number of prepared statements?

Regards
Takayuki Tsunakawa




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cached plans and statement generalization

2017-04-25 Thread Serge Rielau
via Newton Mail 
[https://cloudmagic.com/k/d/mailapp?ct=dx&cv=9.4.52&pv=10.11.6&source=email_footer_2]
On Tue, Apr 25, 2017 at 3:48 PM, Doug Doole  wrote: It's 
not always that simple, at least in postgres, unless you disregard
search_path. Consider e.g. cases like

CREATE SCHEMA a;
CREATE SCHEMA b;
CREATE TABLE a.foobar(somecol int);
SET search_patch = 'b,a';
SELECT * FROM foobar;
CREATE TABLE b.foobar(anothercol int);
SELECT * FROM foobar; -- may not be cached plan from before!

it sounds - my memory of DB2 is very faint, and I never used it much -
like similar issues could arise in DB2 too?

DB2 does handle this case. Unfortunately I don't know the details of how it 
worked though.
A naive option would be to invalidate anything that depends on table or view 
*.FOOBAR. You could probably make it a bit smarter by also requiring that 
schema A appear in the path. While this specific scenario does not arise in DB2 
since it uses CURRENT SCHEMA only for tables (much to my dislike) your examples 
holds for functions and types which are resolved by path. For encapsulated SQL 
(in views, functions) conservative semantics are enforced via including the 
timestamp. For dynamic SQL the problem you describe does exist though and I 
think it is handled in the way Doug describes. However, as noted by Doug the 
topic of plan invalidation is really orthogonal to normalizing the queries. All 
it does is provide more opportunities to run into any pre-existing bugs.
Cheers Serge
PS: I’m just starting to look at the plan invalidation code in PG because we 
are dealing with potentially 10s of thousands of cached SQL statements. So 
these complete wipe outs or walks of every plan in the cache don’t scale.

Re: [HACKERS] Cached plans and statement generalization

2017-04-25 Thread Finnerty, Jim

On 4/25/17, 6:34 PM, "pgsql-hackers-ow...@postgresql.org on behalf of Andres 
Freund"  
wrote:

It's not always that simple, at least in postgres, unless you disregard
search_path.  Consider e.g. cases like

CREATE SCHEMA a;
CREATE SCHEMA b;
CREATE TABLE a.foobar(somecol int);
SET search_patch = 'b,a';
SELECT * FROM foobar;
CREATE TABLE b.foobar(anothercol int);
SELECT * FROM foobar; -- may not be cached plan from before!

it sounds - my memory of DB2 is very faint, and I never used it much -
like similar issues could arise in DB2 too?

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


You may need to store the reloid of each relation in the cached query to ensure 
that the schema bindings are the same, and also invalidate dependent cache 
entries if a referenced relation is dropped. 

Regards,

Jim Finnerty


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cached plans and statement generalization

2017-04-25 Thread Doug Doole
>
> (FWIW, on this list we don't do top-quotes)
>

I know. Forgot and just did "reply all". My bad.

It's not always that simple, at least in postgres, unless you disregard
> search_path.  Consider e.g. cases like
>
> CREATE SCHEMA a;
> CREATE SCHEMA b;
> CREATE TABLE a.foobar(somecol int);
> SET search_patch = 'b,a';
> SELECT * FROM foobar;
> CREATE TABLE b.foobar(anothercol int);
> SELECT * FROM foobar; -- may not be cached plan from before!
>
> it sounds - my memory of DB2 is very faint, and I never used it much -
> like similar issues could arise in DB2 too?
>

DB2 does handle this case. Unfortunately I don't know the details of how it
worked though.

A naive option would be to invalidate anything that depends on table or
view *.FOOBAR. You could probably make it a bit smarter by also requiring
that schema A appear in the path.

- Doug


Re: [HACKERS] Cached plans and statement generalization

2017-04-25 Thread David G. Johnston
On Tue, Apr 25, 2017 at 3:24 PM, David Fetter  wrote:

> I don't have an exploit yet.  What concerns me is attackers' access to
> what is in essence the ability to poke at RULEs when they only have
> privileges to read.
>

​If they want to see how it works they can read the source code.  In terms
of runtime data it would limited to whatever the session itself created.
In most cases the presence of the cache would be invisible.  I suppose it
might appear if one were to explain a query, reset the session, explain
another query and then re-explain the original.  If the chosen plan in the
second pass differed because of the presence of the leading query it would
be noticeable but not revealing.  Albeit I'm a far cry from a security
expert...

David J.
​


Re: [HACKERS] Cached plans and statement generalization

2017-04-25 Thread Andres Freund
Hi,

(FWIW, on this list we don't do top-quotes)

On 2017-04-25 22:21:22 +, Doug Doole wrote:
> Plan invalidation was no different than for any SQL statement. DB2 keeps a
> list of the objects the statement depends on. If any of the objects changes
> in an incompatible way the plan is invalidated and kicked out of the cache.
> 
> I suspect what is more interesting is plan lookup. DB2 has something called
> the "compilation environment". This is a collection of everything that
> impacts how a statement is compiled (SQL path, optimization level, etc.).
> Plan lookup is done using both the statement text and the compilation
> environment. So, for example, if my path is DOUG, MYTEAM, SYSIBM and your
> path is ANDRES, MYTEAM, SYSIBM we will have different compilation
> environments. If we both issue "SELECT * FROM T" we'll end up with
> different cache entries even if T in both of our statements resolves to
> MYTEAM.T. If I execute "SELECT * FROM T", change my SQL path and then
> execute "SELECT * FROM T" again, I have a new compilation environment so
> the second invocation of the statement will create a new entry in the
> cache. The first entry is not kicked out - it will still be there for
> re-use if I change my SQL path back to my original value (modulo LRU for
> cache memory management of course).

It's not always that simple, at least in postgres, unless you disregard
search_path.  Consider e.g. cases like

CREATE SCHEMA a;
CREATE SCHEMA b;
CREATE TABLE a.foobar(somecol int);
SET search_patch = 'b,a';
SELECT * FROM foobar;
CREATE TABLE b.foobar(anothercol int);
SELECT * FROM foobar; -- may not be cached plan from before!

it sounds - my memory of DB2 is very faint, and I never used it much -
like similar issues could arise in DB2 too?

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cached plans and statement generalization

2017-04-25 Thread David Fetter
On Tue, Apr 25, 2017 at 11:35:21PM +0300, Konstantin Knizhnik wrote:
> On 04/25/2017 07:54 PM, David Fetter wrote:
> > On Tue, Apr 25, 2017 at 06:11:09PM +0300, Konstantin Knizhnik wrote:
> > > On 24.04.2017 21:43, Andres Freund wrote:
> > > > Hi,
> > > > 
> > > > On 2017-04-24 11:46:02 +0300, Konstantin Knizhnik wrote:
> > > > > So what I am thinking now is implicit query caching. If the same 
> > > > > query with
> > > > > different literal values is repeated many times, then we can try to
> > > > > generalize this query and replace it with prepared query with
> > > > > parameters.
> > > > That's not actuall all that easy:
> > > > - You pretty much do parse analysis to be able to do an accurate match.
> > > > How much overhead is parse analysis vs. planning in your cases?
> > > > - The invalidation infrastructure for this, if not tied to to fully
> > > > parse-analyzed statements, is going to be hell.
> > > > - Migrating to parameters can actually cause significant slowdowns, not
> > > > nice if that happens implicitly.
> > > Well, first of all I want to share results I already get: pgbench with
> > > default parameters,  scale 10 and one connection:
> > > 
> > > protocol
> > >   TPS
> > > simple
> > >   3492
> > > extended
> > >   2927
> > > prepared
> > >   6865
> > > simple + autoprepare
> > >   6844
> > If this is string mashing on the unparsed query, as it appears to be,
> > it's going to be a perennial source of security issues.
> 
> Sorry, may be I missed something, but I can not understand how
> security can be violated by extracting string literals from query. I
> am just copying bytes from one buffer to another. I do not try to
> somehow interpret this parameters.  What I am doing is very similar
> with standard prepared statements.  And moreover query is parsed!
> Only query which was already parsed and executed (but with different
> values of parameters) can be autoprepared.

I don't have an exploit yet.  What concerns me is attackers' access to
what is in essence the ability to poke at RULEs when they only have
privileges to read.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david(dot)fetter(at)gmail(dot)com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cached plans and statement generalization

2017-04-25 Thread Doug Doole
Plan invalidation was no different than for any SQL statement. DB2 keeps a
list of the objects the statement depends on. If any of the objects changes
in an incompatible way the plan is invalidated and kicked out of the cache.

I suspect what is more interesting is plan lookup. DB2 has something called
the "compilation environment". This is a collection of everything that
impacts how a statement is compiled (SQL path, optimization level, etc.).
Plan lookup is done using both the statement text and the compilation
environment. So, for example, if my path is DOUG, MYTEAM, SYSIBM and your
path is ANDRES, MYTEAM, SYSIBM we will have different compilation
environments. If we both issue "SELECT * FROM T" we'll end up with
different cache entries even if T in both of our statements resolves to
MYTEAM.T. If I execute "SELECT * FROM T", change my SQL path and then
execute "SELECT * FROM T" again, I have a new compilation environment so
the second invocation of the statement will create a new entry in the
cache. The first entry is not kicked out - it will still be there for
re-use if I change my SQL path back to my original value (modulo LRU for
cache memory management of course).

With literal replacement, the cache entry is on the modified statement
text. Given the modified statement text and the compilation environment,
you're guaranteed to get the right plan entry.

On Tue, Apr 25, 2017 at 2:47 PM Andres Freund  wrote:

> On 2017-04-25 21:11:08 +, Doug Doole wrote:
> > When I did this in DB2, I didn't use the parser - it was too expensive. I
> > just tokenized the statement and used some simple rules to bypass the
> > invalid cases. For example, if I saw the tokens "ORDER" and "BY" then I'd
> > disallow replacement replacement until I hit the end of the current
> > subquery or statement.
>
> How did you manage plan invalidation and such?
>
> - Andres
>


Re: [HACKERS] Cached plans and statement generalization

2017-04-25 Thread Andres Freund
On 2017-04-25 21:11:08 +, Doug Doole wrote:
> When I did this in DB2, I didn't use the parser - it was too expensive. I
> just tokenized the statement and used some simple rules to bypass the
> invalid cases. For example, if I saw the tokens "ORDER" and "BY" then I'd
> disallow replacement replacement until I hit the end of the current
> subquery or statement.

How did you manage plan invalidation and such?

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cached plans and statement generalization

2017-04-25 Thread Doug Doole
When I did this in DB2, I didn't use the parser - it was too expensive. I
just tokenized the statement and used some simple rules to bypass the
invalid cases. For example, if I saw the tokens "ORDER" and "BY" then I'd
disallow replacement replacement until I hit the end of the current
subquery or statement.

There are a few limitations to this approach. For example, DB2 allowed you
to cast using function notation like VARCHAR(foo, 10). This meant I would
never replace the second parameter of any VARCHAR function. Now it's
possible that when the statement was fully compiled we'd find that
VARCHAR(foo,10) actually resolved to BOB.VARCHAR() instead of the built-in
cast function. Our thinking that these cases were rare enough that we
wouldn't worry about them. (Of course, PostgreSQL's ::VARCHAR(10) syntax
avoids this problem completely.)

Because SQL is so structured, the implementation ended up being quite
simple (a few hundred line of code) with no significant maintenance issues.
(Other developers had no problem adding in new cases where constants had to
be preserved.)

The simple tokenizer was also fairly extensible. I'd prototyped using the
same code to also normalize statements (uppercase all keywords, collapse
whitespace to a single blank, etc.) but that feature was never added to the
product.

- Doug

On Tue, Apr 25, 2017 at 1:47 PM Konstantin Knizhnik <
k.knizh...@postgrespro.ru> wrote:

> On 04/25/2017 11:40 PM, Serge Rielau wrote:
>
>
> On Apr 25, 2017, at 1:37 PM, Konstantin Knizhnik <
> k.knizh...@postgrespro.ru> wrote:
>
>
> SELECT ‘hello’::CHAR(10) || ‘World’, 5 + 6;
>
> You can substitute ‘hello’, ‘World’, 5, and 6. But not 10.
>
>
> I am substituting only string literals. So the query above will be
> transformed to
>
> SELECT $1::CHAR(10) || $2, 5 + 6;
>
> What's wrong with it?
>
>
> Oh, well that leaves a lot of opportunities on the table, doesn’t it?
>
>
> Well, actually my primary intention was not to make badly designed
> programs (not using prepared statements) work faster.
> I wanted to address cases when it is not possible to use prepared
> statements.
> If we want to substitute with parameters as much literals as possible,
> then parse+deparse tree seems to be the only reasonable approach.
> I will try to implement it also, just to estimate parsing overhead.
>
>
>
>
> Cheers
> Serge
>
>
>
> --
> Konstantin Knizhnik
> Postgres Professional: http://www.postgrespro.com
> The Russian Postgres Company
>
>


Re: [HACKERS] Cached plans and statement generalization

2017-04-25 Thread Konstantin Knizhnik

On 04/25/2017 11:40 PM, Serge Rielau wrote:



On Apr 25, 2017, at 1:37 PM, Konstantin Knizhnik mailto:k.knizh...@postgrespro.ru>> wrote:



SELECT ‘hello’::CHAR(10) || ‘World’, 5 + 6;

You can substitute ‘hello’, ‘World’, 5, and 6. But not 10.


I am substituting only string literals. So the query above will be transformed 
to

SELECT $1::CHAR(10) || $2, 5 + 6;

What's wrong with it?


Oh, well that leaves a lot of opportunities on the table, doesn’t it?


Well, actually my primary intention was not to make badly designed programs 
(not using prepared statements) work faster.
I wanted to address cases when it is not possible to use prepared statements.
If we want to substitute with parameters as much literals as possible, then 
parse+deparse tree seems to be the only reasonable approach.
I will try to implement it also, just to estimate parsing overhead.





Cheers
Serge




--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: [HACKERS] Cached plans and statement generalization

2017-04-25 Thread Serge Rielau

> On Apr 25, 2017, at 1:37 PM, Konstantin Knizhnik  
> wrote:
>> 
>> SELECT ‘hello’::CHAR(10) || ‘World’, 5 + 6;
>> 
>> You can substitute ‘hello’, ‘World’, 5, and 6. But not 10.
> 
> I am substituting only string literals. So the query above will be 
> transformed to 
> 
> SELECT $1::CHAR(10) || $2, 5 + 6;
> 
> What's wrong with it?

Oh, well that leaves a lot of opportunities on the table, doesn’t it?

Cheers
Serge



Re: [HACKERS] Cached plans and statement generalization

2017-04-25 Thread Konstantin Knizhnik

On 04/25/2017 08:09 PM, Serge Rielau wrote:


On Tue, Apr 25, 2017 at 9:45 AM, Konstantin Knizhnik 
 wrote:

On 25.04.2017 19:12, Serge Rielau wrote:


On Apr 25, 2017, at 8:11 AM, Konstantin Knizhnik mailto:k.knizh...@postgrespro.ru>> wrote:
Another problem is caused by using integer literals in context where 
parameters can not be used, for example "order by 1”.

You will also need to deal with modifiers in types such as VARCHAR(10). 
Not sure if there are specific functions which can only deal with literals (?) 
as well.

Sorry, I do not completely understand how presence of type modifiers can 
affect string literals used in query.
Can you provide me some example?

SELECT ‘hello’::CHAR(10) || ‘World’, 5 + 6;

You can substitute ‘hello’, ‘World’, 5, and 6. But not 10.


I am substituting only string literals. So the query above will be transformed 
to

SELECT $1::CHAR(10) || $2, 5 + 6;

What's wrong with it?



Also some OLAP syntax like “rows preceding”

It pretty much boils down to whether you can do some shallow parsing rather 
than expending the effort to build the parse tree.

Cheers
Serge



--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: [HACKERS] Cached plans and statement generalization

2017-04-25 Thread Konstantin Knizhnik

On 04/25/2017 07:54 PM, David Fetter wrote:

On Tue, Apr 25, 2017 at 06:11:09PM +0300, Konstantin Knizhnik wrote:

On 24.04.2017 21:43, Andres Freund wrote:

Hi,

On 2017-04-24 11:46:02 +0300, Konstantin Knizhnik wrote:

So what I am thinking now is implicit query caching. If the same query with
different literal values is repeated many times, then we can try to
generalize this query and replace it with prepared query with
parameters.

That's not actuall all that easy:
- You pretty much do parse analysis to be able to do an accurate match.
How much overhead is parse analysis vs. planning in your cases?
- The invalidation infrastructure for this, if not tied to to fully
parse-analyzed statements, is going to be hell.
- Migrating to parameters can actually cause significant slowdowns, not
nice if that happens implicitly.

Well, first of all I want to share results I already get: pgbench with
default parameters,  scale 10 and one connection:

protocol
TPS
simple
3492
extended
2927
prepared
6865
simple + autoprepare
6844

If this is string mashing on the unparsed query, as it appears to be,
it's going to be a perennial source of security issues.


Sorry, may be I missed something, but I can not understand how security can be violated by extracting string literals from query. I am just copying bytes from one buffer to another. I do not try to somehow interpret this parameters.  What I am doing is 
very similar with standard prepared statements.

And moreover query is parsed! Only query which was already parsed and executed 
(but with different values of parameters) can be autoprepared.




Best,
David.



--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cached plans and statement generalization

2017-04-25 Thread Serge Rielau
On Tue, Apr 25, 2017 at 9:45 AM, Konstantin Knizhnik 
 wrote: On 25.04.2017 19:12, Serge Rielau wrote:

On Apr 25, 2017, at 8:11 AM, Konstantin Knizhnik < k.knizh...@postgrespro.ru 
[k.knizh...@postgrespro.ru] > wrote: Another problem is caused by using integer 
literals in context where parameters can not be used, for example "order by 1”.
You will also need to deal with modifiers in types such as VARCHAR(10). Not 
sure if there are specific functions which can only deal with literals (?) as 
well. Sorry, I do not completely understand how presence of type modifiers can 
affect string literals used in query.
Can you provide me some example? SELECT ‘hello’::CHAR(10) || ‘World’, 5 + 6;
You can substitute ‘hello’, ‘World’, 5, and 6. But not 10.
Also some OLAP syntax like “rows preceding”
It pretty much boils down to whether you can do some shallow parsing rather 
than expending the effort to build the parse tree.
Cheers Serge

Re: [HACKERS] Cached plans and statement generalization

2017-04-25 Thread David Fetter
On Tue, Apr 25, 2017 at 06:11:09PM +0300, Konstantin Knizhnik wrote:
> On 24.04.2017 21:43, Andres Freund wrote:
> > Hi,
> > 
> > On 2017-04-24 11:46:02 +0300, Konstantin Knizhnik wrote:
> > > So what I am thinking now is implicit query caching. If the same query 
> > > with
> > > different literal values is repeated many times, then we can try to
> > > generalize this query and replace it with prepared query with
> > > parameters.
> > That's not actuall all that easy:
> > - You pretty much do parse analysis to be able to do an accurate match.
> >How much overhead is parse analysis vs. planning in your cases?
> > - The invalidation infrastructure for this, if not tied to to fully
> >parse-analyzed statements, is going to be hell.
> > - Migrating to parameters can actually cause significant slowdowns, not
> >nice if that happens implicitly.
> 
> Well, first of all I want to share results I already get: pgbench with
> default parameters,  scale 10 and one connection:
> 
> protocol
>   TPS
> simple
>   3492
> extended
>   2927
> prepared
>   6865
> simple + autoprepare
>   6844

If this is string mashing on the unparsed query, as it appears to be,
it's going to be a perennial source of security issues.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david(dot)fetter(at)gmail(dot)com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cached plans and statement generalization

2017-04-25 Thread Konstantin Knizhnik



On 25.04.2017 19:12, Serge Rielau wrote:


On Apr 25, 2017, at 8:11 AM, Konstantin Knizhnik 
mailto:k.knizh...@postgrespro.ru>> wrote:
Another problem is caused by using integer literals in context where 
parameters can not be used, for example "order by 1”.
You will also need to deal with modifiers in types such as 
VARCHAR(10). Not sure if there are specific functions which can only 
deal with literals (?) as well.


Sorry, I do not completely understand how presence of type modifiers can 
affect string literals used in query.

Can you provide me some example?



Doug Doole did this work in DB2 LUW and he may be able to point to 
more places to watch out for semantically.


Generally, in my experience, this feature is very valuable when 
dealing with (poorly designed) web apps that just glue together strings.


I do not think that this optimization will be useful only for poorly 
designed application.
I already pointed on two use cases where prepapred statements can not be 
used:

1. pgbouncer without session-level pooling.
2. partitioning

Protecting it under a GUC would allow to only do the work if it’s 
deemed likely to help.
Another rule I find useful is to abort any efforts to substitute 
literals if any bind variable is found in the query.
That can be used as a cue that the author of the SQL left the 
remaining literals in on purpose.


A follow up feature would be to formalize different flavors of peeking.
I.e. can you produce a generic plan, but still recruit the initial set 
of bind values/substituted literals to dos costing?

Here situation is the same as for explicitly prepared statements, isn't it?
Sometimes it is preferrable to use specialized plan rather than generic 
plan.

I am not sure if postgres now is able to do it.




Cheers
Serge Rielau
Salesforce.com 

PS: FWIW, I like this feature.


--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: [HACKERS] Cached plans and statement generalization

2017-04-25 Thread Serge Rielau

> On Apr 25, 2017, at 8:11 AM, Konstantin Knizhnik  
> wrote:
> Another problem is caused by using integer literals in context where 
> parameters can not be used, for example "order by 1”.
You will also need to deal with modifiers in types such as VARCHAR(10). Not 
sure if there are specific functions which can only deal with literals (?) as 
well.

Doug Doole did this work in DB2 LUW and he may be able to point to more places 
to watch out for semantically.

Generally, in my experience, this feature is very valuable when dealing with 
(poorly designed) web apps that just glue together strings.
Protecting it under a GUC would allow to only do the work if it’s deemed likely 
to help.
Another rule I find useful is to abort any efforts to substitute literals if 
any bind variable is found in the query.
That can be used as a cue that the author of the SQL left the remaining 
literals in on purpose.

A follow up feature would be to formalize different flavors of peeking. 
I.e. can you produce a generic plan, but still recruit the initial set of bind 
values/substituted literals to dos costing?

Cheers
Serge Rielau
Salesforce.com 

PS: FWIW, I like this feature.

Re: [HACKERS] Cached plans and statement generalization

2017-04-25 Thread Konstantin Knizhnik

On 24.04.2017 21:43, Andres Freund wrote:

Hi,

On 2017-04-24 11:46:02 +0300, Konstantin Knizhnik wrote:

So what I am thinking now is implicit query caching. If the same query with
different literal values is repeated many times, then we can try to
generalize this query and replace it with prepared query with
parameters.

That's not actuall all that easy:
- You pretty much do parse analysis to be able to do an accurate match.
   How much overhead is parse analysis vs. planning in your cases?
- The invalidation infrastructure for this, if not tied to to fully
   parse-analyzed statements, is going to be hell.
- Migrating to parameters can actually cause significant slowdowns, not
   nice if that happens implicitly.


Well, first of all I want to share results I already get: pgbench with 
default parameters,  scale 10 and one connection:


protocol
TPS
simple
3492
extended
2927
prepared
6865
simple + autoprepare
6844


So autoprepare is as efficient as explicit prepare and can increase 
performance almost two times.


My current implementation is replacing with parameters only string 
literals in the query, i.e. select * from T where x='123'; -> select * 
from T where x=$1;
It greatly simplifies matching of parameters - it is just necessary to 
locate '\'' character and then correctly handle pairs  of quotes.

Handling of integer and real literals is really challenged task.
One source of problems is negation: it is not so easy to correctly 
understand whether minus should be treated as part of literal or as 
operator:

(-1), (1-1), (1-1)-1
Another problem is caused by using integer literals in context where 
parameters can not be used, for example "order by 1".


Fully correct substitution can be done by first performing parsing the 
query, then transform parse tree, replacing literal nodes with parameter 
nodes and finally deparse tree into generalized query. postgres_fdw 
already contains such deparse code. It can be moved to postgres core and 
reused for autoprepare (and may be somewhere else).

But in this case overhead will be much higher.
I still think that query parsing time is significantly smaller than time 
needed for building and optimizing query execution plan.

But it should be measured if community will be interested in such approach.

There is obvious question: how I managed to get this pgbench results if 
currently only substitution of string literals is supported and queries 
constructed by pgbench don't contain string literals? I just made small 
patch in pgbench replaceVariable method wrapping value's representation 
in quotes. It has almost no impact on performance (3482 TPS  vs. 3492 TPS),

but allows autoprepare to deal with pgbench queries.

I attached my patch to this mail. It is just first version of the patch 
(based on REL9_6_STABLE branch) just to illustrate proposed approach.
I will be glad to receive any comments and if such optimization is 
considered to be useful, I will continue work on this patch.


--

Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index f6be98b..6291d66 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -96,8 +96,6 @@ int			max_stack_depth = 100;
 /* wait N seconds to allow attach from a debugger */
 int			PostAuthDelay = 0;
 
-
-
 /* 
  *		private variables
  * 
@@ -188,6 +186,7 @@ static bool IsTransactionStmtList(List *parseTrees);
 static void drop_unnamed_stmt(void);
 static void SigHupHandler(SIGNAL_ARGS);
 static void log_disconnections(int code, Datum arg);
+static bool exec_cached_query(const char *query_string);
 
 
 /* 
@@ -916,6 +915,14 @@ exec_simple_query(const char *query_string)
 	drop_unnamed_stmt();
 
 	/*
+	 * Try to find cached plan
+	 */
+	if (autoprepare_threshold != 0 && exec_cached_query(query_string))
+	{
+		return;
+	}
+
+	/*
 	 * Switch to appropriate context for constructing parsetrees.
 	 */
 	oldcontext = MemoryContextSwitchTo(MessageContext);
@@ -4500,3 +4509,566 @@ log_disconnections(int code, Datum arg)
 	port->user_name, port->database_name, port->remote_host,
   port->remote_port[0] ? " port=" : "", port->remote_port)));
 }
+
+
+
+typedef struct { 
+	char const* query;
+	int64 exec_count;
+	CachedPlanSource* plan;	
+	int n_params;
+	int16 format;
+} plan_cache_entry;
+
+
+/*
+ * Replace string literals with parameters. We do not consider integer or real literals to avoid problems with 
+ * negative number, user defined operators, ... For example it is not easy to distinguish cases (-1), (1-1), (1-1)-1
+ */
+static void generalize_statement(const char *query_string, char** gen_query, char** query_params, int* n_params)
+{
+	size_t query_len = strlen(query_string);
+	char const* src = query_string;
+	char* dst;
+	char* params;
+	uns

Re: [HACKERS] Cached plans and statement generalization

2017-04-24 Thread Andres Freund
Hi,

On 2017-04-24 11:46:02 +0300, Konstantin Knizhnik wrote:
> So what I am thinking now is implicit query caching. If the same query with
> different literal values is repeated many times, then we can try to
> generalize this query and replace it with prepared query with
> parameters.

That's not actuall all that easy:
- You pretty much do parse analysis to be able to do an accurate match.
  How much overhead is parse analysis vs. planning in your cases?
- The invalidation infrastructure for this, if not tied to to fully
  parse-analyzed statements, is going to be hell.
- Migrating to parameters can actually cause significant slowdowns, not
  nice if that happens implicitly.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cached plans and statement generalization

2017-04-24 Thread Konstantin Knizhnik



On 24.04.2017 13:24, Alexander Korotkov wrote:

Hi, Konstantin!

On Mon, Apr 24, 2017 at 11:46 AM, Konstantin Knizhnik 
mailto:k.knizh...@postgrespro.ru>> wrote:


There were a lot of discussions about query plan caching in
hackers mailing list, but I failed to find some clear answer for
my question and the current consensus on this question in Postgres
community. As far as I understand current state is the following:
1. We have per-connection prepared statements.
2. Queries executed inside plpgsql code are implicitly prepared.

It is not always possible or convenient to use prepared statements.
For example, if pgbouncer is used to perform connection pooling.
Another use case (which is actually the problem I am trying to
solve now) is partitioning.
Efficient execution of query to partitioned table requires
hardcoded value for partitioning key.
Only in this case optimizer will be able to construct efficient
query plan which access only affected tables (partitions).

My small benchmark for distributed partitioned table based on
pg_pathman + postgres_fdw shows 3 times degrade of performance in
case of using prepared statements.
But without prepared statements substantial amount of time is
spent in query compilation and planning. I was be able to speed up
benchmark more than two time by
sending prepared queries directly to the remote nodes.


I don't think it's correct to ask PostgreSQL hackers about problem 
which arises with pg_pathman while pg_pathman is an extension 
supported by Postgres Pro.
Since we have declarative partitioning committed to 10, I think that 
community should address this issue in the context of declarative 
partitioning.
However, it's unlikely we can spot this issue with declarative 
partitioning because it still uses very inefficient constraint 
exclusion mechanism.  Thus, issues you are writing about would become 
visible on declarative partitioning only when constraint exclusion 
would be replaced with something more efficient.


Long story short, could you reproduce this issue without pg_pathman?



Sorry, I have mentioned pg_pathman just as example.
The same problems takes place with partitioning based on standard 
Postgres inheritance mechanism (when I manually create derived tables 
and specify constraints for them).
I didn't test yet declarative partitioning committed to 10, but I expect 
the that it will also suffer from this problem (because is based on 
inheritance).
But as I wrote, I think that the problem with plan caching is wider and 
is not bounded just to partitioning.





--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com 


The Russian Postgres Company




--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: [HACKERS] Cached plans and statement generalization

2017-04-24 Thread Alexander Korotkov
Hi, Konstantin!

On Mon, Apr 24, 2017 at 11:46 AM, Konstantin Knizhnik <
k.knizh...@postgrespro.ru> wrote:

> There were a lot of discussions about query plan caching in hackers
> mailing list, but I failed to find some clear answer for my question and
> the current consensus on this question in Postgres community. As far as I
> understand current state is the following:
> 1. We have per-connection prepared statements.
> 2. Queries executed inside plpgsql code are implicitly prepared.
>
> It is not always possible or convenient to use prepared statements.
> For example, if pgbouncer is used to perform connection pooling.
> Another use case (which is actually the problem I am trying to solve now)
> is partitioning.
> Efficient execution of query to partitioned table requires hardcoded value
> for partitioning key.
> Only in this case optimizer will be able to construct efficient query plan
> which access only affected tables (partitions).
>
> My small benchmark for distributed partitioned table based on pg_pathman +
> postgres_fdw shows 3 times degrade of performance in case of using prepared
> statements.
> But without prepared statements substantial amount of time is spent in
> query compilation and planning. I was be able to speed up benchmark more
> than two time by
> sending prepared queries directly to the remote nodes.
>

I don't think it's correct to ask PostgreSQL hackers about problem which
arises with pg_pathman while pg_pathman is an extension supported by
Postgres Pro.
Since we have declarative partitioning committed to 10, I think that
community should address this issue in the context of declarative
partitioning.
However, it's unlikely we can spot this issue with declarative partitioning
because it still uses very inefficient constraint exclusion mechanism.
Thus, issues you are writing about would become visible on declarative
partitioning only when constraint exclusion would be replaced with
something more efficient.

Long story short, could you reproduce this issue without pg_pathman?

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


[HACKERS] Cached plans and statement generalization

2017-04-24 Thread Konstantin Knizhnik

Hi hackers,

There were a lot of discussions about query plan caching in hackers 
mailing list, but I failed to find some clear answer for my question and 
the current consensus on this question in Postgres community. As far as 
I understand current state is the following:

1. We have per-connection prepared statements.
2. Queries executed inside plpgsql code are implicitly prepared.

It is not always possible or convenient to use prepared statements.
For example, if pgbouncer is used to perform connection pooling.
Another use case (which is actually the problem I am trying to solve 
now) is partitioning.
Efficient execution of query to partitioned table requires hardcoded 
value for partitioning key.
Only in this case optimizer will be able to construct efficient query 
plan which access only affected tables (partitions).


My small benchmark for distributed partitioned table based on pg_pathman 
+ postgres_fdw shows 3 times degrade of performance in case of using 
prepared statements.
But without prepared statements substantial amount of time is spent in 
query compilation and planning. I was be able to speed up benchmark more 
than two time by

sending prepared queries directly to the remote nodes.

So what I am thinking now is implicit query caching. If the same query 
with different literal values is repeated many times, then we can try to 
generalize this query and replace it with prepared query with 
parameters. I am not considering now shared query cache: is seems to be 
much harder to implement. But local caching of generalized queries seems 
to be not so difficult to implement and requires not so much changes in 
Postgres code. And it can be useful not only for sharding, but for many 
other cases where prepared statements can not be used.


I wonder if such option was already considered and if it was for some 
reasons rejected: can you point me at this reasons?


--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers