Re: [HACKERS] Prepared statements and generic plans

'br...@momjian.us' Mon, 13 Jun 2016 12:41:08 -0700

On Mon, Jun 13, 2016 at 01:26:04PM +0000, Albe Laurenz wrote:
> Bruce Momjian wrote:
> > Also, is it possible to do an EXPLAIN prepare() with the libpq/wire
> > protocol?  I can't do PREPARE EXPLAIN, but I can do EXPLAIN EXECUTE.
> > However, I don't see any way to inject EXPLAIN into the libpq/wire
> > prepare case.  Can you specify prepare(EXPLAIN SELECT)?  (PREPARE
> > EXPLAIN SELECT throws a syntax error.)
> 
> I am not sure what you mean:
> EXPLAIN PREPARE to get EXPLAIN for PREPARE, or PREPARE ... FOR EXPLAIN
> to get an EXPLAIN statement with parameters.
> What should EXPLAIN PREPARE show that EXPLAIN SELECT wouldn't?
> Why the need for EXPLAIN statements with parameters?


Well, you can't use EXPLAIN with SQL PREPARE:

        test=> EXPLAIN PREPARE SELECT * FROM pg_class;
        ERROR:  syntax error at or near "PREPARE"
        LINE 1: EXPLAIN PREPARE SELECT * FROM pg_class;
                        ^
        test=> PREPARE EXPLAIN SELECT * FROM pg_class;
        ERROR:  syntax error at or near "SELECT"
        LINE 1: PREPARE EXPLAIN SELECT * FROM pg_class;
                        ^
You can only do EXPLAIN EXECUTE ..., which works fine, e.g.:

        EXPLAIN EXECUTE prep_c1(0);

However, for the wire protocol prepare/execute, how do you do EXPLAIN?
The only way I can see doing it is to put the EXPLAIN in the prepare
query, but I wasn't sure that works.  So, I just wrote and tested the
attached C program and it properly output the explain information, e.g.

        res = PQprepare(conn, "prep1", "EXPLAIN SELECT * FROM pg_language", 0, 
NULL);
                                        -------
generated:

        QUERY PLAN
        
        Seq Scan on pg_language  (cost=0.00..1.04 rows=4 width=114)

so that works --- good.

> > Looking at how the code behaves, it seems custom plans that are _more_
> > expensive (plus planning cost) than the generic plan switch to the
> > generic plan after five executions, as now documented.  Custom plans
> > that are significantly _cheaper_ than the generic plan _never_ use the
> > generic plan.
> 
> Yes, that's what the suggested documentation improvement says as well,
> right?

Yes.  What is odd is that it isn't the plan of the actual supplied
parameters that is cheaper, just the generic plan that assumes each
distinct value in the query is equally likely to be used.  So, when we
say the generic plan is cheaper, it is just comparing the custom plan
with the supplied parameters vs. the generic plan --- it is not saying
that running the supplied constants with the generic plan will execute
faster, because in fact we might be using a sub-optimial generic plan.

For example, giving my test table that I posted earlier, if you ran the
most common constant (50% common) the first five time, the custom plan
would use a sequential scan.  On the sixth run of that same constant, a
bitmap scan generic plan would be used.  Now, that does have a lower
cost, but only for the _average_ distinct value, not for the 50%
constant that is being used.  A bitmap scan on a constant that would
normally use a sequential scan will take longer than even a sequential
scan, because if it didn't, the custom plan would have chosen the bitmap
scan.

I am not sure how we can improve things, but I wanted to clarify exactly
what is happening.

> > Updated patch attached.
> 
> Upon re-read, one tiny question:
> 
> !    Prepared statements can optionally use generic plans rather than
> !    re-planning with each set of supplied <command>EXECUTE</command> values.
> 
> Maybe the "optionally" should be omitted, since the user has no choice.
> 
> It is true that there is a cursor option CURSOR_OPT_CUSTOM_PLAN, but that
> cannot be used on the SQL level.

Right.  Updated patch attached.

-- 
  Bruce Momjian  <br...@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+                     Ancient Roman grave inscription +

/*
 * src/test/isolation/isolationtester.c
 *
 * isolationtester.c
 *		Runs an isolation test specified by a spec file.
 */

#include <stdio.h>
#include <stdlib.h>
#include <libpq-fe.h>

static void printResultSet(PGresult *res);


static void
exit_nicely(PGconn *conn)
{
    PQfinish(conn);
    exit(1);
}


int
main(int argc, char **argv)
{
    const char *conninfo;
    PGconn     *conn;
	PGresult   *res;

    if (argc > 2)
    {
    	fprintf(stderr, "Usage:  %s connection-string\n", argv[0]);
    	exit(1);
    }
	else if (argc == 2)
        conninfo = argv[1];
    else
        conninfo = "dbname = postgres";

    /* Make a connection to the database */
    conn = PQconnectdb(conninfo);

    /* Check to see that the backend connection was successfully made */
    if (PQstatus(conn) != CONNECTION_OK)
    {
        fprintf(stderr, "Connection to database failed: %s",
                PQerrorMessage(conn));
        exit_nicely(conn);
    }

	res = PQprepare(conn, "prep1", "EXPLAIN SELECT * FROM pg_language", 0, NULL);
	if (PQresultStatus(res) != PGRES_COMMAND_OK)
	{
		fprintf(stderr, "PQprepare() failed: %s", PQerrorMessage(conn));
		exit_nicely(conn);
	}

	PQclear(res);

	res = PQexecPrepared(conn, "prep1", 0, NULL, NULL, NULL, 0);
	if (PQresultStatus(res) != PGRES_TUPLES_OK)
	{
		fprintf(stderr, "PQresultStatus() failed: %s", PQerrorMessage(conn));
		exit_nicely(conn);
	}

	printResultSet(res);
    PQclear(res);
	
    PQfinish(conn);

    return 0;
}

static void
printResultSet(PGresult *res)
{
	int			nFields;
	int			i,
				j;

	/* first, print out the attribute names */
	nFields = PQnfields(res);
	for (i = 0; i < nFields; i++)
		printf("%-15s", PQfname(res, i));
	printf("\n\n");

	/* next, print out the rows */
	for (i = 0; i < PQntuples(res); i++)
	{
		for (j = 0; j < nFields; j++)
			printf("%-15s", PQgetvalue(res, i, j));
		printf("\n");
	}
}

diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml
new file mode 100644
index 3829a14..6285dd0
*** a/doc/src/sgml/libpq.sgml
--- b/doc/src/sgml/libpq.sgml
*************** PGresult *PQprepare(PGconn *conn,
*** 2303,2310 ****
         <para>
          <function>PQprepare</> creates a prepared statement for later
          execution with <function>PQexecPrepared</>.  This feature allows
!         commands that will be used repeatedly to be parsed and planned just
!         once, rather than each time they are executed.
          <function>PQprepare</> is supported only in protocol 3.0 and later
          connections; it will fail when using protocol 2.0.
         </para>
--- 2303,2310 ----
         <para>
          <function>PQprepare</> creates a prepared statement for later
          execution with <function>PQexecPrepared</>.  This feature allows
!         commands to be executed repeatedly without being parsed and
!         planned each time;  see <xref linkend="SQL-PREPARE"> for details.
          <function>PQprepare</> is supported only in protocol 3.0 and later
          connections; it will fail when using protocol 2.0.
         </para>
diff --git a/doc/src/sgml/ref/prepare.sgml b/doc/src/sgml/ref/prepare.sgml
new file mode 100644
index dbce8f2..8efd51a
*** a/doc/src/sgml/ref/prepare.sgml
--- b/doc/src/sgml/ref/prepare.sgml
*************** PREPARE <replaceable class="PARAMETER">n
*** 70,80 ****
    </para>
  
    <para>
!    Prepared statements have the largest performance advantage when a
!    single session is being used to execute a large number of similar
     statements. The performance difference will be particularly
!    significant if the statements are complex to plan or rewrite, for
!    example, if the query involves a join of many tables or requires
     the application of several rules. If the statement is relatively simple
     to plan and rewrite but relatively expensive to execute, the
     performance advantage of prepared statements will be less noticeable.
--- 70,80 ----
    </para>
  
    <para>
!    Prepared statements potentially have the largest performance advantage
!    when a single session is being used to execute a large number of similar
     statements. The performance difference will be particularly
!    significant if the statements are complex to plan or rewrite, e.g. 
!    if the query involves a join of many tables or requires
     the application of several rules. If the statement is relatively simple
     to plan and rewrite but relatively expensive to execute, the
     performance advantage of prepared statements will be less noticeable.
*************** PREPARE <replaceable class="PARAMETER">n
*** 123,148 ****
    </variablelist>
   </refsect1>
  
!  <refsect1>
    <title>Notes</title>
  
    <para>
!    If a prepared statement is executed enough times, the server may eventually
!    decide to save and re-use a generic plan rather than re-planning each time.
!    This will occur immediately if the prepared statement has no parameters;
!    otherwise it occurs only if the generic plan appears to be not much more
!    expensive than a plan that depends on specific parameter values.
!    Typically, a generic plan will be selected only if the query's performance
!    is estimated to be fairly insensitive to the specific parameter values
!    supplied.
    </para>
  
    <para>
     To examine the query plan <productname>PostgreSQL</productname> is using
!    for a prepared statement, use <xref linkend="sql-explain">.
     If a generic plan is in use, it will contain parameter symbols
     <literal>$<replaceable>n</></literal>, while a custom plan will have the
!    current actual parameter values substituted into it.
    </para>
  
    <para>
--- 123,166 ----
    </variablelist>
   </refsect1>
  
!  <refsect1 id="SQL-PREPARE-notes">
    <title>Notes</title>
  
    <para>
!    Prepared statements can use generic plans rather than re-planning with
!    each set of supplied <command>EXECUTE</command> values.  This occurs
!    immediately for prepared statements with no parameters; otherwise
!    it occurs only after five or more executions produce plans whose
!    estimated cost average (including planning overhead) is more expensive
!    than the generic plan cost estimate.  Once a generic plan is chosen,
!    it is used for the remaining lifetime of the prepared statement.
!    Using <command>EXECUTE</command> values which are rare in columns with
!    many duplicates can generate custom plans that are so much cheaper
!    than the generic plan, even after adding planning overhead, that the
!    generic plan might never be used.
!   </para>
! 
!   <para>
!    A generic plan assumes that each value supplied to
!    <command>EXECUTE</command> is one of the column's distinct values
!    and that column values are uniformly distributed.  For example,
!    if statistics record three distinct column values, a generic plan
!    assumes a column equality comparison will match 33% of processed rows.
!    Column statistics also allow generic plans to accurately compute the
!    selectivity of unique columns.  Comparisons on non-uniformly-distributed
!    columns and specification of non-existent values affects the average
!    plan cost, and hence if and when a generic plan is chosen.
    </para>
  
    <para>
     To examine the query plan <productname>PostgreSQL</productname> is using
!    for a prepared statement, use <xref linkend="sql-explain">, e.g.
!    <command>EXPLAIN EXECUTE</>.
     If a generic plan is in use, it will contain parameter symbols
     <literal>$<replaceable>n</></literal>, while a custom plan will have the
!    supplied parameter values substituted into it.
!    The row estimates in the generic plan reflect the selectivity
!    computed for the parameters.
    </para>
  
    <para>

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Prepared statements and generic plans

Reply via email to