Re: [PERFORM] performance for high-volume log insertion

2009-04-23 Thread Stephen Frost
* da...@lang.hm (da...@lang.hm) wrote:
 On Wed, 22 Apr 2009, Stephen Frost wrote:
 Erm..  Prepared queries is about using PQexecPrepared(), not about
 sending a text string as an SQL EXECUTE().  PQexecPrepared takes an
 array of arguments.  That gets translated into a Bind command in the
 protocol with a defined number of parameters and a length for each
 parameter being passed.  That removes any need for scanning/parsing the
 string sent to the backend.  That's the savings I'm referring to.

 are you sure? I thought that what goes out over the wire is always text.

Wow, why is there so much confusion and misunderstanding about this?

*psql* sends everything to the backend as text (except perhaps COPY
BINARY..  but that's because the user handles it), but if you're using
libpq, PQexecPrepared, and protocol 3.0 (any recent PG version), it's
going to use the Parse/Bind protocol-level commands.  To make it perhaps
more clear, here's a snippet from the libpq code for PQsendQueryGuts(),
which is the work-horse called by PQexecPrepared:

/*
 * We will send Parse (if needed), Bind, Describe Portal, Execute, Sync,
 * using specified statement name and the unnamed portal.
 */
[...]

/* Construct the Bind message */
if (pqPutMsgStart('B', false, conn)  0 ||
pqPuts(, conn)  0 ||
pqPuts(stmtName, conn)  0)
goto sendFailed;

/* Send parameter formats */
[...]
-- No param formats included, let the backend know
if (pqPutInt(0, 2, conn)  0)
goto sendFailed;

-- Tell the backend the number of parameters to expect
if (pqPutInt(nParams, 2, conn)  0)
goto sendFailed;

/* Send parameters */
for (i = 0; i  nParams; i++)
[...]
-- Pull the length from the caller-provided for each param
nbytes = paramLengths[i];
[...]
-- Send the length, then the param, over the wire
if (pqPutInt(nbytes, 4, conn)  0 ||
pqPutnchar(paramValues[i], nbytes, conn)  0)
goto sendFailed;
[...]
-- All done, send finish indicator
if (pqPutInt(1, 2, conn)  0 ||
pqPutInt(resultFormat, 2, conn))
goto sendFailed;
if (pqPutMsgEnd(conn)  0)
goto sendFailed;

/* construct the Describe Portal message */
if (pqPutMsgStart('D', false, conn)  0 ||
pqPutc('P', conn)  0 ||
pqPuts(, conn)  0 ||
pqPutMsgEnd(conn)  0)
goto sendFailed;

/* construct the Execute message */
if (pqPutMsgStart('E', false, conn)  0 ||
pqPuts(, conn)  0 ||
pqPutInt(0, 4, conn)  0 ||
pqPutMsgEnd(conn)  0)
goto sendFailed;

[...]
-- clear everything out
if (pqFlush(conn)  0)
goto sendFailed;

Any other questions?

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [PERFORM] performance for high-volume log insertion

2009-04-23 Thread Stephen Frost
* Glenn Maynard (glennfmayn...@gmail.com) wrote:
 I'd suggest this be mentioned in the sql-prepare documentation, then,
 because that documentation only discusses using prepared statements to
 eliminate redundant planning costs.  (I'm sure it's mentioned in the
 API docs and elsewhere, but if it's a major intended use of PREPARE,
 the PREPARE documentation should make note of it.)

Argh.  Perhaps the problem is that it's somewhat 'overloaded'.  PG
supports *both* SQL-level PREPARE/EXECUTE commands and the more
traditional (well, in my view anyway...) API/protocol of PQprepare() and
PQexecPrepared().  When using the API/protocol, you don't actually
explicitly call the SQL 'PREPARE blah AS INSERT INTO', you just call
PQprepare() with 'INSERT INTO blah VALUES ($1, $2, $3);' and then call
PQexecPrepared() later.

That's the reason it's not documented in the SQL-level PREPARE docs,
anyway.  I'm not against adding some kind of reference there, but it's
not quite the way you think it is..

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [PERFORM] WHERE condition not being pushed down to union parts

2009-04-23 Thread John L. Clark
On Tue, Apr 21, 2009 at 3:58 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Ah.  The problem is that your view contains constants in the UNION arms:

 In 8.2 and 8.3, the planner is only smart enough to generate
 inner-indexscan nestloop plans on UNIONs if all the elements of the
 SELECT lists are simple variables (that is, table columns).
 8.4 will be smarter about this.

Ah, and so it is!  I installed 8.4beta1 and have loaded it with the
big database; it is pushing the index condition down to the parts of
the UNION, and my queries are now running MUCH faster.  Here's the new
query plan for the query involving the UNION-constructed view:

query-plan

QUERY PLAN
--
 Nested Loop  (cost=0.00..53.32 rows=1083 width=80)
   Join Filter: (component_0_statements.subject = literalproperties.subject)
   -  Index Scan using relations_poscindex on relations
component_0_statements  (cost=0.00..13.97 rows=2 width=40)
 Index Cond: ((predicate = (-2875059751320018987)::bigint) AND
(object = (-2827607394936393903)::bigint))
   -  Append  (cost=0.00..19.65 rows=2 width=60)
 -  Index Scan using literalproperties_subjectindex on
literalproperties  (cost=0.00..10.05 rows=1 width=57)
   Index Cond: (literalproperties.subject =
component_0_statements.subject)
   Filter: (literalproperties.predicate =
(-2875059751320018987)::bigint)
 -  Index Scan using relations_subjectindex on relations
(cost=0.00..9.59 rows=1 width=64)
   Index Cond: (relations.subject = component_0_statements.subject)
   Filter: (relations.predicate = (-2875059751320018987)::bigint)
(11 rows)
/query-plan

Thanks for your help, Tom.  I am certainly amused and pleased that my
exact use case is handled in the very next PostgreSQL release.

Take care,

John L. Clark

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] WHERE condition not being pushed down to union parts

2009-04-23 Thread Tom Lane
John L. Clark j...@po.cwru.edu writes:
 Thanks for your help, Tom.  I am certainly amused and pleased that my
 exact use case is handled in the very next PostgreSQL release.

Well, sir, your timing is excellent ;-).  That's been a known problem
for quite some time, and it was only in this release cycle that it got
addressed.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] performance for high-volume log insertion

2009-04-23 Thread Thomas Kellerer

Stephen Frost wrote on 22.04.2009 23:51:

What about 4 individual prepared inserts?  Just curious about it.



4 inserts, one prepared statement each (constructing the prepared
statement only once), in a single transaction: 1.68s

I'm surprised that there's any win here at all.


For a single column table, I wouldn't expect much either.  With more
columns I think it would be a larger improvement.


Out of curiosity I did some tests through JDBC.

Using a single-column (integer) table, re-using a prepared statement took about 
7 seconds to insert 10 rows with JDBC's batch interface and a batch size of 1000


Using a prepared statement that had a 1000 (?) after the insert (insert into foo 
 values (?), (?), ...) the insert took about 0.8 seconds. Quite an improvement 
I'd say.


Then I switched to a three column table (int, varchar(500), varchar(500)).

Insert using a preparedstatement with batch (first scenario) now was ~8.5 
seconds, whereas the multi-value insert now took ~3 seconds. So the difference 
got smaller, but still was quite substantial. This was inserting relatively 
small strings (~20 characters) into the table


When increasing the size of the inserted strings, things began to change. When I 
bumped the length of the strings to 70 and 200 characters, the multi-value 
insert slowed down considerably. Both solutions now took around 9 seconds.


The multi-value solution ranged between 7 and 9 seconds, whereas the regular 
insert syntax was pretty constant at roughly 9 seconds (I ran it about 15 times).


So it seems, that as the size of the row increases the multi-value insert loses 
its head-start compared to the regular insert.


I also played around with batch size. Going beyond 200 didn't make a big 
difference.


For the JDBC batch, the batch size was the number of rows after which I called 
executeBatch() for the multi-value insert, this was the number of tuples I sent 
in a single INSERT statement.


The multi-value statement seems to perform better with lower batch sizes 
(~10-50) whereas the JDBC batching seems to be fastest with about 200 statements 
per batch.



Thomas


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM]

2009-04-23 Thread Adam Ruth


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance