Re: [PERFORM] performance for high-volume log insertion
* da...@lang.hm (da...@lang.hm) wrote: On Wed, 22 Apr 2009, Stephen Frost wrote: Erm.. Prepared queries is about using PQexecPrepared(), not about sending a text string as an SQL EXECUTE(). PQexecPrepared takes an array of arguments. That gets translated into a Bind command in the protocol with a defined number of parameters and a length for each parameter being passed. That removes any need for scanning/parsing the string sent to the backend. That's the savings I'm referring to. are you sure? I thought that what goes out over the wire is always text. Wow, why is there so much confusion and misunderstanding about this? *psql* sends everything to the backend as text (except perhaps COPY BINARY.. but that's because the user handles it), but if you're using libpq, PQexecPrepared, and protocol 3.0 (any recent PG version), it's going to use the Parse/Bind protocol-level commands. To make it perhaps more clear, here's a snippet from the libpq code for PQsendQueryGuts(), which is the work-horse called by PQexecPrepared: /* * We will send Parse (if needed), Bind, Describe Portal, Execute, Sync, * using specified statement name and the unnamed portal. */ [...] /* Construct the Bind message */ if (pqPutMsgStart('B', false, conn) 0 || pqPuts(, conn) 0 || pqPuts(stmtName, conn) 0) goto sendFailed; /* Send parameter formats */ [...] -- No param formats included, let the backend know if (pqPutInt(0, 2, conn) 0) goto sendFailed; -- Tell the backend the number of parameters to expect if (pqPutInt(nParams, 2, conn) 0) goto sendFailed; /* Send parameters */ for (i = 0; i nParams; i++) [...] -- Pull the length from the caller-provided for each param nbytes = paramLengths[i]; [...] -- Send the length, then the param, over the wire if (pqPutInt(nbytes, 4, conn) 0 || pqPutnchar(paramValues[i], nbytes, conn) 0) goto sendFailed; [...] -- All done, send finish indicator if (pqPutInt(1, 2, conn) 0 || pqPutInt(resultFormat, 2, conn)) goto sendFailed; if (pqPutMsgEnd(conn) 0) goto sendFailed; /* construct the Describe Portal message */ if (pqPutMsgStart('D', false, conn) 0 || pqPutc('P', conn) 0 || pqPuts(, conn) 0 || pqPutMsgEnd(conn) 0) goto sendFailed; /* construct the Execute message */ if (pqPutMsgStart('E', false, conn) 0 || pqPuts(, conn) 0 || pqPutInt(0, 4, conn) 0 || pqPutMsgEnd(conn) 0) goto sendFailed; [...] -- clear everything out if (pqFlush(conn) 0) goto sendFailed; Any other questions? Thanks, Stephen signature.asc Description: Digital signature
Re: [PERFORM] performance for high-volume log insertion
* Glenn Maynard (glennfmayn...@gmail.com) wrote: I'd suggest this be mentioned in the sql-prepare documentation, then, because that documentation only discusses using prepared statements to eliminate redundant planning costs. (I'm sure it's mentioned in the API docs and elsewhere, but if it's a major intended use of PREPARE, the PREPARE documentation should make note of it.) Argh. Perhaps the problem is that it's somewhat 'overloaded'. PG supports *both* SQL-level PREPARE/EXECUTE commands and the more traditional (well, in my view anyway...) API/protocol of PQprepare() and PQexecPrepared(). When using the API/protocol, you don't actually explicitly call the SQL 'PREPARE blah AS INSERT INTO', you just call PQprepare() with 'INSERT INTO blah VALUES ($1, $2, $3);' and then call PQexecPrepared() later. That's the reason it's not documented in the SQL-level PREPARE docs, anyway. I'm not against adding some kind of reference there, but it's not quite the way you think it is.. Thanks, Stephen signature.asc Description: Digital signature
Re: [PERFORM] WHERE condition not being pushed down to union parts
On Tue, Apr 21, 2009 at 3:58 PM, Tom Lane t...@sss.pgh.pa.us wrote: Ah. The problem is that your view contains constants in the UNION arms: In 8.2 and 8.3, the planner is only smart enough to generate inner-indexscan nestloop plans on UNIONs if all the elements of the SELECT lists are simple variables (that is, table columns). 8.4 will be smarter about this. Ah, and so it is! I installed 8.4beta1 and have loaded it with the big database; it is pushing the index condition down to the parts of the UNION, and my queries are now running MUCH faster. Here's the new query plan for the query involving the UNION-constructed view: query-plan QUERY PLAN -- Nested Loop (cost=0.00..53.32 rows=1083 width=80) Join Filter: (component_0_statements.subject = literalproperties.subject) - Index Scan using relations_poscindex on relations component_0_statements (cost=0.00..13.97 rows=2 width=40) Index Cond: ((predicate = (-2875059751320018987)::bigint) AND (object = (-2827607394936393903)::bigint)) - Append (cost=0.00..19.65 rows=2 width=60) - Index Scan using literalproperties_subjectindex on literalproperties (cost=0.00..10.05 rows=1 width=57) Index Cond: (literalproperties.subject = component_0_statements.subject) Filter: (literalproperties.predicate = (-2875059751320018987)::bigint) - Index Scan using relations_subjectindex on relations (cost=0.00..9.59 rows=1 width=64) Index Cond: (relations.subject = component_0_statements.subject) Filter: (relations.predicate = (-2875059751320018987)::bigint) (11 rows) /query-plan Thanks for your help, Tom. I am certainly amused and pleased that my exact use case is handled in the very next PostgreSQL release. Take care, John L. Clark -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] WHERE condition not being pushed down to union parts
John L. Clark j...@po.cwru.edu writes: Thanks for your help, Tom. I am certainly amused and pleased that my exact use case is handled in the very next PostgreSQL release. Well, sir, your timing is excellent ;-). That's been a known problem for quite some time, and it was only in this release cycle that it got addressed. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] performance for high-volume log insertion
Stephen Frost wrote on 22.04.2009 23:51: What about 4 individual prepared inserts? Just curious about it. 4 inserts, one prepared statement each (constructing the prepared statement only once), in a single transaction: 1.68s I'm surprised that there's any win here at all. For a single column table, I wouldn't expect much either. With more columns I think it would be a larger improvement. Out of curiosity I did some tests through JDBC. Using a single-column (integer) table, re-using a prepared statement took about 7 seconds to insert 10 rows with JDBC's batch interface and a batch size of 1000 Using a prepared statement that had a 1000 (?) after the insert (insert into foo values (?), (?), ...) the insert took about 0.8 seconds. Quite an improvement I'd say. Then I switched to a three column table (int, varchar(500), varchar(500)). Insert using a preparedstatement with batch (first scenario) now was ~8.5 seconds, whereas the multi-value insert now took ~3 seconds. So the difference got smaller, but still was quite substantial. This was inserting relatively small strings (~20 characters) into the table When increasing the size of the inserted strings, things began to change. When I bumped the length of the strings to 70 and 200 characters, the multi-value insert slowed down considerably. Both solutions now took around 9 seconds. The multi-value solution ranged between 7 and 9 seconds, whereas the regular insert syntax was pretty constant at roughly 9 seconds (I ran it about 15 times). So it seems, that as the size of the row increases the multi-value insert loses its head-start compared to the regular insert. I also played around with batch size. Going beyond 200 didn't make a big difference. For the JDBC batch, the batch size was the number of rows after which I called executeBatch() for the multi-value insert, this was the number of tuples I sent in a single INSERT statement. The multi-value statement seems to perform better with lower batch sizes (~10-50) whereas the JDBC batching seems to be fastest with about 200 statements per batch. Thomas -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM]
-- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance