Re: [HACKERS] libpq pipelining
On Friday, December 05, 2014 12:22:38 PM Heikki Linnakangas wrote: Oh, that's what the PQgetLastQuery/PQgetNextQuery functions work! I didn't understand that before. I'd suggest renaming them to something like PQgetSentQuery() and PQgetResultQuery(). The first/last/next names made me think that they're used to iterate a list of queries, but in fact they're supposed to be used at very different stages. - Heikki Okay, I have renamed them with your suggestions, and added PQsetPipelining/PQgetPipelining, defaulting to pipelining off. There should be no behavior change unless pipelining is enabled. Documentation should be mostly complete except the possible addition of an example and maybe a general pipelining overview paragraph. I have implemented async query support (that takes advantage of pipelining) in Qt, along with a couple test cases. This led to me discovering a bug with my last patch where a PGquery object could be reused twice in a row. I have fixed that. I contemplated not reusing the PGquery objects at all, but that wouldn't solve the problem because it's very possible that malloc will return a recent free of the same size anyway. Making the guarantee that a PGquery won't be reused twice in a row should be sufficient, and the only alternative is to add a unique id, but that will add further complexity that I don't think is warranted. Feedback is very welcome and appreciated. Thanks, Matt Newell diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml index d829a4b..4e0431e 100644 --- a/doc/src/sgml/libpq.sgml +++ b/doc/src/sgml/libpq.sgml @@ -3947,9 +3947,14 @@ int PQsendQuery(PGconn *conn, const char *command); After successfully calling functionPQsendQuery/function, call functionPQgetResult/function one or more times to obtain the - results. functionPQsendQuery/function cannot be called again - (on the same connection) until functionPQgetResult/function - has returned a null pointer, indicating that the command is done. + results. If pipelining is enabled functionPQsendQuery/function + may be called multiple times before reading the results. See + functionPQsetPipelining/function and functionPQisPipelining/function. + Call functionPQgetSentQuery/function to get a structnamePGquery/structname + which can be used to identify which results obtained from + functionPQgetResult/function belong to each pipelined query. + If only one query is dispatched at a time, you can call functionPQgetResult/function + until a NULL value is returned to indicate the end of the query. /para /listitem /varlistentry @@ -4133,8 +4138,8 @@ PGresult *PQgetResult(PGconn *conn); para functionPQgetResult/function must be called repeatedly until - it returns a null pointer, indicating that the command is done. - (If called when no command is active, + it returns a null pointer, indicating that all dispatched commands + are done. (If called when no command is active, functionPQgetResult/function will just return a null pointer at once.) Each non-null result from functionPQgetResult/function should be processed using the @@ -4144,14 +4149,17 @@ PGresult *PQgetResult(PGconn *conn); functionPQgetResult/function will block only if a command is active and the necessary response data has not yet been read by functionPQconsumeInput/function. + If query pipelining is being used, functionPQgetResultQuery/function + can be called after PQgetResult to match the result to the query. /para note para Even when functionPQresultStatus/function indicates a fatal -error, functionPQgetResult/function should be called until it -returns a null pointer, to allow applicationlibpq/ to -process the error information completely. +error, functionPQgetResult/function should be called until the +query has no more results (null pointer return if not using query +pipelining, otherwise see functionPQgetResultQuery/function), +to allow applicationlibpq/ to process the error information completely. /para /note /listitem @@ -4385,6 +4393,158 @@ int PQflush(PGconn *conn); read-ready and then read the response as described above. /para + variablelist + varlistentry id=libpq-pqsetpipelining + term +functionPQsetPipelining/function +indexterm + primaryPQsetPipelining/primary +/indexterm + /term + + listitem +para + Enables or disables query pipelining. +synopsis +int PQsetPipelining(PGconn *conn, int arg); +/synopsis +/para + +para + Enables pipelining for the connectino if arg is 1, or disables it + if arg is 0. When pipelining is enabled multiple async queries can + be sent before processing the results of the first. If pipelining + is disabled an
Re: [HACKERS] libpq pipelining
On 12/05/2014 02:30 AM, Matt Newell wrote: The explanation of PQgetFirstQuery makes it sound pretty hard to match up the result with the query. You have to pay attention to PQisBusy. PQgetFirstQuery should also be valid after calling PQgetResult and then you don't have to worry about PQisBusy, so I should probably change the documentation to indicate that is the preferred usage, or maybe make that the only guaranteed usage, and say the results are undefined if you call it before calling PQgetResult. That usage also makes it consistent with PQgetLastQuery being called immediately after PQsendQuery. I changed my second example to call PQgetFirstQuery after PQgetResult instead of before, and that removes the need to call PQconsumeInput and PQisBusy when you don't mind blocking. It makes the example super simple: PQsendQuery(conn, INSERT INTO test(id) VALUES (DEFAULT),(DEFAULT) RETURNING id); query1 = PQgetLastQuery(conn); /* Duplicate primary key error */ PQsendQuery(conn, UPDATE test SET id=2 WHERE id=1); query2 = PQgetLastQuery(conn); PQsendQuery(conn, SELECT * FROM test); query3 = PQgetLastQuery(conn); while( (result = PQgetResult(conn)) != NULL ) { curQuery = PQgetFirstQuery(conn); if (curQuery == query1) checkResult(conn,result,curQuery,PGRES_TUPLES_OK); if (curQuery == query2) checkResult(conn,result,curQuery,PGRES_FATAL_ERROR); if (curQuery == query3) checkResult(conn,result,curQuery,PGRES_TUPLES_OK); } Note that the curQuery == queryX check will work no matter how many results a query produces. Oh, that's what the PQgetLastQuery/PQgetNextQuery functions work! I didn't understand that before. I'd suggest renaming them to something like PQgetSentQuery() and PQgetResultQuery(). The first/last/next names made me think that they're used to iterate a list of queries, but in fact they're supposed to be used at very different stages. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] libpq pipelining
On 12/04/2014 03:11 AM, Matt Newell wrote: The recent discussion about pipelining in the jodbc driver prompted me to look at what it would take for libpq. Great! I have a proof of concept patch working. The results are even more promising than I expected. While it's true that many applications and frameworks won't easily benefit, it amazes me that this hasn't been explored before. I developed a simple test application that creates a table with a single auto increment primary key column, then runs a 4 simple queries x times each: ... I plan to write documentation, add regression testing, and do general cleanup before asking for feedback on the patch itself. Any suggestions about performance testing or api design would be nice. I haven't played with changing the sync logic yet, but I'm guessing that an api to allow manual sync instead of a sync per PQsendQuery will be needed. That could make things tricky though with multi-statement queries, because currently the only way to detect when results change from one query to the next are a ReadyForQuery message. A good API is crucial for this. It should make it easy to write an application that does pipelining, and to handle all the error conditions in a predictable way. I'd suggest that you write the documentation first, before writing any code, so that we can discuss the API. It doesn't have to be in SGML format yet, a plain-text description of the API will do. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] libpq pipelining
On 12/04/2014 05:08 PM, Heikki Linnakangas wrote: A good API is crucial for this. It should make it easy to write an application that does pipelining, and to handle all the error conditions in a predictable way. I'd suggest that you write the documentation first, before writing any code, so that we can discuss the API. It doesn't have to be in SGML format yet, a plain-text description of the API will do. I strongly agree. Applications need to be able to reliably predict what will happen if there's an error in the middle of a pipeline. Consideration of implicit transactions (autocommit), the whole pipeline being one transaction, or multiple transactions is needed. Apps need to be able to wait for the result of a query partway through a pipeline, e.g. scheduling four queries, then waiting for the result of the 2nd. There are probably plenty of other wrinkly bits to think about. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] libpq pipelining
On Thursday, December 04, 2014 10:30:46 PM Craig Ringer wrote: On 12/04/2014 05:08 PM, Heikki Linnakangas wrote: A good API is crucial for this. It should make it easy to write an application that does pipelining, and to handle all the error conditions in a predictable way. I'd suggest that you write the documentation first, before writing any code, so that we can discuss the API. It doesn't have to be in SGML format yet, a plain-text description of the API will do. I strongly agree. First pass at the documentation changes attached, along with a new example that demonstrates pipelining 3 queries, with the middle one resulting in a PGRES_FATAL_ERROR response. With the API i am proposing, only 2 new functions (PQgetFirstQuery, PQgetLastQuery) are required to be able to match each result to the query that caused it. Another function, PQgetNextQuery allows iterating through the pending queries, and PQgetQueryCommand permits getting the original query text. Adding the ability to set a user supplied pointer on the PGquery struct might make it much easier for some frameworks, and other users might want a callback, but I don't think either are required. Applications need to be able to reliably predict what will happen if there's an error in the middle of a pipeline. Yes, the API i am proposing makes it easy to get results for each submitted query independently of the success or failure of previous queries in the pipeline. Consideration of implicit transactions (autocommit), the whole pipeline being one transaction, or multiple transactions is needed. The more I think about this the more confident I am that no extra work is needed. Unless we start doing some preliminary processing of the query inside of libpq, our hands are tied wrt sending a sync at the end of each query. The reason for this is that we rely on the ReadyForQuery message to indicate the end of a query, so without the sync there is no way to tell if the next result is from another statement in the current query, or the first statement in the next query. I also don't see a reason to need multiple queries without a sync statement. If the user wants all queries to succeed or fail together it should be no problem to start the pipeline with begin and complete it commit. But I may be missing some detail... Apps need to be able to wait for the result of a query partway through a pipeline, e.g. scheduling four queries, then waiting for the result of the 2nd. Right. With the api i am proposing the user does have to process each result until it gets to the one it wants, but it's no problem doing that. It would also be trivial to add a function PGresult * PQgetNextQueryResult(PQquery *query); that discards all results from previous queries. Very similar to how a PQexec disregards all results from previous async queries. It would also be possible to queue the results and be able to retrieve them out of order, but I think that add unnecessary complexity and might also make it easy for users to never retrieve and free some results. There are probably plenty of other wrinkly bits to think about. Yup, I'm sure i'm still missing some significant things at this point... Matt Newell /* * src/test/examples/testlibpqpipeline2.c * * * testlibpqpipeline.c * this test program tests query pipelining. It shows how to issue multiple * pipelined queries, and identify from which query a result originated. It * also demonstrates how failure of one query does not impact subsequent queries * when they are not part of the same transaction. * * */ #include stdio.h #include stdlib.h #include sys/time.h #include libpq-fe.h static void checkResult(PGconn *conn, PGresult *result, PGquery *query, int expectedResultStatus) { if (PQresultStatus(result) != expectedResultStatus) { printf( Got unexpected result status '%s', expected '%s'\nQuery:%s\n, PQresStatus(PQresultStatus(result)), PQresStatus(expectedResultStatus), PQgetQueryCommand(query)); PQclear(result); PQclear(PQexec(conn,DROP TABLE test)); PQfinish(conn); exit(1); } PQclear(result); } int main(int argc, char **argv) { PGconn * conn; PGquery * query1; PGquery * query2; PGquery * query3; PGquery * curQuery; PGresult * result; conn = NULL; query1 = query2 = query3 = curQuery = NULL; result = NULL; /* make a connection to the database */ conn = PQsetdb(NULL, NULL, NULL, NULL, NULL); /* check to see that the backend connection was successfully made */ if (PQstatus(conn) != CONNECTION_OK) { fprintf(stderr, Connection to database failed: %s, PQerrorMessage(conn)); exit(1); } checkResult(conn,PQexec(conn,DROP TABLE IF EXISTS test),NULL,PGRES_COMMAND_OK); checkResult(conn,PQexec(conn,CREATE TABLE test ( id SERIAL PRIMARY KEY )),NULL,PGRES_COMMAND_OK); PQsendQuery(conn, INSERT INTO test(id) VALUES (DEFAULT),(DEFAULT) RETURNING id); query1 = PQgetLastQuery(conn); /*
Re: [HACKERS] libpq pipelining
On Thu, Dec 4, 2014 at 4:11 PM, Matt Newell newe...@blur.com wrote: With the API i am proposing, only 2 new functions (PQgetFirstQuery, PQgetLastQuery) are required to be able to match each result to the query that caused it. Another function, PQgetNextQuery allows iterating through the pending queries, and PQgetQueryCommand permits getting the original query text. Adding the ability to set a user supplied pointer on the PGquery struct might make it much easier for some frameworks, and other users might want a callback, but I don't think either are required. With a pointer on PGquery you wouldn't need any of the above. Who whants the query text sets it as a pointer, who wants some other struct sets it as a pointer. You would only need to be careful about the lifetime of the pointed struct, but that onus is on the application I'd say. The API only needs to provide some guarantees about how long or short it holds onto that pointer. I'm thinking this would be somewhat necessary for a python wrapper, like psycopg2 (the wrapper could build a dictionary based on query text, but there's no guarantee that query text will be unique so it'd be very tricky). -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] libpq pipelining
On Thursday, December 04, 2014 04:30:27 PM Claudio Freire wrote: On Thu, Dec 4, 2014 at 4:11 PM, Matt Newell newe...@blur.com wrote: With the API i am proposing, only 2 new functions (PQgetFirstQuery, PQgetLastQuery) are required to be able to match each result to the query that caused it. Another function, PQgetNextQuery allows iterating through the pending queries, and PQgetQueryCommand permits getting the original query text. Adding the ability to set a user supplied pointer on the PGquery struct might make it much easier for some frameworks, and other users might want a callback, but I don't think either are required. With a pointer on PGquery you wouldn't need any of the above. Who whants the query text sets it as a pointer, who wants some other struct sets it as a pointer. libpq already stores the (current) query text as it's used in some error cases, so that's not really optional without breaking backwards compatibility. Adding another pointer for the user to optional utilize should be no big deal though if everyone agrees it's a good thing. You would only need to be careful about the lifetime of the pointed struct, but that onus is on the application I'd say. The API only needs to provide some guarantees about how long or short it holds onto that pointer. Agreed. I'm thinking this would be somewhat necessary for a python wrapper, like psycopg2 (the wrapper could build a dictionary based on query text, but there's no guarantee that query text will be unique so it'd be very tricky). While it might make some things simpler, i really don't think it absolutely necessary since the wrapper can maintain a queue that corresponds to libpq's internal queue of PGquery's. ie, each time you call a PQsendQuery* function you push your required state, and each time the return value of PQgetFirstQuery changes you pop from the queue. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] libpq pipelining
On 12/04/2014 09:11 PM, Matt Newell wrote: With the API i am proposing, only 2 new functions (PQgetFirstQuery, PQgetLastQuery) are required to be able to match each result to the query that caused it. Another function, PQgetNextQuery allows iterating through the pending queries, and PQgetQueryCommand permits getting the original query text. Adding the ability to set a user supplied pointer on the PGquery struct might make it much easier for some frameworks, and other users might want a callback, but I don't think either are required. I don't like exposing the PGquery struct to the application like that. Access to all other libpq objects is done via functions. The application can't (or shouldn't, anyway) directly access the fields of PGresult, for example. It has to call PQnfields(), PQntuples() etc. The user-supplied pointer seems quite pointless. It would make sense if the pointer was passed to PQsendquery(), and you'd get it back in PGquery. You could then use it to tag the query when you send it with whatever makes sense for the application, and use the tag in the result to match it with the original query. But as it stands, I don't see the point. The original query string might be handy for some things, but for others it's useless. It's not enough as a general method to identify the query the result belongs to. A common use case for this is to execute the same query many times with different parameters. So I don't think you've quite nailed the problem of how to match the results to the commands that originated them, yet. One idea is to add a function that can be called after PQgetResult(), to get some identifier of the original command. But there needs to be a mechanism to tag the PQsendQuery() calls. Or you can assign each call a unique ID automatically, and have a way to ask for that ID after calling PQsendQuery(). The explanation of PQgetFirstQuery makes it sound pretty hard to match up the result with the query. You have to pay attention to PQisBusy. It would be good to make it explicit when you start a pipelined operation. Currently, you get an error if you call PQsendQuery() twice in a row, without reading the result inbetween. That's a good thing, to catch application errors, when you're not trying to do pipelining. Otherwise, if you forget to get the result of a query you've sent, and then send another query, you'll merrily read the result of the first query and think that it belongs to the second. Are you trying to support continous pipelining, where you send new queries all the time, and read results as they arrive, without ever draining the pipe? Or are you just trying to do batches, where you send a bunch of queries, and wait for all the results to arrive, before sending more? A batched API would be easier to understand and work with, although a continuous pipeline could be more efficient for an application that can take advantage of it. Consideration of implicit transactions (autocommit), the whole pipeline being one transaction, or multiple transactions is needed. The more I think about this the more confident I am that no extra work is needed. Unless we start doing some preliminary processing of the query inside of libpq, our hands are tied wrt sending a sync at the end of each query. The reason for this is that we rely on the ReadyForQuery message to indicate the end of a query, so without the sync there is no way to tell if the next result is from another statement in the current query, or the first statement in the next query. I also don't see a reason to need multiple queries without a sync statement. If the user wants all queries to succeed or fail together it should be no problem to start the pipeline with begin and complete it commit. But I may be missing some detail... True. It makes me a bit uneasy, though, to not be sure that the whole batch is committed or rolled back as one unit. There are many ways the user can shoot himself in the foot with that. Error handling would be a lot simpler if you would only send one Sync for the whole batch. Tom explained it better on this recent thread: http://www.postgresql.org/message-id/32086.1415063...@sss.pgh.pa.us. Another thought is that for many applications, it would actually be OK to not know which query each result belongs to. For example, if you execute a bunch of inserts, you often just want to get back the total number of inserted, or maybe not even that. Or if you execute a CREATE TEMPORARY TABLE ... ON COMMIT DROP, followed by some insertions to it, some more data manipulations, and finally a SELECT to get the results back. All you want is the last result set. If we could modify the wire protocol, we'd want to have a MiniSync message that is like Sync except that it wouldn't close the current transaction. The server would respond to it with a ReadyForQuery message (which could carry an ID number, to match it up with the MiniSync command). But I really
Re: [HACKERS] libpq pipelining
On Thursday, December 04, 2014 11:39:02 PM Heikki Linnakangas wrote: Adding the ability to set a user supplied pointer on the PGquery struct might make it much easier for some frameworks, and other users might want a callback, but I don't think either are required. I don't like exposing the PGquery struct to the application like that. Access to all other libpq objects is done via functions. The application can't (or shouldn't, anyway) directly access the fields of PGresult, for example. It has to call PQnfields(), PQntuples() etc. Right, my patch doesn't expose it. I was thinking of adding two new functions to get/set the user tag/pointer. The user-supplied pointer seems quite pointless. It would make sense if the pointer was passed to PQsendquery(), and you'd get it back in PGquery. You could then use it to tag the query when you send it with whatever makes sense for the application, and use the tag in the result to match it with the original query. That's exactly what I envisioned, but with a separate call to avoid having to modify/duplicate the PQsendQuery functions: PQsendQuery(conn,...) query = PQgetLastQuery(conn); PQquerySetUserPointer(query,userPtr); ... result = PQgetResult(conn); query = PQgetFirstQuery(conn); userPtr = PQqueryGetUserPointer(query); But as it stands, I don't see the point. I don't need it since it should be easy to keep track without it. It was just an idea. The original query string might be handy for some things, but for others it's useless. It's not enough as a general method to identify the query the result belongs to. A common use case for this is to execute the same query many times with different parameters. Right, I'm only saving the query text because that's how things were done already. Since it's already there I didn't see a reason not to expose it. So I don't think you've quite nailed the problem of how to match the results to the commands that originated them, yet. One idea is to add a function that can be called after PQgetResult(), to get some identifier of the original command. But there needs to be a mechanism to tag the PQsendQuery() calls. Or you can assign each call a unique ID automatically, and have a way to ask for that ID after calling PQsendQuery(). PGquery IS the unique ID, and it is available after calling PQsendQuery by calling PQgetLastQuery. The explanation of PQgetFirstQuery makes it sound pretty hard to match up the result with the query. You have to pay attention to PQisBusy. It's not hard at all and is very natural to use since the whole point of an async api is to avoid blocking, so it's natural to only call PQgetResult when it's not going to block. PQgetFirstQuery should also be valid after calling PQgetResult and then you don't have to worry about PQisBusy, so I should probably change the documentation to indicate that is the preferred usage, or maybe make that the only guaranteed usage, and say the results are undefined if you call it before calling PQgetResult. That usage also makes it consistent with PQgetLastQuery being called immediately after PQsendQuery. Another option would be a function to get the PGquery for any PGresult. This would make things a bit more straightforward for the user, but more complicated in the implementation since multiple PGresults will share the same PGquery. However it's nothing that a reference count wouldn't solve. It would be good to make it explicit when you start a pipelined operation. Currently, you get an error if you call PQsendQuery() twice in a row, without reading the result inbetween. That's a good thing, to catch application errors, when you're not trying to do pipelining. Otherwise, if you forget to get the result of a query you've sent, and then send another query, you'll merrily read the result of the first query and think that it belongs to the second. Agreed, and I think this is the only behavior change currently. An easy fix to restore existing behavior by default: PQsetPipelining(PGconn *conn, int arg); should work. Are you trying to support continous pipelining, where you send new queries all the time, and read results as they arrive, without ever draining the pipe? Or are you just trying to do batches, where you send a bunch of queries, and wait for all the results to arrive, before sending more? A batched API would be easier to understand and work with, although a continuous pipeline could be more efficient for an application that can take advantage of it. I don't see any reason to limit it to batches, though it can certainly be used that way. My first test case does continuous pipelining and it provides a huge throughput gain when there's any latency in the connection. I can envision a lot of uses for the continuous approach. Consideration of implicit transactions (autocommit), the whole pipeline being one transaction, or multiple transactions is needed. The more I think about
Re: [HACKERS] libpq pipelining
The explanation of PQgetFirstQuery makes it sound pretty hard to match up the result with the query. You have to pay attention to PQisBusy. PQgetFirstQuery should also be valid after calling PQgetResult and then you don't have to worry about PQisBusy, so I should probably change the documentation to indicate that is the preferred usage, or maybe make that the only guaranteed usage, and say the results are undefined if you call it before calling PQgetResult. That usage also makes it consistent with PQgetLastQuery being called immediately after PQsendQuery. I changed my second example to call PQgetFirstQuery after PQgetResult instead of before, and that removes the need to call PQconsumeInput and PQisBusy when you don't mind blocking. It makes the example super simple: PQsendQuery(conn, INSERT INTO test(id) VALUES (DEFAULT),(DEFAULT) RETURNING id); query1 = PQgetLastQuery(conn); /* Duplicate primary key error */ PQsendQuery(conn, UPDATE test SET id=2 WHERE id=1); query2 = PQgetLastQuery(conn); PQsendQuery(conn, SELECT * FROM test); query3 = PQgetLastQuery(conn); while( (result = PQgetResult(conn)) != NULL ) { curQuery = PQgetFirstQuery(conn); if (curQuery == query1) checkResult(conn,result,curQuery,PGRES_TUPLES_OK); if (curQuery == query2) checkResult(conn,result,curQuery,PGRES_FATAL_ERROR); if (curQuery == query3) checkResult(conn,result,curQuery,PGRES_TUPLES_OK); } Note that the curQuery == queryX check will work no matter how many results a query produces. Matt Newell -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] libpq pipelining
Hi, The recent discussion about pipelining in the jodbc driver prompted me to look at what it would take for libpq. I have a proof of concept patch working. The results are even more promising than I expected. While it's true that many applications and frameworks won't easily benefit, it amazes me that this hasn't been explored before. I developed a simple test application that creates a table with a single auto increment primary key column, then runs a 4 simple queries x times each: INSERT INTO test() VALUES () SELECT * FROM test LIMIT 1 SELECT * FROM test DELETE FROM test The parameters to testPipelinedSeries are (number of times to execute each query, maximum number of queued queries). Results against local server: testPipelinedSeries(10,1) took 0.020884 testPipelinedSeries(10,3) took 0.020630, speedup 1.01 testPipelinedSeries(10,10) took 0.006265, speedup 3.33 testPipelinedSeries(100,1) took 0.042731 testPipelinedSeries(100,3) took 0.043035, speedup 0.99 testPipelinedSeries(100,10) took 0.037222, speedup 1.15 testPipelinedSeries(100,25) took 0.031223, speedup 1.37 testPipelinedSeries(100,50) took 0.032482, speedup 1.32 testPipelinedSeries(100,100) took 0.031356, speedup 1.36 Results against remote server through ssh tunnel(30-40ms rtt): testPipelinedSeries(10,1) took 3.2461736 testPipelinedSeries(10,3) took 1.1008443, speedup 2.44 testPipelinedSeries(10,10) took 0.342399, speedup 7.19 testPipelinedSeries(100,1) took 26.25882588 testPipelinedSeries(100,3) took 8.8509234, speedup 3.04 testPipelinedSeries(100,10) took 3.2866285, speedup 9.03 testPipelinedSeries(100,25) took 2.1472847, speedup 17.57 testPipelinedSeries(100,50) took 1.957510, speedup 27.03 testPipelinedSeries(100,100) took 0.690682, speedup 37.47 I plan to write documentation, add regression testing, and do general cleanup before asking for feedback on the patch itself. Any suggestions about performance testing or api design would be nice. I haven't played with changing the sync logic yet, but I'm guessing that an api to allow manual sync instead of a sync per PQsendQuery will be needed. That could make things tricky though with multi-statement queries, because currently the only way to detect when results change from one query to the next are a ReadyForQuery message. Matt Newell /* * src/test/examples/testlibpqpipeline.c * * * testlibpqpipeline.c * this test program test query pipelining and it's performance impact * * */ #include stdio.h #include stdlib.h #include sys/time.h #include libpq-fe.h // If defined we won't issue more sql commands if the socket's // write buffer is full //#define MIN_LOCAL_Q //#define PRINT_QUERY_PROGRESS static int testPipelined( PGconn * conn, int totalQueries, int totalQueued, const char * sql ); static int testPipelinedSeries( PGconn * conn, int totalQueries, int totalQueued, int baseline_usecs ); int testPipelined( PGconn * conn, int totalQueries, int totalQueued, const char * sql ) { int nQueriesQueued; int nQueriesTotal; PGresult * result; PGquery * firstQuery; PGquery * curQuery; nQueriesQueued = nQueriesTotal = 0; result = NULL; firstQuery = curQuery = NULL; while( nQueriesQueued 0 || nQueriesTotal totalQueries ) { if( PQconsumeInput(conn) == 0 ) { printf( PQconsumeInput ERROR: %s\n, PQerrorMessage(conn) ); return 1; } do { curQuery = PQgetFirstQuery(conn); /* firstQuery is finished */ if( firstQuery != curQuery ) { //printf( %p done, curQuery=%p\n, firstQuery, curQuery ); #ifdef PRINT_QUERY_PROGRESS printf(-); #endif firstQuery = curQuery; nQueriesQueued--; } /* Break if no queries are ready */ if( !firstQuery || PQisBusy(conn) ) break; if( (result = PQgetResult(conn)) != 0 ) PQclear(result); } while(1); if( nQueriesTotal totalQueries nQueriesQueued totalQueued ) { #ifdef MIN_LOCAL_Q int flushResult = PQflush(conn); if( flushResult == -1 ) { printf( PQflush ERROR: %s\n, PQerrorMessage(conn) ); return 1; } else if ( flushResult == 1 ) continue; #endif PQsendQuery(conn,sql); if( firstQuery == NULL ) firstQuery = PQgetFirstQuery(conn); nQueriesTotal++; nQueriesQueued++; #ifdef PRINT_QUERY_PROGRESS printf( + ); #endif } } #ifdef PRINT_QUERY_PROGRESS printf( \n ); #endif return 0; } int testPipelinedSeries( PGconn * conn, int totalQueries, int totalQueued, int baseline_usecs ) { int result; struct timeval tv1, tv2; int secs, usecs; gettimeofday(tv1,NULL); #define TEST_P(q) \ if( (result = testPipelined(conn,totalQueries,totalQueued,q)) != 0 ) \ return result; TEST_P(INSERT INTO test() VALUES ()); TEST_P(SELECT * FROM test LIMIT 1); TEST_P(SELECT * FROM test); TEST_P(DELETE FROM test); gettimeofday(tv2,NULL); secs = tv2.tv_sec - tv1.tv_sec; usecs = secs * 100 + tv2.tv_usec - tv1.tv_usec; printf(testPipelinedSeries(%i,%i) took