Re: [HACKERS] libpq pipelining

2014-12-10 Thread Matt Newell
On Friday, December 05, 2014 12:22:38 PM Heikki Linnakangas wrote:
 Oh, that's what the PQgetLastQuery/PQgetNextQuery functions work! I
 didn't understand that before. I'd suggest renaming them to something
 like PQgetSentQuery() and PQgetResultQuery(). The first/last/next names
 made me think that they're used to iterate a list of queries, but in
 fact they're supposed to be used at very different stages.
 
 - Heikki


Okay, I have renamed them with your suggestions, and added 
PQsetPipelining/PQgetPipelining, defaulting to pipelining off.  There should be 
no behavior change unless pipelining is enabled.

Documentation should be mostly complete except the possible addition of an 
example and maybe a general pipelining overview paragraph.

I have implemented async query support (that takes advantage of pipelining) in 
Qt, along with a couple test cases.  This led to me discovering a bug with my 
last patch where a PGquery object could be reused twice in a row.  I have fixed 
that.  I contemplated not reusing the PGquery objects at all, but that 
wouldn't solve the problem because it's very possible that malloc will return 
a recent free of the same size anyway.  Making the guarantee that a PGquery 
won't be reused twice in a row should be sufficient, and the only alternative 
is 
to add a unique id, but that will add further complexity that I don't think is 
warranted.

Feedback is very welcome and appreciated.

Thanks,
Matt Newell


diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml
index d829a4b..4e0431e 100644
--- a/doc/src/sgml/libpq.sgml
+++ b/doc/src/sgml/libpq.sgml
@@ -3947,9 +3947,14 @@ int PQsendQuery(PGconn *conn, const char *command);
 
After successfully calling functionPQsendQuery/function, call
functionPQgetResult/function one or more times to obtain the
-   results.  functionPQsendQuery/function cannot be called again
-   (on the same connection) until functionPQgetResult/function
-   has returned a null pointer, indicating that the command is done.
+   results.  If pipelining is enabled functionPQsendQuery/function
+   may be called multiple times before reading the results. See 
+   functionPQsetPipelining/function and functionPQisPipelining/function.
+   Call functionPQgetSentQuery/function to get a structnamePGquery/structname
+   which can be used to identify which results obtained from
+   functionPQgetResult/function belong to each pipelined query.
+   If only one query is dispatched at a time, you can call functionPQgetResult/function
+   until a NULL value is returned to indicate the end of the query.
   /para
  /listitem
 /varlistentry
@@ -4133,8 +4138,8 @@ PGresult *PQgetResult(PGconn *conn);
 
   para
functionPQgetResult/function must be called repeatedly until
-   it returns a null pointer, indicating that the command is done.
-   (If called when no command is active,
+   it returns a null pointer, indicating that all dispatched commands
+   are done. (If called when no command is active,
functionPQgetResult/function will just return a null pointer
at once.) Each non-null result from
functionPQgetResult/function should be processed using the
@@ -4144,14 +4149,17 @@ PGresult *PQgetResult(PGconn *conn);
functionPQgetResult/function will block only if a command is
active and the necessary response data has not yet been read by
functionPQconsumeInput/function.
+   If query pipelining is being used, functionPQgetResultQuery/function
+   can be called after PQgetResult to match the result to the query.
   /para
 
   note
para
 Even when functionPQresultStatus/function indicates a fatal
-error, functionPQgetResult/function should be called until it
-returns a null pointer, to allow applicationlibpq/ to
-process the error information completely.
+error, functionPQgetResult/function should be called until the
+query has no more results (null pointer return if not using query
+pipelining, otherwise see functionPQgetResultQuery/function),
+to allow applicationlibpq/ to process the error information completely.
/para
   /note
  /listitem
@@ -4385,6 +4393,158 @@ int PQflush(PGconn *conn);
read-ready and then read the response as described above.
   /para
 
+ variablelist
+  varlistentry id=libpq-pqsetpipelining
+   term
+functionPQsetPipelining/function
+indexterm
+ primaryPQsetPipelining/primary
+/indexterm
+   /term
+
+   listitem
+para
+ Enables or disables query pipelining.
+synopsis
+int PQsetPipelining(PGconn *conn, int arg);
+/synopsis
+/para
+
+para
+ Enables pipelining for the connectino if arg is 1, or disables it
+ if arg is 0.  When pipelining is enabled multiple async queries can
+ be sent before processing the results of the first.  If pipelining
+ is disabled an 

Re: [HACKERS] libpq pipelining

2014-12-05 Thread Heikki Linnakangas

On 12/05/2014 02:30 AM, Matt Newell wrote:



The explanation of PQgetFirstQuery makes it sound pretty hard to match
up the result with the query. You have to pay attention to PQisBusy.


PQgetFirstQuery should also be valid after
calling PQgetResult and then you don't have to worry about PQisBusy, so I
should probably change the documentation to indicate that is the preferred
usage, or maybe make that the only guaranteed usage, and say the results
are undefined if you call it before calling PQgetResult.  That usage also
makes it consistent with PQgetLastQuery being called immediately after
PQsendQuery.


I changed my second example to call PQgetFirstQuery after PQgetResult instead
of before, and that removes the need to call PQconsumeInput and PQisBusy when
you don't mind blocking.  It makes the example super simple:

PQsendQuery(conn, INSERT INTO test(id) VALUES (DEFAULT),(DEFAULT)
RETURNING id);
query1 = PQgetLastQuery(conn);

/* Duplicate primary key error */
PQsendQuery(conn, UPDATE test SET id=2 WHERE id=1);
query2 = PQgetLastQuery(conn);

PQsendQuery(conn, SELECT * FROM test);
query3 = PQgetLastQuery(conn);

while( (result = PQgetResult(conn)) != NULL )
{
curQuery = PQgetFirstQuery(conn);

if (curQuery == query1)
checkResult(conn,result,curQuery,PGRES_TUPLES_OK);
if (curQuery == query2)
checkResult(conn,result,curQuery,PGRES_FATAL_ERROR);
if (curQuery == query3)
checkResult(conn,result,curQuery,PGRES_TUPLES_OK);
}

Note that the curQuery == queryX check will work no matter how many results a
query produces.


Oh, that's what the PQgetLastQuery/PQgetNextQuery functions work! I 
didn't understand that before. I'd suggest renaming them to something 
like PQgetSentQuery() and PQgetResultQuery(). The first/last/next names 
made me think that they're used to iterate a list of queries, but in 
fact they're supposed to be used at very different stages.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] libpq pipelining

2014-12-04 Thread Heikki Linnakangas

On 12/04/2014 03:11 AM, Matt Newell wrote:

The recent discussion about pipelining in the jodbc driver prompted me to look
at what it would take for libpq.


Great!


I have a proof of concept patch working.  The results are even more promising
than I expected.

While it's true that many applications and frameworks won't easily benefit, it
amazes me that this hasn't been explored before.

I developed a simple test application that creates a table with a single auto
increment primary key column, then runs a 4 simple queries x times each:
...

I plan to write documentation, add regression testing, and do general cleanup
before asking for feedback on the patch itself.  Any suggestions about
performance testing or api design would be nice.  I haven't played with
changing the sync logic yet, but I'm guessing that an api to allow manual sync
instead of a sync per PQsendQuery will be needed.  That could make things
tricky though with multi-statement queries, because currently the only way to
detect when results change from one query  to the next are a ReadyForQuery
message.


A good API is crucial for this. It should make it easy to write an 
application that does pipelining, and to handle all the error conditions 
in a predictable way. I'd suggest that you write the documentation 
first, before writing any code, so that we can discuss the API. It 
doesn't have to be in SGML format yet, a plain-text description of the 
API will do.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] libpq pipelining

2014-12-04 Thread Craig Ringer
On 12/04/2014 05:08 PM, Heikki Linnakangas wrote:

 
 A good API is crucial for this. It should make it easy to write an
 application that does pipelining, and to handle all the error conditions
 in a predictable way. I'd suggest that you write the documentation
 first, before writing any code, so that we can discuss the API. It
 doesn't have to be in SGML format yet, a plain-text description of the
 API will do.

I strongly agree.

Applications need to be able to reliably predict what will happen if
there's an error in the middle of a pipeline.

Consideration of implicit transactions (autocommit), the whole pipeline
being one transaction, or multiple transactions is needed.

Apps need to be able to wait for the result of a query partway through a
pipeline, e.g. scheduling four queries, then waiting for the result of
the 2nd.

There are probably plenty of other wrinkly bits to think about.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] libpq pipelining

2014-12-04 Thread Matt Newell
On Thursday, December 04, 2014 10:30:46 PM Craig Ringer wrote:
 On 12/04/2014 05:08 PM, Heikki Linnakangas wrote:
  A good API is crucial for this. It should make it easy to write an
  application that does pipelining, and to handle all the error conditions
  in a predictable way. I'd suggest that you write the documentation
  first, before writing any code, so that we can discuss the API. It
  doesn't have to be in SGML format yet, a plain-text description of the
  API will do.
 
 I strongly agree.
 
First pass at the documentation changes attached, along with a new example 
that demonstrates pipelining 3 queries, with the middle one resulting in a 
PGRES_FATAL_ERROR response.

With the API i am proposing, only 2 new functions (PQgetFirstQuery, 
PQgetLastQuery) are required to be able to match each result to the query that 
caused it.  Another function, PQgetNextQuery allows iterating through the 
pending queries, and PQgetQueryCommand permits getting the original query 
text.

Adding the ability to set a user supplied pointer on the PGquery struct might 
make it much easier for some frameworks, and other users might want a 
callback, but I don't think either are required.

 Applications need to be able to reliably predict what will happen if
 there's an error in the middle of a pipeline.
 
Yes, the API i am proposing makes it easy to get results for each submitted 
query independently of the success or failure of previous queries in the 
pipeline.

 Consideration of implicit transactions (autocommit), the whole pipeline
 being one transaction, or multiple transactions is needed.
The more I think about this the more confident I am that no extra work is 
needed.

Unless we start doing some preliminary processing of the query inside of 
libpq, our hands are tied wrt sending a sync at the end of each query.  The 
reason for this is that we rely on the ReadyForQuery message to indicate the 
end of a query, so without the sync there is no way to tell if the next result 
is from another statement in the current query, or the first statement in the 
next query.

I also don't see a reason to need multiple queries without a sync statement.  
If the user wants all queries to succeed or fail together it should be no 
problem to start the pipeline with begin and complete it commit.  But I may be 
missing some detail...


 
 Apps need to be able to wait for the result of a query partway through a
 pipeline, e.g. scheduling four queries, then waiting for the result of
 the 2nd.
 
Right.  With the api i am proposing the user does have to process each result 
until it gets to the one it wants, but it's no problem doing that.  It would 
also be trivial to add a function

PGresult * PQgetNextQueryResult(PQquery *query);

that discards all results from previous queries.  Very similar to how a PQexec 
disregards all results from previous async queries.

It would also be possible to queue the results and be able to retrieve them 
out of order, but I think that add unnecessary complexity and might also make 
it easy for users to never retrieve and free some results.

 There are probably plenty of other wrinkly bits to think about.

Yup, I'm sure i'm still missing some significant things at this point...

Matt Newell
/*
 * src/test/examples/testlibpqpipeline2.c
 *
 *
 * testlibpqpipeline.c
 *		this test program tests query pipelining.  It shows how to issue multiple
 *  pipelined queries, and identify from which query a result originated.  It 
 *  also demonstrates how failure of one query does not impact subsequent queries
 *  when they are not part of the same transaction.
 *
 *
 */
#include stdio.h
#include stdlib.h
#include sys/time.h

#include libpq-fe.h

static void checkResult(PGconn *conn, PGresult *result, PGquery *query, int expectedResultStatus)
{
	if (PQresultStatus(result) != expectedResultStatus)
	{
		printf( Got unexpected result status '%s', expected '%s'\nQuery:%s\n, 
			PQresStatus(PQresultStatus(result)), PQresStatus(expectedResultStatus),
			PQgetQueryCommand(query));
		PQclear(result);
		PQclear(PQexec(conn,DROP TABLE test));
		PQfinish(conn);
		exit(1);
	}
	PQclear(result);
}

int
main(int argc, char **argv)
{
	PGconn * conn;
	PGquery * query1;
	PGquery * query2;
	PGquery * query3;
	PGquery * curQuery;
	PGresult * result;
	
	conn = NULL;
	query1 = query2 = query3 = curQuery = NULL;
	result = NULL;
	
	/* make a connection to the database */
	conn = PQsetdb(NULL, NULL, NULL, NULL, NULL);

	/* check to see that the backend connection was successfully made */
	if (PQstatus(conn) != CONNECTION_OK)
	{
		fprintf(stderr, Connection to database failed: %s,
PQerrorMessage(conn));
		exit(1);
	}

	checkResult(conn,PQexec(conn,DROP TABLE IF EXISTS test),NULL,PGRES_COMMAND_OK);
	checkResult(conn,PQexec(conn,CREATE TABLE test ( id SERIAL PRIMARY KEY )),NULL,PGRES_COMMAND_OK);
	
	PQsendQuery(conn, INSERT INTO test(id) VALUES (DEFAULT),(DEFAULT) RETURNING id);
	query1 = PQgetLastQuery(conn);
	
	/* 

Re: [HACKERS] libpq pipelining

2014-12-04 Thread Claudio Freire
On Thu, Dec 4, 2014 at 4:11 PM, Matt Newell newe...@blur.com wrote:
 With the API i am proposing, only 2 new functions (PQgetFirstQuery,
 PQgetLastQuery) are required to be able to match each result to the query that
 caused it.  Another function, PQgetNextQuery allows iterating through the
 pending queries, and PQgetQueryCommand permits getting the original query
 text.

 Adding the ability to set a user supplied pointer on the PGquery struct might
 make it much easier for some frameworks, and other users might want a
 callback, but I don't think either are required.

With a pointer on PGquery you wouldn't need any of the above. Who
whants the query text sets it as a pointer, who wants some other
struct sets it as a pointer.

You would only need to be careful about the lifetime of the pointed
struct, but that onus is on the application I'd say. The API only
needs to provide some guarantees about how long or short it holds onto
that pointer.

I'm thinking this would be somewhat necessary for a python wrapper,
like psycopg2 (the wrapper could build a dictionary based on query
text, but there's no guarantee that query text will be unique so it'd
be very tricky).


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] libpq pipelining

2014-12-04 Thread Matt Newell
On Thursday, December 04, 2014 04:30:27 PM Claudio Freire wrote:
 On Thu, Dec 4, 2014 at 4:11 PM, Matt Newell newe...@blur.com wrote:
  With the API i am proposing, only 2 new functions (PQgetFirstQuery,
  PQgetLastQuery) are required to be able to match each result to the query
  that caused it.  Another function, PQgetNextQuery allows iterating
  through the pending queries, and PQgetQueryCommand permits getting the
  original query text.
  
  Adding the ability to set a user supplied pointer on the PGquery struct
  might make it much easier for some frameworks, and other users might want
  a callback, but I don't think either are required.
 
 With a pointer on PGquery you wouldn't need any of the above. Who
 whants the query text sets it as a pointer, who wants some other
 struct sets it as a pointer.
 
libpq already stores the (current) query text as it's used in some error 
cases, so that's not really optional without breaking backwards compatibility.  
Adding another pointer for the user to optional utilize should be no big deal 
though if everyone agrees it's a good thing.

 You would only need to be careful about the lifetime of the pointed
 struct, but that onus is on the application I'd say. The API only
 needs to provide some guarantees about how long or short it holds onto
 that pointer.
Agreed.

 
 I'm thinking this would be somewhat necessary for a python wrapper,
 like psycopg2 (the wrapper could build a dictionary based on query
 text, but there's no guarantee that query text will be unique so it'd
 be very tricky).
While it might make some things simpler, i really don't think it absolutely 
necessary since the wrapper can maintain a queue that corresponds to libpq's 
internal queue of PGquery's.  ie, each time you call a PQsendQuery* function 
you push your required state, and each time the return value of 
PQgetFirstQuery changes you pop from the queue.  




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] libpq pipelining

2014-12-04 Thread Heikki Linnakangas

On 12/04/2014 09:11 PM, Matt Newell wrote:

With the API i am proposing, only 2 new functions (PQgetFirstQuery,
PQgetLastQuery) are required to be able to match each result to the query that
caused it.  Another function, PQgetNextQuery allows iterating through the
pending queries, and PQgetQueryCommand permits getting the original query
text.

Adding the ability to set a user supplied pointer on the PGquery struct might
make it much easier for some frameworks, and other users might want a
callback, but I don't think either are required.


I don't like exposing the PGquery struct to the application like that. 
Access to all other libpq objects is done via functions. The application 
can't (or shouldn't, anyway) directly access the fields of PGresult, for 
example. It has to call PQnfields(), PQntuples() etc.


The user-supplied pointer seems quite pointless. It would make sense if 
the pointer was passed to PQsendquery(), and you'd get it back in 
PGquery. You could then use it to tag the query when you send it with 
whatever makes sense for the application, and use the tag in the result 
to match it with the original query. But as it stands, I don't see the 
point.


The original query string might be handy for some things, but for others 
it's useless. It's not enough as a general method to identify the query 
the result belongs to. A common use case for this is to execute the same 
query many times with different parameters.


So I don't think you've quite nailed the problem of how to match the 
results to the commands that originated them, yet. One idea is to add a 
function that can be called after PQgetResult(), to get some identifier 
of the original command. But there needs to be a mechanism to tag the 
PQsendQuery() calls. Or you can assign each call a unique ID 
automatically, and have a way to ask for that ID after calling 
PQsendQuery().


The explanation of PQgetFirstQuery makes it sound pretty hard to match 
up the result with the query. You have to pay attention to PQisBusy.


It would be good to make it explicit when you start a pipelined 
operation. Currently, you get an error if you call PQsendQuery() twice 
in a row, without reading the result inbetween. That's a good thing, to 
catch application errors, when you're not trying to do pipelining. 
Otherwise, if you forget to get the result of a query you've sent, and 
then send another query, you'll merrily read the result of the first 
query and think that it belongs to the second.


Are you trying to support continous pipelining, where you send new 
queries all the time, and read results as they arrive, without ever 
draining the pipe? Or are you just trying to do batches, where you 
send a bunch of queries, and wait for all the results to arrive, before 
sending more? A batched API would be easier to understand and work with, 
although a continuous pipeline could be more efficient for an 
application that can take advantage of it.



Consideration of implicit transactions (autocommit), the whole pipeline
being one transaction, or multiple transactions is needed.

The more I think about this the more confident I am that no extra work is
needed.

Unless we start doing some preliminary processing of the query inside of
libpq, our hands are tied wrt sending a sync at the end of each query.  The
reason for this is that we rely on the ReadyForQuery message to indicate the
end of a query, so without the sync there is no way to tell if the next result
is from another statement in the current query, or the first statement in the
next query.

I also don't see a reason to need multiple queries without a sync statement.
If the user wants all queries to succeed or fail together it should be no
problem to start the pipeline with begin and complete it commit.  But I may be
missing some detail...


True. It makes me a bit uneasy, though, to not be sure that the whole 
batch is committed or rolled back as one unit. There are many ways the 
user can shoot himself in the foot with that. Error handling would be a 
lot simpler if you would only send one Sync for the whole batch. Tom 
explained it better on this recent thread: 
http://www.postgresql.org/message-id/32086.1415063...@sss.pgh.pa.us.


Another thought is that for many applications, it would actually be OK 
to not know which query each result belongs to. For example, if you 
execute a bunch of inserts, you often just want to get back the total 
number of inserted, or maybe not even that. Or if you execute a CREATE 
TEMPORARY TABLE ... ON COMMIT DROP, followed by some insertions to it, 
some more data manipulations, and finally a SELECT to get the results 
back. All you want is the last result set.


If we could modify the wire protocol, we'd want to have a MiniSync 
message that is like Sync except that it wouldn't close the current 
transaction. The server would respond to it with a ReadyForQuery message 
(which could carry an ID number, to match it up with the MiniSync 
command). But I really 

Re: [HACKERS] libpq pipelining

2014-12-04 Thread Matt Newell
On Thursday, December 04, 2014 11:39:02 PM Heikki Linnakangas wrote:
  Adding the ability to set a user supplied pointer on the PGquery struct
  might make it much easier for some frameworks, and other users might want
  a callback, but I don't think either are required.
 
 I don't like exposing the PGquery struct to the application like that.
 Access to all other libpq objects is done via functions. The application
 can't (or shouldn't, anyway) directly access the fields of PGresult, for
 example. It has to call PQnfields(), PQntuples() etc.
 
Right, my patch doesn't expose it.  I was thinking of adding two new functions 
to get/set the user tag/pointer.

 The user-supplied pointer seems quite pointless. It would make sense if
 the pointer was passed to PQsendquery(), and you'd get it back in
 PGquery. You could then use it to tag the query when you send it with
 whatever makes sense for the application, and use the tag in the result
 to match it with the original query.
That's exactly what I envisioned, but with a separate call to avoid having to 
modify/duplicate the PQsendQuery functions:

PQsendQuery(conn,...)
query = PQgetLastQuery(conn);
PQquerySetUserPointer(query,userPtr);

...
result = PQgetResult(conn);
query = PQgetFirstQuery(conn);
userPtr = PQqueryGetUserPointer(query);

 But as it stands, I don't see the
 point.
I don't need it since it should be easy to keep track without it.  It was just 
an idea.

 The original query string might be handy for some things, but for others
 it's useless. It's not enough as a general method to identify the query
 the result belongs to. A common use case for this is to execute the same
 query many times with different parameters.
 
Right, I'm only saving the query text because that's how things were done 
already.  Since it's already there I didn't see a reason not to expose it.

 So I don't think you've quite nailed the problem of how to match the
 results to the commands that originated them, yet. One idea is to add a
 function that can be called after PQgetResult(), to get some identifier
 of the original command. But there needs to be a mechanism to tag the
 PQsendQuery() calls. Or you can assign each call a unique ID
 automatically, and have a way to ask for that ID after calling
 PQsendQuery().
PGquery IS the unique ID, and it is available after calling PQsendQuery by 
calling PQgetLastQuery.  

 
 The explanation of PQgetFirstQuery makes it sound pretty hard to match
 up the result with the query. You have to pay attention to PQisBusy.
 
It's not hard at all and is very natural to use since the whole point of an 
async api is to avoid blocking, so it's natural to only call PQgetResult when 
it's not going to block.  PQgetFirstQuery should also be valid after calling 
PQgetResult and then you don't have to worry about PQisBusy, so I should 
probably change the documentation to indicate that is the preferred usage, or 
maybe make that the only guaranteed usage, and say the results are undefined if 
you call it before calling PQgetResult.  That usage also makes it consistent 
with PQgetLastQuery being called immediately after PQsendQuery.

Another option would be a function to get the PGquery for any PGresult.  This 
would make things a bit more straightforward for the user, but more 
complicated in the implementation since multiple PGresults will share the same 
PGquery.  However it's nothing that a reference count wouldn't solve.

 It would be good to make it explicit when you start a pipelined
 operation. Currently, you get an error if you call PQsendQuery() twice
 in a row, without reading the result inbetween. That's a good thing, to
 catch application errors, when you're not trying to do pipelining.
 Otherwise, if you forget to get the result of a query you've sent, and
 then send another query, you'll merrily read the result of the first
 query and think that it belongs to the second.
Agreed, and I think this is the only behavior change currently. An easy fix to 
restore existing behavior by default:

PQsetPipelining(PGconn *conn, int arg); 

should work.

 
 Are you trying to support continous pipelining, where you send new
 queries all the time, and read results as they arrive, without ever
 draining the pipe? Or are you just trying to do batches, where you
 send a bunch of queries, and wait for all the results to arrive, before
 sending more? A batched API would be easier to understand and work with,
 although a continuous pipeline could be more efficient for an
 application that can take advantage of it.
 
I don't see any reason to limit it to batches, though it can certainly be used 
that way.  My first test case does continuous pipelining and it provides a huge 
 
throughput gain when there's any latency in the connection.  I can envision a 
lot of uses for the continuous approach.

  Consideration of implicit transactions (autocommit), the whole pipeline
  being one transaction, or multiple transactions is needed.
  
  The more I think about 

Re: [HACKERS] libpq pipelining

2014-12-04 Thread Matt Newell
 
  The explanation of PQgetFirstQuery makes it sound pretty hard to match
  up the result with the query. You have to pay attention to PQisBusy.
 
 PQgetFirstQuery should also be valid after
 calling PQgetResult and then you don't have to worry about PQisBusy, so I
 should probably change the documentation to indicate that is the preferred
 usage, or maybe make that the only guaranteed usage, and say the results
 are undefined if you call it before calling PQgetResult.  That usage also
 makes it consistent with PQgetLastQuery being called immediately after
 PQsendQuery.
 
I changed my second example to call PQgetFirstQuery after PQgetResult instead 
of before, and that removes the need to call PQconsumeInput and PQisBusy when 
you don't mind blocking.  It makes the example super simple:

PQsendQuery(conn, INSERT INTO test(id) VALUES (DEFAULT),(DEFAULT) 
RETURNING id);
query1 = PQgetLastQuery(conn);

/* Duplicate primary key error */
PQsendQuery(conn, UPDATE test SET id=2 WHERE id=1);
query2 = PQgetLastQuery(conn);

PQsendQuery(conn, SELECT * FROM test);
query3 = PQgetLastQuery(conn);

while( (result = PQgetResult(conn)) != NULL )
{
curQuery = PQgetFirstQuery(conn);

if (curQuery == query1)
checkResult(conn,result,curQuery,PGRES_TUPLES_OK);
if (curQuery == query2)
checkResult(conn,result,curQuery,PGRES_FATAL_ERROR);
if (curQuery == query3)
checkResult(conn,result,curQuery,PGRES_TUPLES_OK);
}

Note that the curQuery == queryX check will work no matter how many results a 
query produces.

Matt Newell



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] libpq pipelining

2014-12-03 Thread Matt Newell

Hi,

The recent discussion about pipelining in the jodbc driver prompted me to look 
at what it would take for libpq.

I have a proof of concept patch working.  The results are even more promising 
than I expected.

While it's true that many applications and frameworks won't easily benefit, it 
amazes me that this hasn't been explored before.  

I developed a simple test application that creates a table with a single auto 
increment primary key column, then runs a 4 simple queries x times each:

INSERT INTO test() VALUES ()
SELECT * FROM test LIMIT 1
SELECT * FROM test
DELETE FROM test

The parameters to testPipelinedSeries are (number of times to execute each 
query, maximum number of queued queries).

Results against local server:

testPipelinedSeries(10,1) took 0.020884
testPipelinedSeries(10,3) took 0.020630, speedup 1.01
testPipelinedSeries(10,10) took 0.006265, speedup 3.33
testPipelinedSeries(100,1) took 0.042731
testPipelinedSeries(100,3) took 0.043035, speedup 0.99
testPipelinedSeries(100,10) took 0.037222, speedup 1.15
testPipelinedSeries(100,25) took 0.031223, speedup 1.37
testPipelinedSeries(100,50) took 0.032482, speedup 1.32
testPipelinedSeries(100,100) took 0.031356, speedup 1.36

Results against remote server through ssh tunnel(30-40ms rtt):

testPipelinedSeries(10,1) took 3.2461736
testPipelinedSeries(10,3) took 1.1008443, speedup 2.44
testPipelinedSeries(10,10) took 0.342399, speedup 7.19
testPipelinedSeries(100,1) took 26.25882588
testPipelinedSeries(100,3) took 8.8509234, speedup 3.04
testPipelinedSeries(100,10) took 3.2866285, speedup 9.03
testPipelinedSeries(100,25) took 2.1472847, speedup 17.57
testPipelinedSeries(100,50) took 1.957510, speedup 27.03
testPipelinedSeries(100,100) took 0.690682, speedup 37.47

I plan to write documentation, add regression testing, and do general cleanup 
before asking for feedback on the patch itself.  Any suggestions about 
performance testing or api design would be nice.  I haven't played with 
changing the sync logic yet, but I'm guessing that an api to allow manual sync 
instead of a sync per PQsendQuery will be needed.  That could make things 
tricky though with multi-statement queries, because currently the only way to 
detect when results change from one query  to the next are a ReadyForQuery 
message.

Matt Newell

/*
 * src/test/examples/testlibpqpipeline.c
 *
 *
 * testlibpqpipeline.c
 *		this test program test query pipelining and it's performance impact
 *
 *
 */
#include stdio.h
#include stdlib.h
#include sys/time.h

#include libpq-fe.h

// If defined we won't issue more sql commands if the socket's
// write buffer is full
//#define MIN_LOCAL_Q

//#define PRINT_QUERY_PROGRESS

static int testPipelined( PGconn * conn, int totalQueries, int totalQueued, const char * sql );
static int testPipelinedSeries( PGconn * conn, int totalQueries, int totalQueued, int baseline_usecs );


int
testPipelined( PGconn * conn, int totalQueries, int totalQueued, const char * sql )
{
	int nQueriesQueued;
	int nQueriesTotal;
	PGresult * result;
	PGquery * firstQuery;
	PGquery * curQuery;
	
	nQueriesQueued = nQueriesTotal = 0;
	result = NULL;
	firstQuery = curQuery = NULL;
	
	while( nQueriesQueued  0 || nQueriesTotal  totalQueries ) {
		
		if( PQconsumeInput(conn) == 0 ) {
			printf( PQconsumeInput ERROR: %s\n, PQerrorMessage(conn) );
			return 1;
		}
		
		do {
			curQuery = PQgetFirstQuery(conn);
			
			/* firstQuery is finished */
			if( firstQuery != curQuery )
			{
//printf( %p done, curQuery=%p\n, firstQuery, curQuery );
#ifdef PRINT_QUERY_PROGRESS
printf(-);
#endif
firstQuery = curQuery;
nQueriesQueued--;
			}
			
			/* Break if no queries are ready */
			if( !firstQuery || PQisBusy(conn) )
break;
			
			if( (result = PQgetResult(conn)) != 0 )
PQclear(result);
		}
		while(1);
		
		if( nQueriesTotal  totalQueries  nQueriesQueued  totalQueued ) {
#ifdef MIN_LOCAL_Q
			int flushResult = PQflush(conn);
			 if( flushResult == -1 ) {
printf( PQflush ERROR: %s\n, PQerrorMessage(conn) );
return 1;
			} else if ( flushResult == 1 )
continue;
#endif
			PQsendQuery(conn,sql);
			if( firstQuery == NULL )
firstQuery = PQgetFirstQuery(conn);
			nQueriesTotal++;
			nQueriesQueued++;
#ifdef PRINT_QUERY_PROGRESS
			printf( + );
#endif
		}
	}
#ifdef PRINT_QUERY_PROGRESS
	printf( \n );
#endif
	return 0;
}

int testPipelinedSeries( PGconn * conn, int totalQueries, int totalQueued, int baseline_usecs )
{
	int result;
	struct timeval tv1, tv2;
	int secs, usecs;
	
	gettimeofday(tv1,NULL);
#define TEST_P(q) \
	if( (result = testPipelined(conn,totalQueries,totalQueued,q)) != 0 ) \
		return result;
	TEST_P(INSERT INTO test() VALUES ());
	TEST_P(SELECT * FROM test LIMIT 1);
	TEST_P(SELECT * FROM test);
	TEST_P(DELETE FROM test);
	gettimeofday(tv2,NULL);
	secs = tv2.tv_sec - tv1.tv_sec;
	usecs = secs * 100 + tv2.tv_usec - tv1.tv_usec;
	printf(testPipelinedSeries(%i,%i) took