I would like to add truly asynchronous query processing to libpq,
enabling command pipelining. The idea is to to allow applications to
auto-tune to the bandwidth-delay product and reduce the number of
context switches when running against a local server.
Here's a sketch of what the interface would look like. I believe it can
be implemented in principle within the existing libpq structure,
although the interaction with single-row mode looks complicated.
If the application is not interested in intermediate query results, it
would use something like this:
PGAsyncMode oldMode = PQsetsendAsyncMode(conn, PQASYNC_DISCARD);
PGresult *res NULL;
while (more_data()) {
int ret = PQsendQueryParams(conn, "INSERT ...", ...);
if (ret == 0) {
// handle low-level error
}
// We can issue a second command. There is no need to fetch
// the result immediately.
ret = PQsendQueryParams(conn, "INSERT ...", ...);
if (ret == 0) {
// handle low-level error
}
res = PQgetResultNoWait(conn);
if (res != NULL) {
// Only returns non-NULL in case of error. Does not wait for
// data to arrive from the server. If there is an error,
// drains all pending errors and reports only the first error.
break;
}
}
if (res == NULL) {
// NULL means that all commands completed without error.
// Again, drains all pending errors.
// (Necessarily syncs with the server.)
res = PQgetResult(conn));
}
PQsetsendAsyncMode(conn, oldMode);
if (res != NULL) {
// Handle error result.
...
PQclear(res);
throw_error();
}
If there is no need to exit from the loop early (say, because errors are
expected to be extremely rare), the PQgetResultNoWait call can be left out.
Internally, libpq will check in PQsendQueryParams (and similar
functions) whether data is available for reading. If yes, it is
consumed, discarded unless it is an error, and stored for later
retrieval with PQgetResultNoWait or PQgetResult. (After the first
error, all responses are discarded—this is necessary to avoid
deadlocks.) PQgetResultNoWait returns immediately if there is no stored
error object. If there is one, it syncs with the server (draining all
pending results). PQgetResult just drains all pending results,
reporting the first error.
The general expectation here is that commands are sent within a
transaction block, so only the first error is interesting. (It make
sense for fatal errors to overtake regular errors.)
If the programmer needs results, not just the errors, a different mode
is needed:
PGAsyncMode oldMode = PQsetsendAsyncMode(conn, PQASYNC_RESULT);
bool more_data;
do {
more_data = ...;
if (more_data) {
int ret = PQsendQueryParams(conn,
"INSERT ... RETURNING ...", ...);
if (ret == 0) {
// handle low-level error
}
}
// Consume all pending results.
while (1) {
PGresult *res;
if (more_data) {
res = PQgetResultNoWait(conn);
} else {
res = PQgetResult(conn);
}
if (res == NULL) {
// No result so far.
break;
}
// Result data is available. Check if it is an error.
switch (PGresultStatus(res)) {
case PGRES_TUPLES_OK:
// Process asynchronous result data.
...
break;
// Error handling.
case PGRES_BAD_RESPONSE:
case PGRES_NONFATAL_ERROR:
case PGRES_FATAL_ERROR:
// Store error somewhere.
...
// Keep draining results, eventually
// synchronizing with the server.
more_data = false;
break;
}
PQclear(res);
}
} while (more_data);
PQsetsendAsyncMode(conn, oldMode);
(This suggest that PQgetResultNoWait should have a different name and a
flag argument that tells it whether it is blocking or not.)
In PQASYNC_RESULT mode, libpq has to build up a list of pending results
each time it notices available data during a PQsendQueryParams
operation. The number of responses that need to be stored this way is
not totally unbounded, it is related to what fits into the
bandwidth-delay product of the link. This, and the fairly elaborate
processing logic, is the reason why I think it makes to have sense
PQASYNC_DISCARD as well.
Instead of buffering the results, we could buffer the encoded command
messages in PQASYNC_RESULT mode. This means that PQsendQueryParams
would not block when it cannot send the (complete) command message, but
store in the connection object so that the subsequent PQgetResultNoWait
and PQgetResult would send it. This might work better with single-tuple
result mode. We cannot avoid buffering either multiple queries or
multiple responses if we want to utilize the link bandwidth, or we'd
risk deadlocks.
There'd also be the default PQASYNC_DEFAULT mode, where it is an error
to call PQsendQueryParams multiple times without a PQgetResult call
in-between.
In PQASYNC_DISCARD mode, we could implicitly synchronize with the server
when a synchronous API is used, reporting a pending error against the
synchronous command. (I think this is what the delayed error reporting
in X11 does.) Typically, that would be the "COMMIT" at the end of the
transaction. With a result-buffering implementation of PQASYNC_RESULT,
we could do that as well, but this might be too much magic.
I thought I'd ask for comments before starting coding because this looks
a bit more complicated than I expected. :)
--
Florian Weimer / Red Hat Product Security Team
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers