Re: [HACKERS] Pipelining executions to postgresql server
On 11/04/2014 07:56 AM, Mikko Tiihonen wrote: > I also think the async I/O is the way to go. Luckily that has already been > done > in the pgjdbc-ng (https://github.com/impossibl/pgjdbc-ng), built on top > of netty java NIO library. It has quite good feature parity with the original > pgjdbc driver. Huh, it does seem to now. The big omission, unless I'm blind, is support for COPY. (It also lacks support for JDBC3 and JDBC4, requiring JDK 7 and JDBC 4.1, but that's reasonable enough.) It now has LISTEN/NOTIFY by the looks, and of course it's built around binary protocol support. I freely admit I haven't looked at it too closely. I'm a tad frustrated by the parallel development of a different driver when the "official" driver has so much in the way of low hanging fruit to improve. I have to say I like some aspects of how pgjdbc-ng is being done - in particular use of Maven and being built around non-blocking I/O. OTOH, the use of Google Guava I find pretty inexplicable in a JDBC driver, and since it hard-depends on JDK 7 I don't understand why Netty was used instead of the async / non-blocking features of the core JVM. That may simply be my inexperience with writing non-blocking socket I/O code on Java though. I'm also concerned about how it'll handle new JDBC versions, as it seems to lack the JDBC interface abstraction of the core driver. > I do not think the JDBC batch interface even allow doing updates to multiple > tables when using prepared statements? Ah, I missed this before. That's correct. You get prepared statements _or_ multiple different statements. That's a more understandable reason to concoct a new API, and explains why you're not just solving the issues with batches. Though I still think you're going to have to fix the buffer management code before you do anything like this. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Pipelining executions to postgresql server
On 11/04/2014 07:56 AM, Mikko Tiihonen wrote: > I do not quite grasp why not sending Sync is so important. Well, at the moment the PgJDBC driver relies on the following flow to manage its buffers and avoid a logjam where both server and client are waiting for the other to consume input: * Send some stuff * Sync * Consume input until Sync response received ... since at this point it knows the server's send buffer should be empty, or near enough. There's no fundamental reason why you *can't* send a Sync after each query, AFAIK. You just have to change how the driver handles syncs so it knows that there might be more work pipelined. Or fix the driver's buffer management, as described in the github issues I linked. It doesn't make much sense to unless you're running in autocommit mode though, and I'm not convinced piplelining work in autocommit mode makes a lot of sense. The biggest problem is exception handling - you can't throw an exception at some statement execution in the past. So you need a new interface for piplelining... or to just use the existing batch interface. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Pipelining executions to postgresql server
On 11/04/2014 09:10 AM, Tom Lane wrote: > Mikko Tiihonen writes: >> I do not quite grasp why not sending Sync is so important. My proof of >> concept setup was for queries with autocommit enabled. > > [snip] It'll be very much like > sending a fixed (predetermined) SQL script to the server using a scripting > language that lacks any conditionals. ... which is part of why I think the batch interface for JDBC is the appropriate UI, and the existing support in PgJDBC just needs to be extended to support returning result sets. The JDBC driver supports multiple result sets, and a batch execution looks just like a procedure that returned multiple result sets (or rather, a mix of result sets, affected rowcounts, and no-info success messages). See http://docs.oracle.com/javase/7/docs/api/java/sql/Statement.html#addBatch(java.lang.String) http://docs.oracle.com/javase/7/docs/api/java/sql/Statement.html#executeBatch() The docs are admittedly mostly oriented toward DML, but nothing stops us returning SUCCESS_NO_INFO and a resultset. Or if we choose, returning a rowcount and resultset. It'd be worth testing what other drivers do before doing that, but nothing in the spec: https://jcp.org/aboutJava/communityprocess/mrel/jsr221/index.html seems to stop us doing it. The batch interface doesn't offer any way to set scrollable/editable resultset options, but we don't want to allow that for batches anyway. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Pipelining executions to postgresql server
Mikko Tiihonen writes: > I do not quite grasp why not sending Sync is so important. My proof of > concept setup was for queries with autocommit enabled. The point is that that will be very, very much harder to use than doing it the other way. It's fairly easy to reason about the results of single-transaction pipelined queries: they're all or nothing. If you fire off queries that are going to autocommit independently, then you have to worry about all combinations of success/failure, and you won't have the opportunity to adjust on the fly. It'll be very much like sending a fixed (predetermined) SQL script to the server using a scripting language that lacks any conditionals. People certainly do use fixed scripts sometimes, but they usually find out they want them wrapped into single transactions, because otherwise they're left with a horrible mess if the script goes off the rails at all. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Pipelining executions to postgresql server
> Craig Ringer wrote: > On 11/02/2014 09:27 PM, Mikko Tiihonen wrote: > > Is the following summary correct: > > - the network protocol supports pipelinings > Yes. > > All you have to do is *not* send a Sync message and be aware that the > server will discard all input until the next Sync, so pipelining + > autocommit doesn't make a ton of sense for error handling reasons. I do not quite grasp why not sending Sync is so important. My proof of concept setup was for queries with autocommit enabled. When looking with wireshark I could observe that the client sent 3-10 P/B//D/E/S messages to server, before the server started sending the corresponding 1/2/T/D/C/Z replies for each request. Every 10 requests the test application waited for the all the replies to come to not overflow the network buffers (which is known to cause deadlocks with current pg jdbc driver). If I want separate error handling for each execution then shouldn't I be sending separate sync for each pipelined operation? > > - the server handles operations in order, starting the processing of next > > operation only after fully processing the previous one > >- thus pipelining is invisible to the server > > As far as I know, yes. The server just doesn't care. > > > - libpq driver does not support pipelining, but that is due to internal > > limitations > > Yep. > > > - if proper error handling is done by the client then there is no reason > > why pipelining could be supported by any pg client > > Indeed, and most should support it. Sending batches of related queries > would make things a LOT faster. > > PgJDBC's batch support is currently write-oriented. There is no > fundamental reason it can't be expanded for reads. I've already written > a patch to do just that for the case of returning generated keys. > > https://github.com/ringerc/pgjdbc/tree/batch-returning-support > > and just need to rebase it so I can send a pull for upstream PgJDBC. > It's already linked in the issues documenting the limitatations in batch >support. Your code looked like good. Returning inserts are an important thing to optimize. > If you want to have more general support for batches that return rowsets > there's no fundamental technical reason why it can't be added. It just > requires some tedious refactoring of the driver to either: > > - Sync and wait before it fills its *send* buffer, rather than trying > to manage its receive buffer (the server send buffer), so it can > reliably avoid deadlocks; or > > - Do async I/O in a select()-like loop over a protocol state machine, > so it can simultaneously read and write on the wire. I also think the async I/O is the way to go. Luckily that has already been done in the pgjdbc-ng (https://github.com/impossibl/pgjdbc-ng), built on top of netty java NIO library. It has quite good feature parity with the original pgjdbc driver. I'll try next if I can enable the pipelining with it now that I have tried the proof of concept with the originial pgjdbc driver. > I might need to do some of that myself soon, but it's a big (and > therefore error-prone) job I've so far avoided by making smaller, more > targeted changes. > > For JDBC the JDBC batch interface is the right place to do this, and you > should not IMO attempt to add pipelining outside that interface. > (Multiple open resultsets from portals, yes, but not pipelining of queries). I do not think the JDBC batch interface even allow doing updates to multiple tables when using prepared statements? -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Pipelining executions to postgresql server
On 11/02/2014 09:27 PM, Mikko Tiihonen wrote: > Is the following summary correct: > - the network protocol supports pipelinings Yes. All you have to do is *not* send a Sync message and be aware that the server will discard all input until the next Sync, so pipelining + autocommit doesn't make a ton of sense for error handling reasons. > - the server handles operations in order, starting the processing of next > operation only after fully processing the previous one - thus pipelining is > invisible to the server As far as I know, yes. The server just doesn't care. > - libpq driver does not support pipelining, but that is due to internal > limitations Yep. > - if proper error handling is done by the client then there is no reason why > pipelining could be supported by any pg client Indeed, and most should support it. Sending batches of related queries would make things a LOT faster. PgJDBC's batch support is currently write-oriented. There is no fundamental reason it can't be expanded for reads. I've already written a patch to do just that for the case of returning generated keys. https://github.com/ringerc/pgjdbc/tree/batch-returning-support and just need to rebase it so I can send a pull for upstream PgJDBC. It's already linked in the issues documenting the limitatations in batch support. If you want to have more general support for batches that return rowsets there's no fundamental technical reason why it can't be added. It just requires some tedious refactoring of the driver to either: - Sync and wait before it fills its *send* buffer, rather than trying to manage its receive buffer (the server send buffer), so it can reliably avoid deadlocks; or - Do async I/O in a select()-like loop over a protocol state machine, so it can simultaneously read and write on the wire. I might need to do some of that myself soon, but it's a big (and therefore error-prone) job I've so far avoided by making smaller, more targeted changes. Doing async I/O using Java nio channels is by far the better approach, but also the more invasive one. The driver currently sends data on the wire where it generates it and blocks to receive expected data. Switching to send-side buffer management doesn't have the full performance gains that doing bidirectional I/O via channels does, though, and may be a significant performance _loss_ if you're sending big queries but getting small replies. For JDBC the JDBC batch interface is the right place to do this, and you should not IMO attempt to add pipelining outside that interface. (Multiple open resultsets from portals, yes, but not pipelining of queries). -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Pipelining executions to postgresql server
On 11/01/2014 10:04 PM, Mikko Tiihonen wrote: > Hi, > > I created a proof of concecpt patch for postgresql JDBC driver that allows > the caller to do pipelining of requests within a transaction. The pipelining > here means same as for HTTP: the client can send the next execution already > before waiting for the response of the previous request to be fully processed. ... but ... it already does that. Use batches, and that's exactly what it'll do. Statement.addBatch(String) or PreparedStatement.addBatch() > What kind of problems could pipelining cause (assuming we limit it to rather > simple use cases only)? Client/server pipleline deadlocks due to how PgJDBC manages it. See the details: https://github.com/pgjdbc/pgjdbc/issues/195 and https://github.com/pgjdbc/pgjdbc/issues/194 -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Pipelining executions to postgresql server
> > > From: Andres Freund > > > On 2014-11-01 14:04:05 +, Mikko Tiihonen wrote: > > > > I created a proof of concecpt patch for postgresql JDBC driver that > > > > allows the caller to do pipelining of requests within a > > > > transaction. The pipelining here means same as for HTTP: the client > > > > can send the next execution already before waiting for the response of > > > > the previous request to be fully processed. > > > > > > Slightly confused here. To my knowledge the jdbc driver already employs > > > some pipelining? There's some conditions where it's disabled (IIRC > > > RETURNING for DML is one of them), but otherwise it's available. > > > > > > I'm very far from a pgjdbc expert, but that's what I gathered from the > > > code when investigating issues a while back and from my colleague Craig. > > > > Most DB interfaces make the server operations look synchronous. > > You IIRC can use jdbc's batch interface. Yes, there is a limited batch interface for inserts and updates. But for example when using prepared statements you can only do batches of same statement (with different parameters of course). > > I should have searched earlier a better reference to libpg. I am planning > > on adding support for something similar to > > http://www.postgresql.org/docs/9.3/static/libpq-async.html > > More specifically operations like: > > int PQsendQuery(PGconn *conn, const char *command); > > PGresult *PQgetResult(PGconn *conn); > > The network protocol allows for pipelining, yes. But libpq unfortunately > doesn't. > You should read the protocol definition. I have studied the protocol, that is why I concluded that it would be possible to add support for pipelining for clients. > > It should be, if the server startd to process (read in) the next query > > only after it has sent the previous response out. > > There's complexities around error handling and such making it slightly > more complex. Are you referring to some complexities on the server side related to error handling or on the client side? Is the following summary correct: - the network protocol supports pipelinings - the server handles operations in order, starting the processing of next operation only after fully processing the previous one - thus pipelining is invisible to the server - libpq driver does not support pipelining, but that is due to internal limitations - if proper error handling is done by the client then there is no reason why pipelining could be supported by any pg client -Mikko -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Pipelining executions to postgresql server
On 2014-11-02 12:11:49 +, Mikko Tiihonen wrote: > > From: Andres Freund > > On 2014-11-01 14:04:05 +, Mikko Tiihonen wrote: > > > I created a proof of concecpt patch for postgresql JDBC driver that > > > allows the caller to do pipelining of requests within a > > > transaction. The pipelining here means same as for HTTP: the client > > > can send the next execution already before waiting for the response of > > > the previous request to be fully processed. > > > > Slightly confused here. To my knowledge the jdbc driver already employs > > some pipelining? There's some conditions where it's disabled (IIRC > > RETURNING for DML is one of them), but otherwise it's available. > > > > I'm very far from a pgjdbc expert, but that's what I gathered from the > > code when investigating issues a while back and from my colleague Craig. > > Most DB interfaces make the server operations look synchronous. You IIRC can use jdbc's batch interface. > I should have searched earlier a better reference to libpg. I am planning on > adding support for something similar to > http://www.postgresql.org/docs/9.3/static/libpq-async.html > More specifically operations like: > int PQsendQuery(PGconn *conn, const char *command); > PGresult *PQgetResult(PGconn *conn); That doesn't really support pipelining though - you can still only send one query. It's nonblocking, but that's a different thing. > In libpg terms: Is it safe to do multiple PQsendQuery operations > before invoking PQgetResult as many times? The network protocol allows for pipelining, yes. But libpq unfortunately doesn't. You should read the protocol definition. > It should be, if the server startd to process (read in) the next query > only after it has sent the previous response out. There's complexities around error handling and such making it slightly more complex. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Pipelining executions to postgresql server
> From: Andres Freund > On 2014-11-01 14:04:05 +, Mikko Tiihonen wrote: > > I created a proof of concecpt patch for postgresql JDBC driver that > > allows the caller to do pipelining of requests within a > > transaction. The pipelining here means same as for HTTP: the client > > can send the next execution already before waiting for the response of > > the previous request to be fully processed. > > Slightly confused here. To my knowledge the jdbc driver already employs > some pipelining? There's some conditions where it's disabled (IIRC > RETURNING for DML is one of them), but otherwise it's available. > > I'm very far from a pgjdbc expert, but that's what I gathered from the > code when investigating issues a while back and from my colleague Craig. Most DB interfaces make the server operations look synchronous. For JDBC the only standard interface is similar to libpg: PGresult *PQexec(PGconn *conn, const char *command); I should have searched earlier a better reference to libpg. I am planning on adding support for something similar to http://www.postgresql.org/docs/9.3/static/libpq-async.html More specifically operations like: int PQsendQuery(PGconn *conn, const char *command); PGresult *PQgetResult(PGconn *conn); The Java API will of course be custom to postgresql jdbc driver since there is no official java api for async db operations. The above I can do purely on the jdbc driver side. But my my question was about pipelining. In libpg terms: Is it safe to do multiple PQsendQuery operations before invoking PQgetResult as many times? It should be, if the server startd to process (read in) the next query only after it has sent the previous response out. And do any pg connection poolers support having multiple executions on-the-fly at the same time for the same connection? -Mikko -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Pipelining executions to postgresql server
On 2014-11-01 14:04:05 +, Mikko Tiihonen wrote: > I created a proof of concecpt patch for postgresql JDBC driver that > allows the caller to do pipelining of requests within a > transaction. The pipelining here means same as for HTTP: the client > can send the next execution already before waiting for the response of > the previous request to be fully processed. Slightly confused here. To my knowledge the jdbc driver already employs some pipelining? There's some conditions where it's disabled (IIRC RETURNING for DML is one of them), but otherwise it's available. I'm very far from a pgjdbc expert, but that's what I gathered from the code when investigating issues a while back and from my colleague Craig. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Pipelining executions to postgresql server
Mikko Tiihonen writes: > I created a proof of concecpt patch for postgresql JDBC driver that allows > the caller to do pipelining of requests within a transaction. The pipelining > here means same as for HTTP: the client can send the next execution already > before waiting for the response of the previous request to be fully processed. In principle this can work if you think through the error handling carefully. The way the FE/BE protocol is intended to handle the case is that all the pipelined statements are part of one transaction so that they all succeed or fail together. Otherwise the user has to work through and predict the outcome of all the combinations of "Q1 succeeded, Q2 failed, Q3 succeeded" which is mighty error-prone, especially if the commands are interdependent in any way at all. Having legislated that (and forbidden any transaction-boundary commands in a pipelined group), what you do is not issue a Sync until after the last command of the set. Then, if any command fails, the backend just automatically discards remaining messages until it gets the Sync, and so you can safely issue new commands before knowing the results of the previous ones. I mention this because, although I don't know the innards of the JDBC driver at all, it sure doesn't look like that's the way you've done it. If I'm right that you've left the error recovery strategy to the user, I think this is gonna be very hard to use safely. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers