Re: [HACKERS] Pipelining executions to postgresql server

2014-11-03 Thread Craig Ringer
On 11/04/2014 07:56 AM, Mikko Tiihonen wrote:
> I also think the async I/O is the way to go. Luckily that has already been 
> done
> in the pgjdbc-ng  (https://github.com/impossibl/pgjdbc-ng), built on top
> of netty java NIO library. It has quite good feature parity with the original
> pgjdbc driver. 

Huh, it does seem to now. The big omission, unless I'm blind, is support
for COPY. (It also lacks support for JDBC3 and JDBC4, requiring JDK 7
and JDBC 4.1, but that's reasonable enough.)

It now has LISTEN/NOTIFY by the looks, and of course it's built around
binary protocol support.

I freely admit I haven't looked at it too closely. I'm a tad frustrated
by the parallel development of a different driver when the "official"
driver has so much in the way of low hanging fruit to improve.

I have to say I like some aspects of how pgjdbc-ng is being done - in
particular use of Maven and being built around non-blocking I/O.

OTOH, the use of Google Guava I find pretty inexplicable in a JDBC
driver, and since it hard-depends on JDK 7 I don't understand why Netty
was used instead of the async / non-blocking features of the core JVM.
That may simply be my inexperience with writing non-blocking socket I/O
code on Java though.

I'm also concerned about how it'll handle new JDBC versions, as it seems
to lack the JDBC interface abstraction of the core driver.

> I do not think the JDBC batch interface even allow doing updates to multiple
> tables when using prepared statements?

Ah, I missed this before.

That's correct. You get prepared statements _or_ multiple different
statements.

That's a more understandable reason to concoct a new API, and explains
why you're not just solving the issues with batches. Though I still
think you're going to have to fix the buffer management code before you
do anything like this.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Pipelining executions to postgresql server

2014-11-03 Thread Craig Ringer
On 11/04/2014 07:56 AM, Mikko Tiihonen wrote:
> I do not quite grasp why not sending Sync is so important.

Well, at the moment the PgJDBC driver relies on the following flow to
manage its buffers and avoid a logjam where both server and client are
waiting for the other to consume input:

* Send some stuff
* Sync
* Consume input until Sync response received

... since at this point it knows the server's send buffer should be
empty, or near enough.

There's no fundamental reason why you *can't* send a Sync after each
query, AFAIK. You just have to change how the driver handles syncs so it
knows that there might be more work pipelined. Or fix the driver's
buffer management, as described in the github issues I linked.

It doesn't make much sense to unless you're running in autocommit mode
though, and I'm not convinced piplelining work in autocommit mode makes
a lot of sense. The biggest problem is exception handling - you can't
throw an exception at some statement execution in the past. So you need
a new interface for piplelining... or to just use the existing batch
interface.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Pipelining executions to postgresql server

2014-11-03 Thread Craig Ringer
On 11/04/2014 09:10 AM, Tom Lane wrote:
> Mikko Tiihonen  writes:
>> I do not quite grasp why not sending Sync is so important. My proof of 
>> concept setup was for queries with autocommit enabled.
> 
> [snip] It'll be very much like
> sending a fixed (predetermined) SQL script to the server using a scripting
> language that lacks any conditionals.

... which is part of why I think the batch interface for JDBC is the
appropriate UI, and the existing support in PgJDBC just needs to be
extended to support returning result sets.

The JDBC driver supports multiple result sets, and a batch execution
looks just like a procedure that returned multiple result sets (or
rather, a mix of result sets, affected rowcounts, and no-info success
messages).

See

http://docs.oracle.com/javase/7/docs/api/java/sql/Statement.html#addBatch(java.lang.String)

http://docs.oracle.com/javase/7/docs/api/java/sql/Statement.html#executeBatch()



The docs are admittedly mostly oriented toward DML, but nothing stops us
returning SUCCESS_NO_INFO and a resultset. Or if we choose, returning a
rowcount and resultset.

It'd be worth testing what other drivers do before doing that, but
nothing in the spec:

https://jcp.org/aboutJava/communityprocess/mrel/jsr221/index.html

seems to stop us doing it.

The batch interface doesn't offer any way to set scrollable/editable
resultset options, but we don't want to allow that for batches anyway.


-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Pipelining executions to postgresql server

2014-11-03 Thread Tom Lane
Mikko Tiihonen  writes:
> I do not quite grasp why not sending Sync is so important. My proof of 
> concept setup was for queries with autocommit enabled.

The point is that that will be very, very much harder to use than doing
it the other way.  It's fairly easy to reason about the results of
single-transaction pipelined queries: they're all or nothing.  If you
fire off queries that are going to autocommit independently, then you
have to worry about all combinations of success/failure, and you won't
have the opportunity to adjust on the fly.  It'll be very much like
sending a fixed (predetermined) SQL script to the server using a scripting
language that lacks any conditionals.  People certainly do use fixed
scripts sometimes, but they usually find out they want them wrapped into
single transactions, because otherwise they're left with a horrible mess
if the script goes off the rails at all.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Pipelining executions to postgresql server

2014-11-03 Thread Mikko Tiihonen
> Craig Ringer wrote:
> On 11/02/2014 09:27 PM, Mikko Tiihonen wrote:
> > Is the following summary correct:
> > - the network protocol supports pipelinings
> Yes.
> 
> All you have to do is *not* send a Sync message and be aware that the
> server will discard all input until the next Sync, so pipelining +
> autocommit doesn't make a ton of sense for error handling reasons.

I do not quite grasp why not sending Sync is so important. My proof of concept 
setup was for queries with autocommit enabled.
When looking with wireshark I could observe that the client sent 3-10 
P/B//D/E/S messages to server, before the server started sending the 
corresponding 1/2/T/D/C/Z replies for each request. Every 10 requests the test 
application waited for the all the replies to come to not overflow the network 
buffers (which is known to cause deadlocks with current pg jdbc driver).

If I want separate error handling for each execution then shouldn't I be 
sending separate sync for each pipelined operation?

> > - the server handles operations in order, starting the processing of next 
> > operation only after fully processing the previous one 
> >- thus pipelining is invisible to the server
> 
> As far as I know, yes. The server just doesn't care.
> 
> > - libpq driver does not support pipelining, but that is due to internal 
> > limitations
> 
> Yep.
> 
> > - if proper error handling is done by the client then there is no reason 
> > why pipelining could be supported by any pg client
> 
> Indeed, and most should support it. Sending batches of related queries
> would make things a LOT faster.
> 
> PgJDBC's batch support is currently write-oriented. There is no
> fundamental reason it can't be expanded for reads. I've already written
> a patch to do just that for the case of returning generated keys.
>
> https://github.com/ringerc/pgjdbc/tree/batch-returning-support
> 
> and just need to rebase it so I can send a pull for upstream PgJDBC.
> It's already linked in the issues documenting the limitatations in batch
>support.

Your code looked like good. Returning inserts are an important thing to 
optimize.

> If you want to have more general support for batches that return rowsets
> there's no fundamental technical reason why it can't be added. It just
> requires some tedious refactoring of the driver to either:
> 
> - Sync and wait before it fills its *send* buffer, rather than trying
>   to manage its receive buffer (the server send buffer), so it can
>   reliably avoid deadlocks; or
> 
> - Do async I/O in a select()-like loop over a protocol state machine,
>   so it can simultaneously read and write on the wire.

I also think the async I/O is the way to go. Luckily that has already been done
in the pgjdbc-ng  (https://github.com/impossibl/pgjdbc-ng), built on top
of netty java NIO library. It has quite good feature parity with the original
pgjdbc driver. I'll try next if I can enable the pipelining with it now that
I have tried the proof of concept with the originial pgjdbc driver.

> I might need to do some of that myself soon, but it's a big (and
> therefore error-prone) job I've so far avoided by making smaller, more
> targeted changes.
> 
> For JDBC the JDBC batch interface is the right place to do this, and you
> should not IMO attempt to add pipelining outside that interface.
> (Multiple open resultsets from portals, yes, but not pipelining of queries).

I do not think the JDBC batch interface even allow doing updates to multiple
tables when using prepared statements?

--
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Pipelining executions to postgresql server

2014-11-03 Thread Craig Ringer
On 11/02/2014 09:27 PM, Mikko Tiihonen wrote:
> Is the following summary correct:
> - the network protocol supports pipelinings

Yes.

All you have to do is *not* send a Sync message and be aware that the
server will discard all input until the next Sync, so pipelining +
autocommit doesn't make a ton of sense for error handling reasons.

> - the server handles operations in order, starting the processing of next 
> operation only after fully processing the previous one - thus pipelining is 
> invisible to the server

As far as I know, yes. The server just doesn't care.

> - libpq driver does not support pipelining, but that is due to internal 
> limitations

Yep.

> - if proper error handling is done by the client then there is no reason why 
> pipelining could be supported by any pg client

Indeed, and most should support it. Sending batches of related queries
would make things a LOT faster.

PgJDBC's batch support is currently write-oriented. There is no
fundamental reason it can't be expanded for reads. I've already written
a patch to do just that for the case of returning generated keys.

https://github.com/ringerc/pgjdbc/tree/batch-returning-support

and just need to rebase it so I can send a pull for upstream PgJDBC.
It's already linked in the issues documenting the limitatations in batch
support.


If you want to have more general support for batches that return rowsets
there's no fundamental technical reason why it can't be added. It just
requires some tedious refactoring of the driver to either:

- Sync and wait before it fills its *send* buffer, rather than trying
  to manage its receive buffer (the server send buffer), so it can
  reliably avoid deadlocks; or

- Do async I/O in a select()-like loop over a protocol state machine,
  so it can simultaneously read and write on the wire.

I might need to do some of that myself soon, but it's a big (and
therefore error-prone) job I've so far avoided by making smaller, more
targeted changes.

Doing async I/O using Java nio channels is by far the better approach,
but also the more invasive one. The driver currently sends data on the
wire where it generates it and blocks to receive expected data.
Switching to send-side buffer management doesn't have the full
performance gains that doing bidirectional I/O via channels does,
though, and may be a significant performance _loss_ if you're sending
big queries but getting small replies.

For JDBC the JDBC batch interface is the right place to do this, and you
should not IMO attempt to add pipelining outside that interface.
(Multiple open resultsets from portals, yes, but not pipelining of queries).


-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Pipelining executions to postgresql server

2014-11-03 Thread Craig Ringer
On 11/01/2014 10:04 PM, Mikko Tiihonen wrote:
> Hi,
> 
> I created a proof of concecpt patch for postgresql JDBC driver that allows 
> the caller to do pipelining of requests within a transaction. The pipelining 
> here means same as for HTTP: the client can send the next execution already 
> before waiting for the response of the previous request to be fully processed.

... but ... it already does that.

Use batches, and that's exactly what it'll do.

Statement.addBatch(String)
or
PreparedStatement.addBatch()

> What kind of problems could pipelining cause (assuming we limit it to rather 
> simple use cases only)?

Client/server pipleline deadlocks due to how PgJDBC manages it.

See the details:

https://github.com/pgjdbc/pgjdbc/issues/195

and

https://github.com/pgjdbc/pgjdbc/issues/194

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Pipelining executions to postgresql server

2014-11-02 Thread Mikko Tiihonen
> > > From: Andres Freund 
> > > On 2014-11-01 14:04:05 +, Mikko Tiihonen wrote:
> > > > I created a proof of concecpt patch for postgresql JDBC driver that
> > > > allows the caller to do pipelining of requests within a
> > > > transaction. The pipelining here means same as for HTTP: the client
> > > > can send the next execution already before waiting for the response of
> > > > the previous request to be fully processed.
> > >
> > > Slightly confused here. To my knowledge the jdbc driver already employs
> > > some pipelining? There's some conditions where it's disabled (IIRC
> > > RETURNING for DML is one of them), but otherwise it's available.
> > >
> > > I'm very far from a pgjdbc expert, but that's what I gathered from the
> > > code when investigating issues a while back and from my colleague Craig.
> >
> > Most DB interfaces make the server operations look synchronous.
> 
> You IIRC can use jdbc's batch interface.

Yes, there is a limited batch interface for inserts and updates. But for 
example when using prepared statements you can only do batches of same 
statement (with different parameters of course).

> > I should have searched earlier a better reference to libpg. I am planning 
> > on adding support for something similar to
> > http://www.postgresql.org/docs/9.3/static/libpq-async.html
> > More specifically operations like:
> >   int PQsendQuery(PGconn *conn, const char *command);
> >   PGresult *PQgetResult(PGconn *conn);
> 
> The network protocol allows for pipelining, yes. But libpq unfortunately
> doesn't.
> You should read the protocol definition.

I have studied the protocol, that is why I concluded that it would be possible 
to add support for pipelining for clients.
 
> > It should be, if the server startd to process (read in) the next query
> > only after it has sent the previous response out.
> 
> There's complexities around error handling and such making it slightly
> more complex.

Are you referring to some complexities on the server side related to error 
handling or on the client side?

Is the following summary correct:
- the network protocol supports pipelinings
- the server handles operations in order, starting the processing of next 
operation only after fully processing the previous one - thus pipelining is 
invisible to the server
- libpq driver does not support pipelining, but that is due to internal 
limitations
- if proper error handling is done by the client then there is no reason why 
pipelining could be supported by any pg client

-Mikko

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Pipelining executions to postgresql server

2014-11-02 Thread Andres Freund
On 2014-11-02 12:11:49 +, Mikko Tiihonen wrote:
> > From: Andres Freund 
> > On 2014-11-01 14:04:05 +, Mikko Tiihonen wrote:
> > > I created a proof of concecpt patch for postgresql JDBC driver that
> > > allows the caller to do pipelining of requests within a
> > > transaction. The pipelining here means same as for HTTP: the client
> > > can send the next execution already before waiting for the response of
> > > the previous request to be fully processed.
> > 
> > Slightly confused here. To my knowledge the jdbc driver already employs
> > some pipelining? There's some conditions where it's disabled (IIRC
> > RETURNING for DML is one of them), but otherwise it's available.
> > 
> > I'm very far from a pgjdbc expert, but that's what I gathered from the
> > code when investigating issues a while back and from my colleague Craig.
> 
> Most DB interfaces make the server operations look synchronous.

You IIRC can use jdbc's batch interface.

> I should have searched earlier a better reference to libpg. I am planning on 
> adding support for something similar to
> http://www.postgresql.org/docs/9.3/static/libpq-async.html
> More specifically operations like:
>   int PQsendQuery(PGconn *conn, const char *command);
>   PGresult *PQgetResult(PGconn *conn);

That doesn't really support pipelining though - you can still only send
one query. It's nonblocking, but that's a different thing.

> In libpg terms: Is it safe to do multiple PQsendQuery operations
> before invoking PQgetResult as many times?

The network protocol allows for pipelining, yes. But libpq unfortunately
doesn't.

You should read the protocol definition.

> It should be, if the server startd to process (read in) the next query
> only after it has sent the previous response out.

There's complexities around error handling and such making it slightly
more complex.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Pipelining executions to postgresql server

2014-11-02 Thread Mikko Tiihonen
> From: Andres Freund 
> On 2014-11-01 14:04:05 +, Mikko Tiihonen wrote:
> > I created a proof of concecpt patch for postgresql JDBC driver that
> > allows the caller to do pipelining of requests within a
> > transaction. The pipelining here means same as for HTTP: the client
> > can send the next execution already before waiting for the response of
> > the previous request to be fully processed.
> 
> Slightly confused here. To my knowledge the jdbc driver already employs
> some pipelining? There's some conditions where it's disabled (IIRC
> RETURNING for DML is one of them), but otherwise it's available.
> 
> I'm very far from a pgjdbc expert, but that's what I gathered from the
> code when investigating issues a while back and from my colleague Craig.

Most DB interfaces make the server operations look synchronous.
For JDBC the only standard interface is similar to libpg:
  PGresult *PQexec(PGconn *conn, const char *command);

I should have searched earlier a better reference to libpg. I am planning on 
adding support for something similar to
http://www.postgresql.org/docs/9.3/static/libpq-async.html
More specifically operations like:
  int PQsendQuery(PGconn *conn, const char *command);
  PGresult *PQgetResult(PGconn *conn);
The Java API will of course be custom to postgresql jdbc driver since there is 
no official java api for async db operations.

The above I can do purely on the jdbc driver side. But my my question was about 
pipelining.
In libpg terms: Is it safe to do multiple PQsendQuery operations before 
invoking PQgetResult as many times?
It should be, if the server startd to process (read in) the next query only 
after it has sent the previous response out.

And do any pg connection poolers support having multiple executions on-the-fly 
at the same time for the same connection?

-Mikko

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Pipelining executions to postgresql server

2014-11-01 Thread Andres Freund
On 2014-11-01 14:04:05 +, Mikko Tiihonen wrote:
> I created a proof of concecpt patch for postgresql JDBC driver that
> allows the caller to do pipelining of requests within a
> transaction. The pipelining here means same as for HTTP: the client
> can send the next execution already before waiting for the response of
> the previous request to be fully processed.

Slightly confused here. To my knowledge the jdbc driver already employs
some pipelining? There's some conditions where it's disabled (IIRC
RETURNING for DML is one of them), but otherwise it's available.

I'm very far from a pgjdbc expert, but that's what I gathered from the
code when investigating issues a while back and from my colleague Craig.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Pipelining executions to postgresql server

2014-11-01 Thread Tom Lane
Mikko Tiihonen  writes:
> I created a proof of concecpt patch for postgresql JDBC driver that allows 
> the caller to do pipelining of requests within a transaction. The pipelining 
> here means same as for HTTP: the client can send the next execution already 
> before waiting for the response of the previous request to be fully processed.

In principle this can work if you think through the error handling
carefully.  The way the FE/BE protocol is intended to handle the case
is that all the pipelined statements are part of one transaction so
that they all succeed or fail together.  Otherwise the user has to work
through and predict the outcome of all the combinations of "Q1 succeeded,
Q2 failed, Q3 succeeded" which is mighty error-prone, especially if the
commands are interdependent in any way at all.  Having legislated that
(and forbidden any transaction-boundary commands in a pipelined group),
what you do is not issue a Sync until after the last command of the set.
Then, if any command fails, the backend just automatically discards
remaining messages until it gets the Sync, and so you can safely issue
new commands before knowing the results of the previous ones.

I mention this because, although I don't know the innards of the JDBC
driver at all, it sure doesn't look like that's the way you've done it.
If I'm right that you've left the error recovery strategy to the user,
I think this is gonna be very hard to use safely.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers