Hi, On 2017-08-10 15:23:06 +1000, Vaishnavi Prabakaran wrote: > Andres Freund wrote : > > If you were to send a gigabyte of queries, it'd buffer them all up in > memory... So some > >more intelligence is going to be needed. > > Firstly, sorry for the delayed response as I got busy with other > activities.
No worries - development of new features was slowed down anyway, due to the v10 feature freeze. > To buffer up the queries before flushing them to the socket, I think > "conn->outCount>=65536" is ok to use, as 64k is considered to be safe in > Windows as per comments in pqSendSome() API. > > Attached the code patch to replace pqFlush calls with pqBatchFlush in the > asynchronous libpq function calls flow. > Still pqFlush is used in "PQbatchSyncQueue" and > "PQexitBatchMode" functions. > Am failing to see the benefit in allowing user to set > PQBatchAutoFlush(true|false) property? Is it really needed? I'm inclined not to introduce that for now. If somebody comes up with a convincing usecase and numbers, we can add it later. Libpq API is set in stone, so I'd rather not introduce unnecessary stuff... > + <para> > + Much like asynchronous query mode, there is no performance disadvantage > to > + using batching and pipelining. It increases client application complexity > + and extra caution is required to prevent client/server deadlocks but > + can sometimes offer considerable performance improvements. > + </para> That's not necessarily true, is it? Unless you count always doing batches of exactly size 1. > + <para> > + Batching is most useful when the server is distant, i.e. network latency > + (<quote>ping time</quote>) is high, and when many small operations are > being performed in > + rapid sequence. There is usually less benefit in using batches when each > + query takes many multiples of the client/server round-trip time to > execute. > + A 100-statement operation run on a server 300ms round-trip-time away > would take > + 30 seconds in network latency alone without batching; with batching it > may spend > + as little as 0.3s waiting for results from the server. > + </para> I'd add a remark that this is frequently beneficial even in cases of minimal latency - as e.g. shown by the numbers I presented upthread. > + <para> > + Use batches when your application does lots of small > + <literal>INSERT</literal>, <literal>UPDATE</literal> and > + <literal>DELETE</literal> operations that can't easily be transformed > into > + operations on sets or into a > + <link linkend="libpq-copy"><literal>COPY</literal></link> operation. > + </para> Aren't SELECTs also a major beneficiarry of this? > + <para> > + Batching is less useful when information from one operation is required > by the > + client before it knows enough to send the next operation. s/less/not/ > + <note> > + <para> > + The batch API was introduced in PostgreSQL 10.0, but clients using > PostgresSQL 10.0 version of libpq can > + use batches on server versions 8.4 and newer. Batching works on any > server > + that supports the v3 extended query protocol. > + </para> > + </note> Where's the 8.4 coming from? > + <para> > + The client uses libpq's asynchronous query functions to dispatch work, > + marking the end of each batch with <function>PQbatchSyncQueue</function>. > + And to get results, it uses <function>PQgetResult</function> and > + <function>PQbatchProcessQueue</function>. It may eventually exit > + batch mode with <function>PQexitBatchMode</function> once all results are > + processed. > + </para> > + > + <note> > + <para> > + It is best to use batch mode with <application>libpq</application> in > + <link linkend="libpq-pqsetnonblocking">non-blocking mode</link>. If > used in > + blocking mode it is possible for a client/server deadlock to occur. The > + client will block trying to send queries to the server, but the server > will > + block trying to send results from queries it has already processed to > the > + client. This only occurs when the client sends enough queries to fill > its > + output buffer and the server's receive buffer before switching to > + processing input from the server, but it's hard to predict exactly when > + that'll happen so it's best to always use non-blocking mode. > + </para> > + </note> Mention that nonblocking only actually helps if send/recv is done as required, and can essentially require unbound memory? We probably should either document or implement some smarts about when to signal read/write readyness. Otherwise we e.g. might be receiving tons of result data without having sent the next query - or the other way round. > + <sect3 id="libpq-batch-sending"> > + <title>Issuing queries</title> > + > + <para> > + After entering batch mode the application dispatches requests > + using normal asynchronous <application>libpq</application> functions > such as > + <function>PQsendQueryParams</function>, > <function>PQsendPrepare</function>, > + <function>PQsendQueryPrepared</function>, > <function>PQsendDescribePortal</function>, > + <function>PQsendDescribePrepared</function>. > + The asynchronous requests are followed by a <link > + > linkend="libpq-PQbatchSyncQueue"><function>PQbatchSyncQueue(conn)</function></link> > call to mark > + the end of the batch. The client <emphasis>does not</emphasis> need to > call > + <function>PQgetResult</function> immediately after dispatching each > + operation. <link linkend="libpq-batch-results">Result processing</link> > + is handled separately. > + </para> > + > + <para> > + Batched operations will be executed by the server in the order the > client > + sends them. The server will send the results in the order the statements > + executed. The server may begin executing the batch before all commands > + in the batch are queued and the end of batch command is sent. If any > + statement encounters an error the server aborts the current transaction > and > + skips processing the rest of the batch. Query processing resumes after > the > + end of the failed batch. > + </para> Maybe note that multiple batches can be "in flight"? I.e. PQbatchSyncQueue() is about error handling, nothing else? Don't have a great idea, but we might want to rename... > + <note> > + <para> > + The client must not assume that work is committed when it > + <emphasis>sends</emphasis> a <literal>COMMIT</literal>, only when the > + corresponding result is received to confirm the commit is complete. > + Because errors arrive asynchronously the application needs to be able > to > + restart from the last <emphasis>received</emphasis> committed change > and > + resend work done after that point if something goes wrong. > + </para> > + </note> This seems fairly independent of batching. > + </sect3> > + > + <sect3 id="libpq-batch-interleave"> > + <title>Interleaving result processing and query dispatch</title> > + > + <para> > + To avoid deadlocks on large batches the client should be structured > around > + a nonblocking I/O loop using a function like > <function>select</function>, > + <function>poll</function>, <function>epoll</function>, > + <function>WaitForMultipleObjectEx</function>, etc. > + </para> > + > + <para> > + The client application should generally maintain a queue of work still > to > + be dispatched and a queue of work that has been dispatched but not yet > had > + its results processed. Hm. Why? If queries are just issued, no such queue is required? > When the socket is writable it should dispatch more > + work. When the socket is readable it should read results and process > them, > + matching them up to the next entry in its expected results queue. > Batches > + should be scoped to logical units of work, usually (but not always) one > + transaction per batch. There's no need to exit batch mode and re-enter > it > + between batches or to wait for one batch to finish before sending the > next. > + </para> This really needs to take memory usage into account. > + </variablelist> > + > + </listitem> > + </varlistentry> > + > + <varlistentry id="libpq-PQenterBatchMode"> > + <term> > + <function>PQenterBatchMode</function> > + <indexterm> > + <primary>PQenterBatchMode</primary> > + </indexterm> > + </term> > + > + <listitem> > + <para> > + Causes a connection to enter batch mode if it is currently idle or > + already in batch mode. > + > +<synopsis> > +int PQenterBatchMode(PGconn *conn); > +</synopsis> > + > + </para> > + <para> > + Returns 1 for success. Returns 0 and has no > + effect if the connection is not currently idle, i.e. it has a > result > + ready, is waiting for more input from the server, etc. This > function > + does not actually send anything to the server, it just changes the > + <application>libpq</application> connection state. > + > + </para> > + </listitem> > + </varlistentry> > + > + <varlistentry id="libpq-PQexitBatchMode"> > + <term> > + <function>PQexitBatchMode</function> > + <indexterm> > + <primary>PQexitBatchMode</primary> > + </indexterm> > + </term> > + > + <listitem> > + <para> > + Causes a connection to exit batch mode if it is currently in batch mode > + with an empty queue and no pending results. > +<synopsis> > +int PQexitBatchMode(PGconn *conn); > +</synopsis> > + </para> > + <para>Returns 1 for success. > + Returns 1 and takes no action if not in batch mode. If the connection > has "returns 1"? > + <varlistentry id="libpq-PQbatchSyncQueue"> > + <term> > + <function>PQbatchSyncQueue</function> > + <indexterm> > + <primary>PQbatchSyncQueue</primary> > + </indexterm> > + </term> > + > + <listitem> > + <para> > + Delimits the end of a set of a batched commands by sending a <link > + linkend="protocol-flow-ext-query">sync message</link> and flushing > + the send buffer. The end of a batch serves as > + the delimiter of an implicit transaction and > + an error recovery point; see <link linkend="libpq-batch-errors"> > + error handling</link>. I wonder why this isn't framed as PQbatchIssue/Send/...()? Syncing seems to mostly make sense from a protocol POV. > + <varlistentry id="libpq-PQbatchQueueCount"> > + <term> > + <function>PQbatchQueueCount</function> > + <indexterm> > + <primary>PQbatchQueueCount</primary> > + </indexterm> > + </term> > + > + <listitem> > + <para> > + Returns the number of queries still in the queue for this batch, not > + including any query that's currently having results being processed. > + This is the number of times <function>PQbatchProcessQueue</function> > has to be > + called before the query queue is empty again. > + > +<synopsis> > +int PQbatchQueueCount(PGconn *conn); > +</synopsis> > + > + </para> > + </listitem> > + </varlistentry> Given that apps are supposed to track this, I'm not sure why we have this? > +/* > + * PQrecyclePipelinedCommand > + * Push a command queue entry onto the freelist. It must be a dangling > entry > + * with null next pointer and not referenced by any other entry's next > pointer. > + */ > +static void > +PQrecyclePipelinedCommand(PGconn *conn, PGcommandQueueEntry * entry) > +{ > + if (entry == NULL) > + return; > + if (entry->next != NULL) > + { > + fprintf(stderr, libpq_gettext("tried to recycle non-dangling > command queue entry")); > + abort(); Don't think we use abort() in libpq like that. There's some Assert()s tho. > static bool > PQsendQueryStart(PGconn *conn) > @@ -1377,20 +1486,59 @@ PQsendQueryStart(PGconn *conn) > libpq_gettext("no connection > to the server\n")); > return false; > } > - /* Can't send while already busy, either. */ > - if (conn->asyncStatus != PGASYNC_IDLE) > + /* Can't send while already busy, either, unless enqueuing for later */ > + if (conn->asyncStatus != PGASYNC_IDLE && conn->batch_status == > PQBATCH_MODE_OFF) > { > printfPQExpBuffer(&conn->errorMessage, > libpq_gettext("another > command is already in progress\n")); > return false; > } > > - /* initialize async result-accumulation state */ > - pqClearAsyncResult(conn); > + if (conn->batch_status != PQBATCH_MODE_OFF) > + { > + /* Weirdly indented. > + if (conn->batch_status != PQBATCH_MODE_OFF) > + { > + pipeCmd = PQmakePipelinedCommand(conn); > + > + if (pipeCmd == NULL) > + return 0; /* error msg already > set */ > + > + last_query = &pipeCmd->query; > + queryclass = &pipeCmd->queryclass; > + } > + else > + { > + last_query = &conn->last_query; > + queryclass = &conn->queryclass; > + } The amount of complexity / branches we're adding to all of these is more than a bit unsightly. > +/* > + * PQbatchQueueCount > + * Return number of queries currently pending in batch mode > + */ > +int > +PQbatchQueueCount(PGconn *conn) > +{ > + int count = 0; > + PGcommandQueueEntry *entry; > + > + if (PQbatchStatus(conn) == PQBATCH_MODE_OFF) > + return 0; > + > + entry = conn->cmd_queue_head; > + while (entry != NULL) > + { > + ++count; > + entry = entry->next; > + } > + return count; > +} Ugh, O(N)? In that case I'd rather just remove this. > +/* > + * PQbatchBegin Mismatched w/ actual function name. > + * Put an idle connection in batch mode. Commands submitted after this > + * can be pipelined on the connection, there's no requirement to wait for > + * one to finish before the next is dispatched. > + * > + * Queuing of new query or syncing during COPY is not allowed. +"a"? > + * A set of commands is terminated by a PQbatchQueueSync. Multiple sets of > batched > + * commands may be sent while in batch mode. Batch mode can be exited by > + * calling PQbatchEnd() once all results are processed. > + * > + * This doesn't actually send anything on the wire, it just puts libpq > + * into a state where it can pipeline work. > + */ > +int > +PQenterBatchMode(PGconn *conn) > +{ > + if (!conn) > + return false; true/false isn't quite in line with int return code. > +/* > + * PQbatchEnd wrong name. > + * End batch mode and return to normal command mode. > + * > + * Has no effect unless the client has processed all results > + * from all outstanding batches and the connection is idle. That seems wrong - will lead to hard to diagnose errors. > + * Returns true if batch mode ended. > + */ > +int > +PQexitBatchMode(PGconn *conn) > +{ > + if (!conn) > + return false; > + > + if (conn->batch_status == PQBATCH_MODE_OFF) > + return true; > + > + switch (conn->asyncStatus) > + { > + case PGASYNC_IDLE: > + printfPQExpBuffer(&conn->errorMessage, > + libpq_gettext_noop("internal error, IDLE in > batch mode")); > + break; > + case PGASYNC_COPY_IN: > + case PGASYNC_COPY_OUT: > + case PGASYNC_COPY_BOTH: > + printfPQExpBuffer(&conn->errorMessage, > + libpq_gettext_noop("internal error, COPY in > batch mode")); > + break; So we'll still check the queue in this case, that's a bit weird? > +/* > + * PQbatchQueueSync Out of sync. > + * End a batch submission by sending a protocol sync. The connection will > + * remain in batch mode and unavailable for new non-batch commands until > all > + * results from the batch are processed by the client. "unavailable for new non-batch commands" - that's hard to follow, and seems pretty redundant with PQendBatchMode (or however it's called). > + * It's legal to start submitting another batch immediately, without > waiting > + * for the results of the current batch. There's no need to end batch mode > + * and start it again. > + * > + * If a command in a batch fails, every subsequent command up to and > including > + * the PQbatchQueueSync command result gets set to PGRES_BATCH_ABORTED > state. If the > + * whole batch is processed without error, a PGresult with PGRES_BATCH_END > is > + * produced. Hm, should probably mention that that's only true for commands since the last PQbatchQueueSync? > +/* > + * PQbatchQueueProcess Out of sync. > + * Returns false if the current query isn't done yet, the connection > + * is not in a batch, or there are no more queries to process. Last complaint about this - think this forgiving mode is a mistake. > + */ > +int > +PQbatchProcessQueue(PGconn *conn) > +{ > + /* This command's results will come in immediately. > + * Initialize async result-accumulation state */ > + pqClearAsyncResult(conn); I'm not following? > /* > * PQgetResult > @@ -1749,10 +2228,32 @@ PQgetResult(PGconn *conn) > + if (conn->batch_status != PQBATCH_MODE_OFF) > + { > + /* > + * batched queries aren't followed by a Sync to > put us back in > + * PGASYNC_IDLE state, and when we do get a > sync we could > + * still have another batch coming after this > one. This needs rephrasing. > + * The connection isn't idle since we can't > submit new > + * nonbatched commands. It isn't also busy > since the current > + * command is done and we need to process a new > one. > + */ > + conn->asyncStatus = PGASYNC_QUEUED; Not sure I like the name. > + if (conn->asyncStatus == PGASYNC_QUEUED || conn->batch_status != > PQBATCH_MODE_OFF) > + { > + printfPQExpBuffer(&conn->errorMessage, > + libpq_gettext("Synchronous > command execution functions are not allowed in batch mode\n")); > + return false; > + } Why do we need the PGASYNC_QUEUED test here? > +/* pqBatchFlush > + * In batch mode, data will be flushed only when the out buffer reaches the > threshold value. > + * In non-batch mode, data will be flushed all the time. > + */ > +static int > +pqBatchFlush(PGconn *conn) > +{ > + if ((conn->batch_status == PQBATCH_MODE_OFF)||(conn->outCount>=65536)) > + return(pqFlush(conn)); > + return 0; /* Just to keep compiler quiet */ > +} This should be defined in a macro or such, rather than hardcoded. Falling over now. This seems like enough feedback for a bit of work anyway. Regards, Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers